Sharpened cosine similarity torch - A Sharpened Cosine Similarity layer for PyTorch

Overview

Sharpened Cosine Similarity

A layer implementation for PyTorch

Install

At your command line:

git clone https://github.com/brohrer/sharpened_cosine_similarity_torch.git

You'll need to install or upgrade PyTorch if you haven't already. If python3 is the command you use to invoke Python at your command line:

python3 -m pip install torch torchvision --upgrade

Demo

Run the Fashion MNIST demo to see sharpened cosine similarity in action.

cd sharpened_cosine_similarity_torch
python3 demo_fashion_mnist.py

When you run this it will take a few extra minutes the first time through to download and extract the Fashion MNIST data set. Its less than 100MB when fully extracted.

I run this entirely on laptop CPUs. I have a dual-core i7 that takes about 90 seconds per epoch and an 8-core i7 that takes about 45 seconds per epoch. Your mileage may vary.

Monitor

You can check on the status of your runs at any time. In another console navigate to the smae directory and run

python3 show_results.py

This will give a little console summary like this

testing errors for version test
mean  : 14.08%
stddev: 0.1099%
stderr: 0.03887%
n runs: 8

and drop a couple of plots like this in the plots directory showing how the classification error on the test data set decreases with each pass through the training data set.

A sample of testing error results over several runs

The demo will keep running for a long time if you let it. Kill it when you get bored of it. If you want to pick the sequence of runs back up, re-run the demo and it will load all the results it's generated so far and append to them.

Track

If you'd like to experiment with the sharpened cosine similarity code, the demo, or with other data sets, you can keep track of each new run by adding a version argument at the command line.

To start a run with version string "v37" run

python3 demo_fashion_mnist.py v37

To check on its progress

python3 show_results.py v37

The version string can be arbitrarily descriptive, for example "3_scs_layer_2_fully_connected_layer_learning_rate_003", but keep it alphanumeric with underscores.

Credit where it's due

Based on and copy/pasted heavily from code from Ze Wang and code from Oliver Batchelor and the TensorFlow implementation and blog post from Raphael Pisoni.

Comments
  • Single alpha per layer. Defaults changed.

    Single alpha per layer. Defaults changed.

    Playing with alpha, I found very strong results with 1 shared alpha value across the layer. Initial observations suggest that one alpha per kernel doesn't add anything to performance, while incurring the addition of extra parameters.

    Inspecting the alpha values chosen by LSUV_like_init(), they fall roughly on the range from 10-50 for a small model I'm running on Fashion MNIST. Choosing an initial alpha of 10 and disabling auto init generates the strongest results I've seen from my model yet.

    @ducha-aiki, what do you think about moving to a single alpha per layer and removing the LSUV init? You've spent the most time thinking about it, so it's possible you've built an intuition around it I don't have yet.

    @4rtemi5 based on what I've seen, the alpha boosts performance noticeably. Based on what implementation we settle on in PyTorch I plan to mimic it in the Jax code. It might make a difference in getting to 90% on CIFAR-10.

    This diff is a little messy-it also has some changes to the demo scripts to make them compatible with the new API.

    opened by brohrer 10
  • Reimplement SCS in terms of conv2d

    Reimplement SCS in terms of conv2d

    This PR proposes a change in implementation of SCS which uses Conv2D, a much more performant primitive. The gist of the idea is to

    1. Normalize the kernel as we already do
    2. Compute per-window normalization factor coming from the image by sqrt(avg_pool2d(input ** 2) * window_size)
    3. Compute the dot products via conv2d between the input and normalized kernel
    4. Normalize the output elementwise by the array obtained in step 2.
    5. Proceed as before

    On my laptop this achieves more than 2x performance improvement. I attach a quick test (compat_test.py) to verify that the two implementations yield (almost) the same results. This PR will obviously need some code unification and renaming before being merged, I submit it as it to facilitate easy comparisons by the maintainers :)

    opened by jatentaki 4
  • Confused about validation loss calculation

    Confused about validation loss calculation

    Hello, I am unable to understand the testing loss. Why loss = F.cross_entropy(preds, labels) is not also used for test loss with test preds?

    https://github.com/brohrer/sharpened_cosine_similarity_torch/blob/28c135adfb6062b0e9e29bb906736ceea9c08162/model_cifar10_18_4.py#L152-L163

    opened by quickgrid 3
  • Add absolute pooling

    Add absolute pooling

    Debug and test a maximum magnitude pooling layer

    @StephenHogg wrote a nice looking bit of code for this, but I haven't been able to get it to work just yet.

    Original code https://github.com/StephenHogg/SCS/blob/main/SCS/layer.py

    Copy/pasted to https://github.com/brohrer/sharpened_cosine_similarity_torch/blob/main/absolute_pooling.py

    opened by brohrer 3
  • Parameter handling

    Parameter handling

    Updated PyTorch and Jax implementations Removed p parameter exponentiation Added weight clipping and p parameter lower limit Changed parameter initialization Removed alpha parameter

    Demo scripts and MNIST test all run.

    Any thought or comments? @ducha-aiki @4rtemi5

    opened by brohrer 1
  • added an Example using pytorch lightning

    added an Example using pytorch lightning

    This is an example application of the scsl fashion mnist example with PyTorch lightning. Lightning removes a lot of the boilerplate of PyTorch and helps to focus on the network. Furthermore, lightning makes training on GPUs very simple.

    opened by p-sodmann 1
  • Currently creates a lot of NaNs

    Currently creates a lot of NaNs

    I am a physician and don't know what I am doing, but this helps to prevent the NaNs, maybe it helps you to create a more sophisticated solution.

            out = inp.square()
            if torch.any(torch.isnan(out)):
                raise Exception("out")
                
            if self.groups == 1:
                out = out.sum(1, keepdim=True)
    
            norm = F.conv2d(
                out,
                torch.ones_like(self.weight[:1, :1]),
                None,
                self.stride,
                self.padding,
                self.dilation) + 1e-6
            
            if torch.any(torch.isnan(norm)):
                raise Exception("norm")
                
            # prevent 0 and inf
            q = torch.exp(-self.q / (self.q_scale**2 + 0.1))
            if torch.any(torch.isnan(q)):
                raise Exception("q")
                
    
            weight = self.weight / (self.weight.square().sum(dim=(1, 2, 3), keepdim=True).sqrt() + q + 1e-6)
            if torch.any(torch.isnan(weight)):
                raise Exception("weight")
                
    
            out = F.conv2d(
                inp,
                weight,
                self.bias,
                self.stride,
                self.padding,
                self.dilation,
                self.groups
            ) / ((norm**2).sqrt() + 1e-1)
    
            if torch.any(torch.isnan(out)):
                raise Exception("out2")
    
            # Comment these lines out for vanilla cosine similarity.
            # It's ~200x faster.
            abs = (out.square() + 1e-6).sqrt()
            sign = out / abs
            # prevent 0 and inf
            p = torch.exp(self.p / (self.p_scale**2 +0.1))
            out = abs ** p
            out = out * sign
            if torch.any(torch.isnan(out)):
                raise Exception("out3")
            return out```
    opened by p-sodmann 1
  • Reimplmentation & slight improvements.

    Reimplmentation & slight improvements.

    Hi! I played around with your code yesterday and ended up refactoring it quite heavy to ease usage. The code is nowhere ready for any pull request, but I hope that you find the changes somewhat useful.

    You can find the code here.

    Notes:

    • I were able to achieve 91.3% CIFAR-10 accuracy with a model of 1.2M parameters( see here for model implementation).
    • Residual connections seem to somewhat help deeper networks, however, training a much deeper model does not work out of the box (see here for implementation).
    • Either annotating the forward pass of the SCS layer or scripting the model will improve runtime at least 30% (measured on NVIDIA 1060 GPU).

    The code automatically uses a CUDA capable device if available, is quite cleaner than the original training script and will hopefully help you iterate on the idea :)

    You can view the tensorboard logs of my experiments here: https://tensorboard.dev/experiment/AY27LbxrRpaBHNMO0m9Wkw/#scalars

    opened by hukkelas 1
  • Fix for issue #4 with train progress

    Fix for issue #4 with train progress

    Issue due to progress bar after every epoch it prints something like,

    0it [00:00, ?it/s]run: 0   epoch: 0   duration: 75.29   training loss: 2.249   testing loss: 2.137   training accuracy: 40.17%   testing accuracy: 57.66%
    

    Instead of,

    run: 0   epoch: 0   duration: 75.29   training loss: 2.249   testing loss: 2.137   training accuracy: 40.17%   testing accuracy: 57.66%
    
    opened by quickgrid 1
  • Create a CIFAR 10 demo

    Create a CIFAR 10 demo

    So far sharpened cosine similarity has performed well on Fashion MNIST. The next step is to demonstrate it on CIFAR 10.

    Here are some partial implementations in case they help from Abhiraj Kanse https://colab.research.google.com/drive/1lKx9wiY2bMg5rH5EvJKymj7nb8I9ESc4?usp=sharing

    from @oliver-batchelor https://github.com/oliver-batchelor/scs_cifar/blob/main/src/main.py

    opened by brohrer 1
  • Unused calculation fix

    Unused calculation fix

    Enhancement

    • Needs a progress bar at least in train loader.

    Issues

    Multiple files have the following issues.

    • Duplicate calculation of epoch time inside mini batch loader. Extra calculation of epoch_duration = time.time() - epoch_start_time inside train loader.

    https://github.com/brohrer/sharpened_cosine_similarity_torch/blob/690909ff360c67581ab2745a8a8516c4ee133ad8/demo_fashion_mnist.py#L127-L131

    • Test preds calculation not used and can be removed.

    https://github.com/brohrer/sharpened_cosine_similarity_torch/blob/690909ff360c67581ab2745a8a8516c4ee133ad8/demo_fashion_mnist.py#L138-L147

    • Make following consistent,

    https://github.com/brohrer/sharpened_cosine_similarity_torch/blob/690909ff360c67581ab2745a8a8516c4ee133ad8/model_cifar10_18_4.py#L127-L130

    https://github.com/brohrer/sharpened_cosine_similarity_torch/blob/690909ff360c67581ab2745a8a8516c4ee133ad8/model_cifar10_18_4.py#L154-L155

    opened by quickgrid 0
Owner
Brandon Rohrer
My latest and most interesting work has been migrated to GitLab. Come say hi. https://gitlab.com/brohrer
Brandon Rohrer
Official implementation of NeurIPS 2021 paper "One Loss for All: Deep Hashing with a Single Cosine Similarity based Learning Objective"

Official implementation of NeurIPS 2021 paper "One Loss for All: Deep Hashing with a Single Cosine Similarity based Learning Objective"

Ng Kam Woh 71 Dec 22, 2022
TensorFlow Similarity is a python package focused on making similarity learning quick and easy.

TensorFlow Similarity is a python package focused on making similarity learning quick and easy.

null 912 Jan 8, 2023
Bayesian-Torch is a library of neural network layers and utilities extending the core of PyTorch to enable the user to perform stochastic variational inference in Bayesian deep neural networks

Bayesian-Torch is a library of neural network layers and utilities extending the core of PyTorch to enable the user to perform stochastic variational inference in Bayesian deep neural networks. Bayesian-Torch is designed to be flexible and seamless in extending a deterministic deep neural network architecture to corresponding Bayesian form by simply replacing the deterministic layers with Bayesian layers.

Intel Labs 210 Jan 4, 2023
Pytorch and Torch testing code of CartoonGAN

CartoonGAN-Test-Pytorch-Torch Pytorch and Torch testing code of CartoonGAN [Chen et al., CVPR18]. With the released pretrained models by the authors,

Yijun Li 642 Dec 27, 2022
Pytorch Implementations of large number classical backbone CNNs, data enhancement, torch loss, attention, visualization and some common algorithms.

Torch-template-for-deep-learning Pytorch implementations of some **classical backbone CNNs, data enhancement, torch loss, attention, visualization and

Li Shengyan 270 Dec 31, 2022
Torch-ngp - A pytorch implementation of the hash encoder proposed in instant-ngp

HashGrid Encoder (WIP) A pytorch implementation of the HashGrid Encoder from ins

hawkey 1k Jan 1, 2023
Torch-mutable-modules - Use in-place and assignment operations on PyTorch module parameters with support for autograd

Torch Mutable Modules Use in-place and assignment operations on PyTorch module p

Kento Nishi 7 Jun 6, 2022
DCT-Mask: Discrete Cosine Transform Mask Representation for Instance Segmentation

DCT-Mask: Discrete Cosine Transform Mask Representation for Instance Segmentation This project hosts the code for implementing the DCT-MASK algorithms

Alibaba Cloud 57 Nov 27, 2022
AdamW optimizer and cosine learning rate annealing with restarts

AdamW optimizer and cosine learning rate annealing with restarts This repository contains an implementation of AdamW optimization algorithm and cosine

Maksym Pyrozhok 133 Dec 20, 2022
Cosine Annealing With Warmup

CosineAnnealingWithWarmup Formulation The learning rate is annealed using a cosine schedule over the course of learning of n_total total steps with an

zhuyun 4 Apr 18, 2022
Simple torch.nn.module implementation of Alias-Free-GAN style filter and resample

Alias-Free-Torch Simple torch module implementation of Alias-Free GAN. This repository including Alias-Free GAN style lowpass sinc filter @filter.py A

이준혁(Junhyeok Lee) 64 Dec 22, 2022
A machine learning library for spiking neural networks. Supports training with both torch and jax pipelines, and deployment to neuromorphic hardware.

Rockpool Rockpool is a Python package for developing signal processing applications with spiking neural networks. Rockpool allows you to build network

SynSense 21 Dec 14, 2022
A torch.Tensor-like DataFrame library supporting multiple execution runtimes and Arrow as a common memory format

TorchArrow (Warning: Unstable Prototype) This is a prototype library currently under heavy development. It does not currently have stable releases, an

Facebook Research 536 Jan 6, 2023
Implements Stacked-RNN in numpy and torch with manual forward and backward functions

Recurrent Neural Networks Implements simple recurrent network and a stacked recurrent network in numpy and torch respectively. Both flavours implement

Vishal R 1 Nov 16, 2021
Torch-based tool for quantizing high-dimensional vectors using additive codebooks

Trainable multi-codebook quantization This repository implements a utility for use with PyTorch, and ideally GPUs, for training an efficient quantizer

Daniel Povey 41 Jan 7, 2023
Torch implementation of "Enhanced Deep Residual Networks for Single Image Super-Resolution"

NTIRE2017 Super-resolution Challenge: SNU_CVLab Introduction This is our project repository for CVPR 2017 Workshop (2nd NTIRE). We, Team SNU_CVLab, (B

Bee Lim 625 Dec 30, 2022
Automatic number plate recognition using tech: Yolo, OCR, Scene text detection, scene text recognation, flask, torch

Automatic Number Plate Recognition Automatic Number Plate Recognition (ANPR) is the process of reading the characters on the plate with various optica

Meftun AKARSU 52 Dec 22, 2022