Sharpened cosine similarity torch - A Sharpened Cosine Similarity layer for PyTorch

Brandon Rohrer

Last update: Nov 30, 2022

Related tags

Deep Learning sharpened_cosine_similarity_torch

Overview

Sharpened Cosine Similarity

A layer implementation for PyTorch

Install

At your command line:

git clone https://github.com/brohrer/sharpened_cosine_similarity_torch.git

You'll need to install or upgrade PyTorch if you haven't already. If python3 is the command you use to invoke Python at your command line:

python3 -m pip install torch torchvision --upgrade

Demo

Run the Fashion MNIST demo to see sharpened cosine similarity in action.

cd sharpened_cosine_similarity_torch
python3 demo_fashion_mnist.py

When you run this it will take a few extra minutes the first time through to download and extract the Fashion MNIST data set. Its less than 100MB when fully extracted.

I run this entirely on laptop CPUs. I have a dual-core i7 that takes about 90 seconds per epoch and an 8-core i7 that takes about 45 seconds per epoch. Your mileage may vary.

Monitor

You can check on the status of your runs at any time. In another console navigate to the smae directory and run

python3 show_results.py

This will give a little console summary like this

testing errors for version test
mean  : 14.08%
stddev: 0.1099%
stderr: 0.03887%
n runs: 8

and drop a couple of plots like this in the plots directory showing how the classification error on the test data set decreases with each pass through the training data set.

The demo will keep running for a long time if you let it. Kill it when you get bored of it. If you want to pick the sequence of runs back up, re-run the demo and it will load all the results it's generated so far and append to them.

Track

If you'd like to experiment with the sharpened cosine similarity code, the demo, or with other data sets, you can keep track of each new run by adding a version argument at the command line.

To start a run with version string "v37" run

python3 demo_fashion_mnist.py v37

To check on its progress

python3 show_results.py v37

The version string can be arbitrarily descriptive, for example "3_scs_layer_2_fully_connected_layer_learning_rate_003", but keep it alphanumeric with underscores.

Credit where it's due

Based on and copy/pasted heavily from code from Ze Wang and code from Oliver Batchelor and the TensorFlow implementation and blog post from Raphael Pisoni.

Comments

Single alpha per layer. Defaults changed.

Playing with alpha, I found very strong results with 1 shared alpha value across the layer. Initial observations suggest that one alpha per kernel doesn't add anything to performance, while incurring the addition of extra parameters.

Inspecting the alpha values chosen by LSUV_like_init(), they fall roughly on the range from 10-50 for a small model I'm running on Fashion MNIST. Choosing an initial alpha of 10 and disabling auto init generates the strongest results I've seen from my model yet.

@ducha-aiki, what do you think about moving to a single alpha per layer and removing the LSUV init? You've spent the most time thinking about it, so it's possible you've built an intuition around it I don't have yet.

@4rtemi5 based on what I've seen, the alpha boosts performance noticeably. Based on what implementation we settle on in PyTorch I plan to mimic it in the Jax code. It might make a difference in getting to 90% on CIFAR-10.

This diff is a little messy-it also has some changes to the demo scripts to make them compatible with the new API.

opened by brohrer 10
Reimplement SCS in terms of conv2d
This PR proposes a change in implementation of SCS which uses Conv2D, a much more performant primitive. The gist of the idea is to

Normalize the kernel as we already do

Compute per-window normalization factor coming from the image by sqrt(avg_pool2d(input ** 2) * window_size)

Compute the dot products via conv2d between the input and normalized kernel

Normalize the output elementwise by the array obtained in step 2.

Proceed as before

On my laptop this achieves more than 2x performance improvement. I attach a quick test (compat_test.py) to verify that the two implementations yield (almost) the same results. This PR will obviously need some code unification and renaming before being merged, I submit it as it to facilitate easy comparisons by the maintainers :)
opened by jatentaki 4
Confused about validation loss calculation

Hello, I am unable to understand the testing loss. Why loss = F.cross_entropy(preds, labels) is not also used for test loss with test preds?

https://github.com/brohrer/sharpened_cosine_similarity_torch/blob/28c135adfb6062b0e9e29bb906736ceea9c08162/model_cifar10_18_4.py#L152-L163

opened by quickgrid 3
Add absolute pooling

Debug and test a maximum magnitude pooling layer

@StephenHogg wrote a nice looking bit of code for this, but I haven't been able to get it to work just yet.

Original code https://github.com/StephenHogg/SCS/blob/main/SCS/layer.py

Copy/pasted to https://github.com/brohrer/sharpened_cosine_similarity_torch/blob/main/absolute_pooling.py

opened by brohrer 3
Parameter handling

Updated PyTorch and Jax implementations Removed p parameter exponentiation Added weight clipping and p parameter lower limit Changed parameter initialization Removed alpha parameter

Demo scripts and MNIST test all run.

Any thought or comments? @ducha-aiki @4rtemi5

opened by brohrer 1
added an Example using pytorch lightning

This is an example application of the scsl fashion mnist example with PyTorch lightning. Lightning removes a lot of the boilerplate of PyTorch and helps to focus on the network. Furthermore, lightning makes training on GPUs very simple.

opened by p-sodmann 1

Currently creates a lot of NaNs

I am a physician and don't know what I am doing, but this helps to prevent the NaNs, maybe it helps you to create a more sophisticated solution.

        out = inp.square()
        if torch.any(torch.isnan(out)):
            raise Exception("out")
            
        if self.groups == 1:
            out = out.sum(1, keepdim=True)

        norm = F.conv2d(
            out,
            torch.ones_like(self.weight[:1, :1]),
            None,
            self.stride,
            self.padding,
            self.dilation) + 1e-6
        
        if torch.any(torch.isnan(norm)):
            raise Exception("norm")
            
        # prevent 0 and inf
        q = torch.exp(-self.q / (self.q_scale**2 + 0.1))
        if torch.any(torch.isnan(q)):
            raise Exception("q")
            

        weight = self.weight / (self.weight.square().sum(dim=(1, 2, 3), keepdim=True).sqrt() + q + 1e-6)
        if torch.any(torch.isnan(weight)):
            raise Exception("weight")
            

        out = F.conv2d(
            inp,
            weight,
            self.bias,
            self.stride,
            self.padding,
            self.dilation,
            self.groups
        ) / ((norm**2).sqrt() + 1e-1)

        if torch.any(torch.isnan(out)):
            raise Exception("out2")

        # Comment these lines out for vanilla cosine similarity.
        # It's ~200x faster.
        abs = (out.square() + 1e-6).sqrt()
        sign = out / abs
        # prevent 0 and inf
        p = torch.exp(self.p / (self.p_scale**2 +0.1))
        out = abs ** p
        out = out * sign
        if torch.any(torch.isnan(out)):
            raise Exception("out3")
        return out```

opened by p-sodmann 1

Reimplmentation & slight improvements.
Hi! I played around with your code yesterday and ended up refactoring it quite heavy to ease usage. The code is nowhere ready for any pull request, but I hope that you find the changes somewhat useful.

You can find the code here.

Notes:

I were able to achieve 91.3% CIFAR-10 accuracy with a model of 1.2M parameters( see here for model implementation).

Residual connections seem to somewhat help deeper networks, however, training a much deeper model does not work out of the box (see here for implementation).

Either annotating the forward pass of the SCS layer or scripting the model will improve runtime at least 30% (measured on NVIDIA 1060 GPU).

The code automatically uses a CUDA capable device if available, is quite cleaner than the original training script and will hopefully help you iterate on the idea :)

You can view the tensorboard logs of my experiments here: https://tensorboard.dev/experiment/AY27LbxrRpaBHNMO0m9Wkw/#scalars
opened by hukkelas 1

Fix for issue #4 with train progress

Issue due to progress bar after every epoch it prints something like,

0it [00:00, ?it/s]run: 0   epoch: 0   duration: 75.29   training loss: 2.249   testing loss: 2.137   training accuracy: 40.17%   testing accuracy: 57.66%

Instead of,

run: 0   epoch: 0   duration: 75.29   training loss: 2.249   testing loss: 2.137   training accuracy: 40.17%   testing accuracy: 57.66%

opened by quickgrid 1

Create a CIFAR 10 demo

So far sharpened cosine similarity has performed well on Fashion MNIST. The next step is to demonstrate it on CIFAR 10.

Here are some partial implementations in case they help from Abhiraj Kanse https://colab.research.google.com/drive/1lKx9wiY2bMg5rH5EvJKymj7nb8I9ESc4?usp=sharing

from @oliver-batchelor https://github.com/oliver-batchelor/scs_cifar/blob/main/src/main.py

opened by brohrer 1
Unused calculation fix
Enhancement

Needs a progress bar at least in train loader.

Issues

Multiple files have the following issues.

Duplicate calculation of epoch time inside mini batch loader. Extra calculation of epoch_duration = time.time() - epoch_start_time inside train loader.

https://github.com/brohrer/sharpened_cosine_similarity_torch/blob/690909ff360c67581ab2745a8a8516c4ee133ad8/demo_fashion_mnist.py#L127-L131

Test preds calculation not used and can be removed.

https://github.com/brohrer/sharpened_cosine_similarity_torch/blob/690909ff360c67581ab2745a8a8516c4ee133ad8/demo_fashion_mnist.py#L138-L147

Make following consistent,

https://github.com/brohrer/sharpened_cosine_similarity_torch/blob/690909ff360c67581ab2745a8a8516c4ee133ad8/model_cifar10_18_4.py#L127-L130

https://github.com/brohrer/sharpened_cosine_similarity_torch/blob/690909ff360c67581ab2745a8a8516c4ee133ad8/model_cifar10_18_4.py#L154-L155
opened by quickgrid 0

Owner

Brandon Rohrer

My latest and most interesting work has been migrated to GitLab. Come say hi. https://gitlab.com/brohrer

GitHub

Official implementation of NeurIPS 2021 paper "One Loss for All: Deep Hashing with a Single Cosine Similarity based Learning Objective"

71 Dec 22, 2022

Product-based-recommendation-system - A product based recommendation system which uses Machine learning algorithm such as KNN and cosine similarity

Product-based-recommendation-system A product based recommendation system which

2 Feb 15, 2022

Siamese-nn-semantic-text-similarity - A repository containing comprehensive Neural Networks based PyTorch implementations for the semantic text similarity task

Siamese Deep Neural Networks for Semantic Text Similarity PyTorch A repository c

32 Dec 15, 2022

TensorFlow Similarity is a python package focused on making similarity learning quick and easy.

912 Jan 8, 2023

Bayesian-Torch is a library of neural network layers and utilities extending the core of PyTorch to enable the user to perform stochastic variational inference in Bayesian deep neural networks

Bayesian-Torch is a library of neural network layers and utilities extending the core of PyTorch to enable the user to perform stochastic variational inference in Bayesian deep neural networks. Bayesian-Torch is designed to be flexible and seamless in extending a deterministic deep neural network architecture to corresponding Bayesian form by simply replacing the deterministic layers with Bayesian layers.

210 Jan 4, 2023

Pytorch and Torch testing code of CartoonGAN

CartoonGAN-Test-Pytorch-Torch Pytorch and Torch testing code of CartoonGAN [Chen et al., CVPR18]. With the released pretrained models by the authors,

642 Dec 27, 2022

Pytorch Implementations of large number classical backbone CNNs, data enhancement, torch loss, attention, visualization and some common algorithms.

Torch-template-for-deep-learning Pytorch implementations of some **classical backbone CNNs, data enhancement, torch loss, attention, visualization and

270 Dec 31, 2022

Torch-ngp - A pytorch implementation of the hash encoder proposed in instant-ngp

HashGrid Encoder (WIP) A pytorch implementation of the HashGrid Encoder from ins

1k Jan 1, 2023

Torch-mutable-modules - Use in-place and assignment operations on PyTorch module parameters with support for autograd

Torch Mutable Modules Use in-place and assignment operations on PyTorch module p

7 Jun 6, 2022

DCT-Mask: Discrete Cosine Transform Mask Representation for Instance Segmentation

DCT-Mask: Discrete Cosine Transform Mask Representation for Instance Segmentation This project hosts the code for implementing the DCT-MASK algorithms

57 Nov 27, 2022

AdamW optimizer and cosine learning rate annealing with restarts

AdamW optimizer and cosine learning rate annealing with restarts This repository contains an implementation of AdamW optimization algorithm and cosine

133 Dec 20, 2022

Cosine Annealing With Warmup

CosineAnnealingWithWarmup Formulation The learning rate is annealed using a cosine schedule over the course of learning of n_total total steps with an

4 Apr 18, 2022

Simple torch.nn.module implementation of Alias-Free-GAN style filter and resample

Alias-Free-Torch Simple torch module implementation of Alias-Free GAN. This repository including Alias-Free GAN style lowpass sinc filter @filter.py A

64 Dec 22, 2022

A machine learning library for spiking neural networks. Supports training with both torch and jax pipelines, and deployment to neuromorphic hardware.

Rockpool Rockpool is a Python package for developing signal processing applications with spiking neural networks. Rockpool allows you to build network

21 Dec 14, 2022

Sharpened cosine similarity torch - A Sharpened Cosine Similarity layer for PyTorch

Related tags

Overview

Sharpened Cosine Similarity

Install

Demo

Monitor

Track

Credit where it's due

Comments

Enhancement

Issues

Owner

Brandon Rohrer

Official implementation of NeurIPS 2021 paper "One Loss for All: Deep Hashing with a Single Cosine Similarity based Learning Objective"

Product-based-recommendation-system - A product based recommendation system which uses Machine learning algorithm such as KNN and cosine similarity

Siamese-nn-semantic-text-similarity - A repository containing comprehensive Neural Networks based PyTorch implementations for the semantic text similarity task

TensorFlow Similarity is a python package focused on making similarity learning quick and easy.

Bayesian-Torch is a library of neural network layers and utilities extending the core of PyTorch to enable the user to perform stochastic variational inference in Bayesian deep neural networks

Pytorch and Torch testing code of CartoonGAN

Pytorch Implementations of large number classical backbone CNNs, data enhancement, torch loss, attention, visualization and some common algorithms.

Torch-ngp - A pytorch implementation of the hash encoder proposed in instant-ngp

Torch-mutable-modules - Use in-place and assignment operations on PyTorch module parameters with support for autograd

DCT-Mask: Discrete Cosine Transform Mask Representation for Instance Segmentation

AdamW optimizer and cosine learning rate annealing with restarts

Cosine Annealing With Warmup

Simple torch.nn.module implementation of Alias-Free-GAN style filter and resample

A machine learning library for spiking neural networks. Supports training with both torch and jax pipelines, and deployment to neuromorphic hardware.

A torch.Tensor-like DataFrame library supporting multiple execution runtimes and Arrow as a common memory format

Implements Stacked-RNN in numpy and torch with manual forward and backward functions

Torch-based tool for quantizing high-dimensional vectors using additive codebooks

Torch implementation of "Enhanced Deep Residual Networks for Single Image Super-Resolution"

Automatic number plate recognition using tech: Yolo, OCR, Scene text detection, scene text recognation, flask, torch