Compare neural networks by their feature similarity

Related tags

Deep Learning PyTorch-Model-Compare

Overview

PyTorch Model Compare

A tiny package to compare two neural networks in PyTorch. There are many ways to compare two neural networks, but one robust and scalable way is using the Centered Kernel Alignment (CKA) metric, where the features of the networks are compared.

Centered Kernel Alignment

Centered Kernel Alignment (CKA) is a representation similarity metric that is widely used for understanding the representations learned by neural networks. Specifically, CKA takes two feature maps / representations X and Y as input and computes their normalized similarity (in terms of the Hilbert-Schmidt Independence Criterion (HSIC)) as

Where K and L are similarity matrices of X and Y respectively. However, the above formula is not scalable against deep architectures and large datasets. Therefore, a minibatch version can be constructed that uses an unbiased estimator of the HSIC as

The above form of CKA is from the 2021 ICLR paper by Nguyen T., Raghu M, Kornblith S.

Getting Started

Installation

pip install torch_cka

Usage

from torch_cka import CKA
model1 = resnet18(pretrained=True)  # Or any neural network of your choice
model2 = resnet34(pretrained=True)

dataloader = DataLoader(your_dataset, 
                        batch_size=batch_size, # according to your device memory
                        shuffle=False)  # Don't forget to seed your dataloader

cka = CKA(model1, model2,
          model1_name="ResNet18",   # good idea to provide names to avoid confusion
          model2_name="ResNet34",   
          model1_layers=layer_names_resnet18, # List of layers to extract features from
          model2_layers=layer_names_resnet34, # extracts all layer features by default
          device='cuda')

cka.compare(dataloader) # secondary dataloader is optional

results = cka.export()  # returns a dict that contains model names, layer names
                        # and the CKA matrix

Examples

torch_cka can be used with any pytorch model (subclass of nn.Module) and can be used with pretrained models available from popular sources like torchHub, timm, huggingface etc. Some examples of where this package can come in handy are illustrated below.

Comparing the effect of Depth

A simple experiment is to analyse the features learned by two architectures of the same family - ResNets but of different depths. Taking two ResNets - ResNet18 and ResNet34 - pre-trained on the Imagenet dataset, we can analyse how they produce their features on, say CIFAR10 for simplicity. This comparison is shown as a heatmap below.

We see high degree of similarity between the two models in lower layers as they both learn similar representations from the data. However at higher layers, the similarity reduces as the deeper model (ResNet34) learn higher order features which the is elusive to the shallower model (ResNet18). Yet, they do indeed have certain similarity in their last fc layer which acts as the feature classifier.

Comparing Two Similar Architectures

Another way of using CKA is in ablation studies. We can go further than those ablation studies that only focus on resultant performance and employ CKA to study the internal representations. Case in point - ResNet50 and WideResNet50 (k=2). WideResNet50 has the same architecture as ResNet50 except having wider residual bottleneck layers (by a factor of 2 in this case).

We clearly notice that the learned features are indeed different after the first few layers. The width has a more pronounced effect in deeper layers as compared to the earlier layers as both networks seem to learn similar features in the initial layers.

As a bonus, here is a comparison between ViT and the latest SOTA model Swin Transformer pretrained on ImageNet22k.

Comparing quite different architectures

CNNs have been analysed a lot over the past decade since AlexNet. We somewhat know what sort of features they learn across their layers (through visualizations) and we have put them to good use. One interesting approach is to compare these understandable features with newer models that don't permit easy visualizations (like recent vision transformer architectures) and study them. This has indeed been a hot research topic (see Raghu et.al 2021).

Comparing Datasets

Yet another application is to compare two datasets - preferably two versions of the data. This is especially useful in production where data drift is a known issue. If you have an updated version of a dataset, you can study how your model will perform on it by comparing the representations of the datasets. This can be more telling about actual performance than simply comparing the datasets directly.

This can also be quite useful in studying the performance of a model on downstream tasks and fine-tuning. For instance, if the CKA score is high for some features on different datasets, then those can be frozen during fine-tuning. As an example, the following figure compares the features of a pretrained Resnet50 on the Imagenet test data and the VOC dataset. Clearly, the pretrained features have little correlation with the VOC dataset. Therefore, we have to resort to fine-tuning to get at least satisfactory results.

Tips

If your model is large (lots of layers or large feature maps), try to extract from select layers. This is to avoid out of memory issues.
If you still want to compare the entire feature map, you can run it multiple times with few layers at each iteration and export your data using cka.export(). The exported data can then be concatenated to produce the full CKA matrix.
Give proper model names to avoid confusion when interpreting the results. The code automatically extracts the model name for you by default, but it is good practice to label the models according to your use case.
When providing your dataloader(s) to the compare() function, it is important that they are seeded properly for reproducibility.
When comparing datasets, be sure to set drop_last=True when building the dataloader. This resolves shape mismatch issues - especially in differently sized datasets.

Citation

If you use this repo in your project or research, please cite as -

@software{subramanian2021torch_cka,
    author={Anand Subramanian},
    title={torch_cka},
    url={https://github.com/AntixK/PyTorch-Model-Compare},
    year={2021}
}

Comments

The X means what in the formulation of HSIC

I notice that in the formulation of HSIC , the K are calculated with X . And X denote a matrix of activateions of p1 neurons for n examples . But in the code , it seems like the feature map after this layer . I don't know if i am wrong , could you help me ?
Take an example , use CIFAR-10 as dataset , which images are 3232 . And the first layer is conv2(in_channels=3,out_channels=6,kernel_size=5) , the feature map after this layer could be 62828 , and the neurons could be 65*5 . Which should i take for calculate CKA? I read the paper and i think it should be the representation , do you think what i said is right? But it some other project of CKA on githubs , they use the weight of neurons which confuses me . If i want to use traditional CKA to calculate the CKA similarities , where can i find the needed feature map?

opened by Zhaihaoyu 2
AssertionError: Input image size (32*32) doesn't match model (224*224).

Dear Author： This error occurred when I tried to run resnet_vit_compare.py： AssertionError: Input image size (3232) doesn't match model (224224). Where do I need to modify it?

opened by DDxk369 1
Works fine with the whole model but raise "NANs" on selected layers.

When I was trying to compare the same model trained on different datasets, I encountered a weird problem:

It works fine when I compare all layers: cka = CKA(model1, model2, device='cuda', model1_name='model1', model2_name='model2')

But, when I try to compare a selected subset of layers: cka = CKA(model1, model2, device='cuda', model1_name='model1', model2_name='model2', model1_layers=list(model1.state_dict().keys())[:5], model2_layers=list(model2.state_dict().keys())[:5]) It raises:

HSIC computation resulted in NANs

Do you have any idea how to fix this? Thank you very much.

opened by PengHongyiNTU 2
getting spurious "HSIC computation resulted in NANs"

Thanks for this great little module! I was able to adapt the code to deal with models suitable for speech recognition (mostly transformers and conformers) and I'm learning a lot from the CKA outputs.

One problem I face is that for some models, some layers hit this assert after a certain number of batches. Basically, if I try to pass 300 batches of 32 through the model, I end up with NaN exception around 150 or so. It doesn't seem related to the data because I shuffle the data and get the same exception after the same number of batches.

I guess this is a numerical stability problem perhaps. Is there some assumptions about the range of the layer features and outputs?

opened by jprobichaud 5

Companion code for the paper "An Infinite-Feature Extension for Bayesian ReLU Nets That Fixes Their Asymptotic Overconfidence" (NeurIPS 2021)

ReLU-GP Residual (RGPR) This repository contains code for reproducing the following NeurIPS 2021 paper: @inproceedings{kristiadi2021infinite, title=

4 Dec 26, 2021

Diabet Feature Engineering - Predict whether people have diabetes when their characteristics are specified

6 Jan 18, 2022

Diabetes-Feature-Engineering - A machine learning model that can predict whether people have diabetes when their characteristics are specified

Diabetes-Feature-Engineering Aim Developing a machine learning model that can pr

0 Feb 23, 2022

Complex-Valued Neural Networks (CVNN)Complex-Valued Neural Networks (CVNN)

Complex-Valued Neural Networks (CVNN) Done by @NEGU93 - J. Agustin Barrachina Using this library, the only difference with a Tensorflow code is that y

1 Nov 12, 2021

A PyTorch implementation of "SimGNN: A Neural Network Approach to Fast Graph Similarity Computation" (WSDM 2019).

SimGNN ⠀⠀⠀ A PyTorch implementation of SimGNN: A Neural Network Approach to Fast Graph Similarity Computation (WSDM 2019). Abstract Graph similarity s

534 Dec 25, 2022

[CIKM 2019] Code and dataset for "Fi-GNN: Modeling Feature Interactions via Graph Neural Networks for CTR Prediction"

FiGNN for CTR prediction The code and data for our paper in CIKM2019: Fi-GNN: Modeling Feature Interactions via Graph Neural Networks for CTR Predicti

Big Data and Multi-modal Computing Group, CRIPAC

75 Dec 30, 2022

Code accompanying our paper Feature Learning in Infinite-Width Neural Networks

Empirical Experiments in "Feature Learning in Infinite-width Neural Networks" This repo contains code to replicate our experiments (Word2Vec, MAML) in

37 Dec 14, 2022

This repository contains notebook implementations of the following Neural Process variants: Conditional Neural Processes (CNPs), Neural Processes (NPs), Attentive Neural Processes (ANPs).

The Neural Process Family This repository contains notebook implementations of the following Neural Process variants: Conditional Neural Processes (CN

892 Dec 28, 2022

Classify bird species based on their songs using SIamese Networks and 1D dilated convolutions.

The goal is to classify different birds species based on their songs/calls. Spectrograms have been extracted from the audio samples and used as features for classification.

9 Dec 27, 2022

Compare neural networks by their feature similarity

Related tags

Overview

PyTorch Model Compare

Centered Kernel Alignment

Getting Started

Installation

Usage

Examples

Comparing the effect of Depth

Comparing Two Similar Architectures

Comparing quite different architectures

Comparing Datasets

Tips

Citation

You might also like...

Companion code for the paper "An Infinite-Feature Extension for Bayesian ReLU Nets That Fixes Their Asymptotic Overconfidence" (NeurIPS 2021)

Diabet Feature Engineering - Predict whether people have diabetes when their characteristics are specified

Diabetes-Feature-Engineering - A machine learning model that can predict whether people have diabetes when their characteristics are specified

Complex-Valued Neural Networks (CVNN)Complex-Valued Neural Networks (CVNN)

A PyTorch implementation of "SimGNN: A Neural Network Approach to Fast Graph Similarity Computation" (WSDM 2019).

[CIKM 2019] Code and dataset for "Fi-GNN: Modeling Feature Interactions via Graph Neural Networks for CTR Prediction"

Code accompanying our paper Feature Learning in Infinite-Width Neural Networks

This repository contains notebook implementations of the following Neural Process variants: Conditional Neural Processes (CNPs), Neural Processes (NPs), Attentive Neural Processes (ANPs).

Classify bird species based on their songs using SIamese Networks and 1D dilated convolutions.

Comments

The X means what in the formulation of HSIC

AssertionError: Input image size (32*32) doesn't match model (224*224).

Works fine with the whole model but raise "NANs" on selected layers.

getting spurious "HSIC computation resulted in NANs"

Owner

Anand Krishnamoorthy

TensorFlow Similarity is a python package focused on making similarity learning quick and easy.

Sharpened cosine similarity torch - A Sharpened Cosine Similarity layer for PyTorch

This is the official pytorch implementation for the paper: Instance Similarity Learning for Unsupervised Feature Representation.

This YoloV5 based model is fit to detect people and different types of land vehicles, and displaying their density on a fitted map, according to their coordinates and detected labels.

Character-Input - Create a program that asks the user to enter their name and their age

Code for "Single-view robot pose and joint angle estimation via render & compare", CVPR 2021 (Oral).

Compare outputs between layers written in Tensorflow and layers written in Pytorch

Pytorch Implementation for CVPR2018 Paper: Learning to Compare: Relation Network for Few-Shot Learning

A lightweight library to compare different PyTorch implementations of the same network architecture.

Local Similarity Pattern and Cost Self-Reassembling for Deep Stereo Matching Networks

AssertionError: Input image size (3232) doesn't match model (224224).