Generalized Shape Metrics on Neural Representations

In neuroscience and in deep learning, quantifying the (dis)similarity of neural representations across networks is a topic of substantial interest.

This code package computes metrics — notions of distance that satisfy the triangle inequality — between neural representations. If we record the activity of K networks, we can compute all pairwise distances and collect them into a K × K distance matrix. The triangle inequality ensures that all of these distance relationships are, in some sense, self-consistent. This self-consistency enables us to apply off-the-shelf algorithms for clustering and dimensionality reduction, which are available through many open-source packages such as scikit-learn.

We published a conference paper (Neurips '21) describing these ideas.

@inproceedings{neural_shape_metrics,
  author = {Alex H. Williams and Erin Kunz and Simon Kornblith and Scott W. Linderman},
  title = {Generalized Shape Metrics on Neural Representations},
  year = {2021},
  booktitle = {Advances in Neural Information Processing Systems},
  volume = {34},
  url = {https://arxiv.org/abs/2110.14739}
}

We also presented an early version of this work at Cosyne (see 7 minute summary on youtube) in early 2021.

Note: This research code remains a work-in-progress to some extent. It could use more documentation and examples. Please use at your own risk and reach out to us (alex.h.willia@gmail.com) if you have questions.

A short and preliminary guide

To install, set up standard python libraries (https://ipython.org/install.html) and then install via pip:

git clone https://github.com/ahwillia/netrep
cd netrep/
pip install -e .

Since the code is preliminary, you will be able to use git pull to get updates as we release them.

Computing the distance between two networks

The metrics implemented in this library are extensions of Procrustes distance. Some useful background can be found in Dryden & Mardia's textbook on Statistical Shape Analysis. A forthcoming preprint will describe the various metrics in more detail. For now, please see the short video description above and reach out to us if you have more questions.

The code uses an API similar to scikit-learn, so we recommend familiarizing yourself with that package.

We start by defining a metric object. The simplest metric to use is LinearMetric, which has a hyperparameter alpha which regularizes the alignment operation:

from netrep.metrics import LinearMetric

# Rotationally invariant metric (fully regularized).
proc_metric = LinearMetric(alpha=1.0, center_columns=True)

# Linearly invariant metric (no regularization).
cca_metric = LinearMetric(alpha=0.0, center_columns=True)

Valid values for the regularization term are 0 <= alpha <= 1. When alpha == 0, the resulting metric is similar to CCA and allows for an invertible linear transformation to align the activations. When alpha == 1, the model is fully regularized and only allows for rotational alignments.

We reccomend starting with the fully regularized model where alpha == 1.

Next, we define the data, which are stored in matrices X and Y that hold paired activations from two networks. Each row of X and Y contains a matched sample of neural activations. For example, we might record the activity of 500 neurons in visual cortex in response to 1000 images (or, analogously, feed 1000 images into a deep network and store the activations of 500 hidden units). We would collect the neural responses into a 1000 x 500 matrix X. We'd then repeat the experiment in a second animal and store the responses in a second matrix Y.

By default if the number of neurons in X and Y do not match, we zero-pad the dataset with fewer neurons to match the size of the larger dataset. This can be justified on the basis that zero-padding does not distort the geometry of the dataset, it simply embeds it into a higher dimension so that the two may be compared. Alternatively, one could preprocess the data by using PCA (for example) to project the data into a common, lower-dimensional space. The default zero-padding behavior can be deactivated as follows:

LinearMetric(alpha=1.0, zero_pad=True)  # default behavior

LinearMetric(alpha=1.0, zero_pad=False)  # throws an error if number of columns in X and Y don't match

Now we are ready to fit alignment transformations (which account for the neurons being mismatched across networks). Then, we evaluate the distance in the aligned space. These are respectively done by calling fit(...) and score(...) functions on the metric instance.

# Given
# -----
# X : ndarray, (num_samples x num_neurons), activations from first network.
#
# Y : ndarray, (num_samples x num_neurons), activations from second network.
#
# metric : an instance of LinearMetric(...)

# Fit alignment transformations.
metric.fit(X, Y)

# Evaluate distance between X and Y, using alignments fit above.
dist = metric.score(X, Y)

Since the model is fit and evaluated by separate function calls, it is very easy to cross-validate the estimated distances:

# Given
# -----
# X_train : ndarray, (num_train_samples x num_neurons), training data from first network.
#
# Y_train : ndarray, (num_train_samples x num_neurons), training data from second network.
#
# X_test : ndarray, (num_test_samples x num_neurons), test data from first network.
#
# Y_test : ndarray, (num_test_samples x num_neurons), test data from second network.
#
# metric : an instance of LinearMetric(...)

# Fit alignment transformations to the training set.
metric.fit(X_train, Y_train)

# Evaluate distance on the test set.
dist = metric.score(X_test, Y_test)

In fact, we can use scikit-learn's built-in cross-validation tools, since LinearMetric extends the sklearn.base.BaseEstimator class. So, if you'd like to do 10-fold cross-validation, for example:

from sklearn.model_selection import cross_validate
results = cross_validate(metric, X, Y, return_train_score=True, cv=10)
results["train_score"]  # holds 10 distance estimates between X and Y, using training data.
results["test_score"]   # holds 10 distance estimates between X and Y, using heldout data.

We can also call transform(...) function to align the activations

# Fit alignment transformations.
metric.fit(X, Y)

# Apply alignment transformations.
X_aligned, Y_aligned = metric.transform(X, Y)

# Now, e.g., you could use PCA to visualize the data in the aligned space...

Computing distances between many networks

Things start to get really interesting when we start to consider larger cohorts containing more than just two networks. The netrep.multiset file contains some useful methods. Let Xs = [X1, X2, X3, ..., Xk] be a list of num_samples x num_neurons matrices similar to those described above. We can do the following:

1) Computing all pairwise distances. The following returns a symmetric k x k matrix of distances.

metric = LinearMetric(alpha=1.0)
dist_matrix = pairwise_distances(metric, Xs, verbose=False)

By setting verbose=True, we print out a progress bar which might be useful for very large datasets.

We can also split data into training sets and test sets.

# Split data into training and testing sets
splitdata = [np.array_split(X, 2) for X in Xs]
traindata = [X_train for (X_train, X_test) in splitdata]
testdata = [X_test for (X_train, X_test) in splitdata]

# Compute all pairwise train and test distances.
train_dists, test_dists = pairwise_distances(metric, traindata, testdata=testdata)

2) Using the pairwise distance matrix. Many of the methods in sklearn.cluster and sklearn.manifold will work and operate directly on these distance matrices.

For example, to perform clustering over the cohort of networks, we could do:

# Given
# -----
# dist_matrix : (num_networks x num_networks) symmetric distance matrix, computed as described above.

# DBSCAN clustering
from sklearn.cluster import DBSCAN
cluster_ids = DBSCAN(metric="precomputed").fit_transform(dist_matrix)

# Agglomerative clustering
from sklearn.cluster import AgglomerativeClustering
cluster_ids = AgglomerativeClustering(n_clusters=5, affinity="precomputed").fit_transform(dist_matrix)

# OPTICS
from sklearn.cluster import OPTICS
cluster_ids = OPTICS(metric="precomputed").fit_transform(dist_matrix)

# Scipy hierarchical clustering
from scipy.cluster import hierarchy
from scipy.spatial.distance import squareform
hierarchy.ward(squareform(dist_matrix)) # return linkage

We can also visualize the set of networks in 2D space by using manifold learning methods:

# Given
# -----
# dist_matrix : (num_networks x num_networks) symmetric distance matrix, computed as described above.

# Multi-dimensional scaling
from sklearn.manifold import MDS
lowd_embedding = MDS(dissimilarity="precomputed").fit_transform(dist_matrix)

# t-distributed Stochastic Neighbor Embedding
from sklearn.manifold import TSNE
lowd_embedding = TSNE(dissimilarity="precomputed").fit_transform(dist_matrix)

# Isomap
from sklearn.manifold import Isomap
lowd_embedding = Isomap(dissimilarity="precomputed").fit_transform(dist_matrix)

# etc., etc.

3) K-means clustering and averaging across networks

We can average across networks using the metric spaces defined above. Specifically, we can compute a Fréchet/Karcher mean in the metric space. See also the section on "Generalized Procrustes Analysis" in Gower & Dijksterhuis (2004).

from netrep.multiset import procrustes_average
Xbar = procrustes_average(Xs, max_iter=100, tol=1e-4)

Further, we can extend the well-known k-means clustering algorithm to the metric space defined by Procrustes distance.

from netrep.multiset import procrustes_kmeans

# Fit 3 clusters
n_clusters = 3
centroids, labels, cent_dists = procrustes_kmeans(Xs, n_clusters)

An incomplete list of related work

Dabagia, Max, Konrad P. Kording, and Eva L. Dyer (forthcoming). "Comparing high-dimensional neural recordings by aligning their low-dimensional latent representations.” Nature Biomedical Engineering

Degenhart, A. D., Bishop, W. E., Oby, E. R., Tyler-Kabara, E. C., Chase, S. M., Batista, A. P., & Byron, M. Y. (2020). Stabilization of a brain–computer interface via the alignment of low-dimensional spaces of neural activity. Nature biomedical engineering, 4(7), 672-685.

Gower, J. C., & Dijksterhuis, G. B. (2004). Procrustes problems (Vol. 30). Oxford University Press.

Gallego, J. A., Perich, M. G., Chowdhury, R. H., Solla, S. A., & Miller, L. E. (2020). Long-term stability of cortical population dynamics underlying consistent behavior. Nature neuroscience, 23(2), 260-270.

Haxby, J. V., Guntupalli, J. S., Nastase, S. A., & Feilong, M. (2020). Hyperalignment: Modeling shared information encoded in idiosyncratic cortical topographies. Elife, 9, e56601.

Kornblith, S., Norouzi, M., Lee, H., & Hinton, G. (2019, May). Similarity of neural network representations revisited. In International Conference on Machine Learning (pp. 3519-3529). PMLR.

Kriegeskorte, N., Mur, M., & Bandettini, P. A. (2008). Representational similarity analysis-connecting the branches of systems neuroscience. Frontiers in systems neuroscience, 2, 4.

Maheswaranathan, N., Williams, A. H., Golub, M. D., Ganguli, S., & Sussillo, D. (2019). Universality and individuality in neural dynamics across large populations of recurrent networks. Advances in neural information processing systems, 2019, 15629.

Raghu, M., Gilmer, J., Yosinski, J., & Sohl-Dickstein, J. (2017). Svcca: Singular vector canonical correlation analysis for deep learning dynamics and interpretability. arXiv preprint arXiv:1706.05806.

I’m trying to use your netrep repo for my own analyses but I’m getting stuck on some errors and I just wanted to make sure I’m implementing it correctly. Essentially, I’m trying to compare the network activations at certain layers between a trained resnet152 model and an untrained resnet152 model in keras. My current workflow is as follows:

Get the activation model at a stopped at a certain layer for both the untrained and trained models Show the activation models an image from ImageNet and store the activations in (n x w x h x d) tensors (U, T) Call conv_layers.convolve_metric(LinearMetric(),T,U) and get the resulting (w x h) matrix

My questions are:

Is the score output by Metric.score based on how close they are to each other or how far apart they are. IE if the score is higher does that mean the representations are close together or far apart?
Is what I described above, the correct workflow, or is there a better one you would recommend?
As a sanity check I called conv_layers.convolve_metric(LinearMetric(), T, T) assuming this should return a matrix of 0s right? but the resulting matrix has an mean value of 0.878 and that doesn't seem right.
What exactly do the barycenter.py, conv_layers.py, and rbf_sampler.py scripts do(or where can I find a description for them in your paper/elsewhere) and when should I use them?
Is the cka.py metric script fully implemented? I noticed that in the fit() function you just call pass, is that because you don’t need to align the tensors in the shape space? Also, the CKA you calculate seems to be just the angular distance between the covariance matrices which seems to be different from the CKA calculation described here no?
A similar question for the kernel.py script: The whole script is commented out, do I just need to uncomment it to use it or is there a reason its left uncommented
What is permutation.py calculating? Actually in general, could you point me to where the metrics calculated in each of these scripts is described so I can get a better understanding? In the README, you wrote that

A forthcoming preprint will describe the various metrics in more detail

do you have a copy of the preprint you might be able to share with me, or another resource I can use?

Thank you for taking the time to answer my questions, I know you must be busy.

Quickly comparing your image classification models with the state-of-the-art models (such as DenseNet, ResNet, ...)

Image Classification Project Killer in PyTorch This repo is designed for those who want to start their experiments two days before the deadline and ki

349 Dec 8, 2022

Using some basic methods to show linkages and transformations of robotic arms

roboticArmVisualizer Python GUI application to create custom linkages and adjust joint angles. In the future, I plan to add 2d inverse kinematics solv

1 Nov 19, 2021

Awesome Deep Graph Clustering is a collection of SOTA, novel deep graph clustering methods

ADGC: Awesome Deep Graph Clustering ADGC is a collection of state-of-the-art (SOTA), novel deep graph clustering methods (papers, codes and datasets).

297 Dec 27, 2022

Use deep learning, genetic programming and other methods to predict stock and market movements

StockPredictions Use classic tricks, neural networks, deep learning, genetic programming and other methods to predict stock and market movements. Both

386 Jan 3, 2023

Code for "Intra-hour Photovoltaic Generation Forecasting based on Multi-source Data and Deep Learning Methods."

pv_predict_unet-lstm Code for "Intra-hour Photovoltaic Generation Forecasting based on Multi-source Data and Deep Learning Methods." IEEE Transactions

8 Oct 8, 2022

Tensorflow 2 implementation of the paper: Learning and Evaluating Representations for Deep One-class Classification published at ICLR 2021

Deep Representation One-class Classification (DROC). This is not an officially supported Google product. Tensorflow 2 implementation of the paper: Lea

137 Dec 23, 2022

Python package facilitating the use of Bayesian Deep Learning methods with Variational Inference for PyTorch

PyVarInf PyVarInf provides facilities to easily train your PyTorch neural network models using variational inference. Bayesian Deep Learning with Vari

342 Dec 2, 2022

Road Crack Detection Using Deep Learning Methods

Road-Crack-Detection-Using-Deep-Learning-Methods This is my Diploma Thesis ¨Road Crack Detection Using Deep Learning Methods¨ under the supervision of

3 May 3, 2022

Deep learning (neural network) based remote photoplethysmography: how to extract pulse signal from video using deep learning tools

Deep-rPPG: Camera-based pulse estimation using deep learning tools Deep learning (neural network) based remote photoplethysmography: how to extract pu

138 Dec 17, 2022

Possible error in euclidean metric?

I'm looking at this line which defines the Euclidean distance: https://github.com/ahwillia/netrep/blob/main/netrep/metrics/linear.py#L150

From your paper and the Dryden book, I understood the size-and-shape distance to be calculated using the Frobenius norm, which would be proportional to the root-mean-squared of np.linalg.norm(..., axis=1) rather than its mean as you've implemented it. (Or, simply np.linalg.norm(..., ord='fro'))

Did I misunderstand something here, or is this an error?

opened by wrongu 3
License

Hi everyone,

thank you very much for this amazing resource!

I was wondering if you were planning to add a license to the repro soon(-ish), outlining respective usage rights?

Thanks again. Cheers, Peer

opened by PeerHerholz 1
Questions about netrep
I’m trying to use your netrep repo for my own analyses but I’m getting stuck on some errors and I just wanted to make sure I’m implementing it correctly. Essentially, I’m trying to compare the network activations at certain layers between a trained resnet152 model and an untrained resnet152 model in keras. My current workflow is as follows:

Get the activation model at a stopped at a certain layer for both the untrained and trained models Show the activation models an image from ImageNet and store the activations in (n x w x h x d) tensors (U, T) Call conv_layers.convolve_metric(LinearMetric(),T,U) and get the resulting (w x h) matrix

My questions are:

Is the score output by Metric.score based on how close they are to each other or how far apart they are. IE if the score is higher does that mean the representations are close together or far apart?

Is what I described above, the correct workflow, or is there a better one you would recommend?

As a sanity check I called conv_layers.convolve_metric(LinearMetric(), T, T) assuming this should return a matrix of 0s right? but the resulting matrix has an mean value of 0.878 and that doesn't seem right.

What exactly do the barycenter.py, conv_layers.py, and rbf_sampler.py scripts do(or where can I find a description for them in your paper/elsewhere) and when should I use them?

Is the cka.py metric script fully implemented? I noticed that in the fit() function you just call pass, is that because you don’t need to align the tensors in the shape space? Also, the CKA you calculate seems to be just the angular distance between the covariance matrices which seems to be different from the CKA calculation described here no?

A similar question for the kernel.py script: The whole script is commented out, do I just need to uncomment it to use it or is there a reason its left uncommented

What is permutation.py calculating? Actually in general, could you point me to where the metrics calculated in each of these scripts is described so I can get a better understanding? In the README, you wrote that

A forthcoming preprint will describe the various metrics in more detail

do you have a copy of the preprint you might be able to share with me, or another resource I can use?

Thank you for taking the time to answer my questions, I know you must be busy.
opened by aaprasad 0

Some methods for comparing network representations in deep learning and neuroscience.

Related tags

Overview

Generalized Shape Metrics on Neural Representations

A short and preliminary guide

Computing the distance between two networks

Computing distances between many networks

An incomplete list of related work

You might also like...

Quickly comparing your image classification models with the state-of-the-art models (such as DenseNet, ResNet, ...)

Using some basic methods to show linkages and transformations of robotic arms

Awesome Deep Graph Clustering is a collection of SOTA, novel deep graph clustering methods

Use deep learning, genetic programming and other methods to predict stock and market movements

Code for "Intra-hour Photovoltaic Generation Forecasting based on Multi-source Data and Deep Learning Methods."

Tensorflow 2 implementation of the paper: Learning and Evaluating Representations for Deep One-class Classification published at ICLR 2021

Python package facilitating the use of Bayesian Deep Learning methods with Variational Inference for PyTorch

Road Crack Detection Using Deep Learning Methods

Deep learning (neural network) based remote photoplethysmography: how to extract pulse signal from video using deep learning tools

Comments

Possible error in euclidean metric?

License

Questions about netrep

Owner

Alex Williams

NeuroGen: activation optimized image synthesis for discovery neuroscience

Project looking into use of autoencoder for semi-supervised learning and comparing data requirements compared to supervised learning.

A toolkit for developing and comparing reinforcement learning algorithms.

We evaluate our method on different datasets (including ShapeNet, CUB-200-2011, and Pascal3D+) and achieve state-of-the-art results, outperforming all the other supervised and unsupervised methods and 3D representations, all in terms of performance, accuracy, and training time.

Revisiting Contrastive Methods for Unsupervised Learning of Visual Representations. [2021]

A PyTorch-based open-source framework that provides methods for improving the weakly annotated data and allows researchers to efficiently develop and compare their own methods.

Deep Learning: Architectures & Methods Project: Deep Learning for Audio Super-Resolution

OCTIS: Comparing Topic Models is Simple! A python package to optimize and evaluate topic models (accepted at EACL2021 demo track)

aka "Bayesian Methods for Hackers": An introduction to Bayesian methods + probabilistic programming with a computation/understanding-first, mathematics-second point of view. All in pure Python ;)

Implementation of temporal pooling methods studied in [ICIP'20] A Comparative Evaluation Of Temporal Pooling Methods For Blind Video Quality Assessment