Grounding Representation Similarity with Statistical Testing

Last update: Dec 2, 2022

Related tags

Deep Learning sim_metric

Overview

Grounding Representation Similarity with Statistical Testing

This repo contains code to replicate the results in our paper, which evaluates representation similarity measures with a series of benchmark tasks. The experiments in the paper require first computing neural network embeddings of a dataset and computing accuracy scores of that neural network, which we provide pre-computed. This repo contains the code that implements our benchmark evaluation, given these embeddings and performance scores.

File descriptions

This repo: `sim_metric`

This repo is organized as follows:

experiments/ contains code to run the experiments in part 4 of the paper:
- layer_exp is the first experiment in part 4, with different random seeds and layer depths
- pca_deletion is the second experiment in part 4, with different numbers of principal components deleted
- feather is the first experiment in part 4.1, with different finetuning seeds
- pretrain_finetune is the second experiment in part 4.2, with different pretraining and finetuning seeds
dists/ contains functions to compute dissimilarities between representations.

Pre-computed resources: `sim_metric_resources`

The pre-computed embeddings and scores available at https://zenodo.org/record/5117844 can be downloaded and unzipped into a folder titled sim_metric_resources, which is organized as follows:

embeddings contains the embeddings between which we are computing dissimilarities
dists contains, for every experiment, the dissimilarities between the corresponding embeddings, for every metric:
- dists.csv contains the precomputed dissimilarities
- dists_self_computed.csv contains the dissimilarities computed by running compute_dists.py (see below)
scores contains, for every experiment, the accuracy scores of the embeddings
full_dfs contains, for every experiment, a csv file aggregating the dissimilarities and accuracy differences between the embeddings

Instructions

clone this repository
go to https://zenodo.org/record/5117844 and download sim_metric_resources.tar
untar it with tar -xvf sim_metric_resources sim_metric_resources.tar
in sim_metric/paths.py, modify the path to sim_metric_resources

Replicating the results

For every experiment (eg feather, pretrain_finetune, layer_exp, or pca_deletion):

the relevant dissimilarities and accuracies differences have already been precomputed and aggregated in a dataframe full_df
make sure that dists_path and full_df_path in compute_full_df.py, script.py and notebook.ipynb are set to dists.csv and full_df.csv, and not dists_self_computed.csv and full_df_self_computed.csv.
to get the results, you can:
- run the notebook notebook.ipynb, or
- run script.py in the experiment's folder, and find the results in results.txt, in the same folder To run the scripts for all four experiments, run experiments/script.py.

Recomputing dissimilarities

For every experiment, you can:

recompute the dissimilarities between embeddings by running compute_dists.py in this experiment's folder
use these and the accuracy scores to recompute the aggregate dataframe by running compute_full_df.py in this experiment's folder
change dists_path and full_df_path in compute_full_df.py, script.py and notebook.ipynb from dists.csv and full_df.csv to dists_self_computed.csv and full_df_self_computed.csv
run the experiments with script.py or notebook.ipynb as above.

Adding a new metric

This repo also allows you to test a new representational similarity metric and see how it compares according to our benchmark. To add a new metric:

add the corresponding function at the end of dists/scoring.py
add a condition in dists/score_pair.py, around line 160
for every experiment in experiments, add the name of the metric to the metrics list in compute_dists.py

[CVPR 2022 Oral] TubeDETR: Spatio-Temporal Video Grounding with Transformers

TubeDETR: Spatio-Temporal Video Grounding with Transformers Website • STVG Demo • Paper This repository provides the code for our paper. This includes

108 Dec 27, 2022

SeqTR: A Simple yet Universal Network for Visual Grounding

SeqTR This is the official implementation of SeqTR: A Simple yet Universal Network for Visual Grounding, which simplifies and unifies the modelling fo

76 Dec 24, 2022

Eff video representation - Efficient video representation through neural fields

Neural Residual Flow Fields for Efficient Video Representations 1. Download MPI

41 Jan 6, 2023

Python Library for learning (Structure and Parameter) and inference (Statistical and Causal) in Bayesian Networks.

pgmpy pgmpy is a python library for working with Probabilistic Graphical Models. Documentation and list of algorithms supported is at our official sit

2.2k Jan 3, 2023

Statsmodels: statistical modeling and econometrics in Python

About statsmodels statsmodels is a Python package that provides a complement to scipy for statistical computations including descriptive statistics an

8.1k Jan 2, 2023

PyStan, a Python interface to Stan, a platform for statistical modeling. Documentation: https://pystan.readthedocs.io

PyStan NOTE: This documentation describes a BETA release of PyStan 3. PyStan is a Python interface to Stan, a package for Bayesian inference. Stan® is

229 Dec 29, 2022

Probabilistic Programming and Statistical Inference in PyTorch

PtStat Probabilistic Programming and Statistical Inference in PyTorch. Introduction This project is being developed during my time at Cogent Labs. The

109 Nov 26, 2022

Code to run experiments in SLOE: A Faster Method for Statistical Inference in High-Dimensional Logistic Regression.

Code to run experiments in SLOE: A Faster Method for Statistical Inference in High-Dimensional Logistic Regression. Not an official Google product. Me

27 Dec 12, 2022

Re-implementation of the Noise Contrastive Estimation algorithm for pyTorch, following "Noise-contrastive estimation: A new estimation principle for unnormalized statistical models." (Gutmann and Hyvarinen, AISTATS 2010)

Noise Contrastive Estimation for pyTorch Overview This repository contains a re-implementation of the Noise Contrastive Estimation algorithm, implemen

42 Nov 24, 2022

Comments

how to normalize?

Can you clarify what this means in an equation?

 Specifically, for a raw representation A we first
subtract the mean value from each column, then divide by the Frobenius norm, to produce the
normalized representation A∗
, used in all our dissimilarity computations. In this work we study
dissimilarity measures d(A∗
, B∗
) that allow for quantitative comparisons of representations both
within and across different networks.

which of these two is it:


def _matrix_normalize(input: Tensor,
                      dim: int
                      ) -> Tensor:
    """
    Center and normalize according to the forbenius norm (not the standard deviation).

    Warning: this does not create standardized random variables in a random vectors.

    Note: careful with this, it makes CCA behave in unexpected ways
    :param input:
    :param dim:
    :return:
    """
    from torch.linalg import norm
    return (input - input.mean(dim=dim, keepdim=True)) / norm(input, 'fro')

def _matrix_normalize_using_centered_data(X: Tensor, dim: int = 1) -> Tensor:
    """
    Normalize matrix of size wrt to the data dimension according to the similarity preprocessing standard.
    Assumption is that X is of size [n, d].
    Otherwise, specify which simension to normalize with dim.

    ref: https://stats.stackexchange.com/questions/544812/how-should-one-normalize-activations-of-batches-before-passing-them-through-a-si
    """
    from torch.linalg import norm
    X_centered: Tensor = (X - X.mean(dim=dim, keepdim=True))
    X_star: Tensor = X_centered / norm(X_centered, "fro")
    return X_star

ref: https://stats.stackexchange.com/questions/544812/how-should-one-normalize-activations-of-batches-before-passing-them-through-a-si

opened by brando90 5

normalizing by forbenius norm might break CCA

Hi, first I want to thank you for the great work!

I started dividing by the forbenius norm and I am not sure if that is correct for CCA. It breaks the very basic santity checks one would expect if one understands CCA in depth.

You can see in detail dicussion here: https://github.com/brando90/ultimate-anatome/issues/3 and here https://github.com/moskomule/anatome/pull/11

Feel free do ping me directly anywhere if you want to discuss.

opened by brando90 2
Fix imports

I added back in a utils.py file we had for loading embeddings and renamed it embedding_utils.py, and I switched the order of import statements so that everything now runs on my local machine. Let me know what you think!

opened by francesding 0
changing the layer experiment

Now the layer experiment just uses the last layer as the reference (no more averaging over all 12 layers)

Since the results are pretty consistent across layers for the layer experiment, and using middle layers as references give us the slightly weird negative correlation results, it might be simpler just to report the results using the 12th layer as our reference representation. I changed script.py and notebook.ipynb to reflect this change, and I think those are all the changes that will be necessary.

Let me know if you think this change makes sense, and also if any other files need to be changed!

opened by francesding 0

Owner

GitHub

Sharpened cosine similarity torch - A Sharpened Cosine Similarity layer for PyTorch

Sharpened Cosine Similarity A layer implementation for PyTorch Install At your c

203 Nov 30, 2022

PyTorch Implementation of Region Similarity Representation Learning (ReSim)

ReSim This repository provides the PyTorch implementation of Region Similarity Representation Learning (ReSim) described in this paper: @Article{xiao2

74 Jan 3, 2023

This is the official pytorch implementation for the paper: Instance Similarity Learning for Unsupervised Feature Representation.

ISL This is the official pytorch implementation for the paper: Instance Similarity Learning for Unsupervised Feature Representation, which is accepted

19 May 4, 2022

PIGLeT: Language Grounding Through Neuro-Symbolic Interaction in a 3D World [ACL 2021]

piglet PIGLeT: Language Grounding Through Neuro-Symbolic Interaction in a 3D World [ACL 2021] This repo contains code and data for PIGLeT. If you like

51 Oct 8, 2022

[CVPR2021] Look before you leap: learning landmark features for one-stage visual grounding.

LBYL-Net This repo implements paper Look Before You Leap: Learning Landmark Features For One-Stage Visual Grounding CVPR 2021. Getting Started Prerequ

45 Dec 12, 2022

A PyTorch implementation of the baseline method in Panoptic Narrative Grounding (ICCV 2021 Oral)

52 Dec 19, 2022

A Fast and Accurate One-Stage Approach to Visual Grounding, ICCV 2019 (Oral)

One-Stage Visual Grounding ***** New: Our recent work on One-stage VG is available at ReSC.***** A Fast and Accurate One-Stage Approach to Visual Grou

118 Dec 5, 2022

The official implementation of CVPR 2021 Paper: Improving Weakly Supervised Visual Grounding by Contrastive Knowledge Distillation.

Improving Weakly Supervised Visual Grounding by Contrastive Knowledge Distillation This repository is the official implementation of CVPR 2021 paper:

9 Nov 14, 2022

Negative Sample Matters: A Renaissance of Metric Learning for Temporal Grounding

2D-TAN (Optimized) Introduction This is an optimized re-implementation repository for AAAI'2020 paper: Learning 2D Temporal Localization Networks for

112 Dec 31, 2022

[ICCV2021] 3DVG-Transformer: Relation Modeling for Visual Grounding on Point Clouds

3DVG-Transformer This repository is for the ICCV 2021 paper "3DVG-Transformer: Relation Modeling for Visual Grounding on Point Clouds" Our method "3DV

22 Dec 11, 2022

Grounding Representation Similarity with Statistical Testing

Related tags

Overview

Grounding Representation Similarity with Statistical Testing

File descriptions

This repo: sim_metric

Pre-computed resources: sim_metric_resources

Instructions

Replicating the results

Recomputing dissimilarities

Adding a new metric

You might also like...

[CVPR 2022 Oral] TubeDETR: Spatio-Temporal Video Grounding with Transformers

SeqTR: A Simple yet Universal Network for Visual Grounding

Eff video representation - Efficient video representation through neural fields

Python Library for learning (Structure and Parameter) and inference (Statistical and Causal) in Bayesian Networks.

Statsmodels: statistical modeling and econometrics in Python

PyStan, a Python interface to Stan, a platform for statistical modeling. Documentation: https://pystan.readthedocs.io

Probabilistic Programming and Statistical Inference in PyTorch

Code to run experiments in SLOE: A Faster Method for Statistical Inference in High-Dimensional Logistic Regression.

Re-implementation of the Noise Contrastive Estimation algorithm for pyTorch, following "Noise-contrastive estimation: A new estimation principle for unnormalized statistical models." (Gutmann and Hyvarinen, AISTATS 2010)

Comments

how to normalize?

normalizing by forbenius norm might break CCA

Fix imports

changing the layer experiment

Owner

Sharpened cosine similarity torch - A Sharpened Cosine Similarity layer for PyTorch

PyTorch Implementation of Region Similarity Representation Learning (ReSim)

This is the official pytorch implementation for the paper: Instance Similarity Learning for Unsupervised Feature Representation.

PIGLeT: Language Grounding Through Neuro-Symbolic Interaction in a 3D World [ACL 2021]

[CVPR2021] Look before you leap: learning landmark features for one-stage visual grounding.

A PyTorch implementation of the baseline method in Panoptic Narrative Grounding (ICCV 2021 Oral)

A Fast and Accurate One-Stage Approach to Visual Grounding, ICCV 2019 (Oral)

The official implementation of CVPR 2021 Paper: Improving Weakly Supervised Visual Grounding by Contrastive Knowledge Distillation.

Negative Sample Matters: A Renaissance of Metric Learning for Temporal Grounding

[ICCV2021] 3DVG-Transformer: Relation Modeling for Visual Grounding on Point Clouds

This repo: `sim_metric`

Pre-computed resources: `sim_metric_resources`