Grounding Representation Similarity with Statistical Testing

Overview

Grounding Representation Similarity with Statistical Testing

This repo contains code to replicate the results in our paper, which evaluates representation similarity measures with a series of benchmark tasks. The experiments in the paper require first computing neural network embeddings of a dataset and computing accuracy scores of that neural network, which we provide pre-computed. This repo contains the code that implements our benchmark evaluation, given these embeddings and performance scores.

File descriptions

This repo: sim_metric

This repo is organized as follows:

  • experiments/ contains code to run the experiments in part 4 of the paper:
    • layer_exp is the first experiment in part 4, with different random seeds and layer depths
    • pca_deletion is the second experiment in part 4, with different numbers of principal components deleted
    • feather is the first experiment in part 4.1, with different finetuning seeds
    • pretrain_finetune is the second experiment in part 4.2, with different pretraining and finetuning seeds
  • dists/ contains functions to compute dissimilarities between representations.

Pre-computed resources: sim_metric_resources

The pre-computed embeddings and scores available at https://zenodo.org/record/5117844 can be downloaded and unzipped into a folder titled sim_metric_resources, which is organized as follows:

  • embeddings contains the embeddings between which we are computing dissimilarities
  • dists contains, for every experiment, the dissimilarities between the corresponding embeddings, for every metric:
    • dists.csv contains the precomputed dissimilarities
    • dists_self_computed.csv contains the dissimilarities computed by running compute_dists.py (see below)
  • scores contains, for every experiment, the accuracy scores of the embeddings
  • full_dfs contains, for every experiment, a csv file aggregating the dissimilarities and accuracy differences between the embeddings

Instructions

  • clone this repository
  • go to https://zenodo.org/record/5117844 and download sim_metric_resources.tar
  • untar it with tar -xvf sim_metric_resources sim_metric_resources.tar
  • in sim_metric/paths.py, modify the path to sim_metric_resources

Replicating the results

For every experiment (eg feather, pretrain_finetune, layer_exp, or pca_deletion):

  • the relevant dissimilarities and accuracies differences have already been precomputed and aggregated in a dataframe full_df
  • make sure that dists_path and full_df_path in compute_full_df.py, script.py and notebook.ipynb are set to dists.csv and full_df.csv, and not dists_self_computed.csv and full_df_self_computed.csv.
  • to get the results, you can:
    • run the notebook notebook.ipynb, or
    • run script.py in the experiment's folder, and find the results in results.txt, in the same folder To run the scripts for all four experiments, run experiments/script.py.

Recomputing dissimilarities

For every experiment, you can:

  • recompute the dissimilarities between embeddings by running compute_dists.py in this experiment's folder
  • use these and the accuracy scores to recompute the aggregate dataframe by running compute_full_df.py in this experiment's folder
  • change dists_path and full_df_path in compute_full_df.py, script.py and notebook.ipynb from dists.csv and full_df.csv to dists_self_computed.csv and full_df_self_computed.csv
  • run the experiments with script.py or notebook.ipynb as above.

Adding a new metric

This repo also allows you to test a new representational similarity metric and see how it compares according to our benchmark. To add a new metric:

  • add the corresponding function at the end of dists/scoring.py
  • add a condition in dists/score_pair.py, around line 160
  • for every experiment in experiments, add the name of the metric to the metrics list in compute_dists.py
You might also like...
[CVPR 2022 Oral] TubeDETR: Spatio-Temporal Video Grounding with Transformers

TubeDETR: Spatio-Temporal Video Grounding with Transformers Website • STVG Demo • Paper This repository provides the code for our paper. This includes

SeqTR: A Simple yet Universal Network for Visual Grounding
SeqTR: A Simple yet Universal Network for Visual Grounding

SeqTR This is the official implementation of SeqTR: A Simple yet Universal Network for Visual Grounding, which simplifies and unifies the modelling fo

Eff video representation - Efficient video representation through neural fields

Neural Residual Flow Fields for Efficient Video Representations 1. Download MPI

Python Library for learning (Structure and Parameter) and inference (Statistical and Causal) in Bayesian Networks.

pgmpy pgmpy is a python library for working with Probabilistic Graphical Models. Documentation and list of algorithms supported is at our official sit

Statsmodels: statistical modeling and econometrics in Python

About statsmodels statsmodels is a Python package that provides a complement to scipy for statistical computations including descriptive statistics an

PyStan, a Python interface to Stan, a platform for statistical modeling. Documentation: https://pystan.readthedocs.io

PyStan NOTE: This documentation describes a BETA release of PyStan 3. PyStan is a Python interface to Stan, a package for Bayesian inference. Stan® is

Probabilistic Programming and Statistical Inference in PyTorch

PtStat Probabilistic Programming and Statistical Inference in PyTorch. Introduction This project is being developed during my time at Cogent Labs. The

Code to run experiments in SLOE: A Faster Method for Statistical Inference in High-Dimensional Logistic Regression.

Code to run experiments in SLOE: A Faster Method for Statistical Inference in High-Dimensional Logistic Regression. Not an official Google product. Me

Re-implementation of the Noise Contrastive Estimation algorithm for pyTorch, following "Noise-contrastive estimation: A new estimation principle for unnormalized statistical models." (Gutmann and Hyvarinen, AISTATS 2010)

Noise Contrastive Estimation for pyTorch Overview This repository contains a re-implementation of the Noise Contrastive Estimation algorithm, implemen

Comments
  • how to normalize?

    how to normalize?

    Can you clarify what this means in an equation?

     Specifically, for a raw representation A we first
    subtract the mean value from each column, then divide by the Frobenius norm, to produce the
    normalized representation A∗
    , used in all our dissimilarity computations. In this work we study
    dissimilarity measures d(A∗
    , B∗
    ) that allow for quantitative comparisons of representations both
    within and across different networks.
    

    which of these two is it:

    
    def _matrix_normalize(input: Tensor,
                          dim: int
                          ) -> Tensor:
        """
        Center and normalize according to the forbenius norm (not the standard deviation).
    
        Warning: this does not create standardized random variables in a random vectors.
    
        Note: careful with this, it makes CCA behave in unexpected ways
        :param input:
        :param dim:
        :return:
        """
        from torch.linalg import norm
        return (input - input.mean(dim=dim, keepdim=True)) / norm(input, 'fro')
    
    def _matrix_normalize_using_centered_data(X: Tensor, dim: int = 1) -> Tensor:
        """
        Normalize matrix of size wrt to the data dimension according to the similarity preprocessing standard.
        Assumption is that X is of size [n, d].
        Otherwise, specify which simension to normalize with dim.
    
        ref: https://stats.stackexchange.com/questions/544812/how-should-one-normalize-activations-of-batches-before-passing-them-through-a-si
        """
        from torch.linalg import norm
        X_centered: Tensor = (X - X.mean(dim=dim, keepdim=True))
        X_star: Tensor = X_centered / norm(X_centered, "fro")
        return X_star
    

    ref: https://stats.stackexchange.com/questions/544812/how-should-one-normalize-activations-of-batches-before-passing-them-through-a-si

    opened by brando90 5
  • normalizing by forbenius norm might break CCA

    normalizing by forbenius norm might break CCA

    Hi, first I want to thank you for the great work!

    I started dividing by the forbenius norm and I am not sure if that is correct for CCA. It breaks the very basic santity checks one would expect if one understands CCA in depth.

    You can see in detail dicussion here: https://github.com/brando90/ultimate-anatome/issues/3 and here https://github.com/moskomule/anatome/pull/11

    Feel free do ping me directly anywhere if you want to discuss.

    opened by brando90 2
  • Fix imports

    Fix imports

    I added back in a utils.py file we had for loading embeddings and renamed it embedding_utils.py, and I switched the order of import statements so that everything now runs on my local machine. Let me know what you think!

    opened by francesding 0
  • changing the layer experiment

    changing the layer experiment

    Now the layer experiment just uses the last layer as the reference (no more averaging over all 12 layers)

    Since the results are pretty consistent across layers for the layer experiment, and using middle layers as references give us the slightly weird negative correlation results, it might be simpler just to report the results using the 12th layer as our reference representation. I changed script.py and notebook.ipynb to reflect this change, and I think those are all the changes that will be necessary.

    Let me know if you think this change makes sense, and also if any other files need to be changed!

    opened by francesding 0
Owner
null
Sharpened cosine similarity torch - A Sharpened Cosine Similarity layer for PyTorch

Sharpened Cosine Similarity A layer implementation for PyTorch Install At your c

Brandon Rohrer 203 Nov 30, 2022
PyTorch Implementation of Region Similarity Representation Learning (ReSim)

ReSim This repository provides the PyTorch implementation of Region Similarity Representation Learning (ReSim) described in this paper: @Article{xiao2

Tete Xiao 74 Jan 3, 2023
This is the official pytorch implementation for the paper: Instance Similarity Learning for Unsupervised Feature Representation.

ISL This is the official pytorch implementation for the paper: Instance Similarity Learning for Unsupervised Feature Representation, which is accepted

null 19 May 4, 2022
PIGLeT: Language Grounding Through Neuro-Symbolic Interaction in a 3D World [ACL 2021]

piglet PIGLeT: Language Grounding Through Neuro-Symbolic Interaction in a 3D World [ACL 2021] This repo contains code and data for PIGLeT. If you like

Rowan Zellers 51 Oct 8, 2022
[CVPR2021] Look before you leap: learning landmark features for one-stage visual grounding.

LBYL-Net This repo implements paper Look Before You Leap: Learning Landmark Features For One-Stage Visual Grounding CVPR 2021. Getting Started Prerequ

SVIP Lab 45 Dec 12, 2022
A PyTorch implementation of the baseline method in Panoptic Narrative Grounding (ICCV 2021 Oral)

A PyTorch implementation of the baseline method in Panoptic Narrative Grounding (ICCV 2021 Oral)

Biomedical Computer Vision @ Uniandes 52 Dec 19, 2022
A Fast and Accurate One-Stage Approach to Visual Grounding, ICCV 2019 (Oral)

One-Stage Visual Grounding ***** New: Our recent work on One-stage VG is available at ReSC.***** A Fast and Accurate One-Stage Approach to Visual Grou

Zhengyuan Yang 118 Dec 5, 2022
The official implementation of CVPR 2021 Paper: Improving Weakly Supervised Visual Grounding by Contrastive Knowledge Distillation.

Improving Weakly Supervised Visual Grounding by Contrastive Knowledge Distillation This repository is the official implementation of CVPR 2021 paper:

null 9 Nov 14, 2022
Negative Sample Matters: A Renaissance of Metric Learning for Temporal Grounding

2D-TAN (Optimized) Introduction This is an optimized re-implementation repository for AAAI'2020 paper: Learning 2D Temporal Localization Networks for

Joya Chen 112 Dec 31, 2022
[ICCV2021] 3DVG-Transformer: Relation Modeling for Visual Grounding on Point Clouds

3DVG-Transformer This repository is for the ICCV 2021 paper "3DVG-Transformer: Relation Modeling for Visual Grounding on Point Clouds" Our method "3DV

null 22 Dec 11, 2022