Code for reproducible experiments presented in KSD Aggregated Goodness-of-fit Test.

Antonin Schrab

Last update: Dec 15, 2022

Related tags

Deep Learning ksdagg-paper

Overview

Code for KSDAgg: a KSD aggregated goodness-of-fit test

This GitHub repository contains the code for the reproducible experiments presented in our paper KSD Aggregated Goodness-of-fit Test:

Gamma distribution experiment,
Gaussian-Bernoulli Restricted Boltzmann Machine experiment,
MNIST Normalizing Flow experiment.

We provide the code to run the experiments to generate Figures 1-4 and Table 1 from our paper, those can be found in figures.

Our aggregated test KSDAgg is implemented in ksdagg.py. We provide code for two quantile estimation methods: the wild bootstrap and the parametric bootstrap. Our implementation uses the IMQ (inverse multiquadric) kernel with a collection of bandwidths consisting of the median bandwidth scaled by powers of 2, and with one of the four types of weights proposed in MMD Aggregated Two-Sample Test. We also provide custom KSDAgg functions in ksdagg.py which allow for the use of any kernel collections and weights.

Requirements

python 3.9

Installation

In a chosen directory, clone the repository and change to its directory by executing

git clone [email protected]:antoninschrab/ksdagg-paper.git
cd ksdagg-paper

We then recommend creating and activating a virtual environment by either

using venv:

python3 -m venv ksdagg-env
source ksdagg-env/bin/activate
# can be deactivated by running:
# deactivate

or using conda:

conda create --name ksdagg-env python=3.9
conda activate ksdagg-env
# can be deactivated by running:
# conda deactivate

The required packages can then be installed in the virtual environment by running

python -m pip install -r requirements.txt

Generating or downloading the data

The data for the Gaussian-Bernoulli Restricted Boltzmann Machine experiment and for the MNIST Normalizing Flow experiment can

be obtained by executing

python generate_data_rbm.py
python generate_data_nf.py

or, as running the above scripts can be computationally expensive, we also provide the option to download their outputs directly
```
python download_data.py
```

Those scripts generate samples and compute their associated scores under the model for the different settings considered in our experiments, the data is saved in the new directory data.

Reproducing the experiments of the paper

First, for our three experiments, we compute KSD values to be used for the parametric bootstrap and save them in the directory parametric. This can be done by running

python generate_parametric.py

For convenience, we directly provide the directory parametric obtained by running this script.

To run the three experiments, the following commands can be executed

python experiment_gamma.py 
python experiment_rbm.py 
python experiment_nf.py

Those commands run all the tests necessary for our experiments, the results are saved in dedicated .csv and .pkl files in the directory results (which is already provided for ease of use). Note that our expeiments are comprised of 'embarrassingly parallel for loops', for which significant speed up can be obtained by using parallel computing libraries such as joblib or dask.

The actual figures of the paper can be obtained from the saved dataframes in results by using the command

python figures.py

The figures are saved in the directory figures and correspond to the ones used in our paper.

References

Our KSDAgg code is based our MMDAgg implementation which can be found at https://github.com/antoninschrab/mmdagg-paper.

For the Gaussian-Bernoulli Restricted Boltzmann Machine experiment, we obtain the samples and scores in generate_data_rbm.py by relying on Wittawat Jitkrittum's implementation which can be found at https://github.com/wittawatj/kernel-gof under the MIT License. The relevant files we use are in the directory kgof.

For the MNIST Normalizing Flow experiment, we use in generate_data_nf.py a multiscale Normalizing Flow trained on the MNIST dataset as implemented by Phillip Lippe in Tutorial 11: Normalizing Flows for image modeling as part of the UvA Deep Learning Tutorials under the MIT License.

Author

Antonin Schrab

Centre for Artificial Intelligence, Department of Computer Science, University College London

Gatsby Computational Neuroscience Unit, University College London

Inria, Lille - Nord Europe research centre and Inria London Programme

Bibtex

@unpublished{schrab2022ksd,
    title={{KSD} Aggregated Goodness-of-fit Test},
    author={Antonin Schrab and Benjamin Guedj and Arthur Gretton},
    year={2022},
    note = "Submitted.",
    abstract = {We investigate properties of goodness-of-fit tests based on the Kernel Stein Discrepancy (KSD). We introduce a strategy to construct a test, called KSDAgg, which aggregates multiple tests with different kernels. KSDAgg avoids splitting the data to perform kernel selection (which leads to a loss in test power), and rather maximises the test power over a collection of kernels. We provide theoretical guarantees on the power of KSDAgg: we show it achieves the smallest uniform separation rate of the collection, up to a logarithmic term. KSDAgg can be computed exactly in practice as it relies either on a parametric bootstrap or on a wild bootstrap to estimate the quantiles and the level corrections. In particular, for the crucial choice of bandwidth of a fixed kernel, it avoids resorting to arbitrary heuristics (such as median or standard deviation) or to data splitting. We find on both synthetic and real-world data that KSDAgg outperforms other state-of-the-art adaptive KSD-based goodness-of-fit testing procedures.},
    url = {https://arxiv.org/abs/2202.00824},
    url_PDF = {https://arxiv.org/pdf/2202.00824.pdf},
    url_Code = {https://github.com/antoninschrab/ksdagg-paper},
    eprint={2202.00824},
    archivePrefix={arXiv},
    primaryClass={stat.ML}
}

License

MIT License (see LICENSE.md)

You might also like...

Projects for AI/ML and IoT integration for games and other presented at re:Invent 2021.

Playground4AWS Projects for AI/ML and IoT integration for games and other presented at re:Invent 2021. Architecture Minecraft and Lamps This project i

5 Nov 30, 2022

Prototypical python implementation of the trust-region algorithm presented in Sequential Linearization Method for Bound-Constrained Mathematical Programs with Complementarity Constraints by Larson, Leyffer, Kirches, and Manns.

Prototypical python implementation of the trust-region algorithm presented in Sequential Linearization Method for Bound-Constrained Mathematical Programs with Complementarity Constraints by Larson, Leyffer, Kirches, and Manns.

3 Dec 2, 2022

Collection of TensorFlow2 implementations of Generative Adversarial Network varieties presented in research papers.

TensorFlow2-GAN Collection of tf2.0 implementations of Generative Adversarial Network varieties presented in research papers. Model architectures will

41 Apr 28, 2022

This is our ARTS test set, an enriched test set to probe Aspect Robustness of ABSA.

This is the repository for our 2020 paper "Tasty Burgers, Soggy Fries: Probing Aspect Robustness in Aspect-Based Sentiment Analysis". Data We provide

35 Nov 16, 2022

Lightweight, Python library for fast and reproducible experimentation :microscope:

Steppy What is Steppy? Steppy is a lightweight, open-source, Python 3 library for fast and reproducible experimentation. Steppy lets data scientist fo

134 Jul 10, 2022

Open-L2O: A Comprehensive and Reproducible Benchmark for Learning to Optimize Algorithms

Open-L2O This repository establishes the first comprehensive benchmark efforts of existing learning to optimize (L2O) approaches on a number of proble

161 Jan 2, 2023

MQBench: Towards Reproducible and Deployable Model Quantization Benchmark

MQBench: Towards Reproducible and Deployable Model Quantization Benchmark We propose a benchmark to evaluate different quantization algorithms on vari

494 Dec 29, 2022

Applications using the GTN library and code to reproduce experiments in "Differentiable Weighted Finite-State Transducers"

gtn_applications An applications library using GTN. Current examples include: Offline handwriting recognition Automatic speech recognition Installing

68 Dec 29, 2022

Code to reproduce the experiments in the paper "Transformer Based Multi-Source Domain Adaptation" (EMNLP 2020)

Transformer Based Multi-Source Domain Adaptation Dustin Wright and Isabelle Augenstein To appear in EMNLP 2020. Read the preprint: https://arxiv.org/a

36 Dec 5, 2022

Comments

Undefined variable in KSDAGG + wild bootstrap

Hi,

It seems the variable bandwidth_multipliers on the following LOC is undefined.

https://github.com/antoninschrab/ksdagg-paper/blob/99a0179f4a16e824aad3e15c448b42ac924d6b83/ksdagg.py#L50

I presume this line can be fixed by replacing it with N = 1 + l_plus - l_minus?

Thanks in advance.

opened by XingLLiu 1

Code for reproducible experiments presented in KSD Aggregated Goodness-of-fit Test.

Related tags

Overview

Code for KSDAgg: a KSD aggregated goodness-of-fit test

Requirements

Installation

Generating or downloading the data

Reproducing the experiments of the paper

References

Author

Bibtex

License

You might also like...

Projects for AI/ML and IoT integration for games and other presented at re:Invent 2021.

Prototypical python implementation of the trust-region algorithm presented in Sequential Linearization Method for Bound-Constrained Mathematical Programs with Complementarity Constraints by Larson, Leyffer, Kirches, and Manns.

Collection of TensorFlow2 implementations of Generative Adversarial Network varieties presented in research papers.

This is our ARTS test set, an enriched test set to probe Aspect Robustness of ABSA.

Lightweight, Python library for fast and reproducible experimentation :microscope:

Open-L2O: A Comprehensive and Reproducible Benchmark for Learning to Optimize Algorithms

MQBench: Towards Reproducible and Deployable Model Quantization Benchmark

Applications using the GTN library and code to reproduce experiments in "Differentiable Weighted Finite-State Transducers"

Code to reproduce the experiments in the paper "Transformer Based Multi-Source Domain Adaptation" (EMNLP 2020)

Comments

Undefined variable in KSDAGG + wild bootstrap

Owner

Antonin Schrab

This is code to fit per-pixel environment map with spherical Gaussian lobes, using LBFGS optimization

Provided is code that demonstrates the training and evaluation of the work presented in the paper: "On the Detection of Digital Face Manipulation" published in CVPR 2020.

Code for the Population-Based Bandits Algorithm, presented at NeurIPS 2020.

This YoloV5 based model is fit to detect people and different types of land vehicles, and displaying their density on a fitted map, according to their coordinates and detected labels.

Minimisation of a negative log likelihood fit to extract the lifetime of the D^0 meson (MNLL2ELDM)

Fit Fast, Explain Fast

Capture all information throughout your model's development in a reproducible way and tie results directly to the model code!

Official implementation of GraphMask as presented in our paper Interpreting Graph Neural Networks for NLP With Differentiable Edge Masking.

Official implementation of the network presented in the paper "M4Depth: A motion-based approach for monocular depth estimation on video sequences"

The materials used in the SaxonJS tutorial presented at Declarative Amsterdam, 2021