Code for Deterministic Neural Networks with Appropriate Inductive Biases Capture Epistemic and Aleatoric Uncertainty

Related tags

Deep Learning DDU
Overview

Deep Deterministic Uncertainty

arXiv Pytorch 1.8.1 License: MIT

This repository contains the code for Deterministic Neural Networks with Appropriate Inductive Biases Capture Epistemic and Aleatoric Uncertainty.

If the code or the paper has been useful in your research, please add a citation to our work:

@article{mukhoti2021deterministic,
  title={Deterministic Neural Networks with Appropriate Inductive Biases Capture Epistemic and Aleatoric Uncertainty},
  author={Mukhoti, Jishnu and Kirsch, Andreas and van Amersfoort, Joost and Torr, Philip HS and Gal, Yarin},
  journal={arXiv preprint arXiv:2102.11582},
  year={2021}
}

Dependencies

The code is based on PyTorch and requires a few further dependencies, listed in environment.yml. It should work with newer versions as well.

OoD Detection

Datasets

For OoD detection, you can train on CIFAR-10/100. You can also train on Dirty-MNIST by downloading Ambiguous-MNIST (amnist_labels.pt and amnist_samples.pt) from here and using the following training instructions.

Training

In order to train a model for the OoD detection task, use the train.py script. Following are the main parameters for training:

--seed: seed for initialization
--dataset: dataset used for training (cifar10/cifar100/dirty_mnist)
--dataset-root: /path/to/amnist_labels.pt and amnist_samples.pt/ (if training on dirty-mnist)
--model: model to train (wide_resnet/vgg16/resnet18/resnet50/lenet)
-sn: whether to use spectral normalization (available for wide_resnet, vgg16 and resnets)
--coeff: Coefficient for spectral normalization
-mod: whether to use architectural modifications (leaky ReLU + average pooling in skip connections)
--save-path: path/for/saving/model/

As an example, in order to train a Wide-ResNet-28-10 with spectral normalization and architectural modifications on CIFAR-10, use the following:

python train.py \
       --seed 1 \
       --dataset cifar10 \
       --model wide_resnet \
       -sn -mod \
       --coeff 3.0 

Similarly, to train a ResNet-18 with spectral normalization on Dirty-MNIST, use:

python train.py \
       --seed 1 \
       --dataset dirty-mnist \
       --dataset-root /home/user/amnist/ \
       --model resnet18 \
       -sn \
       --coeff 3.0

Evaluation

To evaluate trained models, use evaluate.py. This script can evaluate and aggregate results over multiple experimental runs. For example, if the pretrained models are stored in a directory path /home/user/models, store them using the following directory structure:

models
├── Run1
│   └── wide_resnet_1_350.model
├── Run2
│   └── wide_resnet_2_350.model
├── Run3
│   └── wide_resnet_3_350.model
├── Run4
│   └── wide_resnet_4_350.model
└── Run5
    └── wide_resnet_5_350.model

For an ensemble of models, store the models using the following directory structure:

model_ensemble
├── Run1
│   ├── wide_resnet_1_350.model
│   ├── wide_resnet_2_350.model
│   ├── wide_resnet_3_350.model
│   ├── wide_resnet_4_350.model
│   └── wide_resnet_5_350.model
├── Run2
│   ├── wide_resnet_10_350.model
│   ├── wide_resnet_6_350.model
│   ├── wide_resnet_7_350.model
│   ├── wide_resnet_8_350.model
│   └── wide_resnet_9_350.model
├── Run3
│   ├── wide_resnet_11_350.model
│   ├── wide_resnet_12_350.model
│   ├── wide_resnet_13_350.model
│   ├── wide_resnet_14_350.model
│   └── wide_resnet_15_350.model
├── Run4
│   ├── wide_resnet_16_350.model
│   ├── wide_resnet_17_350.model
│   ├── wide_resnet_18_350.model
│   ├── wide_resnet_19_350.model
│   └── wide_resnet_20_350.model
└── Run5
    ├── wide_resnet_21_350.model
    ├── wide_resnet_22_350.model
    ├── wide_resnet_23_350.model
    ├── wide_resnet_24_350.model
    └── wide_resnet_25_350.model

Following are the main parameters for evaluation:

--seed: seed used for initializing the first trained model
--dataset: dataset used for training (cifar10/cifar100)
--ood_dataset: OoD dataset to compute AUROC
--load-path: /path/to/pretrained/models/
--model: model architecture to load (wide_resnet/vgg16)
--runs: number of experimental runs
-sn: whether the model was trained using spectral normalization
--coeff: Coefficient for spectral normalization
-mod: whether the model was trained using architectural modifications
--ensemble: number of models in the ensemble
--model-type: type of model to load for evaluation (softmax/ensemble/gmm)

As an example, in order to evaluate a Wide-ResNet-28-10 with spectral normalization and architectural modifications on CIFAR-10 with OoD dataset as SVHN, use the following:

python evaluate.py \
       --seed 1 \
       --dataset cifar10 \
       --ood_dataset svhn \
       --load-path /path/to/pretrained/models/ \
       --model wide_resnet \
       --runs 5 \
       -sn -mod \
       --coeff 3.0 \
       --model-type softmax

Similarly, to evaluate the above model using feature density, set --model-type gmm. The evaluation script assumes that the seeds of models trained in consecutive runs differ by 1. The script stores the results in a json file with the following structure:

{
    "mean": {
        "accuracy": mean accuracy,
        "ece": mean ECE,
        "m1_auroc": mean AUROC using log density / MI for ensembles,
        "m1_auprc": mean AUPRC using log density / MI for ensembles,
        "m2_auroc": mean AUROC using entropy / PE for ensembles,
        "m2_auprc": mean AUPRC using entropy / PE for ensembles,
        "t_ece": mean ECE (post temp scaling)
        "t_m1_auroc": mean AUROC using log density / MI for ensembles (post temp scaling),
        "t_m1_auprc": mean AUPRC using log density / MI for ensembles (post temp scaling),
        "t_m2_auroc": mean AUROC using entropy / PE for ensembles (post temp scaling),
        "t_m2_auprc": mean AUPRC using entropy / PE for ensembles (post temp scaling)
    },
    "std": {
        "accuracy": std error accuracy,
        "ece": std error ECE,
        "m1_auroc": std error AUROC using log density / MI for ensembles,
        "m1_auprc": std error AUPRC using log density / MI for ensembles,
        "m2_auroc": std error AUROC using entropy / PE for ensembles,
        "m2_auprc": std error AUPRC using entropy / PE for ensembles,
        "t_ece": std error ECE (post temp scaling),
        "t_m1_auroc": std error AUROC using log density / MI for ensembles (post temp scaling),
        "t_m1_auprc": std error AUPRC using log density / MI for ensembles (post temp scaling),
        "t_m2_auroc": std error AUROC using entropy / PE for ensembles (post temp scaling),
        "t_m2_auprc": std error AUPRC using entropy / PE for ensembles (post temp scaling)
    },
    "values": {
        "accuracy": accuracy list,
        "ece": ece list,
        "m1_auroc": AUROC list using log density / MI for ensembles,
        "m2_auroc": AUROC list using entropy / PE for ensembles,
        "t_ece": ece list (post temp scaling),
        "t_m1_auroc": AUROC list using log density / MI for ensembles (post temp scaling),
        "t_m1_auprc": AUPRC list using log density / MI for ensembles (post temp scaling),
        "t_m2_auroc": AUROC list using entropy / PE for ensembles (post temp scaling),
        "t_m2_auprc": AUPRC list using entropy / PE for ensembles (post temp scaling)
    },
    "info": {dictionary of args}
}

Results

Dirty-MNIST

To visualise DDU's performance on Dirty-MNIST (i.e., Fig. 1 of the paper), use fig_1_plot.ipynb. The notebook requires a pretrained LeNet, VGG-16 and ResNet-18 with spectral normalization trained on Dirty-MNIST and visualises the softmax entropy and feature density for Dirty-MNIST (iD) samples vs Fashion-MNIST (OoD) samples. The notebook also visualises the softmax entropies of MNIST vs Ambiguous-MNIST samples for the ResNet-18+SN model (Fig. 2 of the paper). The following figure shows the output of the notebook for the LeNet, VGG-16 and ResNet18+SN model we trained on Dirty-MNIST.

CIFAR-10 vs SVHN

The following table presents results for a Wide-ResNet-28-10 architecture trained on CIFAR-10 with SVHN as the OoD dataset. For the full set of results, refer to the paper.

Method Aleatoric Uncertainty Epistemic Uncertainty Test Accuracy Test ECE AUROC
Softmax Softmax Entropy Softmax Entropy 95.98+-0.02 0.85+-0.02 94.44+-0.43
Energy-based Softmax Entropy Softmax Density 95.98+-0.02 0.85+-0.02 94.56+-0.51
5-Ensemble Predictive Entropy Predictive Entropy 96.59+-0.02 0.76+-0.03 97.73+-0.31
DDU (ours) Softmax Entropy GMM Density 95.97+-0.03 0.85+-0.04 98.09+-0.10

Active Learning

To run active learning experiments, use active_learning_script.py. You can run active learning experiments on both MNIST as well as Dirty-MNIST. When running with Dirty-MNIST, you will need to provide a pretrained model on Dirty-MNIST to distinguish between clean MNIST and Ambiguous-MNIST samples. The following are the main command line arguments for active_learning_script.py.

--seed: seed used for initializing the first model (later experimental runs will have seeds incremented by 1)
--model: model architecture to train (resnet18)
-ambiguous: whether to use ambiguous MNIST during training. If this is set to True, the models will be trained on Dirty-MNIST, otherwise they will train on MNIST.
--dataset-root: /path/to/amnist_labels.pt and amnist_samples.pt/
--trained-model: model architecture of pretrained model to distinguish clean and ambiguous MNIST samples
-tsn: if pretrained model has been trained using spectral normalization
--tcoeff: coefficient of spectral normalization used on pretrained model
-tmod: if pretrained model has been trained using architectural modifications (leaky ReLU and average pooling on skip connections)
--saved-model-path: /path/to/saved/pretrained/model/
--saved-model-name: name of the saved pretrained model file
--threshold: Threshold of softmax entropy to decide if a sample is ambiguous (samples having higher softmax entropy than threshold will be considered ambiguous)
--subsample: number of clean MNIST samples to use to subsample clean MNIST
-sn: whether to use spectral normalization during training
--coeff: coefficient of spectral normalization during training
-mod: whether to use architectural modifications (leaky ReLU and average pooling on skip connections) during training
--al-type: type of active learning acquisition model (softmax/ensemble/gmm)
-mi: whether to use mutual information for ensemble al-type
--num-initial-samples: number of initial samples in the training set
--max-training-samples: maximum number of training samples
--acquisition-batch-size: batch size for each acquisition step

As an example, to run the active learning experiment on MNIST using the DDU method, use:

python active_learning_script.py \
       --seed 1 \
       --model resnet18 \
       -sn -mod \
       --al-type gmm

Similarly, to run the active learning experiment on Dirty-MNIST using the DDU baseline, with a pretrained ResNet-18 with SN to distinguish clean and ambiguous MNIST samples, use the following:

python active_learning_script.py \
       --seed 1 \
       --model resnet18 \
       -sn -mod \
       -ambiguous \
       --dataset-root /home/user/amnist/ \
       --trained-model resnet18 \
       -tsn \
       --saved-model-path /path/to/pretrained/model \
       --saved-model-name resnet18_sn_3.0_1_350.model \
       --threshold 1.0 \
       --subsample 1000 \
       --al-type gmm

Results

The active learning script stores all results in json files. The MNIST test set accuracy is stored in a json file with the following structure:

{
    "experiment run": list of MNIST test set accuracies one per acquisition step
}

When using ambiguous samples in the pool set, the script also stores the fraction of ambiguous samples acquired in each step in the following json:

{
    "experiment run": list of fractions of ambiguous samples in the acquired training set
}

Visualisation

To visualise results from the above json files, use the al_plot.ipynb notebook. The following diagram shows the performance of different baselines (softmax, ensemble PE, ensemble MI and DDU) on MNIST and Dirty-MNIST.

Questions

For any questions, please feel free to raise an issue or email us directly. Our emails can be found on the paper.

Comments
  • ECE much lower than reported

    ECE much lower than reported

    Thanks for the great repo! I've been curious about DDU so I've been playing around with some examples. I fit a wide resnet to CIFAR-10 with spectral normalization, and while I got a reasonable test accuracy (0.9587), my ECE is much lower than the values reported in the DDU paper.

    I see test ECE's of 0.023 on CIFAR-10, but the paper reports ECE's nearly 2 orders of magnitude higher, of 0.85. Does the ECE method in this repository report a metric that is somehow different than the metric used in the DDU paper?

    Should the ece reported by the expected_calibration_error method be multiplied by 100, so my test ECE is actually 2.3 and higher than expected?

    I believe I am using all the same hyperparameters:

    batch_size=128
    val_size = 0.1
    spec_norm = True
    coeff = 3
    mod = True
    
    lr = 0.1
    momentum = 0.9
    weight_decay = 5e-4
    
    epochs = 350
    milestone_1 = 150
    milestone_2 = 250 
    opened by kheuton 2
  • Multiplication of prior with log likelihood - Code not found !!

    Multiplication of prior with log likelihood - Code not found !!

    @BlackHC Can you please tell me where you are multiplying the prior with the log_prob of the logits during evaluation. grafik

    As mentioned in the line 12: We have to multiply the probability of the feature vector with the prior obtained from training. I saw a part where you perform calulcation of log probability

    https://github.com/omegafragger/DDU/blob/f597744c65df4ff51615ace5e86e82ffefe1cd0f/utils/gmm_utils.py#L79

    https://github.com/omegafragger/DDU/blob/f597744c65df4ff51615ace5e86e82ffefe1cd0f/metrics/uncertainty_confidence.py#L16 and https://github.com/omegafragger/DDU/blob/f597744c65df4ff51615ace5e86e82ffefe1cd0f/evaluate.py#L211

    but didnt actually find the code snip where you were multiplying with the prior. Is it required or is it enough to just calculate the logsumexp() of the logits obtained.

    Thanks @BlackHC

    opened by jeethesh-pai 4
  • DDU for semantic segmentation

    DDU for semantic segmentation

    Hi Thanks for this interesting work. I read you other article on applying DDU to semantic segmentation, and I was wondering if there's another repository specific for that one ? Otherwise how could one adapt the code in this repository to make it work with segmentation task mentioned in the other paper

    opened by AlkaSaliss 0
  • Error in computing gaussian_model

    Error in computing gaussian_model

    The gaussian mixture model computations from mean and covariance matrix (obtained from feature's of model - nn.feature), is constantly throwing the error that covariance matrix is not positive definite

    gmm = torch.distributions.MultivariateNormal(
          loc=classwise_mean_features, covariance_matrix=(classwise_cov_features + jitter),
    )
    

    Error:

    Expected parameter covariance_matrix (Tensor of shape (4, 512, 512)) of distribution MultivariateNormal(loc: torch.Size([4, 512]), covariance_matrix: torch.Size([4, 512, 512])) to satisfy th
    e constraint PositiveDefinite(), but found invalid values:
    

    I am not sure if I am doing anything wrong, I am using ResNet18 BTW

    opened by Karthik-Ragunath 9
  • Feature Densities Always Zero

    Feature Densities Always Zero

    Hello,

    I'm having an issue with the active learning script. I'm running:

    CUDA_VISIBLE_DEVICES=7 python active_learning_script.py --seed 1 --model resnet18 -sn -mod --al-type gmm

    and I have set a breakpoint right before acquisition in active_learning_script.py here:

    Screen Shot 2022-02-04 at 1 12 13 PM

    whenever I inspect the result of compute_density(logits, class_prob), density for all instances is always zero like so:

    Screen Shot 2022-02-04 at 1 18 38 PM

    Upon digging deeper, I noticed that the feature Gaussians always assign extremely low probability to every class, leading to all of these zeros. This happens at every acquisition step for me, regardless of the dataset size and seed (I've tried 2 seeds). I've even tried programmatically setting up a check to tell me if the densities are ever not zero for entire training runs, and it never happens. Thus, for me, the following call to torch.topk amounts to random selection at every acquisition step. I'm wondering if this is an issue that you've experienced.

    Thanks, Jacob

    opened by JacobOaks 1
Owner
Jishnu Mukhoti
Graduate Student in Computer Science
Jishnu Mukhoti
Study of human inductive biases in CNNs and Transformers.

Are Convolutional Neural Networks or Transformers more like human vision? This repository contains the code and fine-tuned models of popular Convoluti

Shikhar Tuli 39 Dec 8, 2022
Neural Logic Inductive Learning

Neural Logic Inductive Learning This is the implementation of the Neural Logic Inductive Learning model (NLIL) proposed in the ICLR 2020 paper: Learn

null 36 Nov 28, 2022
Official implementation of the ICCV 2021 paper "Joint Inductive and Transductive Learning for Video Object Segmentation"

JOINT This is the official implementation of Joint Inductive and Transductive learning for Video Object Segmentation, to appear in ICCV 2021. @inproce

Yunyao 35 Oct 16, 2022
A library for uncertainty representation and training in neural networks.

Epistemic Neural Networks A library for uncertainty representation and training in neural networks. Introduction Many applications in deep learning re

DeepMind 211 Dec 12, 2022
"Inductive Entity Representations from Text via Link Prediction" @ The Web Conference 2021

Inductive entity representations from text via link prediction This repository contains the code used for the experiments in the paper "Inductive enti

Daniel Daza 45 Jan 9, 2023
Towards Open-World Feature Extrapolation: An Inductive Graph Learning Approach

This repository holds the implementation for paper Towards Open-World Feature Extrapolation: An Inductive Graph Learning Approach Download our preproc

Qitian Wu 42 Dec 27, 2022
[ICML 2021] Towards Understanding and Mitigating Social Biases in Language Models

Towards Understanding and Mitigating Social Biases in Language Models This repo contains code and data for evaluating and mitigating bias from generat

Paul Liang 42 Jan 3, 2023
Towards Debiasing NLU Models from Unknown Biases

Towards Debiasing NLU Models from Unknown Biases Abstract: NLU models often exploit biased features to achieve high dataset-specific performance witho

Ubiquitous Knowledge Processing Lab 22 Jun 14, 2022
How to Become More Salient? Surfacing Representation Biases of the Saliency Prediction Model

How to Become More Salient? Surfacing Representation Biases of the Saliency Prediction Model

Bogdan Kulynych 49 Nov 5, 2022
PyTorch implementation of Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation.

ALiBi PyTorch implementation of Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation. Quickstart Clone this reposit

Jake Tae 4 Jul 27, 2022
This library provides an abstraction to perform Model Versioning using Weight & Biases.

Description This library provides an abstraction to perform Model Versioning using Weight & Biases. Features Version a new trained model Promote a mod

Hector Lopez Almazan 2 Jan 28, 2022
MINIROCKET: A Very Fast (Almost) Deterministic Transform for Time Series Classification

MINIROCKET: A Very Fast (Almost) Deterministic Transform for Time Series Classification

null 187 Dec 26, 2022
A Pytorch implementation of the multi agent deep deterministic policy gradients (MADDPG) algorithm

Multi-Agent-Deep-Deterministic-Policy-Gradients A Pytorch implementation of the multi agent deep deterministic policy gradients(MADDPG) algorithm This

Phil Tabor 159 Dec 28, 2022
This tool converts a Nondeterministic Finite Automata (NFA) into a Deterministic Finite Automata (DFA)

This tool converts a Nondeterministic Finite Automata (NFA) into a Deterministic Finite Automata (DFA)

Quinn Herden 1 Feb 4, 2022
Official pytorch implementation for Learning to Listen: Modeling Non-Deterministic Dyadic Facial Motion (CVPR 2022)

Learning to Listen: Modeling Non-Deterministic Dyadic Facial Motion This repository contains a pytorch implementation of "Learning to Listen: Modeling

null 50 Dec 17, 2022
Fast and scalable uncertainty quantification for neural molecular property prediction, accelerated optimization, and guided virtual screening.

Evidential Deep Learning for Guided Molecular Property Prediction and Discovery Ava Soleimany*, Alexander Amini*, Samuel Goldman*, Daniela Rus, Sangee

Alexander Amini 75 Dec 15, 2022
Capture all information throughout your model's development in a reproducible way and tie results directly to the model code!

Rubicon Purpose Rubicon is a data science tool that captures and stores model training and execution information, like parameters and outcomes, in a r

Capital One 97 Jan 3, 2023
This is the official code release for the paper Shape and Material Capture at Home

This is the official code release for the paper Shape and Material Capture at Home. The code enables you to reconstruct a 3D mesh and Cook-Torrance BRDF from one or more images captured with a flashlight or camera with flash.

null 89 Dec 10, 2022
Code of Adverse Weather Image Translation with Asymmetric and Uncertainty aware GAN

Adverse Weather Image Translation with Asymmetric and Uncertainty-aware GAN (AU-GAN) Official Tensorflow implementation of Adverse Weather Image Trans

Jeong-gi Kwak 36 Dec 26, 2022