Propose a principled and practically effective framework for unsupervised accuracy estimation and error detection tasks with theoretical analysis and state-of-the-art performance.

Overview

Detecting Errors and Estimating Accuracy on Unlabeled Data with Self-training Ensembles

This project is for the paper: Detecting Errors and Estimating Accuracy on Unlabeled Data with Self-training Ensembles.

Experimental Results

Main Results

Preliminaries

It is tested under Ubuntu Linux 16.04.1 and Python 3.6 environment, and requries some packages to be installed:

Downloading Datasets

  • MNIST-M: download it from the Google drive. Extract the files and place them in ./dataset/mnist_m/.
  • SVHN: need to download Format 2 data (*.mat). Place the files in ./dataset/svhn/.
  • USPS: download the usps.h5 file. Place the file in ./dataset/usps/.

Overview of the Code

  • train_model.py: train standard models via supervised learning.
  • train_dann.py: train domain adaptive (DANN) models.
  • eval_pipeline.py: evaluate various methods on all tasks.

Running Experiments

Examples

  • To train a standard model via supervised learning, you can use the following command:

python train_model.py --source-dataset {source dataset} --model-type {model type} --base-dir {directory to save the model}

{source dataset} can be mnist, mnist-m, svhn or usps.

{model type} can be typical_dnn or dann_arch.

  • To train a domain adaptive (DANN) model, you can use the following command:

python train_dann.py --source-dataset {source dataset} --target-dataset {target dataset} --base-dir {directory to save the model} [--test-time]

{source dataset} (or {target dataset}) can be mnist, mnist-m, svhn or usps.

The argument --test-time is to indicate whether to replace the target training dataset with the target test dataset.

  • To evaluate a method on all training-test dataset pairs, you can use the following command:

python eval_pipeline.py --model-type {model type} --method {method}

{model type} can be typical_dnn or dann_arch.

{method} can be conf_avg, ensemble_conf_avg, conf, trust_score, proxy_risk, our_ri or our_rm.

Train All Models

You can run the following scrips to pre-train all models needed for the experiments.

  • run_all_model_training.sh: train all supervised learning models.
  • run_all_dann_training.sh: train all DANN models.
  • run_all_ensemble_training.sh: train all ensemble models.

Evaluate All Methods

You can run the following script to get the results reported in the paper.

  • run_all_evaluation.sh: evaluate all methods on all tasks.

Acknowledgements

Part of this code is inspired by estimating-generalization and TrustScore.

Citation

Please cite our work if you use the codebase:

@article{chen2021detecting,
  title={Detecting Errors and Estimating Accuracy on Unlabeled Data with Self-training Ensembles},
  author={Chen, Jiefeng and Liu, Frederick and Avci, Besim and Wu, Xi and Liang, Yingyu and Jha, Somesh},
  journal={arXiv preprint arXiv:2106.15728},
  year={2021}
}

License

Please refer to the LICENSE.

You might also like...
Re-implementation of the Noise Contrastive Estimation algorithm for pyTorch, following "Noise-contrastive estimation: A new estimation principle for unnormalized statistical models." (Gutmann and Hyvarinen, AISTATS 2010)

Noise Contrastive Estimation for pyTorch Overview This repository contains a re-implementation of the Noise Contrastive Estimation algorithm, implemen

A fast, distributed, high performance gradient boosting (GBT, GBDT, GBRT, GBM or MART) framework based on decision tree algorithms, used for ranking, classification and many other machine learning tasks.

Light Gradient Boosting Machine LightGBM is a gradient boosting framework that uses tree based learning algorithms. It is designed to be distributed a

TorchDistiller - a collection of the open source pytorch code for knowledge distillation, especially for the perception tasks, including semantic segmentation, depth estimation, object detection and instance segmentation.

This project is a collection of the open source pytorch code for knowledge distillation, especially for the perception tasks, including semantic segmentation, depth estimation, object detection and instance segmentation.

Implementation for the paper 'YOLO-ReT: Towards High Accuracy Real-time Object Detection on Edge GPUs'

YOLO-ReT This is the original implementation of the paper: YOLO-ReT: Towards High Accuracy Real-time Object Detection on Edge GPUs. Prakhar Ganesh, Ya

MEDS: Enhancing Memory Error Detection for Large-Scale Applications

MEDS: Enhancing Memory Error Detection for Large-Scale Applications Prerequisites cmake and clang Build MEDS supporting compiler $ make Build Using Do

tsai is an open-source deep learning package built on top of Pytorch & fastai focused on state-of-the-art techniques for time series classification, regression and forecasting.
tsai is an open-source deep learning package built on top of Pytorch & fastai focused on state-of-the-art techniques for time series classification, regression and forecasting.

Time series Timeseries Deep Learning Pytorch fastai - State-of-the-art Deep Learning with Time Series and Sequences in Pytorch / fastai

Deep Text Search is an AI-powered multilingual text search and recommendation engine with state-of-the-art transformer-based multilingual text embedding (50+ languages).
Deep Text Search is an AI-powered multilingual text search and recommendation engine with state-of-the-art transformer-based multilingual text embedding (50+ languages).

Deep Text Search - AI Based Text Search & Recommendation System Deep Text Search is an AI-powered multilingual text search and recommendation engine w

A selection of State Of The Art research papers (and code) on human locomotion (pose + trajectory) prediction (forecasting)

A selection of State Of The Art research papers (and code) on human trajectory prediction (forecasting). Papers marked with [W] are workshop papers.

PaddleViT: State-of-the-art Visual Transformer and MLP Models for PaddlePaddle 2.0+
PaddleViT: State-of-the-art Visual Transformer and MLP Models for PaddlePaddle 2.0+

PaddlePaddle Vision Transformers State-of-the-art Visual Transformer and MLP Models for PaddlePaddle 🤖 PaddlePaddle Visual Transformers (PaddleViT or

Owner
Jiefeng Chen
Phd student at UW-Madision, working on trustworthy machine learning.
Jiefeng Chen
OpenFace – a state-of-the art tool intended for facial landmark detection, head pose estimation, facial action unit recognition, and eye-gaze estimation.

OpenFace 2.2.0: a facial behavior analysis toolkit Over the past few years, there has been an increased interest in automatic facial behavior analysis

Tadas Baltrusaitis 5.8k Dec 31, 2022
We propose a new method for effective shadow removal by regarding it as an exposure fusion problem.

Auto-exposure fusion for single-image shadow removal We propose a new method for effective shadow removal by regarding it as an exposure fusion proble

Qing Guo 146 Dec 31, 2022
Implementation of the state of the art beat-detection, downbeat-detection and tempo-estimation model

The ISMIR 2020 Beat Detection, Downbeat Detection and Tempo Estimation Model Implementation. This is an implementation in TensorFlow to implement the

Koen van den Brink 1 Nov 12, 2021
Principled Detection of Out-of-Distribution Examples in Neural Networks

ODIN: Out-of-Distribution Detector for Neural Networks This is a PyTorch implementation for detecting out-of-distribution examples in neural networks.

null 189 Nov 29, 2022
Code for the paper A Theoretical Analysis of the Repetition Problem in Text Generation

A Theoretical Analysis of the Repetition Problem in Text Generation This repository share the code for the paper "A Theoretical Analysis of the Repeti

Zihao Fu 37 Nov 21, 2022
LaneDet is an open source lane detection toolbox based on PyTorch that aims to pull together a wide variety of state-of-the-art lane detection models

LaneDet is an open source lane detection toolbox based on PyTorch that aims to pull together a wide variety of state-of-the-art lane detection models. Developers can reproduce these SOTA methods and build their own methods.

TuZheng 405 Jan 4, 2023
To propose and implement a multi-class classification approach to disaster assessment from the given data set of post-earthquake satellite imagery.

To propose and implement a multi-class classification approach to disaster assessment from the given data set of post-earthquake satellite imagery.

Kunal Wadhwa 2 Jan 5, 2022
Video Instance Segmentation with a Propose-Reduce Paradigm (ICCV 2021)

Propose-Reduce VIS This repo contains the official implementation for the paper: Video Instance Segmentation with a Propose-Reduce Paradigm Huaijia Li

DV Lab 39 Nov 23, 2022
Neural Reprojection Error: Merging Feature Learning and Camera Pose Estimation

Neural Reprojection Error: Merging Feature Learning and Camera Pose Estimation This is the official repository for our paper Neural Reprojection Error

Hugo Germain 78 Dec 1, 2022