Detecting Errors and Estimating Accuracy on Unlabeled Data with Self-training Ensembles
This project is for the paper: Detecting Errors and Estimating Accuracy on Unlabeled Data with Self-training Ensembles.
Experimental Results
Preliminaries
It is tested under Ubuntu Linux 16.04.1 and Python 3.6 environment, and requries some packages to be installed:
Downloading Datasets
- MNIST-M: download it from the Google drive. Extract the files and place them in
./dataset/mnist_m/
. - SVHN: need to download Format 2 data (
*.mat
). Place the files in./dataset/svhn/
. - USPS: download the usps.h5 file. Place the file in
./dataset/usps/
.
Overview of the Code
train_model.py
: train standard models via supervised learning.train_dann.py
: train domain adaptive (DANN) models.eval_pipeline.py
: evaluate various methods on all tasks.
Running Experiments
Examples
- To train a standard model via supervised learning, you can use the following command:
python train_model.py --source-dataset {source dataset} --model-type {model type} --base-dir {directory to save the model}
{source dataset}
can be mnist
, mnist-m
, svhn
or usps
.
{model type}
can be typical_dnn
or dann_arch
.
- To train a domain adaptive (DANN) model, you can use the following command:
python train_dann.py --source-dataset {source dataset} --target-dataset {target dataset} --base-dir {directory to save the model} [--test-time]
{source dataset}
(or {target dataset}
) can be mnist
, mnist-m
, svhn
or usps
.
The argument --test-time
is to indicate whether to replace the target training dataset with the target test dataset.
- To evaluate a method on all training-test dataset pairs, you can use the following command:
python eval_pipeline.py --model-type {model type} --method {method}
{model type}
can be typical_dnn
or dann_arch
.
{method}
can be conf_avg
, ensemble_conf_avg
, conf
, trust_score
, proxy_risk
, our_ri
or our_rm
.
Train All Models
You can run the following scrips to pre-train all models needed for the experiments.
run_all_model_training.sh
: train all supervised learning models.run_all_dann_training.sh
: train all DANN models.run_all_ensemble_training.sh
: train all ensemble models.
Evaluate All Methods
You can run the following script to get the results reported in the paper.
run_all_evaluation.sh
: evaluate all methods on all tasks.
Acknowledgements
Part of this code is inspired by estimating-generalization and TrustScore.
Citation
Please cite our work if you use the codebase:
@article{chen2021detecting,
title={Detecting Errors and Estimating Accuracy on Unlabeled Data with Self-training Ensembles},
author={Chen, Jiefeng and Liu, Frederick and Avci, Besim and Wu, Xi and Liang, Yingyu and Jha, Somesh},
journal={arXiv preprint arXiv:2106.15728},
year={2021}
}
License
Please refer to the LICENSE.