PyTorch implementation of "A Simple Baseline for Low-Budget Active Learning".

Overview

A Simple Baseline for Low-Budget Active Learning

This repository is the implementation of A Simple Baseline for Low-Budget Active Learning. In this paper, we are interested in low-budget active learning where only a small subset of unlabeled data, e.g. 0.2% of ImageNet, can be annotated. We show that although the state-of-the-art active learning methods work well given a large budget of data labeling, a simple k-means clustering algorithm can outperform them on low budgets. Our code is modified from CompRess [1].

@article{pourahmadi2021simple,
  title={A Simple Baseline for Low-Budget Active Learning},
  author={Pourahmadi, Kossar and Nooralinejad, Parsa and Pirsiavash, Hamed},
  journal={arXiv preprint arXiv:2110.12033},
  year={2021}
}

Benchmarks

We implemented the following query strategies in strategies.py on CIFAR-10, CIFAR-100, ImageNet, and ImageNet-LT datasets:

a) Single-batch k-means: At each round, it clusters the whole dataset to budget size clusters and sends nearest neighbors of centers directly to the oracle to be annotated.

b) Multi-batch k-means: Uses the difference of two consecutive budget sizes as the number of clusters and picks those nearest examples to centers that have not been labeled previously by the oracle.

c) Core-set [2]

d) Max-Entropy [3]: Treats the entropy of example probability distribution output as an uncertainty score and samples uncertain points for annotation.

e) Uniform: Selects equal number of samples randomly from all classes.

f) Random: Samples are selected randomly (uniformly) from the entire dataset.

Requirements

Usage

This implementation supports multi-gpu, DataParallel or single-gpu training.

You have the following options to run commands:

  • --arch We use pre-trained ResNet-18 with CompRess (download weights) or pre-trained ResNet-50 with MoCo-v2 (download weights). Use one of resnet18 or resnet50 as the argument accordingly.
  • --backbone compress, moco
  • --splits You can define budget sizes with comma as a seperator. For instance, --splits 10,20.
  • --name Specify the query strategy name by using one of uniform random kmeans accu_kmeans coreset.
  • --dataset Indicate the unlabeled dataset name by using one of cifar10 cifar100 imagenet imagenet_lt.

Sample selection

If the strategy needs an initial pool (accu_kmeans or coreset) then pass the file path with --resume-indices.

python sampler.py \
--arch resnet18 \
--weights [path to weights] \
--backbone compress \
--batch-size 4 \
--workers 4 \
--splits 100 \
--load_cache \
--name kmeans \
--dataset cifar10 \
[path to dataset file]

Linear classification

python eval_lincls.py \
--arch resnet18 \
--weights [path to weights] \
--backbone compress \
--batch-size 128 \
--workers 4 \
--lr 0.01 \
--lr_schedule 50,75 \
--epochs 100 \
--splits 1000 \  
--load_cache \
--name random \
--dataset imagenet \
[path to dataset file]

Nearest neighbor classification

python eval_knn.py \
--arch resnet18 \
--weights [path to weights] \
--backbone compress \
--batch-size 128 \
--workers 8 \
--splits 1000 \
--load_cache \
--name random \
--dataset cifar10 \
[path to dataset file]

Entropy sampling

To sample data using Max-Entropy, use active_sampler.py and entropy for --name. Give the initial pool indices file path with --resume-indices.

python active_sampler.py \
--arch resnet18 \
--weights [path to weights] \
--backbone compress \
--batch-size 128 \
--workers 4 \
--lr 0.001 \
--lr_schedule 50,75 \
--epochs 100 \
--splits 2000 \
--load_cache \
--name entropy \
--resume-indices [path to random initial pool file] \
--dataset imagenet \
[path to dataset file]

Fine-tuning

This file is implemented only for CompRess ResNet-18 backbone on ImageNet. --lr is the learning rate of backbone and --lr-lin is for the linear classifier.

python finetune.py \
--arch resnet18 \
--weights [path to weights] \
--batch-size 128 \
--workers 16 \
--epochs 100 \
--lr_schedule 50,75 \
--lr 0.0001 \
--lr-lin 0.01 \
--splits 1000 \
--name kmeans \
--dataset imagenet \
[path to dataset file]

Training from scratch

Starting from a random initialized network, you can train the model on CIFAR-100 or ImageNet.

python trainer_DP.py \
--arch resnet18 \
--batch-size 128 \
--workers 4 \
--epochs 100 \
--lr 0.1 \
--lr_schedule 30,60,90 \
--splits 1000 \
--name kmeans \
--dataset imagenet \
[path to dataset file]

References

[1] CompRess: Self-Supervised Learning by Compressing Representations, NeurIPS, 2020

[2] Active Learning for Convolutional Neural Networks: A Core-Set Approach, ICLR, 2018

[3] A new active labeling method for deep learning, IJCNN, 2014

You might also like...
A PyTorch Extension: Tools for easy mixed precision and distributed training in Pytorch

This repository holds NVIDIA-maintained utilities to streamline mixed precision and distributed training in Pytorch. Some of the code here will be included in upstream Pytorch eventually. The intention of Apex is to make up-to-date utilities available to users as quickly as possible.

Objective of the repository is to learn and build machine learning models using Pytorch. 30DaysofML Using Pytorch
Objective of the repository is to learn and build machine learning models using Pytorch. 30DaysofML Using Pytorch

30 Days Of Machine Learning Using Pytorch Objective of the repository is to learn and build machine learning models using Pytorch. List of Algorithms

Pretrained SOTA Deep Learning models, callbacks and more for research and production with PyTorch Lightning and PyTorch
Pretrained SOTA Deep Learning models, callbacks and more for research and production with PyTorch Lightning and PyTorch

Pretrained SOTA Deep Learning models, callbacks and more for research and production with PyTorch Lightning and PyTorch

Amazon Forest Computer Vision: Satellite Image tagging code using PyTorch / Keras with lots of PyTorch tricks
Amazon Forest Computer Vision: Satellite Image tagging code using PyTorch / Keras with lots of PyTorch tricks

Amazon Forest Computer Vision Satellite Image tagging code using PyTorch / Keras Here is a sample of images we had to work with Source: https://www.ka

The Incredible PyTorch: a curated list of tutorials, papers, projects, communities and more relating to PyTorch.
The Incredible PyTorch: a curated list of tutorials, papers, projects, communities and more relating to PyTorch.

This is a curated list of tutorials, projects, libraries, videos, papers, books and anything related to the incredible PyTorch. Feel free to make a pu

Amazon Forest Computer Vision: Satellite Image tagging code using PyTorch / Keras with lots of PyTorch tricks
Amazon Forest Computer Vision: Satellite Image tagging code using PyTorch / Keras with lots of PyTorch tricks

Amazon Forest Computer Vision Satellite Image tagging code using PyTorch / Keras Here is a sample of images we had to work with Source: https://www.ka

A bunch of random PyTorch models using PyTorch's C++ frontend
A bunch of random PyTorch models using PyTorch's C++ frontend

PyTorch Deep Learning Models using the C++ frontend Gettting started Clone the repo 1. https://github.com/mrdvince/pytorchcpp 2. cd fashionmnist or

PyTorch Autoencoders - Implementing a Variational Autoencoder (VAE) Series in Pytorch.

PyTorch Autoencoders Implementing a Variational Autoencoder (VAE) Series in Pytorch. Inspired by this repository Model List check model paper conferen

PyTorch-LIT is the Lite Inference Toolkit (LIT) for PyTorch which focuses on easy and fast inference of large models on end-devices.

PyTorch-LIT PyTorch-LIT is the Lite Inference Toolkit (LIT) for PyTorch which focuses on easy and fast inference of large models on end-devices. With

Comments
  • I can't run the sampler.py in Windows

    I can't run the sampler.py in Windows

    I try to run the experiment on the ImageNet_LT dataset but it occur wrong

    image

    Then I try to get the Image that read from getitem to be transfer into tensor but it occur

    image

    My computer is windows11 with cuda 11.7 python 3.6

    opened by Ffffffffchopin 1
Owner
null
An essential implementation of BYOL in PyTorch + PyTorch Lightning

Essential BYOL A simple and complete implementation of Bootstrap your own latent: A new approach to self-supervised Learning in PyTorch + PyTorch Ligh

Enrico Fini 48 Sep 27, 2022
RealFormer-Pytorch Implementation of RealFormer using pytorch

RealFormer-Pytorch Implementation of RealFormer using pytorch. Includes comparison with classical Transformer on image classification task (ViT) wrt C

Simo Ryu 90 Dec 8, 2022
A PyTorch implementation of the paper Mixup: Beyond Empirical Risk Minimization in PyTorch

Mixup: Beyond Empirical Risk Minimization in PyTorch This is an unofficial PyTorch implementation of mixup: Beyond Empirical Risk Minimization. The co

Harry Yang 121 Dec 17, 2022
A pytorch implementation of Pytorch-Sketch-RNN

Pytorch-Sketch-RNN A pytorch implementation of https://arxiv.org/abs/1704.03477 In order to draw other things than cats, you will find more drawing da

Alexis David Jacq 172 Dec 12, 2022
PyTorch implementation of Advantage async actor-critic Algorithms (A3C) in PyTorch

Advantage async actor-critic Algorithms (A3C) in PyTorch @inproceedings{mnih2016asynchronous, title={Asynchronous methods for deep reinforcement lea

LEI TAI 111 Dec 8, 2022
Pytorch-diffusion - A basic PyTorch implementation of 'Denoising Diffusion Probabilistic Models'

PyTorch implementation of 'Denoising Diffusion Probabilistic Models' This reposi

Arthur Juliani 76 Jan 7, 2023
Fang Zhonghao 13 Nov 19, 2022
RETRO-pytorch - Implementation of RETRO, Deepmind's Retrieval based Attention net, in Pytorch

RETRO - Pytorch (wip) Implementation of RETRO, Deepmind's Retrieval based Attent

Phil Wang 556 Jan 4, 2023
HashNeRF-pytorch - Pure PyTorch Implementation of NVIDIA paper on Instant Training of Neural Graphics primitives

HashNeRF-pytorch Instant-NGP recently introduced a Multi-resolution Hash Encodin

Yash Sanjay Bhalgat 616 Jan 6, 2023
Generic template to bootstrap your PyTorch project with PyTorch Lightning, Hydra, W&B, and DVC.

NN Template Generic template to bootstrap your PyTorch project. Click on Use this Template and avoid writing boilerplate code for: PyTorch Lightning,

Luca Moschella 520 Dec 30, 2022