A Domain-Agnostic Benchmark for Self-Supervised Learning

Related tags

Deep Learning dabs
Overview

DABS: A Domain Agnostic Benchmark for Self-Supervised Learning

This repository contains the code for DABS, a benchmark for domain-agnostic self-supervised learning algorithms. The basic components of the benchmark can be found in datasets, encoders, and algorithms. Training is implemented with the PyTorch Lightning framework, logging with Weights and Biases, and configuration management with Hydra.

Usage

We provide support for Python >= 3.7. Install requirements with

python -m pip install -r requirements.txt

For instructions on how to install PyTorch versions compatible with your CUDA versions, see pytorch.org.

Datasets

We provide a set of dataset implementations (in src/datasets) from image, text, speech, sensor, medical imaging, and image-text domains. Preprocessing operations on these datasets are minimal and hard-coded as simple resizing (i.e. of images) and truncations (i.e. of text, audio). These should not be changed so as to maintain fair comparisons across other users of the benchmark.

See conf/datasets/*.yaml for all dataset configs, including the loss, metrics, and batch size used for each dataset.

Almost all datasets will download automatically when the dataset class is instantiated. The exceptions are the CheXpert, ImageNet, and CU Birds datasets, where manual registration or download is required. See the respective dataset files for specific instructions.

Pretraining Dataset (unlabeled) Transfer Dataset (labeled)
CIFAR10 Aircraft, CIFAR10, CU Birds, DTD, Traffic Sign, VGG Flower
PAMAP2 PAMAP2
MSCOCO MSCOCO (mismatched detection), VQA (Binary classification)
Wikitext-103 GLUE (10 Tasks)
mC4 PAWS-X (7 Tasks)
CheXpert CheXpert (atelectasis, cardiomegaly, consolidation, edema, and pleural effusion), ChestX-ray8 (atelectasis, cardiomegaly, effusion, infiltration, mass, nodule, pneumonia, pneumothorax)
LibriSpeech Audio MNIST, Fluent Speech (Action, Object, Location), Google Speech Commands, LibriSpeech, VoxCeleb1

Pretraining

During the pretraining phase, self-supervised encoders are trained to learn good representations from unlabeled data. We currently support seven datasets for pretraining, one for each domain: MS COCO, ImageNet, CheXpert, PAMAP2, mC4, WikiText-103, and LibriSpeech. If the pretraining dataset has associated labels, an online linear evaluator is jointly trained with the encoder to provide a heuristic of transfer performance.

Run pretraining with commands like

python pretrain.py exp.name=<experiment-name> dataset=<dataset> algorithm=<algorithm>

Each dataset and encoder has its own config file, so to train a Transformer on the CheXpert dataset with the e-Mix algorithm, run

python pretrain.py exp.name=emix-chexpert encoder=transformer dataset=chexpert algorithm=emix

See conf/pretrain.yaml for all pretraining configuration fields.

For more information on the datasets, encoders, and algorithms, see the following section.

Pretraining Dataset Modality Label type (unused) Input Type
CIFAR10 Natural images Single label 2d
PAMAP2 Sensor Single label 2d
MSCOCO Captioned images Single label 2d +
tokens
WikiText-103 English Text No label tokens
mC4 Multilingual Text No label tokens
CheXpert Medical images Multi label 2d
LibriSpeech Speech No label 2d

Transfer Learning

After pretraining, a small linear classifier is trained on top of the frozen encoder. Run transfer learning from a randomly initialized encoder with

python transfer.py exp.name=<experiment-name> dataset=<dataset> ckpt=null 

See conf/transfer.yaml for all transfer learning configuration fields and optionally replace null with the path to your pretrained encoder checkpoint.

Dataset Modality Label type Evaluation metric Input Type
Aircraft Natural images Single label Accuracy 2d
CU Birds Natural images Single label Accuracy 2d
DTD Natural images Single label Accuracy 2d
Traffic Sign Natural images Single label Accuracy 2d
VGG Flower Natural images Single label Accuracy 2d
Pamap2 Sensor Single label Accuracy 2d
MS COCO Captioned images Binary label Accuracy 2d +
tokens
VQA Captioned images Binary label Accuracy 2d +
tokens
CheXpert Medical images Multi label AUROC 2d
ChestX-ray8 Medical images Multi label AUROC 2d
PAWS-X Multilingual Text Binary label Accuracy tokens
COLA English Text Binary label Pearson correlation tokens
MNLI Matched English Text Single label Accuracy tokens
MNLI Mismatched English Text Single label Accuracy tokens
MRPC English Text Binary label Accuracy tokens
QNLI English Text Binary label Accuracy tokens
QQP English Text Binary label Accuracy tokens
RTE English Text Binary label Accuracy tokens
SST2 English Text Binary label Accuracy tokens
STSB English Text Regression Spearman correlation tokens
WNLI English Text Binary label Accuracy tokens
Audio MNIST Speech Single label Accuracy 2d
Fluent Speech Speech Single label Accuracy 2d
Google Speech Commands Speech Single label Accuracy 2d
LibriSpeech Speech Single label Accuracy 2d
VoxCeleb1 Speech Single label Accuracy 2d

Encoders

A domain-agnostic SSL method should have an encoder which remains as constant as possible across domains. We provide a general transformer encoder baseline (in src/encoders). The transformer operates on a sequence of vectors that are produced by a small set of embedding modules (e.g. patch or token embeddings).

Pretraining algorithms

The pretraining algorithm is the framework and objective that the encoder is trained with. Examples of domain-specific algorithms include SimCLR, BYOL, and MoCo, but these are not domain-agnostic methods as they depend on vision-specific augmentations. We provide our own domain-agnostic implementations of recent algorithms, including e-mix (a generalization of i-mix) and Shuffled Embedding Detection (ShED; a generalization of ELECTRA), which randomly permutes a subset of the input embeddings and trains the model to identify the permuted embeddings.

Results

Below are results for algorithms trained on each dataset in DABS. The baseline performance is obtained via a randomly initialized encoder.

Pretrain Dataset Transfer Dataset Encoder Baseline Performance e-mix Performance ShED Performance
ImageNet CIFAR10 Transformer 24.20% 39.43% 39.63%
ImageNet CU Birds Transformer 1.62% 3.86% 2.95%
ImageNet VGG Flowers Transformer 9.03% 25.96% 13.03%
ImageNet DTD Transformer 7.39% 8.83% 18.35%
ImageNet Traffic Sign Transformer 14.33% 65.07% 27.51%
ImageNet Aircraft Transformer 2.70% 10.15% 5.60%
PAMAP2 PAMAP2 Transformer 69.81% 79.48% 88.69%
MSCOCO VQA Transformer 57.50% 48.90% 54.30%
CheXpert CheXpert Transformer 68.14% 72.40% 72.40%
CheXpert ChestX-ray8 Transformer 57.00% 63.00% 63.70%
Wikitext-103 GLUE (average) Transformer 42.29% 44.08% 48.37%
mC4 PAWS-X (average) Transformer 58.11% 56.16% 59.91%
LibriSpeech Audio MNIST Transformer 33.13% 80.35% 67.33%
LibriSpeech Fluent Locations Transformer 62.09% 60.93% 60.24%
LibriSpeech Fluent Actions Transformer 26.15% 29.87% 30.53%
LibriSpeech Fluent Objects Transformer 30.13% 39.89% 39.36%
LibriSpeech Google Speech Commands Transformer 4.87% 19.22% 20.73%
LibriSpeech LibriSpeech Transformer 17.12% 60.18% 34.77%
LibriSpeech VoxCeleb1 Transformer 0.59% 2.43% 2.81%
You might also like...
 Patch Rotation: A Self-Supervised Auxiliary Task for Robustness and Accuracy of Supervised Models
Patch Rotation: A Self-Supervised Auxiliary Task for Robustness and Accuracy of Supervised Models

Patch-Rotation(PatchRot) Patch Rotation: A Self-Supervised Auxiliary Task for Robustness and Accuracy of Supervised Models Submitted to Neurips2021 To

Mixup for Supervision, Semi- and Self-Supervision Learning Toolbox and Benchmark
Mixup for Supervision, Semi- and Self-Supervision Learning Toolbox and Benchmark

OpenSelfSup News Downstream tasks now support more methods(Mask RCNN-FPN, RetinaNet, Keypoints RCNN) and more datasets(Cityscapes). 'GaussianBlur' is

The dataset and source code for our paper: "Did You Ask a Good Question? A Cross-Domain Question IntentionClassification Benchmark for Text-to-SQL"

TriageSQL The dataset and source code for our paper: "Did You Ask a Good Question? A Cross-Domain Question Intention Classification Benchmark for Text

Pytorch implementation of four neural network based domain adaptation techniques: DeepCORAL, DDC, CDAN and CDAN+E. Evaluated on benchmark dataset Office31.
Pytorch implementation of four neural network based domain adaptation techniques: DeepCORAL, DDC, CDAN and CDAN+E. Evaluated on benchmark dataset Office31.

Deep-Unsupervised-Domain-Adaptation Pytorch implementation of four neural network based domain adaptation techniques: DeepCORAL, DDC, CDAN and CDAN+E.

LoveDA: A Remote Sensing Land-Cover Dataset for Domain Adaptive Semantic Segmentation (NeurIPS2021 Benchmark and Dataset Track)

LoveDA: A Remote Sensing Land-Cover Dataset for Domain Adaptive Semantic Segmentation by Junjue Wang, Zhuo Zheng, Ailong Ma, Xiaoyan Lu, and Yanfei Zh

Variational Attention: Propagating Domain-Specific Knowledge for Multi-Domain Learning in Crowd Counting (ICCV, 2021)

DKPNet ICCV 2021 Variational Attention: Propagating Domain-Specific Knowledge for Multi-Domain Learning in Crowd Counting Baseline of DKPNet is availa

Implementation of the paper
Implementation of the paper "Language-agnostic representation learning of source code from structure and context".

Code Transformer This is an official PyTorch implementation of the CodeTransformer model proposed in: D. Zügner, T. Kirschstein, M. Catasta, J. Leskov

FEDn is an open-source, modular and ML-framework agnostic framework for Federated Machine Learning
FEDn is an open-source, modular and ML-framework agnostic framework for Federated Machine Learning

FEDn is an open-source, modular and ML-framework agnostic framework for Federated Machine Learning (FedML) developed and maintained by Scaleout Systems. FEDn enables highly scalable cross-silo and cross-device use-cases over FEDn networks.

MARS: Learning Modality-Agnostic Representation for Scalable Cross-media Retrieva

Introduction This is the source code of our TCSVT 2021 paper "MARS: Learning Modality-Agnostic Representation for Scalable Cross-media Retrieval". Ple

Comments
  • CVE-2007-4559 Patch

    CVE-2007-4559 Patch

    Patching CVE-2007-4559

    Hi, we are security researchers from the Advanced Research Center at Trellix. We have began a campaign to patch a widespread bug named CVE-2007-4559. CVE-2007-4559 is a 15 year old bug in the Python tarfile package. By using extract() or extractall() on a tarfile object without sanitizing input, a maliciously crafted .tar file could perform a directory path traversal attack. We found at least one unsantized extractall() in your codebase and are providing a patch for you via pull request. The patch essentially checks to see if all tarfile members will be extracted safely and throws an exception otherwise. We encourage you to use this patch or your own solution to secure against CVE-2007-4559. Further technical information about the vulnerability can be found in this blog.

    If you have further questions you may contact us through this projects lead researcher Kasimir Schulz.

    opened by TrellixVulnTeam 0
  • How to load a pretrained ckpt for transfer learning?

    How to load a pretrained ckpt for transfer learning?

    Hi,

    Thanks for sharing the great work. I have a question in reproducing the results:

    I can run the code when I set ckpt=null but fail in loading a pretrained ckpt

    
    Error executing job with overrides: ['exp.name=pamap2-transfer-ckptnull', 'dataset=pamap2', 'ckpt=model.ckpt
    ']
    Traceback (most recent call last):
      File "transfer.py", line 42, in run
        system = transfer.TransferSystem(config)
      File "/data/home/v-clei/gitrepos/dabs/src/systems/transfer.py", line 56, in __init__
        self.load_state_dict(torch.load(config.ckpt)['state_dict'], strict=False)
    TypeError: 'EMixSystem' object is not subscriptable
    
    Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.
    
    

    Thank you very much.

    opened by ChenyangLEI 0
  • Pretraining dataset for natural images

    Pretraining dataset for natural images

    Hello,

    Thank you so much for your contribution.

    It is not clear to me which dataset is actually used for pretraining on natural image domain. In section 4, you claimed that ImageNet is the pretraining dataset. But in the first two tables in the readme file, you mentioned cifar10 is used for pretraining. image

    Also, in Table 3 of your supplementary, it is specified that cifar10 is used in "both" phases. image

    By the way, the accuracy on downstream datasets does not average to 27.9 as reported in Table 1. image image

    I am curious about where the difference comes from since you have seeded your training and different runs cannot explain such a variation.

    Thank you for your time.

    opened by Huiimin5 0
Owner
Alex Tamkin
PhD at @stanfordnlp
Alex Tamkin
PyTorch implementation of the supervised learning experiments from the paper Model-Agnostic Meta-Learning (MAML)

pytorch-maml This is a PyTorch implementation of the supervised learning experiments from the paper Model-Agnostic Meta-Learning (MAML): https://arxiv

Kate Rakelly 516 Jan 5, 2023
Contrastive learning of Class-agnostic Activation Map for Weakly Supervised Object Localization and Semantic Segmentation (CVPR 2022)

CCAM (Unsupervised) Code repository for our paper "CCAM: Contrastive learning of Class-agnostic Activation Map for Weakly Supervised Object Localizati

Computer Vision Insitute, SZU 113 Dec 27, 2022
This codebase is the official implementation of Test-Time Classifier Adjustment Module for Model-Agnostic Domain Generalization (NeurIPS2021, Spotlight)

Test-Time Classifier Adjustment Module for Model-Agnostic Domain Generalization This codebase is the official implementation of Test-Time Classifier A

null 47 Dec 28, 2022
Unified Pre-training for Self-Supervised Learning and Supervised Learning for ASR

UniSpeech The family of UniSpeech: UniSpeech (ICML 2021): Unified Pre-training for Self-Supervised Learning and Supervised Learning for ASR UniSpeech-

Microsoft 282 Jan 9, 2023
The self-supervised goal reaching benchmark introduced in Discovering and Achieving Goals via World Models

Lexa-Benchmark Codebase for the self-supervised goal reaching benchmark introduced in 'Discovering and Achieving Goals via World Models'. Setup Create

null 1 Oct 14, 2021
Codebase for the self-supervised goal reaching benchmark introduced in the LEXA paper

LEXA Benchmark Codebase for the self-supervised goal reaching benchmark introduced in the LEXA paper (Discovering and Achieving Goals via World Models

Oleg Rybkin 36 Dec 22, 2022
Self-Supervised Learning for Domain Adaptation on Point-Clouds

Self-Supervised Learning for Domain Adaptation on Point-Clouds Introduction Self-supervised learning (SSL) allows to learn useful representations from

Idan Achituve 66 Dec 20, 2022
Code for our paper Domain Adaptive Semantic Segmentation with Self-Supervised Depth Estimation

CorDA Code for our paper Domain Adaptive Semantic Segmentation with Self-Supervised Depth Estimation Prerequisite Please create and activate the follo

Qin Wang 60 Nov 30, 2022
[CVPR 2021] "The Lottery Tickets Hypothesis for Supervised and Self-supervised Pre-training in Computer Vision Models" Tianlong Chen, Jonathan Frankle, Shiyu Chang, Sijia Liu, Yang Zhang, Michael Carbin, Zhangyang Wang

The Lottery Tickets Hypothesis for Supervised and Self-supervised Pre-training in Computer Vision Models Codes for this paper The Lottery Tickets Hypo

VITA 59 Dec 28, 2022