Revisiting Weakly Supervised Pre-Training of Visual Perception Models

Meta Research

Last update: Jan 5, 2023

Related tags

Deep Learning SWAG

Overview

SWAG: Supervised Weakly from hashtAGs

This repository contains SWAG models from the paper Revisiting Weakly Supervised Pre-Training of Visual Perception Models.

Requirements

This code has been tested to work with Python 3.8, PyTorch 1.10.1 and torchvision 0.11.2.

Note that CUDA support is not required for the tutorials.

To setup PyTorch and torchvision, please follow PyTorch's getting started instructions. If you are using conda on a linux machine, you can follow the following setup instructions -

conda create --name swag python=3.8
conda activate swag
conda install pytorch torchvision torchaudio cudatoolkit=10.2 -c pytorch

Model Zoo

We share checkpoints for all the pretrained models in the paper, and their ImageNet-1k finetuned counterparts. The models are available via torch.hub, and we also share URLs to all the checkpoints.

The details of the models, their torch.hub names / checkpoint links, and their performance on Imagenet-1k (IN-1K) are listed below.

Model	Pretrain Resolution	Pretrained Model	Finetune Resolution	IN-1K Finetuned Model	IN-1K Top-1	IN-1K Top-5
RegNetY 16GF	224 x 224	regnety_16gf	384 x 384	regnety_16gf_in1k	86.02%	98.05%
RegNetY 32GF	224 x 224	regnety_32gf	384 x 384	regnety_32gf_in1k	86.83%	98.36%
RegNetY 128GF	224 x 224	regnety_128gf	384 x 384	regnety_128gf_in1k	88.23%	98.69%
ViT B/16	224 x 224	vit_b16	384 x 384	vit_b16_in1k	85.29%	97.65%
ViT L/16	224 x 224	vit_l16	512 x 512	vit_l16_in1k	88.07%	98.51%
ViT H/14	224 x 224	vit_h14	518 x 518	vit_h14_in1k	88.55%	98.69%

The models can be loaded via torch hub using the following command -

model = torch.hub.load("facebookresearch/swag", model="vit_b16_in1k")

Inference Tutorial

For a tutorial with step-by-step instructions to perform inference, follow our inference tutorial and run it locally, or .

Live Demo

SWAG has been integrated into Huggingface Spaces 🤗 using Gradio. Try out the web demo on .

Credits: AK391

ImageNet 1K Evaluation

We also provide a script to evaluate the accuracy of our models on ImageNet 1K, imagenet_1k_eval.py. This script is a slightly modified version of the PyTorch ImageNet example which supports our models.

To evaluate the RegNetY 16GF IN1K model on a single node (one or more GPUs), one can simply run the following command -

python imagenet_1k_eval.py -m regnety_16gf_in1k -r 384 -b 400 /path/to/imagenet_1k/root/

Note that we specify a 384 x 384 resolution since that was the model's training resolution, and also specify a mini-batch size of 400, which is distributed over all the GPUs in the node. For larger models or with fewer GPUs, the batch size will need to be reduced. See the PyTorch ImageNet example README for more details.

Citation

If you use the SWAG models or if the work is useful in your research, please give us a star and cite:

@misc{singh2022revisiting,
      title={Revisiting Weakly Supervised Pre-Training of Visual Perception Models}, 
      author={Singh, Mannat and Gustafson, Laura and Adcock, Aaron and Reis, Vinicius de Freitas and Gedik, Bugra and Kosaraju, Raj Prateek and Mahajan, Dhruv and Girshick, Ross and Doll{\'a}r, Piotr and van der Maaten, Laurens},
      journal={arXiv preprint arXiv:2201.08371},
      year={2022}
}

License

SWAG models are released under the CC-BY-NC 4.0 license. See LICENSE for additional details.

Comments

Details around Platt scaling and open-sourcing zero-shot transfer
Thank you for open-sourcing this work.

From the paper:

What does $C$ contain? The classes on which SWAG pre-training was performed?

How is $p$ obtained?

Is it possible to also accompany this repository with a notebook that shows how to adapt the pre-trained models for zero-shot transfer? I don't think there is a good example available on the internet that shows how to extend an image recognition model like SWAG for zero-shot classification.

@mannatsingh
opened by sayakpaul 4
Add Docker environment & web demo

This pull request makes it possible to run your model inside a Docker environment, which makes it easier for other people to run it. We're using an open source tool called Cog to make this process easier.

This also means we can make a web page where other people can try out your model! View it here: https://replicate.com/facebookresearch/swag. We enable selecting different models for inference, and you can find the docker file under the tab ‘run model with docker’.

We have added some examples to the page, but do claim the page so you can own the page, customise the Example gallery as you like, push any future update to the web demo, and we'll feature it on our website and tweet about it too.

In case you're wondering who I am, I'm from Replicate, where we're trying to make machine learning reproducible. We got frustrated that we couldn't run all the really interesting ML work being done. So, we're going round implementing models we like. 😊
CLA Signed

opened by chenxwh 4
Unable to reproduce fine-tuning results
Hello,

I've tried reproducing the reported fine-tuning for RegNetY-16GF but obtained only 50% top-1 accuracy on ImageNet-1K at the end of training. I used the following hyper-parameters, taken from the paper:

# lr: 6e-3 # lr schedule: cosine # batch size: 512 # weight-decay: 0 # resolution: 384x384 # training steps: 20000 ~= 8 epochs # mixup: 0.1 # sync batchnorm # EMA: decay 1e-4

I have some doubt regarding the number of training steps, which seems to be too low. 20K training steps is equivalent to 8 epochs:

num_epochs = batch_size x num_steps / num_images = 512 x 20000/1281167 ~= 8

Could you please confirm if this was indeed your setting?

Thank you in advance!
opened by netw0rkf10w 2
Question regarding image dimension at fine-tuning
Hello,

This is great work, congratulations! And thank you for having released the models.

I would like to ask some questions please:

Is there a particular reason for choosing 518x518 as crop size for fine-tuning ViT H/14? This value doesn't seem to be very standard to be (I would have expected something like 500x500 or 512x512). Could you tell me if you did a hyper-parameter search for this value?

How much do you expect the accuracy to drop if we use 500x500 instead of 518x518?

Thank you very much in advance for your response.
opened by netw0rkf10w 2
Will the set of canonical hashtags be released?

The paper uses 28K canonical hashtags (each correspond to a set of wordnet synsets). I was wondering whether that can be released to foster future research in this area! Thanks

opened by linzhiqiu 1
Fine-tuning recipes

I would like to know if you could share your Classy Vision fine-tuning recipes (i.e., the *.json configs) for the trained weights listed in the README.

Thank you in advance. Best regards.

opened by netw0rkf10w 1

Owner

Meta Research

GitHub

[CVPR 2021] "The Lottery Tickets Hypothesis for Supervised and Self-supervised Pre-training in Computer Vision Models" Tianlong Chen, Jonathan Frankle, Shiyu Chang, Sijia Liu, Yang Zhang, Michael Carbin, Zhangyang Wang

The Lottery Tickets Hypothesis for Supervised and Self-supervised Pre-training in Computer Vision Models Codes for this paper The Lottery Tickets Hypo

59 Dec 28, 2022

A weakly-supervised scene graph generation codebase. The implementation of our CVPR2021 paper ``Linguistic Structures as Weak Supervision for Visual Scene Graph Generation''

README.md shall be finished soon. WSSGG 0 Overview 1 Installation 1.1 Faster-RCNN 1.2 Language Parser 1.3 GloVe Embeddings 2 Settings 2.1 VG-GT-Graph

35 Nov 20, 2022

The official implementation of CVPR 2021 Paper: Improving Weakly Supervised Visual Grounding by Contrastive Knowledge Distillation.

Improving Weakly Supervised Visual Grounding by Contrastive Knowledge Distillation This repository is the official implementation of CVPR 2021 paper:

9 Nov 14, 2022

Revisiting Contrastive Methods for Unsupervised Learning of Visual Representations. [2021]

Revisiting Contrastive Methods for Unsupervised Learning of Visual Representations This repo contains the Pytorch implementation of our paper: Revisit

80 Nov 20, 2022

Unified Pre-training for Self-Supervised Learning and Supervised Learning for ASR

UniSpeech The family of UniSpeech: UniSpeech (ICML 2021): Unified Pre-training for Self-Supervised Learning and Supervised Learning for ASR UniSpeech-

282 Jan 9, 2023

Hybrid CenterNet - Hybrid-supervised object detection / Weakly semi-supervised object detection

Hybrid-Supervised Object Detection System Object detection system trained by hybrid-supervision/weakly semi-supervision (HSOD/WSSOD): This project is

5 Dec 10, 2022

PyTorch implementation for the visual prior component (i.e. perception module) of the Visually Grounded Physics Learner [Li et al., 2020].

VGPL-Visual-Prior PyTorch implementation for the visual prior component (i.e. perception module) of the Visually Grounded Physics Learner (VGPL). Give

8 Dec 29, 2022

Code for the paper One Thing One Click: A Self-Training Approach for Weakly Supervised 3D Semantic Segmentation, CVPR 2021.

One Thing One Click One Thing One Click: A Self-Training Approach for Weakly Supervised 3D Semantic Segmentation (CVPR2021) Code for the paper One Thi

44 Dec 12, 2022

Revisiting Weakly Supervised Pre-Training of Visual Perception Models

Related tags

Overview

SWAG: Supervised Weakly from hashtAGs

Requirements

Model Zoo

Inference Tutorial

Live Demo

ImageNet 1K Evaluation

Citation

License

Comments

Details around Platt scaling and open-sourcing zero-shot transfer

Add Docker environment & web demo

Unable to reproduce fine-tuning results

Question regarding image dimension at fine-tuning

Will the set of canonical hashtags be released?

Fine-tuning recipes

Owner

Meta Research

[CVPR 2021] "The Lottery Tickets Hypothesis for Supervised and Self-supervised Pre-training in Computer Vision Models" Tianlong Chen, Jonathan Frankle, Shiyu Chang, Sijia Liu, Yang Zhang, Michael Carbin, Zhangyang Wang

A weakly-supervised scene graph generation codebase. The implementation of our CVPR2021 paper ``Linguistic Structures as Weak Supervision for Visual Scene Graph Generation''

The official implementation of CVPR 2021 Paper: Improving Weakly Supervised Visual Grounding by Contrastive Knowledge Distillation.

Revisiting Contrastive Methods for Unsupervised Learning of Visual Representations. [2021]

Unified Pre-training for Self-Supervised Learning and Supervised Learning for ASR

Hybrid CenterNet - Hybrid-supervised object detection / Weakly semi-supervised object detection

PyTorch implementation for the visual prior component (i.e. perception module) of the Visually Grounded Physics Learner [Li et al., 2020].

Code for the paper One Thing One Click: A Self-Training Approach for Weakly Supervised 3D Semantic Segmentation, CVPR 2021.

Code for pre-training CharacterBERT models (as well as BERT models).

UniMoCo: Unsupervised, Semi-Supervised and Full-Supervised Visual Representation Learning

Codes to pre-train T5 (Text-to-Text Transfer Transformer) models pre-trained on Japanese web texts

PyTorch implementation of "Contrast to Divide: self-supervised pre-training for learning with noisy labels"

UniLM AI - Large-scale Self-supervised Pre-training across Tasks, Languages, and Modalities

Self-Supervised Pre-Training for Transformer-Based Person Re-Identification

VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training

Weakly Supervised Learning of Rigid 3D Scene Flow

Leveraging Instance-, Image- and Dataset-Level Information for Weakly Supervised Instance Segmentation

A Comprehensive Analysis of Weakly-Supervised Semantic Segmentation in Different Image Domains (IJCV submission)

Context Decoupling Augmentation for Weakly Supervised Semantic Segmentation