Official Pytorch Implementation of: "ImageNet-21K Pretraining for the Masses"(2021) paper

Overview

ImageNet-21K Pretraining for the Masses


Paper | Pretrained models

Official PyTorch Implementation

Tal Ridnik, Emanuel Ben-Baruch, Asaf Noy, Lihi Zelnik-Manor
DAMO Academy, Alibaba Group

Abstract

ImageNet-1K serves as the primary dataset for pretraining deep learning models for computer vision tasks. ImageNet-21K dataset, which contains more pictures and classes, is used less frequently for pretraining, mainly due to its complexity, and underestimation of its added value compared to standard ImageNet-1K pretraining. This paper aims to close this gap, and make high-quality efficient pretraining on ImageNet-21K available for everyone. Via a dedicated preprocessing stage, utilizing WordNet hierarchies, and a novel training scheme called semantic softmax, we show that different models, including small mobile-oriented models, significantly benefit from ImageNet-21K pretraining on numerous datasets and tasks. We also show that we outperform previous ImageNet-21K pretraining schemes for prominent new models like ViT. Our proposed pretraining pipeline is efficient, accessible, and leads to SoTA reproducible results, from a publicly available dataset.

Getting Started

Note - repo under construction, more contetnt will be added.

(1) Pretrained Models on ImageNet-21K-P Dataset

Backbone ImageNet-21K-P semantic
top-1 Accuracy
[%]
ImageNet-1K
top-1 Accuracy
[%]
Maximal
batch size
Maximal
training speed
(img/sec)
Maximal
inference speed
(img/sec)
MobilenetV3_large_100 73.1 78.0 488 1210 5980
Ofa_flops_595m_s 75.0 81.0 288 500 3240
ResNet50 75.6 82.0 320 720 2760
TResNet-M 76.4 83.1 520 670 2970
TResNet-L (V2) 76.7 83.9 240 300 1460
ViT_base_patch16_224 77.6 84.4 160 340 1140

See this link for more details.
We highly recommend to start working with ImageNet-21K by testing these weights against standard ImageNet-1K pretraining, and comparing results on your relevant downstream tasks. After you will see a significant improvement (you will), proceed to pretraining new models.

(2) Obtaining and Processing the Dataset

See instructions for obtaining and processing the dataset in here.

(3) Training Code

To use the traing code, first download ImageNet-21K-P semantic tree to your local ./resources/ folder Example of semantic softmax training:

python train_semantic_softmax.py \
--batch_size=4 \
--data_path=/mnt/datasets/21k \
--model_name=mobilenetv3_large_100 \
--model_path=/mnt/models/mobilenetv3_large_100.pth \
--epochs=80

For shortening the training, we initialize the weights from standard ImageNet-1K. Recommended to use ImageNet-1K weights from this excellent repo.

To be added soon

  • KD training code
  • Inference code
  • Model weights after transferred to ImageNet-1K
  • More...

Citation

@misc{ridnik2021imagenet21k,
      title={ImageNet-21K Pretraining for the Masses}, 
      author={Tal Ridnik and Emanuel Ben-Baruch and Asaf Noy and Lihi Zelnik-Manor},
      year={2021},
      eprint={2104.10972},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}
Comments
  • imagenet1k pretrain model

    imagenet1k pretrain model

    https://github.com/Alibaba-MIIL/ImageNet21K/blob/00ef9989825bbcb8dedc91eac18638c129eb5ad8/src_files/models/utils/factory.py#L54

    Why do we need to load pretrained ImageNet-1K model if we are going to pretrain based on the imagenet21k dataset?

    opened by cissoidx 24
  • General fail in preprocessing

    General fail in preprocessing

    I did a full preprocessing using your script on the winter21_whole.tar.gz dataset. Is it normal to have tons of general fail messages? Will these fails of images impact final training results? 截屏2021-07-05 下午6 35 15

    opened by cissoidx 13
  • Val Result

    Val Result

    Dear author : When I test resnet50 model you provide on imagenet21k-p val dataset, the ImageNet-21K-P semantic top-1 Accuracy is just 69%, but what you have claimed is 75.6%, but I indeed see the improvement on downstream task, what's the problem?

    opened by wtt0213 10
  • can't download metadata

    can't download metadata

    i tried to run visualize_detector.py, but there is a httperror

    urllib.request.urlretrieve(url, filename)

    Exception has occurred: HTTPError HTTP Error 403: Forbidden

    how can it be solved?

    opened by ghkhk 9
  • cant run visualize_detector.py

    cant run visualize_detector.py

    https://github.com/Alibaba-MIIL/ImageNet21K/blob/72c822a7a30290f078a2611139a1f0bcdb668606/visualize_detector.py#L25

    error:

    Unknown model (vit_base_patch16_224_miil_in21k)

    how to run with my own wight

    opened by jaffe-fly 7
  • how to use DPP to start ```train_single_label.py```

    how to use DPP to start ```train_single_label.py```

    1 machine 2 GPUs I cant use command line to start how to start in the wInternal code to use DDP

    i cant find where to save mode, how to save in the DDP

    opened by jaffe-fly 7
  • The strategy of transferring ImageNet-21k ViT model to cifar100

    The strategy of transferring ImageNet-21k ViT model to cifar100

    Hi @mrT23, thanks for your great work! Currently I use timm train.py to finetune the 'vit_base_patch16_224_miil_in21k' model on cifar100, however I can't get the reported result 94.2%. Here is my running script.

    python -m torch.distributed.launch --nproc_per_node=8 --master_port 6016 train.py \
    /data/cifar-100-images/ \
    -b=64 \
    --img-size=224 \
    --epochs=50 \
    --color-jitter=0 \
    --amp \
    --lr=2e-4 \
    --sched='cosine' \
    --model-ema --model-ema-decay=0.995 --reprob=0.5 --smoothing=0.1 \
    --min-lr=1e-8 --warmup-epochs=3 --train-interpolation=bilinear --aa=v0 \
    --model=vit_base_patch16_224_miil_in21k \
    --pretrained \
    --num-classes=100 \
    --opt=adamw --weight-decay=1e-4 \
    --checkpoint-hist=1
    

    I try several settings: adam, lr=2e-4, wd=1e-4 92.44% adamw, lr=2e-4, wd=1e-4 92.90% sgd, lr=2e-4, wd=1e-4 87.49% adamw, lr=4e-4, wd=1e-4 92.24% adamw, lr=2e-4, wd=1e-2 93.08% Could you give me some suggestions?

    opened by Dyfine 5
  • pretrain head of vit

    pretrain head of vit

    In the vit paper, it says:

    The classification head is implemented by a MLP with one hidden layer at pre-training time and by a single linear layer at fine-tuning time

    So if you are using timm package, they define the head like this: https://github.com/rwightman/pytorch-image-models/blob/d3f744065088ca9b6b3a0f968c70e90ed37de75b/timm/models/vision_transformer.py#L293

    Did you reach the stats in your paper using single layer head or head of one hidden layer?

    opened by cissoidx 5
  • The strategy of transferring ImageNet-21k models to ImageNet-1K.

    The strategy of transferring ImageNet-21k models to ImageNet-1K.

    Hi, I try to transfer the ImageNet-21k model (resnet50_miil_21k) to ImageNet-1K according to the details provided in your paper and https://github.com/Alibaba-MIIL/ImageNet21K/blob/main/Transfer_learning.md. But I only get 79.5% top-1 acc on ImageNet-1K, lower than the 82.0% in your readme. Could you give me some suggestions?

    My training command is as follows: python3 -m torch.distributed.launch --nproc_per_node=8 train.py
    data/ImageNet-1k/
    -b=128
    --img-size=224
    --epochs=100
    --color-jitter=0
    --amp
    --sched='cosine'
    --model-ema --model-ema-decay=0.995 --reprob=0.5 --smoothing=0.1
    --min-lr=1e-8 --warmup-epochs=3 --train-interpolation=bilinear --aa=v0
    --pretrained
    --lr=2e-4
    --model=resnet50
    --opt=adam --weight-decay=1e-4

    opened by whai362 5
  • About Training Setting Parameters

    About Training Setting Parameters

    Could you provide the training parameters when using 8 * V100 with DDP? When I use the following command line python3 -u -m torch.distributed.launch --nnodes 1 --node_rank 0 --nproc_per_node 8 --master_port 2221 train_semantic_softmax.py --data_path /data/imagenet22kp_fall/ --model_name mobilenetv3_large_100 --epochs 80 --weight_decay 1e-4 --batch_size 1024 --lr 3e-4 --num_classes 11221 --tree_path ./data/imagenet21k_miil_tree.pth --model_path=./mobilenetv3_large_100.pth The accuracy is only 71.366%, which is far lower than 73.1% reported in the paper.

    opened by SJLeo 5
  • Model weights trained on Stanford cars

    Model weights trained on Stanford cars

    Hi,

    Can you please share the link to the weights file of model trained on Stanford cars data set. I am unable to get the expected results using https://miil-public-eu.oss-eu-central-1.aliyuncs.com/model-zoo/ImageNet_21K_P/models/tresnet_l_v2_miil_21k.pth

    Please guide.

    Thanks,

    opened by ma-siddiqui 5
  • Anyone here have trouble reaching the mentioned accuracy for ViT-B?

    Anyone here have trouble reaching the mentioned accuracy for ViT-B?

    Anyone here have trouble reaching the mentioned accuracy for ViT-B? For some reason, the best accuracy I can get is 77% top1 without KD. While in the paper they said they reach 81% top1 without KD and 84.4% top1 with KD. Anyone manage to get that accuracy? If so, can you tell me what hyperparameters did you use? Thanks!

    opened by CharlesLeeeee 0
  • What is the teacher model when using semantic softmax with KD?

    What is the teacher model when using semantic softmax with KD?

    What is the teacher model when using semantic softmax with KD? The figure in the paper is not clear on what the teacher is. Or in there is no code example on how to use the KD

    opened by CharlesLeeeee 0
  • Missing details on Dropout and momentum value used for SGD when fine tuning on ImageNet1k

    Missing details on Dropout and momentum value used for SGD when fine tuning on ImageNet1k

    Nowhere on the paper or code where it mentions what Dropout prob and momentum value used for SGD when fine tuning on ImageNet1k. Is is the same as https://arxiv.org/pdf/2010.11929.pdf ? Also can you provide the code for imagenet1k. I would like to see how the images are normalized for that stage?

    opened by CharlesLeeeee 0
  • Hyperparameters to finetune ResNet50 from IN21k to IN1k

    Hyperparameters to finetune ResNet50 from IN21k to IN1k

    Good morning, thank you very much for your work. Can you share the hyperparameters/training procedure that have been used to perform the finetuning of resnet50 to standard ImageNet?

    Thank you very much in advance!

    opened by andreamaracani 0
Owner
null
Official Pytorch Implementation of: "Semantic Diversity Learning for Zero-Shot Multi-label Classification"(2021) paper

Semantic Diversity Learning for Zero-Shot Multi-label Classification Paper Official PyTorch Implementation Avi Ben-Cohen, Nadav Zamir, Emanuel Ben Bar

null 28 Aug 29, 2022
Official PyTorch implementation of the preprint paper "Stylized Neural Painting", accepted to CVPR 2021.

Official PyTorch implementation of the preprint paper "Stylized Neural Painting", accepted to CVPR 2021.

Zhengxia Zou 1.5k Dec 28, 2022
Official PyTorch implementation of the paper "Recycling Discriminator: Towards Opinion-Unaware Image Quality Assessment Using Wasserstein GAN", accepted to ACM MM 2021 BNI Track.

RecycleD Official PyTorch implementation of the paper "Recycling Discriminator: Towards Opinion-Unaware Image Quality Assessment Using Wasserstein GAN

Yunan Zhu 23 Nov 5, 2022
official Pytorch implementation of ICCV 2021 paper FuseFormer: Fusing Fine-Grained Information in Transformers for Video Inpainting.

FuseFormer: Fusing Fine-Grained Information in Transformers for Video Inpainting By Rui Liu, Hanming Deng, Yangyi Huang, Xiaoyu Shi, Lewei Lu, Wenxiu

null 77 Dec 27, 2022
Official PyTorch Implementation of paper "Deep 3D Mask Volume for View Synthesis of Dynamic Scenes", ICCV 2021.

Deep 3D Mask Volume for View Synthesis of Dynamic Scenes Official PyTorch Implementation of paper "Deep 3D Mask Volume for View Synthesis of Dynamic S

Ken Lin 17 Oct 12, 2022
This is the official pytorch implementation for our ICCV 2021 paper "TRAR: Routing the Attention Spans in Transformers for Visual Question Answering" on VQA Task

?? ERASOR (RA-L'21 with ICRA Option) Official page of "ERASOR: Egocentric Ratio of Pseudo Occupancy-based Dynamic Object Removal for Static 3D Point C

Hyungtae Lim 225 Dec 29, 2022
Official Pytorch implementation of the paper "Action-Conditioned 3D Human Motion Synthesis with Transformer VAE", ICCV 2021

ACTOR Official Pytorch implementation of the paper "Action-Conditioned 3D Human Motion Synthesis with Transformer VAE", ICCV 2021. Please visit our we

Mathis Petrovich 248 Dec 23, 2022
Official PyTorch implementation of the ICRA 2021 paper: Adversarial Differentiable Data Augmentation for Autonomous Systems.

Adversarial Differentiable Data Augmentation This repository provides the official PyTorch implementation of the ICRA 2021 paper: Adversarial Differen

Manli 3 Oct 15, 2022
Official implementation of the ICLR 2021 paper

You Only Need Adversarial Supervision for Semantic Image Synthesis Official PyTorch implementation of the ICLR 2021 paper "You Only Need Adversarial S

Bosch Research 272 Dec 28, 2022
This project is the official implementation of our accepted ICLR 2021 paper BiPointNet: Binary Neural Network for Point Clouds.

BiPointNet: Binary Neural Network for Point Clouds Created by Haotong Qin, Zhongang Cai, Mingyuan Zhang, Yifu Ding, Haiyu Zhao, Shuai Yi, Xianglong Li

Haotong Qin 59 Dec 17, 2022
This is an official implementation of our CVPR 2021 paper "Bottom-Up Human Pose Estimation Via Disentangled Keypoint Regression" (https://arxiv.org/abs/2104.02300)

Bottom-Up Human Pose Estimation Via Disentangled Keypoint Regression Introduction In this paper, we are interested in the bottom-up paradigm of estima

HRNet 367 Dec 27, 2022
The official implementation of our CVPR 2021 paper - Hybrid Rotation Averaging: A Fast and Robust Rotation Averaging Approach

Graph Optimizer This repo contains the official implementation of our CVPR 2021 paper - Hybrid Rotation Averaging: A Fast and Robust Rotation Averagin

Chenyu 109 Dec 23, 2022
Official implementation of the paper Vision Transformer with Progressive Sampling, ICCV 2021.

Vision Transformer with Progressive Sampling This is the official implementation of the paper Vision Transformer with Progressive Sampling, ICCV 2021.

yuexy 123 Jan 1, 2023
Official implementation for ICDAR 2021 paper "Handwritten Mathematical Expression Recognition with Bidirectionally Trained Transformer"

Handwritten Mathematical Expression Recognition with Bidirectionally Trained Transformer Description Convert offline handwritten mathematical expressi

Wenqi Zhao 87 Dec 27, 2022
Official implementation of "An Image is Worth 16x16 Words, What is a Video Worth?" (2021 paper)

An Image is Worth 16x16 Words, What is a Video Worth? paper Official PyTorch Implementation Gilad Sharir, Asaf Noy, Lihi Zelnik-Manor DAMO Academy, Al

null 213 Nov 12, 2022
Official implementation of the ICCV 2021 paper "Conditional DETR for Fast Training Convergence".

The DETR approach applies the transformer encoder and decoder architecture to object detection and achieves promising performance. In this paper, we handle the critical issue, slow training convergence, and present a conditional cross-attention mechanism for fast DETR training. Our approach is motivated by that the cross-attention in DETR relies highly on the content embeddings and that the spatial embeddings make minor contributions, increasing the need for high-quality content embeddings and thus increasing the training difficulty.

null 281 Dec 30, 2022
The official implementation of the Interspeech 2021 paper WSRGlow: A Glow-based Waveform Generative Model for Audio Super-Resolution.

WSRGlow The official implementation of the Interspeech 2021 paper WSRGlow: A Glow-based Waveform Generative Model for Audio Super-Resolution. Audio sa

Kexun Zhang 96 Jan 3, 2023
The Official Implementation of the ICCV-2021 Paper: Semantically Coherent Out-of-Distribution Detection.

SCOOD-UDG (ICCV 2021) This repository is the official implementation of the paper: Semantically Coherent Out-of-Distribution Detection Jingkang Yang,

Jake YANG 62 Nov 21, 2022
Official implementation of the ICCV 2021 paper: "The Power of Points for Modeling Humans in Clothing".

The Power of Points for Modeling Humans in Clothing (ICCV 2021) This repository contains the official PyTorch implementation of the ICCV 2021 paper: T

Qianli Ma 158 Nov 24, 2022