Official Pytorch Implementation of: "ImageNet-21K Pretraining for the Masses"(2021) paper

Last update: Jan 2, 2023

Related tags

Deep Learning multi-label-classification sota downstream pretraining vision-transformer imagenet21k semantic-softmax single-label

Overview

ImageNet-21K Pretraining for the Masses

Paper | Pretrained models

Official PyTorch Implementation

Tal Ridnik, Emanuel Ben-Baruch, Asaf Noy, Lihi Zelnik-Manor
DAMO Academy, Alibaba Group

Abstract

ImageNet-1K serves as the primary dataset for pretraining deep learning models for computer vision tasks. ImageNet-21K dataset, which contains more pictures and classes, is used less frequently for pretraining, mainly due to its complexity, and underestimation of its added value compared to standard ImageNet-1K pretraining. This paper aims to close this gap, and make high-quality efficient pretraining on ImageNet-21K available for everyone. Via a dedicated preprocessing stage, utilizing WordNet hierarchies, and a novel training scheme called semantic softmax, we show that different models, including small mobile-oriented models, significantly benefit from ImageNet-21K pretraining on numerous datasets and tasks. We also show that we outperform previous ImageNet-21K pretraining schemes for prominent new models like ViT. Our proposed pretraining pipeline is efficient, accessible, and leads to SoTA reproducible results, from a publicly available dataset.

Getting Started

Note - repo under construction, more contetnt will be added.

(1) Pretrained Models on ImageNet-21K-P Dataset

Backbone	ImageNet-21K-P semantic top-1 Accuracy [%]	ImageNet-1K top-1 Accuracy [%]	Maximal batch size	Maximal training speed (img/sec)	Maximal inference speed (img/sec)
MobilenetV3_large_100	73.1	78.0	488	1210	5980
Ofa_flops_595m_s	75.0	81.0	288	500	3240
ResNet50	75.6	82.0	320	720	2760
TResNet-M	76.4	83.1	520	670	2970
TResNet-L (V2)	76.7	83.9	240	300	1460
ViT_base_patch16_224	77.6	84.4	160	340	1140

See this link for more details.
We highly recommend to start working with ImageNet-21K by testing these weights against standard ImageNet-1K pretraining, and comparing results on your relevant downstream tasks. After you will see a significant improvement (you will), proceed to pretraining new models.

(2) Obtaining and Processing the Dataset

See instructions for obtaining and processing the dataset in here.

(3) Training Code

To use the traing code, first download ImageNet-21K-P semantic tree to your local ./resources/ folder Example of semantic softmax training:

python train_semantic_softmax.py \
--batch_size=4 \
--data_path=/mnt/datasets/21k \
--model_name=mobilenetv3_large_100 \
--model_path=/mnt/models/mobilenetv3_large_100.pth \
--epochs=80

For shortening the training, we initialize the weights from standard ImageNet-1K. Recommended to use ImageNet-1K weights from this excellent repo.

To be added soon

KD training code
Inference code
Model weights after transferred to ImageNet-1K
More...

Citation

@misc{ridnik2021imagenet21k,
      title={ImageNet-21K Pretraining for the Masses}, 
      author={Tal Ridnik and Emanuel Ben-Baruch and Asaf Noy and Lihi Zelnik-Manor},
      year={2021},
      eprint={2104.10972},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

Comments

imagenet1k pretrain model

https://github.com/Alibaba-MIIL/ImageNet21K/blob/00ef9989825bbcb8dedc91eac18638c129eb5ad8/src_files/models/utils/factory.py#L54

Why do we need to load pretrained ImageNet-1K model if we are going to pretrain based on the imagenet21k dataset?

opened by cissoidx 24
General fail in preprocessing

I did a full preprocessing using your script on the winter21_whole.tar.gz dataset. Is it normal to have tons of general fail messages? Will these fails of images impact final training results?

opened by cissoidx 13
Val Result

Dear author ： When I test resnet50 model you provide on imagenet21k-p val dataset, the ImageNet-21K-P semantic top-1 Accuracy is just 69%, but what you have claimed is 75.6%, but I indeed see the improvement on downstream task, what's the problem?

opened by wtt0213 10
can't download metadata

i tried to run visualize_detector.py, but there is a httperror

urllib.request.urlretrieve(url, filename)

Exception has occurred: HTTPError HTTP Error 403: Forbidden

how can it be solved?

opened by ghkhk 9
cant run visualize_detector.py

https://github.com/Alibaba-MIIL/ImageNet21K/blob/72c822a7a30290f078a2611139a1f0bcdb668606/visualize_detector.py#L25

error:

Unknown model (vit_base_patch16_224_miil_in21k)

how to run with my own wight

opened by jaffe-fly 7
how to use DPP to start ```train_single_label.py```

1 machine 2 GPUs I cant use command line to start how to start in the wInternal code to use DDP

i cant find where to save mode, how to save in the DDP

opened by jaffe-fly 7
The strategy of transferring ImageNet-21k ViT model to cifar100
Hi @mrT23, thanks for your great work! Currently I use timm train.py to finetune the 'vit_base_patch16_224_miil_in21k' model on cifar100, however I can't get the reported result 94.2%. Here is my running script.

python -m torch.distributed.launch --nproc_per_node=8 --master_port 6016 train.py \ /data/cifar-100-images/ \ -b=64 \ --img-size=224 \ --epochs=50 \ --color-jitter=0 \ --amp \ --lr=2e-4 \ --sched='cosine' \ --model-ema --model-ema-decay=0.995 --reprob=0.5 --smoothing=0.1 \ --min-lr=1e-8 --warmup-epochs=3 --train-interpolation=bilinear --aa=v0 \ --model=vit_base_patch16_224_miil_in21k \ --pretrained \ --num-classes=100 \ --opt=adamw --weight-decay=1e-4 \ --checkpoint-hist=1

I try several settings: adam, lr=2e-4, wd=1e-4 92.44% adamw, lr=2e-4, wd=1e-4 92.90% sgd, lr=2e-4, wd=1e-4 87.49% adamw, lr=4e-4, wd=1e-4 92.24% adamw, lr=2e-4, wd=1e-2 93.08% Could you give me some suggestions?
opened by Dyfine 5
pretrain head of vit

In the vit paper, it says:

The classification head is implemented by a MLP with one hidden layer at pre-training time and by a single linear layer at fine-tuning time

So if you are using timm package, they define the head like this: https://github.com/rwightman/pytorch-image-models/blob/d3f744065088ca9b6b3a0f968c70e90ed37de75b/timm/models/vision_transformer.py#L293

Did you reach the stats in your paper using single layer head or head of one hidden layer?

opened by cissoidx 5
The strategy of transferring ImageNet-21k models to ImageNet-1K.

Hi, I try to transfer the ImageNet-21k model (resnet50_miil_21k) to ImageNet-1K according to the details provided in your paper and https://github.com/Alibaba-MIIL/ImageNet21K/blob/main/Transfer_learning.md. But I only get 79.5% top-1 acc on ImageNet-1K, lower than the 82.0% in your readme. Could you give me some suggestions?

My training command is as follows: python3 -m torch.distributed.launch --nproc_per_node=8 train.py
data/ImageNet-1k/
-b=128
--img-size=224
--epochs=100
--color-jitter=0
--amp
--sched='cosine'
--model-ema --model-ema-decay=0.995 --reprob=0.5 --smoothing=0.1
--min-lr=1e-8 --warmup-epochs=3 --train-interpolation=bilinear --aa=v0
--pretrained
--lr=2e-4
--model=resnet50
--opt=adam --weight-decay=1e-4

opened by whai362 5
About Training Setting Parameters

Could you provide the training parameters when using 8 * V100 with DDP? When I use the following command line python3 -u -m torch.distributed.launch --nnodes 1 --node_rank 0 --nproc_per_node 8 --master_port 2221 train_semantic_softmax.py --data_path /data/imagenet22kp_fall/ --model_name mobilenetv3_large_100 --epochs 80 --weight_decay 1e-4 --batch_size 1024 --lr 3e-4 --num_classes 11221 --tree_path ./data/imagenet21k_miil_tree.pth --model_path=./mobilenetv3_large_100.pth The accuracy is only 71.366%, which is far lower than 73.1% reported in the paper.

opened by SJLeo 5
Model weights trained on Stanford cars

Hi,

Can you please share the link to the weights file of model trained on Stanford cars data set. I am unable to get the expected results using https://miil-public-eu.oss-eu-central-1.aliyuncs.com/model-zoo/ImageNet_21K_P/models/tresnet_l_v2_miil_21k.pth

Please guide.

Thanks,

opened by ma-siddiqui 5
Anyone here have trouble reaching the mentioned accuracy for ViT-B?

Anyone here have trouble reaching the mentioned accuracy for ViT-B? For some reason, the best accuracy I can get is 77% top1 without KD. While in the paper they said they reach 81% top1 without KD and 84.4% top1 with KD. Anyone manage to get that accuracy? If so, can you tell me what hyperparameters did you use? Thanks!

opened by CharlesLeeeee 0
What is the teacher model when using semantic softmax with KD?

What is the teacher model when using semantic softmax with KD? The figure in the paper is not clear on what the teacher is. Or in there is no code example on how to use the KD

opened by CharlesLeeeee 0
Missing details on Dropout and momentum value used for SGD when fine tuning on ImageNet1k

Nowhere on the paper or code where it mentions what Dropout prob and momentum value used for SGD when fine tuning on ImageNet1k. Is is the same as https://arxiv.org/pdf/2010.11929.pdf ? Also can you provide the code for imagenet1k. I would like to see how the images are normalized for that stage?

opened by CharlesLeeeee 0
Hyperparameters to finetune ResNet50 from IN21k to IN1k

Good morning, thank you very much for your work. Can you share the hyperparameters/training procedure that have been used to perform the finetuning of resnet50 to standard ImageNet?

Thank you very much in advance!

opened by andreamaracani 0

Owner

GitHub

Official Pytorch Implementation of: "Semantic Diversity Learning for Zero-Shot Multi-label Classification"(2021) paper

Semantic Diversity Learning for Zero-Shot Multi-label Classification Paper Official PyTorch Implementation Avi Ben-Cohen, Nadav Zamir, Emanuel Ben Bar

28 Aug 29, 2022

Official PyTorch implementation of the preprint paper "Stylized Neural Painting", accepted to CVPR 2021.

1.5k Dec 28, 2022

Official PyTorch implementation of the paper "Recycling Discriminator: Towards Opinion-Unaware Image Quality Assessment Using Wasserstein GAN", accepted to ACM MM 2021 BNI Track.

RecycleD Official PyTorch implementation of the paper "Recycling Discriminator: Towards Opinion-Unaware Image Quality Assessment Using Wasserstein GAN

23 Nov 5, 2022

official Pytorch implementation of ICCV 2021 paper FuseFormer: Fusing Fine-Grained Information in Transformers for Video Inpainting.

FuseFormer: Fusing Fine-Grained Information in Transformers for Video Inpainting By Rui Liu, Hanming Deng, Yangyi Huang, Xiaoyu Shi, Lewei Lu, Wenxiu

77 Dec 27, 2022

Official PyTorch Implementation of paper "Deep 3D Mask Volume for View Synthesis of Dynamic Scenes", ICCV 2021.

Deep 3D Mask Volume for View Synthesis of Dynamic Scenes Official PyTorch Implementation of paper "Deep 3D Mask Volume for View Synthesis of Dynamic S

17 Oct 12, 2022

This is the official pytorch implementation for our ICCV 2021 paper "TRAR: Routing the Attention Spans in Transformers for Visual Question Answering" on VQA Task

?? ERASOR (RA-L'21 with ICRA Option) Official page of "ERASOR: Egocentric Ratio of Pseudo Occupancy-based Dynamic Object Removal for Static 3D Point C

225 Dec 29, 2022

Official Pytorch implementation of the paper "Action-Conditioned 3D Human Motion Synthesis with Transformer VAE", ICCV 2021

ACTOR Official Pytorch implementation of the paper "Action-Conditioned 3D Human Motion Synthesis with Transformer VAE", ICCV 2021. Please visit our we

248 Dec 23, 2022

Official PyTorch implementation of the ICRA 2021 paper: Adversarial Differentiable Data Augmentation for Autonomous Systems.

Adversarial Differentiable Data Augmentation This repository provides the official PyTorch implementation of the ICRA 2021 paper: Adversarial Differen

3 Oct 15, 2022

Official implementation of the ICLR 2021 paper

You Only Need Adversarial Supervision for Semantic Image Synthesis Official PyTorch implementation of the ICLR 2021 paper "You Only Need Adversarial S

272 Dec 28, 2022

This project is the official implementation of our accepted ICLR 2021 paper BiPointNet: Binary Neural Network for Point Clouds.

BiPointNet: Binary Neural Network for Point Clouds Created by Haotong Qin, Zhongang Cai, Mingyuan Zhang, Yifu Ding, Haiyu Zhao, Shuai Yi, Xianglong Li

59 Dec 17, 2022

This is an official implementation of our CVPR 2021 paper "Bottom-Up Human Pose Estimation Via Disentangled Keypoint Regression" (https://arxiv.org/abs/2104.02300)

Bottom-Up Human Pose Estimation Via Disentangled Keypoint Regression Introduction In this paper, we are interested in the bottom-up paradigm of estima

367 Dec 27, 2022

The official implementation of our CVPR 2021 paper - Hybrid Rotation Averaging: A Fast and Robust Rotation Averaging Approach

Graph Optimizer This repo contains the official implementation of our CVPR 2021 paper - Hybrid Rotation Averaging: A Fast and Robust Rotation Averagin

109 Dec 23, 2022

Official implementation of the paper Vision Transformer with Progressive Sampling, ICCV 2021.

Vision Transformer with Progressive Sampling This is the official implementation of the paper Vision Transformer with Progressive Sampling, ICCV 2021.

123 Jan 1, 2023

Official implementation for ICDAR 2021 paper "Handwritten Mathematical Expression Recognition with Bidirectionally Trained Transformer"

Handwritten Mathematical Expression Recognition with Bidirectionally Trained Transformer Description Convert offline handwritten mathematical expressi

87 Dec 27, 2022

Official implementation of "An Image is Worth 16x16 Words, What is a Video Worth?" (2021 paper)

An Image is Worth 16x16 Words, What is a Video Worth? paper Official PyTorch Implementation Gilad Sharir, Asaf Noy, Lihi Zelnik-Manor DAMO Academy, Al

213 Nov 12, 2022

Official implementation of the ICCV 2021 paper "Conditional DETR for Fast Training Convergence".

The DETR approach applies the transformer encoder and decoder architecture to object detection and achieves promising performance. In this paper, we handle the critical issue, slow training convergence, and present a conditional cross-attention mechanism for fast DETR training. Our approach is motivated by that the cross-attention in DETR relies highly on the content embeddings and that the spatial embeddings make minor contributions, increasing the need for high-quality content embeddings and thus increasing the training difficulty.

281 Dec 30, 2022

The official implementation of the Interspeech 2021 paper WSRGlow: A Glow-based Waveform Generative Model for Audio Super-Resolution.

WSRGlow The official implementation of the Interspeech 2021 paper WSRGlow: A Glow-based Waveform Generative Model for Audio Super-Resolution. Audio sa

96 Jan 3, 2023

The Official Implementation of the ICCV-2021 Paper: Semantically Coherent Out-of-Distribution Detection.

SCOOD-UDG (ICCV 2021) This repository is the official implementation of the paper: Semantically Coherent Out-of-Distribution Detection Jingkang Yang,

62 Nov 21, 2022

Official implementation of the ICCV 2021 paper: "The Power of Points for Modeling Humans in Clothing".

The Power of Points for Modeling Humans in Clothing (ICCV 2021) This repository contains the official PyTorch implementation of the ICCV 2021 paper: T

158 Nov 24, 2022