ImageNet-21K Pretraining for the Masses
Official PyTorch Implementation
Tal Ridnik, Emanuel Ben-Baruch, Asaf Noy, Lihi Zelnik-Manor
DAMO Academy, Alibaba Group
Abstract
ImageNet-1K serves as the primary dataset for pretraining deep learning models for computer vision tasks. ImageNet-21K dataset, which contains more pictures and classes, is used less frequently for pretraining, mainly due to its complexity, and underestimation of its added value compared to standard ImageNet-1K pretraining. This paper aims to close this gap, and make high-quality efficient pretraining on ImageNet-21K available for everyone. Via a dedicated preprocessing stage, utilizing WordNet hierarchies, and a novel training scheme called semantic softmax, we show that different models, including small mobile-oriented models, significantly benefit from ImageNet-21K pretraining on numerous datasets and tasks. We also show that we outperform previous ImageNet-21K pretraining schemes for prominent new models like ViT. Our proposed pretraining pipeline is efficient, accessible, and leads to SoTA reproducible results, from a publicly available dataset.
Getting Started
Note - repo under construction, more contetnt will be added.
(1) Pretrained Models on ImageNet-21K-P Dataset
Backbone | ImageNet-21K-P semantic top-1 Accuracy [%] |
ImageNet-1K top-1 Accuracy [%] |
Maximal batch size |
Maximal training speed (img/sec) |
Maximal inference speed (img/sec) |
---|---|---|---|---|---|
MobilenetV3_large_100 | 73.1 | 78.0 | 488 | 1210 | 5980 |
Ofa_flops_595m_s | 75.0 | 81.0 | 288 | 500 | 3240 |
ResNet50 | 75.6 | 82.0 | 320 | 720 | 2760 |
TResNet-M | 76.4 | 83.1 | 520 | 670 | 2970 |
TResNet-L (V2) | 76.7 | 83.9 | 240 | 300 | 1460 |
ViT_base_patch16_224 | 77.6 | 84.4 | 160 | 340 | 1140 |
See this link for more details.
We highly recommend to start working with ImageNet-21K by testing these weights against standard ImageNet-1K pretraining, and comparing results on your relevant downstream tasks. After you will see a significant improvement (you will), proceed to pretraining new models.
(2) Obtaining and Processing the Dataset
See instructions for obtaining and processing the dataset in here.
(3) Training Code
To use the traing code, first download ImageNet-21K-P semantic tree to your local ./resources/ folder Example of semantic softmax training:
python train_semantic_softmax.py \
--batch_size=4 \
--data_path=/mnt/datasets/21k \
--model_name=mobilenetv3_large_100 \
--model_path=/mnt/models/mobilenetv3_large_100.pth \
--epochs=80
For shortening the training, we initialize the weights from standard ImageNet-1K. Recommended to use ImageNet-1K weights from this excellent repo.
To be added soon
- KD training code
- Inference code
- Model weights after transferred to ImageNet-1K
- More...
Citation
@misc{ridnik2021imagenet21k,
title={ImageNet-21K Pretraining for the Masses},
author={Tal Ridnik and Emanuel Ben-Baruch and Asaf Noy and Lihi Zelnik-Manor},
year={2021},
eprint={2104.10972},
archivePrefix={arXiv},
primaryClass={cs.CV}
}