SOTA model in CIFAR10

PJDong

Last update: Dec 21, 2022

Related tags

Deep Learning pytorch-cifar-tricks

Overview

A PyTorch Implementation of CIFAR Tricks

调研了CIFAR10数据集上各种trick，数据增强，正则化方法，并进行了实现。目前项目告一段落，如果有更好的想法，或者希望一起维护这个项目可以提issue或者在我的主页找到我的联系方式。

0. Requirements

Python 3.6+
torch=1.8.0+cu111
torchvision+0.9.0+cu111
tqdm=4.26.0
PyYAML=6.0

1. Implements

1.1 Tricks

Warmup
Cosine LR Decay
SAM
Label Smooth
KD
Adabound
Xavier Kaiming init
lr finder

1.2 Augmentation

Auto Augmentation
Cutout
Mixup
RICAP
Random Erase
ShakeDrop

2. Training

2.1 CIFAR-10训练示例

WideResNet28-10 baseline on CIFAR-10:

python train.py --dataset cifar10

WideResNet28-10 +RICAP on CIFAR-10:

python train.py --dataset cifar10 --ricap True

WideResNet28-10 +Random Erasing on CIFAR-10:

python train.py --dataset cifar10 --random-erase True

WideResNet28-10 +Mixup on CIFAR-10:

python train.py --dataset cifar10 --mixup True

3. Results

3.1 原pytorch-ricap的结果

Model	Error rate	Loss	Error rate (paper)
WideResNet28-10 baseline	3.82（96.18）	0.158	3.89
WideResNet28-10 +RICAP	2.82（97.18）	0.141	2.85
WideResNet28-10 +Random Erasing	3.18（96.82）	0.114	4.65
WideResNet28-10 +Mixup	3.02（96.98）	0.158	3.02

3.2 Reimplementation结果

Model	Error rate	Loss	Error rate (paper)
WideResNet28-10 baseline	3.78（96.22）		3.89
WideResNet28-10 +RICAP	2.81（97.19）		2.85
WideResNet28-10 +Random Erasing	3.03（96.97）	0.113	4.65
WideResNet28-10 +Mixup	2.93（97.07）	0.158	3.02

3.3 Half data快速训练验证各网络结构

reimplementation models(no augmentation, half data，epoch200，bs128)

Model	Error rate	Loss
lenet(cpu爆炸)	（70.76）
wideresnet	3.78（96.22）
resnet20	（89.72）
senet	（92.34）
resnet18	（92.08）
resnet34	（92.48）
resnet50	（91.72）
regnet	（92.58）
nasnet	out of mem
shake_resnet26_2x32d	（93.06）
shake_resnet26_2x64d	（94.14）
densenet	（92.06）
dla	（92.58）
googlenet	（91.90）	0.2675
efficientnetb0(利用率低且慢)	（86.82）	0.5024
mobilenet(利用率低)	（89.18）
mobilenetv2	（91.06）
pnasnet	（90.44）
preact_resnet	（90.76）
resnext	（92.30）
vgg(cpugpu利用率都高)	（88.38）
inceptionv3	（91.84）
inceptionv4	（91.10）
inception_resnet_v2	（83.46）
rir	（92.34）	0.3932
squeezenet(CPU利用率高)	（89.16）	0.4311
stochastic_depth_resnet18	（90.22）
xception
dpn	（92.06）	0.3002
ge_resnext29_8x64d	（93.86）	巨慢

3.4 测试cpu gpu影响

TEST: scale/kernel ToyNet

修改网络的卷积层深度，并进行训练，可以得到以下结论：

结论：lenet这种卷积量比较少，只有两层的，cpu利用率高，gpu利用率低。在这个基础上增加深度，用vgg那种直筒方式增加深度，发现深度越深，cpu利用率越低，gpu利用率越高。

修改训练过程的batch size，可以得到以下结论：

结论：bs会影响收敛效果。

3.5 StepLR优化下测试cutout和mixup

architecture	epoch	cutout	mixup	C10 test acc (%)
shake_resnet26_2x64d	200			96.33
shake_resnet26_2x64d	200	√		96.99
shake_resnet26_2x64d	200		√	96.60
shake_resnet26_2x64d	200	√	√	96.46

3.6 测试SAM,ASAM,Cosine,LabelSmooth

architecture	epoch	SAM	ASAM	Cosine LR Decay	LabelSmooth	C10 test acc (%)
shake_resnet26_2x64d	200	√				96.51
shake_resnet26_2x64d	200		√			96.80
shake_resnet26_2x64d	200			√		96.61
shake_resnet26_2x64d	200				√	96.57

PS:其他库在加长训练过程（epoch=1800）情况下可以实现 shake_resnet26_2x64d achieved 97.71% test accuracy with cutout and mixup!!

3.7 测试cosine lr + shake

architecture	epoch	cutout	mixup	C10 test acc (%)
shake_resnet26_2x64d	300			96.66
shake_resnet26_2x64d	300	√		97.21
shake_resnet26_2x64d	300		√	96.90
shake_resnet26_2x64d	300	√	√	96.73

1800 epoch CIFAR ZOO中结果，由于耗时过久，未进行复现。

architecture	epoch	cutout	mixup	C10 test acc (%)
shake_resnet26_2x64d	1800			96.94（cifar zoo）
shake_resnet26_2x64d	1800	√		97.20（cifar zoo）
shake_resnet26_2x64d	1800		√	97.42（cifar zoo）
shake_resnet26_2x64d	1800	√	√	97.71（cifar zoo）

3.8 Divide and Co-training方案研究

lr:
- warmup (20 epoch)
- cosine lr decay
- lr=0.1
- total epoch(300 epoch)
bs=128
aug:
- Random Crop and resize
- Random left-right flipping
- AutoAugment
- Normalization
- Random Erasing
- Mixup
weight decay=5e-4 (bias and bn undecayed)
kaiming weight init
optimizer: nesterov

复现：((v100:gpu1) 4min*300/60=20h) top1: 97.59% 本项目目前最高值。

python train.py --model 'pyramidnet272' \
                --name 'divide-co-train' \
                --autoaugmentation True \ 
                --random-erase True \
                --mixup True \
                --epochs 300 \
                --sched 'warmcosine' \
                --optims 'nesterov' \
                --bs 128 \
                --root '/home/dpj/project/data'

3.9 测试多种数据增强

architecture	epoch	cutout	mixup	autoaugment	random-erase	C10 test acc (%)
shake_resnet26_2x64d	200					96.42
shake_resnet26_2x64d	200	√				96.49
shake_resnet26_2x64d	200		√			96.17
shake_resnet26_2x64d	200			√		96.25
shake_resnet26_2x64d	200				√	96.20
shake_resnet26_2x64d	200	√	√			95.82
shake_resnet26_2x64d	200	√		√		96.02
shake_resnet26_2x64d	200	√			√	96.00
shake_resnet26_2x64d	200		√	√		95.83
shake_resnet26_2x64d	200		√		√	95.89
shake_resnet26_2x64d	200			√	√	96.25

python train.py --model 'shake_resnet26_2x64d' --name 'ss64_orgin' --bs 64
python train.py --model 'shake_resnet26_2x64d' --name 'ss64_c' --cutout True --bs 64
python train.py --model 'shake_resnet26_2x64d' --name 'ss64_m' --mixup True --bs 64
python train.py --model 'shake_resnet26_2x64d' --name 'ss64_a' --autoaugmentation True  --bs 64
python train.py --model 'shake_resnet26_2x64d' --name 'ss64_r' --random-erase True  --bs 64
python train.py --model 'shake_resnet26_2x64d' --name 'ss64_cm'  --cutout True --mixup True --bs 64
python train.py --model 'shake_resnet26_2x64d' --name 'ss64_ca' --cutout True --autoaugmentation True --bs 64
python train.py --model 'shake_resnet26_2x64d' --name 'ss64_cr' --cutout True --random-erase True --bs 64
python train.py --model 'shake_resnet26_2x64d' --name 'ss64_ma' --mixup True --autoaugmentation True --bs 64
python train.py --model 'shake_resnet26_2x64d' --name 'ss64_mr' --mixup True --random-erase True --bs 64
python train.py --model 'shake_resnet26_2x64d' --name 'ss64_ar' --autoaugmentation True --random-erase True  --bs 64

4. Reference

[1] https://github.com/BIGBALLON/CIFAR-ZOO

[2] https://github.com/pprp/MutableNAS

[3] https://github.com/clovaai/CutMix-PyTorch

[4] https://github.com/4uiiurz1/pytorch-ricap

[5] https://github.com/NUDTNASLab/pytorch-image-models

[6] https://github.com/facebookresearch/LaMCTS

[7] https://github.com/Alibaba-MIIL/ImageNet21K

You might also like...

[ICCV 2021] Official Pytorch implementation for Discriminative Region-based Multi-Label Zero-Shot Learning SOTA results on NUS-WIDE and OpenImages

Discriminative Region-based Multi-Label Zero-Shot Learning (ICCV 2021) [arXiv][Project page coming soon] Sanath Narayan*, Akshita Gupta*, Salman Kh

54 Nov 21, 2022

A collection of SOTA Image Classification Models in PyTorch

85 Dec 30, 2022

Implementation of Uniformer, a simple attention and 3d convolutional net that achieved SOTA in a number of video classification tasks

Uniformer - Pytorch Implementation of Uniformer, a simple attention and 3d convolutional net that achieved SOTA in a number of video classification ta

90 Nov 24, 2022

Official repository for "Restormer: Efficient Transformer for High-Resolution Image Restoration". SOTA for motion deblurring, image deraining, denoising (Gaussian/real data), and defocus deblurring.

Restormer: Efficient Transformer for High-Resolution Image Restoration Syed Waqas Zamir, Aditya Arora, Salman Khan, Munawar Hayat, Fahad Shahbaz Khan,

906 Dec 30, 2022

Awesome Deep Graph Clustering is a collection of SOTA, novel deep graph clustering methods

ADGC: Awesome Deep Graph Clustering ADGC is a collection of state-of-the-art (SOTA), novel deep graph clustering methods (papers, codes and datasets).

297 Dec 27, 2022

In this project we investigate the performance of the SetCon model on realistic video footage. Therefore, we implemented the model in PyTorch and tested the model on two example videos.

Contrastive Learning of Object Representations Supervisor: Prof. Dr. Gemma Roig Institutions: Goethe University CVAI - Computational Vision & Artifici

6 Dec 8, 2022

Step by Step on how to create an vision recognition model using LOBE.ai, export the model and run the model in an Azure Function

3 Mar 30, 2022

Model search is a framework that implements AutoML algorithms for model architecture search at scale

Model search (MS) is a framework that implements AutoML algorithms for model architecture search at scale. It aims to help researchers speed up their exploration process for finding the right model architecture for their classification problems (i.e., DNNs with different types of layers).

3.2k Dec 31, 2022

Capture all information throughout your model's development in a reproducible way and tie results directly to the model code!

Rubicon Purpose Rubicon is a data science tool that captures and stores model training and execution information, like parameters and outcomes, in a r

97 Jan 3, 2023

Comments

关于mobilevit的问题

您好，请问您是将一样尺度的mobilevit试验于cifar了吗？

我看到您Mobilevit的表格那里，mobilevits的weight都接近20M了。然而mobilevit原作者将mobilevit用于256*256的imagenet分类任务，也只有3M左右。原网络里有32倍的尺度下降，勉强可以不改结构跑cifar，但是为什么weight会增加呢，我感觉很疑惑。

如果可以，希望您可以解答。
good first issue

opened by Jerryme-xxm 2
关于Vision Transformer类模型训练方式的疑问
您好，非常感谢您的代码分享，自己最近在做vision transformer，看了您的代码很有启发，遇到了几个问题想请教下。

以ViT和Swin Transformer为例，一般这两个的模型训练方式是先加载好预训练模型（通常是在imagenet上）后再在特定下游任务上进行finetune。而该代码是针对cifar数据集进行训练，因此想问下readme中给出的ViT和swin在cifar上的结果是直接在cifar上train from scratch还是先加载好预训练模型后再在cifar上进行finetune得到的。我看代码中好像没有加载预训练模型的部分，因此有些疑惑

我看作者swin transformer的实现代码和swin transformer官方代码好像不一样，请问有什么区别吗？
opened by lostsword 2
有关内存和显存占用的问题

您好，我在训练过程中发现您的实现对于cpu和gpu有较大的要求，我的内存是16g，gpu是6g，数据集是自己的数据集，一共400张左右，大小在200200左右，数据预处理阶段将其resize到224224，bs设置为4，但是我使用resnet20时跑到80左右的epoch就会报cpu内存不足的错误，而使用shake_resnet26_2x64d模型会直接报显卡内存不足的错误，请问您能指点一下吗？
good first issue

opened by missbook520 1
自己运行的精度比readme里的低

大佬您好，首先感谢您的分享。我运行了一下这份代码，主要跑了mobilevit和mobilenet两个网络，目前有一些问题想问您。 1，直接下载原代码，不改网络结构与参数，得到的精度和readme里的表格有差距。 Mobilevit_s，第160个epoch时，train_acc:0.931（表格中是98.83%）；valid_acc:0.906（表格中是92.50%），再继续训练发生过拟合，精度下降。 Mobilevit_xs，第160个epoch时，train_acc:0.925（表格中是98.22%）；valid_acc:0.895（表格中是91.77%） Mobilenet v1，train_acc:0.891； valid_acc:0.869（表格中是89.18%）我用的GPU是Nvidia 1080Ti，不知道为什么精度会差这么大，请问您得到这些结果，每个网络用的参数是什么？ args.py文件里下载下来时是resnet20。resnet20倒是比表格中的好一点，valid_acc:0.906（表格中是89.72%），可能是因为您刚好跳的这个网络？

2，我尝试自己调试了下，按照mobilevit原项目的参数，把lr改成0.01，weight-decay改成1e-5，依然表现很差 mobilevit_xxs,train_acc:0.984 (表格中是96.4%）； valid_acc:0.870（表格中是90.17%）；

3，这个问题并不是很重要，就是我跑mobilenet v2的时候，发现训练速度很慢，比mobilevit都慢很多，gpu利用率只比mobilevit高一点点。train_acc:0.989 ，valid_acc:0.899也比表格低

我跑这个项目的目的是为了压缩mobilevit，所以我进行了一些其他实验，只是调调参，还没有使用蒸馏剪枝等操作。如果您有兴趣，可以探讨一下吗？我试过把mobilevit里的mobilenet v2模块改成了mobilenet v1,试过改patch_size（2，2）到（4，4），试过把残差消掉，试过改heads数量（这里您使用heads=1，dimheads=32，不知道您怎么考虑的，那篇复现mobilevit的博客用的heads=4，dimheads=8）。总之不管我怎么改，精度都在0.87左右，甚至于我把mobilevit block里的transformer去掉，参数量由1014672变成了339408，精度还是0.863.

很抱歉写这么多打扰您，但是我水平有限，实在搞不懂原因。期待您的回复。

opened by Jerryme-xxm 2

SOTA model in CIFAR10

Related tags

Overview

A PyTorch Implementation of CIFAR Tricks

0. Requirements

1. Implements

1.1 Tricks

1.2 Augmentation

2. Training

2.1 CIFAR-10训练示例

3. Results

3.1 原pytorch-ricap的结果

3.2 Reimplementation结果

3.3 Half data快速训练验证各网络结构

3.4 测试cpu gpu影响

3.5 StepLR优化下测试cutout和mixup

3.6 测试SAM,ASAM,Cosine,LabelSmooth

3.7 测试cosine lr + shake

3.8 Divide and Co-training方案研究

3.9 测试多种数据增强

4. Reference

You might also like...

[ICCV 2021] Official Pytorch implementation for Discriminative Region-based Multi-Label Zero-Shot Learning SOTA results on NUS-WIDE and OpenImages

A collection of SOTA Image Classification Models in PyTorch

Implementation of Uniformer, a simple attention and 3d convolutional net that achieved SOTA in a number of video classification tasks

Official repository for "Restormer: Efficient Transformer for High-Resolution Image Restoration". SOTA for motion deblurring, image deraining, denoising (Gaussian/real data), and defocus deblurring.

Awesome Deep Graph Clustering is a collection of SOTA, novel deep graph clustering methods

In this project we investigate the performance of the SetCon model on realistic video footage. Therefore, we implemented the model in PyTorch and tested the model on two example videos.

Step by Step on how to create an vision recognition model using LOBE.ai, export the model and run the model in an Azure Function

Model search is a framework that implements AutoML algorithms for model architecture search at scale

Capture all information throughout your model's development in a reproducible way and tie results directly to the model code!

Comments

关于mobilevit的问题

关于Vision Transformer类模型训练方式的疑问

有关内存和显存占用的问题

自己运行的精度比readme里的低

Owner

PJDong

Implementation of STAM (Space Time Attention Model), a pure and simple attention model that reaches SOTA for video classification

Random Erasing Data Augmentation. Experiments on CIFAR10, CIFAR100 and Fashion-MNIST

Base pretrained models and datasets in pytorch (MNIST, SVHN, CIFAR10, CIFAR100, STL10, AlexNet, VGG16, VGG19, ResNet, Inception, SqueezeNet)

Vit-ImageClassification - Pytorch ViT for Image classification on the CIFAR10 dataset

This project uses ViT to perform image classification tasks on DATA set CIFAR10.

A toolkit for document-level event extraction, containing some SOTA model implementations

Implementation of Vision Transformer, a simple way to achieve SOTA in vision classification with only a single transformer encoder, in Pytorch

Pretrained SOTA Deep Learning models, callbacks and more for research and production with PyTorch Lightning and PyTorch

Optimizing DR with hard negatives and achieving SOTA first-stage retrieval performance on TREC DL Track (SIGIR 2021 Full Paper).

[ICCV 2021] Official Pytorch implementation for Discriminative Region-based Multi-Label Zero-Shot Learning SOTA results on NUS-WIDE and OpenImages