Trained WRN-28-10 with batch size 64 (128 in paper).
Trained DenseNet-BC-100 (k=12) with batch size 32 and initial learning rate 0.05 (batch size 64 and initial learning rate 0.1 in paper).
Trained ResNeXt-29 4x64d with a single GPU, batch size 32 and initial learning rate 0.025 (8 GPUs, batch size 128 and initial learning rate 0.1 in paper).
Trained shake-shake models with a single GPU (2 GPUs in paper).
Trained shake-shake 26 2x64d (S-S-I) with batch size 64, and initial learning rate 0.1.
Test errors reported above are the ones at last epoch.
Experiments with only 1 run are done on different computer from the one used for experiments with 3 runs.
Results reported in the tables are the test errors at last epochs.
All models are trained using cosine annealing with initial learning rate 0.2.
Following data augmentations are applied to the training data:
Images are padded with 4 pixels on each side, and 28x28 patches are randomly cropped from the padded images.
Images are randomly flipped horizontally.
GeForce GTX 1080 Ti was used in these experiments.
Results on MNIST
Model
Test Error (median of 3 runs)
# of Epochs
Training Time
ResNet-preact-20
0.40
100
12m
ResNet-preact-20, Cutout 6
0.32
100
12m
ResNet-preact-20, Cutout 8
0.25
100
12m
ResNet-preact-20, Cutout 10
0.27
100
12m
ResNet-preact-20, Cutout 12
0.26
100
12m
ResNet-preact-20, Cutout 14
0.26
100
12m
ResNet-preact-20, Cutout 16
0.25
100
12m
ResNet-preact-20, Mixup (alpha=1)
0.40
100
12m
ResNet-preact-20, Mixup (alpha=0.5)
0.38
100
12m
ResNet-preact-20, widening factor 4, Cutout 14
0.26
100
45m
ResNet-preact-50, Cutout 14
0.29
100
28m
ResNet-preact-50, widening factor 4, Cutout 14
0.25
100
1h50m
shake-shake-26 2x96d (S-S-I), Cutout 14
0.24
100
3h22m
Note
Results reported in the table are the test errors at last epochs.
All models are trained using cosine annealing with initial learning rate 0.2.
GeForce GTX 1080 Ti was used in these experiments.
Results on Kuzushiji-MNIST
Model
Test Error (median of 3 runs)
# of Epochs
Training Time
ResNet-preact-20, Cutout 14
0.82 (best 0.67)
200
24m
ResNet-preact-20, widening factor 4, Cutout 14
0.72 (best 0.67)
200
1h30m
PyramidNet-110-270, Cutout 14
0.72 (best 0.70)
200
10h05m
shake-shake-26 2x96d (S-S-I), Cutout 14
0.66 (best 0.63)
200
6h46m
Note
Results reported in the table are the test errors at last epochs.
All models are trained using cosine annealing with initial learning rate 0.2.
GeForce GTX 1080 Ti was used in these experiments.
Experiments
Experiment on residual units, learning rate scheduling, and data augmentation
In this experiment, the effects of the following on classification accuracy are investigated:
PyramidNet-like residual units
Cosine annealing of learning rate
Cutout
Random Erasing
Mixup
Preactivation of shortcuts after downsampling
ResNet-preact-56 is trained on CIFAR-10 with initial learning rate 0.2 in this experiment.
Note
PyramidNet paper (1610.02915) showed that removing first ReLU in residual units and adding BN after last convolutions in residual units both improve classification accuracy.
SGDR paper (1608.03983) showed cosine annealing improves classification accuracy even without restarting.
Results
PyramidNet-like units works.
It might be better not to preactivate shortcuts after downsampling when using PyramidNet-like units.
Cosine annealing slightly improves accuracy.
Cutout, RandomErasing, and Mixup all work great.
Mixup needs longer training.
Model
Test Error (median of 5 runs)
Training Time
w/ 1st ReLU, w/o last BN, preactivate shortcut after downsampling
6.45
95 min
w/ 1st ReLU, w/o last BN
6.47
95 min
w/o 1st ReLU, w/o last BN
6.14
89 min
w/ 1st ReLU, w/ last BN
6.43
104 min
w/o 1st ReLU, w/ last BN
5.85
98 min
w/o 1st ReLU, w/ last BN, preactivate shortcut after downsampling
Experiments on label smoothing, Mixup, RICAP, and Dual-Cutout
Results on CIFAR-10
Model
Test Error (median of 3 runs)
# of Epochs
Training Time
ResNet-preact-20
7.60
200
24m
ResNet-preact-20, label smoothing (epsilon=0.001)
7.51
200
25m
ResNet-preact-20, label smoothing (epsilon=0.01)
7.21
200
25m
ResNet-preact-20, label smoothing (epsilon=0.1)
7.57
200
25m
ResNet-preact-20, mixup (alpha=1)
7.24
200
26m
ResNet-preact-20, RICAP (beta=0.3), w/ random crop
6.88
200
28m
ResNet-preact-20, RICAP (beta=0.3)
6.77
200
28m
ResNet-preact-20, Dual-Cutout 16 (alpha=0.1)
6.24
200
45m
ResNet-preact-20
7.05
400
49m
ResNet-preact-20, label smoothing (epsilon=0.001)
7.20
400
49m
ResNet-preact-20, label smoothing (epsilon=0.01)
6.97
400
49m
ResNet-preact-20, label smoothing (epsilon=0.1)
7.16
400
49m
ResNet-preact-20, mixup (alpha=1)
6.66
400
51m
ResNet-preact-20, RICAP (beta=0.3), w/ random crop
6.30
400
56m
ResNet-preact-20, RICAP (beta=0.3)
6.19
400
56m
ResNet-preact-20, Dual-Cutout 16 (alpha=0.1)
5.55
400
1h36m
Note
Results reported in the table are the test errors at last epochs.
All models are trained using cosine annealing with initial learning rate 0.2.
GeForce GTX 1080 Ti was used in these experiments.
Experiments on batch size and learning rate
Following experiments are done on CIFAR-10 dataset using GeForce 1080 Ti.
Results reported in the table are the test errors at last epochs.
Linear scaling rule for learning rate
Model
batch size
initial lr
lr schedule
# of Epochs
Test Error (1 run)
Training Time
ResNet-preact-20
4096
3.2
cosine
200
10.57
22m
ResNet-preact-20
2048
1.6
cosine
200
8.87
21m
ResNet-preact-20
1024
0.8
cosine
200
8.40
21m
ResNet-preact-20
512
0.4
cosine
200
8.22
20m
ResNet-preact-20
256
0.2
cosine
200
8.61
22m
ResNet-preact-20
128
0.1
cosine
200
8.09
24m
ResNet-preact-20
64
0.05
cosine
200
8.22
28m
ResNet-preact-20
32
0.025
cosine
200
8.00
43m
ResNet-preact-20
16
0.0125
cosine
200
7.75
1h17m
ResNet-preact-20
8
0.006125
cosine
200
7.70
2h32m
Model
batch size
initial lr
lr schedule
# of Epochs
Test Error (1 run)
Training Time
ResNet-preact-20
4096
3.2
multistep
200
28.97
22m
ResNet-preact-20
2048
1.6
multistep
200
9.07
21m
ResNet-preact-20
1024
0.8
multistep
200
8.62
21m
ResNet-preact-20
512
0.4
multistep
200
8.23
20m
ResNet-preact-20
256
0.2
multistep
200
8.40
21m
ResNet-preact-20
128
0.1
multistep
200
8.28
24m
ResNet-preact-20
64
0.05
multistep
200
8.13
28m
ResNet-preact-20
32
0.025
multistep
200
7.58
43m
ResNet-preact-20
16
0.0125
multistep
200
7.93
1h18m
ResNet-preact-20
8
0.006125
multistep
200
8.31
2h34m
Linear scaling + longer training
Model
batch size
initial lr
lr schedule
# of Epochs
Test Error (1 run)
Training Time
ResNet-preact-20
4096
3.2
cosine
400
8.97
44m
ResNet-preact-20
2048
1.6
cosine
400
7.85
43m
ResNet-preact-20
1024
0.8
cosine
400
7.20
42m
ResNet-preact-20
512
0.4
cosine
400
7.83
40m
ResNet-preact-20
256
0.2
cosine
400
7.65
42m
ResNet-preact-20
128
0.1
cosine
400
7.09
47m
ResNet-preact-20
64
0.05
cosine
400
7.17
44m
ResNet-preact-20
32
0.025
cosine
400
7.24
2h11m
ResNet-preact-20
16
0.0125
cosine
400
7.26
4h10m
ResNet-preact-20
8
0.006125
cosine
400
7.02
7h53m
Model
batch size
initial lr
lr schedule
# of Epochs
Test Error (1 run)
Training Time
ResNet-preact-20
4096
3.2
cosine
800
8.14
1h29m
ResNet-preact-20
2048
1.6
cosine
800
7.74
1h23m
ResNet-preact-20
1024
0.8
cosine
800
7.15
1h31m
ResNet-preact-20
512
0.4
cosine
800
7.27
1h25m
ResNet-preact-20
256
0.2
cosine
800
7.22
1h26m
ResNet-preact-20
128
0.1
cosine
800
6.68
1h35m
ResNet-preact-20
64
0.05
cosine
800
7.18
2h20m
ResNet-preact-20
32
0.025
cosine
800
7.03
4h16m
ResNet-preact-20
16
0.0125
cosine
800
6.78
8h37m
ResNet-preact-20
8
0.006125
cosine
800
6.89
16h47m
Effect of initial learning rate
Model
batch size
initial lr
lr schedule
# of Epochs
Test Error (1 run)
Training Time
ResNet-preact-20
4096
3.2
cosine
200
10.57
22m
ResNet-preact-20
4096
1.6
cosine
200
10.32
22m
ResNet-preact-20
4096
0.8
cosine
200
10.71
22m
Model
batch size
initial lr
lr schedule
# of Epochs
Test Error (1 run)
Training Time
ResNet-preact-20
2048
3.2
cosine
200
11.34
21m
ResNet-preact-20
2048
2.4
cosine
200
8.69
21m
ResNet-preact-20
2048
2.0
cosine
200
8.81
21m
ResNet-preact-20
2048
1.6
cosine
200
8.73
22m
ResNet-preact-20
2048
0.8
cosine
200
9.62
21m
Model
batch size
initial lr
lr schedule
# of Epochs
Test Error (1 run)
Training Time
ResNet-preact-20
1024
3.2
cosine
200
9.12
21m
ResNet-preact-20
1024
2.4
cosine
200
8.42
22m
ResNet-preact-20
1024
2.0
cosine
200
8.38
22m
ResNet-preact-20
1024
1.6
cosine
200
8.07
22m
ResNet-preact-20
1024
1.2
cosine
200
8.25
21m
ResNet-preact-20
1024
0.8
cosine
200
8.08
22m
ResNet-preact-20
1024
0.4
cosine
200
8.49
22m
Model
batch size
initial lr
lr schedule
# of Epochs
Test Error (1 run)
Training Time
ResNet-preact-20
512
3.2
cosine
200
8.51
21m
ResNet-preact-20
512
1.6
cosine
200
7.73
20m
ResNet-preact-20
512
0.8
cosine
200
7.73
21m
ResNet-preact-20
512
0.4
cosine
200
8.22
20m
Model
batch size
initial lr
lr schedule
# of Epochs
Test Error (1 run)
Training Time
ResNet-preact-20
256
3.2
cosine
200
9.64
22m
ResNet-preact-20
256
1.6
cosine
200
8.32
22m
ResNet-preact-20
256
0.8
cosine
200
7.45
21m
ResNet-preact-20
256
0.4
cosine
200
7.68
22m
ResNet-preact-20
256
0.2
cosine
200
8.61
22m
Model
batch size
initial lr
lr schedule
# of Epochs
Test Error (1 run)
Training Time
ResNet-preact-20
128
1.6
cosine
200
9.03
24m
ResNet-preact-20
128
0.8
cosine
200
7.54
24m
ResNet-preact-20
128
0.4
cosine
200
7.28
24m
ResNet-preact-20
128
0.2
cosine
200
7.96
24m
ResNet-preact-20
128
0.1
cosine
200
8.09
24m
ResNet-preact-20
128
0.05
cosine
200
8.81
24m
ResNet-preact-20
128
0.025
cosine
200
10.07
24m
Model
batch size
initial lr
lr schedule
# of Epochs
Test Error (1 run)
Training Time
ResNet-preact-20
64
0.4
cosine
200
7.42
35m
ResNet-preact-20
64
0.2
cosine
200
7.52
36m
ResNet-preact-20
64
0.1
cosine
200
7.78
37m
ResNet-preact-20
64
0.05
cosine
200
8.22
28m
Model
batch size
initial lr
lr schedule
# of Epochs
Test Error (1 run)
Training Time
ResNet-preact-20
32
0.2
cosine
200
7.64
1h05m
ResNet-preact-20
32
0.1
cosine
200
7.25
1h08m
ResNet-preact-20
32
0.05
cosine
200
7.45
1h07m
ResNet-preact-20
32
0.025
cosine
200
8.00
43m
Good learning rate + longer training
Model
batch size
initial lr
lr schedule
# of Epochs
Test Error (1 run)
Training Time
ResNet-preact-20
4096
1.6
cosine
200
10.32
22m
ResNet-preact-20
2048
1.6
cosine
200
8.73
22m
ResNet-preact-20
1024
1.6
cosine
200
8.07
22m
ResNet-preact-20
1024
0.8
cosine
200
8.08
22m
ResNet-preact-20
512
1.6
cosine
200
7.73
20m
ResNet-preact-20
512
0.8
cosine
200
7.73
21m
ResNet-preact-20
256
0.8
cosine
200
7.45
21m
ResNet-preact-20
128
0.4
cosine
200
7.28
24m
ResNet-preact-20
128
0.2
cosine
200
7.96
24m
ResNet-preact-20
128
0.1
cosine
200
8.09
24m
Model
batch size
initial lr
lr schedule
# of Epochs
Test Error (1 run)
Training Time
ResNet-preact-20
4096
1.6
cosine
800
8.36
1h33m
ResNet-preact-20
2048
1.6
cosine
800
7.53
1h27m
ResNet-preact-20
1024
1.6
cosine
800
7.30
1h30m
ResNet-preact-20
1024
0.8
cosine
800
7.42
1h30m
ResNet-preact-20
512
1.6
cosine
800
6.69
1h26m
ResNet-preact-20
512
0.8
cosine
800
6.77
1h26m
ResNet-preact-20
256
0.8
cosine
800
6.84
1h28m
ResNet-preact-20
128
0.4
cosine
800
6.86
1h35m
ResNet-preact-20
128
0.2
cosine
800
7.05
1h38m
ResNet-preact-20
128
0.1
cosine
800
6.68
1h35m
Model
batch size
initial lr
lr schedule
# of Epochs
Test Error (1 run)
Training Time
ResNet-preact-20
4096
1.6
cosine
1600
8.25
3h10m
ResNet-preact-20
2048
1.6
cosine
1600
7.34
2h50m
ResNet-preact-20
1024
1.6
cosine
1600
6.94
2h52m
ResNet-preact-20
512
1.6
cosine
1600
6.99
2h44m
ResNet-preact-20
256
0.8
cosine
1600
6.95
2h50m
ResNet-preact-20
128
0.4
cosine
1600
6.64
3h09m
Model
batch size
initial lr
lr schedule
# of Epochs
Test Error (1 run)
Training Time
ResNet-preact-20
4096
1.6
cosine
3200
9.52
6h15m
ResNet-preact-20
2048
1.6
cosine
3200
6.92
5h42m
ResNet-preact-20
1024
1.6
cosine
3200
6.96
5h43m
Model
batch size
initial lr
lr schedule
# of Epochs
Test Error (1 run)
Training Time
ResNet-preact-20
2048
1.6
cosine
6400
7.45
11h44m
LARS
In the original papers (1708.03888, 1801.03137), they used polynomial decay learning rate scheduling, but cosine annealing is used in these experiments.
In this implementation, LARS coefficient is not used, so learning rate should be adjusted accordingly.
He, Kaiming, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. "Deep Residual Learning for Image Recognition." The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016. link, arXiv:1512.03385
He, Kaiming, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. "Identity Mappings in Deep Residual Networks." In European Conference on Computer Vision (ECCV). 2016. arXiv:1603.05027, Torch implementation
Zagoruyko, Sergey, and Nikos Komodakis. "Wide Residual Networks." Proceedings of the British Machine Vision Conference (BMVC), 2016. arXiv:1605.07146, Torch implementation
Huang, Gao, Zhuang Liu, Kilian Q Weinberger, and Laurens van der Maaten. "Densely Connected Convolutional Networks." The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017. link, arXiv:1608.06993, Torch implementation
Xie, Saining, Ross Girshick, Piotr Dollar, Zhuowen Tu, and Kaiming He. "Aggregated Residual Transformations for Deep Neural Networks." The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017. link, arXiv:1611.05431, Torch implementation
Gastaldi, Xavier. "Shake-Shake regularization of 3-branch residual networks." In International Conference on Learning Representations (ICLR) Workshop, 2017. link, arXiv:1705.07485, Torch implementation
Hu, Jie, Li Shen, and Gang Sun. "Squeeze-and-Excitation Networks." The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018, pp. 7132-7141. link, arXiv:1709.01507, Caffe implementation
Huang, Gao, Zhuang Liu, Geoff Pleiss, Laurens van der Maaten, and Kilian Q. Weinberger. "Convolutional Networks with Dense Connectivity." IEEE transactions on pattern analysis and machine intelligence (2019). arXiv:2001.02394
Regularization, data augmentation
Szegedy, Christian, Vincent Vanhoucke, Sergey Ioffe, Jon Shlens, and Zbigniew Wojna. "Rethinking the Inception Architecture for Computer Vision." The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016. link, arXiv:1512.00567
DeVries, Terrance, and Graham W. Taylor. "Improved Regularization of Convolutional Neural Networks with Cutout." arXiv preprint arXiv:1708.04552 (2017). arXiv:1708.04552, PyTorch implementation
Zhong, Zhun, Liang Zheng, Guoliang Kang, Shaozi Li, and Yi Yang. "Random Erasing Data Augmentation." arXiv preprint arXiv:1708.04896 (2017). arXiv:1708.04896, PyTorch implementation
Zhang, Hongyi, Moustapha Cisse, Yann N. Dauphin, and David Lopez-Paz. "mixup: Beyond Empirical Risk Minimization." In International Conference on Learning Representations (ICLR), 2017. link, arXiv:1710.09412
Takahashi, Ryo, Takashi Matsubara, and Kuniaki Uehara. "Data Augmentation using Random Image Cropping and Patching for Deep CNNs." Proceedings of The 10th Asian Conference on Machine Learning (ACML), 2018. link, arXiv:1811.09030
Yun, Sangdoo, Dongyoon Han, Seong Joon Oh, Sanghyuk Chun, Junsuk Choe, and Youngjoon Yoo. "CutMix: Regularization Strategy to Train Strong Classifiers with Localizable Features." arXiv preprint arXiv:1905.04899 (2019). arXiv:1905.04899
Large batch
Keskar, Nitish Shirish, Dheevatsa Mudigere, Jorge Nocedal, Mikhail Smelyanskiy, and Ping Tak Peter Tang. "On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima." In International Conference on Learning Representations (ICLR), 2017. link, arXiv:1609.04836
Hoffer, Elad, Itay Hubara, and Daniel Soudry. "Train longer, generalize better: closing the generalization gap in large batch training of neural networks." In Advances in Neural Information Processing Systems (NIPS), 2017. link, arXiv:1705.08741, PyTorch implementation
Goyal, Priya, Piotr Dollar, Ross Girshick, Pieter Noordhuis, Lukasz Wesolowski, Aapo Kyrola, Andrew Tulloch, Yangqing Jia, and Kaiming He. "Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour." arXiv preprint arXiv:1706.02677 (2017). arXiv:1706.02677
You, Yang, Igor Gitman, and Boris Ginsburg. "Large Batch Training of Convolutional Networks." arXiv preprint arXiv:1708.03888 (2017). arXiv:1708.03888
You, Yang, Zhao Zhang, Cho-Jui Hsieh, James Demmel, and Kurt Keutzer. "ImageNet Training in Minutes." arXiv preprint arXiv:1709.05011 (2017). arXiv:1709.05011
Smith, Samuel L., Pieter-Jan Kindermans, Chris Ying, and Quoc V. Le. "Don't Decay the Learning Rate, Increase the Batch Size." In International Conference on Learning Representations (ICLR), 2018. link, arXiv:1711.00489
Gitman, Igor, Deepak Dilipkumar, and Ben Parr. "Convergence Analysis of Gradient Descent Algorithms with Proportional Updates." arXiv preprint arXiv:1801.03137 (2018). arXiv:1801.03137TensorFlow implementation
Jia, Xianyan, Shutao Song, Wei He, Yangzihao Wang, Haidong Rong, Feihu Zhou, Liqiang Xie, Zhenyu Guo, Yuanzhou Yang, Liwei Yu, Tiegang Chen, Guangxiao Hu, Shaohuai Shi, and Xiaowen Chu. "Highly Scalable Deep Learning Training System with Mixed-Precision: Training ImageNet in Four Minutes." arXiv preprint arXiv:1807.11205 (2018). arXiv:1807.11205
Shallue, Christopher J., Jaehoon Lee, Joseph Antognini, Jascha Sohl-Dickstein, Roy Frostig, and George E. Dahl. "Measuring the Effects of Data Parallelism on Neural Network Training." arXiv preprint arXiv:1811.03600 (2018). arXiv:1811.03600
Ying, Chris, Sameer Kumar, Dehao Chen, Tao Wang, and Youlong Cheng. "Image Classification at Supercomputer Scale." In Advances in Neural Information Processing Systems (NeurIPS) Workshop, 2018. link, arXiv:1811.06992
Others
Loshchilov, Ilya, and Frank Hutter. "SGDR: Stochastic Gradient Descent with Warm Restarts." In International Conference on Learning Representations (ICLR), 2017. link, arXiv:1608.03983, Lasagne implementation
Micikevicius, Paulius, Sharan Narang, Jonah Alben, Gregory Diamos, Erich Elsen, David Garcia, Boris Ginsburg, Michael Houston, Oleksii Kuchaiev, Ganesh Venkatesh, and Hao Wu. "Mixed Precision Training." In International Conference on Learning Representations (ICLR), 2018. link, arXiv:1710.03740
Recht, Benjamin, Rebecca Roelofs, Ludwig Schmidt, and Vaishaal Shankar. "Do CIFAR-10 Classifiers Generalize to CIFAR-10?" arXiv preprint arXiv:1806.00451 (2018). arXiv:1806.00451
He, Tong, Zhi Zhang, Hang Zhang, Zhongyue Zhang, Junyuan Xie, and Mu Li. "Bag of Tricks for Image Classification with Convolutional Neural Networks." arXiv preprint arXiv:1812.01187 (2018). arXiv:1812.01187
I'm having the following problem when running the read.me command and would really appreciate your help :
Traceback (most recent call last):
File "train.py", line 445, in
main()
File "train.py", line 371, in main
model, optimizer, opt_level=config.train.precision)
File "/mnt/cephfs/training/users/lilujun/miniconda3/envs/py/lib/python3.7/site-packages/apex-0.1-py3.7.egg/apex/amp/frontend.py", line 358, in initialize
return _initialize(models, optimizers, _amp_state.opt_properties, num_losses, cast_model_outputs)
File "/mnt/cephfs/training/users/lilujun/miniconda3/envs/py/lib/python3.7/site-packages/apex-0.1-py3.7.egg/apex/amp/_initialize.py", line 171, in _initialize
check_params_fp32(models)
File "/mnt/cephfs/training/users/lilujun/miniconda3/envs/py/lib/python3.7/site-packages/apex-0.1-py3.7.egg/apex/amp/_initialize.py", line 116, in check_params_fp32
name, buf.type()))
File "/mnt/cephfs/training/users/lilujun/miniconda3/envs/py/lib/python3.7/site-packages/apex-0.1-py3.7.egg/apex/amp/_amp_state.py", line 32, in warn_or_err
raise RuntimeError(msg)
RuntimeError: Found buffer total_ops with type torch.DoubleTensor, expected torch.cuda.FloatTensor.
When using amp.initialize, you need to provide a model with buffers
located on a CUDA device before passing it no matter what optimization level
you chose. Use model.to('cuda') to use the default device.
I have a folder containing the images which need to evaluate. All images where stored as
"train dataset folder is:"
/path/female_bag/female_bag*.jpg,
/path/makeup_bag/makeup_bag*.jpg,
" test dataset folder is:"
/path/female_bag/female_bag*.jpg,
/path/makeup_bag/makeup_bag*.jpg,
When I trying to use self-changed evaluate.py to evaluate my own dataset this error always came first. Do you know how could I change the code? I already add this :
transforms.ToTensor()in evaluate.py and changed the code in dataset.py as return self.transform(self.x[index]), self.transform(self.y[index])
Is there any other way to eliminate the error?
The result of pyramidnet or resnext is very good !
However, I want to try some more algorithms such as efficientnet, the github project pytorch-image-models you mentioned before has many algorithms, but the project is build to train and test in open dataset, not good at run our own dataset. Even I changed the code to run my own dataset, the result is worse than your project with the same algorithms.
So, could you give me some advice about how to add efficientnet from pytorch-image-models to your project for imagenet?
Thanks a lot !!!
AssertionError: Key env_info.cuda_version with value <class 'NoneType'> is not a valid type;
valid types: {<class 'bool'>, <class 'float'>, <class 'tuple'>, <class 'int'>, <class 'list'>, <class 'str'>}
Below is the traceback message:
Traceback (most recent call last):
File "train.py", line 436, in <module>
main()
File "train.py", line 340, in main
save_config(get_env_info(config), output_dir / 'env.yaml')
File "/Users/charlotte/Desktop/classification/pytorch_image_classification/utils/env_info.py", line 19, in get_env_info
return ConfigNode({'env_info': info})
File "/Users/charlotte/Desktop/classification/pytorch_image_classification/config/config_node.py", line 6, in __init__
super().__init__(init_dict, key_list, new_allowed)
File "/Users/charlotte/opt/miniconda3/lib/python3.7/site-packages/yacs/config.py", line 86, in __init__
init_dict = self._create_config_tree_from_dict(init_dict, key_list)
File "/Users/charlotte/opt/miniconda3/lib/python3.7/site-packages/yacs/config.py", line 126, in _create_config_tree_from_dict
dic[k] = cls(v, key_list=key_list + [k])
File "/Users/charlotte/Desktop/classification/pytorch_image_classification/config/config_node.py", line 6, in __init__
super().__init__(init_dict, key_list, new_allowed)
File "/Users/charlotte/opt/miniconda3/lib/python3.7/site-packages/yacs/config.py", line 86, in __init__
init_dict = self._create_config_tree_from_dict(init_dict, key_list)
File "/Users/charlotte/opt/miniconda3/lib/python3.7/site-packages/yacs/config.py", line 132, in _create_config_tree_from_dict
".".join(key_list + [str(k)]), type(v), _VALID_TYPES
File "/Users/charlotte/opt/miniconda3/lib/python3.7/site-packages/yacs/config.py", line 525, in _assert_with_logging
assert cond, msg
Can you please give me some hints about how to fix this? Thanks!
Thank you for your excellent work and share !!!
I have own dataset with channel 1, 64X64 gray scale images.
For all network(vgg16, resnet18 ...), if I set n_channels: 1 in yaml file, following error shows:
Traceback (most recent call last):
File "train.py", line 436, in
main()
File "train.py", line 404, in main
validate(0, config, model, val_loss, val_loader, logger,
File "train.py", line 259, in validate
outputs = model(data)
File "/home/zzks/anaconda3/envs/dbnet/lib/python3.8/site-packages/torch/nn/modules/module.py", line 532, in call
result = self.forward(*input, **kwargs)
File "/home/zzks/anaconda3/envs/dbnet/lib/python3.8/site-packages/torch/nn/parallel/data_parallel.py", line 150, in forward
return self.module(*inputs[0], **kwargs[0])
File "/home/zzks/anaconda3/envs/dbnet/lib/python3.8/site-packages/torch/nn/modules/module.py", line 532, in call
result = self.forward(*input, **kwargs)
File "/home/zzks/anaconda3/envs/dbnet/lib/python3.8/site-packages/apex/amp/_initialize.py", line 196, in new_fwd
output = old_fwd(*applier(args, input_caster),
File "/media/zzks/xi/2020PJ/lab/pytorch_image_classification/pytorch_image_classification/models/imagenet/vgg.py", line 80, in forward
x = self._forward_conv(x)
File "/media/zzks/xi/2020PJ/lab/pytorch_image_classification/pytorch_image_classification/models/imagenet/vgg.py", line 72, in _forward_conv
x = self.stage1(x)
File "/home/zzks/anaconda3/envs/dbnet/lib/python3.8/site-packages/torch/nn/modules/module.py", line 532, in call
result = self.forward(*input, **kwargs)
File "/home/zzks/anaconda3/envs/dbnet/lib/python3.8/site-packages/torch/nn/modules/container.py", line 100, in forward
input = module(input)
File "/home/zzks/anaconda3/envs/dbnet/lib/python3.8/site-packages/torch/nn/modules/module.py", line 532, in call
result = self.forward(*input, **kwargs)
File "/home/zzks/anaconda3/envs/dbnet/lib/python3.8/site-packages/torch/nn/modules/conv.py", line 345, in forward
return self.conv2d_forward(input, self.weight)
File "/home/zzks/anaconda3/envs/dbnet/lib/python3.8/site-packages/torch/nn/modules/conv.py", line 341, in conv2d_forward
return F.conv2d(input, weight, self.bias, self.stride,
RuntimeError: Given groups=1, weight of size 64 1 3 3, expected input[100, 3, 64, 64] to have 1 channels, but got 3 channels instead
Hi @hysts , I've trained and evaluated successfully with my own dataset. And now I want to test with only an image, but I'm not sure about any files I need to change. Can you help me this problem? I've tried to change some files and some functions but it's not work.
Thank you so much!
I have a dataset including many folders, each folder (as a class) contains images. So I want to train with my own dataset but I don't know how to set up my data structure. Thank you so much!
I notice that if no_weight_decay_on_bn is set to True, weight decay will only apply to conv.weight. It seems that weight decay on fc layers are also removed at the same time. Is there any reason to do so?
I'm on my ubuntu server(without GPU), trying to run your script.
pip install -r requirements.txt is successful (i ignored the apex requirement, don't have gpu now)
but when I runned the following command, I meet the following issue
and all other .yaml throwed me the same issue.
~/code/pytorch_image_classification# python train.py --config configs/cifar/resnet.yaml
Traceback (most recent call last):
File "train.py", line 449, in
main()
File "train.py", line 353, in main
save_config(get_env_info(config), output_dir / 'env.yaml')
File "/root/code/pytorch_image_classification/pytorch_image_classification/utils/env_info.py", line 19, in get_env_info
return ConfigNode({'env_info': info})
File "/root/code/pytorch_image_classification/pytorch_image_classification/config/config_node.py", line 6, in init
super().init(init_dict, key_list, new_allowed)
File "/root/archiconda3/envs/python38/lib/python3.8/site-packages/yacs/config.py", line 86, in init
init_dict = self._create_config_tree_from_dict(init_dict, key_list)
File "/root/archiconda3/envs/python38/lib/python3.8/site-packages/yacs/config.py", line 126, in _create_config_tree_from_dict
dic[k] = cls(v, key_list=key_list + [k])
File "/root/code/pytorch_image_classification/pytorch_image_classification/config/config_node.py", line 6, in init
super().init(init_dict, key_list, new_allowed)
File "/root/archiconda3/envs/python38/lib/python3.8/site-packages/yacs/config.py", line 86, in init
init_dict = self._create_config_tree_from_dict(init_dict, key_list)
File "/root/archiconda3/envs/python38/lib/python3.8/site-packages/yacs/config.py", line 129, in _create_config_tree_from_dict
_assert_with_logging(
File "/root/archiconda3/envs/python38/lib/python3.8/site-packages/yacs/config.py", line 545, in _assert_with_logging
assert cond, msg
AssertionError: Key env_info.pytorch_version with value <class 'torch.torch_version.TorchVersion'> is not a valid type; valid types: {<class 'NoneType'>, <class 'list'>, <class 'bool'>, <class 'int'>, <class 'tuple'>, <class 'float'>, <class 'str'>}