Hi authors,
I have reproduced all results based on your codes. Most of them are consistent with the reported results, except the swin transformer. Below are some results (with reported results followed in brackets):
Trained with 8 gpus (a100):
Cifar10: 75.00 (59.47), CIFAR100: 52.26 (53.28), SVHN: 38.10 (71.60)
Trained with 4 gpus:
CIFAR10: 81.91 (59.47), CIFAR100: 62.30 (53.28), SVHN: 91.29 (71.60)
It seems that the batch size affect swin a lot from results above. All reproduced results are comparable with vit. (e.g. ViT on CIFAR10 with 8 gpus: 77.00 (71.70)). Do you have any idea on the reason?