Accuracy Aligned. Concise Implementation of Swin Transformer

FengWang

Last update: Dec 16, 2022

Related tags

Deep Learning Swin-Transformer

Overview

Accuracy Aligned. Concise Implementation of Swin Transformer

This repository contains the implementation of Swin Transformer, and the training codes on ImageNet datasets. We have aligned the output of our network with the official one, that is, using the same input and random seed, the output is identical to the official one.

Our implementation is highly based on einops, which makes the implementation more concise, and easy to be understand. (Intuitively, we use only 200 lines of codes compared with ~600 lines of official codes.) Besides, our implementation keeps the same training speed.

Model	Epoch	acc@1(our)	acc@5(our)	acc@1(official)	acc@5(official)	pretrained model
Swin-T	300	81.3	95.5	81.2	95.5	here

Usage

Train on ImageNet:

Train Swin-T

python -m torch.distributed.launch --nproc_per_node=8 --use_env train.py --model Swin_T \
--batch-size 128 --drop-path 0.2 --data-path ~/ILSVRC2012/ --output_dir /data/SwinTransformer_exp/SwinT/

Train Swin-S

python -m torch.distributed.launch --nproc_per_node=8 --use_env train.py --model Swin_S \
--batch-size 128 --drop-path 0.3 --data-path ~/ILSVRC2012/ --output_dir /data/SwinTransformer_exp/SwinS/

Train Swin-B

python -m torch.distributed.launch --nproc_per_node=8 --use_env train.py --model Swin_B \
--batch-size 128 --drop-path 0.5 --data-path ~/ILSVRC2012/ --output_dir /data/SwinTransformer_exp/SwinB/

Reference

The training process involves many training and augmentation tricks, such as stochastic depth, mixup, cutmix and random erasing. I borrow large from Deit (https://github.com/facebookresearch/deit).

Citations

@misc{liu2021swin,
      title={Swin Transformer: Hierarchical Vision Transformer using Shifted Windows}, 
      author={Ze Liu and Yutong Lin and Yue Cao and Han Hu and Yixuan Wei and Zheng Zhang and Stephen Lin and Baining Guo},
      year={2021},
      eprint={2103.14030},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

You might also like...

Comments

IndexError: tensors used as indices must be long, byte or bool tensors

in relative_embedding(self) 91 relation = cord[:, None, :] - cord[None, :, :] + self.window_size -1 92 # negative is allowed ---> 93 return self.relative_position_params[:, relation[:,:,0], relation[:,:,1]] 94 95 class Block(nn.Module):

IndexError: tensors used as indices must be long, byte or bool tensors

directly run the SwinTransformer.py

opened by intelljames 3
swin transformer can not converge with large trainset.

I train the tiny model with one million classes and 100 million images with softmax loss and adamw, the batch size is 600 and train for 400,000 iterations but the model can not converge.

opened by guozhiyao 0

Accuracy Aligned. Concise Implementation of Swin Transformer

Related tags

Overview

Accuracy Aligned. Concise Implementation of Swin Transformer

Usage

Reference

Citations

You might also like...

Pytorch implementation of "Training a 85.4% Top-1 Accuracy Vision Transformer with 56M Parameters on ImageNet"

A concise but complete implementation of CLIP with various experimental improvements from recent papers

A concise but complete implementation of CLIP with various experimental improvements from recent papers

DRLib：A concise deep reinforcement learning library, integrating HER and PER for almost off policy RL algos.

A clear, concise, simple yet powerful and efficient API for deep learning.

PyTorch implementation for Partially View-aligned Representation Learning with Noise-robust Contrastive Loss (CVPR 2021)

Implementation for our ICCV 2021 paper: Dual-Camera Super-Resolution with Aligned Attention Modules

Implementation for our ICCV 2021 paper: Dual-Camera Super-Resolution with Aligned Attention Modules

This is an official implementation for "Video Swin Transformers".

Comments

IndexError: tensors used as indices must be long, byte or bool tensors

IndexError: tensors used as indices must be long, byte or bool tensors

swin transformer can not converge with large trainset.

Owner

FengWang

Unofficial implementation of "Swin Transformer: Hierarchical Vision Transformer using Shifted Windows" (https://arxiv.org/abs/2103.14030)

Implementation of the Swin Transformer in PyTorch.

Tensorflow implementation of Swin Transformer model.

The codes for the work "Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation"

Code of PVTv2 is released! PVTv2 largely improves PVTv1 and works better than Swin Transformer with ImageNet-1K pre-training.

SwinIR: Image Restoration Using Swin Transformer

Image Restoration Using Swin Transformer for VapourSynth

This project aims to explore the deployment of Swin-Transformer based on TensorRT, including the test results of FP16 and INT8.

This repository contains a CBIR system that uses swin transformer to extract image's feature.

Unofficial PyTorch reimplementation of the paper Swin Transformer V2: Scaling Up Capacity and Resolution