This is an official implementation for "Video Swin Transformers".

Overview

Video Swin Transformer

PWC PWC PWC

By Ze Liu*, Jia Ning*, Yue Cao, Yixuan Wei, Zheng Zhang, Stephen Lin and Han Hu.

This repo is the official implementation of "Video Swin Transformer". It is based on mmaction2.

Updates

06/25/2021 Initial commits

Introduction

Video Swin Transformer is initially described in "Video Swin Transformer", which advocates an inductive bias of locality in video Transformers, leading to a better speed-accuracy trade-off compared to previous approaches which compute self-attention globally even with spatial-temporal factorization. The locality of the proposed video architecture is realized by adapting the Swin Transformer designed for the image domain, while continuing to leverage the power of pre-trained image models. Our approach achieves state-of-the-art accuracy on a broad range of video recognition benchmarks, including on action recognition (84.9 top-1 accuracy on Kinetics-400 and 86.1 top-1 accuracy on Kinetics-600 with ~20x less pre-training data and ~3x smaller model size) and temporal modeling (69.6 top-1 accuracy on Something-Something v2).

teaser

Results and Models

Kinetics 400

Backbone Pretrain Lr Schd spatial crop [email protected] [email protected] #params FLOPs config model
Swin-T ImageNet-1K 30ep 224 78.8 93.6 28M 87.9G config github/baidu
Swin-S ImageNet-1K 30ep 224 80.6 94.5 50M 165.9G config github/baidu
Swin-B ImageNet-1K 30ep 224 80.6 94.6 88M 281.6G config github/baidu
Swin-B ImageNet-22K 30ep 224 82.7 95.5 88M 281.6G config github/baidu

Kinetics 600

Backbone Pretrain Lr Schd spatial crop [email protected] [email protected] #params FLOPs config model
Swin-B ImageNet-22K 30ep 224 84.0 96.5 88M 281.6G config github/baidu

Something-Something V2

Backbone Pretrain Lr Schd spatial crop [email protected] [email protected] #params FLOPs config model
Swin-B Kinetics 400 60ep 224 69.6 92.7 89M 320.6G config github/baidu

Notes:

Usage

Installation

Please refer to install.md for installation.

We also provide docker file cuda10.1 (image url) and cuda11.0 (image url) for convenient usage.

Data Preparation

Please refer to data_preparation.md for a general knowledge of data preparation. The supported datasets are listed in supported_datasets.md.

Inference

# single-gpu testing
python tools/test.py <CONFIG_FILE> <CHECKPOINT_FILE> --eval top_k_accuracy

# multi-gpu testing
bash tools/dist_test.sh <CONFIG_FILE> <CHECKPOINT_FILE> <GPU_NUM> --eval top_k_accuracy

Training

To train a video recognition model with pre-trained image models (for Kinetics-400 and Kineticc-600 datasets), run:

# single-gpu training
python tools/train.py <CONFIG_FILE> --cfg-options model.backbone.pretrained=<PRETRAIN_MODEL> [model.backbone.use_checkpoint=True] [other optional arguments]

# multi-gpu training
bash tools/dist_train.sh <CONFIG_FILE> <GPU_NUM> --cfg-options model.backbone.pretrained=<PRETRAIN_MODEL> [model.backbone.use_checkpoint=True] [other optional arguments]

For example, to train a Swin-T model for Kinetics-400 dataset with 8 gpus, run:

bash tools/dist_train.sh configs/recognition/swin/swin_tiny_patch244_window877_kinetics400_1k.py 8 --cfg-options model.backbone.pretrained=<PRETRAIN_MODEL> 

To train a video recognizer with pre-trained video models (for Something-Something v2 datasets), run:

# single-gpu training
python tools/train.py <CONFIG_FILE> --cfg-options load_from=<PRETRAIN_MODEL> [model.backbone.use_checkpoint=True] [other optional arguments]

# multi-gpu training
bash tools/dist_train.sh <CONFIG_FILE> <GPU_NUM> --cfg-options load_from=<PRETRAIN_MODEL> [model.backbone.use_checkpoint=True] [other optional arguments]

For example, to train a Swin-B model for SSv2 dataset with 8 gpus, run:

bash tools/dist_train.sh configs/recognition/swin/swin_base_patch244_window1677_sthv2.py 8 --cfg-options load_from=<PRETRAIN_MODEL>

Note: use_checkpoint is used to save GPU memory. Please refer to this page for more details.

Apex (optional):

We use apex for mixed precision training by default. To install apex, use our provided docker or run:

git clone https://github.com/NVIDIA/apex
cd apex
pip install -v --disable-pip-version-check --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./

If you would like to disable apex, comment out the following code block in the configuration files:

# do not use mmcv version fp16
fp16 = None
optimizer_config = dict(
    type="DistOptimizerHook",
    update_interval=1,
    grad_clip=None,
    coalesce=True,
    bucket_size_mb=-1,
    use_fp16=True,
)

Citation

If you find our work useful in your research, please cite:

@article{liu2021video,
  title={Video Swin Transformer},
  author={Liu, Ze and Ning, Jia and Cao, Yue and Wei, Yixuan and Zhang, Zheng and Lin, Stephen and Hu, Han},
  journal={arXiv preprint arXiv:2106.13230},
  year={2021}
}

@article{liu2021Swin,
  title={Swin Transformer: Hierarchical Vision Transformer using Shifted Windows},
  author={Liu, Ze and Lin, Yutong and Cao, Yue and Hu, Han and Wei, Yixuan and Zhang, Zheng and Lin, Stephen and Guo, Baining},
  journal={arXiv preprint arXiv:2103.14030},
  year={2021}
}

Other Links

Image Classification: See Swin Transformer for Image Classification.

Object Detection: See Swin Transformer for Object Detection.

Semantic Segmentation: See Swin Transformer for Semantic Segmentation.

Self-Supervised Learning: See MoBY with Swin Transformer.

Issues
  • KeyError:

    KeyError: "Recognizer3D: 'SwinTransformer3D is not in the models registry'"

    when i use : python tools/train.py configs/recognition/swin/swin_base_patch244_window877_kinetics400_22k.py

    an error occurred:

    Traceback (most recent call last): File "tools/train.py", line 199, in main() File "tools/train.py", line 154, in main model = build_model( File "/home/pytorch/lib/python3/site-packages/mmaction/models/builder.py", line 70, in build_model return build_localizer(cfg) File "/home/pytorch/lib/python3/site-packages/mmaction/models/builder.py", line 62, in build_localizer return LOCALIZERS.build(cfg) File "/home/pytorch/lib/python3/site-packages/mmcv/utils/registry.py", line 210, in build return self.build_func(*args, **kwargs, registry=self) File "/home/pytorch/lib/python3/site-packages/mmcv/cnn/builder.py", line 26, in build_model_from_cfg return build_from_cfg(cfg, registry, default_args) File "/home/pytorch/lib/python3/site-packages/mmcv/utils/registry.py", line 54, in build_from_cfg raise type(e)(f'{obj_cls.name}: {e}') KeyError: "Recognizer3D: 'SwinTransformer3D is not in the models registry'"

    How to solve it?

    opened by Note-Liu 8
  • Inaccessible Download Links

    Inaccessible Download Links

    The download links for Kinetics 400 pretrained models are on pan.baid.com. Many people are not able to download these at all because you need to create an account (with a phone number) to download files from that site. If you are in germany or the UK, like me, it is not possible to create an account to download these. Please host them somewhere else to make them available to the general public.

    opened by RaivoKoot 2
  • Where can I find the <PRETRAIN_MODEL>?

    Where can I find the ?

    Hi, thanks for this fascinating work! I want to follow the instructions bash tools/dist_train.sh <CONFIG_FILE> <GPU_NUM> --cfg-options load_from=<PRETRAIN_MODEL> [model.backbone.use_checkpoint=True] [other optional arguments] to run the program, but I don't know where I can find the pretrain model. So, I need some help, thanks all of you!

    opened by wsh-nie 2
  • About the 3D relative position bias

    About the 3D relative position bias

    In the subsection 3D relative position bias of your paper, a bias is added in the self-attention computaion. I don't fully understand it. Image_20210826162825

    According to your description, Q,K,V are all matrices with P*M^2 rows and d columns, so QK^T will be a square matrix with P*M^2 rows and P*M^2 columns. To make the summation valid, the 3D relative position bias B should also be a square matrix wtih P*M^2 rows and P*M^2 columns. So how are the values in B are set? Specifically, how the member B(i,j) of B is set ? I can't get any link between B and Image_20210826175012

    opened by TangMinLeo 2
  • Performance Reproducing of Swin-S

    Performance Reproducing of Swin-S

    Hi, Thanks for your great work. I'm trying to reproducing the performance of Swin-S on K-400. Using the released checkpoint for evaluation, I got an 80.11% accuracy; Evaluating the Swin-S model trained by myself, I got an 80.35% accuracy (still ~0.2% worse than the paper reported one). I wonder if anything is wrong. I doubt different validation data causes this as some videos are missing in the current K-400 dataset. My validation set contains 19,870 videos and the training set contains 239,687 videos, and how about the one you use? Thanks a lot in advance. Best.

    opened by JaminFong 1
  • Which version of Kinetics400 do you use?

    Which version of Kinetics400 do you use?

    There are many different version of kinetics 400 and some has more videos than others. Can I know which version do you use and what is the statistics of your train and test set, i.e. how many train and test videos do you have?

    opened by yxchng 1
  • swin config file work_dir = work_dir  = ..?

    swin config file work_dir = work_dir = ..?

    Why does the swin transformer config files have work_dir = work_dir = ... at https://github.com/SwinTransformer/Video-Swin-Transformer/blob/d13b5a30ce1d2398c376f228ab43759ebbe601d5/configs/recognition/swin/swin_base_patch244_window877_kinetics400_22k.py#L109

    opened by g1910 1
  • Drop path rate

    Drop path rate

    Hi,

    https://github.com/SwinTransformer/Video-Swin-Transformer/blob/db018fb8896251711791386bbd2127562fd8d6a6/configs/recognition/swin/swin_small_patch244_window877_kinetics400_1k.py#L4

    Code claims Swin small uses 0.1 drop path rate, but does it match with the report which reads 0.2? Swin-T and Swin-B uses 0.1, and 0.3 respectively as follows:

    https://github.com/SwinTransformer/Video-Swin-Transformer/blob/db018fb8896251711791386bbd2127562fd8d6a6/configs/recognition/swin/swin_tiny_patch244_window877_kinetics400_1k.py#L4 https://github.com/SwinTransformer/Video-Swin-Transformer/blob/db018fb8896251711791386bbd2127562fd8d6a6/configs/recognition/swin/swin_base_patch244_window877_kinetics400_1k.py#L4

    Thanks,

    opened by minostauros 1
  • What head to use ?

    What head to use ?

    Hi!

    There is a problem with the Video Swin Transformer code at the moment, as it is done in a way that makes it impossible to change the number of target classes in an end-to-end fashion. Wanting to use your model on another dataset containing for example 10 or 50 classes, the network gives me an output for the head.

    I build a model:

    model_VST = SwinTransformer3D()
    model_VST.cuda()
    
    

    You can see that I don't have any class number argument in the template paranthese, indeed your code doesn't take that as an argument. Here are the argumentsthat your Video Swin Transformer model takes as input:
    There is nothing for the number of target classes. Right now for an input shape of torch.Size([1, 8, 3, 64, 64]), I get an output shape of torch.Size([1, 768, 2, 2, 2]) from the model_VST (which is the SwinTransformer3D).

    **I understand that I need to add an head to it, but it is not clear at all in your code how to properly manage that. What head to use ? **

    Maybe you can make something like Facebook did with their TimeSformer, they made an end-to-end version of it for classification of videos.

    opened by Scienceseb 1
  • Keeping the temporal dimension

    Keeping the temporal dimension

    Hi, thanks for your fascinating work!

    I want to use the video swin transfomer as a backbone, but my model should produce an output for each input frame. Thus I want to keep the temporal dimension of input after the forward pass.

    So I'm thinking of changing the parameter like this patch_size=(1,4,4), but I am concerned about whether this could violate the authors' intention to make spatio-temporal feature.

    Apart from the memory usage issue, is it okay to make the temporal window size of the patch embedding to 1?

    opened by hongsukchoi 1
  • Details about input frames

    Details about input frames

    Hi there,

    Could you please explain "we sample a clip of 32 frames from each full length video using a temporal stride of 2 and spatial size of 224 ×224, resulting in 16×56×56 input 3D tokens" in detail? How do you sample a clip? Does the temporal stride of 2 means 2 FPS?

    opened by luohwu 0
  • skeleton based

    skeleton based

    skeleton based action recognition?

    opened by henbucuoshanghai 0
  • INSTALL

    INSTALL

    Dear the Authors,

    I would like to ask you how can we Install Video-Swin-Transformer and there are tutorial by notebook for training?

    Thank you very much.

    opened by wenjun90 1
  • Do you have any result on other video dataset, like Charades?

    Do you have any result on other video dataset, like Charades?

    I finetuned the swin-base on Charades with the setting as follow:

    1. optimizer: I used AdamW with lr=75e-6, betas=(0.9,0.999), weight_decay=5e-2, other settings just follow the config that you provided.
    2. learning policy: CosineAnnealing with linear warmup by 2.5 epochs.
    3. loss function: AsymmetricLoss [1] with neg=4 and pos=1
    4. train_pipeline: clip_len=32, frame_intreval=2, num_clip=1, with RandomRescale (256,340) followed the setting in slowfast network, RandomResizedCrop, Resize(224,224) and Flip(0.5)
    5. val_pipelne: clip_len=32, frame_intreval=2, num_clip=10, Resize(-1, 256), CenterCrop(256), Flip(0.5) When the total epoch is 30, I got final val map=44.96 When the total epoch is 60, I got final val map=45.88 Is my result correct? Do you have any suggestions about fine-tuning swin on other dataset?

    ref: [1] Ben-Baruch, E., Ridnik, T., Zamir, N., Noy, A., Friedman, I., Protter, M., & Zelnik-Manor, L. (2020). Asymmetric loss for multi-label classification. arXiv preprint arXiv:2009.14119. code: https://github.com/Alibaba-MIIL/ASL

    opened by visaVita 0
  • How to get parameters and FLOPs values in video swin transformer model?

    How to get parameters and FLOPs values in video swin transformer model?

    Can mmaction2 get this function by config files? Thanks for your work.

    opened by discaptain 1
  • KeyError: 'filename'

    KeyError: 'filename'

    I refered to https://github.com/SwinTransformer/Video-Swin-Transformer/blob/master/docs/tutorials/3_new_dataset.md to prepared the custom dataset. My annotations are like image But get the error: image I have revised the results['filename'] to results['filename_tmpl'], it has other errors. I want to know how to solve it, thanks!

    opened by clannadcl 3
  • ValueError: batch_size should be a positive integer value, but got batch_size=0

    ValueError: batch_size should be a positive integer value, but got batch_size=0

    If you feel we have help you, give us a STAR! :satisfied:

    Notice

    There are several common situations in the reimplementation issues as below

    1. Reimplement a model in the model zoo using the provided configs
    2. Reimplement a model in the model zoo on other dataset (e.g., custom datasets)
    3. Reimplement a custom model but all the components are implemented in MMAction2
    4. Reimplement a custom model with new modules implemented by yourself

    There are several things to do for different cases as below.

    • For case 1 & 3, please follow the steps in the following sections thus we could help to quick identify the issue.
    • For case 2 & 4, please understand that we are not able to do much help here because we usually do not know the full code and the users should be responsible to the code they write.
    • One suggestion for case 2 & 4 is that the users should first check whether the bug lies in the self-implemented code or the original code. For example, users can first make sure that the same model runs well on supported datasets. If you still need help, please describe what you have done and what you obtain in the issue, and follow the steps in the following sections and try as clear as possible so that we can better help you.

    Checklist

    1. I have searched related issues but cannot get the expected help.
    2. The issue has not been fixed in the latest version.

    Describe the issue

    The problem of CUDA out of memory appeared during model reimplementation. I adjusted videos_per_gpu to 1 (https://github.com/SwinTransformer/Video-Swin-Transformer/blob/db018fb8896251711791386bbd2127562fd8d6a6/configs/recognitionow py#L66), a new problem has occurred

    Reproduction

    1. What command or script did you run? python tools/train.py 'configs/recognition/swin/swin_base_patch244_window1677_sthv2.py'
    A placeholder for the command.
    
    1. What config dir you run? configs/recognition/swin/swin_base_patch244_window1677_sthv2.py
    A placeholder for the config.
    
    1. Did you make any modifications on the code or config? Did you understand what you have modified? I adjusted videos_per_gpu to 1 (https://github.com/SwinTransformer/Video-Swin-Transformer/blob/db018fb8896251711791386bbd2127562fd8d6a6/configs/recognitionow py#L66)

    2. What dataset did you use? sthv2 Environment

    3. Please run PYTHONPATH=${PWD}:$PYTHONPATH python mmaction/utils/collect_env.py to collect necessary environment information and paste it here.

    4. fatal: Not a git repository (or any parent up to mount point /home) Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set). sys.platform: linux Python: 3.6.10 (default, Dec 19 2019, 23:04:32) [GCC 5.4.0 20160609] CUDA available: True GPU 0,1,2,3,4,5: TITAN Xp CUDA_HOME: /usr/local/cuda-10.2 NVCC: Cuda compilation tools, release 10.2, V10.2.89 GCC: gcc (Ubuntu 5.4.0-6ubuntu1~16.04.12) 5.4.0 20160609 PyTorch: 1.6.0 PyTorch compiling details: PyTorch built with:

    • GCC 7.3
    • C++ Version: 201402
    • Intel(R) Math Kernel Library Version 2019.0.5 Product Build 20190808 for Intel(R) 64 architecture applications
    • Intel(R) MKL-DNN v1.5.0 (Git Hash e2ac1fac44c5078ca927cb9b90e1b3066a0b2ed0)
    • OpenMP 201511 (a.k.a. OpenMP 4.5)
    • NNPACK is enabled
    • CPU capability usage: AVX2
    • CUDA Runtime 10.2
    • NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75
    • CuDNN 7.6.5
    • Magma 2.5.2
    • Build settings: BLAS=MKL, BUILD_TYPE=Release, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DUSE_VULKAN_WRAPPER -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, USE_CUDA=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_STATIC_DISPATCH=OFF,

    TorchVision: 0.7.0 OpenCV: 4.4.0 MMCV: 1.3.14 MMCV Compiler: GCC 5.4 MMCV CUDA Compiler: 10.2 MMAction2: 0.18.0+

    1. You may add addition that may be helpful for locating the problem, such as
      1. How you installed PyTorch [e.g., pip, conda, source]
      2. Other environment variables that may be related (such as $PATH, $LD_LIBRARY_PATH, $PYTHONPATH, etc.)

    Results

    "but got batch_size={}".format(batch_size)) ValueError: batch_size should be a positive integer value, but got batch_size=0

    A placeholder for results comparison
    

    Issue fix

    If you have already identified the reason, you can provide the information here. If you are willing to create a PR to fix it, please also leave a comment here and that would be much appreciated!

    opened by YZW-explorer 1
  • Official Pytorch API or model?

    Official Pytorch API or model?

    Hi!

    I'm a researcher planning to use this to classify time-lapse of biomedical data. Is there any official Pytorch API with pretrained weights?

    I'm currently using ResNet 3D that is available off-the-shelf in Pytorch https://pytorch.org/vision/stable/models.html#video-classification

    But I believe transformers will give me better results.

    There are also these repos: https://github.com/haofanwang/video-swin-transformer-pytorch https://github.com/berniwal/swin-transformer-pytorch

    But I'm having trouble to get it to work, I'd like to use official code if possible. I have also searched here without results: https://paperswithcode.com/paper/video-swin-transformer#code

    We only have grayscale images so if it was possible to choose number of channels (and classes) it would be great.

    opened by cjh9 0
  • KeyError: 'patch_embed.proj.weight'

    KeyError: 'patch_embed.proj.weight'

    Describe the bug When trying to fine-tune a pretrained model, the following error occurs: KeyError: 'patch_embed.proj.weight' For the line: state_dict['patch_embed.proj.weight'] = state_dict['patch_embed.proj.weight'].unsqueeze(2).repeat(1,1,self.patch_size[0],1,1) / self.patch_size[0]

    Reproduction

    1. What command or script did you run?
    python3 tools/train.py configs/recognition/swin/swin_small_patch244_window877_kinetics400_1k.py --cfg-options model.backbone.pretrained=pretrained/swin_small_patch244_window877_kinetics400_1k.pth model.backbone.use_checkpoint=True
    
    1. Did you make any modifications on the code or config? Did you understand what you have modified?
    • Only changed the following:
    --- a/mmaction/models/backbones/swin_transformer.py
    +++ b/mmaction/models/backbones/swin_transformer.py
    -        state_dict = checkpoint['model']
    +       state_dict = checkpoint['state_dict'] #checkpoint['model']
    
    
    1. What dataset did you use? kinetics-based Environment

    2. Please run PYTHONPATH=${PWD}:$PYTHONPATH python mmaction/utils/collect_env.py to collect necessary environment information and paste it here. sys.platform: linux Python: 3.8.5 (default, Jan 27 2021, 15:41:15) [GCC 9.3.0] CUDA available: True GPU 0,1,2,3: Quadro RTX 8000 CUDA_HOME: /usr/local/cuda NVCC: Build cuda_11.1.TC455_06.29190527_0 GCC: gcc (Ubuntu 9.3.0-10ubuntu2) 9.3.0 PyTorch: 1.9.0+cu102 PyTorch compiling details: PyTorch built with:

    • GCC 7.3
    • C++ Version: 201402
    • Intel(R) Math Kernel Library Version 2020.0.0 Product Build 20191122 for Intel(R) 64 architecture applications
    • Intel(R) MKL-DNN v2.1.2 (Git Hash 98be7e8afa711dc9b66c8ff3504129cb82013cdb)
    • OpenMP 201511 (a.k.a. OpenMP 4.5)
    • NNPACK is enabled
    • CPU capability usage: AVX2
    • CUDA Runtime 10.2
    • NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_70,code=sm_70
    • CuDNN 7.6.5
    • Magma 2.5.2
    • Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=10.2, CUDNN_VERSION=7.6.5, CXX_COMPILER=/opt/rh/devtoolset-7/root/usr/bin/c++, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PT HREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing- field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-st rict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -falig ned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TO RCH_VERSION=1.9.0, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON,

    TorchVision: 0.10.0+cu102 OpenCV: 4.5.3 MMCV: 1.3.13 MMCV Compiler: GCC 7.3 MMCV CUDA Compiler: 10.2 MMAction2: 0.15.0+db018fb

    Error traceback

    If applicable, paste the error traceback here.

    2021-09-12 22:44:38,014 - mmaction - INFO - load model from: pretrained/swin_small_patch244_window877_kinetics400_1k.pth
    Traceback (most recent call last):
      File "<venv dir>/lib/python3.8/site-packages/mmcv/utils/registry.py", line 52, in build_from_cfg
        return obj_cls(**args)
      File "Video-Swin-Transformer/mmaction/models/recognizers/base.py", line 109, in __init__
        self.init_weights()
      File "Video-Swin-Transformer/mmaction/models/recognizers/base.py", line 126, in init_weights
        self.backbone.init_weights()
      File "Video-Swin-Transformer/mmaction/models/backbones/swin_transformer.py", line 641, in init_weights
        self.inflate_weights(logger)
      File "Video-Swin-Transformer/mmaction/models/backbones/swin_transformer.py", line 588, in inflate_weights
        state_dict['patch_embed.proj.weight'] = state_dict['patch_embed.proj.weight'].unsqueeze(2).repeat(1,1,self.patch_size[0],1,1) / self.patch_size[0]
    KeyError: 'patch_embed.proj.weight'
    
    During handling of the above exception, another exception occurred:
    
    Traceback (most recent call last):
      File "tools/train.py", line 201, in <module>
        main()
      File "tools/train.py", line 156, in main
        model = build_model(
      File "Video-Swin-Transformer/mmaction/models/builder.py", line 70, in build_model
        return build_localizer(cfg)
      File "Video-Swin-Transformer/mmaction/models/builder.py", line 62, in build_localizer
        return LOCALIZERS.build(cfg)
      File "<venv_dir>/lib/python3.8/site-packages/mmcv/utils/registry.py", line 212, in build
        return self.build_func(*args, **kwargs, registry=self)
      File "<venv dir>/lib/python3.8/site-packages/mmcv/cnn/builder.py", line 27, in build_model_from_cfg
        return build_from_cfg(cfg, registry, default_args)
      File "<venv dir>/lib/python3.8/site-packages/mmcv/utils/registry.py", line 55, in build_from_cfg
        raise type(e)(f'{obj_cls.__name__}: {e}')
    KeyError: "Recognizer3D: 'patch_embed.proj.weight'"
    
    

    Bug fix

    Looks like the pretrained models are compatible with an older version of mmaction - but I couldn't find which.

    Thanks!

    opened by r-kellerm 3
  • KeyError:

    KeyError: "Recognizer3D: 'SwinTransformer3D is not in the models registry'"

    Describe the bug

    While running the training script "tools/train.py" this error occurs.

    Reproduction Run the command:

    python Video-Swin-Transformer/tools/train.py _Video-Swin-Transformer/configs/recognition/swin/swin_base_patch244_window877_kinetics600_22k.py
    
    1. Did you make any modifications on the code or config? - No. Did you understand what you have modified? - No
    2. What dataset did you use? - Kinetics600

    Environment

    1. Please run PYTHONPATH=${PWD}:$PYTHONPATH python mmaction/utils/collect_env.py to collect necessary environment information and paste it here.
    Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).
    sys.platform: linux
    Python: 3.7.10 | packaged by conda-forge | (default, Feb 19 2021, 16:07:37) [GCC 9.3.0]
    CUDA available: True
    GPU 0: Tesla P100-PCIE-16GB
    CUDA_HOME: /usr/local/cuda
    NVCC: Build cuda_11.0_bu.TC445_37.28845127_0
    GCC: gcc (Debian 8.3.0-6) 8.3.0
    PyTorch: 1.7.0
    PyTorch compiling details: PyTorch built with:
      - GCC 7.3
      - C++ Version: 201402
      - Intel(R) Math Kernel Library Version 2020.0.0 Product Build 20191122 for Intel(R) 64 architecture applications
      - Intel(R) MKL-DNN v1.6.0 (Git Hash 5ef631a030a6f73131c77892041042805a06064f)
      - OpenMP 201511 (a.k.a. OpenMP 4.5)
      - NNPACK is enabled
      - CPU capability usage: AVX2
      - CUDA Runtime 10.2
      - NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75
      - CuDNN 7.6.5
      - Magma 2.5.2
      - Build settings: BLAS=MKL, BUILD_TYPE=Release, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DUSE_VULKAN_WRAPPER -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, USE_CUDA=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, 
    
    TorchVision: 0.10.0+cu102
    OpenCV: 4.5.3
    MMCV: 1.3.12
    MMCV Compiler: GCC 7.3
    MMCV CUDA Compiler: 10.2
    MMAction2: 0.17.0+
    
    1. You may add addition that may be helpful for locating the problem, such as
      • How you installed PyTorch [e.g., pip, conda, source] -- using pip
      • Other environment variables that may be related (such as $PATH, $LD_LIBRARY_PATH, $PYTHONPATH, etc.) - None

    Error traceback

    {'type': 'Recognizer3D', 'backbone': {'type': 'SwinTransformer3D', 'patch_size': (2, 4, 4), 'embed_dim': 128, 'depths': [2, 2, 18, 2], 'num_heads': [4, 8, 16, 32], 'window_size': (8, 7, 7), 'mlp_ratio': 4.0, 'qkv_bias': True, 'qk_scale': None, 'drop_rate': 0.0, 'attn_drop_rate': 0.0, 'drop_path_rate': 0.2, 'patch_norm': True}, 'cls_head': {'type': 'I3DHead', 'in_channels': 1024, 'num_classes': 600, 'spatial_type': 'avg', 'dropout_ratio': 0.5}, 'test_cfg': {'average_clips': 'prob', 'max_testing_views': 2}}
    Traceback (most recent call last):
      File "/opt/conda/lib/python3.7/site-packages/mmcv/utils/registry.py", line 52, in build_from_cfg
        return obj_cls(**args)
      File "/opt/conda/lib/python3.7/site-packages/mmaction/models/recognizers/base.py", line 75, in __init__
        self.backbone = builder.build_backbone(backbone)
      File "/opt/conda/lib/python3.7/site-packages/mmaction/models/builder.py", line 29, in build_backbone
        return BACKBONES.build(cfg)
      File "/opt/conda/lib/python3.7/site-packages/mmcv/utils/registry.py", line 212, in build
        return self.build_func(*args, **kwargs, registry=self)
      File "/opt/conda/lib/python3.7/site-packages/mmcv/cnn/builder.py", line 27, in build_model_from_cfg
        return build_from_cfg(cfg, registry, default_args)
      File "/opt/conda/lib/python3.7/site-packages/mmcv/utils/registry.py", line 45, in build_from_cfg
        f'{obj_type} is not in the {registry.name} registry')
    KeyError: 'SwinTransformer3D is not in the models registry'
    
    During handling of the above exception, another exception occurred:
    
    Traceback (most recent call last):
      File "Video-Swin-Transformer/tools/train.py", line 196, in <module>
        main()
      File "Video-Swin-Transformer/tools/train.py", line 154, in main
        model = build_model(cfg.model,train_cfg=cfg.get('train_cfg'),test_cfg=cfg.get('test_cfg'))
      File "/opt/conda/lib/python3.7/site-packages/mmaction/models/builder.py", line 70, in build_model
        return build_localizer(cfg)
      File "/opt/conda/lib/python3.7/site-packages/mmaction/models/builder.py", line 62, in build_localizer
        return LOCALIZERS.build(cfg)
      File "/opt/conda/lib/python3.7/site-packages/mmcv/utils/registry.py", line 212, in build
        return self.build_func(*args, **kwargs, registry=self)
      File "/opt/conda/lib/python3.7/site-packages/mmcv/cnn/builder.py", line 27, in build_model_from_cfg
        return build_from_cfg(cfg, registry, default_args)
      File "/opt/conda/lib/python3.7/site-packages/mmcv/utils/registry.py", line 55, in build_from_cfg
        raise type(e)(f'{obj_cls.__name__}: {e}')
    KeyError: "Recognizer3D: 'SwinTransformer3D is not in the models registry'"
    

    Other packages versions

    mmcv-full == 1.3.12 pytorch==1.7.0 mmaction2==0.18.0 mmdet == 2.16.0 scipy==1.6.3 numpy==1.19.5

    opened by yugrocks 6
Owner
Swin Transformer
This organization maintains repositories built on Swin Transformers. The pretrained models locate at https://github.com/microsoft/Swin-Transformer
Swin Transformer
Official implementation of AAAI-21 paper "Label Confusion Learning to Enhance Text Classification Models"

Description: This is the official implementation of our AAAI-21 accepted paper Label Confusion Learning to Enhance Text Classification Models. The str

null 74 Nov 19, 2021
Official PyTorch implementation for paper Context Matters: Graph-based Self-supervised Representation Learning for Medical Images

Context Matters: Graph-based Self-supervised Representation Learning for Medical Images Official PyTorch implementation for paper Context Matters: Gra

null 39 Nov 19, 2021
The official implementation of NeMo: Neural Mesh Models of Contrastive Features for Robust 3D Pose Estimation [ICLR-2021]. https://arxiv.org/pdf/2101.12378.pdf

NeMo: Neural Mesh Models of Contrastive Features for Robust 3D Pose Estimation [ICLR-2021] Release Notes The offical PyTorch implementation of NeMo, p

Angtian Wang 67 Oct 29, 2021
StyleGAN2-ADA - Official PyTorch implementation

Abstract: Training generative adversarial networks (GAN) using too little data typically leads to discriminator overfitting, causing training to diverge. We propose an adaptive discriminator augmentation mechanism that significantly stabilizes training in limited data regimes.

NVIDIA Research Projects 2.2k Dec 1, 2021
Official implementation of the ICLR 2021 paper

You Only Need Adversarial Supervision for Semantic Image Synthesis Official PyTorch implementation of the ICLR 2021 paper "You Only Need Adversarial S

Bosch Research 165 Nov 8, 2021
Official PyTorch implementation of Joint Object Detection and Multi-Object Tracking with Graph Neural Networks

This is the official PyTorch implementation of our paper: "Joint Object Detection and Multi-Object Tracking with Graph Neural Networks". Our project website and video demos are here.

Richard Wang 405 Nov 18, 2021
Official implementation of the paper Image Generators with Conditionally-Independent Pixel Synthesis https://arxiv.org/abs/2011.13775

CIPS -- Official Pytorch Implementation of the paper Image Generators with Conditionally-Independent Pixel Synthesis Requirements pip install -r requi

Multimodal Lab @ Samsung AI Center Moscow 155 Nov 24, 2021
Official pytorch implementation of paper "Image-to-image Translation via Hierarchical Style Disentanglement".

HiSD: Image-to-image Translation via Hierarchical Style Disentanglement Official pytorch implementation of paper "Image-to-image Translation

null 321 Nov 26, 2021
Official pytorch implementation of paper "Inception Convolution with Efficient Dilation Search" (CVPR 2021 Oral).

IC-Conv This repository is an official implementation of the paper Inception Convolution with Efficient Dilation Search. Getting Started Download Imag

Jie Liu 53 Nov 21, 2021
Official PyTorch Implementation of Unsupervised Learning of Scene Flow Estimation Fusing with Local Rigidity

UnRigidFlow This is the official PyTorch implementation of UnRigidFlow (IJCAI2019). Here are two sample results (~10MB gif for each) of our unsupervis

Liang Liu 25 Oct 1, 2021
Official implementation of our paper "LLA: Loss-aware Label Assignment for Dense Pedestrian Detection" in Pytorch.

LLA: Loss-aware Label Assignment for Dense Pedestrian Detection This project provides an implementation for "LLA: Loss-aware Label Assignment for Dens

null 31 Nov 6, 2021
Official implementation of Self-supervised Graph Attention Networks (SuperGAT), ICLR 2021.

SuperGAT Official implementation of Self-supervised Graph Attention Networks (SuperGAT). This model is presented at How to Find Your Friendly Neighbor

Dongkwan Kim 90 Oct 26, 2021
An official implementation of "SFNet: Learning Object-aware Semantic Correspondence" (CVPR 2019, TPAMI 2020) in PyTorch.

PyTorch implementation of SFNet This is the implementation of the paper "SFNet: Learning Object-aware Semantic Correspondence". For more information,

CV Lab @ Yonsei University 68 Nov 27, 2021
This project is the official implementation of our accepted ICLR 2021 paper BiPointNet: Binary Neural Network for Point Clouds.

BiPointNet: Binary Neural Network for Point Clouds Created by Haotong Qin, Zhongang Cai, Mingyuan Zhang, Yifu Ding, Haiyu Zhao, Shuai Yi, Xianglong Li

Haotong Qin 49 Oct 28, 2021
Official code implementation for "Personalized Federated Learning using Hypernetworks"

Personalized Federated Learning using Hypernetworks This is an official implementation of Personalized Federated Learning using Hypernetworks paper. [

Aviv Shamsian 74 Nov 26, 2021
StyleGAN2 - Official TensorFlow Implementation

StyleGAN2 - Official TensorFlow Implementation

NVIDIA Research Projects 8.7k Nov 26, 2021
Old Photo Restoration (Official PyTorch Implementation)

Bringing Old Photo Back to Life (CVPR 2020 oral)

Microsoft 9.4k Dec 3, 2021
Official implementation of "GS-WGAN: A Gradient-Sanitized Approach for Learning Differentially Private Generators" (NeurIPS 2020)

GS-WGAN This repository contains the implementation for GS-WGAN: A Gradient-Sanitized Approach for Learning Differentially Private Generators (NeurIPS

null 27 Nov 25, 2021
Official PyTorch implementation of Spatial Dependency Networks.

Spatial Dependency Networks: Neural Layers for Improved Generative Image Modeling Đorđe Miladinović   Aleksandar Stanić   Stefan Bauer   Jürgen Schmid

Djordje Miladinovic 35 Nov 15, 2021