UniFormer - official implementation of UniFormer

Overview

UniFormer

This repo is the official implementation of "Uniformer: Unified Transformer for Efficient Spatiotemporal Representation Learning". It currently includes code and models for the following tasks:

Updates

01/13/2022

[Initial commits]:

  1. Pretrained models on ImageNet-1K, Kinetics-400, Kinetics-600, Something-Something V1&V2

  2. The supported code and models for image classification and video classification are provided.

Introduction

UniFormer (Unified transFormer) is introduce in arxiv, which effectively unifies 3D convolution and spatiotemporal self-attention in a concise transformer format. We adopt local MHRA in shallow layers to largely reduce computation burden and global MHRA in deep layers to learn global token relation.

UniFormer achieves strong performance on video classification. With only ImageNet-1K pretraining, our UniFormer achieves 82.9%/84.8% top-1 accuracy on Kinetics-400/Kinetics-600, while requiring 10x fewer GFLOPs than other comparable methods (e.g., 16.7x fewer GFLOPs than ViViT with JFT-300M pre-training). For Something-Something V1 and V2, our UniFormer achieves 60.9% and 71.2% top-1 accuracy respectively, which are new state-of-the-art performances.

teaser

Main results on ImageNet-1K

Please see image_classification for more details.

More models with large resolution and token labeling will be released soon.

Model Pretrain Resolution Top-1 #Param. FLOPs
UniFormer-S ImageNet-1K 224x224 82.9 22M 3.6G
UniFormer-S† ImageNet-1K 224x224 83.4 24M 4.2G
UniFormer-B ImageNet-1K 224x224 83.9 50M 8.3G

Main results on Kinetics-400

Please see video_classification for more details.

Model Pretrain #Frame Sampling Method FLOPs K400 Top-1 K600 Top-1
UniFormer-S ImageNet-1K 16x1x4 16x4 167G 80.8 82.8
UniFormer-S ImageNet-1K 16x1x4 16x8 167G 80.8 82.7
UniFormer-S ImageNet-1K 32x1x4 32x4 438G 82.0 -
UniFormer-B ImageNet-1K 16x1x4 16x4 387G 82.0 84.0
UniFormer-B ImageNet-1K 16x1x4 16x8 387G 81.7 83.4
UniFormer-B ImageNet-1K 32x1x4 32x4 1036G 82.9 84.5*

* Since Kinetics-600 is too large to train (>1 month in single node with 8 A100 GPUs), we provide model trained in multi node (around 2 weeks with 32 V100 GPUs), but the result is lower due to the lack of tuning hyperparameters.

Main results on Something-Something

Please see video_classification for more details.

Model Pretrain #Frame FLOPs SSV1 Top-1 SSV2 Top-1
UniFormer-S K400 16x3x1 125G 57.2 67.7
UniFormer-S K600 16x3x1 125G 57.6 69.4
UniFormer-S K400 32x3x1 329G 58.8 69.0
UniFormer-S K600 32x3x1 329G 59.9 70.4
UniFormer-B K400 16x3x1 290G 59.1 70.4
UniFormer-B K600 16x3x1 290G 58.8 70.2
UniFormer-B K400 32x3x1 777G 60.9 71.1
UniFormer-B K600 32x3x1 777G 61.0 71.2

Main results on downstream tasks

We have conducted extensive experiments on downstream tasks and achieved comparable results with SOTA models.

Code and models will be released in two weeks.

Cite Uniformer

If you find this repository useful, please use the following BibTeX entry for citation.

@misc{li2022uniformer,
      title={Uniformer: Unified Transformer for Efficient Spatiotemporal Representation Learning}, 
      author={Kunchang Li and Yali Wang and Peng Gao and Guanglu Song and Yu Liu and Hongsheng Li and Yu Qiao},
      year={2022},
      eprint={2201.04676},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

License

This project is released under the MIT license. Please see the LICENSE file for more information.

Contributors and Contact Information

UniFormer is maintained by Kunchang Li.

For help or issues using UniFormer, please submit a GitHub issue.

For other communications related to UniFormer, please contact Kunchang Li ([email protected]).

Comments
  • Huggingface Spaces

    Huggingface Spaces

    Hi, would you be interested in sharing a web demo on Huggingface Spaces for UniFormer?

    It would make this model more accessible as it would allow people to try out the model directly from the browser. Some other recent machine learning model repos have set up Spaces for easy access:

    github: https://github.com/salesforce/BLIP Spaces: https://huggingface.co/spaces/akhaliq/BLIP

    github: https://github.com/facebookresearch/omnivore Spaces: https://huggingface.co/spaces/akhaliq/omnivore

    Spaces is completely free, and I can help setup a Gradio Space. Here are some getting started instructions if you'd prefer to do it yourself: https://huggingface.co/blog/gradio-spaces

    opened by AK391 18
  • Training hangs at the end of the first epoch in image classification task.

    Training hangs at the end of the first epoch in image classification task.

    Dear author:

    When I training the Uniformer model with 8 GPUs, I start the code with the following run.sh:

    work_path=$(dirname $0)
    PYTHONPATH=$PYTHONPATH:../../ \
    python -m torch.distributed.launch --nproc_per_node=8 --master_port=22335 --use_env main.py \
        --model uniformer_base \
        --batch-size 64 \
        --num_workers 8 \
        --drop-path 0.3 \
        --epoch 300 \
        --dist-eval \
        --output_dir ${work_path}/ckpt \
        2>&1 | tee -a ${work_path}/log.txt
    

    And the logs are (I have deleted the display of model details):

    /home/data/user/local/anaconda3/envs/uniformer/lib/python3.9/site-packages/torch/distributed/launch.py:163: DeprecationWarning: The 'warn' method is deprecated, use 'warning' instead
      logger.warn(
    The module torch.distributed.launch is deprecated and going to be removed in future.Migrate to torch.distributed.run
    INFO:torch.distributed.launcher.api:Starting elastic_operator with launch configs:
      entrypoint       : main.py
      min_nodes        : 1
      max_nodes        : 1
      nproc_per_node   : 8
      run_id           : none
      rdzv_backend     : static
      rdzv_endpoint    : 127.0.0.1:22335
      rdzv_configs     : {'rank': 0, 'timeout': 900}
      max_restarts     : 3
      monitor_interval : 5
      log_dir          : None
      metrics_cfg      : {}
    
    INFO:torch.distributed.elastic.agent.server.local_elastic_agent:log directory set to: /tmp/torchelastic_yejsfquq/none_6nw5du9x
    INFO:torch.distributed.elastic.agent.server.api:[default] starting workers for entrypoint: python
    INFO:torch.distributed.elastic.agent.server.api:[default] Rendezvous'ing worker group
    /home/data/user/local/anaconda3/envs/uniformer/lib/python3.9/site-packages/torch/distributed/elastic/utils/store.py:52: FutureWarning: This is an experimental API and will be changed in future.
      warnings.warn(
    INFO:torch.distributed.elastic.agent.server.api:[default] Rendezvous complete for workers. Result:
      restart_count=0
      master_addr=127.0.0.1
      master_port=22335
      group_rank=0
      group_world_size=1
      local_ranks=[0, 1, 2, 3, 4, 5, 6, 7]
      role_ranks=[0, 1, 2, 3, 4, 5, 6, 7]
      global_ranks=[0, 1, 2, 3, 4, 5, 6, 7]
      role_world_sizes=[8, 8, 8, 8, 8, 8, 8, 8]
      global_world_sizes=[8, 8, 8, 8, 8, 8, 8, 8]
    
    INFO:torch.distributed.elastic.agent.server.api:[default] Starting worker group
    INFO:torch.distributed.elastic.multiprocessing:Setting worker0 reply file to: /tmp/torchelastic_yejsfquq/none_6nw5du9x/attempt_0/0/error.json
    INFO:torch.distributed.elastic.multiprocessing:Setting worker1 reply file to: /tmp/torchelastic_yejsfquq/none_6nw5du9x/attempt_0/1/error.json
    INFO:torch.distributed.elastic.multiprocessing:Setting worker2 reply file to: /tmp/torchelastic_yejsfquq/none_6nw5du9x/attempt_0/2/error.json
    INFO:torch.distributed.elastic.multiprocessing:Setting worker3 reply file to: /tmp/torchelastic_yejsfquq/none_6nw5du9x/attempt_0/3/error.json
    INFO:torch.distributed.elastic.multiprocessing:Setting worker4 reply file to: /tmp/torchelastic_yejsfquq/none_6nw5du9x/attempt_0/4/error.json
    INFO:torch.distributed.elastic.multiprocessing:Setting worker5 reply file to: /tmp/torchelastic_yejsfquq/none_6nw5du9x/attempt_0/5/error.json
    INFO:torch.distributed.elastic.multiprocessing:Setting worker6 reply file to: /tmp/torchelastic_yejsfquq/none_6nw5du9x/attempt_0/6/error.json
    INFO:torch.distributed.elastic.multiprocessing:Setting worker7 reply file to: /tmp/torchelastic_yejsfquq/none_6nw5du9x/attempt_0/7/error.json
    Please update your PyTorchVideo to latest master
    Please update your PyTorchVideo to latest master
    Please update your PyTorchVideo to latest master
    Please update your PyTorchVideo to latest master
    Please update your PyTorchVideo to latest master
    Please update your PyTorchVideo to latest master
    Please update your PyTorchVideo to latest master
    Please update your PyTorchVideo to latest master
    | distributed init (rank 3): env://
    | distributed init (rank 6): env://
    | distributed init (rank 1): env://
    | distributed init (rank 0): env://
    | distributed init (rank 5): env://
    | distributed init (rank 2): env://
    | distributed init (rank 7): env://
    | distributed init (rank 4): env://
    [W ProcessGroupNCCL.cpp:1569] Rank 0 using best-guess GPU 0 to perform barrier as devices used by this process are currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect.Specify device_ids in barrier() to force use of a particular device.
    [W ProcessGroupNCCL.cpp:1569] Rank 4 using best-guess GPU 4 to perform barrier as devices used by this process are currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect.Specify device_ids in barrier() to force use of a particular device.
    [W ProcessGroupNCCL.cpp:1569] Rank 5 using best-guess GPU 5 to perform barrier as devices used by this process are currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect.Specify device_ids in barrier() to force use of a particular device.
    [W ProcessGroupNCCL.cpp:1569] Rank 3 using best-guess GPU 3 to perform barrier as devices used by this process are currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect.Specify device_ids in barrier() to force use of a particular device.
    [W ProcessGroupNCCL.cpp:1569] Rank 2 using best-guess GPU 2 to perform barrier as devices used by this process are currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect.Specify device_ids in barrier() to force use of a particular device.
    [W ProcessGroupNCCL.cpp:1569] Rank 6 using best-guess GPU 6 to perform barrier as devices used by this process are currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect.Specify device_ids in barrier() to force use of a particular device.
    [W ProcessGroupNCCL.cpp:1569] Rank 1 using best-guess GPU 1 to perform barrier as devices used by this process are currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect.Specify device_ids in barrier() to force use of a particular device.
    [W ProcessGroupNCCL.cpp:1569] Rank 7 using best-guess GPU 7 to perform barrier as devices used by this process are currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect.Specify device_ids in barrier() to force use of a particular device.
    Warning: Enabling distributed evaluation with an eval dataset not divisible by process number. This will slightly alter validation results as extra duplicate entries are added to achieve equal num of samples per-process.
    Creating model: uniformer_base
    number of params: 49468752
    Start training for 300 epochs
    [W reducer.cpp:1158] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters in the forward pass. This flag results in an extra traversal of the autograd graph every iteration,  which can adversely affect performance. If your model indeed never has any unused parameters in the forward pass, consider turning this flag off. Note that this warning may be a false positive if your model has flow control causing later iterations to have unused parameters. (function operator())
    [W reducer.cpp:1158] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters in the forward pass. This flag results in an extra traversal of the autograd graph every iteration,  which can adversely affect performance. If your model indeed never has any unused parameters in the forward pass, consider turning this flag off. Note that this warning may be a false positive if your model has flow control causing later iterations to have unused parameters. (function operator())
    [W reducer.cpp:1158] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters in the forward pass. This flag results in an extra traversal of the autograd graph every iteration,  which can adversely affect performance. If your model indeed never has any unused parameters in the forward pass, consider turning this flag off. Note that this warning may be a false positive if your model has flow control causing later iterations to have unused parameters. (function operator())
    [W reducer.cpp:1158] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters in the forward pass. This flag results in an extra traversal of the autograd graph every iteration,  which can adversely affect performance. If your model indeed never has any unused parameters in the forward pass, consider turning this flag off. Note that this warning may be a false positive if your model has flow control causing later iterations to have unused parameters. (function operator())
    [W reducer.cpp:1158] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters in the forward pass. This flag results in an extra traversal of the autograd graph every iteration,  which can adversely affect performance. If your model indeed never has any unused parameters in the forward pass, consider turning this flag off. Note that this warning may be a false positive if your model has flow control causing later iterations to have unused parameters. (function operator())
    [W reducer.cpp:1158] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters in the forward pass. This flag results in an extra traversal of the autograd graph every iteration,  which can adversely affect performance. If your model indeed never has any unused parameters in the forward pass, consider turning this flag off. Note that this warning may be a false positive if your model has flow control causing later iterations to have unused parameters. (function operator())
    [W reducer.cpp:1158] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters in the forward pass. This flag results in an extra traversal of the autograd graph every iteration,  which can adversely affect performance. If your model indeed never has any unused parameters in the forward pass, consider turning this flag off. Note that this warning may be a false positive if your model has flow control causing later iterations to have unused parameters. (function operator())
    [W reducer.cpp:1158] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters in the forward pass. This flag results in an extra traversal of the autograd graph every iteration,  which can adversely affect performance. If your model indeed never has any unused parameters in the forward pass, consider turning this flag off. Note that this warning may be a false positive if your model has flow control causing later iterations to have unused parameters. (function operator())
    Epoch: [0]  [0/9]  eta: 0:01:24  lr: 0.000001  loss: 6.0915 (6.0915)  time: 9.3436  data: 3.3803  max mem: 11905
    Epoch: [0]  [8/9]  eta: 0:00:02  lr: 0.000001  loss: 6.0828 (6.0922)  time: 2.0037  data: 0.3758  max mem: 12141
    Epoch: [0] Total time: 0:00:18 (2.0372 s / it)
    Averaged stats: lr: 0.000001  loss: 6.0828 (6.0883)
    [W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
    [W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
    [W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
    [W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
    [W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
    [W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
    [W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
    [W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
    [W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
    [W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
    [W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
    [W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
    [W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
    [W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
    [W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
    [W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
    [W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
    [W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
    [W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
    [W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
    [W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
    [W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
    [W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
    [W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
    [W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
    [W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
    [W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
    [W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
    [W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
    [W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
    [W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
    [W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
    [W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
    [W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
    [W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
    [W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
    [W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
    [W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
    [W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
    [W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
    [W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
    [W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
    [W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
    [W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
    [W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
    [W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
    [W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
    [W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
    [W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
    [W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
    [W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
    [W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
    [W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
    [W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
    [W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
    [W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
    [W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
    [W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
    [W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
    [W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
    [W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
    [W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
    [W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
    [W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
    

    The displayed logs are very messy and maddening, and I am totally no clue whether the code runs correctly. These warnings only occur when distributed training. Have you ever met this situation? I will appreciate it if you can give me some advice.

    opened by realgump 17
  • How to use the pretrained model uniformer_base_in1k.pth as my backbone ?

    How to use the pretrained model uniformer_base_in1k.pth as my backbone ?

    There are some problems when I use the pre-trained model uniformer_base_in1k.pth as my backbone? missing keys: ['patch_embed1.norm.weight', 'patch_embed1.norm.bias', 'patch_embed1.proj.weight', 'patch_embed1.proj.bias', 'patch_embed2.norm.weight', ..... unexpected keys: ['model']

    opened by hongsheng-Z 16
  • AssertionError: The `num_classes` (54) in ConvFCBBoxHead of MMDistributedDataParallel does not matches the length of `CLASSES` 80) in CocoDataset

    AssertionError: The `num_classes` (54) in ConvFCBBoxHead of MMDistributedDataParallel does not matches the length of `CLASSES` 80) in CocoDataset

    CUDA_VISIBLE_DEVICES=0,1,2,3 ./tools/dist_train.sh ./exp/cascade_mask_rcnn_3x_ms_hybrid_base/config.py 4 --cfg-options model.pretrained='/home/lbc/UniFormer/object_detection/pretrained/cascade_mask_rcnn_3x_ms_hybrid_base.pth'

    `base = [ '../../configs/base/models/cascade_mask_rcnn_uniformer_fpn.py', '../../configs/base/datasets/coco_instance.py', '../../configs/base/schedules/schedule_1x.py', '../../configs/base/default_runtime.py' ]

    model = dict( backbone=dict( embed_dim=[64, 128, 320, 512], layers=[5, 8, 20, 7], head_dim=64, drop_path_rate=0.4, use_checkpoint=True, checkpoint_num=[0, 0, 20, 0], windows=False, hybrid=True, window_size=14 ), neck=dict(in_channels=[64, 128, 320, 512]), roi_head=dict( bbox_head=[ dict( type='ConvFCBBoxHead', num_shared_convs=4, num_shared_fcs=1, in_channels=256, conv_out_channels=256, fc_out_channels=1024, roi_feat_size=7, num_classes=54, bbox_coder=dict( type='DeltaXYWHBBoxCoder', target_means=[0., 0., 0., 0.], target_stds=[0.1, 0.1, 0.2, 0.2]), reg_class_agnostic=False, reg_decoded_bbox=True, norm_cfg=dict(type='SyncBN', requires_grad=True), loss_cls=dict( type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0), loss_bbox=dict(type='GIoULoss', loss_weight=10.0)), dict( type='ConvFCBBoxHead', num_shared_convs=4, num_shared_fcs=1, in_channels=256, conv_out_channels=256, fc_out_channels=1024, roi_feat_size=7, num_classes=54, bbox_coder=dict( type='DeltaXYWHBBoxCoder', target_means=[0., 0., 0., 0.], target_stds=[0.05, 0.05, 0.1, 0.1]), reg_class_agnostic=False, reg_decoded_bbox=True, norm_cfg=dict(type='SyncBN', requires_grad=True), loss_cls=dict( type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0), loss_bbox=dict(type='GIoULoss', loss_weight=10.0)), dict( type='ConvFCBBoxHead', num_shared_convs=4, num_shared_fcs=1, in_channels=256, conv_out_channels=256, fc_out_channels=1024, roi_feat_size=7, num_classes=54, bbox_coder=dict( type='DeltaXYWHBBoxCoder', target_means=[0., 0., 0., 0.], target_stds=[0.033, 0.033, 0.067, 0.067]), reg_class_agnostic=False, reg_decoded_bbox=True, norm_cfg=dict(type='SyncBN', requires_grad=True), loss_cls=dict( type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0), loss_bbox=dict(type='GIoULoss', loss_weight=10.0)) ]))

    img_norm_cfg = dict( mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)

    augmentation strategy originates from DETR / Sparse RCNN

    train_pipeline = [ dict(type='LoadImageFromFile'), dict(type='LoadAnnotations', with_bbox=True, with_mask=True), dict(type='RandomFlip', flip_ratio=0.5), dict(type='AutoAugment', policies=[ [ dict(type='Resize', img_scale=[(480, 1333), (512, 1333), (544, 1333), (576, 1333), (608, 1333), (640, 1333), (672, 1333), (704, 1333), (736, 1333), (768, 1333), (800, 1333)], multiscale_mode='value', keep_ratio=True) ], [ dict(type='Resize', img_scale=[(400, 1333), (500, 1333), (600, 1333)], multiscale_mode='value', keep_ratio=True), dict(type='RandomCrop', crop_type='absolute_range', crop_size=(384, 600), allow_negative_crop=True), dict(type='Resize', img_scale=[(480, 1333), (512, 1333), (544, 1333), (576, 1333), (608, 1333), (640, 1333), (672, 1333), (704, 1333), (736, 1333), (768, 1333), (800, 1333)], multiscale_mode='value', override=True, keep_ratio=True) ] ]), dict(type='Normalize', **img_norm_cfg), dict(type='Pad', size_divisor=32), dict(type='DefaultFormatBundle'), dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels', 'gt_masks']), ] data = dict(train=dict(pipeline=train_pipeline))

    optimizer = dict(delete=True, type='AdamW', lr=0.0001, betas=(0.9, 0.999), weight_decay=0.05, paramwise_cfg=dict(custom_keys={'absolute_pos_embed': dict(decay_mult=0.), 'relative_position_bias_table': dict(decay_mult=0.), 'norm': dict(decay_mult=0.)})) lr_config = dict(step=[27, 33]) runner = dict(type='EpochBasedRunnerAmp', max_epochs=36)

    do not use mmdet version fp16

    fp16 = None optimizer_config = dict( type="DistOptimizerHook", update_interval=1, grad_clip=None, coalesce=True, bucket_size_mb=-1, use_fp16=True, )Using /home/lbc/miniconda3/envs/openmmlab_new/lib/python3.7/site-packages Finished processing dependencies for mmdet==2.11.0 `

    opened by Williamlizl 9
  • Video classification, k400 test, top1 is only 0.03

    Video classification, k400 test, top1 is only 0.03

    I ran the test code before and found that top1 is only 0.03. I can guarantee that my data and labels must be aligned. Because there is no dataset provided by the author, I can only find the dataset on the Internet, so the labels I use, they are all generated by myself based on the data set, but the results that came out were so disappointing to me. In previous issues, I found that other people also had this problem, and then I based on the kinetics_400_categroies.txt provided by the author , the test.csv is regenerated, which means that the test.csv file must be regenerated using the kinetics_400_categroies.txt provided by the author . Then I ran it again with the new kinetics_400_categroies.txt and test.csv, this time with a top1 of 0.81. This is really weird. This is really weird. This is really weird,

    opened by LeiYiNuist 8
  • Some questions about video classification.

    Some questions about video classification.

    Hello, this is a great work. But there are something I want to ask: 1.How to prepare the train and val list file for Something-Something V1 dataset? Are these of the same format of Something-Something V2 mentioned in dataset.md? Can you please provide those?

    1. I ever adopt the tools in TSM project to extract frames of Something-Something V2. The difference is that the extracted frames are sparse. For example, according to the train.csc you provide in dataset.md the frames of video 1 is totally 117 while in TSM project the number is 45. So, why do you adopt a much dense extracting rate? How do different rates influence the final performance?
    opened by RongchangLi 8
  • Pretrained window/hybrid SABlock backbone model for Detection task

    Pretrained window/hybrid SABlock backbone model for Detection task

    Hi, thank you for the contribution to this super-rad work!

    Wonder that, in your experiments, whether the backbone models used in Detection task with stage-3 window/hybrid SABlock (S-h14, B-h14) are needed to be pretrained on imagenet?

    If so, could these backbones with window/hybrid SABlock be released? And if not, are the weights loaded directly from the regular model with global window in stage-3?

    Thanks!

    opened by CanyonWind 8
  • Some questions about loading pretrain model and training.

    Some questions about loading pretrain model and training.

    Dear author:

    I want to fine-tune your model on my dataset of video classification task using provided UniFormer-B model. However, I met some problems.

    When I start training, the top1 error is always 100 image

    I have successfully fine-tune my dataset using code facebook SlowFast with different models before.

    Before I training UniFormer, I have:

    • replaced the dataset folder with my previous SlowFast project;
    • set AUG and MIXUP disable.

    Have I possibly made any mistakes?

    Besides that, I notice that SlowFast code using slowfast/utils/checkpoint.py to load pretrain model, while the Uniformer code add a new function in slowfast/model/uniformer.py and copy the origin slowfast/utils/checkpoint.py to slowfast/utils/checkpoint_amp.py. What is the difference between loading pretrain model from the these two functions?

    您好:

    我想用UniFormer模型在我的数据集上fine-tune,我的任务是video classification,用的是文档中提供的UniFormer-B model,但是遇到了一些问题。

    我训练的时候,top-1 error一直是100,如上图。

    我之前用facebook SlowFast 的code在不同的模型上面都可以成功训练。训Uniformer的时候,我把我之前project的dataset文件夹直接覆盖了过来,然后关掉了AUG和MIXUP。不知道是哪里出了问题?

    此外,SlowFast的code是在slowfast/utils/checkpoint.py 里面读的预训练模型,但我看您的code是在 slowfast/model/uniformer.py 重新实现了一个函数来读,并且把原来的slowfast/utils/checkpoint.py重命名为slowfast/utils/checkpoint_amp.py(但我好像没看到两者之间的差别)。我想了解一下用uniformer.py和checkpoint_amp.py加载预训练模型有区别吗?

    opened by realgump 7
  • Strange test results using provided model checkpoints

    Strange test results using provided model checkpoints

    We (@omubarek, @VenerableSpace) have been running Uniformer test only (TRAIN.ENABLE=FALSE), on Kinetics-400 test split. We also specified the following provided model checkpoints via TEST.CHECKPOINT_FILE_PATH for two different runs; one with Uniformer-S, another with Uniformer-B:

    • Uniformer-S, #Frame: 8x1x4, Sampling Stride: 8, filename: uniformer_small_k400_8x8.pth
    • Uniformer-B, #Frame: 8x1x4, Sampling Stride: 8, filename: uniformer_base_k400_8x8.pth

    In both cases, we obtained very similar low top1 / top5 accuracies, such as: {"split": "test_final", "top1_acc": "0.04", "top5_acc": "0.85"}.

    We've tested the pre-processed data using other models different than Uniformer and we could reproduce their results.

    Could you please help us figuring out what we may be missing?

    opened by omubarek 7
  • New dataset

    New dataset

    您好, 我在尝试迁移到新的数据集(例如UCF101)进行测试训练时遇到了问题。我已经查阅了issue#56, issue#17但仍未能解决。 实际上, 我根据issue56中提到的创建数据集方法已经成功的创建并进行了加载。但是在加载k400上预训练权重进行训练时,显示 image

    我的确是复制了kinetics.py并将所有的kinetics都改为了ucf101-->ucf101.py。并且在train.yaml中修改MODEL.NUM_CLASSES = 101 但优化器似乎仍然保持了kinetics class = 400。我不知道应当在哪里修改这个值,希望能得到您的帮助。

    opened by LEM0NTE 6
  • Training time for kinetics-400

    Training time for kinetics-400

    Hello,

    Thank you for sharing the codebase of your exciting work.

    Could you please let me know the training time for pertaining & training on kinetics-400 and the resources you used?

    Thank you!

    opened by AbdelrahmanShakerYousef 6
  • 训练视频模型时如何正确加载Imagenet的预训练权重

    训练视频模型时如何正确加载Imagenet的预训练权重

    您好, 如标题, 我该如何正确加载在图像数据集的预训练权重呢? 如图中所示, 我设置了TRAIN.CHECKPOINT_INFLATE为True, 并且设定了TRAIN.CHECKPOINT_FILE_PATH为我预训练好的2D权重. 但结合目前的训练log来看似乎并没有起到预训练的效果. 所以想向您求助正确的设置方式~ image

    opened by LEM0NTE 1
  • Basic image classifier usage of token label models

    Basic image classifier usage of token label models

    I'm hesitating asking this basic question, but what's the correct way using the token label models for basic image classification? I followed your instruction in huggingface.co uniformer_image, but the result seems not right:

    # cd image_classification
    import torch
    import torch.nn.functional as F
    import torchvision.transforms as T
    # from models import uniformer as torch_uniformer
    from token_labeling.tlt.models import uniformer as torch_uniformer
    
    def inference(model, image):
        image_transform = T.Compose([T.Resize(224), T.CenterCrop(224), T.ToTensor(), T.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])])
        image = image_transform(image)
        image = image.unsqueeze(0)
        prediction = model(image)
        prediction = F.softmax(prediction, dim=1).flatten()
        return prediction
    
    model = torch_uniformer.uniformer_small()
    weights = torch.load('uniformer_small_tl_224.pth')
    model.load_state_dict(weights['model'] if "model" in weights else weights, strict=True)
    model = model.eval()
    
    # Run prediction
    from skimage.data import chelsea
    from PIL import Image
    imm = Image.fromarray(chelsea()) # Chelsea the cat
    out = inference(model, imm)
    print(out.argsort()[-5:])
    # tensor([224, 196, 223, 410, 599])
    
    # Decode, any method just getting the label output
    from tensorflow import keras
    keras.applications.imagenet_utils.decode_predictions(out.detach().numpy()[None])
    # [[('n03530642', 'honeycomb', 0.55872005),
    #   ('n02727426', 'apiary', 0.011748945),
    #   ('n02104365', 'schipperke', 0.0044726683),
    #   ('n02097047', 'miniature_schnauzer', 0.003748106),
    #   ('n02105056', 'groenendael', 0.0033460185)]]
    

    The correct output like using non-token-label uniformer_small is like:

    from models import uniformer as torch_uniformer
    ...
    weights = torch.load('uniformer_small_in1k.pth')
    ...
    print(out.argsort()[-5:])
    # tensor([284, 287, 281, 282, 285])
    ...
    keras.applications.imagenet_utils.decode_predictions(out.detach().numpy()[None])
    # [[('n02124075', 'Egyptian_cat', 0.7029501),
    #   ('n02123159', 'tiger_cat', 0.08705652),
    #   ('n02123045', 'tabby', 0.056305394),
    #   ('n02127052', 'lynx', 0.0035495553),
    #   ('n02123597', 'Siamese_cat', 0.0008160392)]]
    

    Besides, the imagenet evaluation accuracy in my testing for non-token-label uniformer_small is top1: 0.82986 top5: 0.96358, and token-label one using same method is top1: 0.00136 top5: 0.00622. I think it's something wrong in my usage.

    opened by leondgarse 5
Owner
SenseTime X-Lab
Powered by X-Lab, SenseTime Research
SenseTime X-Lab
The project is an official implementation of our CVPR2019 paper "Deep High-Resolution Representation Learning for Human Pose Estimation"

Deep High-Resolution Representation Learning for Human Pose Estimation (CVPR 2019) News [2020/07/05] A very nice blog from Towards Data Science introd

Leo Xiao 3.9k Jan 5, 2023
Official implementation of AAAI-21 paper "Label Confusion Learning to Enhance Text Classification Models"

Description: This is the official implementation of our AAAI-21 accepted paper Label Confusion Learning to Enhance Text Classification Models. The str

null 101 Nov 25, 2022
Official PyTorch implementation for paper Context Matters: Graph-based Self-supervised Representation Learning for Medical Images

Context Matters: Graph-based Self-supervised Representation Learning for Medical Images Official PyTorch implementation for paper Context Matters: Gra

null 49 Nov 23, 2022
The official implementation of NeMo: Neural Mesh Models of Contrastive Features for Robust 3D Pose Estimation [ICLR-2021]. https://arxiv.org/pdf/2101.12378.pdf

NeMo: Neural Mesh Models of Contrastive Features for Robust 3D Pose Estimation [ICLR-2021] Release Notes The offical PyTorch implementation of NeMo, p

Angtian Wang 76 Nov 23, 2022
StyleGAN2-ADA - Official PyTorch implementation

Abstract: Training generative adversarial networks (GAN) using too little data typically leads to discriminator overfitting, causing training to diverge. We propose an adaptive discriminator augmentation mechanism that significantly stabilizes training in limited data regimes.

NVIDIA Research Projects 3.2k Dec 30, 2022
Official implementation of the ICLR 2021 paper

You Only Need Adversarial Supervision for Semantic Image Synthesis Official PyTorch implementation of the ICLR 2021 paper "You Only Need Adversarial S

Bosch Research 272 Dec 28, 2022
Official PyTorch implementation of Joint Object Detection and Multi-Object Tracking with Graph Neural Networks

This is the official PyTorch implementation of our paper: "Joint Object Detection and Multi-Object Tracking with Graph Neural Networks". Our project website and video demos are here.

Richard Wang 443 Dec 6, 2022
Official implementation of the paper Image Generators with Conditionally-Independent Pixel Synthesis https://arxiv.org/abs/2011.13775

CIPS -- Official Pytorch Implementation of the paper Image Generators with Conditionally-Independent Pixel Synthesis Requirements pip install -r requi

Multimodal Lab @ Samsung AI Center Moscow 201 Dec 21, 2022
Official pytorch implementation of paper "Image-to-image Translation via Hierarchical Style Disentanglement".

HiSD: Image-to-image Translation via Hierarchical Style Disentanglement Official pytorch implementation of paper "Image-to-image Translation

null 364 Dec 14, 2022
Official pytorch implementation of paper "Inception Convolution with Efficient Dilation Search" (CVPR 2021 Oral).

IC-Conv This repository is an official implementation of the paper Inception Convolution with Efficient Dilation Search. Getting Started Download Imag

Jie Liu 111 Dec 31, 2022
Official PyTorch Implementation of Unsupervised Learning of Scene Flow Estimation Fusing with Local Rigidity

UnRigidFlow This is the official PyTorch implementation of UnRigidFlow (IJCAI2019). Here are two sample results (~10MB gif for each) of our unsupervis

Liang Liu 28 Nov 16, 2022
Official implementation of our paper "LLA: Loss-aware Label Assignment for Dense Pedestrian Detection" in Pytorch.

LLA: Loss-aware Label Assignment for Dense Pedestrian Detection This project provides an implementation for "LLA: Loss-aware Label Assignment for Dens

null 35 Dec 6, 2022
Official implementation of Self-supervised Graph Attention Networks (SuperGAT), ICLR 2021.

SuperGAT Official implementation of Self-supervised Graph Attention Networks (SuperGAT). This model is presented at How to Find Your Friendly Neighbor

Dongkwan Kim 127 Dec 28, 2022
An official implementation of "SFNet: Learning Object-aware Semantic Correspondence" (CVPR 2019, TPAMI 2020) in PyTorch.

PyTorch implementation of SFNet This is the implementation of the paper "SFNet: Learning Object-aware Semantic Correspondence". For more information,

CV Lab @ Yonsei University 87 Dec 30, 2022
This project is the official implementation of our accepted ICLR 2021 paper BiPointNet: Binary Neural Network for Point Clouds.

BiPointNet: Binary Neural Network for Point Clouds Created by Haotong Qin, Zhongang Cai, Mingyuan Zhang, Yifu Ding, Haiyu Zhao, Shuai Yi, Xianglong Li

Haotong Qin 59 Dec 17, 2022
Official code implementation for "Personalized Federated Learning using Hypernetworks"

Personalized Federated Learning using Hypernetworks This is an official implementation of Personalized Federated Learning using Hypernetworks paper. [

Aviv Shamsian 121 Dec 25, 2022
StyleGAN2 - Official TensorFlow Implementation

StyleGAN2 - Official TensorFlow Implementation

NVIDIA Research Projects 10.1k Dec 28, 2022
Old Photo Restoration (Official PyTorch Implementation)

Bringing Old Photo Back to Life (CVPR 2020 oral)

Microsoft 11.3k Dec 30, 2022
Official implementation of "GS-WGAN: A Gradient-Sanitized Approach for Learning Differentially Private Generators" (NeurIPS 2020)

GS-WGAN This repository contains the implementation for GS-WGAN: A Gradient-Sanitized Approach for Learning Differentially Private Generators (NeurIPS

null 46 Nov 9, 2022