UniFormer - official implementation of UniFormer

SenseTime X-Lab

Last update: Jan 4, 2023

Related tags

Deep Learning UniFormer

Overview

UniFormer

This repo is the official implementation of "Uniformer: Unified Transformer for Efficient Spatiotemporal Representation Learning". It currently includes code and models for the following tasks:

Image Classification
Video Classification
Object Detection (code will be released soon)
Semantic Segmentation (code will be released soon)
Pose Estimation (code will be released soon)

Updates

01/13/2022

[Initial commits]:

Pretrained models on ImageNet-1K, Kinetics-400, Kinetics-600, Something-Something V1&V2
The supported code and models for image classification and video classification are provided.

Introduction

UniFormer (Unified transFormer) is introduce in arxiv, which effectively unifies 3D convolution and spatiotemporal self-attention in a concise transformer format. We adopt local MHRA in shallow layers to largely reduce computation burden and global MHRA in deep layers to learn global token relation.

UniFormer achieves strong performance on video classification. With only ImageNet-1K pretraining, our UniFormer achieves 82.9%/84.8% top-1 accuracy on Kinetics-400/Kinetics-600, while requiring 10x fewer GFLOPs than other comparable methods (e.g., 16.7x fewer GFLOPs than ViViT with JFT-300M pre-training). For Something-Something V1 and V2, our UniFormer achieves 60.9% and 71.2% top-1 accuracy respectively, which are new state-of-the-art performances.

Main results on ImageNet-1K

Please see image_classification for more details.

More models with large resolution and token labeling will be released soon.

Model	Pretrain	Resolution	Top-1	#Param.	FLOPs
UniFormer-S	ImageNet-1K	224x224	82.9	22M	3.6G
UniFormer-S†	ImageNet-1K	224x224	83.4	24M	4.2G
UniFormer-B	ImageNet-1K	224x224	83.9	50M	8.3G

Main results on Kinetics-400

Please see video_classification for more details.

Model	Pretrain	#Frame	Sampling Method	FLOPs	K400 Top-1	K600 Top-1
UniFormer-S	ImageNet-1K	16x1x4	16x4	167G	80.8	82.8
UniFormer-S	ImageNet-1K	16x1x4	16x8	167G	80.8	82.7
UniFormer-S	ImageNet-1K	32x1x4	32x4	438G	82.0	-
UniFormer-B	ImageNet-1K	16x1x4	16x4	387G	82.0	84.0
UniFormer-B	ImageNet-1K	16x1x4	16x8	387G	81.7	83.4
UniFormer-B	ImageNet-1K	32x1x4	32x4	1036G	82.9	84.5*

* Since Kinetics-600 is too large to train (>1 month in single node with 8 A100 GPUs), we provide model trained in multi node (around 2 weeks with 32 V100 GPUs), but the result is lower due to the lack of tuning hyperparameters.

Main results on Something-Something

Please see video_classification for more details.

Model	Pretrain	#Frame	FLOPs	SSV1 Top-1	SSV2 Top-1
UniFormer-S	K400	16x3x1	125G	57.2	67.7
UniFormer-S	K600	16x3x1	125G	57.6	69.4
UniFormer-S	K400	32x3x1	329G	58.8	69.0
UniFormer-S	K600	32x3x1	329G	59.9	70.4
UniFormer-B	K400	16x3x1	290G	59.1	70.4
UniFormer-B	K600	16x3x1	290G	58.8	70.2
UniFormer-B	K400	32x3x1	777G	60.9	71.1
UniFormer-B	K600	32x3x1	777G	61.0	71.2

Main results on downstream tasks

We have conducted extensive experiments on downstream tasks and achieved comparable results with SOTA models.

Code and models will be released in two weeks.

Cite Uniformer

If you find this repository useful, please use the following BibTeX entry for citation.

@misc{li2022uniformer,
      title={Uniformer: Unified Transformer for Efficient Spatiotemporal Representation Learning}, 
      author={Kunchang Li and Yali Wang and Peng Gao and Guanglu Song and Yu Liu and Hongsheng Li and Yu Qiao},
      year={2022},
      eprint={2201.04676},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

License

This project is released under the MIT license. Please see the LICENSE file for more information.

Contributors and Contact Information

UniFormer is maintained by Kunchang Li.

For help or issues using UniFormer, please submit a GitHub issue.

For other communications related to UniFormer, please contact Kunchang Li ([email protected]).

Comments

Huggingface Spaces

Hi, would you be interested in sharing a web demo on Huggingface Spaces for UniFormer?

It would make this model more accessible as it would allow people to try out the model directly from the browser. Some other recent machine learning model repos have set up Spaces for easy access:

github: https://github.com/salesforce/BLIP Spaces: https://huggingface.co/spaces/akhaliq/BLIP

github: https://github.com/facebookresearch/omnivore Spaces: https://huggingface.co/spaces/akhaliq/omnivore

Spaces is completely free, and I can help setup a Gradio Space. Here are some getting started instructions if you'd prefer to do it yourself: https://huggingface.co/blog/gradio-spaces

opened by AK391 18

Training hangs at the end of the first epoch in image classification task.

Dear author:

When I training the Uniformer model with 8 GPUs, I start the code with the following run.sh:

work_path=$(dirname $0)
PYTHONPATH=$PYTHONPATH:../../ \
python -m torch.distributed.launch --nproc_per_node=8 --master_port=22335 --use_env main.py \
    --model uniformer_base \
    --batch-size 64 \
    --num_workers 8 \
    --drop-path 0.3 \
    --epoch 300 \
    --dist-eval \
    --output_dir ${work_path}/ckpt \
    2>&1 | tee -a ${work_path}/log.txt

And the logs are (I have deleted the display of model details):

/home/data/user/local/anaconda3/envs/uniformer/lib/python3.9/site-packages/torch/distributed/launch.py:163: DeprecationWarning: The 'warn' method is deprecated, use 'warning' instead
  logger.warn(
The module torch.distributed.launch is deprecated and going to be removed in future.Migrate to torch.distributed.run
INFO:torch.distributed.launcher.api:Starting elastic_operator with launch configs:
  entrypoint       : main.py
  min_nodes        : 1
  max_nodes        : 1
  nproc_per_node   : 8
  run_id           : none
  rdzv_backend     : static
  rdzv_endpoint    : 127.0.0.1:22335
  rdzv_configs     : {'rank': 0, 'timeout': 900}
  max_restarts     : 3
  monitor_interval : 5
  log_dir          : None
  metrics_cfg      : {}

INFO:torch.distributed.elastic.agent.server.local_elastic_agent:log directory set to: /tmp/torchelastic_yejsfquq/none_6nw5du9x
INFO:torch.distributed.elastic.agent.server.api:[default] starting workers for entrypoint: python
INFO:torch.distributed.elastic.agent.server.api:[default] Rendezvous'ing worker group
/home/data/user/local/anaconda3/envs/uniformer/lib/python3.9/site-packages/torch/distributed/elastic/utils/store.py:52: FutureWarning: This is an experimental API and will be changed in future.
  warnings.warn(
INFO:torch.distributed.elastic.agent.server.api:[default] Rendezvous complete for workers. Result:
  restart_count=0
  master_addr=127.0.0.1
  master_port=22335
  group_rank=0
  group_world_size=1
  local_ranks=[0, 1, 2, 3, 4, 5, 6, 7]
  role_ranks=[0, 1, 2, 3, 4, 5, 6, 7]
  global_ranks=[0, 1, 2, 3, 4, 5, 6, 7]
  role_world_sizes=[8, 8, 8, 8, 8, 8, 8, 8]
  global_world_sizes=[8, 8, 8, 8, 8, 8, 8, 8]

INFO:torch.distributed.elastic.agent.server.api:[default] Starting worker group
INFO:torch.distributed.elastic.multiprocessing:Setting worker0 reply file to: /tmp/torchelastic_yejsfquq/none_6nw5du9x/attempt_0/0/error.json
INFO:torch.distributed.elastic.multiprocessing:Setting worker1 reply file to: /tmp/torchelastic_yejsfquq/none_6nw5du9x/attempt_0/1/error.json
INFO:torch.distributed.elastic.multiprocessing:Setting worker2 reply file to: /tmp/torchelastic_yejsfquq/none_6nw5du9x/attempt_0/2/error.json
INFO:torch.distributed.elastic.multiprocessing:Setting worker3 reply file to: /tmp/torchelastic_yejsfquq/none_6nw5du9x/attempt_0/3/error.json
INFO:torch.distributed.elastic.multiprocessing:Setting worker4 reply file to: /tmp/torchelastic_yejsfquq/none_6nw5du9x/attempt_0/4/error.json
INFO:torch.distributed.elastic.multiprocessing:Setting worker5 reply file to: /tmp/torchelastic_yejsfquq/none_6nw5du9x/attempt_0/5/error.json
INFO:torch.distributed.elastic.multiprocessing:Setting worker6 reply file to: /tmp/torchelastic_yejsfquq/none_6nw5du9x/attempt_0/6/error.json
INFO:torch.distributed.elastic.multiprocessing:Setting worker7 reply file to: /tmp/torchelastic_yejsfquq/none_6nw5du9x/attempt_0/7/error.json
Please update your PyTorchVideo to latest master
Please update your PyTorchVideo to latest master
Please update your PyTorchVideo to latest master
Please update your PyTorchVideo to latest master
Please update your PyTorchVideo to latest master
Please update your PyTorchVideo to latest master
Please update your PyTorchVideo to latest master
Please update your PyTorchVideo to latest master
| distributed init (rank 3): env://
| distributed init (rank 6): env://
| distributed init (rank 1): env://
| distributed init (rank 0): env://
| distributed init (rank 5): env://
| distributed init (rank 2): env://
| distributed init (rank 7): env://
| distributed init (rank 4): env://
[W ProcessGroupNCCL.cpp:1569] Rank 0 using best-guess GPU 0 to perform barrier as devices used by this process are currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect.Specify device_ids in barrier() to force use of a particular device.
[W ProcessGroupNCCL.cpp:1569] Rank 4 using best-guess GPU 4 to perform barrier as devices used by this process are currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect.Specify device_ids in barrier() to force use of a particular device.
[W ProcessGroupNCCL.cpp:1569] Rank 5 using best-guess GPU 5 to perform barrier as devices used by this process are currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect.Specify device_ids in barrier() to force use of a particular device.
[W ProcessGroupNCCL.cpp:1569] Rank 3 using best-guess GPU 3 to perform barrier as devices used by this process are currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect.Specify device_ids in barrier() to force use of a particular device.
[W ProcessGroupNCCL.cpp:1569] Rank 2 using best-guess GPU 2 to perform barrier as devices used by this process are currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect.Specify device_ids in barrier() to force use of a particular device.
[W ProcessGroupNCCL.cpp:1569] Rank 6 using best-guess GPU 6 to perform barrier as devices used by this process are currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect.Specify device_ids in barrier() to force use of a particular device.
[W ProcessGroupNCCL.cpp:1569] Rank 1 using best-guess GPU 1 to perform barrier as devices used by this process are currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect.Specify device_ids in barrier() to force use of a particular device.
[W ProcessGroupNCCL.cpp:1569] Rank 7 using best-guess GPU 7 to perform barrier as devices used by this process are currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect.Specify device_ids in barrier() to force use of a particular device.
Warning: Enabling distributed evaluation with an eval dataset not divisible by process number. This will slightly alter validation results as extra duplicate entries are added to achieve equal num of samples per-process.
Creating model: uniformer_base
number of params: 49468752
Start training for 300 epochs
[W reducer.cpp:1158] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters in the forward pass. This flag results in an extra traversal of the autograd graph every iteration,  which can adversely affect performance. If your model indeed never has any unused parameters in the forward pass, consider turning this flag off. Note that this warning may be a false positive if your model has flow control causing later iterations to have unused parameters. (function operator())
[W reducer.cpp:1158] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters in the forward pass. This flag results in an extra traversal of the autograd graph every iteration,  which can adversely affect performance. If your model indeed never has any unused parameters in the forward pass, consider turning this flag off. Note that this warning may be a false positive if your model has flow control causing later iterations to have unused parameters. (function operator())
[W reducer.cpp:1158] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters in the forward pass. This flag results in an extra traversal of the autograd graph every iteration,  which can adversely affect performance. If your model indeed never has any unused parameters in the forward pass, consider turning this flag off. Note that this warning may be a false positive if your model has flow control causing later iterations to have unused parameters. (function operator())
[W reducer.cpp:1158] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters in the forward pass. This flag results in an extra traversal of the autograd graph every iteration,  which can adversely affect performance. If your model indeed never has any unused parameters in the forward pass, consider turning this flag off. Note that this warning may be a false positive if your model has flow control causing later iterations to have unused parameters. (function operator())
[W reducer.cpp:1158] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters in the forward pass. This flag results in an extra traversal of the autograd graph every iteration,  which can adversely affect performance. If your model indeed never has any unused parameters in the forward pass, consider turning this flag off. Note that this warning may be a false positive if your model has flow control causing later iterations to have unused parameters. (function operator())
[W reducer.cpp:1158] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters in the forward pass. This flag results in an extra traversal of the autograd graph every iteration,  which can adversely affect performance. If your model indeed never has any unused parameters in the forward pass, consider turning this flag off. Note that this warning may be a false positive if your model has flow control causing later iterations to have unused parameters. (function operator())
[W reducer.cpp:1158] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters in the forward pass. This flag results in an extra traversal of the autograd graph every iteration,  which can adversely affect performance. If your model indeed never has any unused parameters in the forward pass, consider turning this flag off. Note that this warning may be a false positive if your model has flow control causing later iterations to have unused parameters. (function operator())
[W reducer.cpp:1158] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters in the forward pass. This flag results in an extra traversal of the autograd graph every iteration,  which can adversely affect performance. If your model indeed never has any unused parameters in the forward pass, consider turning this flag off. Note that this warning may be a false positive if your model has flow control causing later iterations to have unused parameters. (function operator())
Epoch: [0]  [0/9]  eta: 0:01:24  lr: 0.000001  loss: 6.0915 (6.0915)  time: 9.3436  data: 3.3803  max mem: 11905
Epoch: [0]  [8/9]  eta: 0:00:02  lr: 0.000001  loss: 6.0828 (6.0922)  time: 2.0037  data: 0.3758  max mem: 12141
Epoch: [0] Total time: 0:00:18 (2.0372 s / it)
Averaged stats: lr: 0.000001  loss: 6.0828 (6.0883)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)

The displayed logs are very messy and maddening, and I am totally no clue whether the code runs correctly. These warnings only occur when distributed training. Have you ever met this situation? I will appreciate it if you can give me some advice.

opened by realgump 17

How to use the pretrained model uniformer_base_in1k.pth as my backbone ?

There are some problems when I use the pre-trained model uniformer_base_in1k.pth as my backbone? missing keys: ['patch_embed1.norm.weight', 'patch_embed1.norm.bias', 'patch_embed1.proj.weight', 'patch_embed1.proj.bias', 'patch_embed2.norm.weight', ..... unexpected keys: ['model']

opened by hongsheng-Z 16
AssertionError: The `num_classes` (54) in ConvFCBBoxHead of MMDistributedDataParallel does not matches the length of `CLASSES` 80) in CocoDataset

CUDA_VISIBLE_DEVICES=0,1,2,3 ./tools/dist_train.sh ./exp/cascade_mask_rcnn_3x_ms_hybrid_base/config.py 4 --cfg-options model.pretrained='/home/lbc/UniFormer/object_detection/pretrained/cascade_mask_rcnn_3x_ms_hybrid_base.pth'

`base = [ '../../configs/base/models/cascade_mask_rcnn_uniformer_fpn.py', '../../configs/base/datasets/coco_instance.py', '../../configs/base/schedules/schedule_1x.py', '../../configs/base/default_runtime.py' ]

model = dict( backbone=dict( embed_dim=[64, 128, 320, 512], layers=[5, 8, 20, 7], head_dim=64, drop_path_rate=0.4, use_checkpoint=True, checkpoint_num=[0, 0, 20, 0], windows=False, hybrid=True, window_size=14 ), neck=dict(in_channels=[64, 128, 320, 512]), roi_head=dict( bbox_head=[ dict( type='ConvFCBBoxHead', num_shared_convs=4, num_shared_fcs=1, in_channels=256, conv_out_channels=256, fc_out_channels=1024, roi_feat_size=7, num_classes=54, bbox_coder=dict( type='DeltaXYWHBBoxCoder', target_means=[0., 0., 0., 0.], target_stds=[0.1, 0.1, 0.2, 0.2]), reg_class_agnostic=False, reg_decoded_bbox=True, norm_cfg=dict(type='SyncBN', requires_grad=True), loss_cls=dict( type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0), loss_bbox=dict(type='GIoULoss', loss_weight=10.0)), dict( type='ConvFCBBoxHead', num_shared_convs=4, num_shared_fcs=1, in_channels=256, conv_out_channels=256, fc_out_channels=1024, roi_feat_size=7, num_classes=54, bbox_coder=dict( type='DeltaXYWHBBoxCoder', target_means=[0., 0., 0., 0.], target_stds=[0.05, 0.05, 0.1, 0.1]), reg_class_agnostic=False, reg_decoded_bbox=True, norm_cfg=dict(type='SyncBN', requires_grad=True), loss_cls=dict( type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0), loss_bbox=dict(type='GIoULoss', loss_weight=10.0)), dict( type='ConvFCBBoxHead', num_shared_convs=4, num_shared_fcs=1, in_channels=256, conv_out_channels=256, fc_out_channels=1024, roi_feat_size=7, num_classes=54, bbox_coder=dict( type='DeltaXYWHBBoxCoder', target_means=[0., 0., 0., 0.], target_stds=[0.033, 0.033, 0.067, 0.067]), reg_class_agnostic=False, reg_decoded_bbox=True, norm_cfg=dict(type='SyncBN', requires_grad=True), loss_cls=dict( type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0), loss_bbox=dict(type='GIoULoss', loss_weight=10.0)) ]))

img_norm_cfg = dict( mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)

augmentation strategy originates from DETR / Sparse RCNN

train_pipeline = [ dict(type='LoadImageFromFile'), dict(type='LoadAnnotations', with_bbox=True, with_mask=True), dict(type='RandomFlip', flip_ratio=0.5), dict(type='AutoAugment', policies=[ [ dict(type='Resize', img_scale=[(480, 1333), (512, 1333), (544, 1333), (576, 1333), (608, 1333), (640, 1333), (672, 1333), (704, 1333), (736, 1333), (768, 1333), (800, 1333)], multiscale_mode='value', keep_ratio=True) ], [ dict(type='Resize', img_scale=[(400, 1333), (500, 1333), (600, 1333)], multiscale_mode='value', keep_ratio=True), dict(type='RandomCrop', crop_type='absolute_range', crop_size=(384, 600), allow_negative_crop=True), dict(type='Resize', img_scale=[(480, 1333), (512, 1333), (544, 1333), (576, 1333), (608, 1333), (640, 1333), (672, 1333), (704, 1333), (736, 1333), (768, 1333), (800, 1333)], multiscale_mode='value', override=True, keep_ratio=True) ] ]), dict(type='Normalize', **img_norm_cfg), dict(type='Pad', size_divisor=32), dict(type='DefaultFormatBundle'), dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels', 'gt_masks']), ] data = dict(train=dict(pipeline=train_pipeline))

optimizer = dict(delete=True, type='AdamW', lr=0.0001, betas=(0.9, 0.999), weight_decay=0.05, paramwise_cfg=dict(custom_keys={'absolute_pos_embed': dict(decay_mult=0.), 'relative_position_bias_table': dict(decay_mult=0.), 'norm': dict(decay_mult=0.)})) lr_config = dict(step=[27, 33]) runner = dict(type='EpochBasedRunnerAmp', max_epochs=36)

do not use mmdet version fp16

fp16 = None optimizer_config = dict( type="DistOptimizerHook", update_interval=1, grad_clip=None, coalesce=True, bucket_size_mb=-1, use_fp16=True, )Using /home/lbc/miniconda3/envs/openmmlab_new/lib/python3.7/site-packages Finished processing dependencies for mmdet==2.11.0 `

opened by Williamlizl 9
Video classification, k400 test, top1 is only 0.03

I ran the test code before and found that top1 is only 0.03. I can guarantee that my data and labels must be aligned. Because there is no dataset provided by the author, I can only find the dataset on the Internet, so the labels I use, they are all generated by myself based on the data set, but the results that came out were so disappointing to me. In previous issues, I found that other people also had this problem, and then I based on the kinetics_400_categroies.txt provided by the author , the test.csv is regenerated, which means that the test.csv file must be regenerated using the kinetics_400_categroies.txt provided by the author . Then I ran it again with the new kinetics_400_categroies.txt and test.csv, this time with a top1 of 0.81. This is really weird. This is really weird. This is really weird,

opened by LeiYiNuist 8
Some questions about video classification.
Hello, this is a great work. But there are something I want to ask: 1.How to prepare the train and val list file for Something-Something V1 dataset? Are these of the same format of Something-Something V2 mentioned in dataset.md? Can you please provide those?

I ever adopt the tools in TSM project to extract frames of Something-Something V2. The difference is that the extracted frames are sparse. For example, according to the train.csc you provide in dataset.md the frames of video 1 is totally 117 while in TSM project the number is 45. So, why do you adopt a much dense extracting rate? How do different rates influence the final performance?
opened by RongchangLi 8
Pretrained window/hybrid SABlock backbone model for Detection task

Hi, thank you for the contribution to this super-rad work!

Wonder that, in your experiments, whether the backbone models used in Detection task with stage-3 window/hybrid SABlock (S-h14, B-h14) are needed to be pretrained on imagenet?

If so, could these backbones with window/hybrid SABlock be released? And if not, are the weights loaded directly from the regular model with global window in stage-3?

Thanks!

opened by CanyonWind 8
Some questions about loading pretrain model and training.
Dear author:

I want to fine-tune your model on my dataset of video classification task using provided UniFormer-B model. However, I met some problems.

When I start training, the top1 error is always 100

I have successfully fine-tune my dataset using code facebook SlowFast with different models before.

Before I training UniFormer, I have:

replaced the dataset folder with my previous SlowFast project;

set AUG and MIXUP disable.

Have I possibly made any mistakes?

Besides that, I notice that SlowFast code using slowfast/utils/checkpoint.py to load pretrain model, while the Uniformer code add a new function in slowfast/model/uniformer.py and copy the origin slowfast/utils/checkpoint.py to slowfast/utils/checkpoint_amp.py. What is the difference between loading pretrain model from the these two functions?

您好：

我想用UniFormer模型在我的数据集上fine-tune，我的任务是video classification，用的是文档中提供的UniFormer-B model，但是遇到了一些问题。

我训练的时候，top-1 error一直是100，如上图。

我之前用facebook SlowFast 的code在不同的模型上面都可以成功训练。训Uniformer的时候，我把我之前project的dataset文件夹直接覆盖了过来，然后关掉了AUG和MIXUP。不知道是哪里出了问题？

此外，SlowFast的code是在slowfast/utils/checkpoint.py 里面读的预训练模型，但我看您的code是在 slowfast/model/uniformer.py 重新实现了一个函数来读，并且把原来的slowfast/utils/checkpoint.py重命名为slowfast/utils/checkpoint_amp.py（但我好像没看到两者之间的差别）。我想了解一下用uniformer.py和checkpoint_amp.py加载预训练模型有区别吗？
opened by realgump 7
Strange test results using provided model checkpoints
We (@omubarek, @VenerableSpace) have been running Uniformer test only (TRAIN.ENABLE=FALSE), on Kinetics-400 test split. We also specified the following provided model checkpoints via TEST.CHECKPOINT_FILE_PATH for two different runs; one with Uniformer-S, another with Uniformer-B:

Uniformer-S, #Frame: 8x1x4, Sampling Stride: 8, filename: uniformer_small_k400_8x8.pth

Uniformer-B, #Frame: 8x1x4, Sampling Stride: 8, filename: uniformer_base_k400_8x8.pth

In both cases, we obtained very similar low top1 / top5 accuracies, such as: {"split": "test_final", "top1_acc": "0.04", "top5_acc": "0.85"}.

We've tested the pre-processed data using other models different than Uniformer and we could reproduce their results.

Could you please help us figuring out what we may be missing?
opened by omubarek 7
New dataset

您好, 我在尝试迁移到新的数据集（例如UCF101）进行测试训练时遇到了问题。我已经查阅了issue#56， issue#17但仍未能解决。实际上，我根据issue56中提到的创建数据集方法已经成功的创建并进行了加载。但是在加载k400上预训练权重进行训练时，显示

我的确是复制了kinetics.py并将所有的kinetics都改为了ucf101-->ucf101.py。并且在train.yaml中修改MODEL.NUM_CLASSES = 101 但优化器似乎仍然保持了kinetics class = 400。我不知道应当在哪里修改这个值，希望能得到您的帮助。

opened by LEM0NTE 6
Training time for kinetics-400

Hello,

Thank you for sharing the codebase of your exciting work.

Could you please let me know the training time for pertaining & training on kinetics-400 and the resources you used?

Thank you!

opened by AbdelrahmanShakerYousef 6
训练视频模型时如何正确加载Imagenet的预训练权重

您好, 如标题, 我该如何正确加载在图像数据集的预训练权重呢? 如图中所示, 我设置了TRAIN.CHECKPOINT_INFLATE为True, 并且设定了TRAIN.CHECKPOINT_FILE_PATH为我预训练好的2D权重. 但结合目前的训练log来看似乎并没有起到预训练的效果. 所以想向您求助正确的设置方式~

opened by LEM0NTE 1

Basic image classifier usage of token label models

I'm hesitating asking this basic question, but what's the correct way using the token label models for basic image classification? I followed your instruction in huggingface.co uniformer_image, but the result seems not right:

# cd image_classification
import torch
import torch.nn.functional as F
import torchvision.transforms as T
# from models import uniformer as torch_uniformer
from token_labeling.tlt.models import uniformer as torch_uniformer

def inference(model, image):
    image_transform = T.Compose([T.Resize(224), T.CenterCrop(224), T.ToTensor(), T.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])])
    image = image_transform(image)
    image = image.unsqueeze(0)
    prediction = model(image)
    prediction = F.softmax(prediction, dim=1).flatten()
    return prediction

model = torch_uniformer.uniformer_small()
weights = torch.load('uniformer_small_tl_224.pth')
model.load_state_dict(weights['model'] if "model" in weights else weights, strict=True)
model = model.eval()

# Run prediction
from skimage.data import chelsea
from PIL import Image
imm = Image.fromarray(chelsea()) # Chelsea the cat
out = inference(model, imm)
print(out.argsort()[-5:])
# tensor([224, 196, 223, 410, 599])

# Decode, any method just getting the label output
from tensorflow import keras
keras.applications.imagenet_utils.decode_predictions(out.detach().numpy()[None])
# [[('n03530642', 'honeycomb', 0.55872005),
#   ('n02727426', 'apiary', 0.011748945),
#   ('n02104365', 'schipperke', 0.0044726683),
#   ('n02097047', 'miniature_schnauzer', 0.003748106),
#   ('n02105056', 'groenendael', 0.0033460185)]]

The correct output like using non-token-label uniformer_small is like:

from models import uniformer as torch_uniformer
...
weights = torch.load('uniformer_small_in1k.pth')
...
print(out.argsort()[-5:])
# tensor([284, 287, 281, 282, 285])
...
keras.applications.imagenet_utils.decode_predictions(out.detach().numpy()[None])
# [[('n02124075', 'Egyptian_cat', 0.7029501),
#   ('n02123159', 'tiger_cat', 0.08705652),
#   ('n02123045', 'tabby', 0.056305394),
#   ('n02127052', 'lynx', 0.0035495553),
#   ('n02123597', 'Siamese_cat', 0.0008160392)]]

Besides, the imagenet evaluation accuracy in my testing for non-token-label uniformer_small is top1: 0.82986 top5: 0.96358, and token-label one using same method is top1: 0.00136 top5: 0.00622. I think it's something wrong in my usage.

opened by leondgarse 5

Owner

SenseTime X-Lab

GitHub

The project is an official implementation of our CVPR2019 paper "Deep High-Resolution Representation Learning for Human Pose Estimation"

Deep High-Resolution Representation Learning for Human Pose Estimation (CVPR 2019) News [2020/07/05] A very nice blog from Towards Data Science introd

3.9k Jan 5, 2023

Official implementation of AAAI-21 paper "Label Confusion Learning to Enhance Text Classification Models"

Description: This is the official implementation of our AAAI-21 accepted paper Label Confusion Learning to Enhance Text Classification Models. The str

101 Nov 25, 2022

Official PyTorch implementation for paper Context Matters: Graph-based Self-supervised Representation Learning for Medical Images

Context Matters: Graph-based Self-supervised Representation Learning for Medical Images Official PyTorch implementation for paper Context Matters: Gra

49 Nov 23, 2022

The official implementation of NeMo: Neural Mesh Models of Contrastive Features for Robust 3D Pose Estimation [ICLR-2021]. https://arxiv.org/pdf/2101.12378.pdf

NeMo: Neural Mesh Models of Contrastive Features for Robust 3D Pose Estimation [ICLR-2021] Release Notes The offical PyTorch implementation of NeMo, p

76 Nov 23, 2022

StyleGAN2-ADA - Official PyTorch implementation

Abstract: Training generative adversarial networks (GAN) using too little data typically leads to discriminator overfitting, causing training to diverge. We propose an adaptive discriminator augmentation mechanism that significantly stabilizes training in limited data regimes.

3.2k Dec 30, 2022

Official implementation of the ICLR 2021 paper

You Only Need Adversarial Supervision for Semantic Image Synthesis Official PyTorch implementation of the ICLR 2021 paper "You Only Need Adversarial S

272 Dec 28, 2022

Official PyTorch implementation of Joint Object Detection and Multi-Object Tracking with Graph Neural Networks

This is the official PyTorch implementation of our paper: "Joint Object Detection and Multi-Object Tracking with Graph Neural Networks". Our project website and video demos are here.

443 Dec 6, 2022

Official implementation of the paper Image Generators with Conditionally-Independent Pixel Synthesis https://arxiv.org/abs/2011.13775

CIPS -- Official Pytorch Implementation of the paper Image Generators with Conditionally-Independent Pixel Synthesis Requirements pip install -r requi

Multimodal Lab @ Samsung AI Center Moscow

201 Dec 21, 2022

Official pytorch implementation of paper "Image-to-image Translation via Hierarchical Style Disentanglement".

HiSD: Image-to-image Translation via Hierarchical Style Disentanglement Official pytorch implementation of paper "Image-to-image Translation

364 Dec 14, 2022

Official pytorch implementation of paper "Inception Convolution with Efficient Dilation Search" (CVPR 2021 Oral).

IC-Conv This repository is an official implementation of the paper Inception Convolution with Efficient Dilation Search. Getting Started Download Imag

111 Dec 31, 2022

Official PyTorch Implementation of Unsupervised Learning of Scene Flow Estimation Fusing with Local Rigidity

UnRigidFlow This is the official PyTorch implementation of UnRigidFlow (IJCAI2019). Here are two sample results (~10MB gif for each) of our unsupervis

28 Nov 16, 2022

Official implementation of our paper "LLA: Loss-aware Label Assignment for Dense Pedestrian Detection" in Pytorch.

LLA: Loss-aware Label Assignment for Dense Pedestrian Detection This project provides an implementation for "LLA: Loss-aware Label Assignment for Dens

35 Dec 6, 2022

Official implementation of Self-supervised Graph Attention Networks (SuperGAT), ICLR 2021.

SuperGAT Official implementation of Self-supervised Graph Attention Networks (SuperGAT). This model is presented at How to Find Your Friendly Neighbor

127 Dec 28, 2022

An official implementation of "SFNet: Learning Object-aware Semantic Correspondence" (CVPR 2019, TPAMI 2020) in PyTorch.

PyTorch implementation of SFNet This is the implementation of the paper "SFNet: Learning Object-aware Semantic Correspondence". For more information,

87 Dec 30, 2022

This project is the official implementation of our accepted ICLR 2021 paper BiPointNet: Binary Neural Network for Point Clouds.

BiPointNet: Binary Neural Network for Point Clouds Created by Haotong Qin, Zhongang Cai, Mingyuan Zhang, Yifu Ding, Haiyu Zhao, Shuai Yi, Xianglong Li

59 Dec 17, 2022

Official implementation of "GS-WGAN: A Gradient-Sanitized Approach for Learning Differentially Private Generators" (NeurIPS 2020)

GS-WGAN This repository contains the implementation for GS-WGAN: A Gradient-Sanitized Approach for Learning Differentially Private Generators (NeurIPS

46 Nov 9, 2022

UniFormer - official implementation of UniFormer

Related tags

Overview

UniFormer

Updates

Introduction

Main results on ImageNet-1K

Main results on Kinetics-400

Main results on Something-Something

Main results on downstream tasks

Cite Uniformer

License

Contributors and Contact Information

Comments

augmentation strategy originates from DETR / Sparse RCNN

do not use mmdet version fp16

Owner

SenseTime X-Lab

The project is an official implementation of our CVPR2019 paper "Deep High-Resolution Representation Learning for Human Pose Estimation"

Official implementation of AAAI-21 paper "Label Confusion Learning to Enhance Text Classification Models"

Official PyTorch implementation for paper Context Matters: Graph-based Self-supervised Representation Learning for Medical Images

The official implementation of NeMo: Neural Mesh Models of Contrastive Features for Robust 3D Pose Estimation [ICLR-2021]. https://arxiv.org/pdf/2101.12378.pdf

StyleGAN2-ADA - Official PyTorch implementation

Official implementation of the ICLR 2021 paper

Official PyTorch implementation of Joint Object Detection and Multi-Object Tracking with Graph Neural Networks

Official implementation of the paper Image Generators with Conditionally-Independent Pixel Synthesis https://arxiv.org/abs/2011.13775

Official pytorch implementation of paper "Image-to-image Translation via Hierarchical Style Disentanglement".

Official pytorch implementation of paper "Inception Convolution with Efficient Dilation Search" (CVPR 2021 Oral).

Official PyTorch Implementation of Unsupervised Learning of Scene Flow Estimation Fusing with Local Rigidity

Official implementation of our paper "LLA: Loss-aware Label Assignment for Dense Pedestrian Detection" in Pytorch.

Official implementation of Self-supervised Graph Attention Networks (SuperGAT), ICLR 2021.

An official implementation of "SFNet: Learning Object-aware Semantic Correspondence" (CVPR 2019, TPAMI 2020) in PyTorch.

This project is the official implementation of our accepted ICLR 2021 paper BiPointNet: Binary Neural Network for Point Clouds.

Official code implementation for "Personalized Federated Learning using Hypernetworks"

StyleGAN2 - Official TensorFlow Implementation

Old Photo Restoration (Official PyTorch Implementation)

Official implementation of "GS-WGAN: A Gradient-Sanitized Approach for Learning Differentially Private Generators" (NeurIPS 2020)