This is an official implementation for "Swin Transformer: Hierarchical Vision Transformer using Shifted Windows" on Object Detection and Instance Segmentation.

Overview

Swin Transformer for Object Detection

This repo contains the supported code and configuration files to reproduce object detection results of Swin Transformer. It is based on mmdetection.

Updates

04/12/2021 Initial commits

Results and Models

Mask R-CNN

Backbone Pretrain Lr Schd box mAP mask mAP #params FLOPs config log model
Swin-T ImageNet-1K 3x 46.0 41.6 48M 267G config github/baidu github/baidu
Swin-S ImageNet-1K 3x 48.5 43.3 69M 359G config github/baidu github/baidu

Cascade Mask R-CNN

Backbone Pretrain Lr Schd box mAP mask mAP #params FLOPs config log model
Swin-T ImageNet-1K 3x 50.4 43.7 86M 745G config github/baidu github/baidu
Swin-S ImageNet-1K 3x 51.9 45.0 107M 838G config github/baidu github/baidu
Swin-B ImageNet-1K 3x 51.9 45.0 145M 982G config github/baidu github/baidu

RepPoints V2

Backbone Pretrain Lr Schd box mAP mask mAP #params FLOPs
Swin-T ImageNet-1K 3x 50.0 - 45M 283G

Mask RepPoints V2

Backbone Pretrain Lr Schd box mAP mask mAP #params FLOPs
Swin-T ImageNet-1K 3x 50.3 43.6 47M 292G

Notes:

Usage

Installation

Please refer to get_started.md for installation and dataset preparation.

Inference

# single-gpu testing
python tools/test.py <CONFIG_FILE> <DET_CHECKPOINT_FILE> --eval bbox segm

# multi-gpu testing
tools/dist_test.sh <CONFIG_FILE> <DET_CHECKPOINT_FILE> <GPU_NUM> --eval bbox segm

Training

To train a detector with pre-trained models, run:

# single-gpu training
python tools/train.py <CONFIG_FILE> --cfg-options model.pretrained=<PRETRAIN_MODEL> [model.backbone.use_checkpoint=True] [other optional arguments]

# multi-gpu training
tools/dist_train.sh <CONFIG_FILE> <GPU_NUM> --cfg-options model.pretrained=<PRETRAIN_MODEL> [model.backbone.use_checkpoint=True] [other optional arguments] 

For example, to train a Cascade Mask R-CNN model with a Swin-T backbone and 8 gpus, run:

tools/dist_train.sh configs/swin/cascade_mask_rcnn_swin_tiny_patch4_window7_mstrain_480-800_giou_4conv1f_adamw_3x_coco.py 8 --cfg-options model.pretrained=<PRETRAIN_MODEL> 

Note: use_checkpoint is used to save GPU memory. Please refer to this page for more details.

Apex (optional):

We use apex for mixed precision training by default. To install apex, run:

git clone https://github.com/NVIDIA/apex
cd apex
pip install -v --disable-pip-version-check --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./

If you would like to disable apex, modify the type of runner as EpochBasedRunner and comment out the following code block in the configuration files:

# do not use mmdet version fp16
fp16 = None
optimizer_config = dict(
    type="DistOptimizerHook",
    update_interval=1,
    grad_clip=None,
    coalesce=True,
    bucket_size_mb=-1,
    use_fp16=True,
)

Citing Swin Transformer

@article{liu2021Swin,
  title={Swin Transformer: Hierarchical Vision Transformer using Shifted Windows},
  author={Liu, Ze and Lin, Yutong and Cao, Yue and Hu, Han and Wei, Yixuan and Zhang, Zheng and Lin, Stephen and Guo, Baining},
  journal={arXiv preprint arXiv:2103.14030},
  year={2021}
}

Other Links

Image Classification: See Swin Transformer for Image Classification.

Semantic Segmentation: See Swin Transformer for Semantic Segmentation.

Comments
  • TypeError: MaskRCNN: SwinTransformer: __init__() got an unexpected keyword argument 'embed_dim'

    TypeError: MaskRCNN: SwinTransformer: __init__() got an unexpected keyword argument 'embed_dim'

    Checklist

    1. I have searched related issues but cannot get the expected help.
    2. The issue has not been fixed in the latest version.

    Describe the issue

    when i run follow command python tools/train.py configs/swin/mask_rcnn_swin_tiny_patch4_window7_mstrain_480-800_adamw_1x_coco.py --cfg-options model.pretrained=check_points/mask_rcnn_swin_tiny_patch4_window7_1x.pth

    i encountered a TypeError: MaskRCNN: SwinTransformer: init() got an unexpected keyword argument 'embed_dim' Reproduction

    1. What command or script did you run?

    python tools/train.py configs/swin/mask_rcnn_swin_tiny_patch4_window7_mstrain_480-800_adamw_1x_coco.py --cfg-options model.pretrained=check_points/mask_rcnn_swin_tiny_patch4_window7_1x.pth A placeholder for the command.

    ![image](https://user-images.githubusercontent.com/31770250/139524042-0f4f1d17-640d-4ab7-ab5f-6527a34e968b.png)
    
    
    2. What config dir you run?
    
    /home/zhaogy/code/Swin-Transformer-Object-Detection-master/configs/swin/mask_rcnn_swin_tiny_patch4_window7_mstrain_480-800_adamw_1x_coco.py
    A placeholder for the config.
    
    1. Did you make any modifications on the code or config? Did you understand what you have modified? i do not change anything

    2. What dataset did you use? coco Environment

    3. Please run python mmdet/utils/collect_env.py to collect necessary environment information and paste it here. .fatal: not a git repository (or any of the parent directories): .git sys.platform: linux Python: 3.7.0 (default, Oct 9 2018, 10:31:47) [GCC 7.3.0] CUDA available: True GPU 0,1,2,3: Tesla V100-SXM2-16GB CUDA_HOME: /usr/local/cuda NVCC: Build cuda_11.0_bu.TC445_37.28540450_0 GCC: gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0 PyTorch: 1.7.1 PyTorch compiling details: PyTorch built with:

    • GCC 7.3
    • C++ Version: 201402
    • Intel(R) oneAPI Math Kernel Library Version 2021.3-Product Build 20210617 for Intel(R) 64 architecture applications
    • Intel(R) MKL-DNN v1.6.0 (Git Hash 5ef631a030a6f73131c77892041042805a06064f)
    • OpenMP 201511 (a.k.a. OpenMP 4.5)
    • NNPACK is enabled
    • CPU capability usage: AVX2
    • CUDA Runtime 11.0
    • NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_37,code=compute_37
    • CuDNN 8.0.5
    • Magma 2.5.2
    • Build settings: BLAS=MKL, BUILD_TYPE=Release, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DUSE_VULKAN_WRAPPER -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, USE_CUDA=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON,

    TorchVision: 0.8.2 OpenCV: 4.4.0 MMCV: 1.3.16 MMCV Compiler: GCC 7.3 MMCV CUDA Compiler: 11.0 MMDetection: 2.18.0+ 2. You may add addition that may be helpful for locating the problem, such as

    1. How you installed PyTorch [e.g., pip, conda, source] conda install pytorch==1.7.1 torchvision==0.8.2 torchaudio==0.7.2 cudatoolkit=11.0 -c pytorch
    2. Other environment variables that may be related (such as $PATH, $LD_LIBRARY_PATH, $PYTHONPATH, etc.)

    output: fatal: not a git repository (or any of the parent directories): .git 2021-10-30 14:57:51,909 - mmdet - INFO - Environment info:

    sys.platform: linux Python: 3.7.0 (default, Oct 9 2018, 10:31:47) [GCC 7.3.0] CUDA available: True GPU 0,1,2,3: Tesla V100-SXM2-16GB CUDA_HOME: /usr/local/cuda NVCC: Build cuda_11.0_bu.TC445_37.28540450_0 GCC: gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0 PyTorch: 1.7.1 PyTorch compiling details: PyTorch built with:

    • GCC 7.3
    • C++ Version: 201402
    • Intel(R) oneAPI Math Kernel Library Version 2021.3-Product Build 20210617 for Intel(R) 64 architecture applications
    • Intel(R) MKL-DNN v1.6.0 (Git Hash 5ef631a030a6f73131c77892041042805a06064f)
    • OpenMP 201511 (a.k.a. OpenMP 4.5)
    • NNPACK is enabled
    • CPU capability usage: AVX2
    • CUDA Runtime 11.0
    • NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_37,code=compute_37
    • CuDNN 8.0.5
    • Magma 2.5.2
    • Build settings: BLAS=MKL, BUILD_TYPE=Release, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DUSE_VULKAN_WRAPPER -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, USE_CUDA=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON,

    TorchVision: 0.8.2 OpenCV: 4.4.0 MMCV: 1.3.16 MMCV Compiler: GCC 7.3 MMCV CUDA Compiler: 11.0 MMDetection: 2.18.0+

    2021-10-30 14:57:54,436 - mmdet - INFO - Distributed training: False 2021-10-30 14:57:57,051 - mmdet - INFO - Config: model = dict( type='MaskRCNN', pretrained='check_points/mask_rcnn_swin_tiny_patch4_window7_1x.pth', backbone=dict( type='SwinTransformer', embed_dim=96, depths=[2, 2, 6, 2], num_heads=[3, 6, 12, 24], window_size=7, mlp_ratio=4.0, qkv_bias=True, qk_scale=None, drop_rate=0.0, attn_drop_rate=0.0, drop_path_rate=0.1, ape=False, patch_norm=True, out_indices=(0, 1, 2, 3), use_checkpoint=False), neck=dict( type='FPN', in_channels=[96, 192, 384, 768], out_channels=256, num_outs=5), rpn_head=dict( type='RPNHead', in_channels=256, feat_channels=256, anchor_generator=dict( type='AnchorGenerator', scales=[8], ratios=[0.5, 1.0, 2.0], strides=[4, 8, 16, 32, 64]), bbox_coder=dict( type='DeltaXYWHBBoxCoder', target_means=[0.0, 0.0, 0.0, 0.0], target_stds=[1.0, 1.0, 1.0, 1.0]), loss_cls=dict( type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0), loss_bbox=dict(type='L1Loss', loss_weight=1.0)), roi_head=dict( type='StandardRoIHead', bbox_roi_extractor=dict( type='SingleRoIExtractor', roi_layer=dict(type='RoIAlign', output_size=7, sampling_ratio=0), out_channels=256, featmap_strides=[4, 8, 16, 32]), bbox_head=dict( type='Shared2FCBBoxHead', in_channels=256, fc_out_channels=1024, roi_feat_size=7, num_classes=80, bbox_coder=dict( type='DeltaXYWHBBoxCoder', target_means=[0.0, 0.0, 0.0, 0.0], target_stds=[0.1, 0.1, 0.2, 0.2]), reg_class_agnostic=False, loss_cls=dict( type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0), loss_bbox=dict(type='L1Loss', loss_weight=1.0)), mask_roi_extractor=dict( type='SingleRoIExtractor', roi_layer=dict(type='RoIAlign', output_size=14, sampling_ratio=0), out_channels=256, featmap_strides=[4, 8, 16, 32]), mask_head=dict( type='FCNMaskHead', num_convs=4, in_channels=256, conv_out_channels=256, num_classes=80, loss_mask=dict( type='CrossEntropyLoss', use_mask=True, loss_weight=1.0))), train_cfg=dict( rpn=dict( assigner=dict( type='MaxIoUAssigner', pos_iou_thr=0.7, neg_iou_thr=0.3, min_pos_iou=0.3, match_low_quality=True, ignore_iof_thr=-1), sampler=dict( type='RandomSampler', num=256, pos_fraction=0.5, neg_pos_ub=-1, add_gt_as_proposals=False), allowed_border=-1, pos_weight=-1, debug=False), rpn_proposal=dict( nms_pre=2000, max_per_img=1000, nms=dict(type='nms', iou_threshold=0.7), min_bbox_size=0), rcnn=dict( assigner=dict( type='MaxIoUAssigner', pos_iou_thr=0.5, neg_iou_thr=0.5, min_pos_iou=0.5, match_low_quality=True, ignore_iof_thr=-1), sampler=dict( type='RandomSampler', num=512, pos_fraction=0.25, neg_pos_ub=-1, add_gt_as_proposals=True), mask_size=28, pos_weight=-1, debug=False)), test_cfg=dict( rpn=dict( nms_pre=1000, max_per_img=1000, nms=dict(type='nms', iou_threshold=0.7), min_bbox_size=0), rcnn=dict( score_thr=0.05, nms=dict(type='nms', iou_threshold=0.5), max_per_img=100, mask_thr_binary=0.5))) dataset_type = 'CocoDataset' data_root = 'data/coco/' img_norm_cfg = dict( mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True) train_pipeline = [ dict(type='LoadImageFromFile'), dict(type='LoadAnnotations', with_bbox=True, with_mask=True), dict(type='RandomFlip', flip_ratio=0.5), dict( type='AutoAugment', policies=[[{ 'type': 'Resize', 'img_scale': [(480, 1333), (512, 1333), (544, 1333), (576, 1333), (608, 1333), (640, 1333), (672, 1333), (704, 1333), (736, 1333), (768, 1333), (800, 1333)], 'multiscale_mode': 'value', 'keep_ratio': True }], [{ 'type': 'Resize', 'img_scale': [(400, 1333), (500, 1333), (600, 1333)], 'multiscale_mode': 'value', 'keep_ratio': True }, { 'type': 'RandomCrop', 'crop_type': 'absolute_range', 'crop_size': (384, 600), 'allow_negative_crop': True }, { 'type': 'Resize', 'img_scale': [(480, 1333), (512, 1333), (544, 1333), (576, 1333), (608, 1333), (640, 1333), (672, 1333), (704, 1333), (736, 1333), (768, 1333), (800, 1333)], 'multiscale_mode': 'value', 'override': True, 'keep_ratio': True }]]), dict( type='Normalize', mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True), dict(type='Pad', size_divisor=32), dict(type='DefaultFormatBundle'), dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels', 'gt_masks']) ] test_pipeline = [ dict(type='LoadImageFromFile'), dict( type='MultiScaleFlipAug', img_scale=(1333, 800), flip=False, transforms=[ dict(type='Resize', keep_ratio=True), dict(type='RandomFlip'), dict( type='Normalize', mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True), dict(type='Pad', size_divisor=32), dict(type='ImageToTensor', keys=['img']), dict(type='Collect', keys=['img']) ]) ] data = dict( samples_per_gpu=2, workers_per_gpu=2, train=dict( type='CocoDataset', ann_file='data/coco/annotations/instances_train2017.json', img_prefix='data/coco/train2017/', pipeline=[ dict(type='LoadImageFromFile'), dict(type='LoadAnnotations', with_bbox=True, with_mask=True), dict(type='RandomFlip', flip_ratio=0.5), dict( type='AutoAugment', policies=[[{ 'type': 'Resize', 'img_scale': [(480, 1333), (512, 1333), (544, 1333), (576, 1333), (608, 1333), (640, 1333), (672, 1333), (704, 1333), (736, 1333), (768, 1333), (800, 1333)], 'multiscale_mode': 'value', 'keep_ratio': True }], [{ 'type': 'Resize', 'img_scale': [(400, 1333), (500, 1333), (600, 1333)], 'multiscale_mode': 'value', 'keep_ratio': True }, { 'type': 'RandomCrop', 'crop_type': 'absolute_range', 'crop_size': (384, 600), 'allow_negative_crop': True }, { 'type': 'Resize', 'img_scale': [(480, 1333), (512, 1333), (544, 1333), (576, 1333), (608, 1333), (640, 1333), (672, 1333), (704, 1333), (736, 1333), (768, 1333), (800, 1333)], 'multiscale_mode': 'value', 'override': True, 'keep_ratio': True }]]), dict( type='Normalize', mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True), dict(type='Pad', size_divisor=32), dict(type='DefaultFormatBundle'), dict( type='Collect', keys=['img', 'gt_bboxes', 'gt_labels', 'gt_masks']) ]), val=dict( type='CocoDataset', ann_file='data/coco/annotations/instances_val2017.json', img_prefix='data/coco/val2017/', pipeline=[ dict(type='LoadImageFromFile'), dict( type='MultiScaleFlipAug', img_scale=(1333, 800), flip=False, transforms=[ dict(type='Resize', keep_ratio=True), dict(type='RandomFlip'), dict( type='Normalize', mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True), dict(type='Pad', size_divisor=32), dict(type='ImageToTensor', keys=['img']), dict(type='Collect', keys=['img']) ]) ]), test=dict( type='CocoDataset', ann_file='data/coco/annotations/instances_val2017.json', img_prefix='data/coco/val2017/', pipeline=[ dict(type='LoadImageFromFile'), dict( type='MultiScaleFlipAug', img_scale=(1333, 800), flip=False, transforms=[ dict(type='Resize', keep_ratio=True), dict(type='RandomFlip'), dict( type='Normalize', mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True), dict(type='Pad', size_divisor=32), dict(type='ImageToTensor', keys=['img']), dict(type='Collect', keys=['img']) ]) ])) evaluation = dict(metric=['bbox', 'segm']) optimizer = dict( type='AdamW', lr=0.0001, betas=(0.9, 0.999), weight_decay=0.05, paramwise_cfg=dict( custom_keys=dict( absolute_pos_embed=dict(decay_mult=0.0), relative_position_bias_table=dict(decay_mult=0.0), norm=dict(decay_mult=0.0)))) optimizer_config = dict( grad_clip=None, type='DistOptimizerHook', update_interval=1, coalesce=True, bucket_size_mb=-1, use_fp16=True) lr_config = dict( policy='step', warmup='linear', warmup_iters=500, warmup_ratio=0.001, step=[8, 11]) runner = dict(type='EpochBasedRunnerAmp', max_epochs=12) checkpoint_config = dict(interval=1) log_config = dict(interval=50, hooks=[dict(type='TextLoggerHook')]) custom_hooks = [dict(type='NumClassCheckHook')] dist_params = dict(backend='nccl') log_level = 'INFO' load_from = None resume_from = None workflow = [('train', 1)] fp16 = None work_dir = './work_dirs/mask_rcnn_swin_tiny_patch4_window7_mstrain_480-800_adamw_1x_coco' gpu_ids = range(0, 1)

    /home/zhaogy/anaconda3/envs/mmdet_py/lib/python3.7/site-packages/mmdet/models/detectors/two_stage.py:29: UserWarning: DeprecationWarning: pretrained is deprecated, please use "init_cfg" instead warnings.warn('DeprecationWarning: pretrained is deprecated, ' Traceback (most recent call last): File "/home/zhaogy/anaconda3/envs/mmdet_py/lib/python3.7/site-packages/mmcv/utils/registry.py", line 52, in build_from_cfg return obj_cls(**args) TypeError: init() got an unexpected keyword argument 'embed_dim'

    During handling of the above exception, another exception occurred:

    Traceback (most recent call last): File "/home/zhaogy/anaconda3/envs/mmdet_py/lib/python3.7/site-packages/mmcv/utils/registry.py", line 52, in build_from_cfg return obj_cls(**args) File "/home/zhaogy/anaconda3/envs/mmdet_py/lib/python3.7/site-packages/mmdet/models/detectors/mask_rcnn.py", line 27, in init init_cfg=init_cfg) File "/home/zhaogy/anaconda3/envs/mmdet_py/lib/python3.7/site-packages/mmdet/models/detectors/two_stage.py", line 32, in init self.backbone = build_backbone(backbone) File "/home/zhaogy/anaconda3/envs/mmdet_py/lib/python3.7/site-packages/mmdet/models/builder.py", line 20, in build_backbone return BACKBONES.build(cfg) File "/home/zhaogy/anaconda3/envs/mmdet_py/lib/python3.7/site-packages/mmcv/utils/registry.py", line 212, in build return self.build_func(*args, **kwargs, registry=self) File "/home/zhaogy/anaconda3/envs/mmdet_py/lib/python3.7/site-packages/mmcv/cnn/builder.py", line 27, in build_model_from_cfg return build_from_cfg(cfg, registry, default_args) File "/home/zhaogy/anaconda3/envs/mmdet_py/lib/python3.7/site-packages/mmcv/utils/registry.py", line 55, in build_from_cfg raise type(e)(f'{obj_cls.name}: {e}') TypeError: SwinTransformer: init() got an unexpected keyword argument 'embed_dim'

    During handling of the above exception, another exception occurred:

    Traceback (most recent call last): File "tools/train.py", line 187, in main() File "tools/train.py", line 161, in main test_cfg=cfg.get('test_cfg')) File "/home/zhaogy/anaconda3/envs/mmdet_py/lib/python3.7/site-packages/mmdet/models/builder.py", line 59, in build_detector cfg, default_args=dict(train_cfg=train_cfg, test_cfg=test_cfg)) File "/home/zhaogy/anaconda3/envs/mmdet_py/lib/python3.7/site-packages/mmcv/utils/registry.py", line 212, in build return self.build_func(*args, **kwargs, registry=self) File "/home/zhaogy/anaconda3/envs/mmdet_py/lib/python3.7/site-packages/mmcv/cnn/builder.py", line 27, in build_model_from_cfg return build_from_cfg(cfg, registry, default_args) File "/home/zhaogy/anaconda3/envs/mmdet_py/lib/python3.7/site-packages/mmcv/utils/registry.py", line 55, in build_from_cfg raise type(e)(f'{obj_cls.name}: {e}') TypeError: MaskRCNN: SwinTransformer: init() got an unexpected keyword argument 'embed_dim'

    please help!

    opened by GY-CAS 37
  • TypeError: __init__() got an unexpected keyword argument 'embed_dim'

    TypeError: __init__() got an unexpected keyword argument 'embed_dim'

    Notice

    Traceback (most recent call last): File "f:\peizhiwenjian\mmcv-1.4.0\mmcv\utils\registry.py", line 52, in build_from_cfg return obj_cls(**args) TypeError: init() got an unexpected keyword argument 'embed_dim'

    During handling of the above exception, another exception occurred:

    Traceback (most recent call last): File "f:\peizhiwenjian\mmcv-1.4.0\mmcv\utils\registry.py", line 52, in build_from_cfg return obj_cls(**args) File "f:\peizhiwenjian\mmdetection-master2.19.1\mmdet\models\detectors\mask_rcnn.py", line 19, in init super(MaskRCNN, self).init( File "f:\peizhiwenjian\mmdetection-master2.19.1\mmdet\models\detectors\two_stage.py", line 32, in init self.backbone = build_backbone(backbone) File "f:\peizhiwenjian\mmdetection-master2.19.1\mmdet\models\builder.py", line 20, in build_backbone return BACKBONES.build(cfg) File "f:\peizhiwenjian\mmcv-1.4.0\mmcv\utils\registry.py", line 212, in build return self.build_func(*args, **kwargs, registry=self) File "f:\peizhiwenjian\mmcv-1.4.0\mmcv\cnn\builder.py", line 27, in build_model_from_cfg return build_from_cfg(cfg, registry, default_args) File "f:\peizhiwenjian\mmcv-1.4.0\mmcv\utils\registry.py", line 55, in build_from_cfg raise type(e)(f'{obj_cls.name}: {e}') TypeError: SwinTransformer: init() got an unexpected keyword argument 'embed_dim'

    During handling of the above exception, another exception occurred:

    Traceback (most recent call last): File "F:\PYcharm-project\deep-learning-for-image-processing\pytorch_object_detection\Swin-Transformer-Object-Detection-master\demo\image_demo.py", line 26, in main() File "F:\PYcharm-project\deep-learning-for-image-processing\pytorch_object_detection\Swin-Transformer-Object-Detection-master\demo\image_demo.py", line 18, in main model = init_detector(args.config, args.checkpoint, device=args.device) File "f:\peizhiwenjian\mmdetection-master2.19.1\mmdet\apis\inference.py", line 40, in init_detector model = build_detector(config.model, test_cfg=config.get('test_cfg')) File "f:\peizhiwenjian\mmdetection-master2.19.1\mmdet\models\builder.py", line 58, in build_detector return DETECTORS.build( File "f:\peizhiwenjian\mmcv-1.4.0\mmcv\utils\registry.py", line 212, in build return self.build_func(*args, **kwargs, registry=self) File "f:\peizhiwenjian\mmcv-1.4.0\mmcv\cnn\builder.py", line 27, in build_model_from_cfg return build_from_cfg(cfg, registry, default_args) File "f:\peizhiwenjian\mmcv-1.4.0\mmcv\utils\registry.py", line 55, in build_from_cfg raise type(e)(f'{obj_cls.name}: {e}') TypeError: MaskRCNN: SwinTransformer: init() got an unexpected keyword argument 'embed_dim'

    opened by LUO77123 27
  • KeyError: 'SwinTransformer is not in the models registry'

    KeyError: 'SwinTransformer is not in the models registry'

    I am using this model for custom training on my dataset in Colab. As I started training , got the error-

    Traceback (most recent call last):
    File "/usr/local/lib/python3.7/dist-packages/mmcv/utils/registry.py", line 51, in build_from_cfg
    return obj_cls(**args)
    File "/content/drive/MyDrive/mmdetection/mmdet/models/detectors/cascade_rcnn.py", line 27, in __init__
    init_cfg=init_cfg)
    File "/content/drive/MyDrive/mmdetection/mmdet/models/detectors/two_stage.py", line 26, in __init__
    self.backbone = build_backbone(backbone)
    File "/content/drive/MyDrive/mmdetection/mmdet/models/builder.py", line 19, in build_backbone
    return BACKBONES.build(cfg)
    File "/usr/local/lib/python3.7/dist-packages/mmcv/utils/registry.py", line 210, in build
    return self.build_func(*args, **kwargs, registry=self)
    File "/usr/local/lib/python3.7/dist-packages/mmcv/cnn/builder.py", line 26, in build_model_from_cfg
    return build_from_cfg(cfg, registry, default_args)
    File "/usr/local/lib/python3.7/dist-packages/mmcv/utils/registry.py", line 44, in build_from_cfg
    f'{obj_type} is not in the {registry.name} registry')
    KeyError: 'SwinTransformer is not in the models registry'
    
    During handling of the above exception, another exception occurred:
    
     Traceback (most recent call last):
     File "tools/train.py", line 187, in <module>
     main()
      File "tools/train.py", line 161, in main
    test_cfg=cfg.get('test_cfg'))
    File "/content/drive/MyDrive/mmdetection/mmdet/models/builder.py", line 58, in build_detector
    cfg, default_args=dict(train_cfg=train_cfg, test_cfg=test_cfg))
    File "/usr/local/lib/python3.7/dist-packages/mmcv/utils/registry.py", line 210, in build
    return self.build_func(*args, **kwargs, registry=self)
    File "/usr/local/lib/python3.7/dist-packages/mmcv/cnn/builder.py", line 26, in build_model_from_cfg
    return build_from_cfg(cfg, registry, default_args)
    File "/usr/local/lib/python3.7/dist-packages/mmcv/utils/registry.py", line 54, in build_from_cfg
     raise type(e)(f'{obj_cls.__name__}: {e}')
    KeyError: "CascadeRCNN: 'SwinTransformer is not in the models registry'"
    

    Here is my config file -

    2021-05-13 12:30:00,473 - mmdet - INFO - Environment info:
    ------------------------------------------------------------
    sys.platform: linux
    Python: 3.7.10 (default, May  3 2021, 02:48:31) [GCC 7.5.0]
    CUDA available: True
    GPU 0: Tesla P100-PCIE-16GB
    CUDA_HOME: /usr/local/cuda
    NVCC: Build cuda_11.0_bu.TC445_37.28845127_0
    GCC: gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
    PyTorch: 1.8.1+cu101
    PyTorch compiling details: PyTorch built with:
      - GCC 7.3
      - C++ Version: 201402
      - Intel(R) Math Kernel Library Version 2020.0.0 Product Build 20191122 for Intel(R) 64 architecture applications
      - Intel(R) MKL-DNN v1.7.0 (Git Hash 7aed236906b1f7a05c0917e5257a1af05e9ff683)
      - OpenMP 201511 (a.k.a. OpenMP 4.5)
      - NNPACK is enabled
      - CPU capability usage: AVX2
      - CUDA Runtime 10.1
      - NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_70,code=sm_70
      - CuDNN 7.6.3
      - Magma 2.5.2
      - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=10.1, CUDNN_VERSION=7.6.3, CXX_COMPILER=/opt/rh/devtoolset-7/root/usr/bin/c++, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.8.1, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, 
    
    TorchVision: 0.9.1+cu101
    OpenCV: 4.1.2
    MMCV: 1.3.3
    MMCV Compiler: GCC 7.5
    MMCV CUDA Compiler: 11.0
    MMDetection: 2.12.0+41bb93f
    ------------------------------------------------------------
    
    2021-05-13 12:30:04,393 - mmdet - INFO - Distributed training: False
    2021-05-13 12:30:08,323 - mmdet - INFO - Config:
    model = dict(
        type='CascadeRCNN',
        pretrained='./moby_cascade_mask_rcnn_swin_tiny_patch4_window7_3x.pth',
        backbone=dict(
            type='SwinTransformer',
            embed_dim=96,
            depths=[2, 2, 6, 2],
            num_heads=[3, 6, 12, 24],
            window_size=7,
            mlp_ratio=4.0,
            qkv_bias=True,
            qk_scale=None,
            drop_rate=0.0,
            attn_drop_rate=0.0,
            drop_path_rate=0.2,
            ape=False,
            patch_norm=True,
            out_indices=(0, 1, 2, 3),
            use_checkpoint=False),
        neck=dict(
            type='FPN',
            in_channels=[96, 192, 384, 768],
            out_channels=256,
            num_outs=5),
        rpn_head=dict(
            type='RPNHead',
            in_channels=256,
            feat_channels=256,
            anchor_generator=dict(
                type='AnchorGenerator',
                scales=[8],
                ratios=[0.5, 1.0, 2.0],
                strides=[4, 8, 16, 32, 64]),
            bbox_coder=dict(
                type='DeltaXYWHBBoxCoder',
                target_means=[0.0, 0.0, 0.0, 0.0],
                target_stds=[1.0, 1.0, 1.0, 1.0]),
            loss_cls=dict(
                type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0),
            loss_bbox=dict(
                type='SmoothL1Loss', beta=0.1111111111111111, loss_weight=1.0)),
        roi_head=dict(
            type='CascadeRoIHead',
            num_stages=3,
            stage_loss_weights=[1, 0.5, 0.25],
            bbox_roi_extractor=dict(
                type='SingleRoIExtractor',
                roi_layer=dict(type='RoIAlign', output_size=7, sampling_ratio=0),
                out_channels=256,
                featmap_strides=[4, 8, 16, 32]),
            bbox_head=[
                dict(
                    type='ConvFCBBoxHead',
                    num_shared_convs=4,
                    num_shared_fcs=1,
                    in_channels=256,
                    conv_out_channels=256,
                    fc_out_channels=1024,
                    roi_feat_size=7,
                    num_classes=80,
                    bbox_coder=dict(
                        type='DeltaXYWHBBoxCoder',
                        target_means=[0.0, 0.0, 0.0, 0.0],
                        target_stds=[0.1, 0.1, 0.2, 0.2]),
                    reg_class_agnostic=False,
                    reg_decoded_bbox=True,
                    norm_cfg=dict(type='SyncBN', requires_grad=True),
                    loss_cls=dict(
                        type='CrossEntropyLoss',
                        use_sigmoid=False,
                        loss_weight=1.0),
                    loss_bbox=dict(type='GIoULoss', loss_weight=10.0)),
                dict(
                    type='ConvFCBBoxHead',
                    num_shared_convs=4,
                    num_shared_fcs=1,
                    in_channels=256,
                    conv_out_channels=256,
                    fc_out_channels=1024,
                    roi_feat_size=7,
                    num_classes=80,
                    bbox_coder=dict(
                        type='DeltaXYWHBBoxCoder',
                        target_means=[0.0, 0.0, 0.0, 0.0],
                        target_stds=[0.05, 0.05, 0.1, 0.1]),
                    reg_class_agnostic=False,
                    reg_decoded_bbox=True,
                    norm_cfg=dict(type='SyncBN', requires_grad=True),
                    loss_cls=dict(
                        type='CrossEntropyLoss',
                        use_sigmoid=False,
                        loss_weight=1.0),
                    loss_bbox=dict(type='GIoULoss', loss_weight=10.0)),
                dict(
                    type='ConvFCBBoxHead',
                    num_shared_convs=4,
                    num_shared_fcs=1,
                    in_channels=256,
                    conv_out_channels=256,
                    fc_out_channels=1024,
                    roi_feat_size=7,
                    num_classes=80,
                    bbox_coder=dict(
                        type='DeltaXYWHBBoxCoder',
                        target_means=[0.0, 0.0, 0.0, 0.0],
                        target_stds=[0.033, 0.033, 0.067, 0.067]),
                    reg_class_agnostic=False,
                    reg_decoded_bbox=True,
                    norm_cfg=dict(type='SyncBN', requires_grad=True),
                    loss_cls=dict(
                        type='CrossEntropyLoss',
                        use_sigmoid=False,
                        loss_weight=1.0),
                    loss_bbox=dict(type='GIoULoss', loss_weight=10.0))
            ],
            mask_roi_extractor=dict(
                type='SingleRoIExtractor',
                roi_layer=dict(type='RoIAlign', output_size=14, sampling_ratio=0),
                out_channels=256,
                featmap_strides=[4, 8, 16, 32]),
            mask_head=dict(
                type='FCNMaskHead',
                num_convs=4,
                in_channels=256,
                conv_out_channels=256,
                num_classes=80,
                loss_mask=dict(
                    type='CrossEntropyLoss', use_mask=True, loss_weight=1.0))),
        train_cfg=dict(
            rpn=dict(
                assigner=dict(
                    type='MaxIoUAssigner',
                    pos_iou_thr=0.7,
                    neg_iou_thr=0.3,
                    min_pos_iou=0.3,
                    match_low_quality=True,
                    ignore_iof_thr=-1),
                sampler=dict(
                    type='RandomSampler',
                    num=256,
                    pos_fraction=0.5,
                    neg_pos_ub=-1,
                    add_gt_as_proposals=False),
                allowed_border=0,
                pos_weight=-1,
                debug=False),
            rpn_proposal=dict(
                nms_across_levels=False,
                nms_pre=2000,
                nms_post=2000,
                max_per_img=2000,
                nms=dict(type='nms', iou_threshold=0.7),
                min_bbox_size=0),
            rcnn=[
                dict(
                    assigner=dict(
                        type='MaxIoUAssigner',
                        pos_iou_thr=0.5,
                        neg_iou_thr=0.5,
                        min_pos_iou=0.5,
                        match_low_quality=False,
                        ignore_iof_thr=-1),
                    sampler=dict(
                        type='RandomSampler',
                        num=512,
                        pos_fraction=0.25,
                        neg_pos_ub=-1,
                        add_gt_as_proposals=True),
                    mask_size=28,
                    pos_weight=-1,
                    debug=False),
                dict(
                    assigner=dict(
                        type='MaxIoUAssigner',
                        pos_iou_thr=0.6,
                        neg_iou_thr=0.6,
                        min_pos_iou=0.6,
                        match_low_quality=False,
                        ignore_iof_thr=-1),
                    sampler=dict(
                        type='RandomSampler',
                        num=512,
                        pos_fraction=0.25,
                        neg_pos_ub=-1,
                        add_gt_as_proposals=True),
                    mask_size=28,
                    pos_weight=-1,
                    debug=False),
                dict(
                    assigner=dict(
                        type='MaxIoUAssigner',
                        pos_iou_thr=0.7,
                        neg_iou_thr=0.7,
                        min_pos_iou=0.7,
                        match_low_quality=False,
                        ignore_iof_thr=-1),
                    sampler=dict(
                        type='RandomSampler',
                        num=512,
                        pos_fraction=0.25,
                        neg_pos_ub=-1,
                        add_gt_as_proposals=True),
                    mask_size=28,
                    pos_weight=-1,
                    debug=False)
            ]),
        test_cfg=dict(
            rpn=dict(
                nms_across_levels=False,
                nms_pre=1000,
                nms_post=1000,
                max_per_img=1000,
                nms=dict(type='nms', iou_threshold=0.7),
                min_bbox_size=0),
            rcnn=dict(
                score_thr=0.05,
                nms=dict(type='nms', iou_threshold=0.5),
                max_per_img=100,
                mask_thr_binary=0.5)))
    dataset_type = 'COCODataset'
    data_root = '/content/drive/MyDrive/layout/'
    img_norm_cfg = dict(
        mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)
    train_pipeline = [
        dict(type='LoadImageFromFile'),
        dict(type='LoadAnnotations', with_bbox=True, with_mask=True),
        dict(type='RandomFlip', flip_ratio=0.5),
        dict(
            type='AutoAugment',
            policies=[[{
                'type':
                'Resize',
                'img_scale': [(480, 1333), (512, 1333), (544, 1333), (576, 1333),
                              (608, 1333), (640, 1333), (672, 1333), (704, 1333),
                              (736, 1333), (768, 1333), (800, 1333)],
                'multiscale_mode':
                'value',
                'keep_ratio':
                True
            }],
                      [{
                          'type': 'Resize',
                          'img_scale': [(400, 1333), (500, 1333), (600, 1333)],
                          'multiscale_mode': 'value',
                          'keep_ratio': True
                      }, {
                          'type': 'RandomCrop',
                          'crop_type': 'absolute_range',
                          'crop_size': (384, 600),
                          'allow_negative_crop': True
                      }, {
                          'type':
                          'Resize',
                          'img_scale': [(480, 1333), (512, 1333), (544, 1333),
                                        (576, 1333), (608, 1333), (640, 1333),
                                        (672, 1333), (704, 1333), (736, 1333),
                                        (768, 1333), (800, 1333)],
                          'multiscale_mode':
                          'value',
                          'override':
                          True,
                          'keep_ratio':
                          True
                      }]]),
        dict(
            type='Normalize',
            mean=[123.675, 116.28, 103.53],
            std=[58.395, 57.12, 57.375],
            to_rgb=True),
        dict(type='Pad', size_divisor=32),
        dict(type='DefaultFormatBundle'),
        dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels', 'gt_masks'])
    ]
    test_pipeline = [
        dict(type='LoadImageFromFile'),
        dict(
            type='MultiScaleFlipAug',
            img_scale=(1333, 800),
            flip=False,
            transforms=[
                dict(type='Resize', keep_ratio=True),
                dict(type='RandomFlip'),
                dict(
                    type='Normalize',
                    mean=[123.675, 116.28, 103.53],
                    std=[58.395, 57.12, 57.375],
                    to_rgb=True),
                dict(type='Pad', size_divisor=32),
                dict(type='ImageToTensor', keys=['img']),
                dict(type='Collect', keys=['img'])
            ])
    ]
    data = dict(
        samples_per_gpu=2,
        workers_per_gpu=2,
        train=dict(
            type='COCODataset',
            ann_file='/content/drive/MyDrive/layout/train.json',
            img_prefix='/content/drive/MyDrive/layout/train/',
            pipeline=[
                dict(type='LoadImageFromFile'),
                dict(type='LoadAnnotations', with_bbox=True, with_mask=True),
                dict(type='RandomFlip', flip_ratio=0.5),
                dict(
                    type='AutoAugment',
                    policies=[[{
                        'type':
                        'Resize',
                        'img_scale': [(480, 1333), (512, 1333), (544, 1333),
                                      (576, 1333), (608, 1333), (640, 1333),
                                      (672, 1333), (704, 1333), (736, 1333),
                                      (768, 1333), (800, 1333)],
                        'multiscale_mode':
                        'value',
                        'keep_ratio':
                        True
                    }],
                              [{
                                  'type': 'Resize',
                                  'img_scale': [(400, 1333), (500, 1333),
                                                (600, 1333)],
                                  'multiscale_mode': 'value',
                                  'keep_ratio': True
                              }, {
                                  'type': 'RandomCrop',
                                  'crop_type': 'absolute_range',
                                  'crop_size': (384, 600),
                                  'allow_negative_crop': True
                              }, {
                                  'type':
                                  'Resize',
                                  'img_scale': [(480, 1333), (512, 1333),
                                                (544, 1333), (576, 1333),
                                                (608, 1333), (640, 1333),
                                                (672, 1333), (704, 1333),
                                                (736, 1333), (768, 1333),
                                                (800, 1333)],
                                  'multiscale_mode':
                                  'value',
                                  'override':
                                  True,
                                  'keep_ratio':
                                  True
                              }]]),
                dict(
                    type='Normalize',
                    mean=[123.675, 116.28, 103.53],
                    std=[58.395, 57.12, 57.375],
                    to_rgb=True),
                dict(type='Pad', size_divisor=32),
                dict(type='DefaultFormatBundle'),
                dict(
                    type='Collect',
                    keys=['img', 'gt_bboxes', 'gt_labels', 'gt_masks'])
            ]),
        val=dict(
            type='COCODataset',
            ann_file='/content/drive/MyDrive/layout/valid.json',
            img_prefix='/content/drive/MyDrive/layout/valid/',
            pipeline=[
                dict(type='LoadImageFromFile'),
                dict(
                    type='MultiScaleFlipAug',
                    img_scale=(1333, 800),
                    flip=False,
                    transforms=[
                        dict(type='Resize', keep_ratio=True),
                        dict(type='RandomFlip'),
                        dict(
                            type='Normalize',
                            mean=[123.675, 116.28, 103.53],
                            std=[58.395, 57.12, 57.375],
                            to_rgb=True),
                        dict(type='Pad', size_divisor=32),
                        dict(type='ImageToTensor', keys=['img']),
                        dict(type='Collect', keys=['img'])
                    ])
            ]),
        test=dict(
            type='COCODataset',
            ann_file='/content/drive/MyDrive/layout/valid.json',
            img_prefix='/content/drive/MyDrive/layout/valid/',
            pipeline=[
                dict(type='LoadImageFromFile'),
                dict(
                    type='MultiScaleFlipAug',
                    img_scale=(1333, 800),
                    flip=False,
                    transforms=[
                        dict(type='Resize', keep_ratio=True),
                        dict(type='RandomFlip'),
                        dict(
                            type='Normalize',
                            mean=[123.675, 116.28, 103.53],
                            std=[58.395, 57.12, 57.375],
                            to_rgb=True),
                        dict(type='Pad', size_divisor=32),
                        dict(type='ImageToTensor', keys=['img']),
                        dict(type='Collect', keys=['img'])
                    ])
            ]))
    evaluation = dict(metric=['bbox', 'segm'])
    optimizer = dict(
        type='AdamW',
        lr=0.0001,
        betas=(0.9, 0.999),
        weight_decay=0.05,
        paramwise_cfg=dict(
            custom_keys=dict(
                absolute_pos_embed=dict(decay_mult=0.0),
                relative_position_bias_table=dict(decay_mult=0.0),
                norm=dict(decay_mult=0.0))))
    optimizer_config = dict(
        grad_clip=None,
        type='DistOptimizerHook',
        update_interval=1,
        coalesce=True,
        bucket_size_mb=-1,
        use_fp16=True)
    lr_config = dict(
        policy='step',
        warmup='linear',
        warmup_iters=500,
        warmup_ratio=0.001,
        step=[27, 33])
    runner = dict(type='EpochBasedRunnerAmp', max_epochs=36)
    checkpoint_config = dict(interval=5)
    log_config = dict(interval=50, hooks=[dict(type='TextLoggerHook')])
    custom_hooks = [dict(type='NumClassCheckHook')]
    dist_params = dict(backend='nccl')
    log_level = 'INFO'
    load_from = '/content/drive/MyDrive/Swin-Transformer-Object-Detection/moby_cascade_mask_rcnn_swin_tiny_patch4_window7_3x.pth'
    resume_from = None
    workflow = [('train', 1)]
    fp16 = None
    work_dir = './work_dirs/cascade_mask_rcnn_swin_tiny_patch4_window7_mstrain_480-800_giou_4conv1f_adamw_3x_coco'
    gpu_ids = range(0, 1)
    
    opened by Atul997 18
  • KeyError:

    KeyError: "CascadeRCNN: 'backbone.layers.0.blocks.0.attn.relative_position_bias_table'"

    Thanks for your work!

    I occured the error when I run the code. Snipaste_2021-04-14_15-43-29

    I run the command: python tools/train.py configs/swin/mydef_cascade_mask_rcnn_swin_small_patch4_window7_mstrain_480-800_giou_4conv1f_adamw_3x_coco.py --cfg-options model.pretrained=./models/cascade_mask_rcnn_swin_small_patch4_window7.pth

    How to solve it?

    opened by luooofan 11
  • Can't see train loss when train on custom dataset.

    Can't see train loss when train on custom dataset.

    I have trained on custom dataset which included 45 pics in training set. I can't see train loss when train on this dataset, only the metric result showed in terminal. Here is my config file:

    _base_ = './cascade_mask_rcnn_swin_tiny_patch4_window7_mstrain_480-800_giou_4conv1f_adamw_3x_coco.py
    
    dataset_type = 'CocoDataset'
    classes = ('counter_weight',)
    data_root = './MTDATA'
    data = dict(
        samples_per_gpu=2,
        workers_per_gpu=2,
        train=dict(
            type=dataset_type,
            # explicitly add your class names to the field `classes`
            classes=classes,
            ann_file=data_root + 'annotations/gray_train.json',
            img_prefix=data_root + 'train'),
        val=dict(
            type=dataset_type,
            # explicitly add your class names to the field `classes`
            classes=classes,
            ann_file=data_root + 'annotations/gray_val.json',
            img_prefix=data_root + 'val'),
        test=dict(
            type=dataset_type,
            # explicitly add your class names to the field `classes`
            classes=classes,
            ann_file=data_root + 'annotations/gray_val.json',
            img_prefix=data_root + 'val'))
    
    model = dict(
        roi_head=dict(
            bbox_head=[
                dict(
                    type='Shared2FCBBoxHead',
                    # explicitly over-write all the `num_classes` field.
                    num_classes=1),
                dict(
                    type='Shared2FCBBoxHead',
                    # explicitly over-write all the `num_classes` field.
                    num_classes=1),
                dict(
                    type='Shared2FCBBoxHead',
                    # explicitly over-write all the `num_classes` field.
                    num_classes=1)],
        # explicitly over-write all the `num_classes` field.
        mask_head=dict(num_classes=1)))
    

    And I run this config file using: CUDA_VISIBLE_DEVICES=2 python ./tools/train.py configs/swin/counter_weight_config.py --work-dir ./weights/counterweight_20211207/

    opened by LZYmixiu 8
  • torch.nn.modules.module.ModuleAttributeError: MaskRCNN: 'Sequential' object has no attribute 'init_weights

    torch.nn.modules.module.ModuleAttributeError: MaskRCNN: 'Sequential' object has no attribute 'init_weights

    When I training the model which type is maskrcnn, the error likes title was encouraged, dose anyone knows what's going on? or have met this erro before? The problem seems like in mmcv. The Traceback is:

    Traceback (most recent call last):
      File "/home/ljo4sgh/miniconda3/envs/new-swin-T/lib/python3.7/site-packages/mmcv/utils/registry.py", line 52, in build_from_cfg
        return obj_cls(**args)
      File "/home/ljo4sgh/miniconda3/envs/new-swin-T/lib/python3.7/site-packages/mmdet-2.11.0-py3.7.egg/mmdet/models/detectors/mask_rcnn.py", line 24, in __init__    pretrained=pretrained)
      File "/home/ljo4sgh/miniconda3/envs/new-swin-T/lib/python3.7/site-packages/mmdet-2.11.0-py3.7.egg/mmdet/models/detectors/two_stage.py", line 48, in __init__                                             rixzerogoki" 10:54 16-12月-21
        self.init_weights(pretrained=pretrained)
      File "/home/ljo4sgh/miniconda3/envs/new-swin-T/lib/python3.7/site-packages/mmdet-2.11.0-py3.7.egg/mmdet/models/detectors/two_stage.py", line 78, in init_weights
        self.roi_head.init_weights(pretrained)
      File "/home/ljo4sgh/miniconda3/envs/new-swin-T/lib/python3.7/site-packages/mmdet-2.11.0-py3.7.egg/mmdet/models/roi_heads/standard_roi_head.py", line 48, in init_weights
        self.bbox_head.init_weights()
      File "/home/ljo4sgh/miniconda3/envs/new-swin-T/lib/python3.7/site-packages/torch/nn/modules/module.py", line 779, in __getattr__
        type(self).__name__, name))
    torch.nn.modules.module.ModuleAttributeError: 'Sequential' object has no attribute 'init_weights'
    
    During handling of the above exception, another exception occurred:
    
    Traceback (most recent call last):
      File "./tools/train.py", line 187, in <module>
        main()
      File "./tools/train.py", line 161, in main
        test_cfg=cfg.get('test_cfg'))
      File "/home/ljo4sgh/miniconda3/envs/new-swin-T/lib/python3.7/site-packages/mmdet-2.11.0-py3.7.egg/mmdet/models/builder.py", line 77, in build_detector
        return build(cfg, DETECTORS, dict(train_cfg=train_cfg, test_cfg=test_cfg))
      File "/home/ljo4sgh/miniconda3/envs/new-swin-T/lib/python3.7/site-packages/mmdet-2.11.0-py3.7.egg/mmdet/models/builder.py", line 34, in build
        return build_from_cfg(cfg, registry, default_args)
      File "/home/ljo4sgh/miniconda3/envs/new-swin-T/lib/python3.7/site-packages/mmcv/utils/registry.py", line 55, in build_from_cfg
        raise type(e)(f'{obj_cls.__name__}: {e}')
    torch.nn.modules.module.ModuleAttributeError: MaskRCNN: 'Sequential' object has no attribute 'init_weights'
    

    ps: I haven't modified the test_cfg in my own config.

    opened by LZYmixiu 7
  • The model and loaded state dict do not match exactly

    The model and loaded state dict do not match exactly

    Hi,

    I try to load the Swin backbone using the configuration mask_rcnn_swin_small_patch4_window7_mstrain_480-800_adamw_3x_coco.py and weights swin_small_patch4_window7_224.pth and I get the following warnings:

    mmdet - WARNING - The model and loaded state dict do not match exactly
    
    unexpected key in source state_dict: norm.weight, norm.bias, head.weight, head.bias, layers.0.blocks.1.attn_mask, layers.1.blocks.1.attn_mask, layers.2.blocks.1.attn_mask, layers.2.blocks.3.attn_mask, layers.2.blocks.5.attn_mask, layers.2.blocks.7.attn_mask, layers.2.blocks.9.attn_mask, layers.2.blocks.11.attn_mask, layers.2.blocks.13.attn_mask, layers.2.blocks.15.attn_mask, layers.2.blocks.17.attn_mask
    
    missing keys in source state_dict: norm0.weight, norm0.bias, norm1.weight, norm1.bias, norm2.weight, norm2.bias, norm3.weight, norm3.bias
    

    I understand the problem with the norm layers. In the original Swin backbone, there is only one normalization layer at the output while there is a norm layer at every output stage in the Swin backbone used for detection. However, I am not sure why there is the problem with attn_masks.

    Can you please help me?

    Thank you very much in advance.

    opened by vobecant 7
  • missing keys in source state_dict

    missing keys in source state_dict

    patch_embed.proj.weight, patch_embed.proj.bias, patch_embed.norm.weight, patch_embed.norm.bias, layers.0.blocks.0.norm1.weight, layers.0.blocks.0.norm1.bias, layers.0.blocks.0.attn.relative_position_bias_table, layers.0.blocks.0.attn.relative_position_index, layers.0.blocks.0.attn.qkv.weight, layers.0.blocks.0.attn.qkv.bias, layers.0.blocks.0.attn.proj.weight, layers.0.blocks.0.attn.proj.bias, layers.0.blocks.0.norm2.weight, layers.0.blocks.0.norm2.bias, layers.0.blocks.0.mlp.fc1.weight, layers.0.blocks.0.mlp.fc1.bias, layers.0.blocks.0.mlp.fc2.weight, layers.0.blocks.0.mlp.fc2.bias, layers.0.blocks.1.norm1.weight, layers.0.blocks.1.norm1.bias, layers.0.blocks.1.attn.relative_position_bias_table, layers.0.blocks.1.attn.relative_position_index, layers.0.blocks.1.attn.qkv.weight, layers.0.blocks.1.attn.qkv.bias, layers.0.blocks.1.attn.proj.weight, layers.0.blocks.1.attn.proj.bias, layers.0.blocks.1.norm2.weight, layers.0.blocks.1.norm2.bias, layers.0.blocks.1.mlp.fc1.weight, layers.0.blocks.1.mlp.fc1.bias, layers.0.blocks.1.mlp.fc2.weight, layers.0.blocks.1.mlp.fc2.bias, layers.0.downsample.reduction.weight, layers.0.downsample.norm.weight, layers.0.downsample.norm.bias, layers.1.blocks.0.norm1.weight, layers.1.blocks.0.norm1.bias, layers.1.blocks.0.attn.relative_position_bias_table, layers.1.blocks.0.attn.relative_position_index, layers.1.blocks.0.attn.qkv.weight, layers.1.blocks.0.attn.qkv.bias, layers.1.blocks.0.attn.proj.weight, layers.1.blocks.0.attn.proj.bias, layers.1.blocks.0.norm2.weight, layers.1.blocks.0.norm2.bias, layers.1.blocks.0.mlp.fc1.weight, layers.1.blocks.0.mlp.fc1.bias, layers.1.blocks.0.mlp.fc2.weight, layers.1.blocks.0.mlp.fc2.bias, layers.1.blocks.1.norm1.weight, layers.1.blocks.1.norm1.bias, layers.1.blocks.1.attn.relative_position_bias_table, layers.1.blocks.1.attn.relative_position_index, layers.1.blocks.1.attn.qkv.weight, layers.1.blocks.1.attn.qkv.bias, layers.1.blocks.1.attn.proj.weight, layers.1.blocks.1.attn.proj.bias, layers.1.blocks.1.norm2.weight, layers.1.blocks.1.norm2.bias, layers.1.blocks.1.mlp.fc1.weight, layers.1.blocks.1.mlp.fc1.bias, layers.1.blocks.1.mlp.fc2.weight, layers.1.blocks.1.mlp.fc2.bias, layers.1.downsample.reduction.weight, layers.1.downsample.norm.weight, layers.1.downsample.norm.bias, layers.2.blocks.0.norm1.weight, layers.2.blocks.0.norm1.bias, layers.2.blocks.0.attn.relative_position_bias_table, layers.2.blocks.0.attn.relative_position_index, layers.2.blocks.0.attn.qkv.weight, layers.2.blocks.0.attn.qkv.bias, layers.2.blocks.0.attn.proj.weight, layers.2.blocks.0.attn.proj.bias, layers.2.blocks.0.norm2.weight, layers.2.blocks.0.norm2.bias, layers.2.blocks.0.mlp.fc1.weight, layers.2.blocks.0.mlp.fc1.bias, layers.2.blocks.0.mlp.fc2.weight, layers.2.blocks.0.mlp.fc2.bias, layers.2.blocks.1.norm1.weight, layers.2.blocks.1.norm1.bias, layers.2.blocks.1.attn.relative_position_bias_table, layers.2.blocks.1.attn.relative_position_index, layers.2.blocks.1.attn.qkv.weight, layers.2.blocks.1.attn.qkv.bias, layers.2.blocks.1.attn.proj.weight, layers.2.blocks.1.attn.proj.bias, layers.2.blocks.1.norm2.weight, layers.2.blocks.1.norm2.bias, layers.2.blocks.1.mlp.fc1.weight, layers.2.blocks.1.mlp.fc1.bias, layers.2.blocks.1.mlp.fc2.weight, layers.2.blocks.1.mlp.fc2.bias, layers.2.blocks.2.norm1.weight, layers.2.blocks.2.norm1.bias, layers.2.blocks.2.attn.relative_position_bias_table, layers.2.blocks.2.attn.relative_position_index, layers.2.blocks.2.attn.qkv.weight, layers.2.blocks.2.attn.qkv.bias, layers.2.blocks.2.attn.proj.weight, layers.2.blocks.2.attn.proj.bias, layers.2.blocks.2.norm2.weight, layers.2.blocks.2.norm2.bias, layers.2.blocks.2.mlp.fc1.weight, layers.2.blocks.2.mlp.fc1.bias, layers.2.blocks.2.mlp.fc2.weight, layers.2.blocks.2.mlp.fc2.bias, layers.2.blocks.3.norm1.weight, layers.2.blocks.3.norm1.bias, layers.2.blocks.3.attn.relative_position_bias_table, layers.2.blocks.3.attn.relative_position_index, layers.2.blocks.3.attn.qkv.weight, layers.2.blocks.3.attn.qkv.bias, layers.2.blocks.3.attn.proj.weight, layers.2.blocks.3.attn.proj.bias, layers.2.blocks.3.norm2.weight, layers.2.blocks.3.norm2.bias, layers.2.blocks.3.mlp.fc1.weight, layers.2.blocks.3.mlp.fc1.bias, layers.2.blocks.3.mlp.fc2.weight, layers.2.blocks.3.mlp.fc2.bias, layers.2.blocks.4.norm1.weight, layers.2.blocks.4.norm1.bias, layers.2.blocks.4.attn.relative_position_bias_table, layers.2.blocks.4.attn.relative_position_index, layers.2.blocks.4.attn.qkv.weight, layers.2.blocks.4.attn.qkv.bias, layers.2.blocks.4.attn.proj.weight, layers.2.blocks.4.attn.proj.bias, layers.2.blocks.4.norm2.weight, layers.2.blocks.4.norm2.bias, layers.2.blocks.4.mlp.fc1.weight, layers.2.blocks.4.mlp.fc1.bias, layers.2.blocks.4.mlp.fc2.weight, layers.2.blocks.4.mlp.fc2.bias, layers.2.blocks.5.norm1.weight, layers.2.blocks.5.norm1.bias, layers.2.blocks.5.attn.relative_position_bias_table, layers.2.blocks.5.attn.relative_position_index, layers.2.blocks.5.attn.qkv.weight, layers.2.blocks.5.attn.qkv.bias, layers.2.blocks.5.attn.proj.weight, layers.2.blocks.5.attn.proj.bias, layers.2.blocks.5.norm2.weight, layers.2.blocks.5.norm2.bias, layers.2.blocks.5.mlp.fc1.weight, layers.2.blocks.5.mlp.fc1.bias, layers.2.blocks.5.mlp.fc2.weight, layers.2.blocks.5.mlp.fc2.bias, layers.2.downsample.reduction.weight, layers.2.downsample.norm.weight, layers.2.downsample.norm.bias, layers.3.blocks.0.norm1.weight, layers.3.blocks.0.norm1.bias, layers.3.blocks.0.attn.relative_position_bias_table, layers.3.blocks.0.attn.relative_position_index, layers.3.blocks.0.attn.qkv.weight, layers.3.blocks.0.attn.qkv.bias, layers.3.blocks.0.attn.proj.weight, layers.3.blocks.0.attn.proj.bias, layers.3.blocks.0.norm2.weight, layers.3.blocks.0.norm2.bias, layers.3.blocks.0.mlp.fc1.weight, layers.3.blocks.0.mlp.fc1.bias, layers.3.blocks.0.mlp.fc2.weight, layers.3.blocks.0.mlp.fc2.bias, layers.3.blocks.1.norm1.weight, layers.3.blocks.1.norm1.bias, layers.3.blocks.1.attn.relative_position_bias_table, layers.3.blocks.1.attn.relative_position_index, layers.3.blocks.1.attn.qkv.weight, layers.3.blocks.1.attn.qkv.bias, layers.3.blocks.1.attn.proj.weight, layers.3.blocks.1.attn.proj.bias, layers.3.blocks.1.norm2.weight, layers.3.blocks.1.norm2.bias, layers.3.blocks.1.mlp.fc1.weight, layers.3.blocks.1.mlp.fc1.bias, layers.3.blocks.1.mlp.fc2.weight, layers.3.blocks.1.mlp.fc2.bias, norm0.weight, norm0.bias, norm1.weight, norm1.bias, norm2.weight, norm2.bias, norm3.weight, norm3.bias

    model = dict( pretrained='/storage/wjb/AlignPS/pretrained/swin_tiny_patch4_window7_224.pth', backbone=dict( type='SwinTransformer', embed_dim=96, depths=[2, 2, 6, 2], num_heads=[3, 6, 12, 24], window_size=7, mlp_ratio=4., qkv_bias=True, qk_scale=None, drop_rate=0., attn_drop_rate=0., drop_path_rate=0.2, ape=False, patch_norm=True, out_indices=(0, 1, 2, 3), use_checkpoint=False),

    opened by jiabeiwangTJU 5
  • TypeError: MaskRCNN: SwinTransformer: empty() received an invalid combination of arguments

    TypeError: MaskRCNN: SwinTransformer: empty() received an invalid combination of arguments

    Hello, I'm now trying to apply Swin Transfomer for object detection. I meet this error both with or without GPU when training with swin models (the other models are fine when training).

    Checklist

    1. I have searched related issues but cannot get the expected help.
    2. I have read the FAQ documentation but cannot get the expected help.
    3. The bug has not been fixed in the latest version.

    Describe the bug At the first time, I got into these errors: TypeError: MaskRCNN: SwinTransformer: init() got an unexpected keyword argument 'embed_dim' TypeError: MaskRCNN: SwinTransformer: init() got an unexpected keyword argument 'ape' TypeError: MaskRCNN: SwinTransformer: init() got an unexpected keyword argument 'use_checkpoint' Then I've followed an answered issue here to change "embed_dim" to "embed_dims" in the configs, and I have also commented out the "ape=False" and "use_checkpoint=False" lines in the configs. Then these errors didn't happen again :)).

    But right after that, I got into the error below:

    **TypeError: MaskRCNN: SwinTransformer: empty() received an invalid combination of arguments - got (tuple, dtype=NoneType, device=NoneType), but expected one of:

    • (tuple of ints size, *, tuple of names names, torch.memory_format memory_format, torch.dtype dtype, torch.layout layout, torch.device device, bool pin_memory, bool requires_grad)
    • (tuple of ints size, , torch.memory_format memory_format, Tensor out, torch.dtype dtype, torch.layout layout, torch.device device, bool pin_memory, bool requires_grad)*

    Reproduction

    1. What command or script did you run?
    python3 tools/train.py configs/swin/mask_rcnn_swin_tiny_patch4_window7_mstrain_480-800_adamw_1x_coco.py
    

    This also happened with other swin configs I've tried and my swin-based config!

    1. OUTPUT:
    2022-02-16 16:19:04,508 - mmdet - INFO - Distributed training: False
    2022-02-16 16:19:07,995 - mmdet - INFO - Config:
    model = dict(
        type='MaskRCNN',
        pretrained=None,
        backbone=dict(
            type='SwinTransformer',
            embed_dims=96,
            depths=[2, 2, 6, 2],
            num_heads=[3, 6, 12, 24],
            window_size=7,
            mlp_ratio=4.0,
            qkv_bias=True,
            qk_scale=None,
            drop_rate=0.0,
            attn_drop_rate=0.0,
            drop_path_rate=0.1,
            patch_norm=True,
            out_indices=(0, 1, 2, 3)),
        neck=dict(
            type='FPN',
            in_channels=[96, 192, 384, 768],
            out_channels=256,
            num_outs=5),
        rpn_head=dict(
            type='RPNHead',
            in_channels=256,
            feat_channels=256,
            anchor_generator=dict(
                type='AnchorGenerator',
                scales=[8],
                ratios=[0.5, 1.0, 2.0],
                strides=[4, 8, 16, 32, 64]),
            bbox_coder=dict(
                type='DeltaXYWHBBoxCoder',
                target_means=[0.0, 0.0, 0.0, 0.0],
                target_stds=[1.0, 1.0, 1.0, 1.0]),
            loss_cls=dict(
                type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0),
            loss_bbox=dict(type='L1Loss', loss_weight=1.0)),
        roi_head=dict(
            type='StandardRoIHead',
            bbox_roi_extractor=dict(
                type='SingleRoIExtractor',
                roi_layer=dict(type='RoIAlign', output_size=7, sampling_ratio=0),
                out_channels=256,
                featmap_strides=[4, 8, 16, 32]),
            bbox_head=dict(
                type='Shared2FCBBoxHead',
                in_channels=256,
                fc_out_channels=1024,
                roi_feat_size=7,
                num_classes=80,
                bbox_coder=dict(
                    type='DeltaXYWHBBoxCoder',
                    target_means=[0.0, 0.0, 0.0, 0.0],
                    target_stds=[0.1, 0.1, 0.2, 0.2]),
                reg_class_agnostic=False,
                loss_cls=dict(
                    type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0),
                loss_bbox=dict(type='L1Loss', loss_weight=1.0)),
            mask_roi_extractor=dict(
                type='SingleRoIExtractor',
                roi_layer=dict(type='RoIAlign', output_size=14, sampling_ratio=0),
                out_channels=256,
                featmap_strides=[4, 8, 16, 32]),
            mask_head=dict(
                type='FCNMaskHead',
                num_convs=4,
                in_channels=256,
                conv_out_channels=256,
                num_classes=80,
                loss_mask=dict(
                    type='CrossEntropyLoss', use_mask=True, loss_weight=1.0))),
        train_cfg=dict(
            rpn=dict(
                assigner=dict(
                    type='MaxIoUAssigner',
                    pos_iou_thr=0.7,
                    neg_iou_thr=0.3,
                    min_pos_iou=0.3,
                    match_low_quality=True,
                    ignore_iof_thr=-1),
                sampler=dict(
                    type='RandomSampler',
                    num=256,
                    pos_fraction=0.5,
                    neg_pos_ub=-1,
                    add_gt_as_proposals=False),
                allowed_border=-1,
                pos_weight=-1,
                debug=False),
            rpn_proposal=dict(
                nms_pre=2000,
                max_per_img=1000,
                nms=dict(type='nms', iou_threshold=0.7),
                min_bbox_size=0),
            rcnn=dict(
                assigner=dict(
                    type='MaxIoUAssigner',
                    pos_iou_thr=0.5,
                    neg_iou_thr=0.5,
                    min_pos_iou=0.5,
                    match_low_quality=True,
                    ignore_iof_thr=-1),
                sampler=dict(
                    type='RandomSampler',
                    num=512,
                    pos_fraction=0.25,
                    neg_pos_ub=-1,
                    add_gt_as_proposals=True),
                mask_size=28,
                pos_weight=-1,
                debug=False)),
        test_cfg=dict(
            rpn=dict(
                nms_pre=1000,
                max_per_img=1000,
                nms=dict(type='nms', iou_threshold=0.7),
                min_bbox_size=0),
            rcnn=dict(
                score_thr=0.05,
                nms=dict(type='nms', iou_threshold=0.5),
                max_per_img=100,
                mask_thr_binary=0.5)))
    dataset_type = 'CocoDataset'
    data_root = 'data/coco/'
    img_norm_cfg = dict(
        mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)
    train_pipeline = [
        dict(type='LoadImageFromFile'),
        dict(type='LoadAnnotations', with_bbox=True, with_mask=True),
        dict(type='RandomFlip', flip_ratio=0.5),
        dict(
            type='AutoAugment',
            policies=[[{
                'type':
                'Resize',
                'img_scale': [(480, 1333), (512, 1333), (544, 1333), (576, 1333),
                              (608, 1333), (640, 1333), (672, 1333), (704, 1333),
                              (736, 1333), (768, 1333), (800, 1333)],
                'multiscale_mode':
                'value',
                'keep_ratio':
                True
            }],
                      [{
                          'type': 'Resize',
                          'img_scale': [(400, 1333), (500, 1333), (600, 1333)],
                          'multiscale_mode': 'value',
                          'keep_ratio': True
                      }, {
                          'type': 'RandomCrop',
                          'crop_type': 'absolute_range',
                          'crop_size': (384, 600),
                          'allow_negative_crop': True
                      }, {
                          'type':
                          'Resize',
                          'img_scale': [(480, 1333), (512, 1333), (544, 1333),
                                        (576, 1333), (608, 1333), (640, 1333),
                                        (672, 1333), (704, 1333), (736, 1333),
                                        (768, 1333), (800, 1333)],
                          'multiscale_mode':
                          'value',
                          'override':
                          True,
                          'keep_ratio':
                          True
                      }]]),
        dict(
            type='Normalize',
            mean=[123.675, 116.28, 103.53],
            std=[58.395, 57.12, 57.375],
            to_rgb=True),
        dict(type='Pad', size_divisor=32),
        dict(type='DefaultFormatBundle'),
        dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels', 'gt_masks'])
    ]
    test_pipeline = [
        dict(type='LoadImageFromFile'),
        dict(
            type='MultiScaleFlipAug',
            img_scale=(1333, 800),
            flip=False,
            transforms=[
                dict(type='Resize', keep_ratio=True),
                dict(type='RandomFlip'),
                dict(
                    type='Normalize',
                    mean=[123.675, 116.28, 103.53],
                    std=[58.395, 57.12, 57.375],
                    to_rgb=True),
                dict(type='Pad', size_divisor=32),
                dict(type='ImageToTensor', keys=['img']),
                dict(type='Collect', keys=['img'])
            ])
    ]
    data = dict(
        samples_per_gpu=2,
        workers_per_gpu=2,
        train=dict(
            type='CocoDataset',
            ann_file='data/coco/annotations/instances_train2017.json',
            img_prefix='data/coco/train2017/',
            pipeline=[
                dict(type='LoadImageFromFile'),
                dict(type='LoadAnnotations', with_bbox=True, with_mask=True),
                dict(type='RandomFlip', flip_ratio=0.5),
                dict(
                    type='AutoAugment',
                    policies=[[{
                        'type':
                        'Resize',
                        'img_scale': [(480, 1333), (512, 1333), (544, 1333),
                                      (576, 1333), (608, 1333), (640, 1333),
                                      (672, 1333), (704, 1333), (736, 1333),
                                      (768, 1333), (800, 1333)],
                        'multiscale_mode':
                        'value',
                        'keep_ratio':
                        True
                    }],
                              [{
                                  'type': 'Resize',
                                  'img_scale': [(400, 1333), (500, 1333),
                                                (600, 1333)],
                                  'multiscale_mode': 'value',
                                  'keep_ratio': True
                              }, {
                                  'type': 'RandomCrop',
                                  'crop_type': 'absolute_range',
                                  'crop_size': (384, 600),
                                  'allow_negative_crop': True
                              }, {
                                  'type':
                                  'Resize',
                                  'img_scale': [(480, 1333), (512, 1333),
                                                (544, 1333), (576, 1333),
                                                (608, 1333), (640, 1333),
                                                (672, 1333), (704, 1333),
                                                (736, 1333), (768, 1333),
                                                (800, 1333)],
                                  'multiscale_mode':
                                  'value',
                                  'override':
                                  True,
                                  'keep_ratio':
                                  True
                              }]]),
                dict(
                    type='Normalize',
                    mean=[123.675, 116.28, 103.53],
                    std=[58.395, 57.12, 57.375],
                    to_rgb=True),
                dict(type='Pad', size_divisor=32),
                dict(type='DefaultFormatBundle'),
                dict(
                    type='Collect',
                    keys=['img', 'gt_bboxes', 'gt_labels', 'gt_masks'])
            ]),
        val=dict(
            type='CocoDataset',
            ann_file='data/coco/annotations/instances_val2017.json',
            img_prefix='data/coco/val2017/',
            pipeline=[
                dict(type='LoadImageFromFile'),
                dict(
                    type='MultiScaleFlipAug',
                    img_scale=(1333, 800),
                    flip=False,
                    transforms=[
                        dict(type='Resize', keep_ratio=True),
                        dict(type='RandomFlip'),
                        dict(
                            type='Normalize',
                            mean=[123.675, 116.28, 103.53],
                            std=[58.395, 57.12, 57.375],
                            to_rgb=True),
                        dict(type='Pad', size_divisor=32),
                        dict(type='ImageToTensor', keys=['img']),
                        dict(type='Collect', keys=['img'])
                    ])
            ]),
        test=dict(
            type='CocoDataset',
            ann_file='data/coco/annotations/instances_val2017.json',
            img_prefix='data/coco/val2017/',
            pipeline=[
                dict(type='LoadImageFromFile'),
                dict(
                    type='MultiScaleFlipAug',
                    img_scale=(1333, 800),
                    flip=False,
                    transforms=[
                        dict(type='Resize', keep_ratio=True),
                        dict(type='RandomFlip'),
                        dict(
                            type='Normalize',
                            mean=[123.675, 116.28, 103.53],
                            std=[58.395, 57.12, 57.375],
                            to_rgb=True),
                        dict(type='Pad', size_divisor=32),
                        dict(type='ImageToTensor', keys=['img']),
                        dict(type='Collect', keys=['img'])
                    ])
            ]))
    evaluation = dict(metric=['bbox', 'segm'])
    optimizer = dict(
        type='AdamW',
        lr=0.0001,
        betas=(0.9, 0.999),
        weight_decay=0.05,
        paramwise_cfg=dict(
            custom_keys=dict(
                absolute_pos_embed=dict(decay_mult=0.0),
                relative_position_bias_table=dict(decay_mult=0.0),
                norm=dict(decay_mult=0.0))))
    optimizer_config = dict(
        grad_clip=None,
        type='DistOptimizerHook',
        update_interval=1,
        coalesce=True,
        bucket_size_mb=-1,
        use_fp16=True)
    lr_config = dict(
        policy='step',
        warmup='linear',
        warmup_iters=500,
        warmup_ratio=0.001,
        step=[8, 11])
    runner = dict(type='EpochBasedRunnerAmp', max_epochs=12)
    checkpoint_config = dict(interval=1)
    log_config = dict(interval=50, hooks=[dict(type='TextLoggerHook')])
    custom_hooks = [dict(type='NumClassCheckHook')]
    dist_params = dict(backend='nccl')
    log_level = 'INFO'
    load_from = None
    resume_from = None
    workflow = [('train', 1)]
    fp16 = None
    work_dir = './work_dirs/mask_rcnn_swin_tiny_patch4_window7_mstrain_480-800_adamw_1x_coco'
    gpu_ids = range(0, 1)
    
    Traceback (most recent call last):
      File "/usr/local/lib/python3.7/dist-packages/mmcv/utils/registry.py", line 52, in build_from_cfg
        return obj_cls(**args)
      File "/usr/local/lib/python3.7/dist-packages/mmdet/models/backbones/swin.py", line 631, in __init__
        init_cfg=None)
      File "/usr/local/lib/python3.7/dist-packages/mmdet/models/backbones/swin.py", line 450, in __init__
        init_cfg=None)
      File "/usr/local/lib/python3.7/dist-packages/mmdet/models/backbones/swin.py", line 356, in __init__
        init_cfg=None)
      File "/usr/local/lib/python3.7/dist-packages/mmcv/utils/misc.py", line 340, in new_func
        output = old_func(*args, **kwargs)
      File "/usr/local/lib/python3.7/dist-packages/mmcv/cnn/bricks/transformer.py", line 607, in __init__
        Linear(in_channels, feedforward_channels), self.activate,
      File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/linear.py", line 85, in __init__
        self.weight = Parameter(torch.empty((out_features, in_features), **factory_kwargs))
    TypeError: empty() received an invalid combination of arguments - got (tuple, dtype=NoneType, device=NoneType), but expected one of:
     * (tuple of ints size, *, tuple of names names, torch.memory_format memory_format, torch.dtype dtype, torch.layout layout, torch.device device, bool pin_memory, bool requires_grad)
     * (tuple of ints size, *, torch.memory_format memory_format, Tensor out, torch.dtype dtype, torch.layout layout, torch.device device, bool pin_memory, bool requires_grad)
    
    
    During handling of the above exception, another exception occurred:
    
    Traceback (most recent call last):
      File "/usr/local/lib/python3.7/dist-packages/mmcv/utils/registry.py", line 52, in build_from_cfg
        return obj_cls(**args)
      File "/usr/local/lib/python3.7/dist-packages/mmdet/models/detectors/mask_rcnn.py", line 27, in __init__
        init_cfg=init_cfg)
      File "/usr/local/lib/python3.7/dist-packages/mmdet/models/detectors/two_stage.py", line 32, in __init__
        self.backbone = build_backbone(backbone)
      File "/usr/local/lib/python3.7/dist-packages/mmdet/models/builder.py", line 20, in build_backbone
        return BACKBONES.build(cfg)
      File "/usr/local/lib/python3.7/dist-packages/mmcv/utils/registry.py", line 212, in build
        return self.build_func(*args, **kwargs, registry=self)
      File "/usr/local/lib/python3.7/dist-packages/mmcv/cnn/builder.py", line 27, in build_model_from_cfg
        return build_from_cfg(cfg, registry, default_args)
      File "/usr/local/lib/python3.7/dist-packages/mmcv/utils/registry.py", line 55, in build_from_cfg
        raise type(e)(f'{obj_cls.__name__}: {e}')
    TypeError: SwinTransformer: empty() received an invalid combination of arguments - got (tuple, dtype=NoneType, device=NoneType), but expected one of:
     * (tuple of ints size, *, tuple of names names, torch.memory_format memory_format, torch.dtype dtype, torch.layout layout, torch.device device, bool pin_memory, bool requires_grad)
     * (tuple of ints size, *, torch.memory_format memory_format, Tensor out, torch.dtype dtype, torch.layout layout, torch.device device, bool pin_memory, bool requires_grad)
    
    
    During handling of the above exception, another exception occurred:
    
    Traceback (most recent call last):
      File "tools/train.py", line 187, in <module>
        main()
      File "tools/train.py", line 161, in main
        test_cfg=cfg.get('test_cfg'))
      File "/usr/local/lib/python3.7/dist-packages/mmdet/models/builder.py", line 59, in build_detector
        cfg, default_args=dict(train_cfg=train_cfg, test_cfg=test_cfg))
      File "/usr/local/lib/python3.7/dist-packages/mmcv/utils/registry.py", line 212, in build
        return self.build_func(*args, **kwargs, registry=self)
      File "/usr/local/lib/python3.7/dist-packages/mmcv/cnn/builder.py", line 27, in build_model_from_cfg
        return build_from_cfg(cfg, registry, default_args)
      File "/usr/local/lib/python3.7/dist-packages/mmcv/utils/registry.py", line 55, in build_from_cfg
        raise type(e)(f'{obj_cls.__name__}: {e}')
    TypeError: MaskRCNN: SwinTransformer: empty() received an invalid combination of arguments - got (tuple, dtype=NoneType, device=NoneType), but expected one of:
     * (tuple of ints size, *, tuple of names names, torch.memory_format memory_format, torch.dtype dtype, torch.layout layout, torch.device device, bool pin_memory, bool requires_grad)
     * (tuple of ints size, *, torch.memory_format memory_format, Tensor out, torch.dtype dtype, torch.layout layout, torch.device device, bool pin_memory, bool requires_grad)
    
    
    1. What dataset did you use? I've tried to train with default coco2017 dataset, then on my own dataset but the errors always came.

    Environment

    2022-02-16 16:19:01,022 - mmdet - INFO - Environment info:

    sys.platform: linux Python: 3.7.12 (default, Jan 15 2022, 18:48:18) [GCC 7.5.0] CUDA available: True GPU 0: Tesla K80 CUDA_HOME: /usr/local/cuda NVCC: Build cuda_11.1.TC455_06.29190527_0 GCC: gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0 PyTorch: 1.10.0+cu111 PyTorch compiling details: PyTorch built with:

    • GCC 7.3
    • C++ Version: 201402
    • Intel(R) Math Kernel Library Version 2020.0.0 Product Build 20191122 for Intel(R) 64 architecture applications
    • Intel(R) MKL-DNN v2.2.3 (Git Hash 7336ca9f055cf1bfa13efb658fe15dc9b41f0740)
    • OpenMP 201511 (a.k.a. OpenMP 4.5)
    • LAPACK is enabled (usually provided by MKL)
    • NNPACK is enabled
    • CPU capability usage: AVX2
    • CUDA Runtime 11.1
    • NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86
    • CuDNN 8.0.5
    • Magma 2.5.2
    • Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.1, CUDNN_VERSION=8.0.5, CXX_COMPILER=/opt/rh/devtoolset-7/root/usr/bin/c++, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -DEDGE_PROFILER_USE_KINETO -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.10.0, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON,

    TorchVision: 0.11.1+cu111 OpenCV: 4.1.2 MMCV: 1.4.5 MMCV Compiler: GCC 7.3 MMCV CUDA Compiler: 11.1 MMDetection: 2.21.0+461e003

    Please help! This is one of my first experiences working with github and computer vision projects! Thanks for any help!

    opened by queman 4
  • RuntimeError: Default process group has not been initialized, please make sure to call init_process_group for cascade_mask_rcnn_swin_base_patch4

    RuntimeError: Default process group has not been initialized, please make sure to call init_process_group for cascade_mask_rcnn_swin_base_patch4

    Thanks for your error report and we appreciate it a lot.

    Describe the bug

    Getting a run time error "RuntimeError: Default process group has not been initialized, please make sure to call init_process_group." when training using below command

    python tools/train.py configs/swin/cascade_mask_rcnn_swin_base_patch4_window7_mstrain_480-800_giou_4conv1f_adamw_3x_coco.py --cfg-options model.pretrained=swin_base_patch4_window12_384_22k.pth
    

    Error:

    Traceback (most recent call last):
      File "C:\Users\user\Swin-Transformer-Object-Detection\tools\train.py", line 187, in <module>
        main()
      File "C:\Users\user\Swin-Transformer-Object-Detection\tools\train.py", line 176, in main
        train_detector(
      File "c:\users\user\swin-transformer-object-detection\mmdet\apis\train.py", line 185, in train_detector
        runner.run(data_loaders, cfg.workflow)
      File "C:\Users\user\anaconda3\envs\py_env\lib\site-packages\mmcv\runner\epoch_based_runner.py", line 127, in run
        epoch_runner(data_loaders[i], **kwargs)
      File "C:\Users\user\anaconda3\envs\py_env\lib\site-packages\mmcv\runner\epoch_based_runner.py", line 50, in train
        self.run_iter(data_batch, train_mode=True, **kwargs)
      File "C:\Users\user\anaconda3\envs\py_env\lib\site-packages\mmcv\runner\epoch_based_runner.py", line 29, in run_iter
        outputs = self.model.train_step(data_batch, self.optimizer,
      File "C:\Users\user\anaconda3\envs\py_env\lib\site-packages\mmcv\parallel\data_parallel.py", line 75, in train_step
        return self.module.train_step(*inputs[0], **kwargs[0])
      File "c:\users\user\swin-transformer-object-detection\mmdet\models\detectors\base.py", line 247, in train_step
        losses = self(**data)
      File "C:\Users\user\anaconda3\envs\py_env\lib\site-packages\torch\nn\modules\module.py", line 1102, in _call_impl
        return forward_call(*input, **kwargs)
      File "C:\Users\user\anaconda3\envs\py_env\lib\site-packages\mmcv\runner\fp16_utils.py", line 98, in new_func
        return old_func(*args, **kwargs)
      File "c:\users\user\swin-transformer-object-detection\mmdet\models\detectors\base.py", line 181, in forward
        return self.forward_train(img, img_metas, **kwargs)
      File "c:\users\user\swin-transformer-object-detection\mmdet\models\detectors\two_stage.py", line 161, in forward_train
        roi_losses = self.roi_head.forward_train(x, img_metas, proposal_list,
      File "c:\users\user\swin-transformer-object-detection\mmdet\models\roi_heads\cascade_roi_head.py", line 257, in forward_train
        bbox_results = self._bbox_forward_train(i, x, sampling_results,
      File "c:\users\user\swin-transformer-object-detection\mmdet\models\roi_heads\cascade_roi_head.py", line 157, in _bbox_forward_train
        bbox_results = self._bbox_forward(stage, x, rois)
      File "c:\users\user\swin-transformer-object-detection\mmdet\models\roi_heads\cascade_roi_head.py", line 147, in _bbox_forward
        cls_score, bbox_pred = bbox_head(bbox_feats)
      File "C:\Users\user\anaconda3\envs\py_env\lib\site-packages\torch\nn\modules\module.py", line 1102, in _call_impl
        return forward_call(*input, **kwargs)
      File "c:\users\user\swin-transformer-object-detection\mmdet\models\roi_heads\bbox_heads\convfc_bbox_head.py", line 139, in forward
        x = conv(x)
      File "C:\Users\user\anaconda3\envs\py_env\lib\site-packages\torch\nn\modules\module.py", line 1102, in _call_impl
        return forward_call(*input, **kwargs)
      File "C:\Users\user\anaconda3\envs\py_env\lib\site-packages\mmcv\cnn\bricks\conv_module.py", line 203, in forward
        x = self.norm(x)
      File "C:\Users\user\anaconda3\envs\py_env\lib\site-packages\torch\nn\modules\module.py", line 1102, in _call_impl
        return forward_call(*input, **kwargs)
      File "C:\Users\user\anaconda3\envs\py_env\lib\site-packages\torch\nn\modules\batchnorm.py", line 732, in forward
        world_size = torch.distributed.get_world_size(process_group)
      File "C:\Users\user\anaconda3\envs\py_env\lib\site-packages\torch\distributed\distributed_c10d.py", line 845, in get_world_size
        return _get_group_size(group)
      File "C:\Users\user\anaconda3\envs\py_env\lib\site-packages\torch\distributed\distributed_c10d.py", line 306, in _get_group_size
        default_pg = _get_default_group()
      File "C:\Users\user\anaconda3\envs\py_env\lib\site-packages\torch\distributed\distributed_c10d.py", line 410, in _get_default_group
    
    

    A clear and concise description of what the bug is.

    Reproduction

    Training with single GPU

    python tools/train.py configs/swin/cascade_mask_rcnn_swin_base_patch4_window7_mstrain_480-800_giou_4conv1f_adamw_3x_coco.py --cfg-options model.pretrained=swin_base_patch4_window12_384_22k.pth
    
    
    2. Did you make any modifications on the code or config? Did you understand what you have modified?
    
    Changed the number of classes in configs to 2 classes
    
    3. What dataset did you use?
    
    Custom coco dataset
    
    **Environment**
    
    OS : Windows 10
    Torch: 1.1.0
    MMCV: 1.4.0
    
    Please advise. Thanks.
    opened by rgkannan676 4
  • Attempt for SwinTransformer + Faster RCNN

    Attempt for SwinTransformer + Faster RCNN

    Hello, I tried to perform Swin Transformer as a backbone to feed the feature map into Faster RCNN. I used the config of Swin Transformer and FPN from the file "configs/base/models/mask_rcnn_swin_fpn.py" and the rpn_head and roi_head from "configs/base/models/faster_rcnn_r50_fpn.py", in order to construct a new architecture for object detection. Then the pretrained file "swin_tiny_patch4_window7_224.pth" has been imported. But as shown in the console, "The model and loaded state dict do not match exactly". I wonder whether I make some mistakes across the construction of the architecture or the utilization of the pretrained model?

    opened by wliebtzy 4
  • UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc0 in position 5096: invalid start byte

    UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc0 in position 5096: invalid start byte

    I input python tools/train.py configs\swin\cascade_mask_rcnn_swin_tiny_patch4_window7_mstrain_480-800_giou_4conv1f_adamw_3x_coco.py to train my dataset, but it generates UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc0 in position 5096: invalid start byte How can I solve this problem? Thanks

    opened by blurmemo 3
  • Failed to debug the code with VSCode (Unavailable Break point)

    Failed to debug the code with VSCode (Unavailable Break point)

    Hello, I try to debug the inference process and monitor the variables in swin transformer object detection. When I set a break point in function "single_gpu_test" in "/Swin-Transformer-Object-Detection/mmdet/apis", it is useless. And I find that the code runs the "single_gpu_test" in "lib/python3.8/site-packages/mmdet-2.11.0-py3.8.egg/mmdet/apis". So how can I set a available break point to check the variables in "single_gpu_test"? Why is the definition of the function "single_gpu_test" in the "/Swin-Transformer-Object-Detection/mmdet/apis" but implemented in "lib/python3.8/site-packages/mmdet-2.11.0-py3.8.egg/mmdet/apis"?

    Thank you!

    opened by wliebtzy 0
  • Inference Score is low AP - on different dataset

    Inference Score is low AP - on different dataset

    Hello I tried the model and AP is above 40 percent but when I test it on a different dataset the AP is around 0.04 what could be the possible reasons for failure? I trained the model on 256x256 image size and tested on 256x256 and higher sizes but nothing changed AP is still very low on the testing set. Even during I have a separate validation set but AP is good enough around 40

    opened by Mahmood-Hussain 0
  • Running error on model Swin-ReppointV2 : IndexError: The shape of the mask [3070240] at index 0 does not match the shape of the indexed tensor [383780] at index 0

    Running error on model Swin-ReppointV2 : IndexError: The shape of the mask [3070240] at index 0 does not match the shape of the indexed tensor [383780] at index 0

    Prerequisite

    1. I have searched related issues but cannot get the expected help.
    2. I have read the FAQ documentation but cannot get the expected help.
    3. The bug has not been fixed in the latest version.

    Describe the bug

    When I run the command python tools/train.py configs/swin/reppoitsv2_swin_tiny_patch4_window7_mstrain_480_960_giou_gfocal_bifpn_adamw_3x_coco.py , the error traceback as below appeared:

    Traceback (most recent call last):
      File "tools/train.py", line 194, in <module>
        main()
      File "tools/train.py", line 183, in main
        train_detector(
      File "d:\swin-transformer-object-detection\mmdet\apis\train.py", line 185, in train_detector
        runner.run(data_loaders, cfg.workflow)
      File "C:\Users\hlj\.conda\envs\torch_110\lib\site-packages\mmcv\runner\epoch_based_runner.py", line 127, in run
        epoch_runner(data_loaders[i], **kwargs)
      File "C:\Users\hlj\.conda\envs\torch_110\lib\site-packages\mmcv\runner\epoch_based_runner.py", line 50, in train
        self.run_iter(data_batch, train_mode=True, **kwargs)
      File "C:\Users\hlj\.conda\envs\torch_110\lib\site-packages\mmcv\runner\epoch_based_runner.py", line 29, in run_iter
        outputs = self.model.train_step(data_batch, self.optimizer,
      File "C:\Users\hlj\.conda\envs\torch_110\lib\site-packages\mmcv\parallel\data_parallel.py", line 75, in train_step
        return self.module.train_step(*inputs[0], **kwargs[0])
      File "d:\swin-transformer-object-detection\mmdet\models\detectors\base.py", line 247, in train_step
        losses = self(**data)
      File "C:\Users\hlj\.conda\envs\torch_110\lib\site-packages\torch\nn\modules\module.py", line 889, in _call_impl
        result = self.forward(*input, **kwargs)
      File "C:\Users\hlj\.conda\envs\torch_110\lib\site-packages\mmcv\runner\fp16_utils.py", line 128, in new_func
        output = old_func(*new_args, **new_kwargs)
      File "d:\swin-transformer-object-detection\mmdet\models\detectors\base.py", line 181, in forward
        return self.forward_train(img, img_metas, **kwargs)
      File "d:\swin-transformer-object-detection\mmdet\models\detectors\reppoints_v2_detector.py", line 34, in forward_train
        losses = self.bbox_head.loss(
      File "d:\swin-transformer-object-detection\mmdet\models\dense_heads\reppoints_v2_head.py", line 1096, in loss
        loss_sem = self.loss_sem(concat_sem_scores, concat_gt_sem_map, concat_gt_sem_weights, avg_factor=(concat_gt_sem_map > 0).sum())
      File "C:\Users\hlj\.conda\envs\torch_110\lib\site-packages\torch\nn\modules\module.py", line 889, in _call_impl
        result = self.forward(*input, **kwargs)
      File "d:\swin-transformer-object-detection\mmdet\models\losses\focal_loss.py", line 237, in forward
        loss_cls = self.loss_weight * separate_sigmoid_focal_loss(
      File "d:\swin-transformer-object-detection\mmdet\models\losses\focal_loss.py", line 75, in separate_sigmoid_focal_loss
        pos_pred = pred_sigmoid[pos_inds]
    IndexError: The shape of the mask [3070240] at index 0 does not match the shape of the indexed tensor [383780] at index 0
    

    As for modifications on the code, I only changed the 'num_classes' in model and 'CLASSES' in script 'coco.py'

    However, I run the train.py for training of model Mask R-CNN as below, it can work well without this error. python tools/train.py configs/swin/mask_rcnn_swin_tiny_patch4_window7_mstrain_480-800_adamw_1x_coco.py

    Environment

    sys.platform: win32 Python: 3.8.13 (default, Mar 28 2022, 06:59:08) [MSC v.1916 64 bit (AMD64)] CUDA available: True GPU 0: NVIDIA GeForce RTX 3090 CUDA_HOME: C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.1 NVCC: Build cuda_11.1.relgpu_drvr455TC455_06.29190527_0 GCC: n/a PyTorch: 1.8.1+cu111 PyTorch compiling details: PyTorch built with:

    • C++ Version: 199711
    • MSVC 192829913
    • Intel(R) Math Kernel Library Version 2020.0.2 Product Build 20200624 for Intel(R) 64 architecture applications
    • Intel(R) MKL-DNN v1.7.0 (Git Hash 7aed236906b1f7a05c0917e5257a1af05e9ff683)
    • OpenMP 2019
    • CPU capability usage: AVX2
    • CUDA Runtime 11.1
    • NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;a rch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_37,code=compute_37
    • CuDNN 8.0.5
    • Magma 2.5.4
    • Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.1, CUDNN_VERSION=8.0.5, CXX_COMPILER=C:/w/b/windows/tmp_bin/sccache-cl.exe, CXX_FLAGS=/DWIN32 /D_WINDOWS /GR /EHsc /w /bigobj -DUSE_PTHREADPOOL -ope nmp:experimental -DNDEBUG -DUSE_FBGEMM -DUSE_XNNPACK, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.8.1, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG =OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=OFF, USE_NNPACK=OFF, USE_OPENMP=ON,

    TorchVision: 0.9.1+cu111 OpenCV: 4.6.0 MMCV: 1.4.1 MMCV Compiler: MSVC 192930137 MMCV CUDA Compiler: 11.1 MMDetection: 2.11.0+c7b2011

    Look forward to the reply. Thanks!

    opened by busy-wong 2
  • Train common backbone with multiple detection heads

    Train common backbone with multiple detection heads

    Hi, I want to train COCO and VOC together using the same backbone. How can I ensure that the concat dataset has separate obj detection heads catering to the different number of classes in the two datasets, 80 and 20 respectively. Is there any better workaround? Any help will be appreciated!

    Thanks

    opened by jinga-lala 0
Owner
Swin Transformer
This organization maintains repositories built on Swin Transformers. The pretrained models locate at https://github.com/microsoft/Swin-Transformer
Swin Transformer
TorchDistiller - a collection of the open source pytorch code for knowledge distillation, especially for the perception tasks, including semantic segmentation, depth estimation, object detection and instance segmentation.

This project is a collection of the open source pytorch code for knowledge distillation, especially for the perception tasks, including semantic segmentation, depth estimation, object detection and instance segmentation.

yifan liu 147 Dec 3, 2022
Fast, modular reference implementation of Instance Segmentation and Object Detection algorithms in PyTorch.

Faster R-CNN and Mask R-CNN in PyTorch 1.0 maskrcnn-benchmark has been deprecated. Please see detectron2, which includes implementations for all model

Facebook Research 9k Jan 4, 2023
This is the official implementation of the paper "Object Propagation via Inter-Frame Attentions for Temporally Stable Video Instance Segmentation".

[CVPRW 2021] - Object Propagation via Inter-Frame Attentions for Temporally Stable Video Instance Segmentation

Anirudh S Chakravarthy 6 May 3, 2022
Complete-IoU (CIoU) Loss and Cluster-NMS for Object Detection and Instance Segmentation (YOLACT)

Complete-IoU Loss and Cluster-NMS for Improving Object Detection and Instance Segmentation. Our paper is accepted by IEEE Transactions on Cybernetics

null 290 Dec 25, 2022
Mask R-CNN for object detection and instance segmentation on Keras and TensorFlow

Mask R-CNN for Object Detection and Segmentation This is an implementation of Mask R-CNN on Python 3, Keras, and TensorFlow. The model generates bound

Matterport, Inc 22.5k Jan 4, 2023
Leveraging Instance-, Image- and Dataset-Level Information for Weakly Supervised Instance Segmentation

Leveraging Instance-, Image- and Dataset-Level Information for Weakly Supervised Instance Segmentation This paper has been accepted and early accessed

Yun Liu 39 Sep 20, 2022
Object detection and instance segmentation toolkit based on PaddlePaddle.

Object detection and instance segmentation toolkit based on PaddlePaddle.

null 9.3k Jan 2, 2023
Res2Net for Instance segmentation and Object detection using MaskRCNN

Res2Net for Instance segmentation and Object detection using MaskRCNN Since the MaskRCNN-benchmark of facebook is deprecated, we suggest to use our mm

Res2Net Applications 55 Oct 30, 2022
Official PyTorch implementation of Joint Object Detection and Multi-Object Tracking with Graph Neural Networks

This is the official PyTorch implementation of our paper: "Joint Object Detection and Multi-Object Tracking with Graph Neural Networks". Our project website and video demos are here.

Richard Wang 443 Dec 6, 2022
Learning RGB-D Feature Embeddings for Unseen Object Instance Segmentation

Unseen Object Clustering: Learning RGB-D Feature Embeddings for Unseen Object Instance Segmentation Introduction In this work, we propose a new method

NVIDIA Research Projects 132 Dec 13, 2022
Robust Instance Segmentation through Reasoning about Multi-Object Occlusion [CVPR 2021]

Robust Instance Segmentation through Reasoning about Multi-Object Occlusion [CVPR 2021] Abstract Analyzing complex scenes with DNN is a challenging ta

Irene Yuan 24 Jun 27, 2022
Codes of paper "Unseen Object Amodal Instance Segmentation via Hierarchical Occlusion Modeling"

Unseen Object Amodal Instance Segmentation (UOAIS) Seunghyeok Back, Joosoon Lee, Taewon Kim, Sangjun Noh, Raeyoung Kang, Seongho Bak, Kyoobin Lee This

GIST-AILAB 92 Dec 13, 2022
Official Pytorch Implementation of Adversarial Instance Augmentation for Building Change Detection in Remote Sensing Images.

IAug_CDNet Official Implementation of Adversarial Instance Augmentation for Building Change Detection in Remote Sensing Images. Overview We propose a

null 53 Dec 2, 2022
Code and models for ICCV2021 paper "Robust Object Detection via Instance-Level Temporal Cycle Confusion".

Robust Object Detection via Instance-Level Temporal Cycle Confusion This repo contains the implementation of the ICCV 2021 paper, Robust Object Detect

Xin Wang 69 Oct 13, 2022
A Data Annotation Tool for Semantic Segmentation, Object Detection and Lane Line Detection.(In Development Stage)

Data-Annotation-Tool How to Run this Tool? To run this software, follow the steps: git clone https://github.com/Autonomous-Car-Project/Data-Annotation

TiVRA AI 13 Aug 18, 2022
[ArXiv 2021] Data-Efficient Instance Generation from Instance Discrimination

InsGen - Data-Efficient Instance Generation from Instance Discrimination Data-Efficient Instance Generation from Instance Discrimination Ceyuan Yang,

GenForce: May Generative Force Be with You 93 Dec 25, 2022
Code for Multiple Instance Active Learning for Object Detection, CVPR 2021

Language: 简体中文 | English Introduction This is the code for Multiple Instance Active Learning for Object Detection, CVPR 2021. Installation A Linux pla

Tianning Yuan 269 Dec 21, 2022
Code for Multiple Instance Active Learning for Object Detection, CVPR 2021

MI-AOD Language: 简体中文 | English Introduction This is the code for Multiple Instance Active Learning for Object Detection (The PDF is not available tem

Tianning Yuan 269 Dec 21, 2022
Instance-conditional Knowledge Distillation for Object Detection

Instance-conditional Knowledge Distillation for Object Detection This is a MegEngine implementation of the paper "Instance-conditional Knowledge Disti

MEGVII Research 47 Nov 17, 2022