This is an official implementation for "Swin Transformer: Hierarchical Vision Transformer using Shifted Windows" on Object Detection and Instance Segmentation.

Swin Transformer

Last update: Dec 30, 2022

Related tags

Deep Learning object-detection cascade mscoco mask-rcnn swin swin-transformer reppoints

Overview

Swin Transformer for Object Detection

This repo contains the supported code and configuration files to reproduce object detection results of Swin Transformer. It is based on mmdetection.

Updates

04/12/2021 Initial commits

Results and Models

Mask R-CNN

Backbone	Pretrain	Lr Schd	box mAP	mask mAP	#params	FLOPs	config	log	model
Swin-T	ImageNet-1K	3x	46.0	41.6	48M	267G	config	github/baidu	github/baidu
Swin-S	ImageNet-1K	3x	48.5	43.3	69M	359G	config	github/baidu	github/baidu

Cascade Mask R-CNN

Backbone	Pretrain	Lr Schd	box mAP	mask mAP	#params	FLOPs	config	log	model
Swin-T	ImageNet-1K	3x	50.4	43.7	86M	745G	config	github/baidu	github/baidu
Swin-S	ImageNet-1K	3x	51.9	45.0	107M	838G	config	github/baidu	github/baidu
Swin-B	ImageNet-1K	3x	51.9	45.0	145M	982G	config	github/baidu	github/baidu

RepPoints V2

Backbone	Pretrain	Lr Schd	box mAP	mask mAP	#params	FLOPs
Swin-T	ImageNet-1K	3x	50.0	-	45M	283G

Mask RepPoints V2

Backbone	Pretrain	Lr Schd	box mAP	mask mAP	#params	FLOPs
Swin-T	ImageNet-1K	3x	50.3	43.6	47M	292G

Notes:

Pre-trained models can be downloaded from Swin Transformer for ImageNet Classification.
Access code for baidu is swin.

Usage

Installation

Please refer to get_started.md for installation and dataset preparation.

Inference

# single-gpu testing
python tools/test.py <CONFIG_FILE> <DET_CHECKPOINT_FILE> --eval bbox segm

# multi-gpu testing
tools/dist_test.sh <CONFIG_FILE> <DET_CHECKPOINT_FILE> <GPU_NUM> --eval bbox segm

Training

To train a detector with pre-trained models, run:

# single-gpu training
python tools/train.py <CONFIG_FILE> --cfg-options model.pretrained=<PRETRAIN_MODEL> [model.backbone.use_checkpoint=True] [other optional arguments]

# multi-gpu training
tools/dist_train.sh <CONFIG_FILE> <GPU_NUM> --cfg-options model.pretrained=<PRETRAIN_MODEL> [model.backbone.use_checkpoint=True] [other optional arguments]

For example, to train a Cascade Mask R-CNN model with a Swin-T backbone and 8 gpus, run:

tools/dist_train.sh configs/swin/cascade_mask_rcnn_swin_tiny_patch4_window7_mstrain_480-800_giou_4conv1f_adamw_3x_coco.py 8 --cfg-options model.pretrained=<PRETRAIN_MODEL>

Note: use_checkpoint is used to save GPU memory. Please refer to this page for more details.

Apex (optional):

We use apex for mixed precision training by default. To install apex, run:

git clone https://github.com/NVIDIA/apex
cd apex
pip install -v --disable-pip-version-check --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./

If you would like to disable apex, modify the type of runner as EpochBasedRunner and comment out the following code block in the configuration files:

# do not use mmdet version fp16
fp16 = None
optimizer_config = dict(
    type="DistOptimizerHook",
    update_interval=1,
    grad_clip=None,
    coalesce=True,
    bucket_size_mb=-1,
    use_fp16=True,
)

Citing Swin Transformer

@article{liu2021Swin,
  title={Swin Transformer: Hierarchical Vision Transformer using Shifted Windows},
  author={Liu, Ze and Lin, Yutong and Cao, Yue and Hu, Han and Wei, Yixuan and Zhang, Zheng and Lin, Stephen and Guo, Baining},
  journal={arXiv preprint arXiv:2103.14030},
  year={2021}
}

Other Links

Image Classification: See Swin Transformer for Image Classification.

Semantic Segmentation: See Swin Transformer for Semantic Segmentation.

Comments

TypeError: MaskRCNN: SwinTransformer: __init__() got an unexpected keyword argument 'embed_dim'
Checklist

I have searched related issues but cannot get the expected help.

The issue has not been fixed in the latest version.

Describe the issue

when i run follow command python tools/train.py configs/swin/mask_rcnn_swin_tiny_patch4_window7_mstrain_480-800_adamw_1x_coco.py --cfg-options model.pretrained=check_points/mask_rcnn_swin_tiny_patch4_window7_1x.pth

i encountered a TypeError: MaskRCNN: SwinTransformer: init() got an unexpected keyword argument 'embed_dim' Reproduction

What command or script did you run?

python tools/train.py configs/swin/mask_rcnn_swin_tiny_patch4_window7_mstrain_480-800_adamw_1x_coco.py --cfg-options model.pretrained=check_points/mask_rcnn_swin_tiny_patch4_window7_1x.pth A placeholder for the command.

![image](https://user-images.githubusercontent.com/31770250/139524042-0f4f1d17-640d-4ab7-ab5f-6527a34e968b.png) 2. What config dir you run? /home/zhaogy/code/Swin-Transformer-Object-Detection-master/configs/swin/mask_rcnn_swin_tiny_patch4_window7_mstrain_480-800_adamw_1x_coco.py A placeholder for the config.

Did you make any modifications on the code or config? Did you understand what you have modified? i do not change anything

What dataset did you use? coco Environment

Please run python mmdet/utils/collect_env.py to collect necessary environment information and paste it here. .fatal: not a git repository (or any of the parent directories): .git sys.platform: linux Python: 3.7.0 (default, Oct 9 2018, 10:31:47) [GCC 7.3.0] CUDA available: True GPU 0,1,2,3: Tesla V100-SXM2-16GB CUDA_HOME: /usr/local/cuda NVCC: Build cuda_11.0_bu.TC445_37.28540450_0 GCC: gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0 PyTorch: 1.7.1 PyTorch compiling details: PyTorch built with:

GCC 7.3

C++ Version: 201402

Intel(R) oneAPI Math Kernel Library Version 2021.3-Product Build 20210617 for Intel(R) 64 architecture applications

Intel(R) MKL-DNN v1.6.0 (Git Hash 5ef631a030a6f73131c77892041042805a06064f)

OpenMP 201511 (a.k.a. OpenMP 4.5)

NNPACK is enabled

CPU capability usage: AVX2

CUDA Runtime 11.0

NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_37,code=compute_37

CuDNN 8.0.5

Magma 2.5.2

Build settings: BLAS=MKL, BUILD_TYPE=Release, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DUSE_VULKAN_WRAPPER -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, USE_CUDA=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON,

TorchVision: 0.8.2 OpenCV: 4.4.0 MMCV: 1.3.16 MMCV Compiler: GCC 7.3 MMCV CUDA Compiler: 11.0 MMDetection: 2.18.0+ 2. You may add addition that may be helpful for locating the problem, such as

How you installed PyTorch [e.g., pip, conda, source] conda install pytorch==1.7.1 torchvision==0.8.2 torchaudio==0.7.2 cudatoolkit=11.0 -c pytorch

Other environment variables that may be related (such as $PATH, $LD_LIBRARY_PATH, $PYTHONPATH, etc.)

output: fatal: not a git repository (or any of the parent directories): .git 2021-10-30 14:57:51,909 - mmdet - INFO - Environment info:

sys.platform: linux Python: 3.7.0 (default, Oct 9 2018, 10:31:47) [GCC 7.3.0] CUDA available: True GPU 0,1,2,3: Tesla V100-SXM2-16GB CUDA_HOME: /usr/local/cuda NVCC: Build cuda_11.0_bu.TC445_37.28540450_0 GCC: gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0 PyTorch: 1.7.1 PyTorch compiling details: PyTorch built with:

GCC 7.3

C++ Version: 201402

Intel(R) oneAPI Math Kernel Library Version 2021.3-Product Build 20210617 for Intel(R) 64 architecture applications

Intel(R) MKL-DNN v1.6.0 (Git Hash 5ef631a030a6f73131c77892041042805a06064f)

OpenMP 201511 (a.k.a. OpenMP 4.5)

NNPACK is enabled

CPU capability usage: AVX2

CUDA Runtime 11.0

NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_37,code=compute_37

CuDNN 8.0.5

Magma 2.5.2

Build settings: BLAS=MKL, BUILD_TYPE=Release, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DUSE_VULKAN_WRAPPER -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, USE_CUDA=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON,

TorchVision: 0.8.2 OpenCV: 4.4.0 MMCV: 1.3.16 MMCV Compiler: GCC 7.3 MMCV CUDA Compiler: 11.0 MMDetection: 2.18.0+

2021-10-30 14:57:54,436 - mmdet - INFO - Distributed training: False 2021-10-30 14:57:57,051 - mmdet - INFO - Config: model = dict( type='MaskRCNN', pretrained='check_points/mask_rcnn_swin_tiny_patch4_window7_1x.pth', backbone=dict( type='SwinTransformer', embed_dim=96, depths=[2, 2, 6, 2], num_heads=[3, 6, 12, 24], window_size=7, mlp_ratio=4.0, qkv_bias=True, qk_scale=None, drop_rate=0.0, attn_drop_rate=0.0, drop_path_rate=0.1, ape=False, patch_norm=True, out_indices=(0, 1, 2, 3), use_checkpoint=False), neck=dict( type='FPN', in_channels=[96, 192, 384, 768], out_channels=256, num_outs=5), rpn_head=dict( type='RPNHead', in_channels=256, feat_channels=256, anchor_generator=dict( type='AnchorGenerator', scales=[8], ratios=[0.5, 1.0, 2.0], strides=[4, 8, 16, 32, 64]), bbox_coder=dict( type='DeltaXYWHBBoxCoder', target_means=[0.0, 0.0, 0.0, 0.0], target_stds=[1.0, 1.0, 1.0, 1.0]), loss_cls=dict( type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0), loss_bbox=dict(type='L1Loss', loss_weight=1.0)), roi_head=dict( type='StandardRoIHead', bbox_roi_extractor=dict( type='SingleRoIExtractor', roi_layer=dict(type='RoIAlign', output_size=7, sampling_ratio=0), out_channels=256, featmap_strides=[4, 8, 16, 32]), bbox_head=dict( type='Shared2FCBBoxHead', in_channels=256, fc_out_channels=1024, roi_feat_size=7, num_classes=80, bbox_coder=dict( type='DeltaXYWHBBoxCoder', target_means=[0.0, 0.0, 0.0, 0.0], target_stds=[0.1, 0.1, 0.2, 0.2]), reg_class_agnostic=False, loss_cls=dict( type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0), loss_bbox=dict(type='L1Loss', loss_weight=1.0)), mask_roi_extractor=dict( type='SingleRoIExtractor', roi_layer=dict(type='RoIAlign', output_size=14, sampling_ratio=0), out_channels=256, featmap_strides=[4, 8, 16, 32]), mask_head=dict( type='FCNMaskHead', num_convs=4, in_channels=256, conv_out_channels=256, num_classes=80, loss_mask=dict( type='CrossEntropyLoss', use_mask=True, loss_weight=1.0))), train_cfg=dict( rpn=dict( assigner=dict( type='MaxIoUAssigner', pos_iou_thr=0.7, neg_iou_thr=0.3, min_pos_iou=0.3, match_low_quality=True, ignore_iof_thr=-1), sampler=dict( type='RandomSampler', num=256, pos_fraction=0.5, neg_pos_ub=-1, add_gt_as_proposals=False), allowed_border=-1, pos_weight=-1, debug=False), rpn_proposal=dict( nms_pre=2000, max_per_img=1000, nms=dict(type='nms', iou_threshold=0.7), min_bbox_size=0), rcnn=dict( assigner=dict( type='MaxIoUAssigner', pos_iou_thr=0.5, neg_iou_thr=0.5, min_pos_iou=0.5, match_low_quality=True, ignore_iof_thr=-1), sampler=dict( type='RandomSampler', num=512, pos_fraction=0.25, neg_pos_ub=-1, add_gt_as_proposals=True), mask_size=28, pos_weight=-1, debug=False)), test_cfg=dict( rpn=dict( nms_pre=1000, max_per_img=1000, nms=dict(type='nms', iou_threshold=0.7), min_bbox_size=0), rcnn=dict( score_thr=0.05, nms=dict(type='nms', iou_threshold=0.5), max_per_img=100, mask_thr_binary=0.5))) dataset_type = 'CocoDataset' data_root = 'data/coco/' img_norm_cfg = dict( mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True) train_pipeline = [ dict(type='LoadImageFromFile'), dict(type='LoadAnnotations', with_bbox=True, with_mask=True), dict(type='RandomFlip', flip_ratio=0.5), dict( type='AutoAugment', policies=[[{ 'type': 'Resize', 'img_scale': [(480, 1333), (512, 1333), (544, 1333), (576, 1333), (608, 1333), (640, 1333), (672, 1333), (704, 1333), (736, 1333), (768, 1333), (800, 1333)], 'multiscale_mode': 'value', 'keep_ratio': True }], [{ 'type': 'Resize', 'img_scale': [(400, 1333), (500, 1333), (600, 1333)], 'multiscale_mode': 'value', 'keep_ratio': True }, { 'type': 'RandomCrop', 'crop_type': 'absolute_range', 'crop_size': (384, 600), 'allow_negative_crop': True }, { 'type': 'Resize', 'img_scale': [(480, 1333), (512, 1333), (544, 1333), (576, 1333), (608, 1333), (640, 1333), (672, 1333), (704, 1333), (736, 1333), (768, 1333), (800, 1333)], 'multiscale_mode': 'value', 'override': True, 'keep_ratio': True }]]), dict( type='Normalize', mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True), dict(type='Pad', size_divisor=32), dict(type='DefaultFormatBundle'), dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels', 'gt_masks']) ] test_pipeline = [ dict(type='LoadImageFromFile'), dict( type='MultiScaleFlipAug', img_scale=(1333, 800), flip=False, transforms=[ dict(type='Resize', keep_ratio=True), dict(type='RandomFlip'), dict( type='Normalize', mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True), dict(type='Pad', size_divisor=32), dict(type='ImageToTensor', keys=['img']), dict(type='Collect', keys=['img']) ]) ] data = dict( samples_per_gpu=2, workers_per_gpu=2, train=dict( type='CocoDataset', ann_file='data/coco/annotations/instances_train2017.json', img_prefix='data/coco/train2017/', pipeline=[ dict(type='LoadImageFromFile'), dict(type='LoadAnnotations', with_bbox=True, with_mask=True), dict(type='RandomFlip', flip_ratio=0.5), dict( type='AutoAugment', policies=[[{ 'type': 'Resize', 'img_scale': [(480, 1333), (512, 1333), (544, 1333), (576, 1333), (608, 1333), (640, 1333), (672, 1333), (704, 1333), (736, 1333), (768, 1333), (800, 1333)], 'multiscale_mode': 'value', 'keep_ratio': True }], [{ 'type': 'Resize', 'img_scale': [(400, 1333), (500, 1333), (600, 1333)], 'multiscale_mode': 'value', 'keep_ratio': True }, { 'type': 'RandomCrop', 'crop_type': 'absolute_range', 'crop_size': (384, 600), 'allow_negative_crop': True }, { 'type': 'Resize', 'img_scale': [(480, 1333), (512, 1333), (544, 1333), (576, 1333), (608, 1333), (640, 1333), (672, 1333), (704, 1333), (736, 1333), (768, 1333), (800, 1333)], 'multiscale_mode': 'value', 'override': True, 'keep_ratio': True }]]), dict( type='Normalize', mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True), dict(type='Pad', size_divisor=32), dict(type='DefaultFormatBundle'), dict( type='Collect', keys=['img', 'gt_bboxes', 'gt_labels', 'gt_masks']) ]), val=dict( type='CocoDataset', ann_file='data/coco/annotations/instances_val2017.json', img_prefix='data/coco/val2017/', pipeline=[ dict(type='LoadImageFromFile'), dict( type='MultiScaleFlipAug', img_scale=(1333, 800), flip=False, transforms=[ dict(type='Resize', keep_ratio=True), dict(type='RandomFlip'), dict( type='Normalize', mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True), dict(type='Pad', size_divisor=32), dict(type='ImageToTensor', keys=['img']), dict(type='Collect', keys=['img']) ]) ]), test=dict( type='CocoDataset', ann_file='data/coco/annotations/instances_val2017.json', img_prefix='data/coco/val2017/', pipeline=[ dict(type='LoadImageFromFile'), dict( type='MultiScaleFlipAug', img_scale=(1333, 800), flip=False, transforms=[ dict(type='Resize', keep_ratio=True), dict(type='RandomFlip'), dict( type='Normalize', mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True), dict(type='Pad', size_divisor=32), dict(type='ImageToTensor', keys=['img']), dict(type='Collect', keys=['img']) ]) ])) evaluation = dict(metric=['bbox', 'segm']) optimizer = dict( type='AdamW', lr=0.0001, betas=(0.9, 0.999), weight_decay=0.05, paramwise_cfg=dict( custom_keys=dict( absolute_pos_embed=dict(decay_mult=0.0), relative_position_bias_table=dict(decay_mult=0.0), norm=dict(decay_mult=0.0)))) optimizer_config = dict( grad_clip=None, type='DistOptimizerHook', update_interval=1, coalesce=True, bucket_size_mb=-1, use_fp16=True) lr_config = dict( policy='step', warmup='linear', warmup_iters=500, warmup_ratio=0.001, step=[8, 11]) runner = dict(type='EpochBasedRunnerAmp', max_epochs=12) checkpoint_config = dict(interval=1) log_config = dict(interval=50, hooks=[dict(type='TextLoggerHook')]) custom_hooks = [dict(type='NumClassCheckHook')] dist_params = dict(backend='nccl') log_level = 'INFO' load_from = None resume_from = None workflow = [('train', 1)] fp16 = None work_dir = './work_dirs/mask_rcnn_swin_tiny_patch4_window7_mstrain_480-800_adamw_1x_coco' gpu_ids = range(0, 1)

/home/zhaogy/anaconda3/envs/mmdet_py/lib/python3.7/site-packages/mmdet/models/detectors/two_stage.py:29: UserWarning: DeprecationWarning: pretrained is deprecated, please use "init_cfg" instead warnings.warn('DeprecationWarning: pretrained is deprecated, ' Traceback (most recent call last): File "/home/zhaogy/anaconda3/envs/mmdet_py/lib/python3.7/site-packages/mmcv/utils/registry.py", line 52, in build_from_cfg return obj_cls(**args) TypeError: init() got an unexpected keyword argument 'embed_dim'

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/home/zhaogy/anaconda3/envs/mmdet_py/lib/python3.7/site-packages/mmcv/utils/registry.py", line 52, in build_from_cfg return obj_cls(**args) File "/home/zhaogy/anaconda3/envs/mmdet_py/lib/python3.7/site-packages/mmdet/models/detectors/mask_rcnn.py", line 27, in init init_cfg=init_cfg) File "/home/zhaogy/anaconda3/envs/mmdet_py/lib/python3.7/site-packages/mmdet/models/detectors/two_stage.py", line 32, in init self.backbone = build_backbone(backbone) File "/home/zhaogy/anaconda3/envs/mmdet_py/lib/python3.7/site-packages/mmdet/models/builder.py", line 20, in build_backbone return BACKBONES.build(cfg) File "/home/zhaogy/anaconda3/envs/mmdet_py/lib/python3.7/site-packages/mmcv/utils/registry.py", line 212, in build return self.build_func(*args, **kwargs, registry=self) File "/home/zhaogy/anaconda3/envs/mmdet_py/lib/python3.7/site-packages/mmcv/cnn/builder.py", line 27, in build_model_from_cfg return build_from_cfg(cfg, registry, default_args) File "/home/zhaogy/anaconda3/envs/mmdet_py/lib/python3.7/site-packages/mmcv/utils/registry.py", line 55, in build_from_cfg raise type(e)(f'{obj_cls.name}: {e}') TypeError: SwinTransformer: init() got an unexpected keyword argument 'embed_dim'

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "tools/train.py", line 187, in main() File "tools/train.py", line 161, in main test_cfg=cfg.get('test_cfg')) File "/home/zhaogy/anaconda3/envs/mmdet_py/lib/python3.7/site-packages/mmdet/models/builder.py", line 59, in build_detector cfg, default_args=dict(train_cfg=train_cfg, test_cfg=test_cfg)) File "/home/zhaogy/anaconda3/envs/mmdet_py/lib/python3.7/site-packages/mmcv/utils/registry.py", line 212, in build return self.build_func(*args, **kwargs, registry=self) File "/home/zhaogy/anaconda3/envs/mmdet_py/lib/python3.7/site-packages/mmcv/cnn/builder.py", line 27, in build_model_from_cfg return build_from_cfg(cfg, registry, default_args) File "/home/zhaogy/anaconda3/envs/mmdet_py/lib/python3.7/site-packages/mmcv/utils/registry.py", line 55, in build_from_cfg raise type(e)(f'{obj_cls.name}: {e}') TypeError: MaskRCNN: SwinTransformer: init() got an unexpected keyword argument 'embed_dim'

please help!
opened by GY-CAS 37
TypeError: __init__() got an unexpected keyword argument 'embed_dim'

Notice

Traceback (most recent call last): File "f:\peizhiwenjian\mmcv-1.4.0\mmcv\utils\registry.py", line 52, in build_from_cfg return obj_cls(**args) TypeError: init() got an unexpected keyword argument 'embed_dim'

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "f:\peizhiwenjian\mmcv-1.4.0\mmcv\utils\registry.py", line 52, in build_from_cfg return obj_cls(**args) File "f:\peizhiwenjian\mmdetection-master2.19.1\mmdet\models\detectors\mask_rcnn.py", line 19, in init super(MaskRCNN, self).init( File "f:\peizhiwenjian\mmdetection-master2.19.1\mmdet\models\detectors\two_stage.py", line 32, in init self.backbone = build_backbone(backbone) File "f:\peizhiwenjian\mmdetection-master2.19.1\mmdet\models\builder.py", line 20, in build_backbone return BACKBONES.build(cfg) File "f:\peizhiwenjian\mmcv-1.4.0\mmcv\utils\registry.py", line 212, in build return self.build_func(*args, **kwargs, registry=self) File "f:\peizhiwenjian\mmcv-1.4.0\mmcv\cnn\builder.py", line 27, in build_model_from_cfg return build_from_cfg(cfg, registry, default_args) File "f:\peizhiwenjian\mmcv-1.4.0\mmcv\utils\registry.py", line 55, in build_from_cfg raise type(e)(f'{obj_cls.name}: {e}') TypeError: SwinTransformer: init() got an unexpected keyword argument 'embed_dim'

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "F:\PYcharm-project\deep-learning-for-image-processing\pytorch_object_detection\Swin-Transformer-Object-Detection-master\demo\image_demo.py", line 26, in main() File "F:\PYcharm-project\deep-learning-for-image-processing\pytorch_object_detection\Swin-Transformer-Object-Detection-master\demo\image_demo.py", line 18, in main model = init_detector(args.config, args.checkpoint, device=args.device) File "f:\peizhiwenjian\mmdetection-master2.19.1\mmdet\apis\inference.py", line 40, in init_detector model = build_detector(config.model, test_cfg=config.get('test_cfg')) File "f:\peizhiwenjian\mmdetection-master2.19.1\mmdet\models\builder.py", line 58, in build_detector return DETECTORS.build( File "f:\peizhiwenjian\mmcv-1.4.0\mmcv\utils\registry.py", line 212, in build return self.build_func(*args, **kwargs, registry=self) File "f:\peizhiwenjian\mmcv-1.4.0\mmcv\cnn\builder.py", line 27, in build_model_from_cfg return build_from_cfg(cfg, registry, default_args) File "f:\peizhiwenjian\mmcv-1.4.0\mmcv\utils\registry.py", line 55, in build_from_cfg raise type(e)(f'{obj_cls.name}: {e}') TypeError: MaskRCNN: SwinTransformer: init() got an unexpected keyword argument 'embed_dim'

opened by LUO77123 27

KeyError: 'SwinTransformer is not in the models registry'

I am using this model for custom training on my dataset in Colab. As I started training , got the error-

Traceback (most recent call last):
File "/usr/local/lib/python3.7/dist-packages/mmcv/utils/registry.py", line 51, in build_from_cfg
return obj_cls(**args)
File "/content/drive/MyDrive/mmdetection/mmdet/models/detectors/cascade_rcnn.py", line 27, in __init__
init_cfg=init_cfg)
File "/content/drive/MyDrive/mmdetection/mmdet/models/detectors/two_stage.py", line 26, in __init__
self.backbone = build_backbone(backbone)
File "/content/drive/MyDrive/mmdetection/mmdet/models/builder.py", line 19, in build_backbone
return BACKBONES.build(cfg)
File "/usr/local/lib/python3.7/dist-packages/mmcv/utils/registry.py", line 210, in build
return self.build_func(*args, **kwargs, registry=self)
File "/usr/local/lib/python3.7/dist-packages/mmcv/cnn/builder.py", line 26, in build_model_from_cfg
return build_from_cfg(cfg, registry, default_args)
File "/usr/local/lib/python3.7/dist-packages/mmcv/utils/registry.py", line 44, in build_from_cfg
f'{obj_type} is not in the {registry.name} registry')
KeyError: 'SwinTransformer is not in the models registry'

During handling of the above exception, another exception occurred:

 Traceback (most recent call last):
 File "tools/train.py", line 187, in <module>
 main()
  File "tools/train.py", line 161, in main
test_cfg=cfg.get('test_cfg'))
File "/content/drive/MyDrive/mmdetection/mmdet/models/builder.py", line 58, in build_detector
cfg, default_args=dict(train_cfg=train_cfg, test_cfg=test_cfg))
File "/usr/local/lib/python3.7/dist-packages/mmcv/utils/registry.py", line 210, in build
return self.build_func(*args, **kwargs, registry=self)
File "/usr/local/lib/python3.7/dist-packages/mmcv/cnn/builder.py", line 26, in build_model_from_cfg
return build_from_cfg(cfg, registry, default_args)
File "/usr/local/lib/python3.7/dist-packages/mmcv/utils/registry.py", line 54, in build_from_cfg
 raise type(e)(f'{obj_cls.__name__}: {e}')
KeyError: "CascadeRCNN: 'SwinTransformer is not in the models registry'"

Here is my config file -

2021-05-13 12:30:00,473 - mmdet - INFO - Environment info:
------------------------------------------------------------
sys.platform: linux
Python: 3.7.10 (default, May  3 2021, 02:48:31) [GCC 7.5.0]
CUDA available: True
GPU 0: Tesla P100-PCIE-16GB
CUDA_HOME: /usr/local/cuda
NVCC: Build cuda_11.0_bu.TC445_37.28845127_0
GCC: gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
PyTorch: 1.8.1+cu101
PyTorch compiling details: PyTorch built with:
  - GCC 7.3
  - C++ Version: 201402
  - Intel(R) Math Kernel Library Version 2020.0.0 Product Build 20191122 for Intel(R) 64 architecture applications
  - Intel(R) MKL-DNN v1.7.0 (Git Hash 7aed236906b1f7a05c0917e5257a1af05e9ff683)
  - OpenMP 201511 (a.k.a. OpenMP 4.5)
  - NNPACK is enabled
  - CPU capability usage: AVX2
  - CUDA Runtime 10.1
  - NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_70,code=sm_70
  - CuDNN 7.6.3
  - Magma 2.5.2
  - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=10.1, CUDNN_VERSION=7.6.3, CXX_COMPILER=/opt/rh/devtoolset-7/root/usr/bin/c++, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.8.1, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, 

TorchVision: 0.9.1+cu101
OpenCV: 4.1.2
MMCV: 1.3.3
MMCV Compiler: GCC 7.5
MMCV CUDA Compiler: 11.0
MMDetection: 2.12.0+41bb93f
------------------------------------------------------------

2021-05-13 12:30:04,393 - mmdet - INFO - Distributed training: False
2021-05-13 12:30:08,323 - mmdet - INFO - Config:
model = dict(
    type='CascadeRCNN',
    pretrained='./moby_cascade_mask_rcnn_swin_tiny_patch4_window7_3x.pth',
    backbone=dict(
        type='SwinTransformer',
        embed_dim=96,
        depths=[2, 2, 6, 2],
        num_heads=[3, 6, 12, 24],
        window_size=7,
        mlp_ratio=4.0,
        qkv_bias=True,
        qk_scale=None,
        drop_rate=0.0,
        attn_drop_rate=0.0,
        drop_path_rate=0.2,
        ape=False,
        patch_norm=True,
        out_indices=(0, 1, 2, 3),
        use_checkpoint=False),
    neck=dict(
        type='FPN',
        in_channels=[96, 192, 384, 768],
        out_channels=256,
        num_outs=5),
    rpn_head=dict(
        type='RPNHead',
        in_channels=256,
        feat_channels=256,
        anchor_generator=dict(
            type='AnchorGenerator',
            scales=[8],
            ratios=[0.5, 1.0, 2.0],
            strides=[4, 8, 16, 32, 64]),
        bbox_coder=dict(
            type='DeltaXYWHBBoxCoder',
            target_means=[0.0, 0.0, 0.0, 0.0],
            target_stds=[1.0, 1.0, 1.0, 1.0]),
        loss_cls=dict(
            type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0),
        loss_bbox=dict(
            type='SmoothL1Loss', beta=0.1111111111111111, loss_weight=1.0)),
    roi_head=dict(
        type='CascadeRoIHead',
        num_stages=3,
        stage_loss_weights=[1, 0.5, 0.25],
        bbox_roi_extractor=dict(
            type='SingleRoIExtractor',
            roi_layer=dict(type='RoIAlign', output_size=7, sampling_ratio=0),
            out_channels=256,
            featmap_strides=[4, 8, 16, 32]),
        bbox_head=[
            dict(
                type='ConvFCBBoxHead',
                num_shared_convs=4,
                num_shared_fcs=1,
                in_channels=256,
                conv_out_channels=256,
                fc_out_channels=1024,
                roi_feat_size=7,
                num_classes=80,
                bbox_coder=dict(
                    type='DeltaXYWHBBoxCoder',
                    target_means=[0.0, 0.0, 0.0, 0.0],
                    target_stds=[0.1, 0.1, 0.2, 0.2]),
                reg_class_agnostic=False,
                reg_decoded_bbox=True,
                norm_cfg=dict(type='SyncBN', requires_grad=True),
                loss_cls=dict(
                    type='CrossEntropyLoss',
                    use_sigmoid=False,
                    loss_weight=1.0),
                loss_bbox=dict(type='GIoULoss', loss_weight=10.0)),
            dict(
                type='ConvFCBBoxHead',
                num_shared_convs=4,
                num_shared_fcs=1,
                in_channels=256,
                conv_out_channels=256,
                fc_out_channels=1024,
                roi_feat_size=7,
                num_classes=80,
                bbox_coder=dict(
                    type='DeltaXYWHBBoxCoder',
                    target_means=[0.0, 0.0, 0.0, 0.0],
                    target_stds=[0.05, 0.05, 0.1, 0.1]),
                reg_class_agnostic=False,
                reg_decoded_bbox=True,
                norm_cfg=dict(type='SyncBN', requires_grad=True),
                loss_cls=dict(
                    type='CrossEntropyLoss',
                    use_sigmoid=False,
                    loss_weight=1.0),
                loss_bbox=dict(type='GIoULoss', loss_weight=10.0)),
            dict(
                type='ConvFCBBoxHead',
                num_shared_convs=4,
                num_shared_fcs=1,
                in_channels=256,
                conv_out_channels=256,
                fc_out_channels=1024,
                roi_feat_size=7,
                num_classes=80,
                bbox_coder=dict(
                    type='DeltaXYWHBBoxCoder',
                    target_means=[0.0, 0.0, 0.0, 0.0],
                    target_stds=[0.033, 0.033, 0.067, 0.067]),
                reg_class_agnostic=False,
                reg_decoded_bbox=True,
                norm_cfg=dict(type='SyncBN', requires_grad=True),
                loss_cls=dict(
                    type='CrossEntropyLoss',
                    use_sigmoid=False,
                    loss_weight=1.0),
                loss_bbox=dict(type='GIoULoss', loss_weight=10.0))
        ],
        mask_roi_extractor=dict(
            type='SingleRoIExtractor',
            roi_layer=dict(type='RoIAlign', output_size=14, sampling_ratio=0),
            out_channels=256,
            featmap_strides=[4, 8, 16, 32]),
        mask_head=dict(
            type='FCNMaskHead',
            num_convs=4,
            in_channels=256,
            conv_out_channels=256,
            num_classes=80,
            loss_mask=dict(
                type='CrossEntropyLoss', use_mask=True, loss_weight=1.0))),
    train_cfg=dict(
        rpn=dict(
            assigner=dict(
                type='MaxIoUAssigner',
                pos_iou_thr=0.7,
                neg_iou_thr=0.3,
                min_pos_iou=0.3,
                match_low_quality=True,
                ignore_iof_thr=-1),
            sampler=dict(
                type='RandomSampler',
                num=256,
                pos_fraction=0.5,
                neg_pos_ub=-1,
                add_gt_as_proposals=False),
            allowed_border=0,
            pos_weight=-1,
            debug=False),
        rpn_proposal=dict(
            nms_across_levels=False,
            nms_pre=2000,
            nms_post=2000,
            max_per_img=2000,
            nms=dict(type='nms', iou_threshold=0.7),
            min_bbox_size=0),
        rcnn=[
            dict(
                assigner=dict(
                    type='MaxIoUAssigner',
                    pos_iou_thr=0.5,
                    neg_iou_thr=0.5,
                    min_pos_iou=0.5,
                    match_low_quality=False,
                    ignore_iof_thr=-1),
                sampler=dict(
                    type='RandomSampler',
                    num=512,
                    pos_fraction=0.25,
                    neg_pos_ub=-1,
                    add_gt_as_proposals=True),
                mask_size=28,
                pos_weight=-1,
                debug=False),
            dict(
                assigner=dict(
                    type='MaxIoUAssigner',
                    pos_iou_thr=0.6,
                    neg_iou_thr=0.6,
                    min_pos_iou=0.6,
                    match_low_quality=False,
                    ignore_iof_thr=-1),
                sampler=dict(
                    type='RandomSampler',
                    num=512,
                    pos_fraction=0.25,
                    neg_pos_ub=-1,
                    add_gt_as_proposals=True),
                mask_size=28,
                pos_weight=-1,
                debug=False),
            dict(
                assigner=dict(
                    type='MaxIoUAssigner',
                    pos_iou_thr=0.7,
                    neg_iou_thr=0.7,
                    min_pos_iou=0.7,
                    match_low_quality=False,
                    ignore_iof_thr=-1),
                sampler=dict(
                    type='RandomSampler',
                    num=512,
                    pos_fraction=0.25,
                    neg_pos_ub=-1,
                    add_gt_as_proposals=True),
                mask_size=28,
                pos_weight=-1,
                debug=False)
        ]),
    test_cfg=dict(
        rpn=dict(
            nms_across_levels=False,
            nms_pre=1000,
            nms_post=1000,
            max_per_img=1000,
            nms=dict(type='nms', iou_threshold=0.7),
            min_bbox_size=0),
        rcnn=dict(
            score_thr=0.05,
            nms=dict(type='nms', iou_threshold=0.5),
            max_per_img=100,
            mask_thr_binary=0.5)))
dataset_type = 'COCODataset'
data_root = '/content/drive/MyDrive/layout/'
img_norm_cfg = dict(
    mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)
train_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(type='LoadAnnotations', with_bbox=True, with_mask=True),
    dict(type='RandomFlip', flip_ratio=0.5),
    dict(
        type='AutoAugment',
        policies=[[{
            'type':
            'Resize',
            'img_scale': [(480, 1333), (512, 1333), (544, 1333), (576, 1333),
                          (608, 1333), (640, 1333), (672, 1333), (704, 1333),
                          (736, 1333), (768, 1333), (800, 1333)],
            'multiscale_mode':
            'value',
            'keep_ratio':
            True
        }],
                  [{
                      'type': 'Resize',
                      'img_scale': [(400, 1333), (500, 1333), (600, 1333)],
                      'multiscale_mode': 'value',
                      'keep_ratio': True
                  }, {
                      'type': 'RandomCrop',
                      'crop_type': 'absolute_range',
                      'crop_size': (384, 600),
                      'allow_negative_crop': True
                  }, {
                      'type':
                      'Resize',
                      'img_scale': [(480, 1333), (512, 1333), (544, 1333),
                                    (576, 1333), (608, 1333), (640, 1333),
                                    (672, 1333), (704, 1333), (736, 1333),
                                    (768, 1333), (800, 1333)],
                      'multiscale_mode':
                      'value',
                      'override':
                      True,
                      'keep_ratio':
                      True
                  }]]),
    dict(
        type='Normalize',
        mean=[123.675, 116.28, 103.53],
        std=[58.395, 57.12, 57.375],
        to_rgb=True),
    dict(type='Pad', size_divisor=32),
    dict(type='DefaultFormatBundle'),
    dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels', 'gt_masks'])
]
test_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(
        type='MultiScaleFlipAug',
        img_scale=(1333, 800),
        flip=False,
        transforms=[
            dict(type='Resize', keep_ratio=True),
            dict(type='RandomFlip'),
            dict(
                type='Normalize',
                mean=[123.675, 116.28, 103.53],
                std=[58.395, 57.12, 57.375],
                to_rgb=True),
            dict(type='Pad', size_divisor=32),
            dict(type='ImageToTensor', keys=['img']),
            dict(type='Collect', keys=['img'])
        ])
]
data = dict(
    samples_per_gpu=2,
    workers_per_gpu=2,
    train=dict(
        type='COCODataset',
        ann_file='/content/drive/MyDrive/layout/train.json',
        img_prefix='/content/drive/MyDrive/layout/train/',
        pipeline=[
            dict(type='LoadImageFromFile'),
            dict(type='LoadAnnotations', with_bbox=True, with_mask=True),
            dict(type='RandomFlip', flip_ratio=0.5),
            dict(
                type='AutoAugment',
                policies=[[{
                    'type':
                    'Resize',
                    'img_scale': [(480, 1333), (512, 1333), (544, 1333),
                                  (576, 1333), (608, 1333), (640, 1333),
                                  (672, 1333), (704, 1333), (736, 1333),
                                  (768, 1333), (800, 1333)],
                    'multiscale_mode':
                    'value',
                    'keep_ratio':
                    True
                }],
                          [{
                              'type': 'Resize',
                              'img_scale': [(400, 1333), (500, 1333),
                                            (600, 1333)],
                              'multiscale_mode': 'value',
                              'keep_ratio': True
                          }, {
                              'type': 'RandomCrop',
                              'crop_type': 'absolute_range',
                              'crop_size': (384, 600),
                              'allow_negative_crop': True
                          }, {
                              'type':
                              'Resize',
                              'img_scale': [(480, 1333), (512, 1333),
                                            (544, 1333), (576, 1333),
                                            (608, 1333), (640, 1333),
                                            (672, 1333), (704, 1333),
                                            (736, 1333), (768, 1333),
                                            (800, 1333)],
                              'multiscale_mode':
                              'value',
                              'override':
                              True,
                              'keep_ratio':
                              True
                          }]]),
            dict(
                type='Normalize',
                mean=[123.675, 116.28, 103.53],
                std=[58.395, 57.12, 57.375],
                to_rgb=True),
            dict(type='Pad', size_divisor=32),
            dict(type='DefaultFormatBundle'),
            dict(
                type='Collect',
                keys=['img', 'gt_bboxes', 'gt_labels', 'gt_masks'])
        ]),
    val=dict(
        type='COCODataset',
        ann_file='/content/drive/MyDrive/layout/valid.json',
        img_prefix='/content/drive/MyDrive/layout/valid/',
        pipeline=[
            dict(type='LoadImageFromFile'),
            dict(
                type='MultiScaleFlipAug',
                img_scale=(1333, 800),
                flip=False,
                transforms=[
                    dict(type='Resize', keep_ratio=True),
                    dict(type='RandomFlip'),
                    dict(
                        type='Normalize',
                        mean=[123.675, 116.28, 103.53],
                        std=[58.395, 57.12, 57.375],
                        to_rgb=True),
                    dict(type='Pad', size_divisor=32),
                    dict(type='ImageToTensor', keys=['img']),
                    dict(type='Collect', keys=['img'])
                ])
        ]),
    test=dict(
        type='COCODataset',
        ann_file='/content/drive/MyDrive/layout/valid.json',
        img_prefix='/content/drive/MyDrive/layout/valid/',
        pipeline=[
            dict(type='LoadImageFromFile'),
            dict(
                type='MultiScaleFlipAug',
                img_scale=(1333, 800),
                flip=False,
                transforms=[
                    dict(type='Resize', keep_ratio=True),
                    dict(type='RandomFlip'),
                    dict(
                        type='Normalize',
                        mean=[123.675, 116.28, 103.53],
                        std=[58.395, 57.12, 57.375],
                        to_rgb=True),
                    dict(type='Pad', size_divisor=32),
                    dict(type='ImageToTensor', keys=['img']),
                    dict(type='Collect', keys=['img'])
                ])
        ]))
evaluation = dict(metric=['bbox', 'segm'])
optimizer = dict(
    type='AdamW',
    lr=0.0001,
    betas=(0.9, 0.999),
    weight_decay=0.05,
    paramwise_cfg=dict(
        custom_keys=dict(
            absolute_pos_embed=dict(decay_mult=0.0),
            relative_position_bias_table=dict(decay_mult=0.0),
            norm=dict(decay_mult=0.0))))
optimizer_config = dict(
    grad_clip=None,
    type='DistOptimizerHook',
    update_interval=1,
    coalesce=True,
    bucket_size_mb=-1,
    use_fp16=True)
lr_config = dict(
    policy='step',
    warmup='linear',
    warmup_iters=500,
    warmup_ratio=0.001,
    step=[27, 33])
runner = dict(type='EpochBasedRunnerAmp', max_epochs=36)
checkpoint_config = dict(interval=5)
log_config = dict(interval=50, hooks=[dict(type='TextLoggerHook')])
custom_hooks = [dict(type='NumClassCheckHook')]
dist_params = dict(backend='nccl')
log_level = 'INFO'
load_from = '/content/drive/MyDrive/Swin-Transformer-Object-Detection/moby_cascade_mask_rcnn_swin_tiny_patch4_window7_3x.pth'
resume_from = None
workflow = [('train', 1)]
fp16 = None
work_dir = './work_dirs/cascade_mask_rcnn_swin_tiny_patch4_window7_mstrain_480-800_giou_4conv1f_adamw_3x_coco'
gpu_ids = range(0, 1)

opened by Atul997 18

KeyError: "CascadeRCNN: 'backbone.layers.0.blocks.0.attn.relative_position_bias_table'"

Thanks for your work!

I occured the error when I run the code.

I run the command: python tools/train.py configs/swin/mydef_cascade_mask_rcnn_swin_small_patch4_window7_mstrain_480-800_giou_4conv1f_adamw_3x_coco.py --cfg-options model.pretrained=./models/cascade_mask_rcnn_swin_small_patch4_window7.pth

How to solve it?

opened by luooofan 11

Can't see train loss when train on custom dataset.

I have trained on custom dataset which included 45 pics in training set. I can't see train loss when train on this dataset, only the metric result showed in terminal. Here is my config file:

_base_ = './cascade_mask_rcnn_swin_tiny_patch4_window7_mstrain_480-800_giou_4conv1f_adamw_3x_coco.py

dataset_type = 'CocoDataset'
classes = ('counter_weight',)
data_root = './MTDATA'
data = dict(
    samples_per_gpu=2,
    workers_per_gpu=2,
    train=dict(
        type=dataset_type,
        # explicitly add your class names to the field `classes`
        classes=classes,
        ann_file=data_root + 'annotations/gray_train.json',
        img_prefix=data_root + 'train'),
    val=dict(
        type=dataset_type,
        # explicitly add your class names to the field `classes`
        classes=classes,
        ann_file=data_root + 'annotations/gray_val.json',
        img_prefix=data_root + 'val'),
    test=dict(
        type=dataset_type,
        # explicitly add your class names to the field `classes`
        classes=classes,
        ann_file=data_root + 'annotations/gray_val.json',
        img_prefix=data_root + 'val'))

model = dict(
    roi_head=dict(
        bbox_head=[
            dict(
                type='Shared2FCBBoxHead',
                # explicitly over-write all the `num_classes` field.
                num_classes=1),
            dict(
                type='Shared2FCBBoxHead',
                # explicitly over-write all the `num_classes` field.
                num_classes=1),
            dict(
                type='Shared2FCBBoxHead',
                # explicitly over-write all the `num_classes` field.
                num_classes=1)],
    # explicitly over-write all the `num_classes` field.
    mask_head=dict(num_classes=1)))

And I run this config file using: CUDA_VISIBLE_DEVICES=2 python ./tools/train.py configs/swin/counter_weight_config.py --work-dir ./weights/counterweight_20211207/

opened by LZYmixiu 8

torch.nn.modules.module.ModuleAttributeError: MaskRCNN: 'Sequential' object has no attribute 'init_weights

When I training the model which type is maskrcnn, the error likes title was encouraged, dose anyone knows what's going on? or have met this erro before? The problem seems like in mmcv. The Traceback is:

Traceback (most recent call last):
  File "/home/ljo4sgh/miniconda3/envs/new-swin-T/lib/python3.7/site-packages/mmcv/utils/registry.py", line 52, in build_from_cfg
    return obj_cls(**args)
  File "/home/ljo4sgh/miniconda3/envs/new-swin-T/lib/python3.7/site-packages/mmdet-2.11.0-py3.7.egg/mmdet/models/detectors/mask_rcnn.py", line 24, in __init__    pretrained=pretrained)
  File "/home/ljo4sgh/miniconda3/envs/new-swin-T/lib/python3.7/site-packages/mmdet-2.11.0-py3.7.egg/mmdet/models/detectors/two_stage.py", line 48, in __init__                                             rixzerogoki" 10:54 16-12月-21
    self.init_weights(pretrained=pretrained)
  File "/home/ljo4sgh/miniconda3/envs/new-swin-T/lib/python3.7/site-packages/mmdet-2.11.0-py3.7.egg/mmdet/models/detectors/two_stage.py", line 78, in init_weights
    self.roi_head.init_weights(pretrained)
  File "/home/ljo4sgh/miniconda3/envs/new-swin-T/lib/python3.7/site-packages/mmdet-2.11.0-py3.7.egg/mmdet/models/roi_heads/standard_roi_head.py", line 48, in init_weights
    self.bbox_head.init_weights()
  File "/home/ljo4sgh/miniconda3/envs/new-swin-T/lib/python3.7/site-packages/torch/nn/modules/module.py", line 779, in __getattr__
    type(self).__name__, name))
torch.nn.modules.module.ModuleAttributeError: 'Sequential' object has no attribute 'init_weights'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "./tools/train.py", line 187, in <module>
    main()
  File "./tools/train.py", line 161, in main
    test_cfg=cfg.get('test_cfg'))
  File "/home/ljo4sgh/miniconda3/envs/new-swin-T/lib/python3.7/site-packages/mmdet-2.11.0-py3.7.egg/mmdet/models/builder.py", line 77, in build_detector
    return build(cfg, DETECTORS, dict(train_cfg=train_cfg, test_cfg=test_cfg))
  File "/home/ljo4sgh/miniconda3/envs/new-swin-T/lib/python3.7/site-packages/mmdet-2.11.0-py3.7.egg/mmdet/models/builder.py", line 34, in build
    return build_from_cfg(cfg, registry, default_args)
  File "/home/ljo4sgh/miniconda3/envs/new-swin-T/lib/python3.7/site-packages/mmcv/utils/registry.py", line 55, in build_from_cfg
    raise type(e)(f'{obj_cls.__name__}: {e}')
torch.nn.modules.module.ModuleAttributeError: MaskRCNN: 'Sequential' object has no attribute 'init_weights'

ps: I haven't modified the test_cfg in my own config.

opened by LZYmixiu 7

The model and loaded state dict do not match exactly

Hi,

I try to load the Swin backbone using the configuration mask_rcnn_swin_small_patch4_window7_mstrain_480-800_adamw_3x_coco.py and weights swin_small_patch4_window7_224.pth and I get the following warnings:

mmdet - WARNING - The model and loaded state dict do not match exactly

unexpected key in source state_dict: norm.weight, norm.bias, head.weight, head.bias, layers.0.blocks.1.attn_mask, layers.1.blocks.1.attn_mask, layers.2.blocks.1.attn_mask, layers.2.blocks.3.attn_mask, layers.2.blocks.5.attn_mask, layers.2.blocks.7.attn_mask, layers.2.blocks.9.attn_mask, layers.2.blocks.11.attn_mask, layers.2.blocks.13.attn_mask, layers.2.blocks.15.attn_mask, layers.2.blocks.17.attn_mask

missing keys in source state_dict: norm0.weight, norm0.bias, norm1.weight, norm1.bias, norm2.weight, norm2.bias, norm3.weight, norm3.bias

I understand the problem with the norm layers. In the original Swin backbone, there is only one normalization layer at the output while there is a norm layer at every output stage in the Swin backbone used for detection. However, I am not sure why there is the problem with attn_masks.

Can you please help me?

Thank you very much in advance.

opened by vobecant 7

missing keys in source state_dict

patch_embed.proj.weight, patch_embed.proj.bias, patch_embed.norm.weight, patch_embed.norm.bias, layers.0.blocks.0.norm1.weight, layers.0.blocks.0.norm1.bias, layers.0.blocks.0.attn.relative_position_bias_table, layers.0.blocks.0.attn.relative_position_index, layers.0.blocks.0.attn.qkv.weight, layers.0.blocks.0.attn.qkv.bias, layers.0.blocks.0.attn.proj.weight, layers.0.blocks.0.attn.proj.bias, layers.0.blocks.0.norm2.weight, layers.0.blocks.0.norm2.bias, layers.0.blocks.0.mlp.fc1.weight, layers.0.blocks.0.mlp.fc1.bias, layers.0.blocks.0.mlp.fc2.weight, layers.0.blocks.0.mlp.fc2.bias, layers.0.blocks.1.norm1.weight, layers.0.blocks.1.norm1.bias, layers.0.blocks.1.attn.relative_position_bias_table, layers.0.blocks.1.attn.relative_position_index, layers.0.blocks.1.attn.qkv.weight, layers.0.blocks.1.attn.qkv.bias, layers.0.blocks.1.attn.proj.weight, layers.0.blocks.1.attn.proj.bias, layers.0.blocks.1.norm2.weight, layers.0.blocks.1.norm2.bias, layers.0.blocks.1.mlp.fc1.weight, layers.0.blocks.1.mlp.fc1.bias, layers.0.blocks.1.mlp.fc2.weight, layers.0.blocks.1.mlp.fc2.bias, layers.0.downsample.reduction.weight, layers.0.downsample.norm.weight, layers.0.downsample.norm.bias, layers.1.blocks.0.norm1.weight, layers.1.blocks.0.norm1.bias, layers.1.blocks.0.attn.relative_position_bias_table, layers.1.blocks.0.attn.relative_position_index, layers.1.blocks.0.attn.qkv.weight, layers.1.blocks.0.attn.qkv.bias, layers.1.blocks.0.attn.proj.weight, layers.1.blocks.0.attn.proj.bias, layers.1.blocks.0.norm2.weight, layers.1.blocks.0.norm2.bias, layers.1.blocks.0.mlp.fc1.weight, layers.1.blocks.0.mlp.fc1.bias, layers.1.blocks.0.mlp.fc2.weight, layers.1.blocks.0.mlp.fc2.bias, layers.1.blocks.1.norm1.weight, layers.1.blocks.1.norm1.bias, layers.1.blocks.1.attn.relative_position_bias_table, layers.1.blocks.1.attn.relative_position_index, layers.1.blocks.1.attn.qkv.weight, layers.1.blocks.1.attn.qkv.bias, layers.1.blocks.1.attn.proj.weight, layers.1.blocks.1.attn.proj.bias, layers.1.blocks.1.norm2.weight, layers.1.blocks.1.norm2.bias, layers.1.blocks.1.mlp.fc1.weight, layers.1.blocks.1.mlp.fc1.bias, layers.1.blocks.1.mlp.fc2.weight, layers.1.blocks.1.mlp.fc2.bias, layers.1.downsample.reduction.weight, layers.1.downsample.norm.weight, layers.1.downsample.norm.bias, layers.2.blocks.0.norm1.weight, layers.2.blocks.0.norm1.bias, layers.2.blocks.0.attn.relative_position_bias_table, layers.2.blocks.0.attn.relative_position_index, layers.2.blocks.0.attn.qkv.weight, layers.2.blocks.0.attn.qkv.bias, layers.2.blocks.0.attn.proj.weight, layers.2.blocks.0.attn.proj.bias, layers.2.blocks.0.norm2.weight, layers.2.blocks.0.norm2.bias, layers.2.blocks.0.mlp.fc1.weight, layers.2.blocks.0.mlp.fc1.bias, layers.2.blocks.0.mlp.fc2.weight, layers.2.blocks.0.mlp.fc2.bias, layers.2.blocks.1.norm1.weight, layers.2.blocks.1.norm1.bias, layers.2.blocks.1.attn.relative_position_bias_table, layers.2.blocks.1.attn.relative_position_index, layers.2.blocks.1.attn.qkv.weight, layers.2.blocks.1.attn.qkv.bias, layers.2.blocks.1.attn.proj.weight, layers.2.blocks.1.attn.proj.bias, layers.2.blocks.1.norm2.weight, layers.2.blocks.1.norm2.bias, layers.2.blocks.1.mlp.fc1.weight, layers.2.blocks.1.mlp.fc1.bias, layers.2.blocks.1.mlp.fc2.weight, layers.2.blocks.1.mlp.fc2.bias, layers.2.blocks.2.norm1.weight, layers.2.blocks.2.norm1.bias, layers.2.blocks.2.attn.relative_position_bias_table, layers.2.blocks.2.attn.relative_position_index, layers.2.blocks.2.attn.qkv.weight, layers.2.blocks.2.attn.qkv.bias, layers.2.blocks.2.attn.proj.weight, layers.2.blocks.2.attn.proj.bias, layers.2.blocks.2.norm2.weight, layers.2.blocks.2.norm2.bias, layers.2.blocks.2.mlp.fc1.weight, layers.2.blocks.2.mlp.fc1.bias, layers.2.blocks.2.mlp.fc2.weight, layers.2.blocks.2.mlp.fc2.bias, layers.2.blocks.3.norm1.weight, layers.2.blocks.3.norm1.bias, layers.2.blocks.3.attn.relative_position_bias_table, layers.2.blocks.3.attn.relative_position_index, layers.2.blocks.3.attn.qkv.weight, layers.2.blocks.3.attn.qkv.bias, layers.2.blocks.3.attn.proj.weight, layers.2.blocks.3.attn.proj.bias, layers.2.blocks.3.norm2.weight, layers.2.blocks.3.norm2.bias, layers.2.blocks.3.mlp.fc1.weight, layers.2.blocks.3.mlp.fc1.bias, layers.2.blocks.3.mlp.fc2.weight, layers.2.blocks.3.mlp.fc2.bias, layers.2.blocks.4.norm1.weight, layers.2.blocks.4.norm1.bias, layers.2.blocks.4.attn.relative_position_bias_table, layers.2.blocks.4.attn.relative_position_index, layers.2.blocks.4.attn.qkv.weight, layers.2.blocks.4.attn.qkv.bias, layers.2.blocks.4.attn.proj.weight, layers.2.blocks.4.attn.proj.bias, layers.2.blocks.4.norm2.weight, layers.2.blocks.4.norm2.bias, layers.2.blocks.4.mlp.fc1.weight, layers.2.blocks.4.mlp.fc1.bias, layers.2.blocks.4.mlp.fc2.weight, layers.2.blocks.4.mlp.fc2.bias, layers.2.blocks.5.norm1.weight, layers.2.blocks.5.norm1.bias, layers.2.blocks.5.attn.relative_position_bias_table, layers.2.blocks.5.attn.relative_position_index, layers.2.blocks.5.attn.qkv.weight, layers.2.blocks.5.attn.qkv.bias, layers.2.blocks.5.attn.proj.weight, layers.2.blocks.5.attn.proj.bias, layers.2.blocks.5.norm2.weight, layers.2.blocks.5.norm2.bias, layers.2.blocks.5.mlp.fc1.weight, layers.2.blocks.5.mlp.fc1.bias, layers.2.blocks.5.mlp.fc2.weight, layers.2.blocks.5.mlp.fc2.bias, layers.2.downsample.reduction.weight, layers.2.downsample.norm.weight, layers.2.downsample.norm.bias, layers.3.blocks.0.norm1.weight, layers.3.blocks.0.norm1.bias, layers.3.blocks.0.attn.relative_position_bias_table, layers.3.blocks.0.attn.relative_position_index, layers.3.blocks.0.attn.qkv.weight, layers.3.blocks.0.attn.qkv.bias, layers.3.blocks.0.attn.proj.weight, layers.3.blocks.0.attn.proj.bias, layers.3.blocks.0.norm2.weight, layers.3.blocks.0.norm2.bias, layers.3.blocks.0.mlp.fc1.weight, layers.3.blocks.0.mlp.fc1.bias, layers.3.blocks.0.mlp.fc2.weight, layers.3.blocks.0.mlp.fc2.bias, layers.3.blocks.1.norm1.weight, layers.3.blocks.1.norm1.bias, layers.3.blocks.1.attn.relative_position_bias_table, layers.3.blocks.1.attn.relative_position_index, layers.3.blocks.1.attn.qkv.weight, layers.3.blocks.1.attn.qkv.bias, layers.3.blocks.1.attn.proj.weight, layers.3.blocks.1.attn.proj.bias, layers.3.blocks.1.norm2.weight, layers.3.blocks.1.norm2.bias, layers.3.blocks.1.mlp.fc1.weight, layers.3.blocks.1.mlp.fc1.bias, layers.3.blocks.1.mlp.fc2.weight, layers.3.blocks.1.mlp.fc2.bias, norm0.weight, norm0.bias, norm1.weight, norm1.bias, norm2.weight, norm2.bias, norm3.weight, norm3.bias

model = dict( pretrained='/storage/wjb/AlignPS/pretrained/swin_tiny_patch4_window7_224.pth', backbone=dict( type='SwinTransformer', embed_dim=96, depths=[2, 2, 6, 2], num_heads=[3, 6, 12, 24], window_size=7, mlp_ratio=4., qkv_bias=True, qk_scale=None, drop_rate=0., attn_drop_rate=0., drop_path_rate=0.2, ape=False, patch_norm=True, out_indices=(0, 1, 2, 3), use_checkpoint=False),

opened by jiabeiwangTJU 5

TypeError: MaskRCNN: SwinTransformer: empty() received an invalid combination of arguments

Hello, I'm now trying to apply Swin Transfomer for object detection. I meet this error both with or without GPU when training with swin models (the other models are fine when training).

Checklist

I have searched related issues but cannot get the expected help.
I have read the FAQ documentation but cannot get the expected help.
The bug has not been fixed in the latest version.

Describe the bug At the first time, I got into these errors: TypeError: MaskRCNN: SwinTransformer: init() got an unexpected keyword argument 'embed_dim' TypeError: MaskRCNN: SwinTransformer: init() got an unexpected keyword argument 'ape' TypeError: MaskRCNN: SwinTransformer: init() got an unexpected keyword argument 'use_checkpoint' Then I've followed an answered issue here to change "embed_dim" to "embed_dims" in the configs, and I have also commented out the "ape=False" and "use_checkpoint=False" lines in the configs. Then these errors didn't happen again :)).

But right after that, I got into the error below:

**TypeError: MaskRCNN: SwinTransformer: empty() received an invalid combination of arguments - got (tuple, dtype=NoneType, device=NoneType), but expected one of:

(tuple of ints size, *, tuple of names names, torch.memory_format memory_format, torch.dtype dtype, torch.layout layout, torch.device device, bool pin_memory, bool requires_grad)
(tuple of ints size, , torch.memory_format memory_format, Tensor out, torch.dtype dtype, torch.layout layout, torch.device device, bool pin_memory, bool requires_grad)*

Reproduction

What command or script did you run?

python3 tools/train.py configs/swin/mask_rcnn_swin_tiny_patch4_window7_mstrain_480-800_adamw_1x_coco.py

This also happened with other swin configs I've tried and my swin-based config!

OUTPUT:

2022-02-16 16:19:04,508 - mmdet - INFO - Distributed training: False
2022-02-16 16:19:07,995 - mmdet - INFO - Config:
model = dict(
    type='MaskRCNN',
    pretrained=None,
    backbone=dict(
        type='SwinTransformer',
        embed_dims=96,
        depths=[2, 2, 6, 2],
        num_heads=[3, 6, 12, 24],
        window_size=7,
        mlp_ratio=4.0,
        qkv_bias=True,
        qk_scale=None,
        drop_rate=0.0,
        attn_drop_rate=0.0,
        drop_path_rate=0.1,
        patch_norm=True,
        out_indices=(0, 1, 2, 3)),
    neck=dict(
        type='FPN',
        in_channels=[96, 192, 384, 768],
        out_channels=256,
        num_outs=5),
    rpn_head=dict(
        type='RPNHead',
        in_channels=256,
        feat_channels=256,
        anchor_generator=dict(
            type='AnchorGenerator',
            scales=[8],
            ratios=[0.5, 1.0, 2.0],
            strides=[4, 8, 16, 32, 64]),
        bbox_coder=dict(
            type='DeltaXYWHBBoxCoder',
            target_means=[0.0, 0.0, 0.0, 0.0],
            target_stds=[1.0, 1.0, 1.0, 1.0]),
        loss_cls=dict(
            type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0),
        loss_bbox=dict(type='L1Loss', loss_weight=1.0)),
    roi_head=dict(
        type='StandardRoIHead',
        bbox_roi_extractor=dict(
            type='SingleRoIExtractor',
            roi_layer=dict(type='RoIAlign', output_size=7, sampling_ratio=0),
            out_channels=256,
            featmap_strides=[4, 8, 16, 32]),
        bbox_head=dict(
            type='Shared2FCBBoxHead',
            in_channels=256,
            fc_out_channels=1024,
            roi_feat_size=7,
            num_classes=80,
            bbox_coder=dict(
                type='DeltaXYWHBBoxCoder',
                target_means=[0.0, 0.0, 0.0, 0.0],
                target_stds=[0.1, 0.1, 0.2, 0.2]),
            reg_class_agnostic=False,
            loss_cls=dict(
                type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0),
            loss_bbox=dict(type='L1Loss', loss_weight=1.0)),
        mask_roi_extractor=dict(
            type='SingleRoIExtractor',
            roi_layer=dict(type='RoIAlign', output_size=14, sampling_ratio=0),
            out_channels=256,
            featmap_strides=[4, 8, 16, 32]),
        mask_head=dict(
            type='FCNMaskHead',
            num_convs=4,
            in_channels=256,
            conv_out_channels=256,
            num_classes=80,
            loss_mask=dict(
                type='CrossEntropyLoss', use_mask=True, loss_weight=1.0))),
    train_cfg=dict(
        rpn=dict(
            assigner=dict(
                type='MaxIoUAssigner',
                pos_iou_thr=0.7,
                neg_iou_thr=0.3,
                min_pos_iou=0.3,
                match_low_quality=True,
                ignore_iof_thr=-1),
            sampler=dict(
                type='RandomSampler',
                num=256,
                pos_fraction=0.5,
                neg_pos_ub=-1,
                add_gt_as_proposals=False),
            allowed_border=-1,
            pos_weight=-1,
            debug=False),
        rpn_proposal=dict(
            nms_pre=2000,
            max_per_img=1000,
            nms=dict(type='nms', iou_threshold=0.7),
            min_bbox_size=0),
        rcnn=dict(
            assigner=dict(
                type='MaxIoUAssigner',
                pos_iou_thr=0.5,
                neg_iou_thr=0.5,
                min_pos_iou=0.5,
                match_low_quality=True,
                ignore_iof_thr=-1),
            sampler=dict(
                type='RandomSampler',
                num=512,
                pos_fraction=0.25,
                neg_pos_ub=-1,
                add_gt_as_proposals=True),
            mask_size=28,
            pos_weight=-1,
            debug=False)),
    test_cfg=dict(
        rpn=dict(
            nms_pre=1000,
            max_per_img=1000,
            nms=dict(type='nms', iou_threshold=0.7),
            min_bbox_size=0),
        rcnn=dict(
            score_thr=0.05,
            nms=dict(type='nms', iou_threshold=0.5),
            max_per_img=100,
            mask_thr_binary=0.5)))
dataset_type = 'CocoDataset'
data_root = 'data/coco/'
img_norm_cfg = dict(
    mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)
train_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(type='LoadAnnotations', with_bbox=True, with_mask=True),
    dict(type='RandomFlip', flip_ratio=0.5),
    dict(
        type='AutoAugment',
        policies=[[{
            'type':
            'Resize',
            'img_scale': [(480, 1333), (512, 1333), (544, 1333), (576, 1333),
                          (608, 1333), (640, 1333), (672, 1333), (704, 1333),
                          (736, 1333), (768, 1333), (800, 1333)],
            'multiscale_mode':
            'value',
            'keep_ratio':
            True
        }],
                  [{
                      'type': 'Resize',
                      'img_scale': [(400, 1333), (500, 1333), (600, 1333)],
                      'multiscale_mode': 'value',
                      'keep_ratio': True
                  }, {
                      'type': 'RandomCrop',
                      'crop_type': 'absolute_range',
                      'crop_size': (384, 600),
                      'allow_negative_crop': True
                  }, {
                      'type':
                      'Resize',
                      'img_scale': [(480, 1333), (512, 1333), (544, 1333),
                                    (576, 1333), (608, 1333), (640, 1333),
                                    (672, 1333), (704, 1333), (736, 1333),
                                    (768, 1333), (800, 1333)],
                      'multiscale_mode':
                      'value',
                      'override':
                      True,
                      'keep_ratio':
                      True
                  }]]),
    dict(
        type='Normalize',
        mean=[123.675, 116.28, 103.53],
        std=[58.395, 57.12, 57.375],
        to_rgb=True),
    dict(type='Pad', size_divisor=32),
    dict(type='DefaultFormatBundle'),
    dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels', 'gt_masks'])
]
test_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(
        type='MultiScaleFlipAug',
        img_scale=(1333, 800),
        flip=False,
        transforms=[
            dict(type='Resize', keep_ratio=True),
            dict(type='RandomFlip'),
            dict(
                type='Normalize',
                mean=[123.675, 116.28, 103.53],
                std=[58.395, 57.12, 57.375],
                to_rgb=True),
            dict(type='Pad', size_divisor=32),
            dict(type='ImageToTensor', keys=['img']),
            dict(type='Collect', keys=['img'])
        ])
]
data = dict(
    samples_per_gpu=2,
    workers_per_gpu=2,
    train=dict(
        type='CocoDataset',
        ann_file='data/coco/annotations/instances_train2017.json',
        img_prefix='data/coco/train2017/',
        pipeline=[
            dict(type='LoadImageFromFile'),
            dict(type='LoadAnnotations', with_bbox=True, with_mask=True),
            dict(type='RandomFlip', flip_ratio=0.5),
            dict(
                type='AutoAugment',
                policies=[[{
                    'type':
                    'Resize',
                    'img_scale': [(480, 1333), (512, 1333), (544, 1333),
                                  (576, 1333), (608, 1333), (640, 1333),
                                  (672, 1333), (704, 1333), (736, 1333),
                                  (768, 1333), (800, 1333)],
                    'multiscale_mode':
                    'value',
                    'keep_ratio':
                    True
                }],
                          [{
                              'type': 'Resize',
                              'img_scale': [(400, 1333), (500, 1333),
                                            (600, 1333)],
                              'multiscale_mode': 'value',
                              'keep_ratio': True
                          }, {
                              'type': 'RandomCrop',
                              'crop_type': 'absolute_range',
                              'crop_size': (384, 600),
                              'allow_negative_crop': True
                          }, {
                              'type':
                              'Resize',
                              'img_scale': [(480, 1333), (512, 1333),
                                            (544, 1333), (576, 1333),
                                            (608, 1333), (640, 1333),
                                            (672, 1333), (704, 1333),
                                            (736, 1333), (768, 1333),
                                            (800, 1333)],
                              'multiscale_mode':
                              'value',
                              'override':
                              True,
                              'keep_ratio':
                              True
                          }]]),
            dict(
                type='Normalize',
                mean=[123.675, 116.28, 103.53],
                std=[58.395, 57.12, 57.375],
                to_rgb=True),
            dict(type='Pad', size_divisor=32),
            dict(type='DefaultFormatBundle'),
            dict(
                type='Collect',
                keys=['img', 'gt_bboxes', 'gt_labels', 'gt_masks'])
        ]),
    val=dict(
        type='CocoDataset',
        ann_file='data/coco/annotations/instances_val2017.json',
        img_prefix='data/coco/val2017/',
        pipeline=[
            dict(type='LoadImageFromFile'),
            dict(
                type='MultiScaleFlipAug',
                img_scale=(1333, 800),
                flip=False,
                transforms=[
                    dict(type='Resize', keep_ratio=True),
                    dict(type='RandomFlip'),
                    dict(
                        type='Normalize',
                        mean=[123.675, 116.28, 103.53],
                        std=[58.395, 57.12, 57.375],
                        to_rgb=True),
                    dict(type='Pad', size_divisor=32),
                    dict(type='ImageToTensor', keys=['img']),
                    dict(type='Collect', keys=['img'])
                ])
        ]),
    test=dict(
        type='CocoDataset',
        ann_file='data/coco/annotations/instances_val2017.json',
        img_prefix='data/coco/val2017/',
        pipeline=[
            dict(type='LoadImageFromFile'),
            dict(
                type='MultiScaleFlipAug',
                img_scale=(1333, 800),
                flip=False,
                transforms=[
                    dict(type='Resize', keep_ratio=True),
                    dict(type='RandomFlip'),
                    dict(
                        type='Normalize',
                        mean=[123.675, 116.28, 103.53],
                        std=[58.395, 57.12, 57.375],
                        to_rgb=True),
                    dict(type='Pad', size_divisor=32),
                    dict(type='ImageToTensor', keys=['img']),
                    dict(type='Collect', keys=['img'])
                ])
        ]))
evaluation = dict(metric=['bbox', 'segm'])
optimizer = dict(
    type='AdamW',
    lr=0.0001,
    betas=(0.9, 0.999),
    weight_decay=0.05,
    paramwise_cfg=dict(
        custom_keys=dict(
            absolute_pos_embed=dict(decay_mult=0.0),
            relative_position_bias_table=dict(decay_mult=0.0),
            norm=dict(decay_mult=0.0))))
optimizer_config = dict(
    grad_clip=None,
    type='DistOptimizerHook',
    update_interval=1,
    coalesce=True,
    bucket_size_mb=-1,
    use_fp16=True)
lr_config = dict(
    policy='step',
    warmup='linear',
    warmup_iters=500,
    warmup_ratio=0.001,
    step=[8, 11])
runner = dict(type='EpochBasedRunnerAmp', max_epochs=12)
checkpoint_config = dict(interval=1)
log_config = dict(interval=50, hooks=[dict(type='TextLoggerHook')])
custom_hooks = [dict(type='NumClassCheckHook')]
dist_params = dict(backend='nccl')
log_level = 'INFO'
load_from = None
resume_from = None
workflow = [('train', 1)]
fp16 = None
work_dir = './work_dirs/mask_rcnn_swin_tiny_patch4_window7_mstrain_480-800_adamw_1x_coco'
gpu_ids = range(0, 1)

Traceback (most recent call last):
  File "/usr/local/lib/python3.7/dist-packages/mmcv/utils/registry.py", line 52, in build_from_cfg
    return obj_cls(**args)
  File "/usr/local/lib/python3.7/dist-packages/mmdet/models/backbones/swin.py", line 631, in __init__
    init_cfg=None)
  File "/usr/local/lib/python3.7/dist-packages/mmdet/models/backbones/swin.py", line 450, in __init__
    init_cfg=None)
  File "/usr/local/lib/python3.7/dist-packages/mmdet/models/backbones/swin.py", line 356, in __init__
    init_cfg=None)
  File "/usr/local/lib/python3.7/dist-packages/mmcv/utils/misc.py", line 340, in new_func
    output = old_func(*args, **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/mmcv/cnn/bricks/transformer.py", line 607, in __init__
    Linear(in_channels, feedforward_channels), self.activate,
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/linear.py", line 85, in __init__
    self.weight = Parameter(torch.empty((out_features, in_features), **factory_kwargs))
TypeError: empty() received an invalid combination of arguments - got (tuple, dtype=NoneType, device=NoneType), but expected one of:
 * (tuple of ints size, *, tuple of names names, torch.memory_format memory_format, torch.dtype dtype, torch.layout layout, torch.device device, bool pin_memory, bool requires_grad)
 * (tuple of ints size, *, torch.memory_format memory_format, Tensor out, torch.dtype dtype, torch.layout layout, torch.device device, bool pin_memory, bool requires_grad)


During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.7/dist-packages/mmcv/utils/registry.py", line 52, in build_from_cfg
    return obj_cls(**args)
  File "/usr/local/lib/python3.7/dist-packages/mmdet/models/detectors/mask_rcnn.py", line 27, in __init__
    init_cfg=init_cfg)
  File "/usr/local/lib/python3.7/dist-packages/mmdet/models/detectors/two_stage.py", line 32, in __init__
    self.backbone = build_backbone(backbone)
  File "/usr/local/lib/python3.7/dist-packages/mmdet/models/builder.py", line 20, in build_backbone
    return BACKBONES.build(cfg)
  File "/usr/local/lib/python3.7/dist-packages/mmcv/utils/registry.py", line 212, in build
    return self.build_func(*args, **kwargs, registry=self)
  File "/usr/local/lib/python3.7/dist-packages/mmcv/cnn/builder.py", line 27, in build_model_from_cfg
    return build_from_cfg(cfg, registry, default_args)
  File "/usr/local/lib/python3.7/dist-packages/mmcv/utils/registry.py", line 55, in build_from_cfg
    raise type(e)(f'{obj_cls.__name__}: {e}')
TypeError: SwinTransformer: empty() received an invalid combination of arguments - got (tuple, dtype=NoneType, device=NoneType), but expected one of:
 * (tuple of ints size, *, tuple of names names, torch.memory_format memory_format, torch.dtype dtype, torch.layout layout, torch.device device, bool pin_memory, bool requires_grad)
 * (tuple of ints size, *, torch.memory_format memory_format, Tensor out, torch.dtype dtype, torch.layout layout, torch.device device, bool pin_memory, bool requires_grad)


During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "tools/train.py", line 187, in <module>
    main()
  File "tools/train.py", line 161, in main
    test_cfg=cfg.get('test_cfg'))
  File "/usr/local/lib/python3.7/dist-packages/mmdet/models/builder.py", line 59, in build_detector
    cfg, default_args=dict(train_cfg=train_cfg, test_cfg=test_cfg))
  File "/usr/local/lib/python3.7/dist-packages/mmcv/utils/registry.py", line 212, in build
    return self.build_func(*args, **kwargs, registry=self)
  File "/usr/local/lib/python3.7/dist-packages/mmcv/cnn/builder.py", line 27, in build_model_from_cfg
    return build_from_cfg(cfg, registry, default_args)
  File "/usr/local/lib/python3.7/dist-packages/mmcv/utils/registry.py", line 55, in build_from_cfg
    raise type(e)(f'{obj_cls.__name__}: {e}')
TypeError: MaskRCNN: SwinTransformer: empty() received an invalid combination of arguments - got (tuple, dtype=NoneType, device=NoneType), but expected one of:
 * (tuple of ints size, *, tuple of names names, torch.memory_format memory_format, torch.dtype dtype, torch.layout layout, torch.device device, bool pin_memory, bool requires_grad)
 * (tuple of ints size, *, torch.memory_format memory_format, Tensor out, torch.dtype dtype, torch.layout layout, torch.device device, bool pin_memory, bool requires_grad)

What dataset did you use? I've tried to train with default coco2017 dataset, then on my own dataset but the errors always came.

Environment

2022-02-16 16:19:01,022 - mmdet - INFO - Environment info:

sys.platform: linux Python: 3.7.12 (default, Jan 15 2022, 18:48:18) [GCC 7.5.0] CUDA available: True GPU 0: Tesla K80 CUDA_HOME: /usr/local/cuda NVCC: Build cuda_11.1.TC455_06.29190527_0 GCC: gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0 PyTorch: 1.10.0+cu111 PyTorch compiling details: PyTorch built with:

GCC 7.3
C++ Version: 201402
Intel(R) Math Kernel Library Version 2020.0.0 Product Build 20191122 for Intel(R) 64 architecture applications
Intel(R) MKL-DNN v2.2.3 (Git Hash 7336ca9f055cf1bfa13efb658fe15dc9b41f0740)
OpenMP 201511 (a.k.a. OpenMP 4.5)
LAPACK is enabled (usually provided by MKL)
NNPACK is enabled
CPU capability usage: AVX2
CUDA Runtime 11.1
NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86
CuDNN 8.0.5
Magma 2.5.2
Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.1, CUDNN_VERSION=8.0.5, CXX_COMPILER=/opt/rh/devtoolset-7/root/usr/bin/c++, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -DEDGE_PROFILER_USE_KINETO -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.10.0, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON,

TorchVision: 0.11.1+cu111 OpenCV: 4.1.2 MMCV: 1.4.5 MMCV Compiler: GCC 7.3 MMCV CUDA Compiler: 11.1 MMDetection: 2.21.0+461e003

Please help! This is one of my first experiences working with github and computer vision projects! Thanks for any help!

opened by queman 4

RuntimeError: Default process group has not been initialized, please make sure to call init_process_group for cascade_mask_rcnn_swin_base_patch4

Thanks for your error report and we appreciate it a lot.

Describe the bug

Getting a run time error "RuntimeError: Default process group has not been initialized, please make sure to call init_process_group." when training using below command

python tools/train.py configs/swin/cascade_mask_rcnn_swin_base_patch4_window7_mstrain_480-800_giou_4conv1f_adamw_3x_coco.py --cfg-options model.pretrained=swin_base_patch4_window12_384_22k.pth

Error:

Traceback (most recent call last):
  File "C:\Users\user\Swin-Transformer-Object-Detection\tools\train.py", line 187, in <module>
    main()
  File "C:\Users\user\Swin-Transformer-Object-Detection\tools\train.py", line 176, in main
    train_detector(
  File "c:\users\user\swin-transformer-object-detection\mmdet\apis\train.py", line 185, in train_detector
    runner.run(data_loaders, cfg.workflow)
  File "C:\Users\user\anaconda3\envs\py_env\lib\site-packages\mmcv\runner\epoch_based_runner.py", line 127, in run
    epoch_runner(data_loaders[i], **kwargs)
  File "C:\Users\user\anaconda3\envs\py_env\lib\site-packages\mmcv\runner\epoch_based_runner.py", line 50, in train
    self.run_iter(data_batch, train_mode=True, **kwargs)
  File "C:\Users\user\anaconda3\envs\py_env\lib\site-packages\mmcv\runner\epoch_based_runner.py", line 29, in run_iter
    outputs = self.model.train_step(data_batch, self.optimizer,
  File "C:\Users\user\anaconda3\envs\py_env\lib\site-packages\mmcv\parallel\data_parallel.py", line 75, in train_step
    return self.module.train_step(*inputs[0], **kwargs[0])
  File "c:\users\user\swin-transformer-object-detection\mmdet\models\detectors\base.py", line 247, in train_step
    losses = self(**data)
  File "C:\Users\user\anaconda3\envs\py_env\lib\site-packages\torch\nn\modules\module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "C:\Users\user\anaconda3\envs\py_env\lib\site-packages\mmcv\runner\fp16_utils.py", line 98, in new_func
    return old_func(*args, **kwargs)
  File "c:\users\user\swin-transformer-object-detection\mmdet\models\detectors\base.py", line 181, in forward
    return self.forward_train(img, img_metas, **kwargs)
  File "c:\users\user\swin-transformer-object-detection\mmdet\models\detectors\two_stage.py", line 161, in forward_train
    roi_losses = self.roi_head.forward_train(x, img_metas, proposal_list,
  File "c:\users\user\swin-transformer-object-detection\mmdet\models\roi_heads\cascade_roi_head.py", line 257, in forward_train
    bbox_results = self._bbox_forward_train(i, x, sampling_results,
  File "c:\users\user\swin-transformer-object-detection\mmdet\models\roi_heads\cascade_roi_head.py", line 157, in _bbox_forward_train
    bbox_results = self._bbox_forward(stage, x, rois)
  File "c:\users\user\swin-transformer-object-detection\mmdet\models\roi_heads\cascade_roi_head.py", line 147, in _bbox_forward
    cls_score, bbox_pred = bbox_head(bbox_feats)
  File "C:\Users\user\anaconda3\envs\py_env\lib\site-packages\torch\nn\modules\module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "c:\users\user\swin-transformer-object-detection\mmdet\models\roi_heads\bbox_heads\convfc_bbox_head.py", line 139, in forward
    x = conv(x)
  File "C:\Users\user\anaconda3\envs\py_env\lib\site-packages\torch\nn\modules\module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "C:\Users\user\anaconda3\envs\py_env\lib\site-packages\mmcv\cnn\bricks\conv_module.py", line 203, in forward
    x = self.norm(x)
  File "C:\Users\user\anaconda3\envs\py_env\lib\site-packages\torch\nn\modules\module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "C:\Users\user\anaconda3\envs\py_env\lib\site-packages\torch\nn\modules\batchnorm.py", line 732, in forward
    world_size = torch.distributed.get_world_size(process_group)
  File "C:\Users\user\anaconda3\envs\py_env\lib\site-packages\torch\distributed\distributed_c10d.py", line 845, in get_world_size
    return _get_group_size(group)
  File "C:\Users\user\anaconda3\envs\py_env\lib\site-packages\torch\distributed\distributed_c10d.py", line 306, in _get_group_size
    default_pg = _get_default_group()
  File "C:\Users\user\anaconda3\envs\py_env\lib\site-packages\torch\distributed\distributed_c10d.py", line 410, in _get_default_group

A clear and concise description of what the bug is.

Reproduction

Training with single GPU

python tools/train.py configs/swin/cascade_mask_rcnn_swin_base_patch4_window7_mstrain_480-800_giou_4conv1f_adamw_3x_coco.py --cfg-options model.pretrained=swin_base_patch4_window12_384_22k.pth


2. Did you make any modifications on the code or config? Did you understand what you have modified?

Changed the number of classes in configs to 2 classes

3. What dataset did you use?

Custom coco dataset

**Environment**

OS : Windows 10
Torch: 1.1.0
MMCV: 1.4.0

Please advise. Thanks.

opened by rgkannan676 4

Attempt for SwinTransformer + Faster RCNN

Hello, I tried to perform Swin Transformer as a backbone to feed the feature map into Faster RCNN. I used the config of Swin Transformer and FPN from the file "configs/base/models/mask_rcnn_swin_fpn.py" and the rpn_head and roi_head from "configs/base/models/faster_rcnn_r50_fpn.py", in order to construct a new architecture for object detection. Then the pretrained file "swin_tiny_patch4_window7_224.pth" has been imported. But as shown in the console, "The model and loaded state dict do not match exactly". I wonder whether I make some mistakes across the construction of the architecture or the utilization of the pretrained model?

opened by wliebtzy 4
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc0 in position 5096: invalid start byte

I input python tools/train.py configs\swin\cascade_mask_rcnn_swin_tiny_patch4_window7_mstrain_480-800_giou_4conv1f_adamw_3x_coco.py to train my dataset, but it generates UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc0 in position 5096: invalid start byte How can I solve this problem? Thanks

opened by blurmemo 3
Failed to debug the code with VSCode (Unavailable Break point)

Hello, I try to debug the inference process and monitor the variables in swin transformer object detection. When I set a break point in function "single_gpu_test" in "/Swin-Transformer-Object-Detection/mmdet/apis", it is useless. And I find that the code runs the "single_gpu_test" in "lib/python3.8/site-packages/mmdet-2.11.0-py3.8.egg/mmdet/apis". So how can I set a available break point to check the variables in "single_gpu_test"? Why is the definition of the function "single_gpu_test" in the "/Swin-Transformer-Object-Detection/mmdet/apis" but implemented in "lib/python3.8/site-packages/mmdet-2.11.0-py3.8.egg/mmdet/apis"?

Thank you!

opened by wliebtzy 0
Inference Score is low AP - on different dataset

Hello I tried the model and AP is above 40 percent but when I test it on a different dataset the AP is around 0.04 what could be the possible reasons for failure? I trained the model on 256x256 image size and tested on 256x256 and higher sizes but nothing changed AP is still very low on the testing set. Even during I have a separate validation set but AP is good enough around 40

opened by Mahmood-Hussain 0
Running error on model Swin-ReppointV2 : IndexError: The shape of the mask [3070240] at index 0 does not match the shape of the indexed tensor [383780] at index 0
Prerequisite

I have searched related issues but cannot get the expected help.

I have read the FAQ documentation but cannot get the expected help.

The bug has not been fixed in the latest version.

Describe the bug

When I run the command python tools/train.py configs/swin/reppoitsv2_swin_tiny_patch4_window7_mstrain_480_960_giou_gfocal_bifpn_adamw_3x_coco.py , the error traceback as below appeared:

Traceback (most recent call last): File "tools/train.py", line 194, in <module> main() File "tools/train.py", line 183, in main train_detector( File "d:\swin-transformer-object-detection\mmdet\apis\train.py", line 185, in train_detector runner.run(data_loaders, cfg.workflow) File "C:\Users\hlj\.conda\envs\torch_110\lib\site-packages\mmcv\runner\epoch_based_runner.py", line 127, in run epoch_runner(data_loaders[i], **kwargs) File "C:\Users\hlj\.conda\envs\torch_110\lib\site-packages\mmcv\runner\epoch_based_runner.py", line 50, in train self.run_iter(data_batch, train_mode=True, **kwargs) File "C:\Users\hlj\.conda\envs\torch_110\lib\site-packages\mmcv\runner\epoch_based_runner.py", line 29, in run_iter outputs = self.model.train_step(data_batch, self.optimizer, File "C:\Users\hlj\.conda\envs\torch_110\lib\site-packages\mmcv\parallel\data_parallel.py", line 75, in train_step return self.module.train_step(*inputs[0], **kwargs[0]) File "d:\swin-transformer-object-detection\mmdet\models\detectors\base.py", line 247, in train_step losses = self(**data) File "C:\Users\hlj\.conda\envs\torch_110\lib\site-packages\torch\nn\modules\module.py", line 889, in _call_impl result = self.forward(*input, **kwargs) File "C:\Users\hlj\.conda\envs\torch_110\lib\site-packages\mmcv\runner\fp16_utils.py", line 128, in new_func output = old_func(*new_args, **new_kwargs) File "d:\swin-transformer-object-detection\mmdet\models\detectors\base.py", line 181, in forward return self.forward_train(img, img_metas, **kwargs) File "d:\swin-transformer-object-detection\mmdet\models\detectors\reppoints_v2_detector.py", line 34, in forward_train losses = self.bbox_head.loss( File "d:\swin-transformer-object-detection\mmdet\models\dense_heads\reppoints_v2_head.py", line 1096, in loss loss_sem = self.loss_sem(concat_sem_scores, concat_gt_sem_map, concat_gt_sem_weights, avg_factor=(concat_gt_sem_map > 0).sum()) File "C:\Users\hlj\.conda\envs\torch_110\lib\site-packages\torch\nn\modules\module.py", line 889, in _call_impl result = self.forward(*input, **kwargs) File "d:\swin-transformer-object-detection\mmdet\models\losses\focal_loss.py", line 237, in forward loss_cls = self.loss_weight * separate_sigmoid_focal_loss( File "d:\swin-transformer-object-detection\mmdet\models\losses\focal_loss.py", line 75, in separate_sigmoid_focal_loss pos_pred = pred_sigmoid[pos_inds] IndexError: The shape of the mask [3070240] at index 0 does not match the shape of the indexed tensor [383780] at index 0

As for modifications on the code, I only changed the 'num_classes' in model and 'CLASSES' in script 'coco.py'

However, I run the train.py for training of model Mask R-CNN as below, it can work well without this error. python tools/train.py configs/swin/mask_rcnn_swin_tiny_patch4_window7_mstrain_480-800_adamw_1x_coco.py

Environment

sys.platform: win32 Python: 3.8.13 (default, Mar 28 2022, 06:59:08) [MSC v.1916 64 bit (AMD64)] CUDA available: True GPU 0: NVIDIA GeForce RTX 3090 CUDA_HOME: C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.1 NVCC: Build cuda_11.1.relgpu_drvr455TC455_06.29190527_0 GCC: n/a PyTorch: 1.8.1+cu111 PyTorch compiling details: PyTorch built with:

C++ Version: 199711

MSVC 192829913

Intel(R) Math Kernel Library Version 2020.0.2 Product Build 20200624 for Intel(R) 64 architecture applications

Intel(R) MKL-DNN v1.7.0 (Git Hash 7aed236906b1f7a05c0917e5257a1af05e9ff683)

OpenMP 2019

CPU capability usage: AVX2

CUDA Runtime 11.1

NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;a rch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_37,code=compute_37

CuDNN 8.0.5

Magma 2.5.4

Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.1, CUDNN_VERSION=8.0.5, CXX_COMPILER=C:/w/b/windows/tmp_bin/sccache-cl.exe, CXX_FLAGS=/DWIN32 /D_WINDOWS /GR /EHsc /w /bigobj -DUSE_PTHREADPOOL -ope nmp:experimental -DNDEBUG -DUSE_FBGEMM -DUSE_XNNPACK, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.8.1, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG =OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=OFF, USE_NNPACK=OFF, USE_OPENMP=ON,

TorchVision: 0.9.1+cu111 OpenCV: 4.6.0 MMCV: 1.4.1 MMCV Compiler: MSVC 192930137 MMCV CUDA Compiler: 11.1 MMDetection: 2.11.0+c7b2011

Look forward to the reply. Thanks!
opened by busy-wong 2
Train common backbone with multiple detection heads

Hi, I want to train COCO and VOC together using the same backbone. How can I ensure that the concat dataset has separate obj detection heads catering to the different number of classes in the two datasets, 80 and 20 respectively. Is there any better workaround? Any help will be appreciated!

Thanks

opened by jinga-lala 0

This is an official implementation for "Swin Transformer: Hierarchical Vision Transformer using Shifted Windows" on Object Detection and Instance Segmentation.

Related tags

Overview

Swin Transformer for Object Detection

Updates

Results and Models

Mask R-CNN

Cascade Mask R-CNN

RepPoints V2

Mask RepPoints V2

Usage

Installation

Inference

Training

Apex (optional):

Citing Swin Transformer

Other Links

Comments

output: fatal: not a git repository (or any of the parent directories): .git 2021-10-30 14:57:51,909 - mmdet - INFO - Environment info:

TorchVision: 0.8.2 OpenCV: 4.4.0 MMCV: 1.3.16 MMCV Compiler: GCC 7.3 MMCV CUDA Compiler: 11.0 MMDetection: 2.18.0+

2022-02-16 16:19:01,022 - mmdet - INFO - Environment info:

TorchVision: 0.11.1+cu111 OpenCV: 4.1.2 MMCV: 1.4.5 MMCV Compiler: GCC 7.3 MMCV CUDA Compiler: 11.1 MMDetection: 2.21.0+461e003

Owner

Swin Transformer

TorchDistiller - a collection of the open source pytorch code for knowledge distillation, especially for the perception tasks, including semantic segmentation, depth estimation, object detection and instance segmentation.

Fast, modular reference implementation of Instance Segmentation and Object Detection algorithms in PyTorch.

This is the official implementation of the paper "Object Propagation via Inter-Frame Attentions for Temporally Stable Video Instance Segmentation".

Complete-IoU (CIoU) Loss and Cluster-NMS for Object Detection and Instance Segmentation (YOLACT)

Mask R-CNN for object detection and instance segmentation on Keras and TensorFlow

Leveraging Instance-, Image- and Dataset-Level Information for Weakly Supervised Instance Segmentation

Object detection and instance segmentation toolkit based on PaddlePaddle.

Res2Net for Instance segmentation and Object detection using MaskRCNN

Official PyTorch implementation of Joint Object Detection and Multi-Object Tracking with Graph Neural Networks

Learning RGB-D Feature Embeddings for Unseen Object Instance Segmentation

Robust Instance Segmentation through Reasoning about Multi-Object Occlusion [CVPR 2021]

Codes of paper "Unseen Object Amodal Instance Segmentation via Hierarchical Occlusion Modeling"

Official Pytorch Implementation of Adversarial Instance Augmentation for Building Change Detection in Remote Sensing Images.

Code and models for ICCV2021 paper "Robust Object Detection via Instance-Level Temporal Cycle Confusion".

A Data Annotation Tool for Semantic Segmentation, Object Detection and Lane Line Detection.(In Development Stage)

[ArXiv 2021] Data-Efficient Instance Generation from Instance Discrimination

Code for Multiple Instance Active Learning for Object Detection, CVPR 2021

Code for Multiple Instance Active Learning for Object Detection, CVPR 2021

Instance-conditional Knowledge Distillation for Object Detection