Official Implementation of DAFormer: Improving Network Architectures and Training Strategies for Domain-Adaptive Semantic Segmentation

Overview

DAFormer: Improving Network Architectures and Training Strategies for Domain-Adaptive Semantic Segmentation

[Arxiv] [Paper]

As acquiring pixel-wise annotations of real-world images for semantic segmentation is a costly process, a model can instead be trained with more accessible synthetic data and adapted to real images without requiring their annotations. This process is studied in Unsupervised Domain Adaptation (UDA).

Even though a large number of methods propose new UDA strategies, they are mostly based on outdated network architectures. In this work, we particularly study the influence of the network architecture on UDA performance and propose DAFormer, a network architecture tailored for UDA. It consists of a Transformer encoder and a multi-level context-aware feature fusion decoder.

DAFormer is enabled by three simple but crucial training strategies to stabilize the training and to avoid overfitting the source domain: While the Rare Class Sampling on the source domain improves the quality of pseudo-labels by mitigating the confirmation bias of self-training towards common classes, the Thing-Class ImageNet Feature Distance and a Learning Rate Warmup promote feature transfer from ImageNet pretraining.

DAFormer significantly improves the state-of-the-art performance by 10.8 mIoU for GTA→Cityscapes and by 5.4 mIoU for Synthia→Cityscapes and enables learning even difficult classes such as train, bus, and truck well.

UDA over time

The strengths of DAFormer, compared to the previous state-of-the-art UDA method ProDA, can also be observed in qualitative examples from the Cityscapes validation set.

Demo Color Palette

For more information on DAFormer, please check our [Paper].

If you find this project useful in your research, please consider citing:

@article{hoyer2021daformer,
  title={DAFormer: Improving Network Architectures and Training Strategies for Domain-Adaptive Semantic Segmentation},
  author={Hoyer, Lukas and Dai, Dengxin and Van Gool, Luc},
  journal={arXiv preprint arXiv:2111.14887},
  year={2021}
}

Setup Environment

For this project, we used python 3.8.5. We recommend setting up a new virtual environment:

python -m venv ~/venv/daformer
source ~/venv/daformer/bin/activate

In that environment, the requirements can be installed with:

pip install -r requirements.txt -f https://download.pytorch.org/whl/torch_stable.html
pip install mmcv-full==1.3.7  # requires the other packages to be installed first

Further, please download the MiT weights and a pretrained DAFormer using the following script. If problems occur with the automatic download, please follow the instructions for a manual download within the script.

sh tools/download_checkpoints.sh

All experiments were executed on a NVIDIA RTX 2080 Ti.

Inference Demo

Already as this point, the provided DAFormer model (downloaded by tools/download_checkpoints.sh) can be applied to a demo image:

python -m demo.image_demo demo/demo.png work_dirs/211108_1622_gta2cs_daformer_s0_7f24c/211108_1622_gta2cs_daformer_s0_7f24c.json work_dirs/211108_1622_gta2cs_daformer_s0_7f24c/latest.pth

When judging the predictions, please keep in mind that DAFormer had no access to real-world labels during the training.

Setup Datasets

Cityscapes: Please, download leftImg8bit_trainvaltest.zip and gt_trainvaltest.zip from here and extract them to data/cityscapes.

GTA: Please, download all image and label packages from here and extract them to data/gta.

Synthia: Please, download SYNTHIA-RAND-CITYSCAPES from here and extract it to data/synthia.

The final folder structure should look like this:

DAFormer
├── ...
├── data
│   ├── cityscapes
│   │   ├── leftImg8bit
│   │   │   ├── train
│   │   │   ├── val
│   │   ├── gtFine
│   │   │   ├── train
│   │   │   ├── val
│   ├── gta
│   │   ├── images
│   │   ├── labels
│   ├── synthia
│   │   ├── RGB
│   │   ├── GT
│   │   │   ├── LABELS
├── ...

Data Preprocessing: Finally, please run the following scripts to convert the label IDs to the train IDs and to generate the class index for RCS:

python tools/convert_datasets/gta.py data/gta --nproc 8
python tools/convert_datasets/cityscapes.py data/cityscapes --nproc 8
python tools/convert_datasets/synthia.py data/synthia/ --nproc 8

Training

For convenience, we provide an annotated config file of the final DAFormer. A training job can be launched using:

python run_experiments.py --config configs/daformer/gta2cs_uda_warm_fdthings_rcs_croppl_a999_daformer_mitb5_s0.py

For the experiments in our paper (e.g. network architecture comparison, component ablations, ...), we use a system to automatically generate and train the configs:

python run_experimenty.py --exp <ID>

More information about the available experiments and their assigned IDs, can be found in experiments.py. The generated configs will be stored in configs/generated/.

Testing & Predictions

The provided DAFormer checkpoint trained on GTA->Cityscapes (already downloaded by tools/download_checkpoints.sh) can be tested on the Cityscapes validation set using:

sh test.sh work_dirs/211108_1622_gta2cs_daformer_s0_7f24c

The predictions are saved for inspection to work_dirs/211108_1622_gta2cs_daformer_s0_7f24c/preds and the mIoU of the model is printed to the console. The provided checkpoint should achieve 68.85 mIoU. Refer to the end of work_dirs/211108_1622_gta2cs_daformer_s0_7f24c/20211108_164105.log for more information such as the class-wise IoU.

Similarly, also other models can be tested after the training has finished:

sh test.sh path/to/checkpoint_directory

Framework Structure

This project is based on mmsegmentation version 0.16.0. For more information about the framework structure and the config system, please refer to the mmsegmentation documentation and the mmcv documentation.

The most relevant files for DAFormer are:

Acknowledgements

This project is based on the following open-source projects. We thank their authors for making the source code publically available.

Comments
  • About accuracy

    About accuracy

    Hi, thank you for your wonderful work, but I have a question for you. I would appreciate it if you could answer it.

    When I run “python run_experiments.py --config configs/daformer/gta2cs_uda_warm_fdthings_rcs_croppl_a999_daformer_mitb5_s0.py ”, I find the best mIoU is 66.21% in console output. Is this normal? If not, how can I get the mIoU in the paper (68.3%)?

    Another problem, when evaluating performance, do you use a teacher network or a student network?

    opened by BUAA-LKG 9
  • KeyError: 'data_time'

    KeyError: 'data_time'

    Thanks for your detailed codes. I am (just) able to fit DAFormer in 10GiB of GPU memory by disabling FD and reducing the crop size from 512x512 to 480x480. However, when it trained [4000/40000], a key error has occurred. Looking forward to your reply!

    Best.

    2022-01-09 23:01:50,074 - mmseg - INFO - Iter [3950/40000]	lr: 5.408e-05, eta: 11:05:38, time: 1.110, data_time: 0.014, memory: 8163, decode.loss_seg: 0.2515, decode.acc_seg: 86.1694, mix.decode.loss_seg: 0.2489, mix.decode.acc_seg: 86.0454
    2022-01-09 23:02:46,390 - mmseg - INFO - Exp name: 220109_2148_gta2cs_uda_warm_fdthings_rcs_croppl_a999_daformer_mitb5_s0_33f34
    2022-01-09 23:02:46,390 - mmseg - INFO - Iter [4000/40000]	lr: 5.400e-05, eta: 11:04:51, time: 1.127, data_time: 0.015, memory: 8163, decode.loss_seg: 0.2198, decode.acc_seg: 85.9952, mix.decode.loss_seg: 0.2331, mix.decode.acc_seg: 86.3791
    [>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>] 500/500, 7.2 task/s, elapsed: 70s, ETA:     0s2022-01-09 23:04:41,465 - mmseg - INFO - per class results:
    2022-01-09 23:04:41,468 - mmseg - INFO - 
    +---------------+-------+-------+
    |     Class     |  IoU  |  Acc  |
    +---------------+-------+-------+
    |      road     | 88.63 | 92.44 |
    |    sidewalk   | 46.11 | 71.19 |
    |    building   | 83.71 | 94.21 |
    |      wall     | 27.07 | 33.25 |
    |     fence     |  7.09 |  7.36 |
    |      pole     | 29.68 | 32.26 |
    | traffic light | 37.11 | 50.64 |
    |  traffic sign |  26.5 | 27.16 |
    |   vegetation  | 87.65 | 94.58 |
    |    terrain    | 44.26 | 52.82 |
    |      sky      | 85.91 | 97.84 |
    |     person    | 62.97 | 84.29 |
    |     rider     | 34.35 | 54.13 |
    |      car      | 85.22 | 93.03 |
    |     truck     | 48.87 | 65.72 |
    |      bus      | 47.98 | 77.11 |
    |     train     | 16.64 | 17.55 |
    |   motorcycle  | 40.64 | 61.24 |
    |    bicycle    | 47.22 | 51.02 |
    +---------------+-------+-------+
    2022-01-09 23:04:41,468 - mmseg - INFO - Summary:
    2022-01-09 23:04:41,468 - mmseg - INFO - 
    +-------+-------+-------+
    |  aAcc |  mIoU |  mAcc |
    +-------+-------+-------+
    | 88.67 | 49.87 | 60.94 |
    +-------+-------+-------+
    
    2022-01-09 23:04:41,562 - mmseg - INFO - Exp name: 220109_2148_gta2cs_uda_warm_fdthings_rcs_croppl_a999_daformer_mitb5_s0_33f34
    Traceback (most recent call last):
      File "run_experiments.py", line 104, in <module>
        train.main([config_files[i]])
      File "/home/data/liuhao/experiments/DAFormer-master/tools/train.py", line 173, in main
        meta=meta)
      File "/home/data/liuhao/experiments/DAFormer-master/mmseg/apis/train.py", line 131, in train_segmentor
        runner.run(data_loaders, cfg.workflow)
      File "/home/cv428/anaconda3/envs/liuhaommlab/lib/python3.6/site-packages/mmcv/runner/iter_based_runner.py", line 133, in run
        iter_runner(iter_loaders[i], **kwargs)
      File "/home/cv428/anaconda3/envs/liuhaommlab/lib/python3.6/site-packages/mmcv/runner/iter_based_runner.py", line 66, in train
        self.call_hook('after_train_iter')
      File "/home/cv428/anaconda3/envs/liuhaommlab/lib/python3.6/site-packages/mmcv/runner/base_runner.py", line 307, in call_hook
        getattr(hook, fn_name)(self)
      File "/home/cv428/anaconda3/envs/liuhaommlab/lib/python3.6/site-packages/mmcv/runner/hooks/logger/base.py", line 152, in after_train_iter
        self.log(runner)
      File "/home/cv428/anaconda3/envs/liuhaommlab/lib/python3.6/site-packages/mmcv/runner/hooks/logger/text.py", line 234, in log
        self._log_info(log_dict, runner)
      File "/home/cv428/anaconda3/envs/liuhaommlab/lib/python3.6/site-packages/mmcv/runner/hooks/logger/text.py", line 153, in _log_info
        log_str += f'time: {log_dict["time"]:.3f}, ' \
    KeyError: 'data_time'
    
    opened by leo-hao 6
  • About RandomCrop on target image

    About RandomCrop on target image

    Hi @lhoyer. Thank you for your wondeful work and detailed code.

    Here I have some questions about one of the data transforms. For target image, RandomCrop is applied, as follows:

    https://github.com/lhoyer/DAFormer/blob/21c5499f0ee1ea0ecd991003ba4598782d42ec04/mmseg/datasets/pipelines/transforms.py#L547-L556

    I find that when cat_max_ratio < 1 the ground-truth label of target image is also used. However, target ground-truth labels are unavailable in the UDA setting.

    Could you please help me out. Thanks again!

    opened by BinhuiXie 6
  • DarkZurich

    DarkZurich

    Hi, thank you for providing the code!

    This is how you arrange DarkZurich dataset in readme: dark_zurich (optional) │ │ ├── gt │ │ │ ├── val │ │ ├── rgb_anon │ │ │ ├── train │ │ │ ├── val

    but in configs/base/datasets/uda_cityscapes_to_darkzurich_512x512.py is this:

    target=dict( type='DarkZurichDataset', data_root='data/dark_zurich/', img_dir='rgb_anon/train/night/', ann_dir='gt/train/night/',

    however I don't find the gt of the train/night in DarkZurich dataset,

    I am very confused about this. Could you help me solve it? Thank you very much!

    opened by 932765523 4
  • KeyError: 'gta\\labels\\13432_labelTrainIds.png'

    KeyError: 'gta\\labels\\13432_labelTrainIds.png'

    Hi, thank you for your wonderful work, but I have a question for you. I would appreciate it if you could answer it.

    When I run “python run_experiments.py --config configs/daformer/gta2cs_uda_warm_fdthings_rcs_croppl_a999_daformer_mitb5_s0.py", in Windows, I got this problem : KeyError: 'gta\labels\13438_labelTrainIds.png'. And the error is as follows:

    Traceback (most recent call last): File "run_experiments.py", line 103, in train.main([config_files[i]]) File "E:\Project\DAFormer-master\tools\train.py", line 166, in main train_segmentor( File "E:\Project\DAFormer-master\mmseg\apis\train.py", line 131, in train_segmentor runner.run(data_loaders, cfg.workflow) File "c:\programdata\mmcv-1.3.7\mmcv\runner\iter_based_runner.py", line 131, in run iter_runner(iter_loaders[i], **kwargs) File "c:\programdata\mmcv-1.3.7\mmcv\runner\iter_based_runner.py", line 58, in train data_batch = next(data_loader) File "c:\programdata\mmcv-1.3.7\mmcv\runner\iter_based_runner.py", line 32, in next data = [self.dataset[idx] for idx in possibly_batched_index] File "C:\ProgramData\Anaconda3\envs\torch3.8\lib\site-packages\torch\utils\data_utils\fetch.py", line 49, in data = [self.dataset[idx] for idx in possibly_batched_index] File "E:\Project\DAFormer-master\mmseg\datasets\uda_dataset.py", line 113, in getitem return self.get_rare_class_sample() File "E:\Project\DAFormer-master\mmseg\datasets\uda_dataset.py", line 90, in get_rare_class_sample i1 = self.file_to_idx[f1] KeyError: 'gta\labels\13432_labelTrainIds.png'

    opened by 932765523 4
  • src.loss_imnet_feat_dist is nan

    src.loss_imnet_feat_dist is nan

    Hi, thanks for your excellent work! However, when I tried to train DAFormer on GTA5 to Cityscapes benchmark, I found src.loss_imnet_feat_dist is nan while training. Since the final result is expected (68.3 mIoU), I'm a little bit confused.

    duplicate 
    opened by Haochen-Wang409 4
  • 'src.loss_imnet_feat_dist' is nan

    'src.loss_imnet_feat_dist' is nan

    Thanks for your great work. During training, src.loss_imnet_feat_dist is nan at the beginning. Is it right?

    2022-02-19 08:15:52,120 - mmseg - INFO - Iter [50/40000] lr: 1.958e-06, eta: 1 day, 2:59:33, time: 2.432, data_time: 0.081, memory: 9808, decode.loss_seg: 2.6839, decode.acc_seg: 10.5387, src.loss_imnet_feat_dist: nan, mix.decode.loss_seg: 1.4047, mix.decode.acc_seg: 19.1339 2022-02-19 08:17:38,012 - mmseg - INFO - Iter [100/40000] lr: 3.950e-06, eta: 1 day, 1:12:58, time: 2.118, data_time: 0.033, memory: 9808, decode.loss_seg: 2.3862, decode.acc_seg: 47.4850, src.loss_imnet_feat_dist: nan, mix.decode.loss_seg: 1.3233, mix.decode.acc_seg: 41.3826 2022-02-19 08:19:28,666 - mmseg - INFO - Iter [150/40000] lr: 5.938e-06, eta: 1 day, 0:57:19, time: 2.213, data_time: 0.034, memory: 9808, decode.loss_seg: 2.0347, decode.acc_seg: 62.5967, src.loss_imnet_feat_dist: nan, mix.decode.loss_seg: 1.0585, mix.decode.acc_seg: 59.3449 2022-02-19 08:21:15,993 - mmseg - INFO - Iter [200/40000] lr: 7.920e-06, eta: 1 day, 0:37:33, time: 2.147, data_time: 0.033, memory: 9808, decode.loss_seg: 1.6078, decode.acc_seg: 68.1829, src.loss_imnet_feat_dist: nan, mix.decode.loss_seg: 0.7838, mix.decode.acc_seg: 68.8032 2022-02-19 08:23:03,135 - mmseg - INFO - Iter [250/40000] lr: 9.898e-06, eta: 1 day, 0:24:29, time: 2.143, data_time: 0.032, memory: 9808, decode.loss_seg: 1.3028, decode.acc_seg: 68.6837, src.loss_imnet_feat_dist: nan, mix.decode.loss_seg: 0.6529, mix.decode.acc_seg: 70.9704 2022-02-19 08:24:50,133 - mmseg - INFO - Iter [300/40000] lr: 1.187e-05, eta: 1 day, 0:14:51, time: 2.140, data_time: 0.034, memory: 9808, decode.loss_seg: 1.0986, decode.acc_seg: 70.4091, src.loss_imnet_feat_dist: nan, mix.decode.loss_seg: 0.5765, mix.decode.acc_seg: 72.7845 2022-02-19 08:26:36,420 - mmseg - INFO - Iter [350/40000] lr: 1.384e-05, eta: 1 day, 0:06:07, time: 2.126, data_time: 0.031, memory: 9808, decode.loss_seg: 0.9639, decode.acc_seg: 71.2223, src.loss_imnet_feat_dist: nan, mix.decode.loss_seg: 0.5049, mix.decode.acc_seg: 75.3486

    Looking forward to your reply!

    opened by yuki-no-hana 4
  • GPU Out of memory errors with RTX 3080

    GPU Out of memory errors with RTX 3080

    Hi, thanks for your excellent work.

    I see that your experiments were run with an RTX 2080 Ti (11GB?). I am having the following error with RTX 3080 (10GB), I wonder if this is expected or not. And if you have any tips on reducing the GPU memory usage?

    Full terminal output:

    Run job sim2forest_uda_warm_fdthings_rcs_croppl_a999_daformer_mitb5_s0
    2021-12-23 22:42:52,516 - mmseg - INFO - Environment info:
    ------------------------------------------------------------
    sys.platform: linux
    Python: 3.8.12 (default, Oct 12 2021, 13:49:34) [GCC 7.5.0]
    CUDA available: True
    GPU 0: NVIDIA GeForce RTX 3080
    CUDA_HOME: /usr/local/cuda
    NVCC: Build cuda_11.2.r11.2/compiler.29618528_0
    GCC: gcc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0
    PyTorch: 1.9.0+cu111
    PyTorch compiling details: PyTorch built with:
      - GCC 7.3
      - C++ Version: 201402
      - Intel(R) Math Kernel Library Version 2020.0.0 Product Build 20191122 for Intel(R) 64 architecture applications
      - Intel(R) MKL-DNN v2.1.2 (Git Hash 98be7e8afa711dc9b66c8ff3504129cb82013cdb)
      - OpenMP 201511 (a.k.a. OpenMP 4.5)
      - NNPACK is enabled
      - CPU capability usage: AVX2
      - CUDA Runtime 11.1
      - NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86
      - CuDNN 8.0.5
      - Magma 2.5.2
      - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.1, CUDNN_VERSION=8.0.5, CXX_COMPILER=/opt/rh/devtoolset-7/root/usr/bin/c++, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.9.0, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, 
    
    TorchVision: 0.10.0+cu111
    OpenCV: 4.4.0
    MMCV: 1.3.7
    MMCV Compiler: GCC 9.3
    MMCV CUDA Compiler: 11.2
    MMSegmentation: 0.16.0+21c5499
    ------------------------------------------------------------
    
    2021-12-23 22:42:52,516 - mmseg - INFO - Distributed training: False
    2021-12-23 22:42:52,984 - mmseg - INFO - Config:
    log_config = dict(
        interval=50, hooks=[dict(type='TextLoggerHook', by_epoch=False)])
    dist_params = dict(backend='nccl')
    log_level = 'INFO'
    load_from = None
    resume_from = None
    workflow = [('train', 1)]
    cudnn_benchmark = True
    norm_cfg = dict(type='BN', requires_grad=True)
    find_unused_parameters = True
    model = dict(
        type='EncoderDecoder',
        pretrained='pretrained/mit_b5.pth',
        backbone=dict(type='mit_b5', style='pytorch'),
        decode_head=dict(
            type='DAFormerHead',
            in_channels=[64, 128, 320, 512],
            in_index=[0, 1, 2, 3],
            channels=256,
            dropout_ratio=0.1,
            num_classes=19,
            norm_cfg=dict(type='BN', requires_grad=True),
            align_corners=False,
            decoder_params=dict(
                embed_dims=256,
                embed_cfg=dict(type='mlp', act_cfg=None, norm_cfg=None),
                embed_neck_cfg=dict(type='mlp', act_cfg=None, norm_cfg=None),
                fusion_cfg=dict(
                    type='aspp',
                    sep=True,
                    dilations=(1, 6, 12, 18),
                    pool=False,
                    act_cfg=dict(type='ReLU'),
                    norm_cfg=dict(type='BN', requires_grad=True))),
            loss_decode=dict(
                type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0)),
        train_cfg=dict(
            work_dir=
            'work_dirs/local-basic/211223_2242_sim2forest_uda_warm_fdthings_rcs_croppl_a999_daformer_mitb5_s0_dfad9'
        ),
        test_cfg=dict(mode='whole'))
    dataset_type = 'ForestRealDataset'
    data_root = 'data/forest/'
    img_norm_cfg = dict(
        mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)
    crop_size = (512, 512)
    sim_train_pipeline = [
        dict(type='LoadImageFromFile'),
        dict(type='LoadAnnotations'),
        dict(type='Resize', img_scale=(640, 480)),
        dict(type='RandomCrop', crop_size=(512, 512), cat_max_ratio=0.75),
        dict(type='RandomFlip', prob=0.5),
        dict(
            type='Normalize',
            mean=[123.675, 116.28, 103.53],
            std=[58.395, 57.12, 57.375],
            to_rgb=True),
        dict(type='Pad', size=(512, 512), pad_val=0, seg_pad_val=255),
        dict(type='DefaultFormatBundle'),
        dict(type='Collect', keys=['img', 'gt_semantic_seg'])
    ]
    forest_train_pipeline = [
        dict(type='LoadImageFromFile'),
        dict(type='LoadAnnotations'),
        dict(type='Resize', img_scale=(640, 480)),
        dict(type='RandomCrop', crop_size=(512, 512), cat_max_ratio=0.75),
        dict(type='RandomFlip', prob=0.5),
        dict(
            type='Normalize',
            mean=[123.675, 116.28, 103.53],
            std=[58.395, 57.12, 57.375],
            to_rgb=True),
        dict(type='Pad', size=(512, 512), pad_val=0, seg_pad_val=255),
        dict(type='DefaultFormatBundle'),
        dict(type='Collect', keys=['img', 'gt_semantic_seg'])
    ]
    test_pipeline = [
        dict(type='LoadImageFromFile'),
        dict(
            type='MultiScaleFlipAug',
            img_scale=(640, 480),
            flip=False,
            transforms=[
                dict(type='Resize', keep_ratio=True),
                dict(type='RandomFlip'),
                dict(
                    type='Normalize',
                    mean=[123.675, 116.28, 103.53],
                    std=[58.395, 57.12, 57.375],
                    to_rgb=True),
                dict(type='ImageToTensor', keys=['img']),
                dict(type='Collect', keys=['img'])
            ])
    ]
    data = dict(
        samples_per_gpu=2,
        workers_per_gpu=4,
        train=dict(
            type='UDADataset',
            source=dict(
                type='ForestSimDataset',
                data_root='data/sim/',
                img_dir='images',
                ann_dir='labels',
                pipeline=[
                    dict(type='LoadImageFromFile'),
                    dict(type='LoadAnnotations'),
                    dict(type='Resize', img_scale=(640, 480)),
                    dict(
                        type='RandomCrop',
                        crop_size=(512, 512),
                        cat_max_ratio=0.75),
                    dict(type='RandomFlip', prob=0.5),
                    dict(
                        type='Normalize',
                        mean=[123.675, 116.28, 103.53],
                        std=[58.395, 57.12, 57.375],
                        to_rgb=True),
                    dict(type='Pad', size=(512, 512), pad_val=0, seg_pad_val=255),
                    dict(type='DefaultFormatBundle'),
                    dict(type='Collect', keys=['img', 'gt_semantic_seg'])
                ]),
            target=dict(
                type='ForestRealDataset',
                data_root='data/forest/',
                img_dir='images',
                ann_dir='labels',
                pipeline=[
                    dict(type='LoadImageFromFile'),
                    dict(type='LoadAnnotations'),
                    dict(type='Resize', img_scale=(640, 480)),
                    dict(
                        type='RandomCrop',
                        crop_size=(512, 512),
                        cat_max_ratio=0.75),
                    dict(type='RandomFlip', prob=0.5),
                    dict(
                        type='Normalize',
                        mean=[123.675, 116.28, 103.53],
                        std=[58.395, 57.12, 57.375],
                        to_rgb=True),
                    dict(type='Pad', size=(512, 512), pad_val=0, seg_pad_val=255),
                    dict(type='DefaultFormatBundle'),
                    dict(type='Collect', keys=['img', 'gt_semantic_seg'])
                ]),
            rare_class_sampling=dict(
                min_pixels=3000, class_temp=0.01, min_crop_ratio=0.5)),
        val=dict(
            type='ForestRealDataset',
            data_root='data/forest/',
            img_dir='images',
            ann_dir='labels',
            pipeline=[
                dict(type='LoadImageFromFile'),
                dict(
                    type='MultiScaleFlipAug',
                    img_scale=(640, 480),
                    flip=False,
                    transforms=[
                        dict(type='Resize', keep_ratio=True),
                        dict(type='RandomFlip'),
                        dict(
                            type='Normalize',
                            mean=[123.675, 116.28, 103.53],
                            std=[58.395, 57.12, 57.375],
                            to_rgb=True),
                        dict(type='ImageToTensor', keys=['img']),
                        dict(type='Collect', keys=['img'])
                    ])
            ]),
        test=dict(
            type='ForestRealDataset',
            data_root='data/forest/',
            img_dir='images',
            ann_dir='labels',
            pipeline=[
                dict(type='LoadImageFromFile'),
                dict(
                    type='MultiScaleFlipAug',
                    img_scale=(640, 480),
                    flip=False,
                    transforms=[
                        dict(type='Resize', keep_ratio=True),
                        dict(type='RandomFlip'),
                        dict(
                            type='Normalize',
                            mean=[123.675, 116.28, 103.53],
                            std=[58.395, 57.12, 57.375],
                            to_rgb=True),
                        dict(type='ImageToTensor', keys=['img']),
                        dict(type='Collect', keys=['img'])
                    ])
            ]))
    uda = dict(
        type='DACS',
        alpha=0.999,
        pseudo_threshold=0.968,
        pseudo_weight_ignore_top=15,
        pseudo_weight_ignore_bottom=120,
        imnet_feature_dist_lambda=0.005,
        imnet_feature_dist_classes=[6, 7, 11, 12, 13, 14, 15, 16, 17, 18],
        imnet_feature_dist_scale_min_ratio=0.75,
        mix='class',
        blur=True,
        color_jitter_strength=0.2,
        color_jitter_probability=0.2,
        debug_img_interval=1000,
        print_grad_magnitude=False)
    use_ddp_wrapper = True
    optimizer = dict(
        type='AdamW',
        lr=6e-05,
        betas=(0.9, 0.999),
        weight_decay=0.01,
        paramwise_cfg=dict(
            custom_keys=dict(
                head=dict(lr_mult=10.0),
                pos_block=dict(decay_mult=0.0),
                norm=dict(decay_mult=0.0))))
    optimizer_config = None
    lr_config = dict(
        policy='poly',
        warmup='linear',
        warmup_iters=1500,
        warmup_ratio=1e-06,
        power=1.0,
        min_lr=0.0,
        by_epoch=False)
    seed = 0
    n_gpus = 1
    runner = dict(type='IterBasedRunner', max_iters=40000)
    checkpoint_config = dict(by_epoch=False, interval=40000, max_keep_ckpts=1)
    evaluation = dict(interval=4000, metric='mIoU')
    name = '211223_2242_sim2forest_uda_warm_fdthings_rcs_croppl_a999_daformer_mitb5_s0_dfad9'
    exp = 'basic'
    name_dataset = 'sim2forest'
    name_architecture = 'daformer_sepaspp_mitb5'
    name_encoder = 'mitb5'
    name_decoder = 'daformer_sepaspp'
    name_uda = 'dacs_a999_fd_things_rcs0.01_cpl'
    name_opt = 'adamw_6e-05_pmTrue_poly10warm_1x2_40k'
    work_dir = 'work_dirs/local-basic/211223_2242_sim2forest_uda_warm_fdthings_rcs_croppl_a999_daformer_mitb5_s0_dfad9'
    git_rev = '21c5499f0ee1ea0ecd991003ba4598782d42ec04'
    gpu_ids = range(0, 1)
    
    2021-12-23 22:42:52,984 - mmseg - INFO - Set random seed to 0, deterministic: False
    /home/hans/Documents/part3project/Models/DAFormer/mmseg/models/backbones/mix_transformer.py:214: UserWarning: DeprecationWarning: pretrained is a deprecated, please use "init_cfg" instead
      warnings.warn('DeprecationWarning: pretrained is a deprecated, '
    2021-12-23 22:42:54,161 - mmseg - INFO - Load mit checkpoint.
    2021-12-23 22:42:54,161 - mmseg - INFO - Use load_from_local loader
    /home/hans/Documents/part3project/Models/DAFormer/env/lib/python3.8/site-packages/mmcv/cnn/utils/weight_init.py:118: UserWarning: init_cfg without layer key, if you do not define override key either, this init_cfg will do nothing
      warnings.warn(
    2021-12-23 22:42:54,332 - mmseg - INFO - Load mit checkpoint.
    2021-12-23 22:42:54,332 - mmseg - INFO - Use load_from_local loader
    2021-12-23 22:42:54,469 - mmseg - INFO - Load mit checkpoint.
    2021-12-23 22:42:54,470 - mmseg - INFO - Use load_from_local loader
    2021-12-23 22:42:54,606 - mmseg - INFO - DACS(
      (model): EncoderDecoder(
        (backbone): mit_b5(
          (patch_embed1): OverlapPatchEmbed(
            (proj): Conv2d(3, 64, kernel_size=(7, 7), stride=(4, 4), padding=(3, 3))
            (norm): LayerNorm((64,), eps=1e-05, elementwise_affine=True)
          )
          (patch_embed2): OverlapPatchEmbed(
            (proj): Conv2d(64, 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
            (norm): LayerNorm((128,), eps=1e-05, elementwise_affine=True)
          )
          (patch_embed3): OverlapPatchEmbed(
            (proj): Conv2d(128, 320, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
            (norm): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
          )
          (patch_embed4): OverlapPatchEmbed(
            (proj): Conv2d(320, 512, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
            (norm): LayerNorm((512,), eps=1e-05, elementwise_affine=True)
          )
          (block1): ModuleList(
            (0): Block(
              (norm1): LayerNorm((64,), eps=1e-06, elementwise_affine=True)
              (attn): Attention(
                (q): Linear(in_features=64, out_features=64, bias=True)
                (kv): Linear(in_features=64, out_features=128, bias=True)
                (attn_drop): Dropout(p=0.0, inplace=False)
                (proj): Linear(in_features=64, out_features=64, bias=True)
                (proj_drop): Dropout(p=0.0, inplace=False)
                (sr): Conv2d(64, 64, kernel_size=(8, 8), stride=(8, 8))
                (norm): LayerNorm((64,), eps=1e-05, elementwise_affine=True)
              )
              (drop_path): Identity()
              (norm2): LayerNorm((64,), eps=1e-06, elementwise_affine=True)
              (mlp): Mlp(
                (fc1): Linear(in_features=64, out_features=256, bias=True)
                (dwconv): DWConv(
                  (dwconv): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=256)
                )
                (act): GELU()
                (fc2): Linear(in_features=256, out_features=64, bias=True)
                (drop): Dropout(p=0.0, inplace=False)
              )
            )
            (1): Block(
              (norm1): LayerNorm((64,), eps=1e-06, elementwise_affine=True)
              (attn): Attention(
                (q): Linear(in_features=64, out_features=64, bias=True)
                (kv): Linear(in_features=64, out_features=128, bias=True)
                (attn_drop): Dropout(p=0.0, inplace=False)
                (proj): Linear(in_features=64, out_features=64, bias=True)
                (proj_drop): Dropout(p=0.0, inplace=False)
                (sr): Conv2d(64, 64, kernel_size=(8, 8), stride=(8, 8))
                (norm): LayerNorm((64,), eps=1e-05, elementwise_affine=True)
              )
              (drop_path): DropPath()
              (norm2): LayerNorm((64,), eps=1e-06, elementwise_affine=True)
              (mlp): Mlp(
                (fc1): Linear(in_features=64, out_features=256, bias=True)
                (dwconv): DWConv(
                  (dwconv): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=256)
                )
                (act): GELU()
                (fc2): Linear(in_features=256, out_features=64, bias=True)
                (drop): Dropout(p=0.0, inplace=False)
              )
            )
            (2): Block(
              (norm1): LayerNorm((64,), eps=1e-06, elementwise_affine=True)
              (attn): Attention(
                (q): Linear(in_features=64, out_features=64, bias=True)
                (kv): Linear(in_features=64, out_features=128, bias=True)
                (attn_drop): Dropout(p=0.0, inplace=False)
                (proj): Linear(in_features=64, out_features=64, bias=True)
                (proj_drop): Dropout(p=0.0, inplace=False)
                (sr): Conv2d(64, 64, kernel_size=(8, 8), stride=(8, 8))
                (norm): LayerNorm((64,), eps=1e-05, elementwise_affine=True)
              )
              (drop_path): DropPath()
              (norm2): LayerNorm((64,), eps=1e-06, elementwise_affine=True)
              (mlp): Mlp(
                (fc1): Linear(in_features=64, out_features=256, bias=True)
                (dwconv): DWConv(
                  (dwconv): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=256)
                )
                (act): GELU()
                (fc2): Linear(in_features=256, out_features=64, bias=True)
                (drop): Dropout(p=0.0, inplace=False)
              )
            )
          )
          (norm1): LayerNorm((64,), eps=1e-06, elementwise_affine=True)
          (block2): ModuleList(
            (0): Block(
              (norm1): LayerNorm((128,), eps=1e-06, elementwise_affine=True)
              (attn): Attention(
                (q): Linear(in_features=128, out_features=128, bias=True)
                (kv): Linear(in_features=128, out_features=256, bias=True)
                (attn_drop): Dropout(p=0.0, inplace=False)
                (proj): Linear(in_features=128, out_features=128, bias=True)
                (proj_drop): Dropout(p=0.0, inplace=False)
                (sr): Conv2d(128, 128, kernel_size=(4, 4), stride=(4, 4))
                (norm): LayerNorm((128,), eps=1e-05, elementwise_affine=True)
              )
              (drop_path): DropPath()
              (norm2): LayerNorm((128,), eps=1e-06, elementwise_affine=True)
              (mlp): Mlp(
                (fc1): Linear(in_features=128, out_features=512, bias=True)
                (dwconv): DWConv(
                  (dwconv): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=512)
                )
                (act): GELU()
                (fc2): Linear(in_features=512, out_features=128, bias=True)
                (drop): Dropout(p=0.0, inplace=False)
              )
            )
            (1): Block(
              (norm1): LayerNorm((128,), eps=1e-06, elementwise_affine=True)
              (attn): Attention(
                (q): Linear(in_features=128, out_features=128, bias=True)
                (kv): Linear(in_features=128, out_features=256, bias=True)
                (attn_drop): Dropout(p=0.0, inplace=False)
                (proj): Linear(in_features=128, out_features=128, bias=True)
                (proj_drop): Dropout(p=0.0, inplace=False)
                (sr): Conv2d(128, 128, kernel_size=(4, 4), stride=(4, 4))
                (norm): LayerNorm((128,), eps=1e-05, elementwise_affine=True)
              )
              (drop_path): DropPath()
              (norm2): LayerNorm((128,), eps=1e-06, elementwise_affine=True)
              (mlp): Mlp(
                (fc1): Linear(in_features=128, out_features=512, bias=True)
                (dwconv): DWConv(
                  (dwconv): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=512)
                )
                (act): GELU()
                (fc2): Linear(in_features=512, out_features=128, bias=True)
                (drop): Dropout(p=0.0, inplace=False)
              )
            )
            (2): Block(
              (norm1): LayerNorm((128,), eps=1e-06, elementwise_affine=True)
              (attn): Attention(
                (q): Linear(in_features=128, out_features=128, bias=True)
                (kv): Linear(in_features=128, out_features=256, bias=True)
                (attn_drop): Dropout(p=0.0, inplace=False)
                (proj): Linear(in_features=128, out_features=128, bias=True)
                (proj_drop): Dropout(p=0.0, inplace=False)
                (sr): Conv2d(128, 128, kernel_size=(4, 4), stride=(4, 4))
                (norm): LayerNorm((128,), eps=1e-05, elementwise_affine=True)
              )
              (drop_path): DropPath()
              (norm2): LayerNorm((128,), eps=1e-06, elementwise_affine=True)
              (mlp): Mlp(
                (fc1): Linear(in_features=128, out_features=512, bias=True)
                (dwconv): DWConv(
                  (dwconv): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=512)
                )
                (act): GELU()
                (fc2): Linear(in_features=512, out_features=128, bias=True)
                (drop): Dropout(p=0.0, inplace=False)
              )
            )
            (3): Block(
              (norm1): LayerNorm((128,), eps=1e-06, elementwise_affine=True)
              (attn): Attention(
                (q): Linear(in_features=128, out_features=128, bias=True)
                (kv): Linear(in_features=128, out_features=256, bias=True)
                (attn_drop): Dropout(p=0.0, inplace=False)
                (proj): Linear(in_features=128, out_features=128, bias=True)
                (proj_drop): Dropout(p=0.0, inplace=False)
                (sr): Conv2d(128, 128, kernel_size=(4, 4), stride=(4, 4))
                (norm): LayerNorm((128,), eps=1e-05, elementwise_affine=True)
              )
              (drop_path): DropPath()
              (norm2): LayerNorm((128,), eps=1e-06, elementwise_affine=True)
              (mlp): Mlp(
                (fc1): Linear(in_features=128, out_features=512, bias=True)
                (dwconv): DWConv(
                  (dwconv): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=512)
                )
                (act): GELU()
                (fc2): Linear(in_features=512, out_features=128, bias=True)
                (drop): Dropout(p=0.0, inplace=False)
              )
            )
            (4): Block(
              (norm1): LayerNorm((128,), eps=1e-06, elementwise_affine=True)
              (attn): Attention(
                (q): Linear(in_features=128, out_features=128, bias=True)
                (kv): Linear(in_features=128, out_features=256, bias=True)
                (attn_drop): Dropout(p=0.0, inplace=False)
                (proj): Linear(in_features=128, out_features=128, bias=True)
                (proj_drop): Dropout(p=0.0, inplace=False)
                (sr): Conv2d(128, 128, kernel_size=(4, 4), stride=(4, 4))
                (norm): LayerNorm((128,), eps=1e-05, elementwise_affine=True)
              )
              (drop_path): DropPath()
              (norm2): LayerNorm((128,), eps=1e-06, elementwise_affine=True)
              (mlp): Mlp(
                (fc1): Linear(in_features=128, out_features=512, bias=True)
                (dwconv): DWConv(
                  (dwconv): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=512)
                )
                (act): GELU()
                (fc2): Linear(in_features=512, out_features=128, bias=True)
                (drop): Dropout(p=0.0, inplace=False)
              )
            )
            (5): Block(
              (norm1): LayerNorm((128,), eps=1e-06, elementwise_affine=True)
              (attn): Attention(
                (q): Linear(in_features=128, out_features=128, bias=True)
                (kv): Linear(in_features=128, out_features=256, bias=True)
                (attn_drop): Dropout(p=0.0, inplace=False)
                (proj): Linear(in_features=128, out_features=128, bias=True)
                (proj_drop): Dropout(p=0.0, inplace=False)
                (sr): Conv2d(128, 128, kernel_size=(4, 4), stride=(4, 4))
                (norm): LayerNorm((128,), eps=1e-05, elementwise_affine=True)
              )
              (drop_path): DropPath()
              (norm2): LayerNorm((128,), eps=1e-06, elementwise_affine=True)
              (mlp): Mlp(
                (fc1): Linear(in_features=128, out_features=512, bias=True)
                (dwconv): DWConv(
                  (dwconv): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=512)
                )
                (act): GELU()
                (fc2): Linear(in_features=512, out_features=128, bias=True)
                (drop): Dropout(p=0.0, inplace=False)
              )
            )
          )
          (norm2): LayerNorm((128,), eps=1e-06, elementwise_affine=True)
          (block3): ModuleList(
            (0): Block(
              (norm1): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
              (attn): Attention(
                (q): Linear(in_features=320, out_features=320, bias=True)
                (kv): Linear(in_features=320, out_features=640, bias=True)
                (attn_drop): Dropout(p=0.0, inplace=False)
                (proj): Linear(in_features=320, out_features=320, bias=True)
                (proj_drop): Dropout(p=0.0, inplace=False)
                (sr): Conv2d(320, 320, kernel_size=(2, 2), stride=(2, 2))
                (norm): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
              )
              (drop_path): DropPath()
              (norm2): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
              (mlp): Mlp(
                (fc1): Linear(in_features=320, out_features=1280, bias=True)
                (dwconv): DWConv(
                  (dwconv): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1280)
                )
                (act): GELU()
                (fc2): Linear(in_features=1280, out_features=320, bias=True)
                (drop): Dropout(p=0.0, inplace=False)
              )
            )
            (1): Block(
              (norm1): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
              (attn): Attention(
                (q): Linear(in_features=320, out_features=320, bias=True)
                (kv): Linear(in_features=320, out_features=640, bias=True)
                (attn_drop): Dropout(p=0.0, inplace=False)
                (proj): Linear(in_features=320, out_features=320, bias=True)
                (proj_drop): Dropout(p=0.0, inplace=False)
                (sr): Conv2d(320, 320, kernel_size=(2, 2), stride=(2, 2))
                (norm): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
              )
              (drop_path): DropPath()
              (norm2): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
              (mlp): Mlp(
                (fc1): Linear(in_features=320, out_features=1280, bias=True)
                (dwconv): DWConv(
                  (dwconv): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1280)
                )
                (act): GELU()
                (fc2): Linear(in_features=1280, out_features=320, bias=True)
                (drop): Dropout(p=0.0, inplace=False)
              )
            )
            (2): Block(
              (norm1): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
              (attn): Attention(
                (q): Linear(in_features=320, out_features=320, bias=True)
                (kv): Linear(in_features=320, out_features=640, bias=True)
                (attn_drop): Dropout(p=0.0, inplace=False)
                (proj): Linear(in_features=320, out_features=320, bias=True)
                (proj_drop): Dropout(p=0.0, inplace=False)
                (sr): Conv2d(320, 320, kernel_size=(2, 2), stride=(2, 2))
                (norm): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
              )
              (drop_path): DropPath()
              (norm2): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
              (mlp): Mlp(
                (fc1): Linear(in_features=320, out_features=1280, bias=True)
                (dwconv): DWConv(
                  (dwconv): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1280)
                )
                (act): GELU()
                (fc2): Linear(in_features=1280, out_features=320, bias=True)
                (drop): Dropout(p=0.0, inplace=False)
              )
            )
            (3): Block(
              (norm1): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
              (attn): Attention(
                (q): Linear(in_features=320, out_features=320, bias=True)
                (kv): Linear(in_features=320, out_features=640, bias=True)
                (attn_drop): Dropout(p=0.0, inplace=False)
                (proj): Linear(in_features=320, out_features=320, bias=True)
                (proj_drop): Dropout(p=0.0, inplace=False)
                (sr): Conv2d(320, 320, kernel_size=(2, 2), stride=(2, 2))
                (norm): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
              )
              (drop_path): DropPath()
              (norm2): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
              (mlp): Mlp(
                (fc1): Linear(in_features=320, out_features=1280, bias=True)
                (dwconv): DWConv(
                  (dwconv): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1280)
                )
                (act): GELU()
                (fc2): Linear(in_features=1280, out_features=320, bias=True)
                (drop): Dropout(p=0.0, inplace=False)
              )
            )
            (4): Block(
              (norm1): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
              (attn): Attention(
                (q): Linear(in_features=320, out_features=320, bias=True)
                (kv): Linear(in_features=320, out_features=640, bias=True)
                (attn_drop): Dropout(p=0.0, inplace=False)
                (proj): Linear(in_features=320, out_features=320, bias=True)
                (proj_drop): Dropout(p=0.0, inplace=False)
                (sr): Conv2d(320, 320, kernel_size=(2, 2), stride=(2, 2))
                (norm): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
              )
              (drop_path): DropPath()
              (norm2): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
              (mlp): Mlp(
                (fc1): Linear(in_features=320, out_features=1280, bias=True)
                (dwconv): DWConv(
                  (dwconv): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1280)
                )
                (act): GELU()
                (fc2): Linear(in_features=1280, out_features=320, bias=True)
                (drop): Dropout(p=0.0, inplace=False)
              )
            )
            (5): Block(
              (norm1): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
              (attn): Attention(
                (q): Linear(in_features=320, out_features=320, bias=True)
                (kv): Linear(in_features=320, out_features=640, bias=True)
                (attn_drop): Dropout(p=0.0, inplace=False)
                (proj): Linear(in_features=320, out_features=320, bias=True)
                (proj_drop): Dropout(p=0.0, inplace=False)
                (sr): Conv2d(320, 320, kernel_size=(2, 2), stride=(2, 2))
                (norm): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
              )
              (drop_path): DropPath()
              (norm2): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
              (mlp): Mlp(
                (fc1): Linear(in_features=320, out_features=1280, bias=True)
                (dwconv): DWConv(
                  (dwconv): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1280)
                )
                (act): GELU()
                (fc2): Linear(in_features=1280, out_features=320, bias=True)
                (drop): Dropout(p=0.0, inplace=False)
              )
            )
            (6): Block(
              (norm1): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
              (attn): Attention(
                (q): Linear(in_features=320, out_features=320, bias=True)
                (kv): Linear(in_features=320, out_features=640, bias=True)
                (attn_drop): Dropout(p=0.0, inplace=False)
                (proj): Linear(in_features=320, out_features=320, bias=True)
                (proj_drop): Dropout(p=0.0, inplace=False)
                (sr): Conv2d(320, 320, kernel_size=(2, 2), stride=(2, 2))
                (norm): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
              )
              (drop_path): DropPath()
              (norm2): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
              (mlp): Mlp(
                (fc1): Linear(in_features=320, out_features=1280, bias=True)
                (dwconv): DWConv(
                  (dwconv): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1280)
                )
                (act): GELU()
                (fc2): Linear(in_features=1280, out_features=320, bias=True)
                (drop): Dropout(p=0.0, inplace=False)
              )
            )
            (7): Block(
              (norm1): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
              (attn): Attention(
                (q): Linear(in_features=320, out_features=320, bias=True)
                (kv): Linear(in_features=320, out_features=640, bias=True)
                (attn_drop): Dropout(p=0.0, inplace=False)
                (proj): Linear(in_features=320, out_features=320, bias=True)
                (proj_drop): Dropout(p=0.0, inplace=False)
                (sr): Conv2d(320, 320, kernel_size=(2, 2), stride=(2, 2))
                (norm): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
              )
              (drop_path): DropPath()
              (norm2): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
              (mlp): Mlp(
                (fc1): Linear(in_features=320, out_features=1280, bias=True)
                (dwconv): DWConv(
                  (dwconv): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1280)
                )
                (act): GELU()
                (fc2): Linear(in_features=1280, out_features=320, bias=True)
                (drop): Dropout(p=0.0, inplace=False)
              )
            )
            (8): Block(
              (norm1): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
              (attn): Attention(
                (q): Linear(in_features=320, out_features=320, bias=True)
                (kv): Linear(in_features=320, out_features=640, bias=True)
                (attn_drop): Dropout(p=0.0, inplace=False)
                (proj): Linear(in_features=320, out_features=320, bias=True)
                (proj_drop): Dropout(p=0.0, inplace=False)
                (sr): Conv2d(320, 320, kernel_size=(2, 2), stride=(2, 2))
                (norm): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
              )
              (drop_path): DropPath()
              (norm2): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
              (mlp): Mlp(
                (fc1): Linear(in_features=320, out_features=1280, bias=True)
                (dwconv): DWConv(
                  (dwconv): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1280)
                )
                (act): GELU()
                (fc2): Linear(in_features=1280, out_features=320, bias=True)
                (drop): Dropout(p=0.0, inplace=False)
              )
            )
            (9): Block(
              (norm1): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
              (attn): Attention(
                (q): Linear(in_features=320, out_features=320, bias=True)
                (kv): Linear(in_features=320, out_features=640, bias=True)
                (attn_drop): Dropout(p=0.0, inplace=False)
                (proj): Linear(in_features=320, out_features=320, bias=True)
                (proj_drop): Dropout(p=0.0, inplace=False)
                (sr): Conv2d(320, 320, kernel_size=(2, 2), stride=(2, 2))
                (norm): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
              )
              (drop_path): DropPath()
              (norm2): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
              (mlp): Mlp(
                (fc1): Linear(in_features=320, out_features=1280, bias=True)
                (dwconv): DWConv(
                  (dwconv): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1280)
                )
                (act): GELU()
                (fc2): Linear(in_features=1280, out_features=320, bias=True)
                (drop): Dropout(p=0.0, inplace=False)
              )
            )
            (10): Block(
              (norm1): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
              (attn): Attention(
                (q): Linear(in_features=320, out_features=320, bias=True)
                (kv): Linear(in_features=320, out_features=640, bias=True)
                (attn_drop): Dropout(p=0.0, inplace=False)
                (proj): Linear(in_features=320, out_features=320, bias=True)
                (proj_drop): Dropout(p=0.0, inplace=False)
                (sr): Conv2d(320, 320, kernel_size=(2, 2), stride=(2, 2))
                (norm): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
              )
              (drop_path): DropPath()
              (norm2): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
              (mlp): Mlp(
                (fc1): Linear(in_features=320, out_features=1280, bias=True)
                (dwconv): DWConv(
                  (dwconv): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1280)
                )
                (act): GELU()
                (fc2): Linear(in_features=1280, out_features=320, bias=True)
                (drop): Dropout(p=0.0, inplace=False)
              )
            )
            (11): Block(
              (norm1): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
              (attn): Attention(
                (q): Linear(in_features=320, out_features=320, bias=True)
                (kv): Linear(in_features=320, out_features=640, bias=True)
                (attn_drop): Dropout(p=0.0, inplace=False)
                (proj): Linear(in_features=320, out_features=320, bias=True)
                (proj_drop): Dropout(p=0.0, inplace=False)
                (sr): Conv2d(320, 320, kernel_size=(2, 2), stride=(2, 2))
                (norm): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
              )
              (drop_path): DropPath()
              (norm2): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
              (mlp): Mlp(
                (fc1): Linear(in_features=320, out_features=1280, bias=True)
                (dwconv): DWConv(
                  (dwconv): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1280)
                )
                (act): GELU()
                (fc2): Linear(in_features=1280, out_features=320, bias=True)
                (drop): Dropout(p=0.0, inplace=False)
              )
            )
            (12): Block(
              (norm1): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
              (attn): Attention(
                (q): Linear(in_features=320, out_features=320, bias=True)
                (kv): Linear(in_features=320, out_features=640, bias=True)
                (attn_drop): Dropout(p=0.0, inplace=False)
                (proj): Linear(in_features=320, out_features=320, bias=True)
                (proj_drop): Dropout(p=0.0, inplace=False)
                (sr): Conv2d(320, 320, kernel_size=(2, 2), stride=(2, 2))
                (norm): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
              )
              (drop_path): DropPath()
              (norm2): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
              (mlp): Mlp(
                (fc1): Linear(in_features=320, out_features=1280, bias=True)
                (dwconv): DWConv(
                  (dwconv): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1280)
                )
                (act): GELU()
                (fc2): Linear(in_features=1280, out_features=320, bias=True)
                (drop): Dropout(p=0.0, inplace=False)
              )
            )
            (13): Block(
              (norm1): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
              (attn): Attention(
                (q): Linear(in_features=320, out_features=320, bias=True)
                (kv): Linear(in_features=320, out_features=640, bias=True)
                (attn_drop): Dropout(p=0.0, inplace=False)
                (proj): Linear(in_features=320, out_features=320, bias=True)
                (proj_drop): Dropout(p=0.0, inplace=False)
                (sr): Conv2d(320, 320, kernel_size=(2, 2), stride=(2, 2))
                (norm): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
              )
              (drop_path): DropPath()
              (norm2): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
              (mlp): Mlp(
                (fc1): Linear(in_features=320, out_features=1280, bias=True)
                (dwconv): DWConv(
                  (dwconv): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1280)
                )
                (act): GELU()
                (fc2): Linear(in_features=1280, out_features=320, bias=True)
                (drop): Dropout(p=0.0, inplace=False)
              )
            )
            (14): Block(
              (norm1): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
              (attn): Attention(
                (q): Linear(in_features=320, out_features=320, bias=True)
                (kv): Linear(in_features=320, out_features=640, bias=True)
                (attn_drop): Dropout(p=0.0, inplace=False)
                (proj): Linear(in_features=320, out_features=320, bias=True)
                (proj_drop): Dropout(p=0.0, inplace=False)
                (sr): Conv2d(320, 320, kernel_size=(2, 2), stride=(2, 2))
                (norm): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
              )
              (drop_path): DropPath()
              (norm2): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
              (mlp): Mlp(
                (fc1): Linear(in_features=320, out_features=1280, bias=True)
                (dwconv): DWConv(
                  (dwconv): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1280)
                )
                (act): GELU()
                (fc2): Linear(in_features=1280, out_features=320, bias=True)
                (drop): Dropout(p=0.0, inplace=False)
              )
            )
            (15): Block(
              (norm1): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
              (attn): Attention(
                (q): Linear(in_features=320, out_features=320, bias=True)
                (kv): Linear(in_features=320, out_features=640, bias=True)
                (attn_drop): Dropout(p=0.0, inplace=False)
                (proj): Linear(in_features=320, out_features=320, bias=True)
                (proj_drop): Dropout(p=0.0, inplace=False)
                (sr): Conv2d(320, 320, kernel_size=(2, 2), stride=(2, 2))
                (norm): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
              )
              (drop_path): DropPath()
              (norm2): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
              (mlp): Mlp(
                (fc1): Linear(in_features=320, out_features=1280, bias=True)
                (dwconv): DWConv(
                  (dwconv): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1280)
                )
                (act): GELU()
                (fc2): Linear(in_features=1280, out_features=320, bias=True)
                (drop): Dropout(p=0.0, inplace=False)
              )
            )
            (16): Block(
              (norm1): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
              (attn): Attention(
                (q): Linear(in_features=320, out_features=320, bias=True)
                (kv): Linear(in_features=320, out_features=640, bias=True)
                (attn_drop): Dropout(p=0.0, inplace=False)
                (proj): Linear(in_features=320, out_features=320, bias=True)
                (proj_drop): Dropout(p=0.0, inplace=False)
                (sr): Conv2d(320, 320, kernel_size=(2, 2), stride=(2, 2))
                (norm): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
              )
              (drop_path): DropPath()
              (norm2): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
              (mlp): Mlp(
                (fc1): Linear(in_features=320, out_features=1280, bias=True)
                (dwconv): DWConv(
                  (dwconv): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1280)
                )
                (act): GELU()
                (fc2): Linear(in_features=1280, out_features=320, bias=True)
                (drop): Dropout(p=0.0, inplace=False)
              )
            )
            (17): Block(
              (norm1): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
              (attn): Attention(
                (q): Linear(in_features=320, out_features=320, bias=True)
                (kv): Linear(in_features=320, out_features=640, bias=True)
                (attn_drop): Dropout(p=0.0, inplace=False)
                (proj): Linear(in_features=320, out_features=320, bias=True)
                (proj_drop): Dropout(p=0.0, inplace=False)
                (sr): Conv2d(320, 320, kernel_size=(2, 2), stride=(2, 2))
                (norm): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
              )
              (drop_path): DropPath()
              (norm2): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
              (mlp): Mlp(
                (fc1): Linear(in_features=320, out_features=1280, bias=True)
                (dwconv): DWConv(
                  (dwconv): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1280)
                )
                (act): GELU()
                (fc2): Linear(in_features=1280, out_features=320, bias=True)
                (drop): Dropout(p=0.0, inplace=False)
              )
            )
            (18): Block(
              (norm1): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
              (attn): Attention(
                (q): Linear(in_features=320, out_features=320, bias=True)
                (kv): Linear(in_features=320, out_features=640, bias=True)
                (attn_drop): Dropout(p=0.0, inplace=False)
                (proj): Linear(in_features=320, out_features=320, bias=True)
                (proj_drop): Dropout(p=0.0, inplace=False)
                (sr): Conv2d(320, 320, kernel_size=(2, 2), stride=(2, 2))
                (norm): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
              )
              (drop_path): DropPath()
              (norm2): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
              (mlp): Mlp(
                (fc1): Linear(in_features=320, out_features=1280, bias=True)
                (dwconv): DWConv(
                  (dwconv): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1280)
                )
                (act): GELU()
                (fc2): Linear(in_features=1280, out_features=320, bias=True)
                (drop): Dropout(p=0.0, inplace=False)
              )
            )
            (19): Block(
              (norm1): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
              (attn): Attention(
                (q): Linear(in_features=320, out_features=320, bias=True)
                (kv): Linear(in_features=320, out_features=640, bias=True)
                (attn_drop): Dropout(p=0.0, inplace=False)
                (proj): Linear(in_features=320, out_features=320, bias=True)
                (proj_drop): Dropout(p=0.0, inplace=False)
                (sr): Conv2d(320, 320, kernel_size=(2, 2), stride=(2, 2))
                (norm): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
              )
              (drop_path): DropPath()
              (norm2): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
              (mlp): Mlp(
                (fc1): Linear(in_features=320, out_features=1280, bias=True)
                (dwconv): DWConv(
                  (dwconv): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1280)
                )
                (act): GELU()
                (fc2): Linear(in_features=1280, out_features=320, bias=True)
                (drop): Dropout(p=0.0, inplace=False)
              )
            )
            (20): Block(
              (norm1): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
              (attn): Attention(
                (q): Linear(in_features=320, out_features=320, bias=True)
                (kv): Linear(in_features=320, out_features=640, bias=True)
                (attn_drop): Dropout(p=0.0, inplace=False)
                (proj): Linear(in_features=320, out_features=320, bias=True)
                (proj_drop): Dropout(p=0.0, inplace=False)
                (sr): Conv2d(320, 320, kernel_size=(2, 2), stride=(2, 2))
                (norm): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
              )
              (drop_path): DropPath()
              (norm2): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
              (mlp): Mlp(
                (fc1): Linear(in_features=320, out_features=1280, bias=True)
                (dwconv): DWConv(
                  (dwconv): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1280)
                )
                (act): GELU()
                (fc2): Linear(in_features=1280, out_features=320, bias=True)
                (drop): Dropout(p=0.0, inplace=False)
              )
            )
            (21): Block(
              (norm1): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
              (attn): Attention(
                (q): Linear(in_features=320, out_features=320, bias=True)
                (kv): Linear(in_features=320, out_features=640, bias=True)
                (attn_drop): Dropout(p=0.0, inplace=False)
                (proj): Linear(in_features=320, out_features=320, bias=True)
                (proj_drop): Dropout(p=0.0, inplace=False)
                (sr): Conv2d(320, 320, kernel_size=(2, 2), stride=(2, 2))
                (norm): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
              )
              (drop_path): DropPath()
              (norm2): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
              (mlp): Mlp(
                (fc1): Linear(in_features=320, out_features=1280, bias=True)
                (dwconv): DWConv(
                  (dwconv): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1280)
                )
                (act): GELU()
                (fc2): Linear(in_features=1280, out_features=320, bias=True)
                (drop): Dropout(p=0.0, inplace=False)
              )
            )
            (22): Block(
              (norm1): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
              (attn): Attention(
                (q): Linear(in_features=320, out_features=320, bias=True)
                (kv): Linear(in_features=320, out_features=640, bias=True)
                (attn_drop): Dropout(p=0.0, inplace=False)
                (proj): Linear(in_features=320, out_features=320, bias=True)
                (proj_drop): Dropout(p=0.0, inplace=False)
                (sr): Conv2d(320, 320, kernel_size=(2, 2), stride=(2, 2))
                (norm): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
              )
              (drop_path): DropPath()
              (norm2): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
              (mlp): Mlp(
                (fc1): Linear(in_features=320, out_features=1280, bias=True)
                (dwconv): DWConv(
                  (dwconv): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1280)
                )
                (act): GELU()
                (fc2): Linear(in_features=1280, out_features=320, bias=True)
                (drop): Dropout(p=0.0, inplace=False)
              )
            )
            (23): Block(
              (norm1): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
              (attn): Attention(
                (q): Linear(in_features=320, out_features=320, bias=True)
                (kv): Linear(in_features=320, out_features=640, bias=True)
                (attn_drop): Dropout(p=0.0, inplace=False)
                (proj): Linear(in_features=320, out_features=320, bias=True)
                (proj_drop): Dropout(p=0.0, inplace=False)
                (sr): Conv2d(320, 320, kernel_size=(2, 2), stride=(2, 2))
                (norm): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
              )
              (drop_path): DropPath()
              (norm2): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
              (mlp): Mlp(
                (fc1): Linear(in_features=320, out_features=1280, bias=True)
                (dwconv): DWConv(
                  (dwconv): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1280)
                )
                (act): GELU()
                (fc2): Linear(in_features=1280, out_features=320, bias=True)
                (drop): Dropout(p=0.0, inplace=False)
              )
            )
            (24): Block(
              (norm1): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
              (attn): Attention(
                (q): Linear(in_features=320, out_features=320, bias=True)
                (kv): Linear(in_features=320, out_features=640, bias=True)
                (attn_drop): Dropout(p=0.0, inplace=False)
                (proj): Linear(in_features=320, out_features=320, bias=True)
                (proj_drop): Dropout(p=0.0, inplace=False)
                (sr): Conv2d(320, 320, kernel_size=(2, 2), stride=(2, 2))
                (norm): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
              )
              (drop_path): DropPath()
              (norm2): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
              (mlp): Mlp(
                (fc1): Linear(in_features=320, out_features=1280, bias=True)
                (dwconv): DWConv(
                  (dwconv): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1280)
                )
                (act): GELU()
                (fc2): Linear(in_features=1280, out_features=320, bias=True)
                (drop): Dropout(p=0.0, inplace=False)
              )
            )
            (25): Block(
              (norm1): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
              (attn): Attention(
                (q): Linear(in_features=320, out_features=320, bias=True)
                (kv): Linear(in_features=320, out_features=640, bias=True)
                (attn_drop): Dropout(p=0.0, inplace=False)
                (proj): Linear(in_features=320, out_features=320, bias=True)
                (proj_drop): Dropout(p=0.0, inplace=False)
                (sr): Conv2d(320, 320, kernel_size=(2, 2), stride=(2, 2))
                (norm): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
              )
              (drop_path): DropPath()
              (norm2): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
              (mlp): Mlp(
                (fc1): Linear(in_features=320, out_features=1280, bias=True)
                (dwconv): DWConv(
                  (dwconv): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1280)
                )
                (act): GELU()
                (fc2): Linear(in_features=1280, out_features=320, bias=True)
                (drop): Dropout(p=0.0, inplace=False)
              )
            )
            (26): Block(
              (norm1): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
              (attn): Attention(
                (q): Linear(in_features=320, out_features=320, bias=True)
                (kv): Linear(in_features=320, out_features=640, bias=True)
                (attn_drop): Dropout(p=0.0, inplace=False)
                (proj): Linear(in_features=320, out_features=320, bias=True)
                (proj_drop): Dropout(p=0.0, inplace=False)
                (sr): Conv2d(320, 320, kernel_size=(2, 2), stride=(2, 2))
                (norm): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
              )
              (drop_path): DropPath()
              (norm2): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
              (mlp): Mlp(
                (fc1): Linear(in_features=320, out_features=1280, bias=True)
                (dwconv): DWConv(
                  (dwconv): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1280)
                )
                (act): GELU()
                (fc2): Linear(in_features=1280, out_features=320, bias=True)
                (drop): Dropout(p=0.0, inplace=False)
              )
            )
            (27): Block(
              (norm1): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
              (attn): Attention(
                (q): Linear(in_features=320, out_features=320, bias=True)
                (kv): Linear(in_features=320, out_features=640, bias=True)
                (attn_drop): Dropout(p=0.0, inplace=False)
                (proj): Linear(in_features=320, out_features=320, bias=True)
                (proj_drop): Dropout(p=0.0, inplace=False)
                (sr): Conv2d(320, 320, kernel_size=(2, 2), stride=(2, 2))
                (norm): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
              )
              (drop_path): DropPath()
              (norm2): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
              (mlp): Mlp(
                (fc1): Linear(in_features=320, out_features=1280, bias=True)
                (dwconv): DWConv(
                  (dwconv): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1280)
                )
                (act): GELU()
                (fc2): Linear(in_features=1280, out_features=320, bias=True)
                (drop): Dropout(p=0.0, inplace=False)
              )
            )
            (28): Block(
              (norm1): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
              (attn): Attention(
                (q): Linear(in_features=320, out_features=320, bias=True)
                (kv): Linear(in_features=320, out_features=640, bias=True)
                (attn_drop): Dropout(p=0.0, inplace=False)
                (proj): Linear(in_features=320, out_features=320, bias=True)
                (proj_drop): Dropout(p=0.0, inplace=False)
                (sr): Conv2d(320, 320, kernel_size=(2, 2), stride=(2, 2))
                (norm): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
              )
              (drop_path): DropPath()
              (norm2): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
              (mlp): Mlp(
                (fc1): Linear(in_features=320, out_features=1280, bias=True)
                (dwconv): DWConv(
                  (dwconv): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1280)
                )
                (act): GELU()
                (fc2): Linear(in_features=1280, out_features=320, bias=True)
                (drop): Dropout(p=0.0, inplace=False)
              )
            )
            (29): Block(
              (norm1): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
              (attn): Attention(
                (q): Linear(in_features=320, out_features=320, bias=True)
                (kv): Linear(in_features=320, out_features=640, bias=True)
                (attn_drop): Dropout(p=0.0, inplace=False)
                (proj): Linear(in_features=320, out_features=320, bias=True)
                (proj_drop): Dropout(p=0.0, inplace=False)
                (sr): Conv2d(320, 320, kernel_size=(2, 2), stride=(2, 2))
                (norm): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
              )
              (drop_path): DropPath()
              (norm2): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
              (mlp): Mlp(
                (fc1): Linear(in_features=320, out_features=1280, bias=True)
                (dwconv): DWConv(
                  (dwconv): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1280)
                )
                (act): GELU()
                (fc2): Linear(in_features=1280, out_features=320, bias=True)
                (drop): Dropout(p=0.0, inplace=False)
              )
            )
            (30): Block(
              (norm1): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
              (attn): Attention(
                (q): Linear(in_features=320, out_features=320, bias=True)
                (kv): Linear(in_features=320, out_features=640, bias=True)
                (attn_drop): Dropout(p=0.0, inplace=False)
                (proj): Linear(in_features=320, out_features=320, bias=True)
                (proj_drop): Dropout(p=0.0, inplace=False)
                (sr): Conv2d(320, 320, kernel_size=(2, 2), stride=(2, 2))
                (norm): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
              )
              (drop_path): DropPath()
              (norm2): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
              (mlp): Mlp(
                (fc1): Linear(in_features=320, out_features=1280, bias=True)
                (dwconv): DWConv(
                  (dwconv): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1280)
                )
                (act): GELU()
                (fc2): Linear(in_features=1280, out_features=320, bias=True)
                (drop): Dropout(p=0.0, inplace=False)
              )
            )
            (31): Block(
              (norm1): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
              (attn): Attention(
                (q): Linear(in_features=320, out_features=320, bias=True)
                (kv): Linear(in_features=320, out_features=640, bias=True)
                (attn_drop): Dropout(p=0.0, inplace=False)
                (proj): Linear(in_features=320, out_features=320, bias=True)
                (proj_drop): Dropout(p=0.0, inplace=False)
                (sr): Conv2d(320, 320, kernel_size=(2, 2), stride=(2, 2))
                (norm): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
              )
              (drop_path): DropPath()
              (norm2): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
              (mlp): Mlp(
                (fc1): Linear(in_features=320, out_features=1280, bias=True)
                (dwconv): DWConv(
                  (dwconv): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1280)
                )
                (act): GELU()
                (fc2): Linear(in_features=1280, out_features=320, bias=True)
                (drop): Dropout(p=0.0, inplace=False)
              )
            )
            (32): Block(
              (norm1): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
              (attn): Attention(
                (q): Linear(in_features=320, out_features=320, bias=True)
                (kv): Linear(in_features=320, out_features=640, bias=True)
                (attn_drop): Dropout(p=0.0, inplace=False)
                (proj): Linear(in_features=320, out_features=320, bias=True)
                (proj_drop): Dropout(p=0.0, inplace=False)
                (sr): Conv2d(320, 320, kernel_size=(2, 2), stride=(2, 2))
                (norm): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
              )
              (drop_path): DropPath()
              (norm2): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
              (mlp): Mlp(
                (fc1): Linear(in_features=320, out_features=1280, bias=True)
                (dwconv): DWConv(
                  (dwconv): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1280)
                )
                (act): GELU()
                (fc2): Linear(in_features=1280, out_features=320, bias=True)
                (drop): Dropout(p=0.0, inplace=False)
              )
            )
            (33): Block(
              (norm1): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
              (attn): Attention(
                (q): Linear(in_features=320, out_features=320, bias=True)
                (kv): Linear(in_features=320, out_features=640, bias=True)
                (attn_drop): Dropout(p=0.0, inplace=False)
                (proj): Linear(in_features=320, out_features=320, bias=True)
                (proj_drop): Dropout(p=0.0, inplace=False)
                (sr): Conv2d(320, 320, kernel_size=(2, 2), stride=(2, 2))
                (norm): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
              )
              (drop_path): DropPath()
              (norm2): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
              (mlp): Mlp(
                (fc1): Linear(in_features=320, out_features=1280, bias=True)
                (dwconv): DWConv(
                  (dwconv): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1280)
                )
                (act): GELU()
                (fc2): Linear(in_features=1280, out_features=320, bias=True)
                (drop): Dropout(p=0.0, inplace=False)
              )
            )
            (34): Block(
              (norm1): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
              (attn): Attention(
                (q): Linear(in_features=320, out_features=320, bias=True)
                (kv): Linear(in_features=320, out_features=640, bias=True)
                (attn_drop): Dropout(p=0.0, inplace=False)
                (proj): Linear(in_features=320, out_features=320, bias=True)
                (proj_drop): Dropout(p=0.0, inplace=False)
                (sr): Conv2d(320, 320, kernel_size=(2, 2), stride=(2, 2))
                (norm): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
              )
              (drop_path): DropPath()
              (norm2): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
              (mlp): Mlp(
                (fc1): Linear(in_features=320, out_features=1280, bias=True)
                (dwconv): DWConv(
                  (dwconv): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1280)
                )
                (act): GELU()
                (fc2): Linear(in_features=1280, out_features=320, bias=True)
                (drop): Dropout(p=0.0, inplace=False)
              )
            )
            (35): Block(
              (norm1): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
              (attn): Attention(
                (q): Linear(in_features=320, out_features=320, bias=True)
                (kv): Linear(in_features=320, out_features=640, bias=True)
                (attn_drop): Dropout(p=0.0, inplace=False)
                (proj): Linear(in_features=320, out_features=320, bias=True)
                (proj_drop): Dropout(p=0.0, inplace=False)
                (sr): Conv2d(320, 320, kernel_size=(2, 2), stride=(2, 2))
                (norm): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
              )
              (drop_path): DropPath()
              (norm2): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
              (mlp): Mlp(
                (fc1): Linear(in_features=320, out_features=1280, bias=True)
                (dwconv): DWConv(
                  (dwconv): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1280)
                )
                (act): GELU()
                (fc2): Linear(in_features=1280, out_features=320, bias=True)
                (drop): Dropout(p=0.0, inplace=False)
              )
            )
            (36): Block(
              (norm1): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
              (attn): Attention(
                (q): Linear(in_features=320, out_features=320, bias=True)
                (kv): Linear(in_features=320, out_features=640, bias=True)
                (attn_drop): Dropout(p=0.0, inplace=False)
                (proj): Linear(in_features=320, out_features=320, bias=True)
                (proj_drop): Dropout(p=0.0, inplace=False)
                (sr): Conv2d(320, 320, kernel_size=(2, 2), stride=(2, 2))
                (norm): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
              )
              (drop_path): DropPath()
              (norm2): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
              (mlp): Mlp(
                (fc1): Linear(in_features=320, out_features=1280, bias=True)
                (dwconv): DWConv(
                  (dwconv): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1280)
                )
                (act): GELU()
                (fc2): Linear(in_features=1280, out_features=320, bias=True)
                (drop): Dropout(p=0.0, inplace=False)
              )
            )
            (37): Block(
              (norm1): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
              (attn): Attention(
                (q): Linear(in_features=320, out_features=320, bias=True)
                (kv): Linear(in_features=320, out_features=640, bias=True)
                (attn_drop): Dropout(p=0.0, inplace=False)
                (proj): Linear(in_features=320, out_features=320, bias=True)
                (proj_drop): Dropout(p=0.0, inplace=False)
                (sr): Conv2d(320, 320, kernel_size=(2, 2), stride=(2, 2))
                (norm): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
              )
              (drop_path): DropPath()
              (norm2): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
              (mlp): Mlp(
                (fc1): Linear(in_features=320, out_features=1280, bias=True)
                (dwconv): DWConv(
                  (dwconv): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1280)
                )
                (act): GELU()
                (fc2): Linear(in_features=1280, out_features=320, bias=True)
                (drop): Dropout(p=0.0, inplace=False)
              )
            )
            (38): Block(
              (norm1): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
              (attn): Attention(
                (q): Linear(in_features=320, out_features=320, bias=True)
                (kv): Linear(in_features=320, out_features=640, bias=True)
                (attn_drop): Dropout(p=0.0, inplace=False)
                (proj): Linear(in_features=320, out_features=320, bias=True)
                (proj_drop): Dropout(p=0.0, inplace=False)
                (sr): Conv2d(320, 320, kernel_size=(2, 2), stride=(2, 2))
                (norm): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
              )
              (drop_path): DropPath()
              (norm2): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
              (mlp): Mlp(
                (fc1): Linear(in_features=320, out_features=1280, bias=True)
                (dwconv): DWConv(
                  (dwconv): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1280)
                )
                (act): GELU()
                (fc2): Linear(in_features=1280, out_features=320, bias=True)
                (drop): Dropout(p=0.0, inplace=False)
              )
            )
            (39): Block(
              (norm1): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
              (attn): Attention(
                (q): Linear(in_features=320, out_features=320, bias=True)
                (kv): Linear(in_features=320, out_features=640, bias=True)
                (attn_drop): Dropout(p=0.0, inplace=False)
                (proj): Linear(in_features=320, out_features=320, bias=True)
                (proj_drop): Dropout(p=0.0, inplace=False)
                (sr): Conv2d(320, 320, kernel_size=(2, 2), stride=(2, 2))
                (norm): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
              )
              (drop_path): DropPath()
              (norm2): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
              (mlp): Mlp(
                (fc1): Linear(in_features=320, out_features=1280, bias=True)
                (dwconv): DWConv(
                  (dwconv): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1280)
                )
                (act): GELU()
                (fc2): Linear(in_features=1280, out_features=320, bias=True)
                (drop): Dropout(p=0.0, inplace=False)
              )
            )
          )
          (norm3): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
          (block4): ModuleList(
            (0): Block(
              (norm1): LayerNorm((512,), eps=1e-06, elementwise_affine=True)
              (attn): Attention(
                (q): Linear(in_features=512, out_features=512, bias=True)
                (kv): Linear(in_features=512, out_features=1024, bias=True)
                (attn_drop): Dropout(p=0.0, inplace=False)
                (proj): Linear(in_features=512, out_features=512, bias=True)
                (proj_drop): Dropout(p=0.0, inplace=False)
              )
              (drop_path): DropPath()
              (norm2): LayerNorm((512,), eps=1e-06, elementwise_affine=True)
              (mlp): Mlp(
                (fc1): Linear(in_features=512, out_features=2048, bias=True)
                (dwconv): DWConv(
                  (dwconv): Conv2d(2048, 2048, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=2048)
                )
                (act): GELU()
                (fc2): Linear(in_features=2048, out_features=512, bias=True)
                (drop): Dropout(p=0.0, inplace=False)
              )
            )
            (1): Block(
              (norm1): LayerNorm((512,), eps=1e-06, elementwise_affine=True)
              (attn): Attention(
                (q): Linear(in_features=512, out_features=512, bias=True)
                (kv): Linear(in_features=512, out_features=1024, bias=True)
                (attn_drop): Dropout(p=0.0, inplace=False)
                (proj): Linear(in_features=512, out_features=512, bias=True)
                (proj_drop): Dropout(p=0.0, inplace=False)
              )
              (drop_path): DropPath()
              (norm2): LayerNorm((512,), eps=1e-06, elementwise_affine=True)
              (mlp): Mlp(
                (fc1): Linear(in_features=512, out_features=2048, bias=True)
                (dwconv): DWConv(
                  (dwconv): Conv2d(2048, 2048, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=2048)
                )
                (act): GELU()
                (fc2): Linear(in_features=2048, out_features=512, bias=True)
                (drop): Dropout(p=0.0, inplace=False)
              )
            )
            (2): Block(
              (norm1): LayerNorm((512,), eps=1e-06, elementwise_affine=True)
              (attn): Attention(
                (q): Linear(in_features=512, out_features=512, bias=True)
                (kv): Linear(in_features=512, out_features=1024, bias=True)
                (attn_drop): Dropout(p=0.0, inplace=False)
                (proj): Linear(in_features=512, out_features=512, bias=True)
                (proj_drop): Dropout(p=0.0, inplace=False)
              )
              (drop_path): DropPath()
              (norm2): LayerNorm((512,), eps=1e-06, elementwise_affine=True)
              (mlp): Mlp(
                (fc1): Linear(in_features=512, out_features=2048, bias=True)
                (dwconv): DWConv(
                  (dwconv): Conv2d(2048, 2048, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=2048)
                )
                (act): GELU()
                (fc2): Linear(in_features=2048, out_features=512, bias=True)
                (drop): Dropout(p=0.0, inplace=False)
              )
            )
          )
          (norm4): LayerNorm((512,), eps=1e-06, elementwise_affine=True)
        )
        (decode_head): DAFormerHead(
          input_transform=multiple_select, ignore_index=255, align_corners=False
          (loss_decode): CrossEntropyLoss()
          (conv_seg): Conv2d(256, 19, kernel_size=(1, 1), stride=(1, 1))
          (dropout): Dropout2d(p=0.1, inplace=False)
          (embed_layers): ModuleDict(
            (0): MLP(
              (proj): Linear(in_features=64, out_features=256, bias=True)
            )
            (1): MLP(
              (proj): Linear(in_features=128, out_features=256, bias=True)
            )
            (2): MLP(
              (proj): Linear(in_features=320, out_features=256, bias=True)
            )
            (3): MLP(
              (proj): Linear(in_features=512, out_features=256, bias=True)
            )
          )
          (fuse_layer): ASPPWrapper(
            (aspp_modules): DepthwiseSeparableASPPModule(
              (0): ConvModule(
                (conv): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
                (bn): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
                (activate): ReLU(inplace=True)
              )
              (1): DepthwiseSeparableConvModule(
                (depthwise_conv): ConvModule(
                  (conv): Conv2d(1024, 1024, kernel_size=(3, 3), stride=(1, 1), padding=(6, 6), dilation=(6, 6), groups=1024, bias=False)
                  (bn): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
                  (activate): ReLU(inplace=True)
                )
                (pointwise_conv): ConvModule(
                  (conv): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
                  (bn): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
                  (activate): ReLU(inplace=True)
                )
              )
              (2): DepthwiseSeparableConvModule(
                (depthwise_conv): ConvModule(
                  (conv): Conv2d(1024, 1024, kernel_size=(3, 3), stride=(1, 1), padding=(12, 12), dilation=(12, 12), groups=1024, bias=False)
                  (bn): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
                  (activate): ReLU(inplace=True)
                )
                (pointwise_conv): ConvModule(
                  (conv): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
                  (bn): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
                  (activate): ReLU(inplace=True)
                )
              )
              (3): DepthwiseSeparableConvModule(
                (depthwise_conv): ConvModule(
                  (conv): Conv2d(1024, 1024, kernel_size=(3, 3), stride=(1, 1), padding=(18, 18), dilation=(18, 18), groups=1024, bias=False)
                  (bn): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
                  (activate): ReLU(inplace=True)
                )
                (pointwise_conv): ConvModule(
                  (conv): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
                  (bn): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
                  (activate): ReLU(inplace=True)
                )
              )
            )
            (bottleneck): ConvModule(
              (conv): Conv2d(1024, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
              (bn): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
              (activate): ReLU(inplace=True)
            )
          )
        )
        init_cfg={'type': 'Normal', 'std': 0.01, 'override': {'name': 'conv_seg'}}
      )
      (ema_model): EncoderDecoder(
        (backbone): mit_b5(
          (patch_embed1): OverlapPatchEmbed(
            (proj): Conv2d(3, 64, kernel_size=(7, 7), stride=(4, 4), padding=(3, 3))
            (norm): LayerNorm((64,), eps=1e-05, elementwise_affine=True)
          )
          (patch_embed2): OverlapPatchEmbed(
            (proj): Conv2d(64, 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
            (norm): LayerNorm((128,), eps=1e-05, elementwise_affine=True)
          )
          (patch_embed3): OverlapPatchEmbed(
            (proj): Conv2d(128, 320, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
            (norm): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
          )
          (patch_embed4): OverlapPatchEmbed(
            (proj): Conv2d(320, 512, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
            (norm): LayerNorm((512,), eps=1e-05, elementwise_affine=True)
          )
          (block1): ModuleList(
            (0): Block(
              (norm1): LayerNorm((64,), eps=1e-06, elementwise_affine=True)
              (attn): Attention(
                (q): Linear(in_features=64, out_features=64, bias=True)
                (kv): Linear(in_features=64, out_features=128, bias=True)
                (attn_drop): Dropout(p=0.0, inplace=False)
                (proj): Linear(in_features=64, out_features=64, bias=True)
                (proj_drop): Dropout(p=0.0, inplace=False)
                (sr): Conv2d(64, 64, kernel_size=(8, 8), stride=(8, 8))
                (norm): LayerNorm((64,), eps=1e-05, elementwise_affine=True)
              )
              (drop_path): Identity()
              (norm2): LayerNorm((64,), eps=1e-06, elementwise_affine=True)
              (mlp): Mlp(
                (fc1): Linear(in_features=64, out_features=256, bias=True)
                (dwconv): DWConv(
                  (dwconv): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=256)
                )
                (act): GELU()
                (fc2): Linear(in_features=256, out_features=64, bias=True)
                (drop): Dropout(p=0.0, inplace=False)
              )
            )
            (1): Block(
              (norm1): LayerNorm((64,), eps=1e-06, elementwise_affine=True)
              (attn): Attention(
                (q): Linear(in_features=64, out_features=64, bias=True)
                (kv): Linear(in_features=64, out_features=128, bias=True)
                (attn_drop): Dropout(p=0.0, inplace=False)
                (proj): Linear(in_features=64, out_features=64, bias=True)
                (proj_drop): Dropout(p=0.0, inplace=False)
                (sr): Conv2d(64, 64, kernel_size=(8, 8), stride=(8, 8))
                (norm): LayerNorm((64,), eps=1e-05, elementwise_affine=True)
              )
              (drop_path): DropPath()
              (norm2): LayerNorm((64,), eps=1e-06, elementwise_affine=True)
              (mlp): Mlp(
                (fc1): Linear(in_features=64, out_features=256, bias=True)
                (dwconv): DWConv(
                  (dwconv): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=256)
                )
                (act): GELU()
                (fc2): Linear(in_features=256, out_features=64, bias=True)
                (drop): Dropout(p=0.0, inplace=False)
              )
            )
            (2): Block(
              (norm1): LayerNorm((64,), eps=1e-06, elementwise_affine=True)
              (attn): Attention(
                (q): Linear(in_features=64, out_features=64, bias=True)
                (kv): Linear(in_features=64, out_features=128, bias=True)
                (attn_drop): Dropout(p=0.0, inplace=False)
                (proj): Linear(in_features=64, out_features=64, bias=True)
                (proj_drop): Dropout(p=0.0, inplace=False)
                (sr): Conv2d(64, 64, kernel_size=(8, 8), stride=(8, 8))
                (norm): LayerNorm((64,), eps=1e-05, elementwise_affine=True)
              )
              (drop_path): DropPath()
              (norm2): LayerNorm((64,), eps=1e-06, elementwise_affine=True)
              (mlp): Mlp(
                (fc1): Linear(in_features=64, out_features=256, bias=True)
                (dwconv): DWConv(
                  (dwconv): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=256)
                )
                (act): GELU()
                (fc2): Linear(in_features=256, out_features=64, bias=True)
                (drop): Dropout(p=0.0, inplace=False)
              )
            )
          )
          (norm1): LayerNorm((64,), eps=1e-06, elementwise_affine=True)
          (block2): ModuleList(
            (0): Block(
              (norm1): LayerNorm((128,), eps=1e-06, elementwise_affine=True)
              (attn): Attention(
                (q): Linear(in_features=128, out_features=128, bias=True)
                (kv): Linear(in_features=128, out_features=256, bias=True)
                (attn_drop): Dropout(p=0.0, inplace=False)
                (proj): Linear(in_features=128, out_features=128, bias=True)
                (proj_drop): Dropout(p=0.0, inplace=False)
                (sr): Conv2d(128, 128, kernel_size=(4, 4), stride=(4, 4))
                (norm): LayerNorm((128,), eps=1e-05, elementwise_affine=True)
              )
              (drop_path): DropPath()
              (norm2): LayerNorm((128,), eps=1e-06, elementwise_affine=True)
              (mlp): Mlp(
                (fc1): Linear(in_features=128, out_features=512, bias=True)
                (dwconv): DWConv(
                  (dwconv): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=512)
                )
                (act): GELU()
                (fc2): Linear(in_features=512, out_features=128, bias=True)
                (drop): Dropout(p=0.0, inplace=False)
              )
            )
            (1): Block(
              (norm1): LayerNorm((128,), eps=1e-06, elementwise_affine=True)
              (attn): Attention(
                (q): Linear(in_features=128, out_features=128, bias=True)
                (kv): Linear(in_features=128, out_features=256, bias=True)
                (attn_drop): Dropout(p=0.0, inplace=False)
                (proj): Linear(in_features=128, out_features=128, bias=True)
                (proj_drop): Dropout(p=0.0, inplace=False)
                (sr): Conv2d(128, 128, kernel_size=(4, 4), stride=(4, 4))
                (norm): LayerNorm((128,), eps=1e-05, elementwise_affine=True)
              )
              (drop_path): DropPath()
              (norm2): LayerNorm((128,), eps=1e-06, elementwise_affine=True)
              (mlp): Mlp(
                (fc1): Linear(in_features=128, out_features=512, bias=True)
                (dwconv): DWConv(
                  (dwconv): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=512)
                )
                (act): GELU()
                (fc2): Linear(in_features=512, out_features=128, bias=True)
                (drop): Dropout(p=0.0, inplace=False)
              )
            )
            (2): Block(
              (norm1): LayerNorm((128,), eps=1e-06, elementwise_affine=True)
              (attn): Attention(
                (q): Linear(in_features=128, out_features=128, bias=True)
                (kv): Linear(in_features=128, out_features=256, bias=True)
                (attn_drop): Dropout(p=0.0, inplace=False)
                (proj): Linear(in_features=128, out_features=128, bias=True)
                (proj_drop): Dropout(p=0.0, inplace=False)
                (sr): Conv2d(128, 128, kernel_size=(4, 4), stride=(4, 4))
                (norm): LayerNorm((128,), eps=1e-05, elementwise_affine=True)
              )
              (drop_path): DropPath()
              (norm2): LayerNorm((128,), eps=1e-06, elementwise_affine=True)
              (mlp): Mlp(
                (fc1): Linear(in_features=128, out_features=512, bias=True)
                (dwconv): DWConv(
                  (dwconv): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=512)
                )
                (act): GELU()
                (fc2): Linear(in_features=512, out_features=128, bias=True)
                (drop): Dropout(p=0.0, inplace=False)
              )
            )
            (3): Block(
              (norm1): LayerNorm((128,), eps=1e-06, elementwise_affine=True)
              (attn): Attention(
                (q): Linear(in_features=128, out_features=128, bias=True)
                (kv): Linear(in_features=128, out_features=256, bias=True)
                (attn_drop): Dropout(p=0.0, inplace=False)
                (proj): Linear(in_features=128, out_features=128, bias=True)
                (proj_drop): Dropout(p=0.0, inplace=False)
                (sr): Conv2d(128, 128, kernel_size=(4, 4), stride=(4, 4))
                (norm): LayerNorm((128,), eps=1e-05, elementwise_affine=True)
              )
              (drop_path): DropPath()
              (norm2): LayerNorm((128,), eps=1e-06, elementwise_affine=True)
              (mlp): Mlp(
                (fc1): Linear(in_features=128, out_features=512, bias=True)
                (dwconv): DWConv(
                  (dwconv): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=512)
                )
                (act): GELU()
                (fc2): Linear(in_features=512, out_features=128, bias=True)
                (drop): Dropout(p=0.0, inplace=False)
              )
            )
            (4): Block(
              (norm1): LayerNorm((128,), eps=1e-06, elementwise_affine=True)
              (attn): Attention(
                (q): Linear(in_features=128, out_features=128, bias=True)
                (kv): Linear(in_features=128, out_features=256, bias=True)
                (attn_drop): Dropout(p=0.0, inplace=False)
                (proj): Linear(in_features=128, out_features=128, bias=True)
                (proj_drop): Dropout(p=0.0, inplace=False)
                (sr): Conv2d(128, 128, kernel_size=(4, 4), stride=(4, 4))
                (norm): LayerNorm((128,), eps=1e-05, elementwise_affine=True)
              )
              (drop_path): DropPath()
              (norm2): LayerNorm((128,), eps=1e-06, elementwise_affine=True)
              (mlp): Mlp(
                (fc1): Linear(in_features=128, out_features=512, bias=True)
                (dwconv): DWConv(
                  (dwconv): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=512)
                )
                (act): GELU()
                (fc2): Linear(in_features=512, out_features=128, bias=True)
                (drop): Dropout(p=0.0, inplace=False)
              )
            )
            (5): Block(
              (norm1): LayerNorm((128,), eps=1e-06, elementwise_affine=True)
              (attn): Attention(
                (q): Linear(in_features=128, out_features=128, bias=True)
                (kv): Linear(in_features=128, out_features=256, bias=True)
                (attn_drop): Dropout(p=0.0, inplace=False)
                (proj): Linear(in_features=128, out_features=128, bias=True)
                (proj_drop): Dropout(p=0.0, inplace=False)
                (sr): Conv2d(128, 128, kernel_size=(4, 4), stride=(4, 4))
                (norm): LayerNorm((128,), eps=1e-05, elementwise_affine=True)
              )
              (drop_path): DropPath()
              (norm2): LayerNorm((128,), eps=1e-06, elementwise_affine=True)
              (mlp): Mlp(
                (fc1): Linear(in_features=128, out_features=512, bias=True)
                (dwconv): DWConv(
                  (dwconv): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=512)
                )
                (act): GELU()
                (fc2): Linear(in_features=512, out_features=128, bias=True)
                (drop): Dropout(p=0.0, inplace=False)
              )
            )
          )
          (norm2): LayerNorm((128,), eps=1e-06, elementwise_affine=True)
          (block3): ModuleList(
            (0): Block(
              (norm1): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
              (attn): Attention(
                (q): Linear(in_features=320, out_features=320, bias=True)
                (kv): Linear(in_features=320, out_features=640, bias=True)
                (attn_drop): Dropout(p=0.0, inplace=False)
                (proj): Linear(in_features=320, out_features=320, bias=True)
                (proj_drop): Dropout(p=0.0, inplace=False)
                (sr): Conv2d(320, 320, kernel_size=(2, 2), stride=(2, 2))
                (norm): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
              )
              (drop_path): DropPath()
              (norm2): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
              (mlp): Mlp(
                (fc1): Linear(in_features=320, out_features=1280, bias=True)
                (dwconv): DWConv(
                  (dwconv): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1280)
                )
                (act): GELU()
                (fc2): Linear(in_features=1280, out_features=320, bias=True)
                (drop): Dropout(p=0.0, inplace=False)
              )
            )
            (1): Block(
              (norm1): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
              (attn): Attention(
                (q): Linear(in_features=320, out_features=320, bias=True)
                (kv): Linear(in_features=320, out_features=640, bias=True)
                (attn_drop): Dropout(p=0.0, inplace=False)
                (proj): Linear(in_features=320, out_features=320, bias=True)
                (proj_drop): Dropout(p=0.0, inplace=False)
                (sr): Conv2d(320, 320, kernel_size=(2, 2), stride=(2, 2))
                (norm): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
              )
              (drop_path): DropPath()
              (norm2): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
              (mlp): Mlp(
                (fc1): Linear(in_features=320, out_features=1280, bias=True)
                (dwconv): DWConv(
                  (dwconv): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1280)
                )
                (act): GELU()
                (fc2): Linear(in_features=1280, out_features=320, bias=True)
                (drop): Dropout(p=0.0, inplace=False)
              )
            )
            (2): Block(
              (norm1): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
              (attn): Attention(
                (q): Linear(in_features=320, out_features=320, bias=True)
                (kv): Linear(in_features=320, out_features=640, bias=True)
                (attn_drop): Dropout(p=0.0, inplace=False)
                (proj): Linear(in_features=320, out_features=320, bias=True)
                (proj_drop): Dropout(p=0.0, inplace=False)
                (sr): Conv2d(320, 320, kernel_size=(2, 2), stride=(2, 2))
                (norm): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
              )
              (drop_path): DropPath()
              (norm2): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
              (mlp): Mlp(
                (fc1): Linear(in_features=320, out_features=1280, bias=True)
                (dwconv): DWConv(
                  (dwconv): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1280)
                )
                (act): GELU()
                (fc2): Linear(in_features=1280, out_features=320, bias=True)
                (drop): Dropout(p=0.0, inplace=False)
              )
            )
            (3): Block(
              (norm1): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
              (attn): Attention(
                (q): Linear(in_features=320, out_features=320, bias=True)
                (kv): Linear(in_features=320, out_features=640, bias=True)
                (attn_drop): Dropout(p=0.0, inplace=False)
                (proj): Linear(in_features=320, out_features=320, bias=True)
                (proj_drop): Dropout(p=0.0, inplace=False)
                (sr): Conv2d(320, 320, kernel_size=(2, 2), stride=(2, 2))
                (norm): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
              )
              (drop_path): DropPath()
              (norm2): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
              (mlp): Mlp(
                (fc1): Linear(in_features=320, out_features=1280, bias=True)
                (dwconv): DWConv(
                  (dwconv): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1280)
                )
                (act): GELU()
                (fc2): Linear(in_features=1280, out_features=320, bias=True)
                (drop): Dropout(p=0.0, inplace=False)
              )
            )
            (4): Block(
              (norm1): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
              (attn): Attention(
                (q): Linear(in_features=320, out_features=320, bias=True)
                (kv): Linear(in_features=320, out_features=640, bias=True)
                (attn_drop): Dropout(p=0.0, inplace=False)
                (proj): Linear(in_features=320, out_features=320, bias=True)
                (proj_drop): Dropout(p=0.0, inplace=False)
                (sr): Conv2d(320, 320, kernel_size=(2, 2), stride=(2, 2))
                (norm): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
              )
              (drop_path): DropPath()
              (norm2): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
              (mlp): Mlp(
                (fc1): Linear(in_features=320, out_features=1280, bias=True)
                (dwconv): DWConv(
                  (dwconv): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1280)
                )
                (act): GELU()
                (fc2): Linear(in_features=1280, out_features=320, bias=True)
                (drop): Dropout(p=0.0, inplace=False)
              )
            )
            (5): Block(
              (norm1): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
              (attn): Attention(
                (q): Linear(in_features=320, out_features=320, bias=True)
                (kv): Linear(in_features=320, out_features=640, bias=True)
                (attn_drop): Dropout(p=0.0, inplace=False)
                (proj): Linear(in_features=320, out_features=320, bias=True)
                (proj_drop): Dropout(p=0.0, inplace=False)
                (sr): Conv2d(320, 320, kernel_size=(2, 2), stride=(2, 2))
                (norm): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
              )
              (drop_path): DropPath()
              (norm2): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
              (mlp): Mlp(
                (fc1): Linear(in_features=320, out_features=1280, bias=True)
                (dwconv): DWConv(
                  (dwconv): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1280)
                )
                (act): GELU()
                (fc2): Linear(in_features=1280, out_features=320, bias=True)
                (drop): Dropout(p=0.0, inplace=False)
              )
            )
            (6): Block(
              (norm1): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
              (attn): Attention(
                (q): Linear(in_features=320, out_features=320, bias=True)
                (kv): Linear(in_features=320, out_features=640, bias=True)
                (attn_drop): Dropout(p=0.0, inplace=False)
                (proj): Linear(in_features=320, out_features=320, bias=True)
                (proj_drop): Dropout(p=0.0, inplace=False)
                (sr): Conv2d(320, 320, kernel_size=(2, 2), stride=(2, 2))
                (norm): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
              )
              (drop_path): DropPath()
              (norm2): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
              (mlp): Mlp(
                (fc1): Linear(in_features=320, out_features=1280, bias=True)
                (dwconv): DWConv(
                  (dwconv): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1280)
                )
                (act): GELU()
                (fc2): Linear(in_features=1280, out_features=320, bias=True)
                (drop): Dropout(p=0.0, inplace=False)
              )
            )
            (7): Block(
              (norm1): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
              (attn): Attention(
                (q): Linear(in_features=320, out_features=320, bias=True)
                (kv): Linear(in_features=320, out_features=640, bias=True)
                (attn_drop): Dropout(p=0.0, inplace=False)
                (proj): Linear(in_features=320, out_features=320, bias=True)
                (proj_drop): Dropout(p=0.0, inplace=False)
                (sr): Conv2d(320, 320, kernel_size=(2, 2), stride=(2, 2))
                (norm): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
              )
              (drop_path): DropPath()
              (norm2): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
              (mlp): Mlp(
                (fc1): Linear(in_features=320, out_features=1280, bias=True)
                (dwconv): DWConv(
                  (dwconv): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1280)
                )
                (act): GELU()
                (fc2): Linear(in_features=1280, out_features=320, bias=True)
                (drop): Dropout(p=0.0, inplace=False)
              )
            )
            (8): Block(
              (norm1): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
              (attn): Attention(
                (q): Linear(in_features=320, out_features=320, bias=True)
                (kv): Linear(in_features=320, out_features=640, bias=True)
                (attn_drop): Dropout(p=0.0, inplace=False)
                (proj): Linear(in_features=320, out_features=320, bias=True)
                (proj_drop): Dropout(p=0.0, inplace=False)
                (sr): Conv2d(320, 320, kernel_size=(2, 2), stride=(2, 2))
                (norm): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
              )
              (drop_path): DropPath()
              (norm2): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
              (mlp): Mlp(
                (fc1): Linear(in_features=320, out_features=1280, bias=True)
                (dwconv): DWConv(
                  (dwconv): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1280)
                )
                (act): GELU()
                (fc2): Linear(in_features=1280, out_features=320, bias=True)
                (drop): Dropout(p=0.0, inplace=False)
              )
            )
            (9): Block(
              (norm1): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
              (attn): Attention(
                (q): Linear(in_features=320, out_features=320, bias=True)
                (kv): Linear(in_features=320, out_features=640, bias=True)
                (attn_drop): Dropout(p=0.0, inplace=False)
                (proj): Linear(in_features=320, out_features=320, bias=True)
                (proj_drop): Dropout(p=0.0, inplace=False)
                (sr): Conv2d(320, 320, kernel_size=(2, 2), stride=(2, 2))
                (norm): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
              )
              (drop_path): DropPath()
              (norm2): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
              (mlp): Mlp(
                (fc1): Linear(in_features=320, out_features=1280, bias=True)
                (dwconv): DWConv(
                  (dwconv): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1280)
                )
                (act): GELU()
                (fc2): Linear(in_features=1280, out_features=320, bias=True)
                (drop): Dropout(p=0.0, inplace=False)
              )
            )
            (10): Block(
              (norm1): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
              (attn): Attention(
                (q): Linear(in_features=320, out_features=320, bias=True)
                (kv): Linear(in_features=320, out_features=640, bias=True)
                (attn_drop): Dropout(p=0.0, inplace=False)
                (proj): Linear(in_features=320, out_features=320, bias=True)
                (proj_drop): Dropout(p=0.0, inplace=False)
                (sr): Conv2d(320, 320, kernel_size=(2, 2), stride=(2, 2))
                (norm): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
              )
              (drop_path): DropPath()
              (norm2): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
              (mlp): Mlp(
                (fc1): Linear(in_features=320, out_features=1280, bias=True)
                (dwconv): DWConv(
                  (dwconv): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1280)
                )
                (act): GELU()
                (fc2): Linear(in_features=1280, out_features=320, bias=True)
                (drop): Dropout(p=0.0, inplace=False)
              )
            )
            (11): Block(
              (norm1): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
              (attn): Attention(
                (q): Linear(in_features=320, out_features=320, bias=True)
                (kv): Linear(in_features=320, out_features=640, bias=True)
                (attn_drop): Dropout(p=0.0, inplace=False)
                (proj): Linear(in_features=320, out_features=320, bias=True)
                (proj_drop): Dropout(p=0.0, inplace=False)
                (sr): Conv2d(320, 320, kernel_size=(2, 2), stride=(2, 2))
                (norm): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
              )
              (drop_path): DropPath()
              (norm2): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
              (mlp): Mlp(
                (fc1): Linear(in_features=320, out_features=1280, bias=True)
                (dwconv): DWConv(
                  (dwconv): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1280)
                )
                (act): GELU()
                (fc2): Linear(in_features=1280, out_features=320, bias=True)
                (drop): Dropout(p=0.0, inplace=False)
              )
            )
            (12): Block(
              (norm1): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
              (attn): Attention(
                (q): Linear(in_features=320, out_features=320, bias=True)
                (kv): Linear(in_features=320, out_features=640, bias=True)
                (attn_drop): Dropout(p=0.0, inplace=False)
                (proj): Linear(in_features=320, out_features=320, bias=True)
                (proj_drop): Dropout(p=0.0, inplace=False)
                (sr): Conv2d(320, 320, kernel_size=(2, 2), stride=(2, 2))
                (norm): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
              )
              (drop_path): DropPath()
              (norm2): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
              (mlp): Mlp(
                (fc1): Linear(in_features=320, out_features=1280, bias=True)
                (dwconv): DWConv(
                  (dwconv): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1280)
                )
                (act): GELU()
                (fc2): Linear(in_features=1280, out_features=320, bias=True)
                (drop): Dropout(p=0.0, inplace=False)
              )
            )
            (13): Block(
              (norm1): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
              (attn): Attention(
                (q): Linear(in_features=320, out_features=320, bias=True)
                (kv): Linear(in_features=320, out_features=640, bias=True)
                (attn_drop): Dropout(p=0.0, inplace=False)
                (proj): Linear(in_features=320, out_features=320, bias=True)
                (proj_drop): Dropout(p=0.0, inplace=False)
                (sr): Conv2d(320, 320, kernel_size=(2, 2), stride=(2, 2))
                (norm): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
              )
              (drop_path): DropPath()
              (norm2): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
              (mlp): Mlp(
                (fc1): Linear(in_features=320, out_features=1280, bias=True)
                (dwconv): DWConv(
                  (dwconv): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1280)
                )
                (act): GELU()
                (fc2): Linear(in_features=1280, out_features=320, bias=True)
                (drop): Dropout(p=0.0, inplace=False)
              )
            )
            (14): Block(
              (norm1): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
              (attn): Attention(
                (q): Linear(in_features=320, out_features=320, bias=True)
                (kv): Linear(in_features=320, out_features=640, bias=True)
                (attn_drop): Dropout(p=0.0, inplace=False)
                (proj): Linear(in_features=320, out_features=320, bias=True)
                (proj_drop): Dropout(p=0.0, inplace=False)
                (sr): Conv2d(320, 320, kernel_size=(2, 2), stride=(2, 2))
                (norm): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
              )
              (drop_path): DropPath()
              (norm2): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
              (mlp): Mlp(
                (fc1): Linear(in_features=320, out_features=1280, bias=True)
                (dwconv): DWConv(
                  (dwconv): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1280)
                )
                (act): GELU()
                (fc2): Linear(in_features=1280, out_features=320, bias=True)
                (drop): Dropout(p=0.0, inplace=False)
              )
            )
            (15): Block(
              (norm1): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
              (attn): Attention(
                (q): Linear(in_features=320, out_features=320, bias=True)
                (kv): Linear(in_features=320, out_features=640, bias=True)
                (attn_drop): Dropout(p=0.0, inplace=False)
                (proj): Linear(in_features=320, out_features=320, bias=True)
                (proj_drop): Dropout(p=0.0, inplace=False)
                (sr): Conv2d(320, 320, kernel_size=(2, 2), stride=(2, 2))
                (norm): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
              )
              (drop_path): DropPath()
              (norm2): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
              (mlp): Mlp(
                (fc1): Linear(in_features=320, out_features=1280, bias=True)
                (dwconv): DWConv(
                  (dwconv): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1280)
                )
                (act): GELU()
                (fc2): Linear(in_features=1280, out_features=320, bias=True)
                (drop): Dropout(p=0.0, inplace=False)
              )
            )
            (16): Block(
              (norm1): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
              (attn): Attention(
                (q): Linear(in_features=320, out_features=320, bias=True)
                (kv): Linear(in_features=320, out_features=640, bias=True)
                (attn_drop): Dropout(p=0.0, inplace=False)
                (proj): Linear(in_features=320, out_features=320, bias=True)
                (proj_drop): Dropout(p=0.0, inplace=False)
                (sr): Conv2d(320, 320, kernel_size=(2, 2), stride=(2, 2))
                (norm): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
              )
              (drop_path): DropPath()
              (norm2): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
              (mlp): Mlp(
                (fc1): Linear(in_features=320, out_features=1280, bias=True)
                (dwconv): DWConv(
                  (dwconv): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1280)
                )
                (act): GELU()
                (fc2): Linear(in_features=1280, out_features=320, bias=True)
                (drop): Dropout(p=0.0, inplace=False)
              )
            )
            (17): Block(
              (norm1): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
              (attn): Attention(
                (q): Linear(in_features=320, out_features=320, bias=True)
                (kv): Linear(in_features=320, out_features=640, bias=True)
                (attn_drop): Dropout(p=0.0, inplace=False)
                (proj): Linear(in_features=320, out_features=320, bias=True)
                (proj_drop): Dropout(p=0.0, inplace=False)
                (sr): Conv2d(320, 320, kernel_size=(2, 2), stride=(2, 2))
                (norm): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
              )
              (drop_path): DropPath()
              (norm2): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
              (mlp): Mlp(
                (fc1): Linear(in_features=320, out_features=1280, bias=True)
                (dwconv): DWConv(
                  (dwconv): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1280)
                )
                (act): GELU()
                (fc2): Linear(in_features=1280, out_features=320, bias=True)
                (drop): Dropout(p=0.0, inplace=False)
              )
            )
            (18): Block(
              (norm1): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
              (attn): Attention(
                (q): Linear(in_features=320, out_features=320, bias=True)
                (kv): Linear(in_features=320, out_features=640, bias=True)
                (attn_drop): Dropout(p=0.0, inplace=False)
                (proj): Linear(in_features=320, out_features=320, bias=True)
                (proj_drop): Dropout(p=0.0, inplace=False)
                (sr): Conv2d(320, 320, kernel_size=(2, 2), stride=(2, 2))
                (norm): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
              )
              (drop_path): DropPath()
              (norm2): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
              (mlp): Mlp(
                (fc1): Linear(in_features=320, out_features=1280, bias=True)
                (dwconv): DWConv(
                  (dwconv): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1280)
                )
                (act): GELU()
                (fc2): Linear(in_features=1280, out_features=320, bias=True)
                (drop): Dropout(p=0.0, inplace=False)
              )
            )
            (19): Block(
              (norm1): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
              (attn): Attention(
                (q): Linear(in_features=320, out_features=320, bias=True)
                (kv): Linear(in_features=320, out_features=640, bias=True)
                (attn_drop): Dropout(p=0.0, inplace=False)
                (proj): Linear(in_features=320, out_features=320, bias=True)
                (proj_drop): Dropout(p=0.0, inplace=False)
                (sr): Conv2d(320, 320, kernel_size=(2, 2), stride=(2, 2))
                (norm): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
              )
              (drop_path): DropPath()
              (norm2): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
              (mlp): Mlp(
                (fc1): Linear(in_features=320, out_features=1280, bias=True)
                (dwconv): DWConv(
                  (dwconv): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1280)
                )
                (act): GELU()
                (fc2): Linear(in_features=1280, out_features=320, bias=True)
                (drop): Dropout(p=0.0, inplace=False)
              )
            )
            (20): Block(
              (norm1): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
              (attn): Attention(
                (q): Linear(in_features=320, out_features=320, bias=True)
                (kv): Linear(in_features=320, out_features=640, bias=True)
                (attn_drop): Dropout(p=0.0, inplace=False)
                (proj): Linear(in_features=320, out_features=320, bias=True)
                (proj_drop): Dropout(p=0.0, inplace=False)
                (sr): Conv2d(320, 320, kernel_size=(2, 2), stride=(2, 2))
                (norm): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
              )
              (drop_path): DropPath()
              (norm2): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
              (mlp): Mlp(
                (fc1): Linear(in_features=320, out_features=1280, bias=True)
                (dwconv): DWConv(
                  (dwconv): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1280)
                )
                (act): GELU()
                (fc2): Linear(in_features=1280, out_features=320, bias=True)
                (drop): Dropout(p=0.0, inplace=False)
              )
            )
            (21): Block(
              (norm1): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
              (attn): Attention(
                (q): Linear(in_features=320, out_features=320, bias=True)
                (kv): Linear(in_features=320, out_features=640, bias=True)
                (attn_drop): Dropout(p=0.0, inplace=False)
                (proj): Linear(in_features=320, out_features=320, bias=True)
                (proj_drop): Dropout(p=0.0, inplace=False)
                (sr): Conv2d(320, 320, kernel_size=(2, 2), stride=(2, 2))
                (norm): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
              )
              (drop_path): DropPath()
              (norm2): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
              (mlp): Mlp(
                (fc1): Linear(in_features=320, out_features=1280, bias=True)
                (dwconv): DWConv(
                  (dwconv): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1280)
                )
                (act): GELU()
                (fc2): Linear(in_features=1280, out_features=320, bias=True)
                (drop): Dropout(p=0.0, inplace=False)
              )
            )
            (22): Block(
              (norm1): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
              (attn): Attention(
                (q): Linear(in_features=320, out_features=320, bias=True)
                (kv): Linear(in_features=320, out_features=640, bias=True)
                (attn_drop): Dropout(p=0.0, inplace=False)
                (proj): Linear(in_features=320, out_features=320, bias=True)
                (proj_drop): Dropout(p=0.0, inplace=False)
                (sr): Conv2d(320, 320, kernel_size=(2, 2), stride=(2, 2))
                (norm): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
              )
              (drop_path): DropPath()
              (norm2): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
              (mlp): Mlp(
                (fc1): Linear(in_features=320, out_features=1280, bias=True)
                (dwconv): DWConv(
                  (dwconv): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1280)
                )
                (act): GELU()
                (fc2): Linear(in_features=1280, out_features=320, bias=True)
                (drop): Dropout(p=0.0, inplace=False)
              )
            )
            (23): Block(
              (norm1): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
              (attn): Attention(
                (q): Linear(in_features=320, out_features=320, bias=True)
                (kv): Linear(in_features=320, out_features=640, bias=True)
                (attn_drop): Dropout(p=0.0, inplace=False)
                (proj): Linear(in_features=320, out_features=320, bias=True)
                (proj_drop): Dropout(p=0.0, inplace=False)
                (sr): Conv2d(320, 320, kernel_size=(2, 2), stride=(2, 2))
                (norm): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
              )
              (drop_path): DropPath()
              (norm2): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
              (mlp): Mlp(
                (fc1): Linear(in_features=320, out_features=1280, bias=True)
                (dwconv): DWConv(
                  (dwconv): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1280)
                )
                (act): GELU()
                (fc2): Linear(in_features=1280, out_features=320, bias=True)
                (drop): Dropout(p=0.0, inplace=False)
              )
            )
            (24): Block(
              (norm1): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
              (attn): Attention(
                (q): Linear(in_features=320, out_features=320, bias=True)
                (kv): Linear(in_features=320, out_features=640, bias=True)
                (attn_drop): Dropout(p=0.0, inplace=False)
                (proj): Linear(in_features=320, out_features=320, bias=True)
                (proj_drop): Dropout(p=0.0, inplace=False)
                (sr): Conv2d(320, 320, kernel_size=(2, 2), stride=(2, 2))
                (norm): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
              )
              (drop_path): DropPath()
              (norm2): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
              (mlp): Mlp(
                (fc1): Linear(in_features=320, out_features=1280, bias=True)
                (dwconv): DWConv(
                  (dwconv): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1280)
                )
                (act): GELU()
                (fc2): Linear(in_features=1280, out_features=320, bias=True)
                (drop): Dropout(p=0.0, inplace=False)
              )
            )
            (25): Block(
              (norm1): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
              (attn): Attention(
                (q): Linear(in_features=320, out_features=320, bias=True)
                (kv): Linear(in_features=320, out_features=640, bias=True)
                (attn_drop): Dropout(p=0.0, inplace=False)
                (proj): Linear(in_features=320, out_features=320, bias=True)
                (proj_drop): Dropout(p=0.0, inplace=False)
                (sr): Conv2d(320, 320, kernel_size=(2, 2), stride=(2, 2))
                (norm): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
              )
              (drop_path): DropPath()
              (norm2): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
              (mlp): Mlp(
                (fc1): Linear(in_features=320, out_features=1280, bias=True)
                (dwconv): DWConv(
                  (dwconv): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1280)
                )
                (act): GELU()
                (fc2): Linear(in_features=1280, out_features=320, bias=True)
                (drop): Dropout(p=0.0, inplace=False)
              )
            )
            (26): Block(
              (norm1): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
              (attn): Attention(
                (q): Linear(in_features=320, out_features=320, bias=True)
                (kv): Linear(in_features=320, out_features=640, bias=True)
                (attn_drop): Dropout(p=0.0, inplace=False)
                (proj): Linear(in_features=320, out_features=320, bias=True)
                (proj_drop): Dropout(p=0.0, inplace=False)
                (sr): Conv2d(320, 320, kernel_size=(2, 2), stride=(2, 2))
                (norm): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
              )
              (drop_path): DropPath()
              (norm2): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
              (mlp): Mlp(
                (fc1): Linear(in_features=320, out_features=1280, bias=True)
                (dwconv): DWConv(
                  (dwconv): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1280)
                )
                (act): GELU()
                (fc2): Linear(in_features=1280, out_features=320, bias=True)
                (drop): Dropout(p=0.0, inplace=False)
              )
            )
            (27): Block(
              (norm1): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
              (attn): Attention(
                (q): Linear(in_features=320, out_features=320, bias=True)
                (kv): Linear(in_features=320, out_features=640, bias=True)
                (attn_drop): Dropout(p=0.0, inplace=False)
                (proj): Linear(in_features=320, out_features=320, bias=True)
                (proj_drop): Dropout(p=0.0, inplace=False)
                (sr): Conv2d(320, 320, kernel_size=(2, 2), stride=(2, 2))
                (norm): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
              )
              (drop_path): DropPath()
              (norm2): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
              (mlp): Mlp(
                (fc1): Linear(in_features=320, out_features=1280, bias=True)
                (dwconv): DWConv(
                  (dwconv): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1280)
                )
                (act): GELU()
                (fc2): Linear(in_features=1280, out_features=320, bias=True)
                (drop): Dropout(p=0.0, inplace=False)
              )
            )
            (28): Block(
              (norm1): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
              (attn): Attention(
                (q): Linear(in_features=320, out_features=320, bias=True)
                (kv): Linear(in_features=320, out_features=640, bias=True)
                (attn_drop): Dropout(p=0.0, inplace=False)
                (proj): Linear(in_features=320, out_features=320, bias=True)
                (proj_drop): Dropout(p=0.0, inplace=False)
                (sr): Conv2d(320, 320, kernel_size=(2, 2), stride=(2, 2))
                (norm): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
              )
              (drop_path): DropPath()
              (norm2): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
              (mlp): Mlp(
                (fc1): Linear(in_features=320, out_features=1280, bias=True)
                (dwconv): DWConv(
                  (dwconv): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1280)
                )
                (act): GELU()
                (fc2): Linear(in_features=1280, out_features=320, bias=True)
                (drop): Dropout(p=0.0, inplace=False)
              )
            )
            (29): Block(
              (norm1): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
              (attn): Attention(
                (q): Linear(in_features=320, out_features=320, bias=True)
                (kv): Linear(in_features=320, out_features=640, bias=True)
                (attn_drop): Dropout(p=0.0, inplace=False)
                (proj): Linear(in_features=320, out_features=320, bias=True)
                (proj_drop): Dropout(p=0.0, inplace=False)
                (sr): Conv2d(320, 320, kernel_size=(2, 2), stride=(2, 2))
                (norm): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
              )
              (drop_path): DropPath()
              (norm2): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
              (mlp): Mlp(
                (fc1): Linear(in_features=320, out_features=1280, bias=True)
                (dwconv): DWConv(
                  (dwconv): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1280)
                )
                (act): GELU()
                (fc2): Linear(in_features=1280, out_features=320, bias=True)
                (drop): Dropout(p=0.0, inplace=False)
              )
            )
            (30): Block(
              (norm1): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
              (attn): Attention(
                (q): Linear(in_features=320, out_features=320, bias=True)
                (kv): Linear(in_features=320, out_features=640, bias=True)
                (attn_drop): Dropout(p=0.0, inplace=False)
                (proj): Linear(in_features=320, out_features=320, bias=True)
                (proj_drop): Dropout(p=0.0, inplace=False)
                (sr): Conv2d(320, 320, kernel_size=(2, 2), stride=(2, 2))
                (norm): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
              )
              (drop_path): DropPath()
              (norm2): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
              (mlp): Mlp(
                (fc1): Linear(in_features=320, out_features=1280, bias=True)
                (dwconv): DWConv(
                  (dwconv): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1280)
                )
                (act): GELU()
                (fc2): Linear(in_features=1280, out_features=320, bias=True)
                (drop): Dropout(p=0.0, inplace=False)
              )
            )
            (31): Block(
              (norm1): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
              (attn): Attention(
                (q): Linear(in_features=320, out_features=320, bias=True)
                (kv): Linear(in_features=320, out_features=640, bias=True)
                (attn_drop): Dropout(p=0.0, inplace=False)
                (proj): Linear(in_features=320, out_features=320, bias=True)
                (proj_drop): Dropout(p=0.0, inplace=False)
                (sr): Conv2d(320, 320, kernel_size=(2, 2), stride=(2, 2))
                (norm): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
              )
              (drop_path): DropPath()
              (norm2): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
              (mlp): Mlp(
                (fc1): Linear(in_features=320, out_features=1280, bias=True)
                (dwconv): DWConv(
                  (dwconv): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1280)
                )
                (act): GELU()
                (fc2): Linear(in_features=1280, out_features=320, bias=True)
                (drop): Dropout(p=0.0, inplace=False)
              )
            )
            (32): Block(
              (norm1): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
              (attn): Attention(
                (q): Linear(in_features=320, out_features=320, bias=True)
                (kv): Linear(in_features=320, out_features=640, bias=True)
                (attn_drop): Dropout(p=0.0, inplace=False)
                (proj): Linear(in_features=320, out_features=320, bias=True)
                (proj_drop): Dropout(p=0.0, inplace=False)
                (sr): Conv2d(320, 320, kernel_size=(2, 2), stride=(2, 2))
                (norm): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
              )
              (drop_path): DropPath()
              (norm2): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
              (mlp): Mlp(
                (fc1): Linear(in_features=320, out_features=1280, bias=True)
                (dwconv): DWConv(
                  (dwconv): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1280)
                )
                (act): GELU()
                (fc2): Linear(in_features=1280, out_features=320, bias=True)
                (drop): Dropout(p=0.0, inplace=False)
              )
            )
            (33): Block(
              (norm1): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
              (attn): Attention(
                (q): Linear(in_features=320, out_features=320, bias=True)
                (kv): Linear(in_features=320, out_features=640, bias=True)
                (attn_drop): Dropout(p=0.0, inplace=False)
                (proj): Linear(in_features=320, out_features=320, bias=True)
                (proj_drop): Dropout(p=0.0, inplace=False)
                (sr): Conv2d(320, 320, kernel_size=(2, 2), stride=(2, 2))
                (norm): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
              )
              (drop_path): DropPath()
              (norm2): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
              (mlp): Mlp(
                (fc1): Linear(in_features=320, out_features=1280, bias=True)
                (dwconv): DWConv(
                  (dwconv): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1280)
                )
                (act): GELU()
                (fc2): Linear(in_features=1280, out_features=320, bias=True)
                (drop): Dropout(p=0.0, inplace=False)
              )
            )
            (34): Block(
              (norm1): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
              (attn): Attention(
                (q): Linear(in_features=320, out_features=320, bias=True)
                (kv): Linear(in_features=320, out_features=640, bias=True)
                (attn_drop): Dropout(p=0.0, inplace=False)
                (proj): Linear(in_features=320, out_features=320, bias=True)
                (proj_drop): Dropout(p=0.0, inplace=False)
                (sr): Conv2d(320, 320, kernel_size=(2, 2), stride=(2, 2))
                (norm): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
              )
              (drop_path): DropPath()
              (norm2): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
              (mlp): Mlp(
                (fc1): Linear(in_features=320, out_features=1280, bias=True)
                (dwconv): DWConv(
                  (dwconv): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1280)
                )
                (act): GELU()
                (fc2): Linear(in_features=1280, out_features=320, bias=True)
                (drop): Dropout(p=0.0, inplace=False)
              )
            )
            (35): Block(
              (norm1): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
              (attn): Attention(
                (q): Linear(in_features=320, out_features=320, bias=True)
                (kv): Linear(in_features=320, out_features=640, bias=True)
                (attn_drop): Dropout(p=0.0, inplace=False)
                (proj): Linear(in_features=320, out_features=320, bias=True)
                (proj_drop): Dropout(p=0.0, inplace=False)
                (sr): Conv2d(320, 320, kernel_size=(2, 2), stride=(2, 2))
                (norm): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
              )
              (drop_path): DropPath()
              (norm2): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
              (mlp): Mlp(
                (fc1): Linear(in_features=320, out_features=1280, bias=True)
                (dwconv): DWConv(
                  (dwconv): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1280)
                )
                (act): GELU()
                (fc2): Linear(in_features=1280, out_features=320, bias=True)
                (drop): Dropout(p=0.0, inplace=False)
              )
            )
            (36): Block(
              (norm1): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
              (attn): Attention(
                (q): Linear(in_features=320, out_features=320, bias=True)
                (kv): Linear(in_features=320, out_features=640, bias=True)
                (attn_drop): Dropout(p=0.0, inplace=False)
                (proj): Linear(in_features=320, out_features=320, bias=True)
                (proj_drop): Dropout(p=0.0, inplace=False)
                (sr): Conv2d(320, 320, kernel_size=(2, 2), stride=(2, 2))
                (norm): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
              )
              (drop_path): DropPath()
              (norm2): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
              (mlp): Mlp(
                (fc1): Linear(in_features=320, out_features=1280, bias=True)
                (dwconv): DWConv(
                  (dwconv): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1280)
                )
                (act): GELU()
                (fc2): Linear(in_features=1280, out_features=320, bias=True)
                (drop): Dropout(p=0.0, inplace=False)
              )
            )
            (37): Block(
              (norm1): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
              (attn): Attention(
                (q): Linear(in_features=320, out_features=320, bias=True)
                (kv): Linear(in_features=320, out_features=640, bias=True)
                (attn_drop): Dropout(p=0.0, inplace=False)
                (proj): Linear(in_features=320, out_features=320, bias=True)
                (proj_drop): Dropout(p=0.0, inplace=False)
                (sr): Conv2d(320, 320, kernel_size=(2, 2), stride=(2, 2))
                (norm): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
              )
              (drop_path): DropPath()
              (norm2): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
              (mlp): Mlp(
                (fc1): Linear(in_features=320, out_features=1280, bias=True)
                (dwconv): DWConv(
                  (dwconv): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1280)
                )
                (act): GELU()
                (fc2): Linear(in_features=1280, out_features=320, bias=True)
                (drop): Dropout(p=0.0, inplace=False)
              )
            )
            (38): Block(
              (norm1): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
              (attn): Attention(
                (q): Linear(in_features=320, out_features=320, bias=True)
                (kv): Linear(in_features=320, out_features=640, bias=True)
                (attn_drop): Dropout(p=0.0, inplace=False)
                (proj): Linear(in_features=320, out_features=320, bias=True)
                (proj_drop): Dropout(p=0.0, inplace=False)
                (sr): Conv2d(320, 320, kernel_size=(2, 2), stride=(2, 2))
                (norm): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
              )
              (drop_path): DropPath()
              (norm2): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
              (mlp): Mlp(
                (fc1): Linear(in_features=320, out_features=1280, bias=True)
                (dwconv): DWConv(
                  (dwconv): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1280)
                )
                (act): GELU()
                (fc2): Linear(in_features=1280, out_features=320, bias=True)
                (drop): Dropout(p=0.0, inplace=False)
              )
            )
            (39): Block(
              (norm1): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
              (attn): Attention(
                (q): Linear(in_features=320, out_features=320, bias=True)
                (kv): Linear(in_features=320, out_features=640, bias=True)
                (attn_drop): Dropout(p=0.0, inplace=False)
                (proj): Linear(in_features=320, out_features=320, bias=True)
                (proj_drop): Dropout(p=0.0, inplace=False)
                (sr): Conv2d(320, 320, kernel_size=(2, 2), stride=(2, 2))
                (norm): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
              )
              (drop_path): DropPath()
              (norm2): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
              (mlp): Mlp(
                (fc1): Linear(in_features=320, out_features=1280, bias=True)
                (dwconv): DWConv(
                  (dwconv): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1280)
                )
                (act): GELU()
                (fc2): Linear(in_features=1280, out_features=320, bias=True)
                (drop): Dropout(p=0.0, inplace=False)
              )
            )
          )
          (norm3): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
          (block4): ModuleList(
            (0): Block(
              (norm1): LayerNorm((512,), eps=1e-06, elementwise_affine=True)
              (attn): Attention(
                (q): Linear(in_features=512, out_features=512, bias=True)
                (kv): Linear(in_features=512, out_features=1024, bias=True)
                (attn_drop): Dropout(p=0.0, inplace=False)
                (proj): Linear(in_features=512, out_features=512, bias=True)
                (proj_drop): Dropout(p=0.0, inplace=False)
              )
              (drop_path): DropPath()
              (norm2): LayerNorm((512,), eps=1e-06, elementwise_affine=True)
              (mlp): Mlp(
                (fc1): Linear(in_features=512, out_features=2048, bias=True)
                (dwconv): DWConv(
                  (dwconv): Conv2d(2048, 2048, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=2048)
                )
                (act): GELU()
                (fc2): Linear(in_features=2048, out_features=512, bias=True)
                (drop): Dropout(p=0.0, inplace=False)
              )
            )
            (1): Block(
              (norm1): LayerNorm((512,), eps=1e-06, elementwise_affine=True)
              (attn): Attention(
                (q): Linear(in_features=512, out_features=512, bias=True)
                (kv): Linear(in_features=512, out_features=1024, bias=True)
                (attn_drop): Dropout(p=0.0, inplace=False)
                (proj): Linear(in_features=512, out_features=512, bias=True)
                (proj_drop): Dropout(p=0.0, inplace=False)
              )
              (drop_path): DropPath()
              (norm2): LayerNorm((512,), eps=1e-06, elementwise_affine=True)
              (mlp): Mlp(
                (fc1): Linear(in_features=512, out_features=2048, bias=True)
                (dwconv): DWConv(
                  (dwconv): Conv2d(2048, 2048, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=2048)
                )
                (act): GELU()
                (fc2): Linear(in_features=2048, out_features=512, bias=True)
                (drop): Dropout(p=0.0, inplace=False)
              )
            )
            (2): Block(
              (norm1): LayerNorm((512,), eps=1e-06, elementwise_affine=True)
              (attn): Attention(
                (q): Linear(in_features=512, out_features=512, bias=True)
                (kv): Linear(in_features=512, out_features=1024, bias=True)
                (attn_drop): Dropout(p=0.0, inplace=False)
                (proj): Linear(in_features=512, out_features=512, bias=True)
                (proj_drop): Dropout(p=0.0, inplace=False)
              )
              (drop_path): DropPath()
              (norm2): LayerNorm((512,), eps=1e-06, elementwise_affine=True)
              (mlp): Mlp(
                (fc1): Linear(in_features=512, out_features=2048, bias=True)
                (dwconv): DWConv(
                  (dwconv): Conv2d(2048, 2048, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=2048)
                )
                (act): GELU()
                (fc2): Linear(in_features=2048, out_features=512, bias=True)
                (drop): Dropout(p=0.0, inplace=False)
              )
            )
          )
          (norm4): LayerNorm((512,), eps=1e-06, elementwise_affine=True)
        )
        (decode_head): DAFormerHead(
          input_transform=multiple_select, ignore_index=255, align_corners=False
          (loss_decode): CrossEntropyLoss()
          (conv_seg): Conv2d(256, 19, kernel_size=(1, 1), stride=(1, 1))
          (dropout): Dropout2d(p=0.1, inplace=False)
          (embed_layers): ModuleDict(
            (0): MLP(
              (proj): Linear(in_features=64, out_features=256, bias=True)
            )
            (1): MLP(
              (proj): Linear(in_features=128, out_features=256, bias=True)
            )
            (2): MLP(
              (proj): Linear(in_features=320, out_features=256, bias=True)
            )
            (3): MLP(
              (proj): Linear(in_features=512, out_features=256, bias=True)
            )
          )
          (fuse_layer): ASPPWrapper(
            (aspp_modules): DepthwiseSeparableASPPModule(
              (0): ConvModule(
                (conv): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
                (bn): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
                (activate): ReLU(inplace=True)
              )
              (1): DepthwiseSeparableConvModule(
                (depthwise_conv): ConvModule(
                  (conv): Conv2d(1024, 1024, kernel_size=(3, 3), stride=(1, 1), padding=(6, 6), dilation=(6, 6), groups=1024, bias=False)
                  (bn): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
                  (activate): ReLU(inplace=True)
                )
                (pointwise_conv): ConvModule(
                  (conv): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
                  (bn): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
                  (activate): ReLU(inplace=True)
                )
              )
              (2): DepthwiseSeparableConvModule(
                (depthwise_conv): ConvModule(
                  (conv): Conv2d(1024, 1024, kernel_size=(3, 3), stride=(1, 1), padding=(12, 12), dilation=(12, 12), groups=1024, bias=False)
                  (bn): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
                  (activate): ReLU(inplace=True)
                )
                (pointwise_conv): ConvModule(
                  (conv): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
                  (bn): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
                  (activate): ReLU(inplace=True)
                )
              )
              (3): DepthwiseSeparableConvModule(
                (depthwise_conv): ConvModule(
                  (conv): Conv2d(1024, 1024, kernel_size=(3, 3), stride=(1, 1), padding=(18, 18), dilation=(18, 18), groups=1024, bias=False)
                  (bn): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
                  (activate): ReLU(inplace=True)
                )
                (pointwise_conv): ConvModule(
                  (conv): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
                  (bn): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
                  (activate): ReLU(inplace=True)
                )
              )
            )
            (bottleneck): ConvModule(
              (conv): Conv2d(1024, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
              (bn): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
              (activate): ReLU(inplace=True)
            )
          )
        )
        init_cfg={'type': 'Normal', 'std': 0.01, 'override': {'name': 'conv_seg'}}
      )
      (imnet_model): EncoderDecoder(
        (backbone): mit_b5(
          (patch_embed1): OverlapPatchEmbed(
            (proj): Conv2d(3, 64, kernel_size=(7, 7), stride=(4, 4), padding=(3, 3))
            (norm): LayerNorm((64,), eps=1e-05, elementwise_affine=True)
          )
          (patch_embed2): OverlapPatchEmbed(
            (proj): Conv2d(64, 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
            (norm): LayerNorm((128,), eps=1e-05, elementwise_affine=True)
          )
          (patch_embed3): OverlapPatchEmbed(
            (proj): Conv2d(128, 320, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
            (norm): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
          )
          (patch_embed4): OverlapPatchEmbed(
            (proj): Conv2d(320, 512, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
            (norm): LayerNorm((512,), eps=1e-05, elementwise_affine=True)
          )
          (block1): ModuleList(
            (0): Block(
              (norm1): LayerNorm((64,), eps=1e-06, elementwise_affine=True)
              (attn): Attention(
                (q): Linear(in_features=64, out_features=64, bias=True)
                (kv): Linear(in_features=64, out_features=128, bias=True)
                (attn_drop): Dropout(p=0.0, inplace=False)
                (proj): Linear(in_features=64, out_features=64, bias=True)
                (proj_drop): Dropout(p=0.0, inplace=False)
                (sr): Conv2d(64, 64, kernel_size=(8, 8), stride=(8, 8))
                (norm): LayerNorm((64,), eps=1e-05, elementwise_affine=True)
              )
              (drop_path): Identity()
              (norm2): LayerNorm((64,), eps=1e-06, elementwise_affine=True)
              (mlp): Mlp(
                (fc1): Linear(in_features=64, out_features=256, bias=True)
                (dwconv): DWConv(
                  (dwconv): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=256)
                )
                (act): GELU()
                (fc2): Linear(in_features=256, out_features=64, bias=True)
                (drop): Dropout(p=0.0, inplace=False)
              )
            )
            (1): Block(
              (norm1): LayerNorm((64,), eps=1e-06, elementwise_affine=True)
              (attn): Attention(
                (q): Linear(in_features=64, out_features=64, bias=True)
                (kv): Linear(in_features=64, out_features=128, bias=True)
                (attn_drop): Dropout(p=0.0, inplace=False)
                (proj): Linear(in_features=64, out_features=64, bias=True)
                (proj_drop): Dropout(p=0.0, inplace=False)
                (sr): Conv2d(64, 64, kernel_size=(8, 8), stride=(8, 8))
                (norm): LayerNorm((64,), eps=1e-05, elementwise_affine=True)
              )
              (drop_path): DropPath()
              (norm2): LayerNorm((64,), eps=1e-06, elementwise_affine=True)
              (mlp): Mlp(
                (fc1): Linear(in_features=64, out_features=256, bias=True)
                (dwconv): DWConv(
                  (dwconv): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=256)
                )
                (act): GELU()
                (fc2): Linear(in_features=256, out_features=64, bias=True)
                (drop): Dropout(p=0.0, inplace=False)
              )
            )
            (2): Block(
              (norm1): LayerNorm((64,), eps=1e-06, elementwise_affine=True)
              (attn): Attention(
                (q): Linear(in_features=64, out_features=64, bias=True)
                (kv): Linear(in_features=64, out_features=128, bias=True)
                (attn_drop): Dropout(p=0.0, inplace=False)
                (proj): Linear(in_features=64, out_features=64, bias=True)
                (proj_drop): Dropout(p=0.0, inplace=False)
                (sr): Conv2d(64, 64, kernel_size=(8, 8), stride=(8, 8))
                (norm): LayerNorm((64,), eps=1e-05, elementwise_affine=True)
              )
              (drop_path): DropPath()
              (norm2): LayerNorm((64,), eps=1e-06, elementwise_affine=True)
              (mlp): Mlp(
                (fc1): Linear(in_features=64, out_features=256, bias=True)
                (dwconv): DWConv(
                  (dwconv): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=256)
                )
                (act): GELU()
                (fc2): Linear(in_features=256, out_features=64, bias=True)
                (drop): Dropout(p=0.0, inplace=False)
              )
            )
          )
          (norm1): LayerNorm((64,), eps=1e-06, elementwise_affine=True)
          (block2): ModuleList(
            (0): Block(
              (norm1): LayerNorm((128,), eps=1e-06, elementwise_affine=True)
              (attn): Attention(
                (q): Linear(in_features=128, out_features=128, bias=True)
                (kv): Linear(in_features=128, out_features=256, bias=True)
                (attn_drop): Dropout(p=0.0, inplace=False)
                (proj): Linear(in_features=128, out_features=128, bias=True)
                (proj_drop): Dropout(p=0.0, inplace=False)
                (sr): Conv2d(128, 128, kernel_size=(4, 4), stride=(4, 4))
                (norm): LayerNorm((128,), eps=1e-05, elementwise_affine=True)
              )
              (drop_path): DropPath()
              (norm2): LayerNorm((128,), eps=1e-06, elementwise_affine=True)
              (mlp): Mlp(
                (fc1): Linear(in_features=128, out_features=512, bias=True)
                (dwconv): DWConv(
                  (dwconv): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=512)
                )
                (act): GELU()
                (fc2): Linear(in_features=512, out_features=128, bias=True)
                (drop): Dropout(p=0.0, inplace=False)
              )
            )
            (1): Block(
              (norm1): LayerNorm((128,), eps=1e-06, elementwise_affine=True)
              (attn): Attention(
                (q): Linear(in_features=128, out_features=128, bias=True)
                (kv): Linear(in_features=128, out_features=256, bias=True)
                (attn_drop): Dropout(p=0.0, inplace=False)
                (proj): Linear(in_features=128, out_features=128, bias=True)
                (proj_drop): Dropout(p=0.0, inplace=False)
                (sr): Conv2d(128, 128, kernel_size=(4, 4), stride=(4, 4))
                (norm): LayerNorm((128,), eps=1e-05, elementwise_affine=True)
              )
              (drop_path): DropPath()
              (norm2): LayerNorm((128,), eps=1e-06, elementwise_affine=True)
              (mlp): Mlp(
                (fc1): Linear(in_features=128, out_features=512, bias=True)
                (dwconv): DWConv(
                  (dwconv): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=512)
                )
                (act): GELU()
                (fc2): Linear(in_features=512, out_features=128, bias=True)
                (drop): Dropout(p=0.0, inplace=False)
              )
            )
            (2): Block(
              (norm1): LayerNorm((128,), eps=1e-06, elementwise_affine=True)
              (attn): Attention(
                (q): Linear(in_features=128, out_features=128, bias=True)
                (kv): Linear(in_features=128, out_features=256, bias=True)
                (attn_drop): Dropout(p=0.0, inplace=False)
                (proj): Linear(in_features=128, out_features=128, bias=True)
                (proj_drop): Dropout(p=0.0, inplace=False)
                (sr): Conv2d(128, 128, kernel_size=(4, 4), stride=(4, 4))
                (norm): LayerNorm((128,), eps=1e-05, elementwise_affine=True)
              )
              (drop_path): DropPath()
              (norm2): LayerNorm((128,), eps=1e-06, elementwise_affine=True)
              (mlp): Mlp(
                (fc1): Linear(in_features=128, out_features=512, bias=True)
                (dwconv): DWConv(
                  (dwconv): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=512)
                )
                (act): GELU()
                (fc2): Linear(in_features=512, out_features=128, bias=True)
                (drop): Dropout(p=0.0, inplace=False)
              )
            )
            (3): Block(
              (norm1): LayerNorm((128,), eps=1e-06, elementwise_affine=True)
              (attn): Attention(
                (q): Linear(in_features=128, out_features=128, bias=True)
                (kv): Linear(in_features=128, out_features=256, bias=True)
                (attn_drop): Dropout(p=0.0, inplace=False)
                (proj): Linear(in_features=128, out_features=128, bias=True)
                (proj_drop): Dropout(p=0.0, inplace=False)
                (sr): Conv2d(128, 128, kernel_size=(4, 4), stride=(4, 4))
                (norm): LayerNorm((128,), eps=1e-05, elementwise_affine=True)
              )
              (drop_path): DropPath()
              (norm2): LayerNorm((128,), eps=1e-06, elementwise_affine=True)
              (mlp): Mlp(
                (fc1): Linear(in_features=128, out_features=512, bias=True)
                (dwconv): DWConv(
                  (dwconv): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=512)
                )
                (act): GELU()
                (fc2): Linear(in_features=512, out_features=128, bias=True)
                (drop): Dropout(p=0.0, inplace=False)
              )
            )
            (4): Block(
              (norm1): LayerNorm((128,), eps=1e-06, elementwise_affine=True)
              (attn): Attention(
                (q): Linear(in_features=128, out_features=128, bias=True)
                (kv): Linear(in_features=128, out_features=256, bias=True)
                (attn_drop): Dropout(p=0.0, inplace=False)
                (proj): Linear(in_features=128, out_features=128, bias=True)
                (proj_drop): Dropout(p=0.0, inplace=False)
                (sr): Conv2d(128, 128, kernel_size=(4, 4), stride=(4, 4))
                (norm): LayerNorm((128,), eps=1e-05, elementwise_affine=True)
              )
              (drop_path): DropPath()
              (norm2): LayerNorm((128,), eps=1e-06, elementwise_affine=True)
              (mlp): Mlp(
                (fc1): Linear(in_features=128, out_features=512, bias=True)
                (dwconv): DWConv(
                  (dwconv): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=512)
                )
                (act): GELU()
                (fc2): Linear(in_features=512, out_features=128, bias=True)
                (drop): Dropout(p=0.0, inplace=False)
              )
            )
            (5): Block(
              (norm1): LayerNorm((128,), eps=1e-06, elementwise_affine=True)
              (attn): Attention(
                (q): Linear(in_features=128, out_features=128, bias=True)
                (kv): Linear(in_features=128, out_features=256, bias=True)
                (attn_drop): Dropout(p=0.0, inplace=False)
                (proj): Linear(in_features=128, out_features=128, bias=True)
                (proj_drop): Dropout(p=0.0, inplace=False)
                (sr): Conv2d(128, 128, kernel_size=(4, 4), stride=(4, 4))
                (norm): LayerNorm((128,), eps=1e-05, elementwise_affine=True)
              )
              (drop_path): DropPath()
              (norm2): LayerNorm((128,), eps=1e-06, elementwise_affine=True)
              (mlp): Mlp(
                (fc1): Linear(in_features=128, out_features=512, bias=True)
                (dwconv): DWConv(
                  (dwconv): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=512)
                )
                (act): GELU()
                (fc2): Linear(in_features=512, out_features=128, bias=True)
                (drop): Dropout(p=0.0, inplace=False)
              )
            )
          )
          (norm2): LayerNorm((128,), eps=1e-06, elementwise_affine=True)
          (block3): ModuleList(
            (0): Block(
              (norm1): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
              (attn): Attention(
                (q): Linear(in_features=320, out_features=320, bias=True)
                (kv): Linear(in_features=320, out_features=640, bias=True)
                (attn_drop): Dropout(p=0.0, inplace=False)
                (proj): Linear(in_features=320, out_features=320, bias=True)
                (proj_drop): Dropout(p=0.0, inplace=False)
                (sr): Conv2d(320, 320, kernel_size=(2, 2), stride=(2, 2))
                (norm): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
              )
              (drop_path): DropPath()
              (norm2): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
              (mlp): Mlp(
                (fc1): Linear(in_features=320, out_features=1280, bias=True)
                (dwconv): DWConv(
                  (dwconv): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1280)
                )
                (act): GELU()
                (fc2): Linear(in_features=1280, out_features=320, bias=True)
                (drop): Dropout(p=0.0, inplace=False)
              )
            )
            (1): Block(
              (norm1): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
              (attn): Attention(
                (q): Linear(in_features=320, out_features=320, bias=True)
                (kv): Linear(in_features=320, out_features=640, bias=True)
                (attn_drop): Dropout(p=0.0, inplace=False)
                (proj): Linear(in_features=320, out_features=320, bias=True)
                (proj_drop): Dropout(p=0.0, inplace=False)
                (sr): Conv2d(320, 320, kernel_size=(2, 2), stride=(2, 2))
                (norm): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
              )
              (drop_path): DropPath()
              (norm2): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
              (mlp): Mlp(
                (fc1): Linear(in_features=320, out_features=1280, bias=True)
                (dwconv): DWConv(
                  (dwconv): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1280)
                )
                (act): GELU()
                (fc2): Linear(in_features=1280, out_features=320, bias=True)
                (drop): Dropout(p=0.0, inplace=False)
              )
            )
            (2): Block(
              (norm1): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
              (attn): Attention(
                (q): Linear(in_features=320, out_features=320, bias=True)
                (kv): Linear(in_features=320, out_features=640, bias=True)
                (attn_drop): Dropout(p=0.0, inplace=False)
                (proj): Linear(in_features=320, out_features=320, bias=True)
                (proj_drop): Dropout(p=0.0, inplace=False)
                (sr): Conv2d(320, 320, kernel_size=(2, 2), stride=(2, 2))
                (norm): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
              )
              (drop_path): DropPath()
              (norm2): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
              (mlp): Mlp(
                (fc1): Linear(in_features=320, out_features=1280, bias=True)
                (dwconv): DWConv(
                  (dwconv): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1280)
                )
                (act): GELU()
                (fc2): Linear(in_features=1280, out_features=320, bias=True)
                (drop): Dropout(p=0.0, inplace=False)
              )
            )
            (3): Block(
              (norm1): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
              (attn): Attention(
                (q): Linear(in_features=320, out_features=320, bias=True)
                (kv): Linear(in_features=320, out_features=640, bias=True)
                (attn_drop): Dropout(p=0.0, inplace=False)
                (proj): Linear(in_features=320, out_features=320, bias=True)
                (proj_drop): Dropout(p=0.0, inplace=False)
                (sr): Conv2d(320, 320, kernel_size=(2, 2), stride=(2, 2))
                (norm): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
              )
              (drop_path): DropPath()
              (norm2): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
              (mlp): Mlp(
                (fc1): Linear(in_features=320, out_features=1280, bias=True)
                (dwconv): DWConv(
                  (dwconv): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1280)
                )
                (act): GELU()
                (fc2): Linear(in_features=1280, out_features=320, bias=True)
                (drop): Dropout(p=0.0, inplace=False)
              )
            )
            (4): Block(
              (norm1): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
              (attn): Attention(
                (q): Linear(in_features=320, out_features=320, bias=True)
                (kv): Linear(in_features=320, out_features=640, bias=True)
                (attn_drop): Dropout(p=0.0, inplace=False)
                (proj): Linear(in_features=320, out_features=320, bias=True)
                (proj_drop): Dropout(p=0.0, inplace=False)
                (sr): Conv2d(320, 320, kernel_size=(2, 2), stride=(2, 2))
                (norm): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
              )
              (drop_path): DropPath()
              (norm2): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
              (mlp): Mlp(
                (fc1): Linear(in_features=320, out_features=1280, bias=True)
                (dwconv): DWConv(
                  (dwconv): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1280)
                )
                (act): GELU()
                (fc2): Linear(in_features=1280, out_features=320, bias=True)
                (drop): Dropout(p=0.0, inplace=False)
              )
            )
            (5): Block(
              (norm1): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
              (attn): Attention(
                (q): Linear(in_features=320, out_features=320, bias=True)
                (kv): Linear(in_features=320, out_features=640, bias=True)
                (attn_drop): Dropout(p=0.0, inplace=False)
                (proj): Linear(in_features=320, out_features=320, bias=True)
                (proj_drop): Dropout(p=0.0, inplace=False)
                (sr): Conv2d(320, 320, kernel_size=(2, 2), stride=(2, 2))
                (norm): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
              )
              (drop_path): DropPath()
              (norm2): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
              (mlp): Mlp(
                (fc1): Linear(in_features=320, out_features=1280, bias=True)
                (dwconv): DWConv(
                  (dwconv): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1280)
                )
                (act): GELU()
                (fc2): Linear(in_features=1280, out_features=320, bias=True)
                (drop): Dropout(p=0.0, inplace=False)
              )
            )
            (6): Block(
              (norm1): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
              (attn): Attention(
                (q): Linear(in_features=320, out_features=320, bias=True)
                (kv): Linear(in_features=320, out_features=640, bias=True)
                (attn_drop): Dropout(p=0.0, inplace=False)
                (proj): Linear(in_features=320, out_features=320, bias=True)
                (proj_drop): Dropout(p=0.0, inplace=False)
                (sr): Conv2d(320, 320, kernel_size=(2, 2), stride=(2, 2))
                (norm): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
              )
              (drop_path): DropPath()
              (norm2): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
              (mlp): Mlp(
                (fc1): Linear(in_features=320, out_features=1280, bias=True)
                (dwconv): DWConv(
                  (dwconv): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1280)
                )
                (act): GELU()
                (fc2): Linear(in_features=1280, out_features=320, bias=True)
                (drop): Dropout(p=0.0, inplace=False)
              )
            )
            (7): Block(
              (norm1): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
              (attn): Attention(
                (q): Linear(in_features=320, out_features=320, bias=True)
                (kv): Linear(in_features=320, out_features=640, bias=True)
                (attn_drop): Dropout(p=0.0, inplace=False)
                (proj): Linear(in_features=320, out_features=320, bias=True)
                (proj_drop): Dropout(p=0.0, inplace=False)
                (sr): Conv2d(320, 320, kernel_size=(2, 2), stride=(2, 2))
                (norm): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
              )
              (drop_path): DropPath()
              (norm2): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
              (mlp): Mlp(
                (fc1): Linear(in_features=320, out_features=1280, bias=True)
                (dwconv): DWConv(
                  (dwconv): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1280)
                )
                (act): GELU()
                (fc2): Linear(in_features=1280, out_features=320, bias=True)
                (drop): Dropout(p=0.0, inplace=False)
              )
            )
            (8): Block(
              (norm1): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
              (attn): Attention(
                (q): Linear(in_features=320, out_features=320, bias=True)
                (kv): Linear(in_features=320, out_features=640, bias=True)
                (attn_drop): Dropout(p=0.0, inplace=False)
                (proj): Linear(in_features=320, out_features=320, bias=True)
                (proj_drop): Dropout(p=0.0, inplace=False)
                (sr): Conv2d(320, 320, kernel_size=(2, 2), stride=(2, 2))
                (norm): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
              )
              (drop_path): DropPath()
              (norm2): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
              (mlp): Mlp(
                (fc1): Linear(in_features=320, out_features=1280, bias=True)
                (dwconv): DWConv(
                  (dwconv): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1280)
                )
                (act): GELU()
                (fc2): Linear(in_features=1280, out_features=320, bias=True)
                (drop): Dropout(p=0.0, inplace=False)
              )
            )
            (9): Block(
              (norm1): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
              (attn): Attention(
                (q): Linear(in_features=320, out_features=320, bias=True)
                (kv): Linear(in_features=320, out_features=640, bias=True)
                (attn_drop): Dropout(p=0.0, inplace=False)
                (proj): Linear(in_features=320, out_features=320, bias=True)
                (proj_drop): Dropout(p=0.0, inplace=False)
                (sr): Conv2d(320, 320, kernel_size=(2, 2), stride=(2, 2))
                (norm): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
              )
              (drop_path): DropPath()
              (norm2): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
              (mlp): Mlp(
                (fc1): Linear(in_features=320, out_features=1280, bias=True)
                (dwconv): DWConv(
                  (dwconv): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1280)
                )
                (act): GELU()
                (fc2): Linear(in_features=1280, out_features=320, bias=True)
                (drop): Dropout(p=0.0, inplace=False)
              )
            )
            (10): Block(
              (norm1): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
              (attn): Attention(
                (q): Linear(in_features=320, out_features=320, bias=True)
                (kv): Linear(in_features=320, out_features=640, bias=True)
                (attn_drop): Dropout(p=0.0, inplace=False)
                (proj): Linear(in_features=320, out_features=320, bias=True)
                (proj_drop): Dropout(p=0.0, inplace=False)
                (sr): Conv2d(320, 320, kernel_size=(2, 2), stride=(2, 2))
                (norm): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
              )
              (drop_path): DropPath()
              (norm2): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
              (mlp): Mlp(
                (fc1): Linear(in_features=320, out_features=1280, bias=True)
                (dwconv): DWConv(
                  (dwconv): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1280)
                )
                (act): GELU()
                (fc2): Linear(in_features=1280, out_features=320, bias=True)
                (drop): Dropout(p=0.0, inplace=False)
              )
            )
            (11): Block(
              (norm1): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
              (attn): Attention(
                (q): Linear(in_features=320, out_features=320, bias=True)
                (kv): Linear(in_features=320, out_features=640, bias=True)
                (attn_drop): Dropout(p=0.0, inplace=False)
                (proj): Linear(in_features=320, out_features=320, bias=True)
                (proj_drop): Dropout(p=0.0, inplace=False)
                (sr): Conv2d(320, 320, kernel_size=(2, 2), stride=(2, 2))
                (norm): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
              )
              (drop_path): DropPath()
              (norm2): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
              (mlp): Mlp(
                (fc1): Linear(in_features=320, out_features=1280, bias=True)
                (dwconv): DWConv(
                  (dwconv): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1280)
                )
                (act): GELU()
                (fc2): Linear(in_features=1280, out_features=320, bias=True)
                (drop): Dropout(p=0.0, inplace=False)
              )
            )
            (12): Block(
              (norm1): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
              (attn): Attention(
                (q): Linear(in_features=320, out_features=320, bias=True)
                (kv): Linear(in_features=320, out_features=640, bias=True)
                (attn_drop): Dropout(p=0.0, inplace=False)
                (proj): Linear(in_features=320, out_features=320, bias=True)
                (proj_drop): Dropout(p=0.0, inplace=False)
                (sr): Conv2d(320, 320, kernel_size=(2, 2), stride=(2, 2))
                (norm): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
              )
              (drop_path): DropPath()
              (norm2): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
              (mlp): Mlp(
                (fc1): Linear(in_features=320, out_features=1280, bias=True)
                (dwconv): DWConv(
                  (dwconv): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1280)
                )
                (act): GELU()
                (fc2): Linear(in_features=1280, out_features=320, bias=True)
                (drop): Dropout(p=0.0, inplace=False)
              )
            )
            (13): Block(
              (norm1): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
              (attn): Attention(
                (q): Linear(in_features=320, out_features=320, bias=True)
                (kv): Linear(in_features=320, out_features=640, bias=True)
                (attn_drop): Dropout(p=0.0, inplace=False)
                (proj): Linear(in_features=320, out_features=320, bias=True)
                (proj_drop): Dropout(p=0.0, inplace=False)
                (sr): Conv2d(320, 320, kernel_size=(2, 2), stride=(2, 2))
                (norm): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
              )
              (drop_path): DropPath()
              (norm2): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
              (mlp): Mlp(
                (fc1): Linear(in_features=320, out_features=1280, bias=True)
                (dwconv): DWConv(
                  (dwconv): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1280)
                )
                (act): GELU()
                (fc2): Linear(in_features=1280, out_features=320, bias=True)
                (drop): Dropout(p=0.0, inplace=False)
              )
            )
            (14): Block(
              (norm1): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
              (attn): Attention(
                (q): Linear(in_features=320, out_features=320, bias=True)
                (kv): Linear(in_features=320, out_features=640, bias=True)
                (attn_drop): Dropout(p=0.0, inplace=False)
                (proj): Linear(in_features=320, out_features=320, bias=True)
                (proj_drop): Dropout(p=0.0, inplace=False)
                (sr): Conv2d(320, 320, kernel_size=(2, 2), stride=(2, 2))
                (norm): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
              )
              (drop_path): DropPath()
              (norm2): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
              (mlp): Mlp(
                (fc1): Linear(in_features=320, out_features=1280, bias=True)
                (dwconv): DWConv(
                  (dwconv): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1280)
                )
                (act): GELU()
                (fc2): Linear(in_features=1280, out_features=320, bias=True)
                (drop): Dropout(p=0.0, inplace=False)
              )
            )
            (15): Block(
              (norm1): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
              (attn): Attention(
                (q): Linear(in_features=320, out_features=320, bias=True)
                (kv): Linear(in_features=320, out_features=640, bias=True)
                (attn_drop): Dropout(p=0.0, inplace=False)
                (proj): Linear(in_features=320, out_features=320, bias=True)
                (proj_drop): Dropout(p=0.0, inplace=False)
                (sr): Conv2d(320, 320, kernel_size=(2, 2), stride=(2, 2))
                (norm): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
              )
              (drop_path): DropPath()
              (norm2): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
              (mlp): Mlp(
                (fc1): Linear(in_features=320, out_features=1280, bias=True)
                (dwconv): DWConv(
                  (dwconv): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1280)
                )
                (act): GELU()
                (fc2): Linear(in_features=1280, out_features=320, bias=True)
                (drop): Dropout(p=0.0, inplace=False)
              )
            )
            (16): Block(
              (norm1): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
              (attn): Attention(
                (q): Linear(in_features=320, out_features=320, bias=True)
                (kv): Linear(in_features=320, out_features=640, bias=True)
                (attn_drop): Dropout(p=0.0, inplace=False)
                (proj): Linear(in_features=320, out_features=320, bias=True)
                (proj_drop): Dropout(p=0.0, inplace=False)
                (sr): Conv2d(320, 320, kernel_size=(2, 2), stride=(2, 2))
                (norm): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
              )
              (drop_path): DropPath()
              (norm2): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
              (mlp): Mlp(
                (fc1): Linear(in_features=320, out_features=1280, bias=True)
                (dwconv): DWConv(
                  (dwconv): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1280)
                )
                (act): GELU()
                (fc2): Linear(in_features=1280, out_features=320, bias=True)
                (drop): Dropout(p=0.0, inplace=False)
              )
            )
            (17): Block(
              (norm1): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
              (attn): Attention(
                (q): Linear(in_features=320, out_features=320, bias=True)
                (kv): Linear(in_features=320, out_features=640, bias=True)
                (attn_drop): Dropout(p=0.0, inplace=False)
                (proj): Linear(in_features=320, out_features=320, bias=True)
                (proj_drop): Dropout(p=0.0, inplace=False)
                (sr): Conv2d(320, 320, kernel_size=(2, 2), stride=(2, 2))
                (norm): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
              )
              (drop_path): DropPath()
              (norm2): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
              (mlp): Mlp(
                (fc1): Linear(in_features=320, out_features=1280, bias=True)
                (dwconv): DWConv(
                  (dwconv): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1280)
                )
                (act): GELU()
                (fc2): Linear(in_features=1280, out_features=320, bias=True)
                (drop): Dropout(p=0.0, inplace=False)
              )
            )
            (18): Block(
              (norm1): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
              (attn): Attention(
                (q): Linear(in_features=320, out_features=320, bias=True)
                (kv): Linear(in_features=320, out_features=640, bias=True)
                (attn_drop): Dropout(p=0.0, inplace=False)
                (proj): Linear(in_features=320, out_features=320, bias=True)
                (proj_drop): Dropout(p=0.0, inplace=False)
                (sr): Conv2d(320, 320, kernel_size=(2, 2), stride=(2, 2))
                (norm): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
              )
              (drop_path): DropPath()
              (norm2): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
              (mlp): Mlp(
                (fc1): Linear(in_features=320, out_features=1280, bias=True)
                (dwconv): DWConv(
                  (dwconv): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1280)
                )
                (act): GELU()
                (fc2): Linear(in_features=1280, out_features=320, bias=True)
                (drop): Dropout(p=0.0, inplace=False)
              )
            )
            (19): Block(
              (norm1): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
              (attn): Attention(
                (q): Linear(in_features=320, out_features=320, bias=True)
                (kv): Linear(in_features=320, out_features=640, bias=True)
                (attn_drop): Dropout(p=0.0, inplace=False)
                (proj): Linear(in_features=320, out_features=320, bias=True)
                (proj_drop): Dropout(p=0.0, inplace=False)
                (sr): Conv2d(320, 320, kernel_size=(2, 2), stride=(2, 2))
                (norm): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
              )
              (drop_path): DropPath()
              (norm2): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
              (mlp): Mlp(
                (fc1): Linear(in_features=320, out_features=1280, bias=True)
                (dwconv): DWConv(
                  (dwconv): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1280)
                )
                (act): GELU()
                (fc2): Linear(in_features=1280, out_features=320, bias=True)
                (drop): Dropout(p=0.0, inplace=False)
              )
            )
            (20): Block(
              (norm1): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
              (attn): Attention(
                (q): Linear(in_features=320, out_features=320, bias=True)
                (kv): Linear(in_features=320, out_features=640, bias=True)
                (attn_drop): Dropout(p=0.0, inplace=False)
                (proj): Linear(in_features=320, out_features=320, bias=True)
                (proj_drop): Dropout(p=0.0, inplace=False)
                (sr): Conv2d(320, 320, kernel_size=(2, 2), stride=(2, 2))
                (norm): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
              )
              (drop_path): DropPath()
              (norm2): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
              (mlp): Mlp(
                (fc1): Linear(in_features=320, out_features=1280, bias=True)
                (dwconv): DWConv(
                  (dwconv): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1280)
                )
                (act): GELU()
                (fc2): Linear(in_features=1280, out_features=320, bias=True)
                (drop): Dropout(p=0.0, inplace=False)
              )
            )
            (21): Block(
              (norm1): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
              (attn): Attention(
                (q): Linear(in_features=320, out_features=320, bias=True)
                (kv): Linear(in_features=320, out_features=640, bias=True)
                (attn_drop): Dropout(p=0.0, inplace=False)
                (proj): Linear(in_features=320, out_features=320, bias=True)
                (proj_drop): Dropout(p=0.0, inplace=False)
                (sr): Conv2d(320, 320, kernel_size=(2, 2), stride=(2, 2))
                (norm): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
              )
              (drop_path): DropPath()
              (norm2): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
              (mlp): Mlp(
                (fc1): Linear(in_features=320, out_features=1280, bias=True)
                (dwconv): DWConv(
                  (dwconv): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1280)
                )
                (act): GELU()
                (fc2): Linear(in_features=1280, out_features=320, bias=True)
                (drop): Dropout(p=0.0, inplace=False)
              )
            )
            (22): Block(
              (norm1): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
              (attn): Attention(
                (q): Linear(in_features=320, out_features=320, bias=True)
                (kv): Linear(in_features=320, out_features=640, bias=True)
                (attn_drop): Dropout(p=0.0, inplace=False)
                (proj): Linear(in_features=320, out_features=320, bias=True)
                (proj_drop): Dropout(p=0.0, inplace=False)
                (sr): Conv2d(320, 320, kernel_size=(2, 2), stride=(2, 2))
                (norm): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
              )
              (drop_path): DropPath()
              (norm2): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
              (mlp): Mlp(
                (fc1): Linear(in_features=320, out_features=1280, bias=True)
                (dwconv): DWConv(
                  (dwconv): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1280)
                )
                (act): GELU()
                (fc2): Linear(in_features=1280, out_features=320, bias=True)
                (drop): Dropout(p=0.0, inplace=False)
              )
            )
            (23): Block(
              (norm1): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
              (attn): Attention(
                (q): Linear(in_features=320, out_features=320, bias=True)
                (kv): Linear(in_features=320, out_features=640, bias=True)
                (attn_drop): Dropout(p=0.0, inplace=False)
                (proj): Linear(in_features=320, out_features=320, bias=True)
                (proj_drop): Dropout(p=0.0, inplace=False)
                (sr): Conv2d(320, 320, kernel_size=(2, 2), stride=(2, 2))
                (norm): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
              )
              (drop_path): DropPath()
              (norm2): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
              (mlp): Mlp(
                (fc1): Linear(in_features=320, out_features=1280, bias=True)
                (dwconv): DWConv(
                  (dwconv): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1280)
                )
                (act): GELU()
                (fc2): Linear(in_features=1280, out_features=320, bias=True)
                (drop): Dropout(p=0.0, inplace=False)
              )
            )
            (24): Block(
              (norm1): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
              (attn): Attention(
                (q): Linear(in_features=320, out_features=320, bias=True)
                (kv): Linear(in_features=320, out_features=640, bias=True)
                (attn_drop): Dropout(p=0.0, inplace=False)
                (proj): Linear(in_features=320, out_features=320, bias=True)
                (proj_drop): Dropout(p=0.0, inplace=False)
                (sr): Conv2d(320, 320, kernel_size=(2, 2), stride=(2, 2))
                (norm): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
              )
              (drop_path): DropPath()
              (norm2): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
              (mlp): Mlp(
                (fc1): Linear(in_features=320, out_features=1280, bias=True)
                (dwconv): DWConv(
                  (dwconv): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1280)
                )
                (act): GELU()
                (fc2): Linear(in_features=1280, out_features=320, bias=True)
                (drop): Dropout(p=0.0, inplace=False)
              )
            )
            (25): Block(
              (norm1): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
              (attn): Attention(
                (q): Linear(in_features=320, out_features=320, bias=True)
                (kv): Linear(in_features=320, out_features=640, bias=True)
                (attn_drop): Dropout(p=0.0, inplace=False)
                (proj): Linear(in_features=320, out_features=320, bias=True)
                (proj_drop): Dropout(p=0.0, inplace=False)
                (sr): Conv2d(320, 320, kernel_size=(2, 2), stride=(2, 2))
                (norm): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
              )
              (drop_path): DropPath()
              (norm2): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
              (mlp): Mlp(
                (fc1): Linear(in_features=320, out_features=1280, bias=True)
                (dwconv): DWConv(
                  (dwconv): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1280)
                )
                (act): GELU()
                (fc2): Linear(in_features=1280, out_features=320, bias=True)
                (drop): Dropout(p=0.0, inplace=False)
              )
            )
            (26): Block(
              (norm1): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
              (attn): Attention(
                (q): Linear(in_features=320, out_features=320, bias=True)
                (kv): Linear(in_features=320, out_features=640, bias=True)
                (attn_drop): Dropout(p=0.0, inplace=False)
                (proj): Linear(in_features=320, out_features=320, bias=True)
                (proj_drop): Dropout(p=0.0, inplace=False)
                (sr): Conv2d(320, 320, kernel_size=(2, 2), stride=(2, 2))
                (norm): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
              )
              (drop_path): DropPath()
              (norm2): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
              (mlp): Mlp(
                (fc1): Linear(in_features=320, out_features=1280, bias=True)
                (dwconv): DWConv(
                  (dwconv): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1280)
                )
                (act): GELU()
                (fc2): Linear(in_features=1280, out_features=320, bias=True)
                (drop): Dropout(p=0.0, inplace=False)
              )
            )
            (27): Block(
              (norm1): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
              (attn): Attention(
                (q): Linear(in_features=320, out_features=320, bias=True)
                (kv): Linear(in_features=320, out_features=640, bias=True)
                (attn_drop): Dropout(p=0.0, inplace=False)
                (proj): Linear(in_features=320, out_features=320, bias=True)
                (proj_drop): Dropout(p=0.0, inplace=False)
                (sr): Conv2d(320, 320, kernel_size=(2, 2), stride=(2, 2))
                (norm): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
              )
              (drop_path): DropPath()
              (norm2): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
              (mlp): Mlp(
                (fc1): Linear(in_features=320, out_features=1280, bias=True)
                (dwconv): DWConv(
                  (dwconv): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1280)
                )
                (act): GELU()
                (fc2): Linear(in_features=1280, out_features=320, bias=True)
                (drop): Dropout(p=0.0, inplace=False)
              )
            )
            (28): Block(
              (norm1): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
              (attn): Attention(
                (q): Linear(in_features=320, out_features=320, bias=True)
                (kv): Linear(in_features=320, out_features=640, bias=True)
                (attn_drop): Dropout(p=0.0, inplace=False)
                (proj): Linear(in_features=320, out_features=320, bias=True)
                (proj_drop): Dropout(p=0.0, inplace=False)
                (sr): Conv2d(320, 320, kernel_size=(2, 2), stride=(2, 2))
                (norm): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
              )
              (drop_path): DropPath()
              (norm2): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
              (mlp): Mlp(
                (fc1): Linear(in_features=320, out_features=1280, bias=True)
                (dwconv): DWConv(
                  (dwconv): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1280)
                )
                (act): GELU()
                (fc2): Linear(in_features=1280, out_features=320, bias=True)
                (drop): Dropout(p=0.0, inplace=False)
              )
            )
            (29): Block(
              (norm1): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
              (attn): Attention(
                (q): Linear(in_features=320, out_features=320, bias=True)
                (kv): Linear(in_features=320, out_features=640, bias=True)
                (attn_drop): Dropout(p=0.0, inplace=False)
                (proj): Linear(in_features=320, out_features=320, bias=True)
                (proj_drop): Dropout(p=0.0, inplace=False)
                (sr): Conv2d(320, 320, kernel_size=(2, 2), stride=(2, 2))
                (norm): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
              )
              (drop_path): DropPath()
              (norm2): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
              (mlp): Mlp(
                (fc1): Linear(in_features=320, out_features=1280, bias=True)
                (dwconv): DWConv(
                  (dwconv): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1280)
                )
                (act): GELU()
                (fc2): Linear(in_features=1280, out_features=320, bias=True)
                (drop): Dropout(p=0.0, inplace=False)
              )
            )
            (30): Block(
              (norm1): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
              (attn): Attention(
                (q): Linear(in_features=320, out_features=320, bias=True)
                (kv): Linear(in_features=320, out_features=640, bias=True)
                (attn_drop): Dropout(p=0.0, inplace=False)
                (proj): Linear(in_features=320, out_features=320, bias=True)
                (proj_drop): Dropout(p=0.0, inplace=False)
                (sr): Conv2d(320, 320, kernel_size=(2, 2), stride=(2, 2))
                (norm): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
              )
              (drop_path): DropPath()
              (norm2): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
              (mlp): Mlp(
                (fc1): Linear(in_features=320, out_features=1280, bias=True)
                (dwconv): DWConv(
                  (dwconv): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1280)
                )
                (act): GELU()
                (fc2): Linear(in_features=1280, out_features=320, bias=True)
                (drop): Dropout(p=0.0, inplace=False)
              )
            )
            (31): Block(
              (norm1): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
              (attn): Attention(
                (q): Linear(in_features=320, out_features=320, bias=True)
                (kv): Linear(in_features=320, out_features=640, bias=True)
                (attn_drop): Dropout(p=0.0, inplace=False)
                (proj): Linear(in_features=320, out_features=320, bias=True)
                (proj_drop): Dropout(p=0.0, inplace=False)
                (sr): Conv2d(320, 320, kernel_size=(2, 2), stride=(2, 2))
                (norm): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
              )
              (drop_path): DropPath()
              (norm2): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
              (mlp): Mlp(
                (fc1): Linear(in_features=320, out_features=1280, bias=True)
                (dwconv): DWConv(
                  (dwconv): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1280)
                )
                (act): GELU()
                (fc2): Linear(in_features=1280, out_features=320, bias=True)
                (drop): Dropout(p=0.0, inplace=False)
              )
            )
            (32): Block(
              (norm1): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
              (attn): Attention(
                (q): Linear(in_features=320, out_features=320, bias=True)
                (kv): Linear(in_features=320, out_features=640, bias=True)
                (attn_drop): Dropout(p=0.0, inplace=False)
                (proj): Linear(in_features=320, out_features=320, bias=True)
                (proj_drop): Dropout(p=0.0, inplace=False)
                (sr): Conv2d(320, 320, kernel_size=(2, 2), stride=(2, 2))
                (norm): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
              )
              (drop_path): DropPath()
              (norm2): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
              (mlp): Mlp(
                (fc1): Linear(in_features=320, out_features=1280, bias=True)
                (dwconv): DWConv(
                  (dwconv): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1280)
                )
                (act): GELU()
                (fc2): Linear(in_features=1280, out_features=320, bias=True)
                (drop): Dropout(p=0.0, inplace=False)
              )
            )
            (33): Block(
              (norm1): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
              (attn): Attention(
                (q): Linear(in_features=320, out_features=320, bias=True)
                (kv): Linear(in_features=320, out_features=640, bias=True)
                (attn_drop): Dropout(p=0.0, inplace=False)
                (proj): Linear(in_features=320, out_features=320, bias=True)
                (proj_drop): Dropout(p=0.0, inplace=False)
                (sr): Conv2d(320, 320, kernel_size=(2, 2), stride=(2, 2))
                (norm): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
              )
              (drop_path): DropPath()
              (norm2): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
              (mlp): Mlp(
                (fc1): Linear(in_features=320, out_features=1280, bias=True)
                (dwconv): DWConv(
                  (dwconv): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1280)
                )
                (act): GELU()
                (fc2): Linear(in_features=1280, out_features=320, bias=True)
                (drop): Dropout(p=0.0, inplace=False)
              )
            )
            (34): Block(
              (norm1): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
              (attn): Attention(
                (q): Linear(in_features=320, out_features=320, bias=True)
                (kv): Linear(in_features=320, out_features=640, bias=True)
                (attn_drop): Dropout(p=0.0, inplace=False)
                (proj): Linear(in_features=320, out_features=320, bias=True)
                (proj_drop): Dropout(p=0.0, inplace=False)
                (sr): Conv2d(320, 320, kernel_size=(2, 2), stride=(2, 2))
                (norm): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
              )
              (drop_path): DropPath()
              (norm2): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
              (mlp): Mlp(
                (fc1): Linear(in_features=320, out_features=1280, bias=True)
                (dwconv): DWConv(
                  (dwconv): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1280)
                )
                (act): GELU()
                (fc2): Linear(in_features=1280, out_features=320, bias=True)
                (drop): Dropout(p=0.0, inplace=False)
              )
            )
            (35): Block(
              (norm1): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
              (attn): Attention(
                (q): Linear(in_features=320, out_features=320, bias=True)
                (kv): Linear(in_features=320, out_features=640, bias=True)
                (attn_drop): Dropout(p=0.0, inplace=False)
                (proj): Linear(in_features=320, out_features=320, bias=True)
                (proj_drop): Dropout(p=0.0, inplace=False)
                (sr): Conv2d(320, 320, kernel_size=(2, 2), stride=(2, 2))
                (norm): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
              )
              (drop_path): DropPath()
              (norm2): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
              (mlp): Mlp(
                (fc1): Linear(in_features=320, out_features=1280, bias=True)
                (dwconv): DWConv(
                  (dwconv): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1280)
                )
                (act): GELU()
                (fc2): Linear(in_features=1280, out_features=320, bias=True)
                (drop): Dropout(p=0.0, inplace=False)
              )
            )
            (36): Block(
              (norm1): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
              (attn): Attention(
                (q): Linear(in_features=320, out_features=320, bias=True)
                (kv): Linear(in_features=320, out_features=640, bias=True)
                (attn_drop): Dropout(p=0.0, inplace=False)
                (proj): Linear(in_features=320, out_features=320, bias=True)
                (proj_drop): Dropout(p=0.0, inplace=False)
                (sr): Conv2d(320, 320, kernel_size=(2, 2), stride=(2, 2))
                (norm): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
              )
              (drop_path): DropPath()
              (norm2): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
              (mlp): Mlp(
                (fc1): Linear(in_features=320, out_features=1280, bias=True)
                (dwconv): DWConv(
                  (dwconv): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1280)
                )
                (act): GELU()
                (fc2): Linear(in_features=1280, out_features=320, bias=True)
                (drop): Dropout(p=0.0, inplace=False)
              )
            )
            (37): Block(
              (norm1): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
              (attn): Attention(
                (q): Linear(in_features=320, out_features=320, bias=True)
                (kv): Linear(in_features=320, out_features=640, bias=True)
                (attn_drop): Dropout(p=0.0, inplace=False)
                (proj): Linear(in_features=320, out_features=320, bias=True)
                (proj_drop): Dropout(p=0.0, inplace=False)
                (sr): Conv2d(320, 320, kernel_size=(2, 2), stride=(2, 2))
                (norm): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
              )
              (drop_path): DropPath()
              (norm2): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
              (mlp): Mlp(
                (fc1): Linear(in_features=320, out_features=1280, bias=True)
                (dwconv): DWConv(
                  (dwconv): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1280)
                )
                (act): GELU()
                (fc2): Linear(in_features=1280, out_features=320, bias=True)
                (drop): Dropout(p=0.0, inplace=False)
              )
            )
            (38): Block(
              (norm1): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
              (attn): Attention(
                (q): Linear(in_features=320, out_features=320, bias=True)
                (kv): Linear(in_features=320, out_features=640, bias=True)
                (attn_drop): Dropout(p=0.0, inplace=False)
                (proj): Linear(in_features=320, out_features=320, bias=True)
                (proj_drop): Dropout(p=0.0, inplace=False)
                (sr): Conv2d(320, 320, kernel_size=(2, 2), stride=(2, 2))
                (norm): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
              )
              (drop_path): DropPath()
              (norm2): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
              (mlp): Mlp(
                (fc1): Linear(in_features=320, out_features=1280, bias=True)
                (dwconv): DWConv(
                  (dwconv): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1280)
                )
                (act): GELU()
                (fc2): Linear(in_features=1280, out_features=320, bias=True)
                (drop): Dropout(p=0.0, inplace=False)
              )
            )
            (39): Block(
              (norm1): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
              (attn): Attention(
                (q): Linear(in_features=320, out_features=320, bias=True)
                (kv): Linear(in_features=320, out_features=640, bias=True)
                (attn_drop): Dropout(p=0.0, inplace=False)
                (proj): Linear(in_features=320, out_features=320, bias=True)
                (proj_drop): Dropout(p=0.0, inplace=False)
                (sr): Conv2d(320, 320, kernel_size=(2, 2), stride=(2, 2))
                (norm): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
              )
              (drop_path): DropPath()
              (norm2): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
              (mlp): Mlp(
                (fc1): Linear(in_features=320, out_features=1280, bias=True)
                (dwconv): DWConv(
                  (dwconv): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1280)
                )
                (act): GELU()
                (fc2): Linear(in_features=1280, out_features=320, bias=True)
                (drop): Dropout(p=0.0, inplace=False)
              )
            )
          )
          (norm3): LayerNorm((320,), eps=1e-06, elementwise_affine=True)
          (block4): ModuleList(
            (0): Block(
              (norm1): LayerNorm((512,), eps=1e-06, elementwise_affine=True)
              (attn): Attention(
                (q): Linear(in_features=512, out_features=512, bias=True)
                (kv): Linear(in_features=512, out_features=1024, bias=True)
                (attn_drop): Dropout(p=0.0, inplace=False)
                (proj): Linear(in_features=512, out_features=512, bias=True)
                (proj_drop): Dropout(p=0.0, inplace=False)
              )
              (drop_path): DropPath()
              (norm2): LayerNorm((512,), eps=1e-06, elementwise_affine=True)
              (mlp): Mlp(
                (fc1): Linear(in_features=512, out_features=2048, bias=True)
                (dwconv): DWConv(
                  (dwconv): Conv2d(2048, 2048, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=2048)
                )
                (act): GELU()
                (fc2): Linear(in_features=2048, out_features=512, bias=True)
                (drop): Dropout(p=0.0, inplace=False)
              )
            )
            (1): Block(
              (norm1): LayerNorm((512,), eps=1e-06, elementwise_affine=True)
              (attn): Attention(
                (q): Linear(in_features=512, out_features=512, bias=True)
                (kv): Linear(in_features=512, out_features=1024, bias=True)
                (attn_drop): Dropout(p=0.0, inplace=False)
                (proj): Linear(in_features=512, out_features=512, bias=True)
                (proj_drop): Dropout(p=0.0, inplace=False)
              )
              (drop_path): DropPath()
              (norm2): LayerNorm((512,), eps=1e-06, elementwise_affine=True)
              (mlp): Mlp(
                (fc1): Linear(in_features=512, out_features=2048, bias=True)
                (dwconv): DWConv(
                  (dwconv): Conv2d(2048, 2048, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=2048)
                )
                (act): GELU()
                (fc2): Linear(in_features=2048, out_features=512, bias=True)
                (drop): Dropout(p=0.0, inplace=False)
              )
            )
            (2): Block(
              (norm1): LayerNorm((512,), eps=1e-06, elementwise_affine=True)
              (attn): Attention(
                (q): Linear(in_features=512, out_features=512, bias=True)
                (kv): Linear(in_features=512, out_features=1024, bias=True)
                (attn_drop): Dropout(p=0.0, inplace=False)
                (proj): Linear(in_features=512, out_features=512, bias=True)
                (proj_drop): Dropout(p=0.0, inplace=False)
              )
              (drop_path): DropPath()
              (norm2): LayerNorm((512,), eps=1e-06, elementwise_affine=True)
              (mlp): Mlp(
                (fc1): Linear(in_features=512, out_features=2048, bias=True)
                (dwconv): DWConv(
                  (dwconv): Conv2d(2048, 2048, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=2048)
                )
                (act): GELU()
                (fc2): Linear(in_features=2048, out_features=512, bias=True)
                (drop): Dropout(p=0.0, inplace=False)
              )
            )
          )
          (norm4): LayerNorm((512,), eps=1e-06, elementwise_affine=True)
        )
        (decode_head): DAFormerHead(
          input_transform=multiple_select, ignore_index=255, align_corners=False
          (loss_decode): CrossEntropyLoss()
          (conv_seg): Conv2d(256, 19, kernel_size=(1, 1), stride=(1, 1))
          (dropout): Dropout2d(p=0.1, inplace=False)
          (embed_layers): ModuleDict(
            (0): MLP(
              (proj): Linear(in_features=64, out_features=256, bias=True)
            )
            (1): MLP(
              (proj): Linear(in_features=128, out_features=256, bias=True)
            )
            (2): MLP(
              (proj): Linear(in_features=320, out_features=256, bias=True)
            )
            (3): MLP(
              (proj): Linear(in_features=512, out_features=256, bias=True)
            )
          )
          (fuse_layer): ASPPWrapper(
            (aspp_modules): DepthwiseSeparableASPPModule(
              (0): ConvModule(
                (conv): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
                (bn): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
                (activate): ReLU(inplace=True)
              )
              (1): DepthwiseSeparableConvModule(
                (depthwise_conv): ConvModule(
                  (conv): Conv2d(1024, 1024, kernel_size=(3, 3), stride=(1, 1), padding=(6, 6), dilation=(6, 6), groups=1024, bias=False)
                  (bn): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
                  (activate): ReLU(inplace=True)
                )
                (pointwise_conv): ConvModule(
                  (conv): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
                  (bn): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
                  (activate): ReLU(inplace=True)
                )
              )
              (2): DepthwiseSeparableConvModule(
                (depthwise_conv): ConvModule(
                  (conv): Conv2d(1024, 1024, kernel_size=(3, 3), stride=(1, 1), padding=(12, 12), dilation=(12, 12), groups=1024, bias=False)
                  (bn): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
                  (activate): ReLU(inplace=True)
                )
                (pointwise_conv): ConvModule(
                  (conv): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
                  (bn): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
                  (activate): ReLU(inplace=True)
                )
              )
              (3): DepthwiseSeparableConvModule(
                (depthwise_conv): ConvModule(
                  (conv): Conv2d(1024, 1024, kernel_size=(3, 3), stride=(1, 1), padding=(18, 18), dilation=(18, 18), groups=1024, bias=False)
                  (bn): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
                  (activate): ReLU(inplace=True)
                )
                (pointwise_conv): ConvModule(
                  (conv): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
                  (bn): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
                  (activate): ReLU(inplace=True)
                )
              )
            )
            (bottleneck): ConvModule(
              (conv): Conv2d(1024, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
              (bn): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
              (activate): ReLU(inplace=True)
            )
          )
        )
        init_cfg={'type': 'Normal', 'std': 0.01, 'override': {'name': 'conv_seg'}}
      )
    )
    2021-12-23 22:42:54,654 - mmseg - INFO - Loaded 1001 images from data/sim/images
    2021-12-23 22:42:54,664 - mmseg - INFO - Loaded 1001 images from data/forest/images
    2021-12-23 22:42:54,666 - mmseg - INFO - RCS Classes: [6, 5, 1, 0, 2, 4, 3]
    2021-12-23 22:42:54,666 - mmseg - INFO - RCS ClassProb: [3.9898804e-01 3.4685844e-01 2.4818756e-01 5.9657078e-03 1.8850398e-07
     4.6506953e-16 2.8859502e-18]
    2021-12-23 22:42:58,769 - mmseg - INFO - Loaded 1001 images from data/forest/images
    2021-12-23 22:42:58,769 - mmseg - INFO - Start running, host: hans@hans-3080-desktop, work_dir: /home/hans/Documents/part3project/Models/DAFormer/work_dirs/local-basic/211223_2242_sim2forest_uda_warm_fdthings_rcs_croppl_a999_daformer_mitb5_s0_dfad9
    2021-12-23 22:42:58,769 - mmseg - INFO - workflow: [('train', 1)], max: 40000 iters
    Traceback (most recent call last):
      File "run_experiments.py", line 101, in <module>
        train.main([config_files[i]])
      File "/home/hans/Documents/part3project/Models/DAFormer/tools/train.py", line 166, in main
        train_segmentor(
      File "/home/hans/Documents/part3project/Models/DAFormer/mmseg/apis/train.py", line 131, in train_segmentor
        runner.run(data_loaders, cfg.workflow)
      File "/home/hans/Documents/part3project/Models/DAFormer/env/lib/python3.8/site-packages/mmcv/runner/iter_based_runner.py", line 131, in run
        iter_runner(iter_loaders[i], **kwargs)
      File "/home/hans/Documents/part3project/Models/DAFormer/env/lib/python3.8/site-packages/mmcv/runner/iter_based_runner.py", line 60, in train
        outputs = self.model.train_step(data_batch, self.optimizer, **kwargs)
      File "/home/hans/Documents/part3project/Models/DAFormer/env/lib/python3.8/site-packages/mmcv/parallel/data_parallel.py", line 67, in train_step
        return self.module.train_step(*inputs[0], **kwargs[0])
      File "/home/hans/Documents/part3project/Models/DAFormer/mmseg/models/uda/dacs.py", line 138, in train_step
        log_vars = self(**data_batch)
      File "/home/hans/Documents/part3project/Models/DAFormer/env/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
        return forward_call(*input, **kwargs)
      File "/home/hans/Documents/part3project/Models/DAFormer/env/lib/python3.8/site-packages/mmcv/runner/fp16_utils.py", line 97, in new_func
        return old_func(*args, **kwargs)
      File "/home/hans/Documents/part3project/Models/DAFormer/mmseg/models/segmentors/base.py", line 109, in forward
        return self.forward_train(img, img_metas, **kwargs)
      File "/home/hans/Documents/part3project/Models/DAFormer/mmseg/models/uda/dacs.py", line 232, in forward_train
        clean_loss.backward(retain_graph=self.enable_fdist)
      File "/home/hans/Documents/part3project/Models/DAFormer/env/lib/python3.8/site-packages/torch/_tensor.py", line 255, in backward
        torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
      File "/home/hans/Documents/part3project/Models/DAFormer/env/lib/python3.8/site-packages/torch/autograd/__init__.py", line 147, in backward
        Variable._execution_engine.run_backward(
    RuntimeError: CUDA out of memory. Tried to allocate 624.00 MiB (GPU 0; 9.78 GiB total capacity; 6.61 GiB already allocated; 624.69 MiB free; 6.79 GiB reserved in total by PyTorch)
    
    
    opened by DJayalath 4
  • Invitation of contributing to MMSegmentation.

    Invitation of contributing to MMSegmentation.

    Hi, thanks for your excellent work.

    We are members of OpenMMLab whose codebase MMSegmentation is related with your excellent work. Would you like to join us to make a pr about your repo? Currnetly our tasks are all fully supervised segmentation. I think if DAFormer is supported, more researchers could use and cite this method.

    Looking forward to your reply!

    Best,

    opened by MengzhangLI 4
  • Losses are backpropagated separately

    Losses are backpropagated separately

    Dear authors, thank you for your outstanding work. In the process of reading the code, I found that the loss is backpropagated separately. In many other works, the loss is backpropagated after accumulation. What's the difference between the two? Looking forward to your reply.

    opened by HKQX 3
  • why not support Multi-GPU? is it not necessary for UDA?

    why not support Multi-GPU? is it not necessary for UDA?

    opened by fuweifu-vtoo 3
  • code about

    code about "Rare Class Sampling (RCS)"

    hi, I want to find the code about Rare Class Sampling module. But I don't find it. I am not at coding. So I want to ask where can find your Rare Class Sampling module. I want to use it in other job because this is a very potential module.

    opened by yuheyuan 4
  • MiT-B3 is much better than MiT-B4

    MiT-B3 is much better than MiT-B4

    Dear authors, thank you for your outstanding work. I have encountered results in reproducing your work that make me puzzled: in the GTAV->Cityscapes experiment, when the backbone network is MiT-B5, I get results similar to the paper (68.3); when the backbone network is MiT-B4, I get an mIoU of 66.69; when the backbone network is MiT-B3, I get an mIoU of 67.91. I am confused why MiT-B3 is so much better than MiT-B4. Have you conducted similar experiments? What are the results like?

    opened by 5as4as 2
  • How to tune hyper-parameters?

    How to tune hyper-parameters?

    It's really an awesome transformer based UDA work, thanks for your sharing codes.

    I want to ask an open question about UDA.

    Considering that the label of target domain is not available, it's impossible to directly evaluate model performance based on the target domain, then how to tune hyper-parameters? And I mean that the validation set of target domain can not be used for tuning hyper-parameters.

    Looking forward your reply. :-)

    opened by super233 0
Owner
Lukas Hoyer
Doctoral student at ETH Zurich
Lukas Hoyer
Pytorch Implementation for NeurIPS (oral) paper: Pixel Level Cycle Association: A New Perspective for Domain Adaptive Semantic Segmentation

Pixel-Level Cycle Association This is the Pytorch implementation of our NeurIPS 2020 Oral paper Pixel-Level Cycle Association: A New Perspective for D

null 87 Oct 19, 2022
Prototypical Pseudo Label Denoising and Target Structure Learning for Domain Adaptive Semantic Segmentation (CVPR 2021)

Prototypical Pseudo Label Denoising and Target Structure Learning for Domain Adaptive Semantic Segmentation (CVPR 2021, official Pytorch implementatio

Microsoft 247 Dec 25, 2022
LoveDA: A Remote Sensing Land-Cover Dataset for Domain Adaptive Semantic Segmentation (NeurIPS2021 Benchmark and Dataset Track)

LoveDA: A Remote Sensing Land-Cover Dataset for Domain Adaptive Semantic Segmentation by Junjue Wang, Zhuo Zheng, Ailong Ma, Xiaoyan Lu, and Yanfei Zh

Kingdrone 174 Dec 22, 2022
Code for our paper Domain Adaptive Semantic Segmentation with Self-Supervised Depth Estimation

CorDA Code for our paper Domain Adaptive Semantic Segmentation with Self-Supervised Depth Estimation Prerequisite Please create and activate the follo

Qin Wang 60 Nov 30, 2022
LoveDA: A Remote Sensing Land-Cover Dataset for Domain Adaptive Semantic Segmentation

LoveDA: A Remote Sensing Land-Cover Dataset for Domain Adaptive Semantic Segmentation by Junjue Wang, Zhuo Zheng, Ailong Ma, Xiaoyan Lu, and Yanfei Zh

Payphone 8 Nov 21, 2022
Adaptive Pyramid Context Network for Semantic Segmentation (APCNet CVPR'2019)

Adaptive Pyramid Context Network for Semantic Segmentation (APCNet CVPR'2019) Introduction Official implementation of Adaptive Pyramid Context Network

null 21 Nov 9, 2022
Official code of Retinal Vessel Segmentation with Pixel-wise Adaptive Filters and Consistency Training

Official code of Retinal Vessel Segmentation with Pixel-wise Adaptive Filters and Consistency Training (ISBI 2022)

anonymous 7 Feb 10, 2022
An official implementation of the paper Exploring Sequence Feature Alignment for Domain Adaptive Detection Transformers

Sequence Feature Alignment (SFA) By Wen Wang, Yang Cao, Jing Zhang, Fengxiang He, Zheng-jun Zha, Yonggang Wen, and Dacheng Tao This repository is an o

WangWen 79 Dec 24, 2022
(CVPR2021) DANNet: A One-Stage Domain Adaptation Network for Unsupervised Nighttime Semantic Segmentation

DANNet: A One-Stage Domain Adaptation Network for Unsupervised Nighttime Semantic Segmentation CVPR2021(oral) [arxiv] Requirements python3.7 pytorch==

W-zx-Y 85 Dec 7, 2022
Recall Loss for Semantic Segmentation (This repo implements the paper: Recall Loss for Semantic Segmentation)

Recall Loss for Semantic Segmentation (This repo implements the paper: Recall Loss for Semantic Segmentation) Download Synthia dataset The model uses

null 32 Sep 21, 2022
IAST: Instance Adaptive Self-training for Unsupervised Domain Adaptation (ECCV 2020)

This repo is the official implementation of our paper "Instance Adaptive Self-training for Unsupervised Domain Adaptation". The purpose of this repo is to better communicate with you and respond to your questions. This repo is almost the same with Another-Version, and you can also refer to that version.

CVSM Group -  email: czhu@bupt.edu.cn 84 Dec 12, 2022
Semantic Segmentation for Real Point Cloud Scenes via Bilateral Augmentation and Adaptive Fusion (CVPR 2021)

Semantic Segmentation for Real Point Cloud Scenes via Bilateral Augmentation and Adaptive Fusion (CVPR 2021) This repository is for BAAF-Net introduce

null 90 Dec 29, 2022
Official pytorch implementation of "Feature Stylization and Domain-aware Contrastive Loss for Domain Generalization" ACMMM 2021 (Oral)

Feature Stylization and Domain-aware Contrastive Loss for Domain Generalization This is an official implementation of "Feature Stylization and Domain-

null 22 Sep 22, 2022
This is the official implementation of TrivialAugment and a mini-library for the application of multiple image augmentation strategies including RandAugment and TrivialAugment.

Trivial Augment This is the official implementation of TrivialAugment (https://arxiv.org/abs/2103.10158), as was used for the paper. TrivialAugment is

AutoML-Freiburg-Hannover 94 Dec 30, 2022
Official PyTorch implementation of the paper: Improving Graph Neural Network Expressivity via Subgraph Isomorphism Counting.

Improving Graph Neural Network Expressivity via Subgraph Isomorphism Counting Official PyTorch implementation of the paper: Improving Graph Neural Net

Giorgos Bouritsas 58 Dec 31, 2022
Official implementation of the paper 'Efficient and Degradation-Adaptive Network for Real-World Image Super-Resolution'

DASR Paper Efficient and Degradation-Adaptive Network for Real-World Image Super-Resolution Jie Liang, Hui Zeng, and Lei Zhang. In arxiv preprint. Abs

null 81 Dec 28, 2022
Code image classification of MNIST dataset using different architectures: simple linear NN, autoencoder, and highway network

Deep Learning for image classification pip install -r http://webia.lip6.fr/~baskiotisn/requirements-amal.txt Train an autoencoder python3 train_auto

Hector Kohler 0 Mar 30, 2022
[CVPR 2021] Official PyTorch Implementation for "Iterative Filter Adaptive Network for Single Image Defocus Deblurring"

IFAN: Iterative Filter Adaptive Network for Single Image Defocus Deblurring Checkout for the demo (GUI/Google Colab)! The GUI version might occasional

Junyong Lee 173 Dec 30, 2022
This is an official PyTorch implementation of Task-Adaptive Neural Network Search with Meta-Contrastive Learning (NeurIPS 2021, Spotlight).

NeurIPS 2021 (Spotlight): Task-Adaptive Neural Network Search with Meta-Contrastive Learning This is an official PyTorch implementation of Task-Adapti

Wonyong Jeong 15 Nov 21, 2022