Codes for TS-CAM: Token Semantic Coupled Attention Map for Weakly Supervised Object Localization.

Overview

TS-CAM: Token Semantic Coupled Attention Map for Weakly SupervisedObject Localization

This is the official implementaion of paper TS-CAM: Token Semantic Coupled Attention Map for Weakly Supervised Object Localization

This repository contains Pytorch training code, evaluation code, pretrained models and jupyter notebook for more visualization.

Illustration

Based on Deit, TS-CAM couples attention maps from visual image transformer with semantic-aware maps to obtain accurate localization maps (Token Semantic Coupled Attention Map, ts-cam).

ts-cam

Model Zoo

We provide pretrained TS-CAM models trained on CUB-200-2011 and ImageNet_ILSVRC2012 datasets.

Dataset Loc.Acc@1 Loc.Acc@5 Loc.Gt-Known Cls.Acc@1 Cls.Acc@5 Baidu Drive Google Drive
CUB-200-2011 71.3 83.8 87.7 80.3 94.8 model model
ILSVRC2012 53.4 64.3 67.6 74.3 92.1 model model

Note: the Extrate Code for Baidu Drive is as follows:

Usage

First clone the repository locally:

git clone https://github.com/vasgaowei/TS-CAM.git

Then install Pytorch 1.7.0+ and torchvision 0.8.1+ and pytorch-image-models 0.3.2:


conda create -n pytorch1.7 python=3.6
conda activate pytorc1.7
conda install anaconda
conda install pytorch==1.7.0 torchvision==0.8.0 torchaudio==0.7.0 cudatoolkit=10.2 -c pytorch
pip install timm==0.3.2

Data preparation

CUB-200-2011 dataset

Please download and extrate CUB-200-2011 dataset.

The directory structure is the following:

TS-CAM/
  data/
    CUB-200-2011/
      attributes/
      images/
      parts/
      bounding_boxes.txt
      classes.txt
      image_class_labels.txt
      images.txt
      image_sizes.txt
      README
      train_test_split.txt

ImageNet1k

Download ILSVRC2012 dataset and extract train and val images.

The directory structure is organized as follows:

TS-CAM/
  data/
  ImageNet_ILSVRC2012/
    ILSVRC2012_list/
    train/
      n01440764/
        n01440764_18.JPEG
        ...
      n01514859/
        n01514859_1.JPEG
        ...
    val/
      n01440764/
        ILSVRC2012_val_00000293.JPEG
        ...
      n01531178/
        ILSVRC2012_val_00000570.JPEG
        ...
    ILSVRC2012_list/
      train.txt
      val_folder.txt
      val_folder_new.txt

And the training and validation data is expected to be in the train/ folder and val folder respectively:

For training:

On CUB-200-2011 dataset:

bash train_val_cub.sh {GPU_ID} ${NET}

On ImageNet1k dataset:

bash train_val_ilsvrc.sh {GPU_ID} ${NET}

Please note that pretrained model weights of Deit-tiny, Deit-small and Deit-base on ImageNet-1k model will be downloaded when you first train you model, so the Internet should be connected.

For evaluation:

On CUB-200-2011 dataset:

bash val_cub.sh {GPU_ID} ${NET} ${MODEL_PATH}

On ImageNet1k dataset:

bash val_ilsvrc.sh {GPU_ID} ${NET} ${MODEL_PATH}

GPU_ID should be specified and multiple GPUs can be used for accelerating training and evaluation.

NET shoule be chosen among tiny, small and base.

MODEL_PATH is the path of pretrained model.

Visualization

We provided jupyter notebook in tools_cam folder.

TS-CAM/
  tools-cam/
    visualization_attention_map_cub.ipynb
    visualization_attention_map_imaget.ipynb

Please download pretrained TS-CAM model weights and try more visualzation results((Attention maps using our method and Attention Rollout method)). You can try other interseting images you like to show the localization map(ts-cams).

Visualize localization results

We provide some visualization results as follows.

localization

Visualize attention maps

We can also visualize attention maps from different transformer layers.

attention maps_cub attention_map_ilsvrc

Contacts

If you have any question about our work or this repository, please don't hesitate to contact us by emails.

You can also open an issue under this project.

Citation

If you use this code for a paper please cite:

@article{Gao2021TSCAMTS,
  title={TS-CAM: Token Semantic Coupled Attention Map for Weakly Supervised Object Localization},
  author={Wei Gao and Fang Wan and Xingjia Pan and Zhiliang Peng and Qi Tian and Zhenjun Han and Bolei Zhou and Qixiang Ye},
  journal={ArXiv},
  year={2021},
  volume={abs/2103.14862}
}
Comments
  • 请问作者尝试过使用attention rollout的效果么

    请问作者尝试过使用attention rollout的效果么

    感谢作者公开自己工作的代码! 我注意到可视化部分使用了attention rollout的方法,效果看起来不错,请问作者有没有在transformer基础上和TSCAM基础上使用attention rollout后的map来计算最终的定位效果,会有提升么?还是因为没有精度上的提升仅作为可视化的参考?

    opened by ustcjinggg 2
  • TransAttention performance

    TransAttention performance

    Thanks for the great work.

    I have a question about how to reproduce TransAttention performance on CUB-200 (Table 5). I got much higher performance by changing https://github.com/vasgaowei/TS-CAM/blob/aeb823ee097ce0c5594b8cb10d14c0aa03652df3/lib/models/deit.py#L62 to cams = cams.repeat((1, 200, 1, 1)):

    Cls@1:0.803 Cls@5:0.948 Loc@1:0.690 Loc@5:0.816 Loc_gt:0.859 wrong_details:3998 1139 0 556 96 5

    And, I got Loc@1:0.154 Loc@5:0.177 Loc_gt:0.183 for TransCAM, seems there's a mistake in the table. Personally, I feel it's unfair to compare with TransCAM without tuning CAM_THR to its optimal, I can get Loc@1:0.333 Loc@5:0.379 Loc_gt:0.387 by setting CAM_THR to around 0.8, I wonder your thoughts here.

    opened by liruiwen 1
  • Paper Question about Table 1

    Paper Question about Table 1

    Dear authors,

    In Table1, the compared methods and the proposed method use different backbones. Could you interpret whether this comparison is fair?

    Thank you.

    opened by AmingWu 0
  • "creat_model" function call adjusted

    "creat_model" is returning three values but at function call it's assigning to four tuples. So, adjusting the assignment to three values (tuple)

    opened by shakeebmurtaza 0
  • The function of joint_attns_skip

    The function of joint_attns_skip

    Hi, thanks for your wonderful job! However, I dont get the function of joint_attns_skip in Line 471 of conformer.py, is the mean tensor of joint_attns_skip an optional cam?

    opened by xujianglan 1
  • Question about visualization

    Question about visualization

    Hello. At first, thank you for your code and paper.I really appreciate your work. I have a problem. When visualizing training log by our own dataset, we got some question.

    An error occur when i tried to call forward function of model.Error is showed as below.

    1657626264380 And I notice that when running the code below, it shows that our model is call visionTransformer, however the output in github show that model named TSCAM, can you tell me what make it different?

    1657626288610 I'm hoping for your answer Thank you!

    opened by Grand-ou 1
  • Question about the training process

    Question about the training process

    Greetings! Thanks for your excellent work! When running your code, I met a problem that the performance is poor. My running command is

     bash train_val_cub.sh 3 deit small 224
    

    and I got the log like:

    {'BASIC': {'BACKUP_CODES': True,
               'BACKUP_LIST': ['lib', 'tools_cam', 'configs'],
               'DISP_FREQ': 10,
               'GPU_ID': [0],
               'NUM_WORKERS': 40,
               'ROOT_DIR': './tools_cam/..',
               'SAVE_DIR': 'ckpt/CUB/deit_tscam_small_patch16_224_CAM-NORMAL_SEED26_CAM-THR0.1_BS128_2022-03-25-01-46',
               'SEED': 26,
               'TIME': '2022-03-25-01-46'},
     'CUDNN': {'BENCHMARK': False, 'DETERMINISTIC': True, 'ENABLE': True},
     'DATA': {'CROP_SIZE': 224,
              'DATADIR': 'data/CUB_200_2011',
              'DATASET': 'CUB',
              'IMAGE_MEAN': [0.485, 0.456, 0.406],
              'IMAGE_STD': [0.229, 0.224, 0.225],
              'NUM_CLASSES': 200,
              'RESIZE_SIZE': 256,
              'SCALE_LENGTH': 15,
              'SCALE_SIZE': 196,
              'TRAIN_AUG_PATH': '',
              'VAL_PATH': ''},
     'MODEL': {'ARCH': 'deit_tscam_small_patch16_224',
               'CAM_THR': 0.1,
               'LOCALIZER_DIR': '',
               'TOP_K': 1},
     'SOLVER': {'LR_FACTOR': 0.1,
                'LR_STEPS': [30],
                'MUMENTUM': 0.9,
                'NUM_EPOCHS': 60,
                'START_LR': 0.001,
                'WEIGHT_DECAY': 0.0005},
     'TEST': {'BATCH_SIZE': 128,
              'CKPT_DIR': '',
              'SAVE_BOXED_IMAGE': False,
              'SAVE_CAMS': False,
              'TEN_CROPS': False},
     'TRAIN': {'ALPHA': 1.0,
               'BATCH_SIZE': 128,
               'BETA': 1.0,
               'IF_FIX_WEIGHT': False}}
    ==> Preparing data...
    done!
    ==> Preparing networks for baseline...
    Removing key head.weight from pretrained checkpoint
    Removing key head.bias from pretrained checkpoint
    TSCAM(
      (patch_embed): PatchEmbed(
        (proj): Conv2d(3, 384, kernel_size=(16, 16), stride=(16, 16))
      )
      (pos_drop): Dropout(p=0.0, inplace=False)
      (blocks): ModuleList(
        (0): Block(
          (norm1): LayerNorm((384,), eps=1e-06, elementwise_affine=True)
          (attn): Attention(
            (qkv): Linear(in_features=384, out_features=1152, bias=True)
            (attn_drop): Dropout(p=0.0, inplace=False)
            (proj): Linear(in_features=384, out_features=384, bias=True)
            (proj_drop): Dropout(p=0.0, inplace=False)
          )
          (drop_path): Identity()
          (norm2): LayerNorm((384,), eps=1e-06, elementwise_affine=True)
          (mlp): Mlp(
            (fc1): Linear(in_features=384, out_features=1536, bias=True)
            (act): GELU()
            (fc2): Linear(in_features=1536, out_features=384, bias=True)
            (drop): Dropout(p=0.0, inplace=False)
          )
        )
        (1): Block(
          (norm1): LayerNorm((384,), eps=1e-06, elementwise_affine=True)
          (attn): Attention(
            (qkv): Linear(in_features=384, out_features=1152, bias=True)
            (attn_drop): Dropout(p=0.0, inplace=False)
            (proj): Linear(in_features=384, out_features=384, bias=True)
            (proj_drop): Dropout(p=0.0, inplace=False)
          )
          (drop_path): DropPath()
          (norm2): LayerNorm((384,), eps=1e-06, elementwise_affine=True)
          (mlp): Mlp(
            (fc1): Linear(in_features=384, out_features=1536, bias=True)
            (act): GELU()
            (fc2): Linear(in_features=1536, out_features=384, bias=True)
            (drop): Dropout(p=0.0, inplace=False)
          )
        )
        (2): Block(
          (norm1): LayerNorm((384,), eps=1e-06, elementwise_affine=True)
          (attn): Attention(
            (qkv): Linear(in_features=384, out_features=1152, bias=True)
            (attn_drop): Dropout(p=0.0, inplace=False)
            (proj): Linear(in_features=384, out_features=384, bias=True)
            (proj_drop): Dropout(p=0.0, inplace=False)
          )
          (drop_path): DropPath()
          (norm2): LayerNorm((384,), eps=1e-06, elementwise_affine=True)
          (mlp): Mlp(
            (fc1): Linear(in_features=384, out_features=1536, bias=True)
            (act): GELU()
            (fc2): Linear(in_features=1536, out_features=384, bias=True)
            (drop): Dropout(p=0.0, inplace=False)
          )
        )
        (3): Block(
          (norm1): LayerNorm((384,), eps=1e-06, elementwise_affine=True)
          (attn): Attention(
            (qkv): Linear(in_features=384, out_features=1152, bias=True)
            (attn_drop): Dropout(p=0.0, inplace=False)
            (proj): Linear(in_features=384, out_features=384, bias=True)
            (proj_drop): Dropout(p=0.0, inplace=False)
          )
          (drop_path): DropPath()
          (norm2): LayerNorm((384,), eps=1e-06, elementwise_affine=True)
          (mlp): Mlp(
            (fc1): Linear(in_features=384, out_features=1536, bias=True)
            (act): GELU()
            (fc2): Linear(in_features=1536, out_features=384, bias=True)
            (drop): Dropout(p=0.0, inplace=False)
          )
        )
        (4): Block(
          (norm1): LayerNorm((384,), eps=1e-06, elementwise_affine=True)
          (attn): Attention(
            (qkv): Linear(in_features=384, out_features=1152, bias=True)
            (attn_drop): Dropout(p=0.0, inplace=False)
            (proj): Linear(in_features=384, out_features=384, bias=True)
            (proj_drop): Dropout(p=0.0, inplace=False)
          )
          (drop_path): DropPath()
          (norm2): LayerNorm((384,), eps=1e-06, elementwise_affine=True)
          (mlp): Mlp(
            (fc1): Linear(in_features=384, out_features=1536, bias=True)
            (act): GELU()
            (fc2): Linear(in_features=1536, out_features=384, bias=True)
            (drop): Dropout(p=0.0, inplace=False)
          )
        )
        (5): Block(
          (norm1): LayerNorm((384,), eps=1e-06, elementwise_affine=True)
          (attn): Attention(
            (qkv): Linear(in_features=384, out_features=1152, bias=True)
            (attn_drop): Dropout(p=0.0, inplace=False)
            (proj): Linear(in_features=384, out_features=384, bias=True)
            (proj_drop): Dropout(p=0.0, inplace=False)
          )
          (drop_path): DropPath()
          (norm2): LayerNorm((384,), eps=1e-06, elementwise_affine=True)
          (mlp): Mlp(
            (fc1): Linear(in_features=384, out_features=1536, bias=True)
            (act): GELU()
            (fc2): Linear(in_features=1536, out_features=384, bias=True)
            (drop): Dropout(p=0.0, inplace=False)
          )
        )
        (6): Block(
          (norm1): LayerNorm((384,), eps=1e-06, elementwise_affine=True)
          (attn): Attention(
            (qkv): Linear(in_features=384, out_features=1152, bias=True)
            (attn_drop): Dropout(p=0.0, inplace=False)
            (proj): Linear(in_features=384, out_features=384, bias=True)
            (proj_drop): Dropout(p=0.0, inplace=False)
          )
          (drop_path): DropPath()
          (norm2): LayerNorm((384,), eps=1e-06, elementwise_affine=True)
          (mlp): Mlp(
            (fc1): Linear(in_features=384, out_features=1536, bias=True)
            (act): GELU()
            (fc2): Linear(in_features=1536, out_features=384, bias=True)
            (drop): Dropout(p=0.0, inplace=False)
          )
        )
        (7): Block(
          (norm1): LayerNorm((384,), eps=1e-06, elementwise_affine=True)
          (attn): Attention(
            (qkv): Linear(in_features=384, out_features=1152, bias=True)
            (attn_drop): Dropout(p=0.0, inplace=False)
            (proj): Linear(in_features=384, out_features=384, bias=True)
            (proj_drop): Dropout(p=0.0, inplace=False)
          )
          (drop_path): DropPath()
          (norm2): LayerNorm((384,), eps=1e-06, elementwise_affine=True)
          (mlp): Mlp(
            (fc1): Linear(in_features=384, out_features=1536, bias=True)
            (act): GELU()
            (fc2): Linear(in_features=1536, out_features=384, bias=True)
            (drop): Dropout(p=0.0, inplace=False)
          )
        )
        (8): Block(
          (norm1): LayerNorm((384,), eps=1e-06, elementwise_affine=True)
          (attn): Attention(
            (qkv): Linear(in_features=384, out_features=1152, bias=True)
            (attn_drop): Dropout(p=0.0, inplace=False)
            (proj): Linear(in_features=384, out_features=384, bias=True)
            (proj_drop): Dropout(p=0.0, inplace=False)
          )
          (drop_path): DropPath()
          (norm2): LayerNorm((384,), eps=1e-06, elementwise_affine=True)
          (mlp): Mlp(
            (fc1): Linear(in_features=384, out_features=1536, bias=True)
            (act): GELU()
            (fc2): Linear(in_features=1536, out_features=384, bias=True)
            (drop): Dropout(p=0.0, inplace=False)
          )
        )
        (9): Block(
          (norm1): LayerNorm((384,), eps=1e-06, elementwise_affine=True)
          (attn): Attention(
            (qkv): Linear(in_features=384, out_features=1152, bias=True)
            (attn_drop): Dropout(p=0.0, inplace=False)
            (proj): Linear(in_features=384, out_features=384, bias=True)
            (proj_drop): Dropout(p=0.0, inplace=False)
          )
          (drop_path): DropPath()
          (norm2): LayerNorm((384,), eps=1e-06, elementwise_affine=True)
          (mlp): Mlp(
            (fc1): Linear(in_features=384, out_features=1536, bias=True)
            (act): GELU()
            (fc2): Linear(in_features=1536, out_features=384, bias=True)
            (drop): Dropout(p=0.0, inplace=False)
          )
        )
        (10): Block(
          (norm1): LayerNorm((384,), eps=1e-06, elementwise_affine=True)
          (attn): Attention(
            (qkv): Linear(in_features=384, out_features=1152, bias=True)
            (attn_drop): Dropout(p=0.0, inplace=False)
            (proj): Linear(in_features=384, out_features=384, bias=True)
            (proj_drop): Dropout(p=0.0, inplace=False)
          )
          (drop_path): DropPath()
          (norm2): LayerNorm((384,), eps=1e-06, elementwise_affine=True)
          (mlp): Mlp(
            (fc1): Linear(in_features=384, out_features=1536, bias=True)
            (act): GELU()
            (fc2): Linear(in_features=1536, out_features=384, bias=True)
            (drop): Dropout(p=0.0, inplace=False)
          )
        )
        (11): Block(
          (norm1): LayerNorm((384,), eps=1e-06, elementwise_affine=True)
          (attn): Attention(
            (qkv): Linear(in_features=384, out_features=1152, bias=True)
            (attn_drop): Dropout(p=0.0, inplace=False)
            (proj): Linear(in_features=384, out_features=384, bias=True)
            (proj_drop): Dropout(p=0.0, inplace=False)
          )
          (drop_path): DropPath()
          (norm2): LayerNorm((384,), eps=1e-06, elementwise_affine=True)
          (mlp): Mlp(
            (fc1): Linear(in_features=384, out_features=1536, bias=True)
            (act): GELU()
            (fc2): Linear(in_features=1536, out_features=384, bias=True)
            (drop): Dropout(p=0.0, inplace=False)
          )
        )
      )
      (norm): LayerNorm((384,), eps=1e-06, elementwise_affine=True)
      (head): Conv2d(384, 200, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (avgpool): AdaptiveAvgPool2d(output_size=1)
    ){'BASIC': {'BACKUP_CODES': True,
               'BACKUP_LIST': ['lib', 'tools_cam', 'configs'],
               'DISP_FREQ': 10,
               'GPU_ID': [0],
               'NUM_WORKERS': 40,
               'ROOT_DIR': './tools_cam/..',
               'SAVE_DIR': 'ckpt/CUB/deit_tscam_small_patch16_224_CAM-NORMAL_SEED26_CAM-THR0.1_BS128_2022-03-25-01-46',
               'SEED': 26,
               'TIME': '2022-03-25-01-46'},
     'CUDNN': {'BENCHMARK': False, 'DETERMINISTIC': True, 'ENABLE': True},
     'DATA': {'CROP_SIZE': 224,
              'DATADIR': 'data/CUB_200_2011',
              'DATASET': 'CUB',
              'IMAGE_MEAN': [0.485, 0.456, 0.406],
              'IMAGE_STD': [0.229, 0.224, 0.225],
              'NUM_CLASSES': 200,
              'RESIZE_SIZE': 256,
              'SCALE_LENGTH': 15,
              'SCALE_SIZE': 196,
              'TRAIN_AUG_PATH': '',
              'VAL_PATH': ''},
     'MODEL': {'ARCH': 'deit_tscam_small_patch16_224',
               'CAM_THR': 0.1,
               'LOCALIZER_DIR': '',
               'TOP_K': 1},
     'SOLVER': {'LR_FACTOR': 0.1,
                'LR_STEPS': [30],
                'MUMENTUM': 0.9,
                'NUM_EPOCHS': 60,
                'START_LR': 0.001,
                'WEIGHT_DECAY': 0.0005},
     'TEST': {'BATCH_SIZE': 128,
              'CKPT_DIR': '',
              'SAVE_BOXED_IMAGE': False,
              'SAVE_CAMS': False,
              'TEN_CROPS': False},
     'TRAIN': {'ALPHA': 1.0, 'BATCH_SIZE': 128, 'BETA': 1.0, 'IF_FIX_WEIGHT': True}}
    ==> Preparing data...
    done!
    ==> Preparing networks for baseline...
    Removing key head.weight from pretrained checkpoint
    Removing key head.bias from pretrained checkpoint
    TSCAM(
      (patch_embed): PatchEmbed(
        (proj): Conv2d(3, 384, kernel_size=(16, 16), stride=(16, 16))
      )
      (pos_drop): Dropout(p=0.0, inplace=False)
      (blocks): ModuleList(
        (0): Block(
          (norm1): LayerNorm((384,), eps=1e-06, elementwise_affine=True)
          (attn): Attention(
            (qkv): Linear(in_features=384, out_features=1152, bias=True)
            (attn_drop): Dropout(p=0.0, inplace=False)
            (proj): Linear(in_features=384, out_features=384, bias=True)
            (proj_drop): Dropout(p=0.0, inplace=False)
          )
          (drop_path): Identity()
          (norm2): LayerNorm((384,), eps=1e-06, elementwise_affine=True)
          (mlp): Mlp(
            (fc1): Linear(in_features=384, out_features=1536, bias=True)
            (act): GELU()
            (fc2): Linear(in_features=1536, out_features=384, bias=True)
            (drop): Dropout(p=0.0, inplace=False)
          )
        )
        (1): Block(
          (norm1): LayerNorm((384,), eps=1e-06, elementwise_affine=True)
          (attn): Attention(
            (qkv): Linear(in_features=384, out_features=1152, bias=True)
            (attn_drop): Dropout(p=0.0, inplace=False)
            (proj): Linear(in_features=384, out_features=384, bias=True)
            (proj_drop): Dropout(p=0.0, inplace=False)
          )
          (drop_path): DropPath()
          (norm2): LayerNorm((384,), eps=1e-06, elementwise_affine=True)
          (mlp): Mlp(
            (fc1): Linear(in_features=384, out_features=1536, bias=True)
            (act): GELU()
            (fc2): Linear(in_features=1536, out_features=384, bias=True)
            (drop): Dropout(p=0.0, inplace=False)
          )
        )
        (2): Block(
          (norm1): LayerNorm((384,), eps=1e-06, elementwise_affine=True)
          (attn): Attention(
            (qkv): Linear(in_features=384, out_features=1152, bias=True)
            (attn_drop): Dropout(p=0.0, inplace=False)
            (proj): Linear(in_features=384, out_features=384, bias=True)
            (proj_drop): Dropout(p=0.0, inplace=False)
          )
          (drop_path): DropPath()
          (norm2): LayerNorm((384,), eps=1e-06, elementwise_affine=True)
          (mlp): Mlp(
            (fc1): Linear(in_features=384, out_features=1536, bias=True)
            (act): GELU()
            (fc2): Linear(in_features=1536, out_features=384, bias=True)
            (drop): Dropout(p=0.0, inplace=False)
          )
        )
        (3): Block(
          (norm1): LayerNorm((384,), eps=1e-06, elementwise_affine=True)
          (attn): Attention(
            (qkv): Linear(in_features=384, out_features=1152, bias=True)
            (attn_drop): Dropout(p=0.0, inplace=False)
            (proj): Linear(in_features=384, out_features=384, bias=True)
            (proj_drop): Dropout(p=0.0, inplace=False)
          )
          (drop_path): DropPath()
          (norm2): LayerNorm((384,), eps=1e-06, elementwise_affine=True)
          (mlp): Mlp(
            (fc1): Linear(in_features=384, out_features=1536, bias=True)
            (act): GELU()
            (fc2): Linear(in_features=1536, out_features=384, bias=True)
            (drop): Dropout(p=0.0, inplace=False)
          )
        )
        (4): Block(
          (norm1): LayerNorm((384,), eps=1e-06, elementwise_affine=True)
          (attn): Attention(
            (qkv): Linear(in_features=384, out_features=1152, bias=True)
            (attn_drop): Dropout(p=0.0, inplace=False)
            (proj): Linear(in_features=384, out_features=384, bias=True)
            (proj_drop): Dropout(p=0.0, inplace=False)
          )
          (drop_path): DropPath()
          (norm2): LayerNorm((384,), eps=1e-06, elementwise_affine=True)
          (mlp): Mlp(
            (fc1): Linear(in_features=384, out_features=1536, bias=True)
            (act): GELU()
            (fc2): Linear(in_features=1536, out_features=384, bias=True)
            (drop): Dropout(p=0.0, inplace=False)
          )
        )
        (5): Block(
          (norm1): LayerNorm((384,), eps=1e-06, elementwise_affine=True)
          (attn): Attention(
            (qkv): Linear(in_features=384, out_features=1152, bias=True)
            (attn_drop): Dropout(p=0.0, inplace=False)
            (proj): Linear(in_features=384, out_features=384, bias=True)
            (proj_drop): Dropout(p=0.0, inplace=False)
          )
          (drop_path): DropPath()
          (norm2): LayerNorm((384,), eps=1e-06, elementwise_affine=True)
          (mlp): Mlp(
            (fc1): Linear(in_features=384, out_features=1536, bias=True)
            (act): GELU()
            (fc2): Linear(in_features=1536, out_features=384, bias=True)
            (drop): Dropout(p=0.0, inplace=False)
          )
        )
        (6): Block(
          (norm1): LayerNorm((384,), eps=1e-06, elementwise_affine=True)
          (attn): Attention(
            (qkv): Linear(in_features=384, out_features=1152, bias=True)
            (attn_drop): Dropout(p=0.0, inplace=False)
            (proj): Linear(in_features=384, out_features=384, bias=True)
            (proj_drop): Dropout(p=0.0, inplace=False)
          )
          (drop_path): DropPath()
          (norm2): LayerNorm((384,), eps=1e-06, elementwise_affine=True)
          (mlp): Mlp(
            (fc1): Linear(in_features=384, out_features=1536, bias=True)
            (act): GELU()
            (fc2): Linear(in_features=1536, out_features=384, bias=True)
            (drop): Dropout(p=0.0, inplace=False)
          )
        )
        (7): Block(
          (norm1): LayerNorm((384,), eps=1e-06, elementwise_affine=True)
          (attn): Attention(
            (qkv): Linear(in_features=384, out_features=1152, bias=True)
            (attn_drop): Dropout(p=0.0, inplace=False)
            (proj): Linear(in_features=384, out_features=384, bias=True)
            (proj_drop): Dropout(p=0.0, inplace=False)
          )
          (drop_path): DropPath()
          (norm2): LayerNorm((384,), eps=1e-06, elementwise_affine=True)
          (mlp): Mlp(
            (fc1): Linear(in_features=384, out_features=1536, bias=True)
            (act): GELU()
            (fc2): Linear(in_features=1536, out_features=384, bias=True)
            (drop): Dropout(p=0.0, inplace=False)
          )
        )
        (8): Block(
          (norm1): LayerNorm((384,), eps=1e-06, elementwise_affine=True)
          (attn): Attention(
            (qkv): Linear(in_features=384, out_features=1152, bias=True)
            (attn_drop): Dropout(p=0.0, inplace=False)
            (proj): Linear(in_features=384, out_features=384, bias=True)
            (proj_drop): Dropout(p=0.0, inplace=False)
          )
          (drop_path): DropPath()
          (norm2): LayerNorm((384,), eps=1e-06, elementwise_affine=True)
          (mlp): Mlp(
            (fc1): Linear(in_features=384, out_features=1536, bias=True)
            (act): GELU()
            (fc2): Linear(in_features=1536, out_features=384, bias=True)
            (drop): Dropout(p=0.0, inplace=False)
          )
        )
        (9): Block(
          (norm1): LayerNorm((384,), eps=1e-06, elementwise_affine=True)
          (attn): Attention(
            (qkv): Linear(in_features=384, out_features=1152, bias=True)
            (attn_drop): Dropout(p=0.0, inplace=False)
            (proj): Linear(in_features=384, out_features=384, bias=True)
            (proj_drop): Dropout(p=0.0, inplace=False)
          )
          (drop_path): DropPath()
          (norm2): LayerNorm((384,), eps=1e-06, elementwise_affine=True)
          (mlp): Mlp(
            (fc1): Linear(in_features=384, out_features=1536, bias=True)
            (act): GELU()
            (fc2): Linear(in_features=1536, out_features=384, bias=True)
            (drop): Dropout(p=0.0, inplace=False)
          )
        )
        (10): Block(
          (norm1): LayerNorm((384,), eps=1e-06, elementwise_affine=True)
          (attn): Attention(
            (qkv): Linear(in_features=384, out_features=1152, bias=True)
            (attn_drop): Dropout(p=0.0, inplace=False)
            (proj): Linear(in_features=384, out_features=384, bias=True)
            (proj_drop): Dropout(p=0.0, inplace=False)
          )
          (drop_path): DropPath()
          (norm2): LayerNorm((384,), eps=1e-06, elementwise_affine=True)
          (mlp): Mlp(
            (fc1): Linear(in_features=384, out_features=1536, bias=True)
            (act): GELU()
            (fc2): Linear(in_features=1536, out_features=384, bias=True)
            (drop): Dropout(p=0.0, inplace=False)
          )
        )
        (11): Block(
          (norm1): LayerNorm((384,), eps=1e-06, elementwise_affine=True)
          (attn): Attention(
            (qkv): Linear(in_features=384, out_features=1152, bias=True)
            (attn_drop): Dropout(p=0.0, inplace=False)
            (proj): Linear(in_features=384, out_features=384, bias=True)
            (proj_drop): Dropout(p=0.0, inplace=False)
          )
          (drop_path): DropPath()
          (norm2): LayerNorm((384,), eps=1e-06, elementwise_affine=True)
          (mlp): Mlp(
            (fc1): Linear(in_features=384, out_features=1536, bias=True)
            (act): GELU()
            (fc2): Linear(in_features=1536, out_features=384, bias=True)
            (drop): Dropout(p=0.0, inplace=False)
          )
        )
      )
      (norm): LayerNorm((384,), eps=1e-06, elementwise_affine=True)
      (head): Conv2d(384, 200, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (avgpool): AdaptiveAvgPool2d(output_size=1)
    )
    Preparing networks done!
    Train Epoch: [1][1/47],lr: 0.00005	Loss 5.3910 (5.3910)	Prec@1 0.781 (0.781)	Prec@5 3.125 (3.125)
    Train Epoch: [1][11/47],lr: 0.00005	Loss 5.2794 (5.3558)	Prec@1 1.562 (0.781)	Prec@5 7.031 (2.983)
    Train Epoch: [1][21/47],lr: 0.00005	Loss 5.2877 (5.3156)	Prec@1 0.781 (0.818)	Prec@5 4.688 (3.162)
    Train Epoch: [1][31/47],lr: 0.00005	Loss 5.1760 (5.2851)	Prec@1 0.781 (0.882)	Prec@5 7.031 (3.931)
    Train Epoch: [1][41/47],lr: 0.00005	Loss 5.1290 (5.2508)	Prec@1 6.250 (1.524)	Prec@5 12.500 (5.812)
    Train Epoch: [1][47/47],lr: 0.00005	Loss 5.1377 (5.2344)	Prec@1 3.774 (1.668)	Prec@5 11.321 (6.540)
    Val Epoch: [1][1/46]	Loss 4.9150 (4.9150)	
    Cls@1:0.125	Cls@5:0.320
    Loc@1:0.031	Loc@5:0.055	Loc_gt:0.258
    
    Val Epoch: [1][11/46]	Loss 4.7868 (5.0117)	
    Cls@1:0.065	Cls@5:0.232
    Loc@1:0.011	Loc@5:0.044	Loc_gt:0.214
    
    Val Epoch: [1][21/46]	Loss 5.0634 (5.0060)	
    Cls@1:0.066	Cls@5:0.232
    Loc@1:0.015	Loc@5:0.052	Loc_gt:0.217
    
    Val Epoch: [1][31/46]	Loss 5.1113 (5.0342)	
    Cls@1:0.061	Cls@5:0.206
    Loc@1:0.014	Loc@5:0.046	Loc_gt:0.198
    
    Val Epoch: [1][41/46]	Loss 5.0010 (5.0245)	
    Cls@1:0.059	Cls@5:0.204
    Loc@1:0.014	Loc@5:0.046	Loc_gt:0.192
    
    Val Epoch: [1][46/46]	Loss 4.8866 (5.0296)	
    Cls@1:0.059	Cls@5:0.200
    Loc@1:0.013	Loc@5:0.045	Loc_gt:0.189
    
    wrong_details:75 5454 0 6 254 5
    Best GT_LOC: 0.18916120124266483
    Best TOP1_LOC: 0.18916120124266483
    2022-03-25-01-49
    Train Epoch: [2][1/47],lr: 0.00005	Loss 5.0064 (5.0064)	Prec@1 6.250 (6.250)	Prec@5 17.969 (17.969)
    Train Epoch: [2][11/47],lr: 0.00005	Loss 4.9585 (4.9966)	Prec@1 6.250 (6.818)	Prec@5 21.875 (22.656)
    Train Epoch: [2][21/47],lr: 0.00005	Loss 4.9573 (4.9768)	Prec@1 8.594 (6.734)	Prec@5 28.906 (24.479)
    Train Epoch: [2][31/47],lr: 0.00005	Loss 4.9050 (4.9509)	Prec@1 10.938 (7.737)	Prec@5 28.125 (25.932)
    Train Epoch: [2][41/47],lr: 0.00005	Loss 4.8085 (4.9271)	Prec@1 14.844 (8.841)	Prec@5 37.500 (27.458)
    Train Epoch: [2][47/47],lr: 0.00005	Loss 4.8456 (4.9160)	Prec@1 8.491 (9.059)	Prec@5 31.132 (28.195)
    Val Epoch: [2][1/46]	Loss 4.5358 (4.5358)	
    Cls@1:0.258	Cls@5:0.523
    Loc@1:0.078	Loc@5:0.164	Loc_gt:0.344
    
    Val Epoch: [2][11/46]	Loss 4.3821 (4.7243)	
    Cls@1:0.164	Cls@5:0.431
    Loc@1:0.045	Loc@5:0.109	Loc_gt:0.240
    
    Val Epoch: [2][21/46]	Loss 4.8342 (4.6906)	
    Cls@1:0.173	Cls@5:0.453
    Loc@1:0.059	Loc@5:0.135	Loc_gt:0.251
    
    Val Epoch: [2][31/46]	Loss 4.9996 (4.7545)	
    Cls@1:0.153	Cls@5:0.403
    Loc@1:0.050	Loc@5:0.115	Loc_gt:0.225
    
    Val Epoch: [2][41/46]	Loss 4.8124 (4.7559)	
    Cls@1:0.138	Cls@5:0.385
    Loc@1:0.045	Loc@5:0.108	Loc_gt:0.217
    
    Val Epoch: [2][46/46]	Loss 4.8159 (4.7612)	
    Cls@1:0.142	Cls@5:0.391
    Loc@1:0.045	Loc@5:0.108	Loc_gt:0.213
    
    wrong_details:263 4971 0 21 536 3
    Best GT_LOC: 0.2126337590610977
    Best TOP1_LOC: 0.2126337590610977
    2022-03-25-01-54
    Train Epoch: [3][1/47],lr: 0.00005	Loss 4.7283 (4.7283)	Prec@1 21.094 (21.094)	Prec@5 46.875 (46.875)
    Train Epoch: [3][11/47],lr: 0.00005	Loss 4.7234 (4.7402)	Prec@1 11.719 (15.483)	Prec@5 45.312 (43.111)
    Train Epoch: [3][21/47],lr: 0.00005	Loss 4.6686 (4.7088)	Prec@1 15.625 (16.332)	Prec@5 45.312 (43.824)
    Train Epoch: [3][31/47],lr: 0.00005	Loss 4.6701 (4.6906)	Prec@1 20.312 (16.608)	Prec@5 46.875 (43.800)
    Train Epoch: [3][41/47],lr: 0.00005	Loss 4.5544 (4.6702)	Prec@1 26.562 (17.073)	Prec@5 50.000 (44.284)
    Train Epoch: [3][47/47],lr: 0.00005	Loss 4.5622 (4.6585)	Prec@1 26.415 (17.718)	Prec@5 49.057 (44.745)
    Val Epoch: [3][1/46]	Loss 4.1796 (4.1796)	
    Cls@1:0.336	Cls@5:0.711
    Loc@1:0.156	Loc@5:0.312	Loc_gt:0.438
    
    Val Epoch: [3][11/46]	Loss 4.0685 (4.4652)	
    Cls@1:0.263	Cls@5:0.551
    Loc@1:0.078	Loc@5:0.164	Loc_gt:0.273
    
    Val Epoch: [3][21/46]	Loss 4.6838 (4.4194)	
    Cls@1:0.264	Cls@5:0.570
    Loc@1:0.098	Loc@5:0.198	Loc_gt:0.294
    
    Val Epoch: [3][31/46]	Loss 4.8199 (4.5032)	
    Cls@1:0.232	Cls@5:0.515
    Loc@1:0.083	Loc@5:0.167	Loc_gt:0.260
    
    Val Epoch: [3][41/46]	Loss 4.6710 (4.5206)	
    Cls@1:0.209	Cls@5:0.494
    Loc@1:0.073	Loc@5:0.156	Loc_gt:0.250
    
    Val Epoch: [3][46/46]	Loss 4.4396 (4.5273)	
    Cls@1:0.213	Cls@5:0.501
    Loc@1:0.072	Loc@5:0.153	Loc_gt:0.243
    
    wrong_details:420 4557 0 45 757 15
    Best GT_LOC: 0.24318260269244046
    Best TOP1_LOC: 0.24318260269244046
    2022-03-25-01-59
    Train Epoch: [4][1/47],lr: 0.00005	Loss 4.4849 (4.4849)	Prec@1 21.875 (21.875)	Prec@5 58.594 (58.594)
    Train Epoch: [4][11/47],lr: 0.00005	Loss 4.5143 (4.4929)	Prec@1 28.125 (25.284)	Prec@5 47.656 (55.185)
    Train Epoch: [4][21/47],lr: 0.00005	Loss 4.3787 (4.4674)	Prec@1 22.656 (25.744)	Prec@5 56.250 (55.357)
    Train Epoch: [4][31/47],lr: 0.00005	Loss 4.3940 (4.4535)	Prec@1 31.250 (25.731)	Prec@5 52.344 (54.940)
    Train Epoch: [4][41/47],lr: 0.00005	Loss 4.3730 (4.4333)	Prec@1 21.875 (26.067)	Prec@5 59.375 (55.259)
    Train Epoch: [4][47/47],lr: 0.00005	Loss 4.3386 (4.4240)	Prec@1 28.302 (26.376)	Prec@5 64.151 (55.672)
    Val Epoch: [4][1/46]	Loss 3.8875 (3.8875)	
    Cls@1:0.383	Cls@5:0.758
    Loc@1:0.203	Loc@5:0.398	Loc_gt:0.484
    
    Val Epoch: [4][11/46]	Loss 3.8129 (4.2537)	
    Cls@1:0.312	Cls@5:0.580
    Loc@1:0.114	Loc@5:0.204	Loc_gt:0.298
    
    Val Epoch: [4][21/46]	Loss 4.5173 (4.1790)	
    Cls@1:0.326	Cls@5:0.619
    Loc@1:0.137	Loc@5:0.244	Loc_gt:0.330
    
    Val Epoch: [4][31/46]	Loss 4.6776 (4.2892)	
    Cls@1:0.285	Cls@5:0.564
    Loc@1:0.115	Loc@5:0.205	Loc_gt:0.290
    
    Val Epoch: [4][41/46]	Loss 4.4627 (4.3164)	
    Cls@1:0.263	Cls@5:0.547
    Loc@1:0.102	Loc@5:0.190	Loc_gt:0.277
    
    Val Epoch: [4][46/46]	Loss 4.2653 (4.3204)	
    Cls@1:0.270	Cls@5:0.557
    Loc@1:0.100	Loc@5:0.186	Loc_gt:0.269
    
    wrong_details:580 4232 0 75 889 18
    Best GT_LOC: 0.26855367621677595
    Best TOP1_LOC: 0.26855367621677595
    2022-03-25-02-01
    Train Epoch: [5][1/47],lr: 0.00005	Loss 4.3349 (4.3349)	Prec@1 32.812 (32.812)	Prec@5 57.031 (57.031)
    Train Epoch: [5][11/47],lr: 0.00005	Loss 4.2210 (4.2754)	Prec@1 31.250 (33.239)	Prec@5 62.500 (62.713)
    Train Epoch: [5][21/47],lr: 0.00005	Loss 4.2603 (4.2594)	Prec@1 31.250 (32.626)	Prec@5 57.812 (61.793)
    Train Epoch: [5][31/47],lr: 0.00005	Loss 4.2397 (4.2502)	Prec@1 29.688 (31.678)	Prec@5 62.500 (61.164)
    Train Epoch: [5][41/47],lr: 0.00005	Loss 4.2377 (4.2285)	Prec@1 26.562 (31.155)	Prec@5 60.156 (61.300)
    Train Epoch: [5][47/47],lr: 0.00005	Loss 4.1144 (4.2206)	Prec@1 34.906 (31.365)	Prec@5 59.434 (61.328)
    Val Epoch: [5][1/46]	Loss 3.6491 (3.6491)	
    Cls@1:0.398	Cls@5:0.789
    Loc@1:0.203	Loc@5:0.445	Loc_gt:0.516
    
    Val Epoch: [5][11/46]	Loss 3.5341 (4.0492)	
    Cls@1:0.343	Cls@5:0.620
    Loc@1:0.140	Loc@5:0.247	Loc_gt:0.333
    
    Val Epoch: [5][21/46]	Loss 4.3516 (3.9736)	
    Cls@1:0.353	Cls@5:0.653
    Loc@1:0.156	Loc@5:0.279	Loc_gt:0.361
    
    Val Epoch: [5][31/46]	Loss 4.5599 (4.1005)	
    Cls@1:0.306	Cls@5:0.605
    Loc@1:0.132	Loc@5:0.238	Loc_gt:0.318
    
    Val Epoch: [5][41/46]	Loss 4.3230 (4.1360)	
    Cls@1:0.286	Cls@5:0.593
    Loc@1:0.121	Loc@5:0.225	Loc_gt:0.306
    
    Val Epoch: [5][46/46]	Loss 4.1012 (4.1362)	
    Cls@1:0.295	Cls@5:0.602
    Loc@1:0.119	Loc@5:0.220	Loc_gt:0.297
    
    wrong_details:688 4082 0 88 912 24
    Best GT_LOC: 0.2965136347946151
    Best TOP1_LOC: 0.2965136347946151
    2022-03-25-02-02
    Train Epoch: [6][1/47],lr: 0.00005	Loss 4.1231 (4.1231)	Prec@1 30.469 (30.469)	Prec@5 66.406 (66.406)
    Train Epoch: [6][11/47],lr: 0.00005	Loss 4.0252 (4.0962)	Prec@1 37.500 (35.085)	Prec@5 70.312 (67.756)
    Train Epoch: [6][21/47],lr: 0.00005	Loss 3.9509 (4.0630)	Prec@1 40.625 (36.533)	Prec@5 67.969 (67.671)
    Train Epoch: [6][31/47],lr: 0.00005	Loss 3.8919 (4.0431)	Prec@1 45.312 (36.215)	Prec@5 64.844 (67.137)
    Train Epoch: [6][41/47],lr: 0.00005	Loss 3.9957 (4.0417)	Prec@1 40.625 (36.128)	Prec@5 70.312 (66.749)
    Train Epoch: [6][47/47],lr: 0.00005	Loss 3.9811 (4.0315)	Prec@1 41.509 (36.303)	Prec@5 64.151 (67.050)
    Val Epoch: [6][1/46]	Loss 3.4300 (3.4300)	
    Cls@1:0.438	Cls@5:0.781
    Loc@1:0.219	Loc@5:0.453	Loc_gt:0.531
    
    Val Epoch: [6][11/46]	Loss 3.3890 (3.8868)	
    Cls@1:0.357	Cls@5:0.643
    Loc@1:0.145	Loc@5:0.259	Loc_gt:0.343
    
    Val Epoch: [6][21/46]	Loss 4.1725 (3.7921)	
    Cls@1:0.377	Cls@5:0.680
    Loc@1:0.170	Loc@5:0.299	Loc_gt:0.379
    
    Val Epoch: [6][31/46]	Loss 4.4162 (3.9271)	
    Cls@1:0.331	Cls@5:0.634
    Loc@1:0.146	Loc@5:0.259	Loc_gt:0.336
    
    Val Epoch: [6][41/46]	Loss 4.2253 (3.9698)	
    Cls@1:0.313	Cls@5:0.623
    Loc@1:0.136	Loc@5:0.245	Loc_gt:0.325
    
    Val Epoch: [6][46/46]	Loss 3.9466 (3.9713)	
    Cls@1:0.321	Cls@5:0.632
    Loc@1:0.134	Loc@5:0.239	Loc_gt:0.314
    
    wrong_details:776 3935 0 118 940 25
    Best GT_LOC: 0.3139454608215395
    Best TOP1_LOC: 0.3139454608215395
    Preparing networks done!
    Train Epoch: [1][1/47],lr: 0.00005	Loss 5.3910 (5.3910)	Prec@1 0.781 (0.781)	Prec@5 3.125 (3.125)
    Train Epoch: [1][11/47],lr: 0.00005	Loss 5.2530 (5.3511)	Prec@1 1.562 (0.923)	Prec@5 3.906 (3.125)
    Train Epoch: [1][21/47],lr: 0.00005	Loss 5.2631 (5.3216)	Prec@1 0.781 (0.781)	Prec@5 3.906 (3.646)
    Train Epoch: [1][31/47],lr: 0.00005	Loss 5.1785 (5.2905)	Prec@1 0.781 (0.907)	Prec@5 3.125 (4.461)
    Train Epoch: [1][41/47],lr: 0.00005	Loss 5.1472 (5.2599)	Prec@1 1.562 (0.953)	Prec@5 7.031 (4.668)
    Train Epoch: [1][47/47],lr: 0.00005	Loss 5.1461 (5.2453)	Prec@1 1.887 (1.001)	Prec@5 12.264 (4.805)
    Val Epoch: [1][1/46]	Loss 4.8300 (4.8300)	
    Cls@1:0.000	Cls@5:0.094
    Loc@1:0.000	Loc@5:0.023	Loc_gt:0.312
    
    Val Epoch: [1][11/46]	Loss 4.7840 (5.0671)	
    Cls@1:0.010	Cls@5:0.077
    Loc@1:0.002	Loc@5:0.014	Loc_gt:0.224
    
    Val Epoch: [1][21/46]	Loss 5.3839 (5.0786)	
    Cls@1:0.010	Cls@5:0.070
    Loc@1:0.002	Loc@5:0.016	Loc_gt:0.218
    
    Val Epoch: [1][31/46]	Loss 5.3107 (5.1220)	
    Cls@1:0.010	Cls@5:0.061
    Loc@1:0.003	Loc@5:0.014	Loc_gt:0.199
    
    Val Epoch: [1][41/46]	Loss 4.7929 (5.0675)	
    Cls@1:0.016	Cls@5:0.069
    Loc@1:0.005	Loc@5:0.016	Loc_gt:0.195
    
    Val Epoch: [1][46/46]	Loss 5.0628 (5.0798)	
    Cls@1:0.016	Cls@5:0.066
    Loc@1:0.005	Loc@5:0.015	Loc_gt:0.192
    
    wrong_details:27 5704 0 3 58 2
    Best GT_LOC: 0.1924404556437694
    Best TOP1_LOC: 0.1924404556437694
    2022-03-25-01-49
    Train Epoch: [2][1/47],lr: 0.00005	Loss 5.0344 (5.0344)	Prec@1 2.344 (2.344)	Prec@5 7.812 (7.812)
    Train Epoch: [2][11/47],lr: 0.00005	Loss 4.9748 (5.0317)	Prec@1 0.781 (1.634)	Prec@5 9.375 (7.812)
    Train Epoch: [2][21/47],lr: 0.00005	Loss 4.8753 (4.9973)	Prec@1 3.906 (2.083)	Prec@5 10.938 (8.557)
    Train Epoch: [2][31/47],lr: 0.00005	Loss 4.8447 (4.9587)	Prec@1 1.562 (2.419)	Prec@5 9.375 (9.199)
    Train Epoch: [2][41/47],lr: 0.00005	Loss 4.9252 (4.9204)	Prec@1 0.781 (2.591)	Prec@5 10.938 (10.061)
    Train Epoch: [2][47/47],lr: 0.00005	Loss 4.7829 (4.8979)	Prec@1 4.717 (2.669)	Prec@5 16.981 (10.777)
    Val Epoch: [2][1/46]	Loss 4.8567 (4.8567)	
    Cls@1:0.008	Cls@5:0.109
    Loc@1:0.008	Loc@5:0.078	Loc_gt:0.352
    
    Val Epoch: [2][11/46]	Loss 4.4700 (4.4981)	
    Cls@1:0.062	Cls@5:0.232
    Loc@1:0.018	Loc@5:0.061	Loc_gt:0.241
    
    Val Epoch: [2][21/46]	Loss 4.9166 (4.5804)	
    Cls@1:0.057	Cls@5:0.199
    Loc@1:0.015	Loc@5:0.049	Loc_gt:0.231
    
    Val Epoch: [2][31/46]	Loss 4.8543 (4.6311)	
    Cls@1:0.055	Cls@5:0.184
    Loc@1:0.013	Loc@5:0.044	Loc_gt:0.230
    
    Val Epoch: [2][41/46]	Loss 4.4745 (4.5779)	
    Cls@1:0.052	Cls@5:0.182
    Loc@1:0.013	Loc@5:0.046	Loc_gt:0.217
    
    Val Epoch: [2][46/46]	Loss 3.8181 (4.5981)	
    Cls@1:0.051	Cls@5:0.177
    Loc@1:0.012	Loc@5:0.044	Loc_gt:0.215
    
    wrong_details:71 5500 0 116 98 9
    Best GT_LOC: 0.21453227476700035
    Best TOP1_LOC: 0.21453227476700035
    2022-03-25-01-54
    Train Epoch: [3][1/47],lr: 0.00005	Loss 4.4202 (4.4202)	Prec@1 7.031 (7.031)	Prec@5 21.875 (21.875)
    Train Epoch: [3][11/47],lr: 0.00005	Loss 4.2909 (4.5166)	Prec@1 8.594 (5.114)	Prec@5 26.562 (19.389)
    Train Epoch: [3][21/47],lr: 0.00005	Loss 4.5282 (4.5042)	Prec@1 1.562 (5.320)	Prec@5 11.719 (19.085)
    Train Epoch: [3][31/47],lr: 0.00005	Loss 4.3235 (4.4662)	Prec@1 9.375 (5.872)	Prec@5 21.875 (19.481)
    Train Epoch: [3][41/47],lr: 0.00005	Loss 4.2087 (4.4195)	Prec@1 8.594 (6.250)	Prec@5 31.250 (21.418)
    Train Epoch: [3][47/47],lr: 0.00005	Loss 4.1792 (4.3886)	Prec@1 7.547 (6.390)	Prec@5 22.642 (21.955)
    Val Epoch: [3][1/46]	Loss 3.8698 (3.8698)	
    Cls@1:0.172	Cls@5:0.289
    Loc@1:0.055	Loc@5:0.102	Loc_gt:0.375
    
    Val Epoch: [3][11/46]	Loss 4.1646 (3.9793)	
    Cls@1:0.118	Cls@5:0.327
    Loc@1:0.029	Loc@5:0.101	Loc_gt:0.257
    
    Val Epoch: [3][21/46]	Loss 4.8005 (4.0862)	
    Cls@1:0.106	Cls@5:0.299
    Loc@1:0.022	Loc@5:0.076	Loc_gt:0.228
    
    Val Epoch: [3][31/46]	Loss 4.6957 (4.3865)	
    Cls@1:0.085	Cls@5:0.241
    Loc@1:0.017	Loc@5:0.061	Loc_gt:0.225
    
    Val Epoch: [3][41/46]	Loss 4.3491 (4.2824)	
    Cls@1:0.082	Cls@5:0.253
    Loc@1:0.017	Loc@5:0.062	Loc_gt:0.210
    
    Val Epoch: [3][46/46]	Loss 3.9240 (4.3113)	
    Cls@1:0.079	Cls@5:0.246
    Loc@1:0.016	Loc@5:0.057	Loc_gt:0.203
    
    wrong_details:90 5334 0 314 42 14
    Best GT_LOC: 0.21453227476700035
    Best TOP1_LOC: 0.21453227476700035
    2022-03-25-02-00
    Train Epoch: [4][1/47],lr: 0.00005	Loss 3.9914 (3.9914)	Prec@1 12.500 (12.500)	Prec@5 34.375 (34.375)
    Train Epoch: [4][11/47],lr: 0.00005	Loss 4.1547 (4.0524)	Prec@1 10.156 (11.364)	Prec@5 29.688 (30.753)
    Train Epoch: [4][21/47],lr: 0.00005	Loss 4.3899 (4.0732)	Prec@1 7.031 (10.528)	Prec@5 19.531 (30.432)
    Train Epoch: [4][31/47],lr: 0.00005	Loss 3.7553 (4.0368)	Prec@1 11.719 (10.459)	Prec@5 38.281 (31.401)
    Train Epoch: [4][41/47],lr: 0.00005	Loss 3.9345 (4.0095)	Prec@1 8.594 (10.690)	Prec@5 30.469 (31.879)
    Train Epoch: [4][47/47],lr: 0.00005	Loss 3.7705 (3.9801)	Prec@1 16.981 (11.195)	Prec@5 38.679 (32.733)
    Val Epoch: [4][1/46]	Loss 4.2226 (4.2226)	
    Cls@1:0.141	Cls@5:0.344
    Loc@1:0.086	Loc@5:0.195	Loc_gt:0.461
    
    Val Epoch: [4][11/46]	Loss 3.6754 (3.8854)	
    Cls@1:0.148	Cls@5:0.378
    Loc@1:0.048	Loc@5:0.126	Loc_gt:0.303
    
    Val Epoch: [4][21/46]	Loss 4.4785 (3.9334)	
    Cls@1:0.134	Cls@5:0.353
    Loc@1:0.036	Loc@5:0.102	Loc_gt:0.283
    
    Val Epoch: [4][31/46]	Loss 4.5506 (3.9342)	
    Cls@1:0.123	Cls@5:0.345
    Loc@1:0.035	Loc@5:0.108	Loc_gt:0.294
    
    Val Epoch: [4][41/46]	Loss 3.8479 (3.8611)	
    Cls@1:0.121	Cls@5:0.354
    Loc@1:0.030	Loc@5:0.100	Loc_gt:0.267
    
    Val Epoch: [4][46/46]	Loss 3.7787 (3.8731)	
    Cls@1:0.118	Cls@5:0.345
    Loc@1:0.028	Loc@5:0.094	Loc_gt:0.265
    
    wrong_details:164 5109 0 465 41 15
    Best GT_LOC: 0.26458405246807043
    Best TOP1_LOC: 0.26458405246807043
    2022-03-25-02-02
    Train Epoch: [5][1/47],lr: 0.00005	Loss 3.6695 (3.6695)	Prec@1 11.719 (11.719)	Prec@5 38.281 (38.281)
    Train Epoch: [5][11/47],lr: 0.00005	Loss 3.5674 (3.5657)	Prec@1 19.531 (16.548)	Prec@5 40.625 (42.827)
    Train Epoch: [5][21/47],lr: 0.00005	Loss 3.5742 (3.5837)	Prec@1 13.281 (16.481)	Prec@5 40.625 (43.043)
    Train Epoch: [5][31/47],lr: 0.00005	Loss 3.5905 (3.5552)	Prec@1 19.531 (17.339)	Prec@5 44.531 (43.901)
    Train Epoch: [5][41/47],lr: 0.00005	Loss 3.4860 (3.5470)	Prec@1 22.656 (17.492)	Prec@5 46.875 (44.284)
    Train Epoch: [5][47/47],lr: 0.00005	Loss 3.6738 (3.5731)	Prec@1 13.208 (17.017)	Prec@5 38.679 (43.694)
    Val Epoch: [5][1/46]	Loss 3.6581 (3.6581)	
    Cls@1:0.133	Cls@5:0.398
    Loc@1:0.039	Loc@5:0.164	Loc_gt:0.328
    
    Val Epoch: [5][11/46]	Loss 3.4856 (3.7789)	
    Cls@1:0.167	Cls@5:0.392
    Loc@1:0.037	Loc@5:0.097	Loc_gt:0.261
    
    Val Epoch: [5][21/46]	Loss 3.7290 (3.6464)	
    Cls@1:0.188	Cls@5:0.424
    Loc@1:0.036	Loc@5:0.093	Loc_gt:0.240
    
    Val Epoch: [5][31/46]	Loss 4.2331 (3.7345)	
    Cls@1:0.165	Cls@5:0.397
    Loc@1:0.036	Loc@5:0.094	Loc_gt:0.242
    
    Val Epoch: [5][41/46]	Loss 4.0443 (3.7508)	
    Cls@1:0.156	Cls@5:0.393
    Loc@1:0.034	Loc@5:0.089	Loc_gt:0.227
    
    Val Epoch: [5][46/46]	Loss 3.5688 (3.7679)	
    Cls@1:0.150	Cls@5:0.389
    Loc@1:0.032	Loc@5:0.086	Loc_gt:0.226
    
    wrong_details:185 4926 0 622 34 27
    Best GT_LOC: 0.26458405246807043
    Best TOP1_LOC: 0.26458405246807043
    
    

    Looking for your help!

    opened by Unrealluver 1
  • How can i train my custom dataset with no object class?

    How can i train my custom dataset with no object class?

    I really appreciate your work. Nowadays I try to train TS-CAM with my own dataset. First, I make my own dataset into the CUB dataset format. And I set the no object image to the bounding box value (0,0,0,0). But I failed to train the model. Can you recommend any way to train the model with no object dataset?

    opened by kimwin2 1
  • How could I get an performance(imageNet-1K) like your paper?

    How could I get an performance(imageNet-1K) like your paper?

    Hello. At first, thank you for your code and paper. Your paper caught my attention! But.. I have a problem. When training is turned to default setting, the performance is not good like you. So, if possible, can you tell me your setting? Like this : https://github.com/vasgaowei/TS-CAM/issues/7#issuecomment-927473246

    I'm hoping for your answer :+1: :+1:

    I am looking forward to your great papers. Thank you :)

    It is my hyper-parameter setting.

    {'BASIC': {'BACKUP_CODES': True, 'BACKUP_LIST': ['lib', 'tools_cam', 'configs'], 'DISP_FREQ': 10, 'GPU_ID': [0], 'NUM_WORKERS': 8, 'ROOT_DIR': './tools_cam/..', 'SAVE_DIR': 'ckpt/ImageNet/deit_tscam_small_patch16_224_CAM-NORMAL_SEED26_CAM-THR0.12_BS256_2022-02-02-14-27', 'SEED': 26, 'TIME': '2022-02-02-14-27'}, 'CUDNN': {'BENCHMARK': False, 'DETERMINISTIC': True, 'ENABLE': True}, 'DATA': {'CROP_SIZE': 224, 'DATADIR': 'data/ImageNet_ILSVRC2012', 'DATASET': 'ImageNet', 'IMAGE_MEAN': [0.485, 0.456, 0.406], 'IMAGE_STD': [0.229, 0.224, 0.225], 'NUM_CLASSES': 1000, 'RESIZE_SIZE': 512, 'SCALE_LENGTH': 15, 'SCALE_SIZE': 196}, 'MODEL': {'ARCH': 'deit_tscam_small_patch16_224', 'CAM_THR': 0.12, 'LOCALIZER_DIR': '', 'TOP_K': 1}, 'SOLVER': {'LR_FACTOR': 0.1, 'LR_STEPS': [10, 12], 'MUMENTUM': 0.9, 'NUM_EPOCHS': 20, 'START_LR': 0.004, 'WEIGHT_DECAY': 0.0005}, 'TEST': {'BATCH_SIZE': 512, 'CKPT_DIR': '', 'SAVE_BOXED_IMAGE': False, 'SAVE_CAMS': False, 'TEN_CROPS': False}, 'TRAIN': {'ALPHA': 1.0, 'BATCH_SIZE': 256, 'BETA': 1.0}} ==> Preparing data... done! ==> Preparing networks for baseline... Removing key head.weight from pretrained checkpoint TSCAM(

    And the result is like this :

    Val Epoch: [12][98/98] Loss 1.4334 (1.4654) Cls@1:0.657 Cls@5:0.858 Loc@1:0.451 Loc@5:0.564 Loc_gt:0.609

    opened by SejinPark99 1
Owner
vasgaowei
vasgaowei
Contrastive learning of Class-agnostic Activation Map for Weakly Supervised Object Localization and Semantic Segmentation (CVPR 2022)

CCAM (Unsupervised) Code repository for our paper "CCAM: Contrastive learning of Class-agnostic Activation Map for Weakly Supervised Object Localizati

Computer Vision Insitute, SZU 113 Dec 27, 2022
Weakly Supervised Dense Event Captioning in Videos, i.e. generating multiple sentence descriptions for a video in a weakly-supervised manner.

WSDEC This is the official repo for our NeurIPS paper Weakly Supervised Dense Event Captioning in Videos. Description Repo directories ./: global conf

Melon(Xuguang Duan) 96 Nov 1, 2022
Normalization Matters in Weakly Supervised Object Localization (ICCV 2021)

Normalization Matters in Weakly Supervised Object Localization (ICCV 2021) 99% of the code in this repository originates from this link. ICCV 2021 pap

Jeesoo Kim 10 Feb 1, 2022
PyTorch implementation of ''Background Activation Suppression for Weakly Supervised Object Localization''.

Background Activation Suppression for Weakly Supervised Object Localization PyTorch implementation of ''Background Activation Suppression for Weakly S

null 35 Jan 6, 2023
Python scripts performing class agnostic object localization using the Object Localization Network model in ONNX.

ONNX Object Localization Network Python scripts performing class agnostic object localization using the Object Localization Network model in ONNX. Ori

Ibai Gorordo 15 Oct 14, 2022
Hybrid CenterNet - Hybrid-supervised object detection / Weakly semi-supervised object detection

Hybrid-Supervised Object Detection System Object detection system trained by hybrid-supervision/weakly semi-supervision (HSOD/WSSOD): This project is

null 5 Dec 10, 2022
Puzzle-CAM: Improved localization via matching partial and full features.

Puzzle-CAM The official implementation of "Puzzle-CAM: Improved localization via matching partial and full features".

Sanghyun Jo 150 Nov 14, 2022
Learning Pixel-level Semantic Affinity with Image-level Supervision for Weakly Supervised Semantic Segmentation, CVPR 2018

Learning Pixel-level Semantic Affinity with Image-level Supervision This code is deprecated. Please see https://github.com/jiwoon-ahn/irn instead. Int

Jiwoon Ahn 337 Dec 15, 2022
Drone-based Joint Density Map Estimation, Localization and Tracking with Space-Time Multi-Scale Attention Network

DroneCrowd Paper Detection, Tracking, and Counting Meets Drones in Crowds: A Benchmark. Introduction This paper proposes a space-time multi-scale atte

VisDrone 98 Nov 16, 2022
Official Pytorch Implementation of 'Learning Action Completeness from Points for Weakly-supervised Temporal Action Localization' (ICCV-21 Oral)

Learning-Action-Completeness-from-Points Official Pytorch Implementation of 'Learning Action Completeness from Points for Weakly-supervised Temporal A

Pilhyeon Lee 67 Jan 3, 2023
Weak-supervised Visual Geo-localization via Attention-based Knowledge Distillation

Weak-supervised Visual Geo-localization via Attention-based Knowledge Distillation Introduction WAKD is a PyTorch implementation for our ICPR-2022 pap

null 2 Oct 20, 2022
A Comprehensive Analysis of Weakly-Supervised Semantic Segmentation in Different Image Domains (IJCV submission)

wsss-analysis The code of: A Comprehensive Analysis of Weakly-Supervised Semantic Segmentation in Different Image Domains, arXiv pre-print 2019 paper.

Lyndon Chan 48 Dec 18, 2022
Context Decoupling Augmentation for Weakly Supervised Semantic Segmentation

Context Decoupling Augmentation for Weakly Supervised Semantic Segmentation The code of: Context Decoupling Augmentation for Weakly Supervised Semanti

null 54 Dec 12, 2022
Discriminative Region Suppression for Weakly-Supervised Semantic Segmentation

Discriminative Region Suppression for Weakly-Supervised Semantic Segmentation (AAAI 2021) Official pytorch implementation of our paper: Discriminative

Beom 74 Dec 27, 2022
Anti-Adversarially Manipulated Attributions for Weakly and Semi-Supervised Semantic Segmentation (CVPR 2021)

Anti-Adversarially Manipulated Attributions for Weakly and Semi-Supervised Semantic Segmentation Input Image Initial CAM Successive Maps with adversar

Jungbeom Lee 110 Dec 7, 2022
Code for the paper One Thing One Click: A Self-Training Approach for Weakly Supervised 3D Semantic Segmentation, CVPR 2021.

One Thing One Click One Thing One Click: A Self-Training Approach for Weakly Supervised 3D Semantic Segmentation (CVPR2021) Code for the paper One Thi

null 44 Dec 12, 2022
DiscoBox: Weakly Supervised Instance Segmentation and Semantic Correspondence from Box Supervision

The Official PyTorch Implementation of DiscoBox: Weakly Supervised Instance Segmentation and Semantic Correspondence from Box Supervision

Shiyi Lan 3 Oct 15, 2021
Reducing Information Bottleneck for Weakly Supervised Semantic Segmentation (NeurIPS 2021)

Reducing Information Bottleneck for Weakly Supervised Semantic Segmentation (NeurIPS 2021) The implementation of Reducing Infromation Bottleneck for W

Jungbeom Lee 81 Dec 16, 2022
The PyTorch implementation of DiscoBox: Weakly Supervised Instance Segmentation and Semantic Correspondence from Box Supervision.

DiscoBox: Weakly Supervised Instance Segmentation and Semantic Correspondence from Box Supervision The PyTorch implementation of DiscoBox: Weakly Supe

Shiyi Lan 1 Oct 23, 2021