TS-CAM: Token Semantic Coupled Attention Map for Weakly SupervisedObject Localization

This is the official implementaion of paper TS-CAM: Token Semantic Coupled Attention Map for Weakly Supervised Object Localization

This repository contains Pytorch training code, evaluation code, pretrained models and jupyter notebook for more visualization.

Illustration

Based on Deit, TS-CAM couples attention maps from visual image transformer with semantic-aware maps to obtain accurate localization maps (Token Semantic Coupled Attention Map, ts-cam).

Model Zoo

We provide pretrained TS-CAM models trained on CUB-200-2011 and ImageNet_ILSVRC2012 datasets.

Dataset	Loc.Acc@1	Loc.Acc@5	Loc.Gt-Known	Cls.Acc@1	Cls.Acc@5	Baidu Drive	Google Drive
CUB-200-2011	71.3	83.8	87.7	80.3	94.8	model	model
ILSVRC2012	53.4	64.3	67.6	74.3	92.1	model	model

Note: the Extrate Code for Baidu Drive is as follows:

CUB-200-2011: 36wz
ILSVRC2012: sslq

Usage

First clone the repository locally:

git clone https://github.com/vasgaowei/TS-CAM.git

Then install Pytorch 1.7.0+ and torchvision 0.8.1+ and pytorch-image-models 0.3.2:


conda create -n pytorch1.7 python=3.6
conda activate pytorc1.7
conda install anaconda
conda install pytorch==1.7.0 torchvision==0.8.0 torchaudio==0.7.0 cudatoolkit=10.2 -c pytorch
pip install timm==0.3.2

Data preparation

CUB-200-2011 dataset

Please download and extrate CUB-200-2011 dataset.

The directory structure is the following:

TS-CAM/
  data/
    CUB-200-2011/
      attributes/
      images/
      parts/
      bounding_boxes.txt
      classes.txt
      image_class_labels.txt
      images.txt
      image_sizes.txt
      README
      train_test_split.txt

ImageNet1k

Download ILSVRC2012 dataset and extract train and val images.

The directory structure is organized as follows:

TS-CAM/
  data/
  ImageNet_ILSVRC2012/
    ILSVRC2012_list/
    train/
      n01440764/
        n01440764_18.JPEG
        ...
      n01514859/
        n01514859_1.JPEG
        ...
    val/
      n01440764/
        ILSVRC2012_val_00000293.JPEG
        ...
      n01531178/
        ILSVRC2012_val_00000570.JPEG
        ...
    ILSVRC2012_list/
      train.txt
      val_folder.txt
      val_folder_new.txt

And the training and validation data is expected to be in the train/ folder and val folder respectively:

For training:

On CUB-200-2011 dataset:

bash train_val_cub.sh {GPU_ID} ${NET}

On ImageNet1k dataset:

bash train_val_ilsvrc.sh {GPU_ID} ${NET}

Please note that pretrained model weights of Deit-tiny, Deit-small and Deit-base on ImageNet-1k model will be downloaded when you first train you model, so the Internet should be connected.

For evaluation:

On CUB-200-2011 dataset:

bash val_cub.sh {GPU_ID} ${NET} ${MODEL_PATH}

On ImageNet1k dataset:

bash val_ilsvrc.sh {GPU_ID} ${NET} ${MODEL_PATH}

GPU_ID should be specified and multiple GPUs can be used for accelerating training and evaluation.

NET shoule be chosen among tiny, small and base.

MODEL_PATH is the path of pretrained model.

Visualization

We provided jupyter notebook in tools_cam folder.

TS-CAM/
  tools-cam/
    visualization_attention_map_cub.ipynb
    visualization_attention_map_imaget.ipynb

Please download pretrained TS-CAM model weights and try more visualzation results((Attention maps using our method and Attention Rollout method)). You can try other interseting images you like to show the localization map(ts-cams).

Visualize localization results

We provide some visualization results as follows.

Visualize attention maps

We can also visualize attention maps from different transformer layers.

Contacts

If you have any question about our work or this repository, please don't hesitate to contact us by emails.

You can also open an issue under this project.

Citation

If you use this code for a paper please cite:

@article{Gao2021TSCAMTS,
  title={TS-CAM: Token Semantic Coupled Attention Map for Weakly Supervised Object Localization},
  author={Wei Gao and Fang Wan and Xingjia Pan and Zhiliang Peng and Qi Tian and Zhenjun Han and Bolei Zhou and Qixiang Ye},
  journal={ArXiv},
  year={2021},
  volume={abs/2103.14862}
}

Greetings! Thanks for your excellent work! When running your code, I met a problem that the performance is poor. My running command is

 bash train_val_cub.sh 3 deit small 224

and I got the log like:

{'BASIC': {'BACKUP_CODES': True,
           'BACKUP_LIST': ['lib', 'tools_cam', 'configs'],
           'DISP_FREQ': 10,
           'GPU_ID': [0],
           'NUM_WORKERS': 40,
           'ROOT_DIR': './tools_cam/..',
           'SAVE_DIR': 'ckpt/CUB/deit_tscam_small_patch16_224_CAM-NORMAL_SEED26_CAM-THR0.1_BS128_2022-03-25-01-46',
           'SEED': 26,
           'TIME': '2022-03-25-01-46'},
 'CUDNN': {'BENCHMARK': False, 'DETERMINISTIC': True, 'ENABLE': True},
 'DATA': {'CROP_SIZE': 224,
          'DATADIR': 'data/CUB_200_2011',
          'DATASET': 'CUB',
          'IMAGE_MEAN': [0.485, 0.456, 0.406],
          'IMAGE_STD': [0.229, 0.224, 0.225],
          'NUM_CLASSES': 200,
          'RESIZE_SIZE': 256,
          'SCALE_LENGTH': 15,
          'SCALE_SIZE': 196,
          'TRAIN_AUG_PATH': '',
          'VAL_PATH': ''},
 'MODEL': {'ARCH': 'deit_tscam_small_patch16_224',
           'CAM_THR': 0.1,
           'LOCALIZER_DIR': '',
           'TOP_K': 1},
 'SOLVER': {'LR_FACTOR': 0.1,
            'LR_STEPS': [30],
            'MUMENTUM': 0.9,
            'NUM_EPOCHS': 60,
            'START_LR': 0.001,
            'WEIGHT_DECAY': 0.0005},
 'TEST': {'BATCH_SIZE': 128,
          'CKPT_DIR': '',
          'SAVE_BOXED_IMAGE': False,
          'SAVE_CAMS': False,
          'TEN_CROPS': False},
 'TRAIN': {'ALPHA': 1.0,
           'BATCH_SIZE': 128,
           'BETA': 1.0,
           'IF_FIX_WEIGHT': False}}
==> Preparing data...
done!
==> Preparing networks for baseline...
Removing key head.weight from pretrained checkpoint
Removing key head.bias from pretrained checkpoint
TSCAM(
  (patch_embed): PatchEmbed(
    (proj): Conv2d(3, 384, kernel_size=(16, 16), stride=(16, 16))
  )
  (pos_drop): Dropout(p=0.0, inplace=False)
  (blocks): ModuleList(
    (0): Block(
      (norm1): LayerNorm((384,), eps=1e-06, elementwise_affine=True)
      (attn): Attention(
        (qkv): Linear(in_features=384, out_features=1152, bias=True)
        (attn_drop): Dropout(p=0.0, inplace=False)
        (proj): Linear(in_features=384, out_features=384, bias=True)
        (proj_drop): Dropout(p=0.0, inplace=False)
      )
      (drop_path): Identity()
      (norm2): LayerNorm((384,), eps=1e-06, elementwise_affine=True)
      (mlp): Mlp(
        (fc1): Linear(in_features=384, out_features=1536, bias=True)
        (act): GELU()
        (fc2): Linear(in_features=1536, out_features=384, bias=True)
        (drop): Dropout(p=0.0, inplace=False)
      )
    )
    (1): Block(
      (norm1): LayerNorm((384,), eps=1e-06, elementwise_affine=True)
      (attn): Attention(
        (qkv): Linear(in_features=384, out_features=1152, bias=True)
        (attn_drop): Dropout(p=0.0, inplace=False)
        (proj): Linear(in_features=384, out_features=384, bias=True)
        (proj_drop): Dropout(p=0.0, inplace=False)
      )
      (drop_path): DropPath()
      (norm2): LayerNorm((384,), eps=1e-06, elementwise_affine=True)
      (mlp): Mlp(
        (fc1): Linear(in_features=384, out_features=1536, bias=True)
        (act): GELU()
        (fc2): Linear(in_features=1536, out_features=384, bias=True)
        (drop): Dropout(p=0.0, inplace=False)
      )
    )
    (2): Block(
      (norm1): LayerNorm((384,), eps=1e-06, elementwise_affine=True)
      (attn): Attention(
        (qkv): Linear(in_features=384, out_features=1152, bias=True)
        (attn_drop): Dropout(p=0.0, inplace=False)
        (proj): Linear(in_features=384, out_features=384, bias=True)
        (proj_drop): Dropout(p=0.0, inplace=False)
      )
      (drop_path): DropPath()
      (norm2): LayerNorm((384,), eps=1e-06, elementwise_affine=True)
      (mlp): Mlp(
        (fc1): Linear(in_features=384, out_features=1536, bias=True)
        (act): GELU()
        (fc2): Linear(in_features=1536, out_features=384, bias=True)
        (drop): Dropout(p=0.0, inplace=False)
      )
    )
    (3): Block(
      (norm1): LayerNorm((384,), eps=1e-06, elementwise_affine=True)
      (attn): Attention(
        (qkv): Linear(in_features=384, out_features=1152, bias=True)
        (attn_drop): Dropout(p=0.0, inplace=False)
        (proj): Linear(in_features=384, out_features=384, bias=True)
        (proj_drop): Dropout(p=0.0, inplace=False)
      )
      (drop_path): DropPath()
      (norm2): LayerNorm((384,), eps=1e-06, elementwise_affine=True)
      (mlp): Mlp(
        (fc1): Linear(in_features=384, out_features=1536, bias=True)
        (act): GELU()
        (fc2): Linear(in_features=1536, out_features=384, bias=True)
        (drop): Dropout(p=0.0, inplace=False)
      )
    )
    (4): Block(
      (norm1): LayerNorm((384,), eps=1e-06, elementwise_affine=True)
      (attn): Attention(
        (qkv): Linear(in_features=384, out_features=1152, bias=True)
        (attn_drop): Dropout(p=0.0, inplace=False)
        (proj): Linear(in_features=384, out_features=384, bias=True)
        (proj_drop): Dropout(p=0.0, inplace=False)
      )
      (drop_path): DropPath()
      (norm2): LayerNorm((384,), eps=1e-06, elementwise_affine=True)
      (mlp): Mlp(
        (fc1): Linear(in_features=384, out_features=1536, bias=True)
        (act): GELU()
        (fc2): Linear(in_features=1536, out_features=384, bias=True)
        (drop): Dropout(p=0.0, inplace=False)
      )
    )
    (5): Block(
      (norm1): LayerNorm((384,), eps=1e-06, elementwise_affine=True)
      (attn): Attention(
        (qkv): Linear(in_features=384, out_features=1152, bias=True)
        (attn_drop): Dropout(p=0.0, inplace=False)
        (proj): Linear(in_features=384, out_features=384, bias=True)
        (proj_drop): Dropout(p=0.0, inplace=False)
      )
      (drop_path): DropPath()
      (norm2): LayerNorm((384,), eps=1e-06, elementwise_affine=True)
      (mlp): Mlp(
        (fc1): Linear(in_features=384, out_features=1536, bias=True)
        (act): GELU()
        (fc2): Linear(in_features=1536, out_features=384, bias=True)
        (drop): Dropout(p=0.0, inplace=False)
      )
    )
    (6): Block(
      (norm1): LayerNorm((384,), eps=1e-06, elementwise_affine=True)
      (attn): Attention(
        (qkv): Linear(in_features=384, out_features=1152, bias=True)
        (attn_drop): Dropout(p=0.0, inplace=False)
        (proj): Linear(in_features=384, out_features=384, bias=True)
        (proj_drop): Dropout(p=0.0, inplace=False)
      )
      (drop_path): DropPath()
      (norm2): LayerNorm((384,), eps=1e-06, elementwise_affine=True)
      (mlp): Mlp(
        (fc1): Linear(in_features=384, out_features=1536, bias=True)
        (act): GELU()
        (fc2): Linear(in_features=1536, out_features=384, bias=True)
        (drop): Dropout(p=0.0, inplace=False)
      )
    )
    (7): Block(
      (norm1): LayerNorm((384,), eps=1e-06, elementwise_affine=True)
      (attn): Attention(
        (qkv): Linear(in_features=384, out_features=1152, bias=True)
        (attn_drop): Dropout(p=0.0, inplace=False)
        (proj): Linear(in_features=384, out_features=384, bias=True)
        (proj_drop): Dropout(p=0.0, inplace=False)
      )
      (drop_path): DropPath()
      (norm2): LayerNorm((384,), eps=1e-06, elementwise_affine=True)
      (mlp): Mlp(
        (fc1): Linear(in_features=384, out_features=1536, bias=True)
        (act): GELU()
        (fc2): Linear(in_features=1536, out_features=384, bias=True)
        (drop): Dropout(p=0.0, inplace=False)
      )
    )
    (8): Block(
      (norm1): LayerNorm((384,), eps=1e-06, elementwise_affine=True)
      (attn): Attention(
        (qkv): Linear(in_features=384, out_features=1152, bias=True)
        (attn_drop): Dropout(p=0.0, inplace=False)
        (proj): Linear(in_features=384, out_features=384, bias=True)
        (proj_drop): Dropout(p=0.0, inplace=False)
      )
      (drop_path): DropPath()
      (norm2): LayerNorm((384,), eps=1e-06, elementwise_affine=True)
      (mlp): Mlp(
        (fc1): Linear(in_features=384, out_features=1536, bias=True)
        (act): GELU()
        (fc2): Linear(in_features=1536, out_features=384, bias=True)
        (drop): Dropout(p=0.0, inplace=False)
      )
    )
    (9): Block(
      (norm1): LayerNorm((384,), eps=1e-06, elementwise_affine=True)
      (attn): Attention(
        (qkv): Linear(in_features=384, out_features=1152, bias=True)
        (attn_drop): Dropout(p=0.0, inplace=False)
        (proj): Linear(in_features=384, out_features=384, bias=True)
        (proj_drop): Dropout(p=0.0, inplace=False)
      )
      (drop_path): DropPath()
      (norm2): LayerNorm((384,), eps=1e-06, elementwise_affine=True)
      (mlp): Mlp(
        (fc1): Linear(in_features=384, out_features=1536, bias=True)
        (act): GELU()
        (fc2): Linear(in_features=1536, out_features=384, bias=True)
        (drop): Dropout(p=0.0, inplace=False)
      )
    )
    (10): Block(
      (norm1): LayerNorm((384,), eps=1e-06, elementwise_affine=True)
      (attn): Attention(
        (qkv): Linear(in_features=384, out_features=1152, bias=True)
        (attn_drop): Dropout(p=0.0, inplace=False)
        (proj): Linear(in_features=384, out_features=384, bias=True)
        (proj_drop): Dropout(p=0.0, inplace=False)
      )
      (drop_path): DropPath()
      (norm2): LayerNorm((384,), eps=1e-06, elementwise_affine=True)
      (mlp): Mlp(
        (fc1): Linear(in_features=384, out_features=1536, bias=True)
        (act): GELU()
        (fc2): Linear(in_features=1536, out_features=384, bias=True)
        (drop): Dropout(p=0.0, inplace=False)
      )
    )
    (11): Block(
      (norm1): LayerNorm((384,), eps=1e-06, elementwise_affine=True)
      (attn): Attention(
        (qkv): Linear(in_features=384, out_features=1152, bias=True)
        (attn_drop): Dropout(p=0.0, inplace=False)
        (proj): Linear(in_features=384, out_features=384, bias=True)
        (proj_drop): Dropout(p=0.0, inplace=False)
      )
      (drop_path): DropPath()
      (norm2): LayerNorm((384,), eps=1e-06, elementwise_affine=True)
      (mlp): Mlp(
        (fc1): Linear(in_features=384, out_features=1536, bias=True)
        (act): GELU()
        (fc2): Linear(in_features=1536, out_features=384, bias=True)
        (drop): Dropout(p=0.0, inplace=False)
      )
    )
  )
  (norm): LayerNorm((384,), eps=1e-06, elementwise_affine=True)
  (head): Conv2d(384, 200, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (avgpool): AdaptiveAvgPool2d(output_size=1)
){'BASIC': {'BACKUP_CODES': True,
           'BACKUP_LIST': ['lib', 'tools_cam', 'configs'],
           'DISP_FREQ': 10,
           'GPU_ID': [0],
           'NUM_WORKERS': 40,
           'ROOT_DIR': './tools_cam/..',
           'SAVE_DIR': 'ckpt/CUB/deit_tscam_small_patch16_224_CAM-NORMAL_SEED26_CAM-THR0.1_BS128_2022-03-25-01-46',
           'SEED': 26,
           'TIME': '2022-03-25-01-46'},
 'CUDNN': {'BENCHMARK': False, 'DETERMINISTIC': True, 'ENABLE': True},
 'DATA': {'CROP_SIZE': 224,
          'DATADIR': 'data/CUB_200_2011',
          'DATASET': 'CUB',
          'IMAGE_MEAN': [0.485, 0.456, 0.406],
          'IMAGE_STD': [0.229, 0.224, 0.225],
          'NUM_CLASSES': 200,
          'RESIZE_SIZE': 256,
          'SCALE_LENGTH': 15,
          'SCALE_SIZE': 196,
          'TRAIN_AUG_PATH': '',
          'VAL_PATH': ''},
 'MODEL': {'ARCH': 'deit_tscam_small_patch16_224',
           'CAM_THR': 0.1,
           'LOCALIZER_DIR': '',
           'TOP_K': 1},
 'SOLVER': {'LR_FACTOR': 0.1,
            'LR_STEPS': [30],
            'MUMENTUM': 0.9,
            'NUM_EPOCHS': 60,
            'START_LR': 0.001,
            'WEIGHT_DECAY': 0.0005},
 'TEST': {'BATCH_SIZE': 128,
          'CKPT_DIR': '',
          'SAVE_BOXED_IMAGE': False,
          'SAVE_CAMS': False,
          'TEN_CROPS': False},
 'TRAIN': {'ALPHA': 1.0, 'BATCH_SIZE': 128, 'BETA': 1.0, 'IF_FIX_WEIGHT': True}}
==> Preparing data...
done!
==> Preparing networks for baseline...
Removing key head.weight from pretrained checkpoint
Removing key head.bias from pretrained checkpoint
TSCAM(
  (patch_embed): PatchEmbed(
    (proj): Conv2d(3, 384, kernel_size=(16, 16), stride=(16, 16))
  )
  (pos_drop): Dropout(p=0.0, inplace=False)
  (blocks): ModuleList(
    (0): Block(
      (norm1): LayerNorm((384,), eps=1e-06, elementwise_affine=True)
      (attn): Attention(
        (qkv): Linear(in_features=384, out_features=1152, bias=True)
        (attn_drop): Dropout(p=0.0, inplace=False)
        (proj): Linear(in_features=384, out_features=384, bias=True)
        (proj_drop): Dropout(p=0.0, inplace=False)
      )
      (drop_path): Identity()
      (norm2): LayerNorm((384,), eps=1e-06, elementwise_affine=True)
      (mlp): Mlp(
        (fc1): Linear(in_features=384, out_features=1536, bias=True)
        (act): GELU()
        (fc2): Linear(in_features=1536, out_features=384, bias=True)
        (drop): Dropout(p=0.0, inplace=False)
      )
    )
    (1): Block(
      (norm1): LayerNorm((384,), eps=1e-06, elementwise_affine=True)
      (attn): Attention(
        (qkv): Linear(in_features=384, out_features=1152, bias=True)
        (attn_drop): Dropout(p=0.0, inplace=False)
        (proj): Linear(in_features=384, out_features=384, bias=True)
        (proj_drop): Dropout(p=0.0, inplace=False)
      )
      (drop_path): DropPath()
      (norm2): LayerNorm((384,), eps=1e-06, elementwise_affine=True)
      (mlp): Mlp(
        (fc1): Linear(in_features=384, out_features=1536, bias=True)
        (act): GELU()
        (fc2): Linear(in_features=1536, out_features=384, bias=True)
        (drop): Dropout(p=0.0, inplace=False)
      )
    )
    (2): Block(
      (norm1): LayerNorm((384,), eps=1e-06, elementwise_affine=True)
      (attn): Attention(
        (qkv): Linear(in_features=384, out_features=1152, bias=True)
        (attn_drop): Dropout(p=0.0, inplace=False)
        (proj): Linear(in_features=384, out_features=384, bias=True)
        (proj_drop): Dropout(p=0.0, inplace=False)
      )
      (drop_path): DropPath()
      (norm2): LayerNorm((384,), eps=1e-06, elementwise_affine=True)
      (mlp): Mlp(
        (fc1): Linear(in_features=384, out_features=1536, bias=True)
        (act): GELU()
        (fc2): Linear(in_features=1536, out_features=384, bias=True)
        (drop): Dropout(p=0.0, inplace=False)
      )
    )
    (3): Block(
      (norm1): LayerNorm((384,), eps=1e-06, elementwise_affine=True)
      (attn): Attention(
        (qkv): Linear(in_features=384, out_features=1152, bias=True)
        (attn_drop): Dropout(p=0.0, inplace=False)
        (proj): Linear(in_features=384, out_features=384, bias=True)
        (proj_drop): Dropout(p=0.0, inplace=False)
      )
      (drop_path): DropPath()
      (norm2): LayerNorm((384,), eps=1e-06, elementwise_affine=True)
      (mlp): Mlp(
        (fc1): Linear(in_features=384, out_features=1536, bias=True)
        (act): GELU()
        (fc2): Linear(in_features=1536, out_features=384, bias=True)
        (drop): Dropout(p=0.0, inplace=False)
      )
    )
    (4): Block(
      (norm1): LayerNorm((384,), eps=1e-06, elementwise_affine=True)
      (attn): Attention(
        (qkv): Linear(in_features=384, out_features=1152, bias=True)
        (attn_drop): Dropout(p=0.0, inplace=False)
        (proj): Linear(in_features=384, out_features=384, bias=True)
        (proj_drop): Dropout(p=0.0, inplace=False)
      )
      (drop_path): DropPath()
      (norm2): LayerNorm((384,), eps=1e-06, elementwise_affine=True)
      (mlp): Mlp(
        (fc1): Linear(in_features=384, out_features=1536, bias=True)
        (act): GELU()
        (fc2): Linear(in_features=1536, out_features=384, bias=True)
        (drop): Dropout(p=0.0, inplace=False)
      )
    )
    (5): Block(
      (norm1): LayerNorm((384,), eps=1e-06, elementwise_affine=True)
      (attn): Attention(
        (qkv): Linear(in_features=384, out_features=1152, bias=True)
        (attn_drop): Dropout(p=0.0, inplace=False)
        (proj): Linear(in_features=384, out_features=384, bias=True)
        (proj_drop): Dropout(p=0.0, inplace=False)
      )
      (drop_path): DropPath()
      (norm2): LayerNorm((384,), eps=1e-06, elementwise_affine=True)
      (mlp): Mlp(
        (fc1): Linear(in_features=384, out_features=1536, bias=True)
        (act): GELU()
        (fc2): Linear(in_features=1536, out_features=384, bias=True)
        (drop): Dropout(p=0.0, inplace=False)
      )
    )
    (6): Block(
      (norm1): LayerNorm((384,), eps=1e-06, elementwise_affine=True)
      (attn): Attention(
        (qkv): Linear(in_features=384, out_features=1152, bias=True)
        (attn_drop): Dropout(p=0.0, inplace=False)
        (proj): Linear(in_features=384, out_features=384, bias=True)
        (proj_drop): Dropout(p=0.0, inplace=False)
      )
      (drop_path): DropPath()
      (norm2): LayerNorm((384,), eps=1e-06, elementwise_affine=True)
      (mlp): Mlp(
        (fc1): Linear(in_features=384, out_features=1536, bias=True)
        (act): GELU()
        (fc2): Linear(in_features=1536, out_features=384, bias=True)
        (drop): Dropout(p=0.0, inplace=False)
      )
    )
    (7): Block(
      (norm1): LayerNorm((384,), eps=1e-06, elementwise_affine=True)
      (attn): Attention(
        (qkv): Linear(in_features=384, out_features=1152, bias=True)
        (attn_drop): Dropout(p=0.0, inplace=False)
        (proj): Linear(in_features=384, out_features=384, bias=True)
        (proj_drop): Dropout(p=0.0, inplace=False)
      )
      (drop_path): DropPath()
      (norm2): LayerNorm((384,), eps=1e-06, elementwise_affine=True)
      (mlp): Mlp(
        (fc1): Linear(in_features=384, out_features=1536, bias=True)
        (act): GELU()
        (fc2): Linear(in_features=1536, out_features=384, bias=True)
        (drop): Dropout(p=0.0, inplace=False)
      )
    )
    (8): Block(
      (norm1): LayerNorm((384,), eps=1e-06, elementwise_affine=True)
      (attn): Attention(
        (qkv): Linear(in_features=384, out_features=1152, bias=True)
        (attn_drop): Dropout(p=0.0, inplace=False)
        (proj): Linear(in_features=384, out_features=384, bias=True)
        (proj_drop): Dropout(p=0.0, inplace=False)
      )
      (drop_path): DropPath()
      (norm2): LayerNorm((384,), eps=1e-06, elementwise_affine=True)
      (mlp): Mlp(
        (fc1): Linear(in_features=384, out_features=1536, bias=True)
        (act): GELU()
        (fc2): Linear(in_features=1536, out_features=384, bias=True)
        (drop): Dropout(p=0.0, inplace=False)
      )
    )
    (9): Block(
      (norm1): LayerNorm((384,), eps=1e-06, elementwise_affine=True)
      (attn): Attention(
        (qkv): Linear(in_features=384, out_features=1152, bias=True)
        (attn_drop): Dropout(p=0.0, inplace=False)
        (proj): Linear(in_features=384, out_features=384, bias=True)
        (proj_drop): Dropout(p=0.0, inplace=False)
      )
      (drop_path): DropPath()
      (norm2): LayerNorm((384,), eps=1e-06, elementwise_affine=True)
      (mlp): Mlp(
        (fc1): Linear(in_features=384, out_features=1536, bias=True)
        (act): GELU()
        (fc2): Linear(in_features=1536, out_features=384, bias=True)
        (drop): Dropout(p=0.0, inplace=False)
      )
    )
    (10): Block(
      (norm1): LayerNorm((384,), eps=1e-06, elementwise_affine=True)
      (attn): Attention(
        (qkv): Linear(in_features=384, out_features=1152, bias=True)
        (attn_drop): Dropout(p=0.0, inplace=False)
        (proj): Linear(in_features=384, out_features=384, bias=True)
        (proj_drop): Dropout(p=0.0, inplace=False)
      )
      (drop_path): DropPath()
      (norm2): LayerNorm((384,), eps=1e-06, elementwise_affine=True)
      (mlp): Mlp(
        (fc1): Linear(in_features=384, out_features=1536, bias=True)
        (act): GELU()
        (fc2): Linear(in_features=1536, out_features=384, bias=True)
        (drop): Dropout(p=0.0, inplace=False)
      )
    )
    (11): Block(
      (norm1): LayerNorm((384,), eps=1e-06, elementwise_affine=True)
      (attn): Attention(
        (qkv): Linear(in_features=384, out_features=1152, bias=True)
        (attn_drop): Dropout(p=0.0, inplace=False)
        (proj): Linear(in_features=384, out_features=384, bias=True)
        (proj_drop): Dropout(p=0.0, inplace=False)
      )
      (drop_path): DropPath()
      (norm2): LayerNorm((384,), eps=1e-06, elementwise_affine=True)
      (mlp): Mlp(
        (fc1): Linear(in_features=384, out_features=1536, bias=True)
        (act): GELU()
        (fc2): Linear(in_features=1536, out_features=384, bias=True)
        (drop): Dropout(p=0.0, inplace=False)
      )
    )
  )
  (norm): LayerNorm((384,), eps=1e-06, elementwise_affine=True)
  (head): Conv2d(384, 200, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (avgpool): AdaptiveAvgPool2d(output_size=1)
)
Preparing networks done!
Train Epoch: [1][1/47],lr: 0.00005	Loss 5.3910 (5.3910)	Prec@1 0.781 (0.781)	Prec@5 3.125 (3.125)
Train Epoch: [1][11/47],lr: 0.00005	Loss 5.2794 (5.3558)	Prec@1 1.562 (0.781)	Prec@5 7.031 (2.983)
Train Epoch: [1][21/47],lr: 0.00005	Loss 5.2877 (5.3156)	Prec@1 0.781 (0.818)	Prec@5 4.688 (3.162)
Train Epoch: [1][31/47],lr: 0.00005	Loss 5.1760 (5.2851)	Prec@1 0.781 (0.882)	Prec@5 7.031 (3.931)
Train Epoch: [1][41/47],lr: 0.00005	Loss 5.1290 (5.2508)	Prec@1 6.250 (1.524)	Prec@5 12.500 (5.812)
Train Epoch: [1][47/47],lr: 0.00005	Loss 5.1377 (5.2344)	Prec@1 3.774 (1.668)	Prec@5 11.321 (6.540)
Val Epoch: [1][1/46]	Loss 4.9150 (4.9150)	
Cls@1:0.125	Cls@5:0.320
Loc@1:0.031	Loc@5:0.055	Loc_gt:0.258

Val Epoch: [1][11/46]	Loss 4.7868 (5.0117)	
Cls@1:0.065	Cls@5:0.232
Loc@1:0.011	Loc@5:0.044	Loc_gt:0.214

Val Epoch: [1][21/46]	Loss 5.0634 (5.0060)	
Cls@1:0.066	Cls@5:0.232
Loc@1:0.015	Loc@5:0.052	Loc_gt:0.217

Val Epoch: [1][31/46]	Loss 5.1113 (5.0342)	
Cls@1:0.061	Cls@5:0.206
Loc@1:0.014	Loc@5:0.046	Loc_gt:0.198

Val Epoch: [1][41/46]	Loss 5.0010 (5.0245)	
Cls@1:0.059	Cls@5:0.204
Loc@1:0.014	Loc@5:0.046	Loc_gt:0.192

Val Epoch: [1][46/46]	Loss 4.8866 (5.0296)	
Cls@1:0.059	Cls@5:0.200
Loc@1:0.013	Loc@5:0.045	Loc_gt:0.189

wrong_details:75 5454 0 6 254 5
Best GT_LOC: 0.18916120124266483
Best TOP1_LOC: 0.18916120124266483
2022-03-25-01-49
Train Epoch: [2][1/47],lr: 0.00005	Loss 5.0064 (5.0064)	Prec@1 6.250 (6.250)	Prec@5 17.969 (17.969)
Train Epoch: [2][11/47],lr: 0.00005	Loss 4.9585 (4.9966)	Prec@1 6.250 (6.818)	Prec@5 21.875 (22.656)
Train Epoch: [2][21/47],lr: 0.00005	Loss 4.9573 (4.9768)	Prec@1 8.594 (6.734)	Prec@5 28.906 (24.479)
Train Epoch: [2][31/47],lr: 0.00005	Loss 4.9050 (4.9509)	Prec@1 10.938 (7.737)	Prec@5 28.125 (25.932)
Train Epoch: [2][41/47],lr: 0.00005	Loss 4.8085 (4.9271)	Prec@1 14.844 (8.841)	Prec@5 37.500 (27.458)
Train Epoch: [2][47/47],lr: 0.00005	Loss 4.8456 (4.9160)	Prec@1 8.491 (9.059)	Prec@5 31.132 (28.195)
Val Epoch: [2][1/46]	Loss 4.5358 (4.5358)	
Cls@1:0.258	Cls@5:0.523
Loc@1:0.078	Loc@5:0.164	Loc_gt:0.344

Val Epoch: [2][11/46]	Loss 4.3821 (4.7243)	
Cls@1:0.164	Cls@5:0.431
Loc@1:0.045	Loc@5:0.109	Loc_gt:0.240

Val Epoch: [2][21/46]	Loss 4.8342 (4.6906)	
Cls@1:0.173	Cls@5:0.453
Loc@1:0.059	Loc@5:0.135	Loc_gt:0.251

Val Epoch: [2][31/46]	Loss 4.9996 (4.7545)	
Cls@1:0.153	Cls@5:0.403
Loc@1:0.050	Loc@5:0.115	Loc_gt:0.225

Val Epoch: [2][41/46]	Loss 4.8124 (4.7559)	
Cls@1:0.138	Cls@5:0.385
Loc@1:0.045	Loc@5:0.108	Loc_gt:0.217

Val Epoch: [2][46/46]	Loss 4.8159 (4.7612)	
Cls@1:0.142	Cls@5:0.391
Loc@1:0.045	Loc@5:0.108	Loc_gt:0.213

wrong_details:263 4971 0 21 536 3
Best GT_LOC: 0.2126337590610977
Best TOP1_LOC: 0.2126337590610977
2022-03-25-01-54
Train Epoch: [3][1/47],lr: 0.00005	Loss 4.7283 (4.7283)	Prec@1 21.094 (21.094)	Prec@5 46.875 (46.875)
Train Epoch: [3][11/47],lr: 0.00005	Loss 4.7234 (4.7402)	Prec@1 11.719 (15.483)	Prec@5 45.312 (43.111)
Train Epoch: [3][21/47],lr: 0.00005	Loss 4.6686 (4.7088)	Prec@1 15.625 (16.332)	Prec@5 45.312 (43.824)
Train Epoch: [3][31/47],lr: 0.00005	Loss 4.6701 (4.6906)	Prec@1 20.312 (16.608)	Prec@5 46.875 (43.800)
Train Epoch: [3][41/47],lr: 0.00005	Loss 4.5544 (4.6702)	Prec@1 26.562 (17.073)	Prec@5 50.000 (44.284)
Train Epoch: [3][47/47],lr: 0.00005	Loss 4.5622 (4.6585)	Prec@1 26.415 (17.718)	Prec@5 49.057 (44.745)
Val Epoch: [3][1/46]	Loss 4.1796 (4.1796)	
Cls@1:0.336	Cls@5:0.711
Loc@1:0.156	Loc@5:0.312	Loc_gt:0.438

Val Epoch: [3][11/46]	Loss 4.0685 (4.4652)	
Cls@1:0.263	Cls@5:0.551
Loc@1:0.078	Loc@5:0.164	Loc_gt:0.273

Val Epoch: [3][21/46]	Loss 4.6838 (4.4194)	
Cls@1:0.264	Cls@5:0.570
Loc@1:0.098	Loc@5:0.198	Loc_gt:0.294

Val Epoch: [3][31/46]	Loss 4.8199 (4.5032)	
Cls@1:0.232	Cls@5:0.515
Loc@1:0.083	Loc@5:0.167	Loc_gt:0.260

Val Epoch: [3][41/46]	Loss 4.6710 (4.5206)	
Cls@1:0.209	Cls@5:0.494
Loc@1:0.073	Loc@5:0.156	Loc_gt:0.250

Val Epoch: [3][46/46]	Loss 4.4396 (4.5273)	
Cls@1:0.213	Cls@5:0.501
Loc@1:0.072	Loc@5:0.153	Loc_gt:0.243

wrong_details:420 4557 0 45 757 15
Best GT_LOC: 0.24318260269244046
Best TOP1_LOC: 0.24318260269244046
2022-03-25-01-59
Train Epoch: [4][1/47],lr: 0.00005	Loss 4.4849 (4.4849)	Prec@1 21.875 (21.875)	Prec@5 58.594 (58.594)
Train Epoch: [4][11/47],lr: 0.00005	Loss 4.5143 (4.4929)	Prec@1 28.125 (25.284)	Prec@5 47.656 (55.185)
Train Epoch: [4][21/47],lr: 0.00005	Loss 4.3787 (4.4674)	Prec@1 22.656 (25.744)	Prec@5 56.250 (55.357)
Train Epoch: [4][31/47],lr: 0.00005	Loss 4.3940 (4.4535)	Prec@1 31.250 (25.731)	Prec@5 52.344 (54.940)
Train Epoch: [4][41/47],lr: 0.00005	Loss 4.3730 (4.4333)	Prec@1 21.875 (26.067)	Prec@5 59.375 (55.259)
Train Epoch: [4][47/47],lr: 0.00005	Loss 4.3386 (4.4240)	Prec@1 28.302 (26.376)	Prec@5 64.151 (55.672)
Val Epoch: [4][1/46]	Loss 3.8875 (3.8875)	
Cls@1:0.383	Cls@5:0.758
Loc@1:0.203	Loc@5:0.398	Loc_gt:0.484

Val Epoch: [4][11/46]	Loss 3.8129 (4.2537)	
Cls@1:0.312	Cls@5:0.580
Loc@1:0.114	Loc@5:0.204	Loc_gt:0.298

Val Epoch: [4][21/46]	Loss 4.5173 (4.1790)	
Cls@1:0.326	Cls@5:0.619
Loc@1:0.137	Loc@5:0.244	Loc_gt:0.330

Val Epoch: [4][31/46]	Loss 4.6776 (4.2892)	
Cls@1:0.285	Cls@5:0.564
Loc@1:0.115	Loc@5:0.205	Loc_gt:0.290

Val Epoch: [4][41/46]	Loss 4.4627 (4.3164)	
Cls@1:0.263	Cls@5:0.547
Loc@1:0.102	Loc@5:0.190	Loc_gt:0.277

Val Epoch: [4][46/46]	Loss 4.2653 (4.3204)	
Cls@1:0.270	Cls@5:0.557
Loc@1:0.100	Loc@5:0.186	Loc_gt:0.269

wrong_details:580 4232 0 75 889 18
Best GT_LOC: 0.26855367621677595
Best TOP1_LOC: 0.26855367621677595
2022-03-25-02-01
Train Epoch: [5][1/47],lr: 0.00005	Loss 4.3349 (4.3349)	Prec@1 32.812 (32.812)	Prec@5 57.031 (57.031)
Train Epoch: [5][11/47],lr: 0.00005	Loss 4.2210 (4.2754)	Prec@1 31.250 (33.239)	Prec@5 62.500 (62.713)
Train Epoch: [5][21/47],lr: 0.00005	Loss 4.2603 (4.2594)	Prec@1 31.250 (32.626)	Prec@5 57.812 (61.793)
Train Epoch: [5][31/47],lr: 0.00005	Loss 4.2397 (4.2502)	Prec@1 29.688 (31.678)	Prec@5 62.500 (61.164)
Train Epoch: [5][41/47],lr: 0.00005	Loss 4.2377 (4.2285)	Prec@1 26.562 (31.155)	Prec@5 60.156 (61.300)
Train Epoch: [5][47/47],lr: 0.00005	Loss 4.1144 (4.2206)	Prec@1 34.906 (31.365)	Prec@5 59.434 (61.328)
Val Epoch: [5][1/46]	Loss 3.6491 (3.6491)	
Cls@1:0.398	Cls@5:0.789
Loc@1:0.203	Loc@5:0.445	Loc_gt:0.516

Val Epoch: [5][11/46]	Loss 3.5341 (4.0492)	
Cls@1:0.343	Cls@5:0.620
Loc@1:0.140	Loc@5:0.247	Loc_gt:0.333

Val Epoch: [5][21/46]	Loss 4.3516 (3.9736)	
Cls@1:0.353	Cls@5:0.653
Loc@1:0.156	Loc@5:0.279	Loc_gt:0.361

Val Epoch: [5][31/46]	Loss 4.5599 (4.1005)	
Cls@1:0.306	Cls@5:0.605
Loc@1:0.132	Loc@5:0.238	Loc_gt:0.318

Val Epoch: [5][41/46]	Loss 4.3230 (4.1360)	
Cls@1:0.286	Cls@5:0.593
Loc@1:0.121	Loc@5:0.225	Loc_gt:0.306

Val Epoch: [5][46/46]	Loss 4.1012 (4.1362)	
Cls@1:0.295	Cls@5:0.602
Loc@1:0.119	Loc@5:0.220	Loc_gt:0.297

wrong_details:688 4082 0 88 912 24
Best GT_LOC: 0.2965136347946151
Best TOP1_LOC: 0.2965136347946151
2022-03-25-02-02
Train Epoch: [6][1/47],lr: 0.00005	Loss 4.1231 (4.1231)	Prec@1 30.469 (30.469)	Prec@5 66.406 (66.406)
Train Epoch: [6][11/47],lr: 0.00005	Loss 4.0252 (4.0962)	Prec@1 37.500 (35.085)	Prec@5 70.312 (67.756)
Train Epoch: [6][21/47],lr: 0.00005	Loss 3.9509 (4.0630)	Prec@1 40.625 (36.533)	Prec@5 67.969 (67.671)
Train Epoch: [6][31/47],lr: 0.00005	Loss 3.8919 (4.0431)	Prec@1 45.312 (36.215)	Prec@5 64.844 (67.137)
Train Epoch: [6][41/47],lr: 0.00005	Loss 3.9957 (4.0417)	Prec@1 40.625 (36.128)	Prec@5 70.312 (66.749)
Train Epoch: [6][47/47],lr: 0.00005	Loss 3.9811 (4.0315)	Prec@1 41.509 (36.303)	Prec@5 64.151 (67.050)
Val Epoch: [6][1/46]	Loss 3.4300 (3.4300)	
Cls@1:0.438	Cls@5:0.781
Loc@1:0.219	Loc@5:0.453	Loc_gt:0.531

Val Epoch: [6][11/46]	Loss 3.3890 (3.8868)	
Cls@1:0.357	Cls@5:0.643
Loc@1:0.145	Loc@5:0.259	Loc_gt:0.343

Val Epoch: [6][21/46]	Loss 4.1725 (3.7921)	
Cls@1:0.377	Cls@5:0.680
Loc@1:0.170	Loc@5:0.299	Loc_gt:0.379

Val Epoch: [6][31/46]	Loss 4.4162 (3.9271)	
Cls@1:0.331	Cls@5:0.634
Loc@1:0.146	Loc@5:0.259	Loc_gt:0.336

Val Epoch: [6][41/46]	Loss 4.2253 (3.9698)	
Cls@1:0.313	Cls@5:0.623
Loc@1:0.136	Loc@5:0.245	Loc_gt:0.325

Val Epoch: [6][46/46]	Loss 3.9466 (3.9713)	
Cls@1:0.321	Cls@5:0.632
Loc@1:0.134	Loc@5:0.239	Loc_gt:0.314

wrong_details:776 3935 0 118 940 25
Best GT_LOC: 0.3139454608215395
Best TOP1_LOC: 0.3139454608215395
Preparing networks done!
Train Epoch: [1][1/47],lr: 0.00005	Loss 5.3910 (5.3910)	Prec@1 0.781 (0.781)	Prec@5 3.125 (3.125)
Train Epoch: [1][11/47],lr: 0.00005	Loss 5.2530 (5.3511)	Prec@1 1.562 (0.923)	Prec@5 3.906 (3.125)
Train Epoch: [1][21/47],lr: 0.00005	Loss 5.2631 (5.3216)	Prec@1 0.781 (0.781)	Prec@5 3.906 (3.646)
Train Epoch: [1][31/47],lr: 0.00005	Loss 5.1785 (5.2905)	Prec@1 0.781 (0.907)	Prec@5 3.125 (4.461)
Train Epoch: [1][41/47],lr: 0.00005	Loss 5.1472 (5.2599)	Prec@1 1.562 (0.953)	Prec@5 7.031 (4.668)
Train Epoch: [1][47/47],lr: 0.00005	Loss 5.1461 (5.2453)	Prec@1 1.887 (1.001)	Prec@5 12.264 (4.805)
Val Epoch: [1][1/46]	Loss 4.8300 (4.8300)	
Cls@1:0.000	Cls@5:0.094
Loc@1:0.000	Loc@5:0.023	Loc_gt:0.312

Val Epoch: [1][11/46]	Loss 4.7840 (5.0671)	
Cls@1:0.010	Cls@5:0.077
Loc@1:0.002	Loc@5:0.014	Loc_gt:0.224

Val Epoch: [1][21/46]	Loss 5.3839 (5.0786)	
Cls@1:0.010	Cls@5:0.070
Loc@1:0.002	Loc@5:0.016	Loc_gt:0.218

Val Epoch: [1][31/46]	Loss 5.3107 (5.1220)	
Cls@1:0.010	Cls@5:0.061
Loc@1:0.003	Loc@5:0.014	Loc_gt:0.199

Val Epoch: [1][41/46]	Loss 4.7929 (5.0675)	
Cls@1:0.016	Cls@5:0.069
Loc@1:0.005	Loc@5:0.016	Loc_gt:0.195

Val Epoch: [1][46/46]	Loss 5.0628 (5.0798)	
Cls@1:0.016	Cls@5:0.066
Loc@1:0.005	Loc@5:0.015	Loc_gt:0.192

wrong_details:27 5704 0 3 58 2
Best GT_LOC: 0.1924404556437694
Best TOP1_LOC: 0.1924404556437694
2022-03-25-01-49
Train Epoch: [2][1/47],lr: 0.00005	Loss 5.0344 (5.0344)	Prec@1 2.344 (2.344)	Prec@5 7.812 (7.812)
Train Epoch: [2][11/47],lr: 0.00005	Loss 4.9748 (5.0317)	Prec@1 0.781 (1.634)	Prec@5 9.375 (7.812)
Train Epoch: [2][21/47],lr: 0.00005	Loss 4.8753 (4.9973)	Prec@1 3.906 (2.083)	Prec@5 10.938 (8.557)
Train Epoch: [2][31/47],lr: 0.00005	Loss 4.8447 (4.9587)	Prec@1 1.562 (2.419)	Prec@5 9.375 (9.199)
Train Epoch: [2][41/47],lr: 0.00005	Loss 4.9252 (4.9204)	Prec@1 0.781 (2.591)	Prec@5 10.938 (10.061)
Train Epoch: [2][47/47],lr: 0.00005	Loss 4.7829 (4.8979)	Prec@1 4.717 (2.669)	Prec@5 16.981 (10.777)
Val Epoch: [2][1/46]	Loss 4.8567 (4.8567)	
Cls@1:0.008	Cls@5:0.109
Loc@1:0.008	Loc@5:0.078	Loc_gt:0.352

Val Epoch: [2][11/46]	Loss 4.4700 (4.4981)	
Cls@1:0.062	Cls@5:0.232
Loc@1:0.018	Loc@5:0.061	Loc_gt:0.241

Val Epoch: [2][21/46]	Loss 4.9166 (4.5804)	
Cls@1:0.057	Cls@5:0.199
Loc@1:0.015	Loc@5:0.049	Loc_gt:0.231

Val Epoch: [2][31/46]	Loss 4.8543 (4.6311)	
Cls@1:0.055	Cls@5:0.184
Loc@1:0.013	Loc@5:0.044	Loc_gt:0.230

Val Epoch: [2][41/46]	Loss 4.4745 (4.5779)	
Cls@1:0.052	Cls@5:0.182
Loc@1:0.013	Loc@5:0.046	Loc_gt:0.217

Val Epoch: [2][46/46]	Loss 3.8181 (4.5981)	
Cls@1:0.051	Cls@5:0.177
Loc@1:0.012	Loc@5:0.044	Loc_gt:0.215

wrong_details:71 5500 0 116 98 9
Best GT_LOC: 0.21453227476700035
Best TOP1_LOC: 0.21453227476700035
2022-03-25-01-54
Train Epoch: [3][1/47],lr: 0.00005	Loss 4.4202 (4.4202)	Prec@1 7.031 (7.031)	Prec@5 21.875 (21.875)
Train Epoch: [3][11/47],lr: 0.00005	Loss 4.2909 (4.5166)	Prec@1 8.594 (5.114)	Prec@5 26.562 (19.389)
Train Epoch: [3][21/47],lr: 0.00005	Loss 4.5282 (4.5042)	Prec@1 1.562 (5.320)	Prec@5 11.719 (19.085)
Train Epoch: [3][31/47],lr: 0.00005	Loss 4.3235 (4.4662)	Prec@1 9.375 (5.872)	Prec@5 21.875 (19.481)
Train Epoch: [3][41/47],lr: 0.00005	Loss 4.2087 (4.4195)	Prec@1 8.594 (6.250)	Prec@5 31.250 (21.418)
Train Epoch: [3][47/47],lr: 0.00005	Loss 4.1792 (4.3886)	Prec@1 7.547 (6.390)	Prec@5 22.642 (21.955)
Val Epoch: [3][1/46]	Loss 3.8698 (3.8698)	
Cls@1:0.172	Cls@5:0.289
Loc@1:0.055	Loc@5:0.102	Loc_gt:0.375

Val Epoch: [3][11/46]	Loss 4.1646 (3.9793)	
Cls@1:0.118	Cls@5:0.327
Loc@1:0.029	Loc@5:0.101	Loc_gt:0.257

Val Epoch: [3][21/46]	Loss 4.8005 (4.0862)	
Cls@1:0.106	Cls@5:0.299
Loc@1:0.022	Loc@5:0.076	Loc_gt:0.228

Val Epoch: [3][31/46]	Loss 4.6957 (4.3865)	
Cls@1:0.085	Cls@5:0.241
Loc@1:0.017	Loc@5:0.061	Loc_gt:0.225

Val Epoch: [3][41/46]	Loss 4.3491 (4.2824)	
Cls@1:0.082	Cls@5:0.253
Loc@1:0.017	Loc@5:0.062	Loc_gt:0.210

Val Epoch: [3][46/46]	Loss 3.9240 (4.3113)	
Cls@1:0.079	Cls@5:0.246
Loc@1:0.016	Loc@5:0.057	Loc_gt:0.203

wrong_details:90 5334 0 314 42 14
Best GT_LOC: 0.21453227476700035
Best TOP1_LOC: 0.21453227476700035
2022-03-25-02-00
Train Epoch: [4][1/47],lr: 0.00005	Loss 3.9914 (3.9914)	Prec@1 12.500 (12.500)	Prec@5 34.375 (34.375)
Train Epoch: [4][11/47],lr: 0.00005	Loss 4.1547 (4.0524)	Prec@1 10.156 (11.364)	Prec@5 29.688 (30.753)
Train Epoch: [4][21/47],lr: 0.00005	Loss 4.3899 (4.0732)	Prec@1 7.031 (10.528)	Prec@5 19.531 (30.432)
Train Epoch: [4][31/47],lr: 0.00005	Loss 3.7553 (4.0368)	Prec@1 11.719 (10.459)	Prec@5 38.281 (31.401)
Train Epoch: [4][41/47],lr: 0.00005	Loss 3.9345 (4.0095)	Prec@1 8.594 (10.690)	Prec@5 30.469 (31.879)
Train Epoch: [4][47/47],lr: 0.00005	Loss 3.7705 (3.9801)	Prec@1 16.981 (11.195)	Prec@5 38.679 (32.733)
Val Epoch: [4][1/46]	Loss 4.2226 (4.2226)	
Cls@1:0.141	Cls@5:0.344
Loc@1:0.086	Loc@5:0.195	Loc_gt:0.461

Val Epoch: [4][11/46]	Loss 3.6754 (3.8854)	
Cls@1:0.148	Cls@5:0.378
Loc@1:0.048	Loc@5:0.126	Loc_gt:0.303

Val Epoch: [4][21/46]	Loss 4.4785 (3.9334)	
Cls@1:0.134	Cls@5:0.353
Loc@1:0.036	Loc@5:0.102	Loc_gt:0.283

Val Epoch: [4][31/46]	Loss 4.5506 (3.9342)	
Cls@1:0.123	Cls@5:0.345
Loc@1:0.035	Loc@5:0.108	Loc_gt:0.294

Val Epoch: [4][41/46]	Loss 3.8479 (3.8611)	
Cls@1:0.121	Cls@5:0.354
Loc@1:0.030	Loc@5:0.100	Loc_gt:0.267

Val Epoch: [4][46/46]	Loss 3.7787 (3.8731)	
Cls@1:0.118	Cls@5:0.345
Loc@1:0.028	Loc@5:0.094	Loc_gt:0.265

wrong_details:164 5109 0 465 41 15
Best GT_LOC: 0.26458405246807043
Best TOP1_LOC: 0.26458405246807043
2022-03-25-02-02
Train Epoch: [5][1/47],lr: 0.00005	Loss 3.6695 (3.6695)	Prec@1 11.719 (11.719)	Prec@5 38.281 (38.281)
Train Epoch: [5][11/47],lr: 0.00005	Loss 3.5674 (3.5657)	Prec@1 19.531 (16.548)	Prec@5 40.625 (42.827)
Train Epoch: [5][21/47],lr: 0.00005	Loss 3.5742 (3.5837)	Prec@1 13.281 (16.481)	Prec@5 40.625 (43.043)
Train Epoch: [5][31/47],lr: 0.00005	Loss 3.5905 (3.5552)	Prec@1 19.531 (17.339)	Prec@5 44.531 (43.901)
Train Epoch: [5][41/47],lr: 0.00005	Loss 3.4860 (3.5470)	Prec@1 22.656 (17.492)	Prec@5 46.875 (44.284)
Train Epoch: [5][47/47],lr: 0.00005	Loss 3.6738 (3.5731)	Prec@1 13.208 (17.017)	Prec@5 38.679 (43.694)
Val Epoch: [5][1/46]	Loss 3.6581 (3.6581)	
Cls@1:0.133	Cls@5:0.398
Loc@1:0.039	Loc@5:0.164	Loc_gt:0.328

Val Epoch: [5][11/46]	Loss 3.4856 (3.7789)	
Cls@1:0.167	Cls@5:0.392
Loc@1:0.037	Loc@5:0.097	Loc_gt:0.261

Val Epoch: [5][21/46]	Loss 3.7290 (3.6464)	
Cls@1:0.188	Cls@5:0.424
Loc@1:0.036	Loc@5:0.093	Loc_gt:0.240

Val Epoch: [5][31/46]	Loss 4.2331 (3.7345)	
Cls@1:0.165	Cls@5:0.397
Loc@1:0.036	Loc@5:0.094	Loc_gt:0.242

Val Epoch: [5][41/46]	Loss 4.0443 (3.7508)	
Cls@1:0.156	Cls@5:0.393
Loc@1:0.034	Loc@5:0.089	Loc_gt:0.227

Val Epoch: [5][46/46]	Loss 3.5688 (3.7679)	
Cls@1:0.150	Cls@5:0.389
Loc@1:0.032	Loc@5:0.086	Loc_gt:0.226

wrong_details:185 4926 0 622 34 27
Best GT_LOC: 0.26458405246807043
Best TOP1_LOC: 0.26458405246807043

Looking for your help!

请问作者尝试过使用attention rollout的效果么

感谢作者公开自己工作的代码！我注意到可视化部分使用了attention rollout的方法，效果看起来不错，请问作者有没有在transformer基础上和TSCAM基础上使用attention rollout后的map来计算最终的定位效果，会有提升么？还是因为没有精度上的提升仅作为可视化的参考？

opened by ustcjinggg 2
TransAttention performance

Thanks for the great work.

I have a question about how to reproduce TransAttention performance on CUB-200 (Table 5). I got much higher performance by changing https://github.com/vasgaowei/TS-CAM/blob/aeb823ee097ce0c5594b8cb10d14c0aa03652df3/lib/models/deit.py#L62 to cams = cams.repeat((1, 200, 1, 1)):

Cls@1:0.803 Cls@5:0.948 Loc@1:0.690 Loc@5:0.816 Loc_gt:0.859 wrong_details:3998 1139 0 556 96 5

And, I got Loc@1:0.154 Loc@5:0.177 Loc_gt:0.183 for TransCAM, seems there's a mistake in the table. Personally, I feel it's unfair to compare with TransCAM without tuning CAM_THR to its optimal, I can get Loc@1:0.333 Loc@5:0.379 Loc_gt:0.387 by setting CAM_THR to around 0.8, I wonder your thoughts here.

opened by liruiwen 1
Paper Question about Table 1

Dear authors,

In Table1, the compared methods and the proposed method use different backbones. Could you interpret whether this comparison is fair?

Thank you.

opened by AmingWu 0
"creat_model" function call adjusted

"creat_model" is returning three values but at function call it's assigning to four tuples. So, adjusting the assignment to three values (tuple)

opened by shakeebmurtaza 0
The function of joint_attns_skip

Hi, thanks for your wonderful job! However, I dont get the function of joint_attns_skip in Line 471 of conformer.py, is the mean tensor of joint_attns_skip an optional cam?

opened by xujianglan 1
Question about visualization

Hello. At first, thank you for your code and paper.I really appreciate your work. I have a problem. When visualizing training log by our own dataset, we got some question.

An error occur when i tried to call forward function of model.Error is showed as below.

And I notice that when running the code below, it shows that our model is call visionTransformer, however the output in github show that model named TSCAM, can you tell me what make it different?

I'm hoping for your answer Thank you!

opened by Grand-ou 1

Question about the training process

opened by Unrealluver 1

How can i train my custom dataset with no object class?

I really appreciate your work. Nowadays I try to train TS-CAM with my own dataset. First, I make my own dataset into the CUB dataset format. And I set the no object image to the bounding box value (0,0,0,0). But I failed to train the model. Can you recommend any way to train the model with no object dataset?

opened by kimwin2 1
How could I get an performance(imageNet-1K) like your paper?

Hello. At first, thank you for your code and paper. Your paper caught my attention! But.. I have a problem. When training is turned to default setting, the performance is not good like you. So, if possible, can you tell me your setting? Like this : https://github.com/vasgaowei/TS-CAM/issues/7#issuecomment-927473246

I'm hoping for your answer :+1: :+1:

I am looking forward to your great papers. Thank you :)

It is my hyper-parameter setting.

{'BASIC': {'BACKUP_CODES': True, 'BACKUP_LIST': ['lib', 'tools_cam', 'configs'], 'DISP_FREQ': 10, 'GPU_ID': [0], 'NUM_WORKERS': 8, 'ROOT_DIR': './tools_cam/..', 'SAVE_DIR': 'ckpt/ImageNet/deit_tscam_small_patch16_224_CAM-NORMAL_SEED26_CAM-THR0.12_BS256_2022-02-02-14-27', 'SEED': 26, 'TIME': '2022-02-02-14-27'}, 'CUDNN': {'BENCHMARK': False, 'DETERMINISTIC': True, 'ENABLE': True}, 'DATA': {'CROP_SIZE': 224, 'DATADIR': 'data/ImageNet_ILSVRC2012', 'DATASET': 'ImageNet', 'IMAGE_MEAN': [0.485, 0.456, 0.406], 'IMAGE_STD': [0.229, 0.224, 0.225], 'NUM_CLASSES': 1000, 'RESIZE_SIZE': 512, 'SCALE_LENGTH': 15, 'SCALE_SIZE': 196}, 'MODEL': {'ARCH': 'deit_tscam_small_patch16_224', 'CAM_THR': 0.12, 'LOCALIZER_DIR': '', 'TOP_K': 1}, 'SOLVER': {'LR_FACTOR': 0.1, 'LR_STEPS': [10, 12], 'MUMENTUM': 0.9, 'NUM_EPOCHS': 20, 'START_LR': 0.004, 'WEIGHT_DECAY': 0.0005}, 'TEST': {'BATCH_SIZE': 512, 'CKPT_DIR': '', 'SAVE_BOXED_IMAGE': False, 'SAVE_CAMS': False, 'TEN_CROPS': False}, 'TRAIN': {'ALPHA': 1.0, 'BATCH_SIZE': 256, 'BETA': 1.0}} ==> Preparing data... done! ==> Preparing networks for baseline... Removing key head.weight from pretrained checkpoint TSCAM(

And the result is like this :

Val Epoch: [12][98/98] Loss 1.4334 (1.4654) Cls@1:0.657 Cls@5:0.858 Loc@1:0.451 Loc@5:0.564 Loc_gt:0.609

opened by SejinPark99 1

Codes for TS-CAM: Token Semantic Coupled Attention Map for Weakly Supervised Object Localization.

Related tags

Overview

TS-CAM: Token Semantic Coupled Attention Map for Weakly SupervisedObject Localization

Illustration

Model Zoo

Usage

Data preparation

CUB-200-2011 dataset

ImageNet1k

For training:

For evaluation:

Visualization

Visualize localization results

Visualize attention maps

Contacts

Citation

Comments

Owner

vasgaowei

Contrastive learning of Class-agnostic Activation Map for Weakly Supervised Object Localization and Semantic Segmentation (CVPR 2022)

Weakly Supervised Dense Event Captioning in Videos, i.e. generating multiple sentence descriptions for a video in a weakly-supervised manner.

Normalization Matters in Weakly Supervised Object Localization (ICCV 2021)

PyTorch implementation of ''Background Activation Suppression for Weakly Supervised Object Localization''.

Python scripts performing class agnostic object localization using the Object Localization Network model in ONNX.

Hybrid CenterNet - Hybrid-supervised object detection / Weakly semi-supervised object detection

Puzzle-CAM: Improved localization via matching partial and full features.

Learning Pixel-level Semantic Affinity with Image-level Supervision for Weakly Supervised Semantic Segmentation, CVPR 2018

Drone-based Joint Density Map Estimation, Localization and Tracking with Space-Time Multi-Scale Attention Network

Official Pytorch Implementation of 'Learning Action Completeness from Points for Weakly-supervised Temporal Action Localization' (ICCV-21 Oral)

Weak-supervised Visual Geo-localization via Attention-based Knowledge Distillation

A Comprehensive Analysis of Weakly-Supervised Semantic Segmentation in Different Image Domains (IJCV submission)

Context Decoupling Augmentation for Weakly Supervised Semantic Segmentation

Discriminative Region Suppression for Weakly-Supervised Semantic Segmentation

Anti-Adversarially Manipulated Attributions for Weakly and Semi-Supervised Semantic Segmentation (CVPR 2021)

Code for the paper One Thing One Click: A Self-Training Approach for Weakly Supervised 3D Semantic Segmentation, CVPR 2021.

DiscoBox: Weakly Supervised Instance Segmentation and Semantic Correspondence from Box Supervision

Reducing Information Bottleneck for Weakly Supervised Semantic Segmentation (NeurIPS 2021)

The PyTorch implementation of DiscoBox: Weakly Supervised Instance Segmentation and Semantic Correspondence from Box Supervision.