Codes for TS-CAM: Token Semantic Coupled Attention Map for Weakly Supervised Object Localization.


TS-CAM: Token Semantic Coupled Attention Map for Weakly SupervisedObject Localization

This is the official implementaion of paper TS-CAM: Token Semantic Coupled Attention Map for Weakly Supervised Object Localization

This repository contains Pytorch training code, evaluation code, pretrained models and jupyter notebook for more visualization.


Based on Deit, TS-CAM couples attention maps from visual image transformer with semantic-aware maps to obtain accurate localization maps (Token Semantic Coupled Attention Map, ts-cam).


Model Zoo

We provide pretrained TS-CAM models trained on CUB-200-2011 and ImageNet_ILSVRC2012 datasets.

Dataset Loc.Acc@1 Loc.Acc@5 Loc.Gt-Known Cls.Acc@1 Cls.Acc@5 Baidu Drive Google Drive
CUB-200-2011 71.3 83.8 87.7 80.3 94.8 model model
ILSVRC2012 53.4 64.3 67.6 74.3 92.1 model model

Note: the Extrate Code for Baidu Drive is as follows:


First clone the repository locally:

git clone

Then install Pytorch 1.7.0+ and torchvision 0.8.1+ and pytorch-image-models 0.3.2:

conda create -n pytorch1.7 python=3.6
conda activate pytorc1.7
conda install anaconda
conda install pytorch==1.7.0 torchvision==0.8.0 torchaudio==0.7.0 cudatoolkit=10.2 -c pytorch
pip install timm==0.3.2

Data preparation

CUB-200-2011 dataset

Please download and extrate CUB-200-2011 dataset.

The directory structure is the following:



Download ILSVRC2012 dataset and extract train and val images.

The directory structure is organized as follows:


And the training and validation data is expected to be in the train/ folder and val folder respectively:

For training:

On CUB-200-2011 dataset:

bash {GPU_ID} ${NET}

On ImageNet1k dataset:

bash {GPU_ID} ${NET}

Please note that pretrained model weights of Deit-tiny, Deit-small and Deit-base on ImageNet-1k model will be downloaded when you first train you model, so the Internet should be connected.

For evaluation:

On CUB-200-2011 dataset:

bash {GPU_ID} ${NET} ${MODEL_PATH}

On ImageNet1k dataset:

bash {GPU_ID} ${NET} ${MODEL_PATH}

GPU_ID should be specified and multiple GPUs can be used for accelerating training and evaluation.

NET shoule be chosen among tiny, small and base.

MODEL_PATH is the path of pretrained model.


We provided jupyter notebook in tools_cam folder.


Please download pretrained TS-CAM model weights and try more visualzation results((Attention maps using our method and Attention Rollout method)). You can try other interseting images you like to show the localization map(ts-cams).

Visualize localization results

We provide some visualization results as follows.


Visualize attention maps

We can also visualize attention maps from different transformer layers.

attention maps_cub attention_map_ilsvrc


If you have any question about our work or this repository, please don't hesitate to contact us by emails.

You can also open an issue under this project.


If you use this code for a paper please cite:

  title={TS-CAM: Token Semantic Coupled Attention Map for Weakly Supervised Object Localization},
  author={Wei Gao and Fang Wan and Xingjia Pan and Zhiliang Peng and Qi Tian and Zhenjun Han and Bolei Zhou and Qixiang Ye},
  请问作者尝试过使用attention rollout的效果么

    请问作者尝试过使用attention rollout的效果么

    感谢作者公开自己工作的代码! 我注意到可视化部分使用了attention rollout的方法,效果看起来不错,请问作者有没有在transformer基础上和TSCAM基础上使用attention rollout后的map来计算最终的定位效果,会有提升么?还是因为没有精度上的提升仅作为可视化的参考?

    opened by ustcjinggg 2
  TransAttention performance

    TransAttention performance

    Thanks for the great work.

    I have a question about how to reproduce TransAttention performance on CUB-200 (Table 5). I got much higher performance by changing to cams = cams.repeat((1, 200, 1, 1)):

    Cls@1:0.803 Cls@5:0.948 Loc@1:0.690 Loc@5:0.816 Loc_gt:0.859 wrong_details:3998 1139 0 556 96 5

    And, I got Loc@1:0.154 Loc@5:0.177 Loc_gt:0.183 for TransCAM, seems there's a mistake in the table. Personally, I feel it's unfair to compare with TransCAM without tuning CAM_THR to its optimal, I can get Loc@1:0.333 Loc@5:0.379 Loc_gt:0.387 by setting CAM_THR to around 0.8, I wonder your thoughts here.

    opened by liruiwen 1
  Paper Question about Table 1

    Paper Question about Table 1

    Dear authors,

    In Table1, the compared methods and the proposed method use different backbones. Could you interpret whether this comparison is fair?

    Thank you.

    opened by AmingWu 0
  • "creat_model" function call adjusted

    "creat_model" is returning three values but at function call it's assigning to four tuples. So, adjusting the assignment to three values (tuple)

    opened by shakeebmurtaza 0
  The function of joint_attns_skip

    The function of joint_attns_skip

    Hi, thanks for your wonderful job! However, I dont get the function of joint_attns_skip in Line 471 of, is the mean tensor of joint_attns_skip an optional cam?

    opened by xujianglan 1
  Question about visualization

    Question about visualization

    Hello. At first, thank you for your code and paper.I really appreciate your work. I have a problem. When visualizing training log by our own dataset, we got some question.

An error occur when i tried to call forward function of model.

    An error occur when i tried to call forward function of model.Error is showed as below.

    And I notice that when running the code, it shows that our model is call visionTransformer, however the output in github show that model named TSCAM, can you tell me what make it different?

    1657626288610 I'm hoping for your answer Thank you!

    opened by Grand-ou 1
  Question about the training process

    Question about the training process

    Greetings! Thanks for your excellent work! When running your code, I met a problem that the performance is poor. My running command is

bash 3 deit small 224

     bash 3 deit small 224

    and I got the log like:

    Looking for your help!

    opened by Unrealluver 1
  How can i train my custom dataset with no object class?

    How can i train my custom dataset with no object class?

    I really appreciate your work. Nowadays I try to train TS-CAM with my own dataset. First, I make my own dataset into the CUB dataset format. And I set the no object image to the bounding box value (0,0,0,0). But I failed to train the model. Can you recommend any way to train the model with no object dataset?

    opened by kimwin2 1
  How could I get an performance(imageNet-1K) like your paper?

    How could I get an performance(imageNet-1K) like your paper?

    Hello. At first, thank you for your code and paper. Your paper caught my attention! But.. I have a problem. When training is turned to default setting, the performance is not good like you. So, if possible, can you tell me your setting?

    I'm hoping for your answer :+1: :+1:

    I am looking forward to your great papers. Thank you :)

    It is my hyper-parameter setting.

    It is my hyper-parameter setting.

{'BASIC': {'BACKUP_CODES': True, 'BACKUP_LIST': ['lib', 'tools_cam', 'configs'], 'DISP_FREQ': 10, 'GPU_ID': [0], 'NUM_WORKERS': 8, 'ROOT_DIR': './tools_cam/..', 'SAVE_DIR': 'ckpt/ImageNet/deit_tscam_small_patch16_224_CAM-NORMAL_SEED26_CAM-THR0.12_BS256_2022-02-02-14-27', 'SEED': 26, 'TIME': '2022-02-02-14-27'}, 'CUDNN': {'BENCHMARK': False, 'DETERMINISTIC': True, 'ENABLE': True}, 'DATA': {'CROP_SIZE': 224, 'DATADIR': 'data/ImageNet_ILSVRC2012', 'DATASET': 'ImageNet', 'IMAGE_MEAN': [0.485, 0.456, 0.406], 'IMAGE_STD': [0.229, 0.224, 0.225], 'NUM_CLASSES': 1000, 'RESIZE_SIZE': 512, 'SCALE_LENGTH': 15, 'SCALE_SIZE': 196}, 'MODEL': {'ARCH': 'deit_tscam_small_patch16_224', 'CAM_THR': 0.12, 'LOCALIZER_DIR': '', 'TOP_K': 1}, 'SOLVER': {'LR_FACTOR': 0.1, 'LR_STEPS': [10, 12], 'MUMENTUM': 0.9, 'NUM_EPOCHS': 20, 'START_LR': 0.004, 'WEIGHT_DECAY': 0.0005}, 'TEST': {'BATCH_SIZE': 512, 'CKPT_DIR': '', 'SAVE_BOXED_IMAGE': False, 'SAVE_CAMS': False, 'TEN_CROPS': False}, 'TRAIN': {'ALPHA': 1.0, 'BATCH_SIZE': 256, 'BETA': 1.0}}

    And the result is like this :

    Val Epoch: [12][98/98] Loss 1.4334 (1.4654) Cls@1:0.657 Cls@5:0.858 Loc@1:0.451 Loc@5:0.564 Loc_gt:0.609

    opened by SejinPark99 1
