CATs: Semantic Correspondence with Transformers

For more information, check out the paper on [arXiv].

Training with different backbones and evaluations of them are to be updated soon..


Our model CATs is illustrated below:

alt text

Environment Settings

git clone
cd CATs

conda create -n CATs python=3.6
conda activate CATs

pip install torch==1.8.0+cu111 torchvision==0.9.0+cu111 torchaudio==0.8.0 -f
pip install -U scikit-image
pip install git+
pip install tensorboardX termcolor timm tqdm requests pandas


  • Download pre-trained weights on Link
  • All datasets are automatically downloaded into directory specified by argument datapath

Result on SPair-71k: (PCK 49.9%)

  python --pretrained "/path_to_pretrained_model/spair" --benchmark spair

Result on SPair-71k, feature backbone frozen: (PCK 42.4%)

  python --pretrained "/path_to_pretrained_model/spair_frozen" --benchmark spair

Results on PF-PASCAL: (PCK 75.4%, 92.6%, 96.4%)

  python --pretrained "/path_to_pretrained_model/pfpascal" --benchmark pfpascal

Results on PF-PACAL, feature backbone frozen: (PCK 67.5%, 89.1%, 94.9%)

  python --pretrained "/path_to_pretrained_model/pfpascal_frozen" --benchmark pfpascal


If you find this research useful, please consider citing:

      title={Semantic Correspondence with Transformers}, 
      author={Seokju Cho and Sunghwan Hong and Sangryul Jeon and Yunsung Lee and Kwanghoon Sohn and Seungryong Kim},
