This is the official PyTorch implementation of the paper "TransFG: A Transformer Architecture for Fine-grained Recognition" (Ju He, Jie-Neng Chen, Shuai Liu, Adam Kortylewski, Cheng Yang, Yutong Bai, Changhu Wang, Alan Yuille).

Ju He

Last update: Jan 3, 2023

Related tags

Deep Learning fine-grained-recognition

Overview

TransFG: A Transformer Architecture for Fine-grained Recognition

Official PyTorch code for the paper: TransFG: A Transformer Architecture for Fine-grained Recognition

Implementation based on DeiT pretrained on ImageNet-1K with distillation fine-tuning will be released soon.

Framework

Dependencies:

Python 3.7.3
PyTorch 1.5.1
torchvision 0.6.1
ml_collections

Usage

1. Download Google pre-trained ViT models

Get models in this link: ViT-B_16, ViT-B_32...

wget https://storage.googleapis.com/vit_models/imagenet21k/{MODEL_NAME}.npz

2. Prepare data

In the paper, we use data from 5 publicly available datasets:

Please download them from the official websites and put them in the corresponding folders.

3. Install required packages

Install dependencies with the following command:

pip3 install -r requirements.txt

4. Train

To train TransFG on CUB-200-2011 dataset with 4 gpus in FP-16 mode for 10000 steps run:

CUDA_VISIBLE_DEVICES=0,1,2,3 python3 -m torch.distributed.launch --nproc_per_node=4 train.py --dataset CUB_200_2011 --split overlap --num_steps 10000 --fp16 --name sample_run

Citation

If you find our work helpful in your research, please cite it as:

@article{he2021transfg,
  title={TransFG: A Transformer Architecture for Fine-grained Recognition},
  author={He, Ju and Chen, Jieneng and Liu, Shuai and Kortylewski, Adam and Yang, Cheng and Bai, Yutong and Wang, Changhu and Yuille, Alan},
  journal={arXiv preprint arXiv:2103.07976},
  year={2021}
}

Acknowledgement

Many thanks to ViT-pytorch for the PyTorch reimplementation of An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

Comments

visualization code

Thanks for your wonderful work! I meet some problems when I try to visualize the part attention patch as your paper showed. So could you provide the visualization code. Thanks so much!

opened by lao-ling-jie 3
Pip won't find requirements

I'm trying to setup the environment but pip won't find the requirements. In a virtual environment with python 3.7: $ pip install -r requirements.txt
Defaulting to user installation because normal site-packages is not writeable ERROR: Could not find a version that satisfies the requirement torch==1.5.1 (from versions: 1.7.1, 1.8.0, 1.8.1, 1.9.0, 1.9.1, 1.10.0) ERROR: No matching distribution found for torch==1.5.1

opened by DRM-Free 1
patch embeddings always 0?

I was reading the paper and checking the code and I can't see when you add value to the patch embbedings, I was debugging the code and in this part I only see you create a zero tensor and after on forward you only add this tensor. In which moment you give a value to the patch embeddings?

line 157 https://github.com/TACJu/TransFG/blob/master/models/modeling.py#L157 self.position_embeddings = nn.Parameter(torch.zeros(1, n_patches+1, config.hidden_size))

Line 173 embeddings = x + self.position_embeddings

opened by dcastf01 1
CVE-2007-4559 Patch

Patching CVE-2007-4559

Hi, we are security researchers from the Advanced Research Center at Trellix. We have began a campaign to patch a widespread bug named CVE-2007-4559. CVE-2007-4559 is a 15 year old bug in the Python tarfile package. By using extract() or extractall() on a tarfile object without sanitizing input, a maliciously crafted .tar file could perform a directory path traversal attack. We found at least one unsantized extractall() in your codebase and are providing a patch for you via pull request. The patch essentially checks to see if all tarfile members will be extracted safely and throws an exception otherwise. We encourage you to use this patch or your own solution to secure against CVE-2007-4559. Further technical information about the vulnerability can be found in this blog.

If you have further questions you may contact us through this projects lead researcher Kasimir Schulz.

opened by TrellixVulnTeam 0
About the training details

First of all, thank you for your work, which has benefited me a lot.

After several attempts, only 91% accuracy can be obtained on the cub. Can you provide model parameters and training details with 91.7% accuracy.Thank you very much if you reply.

opened by shiyan-cui 0
Would you like to open source the implementation based on [DeiT] pretrained on ImageNet-1K with distillation fine-tuning.

There was a sentence on the project page that went, "Implementation based on DeiT pretrained on ImageNet-1K with distillation fine-tuning will be released soon". It will be great if you still have the plan to open source the implementation based on [DeiT] pretrained on ImageNet-1K. Thank you! I am looking forward to your reply.

opened by Anyway2022 0
About Stanford dogs accuracy

Hi, could you release your training settings for the Stanford dogs dataset? I set the lr to 3e-3 and did not change other settings, however the model is underfitting. I only get 1.7% accuracy after 200k steps.

opened by EdwinKuo1337 2

This is the official PyTorch implementation of the paper "TransFG: A Transformer Architecture for Fine-grained Recognition" (Ju He, Jie-Neng Chen, Shuai Liu, Adam Kortylewski, Cheng Yang, Yutong Bai, Changhu Wang, Alan Yuille).

Related tags

Overview

TransFG: A Transformer Architecture for Fine-grained Recognition

Framework

Dependencies:

Usage

1. Download Google pre-trained ViT models

2. Prepare data

3. Install required packages

4. Train

Citation

Acknowledgement

Comments

visualization code

Pip won't find requirements

patch embeddings always 0?

CVE-2007-4559 Patch

Patching CVE-2007-4559

About the training details

Would you like to open source the implementation based on [DeiT] pretrained on ImageNet-1K with distillation fine-tuning.

About Stanford dogs accuracy

Owner

Ju He

[Preprint] "Chasing Sparsity in Vision Transformers: An End-to-End Exploration" by Tianlong Chen, Yu Cheng, Zhe Gan, Lu Yuan, Lei Zhang, Zhangyang Wang

[CVPR 2022] "The Principle of Diversity: Training Stronger Vision Transformers Calls for Reducing All Levels of Redundancy" by Tianlong Chen, Zhenyu Zhang, Yu Cheng, Ahmed Awadallah, Zhangyang Wang

[CVPRW 21] "BNN - BN = ? Training Binary Neural Networks without Batch Normalization", Tianlong Chen, Zhenyu Zhang, Xu Ouyang, Zechun Liu, Zhiqiang Shen, Zhangyang Wang

[Preprint] "Bag of Tricks for Training Deeper Graph Neural Networks A Comprehensive Benchmark Study" by Tianlong Chen*, Kaixiong Zhou*, Keyu Duan, Wenqing Zheng, Peihao Wang, Xia Hu, Zhangyang Wang

[ICCV'2021] "SSH: A Self-Supervised Framework for Image Harmonization", Yifan Jiang, He Zhang, Jianming Zhang, Yilin Wang, Zhe Lin, Kalyan Sunkavalli, Simon Chen, Sohrab Amirghodsi, Sarah Kong, Zhangyang Wang

[ICML 2021] "Graph Contrastive Learning Automated" by Yuning You, Tianlong Chen, Yang Shen, Zhangyang Wang

This is the code for the paper "Jinkai Zheng, Xinchen Liu, Wu Liu, Lingxiao He, Chenggang Yan, Tao Mei: Gait Recognition in the Wild with Dense 3D Representations and A Benchmark. (CVPR 2022)"

[ICLR 2021] "Neural Architecture Search on ImageNet in Four GPU Hours: A Theoretically Inspired Perspective" by Wuyang Chen, Xinyu Gong, Zhangyang Wang

[ICLR 2021 Spotlight Oral] "Undistillable: Making A Nasty Teacher That CANNOT teach students", Haoyu Ma, Tianlong Chen, Ting-Kuei Hu, Chenyu You, Xiaohui Xie, Zhangyang Wang

[ICML 2021] “ Self-Damaging Contrastive Learning”, Ziyu Jiang, Tianlong Chen, Bobak Mortazavi, Zhangyang Wang

"SinNeRF: Training Neural Radiance Fields on Complex Scenes from a Single Image", Dejia Xu, Yifan Jiang, Peihao Wang, Zhiwen Fan, Humphrey Shi, Zhangyang Wang

PyTorch implementation of our Adam-NSCL algorithm from our CVPR2021 (oral) paper "Training Networks in Null Space for Continual Learning"

How Do Adam and Training Strategies Help BNNs Optimization? In ICML 2021.

Ranger - a synergistic optimizer using RAdam (Rectified Adam), Gradient Centralization and LookAhead in one codebase

Code for the ICML 2021 paper "Bridging Multi-Task Learning and Meta-Learning: Towards Efficient Training and Effective Adaptation", Haoxiang Wang, Han Zhao, Bo Li.

Y. Zhang, Q. Yao, W. Dai, L. Chen. AutoSF: Searching Scoring Functions for Knowledge Graph Embedding. IEEE International Conference on Data Engineering (ICDE). 2020

SuMa++: Efficient LiDAR-based Semantic SLAM (Chen et al IROS 2019)

[ICLR 2021] "CPT: Efficient Deep Neural Network Training via Cyclic Precision" by Yonggan Fu, Han Guo, Meng Li, Xin Yang, Yining Ding, Vikas Chandra, Yingyan Lin

Official PyTorch implementation for paper Context Matters: Graph-based Self-supervised Representation Learning for Medical Images

[Preprint] "Bag of Tricks for Training Deeper Graph Neural Networks A Comprehensive Benchmark Study" by Tianlong Chen, Kaixiong Zhou, Keyu Duan, Wenqing Zheng, Peihao Wang, Xia Hu, Zhangyang Wang