A Novel Plug-in Module for Fine-grained Visual Classification

ChouPoYung

Last update: Dec 20, 2022

Related tags

Deep Learning resnet fine-grained-visual-categorization efficientnet fgvc vision-transformer swin-transformer

Overview

A Novel Plug-in Module for Fine-grained Visual Classification

paper url: https://arxiv.org/abs/2202.03822

We propose a novel plug-in module that can be integrated to many common backbones, including CNN-based or Transformer-based networks to provide strongly discriminative regions. The plugin module can output pixel-level feature maps and fuse filtered features to enhance fine-grained visual classification. Experimental results show that the proposed plugin module outperforms state-ofthe-art approaches and significantly improves the accuracy to 92.77% and 92.83% on CUB200-2011 and NABirds, respectively.

1. Environment setting

install requirements
replace folder timm/ to our timm/ folder (for ViT or Swin-T)

Prepare dataset

In this paper, we use 2 large bird's datasets:

Our pretrained model

Download the pretrained model from this url: https://drive.google.com/drive/folders/1ivMJl4_EgE-EVU_5T8giQTwcNQ6RPtAo?usp=sharing

backup/ is our pretrained model path.
resnet50_miil_21k.pth and vit_base_patch16_224_miil_21k.pth are imagenet21k pretrained model (place these file under models/), thanks to https://github.com/Alibaba-MIIL/ImageNet21K/blob/main/MODEL_ZOO.md !!

OS

Windows10
Ubuntu20.04
macOS

2. Train

configuration file: config.py

python train.py --train_root "./CUB200-2011/train/" --val_root "./CUB200-2011/test/"

3. Evaluation

configuration file: config_eval.py

python eval.py --pretrained_path "./backup/CUB200/best.pth" --val_root "./CUB200-2011/test/"

4. Visualization

configuration file: config_plot.py

python plot_heat.py --pretrained_path "./backup/CUB200/best.pth" --img_path "./img/001.png/"

Acknowledgment

Thanks to timm for Pytorch implementation.
This work was financially supported by the National Taiwan Normal University (NTNU) within the framework of the Higher Education Sprout Project by the Ministry of Education(MOE) in Taiwan, sponsored by Ministry of Science and Technology, Taiwan, R.O.C. under Grant no. MOST 110- 2221-E-003-026, 110-2634-F-003 -007, and 110-2634-F-003 -006. In addition, we thank to National Center for Highperformance Computing (NCHC) for providing computational and storage resources.

Comments

How to train on CUB

Thanks for your great job! I have browsed your code and I found that the data is read in through the ImageDataset class, but it does not seem to fit the original format of the CUB dataset. Did you change the format of the original CUB dataset when you performed your experiments, and if so, can you please tell me how you did it?

opened by JingjunYi 4

Questions regarding inference

Hi, I am running a test on your repo with Stanford Dog Dataset which has 120 species. The model trained really well, but I am a little confused with your inference pipeline. I just want to run a inference on a single image so I am referring to your eval.py and plot_heat.py at the moment.

Your eval.py seems to be calling SwinVit12, but plot_heat.py seems to be calling SwinVit12_demo. Are there difference between the two?

Just tried running eval.py and I am getting:

RuntimeError: Error(s) in loading state_dict for SwinVit12:
        size mismatch for gcn.adj1: copying a param with shape torch.Size([85, 85]) from checkpoint, the shape in current model is torch.Size([15, 15]).
        size mismatch for gcn.pool1.weight: copying a param with shape torch.Size([85, 2720]) from checkpoint, the shape in current model is torch.Size([15, 480]).
        size mismatch for gcn.pool1.bias: copying a param with shape torch.Size([85]) from checkpoint, the shape in current model is torch.Size([15]).
        size mismatch for gcn.pool4.weight: copying a param with shape torch.Size([1, 85]) from checkpoint, the shape in current model is torch.Size([1, 15]).

Seems like something is not configured properly on my end.

opened by chophilip21 4

Swin-T and Resolution

Hi, Thanks for your excellent work, i have a question, i just find the pre-training model Swin_t with pre-training on i1k and resolution 224, can you provide the link to download the pre-training model of swin_t in the paper?

opened by xiang-jian-wen 3
Why the output is a dict and not a tensor？

Why is the output of line 331 of models/pim_module/pim_module.py a dict and not a tensor？That is When I test it, the err is I use swin-T as the backbone. Can you help me?

opened by bf0724 1
How to train the model on my own dataset?

Thanks for your code, it is really a nice work! However, I found multiple troubles when adapting the code to my own data. I have already: 1) changed the class number in the config.py and set the args in CMD; 2) the inputted image size has been changed following former issues in this Github page.

However, after changing these two aspects, the results are still misleading and confusing. So, may I ask, if using the model on one's own dataset, how many issues do I need to change?

opened by BiQiWHU 1
How to use this code when infer?

I checked the code carefully. You write the loss calculation in the model. After initializing the model, the model needs to be sent to the label. There is no code design for infer, and after the end-to-end training, a series of post-processing is used. , including: splicing features, fusion features, etc., and then classify them separately, I want to know if I don't know gt, how to choose the best result? Is this code just written to brush the list? Where is the logic of the actual application?

opened by Bin-ze 1
How to solve the maximum recursion depth error？

This is a great job, but I ran on my own dataset and encountered the following errors.

Start Training 3 Epoch..0%..10%..20% Start Evaluating 1 Epoch Start Training 2 Epoch Start Training 3 Epoch Traceback (most recent call last): File "main.py", line 297, in main(args, tlogger) File "main.py", line 249, in main train(args, epoch, model, scaler, amp_context, optimizer, schedule, train_loader) File "main.py", line 136, in train outs = model(datas) File "/DATA/sgwei/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl return forward_call(*input, **kwargs) File "/DATA/sgwei/code/Fine_grained_PIM/models/pim_module/pim_module.py", line 404, in forward x = self.forward_backbone(x) File "/DATA/sgwei/code/Fine_grained_PIM/models/pim_module/pim_module.py", line 383, in forward_backbone return self.backbone(x) File "/DATA/sgwei/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/fx/graph_module.py", line 616, in wrapped_call raise e.with_traceback(None) RecursionError: maximum recursion depth exceeded while calling a Python object

and this is my configs

batch_size 16 c "./configs/classification_vit.yaml" data_size 448 device "cuda:0" eval_freq 10 exp_name "T3000" fpn_size 512 lambda_b 0.5 lambda_c 1 lambda_n 5 lambda_s 0 log_freq 100 max_epochs 20 max_lr 0.0005 model_name "vit" num_classes 5 num_selects layer1 512 layer2 256 layer3 128 layer4 64 num_workers 8 optimizer "SGD" pretrained desc null value null project_name "pim_classificationpip" save_dir "./records/pim_classificationpip/T3000/" train_root "/DATA/sgwei/Datasets/DataBase_v2/train/" update_freq 2 use_amp true use_combiner true use_fpn true use_selection true use_wandb true val_root "/DATA/sgwei/Datasets/MultiLesionClassify_DataBase_v2/val/" wandb_entity "sgwei" warmup_batchs 800 wdecay 0.0005

opened by sunlight002 0
VIT input size transform 224 to another

Dear author, I saw in your code said "Vit model input can transform 224 to another, we use linear", but I do not know how to use it. I tried to use 384*384 as my input size directly, but it shows me a tensor cannot match issues, so I have to resize my input data, but there are same issues when I detect my test data, so can I know how to use transform 224 to another, or I just need to add a linear layer before I input my datasets?

Thanks

opened by Chaoran-F 0
Fixed the error reporting problem when the code performs result verification

while run

python main.py --c ./configs/eval.yaml

your code will be wrong

I solved this problem and added eval.yaml, when verifying the result, I need to put in the yaml：

pretrained: your model weight path

Then：

python main.py --c ./configs/eval.yaml

will get the correct result

opened by Bin-ze 0
How to run HeatMap with your best pretrained NABirds model?

Thank you so much for the beautiful code. I'm trying to use your pretrained model best.pt on NABirds dataset.

First, I set the PATH to the pretrained model in NABirds_SwinT.yaml then I run:

python heat.py --c ./configs/NABirds_SwinT.yaml --img ./vis/001.jpg --save_img ./vis/001/

But I get errors:

Building... Traceback (most recent call last): File "C:\Users\xxxxxx\Desktopxxxxxx\heat.py", line 100, in model.load_state_dict(checkpoint['model']) File "C:\Users\xxxxxx\anaconda3\envs\xxxxxxx\lib\site-packages\torch\nn\modules\module.py", line 1667, in load_state_dict raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format( RuntimeError: Error(s) in loading state_dict for PluginMoodel: Unexpected key(s) in state_dict: "combiner.conv_qk1.weight", "combiner.conv_qk1.bias". size mismatch for combiner.adj1: copying a param with shape torch.Size([85, 85]) from checkpoint, the shape in current model is torch.Size([15, 15]). size mismatch for combiner.param_pool0.weight: copying a param with shape torch.Size([85, 2720]) from checkpoint, the shape in current model is torch.Size([15, 480]). size mismatch for combiner.param_pool0.bias: copying a param with shape torch.Size([85]) from checkpoint, the shape in current model is torch.Size([15]). size mismatch for combiner.param_pool1.weight: copying a param with shape torch.Size([1, 85]) from checkpoint, the shape in current model is torch.Size([1, 15]).

It seems that your best,pt model is not using default SwinT model, the num_selects is not matching And there are unexpected keys in your best.pt model: "combiner.conv_qk1.weight", "combiner.conv_qk1.bias"

I wish to know what modification I should make to load your pretrained best.pt.

opened by LanceBao0313 1
RuntimeError: mat1 and mat2 shapes cannot be multiplied (6144x1456 and 2720x85)

When I select efficentnet for training I get the following error, only swin-transformer does not report it Can you help me?

Start Training 1 EpochTraceback (most recent call last): File "D:/hxy/FGVC-PIM-master/main.py", line 301, in main(args, tlogger) File "D:/hxy/FGVC-PIM-master/main.py", line 253, in main train(args, epoch, model, scaler, amp_context, optimizer, schedule, train_loader) File "D:/hxy/FGVC-PIM-master/main.py", line 140, in train outs = model(datas) File "C:\Users\mj\anaconda3\envs\pytorch\lib\site-packages\torch\nn\modules\module.py", line 1110, in _call_impl return forward_call(*input, **kwargs) File "C:\Users\mj\anaconda3\envs\pytorch\lib\site-packages\torch\nn\parallel\data_parallel.py", line 168, in forward outputs = self.parallel_apply(replicas, inputs, kwargs) File "C:\Users\mj\anaconda3\envs\pytorch\lib\site-packages\torch\nn\parallel\data_parallel.py", line 178, in parallel_apply return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)]) File "C:\Users\mj\anaconda3\envs\pytorch\lib\site-packages\torch\nn\parallel\parallel_apply.py", line 86, in parallel_apply output.reraise() File "C:\Users\mj\anaconda3\envs\pytorch\lib\site-packages\torch_utils.py", line 457, in reraise raise exception RuntimeError: Caught RuntimeError in replica 0 on device 0. Original Traceback (most recent call last): File "C:\Users\mj\anaconda3\envs\pytorch\lib\site-packages\torch\nn\parallel\parallel_apply.py", line 61, in _worker output = module(*input, **kwargs) File "C:\Users\mj\anaconda3\envs\pytorch\lib\site-packages\torch\nn\modules\module.py", line 1110, in _call_impl return forward_call(*input, **kwargs) File "D:\hxy\FGVC-PIM-master\models\pim_module\pim_module.py", line 414, in forward comb_outs = self.combiner(selects) File "C:\Users\mj\anaconda3\envs\pytorch\lib\site-packages\torch\nn\modules\module.py", line 1110, in _call_impl return forward_call(*input, **kwargs) File "D:\hxy\FGVC-PIM-master\models\pim_module\pim_module.py", line 81, in forward hs = self.param_pool0(hs) File "C:\Users\mj\anaconda3\envs\pytorch\lib\site-packages\torch\nn\modules\module.py", line 1110, in _call_impl return forward_call(*input, **kwargs) File "C:\Users\mj\anaconda3\envs\pytorch\lib\site-packages\torch\nn\modules\linear.py", line 103, in forward return F.linear(input, self.weight, self.bias) RuntimeError: mat1 and mat2 shapes cannot be multiplied (6144x1456 and 2720x85)

opened by smallzhu 4

Owner

ChouPoYung

NTNUEE AIoT Lab.

GitHub

Code release for The Devil is in the Channels: Mutual-Channel Loss for Fine-Grained Image Classification (TIP 2020)

The Devil is in the Channels: Mutual-Channel Loss for Fine-Grained Image Classification Code release for The Devil is in the Channels: Mutual-Channel

230 Dec 31, 2022

PyTorch implementation of Weak-shot Fine-grained Classification via Similarity Transfer

SimTrans-Weak-Shot-Classification This repository contains the official PyTorch implementation of the following paper: Weak-shot Fine-grained Classifi

60 Dec 2, 2022

Weakly Supervised Posture Mining with Reverse Cross-entropy for Fine-grained Classification

Fine-grainedImageClassification Weakly Supervised Posture Mining with Reverse Cross-entropy for Fine-grained Classification We trained model here: lin

14 Oct 21, 2022

计算机视觉中用到的注意力模块和其他即插即用模块PyTorch Implementation Collection of Attention Module and Plug&Play Module

PyTorch实现多种计算机视觉中网络设计中用到的Attention机制，还收集了一些即插即用模块。由于能力有限精力有限，可能很多模块并没有包括进来，有任何的建议或者改进，可以提交issue或者进行PR。

599 Dec 23, 2022

This is the official PyTorch implementation of the paper "TransFG: A Transformer Architecture for Fine-grained Recognition" (Ju He, Jie-Neng Chen, Shuai Liu, Adam Kortylewski, Cheng Yang, Yutong Bai, Changhu Wang, Alan Yuille).

TransFG: A Transformer Architecture for Fine-grained Recognition Official PyTorch code for the paper: TransFG: A Transformer Architecture for Fine-gra

307 Jan 3, 2023

WHENet: Real-time Fine-Grained Estimation for Wide Range Head Pose

WHENet: Real-time Fine-Grained Estimation for Wide Range Head Pose Yijun Zhou and James Gregson - BMVC2020 Abstract: We present an end-to-end head-pos

368 Dec 26, 2022

Code and data of the Fine-Grained R2R Dataset proposed in paper Sub-Instruction Aware Vision-and-Language Navigation

Fine-Grained R2R Code and data of the Fine-Grained R2R Dataset proposed in the EMNLP2020 paper Sub-Instruction Aware Vision-and-Language Navigation. C

34 Nov 15, 2022

The coda and data for "Measuring Fine-Grained Domain Relevance of Terms: A Hierarchical Core-Fringe Approach" (ACL '21)

We propose a hierarchical core-fringe learning framework to measure fine-grained domain relevance of terms – the degree that a term is relevant to a broad (e.g., computer science) or narrow (e.g., deep learning) domain.

14 Oct 21, 2022

The implementation of CVPR2021 paper Temporal Query Networks for Fine-grained Video Understanding, by Chuhan Zhang, Ankush Gupta and Andrew Zisserman.

Temporal Query Networks for Fine-grained Video Understanding ?? This repository contains the implementation of CVPR2021 paper Temporal_Query_Networks

55 Dec 21, 2022

PyTorch implementation for Stochastic Fine-grained Labeling of Multi-state Sign Glosses for Continuous Sign Language Recognition.

Stochastic CSLR This is the PyTorch implementation for the ECCV 2020 paper: Stochastic Fine-grained Labeling of Multi-state Sign Glosses for Continuou

28 Dec 19, 2022

Code for Talk-to-Edit (ICCV2021). Paper: Talk-to-Edit: Fine-Grained Facial Editing via Dialog.

Talk-to-Edit (ICCV2021) This repository contains the implementation of the following paper: Talk-to-Edit: Fine-Grained Facial Editing via Dialog Yumin

221 Jan 7, 2023

official Pytorch implementation of ICCV 2021 paper FuseFormer: Fusing Fine-Grained Information in Transformers for Video Inpainting.

FuseFormer: Fusing Fine-Grained Information in Transformers for Video Inpainting By Rui Liu, Hanming Deng, Yangyi Huang, Xiaoyu Shi, Lewei Lu, Wenxiu

77 Dec 27, 2022

Official PyTorch implementation of N-ImageNet: Towards Robust, Fine-Grained Object Recognition with Event Cameras (ICCV 2021)

N-ImageNet: Towards Robust, Fine-Grained Object Recognition with Event Cameras Official PyTorch implementation of N-ImageNet: Towards Robust, Fine-Gra

32 Dec 26, 2022

SnapMix: Semantically Proportional Mixing for Augmenting Fine-grained Data (AAAI 2021)

SnapMix: Semantically Proportional Mixing for Augmenting Fine-grained Data (AAAI 2021) PyTorch implementation of SnapMix | paper Method Overview Cite

126 Dec 30, 2022

Official pytorch code for SSC-GAN: Semi-Supervised Single-Stage Controllable GANs for Conditional Fine-Grained Image Generation(ICCV 2021)

SSC-GAN_repo Pytorch implementation for 'Semi-Supervised Single-Stage Controllable GANs for Conditional Fine-Grained Image Generation'.PDF SSC-GAN:Sem

4 Aug 28, 2022

A Novel Plug-in Module for Fine-grained Visual Classification

Related tags

Overview

A Novel Plug-in Module for Fine-grained Visual Classification

1. Environment setting

Prepare dataset

Our pretrained model

OS

2. Train

3. Evaluation

4. Visualization

Acknowledgment

Comments

Owner

ChouPoYung

Code release for The Devil is in the Channels: Mutual-Channel Loss for Fine-Grained Image Classification (TIP 2020)

PyTorch implementation of Weak-shot Fine-grained Classification via Similarity Transfer

Weakly Supervised Posture Mining with Reverse Cross-entropy for Fine-grained Classification

计算机视觉中用到的注意力模块和其他即插即用模块PyTorch Implementation Collection of Attention Module and Plug&Play Module

This is the official PyTorch implementation of the paper "TransFG: A Transformer Architecture for Fine-grained Recognition" (Ju He, Jie-Neng Chen, Shuai Liu, Adam Kortylewski, Cheng Yang, Yutong Bai, Changhu Wang, Alan Yuille).

WHENet: Real-time Fine-Grained Estimation for Wide Range Head Pose

Code and data of the Fine-Grained R2R Dataset proposed in paper Sub-Instruction Aware Vision-and-Language Navigation

The coda and data for "Measuring Fine-Grained Domain Relevance of Terms: A Hierarchical Core-Fringe Approach" (ACL '21)

The implementation of CVPR2021 paper Temporal Query Networks for Fine-grained Video Understanding, by Chuhan Zhang, Ankush Gupta and Andrew Zisserman.

PyTorch implementation for Stochastic Fine-grained Labeling of Multi-state Sign Glosses for Continuous Sign Language Recognition.

Code for Talk-to-Edit (ICCV2021). Paper: Talk-to-Edit: Fine-Grained Facial Editing via Dialog.

official Pytorch implementation of ICCV 2021 paper FuseFormer: Fusing Fine-Grained Information in Transformers for Video Inpainting.

Official PyTorch implementation of N-ImageNet: Towards Robust, Fine-Grained Object Recognition with Event Cameras (ICCV 2021)

SnapMix: Semantically Proportional Mixing for Augmenting Fine-grained Data (AAAI 2021)

Official pytorch code for SSC-GAN: Semi-Supervised Single-Stage Controllable GANs for Conditional Fine-Grained Image Generation(ICCV 2021)

Fine-grained Control of Image Caption Generation with Abstract Scene Graphs

TransFGU: A Top-down Approach to Fine-Grained Unsupervised Semantic Segmentation

Towards Fine-Grained Reasoning for Fake News Detection

FIRA: Fine-Grained Graph-Based Code Change Representation for Automated Commit Message Generation