A Novel Plug-in Module for Fine-grained Visual Classification

Overview

A Novel Plug-in Module for Fine-grained Visual Classification

PWC

PWC

paper url: https://arxiv.org/abs/2202.03822

We propose a novel plug-in module that can be integrated to many common backbones, including CNN-based or Transformer-based networks to provide strongly discriminative regions. The plugin module can output pixel-level feature maps and fuse filtered features to enhance fine-grained visual classification. Experimental results show that the proposed plugin module outperforms state-ofthe-art approaches and significantly improves the accuracy to 92.77% and 92.83% on CUB200-2011 and NABirds, respectively.

framework

1. Environment setting

  • install requirements
  • replace folder timm/ to our timm/ folder (for ViT or Swin-T)

Prepare dataset

In this paper, we use 2 large bird's datasets:

Our pretrained model

Download the pretrained model from this url: https://drive.google.com/drive/folders/1ivMJl4_EgE-EVU_5T8giQTwcNQ6RPtAo?usp=sharing

OS

  • Windows10
  • Ubuntu20.04
  • macOS

2. Train

configuration file: config.py

python train.py --train_root "./CUB200-2011/train/" --val_root "./CUB200-2011/test/"

3. Evaluation

configuration file: config_eval.py

python eval.py --pretrained_path "./backup/CUB200/best.pth" --val_root "./CUB200-2011/test/"

4. Visualization

configuration file: config_plot.py

python plot_heat.py --pretrained_path "./backup/CUB200/best.pth" --img_path "./img/001.png/"

visualization

Acknowledgment

  • Thanks to timm for Pytorch implementation.

  • This work was financially supported by the National Taiwan Normal University (NTNU) within the framework of the Higher Education Sprout Project by the Ministry of Education(MOE) in Taiwan, sponsored by Ministry of Science and Technology, Taiwan, R.O.C. under Grant no. MOST 110- 2221-E-003-026, 110-2634-F-003 -007, and 110-2634-F-003 -006. In addition, we thank to National Center for Highperformance Computing (NCHC) for providing computational and storage resources.

Comments
  • How to train on CUB

    How to train on CUB

    Thanks for your great job! I have browsed your code and I found that the data is read in through the ImageDataset class, but it does not seem to fit the original format of the CUB dataset. Did you change the format of the original CUB dataset when you performed your experiments, and if so, can you please tell me how you did it? 1

    opened by JingjunYi 4
  • Questions regarding inference

    Questions regarding inference

    Hi, I am running a test on your repo with Stanford Dog Dataset which has 120 species. The model trained really well, but I am a little confused with your inference pipeline. I just want to run a inference on a single image so I am referring to your eval.py and plot_heat.py at the moment.

    Your eval.py seems to be calling SwinVit12, but plot_heat.py seems to be calling SwinVit12_demo. Are there difference between the two?

    Just tried running eval.py and I am getting:

    RuntimeError: Error(s) in loading state_dict for SwinVit12:
            size mismatch for gcn.adj1: copying a param with shape torch.Size([85, 85]) from checkpoint, the shape in current model is torch.Size([15, 15]).
            size mismatch for gcn.pool1.weight: copying a param with shape torch.Size([85, 2720]) from checkpoint, the shape in current model is torch.Size([15, 480]).
            size mismatch for gcn.pool1.bias: copying a param with shape torch.Size([85]) from checkpoint, the shape in current model is torch.Size([15]).
            size mismatch for gcn.pool4.weight: copying a param with shape torch.Size([1, 85]) from checkpoint, the shape in current model is torch.Size([1, 15]).
    

    Seems like something is not configured properly on my end.

    opened by chophilip21 4
  • Swin-T and Resolution

    Swin-T and Resolution

    Hi, Thanks for your excellent work, i have a question, i just find the pre-training model Swin_t with pre-training on i1k and resolution 224, can you provide the link to download the pre-training model of swin_t in the paper?

    opened by xiang-jian-wen 3
  • Why the output is a dict and not a tensor?

    Why the output is a dict and not a tensor?

    Why is the output of line 331 of models/pim_module/pim_module.py a dict and not a tensor?That is 图片 When I test it, the err is 图片_2 I use swin-T as the backbone. Can you help me?

    opened by bf0724 1
  • How to train the model on my own dataset?

    How to train the model on my own dataset?

    Thanks for your code, it is really a nice work! However, I found multiple troubles when adapting the code to my own data. I have already: 1) changed the class number in the config.py and set the args in CMD; 2) the inputted image size has been changed following former issues in this Github page. QQ图片20220416123249 QQ图片20220416123320

    However, after changing these two aspects, the results are still misleading and confusing. So, may I ask, if using the model on one's own dataset, how many issues do I need to change?

    opened by BiQiWHU 1
  • How to use this code when infer?

    How to use this code when infer?

    I checked the code carefully. You write the loss calculation in the model. After initializing the model, the model needs to be sent to the label. There is no code design for infer, and after the end-to-end training, a series of post-processing is used. , including: splicing features, fusion features, etc., and then classify them separately, I want to know if I don't know gt, how to choose the best result? Is this code just written to brush the list? Where is the logic of the actual application?

    opened by Bin-ze 1
  • How to solve the maximum recursion depth error?

    How to solve the maximum recursion depth error?

    This is a great job, but I ran on my own dataset and encountered the following errors.

    Start Training 3 Epoch..0%..10%..20% Start Evaluating 1 Epoch Start Training 2 Epoch Start Training 3 Epoch Traceback (most recent call last): File "main.py", line 297, in main(args, tlogger) File "main.py", line 249, in main train(args, epoch, model, scaler, amp_context, optimizer, schedule, train_loader) File "main.py", line 136, in train outs = model(datas) File "/DATA/sgwei/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl return forward_call(*input, **kwargs) File "/DATA/sgwei/code/Fine_grained_PIM/models/pim_module/pim_module.py", line 404, in forward x = self.forward_backbone(x) File "/DATA/sgwei/code/Fine_grained_PIM/models/pim_module/pim_module.py", line 383, in forward_backbone return self.backbone(x) File "/DATA/sgwei/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/fx/graph_module.py", line 616, in wrapped_call raise e.with_traceback(None) RecursionError: maximum recursion depth exceeded while calling a Python object

    and this is my configs

    batch_size 16 c "./configs/classification_vit.yaml" data_size 448 device "cuda:0" eval_freq 10 exp_name "T3000" fpn_size 512 lambda_b 0.5 lambda_c 1 lambda_n 5 lambda_s 0 log_freq 100 max_epochs 20 max_lr 0.0005 model_name "vit" num_classes 5 num_selects layer1 512 layer2 256 layer3 128 layer4 64 num_workers 8 optimizer "SGD" pretrained desc null value null project_name "pim_classificationpip" save_dir "./records/pim_classificationpip/T3000/" train_root "/DATA/sgwei/Datasets/DataBase_v2/train/" update_freq 2 use_amp true use_combiner true use_fpn true use_selection true use_wandb true val_root "/DATA/sgwei/Datasets/MultiLesionClassify_DataBase_v2/val/" wandb_entity "sgwei" warmup_batchs 800 wdecay 0.0005

    opened by sunlight002 0
  • VIT input size transform 224 to another

    VIT input size transform 224 to another

    Dear author, I saw in your code said "Vit model input can transform 224 to another, we use linear", but I do not know how to use it. I tried to use 384*384 as my input size directly, but it shows me a tensor cannot match issues, so I have to resize my input data, but there are same issues when I detect my test data, so can I know how to use transform 224 to another, or I just need to add a linear layer before I input my datasets?

    Thanks

    opened by Chaoran-F 0
  • Fixed the error reporting problem when the code performs result verification

    Fixed the error reporting problem when the code performs result verification

    while run

    python main.py --c ./configs/eval.yaml

    your code will be wrong

    I solved this problem and added eval.yaml, when verifying the result, I need to put in the yaml:

    pretrained: your model weight path

    Then:

    python main.py --c ./configs/eval.yaml

    will get the correct result

    opened by Bin-ze 0
  • How to run HeatMap with your best pretrained NABirds model?

    How to run HeatMap with your best pretrained NABirds model?

    Thank you so much for the beautiful code. I'm trying to use your pretrained model best.pt on NABirds dataset.

    First, I set the PATH to the pretrained model in NABirds_SwinT.yaml then I run:

    python heat.py --c ./configs/NABirds_SwinT.yaml --img ./vis/001.jpg --save_img ./vis/001/

    But I get errors:


    Building... Traceback (most recent call last): File "C:\Users\xxxxxx\Desktopxxxxxx\heat.py", line 100, in model.load_state_dict(checkpoint['model']) File "C:\Users\xxxxxx\anaconda3\envs\xxxxxxx\lib\site-packages\torch\nn\modules\module.py", line 1667, in load_state_dict raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format( RuntimeError: Error(s) in loading state_dict for PluginMoodel: Unexpected key(s) in state_dict: "combiner.conv_qk1.weight", "combiner.conv_qk1.bias". size mismatch for combiner.adj1: copying a param with shape torch.Size([85, 85]) from checkpoint, the shape in current model is torch.Size([15, 15]). size mismatch for combiner.param_pool0.weight: copying a param with shape torch.Size([85, 2720]) from checkpoint, the shape in current model is torch.Size([15, 480]). size mismatch for combiner.param_pool0.bias: copying a param with shape torch.Size([85]) from checkpoint, the shape in current model is torch.Size([15]). size mismatch for combiner.param_pool1.weight: copying a param with shape torch.Size([1, 85]) from checkpoint, the shape in current model is torch.Size([1, 15]).


    It seems that your best,pt model is not using default SwinT model, the num_selects is not matching And there are unexpected keys in your best.pt model: "combiner.conv_qk1.weight", "combiner.conv_qk1.bias"

    I wish to know what modification I should make to load your pretrained best.pt.

    opened by LanceBao0313 1
  • RuntimeError: mat1 and mat2 shapes cannot be multiplied (6144x1456 and 2720x85)

    RuntimeError: mat1 and mat2 shapes cannot be multiplied (6144x1456 and 2720x85)

    When I select efficentnet for training I get the following error, only swin-transformer does not report it Can you help me?

    Start Training 1 EpochTraceback (most recent call last): File "D:/hxy/FGVC-PIM-master/main.py", line 301, in main(args, tlogger) File "D:/hxy/FGVC-PIM-master/main.py", line 253, in main train(args, epoch, model, scaler, amp_context, optimizer, schedule, train_loader) File "D:/hxy/FGVC-PIM-master/main.py", line 140, in train outs = model(datas) File "C:\Users\mj\anaconda3\envs\pytorch\lib\site-packages\torch\nn\modules\module.py", line 1110, in _call_impl return forward_call(*input, **kwargs) File "C:\Users\mj\anaconda3\envs\pytorch\lib\site-packages\torch\nn\parallel\data_parallel.py", line 168, in forward outputs = self.parallel_apply(replicas, inputs, kwargs) File "C:\Users\mj\anaconda3\envs\pytorch\lib\site-packages\torch\nn\parallel\data_parallel.py", line 178, in parallel_apply return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)]) File "C:\Users\mj\anaconda3\envs\pytorch\lib\site-packages\torch\nn\parallel\parallel_apply.py", line 86, in parallel_apply output.reraise() File "C:\Users\mj\anaconda3\envs\pytorch\lib\site-packages\torch_utils.py", line 457, in reraise raise exception RuntimeError: Caught RuntimeError in replica 0 on device 0. Original Traceback (most recent call last): File "C:\Users\mj\anaconda3\envs\pytorch\lib\site-packages\torch\nn\parallel\parallel_apply.py", line 61, in _worker output = module(*input, **kwargs) File "C:\Users\mj\anaconda3\envs\pytorch\lib\site-packages\torch\nn\modules\module.py", line 1110, in _call_impl return forward_call(*input, **kwargs) File "D:\hxy\FGVC-PIM-master\models\pim_module\pim_module.py", line 414, in forward comb_outs = self.combiner(selects) File "C:\Users\mj\anaconda3\envs\pytorch\lib\site-packages\torch\nn\modules\module.py", line 1110, in _call_impl return forward_call(*input, **kwargs) File "D:\hxy\FGVC-PIM-master\models\pim_module\pim_module.py", line 81, in forward hs = self.param_pool0(hs) File "C:\Users\mj\anaconda3\envs\pytorch\lib\site-packages\torch\nn\modules\module.py", line 1110, in _call_impl return forward_call(*input, **kwargs) File "C:\Users\mj\anaconda3\envs\pytorch\lib\site-packages\torch\nn\modules\linear.py", line 103, in forward return F.linear(input, self.weight, self.bias) RuntimeError: mat1 and mat2 shapes cannot be multiplied (6144x1456 and 2720x85)

    opened by smallzhu 4
Owner
ChouPoYung
NTNUEE AIoT Lab.
ChouPoYung
Code release for The Devil is in the Channels: Mutual-Channel Loss for Fine-Grained Image Classification (TIP 2020)

The Devil is in the Channels: Mutual-Channel Loss for Fine-Grained Image Classification Code release for The Devil is in the Channels: Mutual-Channel

PRIS-CV: Computer Vision Group 230 Dec 31, 2022
PyTorch implementation of Weak-shot Fine-grained Classification via Similarity Transfer

SimTrans-Weak-Shot-Classification This repository contains the official PyTorch implementation of the following paper: Weak-shot Fine-grained Classifi

BCMI 60 Dec 2, 2022
Weakly Supervised Posture Mining with Reverse Cross-entropy for Fine-grained Classification

Fine-grainedImageClassification Weakly Supervised Posture Mining with Reverse Cross-entropy for Fine-grained Classification We trained model here: lin

ZhenchaoTang 14 Oct 21, 2022
计算机视觉中用到的注意力模块和其他即插即用模块PyTorch Implementation Collection of Attention Module and Plug&Play Module

PyTorch实现多种计算机视觉中网络设计中用到的Attention机制,还收集了一些即插即用模块。由于能力有限精力有限,可能很多模块并没有包括进来,有任何的建议或者改进,可以提交issue或者进行PR。

PJDong 599 Dec 23, 2022
This is the official PyTorch implementation of the paper "TransFG: A Transformer Architecture for Fine-grained Recognition" (Ju He, Jie-Neng Chen, Shuai Liu, Adam Kortylewski, Cheng Yang, Yutong Bai, Changhu Wang, Alan Yuille).

TransFG: A Transformer Architecture for Fine-grained Recognition Official PyTorch code for the paper: TransFG: A Transformer Architecture for Fine-gra

Ju He 307 Jan 3, 2023
WHENet: Real-time Fine-Grained Estimation for Wide Range Head Pose

WHENet: Real-time Fine-Grained Estimation for Wide Range Head Pose Yijun Zhou and James Gregson - BMVC2020 Abstract: We present an end-to-end head-pos

null 368 Dec 26, 2022
Code and data of the Fine-Grained R2R Dataset proposed in paper Sub-Instruction Aware Vision-and-Language Navigation

Fine-Grained R2R Code and data of the Fine-Grained R2R Dataset proposed in the EMNLP2020 paper Sub-Instruction Aware Vision-and-Language Navigation. C

YicongHong 34 Nov 15, 2022
The coda and data for "Measuring Fine-Grained Domain Relevance of Terms: A Hierarchical Core-Fringe Approach" (ACL '21)

We propose a hierarchical core-fringe learning framework to measure fine-grained domain relevance of terms – the degree that a term is relevant to a broad (e.g., computer science) or narrow (e.g., deep learning) domain.

Jie Huang 14 Oct 21, 2022
The implementation of CVPR2021 paper Temporal Query Networks for Fine-grained Video Understanding, by Chuhan Zhang, Ankush Gupta and Andrew Zisserman.

Temporal Query Networks for Fine-grained Video Understanding ?? This repository contains the implementation of CVPR2021 paper Temporal_Query_Networks

null 55 Dec 21, 2022
PyTorch implementation for Stochastic Fine-grained Labeling of Multi-state Sign Glosses for Continuous Sign Language Recognition.

Stochastic CSLR This is the PyTorch implementation for the ECCV 2020 paper: Stochastic Fine-grained Labeling of Multi-state Sign Glosses for Continuou

Zhe Niu 28 Dec 19, 2022
Code for Talk-to-Edit (ICCV2021). Paper: Talk-to-Edit: Fine-Grained Facial Editing via Dialog.

Talk-to-Edit (ICCV2021) This repository contains the implementation of the following paper: Talk-to-Edit: Fine-Grained Facial Editing via Dialog Yumin

Yuming Jiang 221 Jan 7, 2023
official Pytorch implementation of ICCV 2021 paper FuseFormer: Fusing Fine-Grained Information in Transformers for Video Inpainting.

FuseFormer: Fusing Fine-Grained Information in Transformers for Video Inpainting By Rui Liu, Hanming Deng, Yangyi Huang, Xiaoyu Shi, Lewei Lu, Wenxiu

null 77 Dec 27, 2022
Official PyTorch implementation of N-ImageNet: Towards Robust, Fine-Grained Object Recognition with Event Cameras (ICCV 2021)

N-ImageNet: Towards Robust, Fine-Grained Object Recognition with Event Cameras Official PyTorch implementation of N-ImageNet: Towards Robust, Fine-Gra

null 32 Dec 26, 2022
SnapMix: Semantically Proportional Mixing for Augmenting Fine-grained Data (AAAI 2021)

SnapMix: Semantically Proportional Mixing for Augmenting Fine-grained Data (AAAI 2021) PyTorch implementation of SnapMix | paper Method Overview Cite

DavidHuang 126 Dec 30, 2022
Official pytorch code for SSC-GAN: Semi-Supervised Single-Stage Controllable GANs for Conditional Fine-Grained Image Generation(ICCV 2021)

SSC-GAN_repo Pytorch implementation for 'Semi-Supervised Single-Stage Controllable GANs for Conditional Fine-Grained Image Generation'.PDF SSC-GAN:Sem

tyty 4 Aug 28, 2022
Fine-grained Control of Image Caption Generation with Abstract Scene Graphs

Faster R-CNN pretrained on VisualGenome This repository modifies maskrcnn-benchmark for object detection and attribute prediction on VisualGenome data

Shizhe Chen 7 Apr 20, 2021
TransFGU: A Top-down Approach to Fine-Grained Unsupervised Semantic Segmentation

TransFGU: A Top-down Approach to Fine-Grained Unsupervised Semantic Segmentation Zhaoyun Yin, Pichao Wang, Fan Wang, Xianzhe Xu, Hanling Zhang, Hao Li

DamoCV 25 Dec 16, 2022
Towards Fine-Grained Reasoning for Fake News Detection

FinerFact This is the PyTorch implementation for the FinerFact model in the AAAI 2022 paper Towards Fine-Grained Reasoning for Fake News Detection (Ar

Ahren_Jin 15 Dec 15, 2022
FIRA: Fine-Grained Graph-Based Code Change Representation for Automated Commit Message Generation

FIRA is a learning-based commit message generation approach, which first represents code changes via fine-grained graphs and then learns to generate commit messages automatically.

Van 21 Dec 30, 2022