Conformer: Local Features Coupling Global Representations for Visual Recognition

Overview

Conformer: Local Features Coupling Global Representations for Visual Recognition (arxiv)

This repository is built upon DeiT and timm

Usage

First, install PyTorch 1.7.0+ and torchvision 0.8.1+ and pytorch-image-models 0.3.2:

conda install -c pytorch pytorch torchvision
pip install timm==0.3.2

Data preparation

Download and extract ImageNet train and val images from http://image-net.org/. The directory structure is the standard layout for the torchvision datasets.ImageFolder, and the training and validation data is expected to be in the train/ folder and val folder respectively:

/path/to/imagenet/
  train/
    class1/
      img1.jpeg
    class2/
      img2.jpeg
  val/
    class1/
      img3.jpeg
    class/2
      img4.jpeg

Training

To train Conformer-S on ImageNet on a single node with 8 gpus for 300 epochs run:

Conformer-S

export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
OUTPUT='./output/Conformer_small_patch16_batch_1024_lr1e-3_300epochs'

python -m torch.distributed.launch --master_port 50130 --nproc_per_node=8 --use_env main.py \
                                   --model Conformer_small_patch16 \
                                   --data-set IMNET \
                                   --batch-size 128 \
                                   --lr 0.001 \
                                   --num_workers 4 \
                                   --data-path /data/user/Dataset/ImageNet_ILSVRC2012/ \
                                   --output_dir ${OUTPUT} \
                                   --epochs 300

Model Zoo

Model Parameters MACs Top-1 Acc Link
Conformer-Ti 23.5 M 5.2 G 81.3 % baidu(code: hzhm) google
Conformer-S 37.7 M 10.6 G 83.4 % baidu(code: qvu8) google
Conformer-B 83.3 M 23.3 G 84.1 % baidu(code: b4z9) google

Citation

@article{peng2021conformer,
      title={Conformer: Local Features Coupling Global Representations for Visual Recognition}, 
      author={Zhiliang Peng and Wei Huang and Shanzhi Gu and Lingxi Xie and Yaowei Wang and Jianbin Jiao and Qixiang Ye},
      journal={arXiv preprint arXiv:2105.03889},
      year={2021},
}
Comments
  • transformer 分支是否需要强监督。

    transformer 分支是否需要强监督。

    image 你好,请问是否有实验,transformer 分支不进行loss 监督的消融实验呢?从模型结构来看,conv 层的特征已经融合了transformer 特征,如果只监督conv 分支,结果如何呢?我这边在检测DBnet的backbone 中使用了conformer 结构,但是最后只监督了conv 这个分支,结果没有优于原来,可能也和其他变量相关。

    opened by 13354236170 10
  • why does model not load state dict when do inference?

    why does model not load state dict when do inference?

    When just do inference using the model, I found that the model don't do load_state_dict after torch.load The related code is from line 301 to line 322 in './Conformer-main/main.py'

    opened by HIT-LiuChen 5
  • training question

    training question

    Thanks for your nice work! Here I encountered a question about training from scratch for custom data, the error message is shown as the following:

    D:\dl\Conformer-main>python main.py --model Conformer_small_patch16 --data-set IMNET --batch-size 4 --lr 0.001 --num_workers 0 --data-path ./datasets/test/ --output_dir ./output/test/ --epochs 10 Not using distributed mode Namespace(aa='rand-m9-mstd0.5-inc1', batch_size=4, clip_grad=None, color_jitter=0.4, cooldown_epochs=10, cutmix=1.0, cutmix_minmax=None, data_path='./datasets/test/', data_set='IMNET', decay_epochs=30, decay_rate=0.1, device='cuda', dist_url='env://', distributed=False, drop=0.0, drop_block=None, drop_path=0.1, epochs=10, eval=False, evaluate_freq=1, finetune='', inat_category='name', input_size=224, lr=0.001, lr_noise=None, lr_noise_pct=0.67, lr_noise_std=1.0, min_lr=1e-05, mixup=0.8, mixup_mode='batch', mixup_prob=1.0, mixup_switch_prob=0.5, model='Conformer_small_patch16', model_ema=True, model_ema_decay=0.99996, model_ema_force_cpu=False, momentum=0.9, num_workers=0, opt='adamw', opt_betas=None, opt_eps=1e-08, output_dir='./output/test/', patience_epochs=10, pin_mem=True, recount=1, remode='pixel', repeated_aug=True, reprob=0.25, resplit=False, resume='', sched='cosine', seed=0, smoothing=0.1, start_epoch=0, train_interpolation='bicubic', warmup_epochs=5, warmup_lr=1e-06, weight_decay=0.05, world_size=1) Creating model: Conformer_small_patch16 number of params: 37673424 Start training Traceback (most recent call last): File "main.py", line 375, in main(args) File "main.py", line 335, in main set_training_mode=args.finetune == '' # keep in eval mode during finetuning File "D:\dl\Conformer-main\engine.py", line 30, in train_one_epoch for samples, targets in metric_logger.log_every(data_loader, print_freq, header): File "D:\dl\Conformer-main\utils.py", line 157, in log_every header, total_time_str, total_time / len(iterable))) ZeroDivisionError: float division by zero

    Kindly for help, thanks!

    opened by JhihJhe 5
  • QA:Norm acts on Up and Down sampler respectively

    QA:Norm acts on Up and Down sampler respectively

    Hi,Gay. I do not understand why LayerNorm in down-sampler FCU and BatchNorm in up-sampler FCU are used to regularize features.Is there any special meaning?

    opened by xesdiny 4
  • conformer_small_patch32.pth

    conformer_small_patch32.pth

    Nice work! And I have some questions here. The "small" pretrained model posted on home page is conformer_small_patch16.pth. And I haven't found the patch32 one which is the pretrained model in your faster RCNN's config. May I ask this checkpoint file, please? And Can you provide the configuration file of Base and Large model? Thank you very much!

    opened by JarvisKevin 2
  • first 5 epoch , test_acc1=0

    first 5 epoch , test_acc1=0

    {"train_lr": 0.0006003999999998758, "train_loss_0": 5.2468776138196835, "train_loss_1": 5.2314488993825545, "test_loss": 12.697948946271623, "test_loss_0": 6.487931878226144, "test_loss_1": 6.210017108917237, "test_acc1": 0.0, "test_acc1_head1": 0.0, "test_acc1_head2": 0.0, "epoch": 4, "n_parameters": 77557992}

    opened by eeric 2
  • Feature analysis

    Feature analysis

    Dear author, I saw the picture of feature analysis in your paper(Figure 4). I think the visualization effect is very good. Can you provide the code here? Thank you.

    opened by Yang-YuLin 1
  • Questions about the code

    Questions about the code

    1. Could you interpret the last_fusion in ConvTransBlock? I did not find a specific explanation in your paper.
    2. In ConvBlock, could you explain what is x_t in the forward function?
    opened by DrChenziyan 1
  • About Conformer tiny patch16 model link

    About Conformer tiny patch16 model link

    Hi, thanks for sharing your nice work with us.

    When I was opening Google Drive link of Conformer tiny patch16, I can only get the base one.

    Could you update the link?

    Thank you!

    opened by Ree1s 1
  • Top-1 accuracy on the CIFAR100 dataset

    Top-1 accuracy on the CIFAR100 dataset

    Happy Chinese New Year. I tried to use the command "python main.py --model Conformer_tiny_patch16 --data-set CIFAR --batch-size 512 --lr 0.001 --num_workers 4 --data-path ../data --output_dir ./output/Conformer_tiny_patch16_batch512_lr1e3_epoch500_cifar100 --epochs 500 --input-size 32" to train Conformer on the CIFAR100 dataset with one tesla v100 GPU . But only got 45.89% accuracy. Would you mind providing the training parameters and top-1 accuracy on CIFAR 100 dataset? the validation accuracy in last epoch

    opened by HIT-LiuChen 0
  • Why is small to lr? why is lr from small to big?

    Why is small to lr? why is lr from small to big?

    batch size=128, initial lr =0.001 Epoch: [0] [ 1070/14862] eta: 1:11:01 lr: 0.000001 loss_0: 5.6751 (5.6817) loss_1: 5.6846 (5.7059) time: 0.3053 data: 0.0002 max mem: 4992

    opened by eeric 0
  • the time cost for conformer-S?

    the time cost for conformer-S?

    when i use 4 3090Ti to train the conformer-S model, with batchsize=128, epoch=300 as the sh file you showed. 1.it may take around 9 days, is this the normal speed? 2.and what can i do to shorten training time, like with larger batchsize, 3.and it would have some bad influence or not?

    opened by YuYue26 0
  • About generalization capability

    About generalization capability

    Hi! Thank you for sharing your work~ There is one place in the paper that I don't quite understand. You mentioned Rotation Invariance and Scale Invariance. But I don't see a detailed proof of them. Is this conclusion just based on experimental comparisons, or is there a theoretical basis I haven't noticed? Thank you.

    opened by salt0107fish 2
  • object detection code

    object detection code

    I saw the object detection result in the paper, but the input image size is (800,1333), I think may cause the model take up more space. Is there any code to achieve object detection?

    opened by qyu21490 0
Owner
Zhiliang Peng
Zhiliang Peng
PyTorch implementation of "Conformer: Convolution-augmented Transformer for Speech Recognition" (INTERSPEECH 2020)

PyTorch implementation of Conformer: Convolution-augmented Transformer for Speech Recognition. Transformer models are good at capturing content-based

Soohwan Kim 565 Jan 4, 2023
Efficient Conformer: Progressive Downsampling and Grouped Attention for Automatic Speech Recognition

Efficient Conformer: Progressive Downsampling and Grouped Attention for Automatic Speech Recognition Official implementation of the Efficient Conforme

Maxime Burchi 145 Dec 30, 2022
Improving adversarial robustness by a coupling rejection strategy

Adversarial Training with Rectified Rejection The code for the paper Adversarial Training with Rectified Rejection. Environment settings and libraries

Tianyu Pang 29 Jan 6, 2023
Code for the ICASSP-2021 paper: Continuous Speech Separation with Conformer.

Continuous Speech Separation with Conformer Introduction We examine the use of the Conformer architecture for continuous speech separation. Conformer

Sanyuan Chen (陈三元) 81 Nov 28, 2022
Multi-modal Text Recognition Networks: Interactive Enhancements between Visual and Semantic Features

Multi-modal Text Recognition Networks: Interactive Enhancements between Visual and Semantic Features | paper | Official PyTorch implementation for Mul

null 48 Dec 28, 2022
library for nonlinear optimization, wrapping many algorithms for global and local, constrained or unconstrained, optimization

NLopt is a library for nonlinear local and global optimization, for functions with and without gradient information. It is designed as a simple, unifi

Steven G. Johnson 1.4k Dec 25, 2022
Implementation of Self-supervised Graph-level Representation Learning with Local and Global Structure (ICML 2021).

Self-supervised Graph-level Representation Learning with Local and Global Structure Introduction This project is an implementation of ``Self-supervise

MilaGraph 50 Dec 9, 2022
Official code for "Focal Self-attention for Local-Global Interactions in Vision Transformers"

Focal Transformer This is the official implementation of our Focal Transformer -- "Focal Self-attention for Local-Global Interactions in Vision Transf

Microsoft 486 Dec 20, 2022
Decentralized Reinforcment Learning: Global Decision-Making via Local Economic Transactions (ICML 2020)

Decentralized Reinforcement Learning This is the code complementing the paper Decentralized Reinforcment Learning: Global Decision-Making via Local Ec

null 40 Oct 30, 2022
Pytorch implementation of 'Fingerprint Presentation Attack Detector Using Global-Local Model'

RTK-PAD This is an official pytorch implementation of 'Fingerprint Presentation Attack Detector Using Global-Local Model', which is accepted by IEEE T

null 6 Aug 1, 2022
Losslandscapetaxonomy - Taxonomizing local versus global structure in neural network loss landscapes

Taxonomizing local versus global structure in neural network loss landscapes Int

Yaoqing Yang 8 Dec 30, 2022
Code for the paper "MASTER: Multi-Aspect Non-local Network for Scene Text Recognition" (Pattern Recognition 2021)

MASTER-PyTorch PyTorch reimplementation of "MASTER: Multi-Aspect Non-local Network for Scene Text Recognition" (Pattern Recognition 2021). This projec

Wenwen Yu 255 Dec 29, 2022
[ACL 20] Probing Linguistic Features of Sentence-level Representations in Neural Relation Extraction

REval Table of Contents Introduction Overview Requirements Installation Probing Usage Citation License ?? Introduction REval is a simple framework for

null 13 Jan 6, 2023
PyTorch implementation of "ContextNet: Improving Convolutional Neural Networks for Automatic Speech Recognition with Global Context" (INTERSPEECH 2020)

ContextNet ContextNet has CNN-RNN-transducer architecture and features a fully convolutional encoder that incorporates global context information into

Sangchun Ha 24 Nov 24, 2022
Code for the CVPR2021 paper "Patch-NetVLAD: Multi-Scale Fusion of Locally-Global Descriptors for Place Recognition"

Patch-NetVLAD: Multi-Scale Fusion of Locally-Global Descriptors for Place Recognition This repository contains code for the CVPR2021 paper "Patch-NetV

QVPR 368 Jan 6, 2023
Revisiting Contrastive Methods for Unsupervised Learning of Visual Representations. [2021]

Revisiting Contrastive Methods for Unsupervised Learning of Visual Representations This repo contains the Pytorch implementation of our paper: Revisit

Wouter Van Gansbeke 80 Nov 20, 2022
PyTorch implementation of SimCLR: A Simple Framework for Contrastive Learning of Visual Representations

PyTorch implementation of SimCLR: A Simple Framework for Contrastive Learning of Visual Representations

Thalles Silva 1.7k Dec 28, 2022
ImageNet-CoG is a benchmark for concept generalization. It provides a full evaluation framework for pre-trained visual representations which measure how well they generalize to unseen concepts.

The ImageNet-CoG Benchmark Project Website Paper (arXiv) Code repository for the ImageNet-CoG Benchmark introduced in the paper "Concept Generalizatio

NAVER 23 Oct 9, 2022
[ICLR'21] FedBN: Federated Learning on Non-IID Features via Local Batch Normalization

FedBN: Federated Learning on Non-IID Features via Local Batch Normalization This is the PyTorch implemention of our paper FedBN: Federated Learning on

Med-AIR@CUHK 156 Dec 15, 2022