CMT: Convolutional Neural Networks Meet Vision Transformers

Overview

CMT: Convolutional Neural Networks Meet Vision Transformers

[arxiv]

1. Introduction

model This repo is the CMT model which impelement with pytorch, no reference source code so this is a non-official version.

2. Enveriments

  • python 3.7+
  • pytorch 1.7.1
  • pillow
  • apex
  • opencv-python

You can see this repo to find how to install the apex

3. DataSet

  • Trainig
    /data/home/imagenet/train/xxx.jpeg, 0
    /data/home/imagenet/train/xxx.jpeg, 1
    ...
    /data/home/imagenet/train/xxx.jpeg, 999
    
  • Testing
    /data/home/imagenet/test/xxx.jpeg, 0
    /data/home/imagenet/test/xxx.jpeg, 1
    ...
    /data/home/imagenet/test/xxx.jpeg, 999
    

4. Training & Inference

  1. Training

    CMT-Tiny

    #!/bin/bash
    OMP_NUM_THREADS=1
    MKL_NUM_THREADS=1
    export OMP_NUM_THREADS
    export MKL_NUM_THREADS
    cd CMT-pytorch;
    CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python -W ignore -m torch.distributed.launch --nproc_per_node 8 train.py --batch_size 512 --num_workers 48 --lr 6e-3 --optimizer_name "adamw" --tf_optimizer 1 --cosine 1 --model_name cmtti --max_epochs 300 \
    --warmup_epochs 5 --num-classes 1000 --input_size 184 \ --crop_size 160 --weight_decay 1e-1 --grad_clip 0 --repeated-aug 0 --max_grad_norm 5.0 
    --drop_path_rate 0.1 --FP16 0 --qkv_bias 1 
    --ape 0 --rpe 1 --pe_nd 0 --mode O2 --amp 1 --apex 0 \ 
    --train_file $file_folder$/train.txt \
    --val_file $file_folder$/val.txt \
    --log-dir $save_folder$/log_dir \
    --checkpoints-path $save_folder$/checkpoints
    

    Note: If you use the bs 128 * 8 may be get more accuracy, balance the acc & speed.

  2. Inference

    #!/bin/bash
    cd CMT-pytorch;
    CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python -W ignore test.py \
    --dist-url 'tcp://127.0.0.1:9966' --dist-backend 'nccl' --multiprocessing-distributed=1 --world-size=1  --rank=0 
    --batch-size 128 --num-workers 48 --num-classes 1000 --input_size 184 --crop_size 160 \
    --ape 0 --rpe 1 --pe_nd 0 --qkv_bias 1 --swin 0 --model_name cmtti --dropout 0.1 --emb_dropout 0.1 \
    --test_file $file_folder$/val.txt \
    --checkpoints-path $save_folder$/checkpoints/xxx.pth.tar \
    --save_folder $save_folder$/acc_logits/
  3. calculate acc

    python utils/calculate_acc.py --logits_file $save_folder$/acc_logits/

5. Imagenet Result

model-name input_size FLOPs Params acc@one_crop(ours) acc(papers) weights
CMT-T 160x160 516M 11.3M 75.124% 79.2% weights
CMT-T 224x224 1.01G 11.3M 78.4% - weights
CMT-XS 192x192 - - - 81.8% -
CMT-S 224x224 - - - 83.5% -
CMT-L 256x256 - - - 84.5% -

6. TODO

  • Other result may comming sonn if someone need.
  • Release the CMT-XS result on the imagenet.
  • Check the diff with papers, author give the hyparameters on the issue
  • Adjusting the best hyperparameters for CMT or transformers

Supplementary

If you want to know more, I give the CMT explanation, as well as the tuning and training process on here.

You might also like...
Pytorch implementation of AngularGrad: A New Optimization Technique for Angular Convergence of Convolutional Neural Networks
Pytorch implementation of AngularGrad: A New Optimization Technique for Angular Convergence of Convolutional Neural Networks

AngularGrad Optimizer This repository contains the oficial implementation for AngularGrad: A New Optimization Technique for Angular Convergence of Con

PyTorch implementation of
PyTorch implementation of "ContextNet: Improving Convolutional Neural Networks for Automatic Speech Recognition with Global Context" (INTERSPEECH 2020)

ContextNet ContextNet has CNN-RNN-transducer architecture and features a fully convolutional encoder that incorporates global context information into

A PyTorch implementation of " EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks."

EfficientNet A PyTorch implementation of EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks. [arxiv] [Official TF Repo] Implemen

Learning and Building Convolutional Neural Networks using PyTorch
Learning and Building Convolutional Neural Networks using PyTorch

Image Classification Using Deep Learning Learning and Building Convolutional Neural Networks using PyTorch. Models, selected are based on number of ci

An implementation of paper `Real-time Convolutional Neural Networks for Emotion and Gender Classification` with PaddlePaddle.
An implementation of paper `Real-time Convolutional Neural Networks for Emotion and Gender Classification` with PaddlePaddle.

简介 通过PaddlePaddle框架复现了论文 Real-time Convolutional Neural Networks for Emotion and Gender Classification 中提出的两个模型,分别是SimpleCNN和MiniXception。利用 imdb_crop

PyTorch Implementation of [1611.06440] Pruning Convolutional Neural Networks for Resource Efficient Inference

PyTorch implementation of [1611.06440 Pruning Convolutional Neural Networks for Resource Efficient Inference] This demonstrates pruning a VGG16 based

[ICCV 2021] Amplitude-Phase Recombination: Rethinking Robustness of Convolutional Neural Networks in Frequency Domain
[ICCV 2021] Amplitude-Phase Recombination: Rethinking Robustness of Convolutional Neural Networks in Frequency Domain

Amplitude-Phase Recombination (ICCV'21) Official PyTorch implementation of "Amplitude-Phase Recombination: Rethinking Robustness of Convolutional Neur

A PyTorch implementation for V-Net: Fully Convolutional Neural Networks for Volumetric Medical Image Segmentation
A PyTorch implementation for V-Net: Fully Convolutional Neural Networks for Volumetric Medical Image Segmentation

A PyTorch implementation of V-Net Vnet is a PyTorch implementation of the paper V-Net: Fully Convolutional Neural Networks for Volumetric Medical Imag

A PyTorch implementation of Multi-digit Number Recognition from Street View Imagery using Deep Convolutional Neural Networks
A PyTorch implementation of Multi-digit Number Recognition from Street View Imagery using Deep Convolutional Neural Networks

SVHNClassifier-PyTorch A PyTorch implementation of Multi-digit Number Recognition from Street View Imagery using Deep Convolutional Neural Networks If

Comments
  • Results on ImageNet

    Results on ImageNet

    Hi,关于结果的一些训练方法,除了CMT-T(600e-1000e)和CMT-XS(350e-400e)要高于300e,当时CMT-T为了对标EfficientNet的训练,300epoch达不到79,600e以上才能到78以上,大概参数如下,其他的应该和论文差不多,比如一模一样的FLOPs的话,R=3.8其实是R=3.77这种,感觉无关紧要,就不贴在issue里了,投稿体验极差,本来想放代码的,也拖着了==,希望这些参数对你有帮助。

    CMT-Tiny (600e-1000e is better) Top-1: 79.2 python -m torch.distributed.launch --nproc_per_node=8 train_deit.py --model cmt_tiny --batch-size 256 --apex-amp --input-size 160 --weight-decay 0.04 --drop-path 0.1 --epochs 1000 --warmup-lr 1e-7 --warmup-epochs 20 --lr 8e-4 --min-lr 2e-5 --no-model-ema

    CMT-XS (350e-400e is better) Top-1: 81.8 python -m torch.distributed.launch --nproc_per_node=8 train_deit.py --model cmt_extra_small --batch-size 128 --apex-amp --input-size 192 --weight-decay 0.08 --drop-path 0.1 --epochs 400 --warmup-epochs 20 --lr 7e-4 --min-lr 2e-5 --model-ema-decay 0.9998

    CMT-Small (300e) Top-1: 83.5 python -m torch.distributed.launch --nproc_per_node=8 train_deit.py --model cmt_small --batch-size 128 --apex-amp --input-size 224 --weight-decay 0.05 --drop-path 0.1 --epochs 300 --model-ema-decay 0.99996

    CMT-Base (300e, FC Drop=0.3) Top-1: 84.5 python -m torch.distributed.launch --nproc_per_node=8 train_deit.py --model cmt_base --batch-size 128 --apex-amp --input-size 256 --weight-decay 0.05 --drop-path 0.25 --epochs 300 --min-lr 2e-5 --model-ema-decay 0.99996

    CMT-Large (300e, FC Drop=0.3) Top-1: 84.8 python -m torch.distributed.launch --nproc_per_node=8 train_deit.py --model cmt_large --batch-size 128 --apex-amp --input-size 288 --weight-decay 0.05 --drop-path 0.4 --epochs 300 --model-ema-decay 0.99996

    opened by ggjy 5
  • issues on current architecture

    issues on current architecture

    • 根据论文的描述以及图示中,与当前实现不同的地方:

      • LightMultiHeadSelfAttentionself.sr = nn.Conv2d(...) 应该是 DW Conv。 这里使用 DW 的话,总体的参数量应该可以接近论文中的描述。
      • InvertedResidualFeedForward 中 DW 部分应该类似 F(X) = Norm(GELU(DWConv(X) + X)),当前的实现类似 F(X) = Norm(GELU(DWConv(X))) + X
      • 论文中的描述,每个 stage 降采样的 Conv2D 后面有一个 layer_norm
    • 另外不确定的地方:

      • 接触到的很多网络模型,一般 conv2d 中不使用 bias,不知道作者这里是用的什么。@ggjy
    • 我大概写了 Tensorflow 的实现 Keras CMTRelativePostional 部分还没写。有时间也训练一下试试,最近写的几个模型 Halonet / CoAtNet 什么的训练都占用显存很大,不好跑啊。

    opened by leondgarse 2
Owner
FlyEgle
JOYY AI GROUP - Machine Learning Engineer(Computer Vision)
FlyEgle
A PyTorch implementation of "Predict then Propagate: Graph Neural Networks meet Personalized PageRank" (ICLR 2019).

APPNP ⠀ A PyTorch implementation of Predict then Propagate: Graph Neural Networks meet Personalized PageRank (ICLR 2019). Abstract Neural message pass

Benedek Rozemberczki 329 Dec 30, 2022
This repository implements and evaluates convolutional networks on the Möbius strip as toy model instantiations of Coordinate Independent Convolutional Networks.

Orientation independent Möbius CNNs This repository implements and evaluates convolutional networks on the Möbius strip as toy model instantiations of

Maurice Weiler 59 Dec 9, 2022
Complex-Valued Neural Networks (CVNN)Complex-Valued Neural Networks (CVNN)

Complex-Valued Neural Networks (CVNN) Done by @NEGU93 - J. Agustin Barrachina Using this library, the only difference with a Tensorflow code is that y

youceF 1 Nov 12, 2021
Code for our ICASSP 2021 paper: SA-Net: Shuffle Attention for Deep Convolutional Neural Networks

SA-Net: Shuffle Attention for Deep Convolutional Neural Networks (paper) By Qing-Long Zhang and Yu-Bin Yang [State Key Laboratory for Novel Software T

Qing-Long Zhang 199 Jan 8, 2023
《Truly shift-invariant convolutional neural networks》(2021)

Truly shift-invariant convolutional neural networks [Paper] Authors: Anadi Chaman and Ivan Dokmanić Convolutional neural networks were always assumed

Anadi Chaman 46 Dec 19, 2022
《A-CNN: Annularly Convolutional Neural Networks on Point Clouds》(2019)

A-CNN: Annularly Convolutional Neural Networks on Point Clouds Created by Artem Komarichev, Zichun Zhong, Jing Hua from Department of Computer Science

Artёm Komarichev 44 Feb 24, 2022
Spontaneous Facial Micro Expression Recognition using 3D Spatio-Temporal Convolutional Neural Networks

Spontaneous Facial Micro Expression Recognition using 3D Spatio-Temporal Convolutional Neural Networks Abstract Facial expression recognition in video

Bogireddy Sai Prasanna Teja Reddy 103 Dec 29, 2022
This project is a loose implementation of paper "Algorithmic Financial Trading with Deep Convolutional Neural Networks: Time Series to Image Conversion Approach"

Stock Market Buy/Sell/Hold prediction Using convolutional Neural Network This repo is an attempt to implement the research paper titled "Algorithmic F

Asutosh Nayak 136 Dec 28, 2022
Fine-tune pretrained Convolutional Neural Networks with PyTorch

Fine-tune pretrained Convolutional Neural Networks with PyTorch. Features Gives access to the most popular CNN architectures pretrained on ImageNet. A

Alex Parinov 694 Nov 23, 2022