DPT: Deformable Patch-based Transformer for Visual Recognition (ACM MM2021)

CASIA-IVA-Lab

Last update: Dec 21, 2022

Related tags

Deep Learning DPT

Overview

DPT

This repo is the official implementation of DPT: Deformable Patch-based Transformer for Visual Recognition (ACM MM2021). We provide code and models for the following tasks:

Image Classification: Detailed instruction and information see classification/README.md.

Object Detection: Detailed instruction and information see detection/README.md.

The papar has been relased on [Arxiv].

Introduction

Deformable Patch (DePatch) is a plug-and-play module. It learns to adaptively split the images input patches with different positions and scales in a data-driven way, rather than using predefined fixed patches. In this way, our method can well preserve the semantics in patches.

In this repository, code and models for a Deformable Patch-based Transformer (DPT) are provided. As this field is developing rapidly, we are willing to see our DePatch applied to some other latest architectures and promote further research.

Main Results

Image Classification

Training commands and pretrained models are provided >>> here <<<.

Method	#Params (M)	FLOPs(G)	Acc@1
DPT-Tiny	15.2	2.1	77.4
DPT-Small	26.4	4.0	81.0
DPT-Medium	46.1	6.9	81.9

Object Detection

Coming soon.

Citation

@inproceedings{chenDPT21,
  title = {DPT: Deformable Patch-based Transformer for Visual Recognition},
  author = {Zhiyang Chen and Yousong Zhu and Chaoyang Zhao and Guosheng Hu and Wei Zeng and Jinqiao Wang and Ming Tang},
  booktitle={Proceedings of the ACM International Conference on Multimedia},
  year={2021}
}

License

This repository is released under the Apache 2.0 license as found in the LICENSE file.

Acknowledgement

Our implementation is mainly based on PVT. The CUDA operator is borrowed from Deformable-DETR. You may refer these repositories for further information.

You might also like...

The pytorch implementation of DG-Font: Deformable Generative Networks for Unsupervised Font Generation

DG-Font: Deformable Generative Networks for Unsupervised Font Generation The source code for 'DG-Font: Deformable Generative Networks for Unsupervised

130 Dec 5, 2022

[CVPRW 2021] Code for Region-Adaptive Deformable Network for Image Quality Assessment

RADN [CVPRW 2021] Code for Region-Adaptive Deformable Network for Image Quality Assessment [Paper on arXiv] Overview Update [2021/5/7] add codes for W

53 Dec 28, 2022

Deformable DETR is an efficient and fast-converging end-to-end object detector.

Deformable DETR: Deformable Transformers for End-to-End Object Detection.

2k Jan 5, 2023

Official implementation of NPMs: Neural Parametric Models for 3D Deformable Shapes - ICCV 2021

NPMs: Neural Parametric Models Project Page | Paper | ArXiv | Video NPMs: Neural Parametric Models for 3D Deformable Shapes Pablo Palafox, Aljaz Bozic

109 Nov 22, 2022

PyTorch implementation of Deformable Convolution

Deformable Convolutional Networks in PyTorch This repo is an implementation of Deformable Convolution. Ported from author's MXNet implementation. Buil

411 Dec 16, 2022

Code for ICCV 2021 paper: ARAPReg: An As-Rigid-As Possible Regularization Loss for Learning Deformable Shape Generators..

ARAPReg Code for ICCV 2021 paper: ARAPReg: An As-Rigid-As Possible Regularization Loss for Learning Deformable Shape Generators.. Installation The cod

132 Nov 28, 2022

PyTorch implementation of Deformable Convolution

PyTorch implementation of Deformable Convolution !!!Warning: There is some issues in this implementation and this repo is not maintained any more, ple

893 Dec 18, 2022

A multi-scale unsupervised learning for deformable image registration

A multi-scale unsupervised learning for deformable image registration Shuwei Shao, Zhongcai Pei, Weihai Chen, Wentao Zhu, Xingming Wu and Baochang Zha

2 Apr 13, 2022

Some code of the implements of Geological Modeling Using 3D Pixel-Adaptive and Deformable Convolutional Neural Network

3D-GMPDCNN Geological Modeling Using 3D Pixel-Adaptive and Deformable Convolutional Neural Network PyTorch implementation of "Geological Modeling Usin

5 Nov 21, 2022

Comments

extract the patch by calling MSDeformAttnFunction function

Thanks for sharing your work~ I find in file depatch_embed.py (L112) will call this line:

output = MSDeformAttnFunction.apply(x, value_spatial_shapes, self.value_level_start_index, sampling_locations, attention_weights, 1) I assume that this code may produce extra deformable attention calculation. If it is true, this will produce extra computations and it is fair to compare to the PVT ?

opened by RebornForPower 2
getting error in MultiScaleDeformableAttention

Hi , I am getting below error.

ImportError: /home/shubham/anaconda3/envs/dpt1/lib/python3.8/site-packages/MultiScaleDeformableAttention-1.0-py3.8-linux-x86_64.egg/MultiScaleDeformableAttention.cpython-38-x86_64-linux-gnu.so: undefined symbol: _ZNK2at6Tensor7optionsEv

torch.version = 1.10.0.dev20210812+cu111 GCC: gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0

opened by shubham303 4
error: import MultiScaleDeformableAttention as MSDA ----solved

1.at first, need to install successfully -- sh ./make.sh ...... Processing dependencies for MultiScaleDeformableAttention==1.0 Finished processing dependencies for MultiScaleDeformableAttention==1.0

2.meet error as following: import MultiScaleDeformableAttention as MSDA ImportError: libcudart.so.10.0: cannot open shared object file: No such file or directory

3.method: sudo ldconfig /usr/local/cuda-10.0/lib64 (version: cuda-10.0)

4.successfully

opened by eeric 0

Owner

CASIA-IVA-Lab

Image & Video Analysis Group, Institute of Automation, Chinese Academy of Sciences

GitHub

Code for ACM MM2021 paper "Complementary Trilateral Decoder for Fast and Accurate Salient Object Detection"

CTDNet The PyTorch code for ACM MM2021 paper "Complementary Trilateral Decoder for Fast and Accurate Salient Object Detection" Requirements Python 3.6

28 Oct 20, 2022

Implementation of Deformable Attention in Pytorch from the paper "Vision Transformer with Deformable Attention"

Deformable Attention Implementation of Deformable Attention from this paper in Pytorch, which appears to be an improvement to what was proposed in DET

128 Dec 24, 2022

The code for MM2021 paper "Multi-Level Counterfactual Contrast for Visual Commonsense Reasoning"

The Code for MM2021 paper "Multi-Level Counterfactual Contrast for Visual Commonsense Reasoning" Setting up and using the repo Get the dataset. Follow

4 Apr 20, 2022

Implementation of Transformer in Transformer, pixel level attention paired with patch level attention for image classification, in Pytorch

Transformer in Transformer Implementation of Transformer in Transformer, pixel level attention paired with patch level attention for image c

272 Dec 23, 2022

Episodic Transformer (E.T.) is a novel attention-based architecture for vision-and-language navigation. E.T. is based on a multimodal transformer that encodes language inputs and the full episode history of visual observations and actions.

Episodic Transformers (E.T.) Episodic Transformer for Vision-and-Language Navigation Alexander Pashevich, Cordelia Schmid, Chen Sun Episodic Transform

62 Dec 24, 2022

DPT: Deformable Patch-based Transformer for Visual Recognition (ACM MM2021)

Related tags

Overview

DPT

Introduction

Main Results

Image Classification

Object Detection

Citation

License

Acknowledgement

You might also like...

The pytorch implementation of DG-Font: Deformable Generative Networks for Unsupervised Font Generation

[CVPRW 2021] Code for Region-Adaptive Deformable Network for Image Quality Assessment

Deformable DETR is an efficient and fast-converging end-to-end object detector.

Official implementation of NPMs: Neural Parametric Models for 3D Deformable Shapes - ICCV 2021

PyTorch implementation of Deformable Convolution

Code for ICCV 2021 paper: ARAPReg: An As-Rigid-As Possible Regularization Loss for Learning Deformable Shape Generators..

PyTorch implementation of Deformable Convolution

A multi-scale unsupervised learning for deformable image registration

Some code of the implements of Geological Modeling Using 3D Pixel-Adaptive and Deformable Convolutional Neural Network

Comments

extract the patch by calling MSDeformAttnFunction function

getting error in MultiScaleDeformableAttention

error: import MultiScaleDeformableAttention as MSDA ----solved

Owner

CASIA-IVA-Lab

Code for ACM MM2021 paper "Complementary Trilateral Decoder for Fast and Accurate Salient Object Detection"

Implementation of Deformable Attention in Pytorch from the paper "Vision Transformer with Deformable Attention"

The code for MM2021 paper "Multi-Level Counterfactual Contrast for Visual Commonsense Reasoning"

Implementation of Transformer in Transformer, pixel level attention paired with patch level attention for image classification, in Pytorch

Episodic Transformer (E.T.) is a novel attention-based architecture for vision-and-language navigation. E.T. is based on a multimodal transformer that encodes language inputs and the full episode history of visual observations and actions.

Code for the CVPR2021 paper "Patch-NetVLAD: Multi-Scale Fusion of Locally-Global Descriptors for Place Recognition"

[ACM MM 2021] Yes, "Attention is All You Need", for Exemplar based Colorization

This is an official implementation for "ResT: An Efficient Transformer for Visual Recognition".

Code for Learning Manifold Patch-Based Representations of Man-Made Shapes, in ICLR 2021.

This is the code for Deformable Neural Radiance Fields, a.k.a. Nerfies.