DPT: Deformable Patch-based Transformer for Visual Recognition (ACM MM2021)

Related tags

Deep Learning DPT
Overview

DPT


This repo is the official implementation of DPT: Deformable Patch-based Transformer for Visual Recognition (ACM MM2021). We provide code and models for the following tasks:

Image Classification: Detailed instruction and information see classification/README.md.

Object Detection: Detailed instruction and information see detection/README.md.

The papar has been relased on [Arxiv].

Introduction

Deformable Patch (DePatch) is a plug-and-play module. It learns to adaptively split the images input patches with different positions and scales in a data-driven way, rather than using predefined fixed patches. In this way, our method can well preserve the semantics in patches.

In this repository, code and models for a Deformable Patch-based Transformer (DPT) are provided. As this field is developing rapidly, we are willing to see our DePatch applied to some other latest architectures and promote further research.

Main Results

Image Classification

Training commands and pretrained models are provided >>> here <<<.

Method #Params (M) FLOPs(G) Acc@1
DPT-Tiny 15.2 2.1 77.4
DPT-Small 26.4 4.0 81.0
DPT-Medium 46.1 6.9 81.9

Object Detection

Coming soon.

Citation

@inproceedings{chenDPT21,
  title = {DPT: Deformable Patch-based Transformer for Visual Recognition},
  author = {Zhiyang Chen and Yousong Zhu and Chaoyang Zhao and Guosheng Hu and Wei Zeng and Jinqiao Wang and Ming Tang},
  booktitle={Proceedings of the ACM International Conference on Multimedia},
  year={2021}
}

License

This repository is released under the Apache 2.0 license as found in the LICENSE file.

Acknowledgement

Our implementation is mainly based on PVT. The CUDA operator is borrowed from Deformable-DETR. You may refer these repositories for further information.

You might also like...
The pytorch implementation of  DG-Font: Deformable Generative Networks for Unsupervised Font Generation
The pytorch implementation of DG-Font: Deformable Generative Networks for Unsupervised Font Generation

DG-Font: Deformable Generative Networks for Unsupervised Font Generation The source code for 'DG-Font: Deformable Generative Networks for Unsupervised

[CVPRW 2021] Code for Region-Adaptive Deformable Network for Image Quality Assessment
[CVPRW 2021] Code for Region-Adaptive Deformable Network for Image Quality Assessment

RADN [CVPRW 2021] Code for Region-Adaptive Deformable Network for Image Quality Assessment [Paper on arXiv] Overview Update [2021/5/7] add codes for W

Deformable DETR is an efficient and fast-converging end-to-end object detector.
Deformable DETR is an efficient and fast-converging end-to-end object detector.

Deformable DETR: Deformable Transformers for End-to-End Object Detection.

Official implementation of NPMs: Neural Parametric Models for 3D Deformable Shapes - ICCV 2021
Official implementation of NPMs: Neural Parametric Models for 3D Deformable Shapes - ICCV 2021

NPMs: Neural Parametric Models Project Page | Paper | ArXiv | Video NPMs: Neural Parametric Models for 3D Deformable Shapes Pablo Palafox, Aljaz Bozic

PyTorch implementation of Deformable Convolution

Deformable Convolutional Networks in PyTorch This repo is an implementation of Deformable Convolution. Ported from author's MXNet implementation. Buil

Code for ICCV 2021 paper: ARAPReg: An As-Rigid-As Possible Regularization Loss for Learning Deformable Shape Generators..
Code for ICCV 2021 paper: ARAPReg: An As-Rigid-As Possible Regularization Loss for Learning Deformable Shape Generators..

ARAPReg Code for ICCV 2021 paper: ARAPReg: An As-Rigid-As Possible Regularization Loss for Learning Deformable Shape Generators.. Installation The cod

PyTorch implementation of Deformable Convolution
PyTorch implementation of Deformable Convolution

PyTorch implementation of Deformable Convolution !!!Warning: There is some issues in this implementation and this repo is not maintained any more, ple

A multi-scale unsupervised learning for deformable image registration

A multi-scale unsupervised learning for deformable image registration Shuwei Shao, Zhongcai Pei, Weihai Chen, Wentao Zhu, Xingming Wu and Baochang Zha

Some code of the implements of Geological Modeling Using 3D Pixel-Adaptive and Deformable Convolutional Neural Network

3D-GMPDCNN Geological Modeling Using 3D Pixel-Adaptive and Deformable Convolutional Neural Network PyTorch implementation of "Geological Modeling Usin

Comments
  • extract the patch by calling MSDeformAttnFunction function

    extract the patch by calling MSDeformAttnFunction function

    Thanks for sharing your work~ I find in file depatch_embed.py (L112) will call this line:

    output = MSDeformAttnFunction.apply(x, value_spatial_shapes, self.value_level_start_index, sampling_locations, attention_weights, 1) I assume that this code may produce extra deformable attention calculation. If it is true, this will produce extra computations and it is fair to compare to the PVT ?

    opened by RebornForPower 2
  • getting error in MultiScaleDeformableAttention

    getting error in MultiScaleDeformableAttention

    Hi , I am getting below error.

    ImportError: /home/shubham/anaconda3/envs/dpt1/lib/python3.8/site-packages/MultiScaleDeformableAttention-1.0-py3.8-linux-x86_64.egg/MultiScaleDeformableAttention.cpython-38-x86_64-linux-gnu.so: undefined symbol: _ZNK2at6Tensor7optionsEv

    torch.version = 1.10.0.dev20210812+cu111 GCC: gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0

    opened by shubham303 4
  • error: import MultiScaleDeformableAttention as MSDA ----solved

    error: import MultiScaleDeformableAttention as MSDA ----solved

    1.at first, need to install successfully -- sh ./make.sh ...... Processing dependencies for MultiScaleDeformableAttention==1.0 Finished processing dependencies for MultiScaleDeformableAttention==1.0

    2.meet error as following: import MultiScaleDeformableAttention as MSDA ImportError: libcudart.so.10.0: cannot open shared object file: No such file or directory

    3.method: sudo ldconfig /usr/local/cuda-10.0/lib64 (version: cuda-10.0)

    4.successfully

    opened by eeric 0
Owner
CASIA-IVA-Lab
Image & Video Analysis Group, Institute of Automation, Chinese Academy of Sciences
CASIA-IVA-Lab
Code for ACM MM2021 paper "Complementary Trilateral Decoder for Fast and Accurate Salient Object Detection"

CTDNet The PyTorch code for ACM MM2021 paper "Complementary Trilateral Decoder for Fast and Accurate Salient Object Detection" Requirements Python 3.6

CVTEAM 28 Oct 20, 2022
Implementation of Deformable Attention in Pytorch from the paper "Vision Transformer with Deformable Attention"

Deformable Attention Implementation of Deformable Attention from this paper in Pytorch, which appears to be an improvement to what was proposed in DET

Phil Wang 128 Dec 24, 2022
The code for MM2021 paper "Multi-Level Counterfactual Contrast for Visual Commonsense Reasoning"

The Code for MM2021 paper "Multi-Level Counterfactual Contrast for Visual Commonsense Reasoning" Setting up and using the repo Get the dataset. Follow

null 4 Apr 20, 2022
Implementation of Transformer in Transformer, pixel level attention paired with patch level attention for image classification, in Pytorch

Transformer in Transformer Implementation of Transformer in Transformer, pixel level attention paired with patch level attention for image c

Phil Wang 272 Dec 23, 2022
Alex Pashevich 62 Dec 24, 2022
Code for the CVPR2021 paper "Patch-NetVLAD: Multi-Scale Fusion of Locally-Global Descriptors for Place Recognition"

Patch-NetVLAD: Multi-Scale Fusion of Locally-Global Descriptors for Place Recognition This repository contains code for the CVPR2021 paper "Patch-NetV

QVPR 368 Jan 6, 2023
[ACM MM 2021] Yes, "Attention is All You Need", for Exemplar based Colorization

Transformer for Image Colorization This is an implemention for Yes, "Attention Is All You Need", for Exemplar based Colorization, and the current soft

Wang Yin 30 Dec 7, 2022
This is an official implementation for "ResT: An Efficient Transformer for Visual Recognition".

ResT By Qing-Long Zhang and Yu-Bin Yang [State Key Laboratory for Novel Software Technology at Nanjing University] This repo is the official implement

zhql 222 Dec 13, 2022
Code for Learning Manifold Patch-Based Representations of Man-Made Shapes, in ICLR 2021.

LearningPatches | Webpage | Paper | Video Learning Manifold Patch-Based Representations of Man-Made Shapes Dmitriy Smirnov, Mikhail Bessmeltsev, Justi

Dima Smirnov 22 Nov 14, 2022
This is the code for Deformable Neural Radiance Fields, a.k.a. Nerfies.

Deformable Neural Radiance Fields This is the code for Deformable Neural Radiance Fields, a.k.a. Nerfies. Project Page Paper Video This codebase conta

Google 1k Jan 9, 2023