This repo is built for paper: Attention Mechanisms in Computer Vision: A Survey paper

Vision-Attention-Papers

🔥 (citations > 200)

TODO : Code about different attention mechanisms will come soon.
TODO : Code link will come soon.
TODO : collect more related papers. Contributions are welcome.

Channel attention

Squeeze-and-Excitation Networks(CVPR2018) pdf, (PAMI2019 version) pdf 🔥
Image superresolution using very deep residual channel attention networks(ECCV2018) pdf 🔥
Context encoding for semantic segmentation(CVPR2018) pdf 🔥
Spatio-temporal channel correlation networks for action classification(ECCV2018) pdf
Global second-order pooling convolutional networks(CVPR2019) pdf
Srm : A style-based recalibration module for convolutional neural networks(ICCV2019) pdf
You look twice: Gaternet for dynamic filter selection in cnns(CVPR2019) pdf
Second-order attention network for single image super-resolution(CVPR2019) pdf 🔥
Spsequencenet: Semantic segmentation network on 4d point clouds(CVPR2020) pdf
Ecanet: Efficient channel attention for deep convolutional neural networks (CVPR2020) pdf 🔥
Gated channel transformation for visual recognition(CVPR2020) pdf
Fcanet: Frequency channel attention networks(ICCV2021) pdf

Spatial attention

Recurrent models of visual attention(NeurIPS2014), pdf 🔥
Show, attend and tell: Neural image caption generation with visual attention(PMLR2015) pdf 🔥
Draw: A recurrent neural network for image generation(ICML2015) pdf 🔥
Spatial transformer networks(NeurIPS2015) pdf 🔥
Multiple object recognition with visual attention(ICLR2015) pdf 🔥
Action recognition using visual attention(arXiv2015) pdf 🔥
Videolstm convolves, attends and flows for action recognition(arXiv2016) pdf 🔥
Look closer to see better: Recurrent attention convolutional neural network for fine-grained image recognition(CVPR2017) pdf 🔥
Learning multi-attention convolutional neural network for fine-grained image recognition(ICCV2017) pdf 🔥
Diversified visual attention networks for fine-grained object classification(TMM2017) pdf 🔥
Attentional pooling for action recognition(NeurIPS2017) pdf 🔥
Non-local neural networks(CVPR2018) pdf 🔥
Attentional shapecontextnet for point cloud recognition(CVPR2018) pdf
Relation networks for object detection(CVPR2018) pdf 🔥
a2-nets: Double attention networks(NeurIPS2018) pdf 🔥
Attention-aware compositional network for person re-identification(CVPR2018) pdf 🔥
Tell me where to look: Guided attention inference network(CVPR2018) pdf 🔥
Pedestrian alignment network for large-scale person re-identification(TCSVT2018) pdf 🔥
Learn to pay attention(ICLR2018) pdf 🔥
Attention U-Net: Learning Where to Look for the Pancreas(MIDL2018) pdf 🔥
Psanet: Point-wise spatial attention network for scene parsing(ECCV2018) pdf 🔥
Self attention generative adversarial networks(ICML2019) pdf 🔥
Attentional pointnet for 3d-object detection in point clouds(CVPRW2019) pdf
Co-occurrent features in semantic segmentation(CVPR2019) pdf
Attention augmented convolutional networks(ICCV2019) pdf 🔥
Local relation networks for image recognition(ICCV2019) pdf
Latentgnn: Learning efficient nonlocal relations for visual recognition(ICML2019) pdf
Graph-based global reasoning networks(CVPR2019) pdf 🔥
Gcnet: Non-local networks meet squeeze-excitation networks and beyond(ICCVW2019) pdf 🔥
Asymmetric non-local neural networks for semantic segmentation(ICCV2019) pdf 🔥
Looking for the devil in the details: Learning trilinear attention sampling network for fine-grained image recognition(CVPR2019) pdf
Second-order non-local attention networks for person re-identification(ICCV2019) pdf 🔥
End-to-end comparative attention networks for person re-identification(ICCV2019) pdf 🔥
Modeling point clouds with self-attention and gumbel subset sampling(CVPR2019) pdf
Diagnose like a radiologist: Attention guided convolutional neural network for thorax disease classification(arXiv 2019) pdf
L2g autoencoder: Understanding point clouds by local-to-global reconstruction with hierarchical self-attention(arXiv 2019) pdf
Generative pretraining from pixels(PMLR2020) pdf
Exploring self-attention for image recognition(CVPR2020) pdf
Cf-sis: Semantic-instance segmentation of 3d point clouds by context fusion with self attention(MM20) pdf
Disentangled non-local neural networks(ECCV2020) pdf
Relation-aware global attention for person re-identification(CVPR2020) pdf
Segmentation transformer: Object-contextual representations for semantic segmentation(ECCV2020) pdf 🔥
Spatial pyramid based graph reasoning for semantic segmentation(CVPR2020) pdf
Self-supervised Equivariant Attention Mechanism for Weakly Supervised Semantic Segmentation(CVPR2020) pdf
End-to-end object detection with transformers(ECCV2020) pdf 🔥
Pointasnl: Robust point clouds processing using nonlocal neural networks with adaptive sampling(CVPR2020) pdf
Rethinking semantic segmentation from a sequence-to-sequence perspective with transformers(CVPR2021) pdf
An image is worth 16x16 words: Transformers for image recognition at scale(ICLR2021) pdf 🔥
An empirical study of training selfsupervised vision transformers(CVPR2021) pdf
Ocnet: Object context network for scene parsing(IJCV 2021) pdf 🔥
Point transformer(ICCV 2021) pdf
PCT: Point Cloud Transformer (CVMJ 2021) pdf
Pre-trained image processing transformer(CVPR 2021) pdf
An empirical study of training self-supervised vision transformers(ICCV 2021) pdf
Segformer: Simple and efficient design for semantic segmentation with transformers(arxiv 2021) pdf
Beit: Bert pre-training of image transformers(arxiv 2021) pdf
Beyond selfattention: External attention using two linear layers for visual tasks(arxiv 2021) pdf
Query2label: A simple transformer way to multi-label classification(arxiv 2021) pdf
Transformer in transformer(arxiv 2021) pdf

Temporal attention

Jointly attentive spatial-temporal pooling networks for video-based person re-identification (ICCV 2017) pdf 🔥
Video person reidentification with competitive snippet-similarity aggregation and co-attentive snippet embedding(CVPR 2018) pdf
Scan: Self-and-collaborative attention network for video person re-identification (TIP 2019) pdf

Branch attention

Training very deep networks, (NeurIPS 2015) pdf 🔥
Selective kernel networks,(CVPR 2019) pdf 🔥
CondConv: Conditionally Parameterized Convolutions for Efficient Inference (NeurIPS 2019) pdf
Dynamic convolution: Attention over convolution kernels (CVPR 2020) pdf
ResNest: Split-attention networks (arXiv 2020) pdf 🔥

ChannelSpatial attention

Residual attention network for image classification (CVPR 2017) pdf 🔥
SCA-CNN: spatial and channel-wise attention in convolutional networks for image captioning,(CVPR 2017) pdf 🔥
CBAM: convolutional block attention module, (ECCV 2018) pdf 🔥
Harmonious attention network for person re-identification (CVPR 2018) pdf 🔥
Recalibrating fully convolutional networks with spatial and channel “squeeze and excitation” blocks (TMI 2018) pdf
Mancs: A multi-task attentional network with curriculum sampling for person re-identification (ECCV 2018) pdf 🔥
Bam: Bottleneck attention module(BMVC 2018) pdf 🔥
Pvnet: A joint convolutional network of point cloud and multi-view for 3d shape recognition (ACM MM 2018) pdf
Learning what and where to attend,(ICLR 2019) pdf
Dual attention network for scene segmentation (CVPR 2019) pdf 🔥
Abd-net: Attentive but diverse person re-identification (ICCV 2019) pdf
Mixed high-order attention network for person re-identification (ICCV 2019) pdf
Mlcvnet: Multi-level context votenet for 3d object detection (CVPR 2020) pdf
Improving convolutional networks with self-calibrated convolutions (CVPR 2020) pdf
Relation-aware global attention for person re-identification (CVPR 2020) pdf
Strip Pooling: Rethinking spatial pooling for scene parsing (CVPR 2020) pdf
Rotate to attend: Convolutional triplet attention module, (WACV 2021) pdf
Coordinate attention for efficient mobile network design (CVPR 2021) pdf
Simam: A simple, parameter-free attention module for convolutional neural networks (ICML 2021) pdf

SpatialTemporal attention

An end-to-end spatio-temporal attention model for human action recognition from skeleton data(AAAI 2017) pdf 🔥
Diversity regularized spatiotemporal attention for video-based person re-identification (ArXiv 2018) 🔥
Interpretable spatio-temporal attention for video action recognition (ICCVW 2019) pdf
Hierarchical lstms with adaptive attention for visual captioning, (TPAMI 2020) pdf
Stat: Spatial-temporal attention mechanism for video captioning, (TMM 2020) pdf_link
Gta: Global temporal attention for video action understanding (ArXiv 2020) pdf
Multi-granularity reference-aided attentive feature aggregation for video-based person re-identification (CVPR 2020) pdf
Read: Reciprocal attention discriminator for image-to-video re-identification, (ECCV 2020) pdf
Decoupled spatial-temporal transformer for video inpainting (ArXiv 2021) pdf

Implementation

Congratulations on publishing such a comprehensive paper.

Would be nice to implement some of the methods mentioned therein and package that as a library with a focus on readability.

opened by sayakpaul 3
Consideration of "Local Importance-based Pooling" (LIP) as an attention method

Hi Menghao,

Thanks for your great and helpful work. I regard our proposed LIP [1] also as a local attention method designed for the spatial downsampling procedure, whose weights are produced by a lightweight sub-network. Though being designed for spatial downsizing, it still falls in the regime of attention approaches. Could you please consider LIP also as an attention method especially for the pooling procedure?

By the way, the work TAM [2] done by our group is accepted by ICCV 2021. Please consider updating the venue of this paper.

[1] LIP: Local Importance-Based Pooling. ICCV 2019. [2] TAM: Temporal Adaptive Module for Video Recognition. ICCV 2021.

opened by sebgao 1
Source code

Hello, author, I'm a student. I want to use your code for research, but the code in the code folder I downloaded is empty. Can you share with me? My email: [email protected]

opened by Burgess-00 1
The top date is wrong I guess?

It seem that the top date comes from a pre-filled template and is set to:

JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 1

I guess it is wrong given how recent the paper is.

opened by yassineAlouini 1
Reference needed for ocr.py

@MenghaoGuo @idansc @Gsunshine @PengtaoJiang @uyzhang Can you please provide paper refence for code /spatial_attentions/ocr.py

I guess it is for text recognition with attention but I could not gate any relevant paper in provided paper list.

I want to train the network and check results

opened by AniketGurav 2

Summary of related papers on visual attention

Related tags

Overview

This repo is built for paper: Attention Mechanisms in Computer Vision: A Survey paper

Channel attention

Spatial attention

Temporal attention

Branch attention

ChannelSpatial attention

SpatialTemporal attention

Comments

Implementation

Consideration of "Local Importance-based Pooling" (LIP) as an attention method

Source code

The top date is wrong I guess?

Reference needed for ocr.py

Owner

MenghaoGuo

Collect super-resolution related papers, data, repositories

Fully-automated scripts for collecting AI-related papers

Implementation of the 😇 Attention layer from the paper, Scaling Local Self-Attention For Parameter Efficient Visual Backbones

🍀 Pytorch implementation of various Attention Mechanisms, MLP, Re-parameter, Convolution, which is helpful to further understand papers.⭐⭐⭐

A simple but complete full-attention transformer with a set of promising experimental features from various papers

Codebase for the Summary Loop paper at ACL2020

An unofficial styleguide and best practices summary for PyTorch

Summary Explorer is a tool to visually explore the state-of-the-art in text summarization.

Cross-platform CLI tool to generate your Github profile's stats and summary.

improvement of CLIP features over the traditional resnet features on the visual question answering, image captioning, navigation and visual entailment tasks.

Episodic Transformer (E.T.) is a novel attention-based architecture for vision-and-language navigation. E.T. is based on a multimodal transformer that encodes language inputs and the full episode history of visual observations and actions.

Bilinear attention networks for visual question answering

[ICCV 2021] Released code for Causal Attention for Unbiased Visual Recognition

[ICCV 2021] Counterfactual Attention Learning for Fine-Grained Visual Categorization and Re-identification

Pytorch implementation for our ICCV 2021 paper "TRAR: Routing the Attention Spans in Transformers for Visual Question Answering".

This is the official pytorch implementation for our ICCV 2021 paper "TRAR: Routing the Attention Spans in Transformers for Visual Question Answering" on VQA Task

Official implementation of NeurIPS 2021 paper "Contextual Similarity Aggregation with Self-attention for Visual Re-ranking"

Weak-supervised Visual Geo-localization via Attention-based Knowledge Distillation

Implementation of 🦩 Flamingo, state-of-the-art few-shot visual question answering attention net out of Deepmind, in Pytorch