External Attention Network

Related tags

Deep Learning -EANet
Overview

Beyond Self-attention: External Attention using Two Linear Layers for Visual Tasks

paper : https://arxiv.org/abs/2105.02358

Jittor code will come soon

Pascal VOC test result link

Other implementation:

Pytorch : https://github.com/xmu-xiaoma666/External-Attention-pytorch

TODO

  • release jittor semantic segmentation code and checkpoint.
  • release torch semantic segmentation code and checkpoint.
  • release point cloud related code and checkpoint.
  • merge segmentation module into mmsegmentation to reproduce the ADE20K and Cityscapes dataset results.
  • merge PyTorch-StudioGAN to reproduce the GAN results.

Acknowledgments

We would like to sincerely thank HamNet_seg, EMANet_seg, openseg, T2T-ViT, mmsegmentation and PyTorch-StudioGAN for their awesome released code.

Astract

Attention mechanisms, especially self-attention, play an increasingly important role in deep feature representation in visual tasks. Self-attention updates the feature at each position by computing a weighted sum of features using pair-wise affinities across all positions to capture long-range dependency within a single sample. However, self-attention has a quadratic complexity and ignores potential correlation between different samples. This paper proposes a novel attention mechanism which we call external attention, based on two external, small, learnable, and shared memories, which can be implemented easily by simply using two cascaded linear layers and two normalization layers; it conveniently replaces self-attention in existing popular architectures. External attention has linear complexity and implicitly considers the correlations between all samples. Extensive experiments on image classification, semantic segmentation, image generation, point cloud classification and point cloud segmentation tasks reveal that our method provides comparable or superior performance to the self-attention mechanism and some of its variants, with much lower computational and memory costs.

Jittor

Jittor is a high-performance deep learning framework which is easy to learn and use. It provides interfaces like Pytorch.

You can learn how to use Jittor in following links:

Jittor homepage: https://cg.cs.tsinghua.edu.cn/jittor/

Jittor github: https://github.com/Jittor/jittor

If you has any questions about Jittor, you can ask in Jittor developer QQ Group: 761222083

Citation

If it is helpful for your work, please cite this paper:

@misc{guo2021attention,
      title={Beyond Self-attention: External Attention using Two Linear Layers for Visual Tasks}, 
      author={Meng-Hao Guo and Zheng-Ning Liu and Tai-Jiang Mu and Shi-Min Hu},
      year={2021},
      eprint={2105.02358},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}
Comments
  • the code of multi head external attetion can't work

    the code of multi head external attetion can't work

    85ed43e6f2f7159a117ad374faa6570 It's obvious that the code is wrong. You should check the reshape and transpose part in your code. And what is the coef? It has not been mentioned in the paper. And also you didn't import the torch. And the qkv_bias, qk_scale is not used here. Did you really run the multi-head version in your paper?

    opened by DRJYYDS 3
  • 关于EA实现的2个问题,线性层和normalization

    关于EA实现的2个问题,线性层和normalization

    先reshape(view)再Conv1d和直接11 Conv2d有什么区别,为什么不直接用11 Conv2d呢?

    而且这么看的话,EA和ResNet的bottleneck非常相似,都是沙漏状11 Conv2d中间夹一个模块,只不过ResNet夹的是BN-ReLU-33 Conv2d-BN-ReLU,而EA夹的是double normalization

    另外,我把normalization从先在N=H*W维度上softmax再在k=S维度上L1 norm改成只在k=S维度上softmax,在简单的任务上性能没有显著差异,不知道作者有没有做相关的消融实验?(论文中没有提及)

    opened by YouJiacheng 3
  • Question about multi-head EA

    Question about multi-head EA

    Hi, Thanks for releasing the code! Do you think the representation capability of MEA is lower than EA, since the external memories are shared across different heads?

    opened by easonyang1996 2
  • External_attention in the code is strange, only do attention on channel, but no attention on pixel.

    External_attention in the code is strange, only do attention on channel, but no attention on pixel.

    Hello Menhao, I have some question about the difference between code and paper. In the paper, 'Equation (5) is the similarity between the i-th pixel and the j-th rows of M', external attention do the attention between pixels, however in the code I think Conv1d only can do the attention amount only one pixels’ channels, and pixels in F not do attention with pixels in M. And in the paper 'In fact, we find that a small _S_, e.g. 64, works well in experiments.' but in the code d is setted to 64 instead of S.

    opened by zanonShao 2
  • The problem of  the program running on Cifar10/100

    The problem of the program running on Cifar10/100

    Hi,Menghao. I run the command: CUDA_VISIBLE_DEVICES=1,3 python transfer_learning.py and want to verify the performance of EANet on Cifar10/100. But after two rounds of iteration, the loss function remains unchanged. And the value of ACC is always 10.000%. The result is shown in the figure below: 屏幕截图 2021-06-09 105152 Could you please help me to solve this problem?

    opened by laohanlin 1
  • Reproducing semantic segmentation result on PASCAL VOC

    Reproducing semantic segmentation result on PASCAL VOC

    Not able to reproduce semantic segmentation result on PASCAL VOC val with MMSegmentation. I used the code from https://github.com/MenghaoGuo/-EANet/blob/main/model_torch.py and modified nothing despite replacing the backbone with the one in MMSegmentation. After training EANet and PSPNet with several sets of configs, the result is that the mIoU of EANet is always a little bit below PSPNet, e.g. 73 vs 75. Any suggestions?

    opened by npurson 1
  • Visualization Code of Attention Maps

    Visualization Code of Attention Maps

    Thanks for sharing this great work. I was reading the paper and trying to understand the mechanism. In the paper, Fig 4 outcomes caught my attention, and wanted to see how such amazing attention has been generated. Is the generated code already released or will be?

    opened by innat 0
  • multi_head_attention_torch.py

    multi_head_attention_torch.py

    作者您好,关于multi head attention代码中,self.coef=4,这里的coef=4的作用是什么呢?self.trans_dims = nn.Linear(dim, dim * self.coef)的输入输出是不同维度,但原始self attention中的Q经过现性变换前后维度相同,这是为什么呢。

    opened by WUHE-art 0
  • EA代码细节问题

    EA代码细节问题

    作者您好,最近阅读了你的论文,产生了浓厚的兴趣,仔细阅读了论文和代码,这里有一个疑问。在multi_head_attention_torch.py文件中,外部注意力机制的实现经过了4个线性层,第一个线性层将Q的维度由dim变成dim×4,中间两层是记忆单元维度64,最后输出将dim×4变回dim,但这样做会增加计算量,比自注意力的计算量大,虽然EA的复杂度是线性的。 上面是我的一点疑问,期待你的答复,谢谢。

    opened by WUHE-art 0
  • External Attention v.s. Covolutional Kernel

    External Attention v.s. Covolutional Kernel

    Intuitively, the memory units serve as prototypes for different patterns, almost play the same role as a convolution kernel (especially 1*1 conv kernel). From the perspective of mathematical operation, In both cases, the dot product between the feature vector and the memory unit/convolution kernel will be the output.

    Hence the question comes that, what are the differences between a memory unit and a convolution kernel?

    opened by rayleizhu 0
  • No Resnet connection ?

    No Resnet connection ?

    Hi,

    I see that the new multi-head version doesn 't use resnet connection, which is different from former version. Nor did you use normalization. How do you think about this ?

    Thank you.

    opened by shuuchen 0
Owner
MenghaoGuo
First-year Ph.D candidate at G2 group, Tsinghua University.
MenghaoGuo
PyTorch code for our paper "Attention in Attention Network for Image Super-Resolution"

Under construction... Attention in Attention Network for Image Super-Resolution (A2N) This repository is an PyTorch implementation of the paper "Atten

Haoyu Chen 71 Dec 30, 2022
codes for Image Inpainting with External-internal Learning and Monochromic Bottleneck

Image Inpainting with External-internal Learning and Monochromic Bottleneck This repository is for the CVPR 2021 paper: 'Image Inpainting with Externa

null 97 Nov 29, 2022
VID-Fusion: Robust Visual-Inertial-Dynamics Odometry for Accurate External Force Estimation

VID-Fusion VID-Fusion: Robust Visual-Inertial-Dynamics Odometry for Accurate External Force Estimation Authors: Ziming Ding , Tiankai Yang, Kunyi Zhan

ZJU FAST Lab 86 Nov 18, 2022
[ACL-IJCNLP 2021] Improving Named Entity Recognition by External Context Retrieving and Cooperative Learning

CLNER The code is for our ACL-IJCNLP 2021 paper: Improving Named Entity Recognition by External Context Retrieving and Cooperative Learning CLNER is a

null 71 Dec 8, 2022
Defending against Model Stealing via Verifying Embedded External Features

Defending against Model Stealing Attacks via Verifying Embedded External Features This is the official implementation of our paper Defending against M

null 20 Dec 30, 2022
Code for "Retrieving Black-box Optimal Images from External Databases" (WSDM 2022)

Retrieving Black-box Optimal Images from External Databases (WSDM 2022) We propose how a user retreives an optimal image from external databases of we

joisino 5 Apr 13, 2022
Apply a perspective transformation to a raster image inside Inkscape (no need to use an external software such as GIMP or Krita).

Raster Perspective Apply a perspective transformation to bitmap image using the selected path as envelope, without the need to use an external softwar

s.ouchene 19 Dec 22, 2022
Implementation of Transformer in Transformer, pixel level attention paired with patch level attention for image classification, in Pytorch

Transformer in Transformer Implementation of Transformer in Transformer, pixel level attention paired with patch level attention for image c

Phil Wang 272 Dec 23, 2022
Implementation of the 😇 Attention layer from the paper, Scaling Local Self-Attention For Parameter Efficient Visual Backbones

HaloNet - Pytorch Implementation of the Attention layer from the paper, Scaling Local Self-Attention For Parameter Efficient Visual Backbones. This re

Phil Wang 189 Nov 22, 2022
Implementation of STAM (Space Time Attention Model), a pure and simple attention model that reaches SOTA for video classification

STAM - Pytorch Implementation of STAM (Space Time Attention Model), yet another pure and simple SOTA attention model that bests all previous models in

Phil Wang 109 Dec 28, 2022
Attention-driven Robot Manipulation (ARM) which includes Q-attention

Attention-driven Robotic Manipulation (ARM) This codebase is home to: Q-attention: Enabling Efficient Learning for Vision-based Robotic Manipulation I

Stephen James 84 Dec 29, 2022
Locally Enhanced Self-Attention: Rethinking Self-Attention as Local and Context Terms

LESA Introduction This repository contains the official implementation of Locally Enhanced Self-Attention: Rethinking Self-Attention as Local and Cont

Chenglin Yang 20 Dec 31, 2021
Official Pytorch Implementation of Relational Self-Attention: What's Missing in Attention for Video Understanding

Relational Self-Attention: What's Missing in Attention for Video Understanding This repository is the official implementation of "Relational Self-Atte

mandos 43 Dec 7, 2022
Implementation of a memory efficient multi-head attention as proposed in the paper, "Self-attention Does Not Need O(n²) Memory"

Memory Efficient Attention Pytorch Implementation of a memory efficient multi-head attention as proposed in the paper, Self-attention Does Not Need O(

Phil Wang 180 Jan 5, 2023
Official implementation of cosformer-attention in cosFormer: Rethinking Softmax in Attention

cosFormer Official implementation of cosformer-attention in cosFormer: Rethinking Softmax in Attention Update log 2022/2/28 Add core code License This

null 120 Dec 15, 2022
Implementation of Deformable Attention in Pytorch from the paper "Vision Transformer with Deformable Attention"

Deformable Attention Implementation of Deformable Attention from this paper in Pytorch, which appears to be an improvement to what was proposed in DET

Phil Wang 128 Dec 24, 2022
Graph neural network message passing reframed as a Transformer with local attention

Adjacent Attention Network An implementation of a simple transformer that is equivalent to graph neural network where the message passing is done with

Phil Wang 49 Dec 28, 2022
Implementation of TabTransformer, attention network for tabular data, in Pytorch

Tab Transformer Implementation of Tab Transformer, attention network for tabular data, in Pytorch. This simple architecture came within a hair's bread

Phil Wang 420 Jan 5, 2023
Implementation of E(n)-Transformer, which extends the ideas of Welling's E(n)-Equivariant Graph Neural Network to attention

E(n)-Equivariant Transformer (wip) Implementation of E(n)-Equivariant Transformer, which extends the ideas from Welling's E(n)-Equivariant G

Phil Wang 132 Jan 2, 2023