LESA
Introduction
This repository contains the official implementation of Locally Enhanced Self-Attention: Rethinking Self-Attention as Local and Context Terms. The code for image classification and object detection is based on axial-deeplab and mmdetection.
Citing LESA
If you find LESA is helpful in your project, please consider citing our paper.
@article{yang2021locally,
title={Locally Enhanced Self-Attention: Rethinking Self-Attention as Local and Context Terms},
author={Yang, Chenglin and Qiao, Siyuan and Kortylewski, Adam and Yuille, Alan},
journal={arXiv preprint arXiv:2107.05637},
year={2021}
}
Main Results on ImageNet
Please refer to LESA_classification for details.
Method | Model | Top-1 Acc. | Top-5 Acc. |
---|---|---|---|
LESA_ResNet50 | Download | 79.55 | 94.79 |
LESA_WRN50 | Download | 80.18 | 95.07 |
Main Results on COCO test-dev
Please refer to LESA_detection for details.
Method | Backbone | Pretrained | Model | Box AP | Mask AP |
---|---|---|---|---|---|
Mask-RCNN | LESA_ResNet50 | Download | Download | 44.2 | 39.6 |
HTC | LESA_WRN50 | Download | Download | 50.5 | 44.4 |
Credits
This project is based on axial-deeplab and mmdetection.
Relative position embedding is based on bottleneck-transformer-pytorch
ResNet is based on pytorch/vision. Classification helper functions are based on pytorch-classification.