MlTr: Multi-label Classification with Transformer

程星

Last update: Nov 8, 2022

Related tags

Deep Learning MlTr

Overview

MlTr: Multi-label Classification with Transformer

This is official implement of "MlTr: Multi-label Classification with Transformer".

Abstract

The task of multi-label image classification is to recognize all the object labels presented in an image. Though advancing for years, small objects, similar objects and objects with high conditional probability are still the main bottlenecks of previous convolutional neural network(CNN) based models, limited by convolutional kernels' representational capacity. Recent vision transformer networks utilize the self-attention mechanism to extract the feature of pixel granularity, which expresses richer local semantic information, while is insufficient for mining global spatial dependence. In this paper, we point out the three crucial problems that CNN-based methods encounter and explore the possibility of conducting specific transformer modules to settle them. We put forward a Multi-label Transformer architecture(MlTr) constructed with windows partitioning, in-window pixel attention, cross-window attention, particularly improving the performance of multi-label image classification tasks. The proposed MlTr shows state-of-the-art results on various prevalent multi-label datasets such as MS-COCO, Pascal-VOC, and NUS-WIDE with 88.5%, 95.8%, and 65.5% respectively.

Pretrained model (Results on MS-COCO2014)

name	resolution	map	params(M)	model	log
mltr-s	224x224	81.9	33	coming soon	coming soon
mltr-m	384x384	86.8	62	coming soon	coming soon
mltr-l	384x384	88.5	108	coming soon	coming soon

Citing artical

Pleadse cite this article as:

@misc{cheng2021mltr,
      title={MlTr: Multi-label Classification with Transformer}, 
      author={Xing Cheng and Hezheng Lin and Xiangyu Wu and Fan Yang and Dong Shen and Zhongyuan Wang and Nian Shi and Honglin Liu},
      year={2021},
      eprint={2106.06195},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

Started

Please refer to get_started.

You might also like...

Implementation of Vision Transformer, a simple way to achieve SOTA in vision classification with only a single transformer encoder, in Pytorch

12.6k Jan 9, 2023

Official implementation of AAAI-21 paper "Label Confusion Learning to Enhance Text Classification Models"

Description: This is the official implementation of our AAAI-21 accepted paper Label Confusion Learning to Enhance Text Classification Models. The str

101 Nov 25, 2022

DECAF: Deep Extreme Classification with Label Features

DECAF DECAF: Deep Extreme Classification with Label Features @InProceedings{Mittal21, author = "Mittal, A. and Dahiya, K. and Agrawal, S. and Sain

46 Nov 6, 2022

Implementation of CrossViT: Cross-Attention Multi-Scale Vision Transformer for Image Classification

CrossViT : Cross-Attention Multi-Scale Vision Transformer for Image Classification This is an unofficial PyTorch implementation of CrossViT: Cross-Att

103 Nov 25, 2022

Official implementation of CrossViT: Cross-Attention Multi-Scale Vision Transformer for Image Classification

CrossViT This repository is the official implementation of CrossViT: Cross-Attention Multi-Scale Vision Transformer for Image Classification. ArXiv If

168 Dec 29, 2022

VSR-Transformer - This paper proposes a new Transformer for video super-resolution (called VSR-Transformer).

VSR-Transformer By Jiezhang Cao, Yawei Li, Kai Zhang, Luc Van Gool This paper proposes a new Transformer for video super-resolution (called VSR-Transf

225 Nov 13, 2022

Shared Attention for Multi-label Zero-shot Learning

Shared Attention for Multi-label Zero-shot Learning Overview This repository contains the implementation of Shared Attention for Multi-label Zero-shot

26 Dec 14, 2022

[ICCV 2021] Official Pytorch implementation for Discriminative Region-based Multi-Label Zero-Shot Learning SOTA results on NUS-WIDE and OpenImages

Discriminative Region-based Multi-Label Zero-Shot Learning (ICCV 2021) [arXiv][Project page coming soon] Sanath Narayan*, Akshita Gupta*, Salman Kh

54 Nov 21, 2022

[ICCV 2021] Official Pytorch implementation for Discriminative Region-based Multi-Label Zero-Shot Learning SOTA results on NUS-WIDE and OpenImages

Discriminative Region-based Multi-Label Zero-Shot Learning (ICCV 2021) [arXiv][Project page coming soon] Sanath Narayan*, Akshita Gupta*, Salman Kh

54 Nov 21, 2022

MlTr: Multi-label Classification with Transformer

Related tags

Overview

MlTr: Multi-label Classification with Transformer

Abstract

Pretrained model (Results on MS-COCO2014)

Citing artical

Started

You might also like...

Implementation of Vision Transformer, a simple way to achieve SOTA in vision classification with only a single transformer encoder, in Pytorch

Official implementation of AAAI-21 paper "Label Confusion Learning to Enhance Text Classification Models"

DECAF: Deep Extreme Classification with Label Features

Implementation of CrossViT: Cross-Attention Multi-Scale Vision Transformer for Image Classification

Official implementation of CrossViT: Cross-Attention Multi-Scale Vision Transformer for Image Classification

VSR-Transformer - This paper proposes a new Transformer for video super-resolution (called VSR-Transformer).

Shared Attention for Multi-label Zero-shot Learning

[ICCV 2021] Official Pytorch implementation for Discriminative Region-based Multi-Label Zero-Shot Learning SOTA results on NUS-WIDE and OpenImages

[ICCV 2021] Official Pytorch implementation for Discriminative Region-based Multi-Label Zero-Shot Learning SOTA results on NUS-WIDE and OpenImages

Owner

程星

Official implementation of paper "Query2Label: A Simple Transformer Way to Multi-Label Classification".

Official Pytorch Implementation of: "Semantic Diversity Learning for Zero-Shot Multi-label Classification"(2021) paper

General Multi-label Image Classification with Transformers

Official implementation for the paper: "Multi-label Classification with Partial Annotations using Class-aware Selective Loss"

PyTorch implementation of Hierarchical Multi-label Text Classification: An Attention-based Recurrent Network

A benchmark dataset for mesh multi-label-classification based on cube engravings introduced in MeshCNN

This project aim to create multi-label classification annotation tool to boost annotation speed and make it more easier.

Official implementation of "Open-set Label Noise Can Improve Robustness Against Inherent Label Noise" (NeurIPS 2021)

A PyTorch implementation of ICLR 2022 Oral paper PiCO: Contrastive Label Disambiguation for Partial Label Learning

Implementation of Transformer in Transformer, pixel level attention paired with patch level attention for image classification, in Pytorch