An official implementation of the Anchor DETR.

Overview

Anchor DETR: Query Design for Transformer-Based Detector

Introduction

This repository is an official implementation of the Anchor DETR. We encode the anchor points as the object queries in DETR. Multiple patterns are attached to each anchor point to solve the difficulty: "one region, multiple objects". We also propose an attention variant RCDA to reduce the memory cost for high-resolution features.

DETR

Main Results

feature epochs AP GFLOPs Infer Speed (FPS)
DETR DC5 500 43.3 187 10 (12)
SMCA multi-level 50 43.7 152 10
Deformable DETR multi-level 50 43.8 173 15
Conditional DETR DC5 50 43.8 195 10
Anchor DETR DC5 50 44.3 151 16 (19)

Note:

  1. The results are based on ResNet-50 backbone.
  2. Inference speeds are measured on NVIDIA Tesla V100 GPU.
  3. The values in parentheses of the Infer Speed indicate the speed with torchscript optimization.

Model

name backbone AP URL
AnchorDETR-C5 R50 42.1 model / log
AnchorDETR-DC5 R50 44.3 model / log
AnchorDETR-C5 R101 43.5 model / log
AnchorDETR-DC5 R101 45.1 model / log

Note: the models and logs are also available at Baidu Netdisk with code hh13.

Usage

Installation

First, clone the repository locally:

git clone https://github.com/megvii-research/AnchorDETR.git

Then, install dependencies:

pip install -r requirements.txt

Training

To train AnchorDETR on a single node with 8 GPUs:

python -m torch.distributed.launch --nproc_per_node=8 --use_env main.py  --coco_path /path/to/coco 

Evaluation

To evaluate AnchorDETR on a single node with 8 GPUs:

python -m torch.distributed.launch --nproc_per_node=8 --use_env main.py --eval --coco_path /path/to/coco --resume /path/to/checkpoint.pth 

To evaluate AnchorDETR with a single GPU:

python main.py --eval --coco_path /path/to/coco --resume /path/to/checkpoint.pth

Citation

If you find this project useful for your research, please consider citing the paper.

@misc{wang2021anchor,
      title={Anchor DETR: Query Design for Transformer-Based Detector},
      author={Yingming Wang and Xiangyu Zhang and Tong Yang and Jian Sun},
      year={2021},
      eprint={2109.07107},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

Contact

If you have any questions, feel free to open an issue or contact us at [email protected].

Comments
  • num_feature_levels > 1

    num_feature_levels > 1

    Hello,

    When I tried num_feature_levels > 1, the code won't work. This line srcs = torch.cat(srcs, dim=1) in anchor_detr.py shows the mismatch tensor size error. Any ideas to fix it?

    question inactive 
    opened by yformer 43
  • question about extended to panoptic segmentation

    question about extended to panoptic segmentation

    Hi, AnchorDetr massively fixed detr problem on slow converge speed as well as boosted AP on small.

    However, notice that AnchorDETR using 900 as output candidates (num queries), this might not a problem in detection, but once extend it into panoptic segmentation, 900 become a problem. The mainly reason are:

    1. Panoptic in detr treat stuff and things all in things, which means, it will output a gaint tensor: 900x400x400 for example for every possible instances and semantic seg in image, that is a very huge tensor if input resolution not small;
    2. 900 might accelerate training on detection, but not for instance seg or semantic seg.

    Does there any exp on this or how to make it possible do panoptic seg in anchordetr way? Any suggestion would be very appreciated.

    question inactive 
    opened by luohao123 21
  • Overfitting when loading the pretrained model to train own dataset

    Overfitting when loading the pretrained model to train own dataset

    Hi, thanks for your great work. When I load the anchor_detr_r50_dc5.pth into the model and training on other dataset. The training loss is reducing, but the test loss remains around the same value in the 30 epoches. Do you have any experience with this? I reduced the number of reference points and the number of patterns, but still the same problem.

    question 
    opened by zknus 15
  • NAN error: assert (boxes1[:, 2:] >= boxes1[:, :2]).all()

    NAN error: assert (boxes1[:, 2:] >= boxes1[:, :2]).all()

    When training AnchorDETR with images per batch 8, and learning rate 0.0001, it keeps getting nan error, such as

    generalized_box_iou assert (boxes1[:, 2:] >= boxes1[:, :2]).all(), f"incorrect boxes, boxes1 {boxes1}" AssertionError: incorrect boxes, boxes1 tensor([[nan, nan, nan, nan], [nan, nan, nan, nan], [nan, nan, nan, nan], ..., [nan, nan, nan, nan], [nan, nan, nan, nan], [nan, nan, nan, nan]], device='cuda:4')

    Any idea to fix this issue?

    inactive 
    opened by yformer 10
  • onnx model can not be simpiflied and pass onnx.check and wierd output

    onnx model can not be simpiflied and pass onnx.check and wierd output

    the onnx model exported has very wierd dimension caused it can not be simplifed or pass onnx.checker.check.

    This is verbose output of export DETR:

    image

    this is verbose output of AnchorDETR:

    image

    Both are last serveral layers, as you can see, for DETR the strides seems very small

    but AnchorDETR are something like Float(1, 1, 900, 91, strides=[81900, 81900, 91, 1], requires_grad=1, device=cuda:0) gaint value.

    and it caused when try to check this model, or try to simplifed this model:

    ValueError: Message onnx.ModelProto exceeds maximum protobuf size of 2GB: 3714028571
    
    

    error got.

    any idea?

    bug 
    opened by jinfagang 10
  • Trying to understand my results on a custom dataset

    Trying to understand my results on a custom dataset

    Hi,

    I am studing your shared code using a custom dataset.

    Here are key points:

    • I splitted my dataset in train, valid and test
    • I trained for 100 epochs (I saw recommendation to run it for 50 epochs) using AnchorDETR-DC Resnet 101 weights for transfer learning.

    I would like to understand why after 50 epochs the performance of my train starts get worse (overfitting?).

    Here are the graph: image

    Run a validation code under my test dataset I got the following results using the saved weights:

    Saved weights of epoch: 0049
    IoU metric: bbox
     Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.599
     Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.847
     Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.637
     Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.000
     Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.516
     Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.624
     Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.632
     Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.677
     Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.703
     Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.000
     Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.616
     Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.727
    
    Saved weights of epoch: 0099
    IoU metric: bbox
     Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.558
     Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.806
     Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.607
     Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.000
     Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.426
     Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.594
     Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.620
     Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.653
     Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.663
     Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.000
     Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.504
     Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.702
    

    Why the results on epoch 49 is better than results on epoch 99? Is there some option to save only the best weights?

    inactive 
    opened by jackonealll 9
  • Model can not achieve a reasonable performance

    Model can not achieve a reasonable performance

    using bs=80, the AP at 30 can not goes up anymore:

    image

    what could be the reason?

    The model, the Criterion all same as AnchorDETR, all loss weights are same....

    but the performance so bad.... Even it indeed converge fast at the first serveral iterations....

    inactive 
    opened by luohao123 8
  • Some questions about your code

    Some questions about your code

    Hi, I'm very interested in your work about the newly decoder and self-attention of DETR,but I have some questions about your code. 1. when I debug the code, I found a detail not mentioned in the paper:when use RCDA attention, it is divided into two situations, like this : if efficient_compute: if src_len_col<src_len_row: b_ein,q_ein,w_ein = attn_output_weights_row.shape b_ein,h_ein,w_ein,c_ein = v.shape # Ax * V得到论文中的Z [bs*8,HW,H*32] -> [bs*8, H, HW, 32] attn_output_row = torch.matmul(attn_output_weights_row,v.permute(0,2,1,3).reshape(b_ein,w_ein,h_ein*c_ein)).reshape(b_ein,q_ein,h_ein,c_ein).permute(0,2,1,3) # Ay * Z得到论文中的attention outputs: [HW,8,bs,H]*[HW,bs*8,H,32] -> [HW,8,bs,32] -> [HW,8*bs,32] -> [HW, bs, 256] attn_output = torch.matmul(attn_output_weights_col.permute(1,0,2)[:,:,None,:],attn_output_row.permute(2,0,1,3)).squeeze(-2).reshape(tgt_len,bsz,embed_dim) ### the following code base on einsum get the same results # attn_output_row = torch.einsum("bqw,bhwc->bhqc",attn_output_weights_row,v) # attn_output = torch.einsum("bqh,bhqc->qbc",attn_output_weights_col,attn_output_row).reshape(tgt_len,bsz,embed_dim) else: b_ein,q_ein,h_ein=attn_output_weights_col.shape b_ein,h_ein,w_ein,c_ein = v.shape attn_output_col = torch.matmul(attn_output_weights_col,v.reshape(b_ein,h_ein,w_ein*c_ein)).reshape(b_ein,q_ein,w_ein,c_ein) attn_output = torch.matmul(attn_output_weights_row[:,:,None,:],attn_output_col).squeeze(-2).permute(1,0,2).reshape(tgt_len, bsz, embed_dim)

    Why is there two cases to calculate RCDA Attention outputs, in the code, if col < row, Z = Ax * V,then outputs = Ay * Z; if col > row, Z = Ay * V, then outputs = Ax * Z。in my opinion, when col < row, the attention outputs = Ay * Ax * V , otherwise the outputs = Ax * Ay * V,What is the reason for this design?

    1. RCDA attention module is only used in Self-Attention of Encoder and Cross-Attention of Deocder,Why not use it in Self-Attention of Decoder? I sincerely hope I can get your answers! Thanks !
    question 
    opened by Huzhen757 7
  • Some questions about reference point and object query?

    Some questions about reference point and object query?

    Hi, Thank you for your good work!

    Recently i am studying DETR-like, and i have questions as following:

    1.there is no implement about reference point iterative refinement , but many other works do that and it seems work well. I want to know have you tested that? And i notice that your work (Anchor-DETR) which had been mentioned in DAB is implemented by refinement.

    2.i am confusing about the word "object query", it seems different meanings in diiferent works. I want to know, in your opinoin, what the "object query" refers to? The content part(tgt in code), or positional part, or tgt + positional part.

    3.Which part do you think is more important? content part or positional part

    thank you again.

    opened by rOtking 6
  • 测试问题

    测试问题

    博主你好,我最近在复现您的模型,根据你的readme文件,下载了模型参数,并运行了eval的代码 1.python main.py --eval --coco_path /data/coco --resume /weights/AnchorDETR_r50_c5.pth 但是发现选择eval模式进行测试的时候,并不会使用coco的测试数据集,而是使用的coco的训练集进行测试。

    inactive 
    opened by jiangyichen19 6
  • How extract precision, recall and f1-score metrics

    How extract precision, recall and f1-score metrics

    Hello thank you for sharing the code.

    I would like to know how to extract precision, recall and f1-score metrics. I already have the AP and AR metrics.

    I am trying to use the following code but it gives me a numpy matrix:

    precision = coco_eval.eval['precision']
    recall = coco_eval.eval['recall']
    

    Can you help me?

    opened by jackonealll 5
Owner
MEGVII Research
Power Human with AI. 持续创新拓展认知边界 非凡科技成就产品价值
MEGVII Research
Official PyTorch implementation for Generic Attention-model Explainability for Interpreting Bi-Modal and Encoder-Decoder Transformers, a novel method to visualize any Transformer-based network. Including examples for DETR, VQA.

PyTorch Implementation of Generic Attention-model Explainability for Interpreting Bi-Modal and Encoder-Decoder Transformers 1 Using Colab Please notic

Hila Chefer 489 Jan 7, 2023
Official implementation of the ICCV 2021 paper "Conditional DETR for Fast Training Convergence".

The DETR approach applies the transformer encoder and decoder architecture to object detection and achieves promising performance. In this paper, we handle the critical issue, slow training convergence, and present a conditional cross-attention mechanism for fast DETR training. Our approach is motivated by that the cross-attention in DETR relies highly on the content embeddings and that the spatial embeddings make minor contributions, increasing the need for high-quality content embeddings and thus increasing the training difficulty.

null 281 Dec 30, 2022
[CVPR 2022] Official Pytorch code for OW-DETR: Open-world Detection Transformer

OW-DETR: Open-world Detection Transformer (CVPR 2022) [Paper] Akshita Gupta*, Sanath Narayan*, K J Joseph, Salman Khan, Fahad Shahbaz Khan, Mubarak Sh

Akshita Gupta 127 Dec 27, 2022
Official implementation of "Dynamic Anchor Learning for Arbitrary-Oriented Object Detection" (AAAI2021).

DAL This project hosts the official implementation for our AAAI 2021 paper: Dynamic Anchor Learning for Arbitrary-Oriented Object Detection [arxiv] [c

ming71 215 Nov 28, 2022
The official implementation of the CVPR 2021 paper FAPIS: a Few-shot Anchor-free Part-based Instance Segmenter

FAPIS The official implementation of the CVPR 2021 paper FAPIS: a Few-shot Anchor-free Part-based Instance Segmenter Introduction This repo is primari

Khoi Nguyen 8 Dec 11, 2022
[CVPR2021 Oral] UP-DETR: Unsupervised Pre-training for Object Detection with Transformers

UP-DETR: Unsupervised Pre-training for Object Detection with Transformers This is the official PyTorch implementation and models for UP-DETR paper: @a

dddzg 430 Dec 23, 2022
PED: DETR for Crowd Pedestrian Detection

PED: DETR for Crowd Pedestrian Detection Code for PED: DETR For (Crowd) Pedestrian Detection Paper PED: DETR for Crowd Pedestrian Detection Installati

null 36 Sep 13, 2022
Deformable DETR is an efficient and fast-converging end-to-end object detector.

Deformable DETR: Deformable Transformers for End-to-End Object Detection.

null 2k Jan 5, 2023
Moment-DETR code and QVHighlights dataset

Moment-DETR QVHighlights: Detecting Moments and Highlights in Videos via Natural Language Queries Jie Lei, Tamara L. Berg, Mohit Bansal For dataset de

Jie Lei 雷杰 133 Dec 22, 2022
Unofficial implementation of Proxy Anchor Loss for Deep Metric Learning

Proxy Anchor Loss for Deep Metric Learning Unofficial pytorch, tensorflow and mxnet implementations of Proxy Anchor Loss for Deep Metric Learning. Not

Geonmo Gu 3 Jun 9, 2021
Code for CVPR 2021 paper: Anchor-Free Person Search

Introduction This is the implementationn for Anchor-Free Person Search in CVPR2021 License This project is released under the Apache 2.0 license. Inst

null 158 Jan 4, 2023
Code for CVPR2021 paper "Learning Salient Boundary Feature for Anchor-free Temporal Action Localization"

AFSD: Learning Salient Boundary Feature for Anchor-free Temporal Action Localization This is an official implementation in PyTorch of AFSD. Our paper

Tencent YouTu Research 146 Dec 24, 2022
A high-performance anchor-free YOLO. Exceeding yolov3~v5 with ONNX, TensorRT, NCNN, and Openvino supported.

YOLOX is an anchor-free version of YOLO, with a simpler design but better performance! It aims to bridge the gap between research and industrial communities. For more details, please refer to our report on Arxiv.

null 7.7k Jan 6, 2023
YOLOX is a high-performance anchor-free YOLO, exceeding yolov3~v5 with ONNX, TensorRT, ncnn, and OpenVINO supported.

Introduction YOLOX is an anchor-free version of YOLO, with a simpler design but better performance! It aims to bridge the gap between research and ind

null 7.7k Jan 3, 2023
DAFNe: A One-Stage Anchor-Free Deep Model for Oriented Object Detection

DAFNe: A One-Stage Anchor-Free Deep Model for Oriented Object Detection Code for our Paper DAFNe: A One-Stage Anchor-Free Deep Model for Oriented Obje

Steven Lang 58 Dec 19, 2022
CenterFace(size of 7.3MB) is a practical anchor-free face detection and alignment method for edge devices.

CenterFace Introduce CenterFace(size of 7.3MB) is a practical anchor-free face detection and alignment method for edge devices. Recent Update 2019.09.

StarClouds 1.2k Dec 21, 2022
Codes for TIM2021 paper "Anchor-Based Spatio-Temporal Attention 3-D Convolutional Networks for Dynamic 3-D Point Cloud Sequences"

Codes for TIM2021 paper "Anchor-Based Spatio-Temporal Attention 3-D Convolutional Networks for Dynamic 3-D Point Cloud Sequences"

Intelligent Robotics and Machine Vision Lab 4 Jul 19, 2022
Yoloxkeypointsegment - An anchor-free version of YOLO, with a simpler design but better performance

Introduction 关键点版本:已完成 全景分割版本:已完成 实例分割版本:已完成 YOLOX is an anchor-free version of

null 23 Oct 20, 2022
A simple python module to generate anchor (aka default/prior) boxes for object detection tasks.

PyBx WIP A simple python module to generate anchor (aka default/prior) boxes for object detection tasks. Calculated anchor boxes are returned as ndarr

thatgeeman 4 Dec 15, 2022