[ICLR 2022] DAB-DETR: Dynamic Anchor Boxes are Better Queries for DETR

Overview

DAB-DETR

This is the official pytorch implementation of our ICLR 2022 paper DAB-DETR.

Authors: Shilong Liu, Feng Li, Hao Zhang, Xiao Yang, Xianbiao Qi, Hang Su, Jun Zhu, Lei Zhang

News

[2022/4/14] We release the .pptx file of our DETR-like models comparison figure for those who want to draw model arch figures in paper.
[2022/4/12] We fix a bug in the file datasets/coco_eval.py. The parameter useCats of CocoEvaluator should be True by default.
[2022/4/9] Our code is available!
[2022/3/9] We build a repo Awesome Detection Transformer to present papers about transformer for detection and segmenttion. Welcome to your attention!
[2022/3/8] Our new work DINO set a new record of 63.3AP on the MS-COCO leader board.
[2022/3/8] Our new work DN-DETR has been accpted by CVPR 2022!
[2022/1/21] Our work has been accepted to ICLR 2022.

Abstract

We present in this paper a novel query formulation using dynamic anchor boxes for DETR (DEtection TRansformer) and offer a deeper understanding of the role of queries in DETR. This new formulation directly uses box coordinates as queries in Transformer decoders and dynamically updates them layer-by-layer. Using box coordinates not only helps using explicit positional priors to improve the query-to-feature similarity and eliminate the slow training convergence issue in DETR, but also allows us to modulate the positional attention map using the box width and height information. Such a design makes it clear that queries in DETR can be implemented as performing soft ROI pooling layer-by-layer in a cascade manner. As a result, it leads to the best performance on MS-COCO benchmark among the DETR-like detection models under the same setting, e.g., AP 45.7% using ResNet50-DC5 as backbone trained in 50 epochs. We also conducted extensive experiments to confirm our analysis and verify the effectiveness of our methods.

Model

arch

Model Zoo

We provide our models with R50 backbone, including both DAB-DETR and DAB-Deformable-DETR (See Appendix C of our paper for more details).

name backbone box AP Log/Config/Checkpoint Where in Our Paper
0 DAB-DETR-R50 R50 42.2 Google Drive | Tsinghua Cloud Table 2
1 DAB-DETR-R50(3 pat)1 R50 42.6 Google Drive | Tsinghua Cloud Table 2
2 DAB-DETR-R50-DC5 R50 44.5 Google Drive | Tsinghua Cloud Table 2
3 DAB-DETR-R50-DC5-fixxy2 R50 44.7 Google Drive | Tsinghua Cloud Table 8. Appendix H.
4 DAB-DETR-R50-DC5(3 pat) R50 45.7 Google Drive | Tsinghua Cloud Table 2
5 DAB-Deformbale-DETR
(Deformbale Encoder Only)3
R50 46.9 Baseline for DN-DETR
6 DAB-Deformable-DETR-R504 R50 48.1 Google Drive | Tsinghua Cloud Extend Results for Table 5,
Appendix C.

Notes:

  • 1: The models with marks (3 pat) are trained with multiple pattern embeds (refer to Anchor DETR or our paper for more details.).
  • 2: The term "fixxy" means we use random initialization of anchors and do not update their parameters during training (See Appendix H of our paper for more details).
  • 3: The DAB-Deformbale-DETR(Deformbale Encoder Only) is a multiscale version of our DAB-DETR. See DN-DETR for more details.
  • 4: The result here is better than the number in our paper, as we use different losses coefficients during training. Refer to our config file for more details.

Usage

Installation

We use the great DETR project as our codebase, hence no extra dependency is needed for our DAB-DETR. For the DAB-Deformable-DETR, you need to compile the deformable attention operator manually.

We test our models under python=3.7.3,pytorch=1.9.0,cuda=11.1. Other versions might be available as well.

  1. Clone this repo
git clone https://github.com/IDEA-opensource/DAB-DETR.git
cd DAB-DETR
  1. Install Pytorch and torchvision

Follow the instrction on https://pytorch.org/get-started/locally/.

# an example:
conda install -c pytorch pytorch torchvision
  1. Install other needed packages
pip install -r requirements.txt
  1. Compiling CUDA operators
cd models/dab_deformable_detr/ops
python setup.py build install
# unit test (should see all checking is True)
python test.py
cd ../../..

Data

Please download COCO 2017 dataset and organize them as following:

COCODIR/
  ├── train2017/
  ├── val2017/
  └── annotations/
  	├── instances_train2017.json
  	└── instances_val2017.json

Run

We use the standard DAB-DETR-R50 and DAB-Deformable-DETR-R50 as examples for training and evalulation.

Eval our pretrianed models

Download our DAB-DETR-R50 model checkpoint from this link and perform the command below. You can expect to get the final AP about 42.2.

For our DAB-Deformable-DETR (download here), the final AP expected is 48.1.

# for dab_detr: 42.2 AP
python main.py -m dab_detr \
  --output_dir logs/DABDETR/R50 \
  --batch_size 1 \
  --coco_path /path/to/your/COCODIR \ # replace the args to your COCO path
  --resume /path/to/our/checkpoint \ # replace the args to your checkpoint path
  --eval

# for dab_deformable_detr: 48.1 AP
python main.py -m dab_deformable_detr \
  --output_dir logs/dab_deformable_detr/R50 \
  --batch_size 2 \
  --coco_path /path/to/your/COCODIR \ # replace the args to your COCO path
  --resume /path/to/our/checkpoint \ # replace the args to your checkpoint path
  --transformer_activation relu \
  --eval

Training your own models

Similarly, you can also train our model on a single process:

# for dab_detr
python main.py -m dab_detr \
  --output_dir logs/DABDETR/R50 \
  --batch_size 1 \
  --epochs 50 \
  --lr_drop 40 \
  --coco_path /path/to/your/COCODIR  # replace the args to your COCO path

Distributed Run

However, as the training is time consuming, we suggest to train the model on multi-device.

If you plan to train the models on a cluster with Slurm, here is an example command for training:

# for dab_detr: 42.2 AP
python run_with_submitit.py \
  --timeout 3000 \
  --job_name DABDETR \
  --coco_path /path/to/your/COCODIR \
  -m dab_detr \
  --job_dir logs/DABDETR/R50_%j \
  --batch_size 2 \
  --ngpus 8 \
  --nodes 1 \
  --epochs 50 \
  --lr_drop 40 

# for dab_deformable_detr: 48.1 AP
python run_with_submitit.py \
  --timeout 3000 \
  --job_name dab_deformable_detr \
  --coco_path /path/to/your/COCODIR \
  -m dab_deformable_detr \
  --transformer_activation relu \
  --job_dir logs/dab_deformable_detr/R50_%j \
  --batch_size 2 \
  --ngpus 8 \
  --nodes 1 \
  --epochs 50 \
  --lr_drop 40 

The final AP should be similar to ours. (42.2 for DAB-DETR and 48.1 for DAB-Deformable-DETR). Our configs and logs(see the model_zoo) could be used as references as well.

Notes:

  • The results are sensitive to the batch size. We use 16(2 images each GPU x 8 GPUs) by default.

Or run with multi-processes on a single node:

# for dab_detr: 42.2 AP
python -m torch.distributed.launch --nproc_per_node=8 \
  main.py -m dab_detr \
  --output_dir logs/DABDETR/R50 \
  --batch_size 2 \
  --epochs 50 \
  --lr_drop 40 \
  --coco_path /path/to/your/COCODIR

# for dab_deformable_detr: 48.1 AP
python -m torch.distributed.launch --nproc_per_node=8 \
  main.py -m dab_deformable_detr \
  --output_dir logs/dab_deformable_detr/R50 \
  --batch_size 2 \
  --epochs 50 \
  --lr_drop 40 \
  --transformer_activation relu \
  --coco_path /path/to/your/COCODIR

Detailed Model

arch

Comparison of DETR-like Models

The source file can be found here.

comparison

Links

DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection.
Hao Zhang*, Feng Li*, Shilong Liu*, Lei Zhang, Hang Su, Jun Zhu, Lionel M. Ni, Heung-Yeung Shum
arxiv 2022.
[paper] [code]

DN-DETR: Accelerate DETR Training by Introducing Query DeNoising.
Feng Li*, Hao Zhang*, Shilong Liu, Jian Guo, Lionel M. Ni, Lei Zhang.
IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2022.
[paper] [code]

License

DAB-DETR is released under the Apache 2.0 license. Please see the LICENSE file for more information.

Copyright (c) IDEA. All rights reserved.

Licensed under the Apache License, Version 2.0 (the "License"); you may not use these files except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Citation

@inproceedings{
  liu2022dabdetr,
  title={{DAB}-{DETR}: Dynamic Anchor Boxes are Better Queries for {DETR}},
  author={Shilong Liu and Feng Li and Hao Zhang and Xiao Yang and Xianbiao Qi and Hang Su and Jun Zhu and Lei Zhang},
  booktitle={International Conference on Learning Representations},
  year={2022},
  url={https://openreview.net/forum?id=oMI9PjOb9Jl}
}
Comments
  • Training problems of AP and AR.

    Training problems of AP and AR.

    Sorry to bother you! During training, I had a problem.

    Training parameters:

    batch_size = 1,
    epochs = 50,
    lr_drop = 40,
    modelname = 'dab_detr',
    num_workers = 6,
    

    Dataset(tiny-coco):

    10% of the entire coco2017
    

    Problem:

    So far, I've trained 20 epochs. But the results of AP and AR in every epoch are the same, and the results are as follows:

    IoU metric: bbox
     Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.000
     Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.000
     Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.000
     Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.000
     Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.000
     Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.000
     Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.000
     Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.000
     Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.001
     Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.000
     Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.000
     Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.003
    

    And the average statistics for the 20th epoch are:

    class_error: 88.89  loss: 16.2750 (16.8940)  loss_ce: 0.8517 (0.9079)  loss_bbox: 0.7610 (0.8277)  loss_giou: 1.1219 (1.0805)  loss_ce_0: 0.8508 (0.9079)  loss_bbox_0: 0.7550 (0.8284)  loss_giou_0: 1.1278 (1.0790)  loss_ce_1: 0.8519 (0.9078)  loss_bbox_1: 0.7502 (0.8281)  loss_giou_1: 1.1264 (1.0795)  loss_ce_2: 0.8520 (0.9078)  loss_bbox_2: 0.7614 (0.8283)  loss_giou_2: 1.1250 (1.0794)  loss_ce_3: 0.8514 (0.9080)  loss_bbox_3: 0.7613 (0.8279)  loss_giou_3: 1.1236 (1.0799)  loss_ce_4: 0.8517 (0.9079)  loss_bbox_4: 0.7611 (0.8278)  loss_giou_4: 1.1225 (1.0802)  loss_ce_unscaled: 0.8517 (0.9079)  class_error_unscaled: 80.0000 (77.4109)  loss_bbox_unscaled: 0.1522 (0.1655)  loss_giou_unscaled: 0.5610 (0.5403)  loss_xy_unscaled: 0.0560 (0.0604)  loss_hw_unscaled: 0.0998 (0.1051)  cardinality_error_unscaled: 293.0000 (293.0660)  loss_ce_0_unscaled: 0.8508 (0.9079)  loss_bbox_0_unscaled: 0.1510 (0.1657)  loss_giou_0_unscaled: 0.5639 (0.5395)  loss_xy_0_unscaled: 0.0561 (0.0605)  loss_hw_0_unscaled: 0.1034 (0.1052)  cardinality_error_0_unscaled: 293.0000 (293.0660)  loss_ce_1_unscaled: 0.8519 (0.9078)  loss_bbox_1_unscaled: 0.1500 (0.1656)  loss_giou_1_unscaled: 0.5632 (0.5397)  loss_xy_1_unscaled: 0.0560 (0.0605)  loss_hw_1_unscaled: 0.1029 (0.1051)  cardinality_error_1_unscaled: 293.0000 (293.0660)  loss_ce_2_unscaled: 0.8520 (0.9078)  loss_bbox_2_unscaled: 0.1523 (0.1657)  loss_giou_2_unscaled: 0.5625 (0.5397)  loss_xy_2_unscaled: 0.0560 (0.0605)  loss_hw_2_unscaled: 0.1022 (0.1052)  cardinality_error_2_unscaled: 293.0000 (293.0660)  loss_ce_3_unscaled: 0.8514 (0.9080)  loss_bbox_3_unscaled: 0.1523 (0.1656)  loss_giou_3_unscaled: 0.5618 (0.5400)  loss_xy_3_unscaled: 0.0560 (0.0604)  loss_hw_3_unscaled: 0.1014 (0.1052)  cardinality_error_3_unscaled: 293.0000 (293.0660)  loss_ce_4_unscaled: 0.8517 (0.9079)  loss_bbox_4_unscaled: 0.1522 (0.1656)  loss_giou_4_unscaled: 0.5612 (0.5401)  loss_xy_4_unscaled: 0.0560 (0.0604)  loss_hw_4_unscaled: 0.1006 (0.1052)  cardinality_error_4_unscaled: 293.0000 (293.0660)
    

    So what could be the problem? Dataset? Training parameters? Or something else? Thank you!

    opened by yuan738 14
  • Some questions about reproducing DAB-DETR

    Some questions about reproducing DAB-DETR

    Hi, I'd like to reproduce DAB-DETR, and I have two questions about some technique details of DAB-DETR.

    i. How do you initialize the learnable anchor boxes (results in Table 2)? Why not using the results in Table 8 (random initialization and fixing them in the first decoder layer) as default setting?

    ii. I am confused about modulated positional attention in Section 4.4. Is it an improvement on "conditional cross attention" in Conditional DETR (split cross attention into two parts, content and spatial dot-products)? Does the proposed modulated positional attention add referenced w into spatial dot-products?

    opened by Zx55 8
  • Some question about reference points!

    Some question about reference points!

    Thank you for your excellent work! After I read the code about DAB-detr Decoder, I have some question about the reference points. In the code of Decoder, "reference_points = new_reference_points.detach()". I am confused about why should use the detach operator. In my understanding, if using detach operator, the gradient won't backpropagated to the reference embedding, the gradient is cut-off. Looking forward to your reply! Thank you!.

    opened by guwen007 4
  • 'self.query_scale' before each transformer encoder layer

    'self.query_scale' before each transformer encoder layer

    Thanks for your great work. I notice a difference between dab-detr and conditional detr where there is a MLP defined as 'self.query_scale' before each transformer encoder layer. Does this operation have a description in the paper or other reference paper to explain its effect?

    opened by alexzeng1206 4
  • Why does the model predict bbox offset twice?

    Why does the model predict bbox offset twice?

    opened by LiewFeng 4
  • Deep copy MLP

    Deep copy MLP

    I admire your open source code very much, so I learned a lot. In Pytorch, nn.ModuleList[module for I in range(N)] is essentially a shallow copy of module. In order to use different modules at different layers, copy is required to complete deep copy. Therefore, I modified it with copy. Maybe I misunderstood it wrong. Welcome your comments.

    opened by wulele2 4
  • What is the purpose of the minus of max attention weight

    What is the purpose of the minus of max attention weight

    Hi, thanks for you nice work!

    But I have a confuse about the code.What is the purpose of the minus of max attention weight?

    https://github.com/IDEA-opensource/DAB-DETR/blob/9b637396d2d8eea16b39940cde8e7d34262cb2e2/models/DAB_DETR/attention.py#L381-L382

    Looking for your reply!

    opened by JosonChan1998 4
  • Run test.py error

    Run test.py error

    I use 8-node V100 and the environment is below:

    Python 3.8.5 (default, Sep 4 2020, 07:30:14) [GCC 7.3.0] :: Anaconda, Inc. on linux Type "help", "copyright", "credits" or "license" for more information.

    import torch torch.version '1.9.0+cu102'

    Error info: cuda out of memory

    test.py

    • True check_forward_equal_with_pytorch_double: max_abs_err 8.67e-19 max_rel_err 2.35e-16
    • True check_forward_equal_with_pytorch_float: max_abs_err 4.66e-10 max_rel_err 1.13e-07
    • True check_gradient_numerical(D=30)
    • True check_gradient_numerical(D=32)
    • True check_gradient_numerical(D=64)
    • True check_gradient_numerical(D=71)
    • True check_gradient_numerical(D=1025) Traceback (most recent call last): File "test.py", line 86, in check_gradient_numerical(channels, True, True, True) File "test.py", line 76, in check_gradient_numerical gradok = gradcheck(func, (value.double(), shapes, level_start_index, sampling_locations.double(), attention_weights.double(), im2col_step)) File "/root/anaconda3/envs/www_update/lib/python3.8/site-packages/torch/autograd/gradcheck.py", line 1245, in gradcheck return _gradcheck_helper(**args) File "/root/anaconda3/envs/www_update/lib/python3.8/site-packages/torch/autograd/gradcheck.py", line 1258, in _gradcheck_helper _gradcheck_real_imag(gradcheck_fn, func, func_out, tupled_inputs, outputs, eps, File "/root/anaconda3/envs/www_update/lib/python3.8/site-packages/torch/autograd/gradcheck.py", line 930, in _gradcheck_real_imag gradcheck_fn(func, func_out, tupled_inputs, outputs, eps, File "/root/anaconda3/envs/www_update/lib/python3.8/site-packages/torch/autograd/gradcheck.py", line 974, in _slow_gradcheck analytical = _check_analytical_jacobian_attributes(tupled_inputs, o, nondet_tol, check_grad_dtypes) File "/root/anaconda3/envs/www_update/lib/python3.8/site-packages/torch/autograd/gradcheck.py", line 520, in _check_analytical_jacobian_attributes jacobians1, types_ok, sizes_ok = _stack_and_check_tensors(vjps1, inputs, output_numel) File "/root/anaconda3/envs/www_update/lib/python3.8/site-packages/torch/autograd/gradcheck.py", line 461, in _stack_and_check_tensors out_jacobians = _allocate_jacobians_with_inputs(inputs, numel_outputs) File "/root/anaconda3/envs/www_update/lib/python3.8/site-packages/torch/autograd/gradcheck.py", line 31, in _allocate_jacobians_with_inputs out.append(t.new_zeros((t.numel(), numel_output), layout=torch.strided)) RuntimeError: CUDA out of memory. Tried to allocate 7.50 GiB (GPU 0; 31.75 GiB total capacity; 22.52 GiB already allocated; 6.90 GiB free; 23.52 GiB reserved in total by PyTorch)
    opened by tobymu 3
  • question about two_satge:

    question about two_satge:

    Thank you for your excellent work, What I want to know is have you used the two-stage strategy when training DAB-Deformable-DETR? For DAB-Deformable-DETR, does it give a performance boost?

    opened by leayz-888 3
  • Understanding the role of refpoint_embed

    Understanding the role of refpoint_embed

    I am having some trouble understanding the role of refpoint_embed under DABDeformableDETR module. Particularly what are the 4 values represent. Do they represent x,y,w,h or correspond to the 4 levels of the input feature maps? Because from line 391 in models/dab_deformanle_detr/deformable_transformer.py(link) the four values seem to be multiplied with valid ratios from four levels and broadcast along the xywh dimension. Also, when they are inputted into deformable attention, the order of dimension indicate that the 4 values correspond to 4 levels. On the other hand, when random_refpoints_xy is used, the first two values seem to represent xy instead? It's a bit confusing.

    opened by YellowPig-zp 3
  • I want to train my own dataset from scratch, but have some doubts

    I want to train my own dataset from scratch, but have some doubts

    First of all thank you for your team's work!

    My own dataset has 3k training samples and I wish to train from scratch using res50, this facilitates subsequent changes to the different backbone network. I have some questions that I hope will be answered:

    1. If I don't use your publicly available pretrained weights, will the network use the pretrained weights of res50 obtained on the imagenet dataset by default?

    2. At present, I have performed preliminary training. When using the default hyperparameters, I use 4 img/GPU 2 to train 100 epochs, and the result is still basically 0. Do I need more training cycles? How much?

    thanks for your reply

    opened by LKssssZz 2
  • Convert DAB-deformable-DETR to ONNX

    Convert DAB-deformable-DETR to ONNX

    I am trying to convert the generated model that I trained and also your pretrained model to ONNX but unfortunately I faced the following error message:

    RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0, and cpu! (when checking argument for argument index in method wrapper__index_select) By the way I have used static and dynamic input and I have used the following code:

    import torch.onnx import os, sys import torch import numpy as np

    from models import build_dab_deformable_detr from util.slconfig import SLConfig import torchvision import torchvision.transforms.functional as TF

    from PIL import Image import transforms as T

    import cv2 import argparse device = torch.device('cuda:0' )

    if name == "main": parser = argparse.ArgumentParser() parser.add_argument('--model_checkpoint_path', help="change the path of the model checkpoint.", default="./Checkpoints/checkpoint.pt") parser.add_argument('--model_config_path', help="change the path of the model config file", default="./Checkpoints/config.json") args = parser.parse_args() model_config_path = args.model_config_path model_checkpoint_path = args.model_checkpoint_path args_config = SLConfig.fromfile(model_config_path) model, criterion, post_processors = build_dab_deformable_detr(args_config) checkpoint = torch.load(model_checkpoint_path, map_location=device) model.load_state_dict(checkpoint['model']) model = model.to(device) img_size =[1080,1920] input = torch.zeros(1, 3, *img_size) input = input.to(device) model.eval() results =model(input) torch.onnx.export( model, input, "test.onnx", input_names=["input"], output_names=["output"], export_params=True, opset_version=11, # I have also tried version 12,13,14,15 # dynamic_axes={'images': {0: 'batch', 2: 'height', 3: 'width'}, # shape(1,3,640,640) # 'output': {0: 'batch', 1: 'anchors'} # shape(1,25200,85) # } ,#if dynamic else None dynamic_axes = None, )

    opened by sazani 0
  • ONNX model generation

    ONNX model generation

    Can you please convert your model into the ONNX model? I want to test it on tensor rt for inferencing. I am trying to convert it to the ONNX model but getting the following error: RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0, and cpu! (when checking argument for argument index in method wrapper__index_select)

    opened by ayazhassan 2
  • Question about some configuration in DAB-Deformable DETR.

    Question about some configuration in DAB-Deformable DETR.

    Thanks for your great work. I have some questions about the implementation of DAB-Deformable DETR.

    1. In DAB-DETR the position embedding is sinehw, while in DAB-Deformable-DETR it uses the original sine. Is there any reason for this difference?
    2. I found the configuration uses a larger dim_feedforward=2048. How does it performance with 1024?
    3. Have you experimented with the two-stage setting in Deformable-DETR. Could you share the results?
    opened by volgachen 1
  • How to calculate flops

    How to calculate flops

    Hi! thanks for your excellent work, I'm wondering how to evaluate the flops of DAB-DETR model? I can't directly use the DETR script which will get AssertionError in jit_handles.py. https://github.com/facebookresearch/detr/issues/110 Could you pls share your python script?

    opened by stnjumu 0
  • Why modulating attention by w&h works?

    Why modulating attention by w&h works?

    I have some doubts on line https://github.com/IDEA-opensource/DAB-DETR/blob/main/models/DAB_DETR/transformer.py#L242 .

    refHW_cond = self.ref_anchor_head(output).sigmoid() # nq, bs, 2
    

    This line asks the model to learn absolute value of w, h from output. But NO supervision is applied. Besides, the 'output' tensor is used to learn the OFFSET of bbox (x, y, w, h).

    So, I am wondering whether the model can learn width and height as expected?

    opened by SupetZYK 1
Owner
null
Official Implementation of DE-DETR and DELA-DETR in "Towards Data-Efficient Detection Transformers"

DE-DETRs By Wen Wang, Jing Zhang, Yang Cao, Yongliang Shen, and Dacheng Tao This repository is an official implementation of DE-DETR and DELA-DETR in

Wen Wang 61 Dec 12, 2022
Yoloxkeypointsegment - An anchor-free version of YOLO, with a simpler design but better performance

Introduction 关键点版本:已完成 全景分割版本:已完成 实例分割版本:已完成 YOLOX is an anchor-free version of

null 23 Oct 20, 2022
Official implementation of "Dynamic Anchor Learning for Arbitrary-Oriented Object Detection" (AAAI2021).

DAL This project hosts the official implementation for our AAAI 2021 paper: Dynamic Anchor Learning for Arbitrary-Oriented Object Detection [arxiv] [c

ming71 215 Nov 28, 2022
Codes for TIM2021 paper "Anchor-Based Spatio-Temporal Attention 3-D Convolutional Networks for Dynamic 3-D Point Cloud Sequences"

Codes for TIM2021 paper "Anchor-Based Spatio-Temporal Attention 3-D Convolutional Networks for Dynamic 3-D Point Cloud Sequences"

Intelligent Robotics and Machine Vision Lab 4 Jul 19, 2022
[CVPR 2022] Official Pytorch code for OW-DETR: Open-world Detection Transformer

OW-DETR: Open-world Detection Transformer (CVPR 2022) [Paper] Akshita Gupta*, Sanath Narayan*, K J Joseph, Salman Khan, Fahad Shahbaz Khan, Mubarak Sh

Akshita Gupta 127 Dec 27, 2022
[ICLR 2021] Is Attention Better Than Matrix Decomposition?

Enjoy-Hamburger ?? Official implementation of Hamburger, Is Attention Better Than Matrix Decomposition? (ICLR 2021) Under construction. Introduction T

Gsunshine 271 Dec 29, 2022
Imposter-detector-2022 - HackED 2022 Team 3IQ - 2022 Imposter Detector

HackED 2022 Team 3IQ - 2022 Imposter Detector By Aneeljyot Alagh, Curtis Kan, Jo

Joshua Ji 3 Aug 20, 2022
This repo contains the pytorch implementation for Dynamic Concept Learner (accepted by ICLR 2021).

DCL-PyTorch Pytorch implementation for the Dynamic Concept Learner (DCL). More details can be found at the project page. Framework Grounding Physical

Zhenfang Chen 31 Jan 6, 2023
This repository is based on Ultralytics/yolov5, with adjustments to enable polygon prediction boxes.

Polygon-Yolov5 This repository is based on Ultralytics/yolov5, with adjustments to enable polygon prediction boxes. Section I. Description The codes a

xinzelee 226 Jan 5, 2023
This repository is based on Ultralytics/yolov5, with adjustments to enable rotate prediction boxes.

Rotate-Yolov5 This repository is based on Ultralytics/yolov5, with adjustments to enable rotate prediction boxes. Section I. Description The codes are

xinzelee 90 Dec 13, 2022
PyTorch implementation of Interpretable Explanations of Black Boxes by Meaningful Perturbation

PyTorch implementation of Interpretable Explanations of Black Boxes by Meaningful Perturbation The paper: https://arxiv.org/abs/1704.03296 What makes

Jacob Gildenblat 322 Dec 17, 2022
labelpix is a graphical image labeling interface for drawing bounding boxes

Welcome to labelpix ?? labelpix is a graphical image labeling interface for drawing bounding boxes. ?? Homepage Install pip install -r requirements.tx

schissmantics 26 May 24, 2022
Pytorch based library to rank predicted bounding boxes using text/image user's prompts.

pytorch_clip_bbox: Implementation of the CLIP guided bbox ranking for Object Detection. Pytorch based library to rank predicted bounding boxes using t

Sergei Belousov 50 Nov 27, 2022
Pytorch ImageNet1k Loader with Bounding Boxes.

ImageNet 1K Bounding Boxes For some experiments, you might wanna pass only the background of imagenet images vs passing only the foreground. Here, I'v

Amin Ghiasi 11 Oct 15, 2022
Dynamic View Synthesis from Dynamic Monocular Video

Towards Robust Monocular Depth Estimation: Mixing Datasets for Zero-shot Cross-dataset Transfer This repository contains code to compute depth from a

Intelligent Systems Lab Org 2.3k Jan 1, 2023
Dynamic View Synthesis from Dynamic Monocular Video

Dynamic View Synthesis from Dynamic Monocular Video Project Website | Video | Paper Dynamic View Synthesis from Dynamic Monocular Video Chen Gao, Ayus

Chen Gao 139 Dec 28, 2022
Dynamic vae - Dynamic VAE algorithm is used for anomaly detection of battery data

Dynamic VAE frame Automatic feature extraction can be achieved by probability di

null 10 Oct 7, 2022
[CVPRW 2022] Attentions Help CNNs See Better: Attention-based Hybrid Image Quality Assessment Network

Attention Helps CNN See Better: Hybrid Image Quality Assessment Network [CVPRW 2022] Code for Hybrid Image Quality Assessment Network [paper] [code] T

IIGROUP 49 Dec 11, 2022
Pytorch re-implementation of Paper: SwinTextSpotter: Scene Text Spotting via Better Synergy between Text Detection and Text Recognition (CVPR 2022)

SwinTextSpotter This is the pytorch implementation of Paper: SwinTextSpotter: Scene Text Spotting via Better Synergy between Text Detection and Text R

mxin262 183 Jan 3, 2023