[ICLR 2022] DAB-DETR: Dynamic Anchor Boxes are Better Queries for DETR

Last update: Dec 25, 2022

Related tags

Deep Learning DAB-DETR

Overview

DAB-DETR

This is the official pytorch implementation of our ICLR 2022 paper DAB-DETR.

Authors: Shilong Liu, Feng Li, Hao Zhang, Xiao Yang, Xianbiao Qi, Hang Su, Jun Zhu, Lei Zhang

News

[2022/4/14] We release the .pptx file of our DETR-like models comparison figure for those who want to draw model arch figures in paper.
[2022/4/12] We fix a bug in the file datasets/coco_eval.py. The parameter useCats of CocoEvaluator should be True by default.
[2022/4/9] Our code is available!
[2022/3/9] We build a repo Awesome Detection Transformer to present papers about transformer for detection and segmenttion. Welcome to your attention!
[2022/3/8] Our new work DINO set a new record of 63.3AP on the MS-COCO leader board.
[2022/3/8] Our new work DN-DETR has been accpted by CVPR 2022!
[2022/1/21] Our work has been accepted to ICLR 2022.

Abstract

We present in this paper a novel query formulation using dynamic anchor boxes for DETR (DEtection TRansformer) and offer a deeper understanding of the role of queries in DETR. This new formulation directly uses box coordinates as queries in Transformer decoders and dynamically updates them layer-by-layer. Using box coordinates not only helps using explicit positional priors to improve the query-to-feature similarity and eliminate the slow training convergence issue in DETR, but also allows us to modulate the positional attention map using the box width and height information. Such a design makes it clear that queries in DETR can be implemented as performing soft ROI pooling layer-by-layer in a cascade manner. As a result, it leads to the best performance on MS-COCO benchmark among the DETR-like detection models under the same setting, e.g., AP 45.7% using ResNet50-DC5 as backbone trained in 50 epochs. We also conducted extensive experiments to confirm our analysis and verify the effectiveness of our methods.

Model

Model Zoo

We provide our models with R50 backbone, including both DAB-DETR and DAB-Deformable-DETR (See Appendix C of our paper for more details).

	name	backbone	box AP	Log/Config/Checkpoint	Where in Our Paper
0	DAB-DETR-R50	R50	42.2	Google Drive \| Tsinghua Cloud	Table 2
1	DAB-DETR-R50(3 pat)¹	R50	42.6	Google Drive \| Tsinghua Cloud	Table 2
2	DAB-DETR-R50-DC5	R50	44.5	Google Drive \| Tsinghua Cloud	Table 2
3	DAB-DETR-R50-DC5-fixxy²	R50	44.7	Google Drive \| Tsinghua Cloud	Table 8. Appendix H.
4	DAB-DETR-R50-DC5(3 pat)	R50	45.7	Google Drive \| Tsinghua Cloud	Table 2
5	DAB-Deformbale-DETR (Deformbale Encoder Only)³	R50	46.9		Baseline for DN-DETR
6	DAB-Deformable-DETR-R50⁴	R50	48.1	Google Drive \| Tsinghua Cloud	Extend Results for Table 5, Appendix C.

Notes:

¹: The models with marks (3 pat) are trained with multiple pattern embeds (refer to Anchor DETR or our paper for more details.).
²: The term "fixxy" means we use random initialization of anchors and do not update their parameters during training (See Appendix H of our paper for more details).
³: The DAB-Deformbale-DETR(Deformbale Encoder Only) is a multiscale version of our DAB-DETR. See DN-DETR for more details.
⁴: The result here is better than the number in our paper, as we use different losses coefficients during training. Refer to our config file for more details.

Usage

Installation

We use the great DETR project as our codebase, hence no extra dependency is needed for our DAB-DETR. For the DAB-Deformable-DETR, you need to compile the deformable attention operator manually.

We test our models under python=3.7.3,pytorch=1.9.0,cuda=11.1. Other versions might be available as well.

Clone this repo

git clone https://github.com/IDEA-opensource/DAB-DETR.git
cd DAB-DETR

Install Pytorch and torchvision

Follow the instrction on https://pytorch.org/get-started/locally/.

# an example:
conda install -c pytorch pytorch torchvision

Install other needed packages

pip install -r requirements.txt

Compiling CUDA operators

cd models/dab_deformable_detr/ops
python setup.py build install
# unit test (should see all checking is True)
python test.py
cd ../../..

Data

Please download COCO 2017 dataset and organize them as following:

COCODIR/
  ├── train2017/
  ├── val2017/
  └── annotations/
  	├── instances_train2017.json
  	└── instances_val2017.json

Run

We use the standard DAB-DETR-R50 and DAB-Deformable-DETR-R50 as examples for training and evalulation.

Eval our pretrianed models

Download our DAB-DETR-R50 model checkpoint from this link and perform the command below. You can expect to get the final AP about 42.2.

For our DAB-Deformable-DETR (download here), the final AP expected is 48.1.

# for dab_detr: 42.2 AP
python main.py -m dab_detr \
  --output_dir logs/DABDETR/R50 \
  --batch_size 1 \
  --coco_path /path/to/your/COCODIR \ # replace the args to your COCO path
  --resume /path/to/our/checkpoint \ # replace the args to your checkpoint path
  --eval

# for dab_deformable_detr: 48.1 AP
python main.py -m dab_deformable_detr \
  --output_dir logs/dab_deformable_detr/R50 \
  --batch_size 2 \
  --coco_path /path/to/your/COCODIR \ # replace the args to your COCO path
  --resume /path/to/our/checkpoint \ # replace the args to your checkpoint path
  --transformer_activation relu \
  --eval

Training your own models

Similarly, you can also train our model on a single process:

# for dab_detr
python main.py -m dab_detr \
  --output_dir logs/DABDETR/R50 \
  --batch_size 1 \
  --epochs 50 \
  --lr_drop 40 \
  --coco_path /path/to/your/COCODIR  # replace the args to your COCO path

Distributed Run

However, as the training is time consuming, we suggest to train the model on multi-device.

If you plan to train the models on a cluster with Slurm, here is an example command for training:

# for dab_detr: 42.2 AP
python run_with_submitit.py \
  --timeout 3000 \
  --job_name DABDETR \
  --coco_path /path/to/your/COCODIR \
  -m dab_detr \
  --job_dir logs/DABDETR/R50_%j \
  --batch_size 2 \
  --ngpus 8 \
  --nodes 1 \
  --epochs 50 \
  --lr_drop 40 

# for dab_deformable_detr: 48.1 AP
python run_with_submitit.py \
  --timeout 3000 \
  --job_name dab_deformable_detr \
  --coco_path /path/to/your/COCODIR \
  -m dab_deformable_detr \
  --transformer_activation relu \
  --job_dir logs/dab_deformable_detr/R50_%j \
  --batch_size 2 \
  --ngpus 8 \
  --nodes 1 \
  --epochs 50 \
  --lr_drop 40

The final AP should be similar to ours. (42.2 for DAB-DETR and 48.1 for DAB-Deformable-DETR). Our configs and logs(see the model_zoo) could be used as references as well.

Notes:

The results are sensitive to the batch size. We use 16(2 images each GPU x 8 GPUs) by default.

Or run with multi-processes on a single node:

# for dab_detr: 42.2 AP
python -m torch.distributed.launch --nproc_per_node=8 \
  main.py -m dab_detr \
  --output_dir logs/DABDETR/R50 \
  --batch_size 2 \
  --epochs 50 \
  --lr_drop 40 \
  --coco_path /path/to/your/COCODIR

# for dab_deformable_detr: 48.1 AP
python -m torch.distributed.launch --nproc_per_node=8 \
  main.py -m dab_deformable_detr \
  --output_dir logs/dab_deformable_detr/R50 \
  --batch_size 2 \
  --epochs 50 \
  --lr_drop 40 \
  --transformer_activation relu \
  --coco_path /path/to/your/COCODIR

Detailed Model

Comparison of DETR-like Models

The source file can be found here.

Links

DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection.
Hao Zhang*, Feng Li*, Shilong Liu*, Lei Zhang, Hang Su, Jun Zhu, Lionel M. Ni, Heung-Yeung Shum
arxiv 2022.
[paper] [code]

DN-DETR: Accelerate DETR Training by Introducing Query DeNoising.
Feng Li*, Hao Zhang*, Shilong Liu, Jian Guo, Lionel M. Ni, Lei Zhang.
IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2022.
[paper] [code]

License

DAB-DETR is released under the Apache 2.0 license. Please see the LICENSE file for more information.

Licensed under the Apache License, Version 2.0 (the "License"); you may not use these files except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Citation

@inproceedings{
  liu2022dabdetr,
  title={{DAB}-{DETR}: Dynamic Anchor Boxes are Better Queries for {DETR}},
  author={Shilong Liu and Feng Li and Hao Zhang and Xiao Yang and Xianbiao Qi and Hang Su and Jun Zhu and Lei Zhang},
  booktitle={International Conference on Learning Representations},
  year={2022},
  url={https://openreview.net/forum?id=oMI9PjOb9Jl}
}

Comments

Training problems of AP and AR.

Sorry to bother you! During training, I had a problem.

Training parameters:

batch_size = 1,
epochs = 50,
lr_drop = 40,
modelname = 'dab_detr',
num_workers = 6,

Dataset(tiny-coco):

10% of the entire coco2017

Problem:

So far, I've trained 20 epochs. But the results of AP and AR in every epoch are the same, and the results are as follows:

IoU metric: bbox
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.000
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.000
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.000
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.000
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.000
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.000
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.000
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.000
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.001
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.000
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.000
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.003

And the average statistics for the 20th epoch are:

class_error: 88.89  loss: 16.2750 (16.8940)  loss_ce: 0.8517 (0.9079)  loss_bbox: 0.7610 (0.8277)  loss_giou: 1.1219 (1.0805)  loss_ce_0: 0.8508 (0.9079)  loss_bbox_0: 0.7550 (0.8284)  loss_giou_0: 1.1278 (1.0790)  loss_ce_1: 0.8519 (0.9078)  loss_bbox_1: 0.7502 (0.8281)  loss_giou_1: 1.1264 (1.0795)  loss_ce_2: 0.8520 (0.9078)  loss_bbox_2: 0.7614 (0.8283)  loss_giou_2: 1.1250 (1.0794)  loss_ce_3: 0.8514 (0.9080)  loss_bbox_3: 0.7613 (0.8279)  loss_giou_3: 1.1236 (1.0799)  loss_ce_4: 0.8517 (0.9079)  loss_bbox_4: 0.7611 (0.8278)  loss_giou_4: 1.1225 (1.0802)  loss_ce_unscaled: 0.8517 (0.9079)  class_error_unscaled: 80.0000 (77.4109)  loss_bbox_unscaled: 0.1522 (0.1655)  loss_giou_unscaled: 0.5610 (0.5403)  loss_xy_unscaled: 0.0560 (0.0604)  loss_hw_unscaled: 0.0998 (0.1051)  cardinality_error_unscaled: 293.0000 (293.0660)  loss_ce_0_unscaled: 0.8508 (0.9079)  loss_bbox_0_unscaled: 0.1510 (0.1657)  loss_giou_0_unscaled: 0.5639 (0.5395)  loss_xy_0_unscaled: 0.0561 (0.0605)  loss_hw_0_unscaled: 0.1034 (0.1052)  cardinality_error_0_unscaled: 293.0000 (293.0660)  loss_ce_1_unscaled: 0.8519 (0.9078)  loss_bbox_1_unscaled: 0.1500 (0.1656)  loss_giou_1_unscaled: 0.5632 (0.5397)  loss_xy_1_unscaled: 0.0560 (0.0605)  loss_hw_1_unscaled: 0.1029 (0.1051)  cardinality_error_1_unscaled: 293.0000 (293.0660)  loss_ce_2_unscaled: 0.8520 (0.9078)  loss_bbox_2_unscaled: 0.1523 (0.1657)  loss_giou_2_unscaled: 0.5625 (0.5397)  loss_xy_2_unscaled: 0.0560 (0.0605)  loss_hw_2_unscaled: 0.1022 (0.1052)  cardinality_error_2_unscaled: 293.0000 (293.0660)  loss_ce_3_unscaled: 0.8514 (0.9080)  loss_bbox_3_unscaled: 0.1523 (0.1656)  loss_giou_3_unscaled: 0.5618 (0.5400)  loss_xy_3_unscaled: 0.0560 (0.0604)  loss_hw_3_unscaled: 0.1014 (0.1052)  cardinality_error_3_unscaled: 293.0000 (293.0660)  loss_ce_4_unscaled: 0.8517 (0.9079)  loss_bbox_4_unscaled: 0.1522 (0.1656)  loss_giou_4_unscaled: 0.5612 (0.5401)  loss_xy_4_unscaled: 0.0560 (0.0604)  loss_hw_4_unscaled: 0.1006 (0.1052)  cardinality_error_4_unscaled: 293.0000 (293.0660)

So what could be the problem? Dataset? Training parameters? Or something else? Thank you!

opened by yuan738 14

Some questions about reproducing DAB-DETR

Hi, I'd like to reproduce DAB-DETR, and I have two questions about some technique details of DAB-DETR.

i. How do you initialize the learnable anchor boxes (results in Table 2)? Why not using the results in Table 8 (random initialization and fixing them in the first decoder layer) as default setting?

ii. I am confused about modulated positional attention in Section 4.4. Is it an improvement on "conditional cross attention" in Conditional DETR (split cross attention into two parts, content and spatial dot-products)? Does the proposed modulated positional attention add referenced w into spatial dot-products?

opened by Zx55 8
Some question about reference points!

Thank you for your excellent work! After I read the code about DAB-detr Decoder, I have some question about the reference points. In the code of Decoder, "reference_points = new_reference_points.detach()". I am confused about why should use the detach operator. In my understanding, if using detach operator, the gradient won't backpropagated to the reference embedding, the gradient is cut-off. Looking forward to your reply! Thank you!.

opened by guwen007 4
'self.query_scale' before each transformer encoder layer

Thanks for your great work. I notice a difference between dab-detr and conditional detr where there is a MLP defined as 'self.query_scale' before each transformer encoder layer. Does this operation have a description in the paper or other reference paper to explain its effect?

opened by alexzeng1206 4
Why does the model predict bbox offset twice?

The model predicts bbox offsets twice, in DAB-DETR/blob/main/models/DAB_DETR/transformer.py, Line 255-265, in DAB-DETR/blob/main/models/DAB_DETR/DABDETR.py, Line 171-184. The difference is that the input of the second predict is normed. The second predict seems unnecessary. Am I right?

opened by LiewFeng 4
Deep copy MLP

I admire your open source code very much, so I learned a lot. In Pytorch, nn.ModuleList[module for I in range(N)] is essentially a shallow copy of module. In order to use different modules at different layers, copy is required to complete deep copy. Therefore, I modified it with copy. Maybe I misunderstood it wrong. Welcome your comments.

opened by wulele2 4
What is the purpose of the minus of max attention weight

Hi, thanks for you nice work!

But I have a confuse about the code.What is the purpose of the minus of max attention weight？

https://github.com/IDEA-opensource/DAB-DETR/blob/9b637396d2d8eea16b39940cde8e7d34262cb2e2/models/DAB_DETR/attention.py#L381-L382

Looking for your reply!

opened by JosonChan1998 4
Run test.py error
I use 8-node V100 and the environment is below:

Python 3.8.5 (default, Sep 4 2020, 07:30:14) [GCC 7.3.0] :: Anaconda, Inc. on linux Type "help", "copyright", "credits" or "license" for more information.

import torch torch.version '1.9.0+cu102'

Error info: cuda out of memory

test.py

True check_forward_equal_with_pytorch_double: max_abs_err 8.67e-19 max_rel_err 2.35e-16

True check_forward_equal_with_pytorch_float: max_abs_err 4.66e-10 max_rel_err 1.13e-07

True check_gradient_numerical(D=30)

True check_gradient_numerical(D=32)

True check_gradient_numerical(D=64)

True check_gradient_numerical(D=71)

True check_gradient_numerical(D=1025) Traceback (most recent call last): File "test.py", line 86, in check_gradient_numerical(channels, True, True, True) File "test.py", line 76, in check_gradient_numerical gradok = gradcheck(func, (value.double(), shapes, level_start_index, sampling_locations.double(), attention_weights.double(), im2col_step)) File "/root/anaconda3/envs/www_update/lib/python3.8/site-packages/torch/autograd/gradcheck.py", line 1245, in gradcheck return _gradcheck_helper(**args) File "/root/anaconda3/envs/www_update/lib/python3.8/site-packages/torch/autograd/gradcheck.py", line 1258, in _gradcheck_helper _gradcheck_real_imag(gradcheck_fn, func, func_out, tupled_inputs, outputs, eps, File "/root/anaconda3/envs/www_update/lib/python3.8/site-packages/torch/autograd/gradcheck.py", line 930, in _gradcheck_real_imag gradcheck_fn(func, func_out, tupled_inputs, outputs, eps, File "/root/anaconda3/envs/www_update/lib/python3.8/site-packages/torch/autograd/gradcheck.py", line 974, in _slow_gradcheck analytical = _check_analytical_jacobian_attributes(tupled_inputs, o, nondet_tol, check_grad_dtypes) File "/root/anaconda3/envs/www_update/lib/python3.8/site-packages/torch/autograd/gradcheck.py", line 520, in _check_analytical_jacobian_attributes jacobians1, types_ok, sizes_ok = _stack_and_check_tensors(vjps1, inputs, output_numel) File "/root/anaconda3/envs/www_update/lib/python3.8/site-packages/torch/autograd/gradcheck.py", line 461, in _stack_and_check_tensors out_jacobians = _allocate_jacobians_with_inputs(inputs, numel_outputs) File "/root/anaconda3/envs/www_update/lib/python3.8/site-packages/torch/autograd/gradcheck.py", line 31, in _allocate_jacobians_with_inputs out.append(t.new_zeros((t.numel(), numel_output), layout=torch.strided)) RuntimeError: CUDA out of memory. Tried to allocate 7.50 GiB (GPU 0; 31.75 GiB total capacity; 22.52 GiB already allocated; 6.90 GiB free; 23.52 GiB reserved in total by PyTorch)
opened by tobymu 3
question about two_satge:

Thank you for your excellent work, What I want to know is have you used the two-stage strategy when training DAB-Deformable-DETR? For DAB-Deformable-DETR, does it give a performance boost?

opened by leayz-888 3
Understanding the role of refpoint_embed

I am having some trouble understanding the role of refpoint_embed under DABDeformableDETR module. Particularly what are the 4 values represent. Do they represent x,y,w,h or correspond to the 4 levels of the input feature maps? Because from line 391 in models/dab_deformanle_detr/deformable_transformer.py(link) the four values seem to be multiplied with valid ratios from four levels and broadcast along the xywh dimension. Also, when they are inputted into deformable attention, the order of dimension indicate that the 4 values correspond to 4 levels. On the other hand, when random_refpoints_xy is used, the first two values seem to represent xy instead? It's a bit confusing.

opened by YellowPig-zp 3
I want to train my own dataset from scratch, but have some doubts
First of all thank you for your team's work！

My own dataset has 3k training samples and I wish to train from scratch using res50, this facilitates subsequent changes to the different backbone network. I have some questions that I hope will be answered:

If I don't use your publicly available pretrained weights, will the network use the pretrained weights of res50 obtained on the imagenet dataset by default?

At present, I have performed preliminary training. When using the default hyperparameters, I use 4 img/GPU 2 to train 100 epochs, and the result is still basically 0. Do I need more training cycles? How much?

thanks for your reply
opened by LKssssZz 2
Convert DAB-deformable-DETR to ONNX

I am trying to convert the generated model that I trained and also your pretrained model to ONNX but unfortunately I faced the following error message:

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0, and cpu! (when checking argument for argument index in method wrapper__index_select) By the way I have used static and dynamic input and I have used the following code:

import torch.onnx import os, sys import torch import numpy as np

from models import build_dab_deformable_detr from util.slconfig import SLConfig import torchvision import torchvision.transforms.functional as TF

from PIL import Image import transforms as T

import cv2 import argparse device = torch.device('cuda:0' )

if name == "main": parser = argparse.ArgumentParser() parser.add_argument('--model_checkpoint_path', help="change the path of the model checkpoint.", default="./Checkpoints/checkpoint.pt") parser.add_argument('--model_config_path', help="change the path of the model config file", default="./Checkpoints/config.json") args = parser.parse_args() model_config_path = args.model_config_path model_checkpoint_path = args.model_checkpoint_path args_config = SLConfig.fromfile(model_config_path) model, criterion, post_processors = build_dab_deformable_detr(args_config) checkpoint = torch.load(model_checkpoint_path, map_location=device) model.load_state_dict(checkpoint['model']) model = model.to(device) img_size =[1080,1920] input = torch.zeros(1, 3, *img_size) input = input.to(device) model.eval() results =model(input) torch.onnx.export( model, input, "test.onnx", input_names=["input"], output_names=["output"], export_params=True, opset_version=11, # I have also tried version 12,13,14,15 # dynamic_axes={'images': {0: 'batch', 2: 'height', 3: 'width'}, # shape(1,3,640,640) # 'output': {0: 'batch', 1: 'anchors'} # shape(1,25200,85) # } ,#if dynamic else None dynamic_axes = None, )

opened by sazani 0
ONNX model generation

Can you please convert your model into the ONNX model? I want to test it on tensor rt for inferencing. I am trying to convert it to the ONNX model but getting the following error: RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0, and cpu! (when checking argument for argument index in method wrapper__index_select)

opened by ayazhassan 2
Question about some configuration in DAB-Deformable DETR.
Thanks for your great work. I have some questions about the implementation of DAB-Deformable DETR.

In DAB-DETR the position embedding is sinehw, while in DAB-Deformable-DETR it uses the original sine. Is there any reason for this difference?

I found the configuration uses a larger dim_feedforward=2048. How does it performance with 1024?

Have you experimented with the two-stage setting in Deformable-DETR. Could you share the results?
opened by volgachen 1
How to calculate flops

Hi! thanks for your excellent work, I'm wondering how to evaluate the flops of DAB-DETR model? I can't directly use the DETR script which will get AssertionError in jit_handles.py. https://github.com/facebookresearch/detr/issues/110 Could you pls share your python script?

opened by stnjumu 0
Why modulating attention by w&h works?
I have some doubts on line https://github.com/IDEA-opensource/DAB-DETR/blob/main/models/DAB_DETR/transformer.py#L242 .

refHW_cond = self.ref_anchor_head(output).sigmoid() # nq, bs, 2

This line asks the model to learn absolute value of w, h from output. But NO supervision is applied. Besides, the 'output' tensor is used to learn the OFFSET of bbox (x, y, w, h).

So, I am wondering whether the model can learn width and height as expected?
opened by SupetZYK 1

Owner

GitHub

Official Implementation of DE-DETR and DELA-DETR in "Towards Data-Efficient Detection Transformers"

DE-DETRs By Wen Wang, Jing Zhang, Yang Cao, Yongliang Shen, and Dacheng Tao This repository is an official implementation of DE-DETR and DELA-DETR in

61 Dec 12, 2022

Yoloxkeypointsegment - An anchor-free version of YOLO, with a simpler design but better performance

Introduction 关键点版本：已完成全景分割版本：已完成实例分割版本：已完成 YOLOX is an anchor-free version of

23 Oct 20, 2022

Official implementation of "Dynamic Anchor Learning for Arbitrary-Oriented Object Detection" (AAAI2021).

DAL This project hosts the official implementation for our AAAI 2021 paper: Dynamic Anchor Learning for Arbitrary-Oriented Object Detection [arxiv] [c

215 Nov 28, 2022

Codes for TIM2021 paper "Anchor-Based Spatio-Temporal Attention 3-D Convolutional Networks for Dynamic 3-D Point Cloud Sequences"

Intelligent Robotics and Machine Vision Lab

4 Jul 19, 2022

Pytorch re-implementation of Paper: SwinTextSpotter: Scene Text Spotting via Better Synergy between Text Detection and Text Recognition (CVPR 2022)

SwinTextSpotter This is the pytorch implementation of Paper: SwinTextSpotter: Scene Text Spotting via Better Synergy between Text Detection and Text R

183 Jan 3, 2023