Official implementation of the ICCV 2021 paper "Conditional DETR for Fast Training Convergence".

Last update: Dec 30, 2022

Related tags

Overview

Conditional DETR

This repository is an official implementation of the ICCV 2021 paper "Conditional DETR for Fast Training Convergence".

Introduction

The DETR approach applies the transformer encoder and decoder architecture to object detection and achieves promising performance. In this paper, we handle the critical issue, slow training convergence, and present a conditional cross-attention mechanism for fast DETR training. Our approach is motivated by that the cross-attention in DETR relies highly on the content embeddings and that the spatial embeddings make minor contributions, increasing the need for high-quality content embeddings and thus increasing the training difficulty.

Our conditional DETR learns a conditional spatial query from the decoder embedding for decoder multi-head cross-attention. The benefit is that through the conditional spatial query, each cross-attention head is able to attend to a band containing a distinct region, e.g., one object extremity or a region inside the object box (Figure 1). This narrows down the spatial range for localizing the distinct regions for object classification and box regression, thus relaxing the dependence on the content embeddings and easing the training. Empirical results show that conditional DETR converges 6.7x faster for the backbones R50 and R101 and 10x faster for stronger backbones DC5-R50 and DC5-R101.

Model Zoo

We provide conditional DETR and conditional DETR-DC5 models. AP is computed on COCO 2017 val.

Method	Epochs	Params (M)	FLOPs (G)	AP	AP_S	AP_M	AP_L	URL
DETR-R50	500	41	86	42.0	20.5	45.8	61.1	model log
DETR-R50	50	41	86	34.8	13.9	37.3	54.4	model log
DETR-DC5-R50	500	41	187	43.3	22.5	47.3	61.1	model log
DETR-R101	500	60	152	43.5	21.0	48.0	61.8	model log
DETR-R101	50	60	152	36.9	15.5	40.6	55.6	model log
DETR-DC5-R101	500	60	253	44.9	23.7	49.5	62.3	model log
Conditional DETR-R50	50	44	90	41.0	20.6	44.3	59.3	model log
Conditional DETR-DC5-R50	50	44	195	43.7	23.9	47.6	60.1	model log
Conditional DETR-R101	50	63	156	42.8	21.7	46.6	60.9	model log
Conditional DETR-DC5-R101	50	63	262	45.0	26.1	48.9	62.8	model log

Note:

The numbers in the table are slightly differently from the numbers in the paper. We re-ran some experiments when releasing the codes.
"DC5" means removing the stride in C5 stage of ResNet and add a dilation of 2 instead.

Installation

Requirements

Python >= 3.7, CUDA >= 10.1
PyTorch >= 1.7.0, torchvision >= 0.6.1
Cython, COCOAPI, scipy, termcolor

The code is developed using Python 3.8 with PyTorch 1.7.0. First, clone the repository locally:

git clone https://github.com/Atten4Vis/ConditionalDETR.git

Then, install PyTorch and torchvision:

conda install pytorch=1.7.0 torchvision=0.6.1 cudatoolkit=10.1 -c pytorch

Install other requirements:

cd ConditionalDETR
pip install -r requirements.txt

Usage

Data preparation

Download and extract COCO 2017 train and val images with annotations from http://cocodataset.org. We expect the directory structure to be the following:

path/to/coco/
├── annotations/  # annotation json files
└── images/
    ├── train2017/    # train images
    ├── val2017/      # val images
    └── test2017/     # test images

Training

To train conditional DETR-R50 on a single node with 8 gpus for 50 epochs run:

bash scripts/conddetr_r50_epoch50.sh

python -m torch.distributed.launch \
    --nproc_per_node=8 \
    --use_env \
    main.py \
    --resume auto \
    --coco_path /path/to/coco \
    --output_dir output/conddetr_r50_epoch50

The training process takes around 30 hours on a single machine with 8 V100 cards.

Same as DETR training setting, we train conditional DETR with AdamW setting learning rate in the transformer to 1e-4 and 1e-5 in the backbone. Horizontal flips, scales and crops are used for augmentation. Images are rescaled to have min size 800 and max size 1333. The transformer is trained with dropout of 0.1, and the whole model is trained with grad clip of 0.1.

Evaluation

To evaluate conditional DETR-R50 on COCO val with 8 GPUs run:

python -m torch.distributed.launch \
    --nproc_per_node=8 \
    --use_env \
    main.py \
    --batch_size 2 \
    --eval \
    --resume <checkpoint.pth> \
    --coco_path /path/to/coco \
    --output_dir output/<output_path>

Note that numbers vary depending on batch size (number of images) per GPU. Non-DC5 models were trained with batch size 2, and DC5 with 1, so DC5 models show a significant drop in AP if evaluated with more than 1 image per GPU.

License

Conditional DETR is released under the Apache 2.0 license. Please see the LICENSE file for more information.

Citation

@inproceedings{meng2021-CondDETR,
  title       = {Conditional DETR for Fast Training Convergence},
  author      = {Meng, Depu and Chen, Xiaokang and Fan, Zejia and Zeng, Gang and Li, Houqiang and Yuan, Yuhui and Sun, Lei and Wang, Jingdong},
  booktitle   = {Proceedings of the IEEE International Conference on Computer Vision (ICCV)},
  year        = {2021}
}

Comments

Can you provide the checkpoints of ConditionalDETR-R50 and -R101 trained with 108 epochs ?

I can't replicate the result of ConditionalDETR-R50 and -R101 trained with 108 epochs The replicated result of ConditionalDETR-R50 trained with 108 epochs The replicated result of ConditionalDETR-R101 trained with 108 epochs

opened by truetone2022 6
How to use resume correctly?

Hello, My PC can't keep training, so I run three epochs a day for about 12 hours. I just set the parameter “resume” to checkpoint.pth, when other parameters remain unchanged, the effect seems not very good. After running more than a dozen epochs, the effect of the first few is almost the same. So I want to ask how to use resume correctly.When I pause after running to an epoch, do I need to adjust the learning rate when I continue next time？ Should the "--start_epoch" be set to the last checkpoint?

opened by xziyh 6
Visualization code of Figure 1 in paper.

Hi Author,

First thanks for your great work to improve the convergence speed of DETR with such a large margin. When reading the paper, I get a little bit confused on how do you exactly draw the attention map in Figure 1.

Given object query q (1 x d), memory feature m (d x (hw)). I use the following equation to draw the attention maps:

Similarity(q,m) = Softmax(proj(q) \dot proj(m)) [1 x (hw)] where proj is the trained linear layer in cross attention module.

The attention maps I get is quite similar with the one shown in DETR paper:

A random object query:

A random object query on head A:

A random object query on head B:

A random object query on head C:

Could you please give some information on how to generate attention in Figure 1? Thanks!

opened by MaureenZOU 6
Some question about your code and Cross-attention module in your paper
Hi, thank you excellent work about Transformer in Object Detection，I'm extremely interested in your work。But I have some questions when reading the paper and code. I hope you can give me some answers。

In your paper, you ‘Conditional’ mean merely the formation of query matrix(Q) of Cross-attention module consist of the output embedding of self-attention module(same as DETR) and p_q（in your paper 3.3 purposed）; Otherwise, the formation of key matrix(K) and the value matrix(V) is same as DETR，The difference is that your work is Concat and DETR is addition。 Here, I would like to ask: the reference point s is generated by Object queries ? what is the conditional spatial query from the embedding f ? If in the first decoder layer，the decoder embeddings also initial by nn.Embedding() method；and in the back encoder layer，the decoder embeddings is the outputs of previous decoder layer?

Just the formation of p_q in the first decoder layer consist of reference point s and decoder embeddings f, and in the back decoder layer（layer2-6）, the p_q is generated by Object queries（same as DETR）？Because I see in the source code that the initial function in the encoder module include: self.layers[layer_id + 1].ca_qpos_proj = None (layer id begins 0 to 4, in other words, the 2nd-6th decoder) However，in the initial function of TransformerDecoderLayer，the definition of ca_qpos_proj is Linear layer: self.ca_qpos_proj = nn.Linear(d_model, d_model)

When I debug code，the model I choose is ConditionalDETR-res50dc5，but entering forward propagation，the sample contains a input images 'tensors'(batch,3,800,1096) and bool mask(batch,800,1096), Where does this mask come from? I don't see any relevant definitions in the initial function。I know the role of this mask，it is used to generate PE by PositionEmbeddingSine function for encoder and decoder.

the shape of input images is （batch，3，800，1096），the shape is (batch，3,50,69) through backbone，this downsampling rate is 16 not 32，I guess the convolution step in the last bottleneck is changed to 1, but i cant find the change in your code, and where is the deformable convolution that initial and forward propagation process 。

The above are all my questions. I sincerely hope I can get your help。Thanks！
opened by Huzhen757 5
How to add Group DETR in DINO-Deformable-DETR

Hi, Thank you for your wonderful works. If I want to add Group DETR in DINO-Deformable-DETR, how to use Mixed Query Selection that generate box for different groups?

opened by xiaoruiai 4
questions about provided conditional detr model

Thanks for your excellent work! I have questions about your provided model.In the provided conditional detr model"conditional detr resnet50",the transformer.decoder.layer.cross_attn.out_proj.weight/bias is of dimension of 256x256 and 256 seperately,but since the input of this cross attention is the concatenation of two 256-d query, it seems should be 512x512 and 512.It really confuses me.Looking forward to your help,thanks!

opened by xz-123-new 3
Add ConditionalDETR to HuggingFace Transformers

Hi!

As ConditionalDETR seems like some (relatively) minor modifications to the original DETR, it might make sense to add ConditionalDETR to HuggingFace Transformers to increase visibility and adoption. We do have the original DETR in the library, found here: https://huggingface.co/docs/transformers/model_doc/detr. This also comes with nice inference widgets on the hub, check out this one for instance (on the right -> you can directly try out DETR in the browser!): https://huggingface.co/facebook/detr-resnet-50.

The Python implementation is made in a single python script, found here: https://github.com/huggingface/transformers/blob/main/src/transformers/models/detr/modeling_detr.py.

So, if we want to add ConditionalDETR, it would have to be implemented in modeling_conditional_detr.py, which includes the modifications compared to modeling_detr.py.

Are you interested in adding this model to the library?

Kind regards,

Niels, ML Engineer @ HuggingFace

opened by NielsRogge 3
The diagnoal matrix meaning?
Hi, thank your nice work about Transformer in Object Detection. But I have some questions when reading the paper and code. I hope you can give me some answers。

What 's the insight of the pos_transformation T in 3.3 ?

What 's the meaning about diagonal vector \lamda q described in 3.3. And I don't find the code about the diagonal operator in this repo. And i just find the pos_transformation just generated by learnable weights : https://github.com/Atten4Vis/ConditionalDETR/blob/0b04a859c7fac33a866fcdea06f338610ba6e9d8/models/transformer.py#L151

I can't figure out the difference bewteen "Block" , "Full" and "Diagonal" in Fig5.

The above are all my questions. I sincerely hope I can get your help. Thanks！
opened by JosonChan1998 3
About the multi-head attention

I find that you re-implement the Multi-head Attention in models/attention.py. Are there any difference from the original implementation? Since the code is very long, it's kind of hard for me to find the difference. Could you kindly tell me? Thanks!

opened by LiewFeng 2
the parameters are only 43196001, instead of 43524961

I run the default Conddetr-r50, but the num of parameters is different from that in the provided log.

Also, after training for 1 epoch, the eval results are [0.04369693586567375, 0.12083834673558262, 0.023675111814434113, 0.01864211602467282, 0.052261665895792626, 0.07171156446634068, 0.09023536974930606, 0.18654859799415718, 0.22196121793196433, 0.04610799601904764, 0.21023391350986004, 0.3797766209046455],

which is weaker (about 0.7AP) than that in the provided log [0.0509964214370242, 0.13292741190993088, 0.030383986414032393, 0.015355903493298791, 0.05914294278060285, 0.08176101640052409, 0.10028554935230335, 0.2012481198582593, 0.23517722389597043, 0.04296950016312112, 0.23670937055006003, 0.40016568706711353].

opened by Cohesion97 2
What's the difference between the Group DETR and the DETRs with Hybrid Matching?

Hi, I found the Group DETR is the same as the DETRs with Hybrid Matching. They are all group-wise one-to-many assignments. Could you tell me some differences between them?

opened by rockywind 1
Issues about Positional Embedding and Reference Point

Hi, thanks for sharing your wonderful work.

I got a question in here, https://github.com/Atten4Vis/ConditionalDETR/blob/ead865cbcf88be10175b79165df0836c5fcfc7e3/models/transformer.py#L33 which embedes positional information in the query_pos.

however, I don't understand the reason why does 2*(dim_t//2) has to be devided by 128, instead of the actual dimension pos_tensor has (e.g., 256 by default). https://github.com/Atten4Vis/ConditionalDETR/blob/ead865cbcf88be10175b79165df0836c5fcfc7e3/models/transformer.py#L38 Is it works correctly even dim_t is divided by 128?

I would appreciate to be corrected !

And another question is, when we do the calculation of the equation (1) in the paper, https://github.com/Atten4Vis/ConditionalDETR/blob/ead865cbcf88be10175b79165df0836c5fcfc7e3/models/conditional_detr.py#L89 can I understand that the model would learn "offsets" from the corresponding reference points? what is precise role of the reference points?

Thank you!

opened by tae-mo 0

Training Error assert (boxes1[:, 2:] >= boxes1[:, :2]).all()

Instructions To Reproduce the 🐛 Bug:

what changes you made (git diff) or what code you wrote

Nothing change

what exact command you run: python -m torch.distributed.launch --nproc_per_node=8 --use_env main.py --coco_path ../data/COCO2017 --output_dir output/conddetr_r50_epoch50
what you observed (including full logs):

| distributed init (rank 2): env://
| distributed init (rank 0): env://
| distributed init (rank 4): env://
| distributed init (rank 3): env://
| distributed init (rank 5): env://
| distributed init (rank 1): env://
| distributed init (rank 7): env://
| distributed init (rank 6): env://
fatal: Not a git repository (or any parent up to mount point /research/d4)
Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).
fatal: Not a git repository (or any parent up to mount point /research/d4)
Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).
fatal: Not a git repository (or any parent up to mount point /research/d4)
Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).
fatal: Not a git repository (or any parent up to mount point /research/d4)
Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).
fatal: Not a git repository (or any parent up to mount point /research/d4)
Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).
fatal: Not a git repository (or any parent up to mount point /research/d4)
Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).
fatal: Not a git repository (or any parent up to mount point /research/d4)
Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).
git:
  sha: N/A, status: clean, branch: N/A

fatal: Not a git repository (or any parent up to mount point /research/d4)
Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).
Namespace(aux_loss=True, backbone='resnet50', batch_size=2, bbox_loss_coef=5, clip_max_norm=0.1, cls_loss_coef=2, coco_panoptic_path=None, coco_path='../data/COCO2017', dataset_file='coco', dec_layers=6, device='cuda', dice_loss_coef=1, dilation=False, dim_feedforward=2048, dist_backend='nccl', dist_url='env://', distributed=True, dropout=0.1, enc_layers=6, epochs=50, eval=False, focal_alpha=0.25, frozen_weights=None, giou_loss_coef=2, gpu=0, hidden_dim=256, lr=0.0001, lr_backbone=1e-05, lr_drop=40, mask_loss_coef=1, masks=False, nheads=8, num_queries=300, num_workers=2, output_dir='output/conddetr_r50_epoch50', position_embedding='sine', pre_norm=False, rank=0, remove_difficult=False, resume='', seed=42, set_cost_bbox=5, set_cost_class=2, set_cost_giou=2, start_epoch=0, weight_decay=0.0001, world_size=8)
number of params: 43196001
loading annotations into memory...
Done (t=20.78s)
creating index...
index created!
loading annotations into memory...
Done (t=0.56s)
creating index...
index created!
Start training
Epoch: [0]  [   0/7393]  eta: 7:05:21  lr: 0.000100  class_error: 85.57  loss: 45.1821 (45.1821)  loss_bbox: 3.7751 (3.7751)  loss_bbox_0: 3.7823 (3.7823)  loss_bbox_1: 3.7808 (3.7808)  loss_bbox_2: 3.7756 (3.7756)  loss_bbox_3: 3.7911 (3.7911)  loss_bbox_4: 3.7856 (3.7856)  loss_ce: 1.9574 (1.9574)  loss_ce_0: 2.0151 (2.0151)  loss_ce_1: 2.0196 (2.0196)  loss_ce_2: 2.1484 (2.1484)  loss_ce_3: 2.0683 (2.0683)  loss_ce_4: 2.0683 (2.0683)  loss_giou: 1.7011 (1.7011)  loss_giou_0: 1.7000 (1.7000)  loss_giou_1: 1.7040 (1.7040)  loss_giou_2: 1.7059 (1.7059)  loss_giou_3: 1.7022 (1.7022)  loss_giou_4: 1.7012 (1.7012)  cardinality_error_unscaled: 293.1250 (293.1250)  cardinality_error_0_unscaled: 293.1250 (293.1250)  cardinality_error_1_unscaled: 293.1250 (293.1250)  cardinality_error_2_unscaled: 281.9375 (281.9375)  cardinality_error_3_unscaled: 293.1250 (293.1250)  cardinality_error_4_unscaled: 293.1250 (293.1250)  class_error_unscaled: 85.5712 (85.5712)  loss_bbox_unscaled: 0.7550 (0.7550)  loss_bbox_0_unscaled: 0.7565 (0.7565)  loss_bbox_1_unscaled: 0.7562 (0.7562)  loss_bbox_2_unscaled: 0.7551 (0.7551)  loss_bbox_3_unscaled: 0.7582 (0.7582)  loss_bbox_4_unscaled: 0.7571 (0.7571)  loss_ce_unscaled: 0.9787 (0.9787)  loss_ce_0_unscaled: 1.0076 (1.0076)  loss_ce_1_unscaled: 1.0098 (1.0098)  loss_ce_2_unscaled: 1.0742 (1.0742)  loss_ce_3_unscaled: 1.0341 (1.0341)  loss_ce_4_unscaled: 1.0342 (1.0342)  loss_giou_unscaled: 0.8506 (0.8506)  loss_giou_0_unscaled: 0.8500 (0.8500)  loss_giou_1_unscaled: 0.8520 (0.8520)  loss_giou_2_unscaled: 0.8530 (0.8530)  loss_giou_3_unscaled: 0.8511 (0.8511)  loss_giou_4_unscaled: 0.8506 (0.8506)  time: 3.4521  data: 0.4687  max mem: 2932
Epoch: [0]  [ 100/7393]  eta: 1:17:39  lr: 0.000100  class_error: 85.74  loss: 28.2629 (33.7855)  loss_bbox: 1.5517 (2.3437)  loss_bbox_0: 1.5566 (2.3695)  loss_bbox_1: 1.5482 (2.3519)  loss_bbox_2: 1.5535 (2.3396)  loss_bbox_3: 1.5641 (2.3476)  loss_bbox_4: 1.5637 (2.3431)  loss_ce: 1.5467 (1.6584)  loss_ce_0: 1.5650 (1.6414)  loss_ce_1: 1.5443 (1.6461)  loss_ce_2: 1.5557 (1.6477)  loss_ce_3: 1.5392 (1.6545)  loss_ce_4: 1.5541 (1.6667)  loss_giou: 1.5534 (1.6289)  loss_giou_0: 1.5514 (1.6296)  loss_giou_1: 1.5541 (1.6292)  loss_giou_2: 1.5695 (1.6291)  loss_giou_3: 1.5526 (1.6289)  loss_giou_4: 1.5519 (1.6296)  cardinality_error_unscaled: 293.1875 (293.2420)  cardinality_error_0_unscaled: 293.1875 (293.2420)  cardinality_error_1_unscaled: 293.1875 (293.2420)  cardinality_error_2_unscaled: 293.1875 (293.1312)  cardinality_error_3_unscaled: 293.1875 (293.2420)  cardinality_error_4_unscaled: 293.1875 (293.1658)  class_error_unscaled: 75.6680 (75.4478)  loss_bbox_unscaled: 0.3103 (0.4687)  loss_bbox_0_unscaled: 0.3113 (0.4739)  loss_bbox_1_unscaled: 0.3096 (0.4704)  loss_bbox_2_unscaled: 0.3107 (0.4679)  loss_bbox_3_unscaled: 0.3128 (0.4695)  loss_bbox_4_unscaled: 0.3127 (0.4686)  loss_ce_unscaled: 0.7733 (0.8292)  loss_ce_0_unscaled: 0.7825 (0.8207)  loss_ce_1_unscaled: 0.7722 (0.8231)  loss_ce_2_unscaled: 0.7779 (0.8239)  loss_ce_3_unscaled: 0.7696 (0.8272)  loss_ce_4_unscaled: 0.7770 (0.8334)  loss_giou_unscaled: 0.7767 (0.8145)  loss_giou_0_unscaled: 0.7757 (0.8148)  loss_giou_1_unscaled: 0.7771 (0.8146)  loss_giou_2_unscaled: 0.7847 (0.8146)  loss_giou_3_unscaled: 0.7763 (0.8144)  loss_giou_4_unscaled: 0.7760 (0.8148)  time: 0.6098  data: 0.0105  max mem: 4353
Traceback (most recent call last):
  File "main.py", line 258, in <module>
    main(args)
  File "main.py", line 206, in main
    train_stats = train_one_epoch(
  File "/research/d4/gds/zwang21/ConditionalDETR/engine.py", line 41, in train_one_epoch
    loss_dict = criterion(outputs, targets)
  File "/research/d4/gds/zwang21/anaconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/research/d4/gds/zwang21/ConditionalDETR/models/conditional_detr.py", line 254, in forward
    indices = self.matcher(outputs_without_aux, targets)
  File "/research/d4/gds/zwang21/anaconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/research/d4/gds/zwang21/anaconda3/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "/research/d4/gds/zwang21/ConditionalDETR/models/matcher.py", line 79, in forward
    cost_giou = -generalized_box_iou(box_cxcywh_to_xyxy(out_bbox), box_cxcywh_to_xyxy(tgt_bbox))
  File "/research/d4/gds/zwang21/ConditionalDETR/util/box_ops.py", line 59, in generalized_box_iou
    assert (boxes1[:, 2:] >= boxes1[:, :2]).all()
AssertionError
Traceback (most recent call last):
  File "/research/d4/gds/zwang21/anaconda3/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/research/d4/gds/zwang21/anaconda3/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/research/d4/gds/zwang21/anaconda3/lib/python3.8/site-packages/torch/distributed/launch.py", line 340, in <module>
    main()
  File "/research/d4/gds/zwang21/anaconda3/lib/python3.8/site-packages/torch/distributed/launch.py", line 326, in main
    sigkill_handler(signal.SIGTERM, None)  # not coming back
  File "/research/d4/gds/zwang21/anaconda3/lib/python3.8/site-packages/torch/distributed/launch.py", line 301, in sigkill_handler
    raise subprocess.CalledProcessError(returncode=last_return_code, cmd=cmd)
subprocess.CalledProcessError: Command '['/research/d4/gds/zwang21/anaconda3/bin/python', '-u', 'main.py', '--coco_path', '../data/COCO2017', '--output_dir', 'output/conddetr_r50_epoch50']' returned non-zero exit status 1.
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
Killing subprocess 29668
Killing subprocess 29669
Killing subprocess 29670
Killing subprocess 29671
Killing subprocess 29672
Killing subprocess 29673
Killing subprocess 29674
Killing subprocess 29675

please simplify the steps as much as possible so they do not require additional resources to run, such as a private dataset.

Expected behavior:

If there are no obvious error in "what you observed" provided above, please tell us the expected behavior.

Environment:

Provide your environment information using the following command:

Collecting environment information...
PyTorch version: 1.8.0
Is debug build: False
CUDA used to build PyTorch: 10.2
ROCM used to build PyTorch: N/A

OS: CentOS Linux release 7.9.2009 (Core) (x86_64)
GCC version: (GCC) 11.2.0
Clang version: Could not collect
CMake version: version 2.8.12.2

Python version: 3.8 (64-bit runtime)
Is CUDA available: False
CUDA runtime version: No CUDA
GPU models and configuration: No CUDA
Nvidia driver version: No CUDA
cuDNN version: No CUDA
HIP runtime version: N/A
MIOpen runtime version: N/A

Versions of relevant libraries:
[pip3] numpy==1.22.2
[pip3] numpydoc==1.1.0
[pip3] pytorch-ignite==0.2.0
[pip3] pytorch-metric-learning==0.9.99
[pip3] torch==1.8.0
[pip3] torchaudio==0.8.0a0+a751e1d
[pip3] torchfile==0.1.0
[pip3] torchsampler==0.1.1
[pip3] torchsummary==1.5.1
[pip3] torchvision==0.9.0
[conda] blas                      1.0                         mkl  
[conda] cudatoolkit               10.2.89              hfd86e86_1  
[conda] ffmpeg                    4.3                  hf484d3e_0    pytorch
[conda] mkl                       2021.2.0           h06a4308_296  
[conda] mkl-service               2.3.0            py38h27cfd23_1  
[conda] mkl_fft                   1.3.0            py38h42c9631_2  
[conda] mkl_random                1.2.1            py38ha9443f7_2  
[conda] numpy                     1.22.2                   pypi_0    pypi
[conda] numpydoc                  1.1.0              pyhd3eb1b0_1  
[conda] pytorch                   1.8.0           py3.8_cuda10.2_cudnn7.6.5_0    pytorch
[conda] pytorch-ignite            0.2.0                    pypi_0    pypi
[conda] pytorch-metric-learning   0.9.99                   pypi_0    pypi
[conda] pytorch-mutex             1.0                        cuda    pytorch
[conda] torch                     1.10.0                   pypi_0    pypi
[conda] torchaudio                0.8.0                      py38    pytorch
[conda] torchfile                 0.1.0                    pypi_0    pypi
[conda] torchsampler              0.1.1                    pypi_0    pypi
[conda] torchsummary              1.5.1                    pypi_0    pypi
[conda] torchvision               0.9.0                py38_cu102    pytorch

opened by Kyfafyd 1

out of memory

hello author, i use the 3080ti to train Conditional DETR with entire coco2017 datasets. But the programs report that cuda out of memory,3080ti has 12GB memory.I use the msi after burner to monitor the memory usage，and it shows the biggest memory usage is only 2520MB I set the batchsize to 1.

opened by xziyh 3

Official implementation of the ICCV 2021 paper "Conditional DETR for Fast Training Convergence".

Related tags

Overview

Conditional DETR

Introduction

Model Zoo

Installation

Requirements

Usage

Data preparation

Training

Evaluation

License

Citation

Comments

Instructions To Reproduce the 🐛 Bug:

Expected behavior:

Environment:

Owner

The Official Implementation of the ICCV-2021 Paper: Semantically Coherent Out-of-Distribution Detection.

Official implementation of the ICCV 2021 paper: "The Power of Points for Modeling Humans in Clothing".

official Pytorch implementation of ICCV 2021 paper FuseFormer: Fusing Fine-Grained Information in Transformers for Video Inpainting.

Official implementation of the ICCV 2021 paper "Joint Inductive and Transductive Learning for Video Object Segmentation"

Official PyTorch Implementation of paper "Deep 3D Mask Volume for View Synthesis of Dynamic Scenes", ICCV 2021.

This is the official pytorch implementation for our ICCV 2021 paper "TRAR: Routing the Attention Spans in Transformers for Visual Question Answering" on VQA Task

Official Pytorch implementation of the paper "Action-Conditioned 3D Human Motion Synthesis with Transformer VAE", ICCV 2021

Official Repository for the ICCV 2021 paper "PixelSynth: Generating a 3D-Consistent Experience from a Single Image"

Official code release for ICCV 2021 paper SNARF: Differentiable Forward Skinning for Animating Non-rigid Neural Implicit Shapes.

Official implementation of NPMs: Neural Parametric Models for 3D Deformable Shapes - ICCV 2021

An official implementation of "Exploiting a Joint Embedding Space for Generalized Zero-Shot Semantic Segmentation" (ICCV 2021) in PyTorch.

[ICCV 2021] Official Pytorch implementation for Discriminative Region-based Multi-Label Zero-Shot Learning SOTA results on NUS-WIDE and OpenImages

[ICCV 2021] Official Pytorch implementation for Discriminative Region-based Multi-Label Zero-Shot Learning SOTA results on NUS-WIDE and OpenImages

[ICCV 2021] Official PyTorch implementation for Deep Relational Metric Learning.

[ICCV 2021] Official Tensorflow Implementation for "Single Image Defocus Deblurring Using Kernel-Sharing Parallel Atrous Convolutions"

Official pytorch implementation of "Scaling-up Disentanglement for Image Translation", ICCV 2021.

Official PyTorch implementation of N-ImageNet: Towards Robust, Fine-Grained Object Recognition with Event Cameras (ICCV 2021)

Official implementation of Protected Attribute Suppression System, ICCV 2021

This Repo is the official CUDA implementation of ICCV 2019 Oral paper for CARAFE: Content-Aware ReAssembly of FEatures