Official implementation of the ICCV 2021 paper "Conditional DETR for Fast Training Convergence".

Overview

Conditional DETR

This repository is an official implementation of the ICCV 2021 paper "Conditional DETR for Fast Training Convergence".

Introduction

The DETR approach applies the transformer encoder and decoder architecture to object detection and achieves promising performance. In this paper, we handle the critical issue, slow training convergence, and present a conditional cross-attention mechanism for fast DETR training. Our approach is motivated by that the cross-attention in DETR relies highly on the content embeddings and that the spatial embeddings make minor contributions, increasing the need for high-quality content embeddings and thus increasing the training difficulty.

Our conditional DETR learns a conditional spatial query from the decoder embedding for decoder multi-head cross-attention. The benefit is that through the conditional spatial query, each cross-attention head is able to attend to a band containing a distinct region, e.g., one object extremity or a region inside the object box (Figure 1). This narrows down the spatial range for localizing the distinct regions for object classification and box regression, thus relaxing the dependence on the content embeddings and easing the training. Empirical results show that conditional DETR converges 6.7x faster for the backbones R50 and R101 and 10x faster for stronger backbones DC5-R50 and DC5-R101.

Model Zoo

We provide conditional DETR and conditional DETR-DC5 models. AP is computed on COCO 2017 val.

Method Epochs Params (M) FLOPs (G) AP APS APM APL URL
DETR-R50 500 41 86 42.0 20.5 45.8 61.1 model
log
DETR-R50 50 41 86 34.8 13.9 37.3 54.4 model
log
DETR-DC5-R50 500 41 187 43.3 22.5 47.3 61.1 model
log
DETR-R101 500 60 152 43.5 21.0 48.0 61.8 model
log
DETR-R101 50 60 152 36.9 15.5 40.6 55.6 model
log
DETR-DC5-R101 500 60 253 44.9 23.7 49.5 62.3 model
log
Conditional DETR-R50 50 44 90 41.0 20.6 44.3 59.3 model
log
Conditional DETR-DC5-R50 50 44 195 43.7 23.9 47.6 60.1 model
log
Conditional DETR-R101 50 63 156 42.8 21.7 46.6 60.9 model
log
Conditional DETR-DC5-R101 50 63 262 45.0 26.1 48.9 62.8 model
log

Note:

  1. The numbers in the table are slightly differently from the numbers in the paper. We re-ran some experiments when releasing the codes.
  2. "DC5" means removing the stride in C5 stage of ResNet and add a dilation of 2 instead.

Installation

Requirements

  • Python >= 3.7, CUDA >= 10.1
  • PyTorch >= 1.7.0, torchvision >= 0.6.1
  • Cython, COCOAPI, scipy, termcolor

The code is developed using Python 3.8 with PyTorch 1.7.0. First, clone the repository locally:

git clone https://github.com/Atten4Vis/ConditionalDETR.git

Then, install PyTorch and torchvision:

conda install pytorch=1.7.0 torchvision=0.6.1 cudatoolkit=10.1 -c pytorch

Install other requirements:

cd ConditionalDETR
pip install -r requirements.txt

Usage

Data preparation

Download and extract COCO 2017 train and val images with annotations from http://cocodataset.org. We expect the directory structure to be the following:

path/to/coco/
├── annotations/  # annotation json files
└── images/
    ├── train2017/    # train images
    ├── val2017/      # val images
    └── test2017/     # test images

Training

To train conditional DETR-R50 on a single node with 8 gpus for 50 epochs run:

bash scripts/conddetr_r50_epoch50.sh

or

python -m torch.distributed.launch \
    --nproc_per_node=8 \
    --use_env \
    main.py \
    --resume auto \
    --coco_path /path/to/coco \
    --output_dir output/conddetr_r50_epoch50

The training process takes around 30 hours on a single machine with 8 V100 cards.

Same as DETR training setting, we train conditional DETR with AdamW setting learning rate in the transformer to 1e-4 and 1e-5 in the backbone. Horizontal flips, scales and crops are used for augmentation. Images are rescaled to have min size 800 and max size 1333. The transformer is trained with dropout of 0.1, and the whole model is trained with grad clip of 0.1.

Evaluation

To evaluate conditional DETR-R50 on COCO val with 8 GPUs run:

python -m torch.distributed.launch \
    --nproc_per_node=8 \
    --use_env \
    main.py \
    --batch_size 2 \
    --eval \
    --resume <checkpoint.pth> \
    --coco_path /path/to/coco \
    --output_dir output/<output_path>

Note that numbers vary depending on batch size (number of images) per GPU. Non-DC5 models were trained with batch size 2, and DC5 with 1, so DC5 models show a significant drop in AP if evaluated with more than 1 image per GPU.

License

Conditional DETR is released under the Apache 2.0 license. Please see the LICENSE file for more information.

Citation

@inproceedings{meng2021-CondDETR,
  title       = {Conditional DETR for Fast Training Convergence},
  author      = {Meng, Depu and Chen, Xiaokang and Fan, Zejia and Zeng, Gang and Li, Houqiang and Yuan, Yuhui and Sun, Lei and Wang, Jingdong},
  booktitle   = {Proceedings of the IEEE International Conference on Computer Vision (ICCV)},
  year        = {2021}
}
Comments
  • Can you provide the checkpoints of ConditionalDETR-R50 and -R101 trained with 108 epochs ?

    Can you provide the checkpoints of ConditionalDETR-R50 and -R101 trained with 108 epochs ?

    I can't replicate the result of ConditionalDETR-R50 and -R101 trained with 108 epochs The replicated result of ConditionalDETR-R50 trained with 108 epochs image The replicated result of ConditionalDETR-R101 trained with 108 epochs image

    opened by truetone2022 6
  • How to use resume correctly?

    How to use resume correctly?

    Hello, My PC can't keep training, so I run three epochs a day for about 12 hours. I just set the parameter “resume” to checkpoint.pth, when other parameters remain unchanged, the effect seems not very good. After running more than a dozen epochs, the effect of the first few is almost the same. So I want to ask how to use resume correctly.When I pause after running to an epoch, do I need to adjust the learning rate when I continue next time? Should the "--start_epoch" be set to the last checkpoint?

    opened by xziyh 6
  • Visualization code of Figure 1 in paper.

    Visualization code of Figure 1 in paper.

    Hi Author,

    First thanks for your great work to improve the convergence speed of DETR with such a large margin. When reading the paper, I get a little bit confused on how do you exactly draw the attention map in Figure 1.

    Given object query q (1 x d), memory feature m (d x (hw)). I use the following equation to draw the attention maps:

    Similarity(q,m) = Softmax(proj(q) \dot proj(m)) [1 x (hw)] where proj is the trained linear layer in cross attention module.

    The attention maps I get is quite similar with the one shown in DETR paper:

    A random object query: Screen Shot 2021-09-13 at 10 46 41 AM

    A random object query on head A: Screen Shot 2021-09-13 at 10 47 18 AM

    A random object query on head B: Screen Shot 2021-09-13 at 10 47 25 AM

    A random object query on head C: Screen Shot 2021-09-13 at 10 47 30 AM

    Could you please give some information on how to generate attention in Figure 1? Thanks!

    opened by MaureenZOU 6
  • Some question about your code and Cross-attention module in your paper

    Some question about your code and Cross-attention module in your paper

    Hi, thank you excellent work about Transformer in Object Detection,I'm extremely interested in your work。But I have some questions when reading the paper and code. I hope you can give me some answers。

    1. In your paper, you ‘Conditional’ mean merely the formation of query matrix(Q) of Cross-attention module consist of the output embedding of self-attention module(same as DETR) and p_q(in your paper 3.3 purposed); Otherwise, the formation of key matrix(K) and the value matrix(V) is same as DETR,The difference is that your work is Concat and DETR is addition。 Here, I would like to ask: the reference point s is generated by Object queries ? what is the conditional spatial query from the embedding f ? If in the first decoder layer,the decoder embeddings also initial by nn.Embedding() method;and in the back encoder layer,the decoder embeddings is the outputs of previous decoder layer?

    2. Just the formation of p_q in the first decoder layer consist of reference point s and decoder embeddings f, and in the back decoder layer(layer2-6), the p_q is generated by Object queries(same as DETR)?Because I see in the source code that the initial function in the encoder module include: self.layers[layer_id + 1].ca_qpos_proj = None (layer id begins 0 to 4, in other words, the 2nd-6th decoder) However,in the initial function of TransformerDecoderLayer,the definition of ca_qpos_proj is Linear layer: self.ca_qpos_proj = nn.Linear(d_model, d_model)

    3. When I debug code,the model I choose is ConditionalDETR-res50dc5,but entering forward propagation,the sample contains a input images 'tensors'(batch,3,800,1096) and bool mask(batch,800,1096), Where does this mask come from? I don't see any relevant definitions in the initial function。I know the role of this mask,it is used to generate PE by PositionEmbeddingSine function for encoder and decoder.

    4. the shape of input images is (batch,3,800,1096),the shape is (batch,3,50,69) through backbone,this downsampling rate is 16 not 32,I guess the convolution step in the last bottleneck is changed to 1, but i cant find the change in your code, and where is the deformable convolution that initial and forward propagation process 。

    The above are all my questions. I sincerely hope I can get your help。Thanks!

    opened by Huzhen757 5
  • How to add Group DETR in DINO-Deformable-DETR

    How to add Group DETR in DINO-Deformable-DETR

    Hi, Thank you for your wonderful works. If I want to add Group DETR in DINO-Deformable-DETR, how to use Mixed Query Selection that generate box for different groups?

    opened by xiaoruiai 4
  • questions about provided conditional detr model

    questions about provided conditional detr model

    Thanks for your excellent work! I have questions about your provided model.In the provided conditional detr model"conditional detr resnet50",the transformer.decoder.layer.cross_attn.out_proj.weight/bias is of dimension of 256x256 and 256 seperately,but since the input of this cross attention is the concatenation of two 256-d query, it seems should be 512x512 and 512.It really confuses me.Looking forward to your help,thanks!

    opened by xz-123-new 3
  • Add ConditionalDETR to HuggingFace Transformers

    Add ConditionalDETR to HuggingFace Transformers

    Hi!

    As ConditionalDETR seems like some (relatively) minor modifications to the original DETR, it might make sense to add ConditionalDETR to HuggingFace Transformers to increase visibility and adoption. We do have the original DETR in the library, found here: https://huggingface.co/docs/transformers/model_doc/detr. This also comes with nice inference widgets on the hub, check out this one for instance (on the right -> you can directly try out DETR in the browser!): https://huggingface.co/facebook/detr-resnet-50.

    The Python implementation is made in a single python script, found here: https://github.com/huggingface/transformers/blob/main/src/transformers/models/detr/modeling_detr.py.

    So, if we want to add ConditionalDETR, it would have to be implemented in modeling_conditional_detr.py, which includes the modifications compared to modeling_detr.py.

    Are you interested in adding this model to the library?

    Kind regards,

    Niels, ML Engineer @ HuggingFace

    opened by NielsRogge 3
  • The diagnoal matrix meaning?

    The diagnoal matrix meaning?

    Hi, thank your nice work about Transformer in Object Detection. But I have some questions when reading the paper and code. I hope you can give me some answers。

    1. What 's the insight of the pos_transformation T in 3.3 ?

    2. What 's the meaning about diagonal vector \lamda q described in 3.3. And I don't find the code about the diagonal operator in this repo. And i just find the pos_transformation just generated by learnable weights : https://github.com/Atten4Vis/ConditionalDETR/blob/0b04a859c7fac33a866fcdea06f338610ba6e9d8/models/transformer.py#L151

    3. I can't figure out the difference bewteen "Block" , "Full" and "Diagonal" in Fig5.

    The above are all my questions. I sincerely hope I can get your help. Thanks!

    opened by JosonChan1998 3
  • About the multi-head attention

    About the multi-head attention

    I find that you re-implement the Multi-head Attention in models/attention.py. Are there any difference from the original implementation? Since the code is very long, it's kind of hard for me to find the difference. Could you kindly tell me? Thanks!

    opened by LiewFeng 2
  • the parameters are only 43196001, instead of 43524961

    the parameters are only 43196001, instead of 43524961

    I run the default Conddetr-r50, but the num of parameters is different from that in the provided log.

    Also, after training for 1 epoch, the eval results are [0.04369693586567375, 0.12083834673558262, 0.023675111814434113, 0.01864211602467282, 0.052261665895792626, 0.07171156446634068, 0.09023536974930606, 0.18654859799415718, 0.22196121793196433, 0.04610799601904764, 0.21023391350986004, 0.3797766209046455],

    which is weaker (about 0.7AP) than that in the provided log [0.0509964214370242, 0.13292741190993088, 0.030383986414032393, 0.015355903493298791, 0.05914294278060285, 0.08176101640052409, 0.10028554935230335, 0.2012481198582593, 0.23517722389597043, 0.04296950016312112, 0.23670937055006003, 0.40016568706711353].

    opened by Cohesion97 2
  • What's the difference between the Group DETR and the DETRs with Hybrid Matching?

    What's the difference between the Group DETR and the DETRs with Hybrid Matching?

    Hi, I found the Group DETR is the same as the DETRs with Hybrid Matching. They are all group-wise one-to-many assignments. Could you tell me some differences between them?

    opened by rockywind 1
  • Issues about Positional Embedding and Reference Point

    Issues about Positional Embedding and Reference Point

    Hi, thanks for sharing your wonderful work.

    I got a question in here, https://github.com/Atten4Vis/ConditionalDETR/blob/ead865cbcf88be10175b79165df0836c5fcfc7e3/models/transformer.py#L33 which embedes positional information in the query_pos.

    however, I don't understand the reason why does 2*(dim_t//2) has to be devided by 128, instead of the actual dimension pos_tensor has (e.g., 256 by default). https://github.com/Atten4Vis/ConditionalDETR/blob/ead865cbcf88be10175b79165df0836c5fcfc7e3/models/transformer.py#L38 Is it works correctly even dim_t is divided by 128?

    I would appreciate to be corrected !

    And another question is, when we do the calculation of the equation (1) in the paper, https://github.com/Atten4Vis/ConditionalDETR/blob/ead865cbcf88be10175b79165df0836c5fcfc7e3/models/conditional_detr.py#L89 can I understand that the model would learn "offsets" from the corresponding reference points? what is precise role of the reference points?

    Thank you!

    opened by tae-mo 0
  • Training Error assert (boxes1[:, 2:] >= boxes1[:, :2]).all()

    Training Error assert (boxes1[:, 2:] >= boxes1[:, :2]).all()

    Instructions To Reproduce the 🐛 Bug:

    1. what changes you made (git diff) or what code you wrote
    Nothing change
    
    1. what exact command you run: python -m torch.distributed.launch --nproc_per_node=8 --use_env main.py --coco_path ../data/COCO2017 --output_dir output/conddetr_r50_epoch50
    2. what you observed (including full logs):
    | distributed init (rank 2): env://
    | distributed init (rank 0): env://
    | distributed init (rank 4): env://
    | distributed init (rank 3): env://
    | distributed init (rank 5): env://
    | distributed init (rank 1): env://
    | distributed init (rank 7): env://
    | distributed init (rank 6): env://
    fatal: Not a git repository (or any parent up to mount point /research/d4)
    Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).
    fatal: Not a git repository (or any parent up to mount point /research/d4)
    Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).
    fatal: Not a git repository (or any parent up to mount point /research/d4)
    Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).
    fatal: Not a git repository (or any parent up to mount point /research/d4)
    Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).
    fatal: Not a git repository (or any parent up to mount point /research/d4)
    Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).
    fatal: Not a git repository (or any parent up to mount point /research/d4)
    Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).
    fatal: Not a git repository (or any parent up to mount point /research/d4)
    Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).
    git:
      sha: N/A, status: clean, branch: N/A
    
    fatal: Not a git repository (or any parent up to mount point /research/d4)
    Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).
    Namespace(aux_loss=True, backbone='resnet50', batch_size=2, bbox_loss_coef=5, clip_max_norm=0.1, cls_loss_coef=2, coco_panoptic_path=None, coco_path='../data/COCO2017', dataset_file='coco', dec_layers=6, device='cuda', dice_loss_coef=1, dilation=False, dim_feedforward=2048, dist_backend='nccl', dist_url='env://', distributed=True, dropout=0.1, enc_layers=6, epochs=50, eval=False, focal_alpha=0.25, frozen_weights=None, giou_loss_coef=2, gpu=0, hidden_dim=256, lr=0.0001, lr_backbone=1e-05, lr_drop=40, mask_loss_coef=1, masks=False, nheads=8, num_queries=300, num_workers=2, output_dir='output/conddetr_r50_epoch50', position_embedding='sine', pre_norm=False, rank=0, remove_difficult=False, resume='', seed=42, set_cost_bbox=5, set_cost_class=2, set_cost_giou=2, start_epoch=0, weight_decay=0.0001, world_size=8)
    number of params: 43196001
    loading annotations into memory...
    Done (t=20.78s)
    creating index...
    index created!
    loading annotations into memory...
    Done (t=0.56s)
    creating index...
    index created!
    Start training
    Epoch: [0]  [   0/7393]  eta: 7:05:21  lr: 0.000100  class_error: 85.57  loss: 45.1821 (45.1821)  loss_bbox: 3.7751 (3.7751)  loss_bbox_0: 3.7823 (3.7823)  loss_bbox_1: 3.7808 (3.7808)  loss_bbox_2: 3.7756 (3.7756)  loss_bbox_3: 3.7911 (3.7911)  loss_bbox_4: 3.7856 (3.7856)  loss_ce: 1.9574 (1.9574)  loss_ce_0: 2.0151 (2.0151)  loss_ce_1: 2.0196 (2.0196)  loss_ce_2: 2.1484 (2.1484)  loss_ce_3: 2.0683 (2.0683)  loss_ce_4: 2.0683 (2.0683)  loss_giou: 1.7011 (1.7011)  loss_giou_0: 1.7000 (1.7000)  loss_giou_1: 1.7040 (1.7040)  loss_giou_2: 1.7059 (1.7059)  loss_giou_3: 1.7022 (1.7022)  loss_giou_4: 1.7012 (1.7012)  cardinality_error_unscaled: 293.1250 (293.1250)  cardinality_error_0_unscaled: 293.1250 (293.1250)  cardinality_error_1_unscaled: 293.1250 (293.1250)  cardinality_error_2_unscaled: 281.9375 (281.9375)  cardinality_error_3_unscaled: 293.1250 (293.1250)  cardinality_error_4_unscaled: 293.1250 (293.1250)  class_error_unscaled: 85.5712 (85.5712)  loss_bbox_unscaled: 0.7550 (0.7550)  loss_bbox_0_unscaled: 0.7565 (0.7565)  loss_bbox_1_unscaled: 0.7562 (0.7562)  loss_bbox_2_unscaled: 0.7551 (0.7551)  loss_bbox_3_unscaled: 0.7582 (0.7582)  loss_bbox_4_unscaled: 0.7571 (0.7571)  loss_ce_unscaled: 0.9787 (0.9787)  loss_ce_0_unscaled: 1.0076 (1.0076)  loss_ce_1_unscaled: 1.0098 (1.0098)  loss_ce_2_unscaled: 1.0742 (1.0742)  loss_ce_3_unscaled: 1.0341 (1.0341)  loss_ce_4_unscaled: 1.0342 (1.0342)  loss_giou_unscaled: 0.8506 (0.8506)  loss_giou_0_unscaled: 0.8500 (0.8500)  loss_giou_1_unscaled: 0.8520 (0.8520)  loss_giou_2_unscaled: 0.8530 (0.8530)  loss_giou_3_unscaled: 0.8511 (0.8511)  loss_giou_4_unscaled: 0.8506 (0.8506)  time: 3.4521  data: 0.4687  max mem: 2932
    Epoch: [0]  [ 100/7393]  eta: 1:17:39  lr: 0.000100  class_error: 85.74  loss: 28.2629 (33.7855)  loss_bbox: 1.5517 (2.3437)  loss_bbox_0: 1.5566 (2.3695)  loss_bbox_1: 1.5482 (2.3519)  loss_bbox_2: 1.5535 (2.3396)  loss_bbox_3: 1.5641 (2.3476)  loss_bbox_4: 1.5637 (2.3431)  loss_ce: 1.5467 (1.6584)  loss_ce_0: 1.5650 (1.6414)  loss_ce_1: 1.5443 (1.6461)  loss_ce_2: 1.5557 (1.6477)  loss_ce_3: 1.5392 (1.6545)  loss_ce_4: 1.5541 (1.6667)  loss_giou: 1.5534 (1.6289)  loss_giou_0: 1.5514 (1.6296)  loss_giou_1: 1.5541 (1.6292)  loss_giou_2: 1.5695 (1.6291)  loss_giou_3: 1.5526 (1.6289)  loss_giou_4: 1.5519 (1.6296)  cardinality_error_unscaled: 293.1875 (293.2420)  cardinality_error_0_unscaled: 293.1875 (293.2420)  cardinality_error_1_unscaled: 293.1875 (293.2420)  cardinality_error_2_unscaled: 293.1875 (293.1312)  cardinality_error_3_unscaled: 293.1875 (293.2420)  cardinality_error_4_unscaled: 293.1875 (293.1658)  class_error_unscaled: 75.6680 (75.4478)  loss_bbox_unscaled: 0.3103 (0.4687)  loss_bbox_0_unscaled: 0.3113 (0.4739)  loss_bbox_1_unscaled: 0.3096 (0.4704)  loss_bbox_2_unscaled: 0.3107 (0.4679)  loss_bbox_3_unscaled: 0.3128 (0.4695)  loss_bbox_4_unscaled: 0.3127 (0.4686)  loss_ce_unscaled: 0.7733 (0.8292)  loss_ce_0_unscaled: 0.7825 (0.8207)  loss_ce_1_unscaled: 0.7722 (0.8231)  loss_ce_2_unscaled: 0.7779 (0.8239)  loss_ce_3_unscaled: 0.7696 (0.8272)  loss_ce_4_unscaled: 0.7770 (0.8334)  loss_giou_unscaled: 0.7767 (0.8145)  loss_giou_0_unscaled: 0.7757 (0.8148)  loss_giou_1_unscaled: 0.7771 (0.8146)  loss_giou_2_unscaled: 0.7847 (0.8146)  loss_giou_3_unscaled: 0.7763 (0.8144)  loss_giou_4_unscaled: 0.7760 (0.8148)  time: 0.6098  data: 0.0105  max mem: 4353
    Traceback (most recent call last):
      File "main.py", line 258, in <module>
        main(args)
      File "main.py", line 206, in main
        train_stats = train_one_epoch(
      File "/research/d4/gds/zwang21/ConditionalDETR/engine.py", line 41, in train_one_epoch
        loss_dict = criterion(outputs, targets)
      File "/research/d4/gds/zwang21/anaconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
        result = self.forward(*input, **kwargs)
      File "/research/d4/gds/zwang21/ConditionalDETR/models/conditional_detr.py", line 254, in forward
        indices = self.matcher(outputs_without_aux, targets)
      File "/research/d4/gds/zwang21/anaconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
        result = self.forward(*input, **kwargs)
      File "/research/d4/gds/zwang21/anaconda3/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
        return func(*args, **kwargs)
      File "/research/d4/gds/zwang21/ConditionalDETR/models/matcher.py", line 79, in forward
        cost_giou = -generalized_box_iou(box_cxcywh_to_xyxy(out_bbox), box_cxcywh_to_xyxy(tgt_bbox))
      File "/research/d4/gds/zwang21/ConditionalDETR/util/box_ops.py", line 59, in generalized_box_iou
        assert (boxes1[:, 2:] >= boxes1[:, :2]).all()
    AssertionError
    Traceback (most recent call last):
      File "/research/d4/gds/zwang21/anaconda3/lib/python3.8/runpy.py", line 194, in _run_module_as_main
        return _run_code(code, main_globals, None,
      File "/research/d4/gds/zwang21/anaconda3/lib/python3.8/runpy.py", line 87, in _run_code
        exec(code, run_globals)
      File "/research/d4/gds/zwang21/anaconda3/lib/python3.8/site-packages/torch/distributed/launch.py", line 340, in <module>
        main()
      File "/research/d4/gds/zwang21/anaconda3/lib/python3.8/site-packages/torch/distributed/launch.py", line 326, in main
        sigkill_handler(signal.SIGTERM, None)  # not coming back
      File "/research/d4/gds/zwang21/anaconda3/lib/python3.8/site-packages/torch/distributed/launch.py", line 301, in sigkill_handler
        raise subprocess.CalledProcessError(returncode=last_return_code, cmd=cmd)
    subprocess.CalledProcessError: Command '['/research/d4/gds/zwang21/anaconda3/bin/python', '-u', 'main.py', '--coco_path', '../data/COCO2017', '--output_dir', 'output/conddetr_r50_epoch50']' returned non-zero exit status 1.
    *****************************************
    Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
    *****************************************
    Killing subprocess 29668
    Killing subprocess 29669
    Killing subprocess 29670
    Killing subprocess 29671
    Killing subprocess 29672
    Killing subprocess 29673
    Killing subprocess 29674
    Killing subprocess 29675
    
    1. please simplify the steps as much as possible so they do not require additional resources to run, such as a private dataset.

    Expected behavior:

    If there are no obvious error in "what you observed" provided above, please tell us the expected behavior.

    Environment:

    Provide your environment information using the following command:

    Collecting environment information...
    PyTorch version: 1.8.0
    Is debug build: False
    CUDA used to build PyTorch: 10.2
    ROCM used to build PyTorch: N/A
    
    OS: CentOS Linux release 7.9.2009 (Core) (x86_64)
    GCC version: (GCC) 11.2.0
    Clang version: Could not collect
    CMake version: version 2.8.12.2
    
    Python version: 3.8 (64-bit runtime)
    Is CUDA available: False
    CUDA runtime version: No CUDA
    GPU models and configuration: No CUDA
    Nvidia driver version: No CUDA
    cuDNN version: No CUDA
    HIP runtime version: N/A
    MIOpen runtime version: N/A
    
    Versions of relevant libraries:
    [pip3] numpy==1.22.2
    [pip3] numpydoc==1.1.0
    [pip3] pytorch-ignite==0.2.0
    [pip3] pytorch-metric-learning==0.9.99
    [pip3] torch==1.8.0
    [pip3] torchaudio==0.8.0a0+a751e1d
    [pip3] torchfile==0.1.0
    [pip3] torchsampler==0.1.1
    [pip3] torchsummary==1.5.1
    [pip3] torchvision==0.9.0
    [conda] blas                      1.0                         mkl  
    [conda] cudatoolkit               10.2.89              hfd86e86_1  
    [conda] ffmpeg                    4.3                  hf484d3e_0    pytorch
    [conda] mkl                       2021.2.0           h06a4308_296  
    [conda] mkl-service               2.3.0            py38h27cfd23_1  
    [conda] mkl_fft                   1.3.0            py38h42c9631_2  
    [conda] mkl_random                1.2.1            py38ha9443f7_2  
    [conda] numpy                     1.22.2                   pypi_0    pypi
    [conda] numpydoc                  1.1.0              pyhd3eb1b0_1  
    [conda] pytorch                   1.8.0           py3.8_cuda10.2_cudnn7.6.5_0    pytorch
    [conda] pytorch-ignite            0.2.0                    pypi_0    pypi
    [conda] pytorch-metric-learning   0.9.99                   pypi_0    pypi
    [conda] pytorch-mutex             1.0                        cuda    pytorch
    [conda] torch                     1.10.0                   pypi_0    pypi
    [conda] torchaudio                0.8.0                      py38    pytorch
    [conda] torchfile                 0.1.0                    pypi_0    pypi
    [conda] torchsampler              0.1.1                    pypi_0    pypi
    [conda] torchsummary              1.5.1                    pypi_0    pypi
    [conda] torchvision               0.9.0                py38_cu102    pytorch
    
    opened by Kyfafyd 1
  • out of memory

    out of memory

    hello author, i use the 3080ti to train Conditional DETR with entire coco2017 datasets. But the programs report that cuda out of memory,3080ti has 12GB memory.I use the msi after burner to monitor the memory usage,and it shows the biggest memory usage is only 2520MB I set the batchsize to 1. image

    opened by xziyh 3
Owner
Attention for Vision and Visualization
null
The Official Implementation of the ICCV-2021 Paper: Semantically Coherent Out-of-Distribution Detection.

SCOOD-UDG (ICCV 2021) This repository is the official implementation of the paper: Semantically Coherent Out-of-Distribution Detection Jingkang Yang,

Jake YANG 62 Nov 21, 2022
Official implementation of the ICCV 2021 paper: "The Power of Points for Modeling Humans in Clothing".

The Power of Points for Modeling Humans in Clothing (ICCV 2021) This repository contains the official PyTorch implementation of the ICCV 2021 paper: T

Qianli Ma 158 Nov 24, 2022
official Pytorch implementation of ICCV 2021 paper FuseFormer: Fusing Fine-Grained Information in Transformers for Video Inpainting.

FuseFormer: Fusing Fine-Grained Information in Transformers for Video Inpainting By Rui Liu, Hanming Deng, Yangyi Huang, Xiaoyu Shi, Lewei Lu, Wenxiu

null 77 Dec 27, 2022
Official implementation of the ICCV 2021 paper "Joint Inductive and Transductive Learning for Video Object Segmentation"

JOINT This is the official implementation of Joint Inductive and Transductive learning for Video Object Segmentation, to appear in ICCV 2021. @inproce

Yunyao 35 Oct 16, 2022
Official PyTorch Implementation of paper "Deep 3D Mask Volume for View Synthesis of Dynamic Scenes", ICCV 2021.

Deep 3D Mask Volume for View Synthesis of Dynamic Scenes Official PyTorch Implementation of paper "Deep 3D Mask Volume for View Synthesis of Dynamic S

Ken Lin 17 Oct 12, 2022
This is the official pytorch implementation for our ICCV 2021 paper "TRAR: Routing the Attention Spans in Transformers for Visual Question Answering" on VQA Task

?? ERASOR (RA-L'21 with ICRA Option) Official page of "ERASOR: Egocentric Ratio of Pseudo Occupancy-based Dynamic Object Removal for Static 3D Point C

Hyungtae Lim 225 Dec 29, 2022
Official Pytorch implementation of the paper "Action-Conditioned 3D Human Motion Synthesis with Transformer VAE", ICCV 2021

ACTOR Official Pytorch implementation of the paper "Action-Conditioned 3D Human Motion Synthesis with Transformer VAE", ICCV 2021. Please visit our we

Mathis Petrovich 248 Dec 23, 2022
Official Repository for the ICCV 2021 paper "PixelSynth: Generating a 3D-Consistent Experience from a Single Image"

PixelSynth: Generating a 3D-Consistent Experience from a Single Image (ICCV 2021) Chris Rockwell, David F. Fouhey, and Justin Johnson [Project Website

Chris Rockwell 95 Nov 22, 2022
Official code release for ICCV 2021 paper SNARF: Differentiable Forward Skinning for Animating Non-rigid Neural Implicit Shapes.

Official code release for ICCV 2021 paper SNARF: Differentiable Forward Skinning for Animating Non-rigid Neural Implicit Shapes.

null 235 Dec 26, 2022
Official implementation of NPMs: Neural Parametric Models for 3D Deformable Shapes - ICCV 2021

NPMs: Neural Parametric Models Project Page | Paper | ArXiv | Video NPMs: Neural Parametric Models for 3D Deformable Shapes Pablo Palafox, Aljaz Bozic

PabloPalafox 109 Nov 22, 2022
An official implementation of "Exploiting a Joint Embedding Space for Generalized Zero-Shot Semantic Segmentation" (ICCV 2021) in PyTorch.

Exploiting a Joint Embedding Space for Generalized Zero-Shot Semantic Segmentation This is an official implementation of the paper "Exploiting a Joint

CV Lab @ Yonsei University 35 Oct 26, 2022
[ICCV 2021] Official Pytorch implementation for Discriminative Region-based Multi-Label Zero-Shot Learning SOTA results on NUS-WIDE and OpenImages

Discriminative Region-based Multi-Label Zero-Shot Learning (ICCV 2021) [arXiv][Project page >> coming soon] Sanath Narayan*, Akshita Gupta*, Salman Kh

Akshita Gupta 54 Nov 21, 2022
[ICCV 2021] Official Pytorch implementation for Discriminative Region-based Multi-Label Zero-Shot Learning SOTA results on NUS-WIDE and OpenImages

Discriminative Region-based Multi-Label Zero-Shot Learning (ICCV 2021) [arXiv][Project page >> coming soon] Sanath Narayan*, Akshita Gupta*, Salman Kh

Akshita Gupta 54 Nov 21, 2022
[ICCV 2021] Official PyTorch implementation for Deep Relational Metric Learning.

Deep Relational Metric Learning This repository is the official PyTorch implementation of Deep Relational Metric Learning. Framework Datasets CUB-200-

Borui Zhang 39 Dec 10, 2022
[ICCV 2021] Official Tensorflow Implementation for "Single Image Defocus Deblurring Using Kernel-Sharing Parallel Atrous Convolutions"

KPAC: Kernel-Sharing Parallel Atrous Convolutional block This repository contains the official Tensorflow implementation of the following paper: Singl

Hyeongseok Son 50 Dec 29, 2022
Official pytorch implementation of "Scaling-up Disentanglement for Image Translation", ICCV 2021.

Official pytorch implementation of "Scaling-up Disentanglement for Image Translation", ICCV 2021.

Aviv Gabbay 41 Nov 29, 2022
Official PyTorch implementation of N-ImageNet: Towards Robust, Fine-Grained Object Recognition with Event Cameras (ICCV 2021)

N-ImageNet: Towards Robust, Fine-Grained Object Recognition with Event Cameras Official PyTorch implementation of N-ImageNet: Towards Robust, Fine-Gra

null 32 Dec 26, 2022
Official implementation of Protected Attribute Suppression System, ICCV 2021

Official implementation of Protected Attribute Suppression System, ICCV 2021

Prithviraj Dhar 6 Jan 1, 2023
This Repo is the official CUDA implementation of ICCV 2019 Oral paper for CARAFE: Content-Aware ReAssembly of FEatures

Introduction This Repo is the official CUDA implementation of ICCV 2019 Oral paper for CARAFE: Content-Aware ReAssembly of FEatures. @inproceedings{Wa

Jiaqi Wang 42 Jan 7, 2023