Code for our NeurIPS 2021 paper Mining the Benefits of Two-stage and One-stage HOI Detection

Related tags

Deep Learning CDN
Overview

CDN

Code for our NeurIPS 2021 paper "Mining the Benefits of Two-stage and One-stage HOI Detection".

Contributed by Aixi Zhang*, Yue Liao*, Si Liu, Miao Lu, Yongliang Wang, Chen Gao and Xiaobo Li.

Installation

Installl the dependencies.

pip install -r requirements.txt

Data preparation

HICO-DET

HICO-DET dataset can be downloaded here. After finishing downloading, unpack the tarball (hico_20160224_det.tar.gz) to the data directory.

Instead of using the original annotations files, we use the annotation files provided by the PPDM authors. The annotation files can be downloaded from here. The downloaded annotation files have to be placed as follows.

data
 └─ hico_20160224_det
     |─ annotations
     |   |─ trainval_hico.json
     |   |─ test_hico.json
     |   └─ corre_hico.npy
     :

V-COCO

First clone the repository of V-COCO from here, and then follow the instruction to generate the file instances_vcoco_all_2014.json. Next, download the prior file prior.pickle from here. Place the files and make directories as follows.

qpic
 |─ data
 │   └─ v-coco
 |       |─ data
 |       |   |─ instances_vcoco_all_2014.json
 |       |   :
 |       |─ prior.pickle
 |       |─ images
 |       |   |─ train2014
 |       |   |   |─ COCO_train2014_000000000009.jpg
 |       |   |   :
 |       |   └─ val2014
 |       |       |─ COCO_val2014_000000000042.jpg
 |       |       :
 |       |─ annotations
 :       :

For our implementation, the annotation file have to be converted to the HOIA format. The conversion can be conducted as follows.

PYTHONPATH=data/v-coco \
        python convert_vcoco_annotations.py \
        --load_path data/v-coco/data \
        --prior_path data/v-coco/prior.pickle \
        --save_path data/v-coco/annotations

Note that only Python2 can be used for this conversion because vsrl_utils.py in the v-coco repository shows a error with Python3.

V-COCO annotations with the HOIA format, corre_vcoco.npy, test_vcoco.json, and trainval_vcoco.json will be generated to annotations directory.

Pre-trained model

Download the pretrained model of DETR detector for ResNet50, and put it to the params directory.

python convert_parameters.py \
        --load_path params/detr-r50-e632da11.pth \
        --save_path params/detr-r50-pre-2stage-q64.pth \
        --num_queries 64

python convert_parameters.py \
        --load_path params/detr-r50-e632da11.pth \
        --save_path params/detr-r50-pre-2stage.pth \
        --dataset vcoco

Training

After the preparation, you can start training with the following commands. The whole training is split into two steps: CDN base model training and dynamic re-weighting training. The trainings of CDN-S for HICO-DET and V-COCO are shown as follows.

HICO-DET

python -m torch.distributed.launch \
        --nproc_per_node=8 \
        --use_env \
        main.py \
        --pretrained params/detr-r50-pre-2stage-q64.pth \
        --output_dir logs \
        --dataset_file hico \
        --hoi_path data/hico_20160224_det \
        --num_obj_classes 80 \
        --num_verb_classes 117 \
        --backbone resnet50 \
        --num_queries 64 \
        --dec_layers_hopd 3 \
        --dec_layers_interaction 3 \
        --epochs 90 \
        --lr_drop 60 \
        --use_nms_filter

python -m torch.distributed.launch \
        --nproc_per_node=8 \
        --use_env \
        main.py \
        --pretrained logs/checkpoint_last.pth \
        --output_dir logs/ \
        --dataset_file hico \
        --hoi_path data/hico_20160224_det \
        --num_obj_classes 80 \
        --num_verb_classes 117 \
        --backbone resnet50 \
        --num_queries 64 \
        --dec_layers_hopd 3 \
        --dec_layers_interaction 3 \
        --epochs 10 \
        --freeze_mode 1 \
        --obj_reweight \
        --verb_reweight \
        --use_nms_filter

V-COCO

python -m torch.distributed.launch \
        --nproc_per_node=8 \
        --use_env \
        main.py \
        --pretrained params/detr-r50-pre-2stage.pth \
        --output_dir logs \
        --dataset_file vcoco \
        --hoi_path data/v-coco \
        --num_obj_classes 81 \
        --num_verb_classes 29 \
        --backbone resnet50 \
        --num_queries 100 \
        --dec_layers_hopd 3 \
        --dec_layers_interaction 3 \
        --epochs 90 \
        --lr_drop 60 \
        --use_nms_filter

python -m torch.distributed.launch \
        --nproc_per_node=8 \
        --use_env \
        main.py \
        --pretrained logs/checkpoint_last.pth \
        --output_dir logs/ \
        --dataset_file vcoco \
        --hoi_path data/v-coco \
        --num_obj_classes 81 \
        --num_verb_classes 29 \
        --backbone resnet50 \
        --num_queries 100 \
        --dec_layers_hopd 3 \
        --dec_layers_interaction 3 \
        --epochs 10 \
        --freeze_mode 1 \
        --verb_reweight \
        --use_nms_filter

Evaluation

HICO-DET

You can conduct the evaluation with trained parameters for HICO-DET as follows.

python -m torch.distributed.launch \
        --nproc_per_node=8 \
        --use_env \
        main.py \
        --pretrained pretrained/hico_cdn_s.pth \
        --dataset_file hico \
        --hoi_path data/hico_20160224_det \
        --num_obj_classes 80 \
        --num_verb_classes 117 \
        --backbone resnet50 \
        --num_queries 64 \
        --dec_layers_hopd 3 \
        --dec_layers_interaction 3 \
        --eval \
        --use_nms_filter

V-COCO

For the official evaluation of V-COCO, a pickle file of detection results have to be generated. You can generate the file and then evaluate it as follows.

python generate_vcoco_official.py \
        --param_path pretrained/vcoco_cdn_s.pth \
        --save_path vcoco.pickle \
        --hoi_path data/v-coco \
        --dec_layers_hopd 3 \
        --dec_layers_interaction 3 \
        --use_nms_filter

cd data/v-coco
python vsrl_eval.py vcoco.pickle

Results

HICO-DET

Full (D) Rare (D) Non-rare (D) Full(KO) Rare (KO) Non-rare (KO) Download
CDN-S (R50) 31.44 27.39 32.64 34.09 29.63 35.42 model
CDN-B (R50) 31.78 27.55 33.05 34.53 29.73 35.96 model
CDN-L (R101) 32.07 27.19 33.53 34.79 29.48 36.38 model

D: Default, KO: Known object

V-COCO

Scenario 1 Scenario 2 Download
CDN-S (R50) 61.68 63.77 model
CDN-B (R50) 62.29 64.42 model
CDN-L (R101) 63.91 65.89 model

Citation

Please consider citing our paper if it helps your research.

@article{zhang2021mining,
  title={Mining the Benefits of Two-stage and One-stage HOI Detection},
  author={Zhang, Aixi and Liao, Yue and Liu, Si and Lu, Miao and Wang, Yongliang and Gao, Chen and Li, Xiaobo},
  journal={arXiv preprint arXiv:2108.05077},
  year={2021}
}

License

CDN is released under the MIT license. See LICENSE for additional details.

Acknowledge

Some of the codes are built upon PPDM, DETR and QPIC. Thanks them for their great works!

Comments
  • dynamic reweighting causes performance degradation in reproducing

    dynamic reweighting causes performance degradation in reproducing

    Hi, thanks for sharing the code! Great work!

    I have a small question in reproducing your result. I run the CDN-S model (res50, 3+3). It gave a result of about 31.5 or 31.2 (I run 2 times) after the first training stage (train the whole model with re gular loss). But after the second training stage (decoupled training) is finished, the performance downgrades to 31.0 and 30.4 for these 2 runs separately. For full mAP, rare mAP and non-rare mAP, this trick seems to be not helpful.

    So I wonder what could goes wrong during my reproduction or what can be the reason. I will paste the commands and log below. Thanks. Nice day :3

    opened by Charles-Xie 13
  • HICO-DET finetuned Detr

    HICO-DET finetuned Detr

    HI thank u for sharing such a nice work!

    I currently use HOI models for custom datasets and I wonder if u have plans to share pretrained DETR with fine-tuned on HICO-DET.

    Thanks

    opened by jihwanp 7
  • Confusion about parameter conversion

    Confusion about parameter conversion

    Hello, Thanks a lot for your work. As I can see in the mentioned line, you are renaming DETR decoder layer to stage2_decoder. However, in your model there is no stage2_decoder. Does that mean you are not initializing the decoders from DETR?

    https://github.com/YueLiao/CDN/blob/71c570296ecc7194c7abf99f5207344aa9c8dabe/convert_parameters.py#L50

    opened by ASMIftekhar 6
  • Custom Dataset

    Custom Dataset

    Hello,

    Thanks for this great project! I have a quick question; I'm adapting the CDN code to a custom dataset. However, I can't seem to understand the format. I have converted my dataset to the VCOCO format and I was able to extract the dict keys: (['image_id', 'ann_id', 'role_object_id', 'label', 'action_name', 'role_name', 'include']) but I don't understand the pattern in the trainval.json, train.json, val.json, and test.json.

    opened by johnolap 5
  • Two questions about the detail

    Two questions about the detail

    Hello, thanks for your research! I have two questions:

    1. CDN give 100 prediction HOI triplets on every image of V-COCO, and then apply PNMS on the top-100 result. After checking your code and I find you don't set threshold to filter the final result. So the final number of predicting HOI triplets may be 40 or 50 at least, but the ground-truth for one image hava no more than 10 HOI triplets. How do you deal with this?

    2. In your paper, you do some useful research about the Human-object Pair Generation (use the HO-pair decoder to replace the Faster-RCNN in iCAN baseline). Recently I'm also want to do the same expriment, but I'm confused about the details. In CDN, we have 100 prediction for every image, So we could apply Hungarian algorithm to match the preds and labels for a batch. But if we use HO-pair decoder to replace Faster-RCNN in iCAN and then set a threshold to generate human-pairs, then may be one image generate k1 HO pairs, but another image generate k2 HO pairs (k1 is not equal to k2). Under these circumstances, we should apply Hungarian algorithm to match preds and labels for every image??

    Sorry for long and complex questions.

    opened by Devotedyang 5
  • The model providing is not the best.

    The model providing is not the best.

    Hi~ the CDN-L(R101) model you provide show the result of 63.01(Scenario 1) and 64.45(Scenario 2), which is not same as the result 63.91 and 65.89 in paper. So could you provide the model with best performance? Thanks a lot.

    opened by Devotedyang 5
  • How to draw a heatmap for decoder?

    How to draw a heatmap for decoder?

    Sorry to interrupt, but I'm very interested in how to get a heatmap about the decoder like fig.3 mentioned in paper, because the input and output of decoder are both a pair of query(100, N, 256), and the multihead_attn function is torch wrapped. So, I would appreciate any advice.

    opened by ddwhzh 3
  • Code of experiment

    Code of experiment

    Hi!

    Thanks for your impressive work! I noticed that there is an experiment which use pair generated by HO-PD while interaction classifier is iCAN. The performance gain(+9mAP) is really impressive! I really want to try it myself. Could you please release the code of this experiment?

    Thanks a lot!

    Touger

    opened by Touger 2
  • Results.json file has too much prediction.

    Results.json file has too much prediction.

    Hi, I run the test code for HICO by hico_cdn_s.pth, which got the mAP is 31.3616%, Nice work!

    At the same time, it generates a results.json file. However, I review the file and found that predict has too much. I don't know why is this (I'm a freshman in HOI).

    Can you provide your results.json or figure out some advice.

    Thank you!

    (PS: I will upload my results.json file for you to check. Thanks again.) image

    opened by JackKoLing 2
  • Ask for visual.py

    Ask for visual.py

    Hello,

    Thanks for your great work. I run your code and get the result.json file. Now I want to visual the image and predicted annotations to analyze the error samples.

    Can you provide a visual script? Thank you!

    opened by JackKoLing 2
  • Evaluation of pretrained model on HICO-Det is lower than reported.

    Evaluation of pretrained model on HICO-Det is lower than reported.

    Hi, I downloaded your CDN-S/B/L pretrained model, and evaluate them locally with your script. The results are all lower than the reported results. Specifically, Model S mAP: 0.3150153102455715 mAP rare: 0.27220402283111156 mAP non-rare: 0.32780309739534524 mean max recall: 0.6390865334740454 Model B mAP: 0.31671305044814135 mAP rare: 0.27124143008976015 mAP non-rare: 0.3302954825032422 mean max recall: 0.6436087461863998 Model L mAP: 0.3196075598370539 mAP rare: 0.2724063291553318 mAP non-rare: 0.3337066287419839 mean max recall: 0.6492344626931886

    I'm wondering what caused the inconsistency. what's the version of opencv-python in your environment? I'm using opencv 4.5.1.

    opened by boringwar 2
  • questions about `args.use_matching`

    questions about `args.use_matching`

    Thank you for the nice work of HOI. I'm currently following your work

    I have a question about the args.use_matching. It seems that you did not use the matching_embed when training the model. Is that true all the reported model were trained with args.use_matching=False ?

    opened by Dawn-LX 5
Owner
null
Code for one-stage adaptive set-based HOI detector AS-Net.

AS-Net Code for one-stage adaptive set-based HOI detector AS-Net. Mingfei Chen*, Yue Liao*, Si Liu, Zhiyuan Chen, Fei Wang, Chen Qian. "Reformulating

Mingfei Chen 45 Dec 9, 2022
Code for Two-stage Identifier: "Locate and Label: A Two-stage Identifier for Nested Named Entity Recognition"

Code for Two-stage Identifier: "Locate and Label: A Two-stage Identifier for Nested Named Entity Recognition", accepted at ACL 2021. For details of the model and experiments, please see our paper.

tricktreat 87 Dec 16, 2022
Code for our CVPR 2022 Paper "GEN-VLKT: Simplify Association and Enhance Interaction Understanding for HOI Detection"

GEN-VLKT Code for our CVPR 2022 paper "GEN-VLKT: Simplify Association and Enhance Interaction Understanding for HOI Detection". Contributed by Yue Lia

Yue Liao 47 Dec 4, 2022
The official codes of our CVPR2022 paper: A Differentiable Two-stage Alignment Scheme for Burst Image Reconstruction with Large Shift

TwoStageAlign The official codes of our CVPR2022 paper: A Differentiable Two-stage Alignment Scheme for Burst Image Reconstruction with Large Shift Pa

Shi Guo 32 Dec 15, 2022
This is the code for HOI Transformer

HOI Transformer Code for CVPR 2021 accepted paper End-to-End Human Object Interaction Detection with HOI Transformer. Reproduction We recomend you to

BigBangEpoch 124 Dec 29, 2022
The trained model and denoising example for paper : Cardiopulmonary Auscultation Enhancement with a Two-Stage Noise Cancellation Approach

The trained model and denoising example for paper : Cardiopulmonary Auscultation Enhancement with a Two-Stage Noise Cancellation Approach

ycj_project 1 Jan 18, 2022
PyTorch implementation for our NeurIPS 2021 Spotlight paper "Long Short-Term Transformer for Online Action Detection".

Long Short-Term Transformer for Online Action Detection Introduction This is a PyTorch implementation for our NeurIPS 2021 Spotlight paper "Long Short

null 77 Dec 16, 2022
Code for the paper One Thing One Click: A Self-Training Approach for Weakly Supervised 3D Semantic Segmentation, CVPR 2021.

One Thing One Click One Thing One Click: A Self-Training Approach for Weakly Supervised 3D Semantic Segmentation (CVPR2021) Code for the paper One Thi

null 44 Dec 12, 2022
TOOD: Task-aligned One-stage Object Detection, ICCV2021 Oral

One-stage object detection is commonly implemented by optimizing two sub-tasks: object classification and localization, using heads with two parallel branches, which might lead to a certain level of spatial misalignment in predictions between the two tasks.

null 264 Jan 9, 2023
DAFNe: A One-Stage Anchor-Free Deep Model for Oriented Object Detection

DAFNe: A One-Stage Anchor-Free Deep Model for Oriented Object Detection Code for our Paper DAFNe: A One-Stage Anchor-Free Deep Model for Oriented Obje

Steven Lang 58 Dec 19, 2022
Pyramid Grafting Network for One-Stage High Resolution Saliency Detection. CVPR 2022

PGNet Pyramid Grafting Network for One-Stage High Resolution Saliency Detection. CVPR 2022, CVPR 2022 (arXiv 2204.05041) Abstract Recent salient objec

CVTEAM 109 Dec 5, 2022
Code to reproduce the experiments from our NeurIPS 2021 paper " The Limitations of Large Width in Neural Networks: A Deep Gaussian Process Perspective"

Code To run: python runner.py new --save <SAVE_NAME> --data <PATH_TO_DATA_DIR> --dataset <DATASET> --model <model_name> [options] --n 1000 - train - t

Geoff Pleiss 5 Dec 12, 2022
Code for our NeurIPS 2021 paper 'Exploiting the Intrinsic Neighborhood Structure for Source-free Domain Adaptation'

Exploiting the Intrinsic Neighborhood Structure for Source-free Domain Adaptation (NeurIPS 2021) Code for our NeurIPS 2021 paper 'Exploiting the Intri

Shiqi Yang 53 Dec 25, 2022
Two-stage CenterNet

Probabilistic two-stage detection Two-stage object detectors that use class-agnostic one-stage detectors as the proposal network. Probabilistic two-st

Xingyi Zhou 1.1k Jan 3, 2023
Two-Stage Peer-Regularized Feature Recombination for Arbitrary Image Style Transfer

Two-Stage Peer-Regularized Feature Recombination for Arbitrary Image Style Transfer Paper on arXiv Public PyTorch implementation of two-stage peer-reg

NNAISENSE 38 Oct 14, 2022
dualFace: Two-Stage Drawing Guidance for Freehand Portrait Sketching (CVMJ)

dualFace dualFace: Two-Stage Drawing Guidance for Freehand Portrait Sketching (CVMJ) We provide python implementations for our CVM 2021 paper "dualFac

Haoran XIE 46 Nov 10, 2022
A two-stage U-Net for high-fidelity denoising of historical recordings

A two-stage U-Net for high-fidelity denoising of historical recordings Official repository of the paper (not submitted yet): E. Moliner and V. Välimäk

Eloi Moliner Juanpere 57 Jan 5, 2023
PyTorch implementation of "A Two-Stage End-to-End System for Speech-in-Noise Hearing Aid Processing"

Implementation of the Sheffield entry for the first Clarity enhancement challenge (CEC1) This repository contains the PyTorch implementation of "A Two

null 10 Aug 19, 2022
Virtual Dance Reality Stage: a feature that offers you to share a stage with another user virtually

Portrait Segmentation using Tensorflow This script removes the background from an input image. You can read more about segmentation here Setup The scr

null 291 Dec 24, 2022