Code for our CVPR 2022 Paper "GEN-VLKT: Simplify Association and Enhance Interaction Understanding for HOI Detection"

Yue Liao

Last update: Dec 4, 2022

Related tags

Deep Learning gen-vlkt

Overview

GEN-VLKT

Code for our CVPR 2022 paper "GEN-VLKT: Simplify Association and Enhance Interaction Understanding for HOI Detection".

Contributed by Yue Liao*, Aixi Zhang*, Miao Lu, Yongliang Wang, Xiaobo Li and Si Liu.

Installation

Installl the dependencies.

pip install -r requirements.txt

Clone and build CLIP.

git clone https://github.com/openai/CLIP.git && cd CLIP && python setup.py develop && cd ..

Data preparation

HICO-DET

HICO-DET dataset can be downloaded here. After finishing downloading, unpack the tarball (hico_20160224_det.tar.gz) to the data directory.

Instead of using the original annotations files, we use the annotation files provided by the PPDM authors. The annotation files can be downloaded from here. The downloaded annotation files have to be placed as follows.

data
 └─ hico_20160224_det
     |─ annotations
     |   |─ trainval_hico.json
     |   |─ test_hico.json
     |   └─ corre_hico.npy
     :

V-COCO

First clone the repository of V-COCO from here, and then follow the instruction to generate the file instances_vcoco_all_2014.json. Next, download the prior file prior.pickle from here. Place the files and make directories as follows.

GEN-VLKT
 |─ data
 │   └─ v-coco
 |       |─ data
 |       |   |─ instances_vcoco_all_2014.json
 |       |   :
 |       |─ prior.pickle
 |       |─ images
 |       |   |─ train2014
 |       |   |   |─ COCO_train2014_000000000009.jpg
 |       |   |   :
 |       |   └─ val2014
 |       |       |─ COCO_val2014_000000000042.jpg
 |       |       :
 |       |─ annotations
 :       :

For our implementation, the annotation file have to be converted to the HOIA format. The conversion can be conducted as follows.

PYTHONPATH=data/v-coco \
        python convert_vcoco_annotations.py \
        --load_path data/v-coco/data \
        --prior_path data/v-coco/prior.pickle \
        --save_path data/v-coco/annotations

Note that only Python2 can be used for this conversion because vsrl_utils.py in the v-coco repository shows a error with Python3.

V-COCO annotations with the HOIA format, corre_vcoco.npy, test_vcoco.json, and trainval_vcoco.json will be generated to annotations directory.

Pre-trained model

Download the pretrained model of DETR detector for ResNet50, and put it to the params directory.

python ./tools/convert_parameters.py \
        --load_path params/detr-r50-e632da11.pth \
        --save_path params/detr-r50-pre-2branch-hico.pth \
        --num_queries 64

python ./tools/convert_parameters.py \
        --load_path params/detr-r50-e632da11.pth \
        --save_path params/detr-r50-pre-2branch-vcoco.pth \
        --dataset vcoco \
        --num_queries 64

Training

After the preparation, you can start training with the following commands. The whole training is split into two steps: GEN-VLKT base model training and dynamic re-weighting training. The trainings of GEN-VLKT-S for HICO-DET and V-COCO are shown as follows.

HICO-DET

sh ./config/hico_s.sh

V-COCO

sh ./configs/vcoco_s.sh

Zero-shot

sh ./configs/hico_s_zs_nf_uc.sh

Evaluation

HICO-DET

You can conduct the evaluation with trained parameters for HICO-DET as follows.

python -m torch.distributed.launch \
        --nproc_per_node=8 \
        --use_env \
        main.py \
        --pretrained pretrained/hico_gen_vlkt_s.pth \
        --dataset_file hico \
        --hoi_path data/hico_20160224_det \
        --num_obj_classes 80 \
        --num_verb_classes 117 \
        --backbone resnet50 \
        --num_queries 64 \
        --dec_layers 3 \
        --eval \
        --with_clip_label \
        --with_obj_clip_label \
        --use_nms_filter

For the official evaluation (reported in paper), you need to covert the prediction file to a official prediction format following this file, and then follow PPDM evaluation steps.

V-COCO

Firstly, you need the add the following main function to the vsrl_eval.py in data/v-coco.

if __name__ == '__main__':
  import sys

  vsrl_annot_file = 'data/vcoco/vcoco_test.json'
  coco_file = 'data/instances_vcoco_all_2014.json'
  split_file = 'data/splits/vcoco_test.ids'

  vcocoeval = VCOCOeval(vsrl_annot_file, coco_file, split_file)

  det_file = sys.argv[1]
  vcocoeval._do_eval(det_file, ovr_thresh=0.5)

Next, for the official evaluation of V-COCO, a pickle file of detection results have to be generated. You can generate the file with the following command. and then evaluate it as follows.

python generate_vcoco_official.py \
        --param_path pretrained/VCOCO_GEN_VLKT_S.pth \
        --save_path vcoco.pickle \
        --hoi_path data/v-coco \
        --num_queries 64 \
        --dec_layers 3 \
        --use_nms_filter \
        --with_clip_label \
        --with_obj_clip_label

cd data/v-coco
python vsrl_eval.py vcoco.pickle

Zero-shot

python -m torch.distributed.launch \
        --nproc_per_node=8 \
        --use_env \
        main.py \
        --pretrained pretrained/hico_gen_vlkt_s.pth \
        --dataset_file hico \
        --hoi_path data/hico_20160224_det \
        --num_obj_classes 80 \
        --num_verb_classes 117 \
        --backbone resnet50 \
        --num_queries 64 \
        --dec_layers 3 \
        --eval \
        --with_clip_label \
        --with_obj_clip_label \
        --use_nms_filter \
        --zero_shot_type rare_first \
        --del_unseen

Regular HOI Detection Results

HICO-DET

	Full (D)	Rare (D)	Non-rare (D)	Full(KO)	Rare (KO)	Non-rare (KO)	Download	Conifg
GEN-VLKT-S (R50)	33.75	29.25	35.10	36.78	32.75	37.99	model	config
GEN-VLKT-M* (R101)	34.63	30.04	36.01	37.97	33.72	39.24	model	config
GEN-VLKT-L (R101)	34.95	31.18	36.08	38.22	34.36	39.37	model	config

D: Default, KO: Known object, *: The original model is lost and the provided checkpoint performance is slightly different from the paper reported.

V-COCO

	Scenario 1	Scenario 2	Download	Config
GEN-VLKT-S (R50)	62.41	64.46	model	config
GEN-VLKT-M (R101)	63.28	65.58	model	config
GEN-VLKT-L (R101)	63.58	65.93	model	config

Zero-shot HOI Detection Results

	Type	Unseen	Seen	Full	Download	Conifg
GEN-VLKT-S	RF-UC	21.36	32.91	30.56	model	config
GEN-VLKT-S	NF-UC	25.05	23.38	23.71	model	config
GEN-VLKT-S	UO	10.51	28.92	25.63	model	config
GEN-VLKT-S	UV	20.96	30.23	28.74	model	config

Citation

Please consider citing our paper if it helps your research.

@inproceedings{liao2022genvlkt,
  title={GEN-VLKT: Simplify Association and Enhance Interaction Understanding for HOI Detection},
  author={Yue Liao, Aixi Zhang, Miao Lu, Yongliang Wang, Xiaobo Li, Si Liu},
  booktitle={CVPR},
  year={2022}
}

License

GEN-VLKT is released under the MIT license. See LICENSE for additional details.

Acknowledge

Some of the codes are built upon PPDM, DETR, QPIC and CDN. Thanks them for their great works!

Comments

Reproduce the results of the paper

Hello, thank you for your code; But I can't reproduce the result of GEN-VLKTs. I execute the scripthttps://github.com/YueLiao/gen-vlkt/blob/master/configs/hico_s.sh and get best result in hico_det below： "mAP full: 0.30144832960135726 mAP rare: 0.2543793789379969 mAP non-rare: 0.31550788629301035 mean max recall: 0.6205516668727018" ps: 1 3090 GPU. paper result: mAP full: 0.3375

opened by AssiduousMan 6
Batch size and learning rate

Hi, thank you for sharing the great work.

I wanted to reproduce the zero-shot setting and I got a question about batch size and learning rate. The default setting of hico_s_zs_rf_uc.sh is a batch size of 2 and a learning rate of 1e-4 on 8 gpus. Should I change the batch size or learning rate if I want to perform almost the same as the paper with 4 gpus instead of 8 gpus? If so, I would appreciate it if you could specify the exact number.

Thanks, Janet

opened by parang99 3
Reproduce the results of the paper

Hello, thank you for your code; But I can't reproduce the result of GEN-VLKTs. I execute the scripthttps://github.com/YueLiao/gen-vlkt/blob/master/configs/hico_s.sh and get best result in hico_det below： "mAP full: 0.33093688795372056 mAP rare: 0.2844252297058811 mAP non-rare: 0.34482998067710124 mean max recall: 0.6672380371879575". ps: 8 Tesla V100 GPUs. paper result: mAP full: 0.3375

opened by biubug6 2

Evaluation error at training stage using zero-shot configs

sh ./configs/hico_s_zs_nf_uc.sh will throw an error at evaluation after training one epoch:

 File "./engine.py", line 114, in evaluate_hoi
    evaluator = HICOEvaluator(preds, gts, data_loader.dataset.rare_triplets,
  File "./datasets/hico_eval_triplet.py", line 49, in __init__
    hoi_scores = img_preds['hoi_scores'] + obj_scores[:, self.hoi_obj_list]
ValueError: operands could not be broadcast together with shapes (64,480) (64,600)

But the non-zero-shot scripts run well. Please check out this issue.

opened by jxgu1016 2

how to set OMP_NUM_THREADS

when i run the program of gen-vlkt, it said "Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed." Although the program can run normally, I am afraid that it will affect the running speed. So, what should OMP_NUM_THREADS be set to under 4 gpus or 8 gpus?Looking forward to your reply！

opened by AssiduousMan 1
how to initialize the instance decoder query

Does the paper initialize human query and object query to different values,then self-attention together? I think instance decoder focus on parts feature, so human query and object query should go to self-attention separately as it in [Reference Paper 5], i cannot find description about this in paper. waiting for your reply,Thank you!

opened by jojolee123 1
How much GPU memory is needed for evaluation?

I want to evaluate GEN-VLKT-S (R50) model on HICO-DET dataset using a single RTX 3090TI. However, throw an error: CUDA error: out of memory.

I wonder how much GPU memory is needed for evaluation? Thanks!

opened by yuchen2199 1
How to obtain the Know object result of the HICO dataset

hello, i see the results of the code is about default(include full, rare, and non-rare) in HICO dataset, but i want to get the Know object result(include full, rare, and non-rare) in HICO dataset. So, how should i do, could you tell me, thanks you very much!

opened by xiaowuzuida 1
Show a image of the experimental results

Hello, this work is very interesting. I have a question. In this project, I didn't find the py file about displaying the image result. For example, input an image, network outputs a result image that this image contains the detection boxes of people and objects and the categories of their interactions. Could you please provide the code for this function? Thank you!

opened by VictorAidan 6
zero-shot settings

你好，我看到你论文中写道你随机挑选了20个verb，但是我根据你提供的UA_list我一共找到了36种不同的verb，包含no_interaction，这是因为你挑选的verb的图片里不可避免的存在其他verb的HOI，所以把它们也加入到unseen集合中吗？还有你挑选了很多hold构成的HOI组合，但是我发现有很多hold+object的组合没有被找全，这是什么原因？即使hold本身存在的语义歧义，但是hold dog 和 hold sheep应该都是牵着的含义，但是hold sheep并没有出现在你的UA中。您能否提供一下关于unseen_object和unseen_verb各自的id和类别，不是最终的组合类别。

opened by yujialele 5
Question regarding zero-shot inference

Hello, Thanks for your nice work.

I have a question regarding zero-shot inference. It seems that during inference you directly utilize frozen clip embeddings. However, during training you trained a 'dummy' interaction classifier on seen classes, this classifier does not have any effect in the inference phase. Do I understand this correctly?

opened by ASMIftekhar 1

Code for our CVPR 2022 Paper "GEN-VLKT: Simplify Association and Enhance Interaction Understanding for HOI Detection"

Related tags

Overview

GEN-VLKT

Installation

Data preparation

HICO-DET

V-COCO

Pre-trained model

Training

HICO-DET

V-COCO

Zero-shot

Evaluation

HICO-DET

V-COCO

Zero-shot

Regular HOI Detection Results

HICO-DET

V-COCO

Zero-shot HOI Detection Results

Citation

License

Acknowledge

Comments

Owner

Yue Liao

This project is the PyTorch implementation of our CVPR 2022 paper:

Imposter-detector-2022 - HackED 2022 Team 3IQ - 2022 Imposter Detector

The 7th edition of NTIRE: New Trends in Image Restoration and Enhancement workshop will be held on June 2022 in conjunction with CVPR 2022.

The code for our paper submitted to RAL/IROS 2022: OverlapTransformer: An Efficient and Rotation-Invariant Transformer Network for LiDAR-Based Place Recognition.

[CVPR 2022] Official code for the paper: "A Stitch in Time Saves Nine: A Train-Time Regularizing Loss for Improved Neural Network Calibration"

This is the formal code implementation of the CVPR 2022 paper 'Federated Class Incremental Learning'.

Official code of the paper "Expanding Low-Density Latent Regions for Open-Set Object Detection" (CVPR 2022)

Official code for the CVPR 2022 (oral) paper "Extracting Triangular 3D Models, Materials, and Lighting From Images".

This is the code for the paper "Jinkai Zheng, Xinchen Liu, Wu Liu, Lingxiao He, Chenggang Yan, Tao Mei: Gait Recognition in the Wild with Dense 3D Representations and A Benchmark. (CVPR 2022)"

Code for our CVPR 2021 paper "MetaCam+DSCE"

the code for our CVPR 2021 paper Bilateral Grid Learning for Stereo Matching Network [BGNet]

Code for our CVPR 2021 Paper "Rethinking Style Transfer: From Pixels to Parameterized Brushstrokes".

Official code for our CVPR '22 paper "Dataset Distillation by Matching Training Trajectories"

PyTorch implementation of our Adam-NSCL algorithm from our CVPR2021 (oral) paper "Training Networks in Null Space for Continual Learning"

Official Implementation of CVPR 2022 paper: "Mimicking the Oracle: An Initial Phase Decorrelation Approach for Class Incremental Learning"

PyTorch implementations of the paper: "DR.VIC: Decomposition and Reasoning for Video Individual Counting, CVPR, 2022"

Official PyTorch implementation of the paper "Deep Constrained Least Squares for Blind Image Super-Resolution", CVPR 2022.

[CVPR 2022] Pytorch implementation of "Templates for 3D Object Pose Estimation Revisited: Generalization to New objects and Robustness to Occlusions" paper

Pytorch re-implementation of Paper: SwinTextSpotter: Scene Text Spotting via Better Synergy between Text Detection and Text Recognition (CVPR 2022)