Code for our CVPR 2022 Paper "GEN-VLKT: Simplify Association and Enhance Interaction Understanding for HOI Detection"



Code for our CVPR 2022 paper "GEN-VLKT: Simplify Association and Enhance Interaction Understanding for HOI Detection".

Contributed by Yue Liao*, Aixi Zhang*, Miao Lu, Yongliang Wang, Xiaobo Li and Si Liu.


Installl the dependencies.

pip install -r requirements.txt

Clone and build CLIP.

git clone && cd CLIP && python develop && cd ..

Data preparation


HICO-DET dataset can be downloaded here. After finishing downloading, unpack the tarball (hico_20160224_det.tar.gz) to the data directory.

Instead of using the original annotations files, we use the annotation files provided by the PPDM authors. The annotation files can be downloaded from here. The downloaded annotation files have to be placed as follows.

 └─ hico_20160224_det
     |─ annotations
     |   |─ trainval_hico.json
     |   |─ test_hico.json
     |   └─ corre_hico.npy


First clone the repository of V-COCO from here, and then follow the instruction to generate the file instances_vcoco_all_2014.json. Next, download the prior file prior.pickle from here. Place the files and make directories as follows.

 |─ data
 │   └─ v-coco
 |       |─ data
 |       |   |─ instances_vcoco_all_2014.json
 |       |   :
 |       |─ prior.pickle
 |       |─ images
 |       |   |─ train2014
 |       |   |   |─ COCO_train2014_000000000009.jpg
 |       |   |   :
 |       |   └─ val2014
 |       |       |─ COCO_val2014_000000000042.jpg
 |       |       :
 |       |─ annotations
 :       :

For our implementation, the annotation file have to be converted to the HOIA format. The conversion can be conducted as follows.

PYTHONPATH=data/v-coco \
        python \
        --load_path data/v-coco/data \
        --prior_path data/v-coco/prior.pickle \
        --save_path data/v-coco/annotations

Note that only Python2 can be used for this conversion because in the v-coco repository shows a error with Python3.

V-COCO annotations with the HOIA format, corre_vcoco.npy, test_vcoco.json, and trainval_vcoco.json will be generated to annotations directory.

Pre-trained model

Download the pretrained model of DETR detector for ResNet50, and put it to the params directory.

python ./tools/ \
        --load_path params/detr-r50-e632da11.pth \
        --save_path params/detr-r50-pre-2branch-hico.pth \
        --num_queries 64

python ./tools/ \
        --load_path params/detr-r50-e632da11.pth \
        --save_path params/detr-r50-pre-2branch-vcoco.pth \
        --dataset vcoco \
        --num_queries 64


After the preparation, you can start training with the following commands. The whole training is split into two steps: GEN-VLKT base model training and dynamic re-weighting training. The trainings of GEN-VLKT-S for HICO-DET and V-COCO are shown as follows.


sh ./config/


sh ./configs/


sh ./configs/



You can conduct the evaluation with trained parameters for HICO-DET as follows.

python -m torch.distributed.launch \
        --nproc_per_node=8 \
        --use_env \ \
        --pretrained pretrained/hico_gen_vlkt_s.pth \
        --dataset_file hico \
        --hoi_path data/hico_20160224_det \
        --num_obj_classes 80 \
        --num_verb_classes 117 \
        --backbone resnet50 \
        --num_queries 64 \
        --dec_layers 3 \
        --eval \
        --with_clip_label \
        --with_obj_clip_label \

For the official evaluation (reported in paper), you need to covert the prediction file to a official prediction format following this file, and then follow PPDM evaluation steps.


Firstly, you need the add the following main function to the in data/v-coco.

if __name__ == '__main__':
  import sys

  vsrl_annot_file = 'data/vcoco/vcoco_test.json'
  coco_file = 'data/instances_vcoco_all_2014.json'
  split_file = 'data/splits/vcoco_test.ids'

  vcocoeval = VCOCOeval(vsrl_annot_file, coco_file, split_file)

  det_file = sys.argv[1]
  vcocoeval._do_eval(det_file, ovr_thresh=0.5)

Next, for the official evaluation of V-COCO, a pickle file of detection results have to be generated. You can generate the file with the following command. and then evaluate it as follows.

python \
        --param_path pretrained/VCOCO_GEN_VLKT_S.pth \
        --save_path vcoco.pickle \
        --hoi_path data/v-coco \
        --num_queries 64 \
        --dec_layers 3 \
        --use_nms_filter \
        --with_clip_label \

cd data/v-coco
python vcoco.pickle


python -m torch.distributed.launch \
        --nproc_per_node=8 \
        --use_env \ \
        --pretrained pretrained/hico_gen_vlkt_s.pth \
        --dataset_file hico \
        --hoi_path data/hico_20160224_det \
        --num_obj_classes 80 \
        --num_verb_classes 117 \
        --backbone resnet50 \
        --num_queries 64 \
        --dec_layers 3 \
        --eval \
        --with_clip_label \
        --with_obj_clip_label \
        --use_nms_filter \
        --zero_shot_type rare_first \

Regular HOI Detection Results


Full (D) Rare (D) Non-rare (D) Full(KO) Rare (KO) Non-rare (KO) Download Conifg
GEN-VLKT-S (R50) 33.75 29.25 35.10 36.78 32.75 37.99 model config
GEN-VLKT-M* (R101) 34.63 30.04 36.01 37.97 33.72 39.24 model config
GEN-VLKT-L (R101) 34.95 31.18 36.08 38.22 34.36 39.37 model config

D: Default, KO: Known object, *: The original model is lost and the provided checkpoint performance is slightly different from the paper reported.


Scenario 1 Scenario 2 Download Config
GEN-VLKT-S (R50) 62.41 64.46 model config
GEN-VLKT-M (R101) 63.28 65.58 model config
GEN-VLKT-L (R101) 63.58 65.93 model config

Zero-shot HOI Detection Results

Type Unseen Seen Full Download Conifg
GEN-VLKT-S RF-UC 21.36 32.91 30.56 model config
GEN-VLKT-S NF-UC 25.05 23.38 23.71 model config
GEN-VLKT-S UO 10.51 28.92 25.63 model config
GEN-VLKT-S UV 20.96 30.23 28.74 model config


Please consider citing our paper if it helps your research.

  title={GEN-VLKT: Simplify Association and Enhance Interaction Understanding for HOI Detection},
  author={Yue Liao, Aixi Zhang, Miao Lu, Yongliang Wang, Xiaobo Li, Si Liu},


GEN-VLKT is released under the MIT license. See LICENSE for additional details.


Some of the codes are built upon PPDM, DETR, QPIC and CDN. Thanks them for their great works!

  • Reproduce the results of the paper

    Reproduce the results of the paper

    Hello, thank you for your code; But I can't reproduce the result of GEN-VLKTs. I execute the script and get best result in hico_det below: "mAP full: 0.30144832960135726 mAP rare: 0.2543793789379969 mAP non-rare: 0.31550788629301035 mean max recall: 0.6205516668727018" ps: 1 3090 GPU. paper result: mAP full: 0.3375

    opened by AssiduousMan 6
  • Batch size and learning rate

    Batch size and learning rate

    Hi, thank you for sharing the great work.

    I wanted to reproduce the zero-shot setting and I got a question about batch size and learning rate. The default setting of is a batch size of 2 and a learning rate of 1e-4 on 8 gpus. Should I change the batch size or learning rate if I want to perform almost the same as the paper with 4 gpus instead of 8 gpus? If so, I would appreciate it if you could specify the exact number.

    Thanks, Janet

    opened by parang99 3
  • Reproduce the results of the paper

    Reproduce the results of the paper

    Hello, thank you for your code; But I can't reproduce the result of GEN-VLKTs. I execute the script and get best result in hico_det below: "mAP full: 0.33093688795372056 mAP rare: 0.2844252297058811 mAP non-rare: 0.34482998067710124 mean max recall: 0.6672380371879575". ps: 8 Tesla V100 GPUs. paper result: mAP full: 0.3375

    opened by biubug6 2
  • Evaluation error at training stage using zero-shot configs

    Evaluation error at training stage using zero-shot configs

    sh ./configs/ will throw an error at evaluation after training one epoch:

     File "./", line 114, in evaluate_hoi
        evaluator = HICOEvaluator(preds, gts, data_loader.dataset.rare_triplets,
      File "./datasets/", line 49, in __init__
        hoi_scores = img_preds['hoi_scores'] + obj_scores[:, self.hoi_obj_list]
    ValueError: operands could not be broadcast together with shapes (64,480) (64,600)

    But the non-zero-shot scripts run well. Please check out this issue.

    opened by jxgu1016 2
  • how to set OMP_NUM_THREADS

    how to set OMP_NUM_THREADS

    when i run the program of gen-vlkt, it said "Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed." Although the program can run normally, I am afraid that it will affect the running speed. So, what should OMP_NUM_THREADS be set to under 4 gpus or 8 gpus?Looking forward to your reply!

    opened by AssiduousMan 1
  • how to initialize the instance decoder query

    how to initialize the instance decoder query

    Does the paper initialize human query and object query to different values,then self-attention together? I think instance decoder focus on parts feature, so human query and object query should go to self-attention separately as it in [Reference Paper 5], i cannot find description about this in paper. waiting for your reply,Thank you!

    opened by jojolee123 1
  • How much GPU memory is needed for evaluation?

    How much GPU memory is needed for evaluation?

    I want to evaluate GEN-VLKT-S (R50) model on HICO-DET dataset using a single RTX 3090TI. However, throw an error: CUDA error: out of memory.

    I wonder how much GPU memory is needed for evaluation? Thanks!

    opened by yuchen2199 1
  • How to obtain the Know object result of the HICO dataset

    How to obtain the Know object result of the HICO dataset

    hello, i see the results of the code is about default(include full, rare, and non-rare) in HICO dataset, but i want to get the Know object result(include full, rare, and non-rare) in HICO dataset. So, how should i do, could you tell me, thanks you very much!

    opened by xiaowuzuida 1
  • Show a image of the experimental results

    Show a image of the experimental results

    Hello, this work is very interesting. I have a question. In this project, I didn't find the py file about displaying the image result. For example, input an image, network outputs a result image that this image contains the detection boxes of people and objects and the categories of their interactions. Could you please provide the code for this function? Thank you!

    opened by VictorAidan 6
  • zero-shot settings

    zero-shot settings

    你好,我看到你论文中写道你随机挑选了20个verb,但是我根据你提供的UA_list我一共找到了36种不同的verb,包含no_interaction,这是因为你挑选的verb的图片里不可避免的存在其他verb的HOI,所以把它们也加入到unseen集合中吗?还有你挑选了 很多hold构成的HOI组合,但是我发现有很多hold+object的组合没有被找全,这是什么原因?即使hold本身存在的语义歧义,但是hold dog 和 hold sheep应该都是牵着的含义,但是hold sheep并没有出现在你的UA中。您能否提供一下关于unseen_object和unseen_verb各自的id和类别,不是最终的组合类别。

    opened by yujialele 5
  • Question regarding zero-shot inference

    Question regarding zero-shot inference

    Hello, Thanks for your nice work.

    I have a question regarding zero-shot inference. It seems that during inference you directly utilize frozen clip embeddings. However, during training you trained a 'dummy' interaction classifier on seen classes, this classifier does not have any effect in the inference phase. Do I understand this correctly?

    opened by ASMIftekhar 1
Yue Liao
PhD candidate at Beihang University
Yue Liao
This project is the PyTorch implementation of our CVPR 2022 paper:

Requirements and Dependency Install PyTorch with CUDA (for GPU). (Experiments are validated on python 3.8.11 and pytorch 1.7.0) (For visualization if

Lei Huang 23 Nov 29, 2022
Imposter-detector-2022 - HackED 2022 Team 3IQ - 2022 Imposter Detector

HackED 2022 Team 3IQ - 2022 Imposter Detector By Aneeljyot Alagh, Curtis Kan, Jo

Joshua Ji 3 Aug 20, 2022
The 7th edition of NTIRE: New Trends in Image Restoration and Enhancement workshop will be held on June 2022 in conjunction with CVPR 2022.

NTIRE 2022 - Image Inpainting Challenge Important dates 2022.02.01: Release of train data (input and output images) and validation data (only input) 2

Andrés Romero 37 Nov 27, 2022
The code for our paper submitted to RAL/IROS 2022: OverlapTransformer: An Efficient and Rotation-Invariant Transformer Network for LiDAR-Based Place Recognition.

OverlapTransformer The code for our paper submitted to RAL/IROS 2022: OverlapTransformer: An Efficient and Rotation-Invariant Transformer Network for

HAOMO.AI 136 Jan 3, 2023
[CVPR 2022] Official code for the paper: "A Stitch in Time Saves Nine: A Train-Time Regularizing Loss for Improved Neural Network Calibration"

MDCA Calibration This is the official PyTorch implementation for the paper: "A Stitch in Time Saves Nine: A Train-Time Regularizing Loss for Improved

MDCA Calibration 21 Dec 22, 2022
This is the formal code implementation of the CVPR 2022 paper 'Federated Class Incremental Learning'.

Official Pytorch Implementation for GLFC [CVPR-2022] Federated Class-Incremental Learning This is the official implementation code of our paper "Feder

Race Wang 57 Dec 27, 2022
Official code of the paper "Expanding Low-Density Latent Regions for Open-Set Object Detection" (CVPR 2022)

OpenDet Expanding Low-Density Latent Regions for Open-Set Object Detection (CVPR2022) Jiaming Han, Yuqiang Ren, Jian Ding, Xingjia Pan, Ke Yan, Gui-So

csuhan 64 Jan 7, 2023
Official code for the CVPR 2022 (oral) paper "Extracting Triangular 3D Models, Materials, and Lighting From Images".

nvdiffrec Joint optimization of topology, materials and lighting from multi-view image observations as described in the paper Extracting Triangular 3D

NVIDIA Research Projects 1.4k Jan 1, 2023
This is the code for the paper "Jinkai Zheng, Xinchen Liu, Wu Liu, Lingxiao He, Chenggang Yan, Tao Mei: Gait Recognition in the Wild with Dense 3D Representations and A Benchmark. (CVPR 2022)"

Gait3D-Benchmark This is the code for the paper "Jinkai Zheng, Xinchen Liu, Wu Liu, Lingxiao He, Chenggang Yan, Tao Mei: Gait Recognition in the Wild

null 82 Jan 4, 2023
Code for our CVPR 2021 paper "MetaCam+DSCE"

Joint Noise-Tolerant Learning and Meta Camera Shift Adaptation for Unsupervised Person Re-Identification (CVPR'21) Introduction Code for our CVPR 2021

FlyingRoastDuck 59 Oct 31, 2022
the code for our CVPR 2021 paper Bilateral Grid Learning for Stereo Matching Network [BGNet]

BGNet This repository contains the code for our CVPR 2021 paper Bilateral Grid Learning for Stereo Matching Network [BGNet] Environment Python 3.6.* C

3DCV developer 87 Nov 29, 2022
Code for our CVPR 2021 Paper "Rethinking Style Transfer: From Pixels to Parameterized Brushstrokes".

Rethinking Style Transfer: From Pixels to Parameterized Brushstrokes (CVPR 2021) Project page | Paper | Colab | Colab for Drawing App Rethinking Style

CompVis Heidelberg 153 Jan 4, 2023
Official code for our CVPR '22 paper "Dataset Distillation by Matching Training Trajectories"

Dataset Distillation by Matching Training Trajectories Project Page | Paper This repo contains code for training expert trajectories and distilling sy

George Cazenavette 256 Jan 5, 2023
PyTorch implementation of our Adam-NSCL algorithm from our CVPR2021 (oral) paper "Training Networks in Null Space for Continual Learning"

Adam-NSCL This is a PyTorch implementation of Adam-NSCL algorithm for continual learning from our CVPR2021 (oral) paper: Title: Training Networks in N

Shipeng Wang 34 Dec 21, 2022
Official Implementation of CVPR 2022 paper: "Mimicking the Oracle: An Initial Phase Decorrelation Approach for Class Incremental Learning"

(CVPR 2022) Mimicking the Oracle: An Initial Phase Decorrelation Approach for Class Incremental Learning ArXiv This repo contains Official Implementat

Yujun Shi 24 Nov 1, 2022
PyTorch implementations of the paper: "DR.VIC: Decomposition and Reasoning for Video Individual Counting, CVPR, 2022"

DRNet for Video Indvidual Counting (CVPR 2022) Introduction This is the official PyTorch implementation of paper: DR.VIC: Decomposition and Reasoning

tao han 35 Nov 22, 2022
Official PyTorch implementation of the paper "Deep Constrained Least Squares for Blind Image Super-Resolution", CVPR 2022.

Deep Constrained Least Squares for Blind Image Super-Resolution [Paper] This is the official implementation of 'Deep Constrained Least Squares for Bli

MEGVII Research 141 Dec 30, 2022
[CVPR 2022] Pytorch implementation of "Templates for 3D Object Pose Estimation Revisited: Generalization to New objects and Robustness to Occlusions" paper

template-pose Pytorch implementation of "Templates for 3D Object Pose Estimation Revisited: Generalization to New objects and Robustness to Occlusions

Van Nguyen Nguyen 92 Dec 28, 2022
Pytorch re-implementation of Paper: SwinTextSpotter: Scene Text Spotting via Better Synergy between Text Detection and Text Recognition (CVPR 2022)

SwinTextSpotter This is the pytorch implementation of Paper: SwinTextSpotter: Scene Text Spotting via Better Synergy between Text Detection and Text R

mxin262 183 Jan 3, 2023