QAHOI: Query-Based Anchors for Human-Object Interaction Detection (paper)

Last update: Dec 29, 2022

Related tags

Deep Learning QAHOI

Overview

QAHOI

QAHOI: Query-Based Anchors for Human-Object Interaction Detection (paper)

Requirements

PyTorch >= 1.5.1
torchvision >= 0.6.1

pip install -r requirements.txt

Compiling CUDA operators

cd ./models/ops
sh ./make.sh
# test
python test.py

Dataset Preparation

Please follow the HICO-DET dataset preparation of GGNet.

After preparation, the data folder as follows:

data
├── hico_20160224_det
|   ├── images
|   |   ├── test2015
|   |   └── train2015
|   └── annotations
|       ├── anno_list.json
|       ├── corre_hico.npy
|       ├── file_name_to_obj_cat.json
|       ├── hoi_id_to_num.json
|       ├── hoi_list_new.json
|       ├── test_hico.json
|       └── trainval_hico.json

Evaluation

Download the model to params folder.

We test the model with NVIDIA A6000 GPU, Pytorch 1.9.0, Python 3.8 and CUDA 11.2.

Model	Full (def)	Rare (def)	None-Rare (def)	Full (ko)	Rare (ko)	None-Rare (ko)	Download
Swin-Tiny	28.47	22.44	30.27	30.99	24.83	32.84	model
Swin-Base*+	33.58	25.86	35.88	35.34	27.24	37.76	model
Swin-Large*+	35.78	29.80	37.56	37.59	31.36	39.36	model

Evaluating the model by running the following command.

--eval_extra to evaluate the spatio contribution.

mAP_default.json and mAP_ko.json will save in current folder.

Swin-Tiny

python main.py --resume params/QAHOI_swin_tiny_mul3.pth --backbone swin_tiny --num_feature_levels 3 --use_nms --eval

Swin-Base*+

python main.py --resume params/QAHOI_swin_base_384_22k_mul3.pth --backbone swin_base_384 --num_feature_levels 3 --use_nms --eval

Swin-Large*+

python main.py --resume params/QAHOI_swin_large_384_22k_mul3.pth --backbone swin_large_384 --num_feature_levels 3 --use_nms --eval

Training

Download the pre-trained swin-tiny model from Swin-Transformer to params folder.

Training QAHOI with Swin-Tiny from scratch.

python -m torch.distributed.launch \
        --nproc_per_node=8 \
        --use_env main.py \
        --backbone swin_tiny \
        --pretrained params/swin_tiny_patch4_window7_224.pth \
        --output_dir logs/swin_tiny_mul3 \
        --epochs 150 \
        --lr_drop 120 \
        --num_feature_levels 3 \
        --num_queries 300 \
        --use_nms

Training QAHOI with Swin-Base*+ from scratch.

python -m torch.distributed.launch \
        --nproc_per_node=8 \
        --use_env main.py \
        --backbone swin_base_384 \
        --pretrained params/swin_base_patch4_window7_224_22k.pth \
        --output_dir logs/swin_base_384_22k_mul3 \
        --epochs 150 \
        --lr_drop 120 \
        --num_feature_levels 3 \
        --num_queries 300 \
        --use_nms

Training QAHOI with Swin-Large*+ from scratch.

python -m torch.distributed.launch \
        --nproc_per_node=8 \
        --use_env main.py \
        --backbone swin_large_384 \
        --pretrained params/swin_large_patch4_window12_384_22k.pth \
        --output_dir logs/swin_large_384_22k_mul3 \
        --epochs 150 \
        --lr_drop 120 \
        --num_feature_levels 3 \
        --num_queries 300 \
        --use_nms

Citation

@article{cjw,
  title={QAHOI: Query-Based Anchors for Human-Object Interaction Detection},
  author={Junwen Chen and Keiji Yanai},
  journal={arXiv preprint arXiv:2112.08647},
  year={2021}
}

Comments

Test QAHOI on custom images

I would like to test the model on images other than the HICO-DET and V-COCO datasets. I do not have ground truth detection. I only found the eval code but it requires ground truth detections for evaluation.

opened by mjantoun 3
question about the swin-tiny model

I can replicate the result of the swin-large* model. But i can't replicate the result of the swin-tiny model with the same training command

python -m torch.distributed.launch
--nproc_per_node=8
--use_env main.py
--backbone swin_tiny
--pretrained params/swin_tiny_patch4_window7_224.pth
--output_dir logs/swin_tiny_mul3
--epochs 150
--lr_drop 120
--num_feature_levels 3
--num_queries 300
--use_nms

opened by truetone2022 3
Question about iterative box refinement

Have you tried using the iterative box refinement module in deformable detr? Is the iterative box refinement module helpful for HOI detection? Thanks for any helpful reply!

opened by truetone2022 2
Training on vcoco dataset

Hello @cjw2021 ! Thanks for your great works! I tried to train this model on vcoco dataset to save some time.But when i build the vcoco dataset,i meet the same error on File "/root/autodl-tmp/QAHOI-main/models/matcher.py", line 53, in forward cost_verb_class = -(out_verb_prob.matmul(tgt_verb_labels_permute) / \ RuntimeError: mat1 dim 1 must match mat2 dim 0

Do you have the training code of vcoco in feature?My vcoco dataset training followed with "https://github.com/hitachi-rd-cv/qpic" to add vcoco.py and vcocoeval.py in dataset and fix it,If you could give me any help i would be appreciate.Thank you for your help.

opened by OBVIOUSDAWN 2
Is it possible for you to share a training log of Swin-T model?

Hello author, Thx for your great work! As I am using your repo, I find this model hard to converge compared to normal object detection. Thus, is it possible for you to share a training log of Swin-T model (or any other model)? Many thanks for that!

opened by JacobYuan7 2
How to train QPIC with Swin-Tiny

Hello, can you provide backbone for swin-T qpic training parameter details， I found in training that the same setting for each interaction mAP value is 0. How to train QPIC with Swin-Tiny, the mAP of the first 20 epochs we train is almost all 0. Thank you very much!

opened by yaoyaosanqi 1
Question about the performance on V-COCO dataset

Thanks for your great works!! :) Have u ever tried to train the QAHOI on V-COCO dataset? I donot find the results on V-COCO in paper. If you could give me any help i would be appreciate.Thank you for your help.

opened by bingnanG 1
Can you share your visualization script?

Hello @cjw2021 ! Thanks for your great works!

Can you share your visalization script? Also, if you already have distributed demo script and I am missing it, please let me know.

Regards, Sungguk Cha

opened by sunggukcha 1
parameter --use_nms is not used

Hi,

Thank you very much for sharing the codes. It's great! For the --use_nms parameter defined here https://github.com/cjw2021/QAHOI/blob/main/main.py#L138, it looks like it is not used.

opened by ilovecv 1
Converge trend of the model

Hi authors,

Thanks for your open-source implementation, I read your instruction and tried to reproduce the final detection performance. However I realized the converge speed of the model is too low: it takes almost 2 days to reach 150 epoches on two nodes with 8 gpus on each node. Have you tried any way to accelerate the procedure? Will scaling up the learning rate at the start of the training be helpful?

opened by hwfan 1

Owner

GitHub

PaddleRobotics is an open-source algorithm library for robots based on Paddle, including open-source parts such as human-robot interaction, complex motion control, environment perception, SLAM positioning, and navigation.

简体中文 | English PaddleRobotics paddleRobotics是基于paddle的机器人开源算法库集，包括人机交互、复杂运动控制、环境感知、slam定位导航等开源算法部分。人机交互主动多模交互技术TFVT-HRI 主动多模交互技术是通过视觉、语音、触摸传感器等输入机器人

185 Dec 26, 2022

Continuous Query Decomposition for Complex Query Answering in Incomplete Knowledge Graphs

Continuous Query Decomposition This repository contains the official implementation for our ICLR 2021 (Oral) paper, Complex Query Answering with Neura

71 Dec 29, 2022

Code for ACL 21: Generating Query Focused Summaries from Query-Free Resources

marge This repository releases the code for Generating Query Focused Summaries from Query-Free Resources. Please cite the following paper [bib] if you

28 Nov 10, 2022

Synthesizing Long-Term 3D Human Motion and Interaction in 3D in CVPR2021

Long-term-Motion-in-3D-Scenes This is an implementation of the CVPR'21 paper "Synthesizing Long-Term 3D Human Motion and Interaction in 3D". Please ch

76 Dec 13, 2022

Populating 3D Scenes by Learning Human-Scene Interaction https://posa.is.tue.mpg.de/

Populating 3D Scenes by Learning Human-Scene Interaction [Project Page] [Paper] License Software Copyright License for non-commercial scientific resea

81 Nov 8, 2022

Code for our CVPR 2022 Paper "GEN-VLKT: Simplify Association and Enhance Interaction Understanding for HOI Detection"

GEN-VLKT Code for our CVPR 2022 paper "GEN-VLKT: Simplify Association and Enhance Interaction Understanding for HOI Detection". Contributed by Yue Lia

47 Dec 4, 2022

[CVPR 2021] Modular Interactive Video Object Segmentation: Interaction-to-Mask, Propagation and Difference-Aware Fusion

364 Jan 3, 2023

CPF: Learning a Contact Potential Field to Model the Hand-object Interaction

Contact Potential Field This repo contains model, demo, and test codes of our paper: CPF: Learning a Contact Potential Field to Model the Hand-object

99 Dec 26, 2022

Hybrid CenterNet - Hybrid-supervised object detection / Weakly semi-supervised object detection

Hybrid-Supervised Object Detection System Object detection system trained by hybrid-supervision/weakly semi-supervision (HSOD/WSSOD): This project is

5 Dec 10, 2022

Yolo object detection - Yolo object detection with python

How to run download required files make build_image make download Docker versio

3 Jan 26, 2022

MOT-Tracking-by-Detection-Pipeline - For Tracking-by-Detection format MOT (Multi Object Tracking), is it a framework that separates Detection and Tracking processes?

MOT-Tracking-by-Detection-Pipeline Tracking-by-Detection形式のMOT(Multi Object Trac

41 Nov 23, 2022

This is the repo for the paper `SumGNN: Multi-typed Drug Interaction Prediction via Efficient Knowledge Graph Summarization'. (published in Bioinformatics'21)

SumGNN: Multi-typed Drug Interaction Prediction via Efficient Knowledge Graph Summarization This is the code for our paper ``SumGNN: Multi-typed Drug

58 Dec 21, 2022

This's an implementation of deepmind Visual Interaction Networks paper using pytorch

Visual-Interaction-Networks An implementation of Deepmind visual interaction networks in Pytorch. Introduction For the purpose of understanding the ch

166 Dec 6, 2022

The GitHub repository for the paper: “Time Series is a Special Sequence: Forecasting with Sample Convolution and Interaction“.

SCINet This is the original PyTorch implementation of the following work: Time Series is a Special Sequence: Forecasting with Sample Convolution and I

386 Jan 1, 2023

Codes and models for the paper "Learning Unknown from Correlations: Graph Neural Network for Inter-novel-protein Interaction Prediction".

GNN_PPI Codes and models for the paper "Learning Unknown from Correlations: Graph Neural Network for Inter-novel-protein Interaction Prediction". Lear

2 Dec 14, 2022

QAHOI: Query-Based Anchors for Human-Object Interaction Detection (paper)

Related tags

Overview

QAHOI

Requirements

Dataset Preparation

Evaluation

Training

Citation

Comments

Test QAHOI on custom images

question about the swin-tiny model

Question about iterative box refinement

Training on vcoco dataset

Is it possible for you to share a training log of Swin-T model?

How to train QPIC with Swin-Tiny

Question about the performance on V-COCO dataset

Can you share your visualization script?

parameter --use_nms is not used

Converge trend of the model

Owner

PaddleRobotics is an open-source algorithm library for robots based on Paddle, including open-source parts such as human-robot interaction, complex motion control, environment perception, SLAM positioning, and navigation.

Continuous Query Decomposition for Complex Query Answering in Incomplete Knowledge Graphs

Code for ACL 21: Generating Query Focused Summaries from Query-Free Resources

Synthesizing Long-Term 3D Human Motion and Interaction in 3D in CVPR2021

Populating 3D Scenes by Learning Human-Scene Interaction https://posa.is.tue.mpg.de/

Code for our CVPR 2022 Paper "GEN-VLKT: Simplify Association and Enhance Interaction Understanding for HOI Detection"

[CVPR 2021] Modular Interactive Video Object Segmentation: Interaction-to-Mask, Propagation and Difference-Aware Fusion

CPF: Learning a Contact Potential Field to Model the Hand-object Interaction

Hybrid CenterNet - Hybrid-supervised object detection / Weakly semi-supervised object detection

Yolo object detection - Yolo object detection with python

MOT-Tracking-by-Detection-Pipeline - For Tracking-by-Detection format MOT (Multi Object Tracking), is it a framework that separates Detection and Tracking processes?

This is the repo for the paper `SumGNN: Multi-typed Drug Interaction Prediction via Efficient Knowledge Graph Summarization'. (published in Bioinformatics'21)

This's an implementation of deepmind Visual Interaction Networks paper using pytorch

The GitHub repository for the paper: “Time Series is a Special Sequence: Forecasting with Sample Convolution and Interaction“.

Codes and models for the paper "Learning Unknown from Correlations: Graph Neural Network for Inter-novel-protein Interaction Prediction".

Code for KDD'20 "An Efficient Neighborhood-based Interaction Model for Recommendation on Heterogeneous Graph"

GBIM(Gesture-Based Interaction map)

QueryDet: Cascaded Sparse Query for Accelerating High-Resolution SmallObject Detection

Human Detection - Pedestrian Detection using OpenCV Python