QAHOI: Query-Based Anchors for Human-Object Interaction Detection (paper)

Related tags

Deep Learning QAHOI
Overview

QAHOI

QAHOI: Query-Based Anchors for Human-Object Interaction Detection (paper)

Requirements

  • PyTorch >= 1.5.1
  • torchvision >= 0.6.1
pip install -r requirements.txt
  • Compiling CUDA operators
cd ./models/ops
sh ./make.sh
# test
python test.py

Dataset Preparation

Please follow the HICO-DET dataset preparation of GGNet.

After preparation, the data folder as follows:

data
├── hico_20160224_det
|   ├── images
|   |   ├── test2015
|   |   └── train2015
|   └── annotations
|       ├── anno_list.json
|       ├── corre_hico.npy
|       ├── file_name_to_obj_cat.json
|       ├── hoi_id_to_num.json
|       ├── hoi_list_new.json
|       ├── test_hico.json
|       └── trainval_hico.json

Evaluation

Download the model to params folder.

  • We test the model with NVIDIA A6000 GPU, Pytorch 1.9.0, Python 3.8 and CUDA 11.2.
Model Full (def) Rare (def) None-Rare (def) Full (ko) Rare (ko) None-Rare (ko) Download
Swin-Tiny 28.47 22.44 30.27 30.99 24.83 32.84 model
Swin-Base*+ 33.58 25.86 35.88 35.34 27.24 37.76 model
Swin-Large*+ 35.78 29.80 37.56 37.59 31.36 39.36 model

Evaluating the model by running the following command.

--eval_extra to evaluate the spatio contribution.

mAP_default.json and mAP_ko.json will save in current folder.

  • Swin-Tiny
python main.py --resume params/QAHOI_swin_tiny_mul3.pth --backbone swin_tiny --num_feature_levels 3 --use_nms --eval
  • Swin-Base*+
python main.py --resume params/QAHOI_swin_base_384_22k_mul3.pth --backbone swin_base_384 --num_feature_levels 3 --use_nms --eval
  • Swin-Large*+
python main.py --resume params/QAHOI_swin_large_384_22k_mul3.pth --backbone swin_large_384 --num_feature_levels 3 --use_nms --eval

Training

Download the pre-trained swin-tiny model from Swin-Transformer to params folder.

Training QAHOI with Swin-Tiny from scratch.

python -m torch.distributed.launch \
        --nproc_per_node=8 \
        --use_env main.py \
        --backbone swin_tiny \
        --pretrained params/swin_tiny_patch4_window7_224.pth \
        --output_dir logs/swin_tiny_mul3 \
        --epochs 150 \
        --lr_drop 120 \
        --num_feature_levels 3 \
        --num_queries 300 \
        --use_nms

Training QAHOI with Swin-Base*+ from scratch.

python -m torch.distributed.launch \
        --nproc_per_node=8 \
        --use_env main.py \
        --backbone swin_base_384 \
        --pretrained params/swin_base_patch4_window7_224_22k.pth \
        --output_dir logs/swin_base_384_22k_mul3 \
        --epochs 150 \
        --lr_drop 120 \
        --num_feature_levels 3 \
        --num_queries 300 \
        --use_nms

Training QAHOI with Swin-Large*+ from scratch.

python -m torch.distributed.launch \
        --nproc_per_node=8 \
        --use_env main.py \
        --backbone swin_large_384 \
        --pretrained params/swin_large_patch4_window12_384_22k.pth \
        --output_dir logs/swin_large_384_22k_mul3 \
        --epochs 150 \
        --lr_drop 120 \
        --num_feature_levels 3 \
        --num_queries 300 \
        --use_nms

Citation

@article{cjw,
  title={QAHOI: Query-Based Anchors for Human-Object Interaction Detection},
  author={Junwen Chen and Keiji Yanai},
  journal={arXiv preprint arXiv:2112.08647},
  year={2021}
}
Comments
  • Test QAHOI on custom images

    Test QAHOI on custom images

    I would like to test the model on images other than the HICO-DET and V-COCO datasets. I do not have ground truth detection. I only found the eval code but it requires ground truth detections for evaluation.

    opened by mjantoun 3
  • question about the swin-tiny model

    question about the swin-tiny model

    I can replicate the result of the swin-large* model. But i can't replicate the result of the swin-tiny model with the same training command

    python -m torch.distributed.launch
    --nproc_per_node=8
    --use_env main.py
    --backbone swin_tiny
    --pretrained params/swin_tiny_patch4_window7_224.pth
    --output_dir logs/swin_tiny_mul3
    --epochs 150
    --lr_drop 120
    --num_feature_levels 3
    --num_queries 300
    --use_nms

    image

    opened by truetone2022 3
  • Question about iterative box refinement

    Question about iterative box refinement

    Have you tried using the iterative box refinement module in deformable detr? Is the iterative box refinement module helpful for HOI detection? Thanks for any helpful reply!

    opened by truetone2022 2
  • Training on vcoco dataset

    Training on vcoco dataset

    Hello @cjw2021 ! Thanks for your great works! I tried to train this model on vcoco dataset to save some time.But when i build the vcoco dataset,i meet the same error on File "/root/autodl-tmp/QAHOI-main/models/matcher.py", line 53, in forward cost_verb_class = -(out_verb_prob.matmul(tgt_verb_labels_permute) / \ RuntimeError: mat1 dim 1 must match mat2 dim 0

    Do you have the training code of vcoco in feature?My vcoco dataset training followed with "https://github.com/hitachi-rd-cv/qpic" to add vcoco.py and vcocoeval.py in dataset and fix it,If you could give me any help i would be appreciate.Thank you for your help.

    opened by OBVIOUSDAWN 2
  • Is it possible for you to share a training log of Swin-T model?

    Is it possible for you to share a training log of Swin-T model?

    Hello author, Thx for your great work! As I am using your repo, I find this model hard to converge compared to normal object detection. Thus, is it possible for you to share a training log of Swin-T model (or any other model)? Many thanks for that!

    opened by JacobYuan7 2
  • How to train QPIC with Swin-Tiny

    How to train QPIC with Swin-Tiny

    Hello, can you provide backbone for swin-T qpic training parameter details, I found in training that the same setting for each interaction mAP value is 0. How to train QPIC with Swin-Tiny, the mAP of the first 20 epochs we train is almost all 0. Thank you very much!

    opened by yaoyaosanqi 1
  • Question about the performance on V-COCO dataset

    Question about the performance on V-COCO dataset

    Thanks for your great works!! :) Have u ever tried to train the QAHOI on V-COCO dataset? I donot find the results on V-COCO in paper. If you could give me any help i would be appreciate.Thank you for your help.

    opened by bingnanG 1
  • Can you share your visualization script?

    Can you share your visualization script?

    Hello @cjw2021 ! Thanks for your great works!

    Can you share your visalization script? Also, if you already have distributed demo script and I am missing it, please let me know.

    Regards, Sungguk Cha

    opened by sunggukcha 1
  • parameter --use_nms is not used

    parameter --use_nms is not used

    Hi,

    Thank you very much for sharing the codes. It's great! For the --use_nms parameter defined here https://github.com/cjw2021/QAHOI/blob/main/main.py#L138, it looks like it is not used.

    opened by ilovecv 1
  • Converge trend of the model

    Converge trend of the model

    Hi authors,

    Thanks for your open-source implementation, I read your instruction and tried to reproduce the final detection performance. However I realized the converge speed of the model is too low: it takes almost 2 days to reach 150 epoches on two nodes with 8 gpus on each node. Have you tried any way to accelerate the procedure? Will scaling up the learning rate at the start of the training be helpful?

    opened by hwfan 1
Owner
null
PaddleRobotics is an open-source algorithm library for robots based on Paddle, including open-source parts such as human-robot interaction, complex motion control, environment perception, SLAM positioning, and navigation.

简体中文 | English PaddleRobotics paddleRobotics是基于paddle的机器人开源算法库集,包括人机交互、复杂运动控制、环境感知、slam定位导航等开源算法部分。 人机交互 主动多模交互技术TFVT-HRI 主动多模交互技术是通过视觉、语音、触摸传感器等输入机器人

null 185 Dec 26, 2022
Continuous Query Decomposition for Complex Query Answering in Incomplete Knowledge Graphs

Continuous Query Decomposition This repository contains the official implementation for our ICLR 2021 (Oral) paper, Complex Query Answering with Neura

UCL Natural Language Processing 71 Dec 29, 2022
Code for ACL 21: Generating Query Focused Summaries from Query-Free Resources

marge This repository releases the code for Generating Query Focused Summaries from Query-Free Resources. Please cite the following paper [bib] if you

Yumo Xu 28 Nov 10, 2022
Synthesizing Long-Term 3D Human Motion and Interaction in 3D in CVPR2021

Long-term-Motion-in-3D-Scenes This is an implementation of the CVPR'21 paper "Synthesizing Long-Term 3D Human Motion and Interaction in 3D". Please ch

Jiashun Wang 76 Dec 13, 2022
Populating 3D Scenes by Learning Human-Scene Interaction https://posa.is.tue.mpg.de/

Populating 3D Scenes by Learning Human-Scene Interaction [Project Page] [Paper] License Software Copyright License for non-commercial scientific resea

Mohamed Hassan 81 Nov 8, 2022
Code for our CVPR 2022 Paper "GEN-VLKT: Simplify Association and Enhance Interaction Understanding for HOI Detection"

GEN-VLKT Code for our CVPR 2022 paper "GEN-VLKT: Simplify Association and Enhance Interaction Understanding for HOI Detection". Contributed by Yue Lia

Yue Liao 47 Dec 4, 2022
[CVPR 2021] Modular Interactive Video Object Segmentation: Interaction-to-Mask, Propagation and Difference-Aware Fusion

[CVPR 2021] Modular Interactive Video Object Segmentation: Interaction-to-Mask, Propagation and Difference-Aware Fusion

Rex Cheng 364 Jan 3, 2023
CPF: Learning a Contact Potential Field to Model the Hand-object Interaction

Contact Potential Field This repo contains model, demo, and test codes of our paper: CPF: Learning a Contact Potential Field to Model the Hand-object

Lixin YANG 99 Dec 26, 2022
Hybrid CenterNet - Hybrid-supervised object detection / Weakly semi-supervised object detection

Hybrid-Supervised Object Detection System Object detection system trained by hybrid-supervision/weakly semi-supervision (HSOD/WSSOD): This project is

null 5 Dec 10, 2022
Yolo object detection - Yolo object detection with python

How to run download required files make build_image make download Docker versio

null 3 Jan 26, 2022
This is the repo for the paper `SumGNN: Multi-typed Drug Interaction Prediction via Efficient Knowledge Graph Summarization'. (published in Bioinformatics'21)

SumGNN: Multi-typed Drug Interaction Prediction via Efficient Knowledge Graph Summarization This is the code for our paper ``SumGNN: Multi-typed Drug

Yue Yu 58 Dec 21, 2022
This's an implementation of deepmind Visual Interaction Networks paper using pytorch

Visual-Interaction-Networks An implementation of Deepmind visual interaction networks in Pytorch. Introduction For the purpose of understanding the ch

Mahmoud Gamal Salem 166 Dec 6, 2022
The GitHub repository for the paper: “Time Series is a Special Sequence: Forecasting with Sample Convolution and Interaction“.

SCINet This is the original PyTorch implementation of the following work: Time Series is a Special Sequence: Forecasting with Sample Convolution and I

null 386 Jan 1, 2023
Codes and models for the paper "Learning Unknown from Correlations: Graph Neural Network for Inter-novel-protein Interaction Prediction".

GNN_PPI Codes and models for the paper "Learning Unknown from Correlations: Graph Neural Network for Inter-novel-protein Interaction Prediction". Lear

Ursa Zrimsek 2 Dec 14, 2022
Code for KDD'20 "An Efficient Neighborhood-based Interaction Model for Recommendation on Heterogeneous Graph"

Heterogeneous INteract and aggreGatE (GraphHINGE) This is a pytorch implementation of GraphHINGE model. This is the experiment code in the following w

Jinjiarui 69 Nov 24, 2022
GBIM(Gesture-Based Interaction map)

手势交互地图 GBIM(Gesture-Based Interaction map),基于视觉深度神经网络的交互地图,通过电脑摄像头观察使用者的手势变化,进而控制地图进行简单的交互。网络使用PaddleX提供的轻量级模型PPYOLO Tiny以及MobileNet V3 small,使得整个模型大小约10MB左右,即使在CPU下也能快速定位和识别手势。

null 8 Feb 10, 2022
QueryDet: Cascaded Sparse Query for Accelerating High-Resolution SmallObject Detection

QueryDet-PyTorch This repository is the official implementation of our paper: QueryDet: Cascaded Sparse Query for Accelerating High-Resolution Small O

Chenhongyi Yang 276 Dec 31, 2022
Human Detection - Pedestrian Detection using OpenCV Python

Pedestrian Detection using OpenCV Python Follow us on Instagram for Machine Lear

Hrishikesh Dutta 1 Jan 23, 2022