This is the code for HOI Transformer

Overview

HOI Transformer

Code for CVPR 2021 accepted paper End-to-End Human Object Interaction Detection with HOI Transformer.

Reproduction

We recomend you to setup in the following steps:

1.Clone the repo.

git clone https://github.com/bbepoch/HoiTransformer.git

2.Download the MS-COCO pretrained DETR model.

cd data/detr_coco && bash download_model.sh

3.You are supposed to make a soft link named 'images' in 'data/hico/' to refer to your HICO-DET path, or your will have to modify the data path manually in hico.py.

ln -s /path-to-your-hico-det-dataset/hico_20160224_det/images images

4.Train a model.

python3 -m torch.distributed.launch --nproc_per_node=8 --use_env main.py --epochs=250 --lr_drop=200 --dataset_file=hico --batch_size=2 --backbone=resnet50

5.Test a model.

python3 test.py --dataset_file=hico --batch_size=1 --log_dir=./ --model_path=your_model_path

Citation

@inproceedings{zou2021_hoitrans,
author = {Zou, Cheng and Wang, Bohan and Hu, Yue and Liu, Junqi and Wu, Qian and Zhao, Yu and Li, Boxun and Zhang, Chenguang and Zhang, Chi and Wei, Yichen and Sun, Jian},
title = {End-to-End Human Object Interaction Detection with HOI Transformer},
booktitle={CVPR},
year = {2021},
}

Acknowledgement

We sincerely thank all previous works, especially DETR, PPDM, iCAN, for some of the codes are built upon them.

Comments
  • How to evaluate the model with custom images?

    How to evaluate the model with custom images?

    First of all, Thank you so much for such an amazing work. I have two things to clarify

    1. I have download some images from the google and put it on the same folder 'images' replacing the existing test folder with the new one, but still, on executing the program, it fetches the images from the original test data and not from the new one. Even if i give only the subset of images, it is fetching the whole. Could we test the model over new set of test images which is not present in the database, if so where should i do the changes?
    2. I have tried to visualize the images or save the images with bounding box inserted setting save_image=True of the test.py script, but it is throwing an error "NotImplementedError". Could i visualize the test image with detected bounding boxes and labels?
    opened by Jeba-create 11
  • A question about odgt annotations.

    A question about odgt annotations.

    Hello Author, thank you for your great work! Recently, I tried to use HoiTransformer to train my own dataset. I have some questions about the odgt annotations. Taking HICO_train2015_00000001.jpg for example, my questions are as follows:

    1. In the original HICO annotation of this picture, there was only one pair of boxes for motorcycle and person. However, in the ODGT annotation, I found that this picture had more than one pair of boxes for motorcycle and person(but there is only one person and motorcycle in the pic), and the coordinates of each box were not the same, but the difference was not significant. I did not understand this point. (why are the coordinates of boxes of the same person or motorbike is different).

    2. The coordinate of the person box in this picture in the hico annotations is [207,32,426,299], but the coordinates in odgt annotations is [207,32,220,268]. My understanding is that in hico annotations, the coordinates are the coordinates of the upper-left and lower-right points of the box, whereas in your odgt annotation, the coordinates are the upper-left points and the length and width of the box. Although this explanation seems reasonable, according to this understanding, the length and width of the person box should be [426-207=219,299-32=267], but it is [220,268] in ODGT annotations. Please tell me why.

    Thanks again for your excellent work! Your answer will be of great help to me. Looking forward to your reply!

    opened by Zhanyi0923 5
  • error when training

    error when training

    I try to train on V-COCO. But when I run python main.py --epochs=250 --lr_drop=110 --dataset_file=vcoco --batch_size=16 --backbone=resnet50

    I got an error of :

      File "E:\project\HoiTransformer-master\models\hoi_matcher.py", line 80, in forward
        human_cost_class = -human_out_prob[:, human_tgt_ids]
    IndexError: tensors used as indices must be long, byte or bool tensors
    

    How to solve this error?

    opened by leijue222 3
  • viz_hoi_result draw object box retangle incorrect position

    viz_hoi_result draw object box retangle incorrect position

    Hi there,

    I trained model with a small dataset (person raising hand) without any problem. But when I run test_on_images to test prediction, it drawed incorrect object box position (attached image). Could you help me figure out what I've done incorrectly?

    img_1-ch01_20210331111737_0013 jpg_000000

    Best regards, MT.

    opened by mac-tieu 3
  • Person activity without object detection

    Person activity without object detection

    @bbepoch hi thanks for sharing the code base great work, but i had one query, currently when i tested the model for some scenes like an only person running on a beach without any other object present there is no detections/activity in the output, is there any way i can get results like people walking , fighting, waving without depending on the object present in the scene

    Thanks in advance

    opened by abhigoku10 3
  • Question about the training and evaluation code on VCOCO dataset?

    Question about the training and evaluation code on VCOCO dataset?

    Thanks for your nice work. Could you also provide the source code for training and evaluating on VCOCO dataset since it takes too long for the training on the HICO-Det dataset? Thanks a lot.

    opened by GWwangshuo 3
  • A question about your V-COCO dataset.

    A question about your V-COCO dataset.

    First of all, thanks for sharing the source code of the paper (End-to-End Human Object Interaction Detection with HOI Transformer) which was accepted in CVPR. I have a question about your V-COCO dataset. The original V-coco dataset consists of 5,400 images in the trainval dataset and 4946 images in the test set. But I download your retag dataset from google drive which consists of 4971 images in trainval set and 4539 images in the test set.

    May I ask whether I downloaded the wrong data or the data has been processed especially?

    I look forward to receiving your reply.

    opened by Vancause 2
  • some questions about the result

    some questions about the result

    Hi, Thanks for your great work. #Sorry to bother you. I noticed some differences between the repo and the paper.

    In repo: image

    In paper: image

    Maybe I overlooked something else, but i really can not understand the reasons. Could you please explain this?

    opened by fengliqiu 2
  • about number of object categories

    about number of object categories

    Excuse me!

    It's a grate job, Tanks for what you contribute.

    The paper say that HICO-DET has 80 objects,but num_classes is 91 in /models/hoitr.py 345 lines.Why set it like this?

    And could you tell me how to get 'hico_train_retag_hoitr.odgt' from hico-det datasets' file?

    thank you very much!

    opened by JackWhite-rwx 2
  • a little question about paper

    a little question about paper

    It's a grate job, Tanks for what you contribute. I have a question about the paper. In the paper, you said "All one-layer MLP branches for predicting confidence use a softmax function.". But as far as I know, VCOCO dataset's verbs is multi label, I am wonder how can I use softmax to predict multi label. Hope you can reply me Thanks

    opened by feather820 2
  • training on swin-b backbone

    training on swin-b backbone

    I find some swin-b code in repositories and tried to train this model on swin-b.But i find the train loss convergence at 70 and it cant drop with any training.Can you share how to train this model on swin-b or teach me if there anything code need to complete in this repositories that i will tried to complete this code.I would be appreciate for your reponse,thank you so much.

    opened by OBVIOUSDAWN 1
  • Questions about visualization of attention map

    Questions about visualization of attention map

    Thanks for your great works!! :) I found the visualization of attention map in the paper,could u provide the visualization code? Thank you for your help.

    opened by bingnanG 0
  • Vcoco 150 epoch or 250 epoch

    Vcoco 150 epoch or 250 epoch

    Thanks for sharing great work

    I have one question, when training vcoco with resnet 50 Should I train 150 epoch or 250 epoch to reproduce the result of 51 AP

    • during the test.py is --prior used for the test?
    opened by lapal0413 0
  • Datasets Split

    Datasets Split

    Hi,

    Thank you so much for the great paper!

    I'm confused about the data split on VCOCO by going through the codes, did you use train2014 and val2014 for training and testing respectively? I mean there're only training and testing datasets used without a validation set, or I can think the val2014 is used for both validation and testing? Thank you in advance.

    opened by zhumanli 0
  • Replace Detr to yolor

    Replace Detr to yolor

    @bbepoch hi thanks for opensourcing the code base had one query regarding the architecture for the current hoitransformer you are using Detr as the base can we change it yolor ? is so what is the process of changing the code base to yolor can share your thoughts on this Thanks in adavnce

    opened by abhigoku10 0
  • The script to generate ODGT annotation files

    The script to generate ODGT annotation files

    The ODGT annotations are indeed much easier to understand for HOI detection. I was wondering if the script to convert V-COCO's raw annotations to the ODGT format could be shared. Thank you.

    opened by playerkk 1
Owner
BigBangEpoch
make it easier and better
BigBangEpoch
Code for our NeurIPS 2021 paper Mining the Benefits of Two-stage and One-stage HOI Detection

CDN Code for our NeurIPS 2021 paper "Mining the Benefits of Two-stage and One-stage HOI Detection". Contributed by Aixi Zhang*, Yue Liao*, Si Liu, Mia

null 71 Dec 14, 2022
Code for Mining the Benefits of Two-stage and One-stage HOI Detection

Status: Archive (code is provided as-is, no updates expected) PPO-EWMA [Paper] This is code for training agents using PPO-EWMA and PPG-EWMA, introduce

OpenAI 33 Dec 15, 2022
Code for our CVPR 2022 Paper "GEN-VLKT: Simplify Association and Enhance Interaction Understanding for HOI Detection"

GEN-VLKT Code for our CVPR 2022 paper "GEN-VLKT: Simplify Association and Enhance Interaction Understanding for HOI Detection". Contributed by Yue Lia

Yue Liao 47 Dec 4, 2022
VSR-Transformer - This paper proposes a new Transformer for video super-resolution (called VSR-Transformer).

VSR-Transformer By Jiezhang Cao, Yawei Li, Kai Zhang, Luc Van Gool This paper proposes a new Transformer for video super-resolution (called VSR-Transf

Jiezhang Cao 225 Nov 13, 2022
nnFormer: Interleaved Transformer for Volumetric Segmentation Code for paper "nnFormer: Interleaved Transformer for Volumetric Segmentation "

nnFormer: Interleaved Transformer for Volumetric Segmentation Code for paper "nnFormer: Interleaved Transformer for Volumetric Segmentation ". Please

jsguo 610 Dec 28, 2022
Implementation of Transformer in Transformer, pixel level attention paired with patch level attention for image classification, in Pytorch

Transformer in Transformer Implementation of Transformer in Transformer, pixel level attention paired with patch level attention for image c

Phil Wang 272 Dec 23, 2022
This is an official implementation for "Swin Transformer: Hierarchical Vision Transformer using Shifted Windows" on Object Detection and Instance Segmentation.

Swin Transformer for Object Detection This repo contains the supported code and configuration files to reproduce object detection results of Swin Tran

Swin Transformer 1.4k Dec 30, 2022
Third party Pytorch implement of Image Processing Transformer (Pre-Trained Image Processing Transformer arXiv:2012.00364v2)

ImageProcessingTransformer Third party Pytorch implement of Image Processing Transformer (Pre-Trained Image Processing Transformer arXiv:2012.00364v2)

null 61 Jan 1, 2023
Implementation of Vision Transformer, a simple way to achieve SOTA in vision classification with only a single transformer encoder, in Pytorch

Implementation of Vision Transformer, a simple way to achieve SOTA in vision classification with only a single transformer encoder, in Pytorch

Phil Wang 12.6k Jan 9, 2023
Alex Pashevich 62 Dec 24, 2022
The implementation of "Shuffle Transformer: Rethinking Spatial Shuffle for Vision Transformer"

Shuffle Transformer The implementation of "Shuffle Transformer: Rethinking Spatial Shuffle for Vision Transformer" Introduction Very recently, window-

null 87 Nov 29, 2022
Unofficial implementation of "Swin Transformer: Hierarchical Vision Transformer using Shifted Windows" (https://arxiv.org/abs/2103.14030)

Swin-Transformer-Tensorflow A direct translation of the official PyTorch implementation of "Swin Transformer: Hierarchical Vision Transformer using Sh

null 52 Dec 29, 2022
CSWin Transformer: A General Vision Transformer Backbone with Cross-Shaped

CSWin-Transformer This repo is the official implementation of "CSWin Transformer: A General Vision Transformer Backbone with Cross-Shaped Windows". Th

Microsoft 409 Jan 6, 2023
3D-Transformer: Molecular Representation with Transformer in 3D Space

3D-Transformer: Molecular Representation with Transformer in 3D Space

null 55 Dec 19, 2022
This repository builds a basic vision transformer from scratch so that one beginner can understand the theory of vision transformer.

vision-transformer-from-scratch This repository includes several kinds of vision transformers from scratch so that one beginner can understand the the

null 1 Dec 24, 2021
Transformer - Transformer in PyTorch

Transformer 完成进度 Embeddings and PositionalEncoding with example. MultiHeadAttent

Tianyang Li 1 Jan 6, 2022
Transformer Huffman coding - Complete Huffman coding through transformer

Transformer_Huffman_coding Complete Huffman coding through transformer 2022/2/19

null 3 May 19, 2022
Implementation of the Transformer variant proposed in "Transformer Quality in Linear Time"

FLASH - Pytorch Implementation of the Transformer variant proposed in the paper Transformer Quality in Linear Time Install $ pip install FLASH-pytorch

Phil Wang 209 Dec 28, 2022
This is the official source code for SLATE. We provide the code for the model, the training code, and a dataset loader for the 3D Shapes dataset. This code is implemented in Pytorch.

SLATE This is the official source code for SLATE. We provide the code for the model, the training code and a dataset loader for the 3D Shapes dataset.

Gautam Singh 66 Dec 26, 2022