This is the code for HOI Transformer

BigBangEpoch

Last update: Dec 29, 2022

Related tags

Deep Learning HoiTransformer

Overview

HOI Transformer

Code for CVPR 2021 accepted paper End-to-End Human Object Interaction Detection with HOI Transformer.

Reproduction

We recomend you to setup in the following steps:

1.Clone the repo.

git clone https://github.com/bbepoch/HoiTransformer.git

2.Download the MS-COCO pretrained DETR model.

cd data/detr_coco && bash download_model.sh

3.You are supposed to make a soft link named 'images' in 'data/hico/' to refer to your HICO-DET path, or your will have to modify the data path manually in hico.py.

ln -s /path-to-your-hico-det-dataset/hico_20160224_det/images images

4.Train a model.

python3 -m torch.distributed.launch --nproc_per_node=8 --use_env main.py --epochs=250 --lr_drop=200 --dataset_file=hico --batch_size=2 --backbone=resnet50

5.Test a model.

python3 test.py --dataset_file=hico --batch_size=1 --log_dir=./ --model_path=your_model_path

Citation

@inproceedings{zou2021_hoitrans,
author = {Zou, Cheng and Wang, Bohan and Hu, Yue and Liu, Junqi and Wu, Qian and Zhao, Yu and Li, Boxun and Zhang, Chenguang and Zhang, Chi and Wei, Yichen and Sun, Jian},
title = {End-to-End Human Object Interaction Detection with HOI Transformer},
booktitle={CVPR},
year = {2021},
}

Acknowledgement

We sincerely thank all previous works, especially DETR, PPDM, iCAN, for some of the codes are built upon them.

Comments

How to evaluate the model with custom images?
First of all, Thank you so much for such an amazing work. I have two things to clarify

I have download some images from the google and put it on the same folder 'images' replacing the existing test folder with the new one, but still, on executing the program, it fetches the images from the original test data and not from the new one. Even if i give only the subset of images, it is fetching the whole. Could we test the model over new set of test images which is not present in the database, if so where should i do the changes?

I have tried to visualize the images or save the images with bounding box inserted setting save_image=True of the test.py script, but it is throwing an error "NotImplementedError". Could i visualize the test image with detected bounding boxes and labels?
opened by Jeba-create 11
A question about odgt annotations.
Hello Author, thank you for your great work! Recently, I tried to use HoiTransformer to train my own dataset. I have some questions about the odgt annotations. Taking HICO_train2015_00000001.jpg for example, my questions are as follows:

In the original HICO annotation of this picture, there was only one pair of boxes for motorcycle and person. However, in the ODGT annotation, I found that this picture had more than one pair of boxes for motorcycle and person(but there is only one person and motorcycle in the pic), and the coordinates of each box were not the same, but the difference was not significant. I did not understand this point. (why are the coordinates of boxes of the same person or motorbike is different).

The coordinate of the person box in this picture in the hico annotations is [207,32,426,299], but the coordinates in odgt annotations is [207,32,220,268]. My understanding is that in hico annotations, the coordinates are the coordinates of the upper-left and lower-right points of the box, whereas in your odgt annotation, the coordinates are the upper-left points and the length and width of the box. Although this explanation seems reasonable, according to this understanding, the length and width of the person box should be [426-207=219,299-32=267], but it is [220,268] in ODGT annotations. Please tell me why.

Thanks again for your excellent work! Your answer will be of great help to me. Looking forward to your reply!
opened by Zhanyi0923 5
error when training
I try to train on V-COCO. But when I run python main.py --epochs=250 --lr_drop=110 --dataset_file=vcoco --batch_size=16 --backbone=resnet50

I got an error of :

File "E:\project\HoiTransformer-master\models\hoi_matcher.py", line 80, in forward human_cost_class = -human_out_prob[:, human_tgt_ids] IndexError: tensors used as indices must be long, byte or bool tensors

How to solve this error?
opened by leijue222 3
viz_hoi_result draw object box retangle incorrect position

Hi there,

I trained model with a small dataset (person raising hand) without any problem. But when I run test_on_images to test prediction, it drawed incorrect object box position (attached image). Could you help me figure out what I've done incorrectly?

Best regards, MT.

opened by mac-tieu 3
Person activity without object detection

@bbepoch hi thanks for sharing the code base great work, but i had one query, currently when i tested the model for some scenes like an only person running on a beach without any other object present there is no detections/activity in the output, is there any way i can get results like people walking , fighting, waving without depending on the object present in the scene

Thanks in advance

opened by abhigoku10 3
Question about the training and evaluation code on VCOCO dataset?

Thanks for your nice work. Could you also provide the source code for training and evaluating on VCOCO dataset since it takes too long for the training on the HICO-Det dataset? Thanks a lot.

opened by GWwangshuo 3
A question about your V-COCO dataset.

First of all, thanks for sharing the source code of the paper (End-to-End Human Object Interaction Detection with HOI Transformer) which was accepted in CVPR. I have a question about your V-COCO dataset. The original V-coco dataset consists of 5,400 images in the trainval dataset and 4946 images in the test set. But I download your retag dataset from google drive which consists of 4971 images in trainval set and 4539 images in the test set.

May I ask whether I downloaded the wrong data or the data has been processed especially?

I look forward to receiving your reply.

opened by Vancause 2
some questions about the result

Hi, Thanks for your great work. #Sorry to bother you. I noticed some differences between the repo and the paper.

In repo:

In paper:

Maybe I overlooked something else, but i really can not understand the reasons. Could you please explain this？

opened by fengliqiu 2
about number of object categories

Excuse me!

It's a grate job, Tanks for what you contribute.

The paper say that HICO-DET has 80 objects,but num_classes is 91 in /models/hoitr.py 345 lines.Why set it like this？

And could you tell me how to get 'hico_train_retag_hoitr.odgt' from hico-det datasets' file?

thank you very much!

opened by JackWhite-rwx 2
a little question about paper

It's a grate job, Tanks for what you contribute. I have a question about the paper. In the paper, you said "All one-layer MLP branches for predicting confidence use a softmax function.". But as far as I know, VCOCO dataset's verbs is multi label, I am wonder how can I use softmax to predict multi label. Hope you can reply me Thanks

opened by feather820 2
training on swin-b backbone

I find some swin-b code in repositories and tried to train this model on swin-b.But i find the train loss convergence at 70 and it cant drop with any training.Can you share how to train this model on swin-b or teach me if there anything code need to complete in this repositories that i will tried to complete this code.I would be appreciate for your reponse,thank you so much.

opened by OBVIOUSDAWN 1
Questions about visualization of attention map

Thanks for your great works!! :) I found the visualization of attention map in the paper，could u provide the visualization code？ Thank you for your help.

opened by bingnanG 0
Vcoco 150 epoch or 250 epoch
Thanks for sharing great work

I have one question, when training vcoco with resnet 50 Should I train 150 epoch or 250 epoch to reproduce the result of 51 AP

during the test.py is --prior used for the test?
opened by lapal0413 0
Datasets Split

Hi,

Thank you so much for the great paper!

I'm confused about the data split on VCOCO by going through the codes, did you use train2014 and val2014 for training and testing respectively? I mean there're only training and testing datasets used without a validation set, or I can think the val2014 is used for both validation and testing? Thank you in advance.

opened by zhumanli 0
Replace Detr to yolor

@bbepoch hi thanks for opensourcing the code base had one query regarding the architecture for the current hoitransformer you are using Detr as the base can we change it yolor ? is so what is the process of changing the code base to yolor can share your thoughts on this Thanks in adavnce

opened by abhigoku10 0
The script to generate ODGT annotation files

The ODGT annotations are indeed much easier to understand for HOI detection. I was wondering if the script to convert V-COCO's raw annotations to the ODGT format could be shared. Thank you.

opened by playerkk 1

This is the code for HOI Transformer

Related tags

Overview

HOI Transformer

Reproduction

Citation

Acknowledgement

Comments

Owner

BigBangEpoch

Code for our NeurIPS 2021 paper Mining the Benefits of Two-stage and One-stage HOI Detection

Code for Mining the Benefits of Two-stage and One-stage HOI Detection

Code for our CVPR 2022 Paper "GEN-VLKT: Simplify Association and Enhance Interaction Understanding for HOI Detection"

VSR-Transformer - This paper proposes a new Transformer for video super-resolution (called VSR-Transformer).

nnFormer: Interleaved Transformer for Volumetric Segmentation Code for paper "nnFormer: Interleaved Transformer for Volumetric Segmentation "

Implementation of Transformer in Transformer, pixel level attention paired with patch level attention for image classification, in Pytorch

This is an official implementation for "Swin Transformer: Hierarchical Vision Transformer using Shifted Windows" on Object Detection and Instance Segmentation.

Third party Pytorch implement of Image Processing Transformer (Pre-Trained Image Processing Transformer arXiv:2012.00364v2)

Implementation of Vision Transformer, a simple way to achieve SOTA in vision classification with only a single transformer encoder, in Pytorch

Episodic Transformer (E.T.) is a novel attention-based architecture for vision-and-language navigation. E.T. is based on a multimodal transformer that encodes language inputs and the full episode history of visual observations and actions.

The implementation of "Shuffle Transformer: Rethinking Spatial Shuffle for Vision Transformer"

Unofficial implementation of "Swin Transformer: Hierarchical Vision Transformer using Shifted Windows" (https://arxiv.org/abs/2103.14030)

CSWin Transformer: A General Vision Transformer Backbone with Cross-Shaped

3D-Transformer: Molecular Representation with Transformer in 3D Space

This repository builds a basic vision transformer from scratch so that one beginner can understand the theory of vision transformer.

Transformer - Transformer in PyTorch

Transformer Huffman coding - Complete Huffman coding through transformer

Implementation of the Transformer variant proposed in "Transformer Quality in Linear Time"

This is the official source code for SLATE. We provide the code for the model, the training code, and a dataset loader for the 3D Shapes dataset. This code is implemented in Pytorch.