Spatial-Temporal Transformer for Dynamic Scene Graph Generation, ICCV2021

Yuren Cong

Last update: Jan 1, 2023

Related tags

Overview

Spatial-Temporal Transformer for Dynamic Scene Graph Generation

Pytorch Implementation of our paper Spatial-Temporal Transformer for Dynamic Scene Graph Generation accepted by ICCV2021. We propose a Transformer-based model STTran to generate dynamic scene graphs of the given video. STTran can detect the visual relationships in each frame.

The introduction video is available now: https://youtu.be/gKpnRU8btLg

About the code We run the code on a single RTX2080ti for both training and testing. We borrowed some code from Yang's repository and Zellers' repository.

Usage

We use python=3.6, pytorch=1.1 and torchvision=0.3 in our code. First, clone the repository:

git clone https://github.com/yrcong/STTran.git

We borrow some compiled code for bbox operations.

cd lib/draw_rectangles
python setup.py build_ext --inplace
cd ..
cd fpn/box_intersections_cpu
python setup.py build_ext --inplace

For the object detector part, please follow the compilation from https://github.com/jwyang/faster-rcnn.pytorch We provide a pretrained FasterRCNN model for Action Genome. Please download here and put it in

fasterRCNN/models/faster_rcnn_ag.pth

Dataset

We use the dataset Action Genome to train/evaluate our method. Please process the downloaded dataset with the Toolkit. The directories of the dataset should look like:

|-- action_genome
    |-- annotations   #gt annotations
    |-- frames        #sampled frames
    |-- videos        #original videos

In the experiments for SGCLS/SGDET, we only keep bounding boxes with short edges larger than 16 pixels. Please download the file object_bbox_and_relationship_filtersmall.pkl and put it in the dataloader

Train

You can train the STTran with train.py. We trained the model on a RTX 2080ti:

For PredCLS:

python train.py -mode predcls -datasize large -data_path $DATAPATH

For SGCLS:

python train.py -mode sgcls -datasize large -data_path $DATAPATH

For SGDET:

python train.py -mode sgdet -datasize large -data_path $DATAPATH

Evaluation

You can evaluate the STTran with test.py.

For PredCLS (trained Model):

python test.py -m predcls -datasize large -data_path $DATAPATH -model_path $MODELPATH

For SGCLS (trained Model): :

python test.py -m sgcls -datasize large -data_path $DATAPATH -model_path $MODELPATH

For SGDET (trained Model): :

python test.py -m sgdet -datasize large -data_path $DATAPATH -model_path $MODELPATH

Citation

If our work is helpful for your research, please cite our publication:

@inproceedings{cong2021spatial,
  title={Spatial-Temporal Transformer for Dynamic Scene Graph Generation},
  author={Cong, Yuren and Liao, Wentong and Ackermann, Hanno and Rosenhahn, Bodo and Yang, Michael Ying},
  booktitle = {International Conference on Computer Vision (ICCV)},
  year={2021}
  url={https://arxiv.org/abs/2107.12309}
}

Help

When you have any question/idea about the code/paper. Please comment in Github or send us Email. We will reply as soon as possible.

Comments

How to use faster rcnn？

您好，打扰了！请问目前编译 jwyang/faster-rcnn.pytorch，是需要将您仓库中的fasterRCNN目录下文件替换为https://github.com/jwyang/faster-rcnn.pytorch 中文件、修改https://github.com/yrcong/STTran/blob/main/fasterRCNN/lib/model/faster_rcnn/faster_rcnn.py 中代码，然后按照 jwyang/faster-rcnn.pytorch 的readme要求进行编译吗？此外， jwyang中的faster rcnn对于环境的要求似乎与您readme中的环境不同，请问会有影响吗？

opened by Yassin-fan 5
ModuleNotFoundError: No module named 'lib.fpn.box_intersections_cpu.bbox'

thanks for your great job. when I followed README, and try to run the train command:

python train.py -mode predcls -datasize large -data_path $DATAPATH

I had some problem. First, in dataloader/action_genome.py, from scipy.misc import imread . the lib scipy seems not surport the fuction imread in latest version. so i change it to from imageio import imread

Second, when I continue totry to run the project, a new problem occured: ModuleNotFoundError: No module named 'lib.fpn.box_intersections_cpu.bbox' it means i didnt generate the file in this directory, so I went back to

cd fpn/box_intersections_cpu python setup.py install ( i found if i dont add 'install ' , the command cant work)

and then

~/project/STTran/lib/fpn/box_intersections_cpu$ python setup.py install running install running build running build_ext running install_lib running install_egg_info Removing /home/liujingwei/anaconda3/envs/scene_graph_benchmark/lib/python3.7/site-packages/bbox_cython-0.0.0-py3.7.egg-info Writing /home/liujingwei/anaconda3/envs/scene_graph_benchmark/lib/python3.7/site-packages/bbox_cython-0.0.0-py3.7.egg-info

So, it didnt generate the file(lib.fpn.box_intersections_cpu.bbox), and i dont know how to fix it, then I may need your help.^_^

opened by PrimoWW 5

use glove failed

Thanks for your codes, its helpful! but some error happend when i run the Train command，error like this:

The CKPT saved here: data/
spatial encoder layer num: 1 / temporal decoder layer num: 3
mode : predcls
save_path : data/
model_path : None
data_path : /home/abc/NewDisk/origin_datasets/ActionGenome/dataset/ag/
datasize : large
ckpt : None
optimizer : adamw
lr : 1e-05
nepoch : 10
enc_layer : 1
dec_layer : 3
bce_loss : False
-------loading annotations---------slowly-----------
--------------------finish!-------------------------
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
There are 7584 videos and 177330 valid frames
144 videos are invalid (no person), remove them
49 videos are invalid (only one frame), remove them
21643 frames have no human bbox in GT, remove them!
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
-------loading annotations---------slowly-----------
--------------------finish!-------------------------
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
There are 1750 videos and 56923 valid frames
41 videos are invalid (no person), remove them
19 videos are invalid (only one frame), remove them
8636 frames have no human bbox in GT, remove them!
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

loading word vectors from data/glove.6B.200d.pt
loading word vectors from /home/abc/NewDisk/model_zoo/STTran/glove.6B.200d.pt
__background__ -> __background__ 
fail on __background__

The program is stuck at this point, so i wonder why? Thanks!

opened by ghostlyFeng 4

fasterRCNN compilation error with python 3.8 and pytorch 1.12 cuda 11.3
I.m trying to setup sttran from scratch and I'm getting errors while setting up fasterRCNN package. The command I run is : STTran/fasterRCNN/lib: python setup.py install

fatal error: THC/THC.h: No such file or directory 5 | #include <THC/THC.h> | ^~~~~~~~~~~ compilation terminated.
opened by zee-fee 3
the Action Genome Dump frames of this paper

In your readme, it is said to follow JingweiJ/ActionGenome to get the dataset. Could you please supply the version of your ffmpeg? For I followed JingweiJ/ActionGenome and get the empty folders of all the videos. By the way, I'd like to know whether you use 'python tools/dump_frames.py --all_frames' or just 'python tools/dump_frames.py' to get the dataset? Thanks for your work!

opened by Yassin-fan 3
RuntimeError: cuda runtime error (78) : a PTX JIT compilation failed at

Wonderful code! Howerever,I encountered the following error.Did I miss something that caused this error?

mode : predcls save_path : /media/wow/disk2/AG/save model_path : /media/wow/disk2/AG/predcls.tar data_path : /media/wow/disk2/AG/dataset datasize : large ckpt : None optimizer : adamw lr : 1e-05 nepoch : 10 enc_layer : 1 dec_layer : 3 bce_loss : False -------loading annotations---------slowly----------- --------------------finish!------------------------- xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx There are 1750 videos and 56923 valid frames 41 videos are invalid (no person), remove them 19 videos are invalid (only one frame), remove them 8636 frames have no human bbox in GT, remove them! xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx loading word vectors from data/glove.6B.200d.pt loading word vectors from /media/wow/disk2/AG/glove.6B.200d.pt background -> background fail on background

CKPT /media/wow/disk2/AG/predcls.tar is loaded THCudaCheck FAIL file=/home/cong/Dokumente/faster-rcnn.pytorch/lib/model/csrc/cuda/ROIAlign_cuda.cu line=297 error=78 : a PTX JIT compilation failed Traceback (most recent call last): File "test.py", line 80, in entry = object_detector(im_data, im_info, gt_boxes, num_boxes, gt_annotation, im_all=None) File "/home/wow/anaconda2/envs/STTran/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in call result = self.forward(*input, **kwargs) File "/media/wow/disk2/STT/STTran-main/lib/object_detector.py", line 306, in forward FINAL_FEATURES = self.fasterRCNN.RCNN_roi_align(FINAL_BASE_FEATURES, FINAL_BBOXES) File "/home/wow/anaconda2/envs/STTran/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in call result = self.forward(*input, **kwargs) File "/media/wow/disk2/STT/STTran-main/fasterRCNN/lib/model/roi_layers/roi_align.py", line 58, in forward input, rois, self.output_size, self.spatial_scale, self.sampling_ratio File "/media/wow/disk2/STT/STTran-main/fasterRCNN/lib/model/roi_layers/roi_align.py", line 20, in forward output = _C.roi_align_forward(input, roi, spatial_scale, output_size[0], output_size[1], sampling_ratio) RuntimeError: cuda runtime error (78) : a PTX JIT compilation failed at /home/cong/Dokumente/faster-rcnn.pytorch/lib/model/csrc/cuda/ROIAlign_cuda.cu:297

opened by Tudouu 3
Problem with torch.utils.ffi

Hello,

I am trying to run your code but I'm coming across some issues regarding torch.utils.ffi. I compiled the draw_rectangles, fpn. Then compiled faster rcnn as instructed and copied the files over the fasterRCNN folder inside STTran, but when I try to run the code as it tries to resolve all the dependencies from line 10 on test.py from lib.object_detector import detector it crashes with the following error:

File "/media/hdd5/raphael/HOI/STTran/test.py", line 10, in from lib.object_detector import detector File "/media/hdd5/raphael/HOI/STTran/lib/object_detector.py", line 10, in from fasterRCNN.lib.model.faster_rcnn.resnet import resnet File "/media/hdd5/raphael/HOI/STTran/fasterRCNN/lib/model/faster_rcnn/resnet.py", line 6, in from fasterRCNN.lib.model.faster_rcnn.faster_rcnn import _fasterRCNN File "/media/hdd5/raphael/HOI/STTran/fasterRCNN/lib/model/faster_rcnn/faster_rcnn.py", line 10, in from model.rpn.rpn import _RPN File "/media/hdd5/raphael/HOI/STTran/fasterRCNN/lib/model/rpn/rpn.py", line 8, in from .proposal_layer import _ProposalLayer File "/media/hdd5/raphael/HOI/STTran/fasterRCNN/lib/model/rpn/proposal_layer.py", line 20, in from model.nms.nms_wrapper import nms File "/media/hdd5/raphael/HOI/STTran/fasterRCNN/lib/model/nms/nms_wrapper.py", line 10, in from model.nms.nms_gpu import nms_gpu File "/media/hdd5/raphael/HOI/STTran/fasterRCNN/lib/model/nms/nms_gpu.py", line 4, in from ._ext import nms File "/media/hdd5/raphael/HOI/STTran/fasterRCNN/lib/model/nms/_ext/nms/init.py", line 2, in from torch.utils.ffi import _wrap_function File "/media/hdd5/raphael/HOI/HOI_env/lib/python3.8/site-packages/torch/utils/ffi/init.py", line 1, in raise ImportError("torch.utils.ffi is deprecated. Please use cpp extensions instead.") ImportError: torch.utils.ffi is deprecated. Please use cpp extensions instead.

This seems to be deprecated for pytorch versions > 0.4 but on the repo, it is stated to use 1.1. Strangely I tried downgrading to torch==0.4.1 but the same error persists. Any ideas on how to work around that would be appreciated!

opened by RRuschel 2
clean_class in ObjectClassifier

Thanx for ur wonderful work. It seems that the clean_class replaces the categories of those data that originally classified as class_ idx with their second highest confidence categories when it is sgdet mode. Could you explain why this operation performs here? Thanx again. Waiting for reply : )

opened by jkli-aa 2
About SGCLS

Hi! I tried your code but I have found that the evaluation for sgcls is extremely low (near to 0) and the predcls and sgdet were the same with your paper. Could you please give me some hint about this problem or could you please check your code for sgcls again? Thank you so much!

opened by qiuyue1993 2
view size is not compatible with input tensor for sgdet

Hi, thanks for your code and paper in advance. However, I have a small question. When I run the training code in predcls or sgcls mode, everything is fine but when I run the training code in sgdet mode, the error below shows:

File "/home/quhaoxuan/STTran/fasterRCNN/lib/model/rpn/rpn.py", line 50, in reshape x = x.view( RuntimeError: view size is not compatible with input tensor's size and stride (at least one dimension spans across two contiguous subspaces). Use .reshape(...) instead.

I understand that this function seems to be only triggered in sgdet setting. But can I ask is there any suggestions on any possible solutions to this error? Many thanks in advance

opened by Harryqu123 2
Decoder_lin in Object Classifier

Hi, I want to make sure for "sgdet" task, why is the decoder_lin not used during testing

Does it mean that the object prediction labels from the faserRCNN are used directly in "sgdet"? Then why is decoder_lin trained?

opened by Shengyu-Feng 2
Test set for performance metrics

@yrcong What train-val-test split is used with Action Genome dataset? From the codebase, it seems like the same test set was used as validation set during training (in train.py) and also as test set in (test.py). What dataset split is used for reported metrics in paper?

opened by zee-fee 1
video action recognition visualization

In the introduction video on youtube, we see the author visualize the recognition of human actions in video frames. What I want to know is, did the author include the code for video action recognition visualization in the code? I hope the author can provide some help, thank you very much

opened by lili124 1

Owner

Yuren Cong

Email: [email protected]

GitHub

Cascaded Deep Video Deblurring Using Temporal Sharpness Prior and Non-local Spatial-Temporal Similarity

This repository is the official PyTorch implementation of Cascaded Deep Video Deblurring Using Temporal Sharpness Prior and Non-local Spatial-Temporal Similarity

4 Dec 11, 2022

Seeing Dynamic Scene in the Dark: High-Quality Video Dataset with Mechatronic Alignment (ICCV2021)

Seeing Dynamic Scene in the Dark: High-Quality Video Dataset with Mechatronic Alignment This is a pytorch project for the paper Seeing Dynamic Scene i

21 Nov 28, 2022

[TIP 2020] Multi-Temporal Scene Classification and Scene Change Detection with Correlation based Fusion

Multi-Temporal Scene Classification and Scene Change Detection with Correlation based Fusion Code for Multi-Temporal Scene Classification and Scene Ch

33 Dec 12, 2022

Neural Scene Graphs for Dynamic Scene (CVPR 2021)

Implementation of Neural Scene Graphs, that optimizes multiple radiance fields to represent different objects and a static scene background. Learned representations can be rendered with novel object compositions and views.

151 Dec 26, 2022

Implementation of the paper NAST: Non-Autoregressive Spatial-Temporal Transformer for Time Series Forecasting.

Non-AR Spatial-Temporal Transformer Introduction Implementation of the paper NAST: Non-Autoregressive Spatial-Temporal Transformer for Time Series For

66 Nov 28, 2022

Group Activity Recognition with Clustered Spatial Temporal Transformer

GroupFormer Group Activity Recognition with Clustered Spatial-TemporalTransformer Backbone Style Action Acc Activity Acc Config Download Inv3+flow+pos

28 Dec 12, 2022

This is the official Pytorch implementation of the paper "Diverse Motion Stylization for Multiple Style Domains via Spatial-Temporal Graph-Based Generative Model"

Diverse Motion Stylization (Official) This is the official Pytorch implementation of this paper. Diverse Motion Stylization for Multiple Style Domains

28 Dec 16, 2022

Graph Self-Attention Network for Learning Spatial-Temporal Interaction Representation in Autonomous Driving

GSAN Introduction Code for paper GSAN: Graph Self-Attention Network for Learning Spatial-Temporal Interaction Representation in Autonomous Driving, wh

6 Oct 27, 2022

Spatial Temporal Graph Convolutional Networks (ST-GCN) for Skeleton-Based Action Recognition in PyTorch

Reminder ST-GCN has transferred to MMSkeleton, and keep on developing as an flexible open source toolbox for skeleton-based human understanding. You a

1.1k Dec 25, 2022

Dynamic Attentive Graph Learning for Image Restoration, ICCV2021 [PyTorch Code]

Dynamic Attentive Graph Learning for Image Restoration This repository is for GATIR introduced in the following paper: Chong Mou, Jian Zhang, Zhuoyuan

84 Dec 9, 2022

Pytorch implementation of Make-A-Scene: Scene-Based Text-to-Image Generation with Human Priors

Make-A-Scene - PyTorch Pytorch implementation (inofficial) of Make-A-Scene: Scene-Based Text-to-Image Generation with Human Priors (https://arxiv.org/

259 Dec 28, 2022

Official PyTorch code of DeepPanoContext: Panoramic 3D Scene Understanding with Holistic Scene Context Graph and Relation-based Optimization (ICCV 2021 Oral).

DeepPanoContext (DPC) [Project Page (with interactive results)][Paper] DeepPanoContext: Panoramic 3D Scene Understanding with Holistic Scene Context G

66 Nov 16, 2022

The implementation of "Shuffle Transformer: Rethinking Spatial Shuffle for Vision Transformer"

Shuffle Transformer The implementation of "Shuffle Transformer: Rethinking Spatial Shuffle for Vision Transformer" Introduction Very recently, window-

87 Nov 29, 2022

Code and models for ICCV2021 paper "Robust Object Detection via Instance-Level Temporal Cycle Confusion".

Robust Object Detection via Instance-Level Temporal Cycle Confusion This repo contains the implementation of the ICCV 2021 paper, Robust Object Detect

69 Oct 13, 2022

data/code repository of "C2F-FWN: Coarse-to-Fine Flow Warping Network for Spatial-Temporal Consistent Motion Transfer"

C2F-FWN data/code repository of "C2F-FWN: Coarse-to-Fine Flow Warping Network for Spatial-Temporal Consistent Motion Transfer" (https://arxiv.org/abs/

46 Dec 14, 2022

Unofficial implementation of "TTNet: Real-time temporal and spatial video analysis of table tennis" (CVPR 2020)

TTNet-Pytorch The implementation for the paper "TTNet: Real-time temporal and spatial video analysis of table tennis" An introduction of the project c

438 Dec 29, 2022

The project is an official implementation of our paper "3D Human Pose Estimation with Spatial and Temporal Transformers".

3D Human Pose Estimation with Spatial and Temporal Transformers This repo is the official implementation for 3D Human Pose Estimation with Spatial and

363 Dec 28, 2022

PyMove is a Python library to simplify queries and visualization of trajectories and other spatial-temporal data

Use PyMove and go much further Information Package Status License Python Version Platforms Build Status PyPi version PyPi Downloads Conda version Cond

64 Nov 15, 2022

A PyTorch implementation of "From Two to One: A New Scene Text Recognizer with Visual Language Modeling Network" (ICCV2021)

From Two to One: A New Scene Text Recognizer with Visual Language Modeling Network The official code of VisionLAN (ICCV2021). VisionLAN successfully a

81 Dec 12, 2022