Spatial-Temporal Transformer for Dynamic Scene Graph Generation, ICCV2021

Overview

Spatial-Temporal Transformer for Dynamic Scene Graph Generation

Pytorch Implementation of our paper Spatial-Temporal Transformer for Dynamic Scene Graph Generation accepted by ICCV2021. We propose a Transformer-based model STTran to generate dynamic scene graphs of the given video. STTran can detect the visual relationships in each frame.

The introduction video is available now: https://youtu.be/gKpnRU8btLg

GitHub Logo

About the code We run the code on a single RTX2080ti for both training and testing. We borrowed some code from Yang's repository and Zellers' repository.

Usage

We use python=3.6, pytorch=1.1 and torchvision=0.3 in our code. First, clone the repository:

git clone https://github.com/yrcong/STTran.git

We borrow some compiled code for bbox operations.

cd lib/draw_rectangles
python setup.py build_ext --inplace
cd ..
cd fpn/box_intersections_cpu
python setup.py build_ext --inplace

For the object detector part, please follow the compilation from https://github.com/jwyang/faster-rcnn.pytorch We provide a pretrained FasterRCNN model for Action Genome. Please download here and put it in

fasterRCNN/models/faster_rcnn_ag.pth

Dataset

We use the dataset Action Genome to train/evaluate our method. Please process the downloaded dataset with the Toolkit. The directories of the dataset should look like:

|-- action_genome
    |-- annotations   #gt annotations
    |-- frames        #sampled frames
    |-- videos        #original videos

In the experiments for SGCLS/SGDET, we only keep bounding boxes with short edges larger than 16 pixels. Please download the file object_bbox_and_relationship_filtersmall.pkl and put it in the dataloader

Train

You can train the STTran with train.py. We trained the model on a RTX 2080ti:

  • For PredCLS:
python train.py -mode predcls -datasize large -data_path $DATAPATH 
  • For SGCLS:
python train.py -mode sgcls -datasize large -data_path $DATAPATH 
  • For SGDET:
python train.py -mode sgdet -datasize large -data_path $DATAPATH 

Evaluation

You can evaluate the STTran with test.py.

python test.py -m predcls -datasize large -data_path $DATAPATH -model_path $MODELPATH
python test.py -m sgcls -datasize large -data_path $DATAPATH -model_path $MODELPATH
python test.py -m sgdet -datasize large -data_path $DATAPATH -model_path $MODELPATH

Citation

If our work is helpful for your research, please cite our publication:

@inproceedings{cong2021spatial,
  title={Spatial-Temporal Transformer for Dynamic Scene Graph Generation},
  author={Cong, Yuren and Liao, Wentong and Ackermann, Hanno and Rosenhahn, Bodo and Yang, Michael Ying},
  booktitle = {International Conference on Computer Vision (ICCV)},
  year={2021}
  url={https://arxiv.org/abs/2107.12309}
}

Help

When you have any question/idea about the code/paper. Please comment in Github or send us Email. We will reply as soon as possible.

Comments
  • How to use faster rcnn?

    How to use faster rcnn?

    您好,打扰了!请问目前编译 jwyang/faster-rcnn.pytorch,是需要将您仓库中的fasterRCNN目录下文件替换为https://github.com/jwyang/faster-rcnn.pytorch 中文件、修改https://github.com/yrcong/STTran/blob/main/fasterRCNN/lib/model/faster_rcnn/faster_rcnn.py 中代码,然后按照 jwyang/faster-rcnn.pytorch 的readme要求进行编译吗?此外, jwyang中的faster rcnn对于环境的要求似乎与您readme中的环境不同,请问会有影响吗?

    opened by Yassin-fan 5
  • ModuleNotFoundError: No module named 'lib.fpn.box_intersections_cpu.bbox'

    ModuleNotFoundError: No module named 'lib.fpn.box_intersections_cpu.bbox'

    thanks for your great job. when I followed README, and try to run the train command:

    python train.py -mode predcls -datasize large -data_path $DATAPATH

    I had some problem. First, in dataloader/action_genome.py, from scipy.misc import imread . the lib scipy seems not surport the fuction imread in latest version. so i change it to from imageio import imread

    Second, when I continue totry to run the project, a new problem occured: ModuleNotFoundError: No module named 'lib.fpn.box_intersections_cpu.bbox' it means i didnt generate the file in this directory, so I went back to

    cd fpn/box_intersections_cpu python setup.py install ( i found if i dont add 'install ' , the command cant work)

    and then

    ~/project/STTran/lib/fpn/box_intersections_cpu$ python setup.py install running install running build running build_ext running install_lib running install_egg_info Removing /home/liujingwei/anaconda3/envs/scene_graph_benchmark/lib/python3.7/site-packages/bbox_cython-0.0.0-py3.7.egg-info Writing /home/liujingwei/anaconda3/envs/scene_graph_benchmark/lib/python3.7/site-packages/bbox_cython-0.0.0-py3.7.egg-info

    So, it didnt generate the file(lib.fpn.box_intersections_cpu.bbox), and i dont know how to fix it, then I may need your help.^_^

    opened by PrimoWW 5
  • use glove failed

    use glove failed

    Thanks for your codes, its helpful! but some error happend when i run the Train command,error like this:

    The CKPT saved here: data/
    spatial encoder layer num: 1 / temporal decoder layer num: 3
    mode : predcls
    save_path : data/
    model_path : None
    data_path : /home/abc/NewDisk/origin_datasets/ActionGenome/dataset/ag/
    datasize : large
    ckpt : None
    optimizer : adamw
    lr : 1e-05
    nepoch : 10
    enc_layer : 1
    dec_layer : 3
    bce_loss : False
    -------loading annotations---------slowly-----------
    --------------------finish!-------------------------
    xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
    There are 7584 videos and 177330 valid frames
    144 videos are invalid (no person), remove them
    49 videos are invalid (only one frame), remove them
    21643 frames have no human bbox in GT, remove them!
    xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
    -------loading annotations---------slowly-----------
    --------------------finish!-------------------------
    xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
    There are 1750 videos and 56923 valid frames
    41 videos are invalid (no person), remove them
    19 videos are invalid (only one frame), remove them
    8636 frames have no human bbox in GT, remove them!
    xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
    
    loading word vectors from data/glove.6B.200d.pt
    loading word vectors from /home/abc/NewDisk/model_zoo/STTran/glove.6B.200d.pt
    __background__ -> __background__ 
    fail on __background__
    

    The program is stuck at this point, so i wonder why? Thanks!

    opened by ghostlyFeng 4
  • fasterRCNN compilation error with python 3.8 and pytorch 1.12 cuda 11.3

    fasterRCNN compilation error with python 3.8 and pytorch 1.12 cuda 11.3

    I.m trying to setup sttran from scratch and I'm getting errors while setting up fasterRCNN package. The command I run is : STTran/fasterRCNN/lib: python setup.py install

    fatal error: THC/THC.h: No such file or directory
        5 | #include <THC/THC.h>
          |          ^~~~~~~~~~~
    compilation terminated.
    
    opened by zee-fee 3
  • the Action Genome Dump frames of this paper

    the Action Genome Dump frames of this paper

    In your readme, it is said to follow JingweiJ/ActionGenome to get the dataset. Could you please supply the version of your ffmpeg? For I followed JingweiJ/ActionGenome and get the empty folders of all the videos. By the way, I'd like to know whether you use 'python tools/dump_frames.py --all_frames' or just 'python tools/dump_frames.py' to get the dataset? Thanks for your work!

    opened by Yassin-fan 3
  • RuntimeError: cuda runtime error (78) : a PTX JIT compilation failed at

    RuntimeError: cuda runtime error (78) : a PTX JIT compilation failed at

    Wonderful code! Howerever,I encountered the following error.Did I miss something that caused this error?

    mode : predcls save_path : /media/wow/disk2/AG/save model_path : /media/wow/disk2/AG/predcls.tar data_path : /media/wow/disk2/AG/dataset datasize : large ckpt : None optimizer : adamw lr : 1e-05 nepoch : 10 enc_layer : 1 dec_layer : 3 bce_loss : False -------loading annotations---------slowly----------- --------------------finish!------------------------- xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx There are 1750 videos and 56923 valid frames 41 videos are invalid (no person), remove them 19 videos are invalid (only one frame), remove them 8636 frames have no human bbox in GT, remove them! xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx loading word vectors from data/glove.6B.200d.pt loading word vectors from /media/wow/disk2/AG/glove.6B.200d.pt background -> background fail on background


    CKPT /media/wow/disk2/AG/predcls.tar is loaded THCudaCheck FAIL file=/home/cong/Dokumente/faster-rcnn.pytorch/lib/model/csrc/cuda/ROIAlign_cuda.cu line=297 error=78 : a PTX JIT compilation failed Traceback (most recent call last): File "test.py", line 80, in entry = object_detector(im_data, im_info, gt_boxes, num_boxes, gt_annotation, im_all=None) File "/home/wow/anaconda2/envs/STTran/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in call result = self.forward(*input, **kwargs) File "/media/wow/disk2/STT/STTran-main/lib/object_detector.py", line 306, in forward FINAL_FEATURES = self.fasterRCNN.RCNN_roi_align(FINAL_BASE_FEATURES, FINAL_BBOXES) File "/home/wow/anaconda2/envs/STTran/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in call result = self.forward(*input, **kwargs) File "/media/wow/disk2/STT/STTran-main/fasterRCNN/lib/model/roi_layers/roi_align.py", line 58, in forward input, rois, self.output_size, self.spatial_scale, self.sampling_ratio File "/media/wow/disk2/STT/STTran-main/fasterRCNN/lib/model/roi_layers/roi_align.py", line 20, in forward output = _C.roi_align_forward(input, roi, spatial_scale, output_size[0], output_size[1], sampling_ratio) RuntimeError: cuda runtime error (78) : a PTX JIT compilation failed at /home/cong/Dokumente/faster-rcnn.pytorch/lib/model/csrc/cuda/ROIAlign_cuda.cu:297

    opened by Tudouu 3
  • Problem with torch.utils.ffi

    Problem with torch.utils.ffi

    Hello,

    I am trying to run your code but I'm coming across some issues regarding torch.utils.ffi. I compiled the draw_rectangles, fpn. Then compiled faster rcnn as instructed and copied the files over the fasterRCNN folder inside STTran, but when I try to run the code as it tries to resolve all the dependencies from line 10 on test.py from lib.object_detector import detector it crashes with the following error:

    File "/media/hdd5/raphael/HOI/STTran/test.py", line 10, in from lib.object_detector import detector File "/media/hdd5/raphael/HOI/STTran/lib/object_detector.py", line 10, in from fasterRCNN.lib.model.faster_rcnn.resnet import resnet File "/media/hdd5/raphael/HOI/STTran/fasterRCNN/lib/model/faster_rcnn/resnet.py", line 6, in from fasterRCNN.lib.model.faster_rcnn.faster_rcnn import _fasterRCNN File "/media/hdd5/raphael/HOI/STTran/fasterRCNN/lib/model/faster_rcnn/faster_rcnn.py", line 10, in from model.rpn.rpn import _RPN File "/media/hdd5/raphael/HOI/STTran/fasterRCNN/lib/model/rpn/rpn.py", line 8, in from .proposal_layer import _ProposalLayer File "/media/hdd5/raphael/HOI/STTran/fasterRCNN/lib/model/rpn/proposal_layer.py", line 20, in from model.nms.nms_wrapper import nms File "/media/hdd5/raphael/HOI/STTran/fasterRCNN/lib/model/nms/nms_wrapper.py", line 10, in from model.nms.nms_gpu import nms_gpu File "/media/hdd5/raphael/HOI/STTran/fasterRCNN/lib/model/nms/nms_gpu.py", line 4, in from ._ext import nms File "/media/hdd5/raphael/HOI/STTran/fasterRCNN/lib/model/nms/_ext/nms/init.py", line 2, in from torch.utils.ffi import _wrap_function File "/media/hdd5/raphael/HOI/HOI_env/lib/python3.8/site-packages/torch/utils/ffi/init.py", line 1, in raise ImportError("torch.utils.ffi is deprecated. Please use cpp extensions instead.") ImportError: torch.utils.ffi is deprecated. Please use cpp extensions instead.

    This seems to be deprecated for pytorch versions > 0.4 but on the repo, it is stated to use 1.1. Strangely I tried downgrading to torch==0.4.1 but the same error persists. Any ideas on how to work around that would be appreciated!

    opened by RRuschel 2
  • clean_class in ObjectClassifier

    clean_class in ObjectClassifier

    Thanx for ur wonderful work. It seems that the clean_class replaces the categories of those data that originally classified as class_ idx with their second highest confidence categories when it is sgdet mode. Could you explain why this operation performs here? Thanx again. Waiting for reply : )

    opened by jkli-aa 2
  • About SGCLS

    About SGCLS

    Hi! I tried your code but I have found that the evaluation for sgcls is extremely low (near to 0) and the predcls and sgdet were the same with your paper. Could you please give me some hint about this problem or could you please check your code for sgcls again? Thank you so much!

    opened by qiuyue1993 2
  • view size is not compatible with input tensor for sgdet

    view size is not compatible with input tensor for sgdet

    Hi, thanks for your code and paper in advance. However, I have a small question. When I run the training code in predcls or sgcls mode, everything is fine but when I run the training code in sgdet mode, the error below shows:

    File "/home/quhaoxuan/STTran/fasterRCNN/lib/model/rpn/rpn.py", line 50, in reshape x = x.view( RuntimeError: view size is not compatible with input tensor's size and stride (at least one dimension spans across two contiguous subspaces). Use .reshape(...) instead.

    I understand that this function seems to be only triggered in sgdet setting. But can I ask is there any suggestions on any possible solutions to this error? Many thanks in advance

    opened by Harryqu123 2
  • Decoder_lin in Object Classifier

    Decoder_lin in Object Classifier

    Hi, I want to make sure for "sgdet" task, why is the decoder_lin not used during testing

    Does it mean that the object prediction labels from the faserRCNN are used directly in "sgdet"? Then why is decoder_lin trained?

    opened by Shengyu-Feng 2
  • Test set for performance metrics

    Test set for performance metrics

    @yrcong What train-val-test split is used with Action Genome dataset? From the codebase, it seems like the same test set was used as validation set during training (in train.py) and also as test set in (test.py). What dataset split is used for reported metrics in paper?

    opened by zee-fee 1
  • video action recognition visualization

    video action recognition visualization

    In the introduction video on youtube, we see the author visualize the recognition of human actions in video frames. What I want to know is, did the author include the code for video action recognition visualization in the code? I hope the author can provide some help, thank you very much

    opened by lili124 1
Owner
Yuren Cong
Yuren Cong
Cascaded Deep Video Deblurring Using Temporal Sharpness Prior and Non-local Spatial-Temporal Similarity

This repository is the official PyTorch implementation of Cascaded Deep Video Deblurring Using Temporal Sharpness Prior and Non-local Spatial-Temporal Similarity

hippopmonkey 4 Dec 11, 2022
Seeing Dynamic Scene in the Dark: High-Quality Video Dataset with Mechatronic Alignment (ICCV2021)

Seeing Dynamic Scene in the Dark: High-Quality Video Dataset with Mechatronic Alignment This is a pytorch project for the paper Seeing Dynamic Scene i

DV Lab 21 Nov 28, 2022
[TIP 2020] Multi-Temporal Scene Classification and Scene Change Detection with Correlation based Fusion

Multi-Temporal Scene Classification and Scene Change Detection with Correlation based Fusion Code for Multi-Temporal Scene Classification and Scene Ch

Lixiang Ru 33 Dec 12, 2022
Neural Scene Graphs for Dynamic Scene (CVPR 2021)

Implementation of Neural Scene Graphs, that optimizes multiple radiance fields to represent different objects and a static scene background. Learned representations can be rendered with novel object compositions and views.

null 151 Dec 26, 2022
Implementation of the paper NAST: Non-Autoregressive Spatial-Temporal Transformer for Time Series Forecasting.

Non-AR Spatial-Temporal Transformer Introduction Implementation of the paper NAST: Non-Autoregressive Spatial-Temporal Transformer for Time Series For

Chen Kai 66 Nov 28, 2022
Group Activity Recognition with Clustered Spatial Temporal Transformer

GroupFormer Group Activity Recognition with Clustered Spatial-TemporalTransformer Backbone Style Action Acc Activity Acc Config Download Inv3+flow+pos

null 28 Dec 12, 2022
This is the official Pytorch implementation of the paper "Diverse Motion Stylization for Multiple Style Domains via Spatial-Temporal Graph-Based Generative Model"

Diverse Motion Stylization (Official) This is the official Pytorch implementation of this paper. Diverse Motion Stylization for Multiple Style Domains

Soomin Park 28 Dec 16, 2022
Graph Self-Attention Network for Learning Spatial-Temporal Interaction Representation in Autonomous Driving

GSAN Introduction Code for paper GSAN: Graph Self-Attention Network for Learning Spatial-Temporal Interaction Representation in Autonomous Driving, wh

YE Luyao 6 Oct 27, 2022
Spatial Temporal Graph Convolutional Networks (ST-GCN) for Skeleton-Based Action Recognition in PyTorch

Reminder ST-GCN has transferred to MMSkeleton, and keep on developing as an flexible open source toolbox for skeleton-based human understanding. You a

sijie yan 1.1k Dec 25, 2022
Dynamic Attentive Graph Learning for Image Restoration, ICCV2021 [PyTorch Code]

Dynamic Attentive Graph Learning for Image Restoration This repository is for GATIR introduced in the following paper: Chong Mou, Jian Zhang, Zhuoyuan

Jian Zhang 84 Dec 9, 2022
Pytorch implementation of Make-A-Scene: Scene-Based Text-to-Image Generation with Human Priors

Make-A-Scene - PyTorch Pytorch implementation (inofficial) of Make-A-Scene: Scene-Based Text-to-Image Generation with Human Priors (https://arxiv.org/

Casual GAN Papers 259 Dec 28, 2022
Official PyTorch code of DeepPanoContext: Panoramic 3D Scene Understanding with Holistic Scene Context Graph and Relation-based Optimization (ICCV 2021 Oral).

DeepPanoContext (DPC) [Project Page (with interactive results)][Paper] DeepPanoContext: Panoramic 3D Scene Understanding with Holistic Scene Context G

Cheng Zhang 66 Nov 16, 2022
The implementation of "Shuffle Transformer: Rethinking Spatial Shuffle for Vision Transformer"

Shuffle Transformer The implementation of "Shuffle Transformer: Rethinking Spatial Shuffle for Vision Transformer" Introduction Very recently, window-

null 87 Nov 29, 2022
Code and models for ICCV2021 paper "Robust Object Detection via Instance-Level Temporal Cycle Confusion".

Robust Object Detection via Instance-Level Temporal Cycle Confusion This repo contains the implementation of the ICCV 2021 paper, Robust Object Detect

Xin Wang 69 Oct 13, 2022
data/code repository of "C2F-FWN: Coarse-to-Fine Flow Warping Network for Spatial-Temporal Consistent Motion Transfer"

C2F-FWN data/code repository of "C2F-FWN: Coarse-to-Fine Flow Warping Network for Spatial-Temporal Consistent Motion Transfer" (https://arxiv.org/abs/

EKILI 46 Dec 14, 2022
Unofficial implementation of "TTNet: Real-time temporal and spatial video analysis of table tennis" (CVPR 2020)

TTNet-Pytorch The implementation for the paper "TTNet: Real-time temporal and spatial video analysis of table tennis" An introduction of the project c

Nguyen Mau Dung 438 Dec 29, 2022
The project is an official implementation of our paper "3D Human Pose Estimation with Spatial and Temporal Transformers".

3D Human Pose Estimation with Spatial and Temporal Transformers This repo is the official implementation for 3D Human Pose Estimation with Spatial and

Ce Zheng 363 Dec 28, 2022
PyMove is a Python library to simplify queries and visualization of trajectories and other spatial-temporal data

Use PyMove and go much further Information Package Status License Python Version Platforms Build Status PyPi version PyPi Downloads Conda version Cond

Insight Data Science Lab 64 Nov 15, 2022
A PyTorch implementation of "From Two to One: A New Scene Text Recognizer with Visual Language Modeling Network" (ICCV2021)

From Two to One: A New Scene Text Recognizer with Visual Language Modeling Network The official code of VisionLAN (ICCV2021). VisionLAN successfully a

null 81 Dec 12, 2022