Code for the ICCV'21 paper "Context-aware Scene Graph Generation with Seq2Seq Transformers"

Overview

ICCV'21 Context-aware Scene Graph Generation with Seq2Seq Transformers

Authors: Yichao Lu*, Himanshu Rai*, Cheng Chang*, Boris Knyazev†, Guangwei Yu, Shashank Shekhar†, Graham W. Taylor†, Maksims Volkovs

  • * Denotes equal contribution
  • † University of Guelph / Vector Institute

Prerequisites and Environment

  • pytorch-gpu 1.13.1
  • numpy 1.16.0
  • tqdm

All experiments were conducted on a 20-core Intel(R) Xeon(R) CPU E5-2630 v4 @2.20GHz and 4 NVIDIA V100 GPUs with 32GB GPU memory.

Dataset

Visual Genome

Download it here. Unzip it under the data folder. You should see a vg folder unzipped there. It contains .json annotations that suit the dataloader used in this repo.

Visual Relation Detection

See Images:VRD

Images

Visual Genome

Create a folder for all images:

# ROOT=path/to/cloned/repository
cd $ROOT/data/vg
mkdir VG_100K

Download Visual Genome images from the official page. Unzip all images (part 1 and part 2) into VG_100K/. There should be a total of 108249 files.

Visual Relation Detection

Create the vrd folder under data:

# ROOT=path/to/cloned/repository
cd $ROOT/data/vrd

Download the original annotation json files from here and unzip json_dataset.zip here. The images can be downloaded from here. Unzip sg_dataset.zip to create an sg_dataset folder in data/vrd. Next run the preprocessing scripts:

cd $ROOT
python tools/rename_vrd_with_numbers.py
python tools/convert_vrd_anno_to_coco_format.py

rename_vrd_with_numbers.py converts all non-jpg images (some images are in png or gif) to jpg, and renames them in the {:012d}.jpg format (e.g., "000000000001.jpg"). It also creates new relationship annotations other than the original ones. This is mostly to make things easier for the dataloader. The filename mapping from the original is stored in data/vrd/*_fname_mapping.json where "*" is either "train" or "val".

convert_vrd_anno_to_coco_format.py creates object detection annotations from the new annotations generated above, which are required by the dataloader during training.

Pre-trained Object Detection Models

Download pre-trained object detection models here. Unzip it under the root directory. Note: We do not include code for training object detectors. Please refer to the "(Optional) Training Object Detection Models" section in Large-Scale-VRD.pytorch for this.

Directory Structure

The final directories should look like:

|-- data
|   |-- detections_train.json
|   |-- detections_val.json
|   |-- new_annotations_train.json
|   |-- new_annotations_val.json
|   |-- objects.json
|   |-- predicates.json
|-- evaluation
|-- output
|   |-- pair_predicate_dict.dat
|   |-- train_data.dat
|   |-- valid_data.dat
|-- config.py
|-- core.py
|-- data_utils.py
|-- evaluation_utils.py
|-- feature_utils.py
|-- file_utils.py
|-- preprocess.py
|-- trainer.py
|-- transformer.py

Evaluating Pre-trained Relationship Detection models

DO NOT CHANGE anything in the provided config files(configs/xx/xxxx.yaml) even if you want to test with less or more than 8 GPUs. Use the environment variable CUDA_VISIBLE_DEVICES to control how many and which GPUs to use. Remove the --multi-gpu-test for single-gpu inference.

Visual Genome

NOTE: May require at least 64GB RAM to evaluate on the Visual Genome test set

We use three evaluation metrics for Visual Genome:

  1. SGDET: predict all the three labels and two boxes
  2. SGCLS: predict subject, object and predicate labels given ground truth subject and object boxes
  3. PRDCLS: predict predicate labels given ground truth subject and object boxes and labels

Training Scene Graph Generation Models

With the following command lines, the training results (models and logs) should be in $ROOT/Outputs/xxx/ where xxx is the .yaml file name used in the command without the ".yaml" extension. If you want to test with your trained models, simply run the test commands described above by setting --load_ckpt as the path of your trained models.

Visual Relation Detection

To train our scene graph generation model on the VRD dataset, run

python preprocess.py

python trainer.py --num-encoder-layers 4 --num-decoder-layers 2 --nhead 4 --num-epochs 500 --learning-rate 1e-3

python preprocess_evaluation.py

python write_prediction.py

mv prediction.txt evaluation/vrd/

cd evaluation/vrd

python run_all_for_vrd.py prediction.txt

Visual Genome

To train our scene graph generation model on the VG dataset, download the json files from https://visualgenome.org/api/v0/api_home.html, put the extracted files under data and then run

python preprocess.py

python trainer.py --num-encoder-layers 4 --num-decoder-layers 2 --nhead 4 --num-epochs 2000 --learning-rate 1e-3

python preprocess_evaluation.py

python write_prediction.py

mv prediction.txt evaluation/vg/

cd evaluation/vg

python run_all.py prediction.txt

Acknowledgements

This repository uses code based on the ContrastiveLosses4VRD Ji Zhang, Neural-Motifs source code from Rowan Zellers.

Citation

If you find this code useful in your research, please cite the following paper:

@inproceedings{lu2021seq2seq,
  title={Context-aware Scene Graph Generation with Seq2Seq Transformers},
  author={Yichao Lu, Himanshu Rai, Jason Chang, Boris Knyazev, Guangwei Yu, Shashank Shekhar, Graham W. Taylor, Maksims Volkovs},
  booktitle={ICCV},
  year={2021}
}
You might also like...
Official TensorFlow code for the forthcoming paper
Official TensorFlow code for the forthcoming paper

~ Efficient-CapsNet ~ Are you tired of over inflated and overused convolutional neural networks? You're right! It's time for CAPSULES :)

This is the code for the paper
This is the code for the paper "Contrastive Clustering" (AAAI 2021)

Contrastive Clustering (CC) This is the code for the paper "Contrastive Clustering" (AAAI 2021) Dependency python=3.7 pytorch=1.6.0 torchvision=0.8

Code for the paper Learning the Predictability of the Future

Learning the Predictability of the Future Code from the paper Learning the Predictability of the Future. Website of the project in hyperfuture.cs.colu

PyTorch code for the paper: FeatMatch: Feature-Based Augmentation for Semi-Supervised Learning
PyTorch code for the paper: FeatMatch: Feature-Based Augmentation for Semi-Supervised Learning

FeatMatch: Feature-Based Augmentation for Semi-Supervised Learning This is the PyTorch implementation of our paper: FeatMatch: Feature-Based Augmentat

Code for the paper A Theoretical Analysis of the Repetition Problem in Text Generation
Code for the paper A Theoretical Analysis of the Repetition Problem in Text Generation

A Theoretical Analysis of the Repetition Problem in Text Generation This repository share the code for the paper "A Theoretical Analysis of the Repeti

Code for our ICASSP 2021 paper: SA-Net: Shuffle Attention for Deep Convolutional Neural Networks
Code for our ICASSP 2021 paper: SA-Net: Shuffle Attention for Deep Convolutional Neural Networks

SA-Net: Shuffle Attention for Deep Convolutional Neural Networks (paper) By Qing-Long Zhang and Yu-Bin Yang [State Key Laboratory for Novel Software T

Open source repository for the code accompanying the paper 'Non-Rigid Neural Radiance Fields Reconstruction and Novel View Synthesis of a Deforming Scene from Monocular Video'.
Open source repository for the code accompanying the paper 'Non-Rigid Neural Radiance Fields Reconstruction and Novel View Synthesis of a Deforming Scene from Monocular Video'.

Non-Rigid Neural Radiance Fields This is the official repository for the project "Non-Rigid Neural Radiance Fields: Reconstruction and Novel View Synt

Code for the Shortformer model, from the paper by Ofir Press, Noah A. Smith and Mike Lewis.

Shortformer This repository contains the code and the final checkpoint of the Shortformer model. This file explains how to run our experiments on the

PyTorch code for ICLR 2021 paper Unbiased Teacher for Semi-Supervised Object Detection
PyTorch code for ICLR 2021 paper Unbiased Teacher for Semi-Supervised Object Detection

Unbiased Teacher for Semi-Supervised Object Detection This is the PyTorch implementation of our paper: Unbiased Teacher for Semi-Supervised Object Detection

Comments
  • Is this repo runnable?

    Is this repo runnable?

    There seems some bugs in this repo. For instance, the parameter of nn.TransformerEncoderLayer d_model in line 30 of transformer.py is the number of expected features in the input which must be divisible by num_heads. But in this repo, the d_model is set as NUM_BOX_FEATURES 109, and num_heads is 4. (109 % 4 eq 1)

    opened by korcys 2
  • something wrong in the code

    something wrong in the code

    tgt_key_padding_mask cannt match with tgt_mask.

    Epoch 1 Recall@5 0.40540524032301645 Recall@10 0.5550042311638902 Recall@20 0.691875286804221 Recall@50 0.7947694177616892 Recall@100 0.82678933109488...

    Traceback (most recent call last): File "D:/xuexi/graduate/project/SGG-Seq2Seq-main/trainer.py", line 406, in main() File "D:/xuexi/graduate/project/SGG-Seq2Seq-main/trainer.py", line 322, in main predictions = model( File "D:\software\anaconda\envs\pytorch1.8\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl return forward_call(*input, **kwargs) File "D:\xuexi\graduate\project\SGG-Seq2Seq-main\transformer.py", line 92, in forward decoder_states = self.decoder( File "D:\software\anaconda\envs\pytorch1.8\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl return forward_call(*input, **kwargs) File "D:\software\anaconda\envs\pytorch1.8\lib\site-packages\torch\nn\modules\transformer.py", line 291, in forward output = mod(output, memory, tgt_mask=tgt_mask, File "D:\software\anaconda\envs\pytorch1.8\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl return forward_call(*input, **kwargs) File "D:\software\anaconda\envs\pytorch1.8\lib\site-packages\torch\nn\modules\transformer.py", line 576, in forward x = self.norm1(x + self._sa_block(x, tgt_mask, tgt_key_padding_mask)) File "D:\software\anaconda\envs\pytorch1.8\lib\site-packages\torch\nn\modules\transformer.py", line 585, in _sa_block x = self.self_attn(x, x, x, File "D:\software\anaconda\envs\pytorch1.8\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl return forward_call(*input, **kwargs) File "D:\software\anaconda\envs\pytorch1.8\lib\site-packages\torch\nn\modules\activation.py", line 1153, in forward attn_output, attn_output_weights = F.multi_head_attention_forward( File "D:\software\anaconda\envs\pytorch1.8\lib\site-packages\torch\nn\functional.py", line 5155, in multi_head_attention_forward assert key_padding_mask.shape == (bsz, src_len),
    AssertionError: expecting key_padding_mask shape of (2, 100), but got torch.Size([2, 78])

    Process finished with exit code 1

    opened by xywhat 0
  • Some Issues

    Some Issues

    Thank you for your code. when I run this repo, some issues confused me.

    1. for code https://github.com/layer6ai-labs/SGG-Seq2Seq/blob/main/transformer.py#L238 "relation_ids" contain relation label such as tensor([[ 0, 48, 31, 48, 31, 31, 22, 31, 48, 48, 22], device='cuda:0'), "relation_ids" will embedded using nn.Embedding and fed into TripletEncoder. But, I think "relation_ids" are groundtruth, may be I'm wrong.

    2. I run the repo, python trainer.py --dataset vg --batch_size 512 --nhead 4 and get: image But, I find this repo doesn't use the visual feature of object regions?

    3. For union features in this repo called pairwise_features just contain box feature in https://github.com/layer6ai-labs/SGG-Seq2Seq/blob/main/feature_utils.py#L65. Have you considered using visual union features like motif UnionBoxesAndFeats. Thanks.

    opened by JingyuLi-code 0
Owner
Layer6 Labs
Research repositories from Layer 6 AI.
Layer6 Labs
[ICCV21] Self-Calibrating Neural Radiance Fields

Self-Calibrating Neural Radiance Fields, ICCV, 2021 Project Page | Paper | Video Author Information Yoonwoo Jeong [Google Scholar] Seokjun Ahn [Google

null 381 Dec 30, 2022
The LaTeX and Python code for generating the paper, experiments' results and visualizations reported in each paper is available (whenever possible) in the paper's directory

This repository contains the software implementation of most algorithms used or developed in my research. The LaTeX and Python code for generating the

João Fonseca 3 Jan 3, 2023
Inference code for "StylePeople: A Generative Model of Fullbody Human Avatars" paper. This code is for the part of the paper describing video-based avatars.

NeuralTextures This is repository with inference code for paper "StylePeople: A Generative Model of Fullbody Human Avatars" (CVPR21). This code is for

Visual Understanding Lab @ Samsung AI Center Moscow 18 Oct 6, 2022
This is the official source code for SLATE. We provide the code for the model, the training code, and a dataset loader for the 3D Shapes dataset. This code is implemented in Pytorch.

SLATE This is the official source code for SLATE. We provide the code for the model, the training code and a dataset loader for the 3D Shapes dataset.

Gautam Singh 66 Dec 26, 2022
Code for paper ECCV 2020 paper: Who Left the Dogs Out? 3D Animal Reconstruction with Expectation Maximization in the Loop.

Who Left the Dogs Out? Evaluation and demo code for our ECCV 2020 paper: Who Left the Dogs Out? 3D Animal Reconstruction with Expectation Maximization

Benjamin Biggs 29 Dec 28, 2022
TensorFlow code for the neural network presented in the paper: "Structural Language Models of Code" (ICML'2020)

SLM: Structural Language Models of Code This is an official implementation of the model described in: "Structural Language Models of Code" [PDF] To ap

null 73 Nov 6, 2022
Code for the prototype tool in our paper "CoProtector: Protect Open-Source Code against Unauthorized Training Usage with Data Poisoning".

CoProtector Code for the prototype tool in our paper "CoProtector: Protect Open-Source Code against Unauthorized Training Usage with Data Poisoning".

Zhensu Sun 1 Oct 26, 2021
Code to use Augmented Shapiro Wilks Stopping, as well as code for the paper "Statistically Signifigant Stopping of Neural Network Training"

This codebase is being actively maintained, please create and issue if you have issues using it Basics All data files are included under losses and ea

J K Terry 32 Nov 9, 2021
Code for our method RePRI for Few-Shot Segmentation. Paper at http://arxiv.org/abs/2012.06166

Region Proportion Regularized Inference (RePRI) for Few-Shot Segmentation In this repo, we provide the code for our paper : "Few-Shot Segmentation Wit

Malik Boudiaf 138 Dec 12, 2022
Code for ACM MM 2020 paper "NOH-NMS: Improving Pedestrian Detection by Nearby Objects Hallucination"

NOH-NMS: Improving Pedestrian Detection by Nearby Objects Hallucination The offical implementation for the "NOH-NMS: Improving Pedestrian Detection by

Tencent YouTu Research 64 Nov 11, 2022