Code for the ICCV'21 paper "Context-aware Scene Graph Generation with Seq2Seq Transformers"

Layer6 Labs

Last update: Dec 18, 2022

Related tags

Deep Learning SGG-Seq2Seq

Overview

ICCV'21 Context-aware Scene Graph Generation with Seq2Seq Transformers

Authors: Yichao Lu*, Himanshu Rai*, Cheng Chang*, Boris Knyazev†, Guangwei Yu, Shashank Shekhar†, Graham W. Taylor†, Maksims Volkovs

* Denotes equal contribution
† University of Guelph / Vector Institute

Prerequisites and Environment

pytorch-gpu 1.13.1
numpy 1.16.0
tqdm

All experiments were conducted on a 20-core Intel(R) Xeon(R) CPU E5-2630 v4 @2.20GHz and 4 NVIDIA V100 GPUs with 32GB GPU memory.

Dataset

Visual Genome

Download it here. Unzip it under the data folder. You should see a vg folder unzipped there. It contains .json annotations that suit the dataloader used in this repo.

Visual Relation Detection

See Images:VRD

Images

Visual Genome

Create a folder for all images:

# ROOT=path/to/cloned/repository
cd $ROOT/data/vg
mkdir VG_100K

Download Visual Genome images from the official page. Unzip all images (part 1 and part 2) into VG_100K/. There should be a total of 108249 files.

Visual Relation Detection

Create the vrd folder under data:

# ROOT=path/to/cloned/repository
cd $ROOT/data/vrd

Download the original annotation json files from here and unzip json_dataset.zip here. The images can be downloaded from here. Unzip sg_dataset.zip to create an sg_dataset folder in data/vrd. Next run the preprocessing scripts:

cd $ROOT
python tools/rename_vrd_with_numbers.py
python tools/convert_vrd_anno_to_coco_format.py

rename_vrd_with_numbers.py converts all non-jpg images (some images are in png or gif) to jpg, and renames them in the {:012d}.jpg format (e.g., "000000000001.jpg"). It also creates new relationship annotations other than the original ones. This is mostly to make things easier for the dataloader. The filename mapping from the original is stored in data/vrd/*_fname_mapping.json where "*" is either "train" or "val".

convert_vrd_anno_to_coco_format.py creates object detection annotations from the new annotations generated above, which are required by the dataloader during training.

Pre-trained Object Detection Models

Download pre-trained object detection models here. Unzip it under the root directory. Note: We do not include code for training object detectors. Please refer to the "(Optional) Training Object Detection Models" section in Large-Scale-VRD.pytorch for this.

Directory Structure

The final directories should look like:

|-- data
|   |-- detections_train.json
|   |-- detections_val.json
|   |-- new_annotations_train.json
|   |-- new_annotations_val.json
|   |-- objects.json
|   |-- predicates.json
|-- evaluation
|-- output
|   |-- pair_predicate_dict.dat
|   |-- train_data.dat
|   |-- valid_data.dat
|-- config.py
|-- core.py
|-- data_utils.py
|-- evaluation_utils.py
|-- feature_utils.py
|-- file_utils.py
|-- preprocess.py
|-- trainer.py
|-- transformer.py

Evaluating Pre-trained Relationship Detection models

DO NOT CHANGE anything in the provided config files(configs/xx/xxxx.yaml) even if you want to test with less or more than 8 GPUs. Use the environment variable CUDA_VISIBLE_DEVICES to control how many and which GPUs to use. Remove the --multi-gpu-test for single-gpu inference.

Visual Genome

NOTE: May require at least 64GB RAM to evaluate on the Visual Genome test set

We use three evaluation metrics for Visual Genome:

SGDET: predict all the three labels and two boxes
SGCLS: predict subject, object and predicate labels given ground truth subject and object boxes
PRDCLS: predict predicate labels given ground truth subject and object boxes and labels

Training Scene Graph Generation Models

With the following command lines, the training results (models and logs) should be in $ROOT/Outputs/xxx/ where xxx is the .yaml file name used in the command without the ".yaml" extension. If you want to test with your trained models, simply run the test commands described above by setting --load_ckpt as the path of your trained models.

Visual Relation Detection

To train our scene graph generation model on the VRD dataset, run

python preprocess.py

python trainer.py --num-encoder-layers 4 --num-decoder-layers 2 --nhead 4 --num-epochs 500 --learning-rate 1e-3

python preprocess_evaluation.py

python write_prediction.py

mv prediction.txt evaluation/vrd/

cd evaluation/vrd

python run_all_for_vrd.py prediction.txt

Visual Genome

To train our scene graph generation model on the VG dataset, download the json files from https://visualgenome.org/api/v0/api_home.html, put the extracted files under data and then run

python preprocess.py

python trainer.py --num-encoder-layers 4 --num-decoder-layers 2 --nhead 4 --num-epochs 2000 --learning-rate 1e-3

python preprocess_evaluation.py

python write_prediction.py

mv prediction.txt evaluation/vg/

cd evaluation/vg

python run_all.py prediction.txt

Acknowledgements

This repository uses code based on the ContrastiveLosses4VRD Ji Zhang, Neural-Motifs source code from Rowan Zellers.

Citation

If you find this code useful in your research, please cite the following paper:

@inproceedings{lu2021seq2seq,
  title={Context-aware Scene Graph Generation with Seq2Seq Transformers},
  author={Yichao Lu, Himanshu Rai, Jason Chang, Boris Knyazev, Guangwei Yu, Shashank Shekhar, Graham W. Taylor, Maksims Volkovs},
  booktitle={ICCV},
  year={2021}
}

You might also like...

Official TensorFlow code for the forthcoming paper

~ Efficient-CapsNet ~ Are you tired of over inflated and overused convolutional neural networks? You're right! It's time for CAPSULES :)

203 Jan 8, 2023

This is the code for the paper "Contrastive Clustering" (AAAI 2021)

Contrastive Clustering (CC) This is the code for the paper "Contrastive Clustering" (AAAI 2021) Dependency python=3.7 pytorch=1.6.0 torchvision=0.8

210 Dec 30, 2022

Code for the paper Learning the Predictability of the Future

Learning the Predictability of the Future Code from the paper Learning the Predictability of the Future. Website of the project in hyperfuture.cs.colu

Computer Vision Lab at Columbia University

139 Nov 18, 2022

PyTorch code for the paper: FeatMatch: Feature-Based Augmentation for Semi-Supervised Learning

FeatMatch: Feature-Based Augmentation for Semi-Supervised Learning This is the PyTorch implementation of our paper: FeatMatch: Feature-Based Augmentat

43 Nov 19, 2022

Code for the paper A Theoretical Analysis of the Repetition Problem in Text Generation

A Theoretical Analysis of the Repetition Problem in Text Generation This repository share the code for the paper "A Theoretical Analysis of the Repeti

37 Nov 21, 2022

Code for our ICASSP 2021 paper: SA-Net: Shuffle Attention for Deep Convolutional Neural Networks

SA-Net: Shuffle Attention for Deep Convolutional Neural Networks (paper) By Qing-Long Zhang and Yu-Bin Yang [State Key Laboratory for Novel Software T

199 Jan 8, 2023

Open source repository for the code accompanying the paper 'Non-Rigid Neural Radiance Fields Reconstruction and Novel View Synthesis of a Deforming Scene from Monocular Video'.

Non-Rigid Neural Radiance Fields This is the official repository for the project "Non-Rigid Neural Radiance Fields: Reconstruction and Novel View Synt

296 Dec 29, 2022

Code for the Shortformer model, from the paper by Ofir Press, Noah A. Smith and Mike Lewis.

Shortformer This repository contains the code and the final checkpoint of the Shortformer model. This file explains how to run our experiments on the

138 Apr 15, 2022

PyTorch code for ICLR 2021 paper Unbiased Teacher for Semi-Supervised Object Detection

Unbiased Teacher for Semi-Supervised Object Detection This is the PyTorch implementation of our paper: Unbiased Teacher for Semi-Supervised Object Detection

366 Dec 28, 2022

Comments

Is this repo runnable?

There seems some bugs in this repo. For instance, the parameter of nn.TransformerEncoderLayer d_model in line 30 of transformer.py is the number of expected features in the input which must be divisible by num_heads. But in this repo, the d_model is set as NUM_BOX_FEATURES 109, and num_heads is 4. (109 % 4 eq 1)

opened by korcys 2
something wrong in the code

tgt_key_padding_mask cannt match with tgt_mask.

Epoch 1 Recall@5 0.40540524032301645 Recall@10 0.5550042311638902 Recall@20 0.691875286804221 Recall@50 0.7947694177616892 Recall@100 0.82678933109488...

Traceback (most recent call last): File "D:/xuexi/graduate/project/SGG-Seq2Seq-main/trainer.py", line 406, in main() File "D:/xuexi/graduate/project/SGG-Seq2Seq-main/trainer.py", line 322, in main predictions = model( File "D:\software\anaconda\envs\pytorch1.8\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl return forward_call(*input, **kwargs) File "D:\xuexi\graduate\project\SGG-Seq2Seq-main\transformer.py", line 92, in forward decoder_states = self.decoder( File "D:\software\anaconda\envs\pytorch1.8\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl return forward_call(*input, **kwargs) File "D:\software\anaconda\envs\pytorch1.8\lib\site-packages\torch\nn\modules\transformer.py", line 291, in forward output = mod(output, memory, tgt_mask=tgt_mask, File "D:\software\anaconda\envs\pytorch1.8\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl return forward_call(*input, **kwargs) File "D:\software\anaconda\envs\pytorch1.8\lib\site-packages\torch\nn\modules\transformer.py", line 576, in forward x = self.norm1(x + self._sa_block(x, tgt_mask, tgt_key_padding_mask)) File "D:\software\anaconda\envs\pytorch1.8\lib\site-packages\torch\nn\modules\transformer.py", line 585, in _sa_block x = self.self_attn(x, x, x, File "D:\software\anaconda\envs\pytorch1.8\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl return forward_call(*input, **kwargs) File "D:\software\anaconda\envs\pytorch1.8\lib\site-packages\torch\nn\modules\activation.py", line 1153, in forward attn_output, attn_output_weights = F.multi_head_attention_forward( File "D:\software\anaconda\envs\pytorch1.8\lib\site-packages\torch\nn\functional.py", line 5155, in multi_head_attention_forward assert key_padding_mask.shape == (bsz, src_len),
AssertionError: expecting key_padding_mask shape of (2, 100), but got torch.Size([2, 78])

Process finished with exit code 1

opened by xywhat 0
Some Issues
Thank you for your code. when I run this repo, some issues confused me.

for code https://github.com/layer6ai-labs/SGG-Seq2Seq/blob/main/transformer.py#L238 "relation_ids" contain relation label such as tensor([[ 0, 48, 31, 48, 31, 31, 22, 31, 48, 48, 22], device='cuda:0'), "relation_ids" will embedded using nn.Embedding and fed into TripletEncoder. But, I think "relation_ids" are groundtruth, may be I'm wrong.

I run the repo, python trainer.py --dataset vg --batch_size 512 --nhead 4 and get: But, I find this repo doesn't use the visual feature of object regions?

For union features in this repo called pairwise_features just contain box feature in https://github.com/layer6ai-labs/SGG-Seq2Seq/blob/main/feature_utils.py#L65. Have you considered using visual union features like motif UnionBoxesAndFeats. Thanks.
opened by JingyuLi-code 0

Code for the ICCV'21 paper "Context-aware Scene Graph Generation with Seq2Seq Transformers"

Related tags

Overview

ICCV'21 Context-aware Scene Graph Generation with Seq2Seq Transformers

Prerequisites and Environment

Dataset

Visual Genome

Visual Relation Detection

Images

Visual Genome

Visual Relation Detection

Pre-trained Object Detection Models

Directory Structure

Evaluating Pre-trained Relationship Detection models

Visual Genome

Training Scene Graph Generation Models

Visual Relation Detection

Visual Genome

Acknowledgements

Citation

You might also like...

Official TensorFlow code for the forthcoming paper

This is the code for the paper "Contrastive Clustering" (AAAI 2021)

Code for the paper Learning the Predictability of the Future

PyTorch code for the paper: FeatMatch: Feature-Based Augmentation for Semi-Supervised Learning

Code for the paper A Theoretical Analysis of the Repetition Problem in Text Generation

Code for our ICASSP 2021 paper: SA-Net: Shuffle Attention for Deep Convolutional Neural Networks

Open source repository for the code accompanying the paper 'Non-Rigid Neural Radiance Fields Reconstruction and Novel View Synthesis of a Deforming Scene from Monocular Video'.

Code for the Shortformer model, from the paper by Ofir Press, Noah A. Smith and Mike Lewis.

PyTorch code for ICLR 2021 paper Unbiased Teacher for Semi-Supervised Object Detection

Comments

Is this repo runnable?

something wrong in the code

Some Issues

Owner

Layer6 Labs

[ICCV21] Self-Calibrating Neural Radiance Fields

The LaTeX and Python code for generating the paper, experiments' results and visualizations reported in each paper is available (whenever possible) in the paper's directory

Inference code for "StylePeople: A Generative Model of Fullbody Human Avatars" paper. This code is for the part of the paper describing video-based avatars.

This is the official source code for SLATE. We provide the code for the model, the training code, and a dataset loader for the 3D Shapes dataset. This code is implemented in Pytorch.

Code for paper ECCV 2020 paper: Who Left the Dogs Out? 3D Animal Reconstruction with Expectation Maximization in the Loop.

TensorFlow code for the neural network presented in the paper: "Structural Language Models of Code" (ICML'2020)

Code for the prototype tool in our paper "CoProtector: Protect Open-Source Code against Unauthorized Training Usage with Data Poisoning".

Code to use Augmented Shapiro Wilks Stopping, as well as code for the paper "Statistically Signifigant Stopping of Neural Network Training"

Code for our method RePRI for Few-Shot Segmentation. Paper at http://arxiv.org/abs/2012.06166

Code for ACM MM 2020 paper "NOH-NMS: Improving Pedestrian Detection by Nearby Objects Hallucination"