A brand new hub for Scene Graph Generation methods based on MMdetection (2021). The pipeline of from detection, scene graph generation to downstream tasks (e.g., image cpationing) is supported. Pytorch version implementation of HetH (ECCV 2020) and TopicSG (ICCV 2021) is included.

Overview

MMSceneGraph

LICENSE Python PyTorch

Introduction

MMSceneneGraph is an open source code hub for scene graph generation as well as supporting downstream tasks based on the scene graph on PyTorch. The frontend object detector is supported by open-mmlab/mmdetection.

demo image

Major features

  • Modular design

    We decompose the framework into different components and one can easily construct a customized scene graph generation framework by combining different modules.

  • Support of multiple frameworks out of box

    The toolbox directly supports popular and contemporary detection frameworks, e.g. Faster RCNN, Mask RCNN, etc.

  • Visualization support

    The visualization of the groundtruth/predicted scene graph is integrated into the toolbox.

License

This project is released under the MIT license.

Changelog

Please refer to CHANGELOG.md for details.

Benchmark and model zoo

The original object detection results and models provided by mmdetection are available in the model zoo. The models for the scene graph generation are temporarily unavailable yet.

Supported methods and Datasets

Supported SGG (VRD) methods:

  • Neural Motifs (CVPR'2018)
  • VCTree (CVPR'2019)
  • TDE (CVPR'2020)
  • VTransE (CVPR'2017)
  • IMP (CVPR'2017)
  • KERN (CVPR'2019)
  • GPSNet (CVPR'2020)
  • HetH (ECCV'2020, ours)
  • TopicSG (ICCV'2021, ours)

Supported saliency object detection methods:

  • R3Net (IJCAI'2018)
  • SCRN (ICCV'2019)

Supported image captioning methods:

  • bottom-up (CVPR'2018)
  • XLAN (CVPR'2020)

Supported datasets:

  • Visual Genome: VG150 (CVPR'2017)
  • VRD (ECCV'2016)
  • Visual Genome: VG200/VG-KR (ours)
  • MSCOCO (for object detection, image caption)
  • RelCap (from VG and COCO, ours)

Installation

As our project is built on mmdetection 1.x (which is a bit different from their current master version 2.x), please refer to INSTALL.md. If you want to use mmdetection 2.x, please refer to mmdetection/get_start.md.

Getting Started

Please refer to GETTING_STARTED.md for using the projects. We will update it constantly.

Acknowledgement

We appreciate the contributors of the mmdetection project and Scene-Graph-Benchmark.pytorch which inspires our design.

Citation

If you find this code hub or our works useful in your research works, please consider citing:

@inproceedings{wang2021topic,
  title={Topic Scene Graph Generation by Attention Distillation from Caption},
  author={Wang, Wenbin and Wang, Ruiping and Chen, Xilin},
  booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
  pages={15900--15910},
  month = {October},
  year={2021}
}


@inproceedings{wang2020sketching,
  title={Sketching Image Gist: Human-Mimetic Hierarchical Scene Graph Generation},
  author={Wang, Wenbin and Wang, Ruiping and Shan, Shiguang and Chen, Xilin},
  booktitle={Proceedings of European Conference on Computer Vision (ECCV)},
  pages={222--239},
  year={2020},
  volume={12358},
  doi={10.1007/978-3-030-58601-0_14},
  publisher={Springer}
}

@InProceedings{Wang_2019_CVPR,
author = {Wang, Wenbin and Wang, Ruiping and Shan, Shiguang and Chen, Xilin},
title = {Exploring Context and Visual Pattern of Relationship for Scene Graph Generation},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
pages = {8188-8197},
month = {June},
address = {Long Beach, California, USA},
doi = {10.1109/CVPR.2019.00838},
year = {2019}
}
You might also like...
[TIP 2020] Multi-Temporal Scene Classification and Scene Change Detection with Correlation based Fusion
[TIP 2020] Multi-Temporal Scene Classification and Scene Change Detection with Correlation based Fusion

Multi-Temporal Scene Classification and Scene Change Detection with Correlation based Fusion Code for Multi-Temporal Scene Classification and Scene Ch

Pytorch implementation of Make-A-Scene: Scene-Based Text-to-Image Generation with Human Priors
Pytorch implementation of Make-A-Scene: Scene-Based Text-to-Image Generation with Human Priors

Make-A-Scene - PyTorch Pytorch implementation (inofficial) of Make-A-Scene: Scene-Based Text-to-Image Generation with Human Priors (https://arxiv.org/

OBBDetection: an oriented object detection toolbox modified from MMdetection
OBBDetection: an oriented object detection toolbox modified from MMdetection

OBBDetection note: If you have questions or good suggestions, feel free to propose issues and contact me. introduction OBBDetection is an oriented obj

PyTorch code for our ECCV 2020 paper "Single Image Super-Resolution via a Holistic Attention Network"

HAN PyTorch code for our ECCV 2020 paper "Single Image Super-Resolution via a Holistic Attention Network" This repository is for HAN introduced in the

PyTorch implementation of ECCV 2020 paper "Foley Music: Learning to Generate Music from Videos "

Foley Music: Learning to Generate Music from Videos This repo holds the code for the framework presented on ECCV 2020. Foley Music: Learning to Genera

[ECCV 2020] Gradient-Induced Co-Saliency Detection
[ECCV 2020] Gradient-Induced Co-Saliency Detection

Gradient-Induced Co-Saliency Detection Zhao Zhang*, Wenda Jin*, Jun Xu, Ming-Ming Cheng ⭐ Project Home » The official repo of the ECCV 2020 paper Grad

A PyTorch-based open-source framework that provides methods for improving the weakly annotated data and allows researchers to efficiently develop and compare their own methods.
A PyTorch-based open-source framework that provides methods for improving the weakly annotated data and allows researchers to efficiently develop and compare their own methods.

Knodle (Knowledge-supervised Deep Learning Framework) - a new framework for weak supervision with neural networks. It provides a modularization for se

Code for the paper "Improving Vision-and-Language Navigation with Image-Text Pairs from the Web" (ECCV 2020)

Improving Vision-and-Language Navigation with Image-Text Pairs from the Web Arjun Majumdar, Ayush Shrivastava, Stefan Lee, Peter Anderson, Devi Parikh

SNE-RoadSeg in PyTorch, ECCV 2020
SNE-RoadSeg in PyTorch, ECCV 2020

SNE-RoadSeg Introduction This is the official PyTorch implementation of SNE-RoadSeg: Incorporating Surface Normal Information into Semantic Segmentati

Comments
  • TypeError: unsupported operand type(s) for +: 'Tensor' and 'tuple'

    TypeError: unsupported operand type(s) for +: 'Tensor' and 'tuple'

    When I run "python tools/train.py configs/scene_graph/VG_SgDet_heth_area_mask_X_rcnn_x101_64x4d_fpn_1x.py",

    There was an error: File "/home/XXX/MMSceneGraph/mmdet/models/relation_heads/het_head.py", line 117, in forward roi_feats, union_feats, det_result = self.frontend_features(img, img_meta, det_result, gt_result) File "/home/XXX/MMSceneGraph/mmdet/models/relation_heads/relation_head.py", line 229, in frontend_features return roi_feats + union_feats + (det_result,) TypeError: unsupported operand type(s) for +: 'Tensor' and 'tuple'

    but run "python tools/train.py configs/scene_graph/VG_SgCls_heth_area_mask_X_rcnn_x101_64x4d_fpn_1x.py" is no problem. How can I solve the problem? thx.

    opened by qianqianderizi 5
  • TypeError: unsupported operand type(s) for +: 'Tensor' and 'tuple'

    TypeError: unsupported operand type(s) for +: 'Tensor' and 'tuple'

    When I run "python tools/train.py configs/scene_graph/VG_SgDet_heth_area_mask_X_rcnn_x101_64x4d_fpn_1x.py",

    There was an error: File "/home/XXX/MMSceneGraph/mmdet/models/relation_heads/het_head.py", line 117, in forward roi_feats, union_feats, det_result = self.frontend_features(img, img_meta, det_result, gt_result) File "/home/XXX/MMSceneGraph/mmdet/models/relation_heads/relation_head.py", line 229, in frontend_features return roi_feats + union_feats + (det_result,) TypeError: unsupported operand type(s) for +: 'Tensor' and 'tuple'

    but run "python tools/train.py configs/scene_graph/VG_SgCls_heth_area_mask_X_rcnn_x101_64x4d_fpn_1x.py" is no problem. How can I solve the problem? thx.

    opened by qianqianderizi 0
  • DataContainer error at training stage

    DataContainer error at training stage

    Describe the Issue Train FasterRCNN on VG dataset. Error code: TypeError: conv2d(): argument 'input' (position 1) must be Tensor, not DataContainer

    Reproduction

    1. What command, code, or script did you run? CUDA_VISIBLE_DEVICES=0,1 python -m torch.distributed.launch --nproc_per_node=2 --master_port=12340 ./tools/train.py ./configs/visualgenome/faster_rcnn_x101_64x4d_fpn_1x.py --launcher pytorch

    2. Did you make any modifications on the code? Did you understand what you have modified? No I have not.

    Environment

    Python: 3.8.13 (default, Mar 28 2022, 11:38:47) [GCC 7.5.0]
    CUDA available: True
    CUDA_HOME: /usr/local/cuda-11.3
    NVCC: Build cuda_11.3.r11.3/compiler.29920130_0
    GPU 0: NVIDIA A40
    GCC: gcc (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0
    PyTorch: 1.8.1+cu111
    PyTorch compiling details: PyTorch built with:
      - GCC 7.3
      - C++ Version: 201402
      - Intel(R) Math Kernel Library Version 2020.0.0 Product Build 20191122 for Intel(R) 64 architecture applications
      - Intel(R) MKL-DNN v1.7.0 (Git Hash 7aed236906b1f7a05c0917e5257a1af05e9ff683)
      - OpenMP 201511 (a.k.a. OpenMP 4.5)
      - NNPACK is enabled
      - CPU capability usage: AVX2
      - CUDA Runtime 11.1
      - NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86
      - CuDNN 8.0.5
      - Magma 2.5.2
      - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.1, CUDNN_VERSION=8.0.5, CXX_COMPILER=/opt/rh/devtoolset-7/root/usr/bin/c++, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.8.1, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, 
    
    TorchVision: 0.9.1+cu111
    OpenCV: 4.6.0
    MMCV: 0.4.3
    MMDetection: 1.1.0+126af87
    MMDetection Compiler: GCC 9.4
    MMDetection CUDA Compiler: 11.3
    

    Error traceback

    Traceback (most recent call last):
      File "/home/stud/zhangya/.pycharm_helpers/pydev/pydevd.py", line 1491, in _exec
        pydev_imports.execfile(file, globals, locals)  # execute the script
      File "/home/stud/zhangya/.pycharm_helpers/pydev/_pydev_imps/_pydev_execfile.py", line 18, in execfile
        exec(compile(contents+"\n", file, 'exec'), glob, loc)
      File "/home/stud/zhangya/repo/MMSceneGraph-master/tools/train.py", line 165, in <module>
        main()
      File "/home/stud/zhangya/repo/MMSceneGraph-master/tools/train.py", line 154, in main
        train_detector(
      File "/home/stud/zhangya/repo/MMSceneGraph-master/mmdet/apis/train.py", line 190, in train_detector
        runner.run(data_loaders, cfg.workflow, cfg.total_epochs)
      File "/home/stud/zhangya/repo/MMSceneGraph-master/mmcv-0.4.3/mmcv/runner/runner.py", line 359, in run
        epoch_runner(data_loaders[i], **kwargs)
      File "/home/stud/zhangya/repo/MMSceneGraph-master/mmcv-0.4.3/mmcv/runner/runner.py", line 262, in train
        outputs = self.batch_processor(
      File "/home/stud/zhangya/repo/MMSceneGraph-master/mmdet/apis/train.py", line 77, in batch_processor
        losses = model(**data)
      File "/home/stud/zhangya/miniconda3/envs/mmsg/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
        result = self.forward(*input, **kwargs)
      File "/home/stud/zhangya/miniconda3/envs/mmsg/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 705, in forward
        output = self.module(*inputs[0], **kwargs[0])
      File "/home/stud/zhangya/miniconda3/envs/mmsg/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
        result = self.forward(*input, **kwargs)
      File "/home/stud/zhangya/repo/MMSceneGraph-master/mmdet/core/fp16/decorators.py", line 49, in new_func
        return old_func(*args, **kwargs)
      File "/home/stud/zhangya/repo/MMSceneGraph-master/mmdet/models/detectors/base.py", line 192, in forward
        return self.forward_train(img, img_meta, **kwargs)
      File "/home/stud/zhangya/repo/MMSceneGraph-master/mmdet/models/detectors/two_stage.py", line 227, in forward_train
        x = self.extract_feat(img)
      File "/home/stud/zhangya/repo/MMSceneGraph-master/mmdet/models/detectors/two_stage.py", line 129, in extract_feat
        x = self.backbone(img)
      File "/home/stud/zhangya/miniconda3/envs/mmsg/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
        result = self.forward(*input, **kwargs)
      File "/home/stud/zhangya/repo/MMSceneGraph-master/mmdet/models/backbones/resnet.py", line 496, in forward
        x = self.conv1(x)
      File "/home/stud/zhangya/miniconda3/envs/mmsg/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
        result = self.forward(*input, **kwargs)
      File "/home/stud/zhangya/miniconda3/envs/mmsg/lib/python3.8/site-packages/torch/nn/modules/conv.py", line 399, in forward
        return self._conv_forward(input, self.weight, self.bias)
      File "/home/stud/zhangya/miniconda3/envs/mmsg/lib/python3.8/site-packages/torch/nn/modules/conv.py", line 395, in _conv_forward
        return F.conv2d(input, weight, bias, self.stride,
    TypeError: conv2d(): argument 'input' (position 1) must be Tensor, not DataContainer
    

    Bug fix I checked the solution provided by https://github.com/open-mmlab/mmdetection/issues/2782. But the model is indeed encapsulated by MMDistributedDataParallel. image

    Any solutions? Thank you in advance!

    opened by ZYao720 0
  • About your pre-trained model.

    About your pre-trained model.

    Hello, this is an interesting project, but I did not find your pre-trained model when I was reproducing your program. Can you provide us with the model "./experiments/VG_COCOremap_MASKTRANS_mask_rcnn_x101_64x4d_fpn_1x/latest.pth" ?

    opened by Yun-960 1
Owner
Kenneth-Wong
http://www.kennethwong.tech/
Kenneth-Wong
Time-stretch audio clips quickly with PyTorch (CUDA supported)! Additional utilities for searching efficient transformations are included.

Time-stretch audio clips quickly with PyTorch (CUDA supported)! Additional utilities for searching efficient transformations are included.

Kento Nishi 22 Jul 7, 2022
Decision Transformer: A brand new Offline RL Pattern

DecisionTransformer_StepbyStep Intro Decision Transformer: A brand new Offline RL Pattern. 这是关于NeurIPS 2021 热门论文Decision Transformer的复现。 ?? 原文地址: Deci

Irving 14 Nov 22, 2022
This repository contains various models targetting multimodal representation learning, multimodal fusion for downstream tasks such as multimodal sentiment analysis.

Multimodal Deep Learning ?? ?? ?? Announcing the multimodal deep learning repository that contains implementation of various deep learning-based model

Deep Cognition and Language Research (DeCLaRe) Lab 398 Dec 30, 2022
Official code for the paper "Why Do Self-Supervised Models Transfer? Investigating the Impact of Invariance on Downstream Tasks".

Why Do Self-Supervised Models Transfer? Investigating the Impact of Invariance on Downstream Tasks This repository contains the official code for the

Linus Ericsson 11 Dec 16, 2022
A weakly-supervised scene graph generation codebase. The implementation of our CVPR2021 paper ``Linguistic Structures as Weak Supervision for Visual Scene Graph Generation''

README.md shall be finished soon. WSSGG 0 Overview 1 Installation 1.1 Faster-RCNN 1.2 Language Parser 1.3 GloVe Embeddings 2 Settings 2.1 VG-GT-Graph

Keren Ye 35 Nov 20, 2022
Official PyTorch code of DeepPanoContext: Panoramic 3D Scene Understanding with Holistic Scene Context Graph and Relation-based Optimization (ICCV 2021 Oral).

DeepPanoContext (DPC) [Project Page (with interactive results)][Paper] DeepPanoContext: Panoramic 3D Scene Understanding with Holistic Scene Context G

Cheng Zhang 66 Nov 16, 2022
mmdetection version of TinyBenchmark.

introduction This project is an mmdetection version of TinyBenchmark. TODO list: add TinyPerson dataset and evaluation add crop and merge for image du

null 34 Aug 27, 2022
Fang Zhonghao 13 Nov 19, 2022
Unofficial PyTorch implementation of "RTM3D: Real-time Monocular 3D Detection from Object Keypoints for Autonomous Driving" (ECCV 2020)

RTM3D-PyTorch The PyTorch Implementation of the paper: RTM3D: Real-time Monocular 3D Detection from Object Keypoints for Autonomous Driving (ECCV 2020

Nguyen Mau Dzung 271 Nov 29, 2022
OBBDetection is a oriented object detection library, which is based on MMdetection.

OBBDetection news: We are now updating OBBDetection to new vision based on MMdetection v2.10, which has more advanced models and more efficient featur

jbwang1997 401 Jan 2, 2023