image scene graph generation benchmark

Microsoft

Last update: Dec 27, 2022

Related tags

Deep Learning scene_graph_benchmark

Overview

Scene Graph Benchmark in PyTorch 1.7

This project is based on maskrcnn-benchmark

Highlights

Upgrad to pytorch 1.7
Multi-GPU training and inference
Batched inference: can perform inference using multiple images per batch per GPU.
Fast and flexible tsv dataset format
Remove FasterRCNN detector dependency: during relation head training, can plugin bounding boxes from any detector.
Provides pre-trained models for different scene graph detection algorithms (IMP, MSDN, GRCNN, Neural Motif, RelDN).
Provides bounding box level and relation level feature extraction functionalities
Provides large detector backbones (ResNxt152)

Installation

Check INSTALL.md for installation instructions.

Model Zoo and Baselines

Pre-trained models can be found in SCENE_GRAPH_MODEL_ZOO.md

Visualization and Demo

We provide a helper class to simplify writing inference pipelines using pre-trained models (Currently only support objects and attributes). Here is how we would do it. Run the following commands:

# visualize VinVL object detection
# pretrained models at https://penzhanwu2.blob.core.windows.net/sgg/sgg_benchmark/vinvl_model_zoo/vinvl_vg_x152c4.pth
# the associated labelmap at https://penzhanwu2.blob.core.windows.net/sgg/sgg_benchmark/vinvl_model_zoo/VG-SGG-dicts-vgoi6-clipped.json
python tools/demo/demo_image.py --config_file sgg_configs/vgattr/vinvl_x152c4.yaml --img_file demo/woman_fish.jpg --save_file output/woman_fish_x152c4.obj.jpg MODEL.WEIGHT pretrained_model/vinvl_vg_x152c4.pth MODEL.ROI_HEADS.NMS_FILTER 1 MODEL.ROI_HEADS.SCORE_THRESH 0.2 TEST.IGNORE_BOX_REGRESSION False

# visualize VinVL object-attribute detection
# pretrained models at https://penzhanwu2.blob.core.windows.net/sgg/sgg_benchmark/vinvl_model_zoo/vinvl_vg_x152c4.pth
# the associated labelmap at https://penzhanwu2.blob.core.windows.net/sgg/sgg_benchmark/vinvl_model_zoo/VG-SGG-dicts-vgoi6-clipped.json
python tools/demo/demo_image.py --config_file sgg_configs/vgattr/vinvl_x152c4.yaml --img_file demo/woman_fish.jpg --save_file output/woman_fish_x152c4.attr.jpg --visualize_attr MODEL.WEIGHT pretrained_model/vinvl_vg_x152c4.pth MODEL.ROI_HEADS.NMS_FILTER 1 MODEL.ROI_HEADS.SCORE_THRESH 0.2 TEST.IGNORE_BOX_REGRESSION False

# visualize OpenImage scene graph generation by RelDN
# pretrained models at https://penzhanwu2.blob.core.windows.net/sgg/sgg_benchmark/sgg_model_zoo/sgg_oi_vrd_model_zoo/RX152FPN_reldn_oi_best.pth
python tools/demo/demo_image.py --config_file sgg_configs/vrd/R152FPN_vrd_reldn.yaml --img_file demo/1024px-Gen_Robert_E_Lee_on_Traveler_at_Gettysburg_Pa.jpg --save_file output/1024px-Gen_Robert_E_Lee_on_Traveler_at_Gettysburg_Pa.reldn_relation.jpg --visualize_relation MODEL.ROI_RELATION_HEAD.DETECTOR_PRE_CALCULATED False

# visualize Visual Genome scene graph generation by neural motif
python tools/demo/demo_image.py --config_file sgg_configs/vg_vrd/rel_danfeiX_FPN50_nm.yaml --img_file demo/1024px-Gen_Robert_E_Lee_on_Traveler_at_Gettysburg_Pa.jpg --save_file demo/1024px-Gen_Robert_E_Lee_on_Traveler_at_Gettysburg_Pa_vgnm.jpg --visualize_relation MODEL.ROI_RELATION_HEAD.DETECTOR_PRE_CALCULATED False DATASETS.LABELMAP_FILE "visualgenome/VG-SGG-dicts-danfeiX-clipped.json" DATA_DIR /home/penzhan/GitHub/maskrcnn-benchmark-1/datasets1 MODEL.ROI_RELATION_HEAD.USE_BIAS True MODEL.ROI_RELATION_HEAD.FILTER_NON_OVERLAP True MODEL.ROI_HEADS.DETECTIONS_PER_IMG 64 MODEL.ROI_RELATION_HEAD.SHARE_BOX_FEATURE_EXTRACTOR False MODEL.ROI_RELATION_HEAD.NEURAL_MOTIF.OBJ_LSTM_NUM_LAYERS 0 MODEL.ROI_RELATION_HEAD.NEURAL_MOTIF.EDGE_LSTM_NUM_LAYERS 2 TEST.IMS_PER_BATCH 2

Perform training

For the following examples to work, you need to first install this repo.

You will also need to download the dataset. Datasets can be downloaded by azcopy with following command:

path/to/azcopy copy 'https://penzhanwu2.blob.core.windows.net/sgg/sgg_benchmark/datasets/TASK_NAME' <target folder> --recursive

TASK_NAME could be visualgenome, openimages_v5c.

We recommend to symlink the path to the dataset to datasets/ as follows

# symlink the dataset
cd ~/github/maskrcnn-benchmark
mkdir -p datasets/openimages_v5c/
ln -s /vrd datasets/openimages_v5c/vrd

You can also prepare your own datasets.

Follow tsv dataset creation instructions tools/mini_tsv/README.md

Single GPU training

python tools/train_sg_net.py --config-file "/path/to/config/file.yaml"

This should work out of the box and is very similar to what we should do for multi-GPU training. But the drawback is that it will use much more GPU memory. The reason is that we set in the configuration files a global batch size that is divided over the number of GPUs. So if we only have a single GPU, this means that the batch size for that GPU will be 4x larger, which might lead to out-of-memory errors.

Multi-GPU training

We use internally torch.distributed.launch in order to launch multi-gpu training. This utility function from PyTorch spawns as many Python processes as the number of GPUs we want to use, and each Python process will only use a single GPU.

export NGPUS=4
python -m torch.distributed.launch --nproc_per_node=$NGPUS tools/train_sg_net.py --config-file "path/to/config/file.yaml"

Evaluation

You can test your model directly on single or multiple gpus. To evaluate relations, one needs to output "relation_scores_all" in the TSV_SAVE_SUBSET. Here are a few example command line for evaluating on 4 GPUS:

export NGPUS=4

python -m torch.distributed.launch --nproc_per_node=$NGPUS tools/test_sg_net.py --config-file CONFIG_FILE_PATH 

# vg IMP evaluation
python -m torch.distributed.launch --nproc_per_node=$NGPUS tools/test_sg_net.py --config-file sgg_configs/vg_vrd/rel_danfeiX_FPN50_imp.yaml

# vg MSDN evaluation
python -m torch.distributed.launch --nproc_per_node=$NGPUS tools/test_sg_net.py --config-file sgg_configs/vg_vrd/rel_danfeiX_FPN50_msdn.yaml

# vg neural motif evaluation
python -m torch.distributed.launch --nproc_per_node=$NGPUS tools/test_sg_net.py --config-file sgg_configs/vg_vrd/rel_danfeiX_FPN50_nm.yaml

# vg GRCNN evaluation
python -m torch.distributed.launch --nproc_per_node=$NGPUS tools/test_sg_net.py --config-file sgg_configs/vg_vrd/rel_danfeiX_FPN50_grcnn.yaml

# vg RelDN evaluation
python -m torch.distributed.launch --nproc_per_node=$NGPUS tools/test_sg_net.py --config-file sgg_configs/vg_vrd/rel_danfeiX_FPN50_reldn.yaml

# oi IMP evaluation
python -m torch.distributed.launch --nproc_per_node=$NGPUS tools/test_sg_net.py --config-file sgg_configs/oi_vrd/R152FPN_imp_bias_oi.yaml

# oi MSDN evaluation
python -m torch.distributed.launch --nproc_per_node=$NGPUS tools/test_sg_net.py --config-file sgg_configs/oi_vrd/R152FPN_msdn_bias_oi.yaml

# oi neural motif evaluation
python -m torch.distributed.launch --nproc_per_node=$NGPUS tools/test_sg_net.py --config-file sgg_configs/oi_vrd/R152FPN_motif_oi.yaml

# oi GRCNN evaluation
python -m torch.distributed.launch --nproc_per_node=$NGPUS tools/test_sg_net.py --config-file sgg_configs/oi_vrd/R152FPN_grcnn_oi.yaml

# oi RelDN evaluation
python -m torch.distributed.launch --nproc_per_node=$NGPUS tools/test_sg_net.py --config-file sgg_configs/vrd/R152FPN_vrd_reldn.yaml

To evaluate in sgcls mode:

export NGPUS=4

python -m torch.distributed.launch --nproc_per_node=$NGPUS tools/test_sg_net.py --config-file CONFIG_FILE_PATH MODEL.ROI_BOX_HEAD.FORCE_BOXES True MODEL.ROI_RELATION_HEAD.MODE "sgcls"

To evaluate in predcls mode:

export NGPUS=4

python -m torch.distributed.launch --nproc_per_node=$NGPUS tools/test_sg_net.py --config-file CONFIG_FILE_PATH MODEL.ROI_RELATION_HEAD.MODE "predcls"

To evaluate with ground truth bbox and ground truth pairs:

export NGPUS=4

python -m torch.distributed.launch --nproc_per_node=$NGPUS tools/test_sg_net.py --config-file CONFIG_FILE_PATH MODEL.ROI_RELATION_HEAD.FORCE_RELATIONS True

Abstractions

For more information on some of the main abstractions in our implementation, see ABSTRACTIONS.md.

Adding your own dataset

This implementation adds support for TSV style datasets. But adding support for training on a new dataset can be done as follows:

from maskrcnn_benchmark.data.datasets.relation_tsv import RelationTSVDataset

class MyDataset(RelationTSVDataset):
    def __init__(self, yaml_file, extra_fields=(), transforms=None,
            is_load_label=True, **kwargs):

        super(MyDataset, self).__init__(yaml_file, extra_fields, transforms, is_load_label, **kwargs)
    
    def your_own_function(self, idx, call=False):
        # you can overwrite function or add your own functions this way
        pass

That's it. You can also add extra fields to the boxlist, such as segmentation masks (using structures.segmentation_mask.SegmentationMask), or even your own instance type.

For a full example of how the VGTSVDataset is implemented, check maskrcnn_benchmark/data/datasets/vg_tsv.py.

Once you have created your dataset, it needs to be added in a couple of places:

maskrcnn_benchmark/data/datasets/__init__.py: add it to __all__
maskrcnn_benchmark/data/datasets/utils/config_args.py: add it's name as an option to tsv_dataset_name

Adding your own evaluation

To enable your dataset for testing, add a corresponding if statement in maskrcnn_benchmark/data/datasets/evaluation/__init__.py:

if isinstance(dataset, datasets.MyDataset):
        return your_evaluation(**args)

VinVL Feature extraction

The output feature will be encoded as base64

# extract vision features with VinVL object-attribute detection model
# pretrained models at https://penzhanwu2.blob.core.windows.net/sgg/sgg_benchmark/vinvl_model_zoo/vinvl_vg_x152c4.pth
# the associated labelmap at https://penzhanwu2.blob.core.windows.net/sgg/sgg_benchmark/vinvl_model_zoo/VG-SGG-dicts-vgoi6-clipped.json
python tools/test_sg_net.py --config-file sgg_configs/vgattr/vinvl_x152c4.yaml TEST.IMS_PER_BATCH 2 MODEL.WEIGHT models/vinvl/vinvl_vg_x152c4.pth MODEL.ROI_HEADS.NMS_FILTER 1 MODEL.ROI_HEADS.SCORE_THRESH 0.2 DATA_DIR "../maskrcnn-benchmark-1/datasets1" TEST.IGNORE_BOX_REGRESSION True MODEL.ATTRIBUTE_ON True

To extract relation features (union bounding box's feature), in yaml file, set TEST.OUTPUT_RELATION_FEATURE to True, add relation_feature in TEST.TSV_SAVE_SUBSET.

To extract bounding box features, in yaml file, set TEST.OUTPUT_FEATURE to True, add feature in TEST.TSV_SAVE_SUBSET.

Troubleshooting

If you have issues running or compiling this code, we have compiled a list of common issues in TROUBLESHOOTING.md. If your issue is not present there, please feel free to open a new issue.

Citations

Please consider citing this project in your publications if it helps your research. The following is a BibTeX reference. The BibTeX entry requires the url LaTeX package.

@misc{han2021image,
      title={Image Scene Graph Generation (SGG) Benchmark}, 
      author={Xiaotian Han and Jianwei Yang and Houdong Hu and Lei Zhang and Jianfeng Gao and Pengchuan Zhang},
      year={2021},
      eprint={2107.12604},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

License

maskrcnn-benchmark is released under the MIT license. See LICENSE for additional details.

Acknowledgement

Comments

Slow feature extraction compared to bottom-up-attention
Hi, thanks for the great work and open-sourcing this project.

I'm excited to try VinVL since it promises faster computation time for the feature extraction part as written in the paper compared to bottom-up-attention

I have created my own TSV file using tsv_demo.py and ran tools/test_sg_net.py to do feature extraction. The sad thing is the feature extraction runs quite slowly. Right now I'm using Pytorch 1.7, Debian 10, with 1 Nvidia T4. The feature extraction process took 9 second / 4 images.

I used bottom-up-attention from https://github.com/airsplay/py-bottom-up-attention and https://github.com/peteanderson80/bottom-up-attention while using OSCAR on the same dataset. these repo give much faster feature extraction time (the first repo need 2.7 seconds / 8 images, while the original caffe bottom-up took less than 1 second for 1 image ) on a similar machine. This contradicts what written in your paper.

Here's some key config that I'm using while running the tools/test_sg_net.py

TEST: IMS_PER_BATCH: 4 IGNORE_BOX_REGRESSION: True SKIP_PERFORMANCE_EVAL: True SAVE_PREDICTIONS: True SAVE_RESULTS_TO_TSV: True TSV_SAVE_SUBSET: ['rect', 'class', 'conf', 'feature'] GATHER_ON_CPU: True OUTPUT_FEATURE : True

I'm check my nvidia-smi and it showing my GPU is working.

Is anyone else have this issue also?
opened by vinson2233 7
Conflicting version pytorch and torchvision

I have followed the instructions in INSTALL.md, but it shows compatibility error between pytorch and torchvision versions.

Am I missing something here?

opened by rafaelpadilla 5
Couldn't load custom C++ ops.

I installed the environment as required, pytorch = = 1.7.1 torchvision = = 0.8.2 torchaudio = = 0.7.2 cudatoolkit = 10.1, but when I run test_ sg_ net.py, an error occurred:
Runtime error occurred in Image Ids: 0,1 Couldn't load custom C++ ops. This can happen if your PyTorch and torchvision versions are incompatible, or if you had errors while compiling torchvision from source. For further information on the compatible versions, check https://github.com/pytorch/vision#installation for the compatibility matrix. Please check your PyTorch version with torch.__version__ and your torchvision version with torchvision.__version__ and verify if they are compatible, and if not please reinstall torchvision so that it matches your PyTorch install.
Can anyone help me?
Thanks

opened by QC-LY 5
VinVL can model the relation prediction?

I found VinVL 'S object and attribute lable is so bigger. So How to use the VinVL in predicate classification. At Now the project only provides 150 object. But visual genome +faster rcnn can detect 1370 object class. It is so big difference.

opened by alice-cool 5
Could you please show me an easy way to generate features for single images？

The instruction seems complicated, and I follow the guide, but only bbox is created without features. Could you give us an reproducible examples to use the demo to extract featrures?

opened by Edwardmark 4
Download vinvl pre-training model

I failed to used azcopy command to download vinvl pre-training model. Could you show me how to download pretraining models? Thank you!

./azcopy cp https://penzhanwu2.blob.core.windows.net/results/vinvl/od_models/vinvl_vg_x152c4.pth .

INFO: Scanning...

failed to perform copy command due to error: Login Credentials missing. No SAS token or OAuth token is present and the resource is not public.

opened by yanan1989 3

RuntimeError: CUDA error: invalid device function

When I try to run

python tools/test_sg_net.py --config-file sgg_configs/vgattr/vinvl_x152c4.yaml TEST.IMS_PER_BATCH 1 MODEL.WEIGHT models/vinvl/vinvl_vg_x152c4.pth MODEL.ROI_HEADS.NMS_FILTER 1 \
MODEL.ROI_HEADS.SCORE_THRESH 0.2 DATA_DIR /my_path_to_prepard_tsv/dataset/tsv TEST.IGNORE_BOX_REGRESSION True MODEL.ATTRIBUTE_ON True TEST.OUTPUT_FEATURE True

with environment

PyTorch version: 1.4.0
Is debug build: No
CUDA used to build PyTorch: 10.1

OS: Ubuntu 16.04.7 LTS
GCC version: (Ubuntu 5.4.0-6ubuntu1~16.04.12) 5.4.0 20160609
CMake version: version 3.5.1

Python version: 3.7
Is CUDA available: Yes
CUDA runtime version: 10.1.243
GPU models and configuration: 
GPU 0: Tesla V100-SXM2-32GB
GPU 1: Tesla V100-SXM2-32GB
GPU 2: Tesla V100-SXM2-32GB
GPU 3: Tesla V100-SXM2-32GB
GPU 4: Tesla V100-SXM2-32GB
GPU 5: Tesla V100-SXM2-32GB
GPU 6: Tesla V100-SXM2-32GB
GPU 7: Tesla V100-SXM2-32GB

Nvidia driver version: 418.67
cuDNN version: /usr/lib/x86_64-linux-gnu/libcudnn.so.7.6.5

Versions of relevant libraries:
[pip3] numpy==1.19.2
[pip3] torch==1.4.0
[pip3] torchvision==0.5.0
[conda] blas                      1.0                         mkl  
[conda] mkl                       2020.2                      256  
[conda] mkl-service               2.3.0            py37he8ac12f_0  
[conda] mkl_fft                   1.3.0            py37h54f3939_0  
[conda] mkl_random                1.1.1            py37h0573a6f_0  
[conda] pytorch                   1.4.0           py3.7_cuda10.1.243_cudnn7.6.3_0    pytorch
[conda] torchvision               0.5.0                py37_cu101    pytorch
        Pillow (8.2.0)

I encountered the error as following

RuntimeError: CUDA error: invalid device function (launch_kernel at /opt/conda/conda-bld/pytorch_1579022060824/work/aten/src/ATen/native/cuda/Loops.cuh:103)
frame #0: c10::Error::Error(c10::SourceLocation, std::string const&) + 0x47 (0x7f3f8ee64627 in /miniconda/envs/py37/lib/python3.7/site-packages/torch/lib/libc10.so)
frame #1: void at::native::gpu_index_kernel<__nv_dl_wrapper_t<__nv_dl_tag<void (*)(at::TensorIterator&, c10::ArrayRef<long>, c10::ArrayRef<long>), &(void at::native::index_kernel_impl<at::native::OpaqueType<8> >(at::TensorIterator&, c10::ArrayRef<long>, c10::ArrayRef<long>)), 1u>> >(at::TensorIterator&, c10::ArrayRef<long>, c10::ArrayRef<long>, __nv_dl_wrapper_t<__nv_dl_tag<void (*)(at::TensorIterator&, c10::ArrayRef<long>, c10::ArrayRef<long>), &(void at::native::index_kernel_impl<at::native::OpaqueType<8> >(at::TensorIterator&, c10::ArrayRef<long>, c10::ArrayRef<long>)), 1u>> const&) + 0x78d (0x7f3f9670368d in /miniconda/envs/py37/lib/python3.7/site-packages/torch/lib/libtorch.so)
frame #2: <unknown function> + 0x571bf32 (0x7f3f966fcf32 in /miniconda/envs/py37/lib/python3.7/site-packages/torch/lib/libtorch.so)
frame #3: <unknown function> + 0x571c298 (0x7f3f966fd298 in /miniconda/envs/py37/lib/python3.7/site-packages/torch/lib/libtorch.so)
frame #4: <unknown function> + 0x16957eb (0x7f3f926767eb in /miniconda/envs/py37/lib/python3.7/site-packages/torch/lib/libtorch.so)
frame #5: at::native::index(at::Tensor const&, c10::ArrayRef<at::Tensor>) + 0x47e (0x7f3f926725ae in /miniconda/envs/py37/lib/python3.7/site-packages/torch/lib/libtorch.so)
frame #6: <unknown function> + 0x1c0155a (0x7f3f92be255a in /miniconda/envs/py37/lib/python3.7/site-packages/torch/lib/libtorch.so)
frame #7: <unknown function> + 0x1c06023 (0x7f3f92be7023 in /miniconda/envs/py37/lib/python3.7/site-packages/torch/lib/libtorch.so)
frame #8: <unknown function> + 0x3820d1a (0x7f3f94801d1a in /miniconda/envs/py37/lib/python3.7/site-packages/torch/lib/libtorch.so)
frame #9: <unknown function> + 0x1c06023 (0x7f3f92be7023 in /miniconda/envs/py37/lib/python3.7/site-packages/torch/lib/libtorch.so)
frame #10: at::Tensor::index(c10::ArrayRef<at::Tensor>) const + 0x191 (0x7f3fc1465931 in /miniconda/envs/py37/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #11: nms_cuda(at::Tensor, float) + 0x7e8 (0x7f3f6982407b in ./maskrcnn_benchmark/_C.cpython-37m-x86_64-linux-gnu.so)
frame #12: nms(at::Tensor const&, at::Tensor const&, float) + 0x790 (0x7f3f697eabb0 in ./maskrcnn_benchmark/_C.cpython-37m-x86_64-linux-gnu.so)
frame #13: <unknown function> + 0x53b97 (0x7f3f697fbb97 in ./maskrcnn_benchmark/_C.cpython-37m-x86_64-linux-gnu.so)
frame #14: <unknown function> + 0x5004d (0x7f3f697f804d in ./maskrcnn_benchmark/_C.cpython-37m-x86_64-linux-gnu.so)
<omitting python frames>

I tried both to set up my env by using option 1 and docker image, both environments give me the same error. If anyone also has the same issue, please guide me.

opened by mohaoran93 3

demo_image.py not working

Hi,

I am running the following code:

python tools/demo/demo_image.py --config_file sgg_configs/vgattr/vinvl_x152c4.yaml --img_file women_fish.jpg --save_file output/woman_fish_x152c4.obj.jpg MODEL.WEIGHT models/vinvl/vinvl_vg_x152c4.pth MODEL.ROI_HEADS.NMS_FILTER 1 MODEL.ROI_HEADS.SCORE_THRESH 0.2 DATA_DIR "." TEST.IGNORE_BOX_REGRESSION False

Here is the error:

    rel_subj_centers = [r['subj_center'] for r in rel_dets]
UnboundLocalError: local variable 'rel_dets' referenced before assignment

I believe the bug is in line https://github.com/microsoft/scene_graph_benchmark/blob/f91725d8b831ba7ee52a583eab3317fbbeffbfe6/tools/demo/demo_image.py#L118

opened by JXT218 3

ModelZoo contains broken links
Hi there,

It looks like the ModelZoo contains broken links for all the OpenImages models such as:

https://penzhanwu2.blob.core.windows.net/phillytools/data/maskrcnn/pretrained_model/sgg_model_zoo/oi_R152_nm.pth

https://penzhanwu2.blob.core.windows.net/phillytools/data/maskrcnn/pretrained_model/sgg_model_zoo/oi_R152_grcnn.pth

https://penzhanwu2.blob.core.windows.net/phillytools/data/maskrcnn/pretrained_model/sgg_model_zoo/oi_R152_reldn.pth

Would it be possible to fix them!?

Thanks, Alessandro
opened by aleSuglia 2

Several issues in extracting VinVL Feature extraction

When I have installed your environment step by step by option 1 and then run the command for below,

# extract vision features with VinVL object-attribute detection model
# pretrained models at https://penzhanwu2.blob.core.windows.net/sgg/sgg_benchmark/vinvl_model_zoo/vinvl_vg_x152c4.pth
# the associated labelmap at https://penzhanwu2.blob.core.windows.net/sgg/sgg_benchmark/vinvl_model_zoo/VG-SGG-dicts-vgoi6-clipped.json
python tools/test_sg_net.py --config-file sgg_configs/vgattr/vinvl_x152c4.yaml TEST.IMS_PER_BATCH 2 MODEL.WEIGHT models/vinvl/vinvl_vg_x152c4.pth MODEL.ROI_HEADS.NMS_FILTER 1 MODEL.ROI_HEADS.SCORE_THRESH 0.2 DATA_DIR "../maskrcnn-benchmark-1/datasets1" TEST.IGNORE_BOX_REGRESSION True MODEL.ATTRIBUTE_ON True

There are several issues,

I do not find the code to load the pre_trained model parameters for "AttrRCNN". Though the command has a pre-trained model path, I do not find a concrete torch.load() code by debugging. Thus I wonder whether I need to add torch.load() by self when I run the above command.
The "self.training" in "AttrRCNN" is extended from "torch.nn.modules.module.py", which is set as True by default. But in run the command to extract VinVL features by the beginning command, it seems that it should be False and I have to overwrite each init functions of AttrRCNN, its "self.rpn", and "self.roi_heads" as below,

 proposals, proposal_losses = self.rpn(images, features, targets, is_training = self.training)
  x, predictions, detector_losses = self.roi_heads(features,  proposals, targets, is_training = self.training)

Instead of applying Pytorch 1.4, I apply Pytorch 1.7, but it always gives running errors for several in-place operations, such as below codes in "bounding_box.py"

        def clip_to_image(self, remove_empty=True):
        TO_REMOVE = 1
        self.bbox[:, 0].clamp_(min=0, max=self.size[0] - TO_REMOVE)
        self.bbox[:, 1].clamp_(min=0, max=self.size[1] - TO_REMOVE)
        self.bbox[:, 2].clamp_(min=0, max=self.size[0] - TO_REMOVE)
        self.bbox[:, 3].clamp_(min=0, max=self.size[1] - TO_REMOVE)

the error is as below,

  File "/home/jfhe/Documents/MountHe/jfhe/mm_dialogue/MM_Dialogue/scene_graph_benchmark/maskrcnn_benchmark/modeling/rpn/rpn.py", line 188, in _forward_test
    boxes = self.box_selector_test(anchors, objectness, rpn_box_regression)
  File "/home/jfhe/anaconda3/envs/JD2/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/jfhe/Documents/MountHe/jfhe/mm_dialogue/MM_Dialogue/scene_graph_benchmark/maskrcnn_benchmark/modeling/rpn/inference.py", line 140, in forward
    sampled_boxes.append(self.forward_for_single_feature_map(a, o, b))
  File "/home/jfhe/Documents/MountHe/jfhe/mm_dialogue/MM_Dialogue/scene_graph_benchmark/maskrcnn_benchmark/modeling/rpn/inference.py", line 114, in forward_for_single_feature_map
    boxlist = boxlist.clip_to_image(remove_empty=False)
  File "/home/jfhe/Documents/MountHe/jfhe/mm_dialogue/MM_Dialogue/scene_graph_benchmark/maskrcnn_benchmark/structures/bounding_box.py", line 217, in clip_to_image
    self.bbox[:, 1].clamp_(min=0, max=self.size[1] - TO_REMOVE)
RuntimeError: Output 0 of UnbindBackward is a view and its base or another view of its base has been modified inplace. This view is the output of a function that returns multiple views. Such functions do not allow the output views to be modified inplace. You should replace the inplace operation by an out-of-place one.

I address them by setting "with torch.no_grad()", but it feels strange. If I have to fine-tune the model, then these bugs will come again by removing "with torch.no_grad()".

Also, I agree with the top question, https://github.com/microsoft/scene_graph_benchmark/issues/25 Could you please provide a simpler way to extract the VinVL features directly? Because it will bring much help to the community, and we will cite your works definitely.

opened by he159ok 2

Broken links to VinVL model and associated labelmaps

Hi! These links are broken (resource not found error): https://penzhanwu2.blob.core.windows.net/sgg/sgg_benchmark/vinvl_model_zoo/vinvl_vg_x152c4.pth https://penzhanwu2.blob.core.windows.net/sgg/sgg_benchmark/vinvl_model_zoo/VG-SGG-dicts-vgoi6-clipped.json

opened by stopmosk 2
Emergency, the pretrained model download links are lost

It seems that links start with "https://penzhanwu2.blob.core.windows.net/" are not accessed. Is there anyone can fix this problem? Thanks very much!

opened by yxding95 1

[VinVL] Support for ONNX

Hello there,

Thank you so much for this great repository. I've been using VinVL for a while now and I'm really pleased with its accuracy. However, considering its size, I was wondering whether you had any plans to support ONNX to speed up the inference process. I have tried myself to enable it but I got some very strange errors while completing some operations.

/home/ubuntu/emma/perception/.venv/lib/python3.9/site-packages/torch/onnx/symbolic_opset9.py:2815: UserWarning: Exporting aten::index operator of advanced indexing in opset 9 is achieved by combination of multiple ONNX operators, including Reshape, Transpose, Concat, and Gather. If indices include negative values, the exported graph will produce incorrect results.
  warnings.warn("Exporting aten::index operator of advanced indexing in opset " +
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/ubuntu/emma/perception/.venv/lib/python3.9/site-packages/torch/autograd/grad_mode.py", line 28, in decorate_context
    return func(*args, **kwargs)
  File "/home/ubuntu/emma/perception/.venv/lib/python3.9/site-packages/pytorch_lightning/core/lightning.py", line 1899, in to_onnx
    torch.onnx.export(self, input_sample, file_path, **kwargs)
  File "/home/ubuntu/emma/perception/.venv/lib/python3.9/site-packages/torch/onnx/__init__.py", line 316, in export
    return utils.export(model, args, f, export_params, verbose, training,
  File "/home/ubuntu/emma/perception/.venv/lib/python3.9/site-packages/torch/onnx/utils.py", line 107, in export
    _export(model, args, f, export_params, verbose, training, input_names, output_names,
  File "/home/ubuntu/emma/perception/.venv/lib/python3.9/site-packages/torch/onnx/utils.py", line 724, in _export
    _model_to_graph(model, args, verbose, input_names,
  File "/home/ubuntu/emma/perception/.venv/lib/python3.9/site-packages/torch/onnx/utils.py", line 497, in _model_to_graph
    graph = _optimize_graph(graph, operator_export_type,
  File "/home/ubuntu/emma/perception/.venv/lib/python3.9/site-packages/torch/onnx/utils.py", line 216, in _optimize_graph
    graph = torch._C._jit_pass_onnx(graph, operator_export_type)
  File "/home/ubuntu/emma/perception/.venv/lib/python3.9/site-packages/torch/onnx/__init__.py", line 373, in _run_symbolic_function
    return utils._run_symbolic_function(*args, **kwargs)
  File "/home/ubuntu/emma/perception/.venv/lib/python3.9/site-packages/torch/onnx/utils.py", line 1032, in _run_symbolic_function
    return symbolic_fn(g, *inputs, **attrs)
  File "/home/ubuntu/emma/perception/.venv/lib/python3.9/site-packages/torch/onnx/symbolic_opset9.py", line 1866, in slice
    raise RuntimeError("step!=1 is currently not supported")
RuntimeError: step!=1 is currently not supported

>>> extractor.to_onnx("storage/model/vinvl_... by Suglia, Alessandro
Suglia, Alessandro17/10 16:52
>>> extractor.to_onnx("storage/model/vinvl_vg_x152c4_simbot.onnx", export_params=True, input_sample=input_sample)
/home/ubuntu/emma/perception/.venv/lib/python3.9/site-packages/torch/functional.py:445: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at  ../aten/src/ATen/native/TensorShape.cpp:2157.)
  return _VF.meshgrid(tensors, **kwargs)  # type: ignore[attr-defined]
/home/ubuntu/emma/perception/.venv/lib/python3.9/site-packages/maskrcnn_benchmark/structures/bounding_box.py:21: TracerWarning: torch.as_tensor results are registered as constants in the trace. You can safely ignore this warning if you use this function to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might cause the trace to be incorrect.
  bbox = torch.as_tensor(bbox, dtype=torch.float32, device=device)
/home/ubuntu/emma/perception/.venv/lib/python3.9/site-packages/maskrcnn_benchmark/structures/bounding_box.py:26: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if bbox.size(-1) != 4:
/home/ubuntu/emma/perception/.venv/lib/python3.9/site-packages/maskrcnn_benchmark/modeling/rpn/inference.py:94: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  pre_nms_top_n = min(self.pre_nms_top_n, num_anchors)
/home/ubuntu/emma/perception/.venv/lib/python3.9/site-packages/maskrcnn_benchmark/modeling/rpn/inference.py:111: TracerWarning: Iterating over a tensor might cause the trace to be incorrect. Passing a tensor of different shape won't change the number of iterations executed (and might lead to errors or silently give incorrect results).
  for proposal, score, im_shape in zip(proposals, objectness, image_shapes):
/home/ubuntu/emma/perception/.venv/lib/python3.9/site-packages/maskrcnn_benchmark/modeling/roi_heads/box_head/inference.py:169: TracerWarning: torch.Tensor results are registered as constants in the trace. You can safely ignore this warning if you use this function to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might cause the trace to be incorrect.
  boxlist_empty.add_field("scores", torch.Tensor([]).to(device))
/home/ubuntu/emma/perception/.venv/lib/python3.9/site-packages/maskrcnn_benchmark/modeling/roi_heads/box_head/inference.py:206: TracerWarning: Using len to get tensor shape might cause the trace to be incorrect. Recommended usage would be tensor.shape[0]. Passing a tensor of different shape might lead to errors or silently give incorrect results.
  if len(inds)>0:
/home/ubuntu/emma/perception/.venv/lib/python3.9/site-packages/maskrcnn_benchmark/modeling/roi_heads/box_head/inference.py:131: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  while new_boxlist.bbox.shape[0] < \
/home/ubuntu/emma/perception/src/emma_perception/models/vinvl_extractor.py:55: TracerWarning: Iterating over a tensor might cause the trace to be incorrect. Passing a tensor of different shape won't change the number of iterations executed (and might lead to errors or silently give incorrect results).
  out = zip(batch["ids"], batch["width"], batch["height"], predictions, cnn_features)
/home/ubuntu/emma/perception/.venv/lib/python3.9/site-packages/maskrcnn_benchmark/structures/bounding_box.py:99: TracerWarning: Converting a tensor to a Python float might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  ratios = tuple(float(s) / float(s_orig) for s, s_orig in zip(size, self.size))
/home/ubuntu/emma/perception/.venv/lib/python3.9/site-packages/torch/onnx/symbolic_opset9.py:2815: UserWarning: Exporting aten::index operator of advanced indexing in opset 9 is achieved by combination of multiple ONNX operators, including Reshape, Transpose, Concat, and Gather. If indices include negative values, the exported graph will produce incorrect results.
  warnings.warn("Exporting aten::index operator of advanced indexing in opset " +
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/ubuntu/emma/perception/.venv/lib/python3.9/site-packages/torch/autograd/grad_mode.py", line 28, in decorate_context
    return func(*args, **kwargs)
  File "/home/ubuntu/emma/perception/.venv/lib/python3.9/site-packages/pytorch_lightning/core/lightning.py", line 1899, in to_onnx
    torch.onnx.export(self, input_sample, file_path, **kwargs)
  File "/home/ubuntu/emma/perception/.venv/lib/python3.9/site-packages/torch/onnx/__init__.py", line 316, in export
    return utils.export(model, args, f, export_params, verbose, training,
  File "/home/ubuntu/emma/perception/.venv/lib/python3.9/site-packages/torch/onnx/utils.py", line 107, in export
    _export(model, args, f, export_params, verbose, training, input_names, output_names,
  File "/home/ubuntu/emma/perception/.venv/lib/python3.9/site-packages/torch/onnx/utils.py", line 724, in _export
    _model_to_graph(model, args, verbose, input_names,
  File "/home/ubuntu/emma/perception/.venv/lib/python3.9/site-packages/torch/onnx/utils.py", line 497, in _model_to_graph
    graph = _optimize_graph(graph, operator_export_type,
  File "/home/ubuntu/emma/perception/.venv/lib/python3.9/site-packages/torch/onnx/utils.py", line 216, in _optimize_graph
    graph = torch._C._jit_pass_onnx(graph, operator_export_type)
  File "/home/ubuntu/emma/perception/.venv/lib/python3.9/site-packages/torch/onnx/__init__.py", line 373, in _run_symbolic_function
    return utils._run_symbolic_function(*args, **kwargs)
  File "/home/ubuntu/emma/perception/.venv/lib/python3.9/site-packages/torch/onnx/utils.py", line 1032, in _run_symbolic_function
    return symbolic_fn(g, *inputs, **attrs)
  File "/home/ubuntu/emma/perception/.venv/lib/python3.9/site-packages/torch/onnx/symbolic_opset9.py", line 1866, in slice
    raise RuntimeError("step!=1 is currently not supported")
RuntimeError: step!=1 is currently not supported

Do you have any advise?

opened by aleSuglia 0

I got nan losses on results

INFO:maskrcnn_benchmark.trainer:eta: 10:55:48 iter: 5100 loss: nan (nan) loss_obj_classifier: 0.0000 (0.0000) loss_pred_classifier: nan (nan) time: 1.1246 (1.1275) data: 0.0462 (0.0644) lr: 0.000467 max mem: 9431

SOLVER: BASE_LR: 0.001 WEIGHT_DECAY: 0.0001 MAX_ITER: 40000 STEPS: (50000,) IMS_PER_BATCH: 16 CHECKPOINT_PERIOD: 10000

opened by tiantao911 1
Is the pre-trained Resnet-50 object detection model available?
Hi, thank you for the nice repository! I had a question regarding the training of the VisualGenome models. The following is written in the paper:

"The object detection model is a ResNet50-FPN detector trained on Visucal Genome"

I have two questions:

I cannot figure out where I can find the pretrained weights of this model? Or are we expected to train this ourselves?

Why is the Backbone resnet different for VisualGenome and OpenImages?

Thanks for the help!
opened by VSJMilewski 0
Add `$schema` to `cgmanifest.json`

This pull request adds the JSON schema for cgmanifest.json.

FAQ

Why?

A JSON schema helps you to ensure that your cgmanifest.json file is valid. JSON schema validation is a build-in feature in most modern IDEs like Visual Studio and Visual Studio Code. Most modern IDEs also provide code-completion for JSON schemas.

How can I validate my cgmanifest.json file?

Most modern IDEs like Visual Studio and Visual Studio Code have a built-in feature to validate JSON files. You can also use this small script to validate your cgmanifest.json file.

Why does it suggest camel case for the properties?

Component Detection is able to read camel case and pascal case properties. However, the JSON schema doesn't have a case-insensitive mode. We therefore suggest camel case as it's the most common format for JSON.

Why is the diff so large?

To deserialize the cgmanifest.json file, we use JSON.parse(). However, to serialize the JSON again we use prettier. We found that, in general, it gave smaller diffs than the default JSON.stringify() function.

opened by JamieMagee 0

image scene graph generation benchmark

Related tags

Overview

Scene Graph Benchmark in PyTorch 1.7

Highlights

Installation

Model Zoo and Baselines

Visualization and Demo

Perform training

Single GPU training

Multi-GPU training

Evaluation

Abstractions

Adding your own dataset

Adding your own evaluation

VinVL Feature extraction

Troubleshooting

Citations

License

Acknowledgement

Comments

FAQ

Why?

How can I validate my cgmanifest.json file?

Why does it suggest camel case for the properties?

Why is the diff so large?

Owner

Microsoft

Pytorch implementation of Make-A-Scene: Scene-Based Text-to-Image Generation with Human Priors

Code for "Learning Canonical Representations for Scene Graph to Image Generation", Herzig & Bar et al., ECCV2020

Official PyTorch code of DeepPanoContext: Panoramic 3D Scene Understanding with Holistic Scene Context Graph and Relation-based Optimization (ICCV 2021 Oral).

Image-generation-baseline - MUGE Text To Image Generation Baseline

Fine-grained Control of Image Caption Generation with Abstract Scene Graphs

Code for CVPR 2021 oral paper "Exploring Data-Efficient 3D Scene Understanding with Contrastive Scene Contexts"

[TIP 2020] Multi-Temporal Scene Classification and Scene Change Detection with Correlation based Fusion

Neural Scene Graphs for Dynamic Scene (CVPR 2021)

Automatic number plate recognition using tech: Yolo, OCR, Scene text detection, scene text recognation, flask, torch

Development kit for MIT Scene Parsing Benchmark

This repository contains several image-to-image translation models, whcih were tested for RGB to NIR image generation. The models are Pix2Pix, Pix2PixHD, CycleGAN and PointWise.

OpenMMLab's Next Generation Video Understanding Toolbox and Benchmark

Benchmark for evaluating open-ended generation

A Research-oriented Federated Learning Library and Benchmark Platform for Graph Neural Networks. Accepted to ICLR'2021 - DPML and MLSys'21 - GNNSys workshops.

[Preprint] "Bag of Tricks for Training Deeper Graph Neural Networks A Comprehensive Benchmark Study" by Tianlong Chen*, Kaixiong Zhou*, Keyu Duan, Wenqing Zheng, Peihao Wang, Xia Hu, Zhangyang Wang

A 1.3B text-to-image generation model trained on 14 million image-text pairs

Source code for the GPT-2 story generation models in the EMNLP 2020 paper "STORIUM: A Dataset and Evaluation Platform for Human-in-the-Loop Story Generation"

Official PyTorch code of Holistic 3D Scene Understanding from a Single Image with Implicit Representation (CVPR 2021)

How can I validate my `cgmanifest.json` file?

[Preprint] "Bag of Tricks for Training Deeper Graph Neural Networks A Comprehensive Benchmark Study" by Tianlong Chen, Kaixiong Zhou, Keyu Duan, Wenqing Zheng, Peihao Wang, Xia Hu, Zhangyang Wang