image scene graph generation benchmark

Overview

Scene Graph Benchmark in PyTorch 1.7

This project is based on maskrcnn-benchmark

alt text

Highlights

  • Upgrad to pytorch 1.7
  • Multi-GPU training and inference
  • Batched inference: can perform inference using multiple images per batch per GPU.
  • Fast and flexible tsv dataset format
  • Remove FasterRCNN detector dependency: during relation head training, can plugin bounding boxes from any detector.
  • Provides pre-trained models for different scene graph detection algorithms (IMP, MSDN, GRCNN, Neural Motif, RelDN).
  • Provides bounding box level and relation level feature extraction functionalities
  • Provides large detector backbones (ResNxt152)

Installation

Check INSTALL.md for installation instructions.

Model Zoo and Baselines

Pre-trained models can be found in SCENE_GRAPH_MODEL_ZOO.md

Visualization and Demo

We provide a helper class to simplify writing inference pipelines using pre-trained models (Currently only support objects and attributes). Here is how we would do it. Run the following commands:

# visualize VinVL object detection
# pretrained models at https://penzhanwu2.blob.core.windows.net/sgg/sgg_benchmark/vinvl_model_zoo/vinvl_vg_x152c4.pth
# the associated labelmap at https://penzhanwu2.blob.core.windows.net/sgg/sgg_benchmark/vinvl_model_zoo/VG-SGG-dicts-vgoi6-clipped.json
python tools/demo/demo_image.py --config_file sgg_configs/vgattr/vinvl_x152c4.yaml --img_file demo/woman_fish.jpg --save_file output/woman_fish_x152c4.obj.jpg MODEL.WEIGHT pretrained_model/vinvl_vg_x152c4.pth MODEL.ROI_HEADS.NMS_FILTER 1 MODEL.ROI_HEADS.SCORE_THRESH 0.2 TEST.IGNORE_BOX_REGRESSION False

# visualize VinVL object-attribute detection
# pretrained models at https://penzhanwu2.blob.core.windows.net/sgg/sgg_benchmark/vinvl_model_zoo/vinvl_vg_x152c4.pth
# the associated labelmap at https://penzhanwu2.blob.core.windows.net/sgg/sgg_benchmark/vinvl_model_zoo/VG-SGG-dicts-vgoi6-clipped.json
python tools/demo/demo_image.py --config_file sgg_configs/vgattr/vinvl_x152c4.yaml --img_file demo/woman_fish.jpg --save_file output/woman_fish_x152c4.attr.jpg --visualize_attr MODEL.WEIGHT pretrained_model/vinvl_vg_x152c4.pth MODEL.ROI_HEADS.NMS_FILTER 1 MODEL.ROI_HEADS.SCORE_THRESH 0.2 TEST.IGNORE_BOX_REGRESSION False

# visualize OpenImage scene graph generation by RelDN
# pretrained models at https://penzhanwu2.blob.core.windows.net/sgg/sgg_benchmark/sgg_model_zoo/sgg_oi_vrd_model_zoo/RX152FPN_reldn_oi_best.pth
python tools/demo/demo_image.py --config_file sgg_configs/vrd/R152FPN_vrd_reldn.yaml --img_file demo/1024px-Gen_Robert_E_Lee_on_Traveler_at_Gettysburg_Pa.jpg --save_file output/1024px-Gen_Robert_E_Lee_on_Traveler_at_Gettysburg_Pa.reldn_relation.jpg --visualize_relation MODEL.ROI_RELATION_HEAD.DETECTOR_PRE_CALCULATED False

# visualize Visual Genome scene graph generation by neural motif
python tools/demo/demo_image.py --config_file sgg_configs/vg_vrd/rel_danfeiX_FPN50_nm.yaml --img_file demo/1024px-Gen_Robert_E_Lee_on_Traveler_at_Gettysburg_Pa.jpg --save_file demo/1024px-Gen_Robert_E_Lee_on_Traveler_at_Gettysburg_Pa_vgnm.jpg --visualize_relation MODEL.ROI_RELATION_HEAD.DETECTOR_PRE_CALCULATED False DATASETS.LABELMAP_FILE "visualgenome/VG-SGG-dicts-danfeiX-clipped.json" DATA_DIR /home/penzhan/GitHub/maskrcnn-benchmark-1/datasets1 MODEL.ROI_RELATION_HEAD.USE_BIAS True MODEL.ROI_RELATION_HEAD.FILTER_NON_OVERLAP True MODEL.ROI_HEADS.DETECTIONS_PER_IMG 64 MODEL.ROI_RELATION_HEAD.SHARE_BOX_FEATURE_EXTRACTOR False MODEL.ROI_RELATION_HEAD.NEURAL_MOTIF.OBJ_LSTM_NUM_LAYERS 0 MODEL.ROI_RELATION_HEAD.NEURAL_MOTIF.EDGE_LSTM_NUM_LAYERS 2 TEST.IMS_PER_BATCH 2

Perform training

For the following examples to work, you need to first install this repo.

You will also need to download the dataset. Datasets can be downloaded by azcopy with following command:

path/to/azcopy copy 'https://penzhanwu2.blob.core.windows.net/sgg/sgg_benchmark/datasets/TASK_NAME' <target folder> --recursive

TASK_NAME could be visualgenome, openimages_v5c.

We recommend to symlink the path to the dataset to datasets/ as follows

# symlink the dataset
cd ~/github/maskrcnn-benchmark
mkdir -p datasets/openimages_v5c/
ln -s /vrd datasets/openimages_v5c/vrd

You can also prepare your own datasets.

Follow tsv dataset creation instructions tools/mini_tsv/README.md

Single GPU training

python tools/train_sg_net.py --config-file "/path/to/config/file.yaml"

This should work out of the box and is very similar to what we should do for multi-GPU training. But the drawback is that it will use much more GPU memory. The reason is that we set in the configuration files a global batch size that is divided over the number of GPUs. So if we only have a single GPU, this means that the batch size for that GPU will be 4x larger, which might lead to out-of-memory errors.

Multi-GPU training

We use internally torch.distributed.launch in order to launch multi-gpu training. This utility function from PyTorch spawns as many Python processes as the number of GPUs we want to use, and each Python process will only use a single GPU.

export NGPUS=4
python -m torch.distributed.launch --nproc_per_node=$NGPUS tools/train_sg_net.py --config-file "path/to/config/file.yaml" 

Evaluation

You can test your model directly on single or multiple gpus. To evaluate relations, one needs to output "relation_scores_all" in the TSV_SAVE_SUBSET. Here are a few example command line for evaluating on 4 GPUS:

export NGPUS=4

python -m torch.distributed.launch --nproc_per_node=$NGPUS tools/test_sg_net.py --config-file CONFIG_FILE_PATH 

# vg IMP evaluation
python -m torch.distributed.launch --nproc_per_node=$NGPUS tools/test_sg_net.py --config-file sgg_configs/vg_vrd/rel_danfeiX_FPN50_imp.yaml

# vg MSDN evaluation
python -m torch.distributed.launch --nproc_per_node=$NGPUS tools/test_sg_net.py --config-file sgg_configs/vg_vrd/rel_danfeiX_FPN50_msdn.yaml

# vg neural motif evaluation
python -m torch.distributed.launch --nproc_per_node=$NGPUS tools/test_sg_net.py --config-file sgg_configs/vg_vrd/rel_danfeiX_FPN50_nm.yaml

# vg GRCNN evaluation
python -m torch.distributed.launch --nproc_per_node=$NGPUS tools/test_sg_net.py --config-file sgg_configs/vg_vrd/rel_danfeiX_FPN50_grcnn.yaml

# vg RelDN evaluation
python -m torch.distributed.launch --nproc_per_node=$NGPUS tools/test_sg_net.py --config-file sgg_configs/vg_vrd/rel_danfeiX_FPN50_reldn.yaml

# oi IMP evaluation
python -m torch.distributed.launch --nproc_per_node=$NGPUS tools/test_sg_net.py --config-file sgg_configs/oi_vrd/R152FPN_imp_bias_oi.yaml

# oi MSDN evaluation
python -m torch.distributed.launch --nproc_per_node=$NGPUS tools/test_sg_net.py --config-file sgg_configs/oi_vrd/R152FPN_msdn_bias_oi.yaml

# oi neural motif evaluation
python -m torch.distributed.launch --nproc_per_node=$NGPUS tools/test_sg_net.py --config-file sgg_configs/oi_vrd/R152FPN_motif_oi.yaml

# oi GRCNN evaluation
python -m torch.distributed.launch --nproc_per_node=$NGPUS tools/test_sg_net.py --config-file sgg_configs/oi_vrd/R152FPN_grcnn_oi.yaml

# oi RelDN evaluation
python -m torch.distributed.launch --nproc_per_node=$NGPUS tools/test_sg_net.py --config-file sgg_configs/vrd/R152FPN_vrd_reldn.yaml

To evaluate in sgcls mode:

export NGPUS=4

python -m torch.distributed.launch --nproc_per_node=$NGPUS tools/test_sg_net.py --config-file CONFIG_FILE_PATH MODEL.ROI_BOX_HEAD.FORCE_BOXES True MODEL.ROI_RELATION_HEAD.MODE "sgcls"

To evaluate in predcls mode:

export NGPUS=4

python -m torch.distributed.launch --nproc_per_node=$NGPUS tools/test_sg_net.py --config-file CONFIG_FILE_PATH MODEL.ROI_RELATION_HEAD.MODE "predcls"

To evaluate with ground truth bbox and ground truth pairs:

export NGPUS=4

python -m torch.distributed.launch --nproc_per_node=$NGPUS tools/test_sg_net.py --config-file CONFIG_FILE_PATH MODEL.ROI_RELATION_HEAD.FORCE_RELATIONS True

Abstractions

For more information on some of the main abstractions in our implementation, see ABSTRACTIONS.md.

Adding your own dataset

This implementation adds support for TSV style datasets. But adding support for training on a new dataset can be done as follows:

from maskrcnn_benchmark.data.datasets.relation_tsv import RelationTSVDataset

class MyDataset(RelationTSVDataset):
    def __init__(self, yaml_file, extra_fields=(), transforms=None,
            is_load_label=True, **kwargs):

        super(MyDataset, self).__init__(yaml_file, extra_fields, transforms, is_load_label, **kwargs)
    
    def your_own_function(self, idx, call=False):
        # you can overwrite function or add your own functions this way
        pass

That's it. You can also add extra fields to the boxlist, such as segmentation masks (using structures.segmentation_mask.SegmentationMask), or even your own instance type.

For a full example of how the VGTSVDataset is implemented, check maskrcnn_benchmark/data/datasets/vg_tsv.py.

Once you have created your dataset, it needs to be added in a couple of places:

Adding your own evaluation

To enable your dataset for testing, add a corresponding if statement in maskrcnn_benchmark/data/datasets/evaluation/__init__.py:

if isinstance(dataset, datasets.MyDataset):
        return your_evaluation(**args)

VinVL Feature extraction

The output feature will be encoded as base64

# extract vision features with VinVL object-attribute detection model
# pretrained models at https://penzhanwu2.blob.core.windows.net/sgg/sgg_benchmark/vinvl_model_zoo/vinvl_vg_x152c4.pth
# the associated labelmap at https://penzhanwu2.blob.core.windows.net/sgg/sgg_benchmark/vinvl_model_zoo/VG-SGG-dicts-vgoi6-clipped.json
python tools/test_sg_net.py --config-file sgg_configs/vgattr/vinvl_x152c4.yaml TEST.IMS_PER_BATCH 2 MODEL.WEIGHT models/vinvl/vinvl_vg_x152c4.pth MODEL.ROI_HEADS.NMS_FILTER 1 MODEL.ROI_HEADS.SCORE_THRESH 0.2 DATA_DIR "../maskrcnn-benchmark-1/datasets1" TEST.IGNORE_BOX_REGRESSION True MODEL.ATTRIBUTE_ON True

To extract relation features (union bounding box's feature), in yaml file, set TEST.OUTPUT_RELATION_FEATURE to True, add relation_feature in TEST.TSV_SAVE_SUBSET.

To extract bounding box features, in yaml file, set TEST.OUTPUT_FEATURE to True, add feature in TEST.TSV_SAVE_SUBSET.

Troubleshooting

If you have issues running or compiling this code, we have compiled a list of common issues in TROUBLESHOOTING.md. If your issue is not present there, please feel free to open a new issue.

Citations

Please consider citing this project in your publications if it helps your research. The following is a BibTeX reference. The BibTeX entry requires the url LaTeX package.

@misc{han2021image,
      title={Image Scene Graph Generation (SGG) Benchmark}, 
      author={Xiaotian Han and Jianwei Yang and Houdong Hu and Lei Zhang and Jianfeng Gao and Pengchuan Zhang},
      year={2021},
      eprint={2107.12604},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

License

maskrcnn-benchmark is released under the MIT license. See LICENSE for additional details.

Acknowledgement

Comments
  • Slow feature extraction compared to bottom-up-attention

    Slow feature extraction compared to bottom-up-attention

    Hi, thanks for the great work and open-sourcing this project.

    I'm excited to try VinVL since it promises faster computation time for the feature extraction part as written in the paper compared to bottom-up-attention image

    I have created my own TSV file using tsv_demo.py and ran tools/test_sg_net.py to do feature extraction. The sad thing is the feature extraction runs quite slowly. Right now I'm using Pytorch 1.7, Debian 10, with 1 Nvidia T4. The feature extraction process took 9 second / 4 images.

    I used bottom-up-attention from https://github.com/airsplay/py-bottom-up-attention and https://github.com/peteanderson80/bottom-up-attention while using OSCAR on the same dataset. these repo give much faster feature extraction time (the first repo need 2.7 seconds / 8 images, while the original caffe bottom-up took less than 1 second for 1 image ) on a similar machine. This contradicts what written in your paper.

    Here's some key config that I'm using while running the tools/test_sg_net.py

    TEST:
        IMS_PER_BATCH: 4
        IGNORE_BOX_REGRESSION: True
        SKIP_PERFORMANCE_EVAL: True
        SAVE_PREDICTIONS: True
        SAVE_RESULTS_TO_TSV: True
        TSV_SAVE_SUBSET: ['rect', 'class', 'conf', 'feature']
        GATHER_ON_CPU: True
        OUTPUT_FEATURE : True
    

    I'm check my nvidia-smi and it showing my GPU is working.

    Is anyone else have this issue also?

    opened by vinson2233 7
  • Conflicting version pytorch and torchvision

    Conflicting version pytorch and torchvision

    I have followed the instructions in INSTALL.md, but it shows compatibility error between pytorch and torchvision versions.

    image

    Am I missing something here?

    opened by rafaelpadilla 5
  • Couldn't load custom C++ ops.

    Couldn't load custom C++ ops.

    I installed the environment as required, pytorch = = 1.7.1 torchvision = = 0.8.2 torchaudio = = 0.7.2 cudatoolkit = 10.1, but when I run test_ sg_ net.py, an error occurred:
    Runtime error occurred in Image Ids: 0,1 Couldn't load custom C++ ops. This can happen if your PyTorch and torchvision versions are incompatible, or if you had errors while compiling torchvision from source. For further information on the compatible versions, check https://github.com/pytorch/vision#installation for the compatibility matrix. Please check your PyTorch version with torch.__version__ and your torchvision version with torchvision.__version__ and verify if they are compatible, and if not please reinstall torchvision so that it matches your PyTorch install.
    Can anyone help me?
    Thanks

    opened by QC-LY 5
  • VinVL can model the relation prediction?

    VinVL can model the relation prediction?

    I found VinVL 'S object and attribute lable is so bigger. So How to use the VinVL in predicate classification. At Now the project only provides 150 object. But visual genome +faster rcnn can detect 1370 object class. It is so big difference.

    opened by alice-cool 5
  • Could you please show me an easy way to generate features for single images?

    Could you please show me an easy way to generate features for single images?

    The instruction seems complicated, and I follow the guide, but only bbox is created without features. Could you give us an reproducible examples to use the demo to extract featrures?

    opened by Edwardmark 4
  • Download vinvl pre-training model

    Download vinvl pre-training model

    I failed to used azcopy command to download vinvl pre-training model. Could you show me how to download pretraining models? Thank you!

    ./azcopy cp https://penzhanwu2.blob.core.windows.net/results/vinvl/od_models/vinvl_vg_x152c4.pth .

    INFO: Scanning...

    failed to perform copy command due to error: Login Credentials missing. No SAS token or OAuth token is present and the resource is not public.

    opened by yanan1989 3
  • RuntimeError: CUDA error: invalid device function

    RuntimeError: CUDA error: invalid device function

    When I try to run

    python tools/test_sg_net.py --config-file sgg_configs/vgattr/vinvl_x152c4.yaml TEST.IMS_PER_BATCH 1 MODEL.WEIGHT models/vinvl/vinvl_vg_x152c4.pth MODEL.ROI_HEADS.NMS_FILTER 1 \
    MODEL.ROI_HEADS.SCORE_THRESH 0.2 DATA_DIR /my_path_to_prepard_tsv/dataset/tsv TEST.IGNORE_BOX_REGRESSION True MODEL.ATTRIBUTE_ON True TEST.OUTPUT_FEATURE True
    

    with environment

    PyTorch version: 1.4.0
    Is debug build: No
    CUDA used to build PyTorch: 10.1
    
    OS: Ubuntu 16.04.7 LTS
    GCC version: (Ubuntu 5.4.0-6ubuntu1~16.04.12) 5.4.0 20160609
    CMake version: version 3.5.1
    
    Python version: 3.7
    Is CUDA available: Yes
    CUDA runtime version: 10.1.243
    GPU models and configuration: 
    GPU 0: Tesla V100-SXM2-32GB
    GPU 1: Tesla V100-SXM2-32GB
    GPU 2: Tesla V100-SXM2-32GB
    GPU 3: Tesla V100-SXM2-32GB
    GPU 4: Tesla V100-SXM2-32GB
    GPU 5: Tesla V100-SXM2-32GB
    GPU 6: Tesla V100-SXM2-32GB
    GPU 7: Tesla V100-SXM2-32GB
    
    Nvidia driver version: 418.67
    cuDNN version: /usr/lib/x86_64-linux-gnu/libcudnn.so.7.6.5
    
    Versions of relevant libraries:
    [pip3] numpy==1.19.2
    [pip3] torch==1.4.0
    [pip3] torchvision==0.5.0
    [conda] blas                      1.0                         mkl  
    [conda] mkl                       2020.2                      256  
    [conda] mkl-service               2.3.0            py37he8ac12f_0  
    [conda] mkl_fft                   1.3.0            py37h54f3939_0  
    [conda] mkl_random                1.1.1            py37h0573a6f_0  
    [conda] pytorch                   1.4.0           py3.7_cuda10.1.243_cudnn7.6.3_0    pytorch
    [conda] torchvision               0.5.0                py37_cu101    pytorch
            Pillow (8.2.0)
    
    

    I encountered the error as following

    RuntimeError: CUDA error: invalid device function (launch_kernel at /opt/conda/conda-bld/pytorch_1579022060824/work/aten/src/ATen/native/cuda/Loops.cuh:103)
    frame #0: c10::Error::Error(c10::SourceLocation, std::string const&) + 0x47 (0x7f3f8ee64627 in /miniconda/envs/py37/lib/python3.7/site-packages/torch/lib/libc10.so)
    frame #1: void at::native::gpu_index_kernel<__nv_dl_wrapper_t<__nv_dl_tag<void (*)(at::TensorIterator&, c10::ArrayRef<long>, c10::ArrayRef<long>), &(void at::native::index_kernel_impl<at::native::OpaqueType<8> >(at::TensorIterator&, c10::ArrayRef<long>, c10::ArrayRef<long>)), 1u>> >(at::TensorIterator&, c10::ArrayRef<long>, c10::ArrayRef<long>, __nv_dl_wrapper_t<__nv_dl_tag<void (*)(at::TensorIterator&, c10::ArrayRef<long>, c10::ArrayRef<long>), &(void at::native::index_kernel_impl<at::native::OpaqueType<8> >(at::TensorIterator&, c10::ArrayRef<long>, c10::ArrayRef<long>)), 1u>> const&) + 0x78d (0x7f3f9670368d in /miniconda/envs/py37/lib/python3.7/site-packages/torch/lib/libtorch.so)
    frame #2: <unknown function> + 0x571bf32 (0x7f3f966fcf32 in /miniconda/envs/py37/lib/python3.7/site-packages/torch/lib/libtorch.so)
    frame #3: <unknown function> + 0x571c298 (0x7f3f966fd298 in /miniconda/envs/py37/lib/python3.7/site-packages/torch/lib/libtorch.so)
    frame #4: <unknown function> + 0x16957eb (0x7f3f926767eb in /miniconda/envs/py37/lib/python3.7/site-packages/torch/lib/libtorch.so)
    frame #5: at::native::index(at::Tensor const&, c10::ArrayRef<at::Tensor>) + 0x47e (0x7f3f926725ae in /miniconda/envs/py37/lib/python3.7/site-packages/torch/lib/libtorch.so)
    frame #6: <unknown function> + 0x1c0155a (0x7f3f92be255a in /miniconda/envs/py37/lib/python3.7/site-packages/torch/lib/libtorch.so)
    frame #7: <unknown function> + 0x1c06023 (0x7f3f92be7023 in /miniconda/envs/py37/lib/python3.7/site-packages/torch/lib/libtorch.so)
    frame #8: <unknown function> + 0x3820d1a (0x7f3f94801d1a in /miniconda/envs/py37/lib/python3.7/site-packages/torch/lib/libtorch.so)
    frame #9: <unknown function> + 0x1c06023 (0x7f3f92be7023 in /miniconda/envs/py37/lib/python3.7/site-packages/torch/lib/libtorch.so)
    frame #10: at::Tensor::index(c10::ArrayRef<at::Tensor>) const + 0x191 (0x7f3fc1465931 in /miniconda/envs/py37/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
    frame #11: nms_cuda(at::Tensor, float) + 0x7e8 (0x7f3f6982407b in ./maskrcnn_benchmark/_C.cpython-37m-x86_64-linux-gnu.so)
    frame #12: nms(at::Tensor const&, at::Tensor const&, float) + 0x790 (0x7f3f697eabb0 in ./maskrcnn_benchmark/_C.cpython-37m-x86_64-linux-gnu.so)
    frame #13: <unknown function> + 0x53b97 (0x7f3f697fbb97 in ./maskrcnn_benchmark/_C.cpython-37m-x86_64-linux-gnu.so)
    frame #14: <unknown function> + 0x5004d (0x7f3f697f804d in ./maskrcnn_benchmark/_C.cpython-37m-x86_64-linux-gnu.so)
    <omitting python frames>
    

    I tried both to set up my env by using option 1 and docker image, both environments give me the same error. If anyone also has the same issue, please guide me.

    opened by mohaoran93 3
  • demo_image.py not working

    demo_image.py not working

    Hi,

    I am running the following code:

    python tools/demo/demo_image.py --config_file sgg_configs/vgattr/vinvl_x152c4.yaml --img_file women_fish.jpg --save_file output/woman_fish_x152c4.obj.jpg MODEL.WEIGHT models/vinvl/vinvl_vg_x152c4.pth MODEL.ROI_HEADS.NMS_FILTER 1 MODEL.ROI_HEADS.SCORE_THRESH 0.2 DATA_DIR "." TEST.IGNORE_BOX_REGRESSION False
    

    Here is the error:

        rel_subj_centers = [r['subj_center'] for r in rel_dets]
    UnboundLocalError: local variable 'rel_dets' referenced before assignment
    

    I believe the bug is in line https://github.com/microsoft/scene_graph_benchmark/blob/f91725d8b831ba7ee52a583eab3317fbbeffbfe6/tools/demo/demo_image.py#L118

    opened by JXT218 3
  • ModelZoo contains broken links

    ModelZoo contains broken links

    Hi there,

    It looks like the ModelZoo contains broken links for all the OpenImages models such as:

    1. https://penzhanwu2.blob.core.windows.net/phillytools/data/maskrcnn/pretrained_model/sgg_model_zoo/oi_R152_nm.pth
    2. https://penzhanwu2.blob.core.windows.net/phillytools/data/maskrcnn/pretrained_model/sgg_model_zoo/oi_R152_grcnn.pth
    3. https://penzhanwu2.blob.core.windows.net/phillytools/data/maskrcnn/pretrained_model/sgg_model_zoo/oi_R152_reldn.pth

    Would it be possible to fix them!?

    Thanks, Alessandro

    opened by aleSuglia 2
  • Several issues in extracting VinVL Feature extraction

    Several issues in extracting VinVL Feature extraction

    When I have installed your environment step by step by option 1 and then run the command for below,

    # extract vision features with VinVL object-attribute detection model
    # pretrained models at https://penzhanwu2.blob.core.windows.net/sgg/sgg_benchmark/vinvl_model_zoo/vinvl_vg_x152c4.pth
    # the associated labelmap at https://penzhanwu2.blob.core.windows.net/sgg/sgg_benchmark/vinvl_model_zoo/VG-SGG-dicts-vgoi6-clipped.json
    python tools/test_sg_net.py --config-file sgg_configs/vgattr/vinvl_x152c4.yaml TEST.IMS_PER_BATCH 2 MODEL.WEIGHT models/vinvl/vinvl_vg_x152c4.pth MODEL.ROI_HEADS.NMS_FILTER 1 MODEL.ROI_HEADS.SCORE_THRESH 0.2 DATA_DIR "../maskrcnn-benchmark-1/datasets1" TEST.IGNORE_BOX_REGRESSION True MODEL.ATTRIBUTE_ON True
    

    There are several issues,

    1. I do not find the code to load the pre_trained model parameters for "AttrRCNN". Though the command has a pre-trained model path, I do not find a concrete torch.load() code by debugging. Thus I wonder whether I need to add torch.load() by self when I run the above command.

    2. The "self.training" in "AttrRCNN" is extended from "torch.nn.modules.module.py", which is set as True by default. But in run the command to extract VinVL features by the beginning command, it seems that it should be False and I have to overwrite each init functions of AttrRCNN, its "self.rpn", and "self.roi_heads" as below,

     proposals, proposal_losses = self.rpn(images, features, targets, is_training = self.training)
      x, predictions, detector_losses = self.roi_heads(features,  proposals, targets, is_training = self.training) 
    
    1. Instead of applying Pytorch 1.4, I apply Pytorch 1.7, but it always gives running errors for several in-place operations, such as below codes in "bounding_box.py"
            def clip_to_image(self, remove_empty=True):
            TO_REMOVE = 1
            self.bbox[:, 0].clamp_(min=0, max=self.size[0] - TO_REMOVE)
            self.bbox[:, 1].clamp_(min=0, max=self.size[1] - TO_REMOVE)
            self.bbox[:, 2].clamp_(min=0, max=self.size[0] - TO_REMOVE)
            self.bbox[:, 3].clamp_(min=0, max=self.size[1] - TO_REMOVE)
    

    the error is as below,

      File "/home/jfhe/Documents/MountHe/jfhe/mm_dialogue/MM_Dialogue/scene_graph_benchmark/maskrcnn_benchmark/modeling/rpn/rpn.py", line 188, in _forward_test
        boxes = self.box_selector_test(anchors, objectness, rpn_box_regression)
      File "/home/jfhe/anaconda3/envs/JD2/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
        result = self.forward(*input, **kwargs)
      File "/home/jfhe/Documents/MountHe/jfhe/mm_dialogue/MM_Dialogue/scene_graph_benchmark/maskrcnn_benchmark/modeling/rpn/inference.py", line 140, in forward
        sampled_boxes.append(self.forward_for_single_feature_map(a, o, b))
      File "/home/jfhe/Documents/MountHe/jfhe/mm_dialogue/MM_Dialogue/scene_graph_benchmark/maskrcnn_benchmark/modeling/rpn/inference.py", line 114, in forward_for_single_feature_map
        boxlist = boxlist.clip_to_image(remove_empty=False)
      File "/home/jfhe/Documents/MountHe/jfhe/mm_dialogue/MM_Dialogue/scene_graph_benchmark/maskrcnn_benchmark/structures/bounding_box.py", line 217, in clip_to_image
        self.bbox[:, 1].clamp_(min=0, max=self.size[1] - TO_REMOVE)
    RuntimeError: Output 0 of UnbindBackward is a view and its base or another view of its base has been modified inplace. This view is the output of a function that returns multiple views. Such functions do not allow the output views to be modified inplace. You should replace the inplace operation by an out-of-place one.
    

    I address them by setting "with torch.no_grad()", but it feels strange. If I have to fine-tune the model, then these bugs will come again by removing "with torch.no_grad()".

    1. Also, I agree with the top question, https://github.com/microsoft/scene_graph_benchmark/issues/25 Could you please provide a simpler way to extract the VinVL features directly? Because it will bring much help to the community, and we will cite your works definitely.
    opened by he159ok 2
  • Broken links to VinVL model and associated labelmaps

    Broken links to VinVL model and associated labelmaps

    Hi! These links are broken (resource not found error): https://penzhanwu2.blob.core.windows.net/sgg/sgg_benchmark/vinvl_model_zoo/vinvl_vg_x152c4.pth https://penzhanwu2.blob.core.windows.net/sgg/sgg_benchmark/vinvl_model_zoo/VG-SGG-dicts-vgoi6-clipped.json

    opened by stopmosk 2
  • Emergency, the pretrained model download links are lost

    Emergency, the pretrained model download links are lost

    It seems that links start with "https://penzhanwu2.blob.core.windows.net/" are not accessed. Is there anyone can fix this problem? Thanks very much!

    opened by yxding95 1
  • [VinVL] Support for ONNX

    [VinVL] Support for ONNX

    Hello there,

    Thank you so much for this great repository. I've been using VinVL for a while now and I'm really pleased with its accuracy. However, considering its size, I was wondering whether you had any plans to support ONNX to speed up the inference process. I have tried myself to enable it but I got some very strange errors while completing some operations.

    /home/ubuntu/emma/perception/.venv/lib/python3.9/site-packages/torch/onnx/symbolic_opset9.py:2815: UserWarning: Exporting aten::index operator of advanced indexing in opset 9 is achieved by combination of multiple ONNX operators, including Reshape, Transpose, Concat, and Gather. If indices include negative values, the exported graph will produce incorrect results.
      warnings.warn("Exporting aten::index operator of advanced indexing in opset " +
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "/home/ubuntu/emma/perception/.venv/lib/python3.9/site-packages/torch/autograd/grad_mode.py", line 28, in decorate_context
        return func(*args, **kwargs)
      File "/home/ubuntu/emma/perception/.venv/lib/python3.9/site-packages/pytorch_lightning/core/lightning.py", line 1899, in to_onnx
        torch.onnx.export(self, input_sample, file_path, **kwargs)
      File "/home/ubuntu/emma/perception/.venv/lib/python3.9/site-packages/torch/onnx/__init__.py", line 316, in export
        return utils.export(model, args, f, export_params, verbose, training,
      File "/home/ubuntu/emma/perception/.venv/lib/python3.9/site-packages/torch/onnx/utils.py", line 107, in export
        _export(model, args, f, export_params, verbose, training, input_names, output_names,
      File "/home/ubuntu/emma/perception/.venv/lib/python3.9/site-packages/torch/onnx/utils.py", line 724, in _export
        _model_to_graph(model, args, verbose, input_names,
      File "/home/ubuntu/emma/perception/.venv/lib/python3.9/site-packages/torch/onnx/utils.py", line 497, in _model_to_graph
        graph = _optimize_graph(graph, operator_export_type,
      File "/home/ubuntu/emma/perception/.venv/lib/python3.9/site-packages/torch/onnx/utils.py", line 216, in _optimize_graph
        graph = torch._C._jit_pass_onnx(graph, operator_export_type)
      File "/home/ubuntu/emma/perception/.venv/lib/python3.9/site-packages/torch/onnx/__init__.py", line 373, in _run_symbolic_function
        return utils._run_symbolic_function(*args, **kwargs)
      File "/home/ubuntu/emma/perception/.venv/lib/python3.9/site-packages/torch/onnx/utils.py", line 1032, in _run_symbolic_function
        return symbolic_fn(g, *inputs, **attrs)
      File "/home/ubuntu/emma/perception/.venv/lib/python3.9/site-packages/torch/onnx/symbolic_opset9.py", line 1866, in slice
        raise RuntimeError("step!=1 is currently not supported")
    RuntimeError: step!=1 is currently not supported
    
    >>> extractor.to_onnx("storage/model/vinvl_... by Suglia, Alessandro
    Suglia, Alessandro17/10 16:52
    >>> extractor.to_onnx("storage/model/vinvl_vg_x152c4_simbot.onnx", export_params=True, input_sample=input_sample)
    /home/ubuntu/emma/perception/.venv/lib/python3.9/site-packages/torch/functional.py:445: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at  ../aten/src/ATen/native/TensorShape.cpp:2157.)
      return _VF.meshgrid(tensors, **kwargs)  # type: ignore[attr-defined]
    /home/ubuntu/emma/perception/.venv/lib/python3.9/site-packages/maskrcnn_benchmark/structures/bounding_box.py:21: TracerWarning: torch.as_tensor results are registered as constants in the trace. You can safely ignore this warning if you use this function to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might cause the trace to be incorrect.
      bbox = torch.as_tensor(bbox, dtype=torch.float32, device=device)
    /home/ubuntu/emma/perception/.venv/lib/python3.9/site-packages/maskrcnn_benchmark/structures/bounding_box.py:26: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
      if bbox.size(-1) != 4:
    /home/ubuntu/emma/perception/.venv/lib/python3.9/site-packages/maskrcnn_benchmark/modeling/rpn/inference.py:94: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
      pre_nms_top_n = min(self.pre_nms_top_n, num_anchors)
    /home/ubuntu/emma/perception/.venv/lib/python3.9/site-packages/maskrcnn_benchmark/modeling/rpn/inference.py:111: TracerWarning: Iterating over a tensor might cause the trace to be incorrect. Passing a tensor of different shape won't change the number of iterations executed (and might lead to errors or silently give incorrect results).
      for proposal, score, im_shape in zip(proposals, objectness, image_shapes):
    /home/ubuntu/emma/perception/.venv/lib/python3.9/site-packages/maskrcnn_benchmark/modeling/roi_heads/box_head/inference.py:169: TracerWarning: torch.Tensor results are registered as constants in the trace. You can safely ignore this warning if you use this function to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might cause the trace to be incorrect.
      boxlist_empty.add_field("scores", torch.Tensor([]).to(device))
    /home/ubuntu/emma/perception/.venv/lib/python3.9/site-packages/maskrcnn_benchmark/modeling/roi_heads/box_head/inference.py:206: TracerWarning: Using len to get tensor shape might cause the trace to be incorrect. Recommended usage would be tensor.shape[0]. Passing a tensor of different shape might lead to errors or silently give incorrect results.
      if len(inds)>0:
    /home/ubuntu/emma/perception/.venv/lib/python3.9/site-packages/maskrcnn_benchmark/modeling/roi_heads/box_head/inference.py:131: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
      while new_boxlist.bbox.shape[0] < \
    /home/ubuntu/emma/perception/src/emma_perception/models/vinvl_extractor.py:55: TracerWarning: Iterating over a tensor might cause the trace to be incorrect. Passing a tensor of different shape won't change the number of iterations executed (and might lead to errors or silently give incorrect results).
      out = zip(batch["ids"], batch["width"], batch["height"], predictions, cnn_features)
    /home/ubuntu/emma/perception/.venv/lib/python3.9/site-packages/maskrcnn_benchmark/structures/bounding_box.py:99: TracerWarning: Converting a tensor to a Python float might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
      ratios = tuple(float(s) / float(s_orig) for s, s_orig in zip(size, self.size))
    /home/ubuntu/emma/perception/.venv/lib/python3.9/site-packages/torch/onnx/symbolic_opset9.py:2815: UserWarning: Exporting aten::index operator of advanced indexing in opset 9 is achieved by combination of multiple ONNX operators, including Reshape, Transpose, Concat, and Gather. If indices include negative values, the exported graph will produce incorrect results.
      warnings.warn("Exporting aten::index operator of advanced indexing in opset " +
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "/home/ubuntu/emma/perception/.venv/lib/python3.9/site-packages/torch/autograd/grad_mode.py", line 28, in decorate_context
        return func(*args, **kwargs)
      File "/home/ubuntu/emma/perception/.venv/lib/python3.9/site-packages/pytorch_lightning/core/lightning.py", line 1899, in to_onnx
        torch.onnx.export(self, input_sample, file_path, **kwargs)
      File "/home/ubuntu/emma/perception/.venv/lib/python3.9/site-packages/torch/onnx/__init__.py", line 316, in export
        return utils.export(model, args, f, export_params, verbose, training,
      File "/home/ubuntu/emma/perception/.venv/lib/python3.9/site-packages/torch/onnx/utils.py", line 107, in export
        _export(model, args, f, export_params, verbose, training, input_names, output_names,
      File "/home/ubuntu/emma/perception/.venv/lib/python3.9/site-packages/torch/onnx/utils.py", line 724, in _export
        _model_to_graph(model, args, verbose, input_names,
      File "/home/ubuntu/emma/perception/.venv/lib/python3.9/site-packages/torch/onnx/utils.py", line 497, in _model_to_graph
        graph = _optimize_graph(graph, operator_export_type,
      File "/home/ubuntu/emma/perception/.venv/lib/python3.9/site-packages/torch/onnx/utils.py", line 216, in _optimize_graph
        graph = torch._C._jit_pass_onnx(graph, operator_export_type)
      File "/home/ubuntu/emma/perception/.venv/lib/python3.9/site-packages/torch/onnx/__init__.py", line 373, in _run_symbolic_function
        return utils._run_symbolic_function(*args, **kwargs)
      File "/home/ubuntu/emma/perception/.venv/lib/python3.9/site-packages/torch/onnx/utils.py", line 1032, in _run_symbolic_function
        return symbolic_fn(g, *inputs, **attrs)
      File "/home/ubuntu/emma/perception/.venv/lib/python3.9/site-packages/torch/onnx/symbolic_opset9.py", line 1866, in slice
        raise RuntimeError("step!=1 is currently not supported")
    RuntimeError: step!=1 is currently not supported
    

    Do you have any advise?

    opened by aleSuglia 0
  • I got nan losses on results

    I got nan losses on results

    INFO:maskrcnn_benchmark.trainer:eta: 10:55:48 iter: 5100 loss: nan (nan) loss_obj_classifier: 0.0000 (0.0000) loss_pred_classifier: nan (nan) time: 1.1246 (1.1275) data: 0.0462 (0.0644) lr: 0.000467 max mem: 9431

    SOLVER: BASE_LR: 0.001 WEIGHT_DECAY: 0.0001 MAX_ITER: 40000 STEPS: (50000,) IMS_PER_BATCH: 16 CHECKPOINT_PERIOD: 10000

    opened by tiantao911 1
  • Is the pre-trained Resnet-50 object detection model available?

    Is the pre-trained Resnet-50 object detection model available?

    Hi, thank you for the nice repository! I had a question regarding the training of the VisualGenome models. The following is written in the paper:

    "The object detection model is a ResNet50-FPN detector trained on Visucal Genome"

    I have two questions:

    1. I cannot figure out where I can find the pretrained weights of this model? Or are we expected to train this ourselves?
    2. Why is the Backbone resnet different for VisualGenome and OpenImages?

    Thanks for the help!

    opened by VSJMilewski 0
  • Add `$schema` to `cgmanifest.json`

    Add `$schema` to `cgmanifest.json`

    This pull request adds the JSON schema for cgmanifest.json.

    FAQ

    Why?

    A JSON schema helps you to ensure that your cgmanifest.json file is valid. JSON schema validation is a build-in feature in most modern IDEs like Visual Studio and Visual Studio Code. Most modern IDEs also provide code-completion for JSON schemas.

    How can I validate my cgmanifest.json file?

    Most modern IDEs like Visual Studio and Visual Studio Code have a built-in feature to validate JSON files. You can also use this small script to validate your cgmanifest.json file.

    Why does it suggest camel case for the properties?

    Component Detection is able to read camel case and pascal case properties. However, the JSON schema doesn't have a case-insensitive mode. We therefore suggest camel case as it's the most common format for JSON.

    Why is the diff so large?

    To deserialize the cgmanifest.json file, we use JSON.parse(). However, to serialize the JSON again we use prettier. We found that, in general, it gave smaller diffs than the default JSON.stringify() function.

    opened by JamieMagee 0
Owner
Microsoft
Open source projects and samples from Microsoft
Microsoft
Pytorch implementation of Make-A-Scene: Scene-Based Text-to-Image Generation with Human Priors

Make-A-Scene - PyTorch Pytorch implementation (inofficial) of Make-A-Scene: Scene-Based Text-to-Image Generation with Human Priors (https://arxiv.org/

Casual GAN Papers 259 Dec 28, 2022
Code for "Learning Canonical Representations for Scene Graph to Image Generation", Herzig & Bar et al., ECCV2020

Learning Canonical Representations for Scene Graph to Image Generation (ECCV 2020) Roei Herzig*, Amir Bar*, Huijuan Xu, Gal Chechik, Trevor Darrell, A

roei_herzig 24 Jul 7, 2022
Official PyTorch code of DeepPanoContext: Panoramic 3D Scene Understanding with Holistic Scene Context Graph and Relation-based Optimization (ICCV 2021 Oral).

DeepPanoContext (DPC) [Project Page (with interactive results)][Paper] DeepPanoContext: Panoramic 3D Scene Understanding with Holistic Scene Context G

Cheng Zhang 66 Nov 16, 2022
Image-generation-baseline - MUGE Text To Image Generation Baseline

MUGE Text To Image Generation Baseline Requirements and Installation More detail

null 23 Oct 17, 2022
Fine-grained Control of Image Caption Generation with Abstract Scene Graphs

Faster R-CNN pretrained on VisualGenome This repository modifies maskrcnn-benchmark for object detection and attribute prediction on VisualGenome data

Shizhe Chen 7 Apr 20, 2021
Code for CVPR 2021 oral paper "Exploring Data-Efficient 3D Scene Understanding with Contrastive Scene Contexts"

Exploring Data-Efficient 3D Scene Understanding with Contrastive Scene Contexts The rapid progress in 3D scene understanding has come with growing dem

Facebook Research 182 Dec 30, 2022
[TIP 2020] Multi-Temporal Scene Classification and Scene Change Detection with Correlation based Fusion

Multi-Temporal Scene Classification and Scene Change Detection with Correlation based Fusion Code for Multi-Temporal Scene Classification and Scene Ch

Lixiang Ru 33 Dec 12, 2022
Neural Scene Graphs for Dynamic Scene (CVPR 2021)

Implementation of Neural Scene Graphs, that optimizes multiple radiance fields to represent different objects and a static scene background. Learned representations can be rendered with novel object compositions and views.

null 151 Dec 26, 2022
Automatic number plate recognition using tech: Yolo, OCR, Scene text detection, scene text recognation, flask, torch

Automatic Number Plate Recognition Automatic Number Plate Recognition (ANPR) is the process of reading the characters on the plate with various optica

Meftun AKARSU 52 Dec 22, 2022
Development kit for MIT Scene Parsing Benchmark

Development Kit for MIT Scene Parsing Benchmark [NEW!] Our PyTorch implementation is released in the following repository: https://github.com/hangzhao

MIT CSAIL Computer Vision 424 Dec 1, 2022
This repository contains several image-to-image translation models, whcih were tested for RGB to NIR image generation. The models are Pix2Pix, Pix2PixHD, CycleGAN and PointWise.

RGB2NIR_Experimental This repository contains several image-to-image translation models, whcih were tested for RGB to NIR image generation. The models

null 5 Jan 4, 2023
OpenMMLab's Next Generation Video Understanding Toolbox and Benchmark

Introduction English | 简体中文 MMAction2 is an open-source toolbox for video understanding based on PyTorch. It is a part of the OpenMMLab project. The m

OpenMMLab 2.7k Jan 7, 2023
Benchmark for evaluating open-ended generation

OpenMEVA Contributed by Jian Guan, Zhexin Zhang. Thank Jiaxin Wen for DeBugging. OpenMEVA is a benchmark for evaluating open-ended story generation me

null 25 Nov 15, 2022
A Research-oriented Federated Learning Library and Benchmark Platform for Graph Neural Networks. Accepted to ICLR'2021 - DPML and MLSys'21 - GNNSys workshops.

FedGraphNN: A Federated Learning System and Benchmark for Graph Neural Networks A Research-oriented Federated Learning Library and Benchmark Platform

FedML-AI 175 Dec 1, 2022
[Preprint] "Bag of Tricks for Training Deeper Graph Neural Networks A Comprehensive Benchmark Study" by Tianlong Chen*, Kaixiong Zhou*, Keyu Duan, Wenqing Zheng, Peihao Wang, Xia Hu, Zhangyang Wang

Bag of Tricks for Training Deeper Graph Neural Networks: A Comprehensive Benchmark Study Codes for [Preprint] Bag of Tricks for Training Deeper Graph

VITA 101 Dec 29, 2022
A 1.3B text-to-image generation model trained on 14 million image-text pairs

minDALL-E on Conceptual Captions minDALL-E, named after minGPT, is a 1.3B text-to-image generation model trained on 14 million image-text pairs for no

Kakao Brain 604 Dec 14, 2022
Source code for the GPT-2 story generation models in the EMNLP 2020 paper "STORIUM: A Dataset and Evaluation Platform for Human-in-the-Loop Story Generation"

Storium GPT-2 Models This is the official repository for the GPT-2 models described in the EMNLP 2020 paper [STORIUM: A Dataset and Evaluation Platfor

Nader Akoury 27 Dec 20, 2022
Official PyTorch code of Holistic 3D Scene Understanding from a Single Image with Implicit Representation (CVPR 2021)

Implicit3DUnderstanding (Im3D) [Project Page] Holistic 3D Scene Understanding from a Single Image with Implicit Representation Cheng Zhang, Zhaopeng C

Cheng Zhang 149 Jan 8, 2023