A Diagnostic Dataset for Compositional Language and Elementary Visual Reasoning

Facebook Research

Last update: Jan 4, 2023

Related tags

Deep Learning clevr-dataset-gen

Overview

CLEVR Dataset Generation

This is the code used to generate the CLEVR dataset as described in the paper:

CLEVR: A Diagnostic Dataset for Compositional Language and Elementary Visual Reasoning
Justin Johnson, Bharath Hariharan, Laurens van der Maaten, Fei-Fei Li, Larry Zitnick, Ross Girshick
Presented at CVPR 2017

Code and pretrained models for the baselines used in the paper can be found here.

You can use this code to render synthetic images and compositional questions for those images, like this:

Q: How many small spheres are there?
A: 2

Q: What number of cubes are small things or red metal objects?
A: 2

Q: Does the metal sphere have the same color as the metal cylinder?
A: Yes

Q: Are there more small cylinders than metal things?
A: No

Q: There is a cylinder that is on the right side of the large yellow object behind the blue ball; is there a shiny cube in front of it?
A: Yes

If you find this code useful in your research then please cite

@inproceedings{johnson2017clevr,
  title={CLEVR: A Diagnostic Dataset for Compositional Language and Elementary Visual Reasoning},
  author={Johnson, Justin and Hariharan, Bharath and van der Maaten, Laurens
          and Fei-Fei, Li and Zitnick, C Lawrence and Girshick, Ross},
  booktitle={CVPR},
  year={2017}
}

All code was developed and tested on OSX and Ubuntu 16.04.

Step 1: Generating Images

First we render synthetic images using Blender, outputting both rendered images as well as a JSON file containing ground-truth scene information for each image.

Blender ships with its own installation of Python which is used to execute scripts that interact with Blender; you'll need to add the image_generation directory to Python path of Blender's bundled Python. The easiest way to do this is by adding a .pth file to the site-packages directory of Blender's Python, like this:

echo $PWD/image_generation >> $BLENDER/$VERSION/python/lib/python3.5/site-packages/clevr.pth

where $BLENDER is the directory where Blender is installed and $VERSION is your Blender version; for example on OSX you might run:

echo $PWD/image_generation >> /Applications/blender/blender.app/Contents/Resources/2.78/python/lib/python3.5/site-packages/clevr.pth

You can then render some images like this:

cd image_generation
blender --background --python render_images.py -- --num_images 10

On OSX the blender binary is located inside the blender.app directory; for convenience you may want to add the following alias to your ~/.bash_profile file:

alias blender='/Applications/blender/blender.app/Contents/MacOS/blender'

If you have an NVIDIA GPU with CUDA installed then you can use the GPU to accelerate rendering like this:

blender --background --python render_images.py -- --num_images 10 --use_gpu 1

After this command terminates you should have ten freshly rendered images stored in output/images like these:

The file output/CLEVR_scenes.json will contain ground-truth scene information for all newly rendered images.

You can find more details about image rendering here.

Step 2: Generating Questions

Next we generate questions, functional programs, and answers for the rendered images generated in the previous step. This step takes as input the single JSON file containing all ground-truth scene information, and outputs a JSON file containing questions, answers, and functional programs for the questions in a single JSON file.

You can generate questions like this:

cd question_generation
python generate_questions.py

The file output/CLEVR_questions.json will then contain questions for the generated images.

You can find more details about question generation here.

Comments

What is the leadership board for the CLEVR dataset in Machine Learning?

Where is a list to the papers with all the benchmarks and models done to this data set?

https://www.quora.com/unanswered/What-is-the-leadership-board-the-the-CLEVER-dataset-in-Machine-Learning

opened by brando90 3
Blender compatible version
Is this repo compatible with blender 2.8? if not which version should I install? I recommend you to add this to the README file since this info is not included for now.

Another question please: I saw json file example of generated image and under "pixel_coords" key there was a list of three values. What does the last value represent?
opened by AvivSham 1
Inconsistency in 3d coordinates

Hi,

I am noticing an inconsistency between the placement of the 3d objects in the scene and their corresponding 3d coordinates in the scene. For instance, to show some examples:

(1): (it looks like the z-axis in this plot corresponds to how close/far you are from the camera, e.g. 'small-cyclinder-brown' and 'small-sphere-gray' are at the top of the z-axis, and they are closest to the camera)

(2): (this contradicts (1) now, because the small red cylinder lies around the midpoint of the z-axis yet it's actually the furthest from the camera)

(3): (again, contradicts the previous images, because the gold sphere is closest to the camera but lies about the mid-point of the z-axis)

I have not generated the dataset myself from source, but if there is a discrepancy between the code and the dataset that is available online that may explain it. Or, if the 3d coordinates are not relative to the camera (e.g. the camera was randomly displaced prior to rendering). I tried looking at the code but I didn't find anything that appeared to be odd to me, but I am really not sure at this stage.

Thanks!

opened by christopher-beckham 1
No key 'function' in the list of questions

I am referring to the repository CLEVR IEP for training model. I followed the steps as in the TRAINING.md file but unfortunately the script breaks. In the file programs.py inside functions function_to_str, list_to_tree, tree_to_prefix, tree_to_postfix all tries to access the key cur['function'], which does not exist in any of the questions generated.

Does someone have the updated code for clevr-dataset-gen for questions generation, and if not how this could be tackled?

opened by RishikMani 1
Adding Code of Conduct file

This is pull request was created automatically because we noticed your project was missing a Code of Conduct file.

Code of Conduct files facilitate respectful and constructive communities by establishing expected behaviors for project contributors.

This PR was crafted with love by Facebook's Open Source Team.
CLA Signed

opened by facebook-github-bot 0
Rendering scenes with fixed objects locations

For my project I need to render images with fixed number of objects ( max_objects=min_objects =2 for now) and fix the locations of the objects in all the rendered scenes. Any thoughts on how to fix the location parameter?

opened by hudaAlamri 0

Zeroeth template in compare_integer.json can never return true

First and foremost, it goes without saying, but great repo and dataset!

Moving on to the issue, I believe there's a slight error in the zeroeth template of compare_integer.json

Specifically, the zeroeth template (with indenting for legibility) is as follows:

    "constraints": [
      {
        "params": [
          1,
          3
        ],
        "type": "OUT_NEQ"
      }
    ],
    "nodes": [
      {
        "inputs": [],
        "type": "scene"
      },
      {
        "inputs": [
          0
        ],
        "side_inputs": [
          "<Z>",
          "<C>",
          "<M>",
          "<S>"
        ],
        "type": "filter_count"
      },
      {
        "inputs": [],
        "type": "scene"
      },
      {
        "inputs": [
          2
        ],
        "side_inputs": [
          "<Z2>",
          "<C2>",
          "<M2>",
          "<S2>"
        ],
        "type": "filter_count"
      },
      {
        "inputs": [
          1,
          3
        ],
        "type": "equal_integer"
      }
    ],
    "params": [
      {
        "name": "<Z>",
        "type": "Size"
      },
      {
        "name": "<C>",
        "type": "Color"
      },
      {
        "name": "<M>",
        "type": "Material"
      },
      {
        "name": "<S>",
        "type": "Shape"
      },
      {
        "name": "<Z2>",
        "type": "Size"
      },
      {
        "name": "<C2>",
        "type": "Color"
      },
      {
        "name": "<M2>",
        "type": "Material"
      },
      {
        "name": "<S2>",
        "type": "Shape"
      }
    ],
    "text": [
      "Are there an equal number of <Z> <C> <M> <S>s and <Z2> <C2> <M2> <S2>s?",
      "Are there the same number of <Z> <C> <M> <S>s and <Z2> <C2> <M2> <S2>s?",
      "Is the number of <Z> <C> <M> <S>s the same as the number of <Z2> <C2> <M2> <S2>s?"
    ]
  },

Note the constraint is OUT_NEQ between nodes 1 and 3. However, nodes 1 and 3 are filter_count nodes rather than filter nodes, so the constraint is that the counts (rather than the objects found) must differ. Consequently, the answer is always false (there cannot be an equal number of objects, because our constraint forces the object count to differ). I threw together a quick script to try and check the answers in the CLEVR training set, and I believe it supports my conclusions.

It's not a serious issue, but it confused me and took me a while to figure out what was going on -- I hope this spares someone else some debugging time. Cheers!

opened by ikb-a 0

Object rotation angle problem

https://github.com/facebookresearch/clevr-dataset-gen/blob/f0ce2c81750bfae09b5bf94d009f42e055f2cb3a/image_generation/render_images.py#L411

https://github.com/facebookresearch/clevr-dataset-gen/blob/f0ce2c81750bfae09b5bf94d009f42e055f2cb3a/image_generation/utils.py#L104

I think there is a mistake here. When I tried to use the object rotation, I found that the object did not rotate much. After I debug, I find that the angle here is incorrect. This angle should be radians, not angles.

I hope everyone can understand this problem. After all, this problem has caused me a lot of trouble.

opened by SongQiPing 0
CLEVR with other background

Do you have thinking about to add background by HDRI data. I have try to do this, but it seems like something wrong. maybe because the camera is too close to the HDRI world scene. So, is there any example to replace the pure background with other images such like COCO

opened by shen453011331 1
Add different shapes

I tried adding different shapes that I created into the dataset, but to render some new images, but I get an error: " bpy_prop_collection[key]: Key "Polygon not found' " I was able to add a torus and a cone with no issues, but not anything else. (Like a cube that isn't smooth) Could you explain to me why this is an issue?

opened by rachelsot123 0

Owner

Facebook Research

GitHub

Predict Breast Cancer Wisconsin (Diagnostic) using Naive Bayes

Naive-Bayes Predict Breast Cancer Wisconsin (Diagnostic) using Naive Bayes Downloading Data Set Use our Breast Cancer Wisconsin Data Set Also you can

0 Apr 6, 2022

PyTorch implementation of "Transparency by Design: Closing the Gap Between Performance and Interpretability in Visual Reasoning"

Transparency-by-Design networks (TbD-nets) This repository contains code for replicating the experiments and visualizations from the paper Transparenc

351 Nov 18, 2022

🌈 PyTorch Implementation for EMNLP'21 Findings "Reasoning Visual Dialog with Sparse Graph Learning and Knowledge Transfer"

SGLKT-VisDial Pytorch Implementation for the paper: Reasoning Visual Dialog with Sparse Graph Learning and Knowledge Transfer Gi-Cheon Kang, Junseok P

9 Jul 5, 2022

Code and data for "Broaden the Vision: Geo-Diverse Visual Commonsense Reasoning" (EMNLP 2021).

GD-VCR Code for Broaden the Vision: Geo-Diverse Visual Commonsense Reasoning (EMNLP 2021). Research Questions and Aims: How well can a model perform o

24 Oct 13, 2022

[CVPR 2021] A Peek Into the Reasoning of Neural Networks: Interpreting with Structural Visual Concepts

Visual-Reasoning-eXplanation [CVPR 2021 A Peek Into the Reasoning of Neural Networks: Interpreting with Structural Visual Concepts] Project Page | Vid

54 Dec 21, 2022

The code for MM2021 paper "Multi-Level Counterfactual Contrast for Visual Commonsense Reasoning"

The Code for MM2021 paper "Multi-Level Counterfactual Contrast for Visual Commonsense Reasoning" Setting up and using the repo Get the dataset. Follow

4 Apr 20, 2022

Compositional and Parameter-Efficient Representations for Large Knowledge Graphs

NodePiece - Compositional and Parameter-Efficient Representations for Large Knowledge Graphs NodePiece is a "tokenizer" for reducing entity vocabulary

107 Jan 4, 2023

Code to reproduce the results for Compositional Attention: Disentangling Search and Retrieval.

Compositional-Attention This repository contains the official implementation for the paper Compositional Attention: Disentangling Search and Retrieval

17 Oct 23, 2021

Code for SentiBERT: A Transferable Transformer-Based Architecture for Compositional Sentiment Semantics (ACL'2020).

SentiBERT Code for SentiBERT: A Transferable Transformer-Based Architecture for Compositional Sentiment Semantics (ACL'2020). https://arxiv.org/abs/20

66 Aug 13, 2022

Compositional Sketch Search

Compositional Sketch Search Official repository for ICIP 2021 Paper: Compositional Sketch Search Requirements Install and activate conda environment c

8 Sep 6, 2021

Code for EMNLP 2021 main conference paper "Text AutoAugment: Learning Compositional Augmentation Policy for Text Classification"

Text-AutoAugment (TAA) This repository contains the code for our paper Text AutoAugment: Learning Compositional Augmentation Policy for Text Classific

105 Jan 3, 2023

This repository contains the code for the CVPR 2021 paper "GIRAFFE: Representing Scenes as Compositional Generative Neural Feature Fields"

1.1k Dec 30, 2022

[NeurIPS 2021] Code for Unsupervised Learning of Compositional Energy Concepts

Unsupervised Learning of Compositional Energy Concepts This is the pytorch code for the paper Unsupervised Learning of Compositional Energy Concepts.

45 Nov 30, 2022

[CVPR'22] COAP: Learning Compositional Occupancy of People

COAP: Compositional Articulated Occupancy of People Paper | Video | Project Page This is the official implementation of the CVPR 2022 paper COAP: Lear

111 Dec 11, 2022

Dataset used in "PlantDoc: A Dataset for Visual Plant Disease Detection" accepted in CODS-COMAD 2020

PlantDoc: A Dataset for Visual Plant Disease Detection This repository contains the Cropped-PlantDoc dataset used for benchmarking classification mode

109 Dec 29, 2022

Data and Code for ACL 2021 Paper "Inter-GPS: Interpretable Geometry Problem Solving with Formal Language and Symbolic Reasoning"

Introduction Code and data for ACL 2021 Paper "Inter-GPS: Interpretable Geometry Problem Solving with Formal Language and Symbolic Reasoning". We cons

81 Dec 27, 2022

Episodic Transformer (E.T.) is a novel attention-based architecture for vision-and-language navigation. E.T. is based on a multimodal transformer that encodes language inputs and the full episode history of visual observations and actions.

Episodic Transformers (E.T.) Episodic Transformer for Vision-and-Language Navigation Alexander Pashevich, Cordelia Schmid, Chen Sun Episodic Transform

62 Dec 24, 2022

improvement of CLIP features over the traditional resnet features on the visual question answering, image captioning, navigation and visual entailment tasks.

CLIP-ViL In our paper "How Much Can CLIP Benefit Vision-and-Language Tasks?", we show the improvement of CLIP features over the traditional resnet fea

310 Dec 28, 2022

Deep Learning and Logical Reasoning from Data and Knowledge

Logic Tensor Networks (LTN) Logic Tensor Network (LTN) is a neurosymbolic framework that supports querying, learning and reasoning with both rich data

171 Dec 29, 2022