Dynamic Visual Reasoning by Learning Differentiable Physics Models from Video and Language (NeurIPS 2021)

Mingyu Ding

Last update: Sep 20, 2022

Related tags

Deep Learning VRDP

Overview

VRDP (NeurIPS 2021)

Dynamic Visual Reasoning by Learning Differentiable Physics Models from Video and Language
Mingyu Ding, Zhenfang Chen, Tao Du, Ping Luo, Joshua B. Tenenbaum, and Chuang Gan

More details can be found at the Project Page.

If you find our work useful in your research please consider citing our paper:

@inproceedings{ding2021dynamic,
  author = {Ding, Mingyu and Chen, Zhenfang and Du, Tao and Luo, Ping and Tenenbaum, Joshua B and Gan, Chuang},
  title = {Dynamic Visual Reasoning by Learning Differentiable Physics Models from Video and Language},
  booktitle = {Advances In Neural Information Processing Systems},
  year = {2021}
}

Prerequisites

Python 3
PyTorch 1.3 or higher
All relative packages are covered by Miniconda
Both CPUs and GPUs are supported

Dataset preparation

Download videos, video annotation, questions and answers, and object proposals accordingly from the official website
Transform videos into ".png" frames with ffmpeg.

Organize the data as shown below.

clevrer
├── annotation_00000-01000
│   ├── annotation_00000.json
│   ├── annotation_00001.json
│   └── ...
├── ...
├── image_00000-01000
│   │   ├── 1.png
│   │   ├── 2.png
│   │   └── ...
│   └── ...
├── ...
├── questions
│   ├── train.json
│   ├── validation.json
│   └── test.json
├── proposals
│   ├── proposal_00000.json
│   ├── proposal_00001.json
│   └── ...

We also provide data for physics learning and program execution in Google Drive. You can download them optionally and put them in the ./data/ folder.
Download the processed data executor_data.zip for the executor. Put it in and unzip it to ./executor/data/.

Get Object Dictionaries (Concepts and Trajectories)

Download the object proposals from the region proposal network and follow the Step-by-step Training in DCL to get object concepts and trajectories.

The above process includes:

trajectory extraction
concept learning
trajectory refinement

Or you can download our extracted object dictionaries object_dicts.zip directly from Google Drive.

Learning

1. Differentiable Physics Learning

After we get the above object dictionaries, we learn physical parameters from object properties and trajectories.

cd dynamics/
python3 learn_dynamics.py 10000 15000
# Here argv[1] and argv[2] represent the start and end processing index respectively.

The output object physical parameters object_dicts_with_physics.zip can be downloaded from Google Drive.

2. Physics Simulation (counterfactual)

Physical simulation using learned physical parameters.

cd dynamics/
python3 physics_simulation.py 10000 15000
# Here argv[1] and argv[2] represent the start and end processing index respectively.

The output simulated trajectories/events object_simulated.zip can be downloaded from Google Drive.

3. Physics Simulation (predictive)

Correction of long-range prediction according to video observations.

cd dynamics/
python3 refine_prediction.py 10000 15000
# Here argv[1] and argv[2] represent the start and end processing index respectively.

The output refined trajectories/events object_updated_results.zip can be downloaded from Google Drive.

Evaluation

After we get the final trajectories/events, we perform the neuro-symbolic execution and evaluate the performance on the validation set.

cd executor/
python3 evaluation.py

The test json file for evaluation on evalAI can be generated by

cd executor/
python3 get_results.py

The Generalized Clerver Dataset (counterfactual_mass)

Download causal_mass.zip and counterfactual_mass.zip from Google Drive.
Generate counterfactual data on the collision event by python3 counterfactual_mass/generate_data.py

Examples

Predictive question
Counterfactual question

Acknowledgements

For questions regarding VRDP, feel free to post here or directly contact the author ([email protected]).

You might also like...

McGill Physics Hackathon 2021: Reaction-Diffusion Models for the Generation of Biological Patterns

DiffuseAnimals: Reaction-Diffusion Models for the Generation of Biological Patterns Introduction Reaction-diffusion equations can be utilized in order

2 Mar 7, 2022

PyTorch implementation of "Transparency by Design: Closing the Gap Between Performance and Interpretability in Visual Reasoning"

Transparency-by-Design networks (TbD-nets) This repository contains code for replicating the experiments and visualizations from the paper Transparenc

351 Nov 18, 2022

The code for MM2021 paper "Multi-Level Counterfactual Contrast for Visual Commonsense Reasoning"

The Code for MM2021 paper "Multi-Level Counterfactual Contrast for Visual Commonsense Reasoning" Setting up and using the repo Get the dataset. Follow

4 Apr 20, 2022

Guiding evolutionary strategies by (inaccurate) differentiable robot simulators @ NeurIPS, 4th Robot Learning Workshop

Guiding Evolutionary Strategies by Differentiable Robot Simulators In recent years, Evolutionary Strategies were actively explored in robotic tasks fo

4 Dec 14, 2021

[NeurIPS 2021 Spotlight] Code for Learning to Compose Visual Relations

Learning to Compose Visual Relations This is the pytorch codebase for the NeurIPS 2021 Spotlight paper Learning to Compose Visual Relations. Demo Imag

88 Jan 4, 2023

PyTorch implementations of the paper: "DR.VIC: Decomposition and Reasoning for Video Individual Counting, CVPR, 2022"

DRNet for Video Indvidual Counting (CVPR 2022) Introduction This is the official PyTorch implementation of paper: DR.VIC: Decomposition and Reasoning

35 Nov 22, 2022

[NeurIPS'21] Shape As Points: A Differentiable Poisson Solver

Shape As Points (SAP) Paper | Project Page | Short Video (6 min) | Long Video (12 min) This repository contains the implementation of the paper: Shape

394 Dec 30, 2022

【ACMMM 2021】DSANet: Dynamic Segment Aggregation Network for Video-Level Representation Learning

DSANet: Dynamic Segment Aggregation Network for Video-Level Representation Learning (ACMMM 2021) Overview We release the code of the DSANet (Dynamic S

46 Dec 27, 2022

CLIP: Connecting Text and Image (Learning Transferable Visual Models From Natural Language Supervision)

CLIP (Contrastive Language–Image Pre-training) Experiments (Evaluation) Model Dataset Acc (%) ViT-B/32 (Paper) CIFAR100 65.1 ViT-B/32 (Our) CIFAR100 6

52 Jan 7, 2023

Comments

Inquire about the results of explanatory queries

Hello. Thank you for sharing code!

I am rehearsing the experiment, and I got similar results on descriptive, predictive and counterfactual queries. But the result of explanatory queries is quite low. (In the paper, the records are 96.3% per opt. and 91.9% per ques.)

Can you give me some advice?

I gave a input object_updated_results.zip and followed Evaluation

Here are my result on validation set.

============ results ============
overall accuracy per option: 93.764898 %
overall accuracy per question: 91.686308 %
descriptive accuracy per question: 93.376978 %
explanatory accuracy per option: 92.976512 %
explanatory accuracy per question: 88.972667 %
predictive accuracy per option: 95.881361 %
predictive accuracy per question: 91.903289 %
counterfactual accuracy per option: 94.686999 %
counterfactual accuracy per question: 84.110147 %
============ results ============

Thanks!

opened by kig1929 0

Hi, thanks very much for sharing this wonderful work. Here is a question about forming the Clevrer dataset.

According to the dataset preparation, 2nd step, all downloaded videos should be transformed into images. However, since there are 1000 videos in the original video_00000_01000 folder, it seems that all images of these videos are stored in the image_00000_01000 folder. Should these images be put under separate folders such as video_00000? If not, how to decide the sequence of images transformed from different videos?

opened by ddghjikle 0