DCL-PyTorch
Pytorch implementation for the Dynamic Concept Learner (DCL). More details can be found at the project page.
Framework
Grounding Physical Concepts of Objects and Events Through Dynamic Visual Reasoning
Zhenfang Chen, Jiayuan Mao, Jiajun Wu, Kwan-Yee K. Wong, Joshua B. Tenenbaum, and Chuang Gan
Prerequisites
- Python 3
- PyTorch 1.0 or higher, with NVIDIA CUDA Support
- Other required python packages specified by
requirements.txt
. See the Installation.
Installation
Install Jacinle: Clone the package, and add the bin path to your global PATH
environment variable:
git clone https://github.com/vacancy/Jacinle --recursive
export PATH=<path_to_jacinle>/bin:$PATH
Clone this repository:
git clone https://github.com/zfchenUnique/DCL-Release.git --recursive
Create a conda environment for NS-CL, and install the requirements. This includes the required python packages from both Jacinle NS-CL. Most of the required packages have been included in the built-in anaconda
package:
Dataset preparation
- Download videos, video annotation, questions and answers, and object proposals accordingly from the official website
- Transform videos into ".png" frames with ffmpeg.
- Organize the data as shown below.
clevrer ├── annotation_00000-01000 │ ├── annotation_00000.json │ ├── annotation_00001.json │ └── ... ├── ... ├── image_00000-01000 │ │ ├── 1.png │ │ ├── 2.png │ │ └── ... │ └── ... ├── ... ├── questions │ ├── train.json │ ├── validation.json │ └── test.json ├── proposals │ ├── proposal_00000.json │ ├── proposal_00001.json │ └── ...
Fast Evaluation
- Download the extracted object trajectories from google drive.
- Git clone the dynamic model, download image proposals and the pretrained propNet models and make dynamic prediction by
git clone https://github.com/zfchenUnique/clevrer_dynamic_propnet.git
cd clevrer_dynamic_propnet
sh ./scripts/eval_fast_release_v2.sh 0
- Download the pretrained DCL model and parsed programs.
sh scripts/script_test_prp_clevrer_qa.sh 0
- Get the accuracy on evalAI.
Step-by-step Training
- Step 1: download the proposals from the region proposal network and extract object trajectories for train and val set by
sh scripts/script_gen_tubes.sh
- Step 2: train a concept learner with descriptive and explanatory questions for static concepts (i.e. color, shape and material)
sh scripts/script_train_dcl_stage1.sh 0
- Step 3: extract static attributes & refine object trajectories extract static attributes
sh scripts/script_extract_attribute.sh
refine object trajectories
sh scripts/script_gen_tubes_refine.sh
- Step 4: extract predictive and counterfactual scenes by
cd clevrer_dynamic_propnet
sh ./scripts/train_tube_box_only.sh # train
sh ./scripts/train_tube.sh # train
sh ./scripts/eval_fast_release_v2.sh 0 # val
- Step 5: train DCL with all questions and the refined trajectories
sh scripts/script_train_dcl_stage2.sh 0
Generalization to CLEVRER-Grounding
- Step 1: download expression annotation and parsed programs from google drive
- Step 2: evaluate the performance on CLEVRER-Grounding
sh ./scripts/script_grounding.sh 0
jac-crun 0 scripts/script_evaluate_grounding.py
Generalization to CLEVRER-Retrieval
- Step 1: download expression annotation and parsed programs from google drive
- Step 2: evaluate the performance on CLEVRER-Retrieval
sh ./scripts/script_retrieval.sh 0
jac-crun 0 scripts/script_evaluate_retrieval.py
Extension to Tower Blocks
- Step 1: download question annotation and videos from google drive
- Step 2: train on Tower block QA
sh ./scripts/script_train_blocks.sh 0
- Step 3: download the pretrain model from google drive and evaluate on Tower block QA
sh ./scripts/script_eval_blocks.sh 0
Others
- Qualitative Results
- CLEVRER-Grounding training set Annotation
- CLEVRER-Retrieval training set Annotation
- Project Page
Citation
If you find this repo useful in your research, please consider citing:
@inproceedings{zfchen2021iclr,
title={Grounding Physical Concepts of Objects and Events Through Dynamic Visual Reasoning},
author={Chen, Zhenfang and Mao, Jiayuan and Wu, Jiajun and Wong, Kwan-Yee K and Tenenbaum, Joshua B. and Gan, Chuang},
booktitle={International Conference on Learning Representations},
year={2021}
}