PIGLeT: Language Grounding Through Neuro-Symbolic Interaction in a 3D World [ACL 2021]

Related tags

Deep Learning piglet
Overview

piglet

PIGLeT: Language Grounding Through Neuro-Symbolic Interaction in a 3D World [ACL 2021] This repo contains code and data for PIGLeT. If you like this paper, please cite us:

@inproceedings{zellers2021piglet,
    title={PIGLeT: Language Grounding Through Neuro-Symbolic Interaction in a 3D World},
    author={Zellers, Rowan and Holtzman, Ari and Peters, Matthew and Mottaghi, Roozbeh and Kembhavi, Aniruddha and Farhadi, Ali and Choi, Yejin},
    booktitle ={Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics},
    year={2021}
}

See more at https://rowanzellers.com/piglet

What this repo contains

Physical dynamics model

  • You can get data yourself by sampling trajectories in sampler/ and then converting them to tfrecord (which is the format I used) in tfrecord/. I also have the exact tfrecords I used at gs://piglet-data/physical-interaction-tfrecords/ -- they're big files so I turned on 'requester pays' for them.
  • You can pretrain the model and evaluate it in model/interact/train.py and model/interact/intrinsic_eval.py
  • Alteratively feel free to use my checkpoint: gs://piglet/checkpoints/physical_dynamics_model/model.ckpt-5420

Language model

  • You can process data (also in tfrecord format) using data/zeroshot_lm_setup/prepare_zslm_tfrecord.py, or download at gs://piglet-data/text-data/. I have both 'zero-shot' tfrecord data, basically a version of BookCorpus and Wikipedia where certain concepts are filtered out, as well as non-zero shot (regularly processed). This was used to evaluate generalization to new concepts.
  • Train the model using model/lm/train.py
  • Alternatively, feel free to just use my checkpoint: gs://piglet/checkpoints/language_model/model.ckpt-20000

Tying it all together

  • Everything you need for this is in model/predict_statechange/ building on both the physical dynamics model and language model pretrained.
  • I have annotations in data/annotations.jsonl for training and evaluating both tasks -- PIGPeN-NLU and PIGPeN-NLG.
  • Alternatively you can download my checkpoints at gs://piglet/checkpoints/pigpen-nlu-model/ for NLU (predicting state change given english text) or gs://piglet/checkpoints/pigpen-nlg-model/ for NLG.

That's it!

Getting the environment set up

I used TPUs for this project so those are the only things I support right now, sorry!

I used tensorflow 1.15.5 and TPUs for this project. My recommendation is to use ctpu to start up a VM with access to a v3-8 TPU. Then, use the following command to install dependencies:

curl -o ~/miniconda.sh -O  https://repo.continuum.io/miniconda/Miniconda3-4.5.4-Linux-x86_64.sh  && \
     chmod +x ~/miniconda.sh && \
     ~/miniconda.sh -b -p ~/conda && \
     rm ~/miniconda.sh && \
     ~/conda/bin/conda install -y python=3.7 tqdm numpy pyyaml scipy ipython mkl mkl-include cython typing h5py pandas && ~/conda/bin/conda clean -ya
     
echo 'export PATH=~/conda/bin:$PATH' >>~/.bashrc
source ~/.bashrc
pip install "tensorflow==1.15.5"
pip install --upgrade google-api-python-client oauth2client
pip install -r requirements.txt
You might also like...
[CVPR 2021] Modular Interactive Video Object Segmentation: Interaction-to-Mask, Propagation and Difference-Aware Fusion
[CVPR 2021] Modular Interactive Video Object Segmentation: Interaction-to-Mask, Propagation and Difference-Aware Fusion

[CVPR 2021] Modular Interactive Video Object Segmentation: Interaction-to-Mask, Propagation and Difference-Aware Fusion

Accommodating supervised learning algorithms for the historical prices of the world's favorite cryptocurrency and boosting it through LightGBM.

Accommodating supervised learning algorithms for the historical prices of the world's favorite cryptocurrency and boosting it through LightGBM.

Make your master artistic punk avatar through machine learning world famous paintings.
Make your master artistic punk avatar through machine learning world famous paintings.

Master-art-punk Make your master artistic punk avatar through machine learning world famous paintings. 通过机器学习世界名画制作属于你的大师级艺术朋克头像 Nowadays, NFT is beco

Theano is a Python library that allows you to define, optimize, and evaluate mathematical expressions involving multi-dimensional arrays efficiently. It can use GPUs and perform efficient symbolic differentiation.

============================================================================================================ `MILA will stop developing Theano https:

Theano is a Python library that allows you to define, optimize, and evaluate mathematical expressions involving multi-dimensional arrays efficiently. It can use GPUs and perform efficient symbolic differentiation.

============================================================================================================ `MILA will stop developing Theano https:

Theano is a Python library that allows you to define, optimize, and evaluate mathematical expressions involving multi-dimensional arrays efficiently. It can use GPUs and perform efficient symbolic differentiation.

============================================================================================================ `MILA will stop developing Theano https:

ATOMIC 2020: On Symbolic and Neural Commonsense Knowledge Graphs
ATOMIC 2020: On Symbolic and Neural Commonsense Knowledge Graphs

(Comet-) ATOMIC 2020: On Symbolic and Neural Commonsense Knowledge Graphs Paper Jena D. Hwang, Chandra Bhagavatula, Ronan Le Bras, Jeff Da, Keisuke Sa

Symbolic Parallel Adaptive Importance Sampling for Probabilistic Program Analysis in JAX

SYMPAIS: Symbolic Parallel Adaptive Importance Sampling for Probabilistic Program Analysis Overview | Installation | Documentation | Examples | Notebo

Official repository for the paper, MidiBERT-Piano: Large-scale Pre-training for Symbolic Music Understanding.
Official repository for the paper, MidiBERT-Piano: Large-scale Pre-training for Symbolic Music Understanding.

MidiBERT-Piano Authors: Yi-Hui (Sophia) Chou, I-Chun (Bronwin) Chen Introduction This is the official repository for the paper, MidiBERT-Piano: Large-

Comments
  • Missing lm checkpoint

    Missing lm checkpoint

    Hey! I am trying to download the checkpoint for the trained lm (in the description you say it is under gs://piglet/checkpoints/language_model/model.ckpt-20000). However, this is not a valid path; I downloaded the contents under gs://piglet/checkpoints/language_model and I could only find the *.index, *.data-00000-of-000001 and *.meta files.

    Please lmk if I am missing something or the files are provided somewhere else. Essentially I want to experiment with the predict_statechange part of the project and different ways to fine-tune in the NLU task, so I would prefer to avoid re-training the entire LM. I am also trying to run the finetune_sc.py file, would it be possible to point me to the yaml config file? I found two versions (flagship_configs and symbol2text_configs), but I am not sure about the paths or what some of the params mean ("train_time_symbolic_actions_prob", "fuse_action",). Thank you!

    opened by spilioeve 4
  • detailed README on the data format for training the dymanic model

    detailed README on the data format for training the dymanic model

    Hi Rowan, thanks for the great work! just wondering if you could provide a more detailed readme on the .tfrecord data used for training the dynamic model, i.e., what are the meanings of each field, for example where can I find the mapping from actions/action_id to action names; what does the ints in the meta means; etc. Below is one example instance in the .tfrecord file:

    {   'actions/action_args': array([1, 0]),
        'actions/action_id': array([13]),
        'actions/action_success': array([1]),
        'agent_state': array([  1.        ,   0.        ,   0.90099925,  -0.75      ,
           180.        ,  30.000004  ], dtype=float32),
        'comparison_labels': array([1, 2]),
        'frames/encoded': array([255, 216, 255, ..., 143, 255, 217], dtype=uint8),
        'frames/format': array([106, 112, 101, 103], dtype=uint8),
        'frames/height': array([384]),
        'frames/num_frames': array([2]),
        'frames/width': array([640]),
        'meta': array([123,  34, 115,  99, 101, 110, 101,  95, 110,  97, 109, 101,  34,
            58,  32,  34,  70, 108, 111, 111, 114,  80, 108,  97, 110,  50,
            95, 112, 104, 121, 115, 105,  99, 115,  34,  44,  32,  34, 116,
           101, 120, 116,  34,  58,  32,  34,  70, 105, 108, 108,  32,  36,
            49,  32, 119, 105, 116, 104,  32, 108, 105, 113, 117, 105, 100,
            32, 117, 115, 105, 110, 103,  32,  36,  50,  44,  32, 116, 104,
           101, 110,  32, 101, 109, 112, 116, 121,  32, 105, 116,  46,  34,
            44,  32,  34, 109,  97, 105, 110,  95, 111,  98, 106, 101,  99,
           116,  95, 105, 100, 115,  34,  58,  32,  91,  34,  80, 111, 116,
           124,  43,  48,  48,  46,  56,  57, 124,  43,  48,  48,  46,  57,
            48, 124,  45,  48,  49,  46,  52,  49,  34,  44,  32,  34,  70,
            97, 117,  99, 101, 116, 124,  45,  48,  48,  46,  48,  50, 124,
            43,  48,  49,  46,  49,  52, 124,  45,  48,  49,  46,  54,  49,
            34,  44,  32,  34,  83, 105, 110, 107, 124,  43,  48,  48,  46,
            48,  48, 124,  43,  48,  48,  46,  56,  57, 124,  45,  48,  49,
            46,  52,  52, 124,  83, 105, 110, 107,  66,  97, 115, 105, 110,
            34,  93,  44,  32,  34, 119, 105, 100, 116, 104,  34,  58,  32,
            54,  52,  48,  44,  32,  34, 104, 101, 105, 103, 104, 116,  34,
            58,  32,  51,  56,  52,  44,  32,  34, 116,  97, 115, 107,  95,
           110,  97, 109, 101,  34,  58,  32,  34,  95, 102, 105, 108, 108,
            95, 111,  98, 106, 101,  99, 116, 120,  95, 119, 105, 116, 104,
            95, 108, 105, 113, 117, 105, 100,  34,  44,  32,  34, 102, 110,
            34,  58,  32,  34,  47, 104, 111, 109, 101,  47, 114, 111, 119,
            97, 110,  47, 100,  97, 116,  97, 115, 101, 116, 115,  51,  47,
           105, 112, 107,  45, 118,  49,  47, 102, 105, 108, 108,  95, 111,
            98, 106, 101,  99, 116, 120,  95, 119, 105, 116, 104,  95, 108,
           105, 113, 117, 105, 100,  47, 100,  56, 111,  57, 112, 104,  88,
            48,  53,  68, 117, 100,  46, 104,  53,  34, 125], dtype=uint8),
        'objects/ObjectTemperature': array([1, 1, 1, 1]),
        'objects/breakable': array([0, 0, 0, 0]),
        'objects/canBeUsedUp': array([0, 0, 0, 0]),
        'objects/canFillWithLiquid': array([1, 1, 1, 1]),
        'objects/cookable': array([0, 0, 0, 0]),
        'objects/dirtyable': array([1, 1, 0, 0]),
        'objects/distance': array([2, 1, 2, 2]),
        'objects/isBroken': array([0, 0, 0, 0]),
        'objects/isCooked': array([0, 0, 0, 0]),
        'objects/isDirty': array([0, 0, 0, 0]),
        'objects/isFilledWithLiquid': array([1, 1, 0, 0]),
        'objects/isOpen': array([0, 0, 0, 0]),
        'objects/isPickedUp': array([0, 1, 0, 0]),
        'objects/isSliced': array([0, 0, 0, 0]),
        'objects/isToggled': array([0, 0, 0, 0]),
        'objects/isUsedUp': array([0, 0, 0, 0]),
        'objects/mass': array([4, 4, 0, 0]),
        'objects/moveable': array([0, 0, 0, 0]),
        'objects/object_types': array([79, 79, 94, 94]),
        'objects/openable': array([0, 0, 0, 0]),
        'objects/parentReceptacles': array([94,  0,  0,  0]),
        'objects/pickupable': array([1, 1, 0, 0]),
        'objects/receptacle': array([1, 1, 1, 1]),
        'objects/receptacleObjectIds': array([ 0,  0, 79,  0]),
        'objects/salientMaterials_Ceramic': array([0, 0, 0, 0]),
        'objects/salientMaterials_Fabric': array([0, 0, 0, 0]),
        'objects/salientMaterials_Food': array([0, 0, 0, 0]),
        'objects/salientMaterials_Glass': array([0, 0, 0, 0]),
        'objects/salientMaterials_Leather': array([0, 0, 0, 0]),
        'objects/salientMaterials_Metal': array([1, 1, 0, 0]),
        'objects/salientMaterials_Organic': array([0, 0, 0, 0]),
        'objects/salientMaterials_Paper': array([0, 0, 0, 0]),
        'objects/salientMaterials_Plastic': array([0, 0, 0, 0]),
        'objects/salientMaterials_Rubber': array([0, 0, 0, 0]),
        'objects/salientMaterials_Soap': array([0, 0, 0, 0]),
        'objects/salientMaterials_Sponge': array([0, 0, 0, 0]),
        'objects/salientMaterials_Stone': array([0, 0, 0, 0]),
        'objects/salientMaterials_Wax': array([0, 0, 0, 0]),
        'objects/salientMaterials_Wood': array([0, 0, 0, 0]),
        'objects/size': array([4, 4, 5, 5]),
        'objects/sliceable': array([0, 0, 0, 0]),
        'objects/toggleable': array([0, 0, 0, 0])}
    

    Thanks!

    opened by MikeWangWZHL 2
  • downloading tfrecords: non-free?

    downloading tfrecords: non-free?

    Hello, thank you for the amazing work!

    So, if I want to download tfrecords, do I have to pay some money for them? Or is it free, but the reason why 'requester pays' is on is because otherwise google cloud would not allow to upload these files into their system?

    opened by nilinykh 2
  • Mapping between the example in `annotations.jsonl` and `physical-interaction-tfrecords`

    Mapping between the example in `annotations.jsonl` and `physical-interaction-tfrecords`

    Hello, thank you for the great work!

    I have downloaded the annotations.jsonl and physical-interaction-tfrecords. However it is not clear to me how can I find the mapping between the examples in annotations.jsonl and the tfrecords in physical-interaction-tfrecords.

    Specifically, what I need is to find the corresponding image frames (pre-condition and post-condition) for the each example in the annotation file. Could you help me with this?

    Thank you!

    opened by yingShen-ys 2
Owner
Rowan Zellers
Rowan Zellers
Relaxed-machines - explorations in neuro-symbolic differentiable interpreters

Relaxed Machines Explorations in neuro-symbolic differentiable interpreters. Baby steps: inc_stop Libraries JAX Haiku Optax Resources Chapter 3 (∂4: A

Nada Amin 6 Feb 2, 2022
Data and Code for ACL 2021 Paper "Inter-GPS: Interpretable Geometry Problem Solving with Formal Language and Symbolic Reasoning"

Introduction Code and data for ACL 2021 Paper "Inter-GPS: Interpretable Geometry Problem Solving with Formal Language and Symbolic Reasoning". We cons

Pan Lu 81 Dec 27, 2022
A PyTorch implementation of the baseline method in Panoptic Narrative Grounding (ICCV 2021 Oral)

A PyTorch implementation of the baseline method in Panoptic Narrative Grounding (ICCV 2021 Oral)

Biomedical Computer Vision @ Uniandes 52 Dec 19, 2022
The official implementation of CVPR 2021 Paper: Improving Weakly Supervised Visual Grounding by Contrastive Knowledge Distillation.

Improving Weakly Supervised Visual Grounding by Contrastive Knowledge Distillation This repository is the official implementation of CVPR 2021 paper:

null 9 Nov 14, 2022
[CVPR2021] Look before you leap: learning landmark features for one-stage visual grounding.

LBYL-Net This repo implements paper Look Before You Leap: Learning Landmark Features For One-Stage Visual Grounding CVPR 2021. Getting Started Prerequ

SVIP Lab 45 Dec 12, 2022
A Fast and Accurate One-Stage Approach to Visual Grounding, ICCV 2019 (Oral)

One-Stage Visual Grounding ***** New: Our recent work on One-stage VG is available at ReSC.***** A Fast and Accurate One-Stage Approach to Visual Grou

Zhengyuan Yang 118 Dec 5, 2022
Negative Sample Matters: A Renaissance of Metric Learning for Temporal Grounding

2D-TAN (Optimized) Introduction This is an optimized re-implementation repository for AAAI'2020 paper: Learning 2D Temporal Localization Networks for

Joya Chen 112 Dec 31, 2022
[ICCV2021] 3DVG-Transformer: Relation Modeling for Visual Grounding on Point Clouds

3DVG-Transformer This repository is for the ICCV 2021 paper "3DVG-Transformer: Relation Modeling for Visual Grounding on Point Clouds" Our method "3DV

null 22 Dec 11, 2022
[CVPR 2022 Oral] TubeDETR: Spatio-Temporal Video Grounding with Transformers

TubeDETR: Spatio-Temporal Video Grounding with Transformers Website • STVG Demo • Paper This repository provides the code for our paper. This includes

Antoine Yang 108 Dec 27, 2022
SeqTR: A Simple yet Universal Network for Visual Grounding

SeqTR This is the official implementation of SeqTR: A Simple yet Universal Network for Visual Grounding, which simplifies and unifies the modelling fo

seanZhuh 76 Dec 24, 2022