Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents

Overview

Open in Colab

Language Models as Zero-Shot Planners:
Extracting Actionable Knowledge for Embodied Agents

[Project Page] [Paper] [Video]

Wenlong Huang1, Pieter Abbeel1, Deepak Pathak*2, Igor Mordatch*3 (*equal advising)

1University of California, Berkeley, 2Carnegie Mellon University, 3Google Brain

This is the official demo code for our Language Models as Zero-Shot Planners paper. The code demonstrates how Large Language Models, such as GPT-3 and Codex, can generate action plans for complex human activities (e.g. "make breakfast"), even without any further training. The code can be used with any available language models from OpenAI API and Huggingface Transformers with a common interface.

If you find this work useful in your research, please cite using the following BibTeX:

@article{huang2022language,
      title={Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents},
      author={Huang, Wenlong and Abbeel, Pieter and Pathak, Deepak and Mordatch, Igor},
      journal={arXiv preprint arXiv:2201.07207},
      year={2022}
    }

Local Setup or Open in Colab

Requirements

  • Python=3.6.13
  • CUDA=11.3

Setup Instructions

git clone https://github.com/huangwl18/language-planner.git
cd language-planner/
conda create --name language-planner-env python=3.6.13
conda activate language-planner-env
pip install --upgrade pip
pip install -r requirements.txt

Running Code

See demo.ipynb (or Open in Colab) for a complete walk-through of our method. Feel free to experiment with any household tasks that you come up with (or any tasks beyond household domain if you provide necessary actions in available_actions.json)!

Note:

  • It is observed that best results can be obtained with larger language models. If you cannot run Huggingface Transformers models locally or on Google Colab due to memory constraint, it is recommended to register an OpenAI API account and use GPT-3 or Codex (As of 01/2022, $18 free credits are awarded to new accounts and Codex series are free after admitted from the waitlist).
  • Due to language models' high sensitivity to sampling hyperparameters, you may need to tune sampling hyperparameters for different models to obtain the best results.
  • The code uses the list of available actions supported in VirtualHome 1.0's Evolving Graph Simulator. The available actions are stored in available_actions.json. The actions should support a large variety of household tasks. However, you may modify or replace this file if you're interested in a different set of actions or a different domain of tasks (beyond household domain).
  • A subset of the manually-annotated examples originally collected by the VirtualHome paper is used as available examples in the prompt. They are transformed to natural language format and stored in available_examples.json. Feel free to change this file for a different set of available examples.
You might also like...
Watson Natural Language Understanding and Knowledge Studio

Material de demonstração dos serviços: Watson Natural Language Understanding e Knowledge Studio Visão Geral: https://www.ibm.com/br-pt/cloud/watson-na

Knowledge Oriented Programming Language

KoPL: 面向知识的推理问答编程语言 安装 | 快速开始 | 文档 KoPL全称 Knowledge oriented Programing Language, 是一个为复杂推理问答而设计的编程语言。我们可以将自然语言问题表示为由基本函数组合而成的KoPL程序,程序运行的结果就是问题的答案。目前,

A collection of Classical Chinese natural language processing models, including Classical Chinese related models and resources on the Internet.
A collection of Classical Chinese natural language processing models, including Classical Chinese related models and resources on the Internet.

GuwenModels: 古文自然语言处理模型合集, 收录互联网上的古文相关模型及资源. A collection of Classical Chinese natural language processing models, including Classical Chinese related models and resources on the Internet.

A library for finding knowledge neurons in pretrained transformer models.
A library for finding knowledge neurons in pretrained transformer models.

knowledge-neurons An open source repository replicating the 2021 paper Knowledge Neurons in Pretrained Transformers by Dai et al., and extending the t

A framework for evaluating Knowledge Graph Embedding Models in a fine-grained manner.

A framework for evaluating Knowledge Graph Embedding Models in a fine-grained manner.

Text preprocessing, representation and visualization from zero to hero.
Text preprocessing, representation and visualization from zero to hero.

Text preprocessing, representation and visualization from zero to hero. From zero to hero • Installation • Getting Started • Examples • API • FAQ • Co

Text preprocessing, representation and visualization from zero to hero.
Text preprocessing, representation and visualization from zero to hero.

Text preprocessing, representation and visualization from zero to hero. From zero to hero • Installation • Getting Started • Examples • API • FAQ • Co

CPC-big and k-means clustering for zero-resource speech processing

The CPC-big model and k-means checkpoints used in Analyzing Speaker Information in Self-Supervised Models to Improve Zero-Resource Speech Processing.

Final Project Bootcamp Zero

The Quest (Pygame) Descripción Este es el repositorio de código The-Quest para el proyecto final Bootcamp Zero de KeepCoding. El juego consiste en la

Comments
  • the output of the demo.

    the output of the demo.

    Hello, @huangwl18 First, I would like to thank you for sharing your awesome code!

    I ran your demo without any change and this is what I got from the last cell (i.e., the autoregressive plan generation part) as below. But it is a bit hard for me to find the output plan represents a possible solution (e.g., it tries to just open multiple objects but does not make some food). Is this output expected (i.e., did you get that same output)? If not, what could be a reason for this?

    ---------- GIVEN EXAMPLE ----------
    Task: Make toast
    Step 1: Walk to dining room
    Step 2: Walk to freezer
    Step 3: Find freezer
    Step 4: Open freezer
    Step 5: Find food bread
    Step 6: Grab food bread
    Step 7: Close freezer
    Step 8: Find toaster
    Step 9: Plug in toaster
    Step 10: Put food bread on toaster
    Step 11: Switch on toaster
    ---------- EXAMPLE END ----------
    
    Task: Make breakfast
    Step 1: Walk to kitchen
    Step 2: Turn to fridge
    Step 3: Open fridge
    Step 4: Open microwave
    Step 5: Open oven
    Step 6: Open stove
    Step 7: Open cupboard
    
    [Terminating early because best overall score is lower than CUTOFF_THRESHOLD (0.7268587350845337 < 0.8)]
    

    Thank you!

    opened by qwerty863 2
  • Calculation of mean log probability (GPT-3)

    Calculation of mean log probability (GPT-3)

    Hello Wenlong,

    I think there might be an error in calculating mean log probability when using GPT-3. The main issue is that GPT-3 does not only return generated texts in response, it returns more than these (including token_logprobs of logprobs). Therefore, in order to calculate the mean log probability, we cannot simply use

    # calculate mean log prob across tokens
    mean_log_probs = [np.mean(response['choices'][i]['logprobs']['token_logprobs']) for i in range(sampling_params['n'])]
    

    Instead, we should stop counting when a stop token is met.

    For example, here is a response with a stop sequence of "\n". The generated text is "Walk to kitchen", however GPT-3 returns more than that,

    response: {
      "choices": [
        {
          "finish_reason": "stop",
          "index": 0,
          "logprobs": {
            "text_offset": [
              317,
              322,
              325,
              333,
              333,
              333,
              333,
              333
            ],
            "token_logprobs": [
              -0.2976162,
              -0.00012346054,
              -0.5069456,
              -0.0011470452,
              -0.0060894582,
              -0.00028055036,
              -6.838237e-05,
              -0.054386232
            ],
            "tokens": [
              " Walk",
              " to",
              " kitchen",
              "\n",
              "Step",
              " 2",
              ":",
              " Walk"
            ],
            "top_logprobs": [
              {
                " Get": -3.9821253,
                " Go": -3.5860093,
                " Make": -3.1428235,
                " Wake": -2.513738,
                " Walk": -0.2976162
              },
              {
                " To": -12.335158,
                " in": -11.411637,
                " into": -9.384543,
                " to": -0.00012346054,
                " upstairs": -12.2138815
              },
              {
                " bedroom": -5.3587174,
                " dining": -1.0860167,
                " kitchen": -0.5069456,
                " living": -4.34434,
                " the": -3.2986841
              },
              {
                "\n": -0.0011470452,
                " ": -7.6692185,
                " table": -9.372099,
                ".": -8.122213,
                "ette": -9.167303
              },
              {
                "\n": -5.1904135,
                " Step": -7.8304586,
                "Step": -0.0060894582,
                "Task": -9.905375,
                "step": -10.6300955
              },
              {
                " 1": -10.295448,
                " 2": -0.00028055036,
                " 3": -11.589857,
                " 4": -12.77457,
                "2": -8.387781
              },
              {
                "\n": -11.062581,
                " :": -11.94543,
                ",": -12.268325,
                ".": -10.367215,
                ":": -6.838237e-05
              },
              {
                " Find": -3.783928,
                " Open": -4.0909195,
                " Turn": -5.903181,
                " Walk": -0.054386232,
                "Walk": -5.14835
              }
            ]
          },
          "text": " Walk to kitchen"
        }
      ],
      "model": "text-davinci-001",
      "object": "text_completion",
      "usage": {
        "completion_tokens": 3,
        "prompt_tokens": 94,
        "total_tokens": 97
      }
    }
    

    The current way of calculating mean log prob gives -0.10833211608375, where it should be mean(-0.2976162, -0.00012346054, -0.5069456) = -0.26822842018

    Please let me know what you think. Great work!

    Cheers, Kaixian

    opened by kaixqu 0
Owner
Wenlong Huang
Undergraduate Student @ UC Berkeley
Wenlong Huang
Extracting Summary Knowledge Graphs from Long Documents

GraphSum This repo contains the data and code for the G2G model in the paper: Extracting Summary Knowledge Graphs from Long Documents. The other basel

Zeqiu (Ellen) Wu 10 Oct 21, 2022
Train 🤗transformers with DeepSpeed: ZeRO-2, ZeRO-3

Fork from https://github.com/huggingface/transformers/tree/86d5fb0b360e68de46d40265e7c707fe68c8015b/examples/pytorch/language-modeling at 2021.05.17.

Junbum Lee 12 Oct 26, 2022
Unofficial Implementation of Zero-Shot Text-to-Speech for Text-Based Insertion in Audio Narration

Zero-Shot Text-to-Speech for Text-Based Insertion in Audio Narration This repo contains only model Implementation of Zero-Shot Text-to-Speech for Text

Rishikesh (ऋषिकेश) 33 Sep 22, 2022
EMNLP'2021: Can Language Models be Biomedical Knowledge Bases?

BioLAMA BioLAMA is biomedical factual knowledge triples for probing biomedical LMs. The triples are collected and pre-processed from three sources: CT

DMIS Laboratory - Korea University 41 Nov 18, 2022
Code for Findings at EMNLP 2021 paper: "Learn Continually, Generalize Rapidly: Lifelong Knowledge Accumulation for Few-shot Learning"

Learn Continually, Generalize Rapidly: Lifelong Knowledge Accumulation for Few-shot Learning This repo is for Findings at EMNLP 2021 paper: Learn Cont

INK Lab @ USC 6 Sep 2, 2022
A look-ahead multi-entity Transformer for modeling coordinated agents.

baller2vec++ This is the repository for the paper: Michael A. Alcorn and Anh Nguyen. baller2vec++: A Look-Ahead Multi-Entity Transformer For Modeling

Michael A. Alcorn 30 Dec 16, 2022
An easy to use, user-friendly and efficient code for extracting OpenAI CLIP (Global/Grid) features from image and text respectively.

Extracting OpenAI CLIP (Global/Grid) Features from Image and Text This repo aims at providing an easy to use and efficient code for extracting image &

Jianjie(JJ) Luo 13 Jan 6, 2023
This repository contains Python scripts for extracting linguistic features from Filipino texts.

Filipino Text Linguistic Feature Extractors This repository contains scripts for extracting linguistic features from Filipino texts. The scripts were

Joseph Imperial 1 Oct 5, 2021
A design of MIDI language for music generation task, specifically for Natural Language Processing (NLP) models.

MIDI Language Introduction Reference Paper: Pop Music Transformer: Beat-based Modeling and Generation of Expressive Pop Piano Compositions: code This

Robert Bogan Kang 3 May 25, 2022