Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents

Wenlong Huang

Last update: Dec 29, 2022

Related tags

Text Data & NLP deep-learning transformers artificial-intelligence planning language-model codex knowledge-extraction embodied-ai gpt-3 foundation-models in-context-learning

Overview

Language Models as Zero-Shot Planners:
Extracting Actionable Knowledge for Embodied Agents

[Project Page] [Paper] [Video]

Wenlong Huang¹, Pieter Abbeel¹, Deepak Pathak*², Igor Mordatch*³ (*equal advising)

¹University of California, Berkeley, ²Carnegie Mellon University, ³Google Brain

This is the official demo code for our Language Models as Zero-Shot Planners paper. The code demonstrates how Large Language Models, such as GPT-3 and Codex, can generate action plans for complex human activities (e.g. "make breakfast"), even without any further training. The code can be used with any available language models from OpenAI API and Huggingface Transformers with a common interface.

If you find this work useful in your research, please cite using the following BibTeX:

@article{huang2022language,
      title={Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents},
      author={Huang, Wenlong and Abbeel, Pieter and Pathak, Deepak and Mordatch, Igor},
      journal={arXiv preprint arXiv:2201.07207},
      year={2022}
    }

Local Setup or Open in Colab

Requirements

Python=3.6.13
CUDA=11.3

Setup Instructions

git clone https://github.com/huangwl18/language-planner.git
cd language-planner/
conda create --name language-planner-env python=3.6.13
conda activate language-planner-env
pip install --upgrade pip
pip install -r requirements.txt

Running Code

See demo.ipynb (or ) for a complete walk-through of our method. Feel free to experiment with any household tasks that you come up with (or any tasks beyond household domain if you provide necessary actions in available_actions.json)!

Note:

It is observed that best results can be obtained with larger language models. If you cannot run Huggingface Transformers models locally or on Google Colab due to memory constraint, it is recommended to register an OpenAI API account and use GPT-3 or Codex (As of 01/2022, $18 free credits are awarded to new accounts and Codex series are free after admitted from the waitlist).
Due to language models' high sensitivity to sampling hyperparameters, you may need to tune sampling hyperparameters for different models to obtain the best results.
The code uses the list of available actions supported in VirtualHome 1.0's Evolving Graph Simulator. The available actions are stored in available_actions.json. The actions should support a large variety of household tasks. However, you may modify or replace this file if you're interested in a different set of actions or a different domain of tasks (beyond household domain).
A subset of the manually-annotated examples originally collected by the VirtualHome paper is used as available examples in the prompt. They are transformed to natural language format and stored in available_examples.json. Feel free to change this file for a different set of available examples.

Watson Natural Language Understanding and Knowledge Studio

Material de demonstração dos serviços: Watson Natural Language Understanding e Knowledge Studio Visão Geral: https://www.ibm.com/br-pt/cloud/watson-na

4 Oct 24, 2021

Knowledge Oriented Programming Language

KoPL: 面向知识的推理问答编程语言安装 | 快速开始 | 文档 KoPL全称 Knowledge oriented Programing Language, 是一个为复杂推理问答而设计的编程语言。我们可以将自然语言问题表示为由基本函数组合而成的KoPL程序，程序运行的结果就是问题的答案。目前，

62 Dec 12, 2022

A collection of Classical Chinese natural language processing models, including Classical Chinese related models and resources on the Internet.

GuwenModels: 古文自然语言处理模型合集, 收录互联网上的古文相关模型及资源. A collection of Classical Chinese natural language processing models, including Classical Chinese related models and resources on the Internet.

66 Dec 26, 2022

A library for finding knowledge neurons in pretrained transformer models.

knowledge-neurons An open source repository replicating the 2021 paper Knowledge Neurons in Pretrained Transformers by Dai et al., and extending the t

96 Dec 21, 2022

A framework for evaluating Knowledge Graph Embedding Models in a fine-grained manner.

13 Sep 8, 2022

Text preprocessing, representation and visualization from zero to hero.

Text preprocessing, representation and visualization from zero to hero. From zero to hero • Installation • Getting Started • Examples • API • FAQ • Co

2.7k Jan 8, 2023

Text preprocessing, representation and visualization from zero to hero.

Text preprocessing, representation and visualization from zero to hero. From zero to hero • Installation • Getting Started • Examples • API • FAQ • Co

2.1k Feb 13, 2021

CPC-big and k-means clustering for zero-resource speech processing

The CPC-big model and k-means checkpoints used in Analyzing Speaker Information in Self-Supervised Models to Improve Zero-Resource Speech Processing.

5 Nov 23, 2022

Final Project Bootcamp Zero

The Quest (Pygame) Descripción Este es el repositorio de código The-Quest para el proyecto final Bootcamp Zero de KeepCoding. El juego consiste en la

1 Mar 2, 2022

Comments

the output of the demo.

Hello, @huangwl18 First, I would like to thank you for sharing your awesome code!

I ran your demo without any change and this is what I got from the last cell (i.e., the autoregressive plan generation part) as below. But it is a bit hard for me to find the output plan represents a possible solution (e.g., it tries to just open multiple objects but does not make some food). Is this output expected (i.e., did you get that same output)? If not, what could be a reason for this?

---------- GIVEN EXAMPLE ----------
Task: Make toast
Step 1: Walk to dining room
Step 2: Walk to freezer
Step 3: Find freezer
Step 4: Open freezer
Step 5: Find food bread
Step 6: Grab food bread
Step 7: Close freezer
Step 8: Find toaster
Step 9: Plug in toaster
Step 10: Put food bread on toaster
Step 11: Switch on toaster
---------- EXAMPLE END ----------

Task: Make breakfast
Step 1: Walk to kitchen
Step 2: Turn to fridge
Step 3: Open fridge
Step 4: Open microwave
Step 5: Open oven
Step 6: Open stove
Step 7: Open cupboard

[Terminating early because best overall score is lower than CUTOFF_THRESHOLD (0.7268587350845337 < 0.8)]

Thank you!

opened by qwerty863 2

Calculation of mean log probability (GPT-3)

Hello Wenlong,

I think there might be an error in calculating mean log probability when using GPT-3. The main issue is that GPT-3 does not only return generated texts in response, it returns more than these (including token_logprobs of logprobs). Therefore, in order to calculate the mean log probability, we cannot simply use

# calculate mean log prob across tokens
mean_log_probs = [np.mean(response['choices'][i]['logprobs']['token_logprobs']) for i in range(sampling_params['n'])]

Instead, we should stop counting when a stop token is met.

For example, here is a response with a stop sequence of "\n". The generated text is "Walk to kitchen", however GPT-3 returns more than that,

response: {
  "choices": [
    {
      "finish_reason": "stop",
      "index": 0,
      "logprobs": {
        "text_offset": [
          317,
          322,
          325,
          333,
          333,
          333,
          333,
          333
        ],
        "token_logprobs": [
          -0.2976162,
          -0.00012346054,
          -0.5069456,
          -0.0011470452,
          -0.0060894582,
          -0.00028055036,
          -6.838237e-05,
          -0.054386232
        ],
        "tokens": [
          " Walk",
          " to",
          " kitchen",
          "\n",
          "Step",
          " 2",
          ":",
          " Walk"
        ],
        "top_logprobs": [
          {
            " Get": -3.9821253,
            " Go": -3.5860093,
            " Make": -3.1428235,
            " Wake": -2.513738,
            " Walk": -0.2976162
          },
          {
            " To": -12.335158,
            " in": -11.411637,
            " into": -9.384543,
            " to": -0.00012346054,
            " upstairs": -12.2138815
          },
          {
            " bedroom": -5.3587174,
            " dining": -1.0860167,
            " kitchen": -0.5069456,
            " living": -4.34434,
            " the": -3.2986841
          },
          {
            "\n": -0.0011470452,
            " ": -7.6692185,
            " table": -9.372099,
            ".": -8.122213,
            "ette": -9.167303
          },
          {
            "\n": -5.1904135,
            " Step": -7.8304586,
            "Step": -0.0060894582,
            "Task": -9.905375,
            "step": -10.6300955
          },
          {
            " 1": -10.295448,
            " 2": -0.00028055036,
            " 3": -11.589857,
            " 4": -12.77457,
            "2": -8.387781
          },
          {
            "\n": -11.062581,
            " :": -11.94543,
            ",": -12.268325,
            ".": -10.367215,
            ":": -6.838237e-05
          },
          {
            " Find": -3.783928,
            " Open": -4.0909195,
            " Turn": -5.903181,
            " Walk": -0.054386232,
            "Walk": -5.14835
          }
        ]
      },
      "text": " Walk to kitchen"
    }
  ],
  "model": "text-davinci-001",
  "object": "text_completion",
  "usage": {
    "completion_tokens": 3,
    "prompt_tokens": 94,
    "total_tokens": 97
  }
}

The current way of calculating mean log prob gives -0.10833211608375, where it should be mean(-0.2976162, -0.00012346054, -0.5069456) = -0.26822842018

Please let me know what you think. Great work!

Cheers, Kaixian

opened by kaixqu 0

Owner

Wenlong Huang

Undergraduate Student @ UC Berkeley

GitHub https://huangwl18.github.io/language-planner/

Extracting Summary Knowledge Graphs from Long Documents

GraphSum This repo contains the data and code for the G2G model in the paper: Extracting Summary Knowledge Graphs from Long Documents. The other basel

10 Oct 21, 2022

Train 🤗transformers with DeepSpeed: ZeRO-2, ZeRO-3

Fork from https://github.com/huggingface/transformers/tree/86d5fb0b360e68de46d40265e7c707fe68c8015b/examples/pytorch/language-modeling at 2021.05.17.

12 Oct 26, 2022

Unofficial Implementation of Zero-Shot Text-to-Speech for Text-Based Insertion in Audio Narration

Zero-Shot Text-to-Speech for Text-Based Insertion in Audio Narration This repo contains only model Implementation of Zero-Shot Text-to-Speech for Text

33 Sep 22, 2022

EMNLP'2021: Can Language Models be Biomedical Knowledge Bases?

BioLAMA BioLAMA is biomedical factual knowledge triples for probing biomedical LMs. The triples are collected and pre-processed from three sources: CT

41 Nov 18, 2022

Code for Findings at EMNLP 2021 paper: "Learn Continually, Generalize Rapidly: Lifelong Knowledge Accumulation for Few-shot Learning"

Learn Continually, Generalize Rapidly: Lifelong Knowledge Accumulation for Few-shot Learning This repo is for Findings at EMNLP 2021 paper: Learn Cont

6 Sep 2, 2022

A look-ahead multi-entity Transformer for modeling coordinated agents.

baller2vec++ This is the repository for the paper: Michael A. Alcorn and Anh Nguyen. baller2vec++: A Look-Ahead Multi-Entity Transformer For Modeling

30 Dec 16, 2022

An easy to use, user-friendly and efficient code for extracting OpenAI CLIP (Global/Grid) features from image and text respectively.

Extracting OpenAI CLIP (Global/Grid) Features from Image and Text This repo aims at providing an easy to use and efficient code for extracting image &

13 Jan 6, 2023

This repository contains Python scripts for extracting linguistic features from Filipino texts.

Filipino Text Linguistic Feature Extractors This repository contains scripts for extracting linguistic features from Filipino texts. The scripts were

1 Oct 5, 2021

A Neural Language Style Transfer framework to transfer natural language text smoothly between fine-grained language styles like formal/casual, active/passive, and many more. Created by Prithiviraj Damodaran. Open to pull requests and other forms of collaboration.

Styleformer A Neural Language Style Transfer framework to transfer natural language text smoothly between fine-grained language styles like formal/cas

431 Dec 19, 2022

A design of MIDI language for music generation task, specifically for Natural Language Processing (NLP) models.

MIDI Language Introduction Reference Paper: Pop Music Transformer: Beat-based Modeling and Generation of Expressive Pop Piano Compositions: code This

3 May 25, 2022

Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents

Related tags

Overview

Language Models as Zero-Shot Planners:Extracting Actionable Knowledge for Embodied Agents

[Project Page] [Paper] [Video]

Local Setup or Open in Colab

Requirements

Setup Instructions

Running Code

You might also like...

Watson Natural Language Understanding and Knowledge Studio

Knowledge Oriented Programming Language

A collection of Classical Chinese natural language processing models, including Classical Chinese related models and resources on the Internet.

A library for finding knowledge neurons in pretrained transformer models.

A framework for evaluating Knowledge Graph Embedding Models in a fine-grained manner.

Text preprocessing, representation and visualization from zero to hero.

Text preprocessing, representation and visualization from zero to hero.

CPC-big and k-means clustering for zero-resource speech processing

Final Project Bootcamp Zero

Comments

the output of the demo.

Calculation of mean log probability (GPT-3)

Owner

Wenlong Huang

Extracting Summary Knowledge Graphs from Long Documents

Train 🤗transformers with DeepSpeed: ZeRO-2, ZeRO-3

Unofficial Implementation of Zero-Shot Text-to-Speech for Text-Based Insertion in Audio Narration

EMNLP'2021: Can Language Models be Biomedical Knowledge Bases?

Code for Findings at EMNLP 2021 paper: "Learn Continually, Generalize Rapidly: Lifelong Knowledge Accumulation for Few-shot Learning"

A look-ahead multi-entity Transformer for modeling coordinated agents.

An easy to use, user-friendly and efficient code for extracting OpenAI CLIP (Global/Grid) features from image and text respectively.

This repository contains Python scripts for extracting linguistic features from Filipino texts.

A Neural Language Style Transfer framework to transfer natural language text smoothly between fine-grained language styles like formal/casual, active/passive, and many more. Created by Prithiviraj Damodaran. Open to pull requests and other forms of collaboration.

A design of MIDI language for music generation task, specifically for Natural Language Processing (NLP) models.

Language Models as Zero-Shot Planners:
Extracting Actionable Knowledge for Embodied Agents