Code for "Temporal Difference Learning for Model Predictive Control"

Related tags

Downloader tdmpc
Overview

Temporal Difference Learning for Model Predictive Control

Original PyTorch implementation of TD-MPC from

Temporal Difference Learning for Model Predictive Control by

Nicklas Hansen, Xiaolong Wang*, Hao Su*



[Paper][Website]

Method

TD-MPC is a framework for model predictive control (MPC) using a Task-Oriented Latent Dynamics (TOLD) model and a terminal value function learned jointly by temporal difference (TD) learning. TD-MPC plans actions entirely in latent space using the TOLD model, which learns compact task-centric representations from either state or image inputs. TD-MPC solves challenging Humanoid and Dog locomotion tasks in 1M environment steps.

Citation

If you use our method or code in your research, please consider citing the paper as follows:

@article{Hansen2022tdmpc,
	title={Temporal Difference Learning for Model Predictive Control},
	author={Nicklas Hansen and Xiaolong Wang and Hao Su},
	eprint={2203.04955},
	archivePrefix={arXiv},
	primaryClass={cs.LG},
	year={2022}
}

Instructions

Assuming that you already have MuJoCo installed, install dependencies using conda:

conda env create -f environment.yaml
conda activate tdmpc

After installing dependencies, you can train an agent by calling

python src/train.py task=dog-run

Evaluation videos and model weights can be saved with arguments save_video=True and save_model=True. Refer to the cfgs directory for a full list of options and default hyperparameters, and see tasks.txt for a list of supported tasks. We also provide results for all 23 state-based DMControl tasks in the results directory.

The training script supports both local logging as well as cloud-based logging with Weights & Biases. To use W&B, provide a key by setting the environment variable WANDB_API_KEY=<YOUR_KEY> and add your W&B project and entity details to cfgs/default.yaml.

Changelog

  • [03-27-2022] Reduced memory usage in pixel experiments by 6x. Code improvements. Refactoring. Update default pixel hyperparameters.
  • [03-10-2022] Initial code release.

License & Acknowledgements

TD-MPC is licensed under the MIT license. MuJoCo and DeepMind Control Suite are licensed under the Apache 2.0 license. We thank the DrQv2 authors for their implementation of DMControl wrappers.

Comments
  • A Question about mpc and td-mpc

    A Question about mpc and td-mpc

    Thank you for your apply a few days ago. I'm doing something about control the medicine dose with rl. I woder whether the td-mpc can against the mpc with biologic model. If possible, what's the advantage of td-mpc compared with mpc in this field. It's just my own question.

    opened by Wenzhou-Lyu 4
  • Implementation in Openai Atari gym

    Implementation in Openai Atari gym

    Firstly, thank you for the awesome paper!

    I was trying to implement this using OpenAI's Atari gym instead of the physics sims and I was having some trouble getting it set up. Could you give me some tips on how I should modify the functions to make it work? Thanks!

    opened by zrbak 1
  • Multimodal data as input to the model

    Multimodal data as input to the model

    Hi, congratulations on the amazing work!

    I wanted to ask a question, the paper mentions that multimodal data [RGB + proprioception] can be used as input of the model

    In the code, the observations are sent to an encoder that process them in different ways depending if it's pixels or another modality, nevertheless I'm not sure that any of those options apply to multimodal data containing both pixels and state information. Given the experiments you made in the paper, how would you recommend processing such data in the encoder?

    opened by SergioArnaud 1
  • Why don't have Meta-World task in the task.txt?

    Why don't have Meta-World task in the task.txt?

    I saw that Meta-World is used in the paper, but it doesn't seem to be in the code. How do I change the code if I want to use a Meta-World environment?

    opened by 945716994 1
  • the setting of random seed

    the setting of random seed

    The result of my experiment on Reacher-easy is only 318±172(random seed is 1 to 5), which the article is 628±105. All settings are the same as in the article. Maybe it is because of the setting of the random seed. Could you share with us the exact value of the random seed? Thank you very much!

    opened by Arya87 1
  • Why I can't save the video?

    Why I can't save the video?

    I ran the command python src/train.py task=dog-run save_video=True save_model=True When finished the train, I only got the model.pt, there are not video saved.

    My platform is ubuntu20.04

    opened by Bailey-24 1
  • Question about base.device

    Question about base.device

    Hi! thanks for this excellent paper and it inspired me a lot!

    In my experiment, I'm confused about the device setting in cfg.py:

    https://github.com/nicklashansen/tdmpc/blob/96cb7036ecf06f75d5ffd64a0454bbab7d0d3e17/src/cfg.py#L45

    I don't understand why we need to make pixel's environment(modality=='pixels') based on cpu and state's environment(modality=='state') based on gpu(cuda), is there any special purpose for this setting?

    opened by pickxiguapi 1
  • Intuition behind using LayerNorm in Q function?

    Intuition behind using LayerNorm in Q function?

    Hi, while I was reading through the code implementation, I found that LayerNorm is only used after the input layer of Q-network. What is the reason for this?

    I guess it stabilizes the Q-value estimation that is susceptible to wrong value estimations.

    Thanks in advance!

    opened by mch5048 1
  • Typo in the arxived paper and some question on the notation.

    Typo in the arxived paper and some question on the notation.

    Hi, thanks for brining a new perspective to this field with this paper. I really enjoyed reading it.

    While reading through the paper, I found a typo in the inline equation that describes the MPC.

    In the Model predictive control subsection under Preliminaries, it is stated that the globally optimal policy \Pi_{\theta} is proportional to the expectation of the negated Q-values. I think the negation should be removed, intuitively.

    Another question regarding the description of MPPI in Sec. 3 is about Eq.(4), that describes the CEM.

    Here, the mean/var of the j-th policy is computed based on weighted/shifted \Gamma, where \Gamma is denoted as sampled trajectory in the paragraph. I guess the authors meant the state-action sequence as \Gamma. Thus, \Gamma^{\star}_i in Eq.(4) should be replaced with the action I guess. Maybe the code snippet here corresponds to this equation?

    I wonder if I understood it correctly. Thanks in advance!

    opened by mch5048 1
  • how to adapt to discrete action space

    how to adapt to discrete action space

    Hi! First of all, thanks for such an interesting and well written paper!

    For my experiments, I would really like to try this approach on Crafter to compare it with DreamerV2. The problem is that Crafter only supports discrete action space.

    I'm mostly familiar with model-free algorithms, so it's not quite obvious to me how this algorithm can be adapted. As far as I know MCTS is used for discrete action space, but writing it efficiently in python is a challenge. Is it reasonable to try to just change the distribution in CEM from normal to categorical and sample from it? Could it work?

    opened by Howuhh 1
  • Intuition behind using zero initialization in critic and reward model last layer

    Intuition behind using zero initialization in critic and reward model last layer

    Hi, while I was reading through the code implementation, I found that you use zero initialization in critic and reward model last layer. What is the reason for this? I guess this can avoid some overestimation, is that right? Other question is: why you choose the orthogonal_init except last layer, is there any theory behind this? zero initialization Thanks in advance!

    opened by hdadong 0
Owner
Nicklas Hansen
PhD student @ UC San Diego. Previously: UC Berkeley, DTU, NTUsg. Working on machine learning, robotics, and computer vision.
Nicklas Hansen
Write reproducible code for getting and processing ChEMBL

chembl_downloader Don't worry about downloading/extracting ChEMBL or versioning - just use chembl_downloader to write code that knows how to download

Charles Tapley Hoyt 34 Dec 25, 2022
python code used to download all images contained in a facebook uid , the uid can be profile,group,fanpage

python code used to download all images contained in a facebook uid , the uid can be profile,group,fanpage

VVHai 2 Dec 21, 2021
Code to scrape , download and upload to youtube daily

Youtube_Automated_Channel Code to scrape , download and upload to youtube daily INSTRUCTIONS Download the Github Repository Download and install Pytho

Atsiksdong 2 Dec 19, 2021
This repository contains code for a youtube-dl GUI written in PyQt.

youtube-dl-GUI This repository contains code for a youtube-dl GUI written in PyQt. It is based on youtube-dl which is a Video downloading script maint

M.Yasoob Ullah Khalid ☺ 191 Jan 2, 2023
Source code of paper: "HRegNet: A Hierarchical Network for Efficient and Accurate Outdoor LiDAR Point Cloud Registration".

HRegNet: A Hierarchical Network for Efficient and Accurate Outdoor LiDAR Point Cloud Registration Environments The code mainly requires the following

Intelligent Sensing, Perception and Computing Group 3 Oct 6, 2022
code for paper"3D reconstruction method based on a generative model in continuous latent space"

PyTorch implementation of 3D-VGT(3D-VAE-GAN-Transformer) This repository contains the source code for the paper "3D reconstruction method based on a g

Tong 5 Apr 25, 2022
Code for "Adversarial Motion Priors Make Good Substitutes for Complex Reward Functions"

Adversarial Motion Priors Make Good Substitutes for Complex Reward Functions Codebase for the "Adversarial Motion Priors Make Good Substitutes for Com

Alejandro Escontrela 54 Dec 13, 2022
GTK4 + Python tutorial with code examples

Taiko's GTK4 Python tutorial Wanna make apps for Linux but not sure how to start with GTK? This guide will hopefully help! The intent is to show you h

null 190 Jan 8, 2023
This is discord nitro code generator and checker made with python. This will generate nitro codes and checks if the code is valid or not. If code is valid then it will print the code leaving 2 lines and if not then it will print '*'.

Discord Nitro Generator And Checker ⚙️ Rᴜɴ Oɴ Rᴇᴘʟɪᴛ ??️ Lᴀɴɢᴜᴀɢᴇs Aɴᴅ Tᴏᴏʟs If you are taking code from this repository without a fork, then atleast

Vɪɴᴀʏᴀᴋ Pᴀɴᴅᴇʏ 37 Jan 7, 2023
This is the official source code for SLATE. We provide the code for the model, the training code, and a dataset loader for the 3D Shapes dataset. This code is implemented in Pytorch.

SLATE This is the official source code for SLATE. We provide the code for the model, the training code and a dataset loader for the 3D Shapes dataset.

Gautam Singh 66 Dec 26, 2022
IDE allow you to refactor code, Baron allows you to write refactoring code.

Introduction Baron is a Full Syntax Tree (FST) library for Python. By opposition to an AST which drops some syntax information in the process of its c

Python Code Quality Authority 278 Dec 29, 2022
Various code metrics for Python code

Radon Radon is a Python tool that computes various metrics from the source code. Radon can compute: McCabe's complexity, i.e. cyclomatic complexity ra

Michele Lacchia 1.4k Jan 7, 2023
Code examples for my Write Better Python Code series on YouTube.

Write Better Python Code This repository contains the code examples used in my Write Better Python Code series published on YouTube: https:/

null 858 Dec 29, 2022
Django project starter on steroids: quickly create a Django app AND generate source code for data models + REST/GraphQL APIs (the generated code is auto-linted and has 100% test coverage).

Create Django App ?? We're a Django project starter on steroids! One-line command to create a Django app with all the dependencies auto-installed AND

imagine.ai 68 Oct 19, 2022
TensorFlow code for the neural network presented in the paper: "Structural Language Models of Code" (ICML'2020)

SLM: Structural Language Models of Code This is an official implementation of the model described in: "Structural Language Models of Code" [PDF] To ap

null 73 Nov 6, 2022
Code to use Augmented Shapiro Wilks Stopping, as well as code for the paper "Statistically Signifigant Stopping of Neural Network Training"

This codebase is being actively maintained, please create and issue if you have issues using it Basics All data files are included under losses and ea

Justin Terry 32 Nov 9, 2021
Inference code for "StylePeople: A Generative Model of Fullbody Human Avatars" paper. This code is for the part of the paper describing video-based avatars.

NeuralTextures This is repository with inference code for paper "StylePeople: A Generative Model of Fullbody Human Avatars" (CVPR21). This code is for

Visual Understanding Lab @ Samsung AI Center Moscow 18 Oct 6, 2022
A code generator from ONNX to PyTorch code

onnx-pytorch Generating pytorch code from ONNX. Currently support onnx==1.9.0 and torch==1.8.1. Installation From PyPI pip install onnx-pytorch From

Wenhao Hu 94 Jan 6, 2023
SCodeScanner stands for Source Code scanner where the user can scans the source code for finding the Critical Vulnerabilities.

The SCodeScanner stands for Source Code Scanner, where you can scan your source code files like PHP and get identify the vulnerabilities inside it. The tool can use by Pentester, Developer to quickly identify the weakness.

null 136 Dec 13, 2022
This is the code for our KILT leaderboard submission to the T-REx and zsRE tasks. It includes code for training a DPR model then continuing training with RAG.

KGI (Knowledge Graph Induction) for slot filling This is the code for our KILT leaderboard submission to the T-REx and zsRE tasks. It includes code fo

International Business Machines 72 Jan 6, 2023