Code for "Temporal Difference Learning for Model Predictive Control"

Nicklas Hansen

Last update: Jan 3, 2023

Related tags

Downloader tdmpc

Overview

Temporal Difference Learning for Model Predictive Control

Original PyTorch implementation of TD-MPC from

Temporal Difference Learning for Model Predictive Control by

Nicklas Hansen, Xiaolong Wang*, Hao Su*

[Paper] [Website]

Method

TD-MPC is a framework for model predictive control (MPC) using a Task-Oriented Latent Dynamics (TOLD) model and a terminal value function learned jointly by temporal difference (TD) learning. TD-MPC plans actions entirely in latent space using the TOLD model, which learns compact task-centric representations from either state or image inputs. TD-MPC solves challenging Humanoid and Dog locomotion tasks in 1M environment steps.

Citation

If you use our method or code in your research, please consider citing the paper as follows:

@article{Hansen2022tdmpc,
	title={Temporal Difference Learning for Model Predictive Control},
	author={Nicklas Hansen and Xiaolong Wang and Hao Su},
	eprint={2203.04955},
	archivePrefix={arXiv},
	primaryClass={cs.LG},
	year={2022}
}

Instructions

Assuming that you already have MuJoCo installed, install dependencies using conda:

conda env create -f environment.yaml
conda activate tdmpc

After installing dependencies, you can train an agent by calling

python src/train.py task=dog-run

Evaluation videos and model weights can be saved with arguments save_video=True and save_model=True. Refer to the cfgs directory for a full list of options and default hyperparameters, and see tasks.txt for a list of supported tasks. We also provide results for all 23 state-based DMControl tasks in the results directory.

The training script supports both local logging as well as cloud-based logging with Weights & Biases. To use W&B, provide a key by setting the environment variable WANDB_API_KEY=<YOUR_KEY> and add your W&B project and entity details to cfgs/default.yaml.

Changelog

[03-27-2022] Reduced memory usage in pixel experiments by 6x. Code improvements. Refactoring. Update default pixel hyperparameters.
[03-10-2022] Initial code release.

License & Acknowledgements

TD-MPC is licensed under the MIT license. MuJoCo and DeepMind Control Suite are licensed under the Apache 2.0 license. We thank the DrQv2 authors for their implementation of DMControl wrappers.

Comments

A Question about mpc and td-mpc

Thank you for your apply a few days ago. I'm doing something about control the medicine dose with rl. I woder whether the td-mpc can against the mpc with biologic model. If possible, what's the advantage of td-mpc compared with mpc in this field. It's just my own question.

opened by Wenzhou-Lyu 4
Implementation in Openai Atari gym

Firstly, thank you for the awesome paper!

I was trying to implement this using OpenAI's Atari gym instead of the physics sims and I was having some trouble getting it set up. Could you give me some tips on how I should modify the functions to make it work? Thanks!

opened by zrbak 1
Multimodal data as input to the model

Hi, congratulations on the amazing work!

I wanted to ask a question, the paper mentions that multimodal data [RGB + proprioception] can be used as input of the model

In the code, the observations are sent to an encoder that process them in different ways depending if it's pixels or another modality, nevertheless I'm not sure that any of those options apply to multimodal data containing both pixels and state information. Given the experiments you made in the paper, how would you recommend processing such data in the encoder?

opened by SergioArnaud 1
Why don't have Meta-World task in the task.txt?

I saw that Meta-World is used in the paper, but it doesn't seem to be in the code. How do I change the code if I want to use a Meta-World environment?

opened by 945716994 1
the setting of random seed

The result of my experiment on Reacher-easy is only 318±172(random seed is 1 to 5), which the article is 628±105. All settings are the same as in the article. Maybe it is because of the setting of the random seed. Could you share with us the exact value of the random seed? Thank you very much!

opened by Arya87 1
Why I can't save the video?

I ran the command python src/train.py task=dog-run save_video=True save_model=True When finished the train, I only got the model.pt, there are not video saved.

My platform is ubuntu20.04

opened by Bailey-24 1
Question about base.device

Hi! thanks for this excellent paper and it inspired me a lot!

In my experiment, I'm confused about the device setting in cfg.py:

https://github.com/nicklashansen/tdmpc/blob/96cb7036ecf06f75d5ffd64a0454bbab7d0d3e17/src/cfg.py#L45

I don't understand why we need to make pixel's environment(modality=='pixels') based on cpu and state's environment(modality=='state') based on gpu(cuda), is there any special purpose for this setting?

opened by pickxiguapi 1
Intuition behind using LayerNorm in Q function?

Hi, while I was reading through the code implementation, I found that LayerNorm is only used after the input layer of Q-network. What is the reason for this?

I guess it stabilizes the Q-value estimation that is susceptible to wrong value estimations.

Thanks in advance!

opened by mch5048 1
Typo in the arxived paper and some question on the notation.

Hi, thanks for brining a new perspective to this field with this paper. I really enjoyed reading it.

While reading through the paper, I found a typo in the inline equation that describes the MPC.

In the Model predictive control subsection under Preliminaries, it is stated that the globally optimal policy \Pi_{\theta} is proportional to the expectation of the negated Q-values. I think the negation should be removed, intuitively.

Another question regarding the description of MPPI in Sec. 3 is about Eq.(4), that describes the CEM.

Here, the mean/var of the j-th policy is computed based on weighted/shifted \Gamma, where \Gamma is denoted as sampled trajectory in the paragraph. I guess the authors meant the state-action sequence as \Gamma. Thus, \Gamma^{\star}_i in Eq.(4) should be replaced with the action I guess. Maybe the code snippet here corresponds to this equation?

I wonder if I understood it correctly. Thanks in advance!

opened by mch5048 1
how to adapt to discrete action space

Hi! First of all, thanks for such an interesting and well written paper!

For my experiments, I would really like to try this approach on Crafter to compare it with DreamerV2. The problem is that Crafter only supports discrete action space.

I'm mostly familiar with model-free algorithms, so it's not quite obvious to me how this algorithm can be adapted. As far as I know MCTS is used for discrete action space, but writing it efficiently in python is a challenge. Is it reasonable to try to just change the distribution in CEM from normal to categorical and sample from it? Could it work?

opened by Howuhh 1
Intuition behind using zero initialization in critic and reward model last layer

Hi, while I was reading through the code implementation, I found that you use zero initialization in critic and reward model last layer. What is the reason for this? I guess this can avoid some overestimation, is that right? Other question is: why you choose the orthogonal_init except last layer, is there any theory behind this? zero initialization Thanks in advance!

opened by hdadong 0

Owner

Nicklas Hansen

PhD student @ UC San Diego. Previously: UC Berkeley, DTU, NTUsg. Working on machine learning, robotics, and computer vision.

GitHub

Write reproducible code for getting and processing ChEMBL

chembl_downloader Don't worry about downloading/extracting ChEMBL or versioning - just use chembl_downloader to write code that knows how to download

34 Dec 25, 2022

python code used to download all images contained in a facebook uid , the uid can be profile,group,fanpage

2 Dec 21, 2021

Code to scrape , download and upload to youtube daily

Youtube_Automated_Channel Code to scrape , download and upload to youtube daily INSTRUCTIONS Download the Github Repository Download and install Pytho

2 Dec 19, 2021

This repository contains code for a youtube-dl GUI written in PyQt.

youtube-dl-GUI This repository contains code for a youtube-dl GUI written in PyQt. It is based on youtube-dl which is a Video downloading script maint

191 Jan 2, 2023

Source code of paper: "HRegNet: A Hierarchical Network for Efficient and Accurate Outdoor LiDAR Point Cloud Registration".

HRegNet: A Hierarchical Network for Efficient and Accurate Outdoor LiDAR Point Cloud Registration Environments The code mainly requires the following

Intelligent Sensing, Perception and Computing Group

3 Oct 6, 2022

code for paper"3D reconstruction method based on a generative model in continuous latent space"

PyTorch implementation of 3D-VGT(3D-VAE-GAN-Transformer) This repository contains the source code for the paper "3D reconstruction method based on a g

5 Apr 25, 2022

Code for "Adversarial Motion Priors Make Good Substitutes for Complex Reward Functions"

Adversarial Motion Priors Make Good Substitutes for Complex Reward Functions Codebase for the "Adversarial Motion Priors Make Good Substitutes for Com

54 Dec 13, 2022

GTK4 + Python tutorial with code examples

Taiko's GTK4 Python tutorial Wanna make apps for Linux but not sure how to start with GTK? This guide will hopefully help! The intent is to show you h

190 Jan 8, 2023

This is discord nitro code generator and checker made with python. This will generate nitro codes and checks if the code is valid or not. If code is valid then it will print the code leaving 2 lines and if not then it will print '*'.

Discord Nitro Generator And Checker ⚙️ Rᴜɴ Oɴ Rᴇᴘʟɪᴛ ??️ Lᴀɴɢᴜᴀɢᴇs Aɴᴅ Tᴏᴏʟs If you are taking code from this repository without a fork, then atleast

37 Jan 7, 2023

This is the official source code for SLATE. We provide the code for the model, the training code, and a dataset loader for the 3D Shapes dataset. This code is implemented in Pytorch.

SLATE This is the official source code for SLATE. We provide the code for the model, the training code and a dataset loader for the 3D Shapes dataset.

66 Dec 26, 2022

IDE allow you to refactor code, Baron allows you to write refactoring code.

Introduction Baron is a Full Syntax Tree (FST) library for Python. By opposition to an AST which drops some syntax information in the process of its c

278 Dec 29, 2022

Various code metrics for Python code

Radon Radon is a Python tool that computes various metrics from the source code. Radon can compute: McCabe's complexity, i.e. cyclomatic complexity ra

1.4k Jan 7, 2023

Code examples for my Write Better Python Code series on YouTube.

Write Better Python Code This repository contains the code examples used in my Write Better Python Code series published on YouTube: https:/

858 Dec 29, 2022

Django project starter on steroids: quickly create a Django app AND generate source code for data models + REST/GraphQL APIs (the generated code is auto-linted and has 100% test coverage).

Create Django App ?? We're a Django project starter on steroids! One-line command to create a Django app with all the dependencies auto-installed AND

68 Oct 19, 2022

TensorFlow code for the neural network presented in the paper: "Structural Language Models of Code" (ICML'2020)

SLM: Structural Language Models of Code This is an official implementation of the model described in: "Structural Language Models of Code" [PDF] To ap

73 Nov 6, 2022

Code to use Augmented Shapiro Wilks Stopping, as well as code for the paper "Statistically Signifigant Stopping of Neural Network Training"

This codebase is being actively maintained, please create and issue if you have issues using it Basics All data files are included under losses and ea

32 Nov 9, 2021

Inference code for "StylePeople: A Generative Model of Fullbody Human Avatars" paper. This code is for the part of the paper describing video-based avatars.

NeuralTextures This is repository with inference code for paper "StylePeople: A Generative Model of Fullbody Human Avatars" (CVPR21). This code is for

Visual Understanding Lab @ Samsung AI Center Moscow

18 Oct 6, 2022

A code generator from ONNX to PyTorch code

onnx-pytorch Generating pytorch code from ONNX. Currently support onnx==1.9.0 and torch==1.8.1. Installation From PyPI pip install onnx-pytorch From

94 Jan 6, 2023

SCodeScanner stands for Source Code scanner where the user can scans the source code for finding the Critical Vulnerabilities.

The SCodeScanner stands for Source Code Scanner, where you can scan your source code files like PHP and get identify the vulnerabilities inside it. The tool can use by Pentester, Developer to quickly identify the weakness.

136 Dec 13, 2022

This is the code for our KILT leaderboard submission to the T-REx and zsRE tasks. It includes code for training a DPR model then continuing training with RAG.

KGI (Knowledge Graph Induction) for slot filling This is the code for our KILT leaderboard submission to the T-REx and zsRE tasks. It includes code fo

72 Jan 6, 2023