Neural Dynamic Policies for End-to-End Sensorimotor Learning

Overview

Neural Dynamic Policies for End-to-End Sensorimotor Learning

In NeurIPS 2020 (Spotlight) [Project Website] [Project Video]

Shikhar Bahl, Mustafa Mukadam, Abhinav Gupta, Deepak Pathak
Carnegie Mellon University & Facebook AI Research

This is a PyTorch based implementation for our NeurIPS 2020 paper on Neural Dynamic Policies for end-to-end sensorimotor learning. In this work, we begin to close this gap and embed dynamics structure into deep neural network-based policies by reparameterizing action spaces with differential equations. We propose Neural Dynamic Policies (NDPs) that make predictions in trajectory distribution space as opposed to prior policy learning methods where action represents the raw control space. The embedded structure allow us to perform end-to-end policy learning under both reinforcement and imitation learning setups. If you find this work useful in your research, please cite:

  @inproceedings{bahl2020neural,
    Author = { Bahl, Shikhar and Mukadam, Mustafa and
    Gupta, Abhinav and Pathak, Deepak},
    Title = {Neural Dynamic Policies for End-to-End Sensorimotor Learning},
    Booktitle = {NeurIPS},
    Year = {2020}
  }

1) Installation and Usage

  1. This code is based on PyTorch. This code needs MuJoCo 1.5 to run. To install and setup the code, run the following commands:
#create directory for data and add dependencies
cd neural-dynamic-polices; mkdir data/
git clone https://github.com/rll/rllab.git
git clone https://github.com/openai/baselines.git

#create virtual env
conda create --name ndp python=3.5
source activate ndp

#install requirements
pip install -r requirements.txt
#OR try
conda env create -f ndp.yaml
  1. Training imitation learning
cd neural-dynamic-polices
# name of the experiment
python main_il.py --name NAME
  1. Training RL: run the script run_rl.sh. ENV_NAME is the environment (could be throw, pick, push, soccer, faucet). ALGO-TYPE is the algorithm (dmp for NDPs, ppo for PPO [Schulman et al., 2017] and ppo-multi for the multistep actor-critic architecture we present in our paper).
sh run_rl.sh ENV_NAME ALGO-TYPE EXP_ID SEED
  1. In order to visualize trained models/policies, use the same exact arguments as used for training but call vis_policy.sh
  sh vis_policy.sh ENV_NAME ALGO-TYPE EXP_ID SEED

2) Other helpful pointers

3) Acknowledgements

We use the PPO infrastructure from: https://github.com/ikostrikov/pytorch-a2c-ppo-acktr-gail. We use environment code from: https://github.com/dibyaghosh/dnc/tree/master/dnc/envs, https://github.com/rlworkgroup/metaworld, https://github.com/vitchyr/multiworld. We use pytorch and RL utility functions from: https://github.com/vitchyr/rlkit. We use the DMP skeleton code from https://github.com/abr-ijs/imednet, https://github.com/abr-ijs/digit_generator. We also use https://github.com/rll/rllab.git and https://github.com/openai/baselines.git.

Comments
  • can not get the good result in paper

    can not get the good result in paper

    I run the code in RL,but I find there maybe some error.and I can not get the good result in paper,could you help me? My success rate always is 0.0,and I always get the error is mujoco_py.builder.MujocoException: Got MuJoCo Warning: Nan, Inf or huge value in QPOS at DOF 0. The simulation is unstable. Time = 2321.0500.

    opened by csufangyu 7
  • How can I verify your algorithm performance?

    How can I verify your algorithm performance?

    Hello, I have some questions while running your NDP algorithm.

    I finished environment settings and run code about 8hrs with $ sh run_rl.sh faucet dmp 2 1

    however it seems not working as well. 스크린샷, 2021-05-24 04-03-02

    My questions are

    1. How much time to train ppo and ndp algorithm for example environment?
    2. How can I know or visualize result plot?
    3. I am wondering about dmp_train.py code line 45 to line 72
    for j in range(num_updates):
            if args.use_linear_lr_decay:
                utils.update_linear_schedule(
                    agent.optimizer, j, num_updates,
                    agent.optimizer.lr if args.algo == "acktr" else args.lr)
            envs.reset()
            for step in range(args.num_steps):
                if step % args.T == 0:
                    with torch.no_grad():
                        values, actions, action_log_probs_list, recurrent_hidden_states_lst = actor_critic.act(
                            rollouts.obs[step], rollouts.recurrent_hidden_states[step],
                            rollouts.masks[step])
    
                    action = actions[step % args.T]
                    action_log_probs = action_log_probs_list[step % args.T]
                    recurrent_hidden_states = recurrent_hidden_states_lst[0]
                    value = values[:, step % args.T].view(-1, 1)
    
                obs, reward, done, infos = envs.step(action)
    
                episode_rewards.append(reward[0].item())
                masks = torch.FloatTensor(
                    [[0.0] if done_ else [1.0] for done_ in done])
                bad_masks = torch.FloatTensor(
                    [[0.0] if 'bad_transition' in info.keys() else [1.0]
                     for info in infos])
                rollouts.insert(obs, recurrent_hidden_states, action,
                                action_log_probs, value, reward, masks, bad_masks)
    

    in this code, you input same action for step Ts to T(s+1)-1 (T = args.T), but i think that for each step, it is right to set action to actions[step%args.N] since dmp actor outputs N steps(N=args.N) actions. could you explain more detail about this part?

    Thank you!

    opened by OhJeongwoo 2
  • Can not load the file './dmp/data/40x40-smnist.mat'

    Can not load the file './dmp/data/40x40-smnist.mat'

    Hi, I am doing this project as my final master project, but when I run the command as your guide, the code can not run due to not found a file '40x40-smnist.mat'. Could you help me solve this problem?

    opened by linksdl 2
  • About the './dmp/data/pretrained_weights.pt' file

    About the './dmp/data/pretrained_weights.pt' file

    Hi, @shikharbahl, I am struggling with a problem in 'main_il.py'. Is this parameter pt='./dmp/data/pretrained_weights.pt your trained model (ndp_cnn) ? If yes, could you upload this file, please?

    opened by linksdl 1
  • Is some error in 'smnist_loader.py' file?

    Is some error in 'smnist_loader.py' file?

    WX20210531-172638@2x

    @shikharbahl Hi, Could you please confirm 'smnist_loader.py'? 1, is this a '.mat' file type? 2, 'for image in data[image_key][0,0][0]'? 3, 'DMP_data = data[dmp_params_key][0, 0][0]'?

    opened by linksdl 1
  • Run

    Run "main_il.py" costs much time, is it right?

    Hello, I am learning about your work and want to run your code. When i run the "main_il.py", it costs much time for a epoch, and I just obtain the result of Epoch 0 as follows: 1649686079 I want to know that common time spent on this, and is there any mistake that I made for this? I'd appreciate it if you could help me, thank you.

    opened by Tiantiansayhi 0
  • Environment dependency outdated

    Environment dependency outdated

    Hi,

    I am a student from Karlsruhe Institute of Technology and I would like to have a deeper look of your work and experiment.

    I followed the readme and create a conda env but I got stuck in the pip packages installation. It looks like few pip dependencies are not available any more. Could you please update those dependencies?

    I will appreciate any suggestion from yours, thx!

    Best, Ge Li

    opened by BruceGeLi 0
Owner
Shikhar Bahl
AI Researcher at CMU (PhD, Robotics Institute) interested in deep RL, machine learning, robotics and optimization
Shikhar Bahl
SenseNet is a sensorimotor and touch simulator for deep reinforcement learning research

SenseNet is a sensorimotor and touch simulator for deep reinforcement learning research

null 59 Feb 25, 2022
🐤 Nix-TTS: An Incredibly Lightweight End-to-End Text-to-Speech Model via Non End-to-End Distillation

?? Nix-TTS An Incredibly Lightweight End-to-End Text-to-Speech Model via Non End-to-End Distillation Rendi Chevi, Radityo Eko Prasojo, Alham Fikri Aji

Rendi Chevi 156 Jan 9, 2023
AdaFocus V2: End-to-End Training of Spatial Dynamic Networks for Video Recognition

AdaFocusV2 This repo contains the official code and pre-trained models for AdaFo

null 79 Dec 26, 2022
Official codebase for Legged Robots that Keep on Learning: Fine-Tuning Locomotion Policies in the Real World

Legged Robots that Keep on Learning Official codebase for Legged Robots that Keep on Learning: Fine-Tuning Locomotion Policies in the Real World, whic

Laura Smith 70 Dec 7, 2022
Softlearning is a reinforcement learning framework for training maximum entropy policies in continuous domains. Includes the official implementation of the Soft Actor-Critic algorithm.

Softlearning Softlearning is a deep reinforcement learning toolbox for training maximum entropy policies in continuous domains. The implementation is

Robotic AI & Learning Lab Berkeley 997 Dec 30, 2022
Code for NeurIPS 2021 paper: Invariant Causal Imitation Learning for Generalizable Policies

Invariant Causal Imitation Learning for Generalizable Policies Ioana Bica, Daniel Jarrett, Mihaela van der Schaar Neural Information Processing System

Ioana Bica 17 Dec 1, 2022
Dynamic View Synthesis from Dynamic Monocular Video

Towards Robust Monocular Depth Estimation: Mixing Datasets for Zero-shot Cross-dataset Transfer This repository contains code to compute depth from a

Intelligent Systems Lab Org 2.3k Jan 1, 2023
Dynamic View Synthesis from Dynamic Monocular Video

Dynamic View Synthesis from Dynamic Monocular Video Project Website | Video | Paper Dynamic View Synthesis from Dynamic Monocular Video Chen Gao, Ayus

Chen Gao 139 Dec 28, 2022
Dynamic vae - Dynamic VAE algorithm is used for anomaly detection of battery data

Dynamic VAE frame Automatic feature extraction can be achieved by probability di

null 10 Oct 7, 2022
A complete end-to-end demonstration in which we collect training data in Unity and use that data to train a deep neural network to predict the pose of a cube. This model is then deployed in a simulated robotic pick-and-place task.

Object Pose Estimation Demo This tutorial will go through the steps necessary to perform pose estimation with a UR3 robotic arm in Unity. You’ll gain

Unity Technologies 187 Dec 24, 2022
[ICCV'21] NEAT: Neural Attention Fields for End-to-End Autonomous Driving

NEAT: Neural Attention Fields for End-to-End Autonomous Driving Paper | Supplementary | Video | Poster | Blog This repository is for the ICCV 2021 pap

null 254 Jan 2, 2023
PyTorch implementation of SampleRNN: An Unconditional End-to-End Neural Audio Generation Model

samplernn-pytorch A PyTorch implementation of SampleRNN: An Unconditional End-to-End Neural Audio Generation Model. It's based on the reference implem

DeepSound 261 Dec 14, 2022
An end-to-end machine learning web app to predict rugby scores (Pandas, SQLite, Keras, Flask, Docker)

Rugby score prediction An end-to-end machine learning web app to predict rugby scores Overview An demo project to provide a high-level overview of the

null 34 May 24, 2022
Learning recognition/segmentation models without end-to-end training. 40%-60% less GPU memory footprint. Same training time. Better performance.

InfoPro-Pytorch The Information Propagation algorithm for training deep networks with local supervision. (ICLR 2021) Revisiting Locally Supervised Lea

null 78 Dec 27, 2022
An end-to-end machine learning library to directly optimize AUC loss

LibAUC An end-to-end machine learning library for AUC optimization. Why LibAUC? Deep AUC Maximization (DAM) is a paradigm for learning a deep neural n

Andrew 75 Dec 12, 2022
[CVPR'21 Oral] Seeing Out of tHe bOx: End-to-End Pre-training for Vision-Language Representation Learning

Seeing Out of tHe bOx: End-to-End Pre-training for Vision-Language Representation Learning [CVPR'21, Oral] By Zhicheng Huang*, Zhaoyang Zeng*, Yupan H

Multimedia Research 196 Dec 13, 2022
"SOLQ: Segmenting Objects by Learning Queries", SOLQ is an end-to-end instance segmentation framework with Transformer.

SOLQ: Segmenting Objects by Learning Queries This repository is an official implementation of the paper SOLQ: Segmenting Objects by Learning Queries.

MEGVII Research 179 Jan 2, 2023
VITS: Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech

VITS: Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech Jaehyeon Kim, Jungil Kong, and Juhee Son In our rece

Jaehyeon Kim 1.7k Jan 8, 2023
FPGA: Fast Patch-Free Global Learning Framework for Fully End-to-End Hyperspectral Image Classification

FPGA & FreeNet Fast Patch-Free Global Learning Framework for Fully End-to-End Hyperspectral Image Classification by Zhuo Zheng, Yanfei Zhong, Ailong M

Zhuo Zheng 92 Jan 3, 2023