Neural Dynamic Policies for End-to-End Sensorimotor Learning

Shikhar Bahl

Last update: Dec 11, 2022

Related tags

Deep Learning neural-dynamic-policies

Overview

Neural Dynamic Policies for End-to-End Sensorimotor Learning

In NeurIPS 2020 (Spotlight) [Project Website] [Project Video]

Shikhar Bahl, Mustafa Mukadam, Abhinav Gupta, Deepak Pathak
Carnegie Mellon University & Facebook AI Research

This is a PyTorch based implementation for our NeurIPS 2020 paper on Neural Dynamic Policies for end-to-end sensorimotor learning. In this work, we begin to close this gap and embed dynamics structure into deep neural network-based policies by reparameterizing action spaces with differential equations. We propose Neural Dynamic Policies (NDPs) that make predictions in trajectory distribution space as opposed to prior policy learning methods where action represents the raw control space. The embedded structure allow us to perform end-to-end policy learning under both reinforcement and imitation learning setups. If you find this work useful in your research, please cite:

  @inproceedings{bahl2020neural,
    Author = { Bahl, Shikhar and Mukadam, Mustafa and
    Gupta, Abhinav and Pathak, Deepak},
    Title = {Neural Dynamic Policies for End-to-End Sensorimotor Learning},
    Booktitle = {NeurIPS},
    Year = {2020}
  }

1) Installation and Usage

This code is based on PyTorch. This code needs MuJoCo 1.5 to run. To install and setup the code, run the following commands:

#create directory for data and add dependencies
cd neural-dynamic-polices; mkdir data/
git clone https://github.com/rll/rllab.git
git clone https://github.com/openai/baselines.git

#create virtual env
conda create --name ndp python=3.5
source activate ndp

#install requirements
pip install -r requirements.txt
#OR try
conda env create -f ndp.yaml

Training imitation learning

cd neural-dynamic-polices
# name of the experiment
python main_il.py --name NAME

Training RL: run the script run_rl.sh. ENV_NAME is the environment (could be throw, pick, push, soccer, faucet). ALGO-TYPE is the algorithm (dmp for NDPs, ppo for PPO [Schulman et al., 2017] and ppo-multi for the multistep actor-critic architecture we present in our paper).

sh run_rl.sh ENV_NAME ALGO-TYPE EXP_ID SEED

In order to visualize trained models/policies, use the same exact arguments as used for training but call vis_policy.sh

  sh vis_policy.sh ENV_NAME ALGO-TYPE EXP_ID SEED

2) Other helpful pointers

3) Acknowledgements

Comments

can not get the good result in paper

I run the code in RL,but I find there maybe some error.and I can not get the good result in paper,could you help me? My success rate always is 0.0,and I always get the error is mujoco_py.builder.MujocoException: Got MuJoCo Warning: Nan, Inf or huge value in QPOS at DOF 0. The simulation is unstable. Time = 2321.0500.

opened by csufangyu 7

How can I verify your algorithm performance?

Hello, I have some questions while running your NDP algorithm.

I finished environment settings and run code about 8hrs with $ sh run_rl.sh faucet dmp 2 1

however it seems not working as well. 스크린샷, 2021-05-24 04-03-02

My questions are

How much time to train ppo and ndp algorithm for example environment?
How can I know or visualize result plot?
I am wondering about dmp_train.py code line 45 to line 72

for j in range(num_updates):
        if args.use_linear_lr_decay:
            utils.update_linear_schedule(
                agent.optimizer, j, num_updates,
                agent.optimizer.lr if args.algo == "acktr" else args.lr)
        envs.reset()
        for step in range(args.num_steps):
            if step % args.T == 0:
                with torch.no_grad():
                    values, actions, action_log_probs_list, recurrent_hidden_states_lst = actor_critic.act(
                        rollouts.obs[step], rollouts.recurrent_hidden_states[step],
                        rollouts.masks[step])

                action = actions[step % args.T]
                action_log_probs = action_log_probs_list[step % args.T]
                recurrent_hidden_states = recurrent_hidden_states_lst[0]
                value = values[:, step % args.T].view(-1, 1)

            obs, reward, done, infos = envs.step(action)

            episode_rewards.append(reward[0].item())
            masks = torch.FloatTensor(
                [[0.0] if done_ else [1.0] for done_ in done])
            bad_masks = torch.FloatTensor(
                [[0.0] if 'bad_transition' in info.keys() else [1.0]
                 for info in infos])
            rollouts.insert(obs, recurrent_hidden_states, action,
                            action_log_probs, value, reward, masks, bad_masks)

in this code, you input same action for step Ts to T(s+1)-1 (T = args.T), but i think that for each step, it is right to set action to actions[step%args.N] since dmp actor outputs N steps(N=args.N) actions. could you explain more detail about this part?

Thank you!

opened by OhJeongwoo 2

Can not load the file './dmp/data/40x40-smnist.mat'

Hi, I am doing this project as my final master project, but when I run the command as your guide, the code can not run due to not found a file '40x40-smnist.mat'. Could you help me solve this problem?

opened by linksdl 2
About the './dmp/data/pretrained_weights.pt' file

Hi, @shikharbahl, I am struggling with a problem in 'main_il.py'. Is this parameter pt='./dmp/data/pretrained_weights.pt your trained model (ndp_cnn) ? If yes, could you upload this file, please?

opened by linksdl 1
Is some error in 'smnist_loader.py' file?

@shikharbahl Hi, Could you please confirm 'smnist_loader.py'? 1, is this a '.mat' file type? 2, 'for image in data[image_key][0,0][0]'? 3, 'DMP_data = data[dmp_params_key][0, 0][0]'?

opened by linksdl 1
Run "main_il.py" costs much time, is it right?

Hello, I am learning about your work and want to run your code. When i run the "main_il.py", it costs much time for a epoch, and I just obtain the result of Epoch 0 as follows: I want to know that common time spent on this, and is there any mistake that I made for this? I'd appreciate it if you could help me, thank you.

opened by Tiantiansayhi 0
Environment dependency outdated

Hi,

I am a student from Karlsruhe Institute of Technology and I would like to have a deeper look of your work and experiment.

I followed the readme and create a conda env but I got stuck in the pip packages installation. It looks like few pip dependencies are not available any more. Could you please update those dependencies?

I will appreciate any suggestion from yours, thx!

Best, Ge Li

opened by BruceGeLi 0

Owner

Shikhar Bahl

AI Researcher at CMU (PhD, Robotics Institute) interested in deep RL, machine learning, robotics and optimization

GitHub

SenseNet is a sensorimotor and touch simulator for deep reinforcement learning research

59 Feb 25, 2022

🐤 Nix-TTS: An Incredibly Lightweight End-to-End Text-to-Speech Model via Non End-to-End Distillation

?? Nix-TTS An Incredibly Lightweight End-to-End Text-to-Speech Model via Non End-to-End Distillation Rendi Chevi, Radityo Eko Prasojo, Alham Fikri Aji

156 Jan 9, 2023

AdaFocus V2: End-to-End Training of Spatial Dynamic Networks for Video Recognition

AdaFocusV2 This repo contains the official code and pre-trained models for AdaFo

79 Dec 26, 2022

Official codebase for Legged Robots that Keep on Learning: Fine-Tuning Locomotion Policies in the Real World

Legged Robots that Keep on Learning Official codebase for Legged Robots that Keep on Learning: Fine-Tuning Locomotion Policies in the Real World, whic

70 Dec 7, 2022

Softlearning is a reinforcement learning framework for training maximum entropy policies in continuous domains. Includes the official implementation of the Soft Actor-Critic algorithm.

Softlearning Softlearning is a deep reinforcement learning toolbox for training maximum entropy policies in continuous domains. The implementation is

997 Dec 30, 2022

Code for NeurIPS 2021 paper: Invariant Causal Imitation Learning for Generalizable Policies

Invariant Causal Imitation Learning for Generalizable Policies Ioana Bica, Daniel Jarrett, Mihaela van der Schaar Neural Information Processing System

17 Dec 1, 2022

Dynamic View Synthesis from Dynamic Monocular Video

Towards Robust Monocular Depth Estimation: Mixing Datasets for Zero-shot Cross-dataset Transfer This repository contains code to compute depth from a

2.3k Jan 1, 2023

Dynamic View Synthesis from Dynamic Monocular Video

Dynamic View Synthesis from Dynamic Monocular Video Project Website | Video | Paper Dynamic View Synthesis from Dynamic Monocular Video Chen Gao, Ayus

139 Dec 28, 2022

Dynamic vae - Dynamic VAE algorithm is used for anomaly detection of battery data

Dynamic VAE frame Automatic feature extraction can be achieved by probability di

10 Oct 7, 2022

A complete end-to-end demonstration in which we collect training data in Unity and use that data to train a deep neural network to predict the pose of a cube. This model is then deployed in a simulated robotic pick-and-place task.

Object Pose Estimation Demo This tutorial will go through the steps necessary to perform pose estimation with a UR3 robotic arm in Unity. You’ll gain

187 Dec 24, 2022

[ICCV'21] NEAT: Neural Attention Fields for End-to-End Autonomous Driving

NEAT: Neural Attention Fields for End-to-End Autonomous Driving Paper | Supplementary | Video | Poster | Blog This repository is for the ICCV 2021 pap

254 Jan 2, 2023

PyTorch implementation of SampleRNN: An Unconditional End-to-End Neural Audio Generation Model

samplernn-pytorch A PyTorch implementation of SampleRNN: An Unconditional End-to-End Neural Audio Generation Model. It's based on the reference implem

261 Dec 14, 2022

An end-to-end machine learning web app to predict rugby scores (Pandas, SQLite, Keras, Flask, Docker)

Rugby score prediction An end-to-end machine learning web app to predict rugby scores Overview An demo project to provide a high-level overview of the

34 May 24, 2022

Learning recognition/segmentation models without end-to-end training. 40%-60% less GPU memory footprint. Same training time. Better performance.

InfoPro-Pytorch The Information Propagation algorithm for training deep networks with local supervision. (ICLR 2021) Revisiting Locally Supervised Lea

78 Dec 27, 2022

Neural Dynamic Policies for End-to-End Sensorimotor Learning

Related tags

Overview

Neural Dynamic Policies for End-to-End Sensorimotor Learning

In NeurIPS 2020 (Spotlight) [Project Website] [Project Video]

1) Installation and Usage

2) Other helpful pointers

3) Acknowledgements

Comments

can not get the good result in paper

How can I verify your algorithm performance?

Can not load the file './dmp/data/40x40-smnist.mat'

About the './dmp/data/pretrained_weights.pt' file

Is some error in 'smnist_loader.py' file?

Run "main_il.py" costs much time, is it right?

Environment dependency outdated

Owner

Shikhar Bahl

SenseNet is a sensorimotor and touch simulator for deep reinforcement learning research

🐤 Nix-TTS: An Incredibly Lightweight End-to-End Text-to-Speech Model via Non End-to-End Distillation

AdaFocus V2: End-to-End Training of Spatial Dynamic Networks for Video Recognition

Official codebase for Legged Robots that Keep on Learning: Fine-Tuning Locomotion Policies in the Real World

Softlearning is a reinforcement learning framework for training maximum entropy policies in continuous domains. Includes the official implementation of the Soft Actor-Critic algorithm.

Code for NeurIPS 2021 paper: Invariant Causal Imitation Learning for Generalizable Policies

Dynamic View Synthesis from Dynamic Monocular Video

Dynamic View Synthesis from Dynamic Monocular Video

Dynamic vae - Dynamic VAE algorithm is used for anomaly detection of battery data

A complete end-to-end demonstration in which we collect training data in Unity and use that data to train a deep neural network to predict the pose of a cube. This model is then deployed in a simulated robotic pick-and-place task.

[ICCV'21] NEAT: Neural Attention Fields for End-to-End Autonomous Driving

PyTorch implementation of SampleRNN: An Unconditional End-to-End Neural Audio Generation Model

An end-to-end machine learning web app to predict rugby scores (Pandas, SQLite, Keras, Flask, Docker)

Learning recognition/segmentation models without end-to-end training. 40%-60% less GPU memory footprint. Same training time. Better performance.

An end-to-end machine learning library to directly optimize AUC loss

[CVPR'21 Oral] Seeing Out of tHe bOx: End-to-End Pre-training for Vision-Language Representation Learning

"SOLQ: Segmenting Objects by Learning Queries", SOLQ is an end-to-end instance segmentation framework with Transformer.

VITS: Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech

FPGA: Fast Patch-Free Global Learning Framework for Fully End-to-End Hyperspectral Image Classification