Official codebase for Legged Robots that Keep on Learning: Fine-Tuning Locomotion Policies in the Real World

Laura Smith

Last update: Dec 7, 2022

Related tags

Deep Learning fine-tuning-locomotion

Overview

Legged Robots that Keep on Learning

Official codebase for Legged Robots that Keep on Learning: Fine-Tuning Locomotion Policies in the Real World, which contains code for training a simulated or real A1 quadrupedal robot to imitate various reference motions, pre-trained policies, and example training code for learning the policies.

Project page: https://sites.google.com/berkeley.edu/fine-tuning-locomotion

Getting Started

Install MPC extension (Optional) python3 setup.py install --user

Install dependencies:

Install MPI: sudo apt install libopenmpi-dev
Install requirements: pip3 install -r requirements.txt

Training Policies in Simulation

To train a policy, run the following command:

python3 motion_imitation/run_sac.py \
--mode train \
--motion_file [path to reference motion, e.g., motion_imitation/data/motions/pace.txt] \
--int_save_freq 1000 \
--visualize

--mode can be either train or test.
--motion_file specifies the reference motion that the robot is to imitate (not needed for training a reset policy). motion_imitation/data/motions/ contains different reference motion clips.
--int_save_freq specifies the frequency for saving intermediate policies every n policy steps.
--visualize enables visualization, and rendering can be disabled by removing the flag.
--train_reset trains a reset policy, otherwise imitation policies will be trained according to the reference motions passed in.
adding --use_redq uses REDQ, otherwise vanilla SAC will be used.
the trained model, videos, and logs will be written to output/.

Evaluating and/or Fine-Tuning Trained Policies

We provide checkpoints for the pre-trained models used in our experiments in motion_imitation/data/policies/.

Evaluating a Policy in Simulation

To evaluate individual policies, run the following command:

python3 motion_imitation/run_sac.py \
--mode test \
--motion_file [path to reference motion, e.g., motion_imitation/data/motions/pace.txt] \
--model_file [path to imitation model checkpoint, e.g., motion_imitation/data/policies/pace.ckpt] \
--num_test_episodes [# episodes to test] \
--use_redq \
--visualize

--motion_file specifies the reference motion that the robot is to imitate motion_imitation/data/motions/ contains different reference motion clips.
--model_file specifies specifies the .ckpt file that contains the trained model motion_imitation/data/policies/ contains different pre-trained models.
--num_test_episodes specifies the number of episodes to run evaluation for
--visualize enables visualization, and rendering can be disabled by removing the flag.

Autonomous Training using a Pre-Trained Reset Controller

To fine-tune policies autonomously, add a path to a trained reset policy (e.g., motion_imitation/data/policies/reset.ckpt) and a (pre-trained) imitation policy.

python3 motion_imitation/run_sac.py \
--mode train \
--motion_file [path to reference motion] \
--model_file [path to imitation model checkpoint] \
--getup_model_file [path to reset model checkpoint] \
--use_redq \
--int_save_freq 100 \
--num_test_episodes 20 \
--finetune \
--real_robot

adding --finetune performs fine-tuning, otherwise hyperparameters for pre-training will be used.
adding --real_robot will run training on the real A1 (see below to install necessary packages for running the real A1). If this is omitted, training will run in simulation.

To run two SAC trainers, one learning to walk forward and one backward, add a reference and checkpoint for another policy and use the multitask flag.

python motion_imitation/run_sac.py \
--mode train \
--motion_file motion_imitation/data/motions/pace.txt \
--backward_motion_file motion_imitation/data/motions/pace_backward.txt \
--model_file [path to forward imitation model checkpoint] \
--backward_model_file [path to backward imitation model checkpoint] \
--getup_model_file [path to reset model checkpoint] \
--use_redq \
--int_save_freq 100 \
--num_test_episodes 20 \
--real_robot \
--finetune \
--multitask

Running MPC on the real A1 robot

Since the SDK from Unitree is implemented in C++, we find the optimal way of robot interfacing to be via C++-python interface using pybind11.

Step 1: Build and Test the robot interface

To start, build the python interface by running the following: bash cd third_party/unitree_legged_sdk mkdir build cd build cmake .. make Then copy the built robot_interface.XXX.so file to the main directory (where you can see this README.md file).

Step 2: Setup correct permissions for non-sudo user

Since the Unitree SDK requires memory locking and high-priority process, which is not usually granted without sudo, add the following lines to /etc/security/limits.conf:


   
     soft memlock unlimited

    
      hard memlock unlimited

     
       soft nice eip

      
        hard nice eip

You may need to reboot the computer for the above changes to get into effect.

Step 3: Test robot interface.

Test the python interfacing by running: 'sudo python3 -m motion_imitation.examples.test_robot_interface'

If the previous steps were completed correctly, the script should finish without throwing any errors.

Note that this code does not do anything on the actual robot.

Running the Whole-body MPC controller

To see the whole-body MPC controller in sim, run: bash python3 -m motion_imitation.examples.whole_body_controller_example

To see the whole-body MPC controller on the real robot, run: bash sudo python3 -m motion_imitation.examples.whole_body_controller_robot_example

Comments

Cannot run in real Unitree A1

Hi! Thank you for sharing your work.

However, I found some trouble when implementing the MPC to a real A1 robot. It works fine in the first one second, and starts to lose its balance an acted weirdly afterwards.

May I know your computing device to run the MPC?

Thanks in advance!

opened by anahrendra 3
samples_per_iter=512?

Hey,

Thanks for making the repo public! I was trying to run some of the experiments from the paper and noticed that the training seems pretty slow. Upon some digging I saw that the default hyperparameters for "samples_per_iter" has been set to 512 (which means that the algo is collecting 512 samples before performing 1 SAC/REDQ update). From my limited experience with SAC, I've mostly seen that hyperparameter set to 1 so just wanted to confirm if there's a reason why it's been set to 512 in this case?

Thanks!

opened by swami1995 1
How to test reset policy in simulation

Hi, I wonder how to test reset policy in simulation from your code, I run the simulation reference to “readme file” with run_sacy.py, but it seems to test the imitaion policy, I could not get the results for reset policy which you have shown in your paper. Thanks!

opened by JeffXu1 1

Can't find whole_body_controller_robot_example file

In the Readme file: To see the whole-body MPC controller on the real robot, run:

sudo python3 -m motion_imitation.examples.whole_body_controller_robot_example
but I can't find the whole_body_controller_robot_example file in your  folder, could you  help me? Thanks!

opened by JeffXu1 0

Cannot build robot interface

Hello, thanks for the great work done here!

When attempting Step 1: Build and Test the robot interface as follows:

cd third_party/unitree_legged_sdk
mkdir build
cd build
cmake ..
make

I run into the following error on the last line:

$ make
[ 50%] Linking CXX shared module robot_interface.cpython-37m-x86_64-linux-gnu.so
/usr/bin/ld: cannot find -lunitree_legged_sdk_amd64
collect2: error: ld returned 1 exit status
make[2]: *** [CMakeFiles/robot_interface.dir/build.make:84: robot_interface.cpython-37m-x86_64-linux-gnu.so] Error 1
make[1]: *** [CMakeFiles/Makefile2:96: CMakeFiles/robot_interface.dir/all] Error 2
make: *** [Makefile:84: all] Error 2

It looks like the linker can't find the Unitree Legged SDK binaries. I was able to build that (using v3.3.1 as suggested in their README) but I'm unsure exactly what I need to do to get it working together with this codebase. Any help is appreciated! Thanks again.

opened by dtch1997 1

missing 'reset.txt' file at the motion_imitation/data/motions

Hi, I think you have missed to upload the motion file for 'reset' task here: https://github.com/lauramsmith/fine-tuning-locomotion/tree/main/motion_imitation/data/motions

opened by SaminYeasar 1

Official codebase for Legged Robots that Keep on Learning: Fine-Tuning Locomotion Policies in the Real World

Related tags

Overview

Legged Robots that Keep on Learning

Getting Started

Training Policies in Simulation

Evaluating and/or Fine-Tuning Trained Policies

Evaluating a Policy in Simulation

Autonomous Training using a Pre-Trained Reset Controller

Running MPC on the real A1 robot

Step 1: Build and Test the robot interface

Step 2: Setup correct permissions for non-sudo user

Step 3: Test robot interface.

Running the Whole-body MPC controller

Comments

Cannot run in real Unitree A1

samples_per_iter=512?

How to test reset policy in simulation

Can't find whole_body_controller_robot_example file

Cannot build robot interface

missing 'reset.txt' file at the motion_imitation/data/motions

Owner

Laura Smith

Distilling Motion Planner Augmented Policies into Visual Control Policies for Robot Manipulation (CoRL 2021)

A selection of State Of The Art research papers (and code) on human locomotion (pose + trajectory) prediction (forecasting)

HDR Video Reconstruction: A Coarse-to-fine Network and A Real-world Benchmark Dataset (ICCV 2021)

This repository is the official implementation of Unleashing the Power of Contrastive Self-Supervised Visual Models via Contrast-Regularized Fine-Tuning (NeurIPS21).

Code for T-Few from "Few-Shot Parameter-Efficient Fine-Tuning is Better and Cheaper than In-Context Learning"

Softlearning is a reinforcement learning framework for training maximum entropy policies in continuous domains. Includes the official implementation of the Soft Actor-Critic algorithm.

Black-Box-Tuning - Black-Box Tuning for Language-Model-as-a-Service

Real-ESRGAN: Training Real-World Blind Super-Resolution with Pure Synthetic Data

Code for ACL2021 paper Consistency Regularization for Cross-Lingual Fine-Tuning.

Cartoon-StyleGan2 🙃 : Fine-tuning StyleGAN2 for Cartoon Face Generation

Fine-tuning StyleGAN2 for Cartoon Face Generation

Example Of Fine-Tuning BERT For Named-Entity Recognition Task And Preparing For Cloud Deployment Using Flask, React, And Docker

Implementation of the paper "Fine-Tuning Transformers: Vocabulary Transfer"

Ensemble Knowledge Guided Sub-network Search and Fine-tuning for Filter Pruning

Neural Dynamic Policies for End-to-End Sensorimotor Learning

Code for NeurIPS 2021 paper: Invariant Causal Imitation Learning for Generalizable Policies

Official implementation of the paper 'Efficient and Degradation-Adaptive Network for Real-World Image Super-Resolution'

Official codebase for Decision Transformer: Reinforcement Learning via Sequence Modeling.

Official codebase for "B-Pref: Benchmarking Preference-BasedReinforcement Learning" contains scripts to reproduce experiments.