CLIPort: What and Where Pathways for Robotic Manipulation

Overview

CLIPort

CLIPort: What and Where Pathways for Robotic Manipulation
Mohit Shridhar, Lucas Manuelli, Dieter Fox
CoRL 2021

CLIPort is an end-to-end imitation-learning agent that can learn a single language-conditioned policy for various tabletop tasks. The framework combines the broad semantic understanding (what) of CLIP with the spatial precision (where) of TransporterNets to learn generalizable skills from limited training demonstrations.

For the latest updates, see: cliport.github.io

Guides

Installation

Clone Repo:

git clone https://github.com/cliport/cliport.git

Setup virtualenv and install requirements:

# setup virtualenv with whichever package manager you prefer
virtualenv -p $(which python3.8) --system-site-packages cliport_env  
source cliport_env/bin/activate
pip install --upgrade pip

cd cliport
pip install -r requirements.txt

export CLIPORT_ROOT=$(pwd)
python setup.py develop

Note: You might need versions of torch==1.7.1 and torchvision==0.8.2 that are compatible with your CUDA and hardware.

Quickstart

A quick tutorial on evaluating a pre-trained multi-task model.

Download a pre-trained checkpoint for multi-language-conditioned trained with 1000 demos:

python scripts/quickstart_download.py

Generate a small test set of 10 instances for stack-block-pyramid-seq-seen-colors inside $CLIPORT_ROOT/data:

python cliport/demos.py n=10 \
                        task=stack-block-pyramid-seq-seen-colors \
                        mode=test 

This will take a few minutes to finish.

Evaluate the best validation checkpoint for stack-block-pyramid-seq-seen-colors on the test set:

python cliport/eval.py model_task=multi-language-conditioned \
                       eval_task=stack-block-pyramid-seq-seen-colors \
                       agent=cliport \
                       mode=test \
                       n_demos=10 \
                       train_demos=1000 \
                       exp_folder=cliport_quickstart \
                       checkpoint_type=test_best \
                       update_results=True \
                       disp=True

If you are on a headless machine turn off the visualization with disp=False.

You can evaluate the same multi-language-conditioned model on other tasks. First generate a val set for the task and then specify eval_task=<task_name> with mode=val and checkpoint_type=val_missing (the quickstart doesn't include validation results for all tasks; download all task results from here).

Download

Google Scanned Objects

Download center-of-mass (COM) corrected Google Scanned Objects:

python scripts/google_objects_download.py

Credit: Google.

Pre-trained Checkpoints and Result JSONs

This Google Drive Folder contains pre-trained multi-language-conditioned checkpoints for n=1,10,100,1000 and validation/test result JSONs for all tasks. The *val-results.json files contain the name of the best checkpoint (from validation) to be evaluated on the test set.

Note: Google Drive might complain about bandwidth restrictions. I recommend using rclone with API access enabled.

Evaluate the best validation checkpoint on the test set:

python cliport/eval.py model_task=multi-language-conditioned \
                       eval_task=stack-block-pyramid-seq-seen-colors \
                       agent=cliport \
                       mode=test \
                       n_demos=10 \
                       train_demos=100 \
                       exp_folder=cliport_exps \
                       checkpoint_type=test_best \
                       update_results=True \
                       disp=True

Training and Evaluation

The following is a guide for training everything from scratch. All tasks follow a 4-phase workflow:

  1. Generate train, val, test datasets with demos.py
  2. Train agents with train.py
  3. Run validation with eval.py to find the best checkpoint on val tasks and save *val-results.json
  4. Evaluate the best checkpoint in *val-results.json on test tasks with eval.py

Dataset Generation

Single Task

Generate a train set of 1000 demonstrations for stack-block-pyramid-seq-seen-colors inside $CLIPORT_ROOT/data:

python cliport/demos.py n=1000 \
                        task=stack-block-pyramid-seq-seen-colors \
                        mode=train 

You can also do a sequential sweep with -m and comma-separated params task=towers-of-hanoi-seq-seen-colors,stack-block-pyramid-seq-seen-colors. Use disp=True to visualize the data generation.

Full Dataset

Run generate_dataset.sh to generate the full dataset and save it to $CLIPORT_ROOT/data:

sh scripts/generate_dataset.sh data

Note: This script is not parallelized and will take a long time (maybe days) to finish. The full dataset requires ~1.6TB of storage, which includes both language-conditioned and demo-conditioned (original TransporterNets) tasks. It's recommend that you start with single-task training if you don't have enough storage space.

Single-Task Training & Evaluation

Make sure you have a train (n demos) and val (100 demos) set for the task you want to train on.

Training

Train a cliport agent with 1000 demonstrations on the stack-block-pyramid-seq-seen-colors task for 200K iterations:

python cliport/train.py train.task=stack-block-pyramid-seq-seen-colors \
                        train.agent=cliport \
                        train.attn_stream_fusion_type=add \
                        train.trans_stream_fusion_type=conv \
                        train.lang_fusion_type=mult \
                        train.n_demos=1000 \
                        train.n_steps=201000 \
                        train.exp_folder=exps \
                        dataset.cache=False 

Validation

Iteratively evaluate all the checkpoints on val and save the results in exps/<task>-train/checkpoints/<task>-val-results.json:

python cliport/eval.py eval_task=stack-block-pyramid-seq-seen-colors \
                       agent=cliport \
                       mode=val \
                       n_demos=100 \
                       train_demos=1000 \
                       checkpoint_type=val_missing \
                       exp_folder=exps 

Test

Choose the best checkpoint from validation to run on the test set and save the results in exps/<task>-train/checkpoints/<task>-test-results.json:

python cliport/eval.py eval_task=stack-block-pyramid-seq-seen-colors \
                       agent=cliport \
                       mode=test \
                       n_demos=100 \
                       train_demos=1000 \
                       checkpoint_type=test_best \
                       exp_folder=exps 

Multi-Task Training & Evaluation

Training

Train multi-task models by specifying task=multi-language-conditioned, task=multi-loo-packing-box-pairs-unseen-colors (loo stands for leave-one-out or multi-attr tasks) etc.

python cliport/train.py train.task=multi-language-conditioned \
                        train.agent=cliport \
                        train.attn_stream_fusion_type=add \
                        train.trans_stream_fusion_type=conv \
                        train.lang_fusion_type=mult \
                        train.n_demos=1000 \
                        train.n_steps=601000 \
                        dataset.cache=False \
                        train.exp_folder=exps \
                        dataset.type=multi 

Important: You need to generate the full dataset of tasks specified in dataset.py before multi-task training or modify the list of tasks here.

Validation

Run validation with a trained multi-language-conditioned multi-task model on stack-block-pyramid-seq-seen-colors:

python cliport/eval.py model_task=multi-language-conditioned \
                       eval_task=stack-block-pyramid-seq-seen-colors \
                       agent=cliport \
                       mode=val \
                       n_demos=100 \
                       train_demos=1000 \
                       checkpoint_type=val_missing \
                       type=single \
                       exp_folder=exps 

Test

Evaluate the best checkpoint on the test set:

python cliport/eval.py model_task=multi-language-conditioned \
                       eval_task=stack-block-pyramid-seq-seen-colors \
                       agent=cliport \
                       mode=test \
                       n_demos=100 \
                       train_demos=1000 \
                       checkpoint_type=test_best \
                       type=single \
                       exp_folder=exps 

Disclaimers

  • Code Quality Level: Tired grad student.
  • Scaling: The code only works for batch size 1. See #issue1 for reference. In theory, there is nothing preventing larger batch sizes other than GPU memory constraints.
  • Memory and Storage: There are lots of places where memory usage can be reduced. You don't need 3 copies of the same CLIP ResNet50 and you don't need to save its weights in checkpoints since it's frozen anyway. Dataset sizes could be dramatically reduced with better storage formats and compression.
  • Frameworks: There are lots of leftover NumPy bits from when I was trying to reproduce the TransportNets results. I'll try to clean up when I get some time.
  • Rotation Augmentation: All tasks use the same distribution for sampling SE(2) rotation perturbations. This obviously leads to issues with tasks that involve spatial relationships like 'left' or 'forward'.
  • Evaluation Runs: In an ideal setting, the evaluation metrics should be averaged over 3 or more repetitions with different seeds. This might be feasible if you are working just with multi-task models.
  • Duplicate Training Sets: The train sets of some *seen and *unseen tasks are identical, and only the val and test sets differ for purposes of evaluating generalization performance. So you might not need two duplicate train sets or train two separate models.
  • Other Limitations: Checkout Appendix I in the paper.

Notebooks

Checkout Kevin Zakka's Colab for zero-shot detection with CLIP. This notebook might be a good way of gauging what sort of visual attributes CLIP can ground with language. But note that CLIPort does NOT do "object detection", but instead directly "detects actions".

Others Todos

  • Dataset Visualizer
  • Affordance Heatmap Visualizer
  • Evaluation Results Plot

Docker Guide

Install Docker and NVIDIA Docker.

Modify docker_build.py and docker_run.py to your needs.

Build

Build the image:

python scripts/docker_build.py 

Run

Start container:

python scripts/docker_run.py --nvidia_docker
 
  cd ~/cliport

Use scripts/docker_run.py --headless if you are on a headless machines like a remote server or cloud instance.

Real-Robot Training FAQ

How much training data do I need?

It depends on the complexity of the task. With 5-10 demonstrations the agent should start to do something useful, but it will often make mistakes by picking the wrong object. For robustness you probably need 50-100 demostrations. A good way to gauge how much data you might need is to setup a simulated version of the problem and evaluate agents trained with 1, 10, 100, and 1000 demonstrations.

Why doesn't the agent follow my language instruction?

This means either there is some sort of bias in the dataset that the agent is exploiting, or you don't have enough training data. Also make sure that the task is doable - if a referred attribute is barely legible in the input, then it's going to be hard for agent to figure out what you mean.

Does CLIPort predict height (z-values) of the end-effector?

CLIPort does not predict height values. You can either: (1) come up with a heuristic based on the heightmap to determine the height position, or (2) train a simple MLP like in TransportNets-6DOF to predict z-values.

Shouldn't CLIP help in zero-shot detection of things? Why do I need collect more data?

Note that CLIPort is not doing "object detection". CLIPort fine-tunes CLIP's representations to "detect actions" in SE(2). CLIP by itself has no understanding of actions or affordances; recognizing and localizing objects (e.g. detecting hammer) does not tell you anything about how to manipulate them (e.g. grasping the hammer by the handle).

What are the best hyperparams for real-robot training?

The default settings should work well. Although recently, I have been playing around with using FiLM (Perez et. al, 2017) to fuse language features inspired by BC-0 (Jang et. al, 2021). Qualitatively, it seems like FiLM is better for reading text etc. but I haven't conducted a full quantitative analysis. Try it out yourself with train.agent=two_stream_clip_film_lingunet_lat_transporter (non-residual FiLM).

How to pick the best checkpoint for real-robot tasks?

Ideally, you should create a validation set with heldout instances and then choose the checkpoint with the lowest translation and rotation errors. You can also reuse the training instances but swap the language instructions with unseen goals.

Why is the agent confusing directions like 'forward' and 'left'?

By default, training samples are augmented with SE(2) rotations sampled from N(0, 60 deg). For tasks with rotational symmetries (like moving pieces on a chessboard) you need to be careful with this rotation augmentation parameter.

Acknowledgements

This work use code from the following open-source projects and datasets:

Google Ravens (TransporterNets)

Original: https://github.com/google-research/ravens
License: Apache 2.0
Changes: All PyBullet tasks are directly adapted from the Ravens codebase. The original TransporterNets models were reimplemented in PyTorch.

OpenAI CLIP

Original: https://github.com/openai/CLIP
License: MIT
Changes: Minor modifications to CLIP-ResNet50 to save intermediate features for skip connections.

Google Scanned Objects

Original: Dataset
License: Creative Commons BY 4.0
Changes: Fixed center-of-mass (COM) to be geometric-center for selected objects.

U-Net

Original: https://github.com/milesial/Pytorch-UNet/
License: GPL 3.0
Changes: Used as is in unet.py. Note: This part of the code is GPL 3.0.

Citations

CLIPort

@inproceedings{shridhar2021cliport,
  title     = {CLIPort: What and Where Pathways for Robotic Manipulation},
  author    = {Shridhar, Mohit and Manuelli, Lucas and Fox, Dieter},
  booktitle = {Proceedings of the 5th Conference on Robot Learning (CoRL)},
  year      = {2021},
}

CLIP

@article{radford2021learning,
  title={Learning transferable visual models from natural language supervision},
  author={Radford, Alec and Kim, Jong Wook and Hallacy, Chris and Ramesh, Aditya and Goh, Gabriel and Agarwal, Sandhini and Sastry, Girish and Askell, Amanda and Mishkin, Pamela and Clark, Jack and others},
  journal={arXiv preprint arXiv:2103.00020},
  year={2021}
}

TransporterNets

@inproceedings{zeng2020transporter,
  title={Transporter networks: Rearranging the visual world for robotic manipulation},
  author={Zeng, Andy and Florence, Pete and Tompson, Jonathan and Welker, Stefan and Chien, Jonathan and Attarian, Maria and Armstrong, Travis and Krasin, Ivan and Duong, Dan and Sindhwani, Vikas and others},
  booktitle={Proceedings of the 4th Conference on Robot Learning (CoRL)},
  year= {2020},
}

Questions or Issues?

Please file an issue with the issue tracker.

Comments
  • Why would you include overlapping colors in terms of the seen and unseen color split?

    Why would you include overlapping colors in terms of the seen and unseen color split?

    Dear author,

    It seems that you put three colors - red, blue, and green in both the seen and unseen splits. Could you explain your intention for doing that? I wonder if the inclusion of three shared colors makes the evaluation on unseen split mixed?

    Thanks in advance!

    question 
    opened by DeeDive 6
  • About the visualization of pick and place affordance

    About the visualization of pick and place affordance

    Dear author,

    Thanks for your code about heatmap visualization! Following https://github.com/cliport/cliport/blob/master/notebooks/affordances.ipynb, I can successfully visualize the pick and place location in my own dataset. However, what I see in the heatmap are just some tiny bright points (maybe scale a few pixels) indicating the target location, which are quite different from your figures (the high-lightened areas exactly match the shape of target block). I'm wondering the reason. From my understanding, since the target position is a single pixel, the output logits after successful training should be concentrated on the target pixel. So that the high-lightened area should be focused on certain point(s). Do you have any idea about this?

    Thanks!

    opened by qianLyu 4
  • Question about color channel order (gray background) in video demos

    Question about color channel order (gray background) in video demos

    Hi, thanks for the great work! I am new to OpenCV and I feel a little bit confused about the channel order for your video. The visualization replicated by myself has yellow background instead of the gray background in your demos. Is there anything wrong with my color channels? Or did you manually edit those rendered videos? Millions of thanks in advance.

    question 
    opened by DeeDive 4
  • Customizing image size

    Customizing image size

    Hello, respect to your great job and generous release! I wonder if I can modify the image size for training?

    I already figure out how to change input image resolution in preprocess module in cliport/utils/utils.py. I am not sure if it works. And could you please offer some guidance on how to change the resolution of ground truth label? In addition, I am not sure whether modifying these two places is enough for training at customized image size?

    opened by huangjy-pku 3
  • Question about the oracle camera config

    Question about the oracle camera config

    Hi authors, thanks for the great work! Can I ask a question about the camera configs of the oracle?

    class Oracle(object):
        """Top-down noiseless image used only by the oracle demonstrator."""
    
        # Near-orthographic projection.
        image_size = (480, 640)
        intrinsics = (63e4, 0, 320.0, 0, 63e4, 240.0, 0, 0, 1)
        position = (0.5, 0, 1000.0)
        rotation = p.getQuaternionFromEuler((0, np.pi, -np.pi / 2))
    
        # Camera config.
        CONFIG = [
            {
                "image_size": image_size,
                "intrinsics": intrinsics,
                "position": position,
                "rotation": rotation,
                # "zrange": (999.7, 1001.0),
                "zrange": (999.7, 1001.0),
                "noise": False,
            }
        ]
    

    Could you explain the meaning of different parameters in the config? I find that your camera could have perfect captures of the workspace without, e.g. margins. How did you achieve that? And more specifically, what does "zrange": (999.7, 1001.0), do? From my understanding, it set the camera at a height of 1000cm, right? I am a little bit confused about that because the height of the whole workspace is only 0.3.

    Millions of thanks in advance!

    question 
    opened by DeeDive 3
  • Affordance heatmap visualizer

    Affordance heatmap visualizer

    Hi, thanks for this great work and code release! I was wondering what the status is of the affordance heatmap visualizer (or the dataset visualizer too) - I see it's in the TODOs in the README. It'd make it much easier to build off of this work with that in hand to explore! Thanks and looking forward to hearing from you.

    opened by sachit-menon 3
  • Non-Transporter Baselines

    Non-Transporter Baselines

    Hey Mohit,

    I’m looking to test some models that do language conditioned imitation without the underlying Transporter networks inductive bias (state, language —> action).

    Is this feasible from say ~100 demos? What’s the best place in the codebase to start lifting out this tasks/evaluation functionality (leaving all the Transporter specific things)?

    Thanks so much!

    opened by siddk 2
  • adding new objects, center of mass

    adding new objects, center of mass

    I've been able to run the quickstart example and it seems to work ok!

    I'd like to create new test tasks with new objects from the Google Scanned Objects dataset:

    https://app.ignitionrobotics.org/GoogleResearch/fuel/models/DOLL_FAMILY https://app.ignitionrobotics.org/GoogleResearch/fuel/models/MODERN_DOLL_FAMILY

    I have a few questions:

    1. How do I run the "pack all the blue and black sneaker objects in the brown box" task?
    2. Could you add the process you used to the code/docs for prepping a new model and doing center of mass correction?
    3. Will the presence of multiple objects (the various family members) in these two models be a problem, do you have suggestions to separate them into multiple models?
    4. Do you have suggestions of where/how to add a new test task to the code? I see packing_google_objects.py plus strings/classes I should add to init.py in the same folder, are there other spots I should know of?

    Thanks for giving this a look!

    question 
    opened by ahundt 2
  • ValueError: 'a' cannot be empty unless no samples are taken

    ValueError: 'a' cannot be empty unless no samples are taken

    Hi,thanks for your great job.

    I try to train and got the following error:

    Traceback (most recent call last): File "/home/adminis/qw/My_Projects/cliport_env/lib/python3.8/site-packages/hydra/_internal/utils.py", line 198, in run_and_report return func() File "/home/adminis/qw/My_Projects/cliport_env/lib/python3.8/site-packages/hydra/_internal/utils.py", line 347, in lambda: hydra.run( File "/home/adminis/qw/My_Projects/cliport_env/lib/python3.8/site-packages/hydra/_internal/hydra.py", line 107, in run return run_job( File "/home/adminis/qw/My_Projects/cliport_env/lib/python3.8/site-packages/hydra/core/utils.py", line 128, in run_job ret.return_value = task_function(task_cfg) File "cliport/train.py", line 69, in main val_ds = RavensDataset(os.path.join(data_dir, '{}-val'.format(task)), cfg, n_demos=n_val, augment=False) File "/home/adminis/qw/My_Projects/cliport/cliport/dataset.py", line 60, in init episodes = np.random.choice(range(self.n_episodes), self.n_demos, False) File "mtrand.pyx", line 773, in numpy.random.mtrand.RandomState.choice ValueError: 'a' cannot be empty unless no samples are taken

    During handling of the above exception, another exception occurred:

    Traceback (most recent call last): File "cliport/train.py", line 78, in main() File "/home/adminis/qw/My_Projects/cliport_env/lib/python3.8/site-packages/hydra/main.py", line 32, in decorated_main _run_hydra( File "/home/adminis/qw/My_Projects/cliport_env/lib/python3.8/site-packages/hydra/_internal/utils.py", line 346, in _run_hydra run_and_report( File "/home/adminis/qw/My_Projects/cliport_env/lib/python3.8/site-packages/hydra/_internal/utils.py", line 237, in run_and_report assert mdl is not None AssertionError [2021-11-26 18:16:01,603][wandb.sdk.internal.internal][INFO] - Internal process exited

    opened by qingweihk 2
  • missing of query stream model

    missing of query stream model

    Hello, I met another confusion that, in one-stream language-goal transport module, both logits and kernel are derived by passing through key_stream_one, with the other stream model (query_stream_one) unused. I don't know if it is a typo, but referring to the two-stream transport module, where the kernel is derived by passing through query_stream, I suspect it needs fixing here.

    The code is here: https://github.com/cliport/cliport/blob/152610b6913a84b5d67a356e6b80c591d80d7d7a/cliport/models/streams/one_stream_transport_lang_fusion.py#L23

    opened by huangjy-pku 1
  • Training iterations with different numbers of demonstrations

    Training iterations with different numbers of demonstrations

    Thanks for your excellent work and perfect github repo.

    I have a problem about the number of training iterations. Does 200K iterations are used for training a single task model on different numbers of demonstrations (1, 10, 100, or 1000)? For example, training on 1 demonstration also takes 200K iterations. Since max_epoch = cfg['train']['n_steps'] // cfg['train']['n_demos'], does it mean training on fewer demonstrations takes more epochs?

    Looking forward to your reply.

    Best regards, Yuying

    opened by geyuying 1
  • Try to run 'quickstart' and meet the bug: Invalid version: '0.10.1,<0.11'

    Try to run 'quickstart' and meet the bug: Invalid version: '0.10.1,<0.11'

    I follow the commands in 'Installation' and 'quickstart' step by step. And meet the problem so called: packaging.version.InvalidVersion: Invalid version: '0.10.1,<0.11'.

    I wonder if there some new updates of virtual env I should consider?

    opened by ZhouYFeng 3
  • Fix typos and an undefined name

    Fix typos and an undefined name

    Test results: https://github.com/cclauss/cliport/actions

    $ flake8 . --count --select=E9,F63,F7,F82 --show-source --statistics

    ./cliport/cliport/models/core/clip.py:27:1: F822 undefined name 'load' in __all__
    __all__ = ["available_models", "load", "tokenize"]
    ^
    1     F822 undefined name 'load' in __all__
    1
    
    opened by cclauss 0
Look Closer: Bridging Egocentric and Third-Person Views with Transformers for Robotic Manipulation

Look Closer: Bridging Egocentric and Third-Person Views with Transformers for Robotic Manipulation Official PyTorch implementation for the paper Look

Rishabh Jangir 20 Nov 24, 2022
ManipulaTHOR, a framework that facilitates visual manipulation of objects using a robotic arm

ManipulaTHOR: A Framework for Visual Object Manipulation Kiana Ehsani, Winson Han, Alvaro Herrasti, Eli VanderBilt, Luca Weihs, Eric Kolve, Aniruddha

AI2 65 Dec 30, 2022
RGB-stacking 🛑 🟩 🔷 for robotic manipulation

RGB-stacking ?? ?? ?? for robotic manipulation BLOG | PAPER | VIDEO Beyond Pick-and-Place: Tackling Robotic Stacking of Diverse Shapes, Alex X. Lee*,

DeepMind 95 Dec 23, 2022
ManiSkill-Learn is a framework for training agents on SAPIEN Open-Source Manipulation Skill Challenge (ManiSkill Challenge), a large-scale learning-from-demonstrations benchmark for object manipulation.

ManiSkill-Learn ManiSkill-Learn is a framework for training agents on SAPIEN Open-Source Manipulation Skill Challenge, a large-scale learning-from-dem

Hao Su's Lab, UCSD 48 Dec 30, 2022
A complete end-to-end demonstration in which we collect training data in Unity and use that data to train a deep neural network to predict the pose of a cube. This model is then deployed in a simulated robotic pick-and-place task.

Object Pose Estimation Demo This tutorial will go through the steps necessary to perform pose estimation with a UR3 robotic arm in Unity. You’ll gain

Unity Technologies 187 Dec 24, 2022
Axel - 3D printed robotic hands and they controll with Raspberry Pi and Arduino combo

Axel It's our graduation project about 3D printed robotic hands and they control

null 0 Feb 14, 2022
Doosan robotic arm, simulation, control, visualization in Gazebo and ROS2 for Reinforcement Learning.

Robotic Arm Simulation in ROS2 and Gazebo General Overview This repository includes: First, how to simulate a 6DoF Robotic Arm from scratch using GAZE

David Valencia 12 Jan 2, 2023
Using some basic methods to show linkages and transformations of robotic arms

roboticArmVisualizer Python GUI application to create custom linkages and adjust joint angles. In the future, I plan to add 2d inverse kinematics solv

Sandesh Banskota 1 Nov 19, 2021
Control-Robot-Arm-using-PS4-Controller - A Robotic Arm based on Raspberry Pi and Arduino that controlled by PS4 Controller

Control-Robot-Arm-using-PS4-Controller You can see all details about this Robot

MohammadReza Sharifi 5 Jan 1, 2022
Building Ellee — A GPT-3 and Computer Vision Powered Talking Robotic Teddy Bear With Human Level Conversation Intelligence

Using an object detection and facial recognition system built on MobileNetSSDV2 and Dlib and running on an NVIDIA Jetson Nano, a GPT-3 model, Google Speech Recognition, Amazon Polly and servo motors, I built Ellee - a robotic teddy bear who can move her head and converse naturally.

null 24 Oct 26, 2022
Self-supervised Deep LiDAR Odometry for Robotic Applications

DeLORA: Self-supervised Deep LiDAR Odometry for Robotic Applications Overview Paper: link Video: link ICRA Presentation: link This is the correspondin

Robotic Systems Lab - Legged Robotics at ETH Zürich 181 Dec 29, 2022
DiSECt: Differentiable Simulator for Robotic Cutting

DiSECt: Differentiable Simulator for Robotic Cutting Website | Paper | Dataset | Video | Blog post DiSECt is a simulator for the cutting of deformable

NVIDIA Research Projects 73 Oct 29, 2022
Get a Grip! - A robotic system for remote clinical environments.

Get a Grip! Within clinical environments, sterilization is an essential procedure for disinfecting surgical and medical instruments. For our engineeri

Jay Sharma 1 Jan 5, 2022
A robotic arm that mimics hand movement through MediaPipe tracking.

La-Z-Arm A robotic arm that mimics hand movement through MediaPipe tracking. Hardware NVidia Jetson Nano Sparkfun Pi Servo Shield Micro Servos Webcam

Alfred 1 Jun 5, 2022
Data manipulation and transformation for audio signal processing, powered by PyTorch

torchaudio: an audio library for PyTorch The aim of torchaudio is to apply PyTorch to the audio domain. By supporting PyTorch, torchaudio follows the

null 1.9k Dec 28, 2022
Provided is code that demonstrates the training and evaluation of the work presented in the paper: "On the Detection of Digital Face Manipulation" published in CVPR 2020.

FFD Source Code Provided is code that demonstrates the training and evaluation of the work presented in the paper: "On the Detection of Digital Face M

null 88 Nov 22, 2022
This is a vision-based 3d model manipulation and control UI

Manipulation of 3D Models Using Hand Gesture This program allows user to manipulation 3D models (.obj format) with their hands. The project support bo

Cortic Technology Corp. 43 Oct 23, 2022
Python package for covariance matrices manipulation and Biosignal classification with application in Brain Computer interface

pyRiemann pyRiemann is a python package for covariance matrices manipulation and classification through Riemannian geometry. The primary target is cla

null 447 Jan 5, 2023
Spatial Intention Maps for Multi-Agent Mobile Manipulation (ICRA 2021)

spatial-intention-maps This code release accompanies the following paper: Spatial Intention Maps for Multi-Agent Mobile Manipulation Jimmy Wu, Xingyua

Jimmy Wu 70 Jan 2, 2023