Efficient 6-DoF Grasp Generation in Cluttered Scenes

NVIDIA Research Projects

Last update: Dec 28, 2022

Related tags

Deep Learning contact_graspnet

Overview

Contact-GraspNet

Contact-GraspNet: Efficient 6-DoF Grasp Generation in Cluttered Scenes

Martin Sundermeyer, Arsalan Mousavian, Rudolph Triebel, Dieter Fox
ICRA 2021

paper, project page, video

Installation

This code has been tested with python 3.7, tensorflow 2.2, CUDA 10.1, and CUDNN 7.6.0

Create the conda env

conda env create -f contact_graspnet_env.yml

Troubleshooting

Recompile pointnet2 tf_ops, see here

Hardware

Training: 1x Nvidia GPU >= 24GB VRAM, >=64GB RAM
Inference: 1x Nvidia GPU >= 8GB VRAM (might work with less)

Download Models and Data

Model

Download trained models from here and copy them into the checkpoints/ folder.

Test data

Download the test data from here and copy them them into the test_data/ folder.

Inference

Contact-GraspNet can directly predict a 6-DoF grasp distribution from a raw scene point cloud. However, to obtain object-wise grasps, remove background grasps and to achieve denser proposals it is highly recommended to use (unknown) object segmentation [e.g. 1, 2] as preprocessing and then use the resulting segmentation map to crop local regions and filter grasp contacts.

Given a .npy/.npz file with a depth map (in meters), camera matrix K and (optionally) a 2D segmentation map, execute:

python contact_graspnet/inference.py \
       --np_path=test_data/*.npy \
       --local_regions --filter_grasps

--> close the window to go to next scene

Given a .npy/.npz file with just a 3D point cloud (in meters), execute for example:

python contact_graspnet/inference.py --np_path=/path/to/your/pc.npy \
                                     --forward_passes=5 \
                                     --z_range=[0.2,1.1]

--np_path: input .npz/.npy file(s) with 'depth', 'K' and optionally 'segmap', 'rgb' keys. For processing a Nx3 point cloud instead use 'xzy' and optionally 'xyz_color' as keys.
--ckpt_dir: relative path to checkpooint directory. By default checkpoint/scene_test_2048_bs3_hor_sigma_001 is used. For very clean / noisy depth data consider scene_2048_bs3_rad2_32 / scene_test_2048_bs3_hor_sigma_0025 trained with no / strong noise.
--local_regions: Crop 3D local regions around object segments for inference. (only works with segmap)
--filter_grasps: Filter grasp contacts such that they only lie on the surface of object segments. (only works with segmap)
--skip_border_objects Ignore segments touching the depth map boundary.
--forward_passes number of (batched) forward passes. Increase to sample more potential grasp contacts.
--z_range [min, max] z values in meter used to crop the input point cloud, e.g. to avoid grasps in the foreground/background(as above).
--arg_configs TEST.second_thres:0.19 TEST.first_thres:0.23 Overwrite config confidence thresholds for successful grasp contacts to get more/less grasp proposals

Training

Download Data

Download the Acronym dataset, ShapeNet meshes and make them watertight, following these steps.

Download the training data consisting of 10000 table top training scenes with contact grasp information from here and extract it to the same folder:

acronym
├── grasps
├── meshes
├── scene_contacts
└── splits

Train Contact-GraspNet

When training on a headless server set the environment variable

export PYOPENGL_PLATFORM='egl'

Start training with config contact_graspnet/config.yaml

python contact_graspnet/train.py --ckpt_dir checkpoints/your_model_name \
                                 --data_path /path/to/acronym/data

Generate Contact Grasps and Scenes yourself (optional)

The scene_contacts downloaded above are generated from the Acronym dataset. To generate/visualize table-top scenes yourself, also pip install the acronym_tools package in your conda environment as described in the acronym repository.

In the first step, object-wise 6-DoF grasps are mapped to their contact points saved in mesh_contacts

python tools/create_contact_infos.py /path/to/acronym

From the generated mesh_contacts you can create table-top scenes which are saved in scene_contacts with

python tools/create_table_top_scenes.py /path/to/acronym

Takes ~3 days in a single thread. Run the command several times to process on multiple cores in parallel.

You can also visualize existing table-top scenes and grasps

python tools/create_table_top_scenes.py /path/to/acronym \
       --load_existing scene_contacts/000000.npz -vis

Citation

@article{sundermeyer2021contact,
  title={Contact-GraspNet: Efficient 6-DoF Grasp Generation in Cluttered Scenes},
  author={Sundermeyer, Martin and Mousavian, Arsalan and Triebel, Rudolph and Fox, Dieter},
  booktitle={2021 IEEE International Conference on Robotics and Automation (ICRA)},
  year={2021}
}

Comments

conflict environment

Hi, My GPU is GTX3080Ti and i try to use contact_graspnet(inferency.py).it's pretty slow. I have seen the other issues #16 and #9. I try to use cuda11.2 and cudnn8.1 with TensorFlow-gpu2.5. But it exits a lot of packages conflict and python package bugs. So, is it possible that you provide a new version yml file for TensorFlow-gpu2.5?

Best regards, xiaolin

opened by xlim1996 8
Gripper control points

Hello. I am an undergraduate student currently using contact-graspnet on a project.

I am using contact-graspnet on a different robot (TIAGo from pal robotics) and the gripper differs from the panda robot. In order to be safe, I move the generated grasps backwards by an offset and then perform a forward motion to get the object between the robot's fingers. You have trained the network with the panda's gripper configuration by using its STL model and also points on the gripper (contact_graspnet/gripper_control_points/panda_gripper_coords.yml). I wonder if retraining the network using gripper coordinates for TIAGo's gripper would improve the generated grasps. I think that one of the things it would improve would be the grasps generated based on the gripper width constraints as the panda gripper is wider than TIAGO's gripper.

Thanks in advance!

opened by jucamohedano 8
First call to sess.run() at inference time is slow
Hi, have you encountered an issue where the first call to sess.run() in contact_grasp_estimator.py is slow? I am running the inference example in the readme, and when I time sess.run() the first call takes much longer than subsequent calls:

Run inference 1162.3998165130615 Preprocess pc for inference 0.0007269382476806641 Run inference 0.2754530906677246 Preprocess pc for inference 0.0006759166717529297

I found this thread on what seems to be a similar issue but the simple resolutions have not worked, and I have not tried compiling tensorflow from source yet. I am running on a GTX 3090 with CUDA 11.1, tensorflow-gpu==2.2. Have you encountered this issue before? Thanks for your help.
opened by thomasweng15 5
Gripper Width

Hello, Thank you again for your code. I tried to change gripper width in config file but I didn't see any difference in inference. I need to train again to change this feature? also I saw the filter with the gripper oppenings but I didn't find a consistent relation, could you explain me it, please.

opened by BryanBetancur 2
Fingers length

Hello, Thank you for your awesome code. I'm trying to implement your code with a different gripper, the fingers are longer but I haven't been able to find where to adjust this. Could you help me, please?

opened by BryanBetancur 2

how to convert output grasp from inference.py to real-world coordinate system

Hi,@MartinSmeyer I have a question: how to convert output grasp from inference.py to the real-world coordinate system? I use the following code to calculate the position from pred_grasps_cam,but the result seem incorrect. So I'm wondering if the matrix from pred_grasps_cam contains position and rotation information about the coordinates relative to the object what it want to grasp, or the coordinates actually in the image, or the coordinates actually in real-world?

def rotationMatrixToEulerAngles(R) :

    if R[1,0]>0.998:
        x= 0
        y=math.pi/2
        z=math.atan2(R[0, 2], R[2, 2])
    elif R[1, 0] <-0.998:
        x=0
        y=-math.pi / 2
        z = math.atan2(R[0, 2], R[2, 2])
    else:
        x = math.atan2(-R[1,2],R[1,1])
        y = math.asin(R[1,0])
        z = math.atan2(-R[2,0],R[0,0])

    return x,y,z

def getRotationAndPosition(transformation):
    assert transformation.shape[0] ==4 and transformation.shape[1] ==4 ,"shape error"
    x = transformation[0,3]
    y= transformation[1,3]
    z = transformation[2,3]
    x_r,y_r,z_r =rotationMatrixToEulerAngles(transformation[0:3,0:3])
    return x, y, z, x_r,y_r,z_r

opened by xlim1996 2

Output grasp pose

Hi,

I am running into some problems when transforming the output grasp pose to the pose that the robot needs. May I ask if the output grasp pose is in the actual camera coordinate or some other coordinate system? Thank you so much! @MartinSmeyer

opened by y556zhao 2
Result different during inference

Hi,

I was using default parameters to test the model on 7.npy, and the resulting grasps are sparse as shown in the image below. May I ask if I was doing anything wrong?

Here is the command I used: python contact_graspnet/inference.py --np_path=test_data/7.npy --local_regions --filter_grasps

Checkpoint: checkpoints/scene_test_2048_bs3_hor_sigma_001

Thank you! @MartinSmeyer

opened by y556zhao 2
Visualizing step - Stuck for hours

Hi,

While trying to generate grasps using the following command:

python contact_graspnet/inference.py \ --np_path=test_data/*.npy \ --local_regions --filter_grasps

the code does not proceed after the following print statement:

Generated 1 grasps for object 1.0 Generated 11 grasps for object 2.0 Generated 33 grasps for object 3.0 Generated 134 grasps for object 4.0 Generated 56 grasps for object 5.0 Generated 92 grasps for object 6.0 Generated 66 grasps for object 7.0 Generated 40 grasps for object 8.0 Generated 48 grasps for object 9.0 Generated 5 grasps for object 10.0 /home/kaykay/contact_graspnet/contact_graspnet/visualization_utils.py:63: MatplotlibDeprecationWarning: You are modifying the state of a globally registered colormap. In future versions, you will not be able to modify a registered colormap in-place. To remove this warning, you can make a copy of the colormap first. cmap = copy.copy(mpl.cm.get_cmap("rainbow")) cmap.set_under(alpha=0.0) Visualizing...takes time

It has been around 2 hours, what is the average expected time for the visualization step and if there is a CUDA memory limit (which I suspect) do we get a warning or an error?

opened by abhinavkk 2
predict_scene_grasps_from_depth_K_and_2d_seg missing some arguments and passing some incorrect arguments
predict_scene_grasps_from_depth_K_and_2d_seg() is missing :

rgb: taken by extract_point_clouds()

forward_passes: taken by predict_scene_grasps()

Also, extract_point_clouds() does not take local_regions and filter_grasps.
opened by abhishek47kashyap 2
Normalizing depth image for inference

Hi, @MartinSmeyer

I wanted to test the inference.py on real-time data from realsense depth camera. and I realized depth image from the provided test data is normalized while the depth image from realsense camera is uint16 format. I tried to normalize using this formula but i am not certain about the range. depth_normalized = (depth - min_depth) / (max_depth - min_depth) so, could you provide the range used to normalize or any other ways to preprocess depth image?

Thanks, Amanuel

opened by AmanuelErgogo 1
Issues not related to this project：RGB-D Fusion

I found you released the issue whether the mmdet can use 4 channel-input? I have achieved this in YOLOv5, and I am also interested in the question you asked? Have you tried in the mmdet?

opened by JCRONG96 0
Problems encountered in the training process

When I changed the use_farthest_point in the configuration file to true, the training time was very long. It took about 3 seconds to train a scene before it was changed, and it took about 180 seconds to train a scene after it was changed. I also monitored the usage of gpu and found that it was basically useless.

opened by Ruangq 1
Scores model output

I have noticed that the model produces scores when segmentation mask is set to None. Do these scores have any significance and if so what is that?

P.S. I added your model as a servicecall in a ROSBRIDGE, and, currently, I am naively selecting the top 5 poses with the highest score values without a segmentation mask. Also, i added the the appropriate transforms to get it working on the interbotix wx250s. its surprisingly good even before training and just filtering out invalid gripper widths. I am a senior in undergrad btw, so my journey has been similar to: https://github.com/NVlabs/contact_graspnet/issues/8

opened by danialdunson 1
No grab available

I have a problem. When I use the model I trained to predict, the success rate of grabbing is very low, about 0.003. I hope you can give me some help to solve this problem.

opened by Ruangq 6
About the format of the trained model

Thank you for making the code public. I'm trying this method. But I found that the provided model is in model.ckpt format, and some .meta files are missing, which makes my conversion to '.pb'/'.trt' format always fail, and there are certain problems. Excuse me, can you provide the trained model in '.pb' format, or methods in other formats. I want to try an experimental deployment.

opened by 123zhen123 1
change_object not implemented error

Thank you for the codebase!

Should this line

pcreader.change_object(cad_path, cad_scale)

be changed to

pcreader._renderer._load_object(cad_path, cad_scale)

because SceneRenderer does not implement change_object function.

Thanks!

opened by quanvuong 1

Owner

NVIDIA Research Projects

GitHub

Cluttered MNIST Dataset

Cluttered MNIST Dataset A setup script will download MNIST and produce mnist/*.t7 files: luajit download_mnist.lua Example usage: local mnist_clutter

50 Jul 12, 2022

Project to create an open-source 6 DoF input device

6DInputs A Project to create open-source 3D printed 6 DoF input devices Note the plural ('6DInputs' and 'devices') in the headings. We would like seve

47 Jul 28, 2022

The first dataset on shadow generation for the foreground object in real-world scenes.

Object-Shadow-Generation-Dataset-DESOBA Object Shadow Generation is to deal with the shadow inconsistency between the foreground object and the backgr

105 Dec 30, 2022

PyTorch implementation of paper "Neural Scene Flow Fields for Space-Time View Synthesis of Dynamic Scenes", CVPR 2021

Neural Scene Flow Fields PyTorch implementation of paper "Neural Scene Flow Fields for Space-Time View Synthesis of Dynamic Scenes", CVPR 20

585 Jan 4, 2023

Towards Rolling Shutter Correction and Deblurring in Dynamic Scenes (CVPR2021)

RSCD (BS-RSCD & JCD) Towards Rolling Shutter Correction and Deblurring in Dynamic Scenes (CVPR2021) by Zhihang Zhong, Yinqiang Zheng, Imari Sato We co

81 Dec 15, 2022

Generate indoor scenes with Transformers

SceneFormer: Indoor Scene Generation with Transformers Initial code release for the Sceneformer paper, contains models, train and test scripts for the

110 Dec 6, 2022

Populating 3D Scenes by Learning Human-Scene Interaction https://posa.is.tue.mpg.de/

Populating 3D Scenes by Learning Human-Scene Interaction [Project Page] [Paper] License Software Copyright License for non-commercial scientific resea

81 Nov 8, 2022

Human POSEitioning System (HPS): 3D Human Pose Estimation and Self-localization in Large Scenes from Body-Mounted Sensors, CVPR 2021

Human POSEitioning System (HPS): 3D Human Pose Estimation and Self-localization in Large Scenes from Body-Mounted Sensors Human POSEitioning System (H

66 Dec 21, 2022

Stereo Radiance Fields (SRF): Learning View Synthesis for Sparse Views of Novel Scenes

111 Dec 29, 2022

Semantic Segmentation for Real Point Cloud Scenes via Bilateral Augmentation and Adaptive Fusion (CVPR 2021)

Semantic Segmentation for Real Point Cloud Scenes via Bilateral Augmentation and Adaptive Fusion (CVPR 2021) This repository is for BAAF-Net introduce

90 Dec 29, 2022

This is the unofficial code of Deep Dual-resolution Networks for Real-time and Accurate Semantic Segmentation of Road Scenes. which achieve state-of-the-art trade-off between accuracy and speed on cityscapes and camvid, without using inference acceleration and extra data

Deep Dual-resolution Networks for Real-time and Accurate Semantic Segmentation of Road Scenes Introduction This is the unofficial code of Deep Dual-re

113 Dec 23, 2022

One Million Scenes for Autonomous Driving

ONCE Benchmark This is a reproduced benchmark for 3D object detection on the ONCE (One Million Scenes) dataset. The code is mainly based on OpenPCDet.

148 Dec 28, 2022

Instance Segmentation in 3D Scenes using Semantic Superpoint Tree Networks

SSTNet Instance Segmentation in 3D Scenes using Semantic Superpoint Tree Networks(ICCV2021) by Zhihao Liang, Zhihao Li, Songcen Xu, Mingkui Tan, Kui J

83 Nov 29, 2022

D-NeRF: Neural Radiance Fields for Dynamic Scenes

D-NeRF: Neural Radiance Fields for Dynamic Scenes [Project] [Paper] D-NeRF is a method for synthesizing novel views, at an arbitrary point in time, of

291 Jan 2, 2023

This repository contains the code for the CVPR 2021 paper "GIRAFFE: Representing Scenes as Compositional Generative Neural Feature Fields"

1.1k Dec 30, 2022

Official Pytorch implementation of "Learning to Estimate Robust 3D Human Mesh from In-the-Wild Crowded Scenes", CVPR 2022

Learning to Estimate Robust 3D Human Mesh from In-the-Wild Crowded Scenes / 3DCrowdNet News ?? 3DCrowdNet achieves the state-of-the-art accuracy on 3D

113 Dec 21, 2022

Efficient 6-DoF Grasp Generation in Cluttered Scenes

Related tags

Overview

Contact-GraspNet

Contact-GraspNet: Efficient 6-DoF Grasp Generation in Cluttered Scenes

Installation

Troubleshooting

Hardware

Download Models and Data

Model

Test data

Inference

Training

Download Data

Train Contact-GraspNet

Generate Contact Grasps and Scenes yourself (optional)

Citation

Comments

Owner

NVIDIA Research Projects

Cluttered MNIST Dataset

Project to create an open-source 6 DoF input device

The first dataset on shadow generation for the foreground object in real-world scenes.

PyTorch implementation of paper "Neural Scene Flow Fields for Space-Time View Synthesis of Dynamic Scenes", CVPR 2021

Towards Rolling Shutter Correction and Deblurring in Dynamic Scenes (CVPR2021)

Generate indoor scenes with Transformers

Populating 3D Scenes by Learning Human-Scene Interaction https://posa.is.tue.mpg.de/

Human POSEitioning System (HPS): 3D Human Pose Estimation and Self-localization in Large Scenes from Body-Mounted Sensors, CVPR 2021

Stereo Radiance Fields (SRF): Learning View Synthesis for Sparse Views of Novel Scenes

Semantic Segmentation for Real Point Cloud Scenes via Bilateral Augmentation and Adaptive Fusion (CVPR 2021)

This is the unofficial code of Deep Dual-resolution Networks for Real-time and Accurate Semantic Segmentation of Road Scenes. which achieve state-of-the-art trade-off between accuracy and speed on cityscapes and camvid, without using inference acceleration and extra data

One Million Scenes for Autonomous Driving

Instance Segmentation in 3D Scenes using Semantic Superpoint Tree Networks

D-NeRF: Neural Radiance Fields for Dynamic Scenes

This repository contains the code for the CVPR 2021 paper "GIRAFFE: Representing Scenes as Compositional Generative Neural Feature Fields"

Official PyTorch Implementation of paper "Deep 3D Mask Volume for View Synthesis of Dynamic Scenes", ICCV 2021.

PyTorch version repo for CSRNet: Dilated Convolutional Neural Networks for Understanding the Highly Congested Scenes

A minimal TPU compatible Jax implementation of NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis

Official Pytorch implementation of "Learning to Estimate Robust 3D Human Mesh from In-the-Wild Crowded Scenes", CVPR 2022