Efficient 6-DoF Grasp Generation in Cluttered Scenes

Overview

Contact-GraspNet

Contact-GraspNet: Efficient 6-DoF Grasp Generation in Cluttered Scenes

Martin Sundermeyer, Arsalan Mousavian, Rudolph Triebel, Dieter Fox
ICRA 2021

paper, project page, video

Installation

This code has been tested with python 3.7, tensorflow 2.2, CUDA 10.1, and CUDNN 7.6.0

Create the conda env

conda env create -f contact_graspnet_env.yml

Troubleshooting

  • Recompile pointnet2 tf_ops, see here

Hardware

Training: 1x Nvidia GPU >= 24GB VRAM, >=64GB RAM
Inference: 1x Nvidia GPU >= 8GB VRAM (might work with less)

Download Models and Data

Model

Download trained models from here and copy them into the checkpoints/ folder.

Test data

Download the test data from here and copy them them into the test_data/ folder.

Inference

Contact-GraspNet can directly predict a 6-DoF grasp distribution from a raw scene point cloud. However, to obtain object-wise grasps, remove background grasps and to achieve denser proposals it is highly recommended to use (unknown) object segmentation [e.g. 1, 2] as preprocessing and then use the resulting segmentation map to crop local regions and filter grasp contacts.

Given a .npy/.npz file with a depth map (in meters), camera matrix K and (optionally) a 2D segmentation map, execute:

python contact_graspnet/inference.py \
       --np_path=test_data/*.npy \
       --local_regions --filter_grasps

--> close the window to go to next scene

Given a .npy/.npz file with just a 3D point cloud (in meters), execute for example:

python contact_graspnet/inference.py --np_path=/path/to/your/pc.npy \
                                     --forward_passes=5 \
                                     --z_range=[0.2,1.1]

--np_path: input .npz/.npy file(s) with 'depth', 'K' and optionally 'segmap', 'rgb' keys. For processing a Nx3 point cloud instead use 'xzy' and optionally 'xyz_color' as keys.
--ckpt_dir: relative path to checkpooint directory. By default checkpoint/scene_test_2048_bs3_hor_sigma_001 is used. For very clean / noisy depth data consider scene_2048_bs3_rad2_32 / scene_test_2048_bs3_hor_sigma_0025 trained with no / strong noise.
--local_regions: Crop 3D local regions around object segments for inference. (only works with segmap)
--filter_grasps: Filter grasp contacts such that they only lie on the surface of object segments. (only works with segmap)
--skip_border_objects Ignore segments touching the depth map boundary.
--forward_passes number of (batched) forward passes. Increase to sample more potential grasp contacts.
--z_range [min, max] z values in meter used to crop the input point cloud, e.g. to avoid grasps in the foreground/background(as above).
--arg_configs TEST.second_thres:0.19 TEST.first_thres:0.23 Overwrite config confidence thresholds for successful grasp contacts to get more/less grasp proposals

Training

Download Data

Download the Acronym dataset, ShapeNet meshes and make them watertight, following these steps.

Download the training data consisting of 10000 table top training scenes with contact grasp information from here and extract it to the same folder:

acronym
├── grasps
├── meshes
├── scene_contacts
└── splits

Train Contact-GraspNet

When training on a headless server set the environment variable

export PYOPENGL_PLATFORM='egl'

Start training with config contact_graspnet/config.yaml

python contact_graspnet/train.py --ckpt_dir checkpoints/your_model_name \
                                 --data_path /path/to/acronym/data

Generate Contact Grasps and Scenes yourself (optional)

The scene_contacts downloaded above are generated from the Acronym dataset. To generate/visualize table-top scenes yourself, also pip install the acronym_tools package in your conda environment as described in the acronym repository.

In the first step, object-wise 6-DoF grasps are mapped to their contact points saved in mesh_contacts

python tools/create_contact_infos.py /path/to/acronym

From the generated mesh_contacts you can create table-top scenes which are saved in scene_contacts with

python tools/create_table_top_scenes.py /path/to/acronym

Takes ~3 days in a single thread. Run the command several times to process on multiple cores in parallel.

You can also visualize existing table-top scenes and grasps

python tools/create_table_top_scenes.py /path/to/acronym \
       --load_existing scene_contacts/000000.npz -vis

Citation

@article{sundermeyer2021contact,
  title={Contact-GraspNet: Efficient 6-DoF Grasp Generation in Cluttered Scenes},
  author={Sundermeyer, Martin and Mousavian, Arsalan and Triebel, Rudolph and Fox, Dieter},
  booktitle={2021 IEEE International Conference on Robotics and Automation (ICRA)},
  year={2021}
}
Comments
  • conflict environment

    conflict environment

    Hi, My GPU is GTX3080Ti and i try to use contact_graspnet(inferency.py).it's pretty slow. I have seen the other issues #16 and #9. I try to use cuda11.2 and cudnn8.1 with TensorFlow-gpu2.5. But it exits a lot of packages conflict and python package bugs. So, is it possible that you provide a new version yml file for TensorFlow-gpu2.5?

    Best regards, xiaolin

    opened by xlim1996 8
  • Gripper control points

    Gripper control points

    Hello. I am an undergraduate student currently using contact-graspnet on a project.

    I am using contact-graspnet on a different robot (TIAGo from pal robotics) and the gripper differs from the panda robot. In order to be safe, I move the generated grasps backwards by an offset and then perform a forward motion to get the object between the robot's fingers. You have trained the network with the panda's gripper configuration by using its STL model and also points on the gripper (contact_graspnet/gripper_control_points/panda_gripper_coords.yml). I wonder if retraining the network using gripper coordinates for TIAGo's gripper would improve the generated grasps. I think that one of the things it would improve would be the grasps generated based on the gripper width constraints as the panda gripper is wider than TIAGO's gripper.

    Thanks in advance!

    opened by jucamohedano 8
  • First call to sess.run() at inference time is slow

    First call to sess.run() at inference time is slow

    Hi, have you encountered an issue where the first call to sess.run() in contact_grasp_estimator.py is slow? I am running the inference example in the readme, and when I time sess.run() the first call takes much longer than subsequent calls:

    Run inference 1162.3998165130615
    Preprocess pc for inference 0.0007269382476806641
    Run inference 0.2754530906677246
    Preprocess pc for inference 0.0006759166717529297
    

    I found this thread on what seems to be a similar issue but the simple resolutions have not worked, and I have not tried compiling tensorflow from source yet. I am running on a GTX 3090 with CUDA 11.1, tensorflow-gpu==2.2. Have you encountered this issue before? Thanks for your help.

    opened by thomasweng15 5
  • Gripper Width

    Gripper Width

    Hello, Thank you again for your code. I tried to change gripper width in config file but I didn't see any difference in inference. I need to train again to change this feature? also I saw the filter with the gripper oppenings but I didn't find a consistent relation, could you explain me it, please.

    opened by BryanBetancur 2
  • Fingers length

    Fingers length

    Hello, Thank you for your awesome code. I'm trying to implement your code with a different gripper, the fingers are longer but I haven't been able to find where to adjust this. Could you help me, please?

    opened by BryanBetancur 2
  • how to convert output grasp from inference.py to real-world coordinate system

    how to convert output grasp from inference.py to real-world coordinate system

    Hi,@MartinSmeyer I have a question: how to convert output grasp from inference.py to the real-world coordinate system? I use the following code to calculate the position from pred_grasps_cam,but the result seem incorrect. So I'm wondering if the matrix from pred_grasps_cam contains position and rotation information about the coordinates relative to the object what it want to grasp, or the coordinates actually in the image, or the coordinates actually in real-world?

    def rotationMatrixToEulerAngles(R) :
    
        if R[1,0]>0.998:
            x= 0
            y=math.pi/2
            z=math.atan2(R[0, 2], R[2, 2])
        elif R[1, 0] <-0.998:
            x=0
            y=-math.pi / 2
            z = math.atan2(R[0, 2], R[2, 2])
        else:
            x = math.atan2(-R[1,2],R[1,1])
            y = math.asin(R[1,0])
            z = math.atan2(-R[2,0],R[0,0])
    
        return x,y,z
    
    def getRotationAndPosition(transformation):
        assert transformation.shape[0] ==4 and transformation.shape[1] ==4 ,"shape error"
        x = transformation[0,3]
        y= transformation[1,3]
        z = transformation[2,3]
        x_r,y_r,z_r =rotationMatrixToEulerAngles(transformation[0:3,0:3])
        return x, y, z, x_r,y_r,z_r
    
    opened by xlim1996 2
  • Output grasp pose

    Output grasp pose

    Hi,

    I am running into some problems when transforming the output grasp pose to the pose that the robot needs. May I ask if the output grasp pose is in the actual camera coordinate or some other coordinate system? Thank you so much! @MartinSmeyer

    opened by y556zhao 2
  • Result different during inference

    Result different during inference

    Hi,

    I was using default parameters to test the model on 7.npy, and the resulting grasps are sparse as shown in the image below. May I ask if I was doing anything wrong?

    Here is the command I used: python contact_graspnet/inference.py --np_path=test_data/7.npy --local_regions --filter_grasps

    Checkpoint: checkpoints/scene_test_2048_bs3_hor_sigma_001

    Thank you! @MartinSmeyer grasp

    opened by y556zhao 2
  • Visualizing step - Stuck for hours

    Visualizing step - Stuck for hours

    Hi,

    While trying to generate grasps using the following command:

    python contact_graspnet/inference.py \ --np_path=test_data/*.npy \ --local_regions --filter_grasps

    the code does not proceed after the following print statement:

    Generated 1 grasps for object 1.0 Generated 11 grasps for object 2.0 Generated 33 grasps for object 3.0 Generated 134 grasps for object 4.0 Generated 56 grasps for object 5.0 Generated 92 grasps for object 6.0 Generated 66 grasps for object 7.0 Generated 40 grasps for object 8.0 Generated 48 grasps for object 9.0 Generated 5 grasps for object 10.0 /home/kaykay/contact_graspnet/contact_graspnet/visualization_utils.py:63: MatplotlibDeprecationWarning: You are modifying the state of a globally registered colormap. In future versions, you will not be able to modify a registered colormap in-place. To remove this warning, you can make a copy of the colormap first. cmap = copy.copy(mpl.cm.get_cmap("rainbow")) cmap.set_under(alpha=0.0) Visualizing...takes time

    It has been around 2 hours, what is the average expected time for the visualization step and if there is a CUDA memory limit (which I suspect) do we get a warning or an error?

    opened by abhinavkk 2
  • predict_scene_grasps_from_depth_K_and_2d_seg missing some arguments and passing some incorrect arguments

    predict_scene_grasps_from_depth_K_and_2d_seg missing some arguments and passing some incorrect arguments

    predict_scene_grasps_from_depth_K_and_2d_seg() is missing :

    • rgb: taken by extract_point_clouds()
    • forward_passes: taken by predict_scene_grasps()

    Also, extract_point_clouds() does not take local_regions and filter_grasps.

    opened by abhishek47kashyap 2
  • Normalizing depth image for inference

    Normalizing depth image for inference

    Hi, @MartinSmeyer

    I wanted to test the inference.py on real-time data from realsense depth camera. and I realized depth image from the provided test data is normalized while the depth image from realsense camera is uint16 format. I tried to normalize using this formula but i am not certain about the range. depth_normalized = (depth - min_depth) / (max_depth - min_depth) so, could you provide the range used to normalize or any other ways to preprocess depth image?

    Thanks, Amanuel

    opened by AmanuelErgogo 1
  • Issues not related to this project:RGB-D Fusion

    Issues not related to this project:RGB-D Fusion

    I found you released the issue whether the mmdet can use 4 channel-input? I have achieved this in YOLOv5, and I am also interested in the question you asked? Have you tried in the mmdet?

    opened by JCRONG96 0
  • Problems encountered in the training process

    Problems encountered in the training process

    When I changed the use_farthest_point in the configuration file to true, the training time was very long. It took about 3 seconds to train a scene before it was changed, and it took about 180 seconds to train a scene after it was changed. I also monitored the usage of gpu and found that it was basically useless.

    opened by Ruangq 1
  • Scores model output

    Scores model output

    I have noticed that the model produces scores when segmentation mask is set to None. Do these scores have any significance and if so what is that?

    P.S. I added your model as a servicecall in a ROSBRIDGE, and, currently, I am naively selecting the top 5 poses with the highest score values without a segmentation mask. Also, i added the the appropriate transforms to get it working on the interbotix wx250s. its surprisingly good even before training and just filtering out invalid gripper widths. I am a senior in undergrad btw, so my journey has been similar to: https://github.com/NVlabs/contact_graspnet/issues/8

    opened by danialdunson 1
  • No grab available

    No grab available

    I have a problem. When I use the model I trained to predict, the success rate of grabbing is very low, about 0.003. I hope you can give me some help to solve this problem.

    opened by Ruangq 6
  • About the format of the trained model

    About the format of the trained model

    Thank you for making the code public. I'm trying this method. But I found that the provided model is in model.ckpt format, and some .meta files are missing, which makes my conversion to '.pb'/'.trt' format always fail, and there are certain problems. Excuse me, can you provide the trained model in '.pb' format, or methods in other formats. I want to try an experimental deployment.

    opened by 123zhen123 1
  • change_object not implemented error

    change_object not implemented error

    Thank you for the codebase!

    Should this line

    pcreader.change_object(cad_path, cad_scale)

    be changed to

    pcreader._renderer._load_object(cad_path, cad_scale)

    because SceneRenderer does not implement change_object function.

    Thanks!

    opened by quanvuong 1
Owner
NVIDIA Research Projects
NVIDIA Research Projects
Cluttered MNIST Dataset

Cluttered MNIST Dataset A setup script will download MNIST and produce mnist/*.t7 files: luajit download_mnist.lua Example usage: local mnist_clutter

DeepMind 50 Jul 12, 2022
Project to create an open-source 6 DoF input device

6DInputs A Project to create open-source 3D printed 6 DoF input devices Note the plural ('6DInputs' and 'devices') in the headings. We would like seve

RepRap Ltd 47 Jul 28, 2022
The first dataset on shadow generation for the foreground object in real-world scenes.

Object-Shadow-Generation-Dataset-DESOBA Object Shadow Generation is to deal with the shadow inconsistency between the foreground object and the backgr

BCMI 105 Dec 30, 2022
PyTorch implementation of paper "Neural Scene Flow Fields for Space-Time View Synthesis of Dynamic Scenes", CVPR 2021

Neural Scene Flow Fields PyTorch implementation of paper "Neural Scene Flow Fields for Space-Time View Synthesis of Dynamic Scenes", CVPR 20

Zhengqi Li 585 Jan 4, 2023
Towards Rolling Shutter Correction and Deblurring in Dynamic Scenes (CVPR2021)

RSCD (BS-RSCD & JCD) Towards Rolling Shutter Correction and Deblurring in Dynamic Scenes (CVPR2021) by Zhihang Zhong, Yinqiang Zheng, Imari Sato We co

null 81 Dec 15, 2022
Generate indoor scenes with Transformers

SceneFormer: Indoor Scene Generation with Transformers Initial code release for the Sceneformer paper, contains models, train and test scripts for the

Chandan Yeshwanth 110 Dec 6, 2022
Populating 3D Scenes by Learning Human-Scene Interaction https://posa.is.tue.mpg.de/

Populating 3D Scenes by Learning Human-Scene Interaction [Project Page] [Paper] License Software Copyright License for non-commercial scientific resea

Mohamed Hassan 81 Nov 8, 2022
Human POSEitioning System (HPS): 3D Human Pose Estimation and Self-localization in Large Scenes from Body-Mounted Sensors, CVPR 2021

Human POSEitioning System (HPS): 3D Human Pose Estimation and Self-localization in Large Scenes from Body-Mounted Sensors Human POSEitioning System (H

Aymen Mir 66 Dec 21, 2022
Stereo Radiance Fields (SRF): Learning View Synthesis for Sparse Views of Novel Scenes

Stereo Radiance Fields (SRF): Learning View Synthesis for Sparse Views of Novel Scenes

null 111 Dec 29, 2022
Semantic Segmentation for Real Point Cloud Scenes via Bilateral Augmentation and Adaptive Fusion (CVPR 2021)

Semantic Segmentation for Real Point Cloud Scenes via Bilateral Augmentation and Adaptive Fusion (CVPR 2021) This repository is for BAAF-Net introduce

null 90 Dec 29, 2022
One Million Scenes for Autonomous Driving

ONCE Benchmark This is a reproduced benchmark for 3D object detection on the ONCE (One Million Scenes) dataset. The code is mainly based on OpenPCDet.

null 148 Dec 28, 2022
Instance Segmentation in 3D Scenes using Semantic Superpoint Tree Networks

SSTNet Instance Segmentation in 3D Scenes using Semantic Superpoint Tree Networks(ICCV2021) by Zhihao Liang, Zhihao Li, Songcen Xu, Mingkui Tan, Kui J

null 83 Nov 29, 2022
D-NeRF: Neural Radiance Fields for Dynamic Scenes

D-NeRF: Neural Radiance Fields for Dynamic Scenes [Project] [Paper] D-NeRF is a method for synthesizing novel views, at an arbitrary point in time, of

Albert Pumarola 291 Jan 2, 2023
This repository contains the code for the CVPR 2021 paper "GIRAFFE: Representing Scenes as Compositional Generative Neural Feature Fields"

GIRAFFE: Representing Scenes as Compositional Generative Neural Feature Fields Project Page | Paper | Supplementary | Video | Slides | Blog | Talk If

null 1.1k Dec 30, 2022
Official PyTorch Implementation of paper "Deep 3D Mask Volume for View Synthesis of Dynamic Scenes", ICCV 2021.

Deep 3D Mask Volume for View Synthesis of Dynamic Scenes Official PyTorch Implementation of paper "Deep 3D Mask Volume for View Synthesis of Dynamic S

Ken Lin 17 Oct 12, 2022
PyTorch version repo for CSRNet: Dilated Convolutional Neural Networks for Understanding the Highly Congested Scenes

Study-CSRNet-pytorch This is the PyTorch version repo for CSRNet: Dilated Convolutional Neural Networks for Understanding the Highly Congested Scenes

null 0 Mar 1, 2022
A minimal TPU compatible Jax implementation of NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis

NeRF Minimal Jax implementation of NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis. Result of Tiny-NeRF RGB Depth

Soumik Rakshit 11 Jul 24, 2022
Official Pytorch implementation of "Learning to Estimate Robust 3D Human Mesh from In-the-Wild Crowded Scenes", CVPR 2022

Learning to Estimate Robust 3D Human Mesh from In-the-Wild Crowded Scenes / 3DCrowdNet News ?? 3DCrowdNet achieves the state-of-the-art accuracy on 3D

Hongsuk Choi 113 Dec 21, 2022