Learning RGB-D Feature Embeddings for Unseen Object Instance Segmentation

Overview

Unseen Object Clustering: Learning RGB-D Feature Embeddings for Unseen Object Instance Segmentation

Introduction

In this work, we propose a new method for unseen object instance segmentation by learning RGB-D feature embeddings from synthetic data. A metric learning loss functionis utilized to learn to produce pixel-wise feature embeddings such that pixels from the same object are close to each other and pixels from different objects are separated in the embedding space. With the learned feature embeddings, a mean shift clustering algorithm can be applied to discover and segment unseen objects. We further improve the segmentation accuracy with a new two-stage clustering algorithm. Our method demonstrates that non-photorealistic synthetic RGB and depth images can be used to learn feature embeddings that transfer well to real-world images for unseen object instance segmentation. arXiv, Talk video

License

Unseen Object Clustering is released under the NVIDIA Source Code License (refer to the LICENSE file for details).

Citation

If you find Unseen Object Clustering useful in your research, please consider citing:

@inproceedings{xiang2020learning,
    Author = {Yu Xiang and Christopher Xie and Arsalan Mousavian and Dieter Fox},
    Title = {Learning RGB-D Feature Embeddings for Unseen Object Instance Segmentation},
    booktitle = {Conference on Robot Learning (CoRL)},
    Year = {2020}
}

Required environment

  • Ubuntu 16.04 or above
  • PyTorch 0.4.1 or above
  • CUDA 9.1 or above

Installation

  1. Install PyTorch.

  2. Install python packages

    pip install -r requirement.txt

Download

  • Download our trained checkpoints from here, save to $ROOT/data.

Running the demo

  1. Download our trained checkpoints first.

  2. Run the following script for testing on images under $ROOT/data/demo.

    ./experiments/scripts/demo_rgbd_add.sh

Training and testing on the Tabletop Object Dataset (TOD)

  1. Download the Tabletop Object Dataset (TOD) from here (34G).

  2. Create a symlink for the TOD dataset

    cd $ROOT/data
    ln -s $TOD_DATA tabletop
  3. Training and testing on the TOD dataset

    cd $ROOT
    
    # multi-gpu training, we used 4 GPUs
    ./experiments/scripts/seg_resnet34_8s_embedding_cosine_rgbd_add_train_tabletop.sh
    
    # testing, $GPU_ID can be 0, 1, etc.
    ./experiments/scripts/seg_resnet34_8s_embedding_cosine_rgbd_add_test_tabletop.sh $GPU_ID $EPOCH
    

Testing on the OCID dataset and the OSD dataset

  1. Download the OCID dataset from here, and create a symbol link:

    cd $ROOT/data
    ln -s $OCID_dataset OCID
  2. Download the OSD dataset from here, and create a symbol link:

    cd $ROOT/data
    ln -s $OSD_dataset OSD
  3. Check scripts in experiments/scripts with name test_ocid or test_ocd. Make sure the path of the trained checkpoints exist.

    experiments/scripts/seg_resnet34_8s_embedding_cosine_rgbd_add_test_ocid.sh
    experiments/scripts/seg_resnet34_8s_embedding_cosine_rgbd_add_test_osd.sh
    

Running with ROS on a Realsense camera for real-world unseen object instance segmentation

  • Python2 is needed for ROS.

  • Make sure our pretrained checkpoints are downloaded.

    # start realsense
    roslaunch realsense2_camera rs_aligned_depth.launch tf_prefix:=measured/camera
    
    # start rviz
    rosrun rviz rviz -d ./ros/segmentation.rviz
    
    # run segmentation, $GPU_ID can be 0, 1, etc.
    ./experiments/scripts/ros_seg_rgbd_add_test_segmentation_realsense.sh $GPU_ID

Our example:

Comments
  • Run with ROS

    Run with ROS

    Hi, @bearpaw thanks for this great contribution.

    I have created a conda environment with python 2.7 and successfully installed all the requirements and I already have ros noetic. When I try to run the ros node using ./experiments/scripts/ros_seg_rgbd_add_test_segmentation_realsense.sh $GPU_ID

    I get this error:-

    Traceback (most recent call last): File "./ros/test_images_segmentation.py", line 13, in <module> import tf File "/opt/ros/noetic/lib/python3/dist-packages/tf/__init__.py", line 30, in <module> from tf2_ros import TransformException as Exception, ConnectivityException, LookupException, ExtrapolationException File "/opt/ros/noetic/lib/python3/dist-packages/tf2_ros/__init__.py", line 38, in <module> from tf2_py import * File "/opt/ros/noetic/lib/python3/dist-packages/tf2_py/__init__.py", line 38, in <module> from ._tf2 import * ImportError: dynamic module does not define init function (init_tf2)

    I guess this occurs because of the python version that ROS noetic uses(python3). So, how do I run on Ubuntu 20.04 with ROS noetic?

    opened by AmanuelErgogo 1
  • No depth image

    No depth image

    Hi, thanks for your great project!

    When I o run the code, there are some problems I am sure.

    1. For ros example, when I run roslaunch realsense2_camera rs_aligned_depth.launch tf_prefix:=measured/camera, there is no message in topic /camera/aligned_depth_to_color/image_raw. When I change the launch file to roslaunch realsense2_camera rs_camera.launch align_depth:=true tf_prefix:=measured/camera, the topic works fine. (P.S.: I use a intel realsense D435i camera)
    2. Due to the ROS python2 version, there maybe some problems. For the tf import problem, this issue maybe helps. When I run ./experiments/scripts/ros_seg_rgbd_add_test_segmentation_realsense.sh $GPU_ID, I got error like this:
    [ERROR] [1610543855.033907]: bad callback: <bound method Subscriber.callback of <message_filters.Subscriber object at 0x7f8338747dd8>>
    Traceback (most recent call last):
      File "/opt/ros/kinetic/lib/python2.7/dist-packages/rospy/topics.py", line 750, in _invoke_callback
        cb(msg)
      File "/opt/ros/kinetic/lib/python2.7/dist-packages/message_filters/__init__.py", line 75, in callback
        self.signalMessage(msg)
      File "/opt/ros/kinetic/lib/python2.7/dist-packages/message_filters/__init__.py", line 57, in signalMessage
        cb(*(msg + args))
      File "/opt/ros/kinetic/lib/python2.7/dist-packages/message_filters/__init__.py", line 282, in add
        for vv in itertools.product(*[zip(*s)[0] for s in stamps]):
      File "/opt/ros/kinetic/lib/python2.7/dist-packages/message_filters/__init__.py", line 282, in <listcomp>
        for vv in itertools.product(*[zip(*s)[0] for s in stamps]):
    TypeError: 'zip' object is not subscriptable
    

    For now, I only rebuild tf package with python3 env. So could you point out which part runs based on ros python2 or rebuilds with python3? Thanks a lot! 3. I am not sure the camera name here. D415 or D435?

    Thanks a lot!

    opened by wangcongrobot 1
  • Tuning params for different dataset

    Tuning params for different dataset

    Dear all,

    thank you very much for open sourcing this interesting work! I just pulled the code, and while it works perfectly on the provided images, I am having a hard time making it work properly on a different dataset, see the example image below. Is there any parameter I can tune in order to get good results on this kind of images? Thank you very much in advance!

    Screenshot from 2022-05-11 10-25-27

    opened by chisarie 0
  • pyyaml version should be capped at 5.4.1

    pyyaml version should be capped at 5.4.1

    Thought this might be helpful in case anyone comes across this issue. Since version 6.0 of pyyaml, yaml.load requires a Loader argument. This causes an error to be raised in the code. PR linked

    opened by alexander-soare 0
  • Retraining of the Segmentation

    Retraining of the Segmentation

    Is there a possibility to retrain the network on my own generated synthetic dataset without losing the current performance of the network? I would like to just retrain some certain layers to get even better performance on my own dataset, or is the architecture too special for that? I would like to avoid to retrained the network from scratch? For generating the synthetic dataset I used SceneNet RGB-D with custom rooms and objects.

    opened by datboi223 0
  • Get table segmentation in detections

    Get table segmentation in detections

    Thanks for this impressive work. I'm working on a project where I need to segment tables from point clouds as well as objects on the table. Is it possible to use your work to segment the table in addition to objects? Have you tried it?

    Best regards, Mikkel

    opened by mikkeljakobsen 1
Owner
NVIDIA Research Projects
NVIDIA Research Projects
3DMV jointly combines RGB color and geometric information to perform 3D semantic segmentation of RGB-D scans.

3DMV 3DMV jointly combines RGB color and geometric information to perform 3D semantic segmentation of RGB-D scans. This work is based on our ECCV'18 p

Владислав Молодцов 0 Feb 6, 2022
Tools to create pixel-wise object masks, bounding box labels (2D and 3D) and 3D object model (PLY triangle mesh) for object sequences filmed with an RGB-D camera.

Tools to create pixel-wise object masks, bounding box labels (2D and 3D) and 3D object model (PLY triangle mesh) for object sequences filmed with an RGB-D camera. This project prepares training and testing data for various deep learning projects such as 6D object pose estimation projects singleshotpose, as well as object detection and instance segmentation projects.

null 305 Dec 16, 2022
[MedIA2021]MIDeepSeg: Minimally Interactive Segmentation of Unseen Objects from Medical Images Using Deep Learning

MIDeepSeg: Minimally Interactive Segmentation of Unseen Objects from Medical Images Using Deep Learning [MedIA or Arxiv] and [Demo] This repository pr

Healthcare Intelligence Laboratory 92 Dec 8, 2022
DSAC* for Visual Camera Re-Localization (RGB or RGB-D)

DSAC* for Visual Camera Re-Localization (RGB or RGB-D) Introduction Installation Data Structure Supported Datasets 7Scenes 12Scenes Cambridge Landmark

Visual Learning Lab 143 Dec 22, 2022
TorchDistiller - a collection of the open source pytorch code for knowledge distillation, especially for the perception tasks, including semantic segmentation, depth estimation, object detection and instance segmentation.

This project is a collection of the open source pytorch code for knowledge distillation, especially for the perception tasks, including semantic segmentation, depth estimation, object detection and instance segmentation.

yifan liu 147 Dec 3, 2022
The implementation of the algorithm in the paper "Safe Deep Semi-Supervised Learning for Unseen-Class Unlabeled Data" published in ICML 2020.

DS3L This is the code for paper "Safe Deep Semi-Supervised Learning for Unseen-Class Unlabeled Data" published in ICML 2020. Setups The code is implem

Guolz 36 Oct 19, 2022
Leveraging Instance-, Image- and Dataset-Level Information for Weakly Supervised Instance Segmentation

Leveraging Instance-, Image- and Dataset-Level Information for Weakly Supervised Instance Segmentation This paper has been accepted and early accessed

Yun Liu 39 Sep 20, 2022
ImageNet-CoG is a benchmark for concept generalization. It provides a full evaluation framework for pre-trained visual representations which measure how well they generalize to unseen concepts.

The ImageNet-CoG Benchmark Project Website Paper (arXiv) Code repository for the ImageNet-CoG Benchmark introduced in the paper "Concept Generalizatio

NAVER 23 Oct 9, 2022
Code for Parameter Prediction for Unseen Deep Architectures (NeurIPS 2021)

Parameter Prediction for Unseen Deep Architectures (NeurIPS 2021) authors: Boris Knyazev, Michal Drozdzal, Graham Taylor, Adriana Romero-Soriano Overv

Facebook Research 462 Jan 3, 2023
pytorch implementation of "Contrastive Multiview Coding", "Momentum Contrast for Unsupervised Visual Representation Learning", and "Unsupervised Feature Learning via Non-Parametric Instance-level Discrimination"

Unofficial implementation: MoCo: Momentum Contrast for Unsupervised Visual Representation Learning (Paper) InsDis: Unsupervised Feature Learning via N

Zhiqiang Shen 16 Nov 4, 2020
This is an official implementation for "Swin Transformer: Hierarchical Vision Transformer using Shifted Windows" on Object Detection and Instance Segmentation.

Swin Transformer for Object Detection This repo contains the supported code and configuration files to reproduce object detection results of Swin Tran

Swin Transformer 1.4k Dec 30, 2022
Fast, modular reference implementation of Instance Segmentation and Object Detection algorithms in PyTorch.

Faster R-CNN and Mask R-CNN in PyTorch 1.0 maskrcnn-benchmark has been deprecated. Please see detectron2, which includes implementations for all model

Facebook Research 9k Jan 4, 2023
Object detection and instance segmentation toolkit based on PaddlePaddle.

Object detection and instance segmentation toolkit based on PaddlePaddle.

null 9.3k Jan 2, 2023
Robust Instance Segmentation through Reasoning about Multi-Object Occlusion [CVPR 2021]

Robust Instance Segmentation through Reasoning about Multi-Object Occlusion [CVPR 2021] Abstract Analyzing complex scenes with DNN is a challenging ta

Irene Yuan 24 Jun 27, 2022
This is the official implementation of the paper "Object Propagation via Inter-Frame Attentions for Temporally Stable Video Instance Segmentation".

[CVPRW 2021] - Object Propagation via Inter-Frame Attentions for Temporally Stable Video Instance Segmentation

Anirudh S Chakravarthy 6 May 3, 2022
Complete-IoU (CIoU) Loss and Cluster-NMS for Object Detection and Instance Segmentation (YOLACT)

Complete-IoU Loss and Cluster-NMS for Improving Object Detection and Instance Segmentation. Our paper is accepted by IEEE Transactions on Cybernetics

null 290 Dec 25, 2022
Res2Net for Instance segmentation and Object detection using MaskRCNN

Res2Net for Instance segmentation and Object detection using MaskRCNN Since the MaskRCNN-benchmark of facebook is deprecated, we suggest to use our mm

Res2Net Applications 55 Oct 30, 2022
Mask R-CNN for object detection and instance segmentation on Keras and TensorFlow

Mask R-CNN for Object Detection and Segmentation This is an implementation of Mask R-CNN on Python 3, Keras, and TensorFlow. The model generates bound

Matterport, Inc 22.5k Jan 4, 2023
This is the official pytorch implementation for the paper: Instance Similarity Learning for Unsupervised Feature Representation.

ISL This is the official pytorch implementation for the paper: Instance Similarity Learning for Unsupervised Feature Representation, which is accepted

null 19 May 4, 2022