Tools to create pixel-wise object masks, bounding box labels (2D and 3D) and 3D object model (PLY triangle mesh) for object sequences filmed with an RGB-D camera.

Overview

Object Dataset Tools

Introduction

This repository contains pure python scripts to create object masks, bounding box labels, and 3D reconstructed object mesh (.ply) for object sequences filmed with an RGB-D camera. This project can prepare training and testing data for various deep learning projects such as 6D object pose estimation projects singleshotpose, and many object detection (e.g., faster rcnn) and instance segmentation (e.g., mask rcnn) projects. Ideally, if you have realsense cameras and have some experience with MeshLab or Blender, creating your customized dataset should be as easy as executing a few command line arguments.

This codes in this repository implement a raw 3D model acquisition pipeline through aruco markers and ICP registration. The raw 3D model obtained needs to be processed and noise-removed in a mesh processing software. After this step, there are functions to generate required labels in automatically.

The codes are currently written for a single object of interest per frame. They can be modified to create a dataset that has several items within a frame.

cover mask

Installation

Installation of this repository has been tested on a fresh install of Ubuntu 16.04 with Python 2.7, but should be compatible with Python 3 as well. Installations on a wide range of intel realsense drivers and their python wrappers are included.

Create dataset on customized items

1. Preparation

Color print the pdf with the correctly sized aruco markers (with ID 1-13) in the arucomarkers folder. Affix the markers surrounding the object of interest, as shown in the picture, make sure that you don't have markers with dulplicate IDS .

BackFlow

2. Record an object sequence

Option 1: Record with a realsense camera (SR300 perfered)

The script is provided to record an object video sequence using a compatible realsense camera. Use record.py for legacy models and record2.py for librealsense SDK 2.0:

python record.py LINEMOD/OBJECTNAME

e.g.,

python record.py LINEMOD/sugar

to record a sequence of a sugar box. By default, the script records for 40 seconds after a countdown of 5. You can change the recording interval or exit the recording by pressing "q". Please steadily move the camera to get different views of the object while maintaining that 2-3 markers are within the field of view of the camera at any time.

Note that the project assumes all sequences are saved under the folder named "LINEMOD", use other folder names will cause an error to occur.

If you use record.py to create your sequence, color images, depth aligned to color images, and camera parameters will be automatically saved under the directory of the sequence.

Option 2: Use an existing sequence or record with other cameras

If you are using other cameras, please put color images (.jpg) in a folder named "JPEGImages" and the aligned depth images (uint16 pngs interpolated over a 8m range) in the "depth" folder. Please note that the algorithm assumes the depth images to be aligned to color images. Name your color images in sequential order from 0.jpg, 1.jpg ... 600.jpg and the corresponding depth images as 0.png ... 600.png, you should also create a file intrinsics.json under the sequence directory and manually input the camera parameters in the format like below:

{"fx": 614.4744262695312, "fy": 614.4745483398438, "height": 480, "width": 640, "ppy": 233.29214477539062, "ppx": 308.8282470703125, "ID": "620201000292"}

If you don't know your camera's intrinsic, you can put a rough estimation in. All parameters required are fx, fy, cx, cy, where commonly fx = fy and equals to the width of the image and cx and cy is the center of the image. For example, for a 640 x 480 resolution image, fx, fy = 640, cx = 320, cy = 240.

An example sequence can be download HERE, create a directory named "LINEMOD", unzip the example sequence, and put the extracted folder (timer) under LINEMOD.

3. Obtain frame transforms

Compute transforms for frames at the specified interval (interval can be changed in config/registrationParameters) against the first frame, save the transforms(4*4 homogenous transforms) as a numpy array (.npy).

python compute_gt_poses.py LINEMOD/sugar

4. Register all frames and create a mesh for the registered scene.

python register_scene.py LINEMOD/sugar

A raw registeredScene.ply will be saved under the specified directory (e.g., LINEMOD/sugar). The registeredScene.ply is a registered pointcloud of the scene that includes the table top, markers, and any other objects exposed during the scanning, with some level of noise removal. The generated mesh looks something like this and requires manual processing in step 5:

BackFlow

Alternatively, you can try skipping all manual efforts by trying register_segmented instead of register_scene.

python register_segmented.py LINEMOD/sugar

By default, register_segmented attempts to removes all unwanted backgrounds and performs surface reconstruction that converts the registered pointcloud into a triangular mesh. If MESHING is set to false, the script will only attempt to remove background and auto-complete the unseen bottom with a flat surface (If FILLBOTTOM is set to true), and you will need to do step 5.

However, register_segmented may fail as it uses some ad hoc methods for segmenting the background, therefore you may need to tune some parameters for it to work with your object. The most important knob to tune is "MAX_RADIUS", which cuts off any depth reading whose Euclidean distance to the center of the aruco markers observed is longer than the value specified. This value is currently set at 0.2 m, if you have a larger object, you may need to increase this value to not cut off parts of your object. Result from running register_segmented looks something like this:

BackFlow

5. Process the registered pointcloud manually (Optional)

(03/03/2019) You can skip step 5 if you are satisfied with the result from running register_segmented.

The registered pointcloud needs to be processed to

  1. Remove background that is not of interest,
  2. Perform surface reconstruction and complete the missing side or vice versa,
  3. Process the reconstructed mesh (you may need to cut parts off and recomplete the missing side),
  4. Make sure that the processed mesh is free of ANY isolated noise.

The end product is a triangular mesh instead of the registered pointcloud generated by the algorithm.

You may find these YouTube tutorials useful: Point cloud to mesh conversion, Point Cloud to Mesh Reconstruction (MeshLab), and this very basic one I recorded.

If you are creating the mesh as a by-product to obtain image masks, or use it for projects like singleshotpose. Only the exact mesh geometry is needed while the appearance is not useful. It's therefore acceptable to "close holes" as shown in the video for planar areas. Also, for symmetrical objects, complete the shape manually by symmetry. If you need the exact texture information for the missing side, you will need to film another sequence exposing the missing side and manually align 2 pointclouds.

6. Create image masks and label files

When you have completed step 1-4 for all customized objects, run

python create_label_files.py all

or

python create_label_files.py LINEMOD/sugar

This step creates a new mesh named foldername.ply (e.g., sugar.ply) whose AABB is centered at the origin and are the same dimensions as the OBB. It also produces image masks (saved under mask), 4 x 4 homogenious transforms in regards to the new mesh (saved under transforms), as well as labels files (saved under labels) which are projections of the 3D bounding box of the object onto the 2D images. The mask files can be used for training and testing purposes for a deep learning project (e.g., mask-rcnn)

Inspect the correctness of the created 3D bounding boxes and masks visually by running:

python inspectMasks.py LINEMOD/sugar

(Optional) Create additional files required by singleshotpose

If you create the mesh file for singleshot pose, you need to open those new mesh files in meshlab and save them again by unchecking the binary format option. Those meshes are used by singleshotpose for evaluation and pose estimation purpose, and singleshotpose cannot read mesh that is binary encoded.

Masks and labels created in step 6 are compatible with singleshotpose. Currently, class labels are assigned in a hacky way (e.g., by the order the folder is grabbed among all sequence folders), if you call create_label for each folder they will be assigned the same label, so please read the printout and change class label manually in create_label_files.py.

In addition, you need to create train and test images

python makeTrainTestfiles.py

and create other required path files

For each of the customized object, create an objectname.data file in the cfg folder

To get the object scale(max vertice distance), you can run

python getmeshscale.py

This should be everything you need for creating a customized dataset for singleshotpose, please don't forget to update the camera calibration parameters in singleshotpose as well.

(Optional) Create bounding box labels for object detection projects

After you complete step 6 (generated image masks). Run:

python get_BBs.py

This creates annotations.csv that contains class labels and bounding box information for all images under LINEMOD folder.

If you encounter any problems with the code, want to report bugs, etc. please contact me at faninedinburgh[at]gmail[dot]com.

Comments
  • make my own datasets for training another projects

    make my own datasets for training another projects

    Have you ever paid attention to DenseFusion and Deep_Object_Pose

    This two projects also realize 6D pose estimation. If I want to use the above two projects to train my data, can I use your tools to create my own datasets?

    enhancement 
    opened by hlcool 25
  • Problems when generating mesh

    Problems when generating mesh

    We are met with a problem after running compose_gt_poses.py. 'Mesh saved' is printed but when we open the .ply in meshlab, we get this result. Selection_006 The environment is like this: TIM图片20190511111252 An example of color pictures from JPENImages folder is 72 Our camera is realsense D435 Camera depth intrinsic is: {'fx':640.292, 'fy': 640.292, 'height':720, 'width':1280, 'coeffs':[0,0,0,0,0], 'ppy':357.747, 'ppx':647.852, 'model':2} The resolution of color and depth are both 1280*720.

    We think the reason is that markers are not Identified. But we don't know the ids of the markers. Can you provide them? Thanks.

    opened by JennahF 19
  • About the marker

    About the marker

    Hi, you said "Color print the pdf with the correctly sized aruco markers in the arucomarkers folder.", is that mean print the pdf in a A4 paper? I print the marker in A4 and cut the marker to make the marker around the object, like image But the object I reconstructed is not right in scale.

    opened by ghoshaw 9
  • Succesful Method for DenseFusion

    Succesful Method for DenseFusion

    I sucessfully added an extension to this git for DenseFusion, reassembling singleshotpose groundtruth label to DenseFusion camera pose gt.yml, Thanks for @F2Wang for making this Third Party Program, which is very helpful!

    opened by delonixsen 7
  • A question about 3d reconstruction

    A question about 3d reconstruction

    Hi~ i wonder whether your code has used bundle adjustment to optimize the 3D reconstruction parts?
    And why not use something, such as Elesticfution, to reconstruct the whole scene?

    opened by pyni 7
  • AttributeError: 'NoneType' object has no attribute 'shape' while creating Image Masks

    AttributeError: 'NoneType' object has no attribute 'shape' while creating Image Masks

    Hello! I am Akmaral. First of all, thank you for this great job and sharing with us.

    I am getting this error: " AttributeError: 'NoneType' object has no attribute 'shape' " while running 'python3 create_label_files.py LINEMOD/xxx' (Step 6). I have successfulIy completed all previous steps, and have registeredScene.ply inside my folder. Even updating my trimesh version doesn't help. Please can you give me a hint to solve this problem.

    opened by ghost 6
  • where is the origin of word coordinates of point clouds produced by register_scene.py ?Is it related to the markers?

    where is the origin of word coordinates of point clouds produced by register_scene.py ?Is it related to the markers?

    Hi, first thanks for your work. For now I want to use RGBD Camera to take RGB and depth images around single or multiple still objects ,and I want to get a full point clouds of the secene or the only object. BUT i NEED TO KNOW the origin of word coordinates of these point clouds ,because i need this information to do manipulation with UR5 robots. Can u give me some instructions about that ? Thanks.

    opened by MengHao666 5
  • Adding my own model

    Adding my own model

    Hello, Thank you for your work. I have a question, is it possible to add my own object to the scene instead of creating a mesh from the reconstructed scene? How can I do it? thank you

    opened by luigifaticoso 4
  • about register_scene.py

    about register_scene.py

    when i run register_scene.py ,The reconstruction result is relatively bad,I think it's the reason for the video,but i don't get a good result after several attempts , so about make a viedo,Do you have any guidance?or another reason?

    opened by shanniruo 4
  • Question about step 5

    Question about step 5

    1. The registered pointcloud (manually) mentioned in step 5 is it the same thing generated by the register_scene.py ? (But it seems that register_scene.py already will generate the mesh )

    2. If the answer is yes in previous question, then can we just use the result from register_segmented.py without the need to do step 5 ?

    3. If we really need to use meshlab, is it ok to just use the same setting as what you did in your youtube video ?

    Thank you

    opened by JacksonnnTan 4
  • Issue with 2D and 3D corner coordinates

    Issue with 2D and 3D corner coordinates

    Hello!

    Sorry for bothering you again, but I can't understand some moments. I have 2 questions:

    1. I am confused that 2D corner coordinates are not between 0-1, but bigger than 1. Don't you think this is because of the camera I am using? I am using Intel RealSense D435.

    2. In corners = compute_projection(transformed,K), transformed are 3D corner coordinates in world frame or in my model frame?

    opened by ghost 4
  • create_label_files

    create_label_files

    when i try to get the mask of the object,it appear that: Traceback (most recent call last): File "create_label_files.py", line 155, in cnt = max(contours, key=cv2.contourArea) ValueError: max() arg is an empty sequence

    i don not know how to handle it,anybody can you help me ? thank you very much!

    opened by Roy-815 0
  • ValueError: not enough values to unpack (expected 3, got 2)

    ValueError: not enough values to unpack (expected 3, got 2)

    Lamp is assigned class label 0. 0%| | 0/1199 [00:00<?, ?it/s] Traceback (most recent call last): File "create_label_files.py", line 155, in cv2.CHAIN_APPROX_SIMPLE) ValueError: not enough values to unpack (expected 3, got 2)

    how can i do? help me!

    opened by shqmffl486 1
  • NameError: name 'registration' is not defined

    NameError: name 'registration' is not defined

    Hi @paperstiger @F2Wang ! Please look into this issue,

               ```
    Traceback (most recent call last):
      File "compute_gt_poses.py", line 351, in <module>
        pose_graph = full_registration(path, max_correspondence_distance_coarse,
      File "compute_gt_poses.py", line 152, in full_registration
        pose_graph = registration.PoseGraph()
    NameError: name 'registration' is not defined
    
    That registration module itself not imported and even don't know that where it is defined.
    can you please check it ?
    opened by sowmyakavali 1
  • support Open3D 0.15.1 and OpenCV 4.6

    support Open3D 0.15.1 and OpenCV 4.6

    • Newer open3d(0.15.1) has moved some APIs under pipeline namespace.
    • The signature of cv2.findCountours() in newer OpenCV(4.6) is different from older version.

    I confirmed the script works well under the following conditions:

    # Python 3.9.13
    anyio==3.6.1
    argon2-cffi==21.3.0
    argon2-cffi-bindings==21.2.0
    asttokens==2.0.8
    attrs==22.1.0
    Babel==2.10.3
    backcall==0.2.0
    beautifulsoup4==4.11.1
    bleach==5.0.1
    certifi==2022.9.24
    cffi==1.15.1
    charset-normalizer==2.1.1
    colorama==0.4.5
    debugpy==1.6.3
    decorator==5.1.1
    defusedxml==0.7.1
    deprecation==2.1.0
    entrypoints==0.4
    executing==1.1.1
    fastjsonschema==2.16.2
    idna==3.4
    importlib-metadata==5.0.0
    ipykernel==6.16.0
    ipython==8.5.0
    ipython-genutils==0.2.0
    ipywidgets==8.0.2
    jedi==0.18.1
    Jinja2==3.1.2
    json5==0.9.10
    jsonschema==4.16.0
    jupyter-core==4.11.1
    jupyter-server==1.19.1
    jupyter_client==7.3.5
    jupyter_packaging==0.12.3
    jupyterlab==3.4.8
    jupyterlab-pygments==0.2.2
    jupyterlab-widgets==3.0.3
    jupyterlab_server==2.15.2
    MarkupSafe==2.1.1
    matplotlib-inline==0.1.6
    mistune==2.0.4
    nbclassic==0.4.5
    nbclient==0.7.0
    nbconvert==7.2.1
    nbformat==5.7.0
    nest-asyncio==1.5.6
    notebook==6.4.12
    notebook-shim==0.1.0
    numpy==1.23.3
    open3d==0.15.1
    opencv-contrib-python==4.6.0.66
    opencv-python==4.6.0.66
    packaging==21.3
    pandocfilters==1.5.0
    parso==0.8.3
    pickleshare==0.7.5
    Pillow==9.2.0
    prometheus-client==0.14.1
    prompt-toolkit==3.0.31
    psutil==5.9.2
    pure-eval==0.2.2
    pycparser==2.21
    Pygments==2.13.0
    pykdtree==1.3.5
    pyparsing==3.0.9
    pypng==0.20220715.0
    pyrealsense2==2.51.1.4348
    pyrsistent==0.18.1
    python-dateutil==2.8.2
    pytz==2022.4
    pywin32==304
    pywinpty==2.0.8
    pyzmq==24.0.1
    requests==2.28.1
    scipy==1.9.2
    Send2Trash==1.8.0
    six==1.16.0
    sniffio==1.3.0
    soupsieve==2.3.2.post1
    stack-data==0.5.1
    terminado==0.16.0
    tinycss2==1.1.1
    tomli==2.0.1
    tomlkit==0.11.5
    tornado==6.2
    tqdm==4.64.1
    traitlets==5.4.0
    trimesh==3.15.4
    urllib3==1.26.12
    wcwidth==0.2.5
    webencodings==0.5.1
    websocket-client==1.4.1
    widgetsnbextension==4.0.3
    zipp==3.9.0
    
    opened by hirohitokato 0
  • Issue with step6

    Issue with step6

    when I run step6,it raise this issue: File "qhull.pyx", line 2431, in scipy.spatial.qhull.ConvexHull.init File "qhull.pyx", line 356, in scipy.spatial.qhull._Qhull.init scipy.spatial.qhull.QhullError: QH7023 qhull option warning: unknown 'Q' qhull option 'Qn', skip to next space QH6035 qhull option error: see previous warnings, use 'Qw' to override: 'qhull i QJn Pp QbB Qt' (last offset 9)

    While executing: | qhull i QJn Pp QbB Qt Options selected for Qhull 2019.1.r 2019/06/21: run-id 165439650 incidence Pprecision-ignore QbBound-unit-box 0.5 Qtriangulate _maxoutside 0

    opened by bckaiwang 0
  • Issue with step 4: register_segmented.py, TypeError occurs

    Issue with step 4: register_segmented.py, TypeError occurs

    Hi, I found some error at register_segmented.py

    100%|██████████████████████████████████████████████████████████████████████████████████████| 42/42 [00:15<00:00, 2.68it/s] Apply post processing 100%|██████████████████████████████████████████████████████████████████████████████████████| 33/33 [00:00<00:00, 58.07it/s]

    Traceback (most recent call last):

    File "/home/youngwoo/ObjectDatasetTools/register_segmented.py", line 243, in plane_norm = normalize(np.array(plane_equation[:3])) TypeError: 'NoneType' object has no attribute 'getitem'

    I got this error after processing. Could you please help me with this error? Thank you!

    opened by youngwoo1 0
Owner
null
Blender add-on: Add to Cameras menu: View → Camera, View → Add Camera, Camera → View, Previous Camera, Next Camera

Blender add-on: Camera additions In 3D view, it adds these actions to the View|Cameras menu: View → Camera : set the current camera to the 3D view Vie

German Bauer 11 Feb 8, 2022
DSAC* for Visual Camera Re-Localization (RGB or RGB-D)

DSAC* for Visual Camera Re-Localization (RGB or RGB-D) Introduction Installation Data Structure Supported Datasets 7Scenes 12Scenes Cambridge Landmark

Visual Learning Lab 143 Dec 22, 2022
Improving Object Detection by Estimating Bounding Box Quality Accurately

Improving Object Detection by Estimating Bounding Box Quality Accurately Abstrac

null 2 Apr 14, 2022
LQM - Improving Object Detection by Estimating Bounding Box Quality Accurately

Improving Object Detection by Estimating Bounding Box Quality Accurately Abstract Object detection aims to locate and classify object instances in ima

IM Lab., POSTECH 0 Sep 28, 2022
Python tools for 3D face: 3DMM, Mesh processing(transform, camera, light, render), 3D face representations.

face3d: Python tools for processing 3D face Introduction This project implements some basic functions related to 3D faces. You can use this to process

Yao Feng 2.3k Dec 30, 2022
noisy labels; missing labels; semi-supervised learning; entropy; uncertainty; robustness and generalisation.

ProSelfLC: CVPR 2021 ProSelfLC: Progressive Self Label Correction for Training Robust Deep Neural Networks For any specific discussion or potential fu

amos_xwang 57 Dec 4, 2022
This code finds bounding box of a single human mouth.

This code finds bounding box of a single human mouth. In comparison to other face segmentation methods, it is relatively insusceptible to open mouth conditions, e.g., yawning, surgical robots, etc. The mouth coordinates are found in a more certified way using two independent algorithms. Therefore, the algorithm can be used in more sensitive applications.

iThermAI 4 Nov 27, 2022
Alpha-IoU: A Family of Power Intersection over Union Losses for Bounding Box Regression

Alpha-IoU: A Family of Power Intersection over Union Losses for Bounding Box Regression YOLOv5 with alpha-IoU losses implemented in PyTorch. Example r

Jacobi(Jiabo He) 147 Dec 5, 2022
Fast algorithms to compute an approximation of the minimal volume oriented bounding box of a point cloud in 3D.

ApproxMVBB Status Build UnitTests Homepage Fast algorithms to compute an approximation of the minimal volume oriented bounding box of a point cloud in

Gabriel Nützi 390 Dec 31, 2022
Official code of Retinal Vessel Segmentation with Pixel-wise Adaptive Filters and Consistency Training

Official code of Retinal Vessel Segmentation with Pixel-wise Adaptive Filters and Consistency Training (ISBI 2022)

anonymous 7 Feb 10, 2022
This is an official implementation of "Polarized Self-Attention: Towards High-quality Pixel-wise Regression"

Polarized Self-Attention: Towards High-quality Pixel-wise Regression This is an official implementation of: Huajun Liu, Fuqiang Liu, Xinyi Fan and Don

DeLightCMU 212 Jan 8, 2023
Pixel-wise segmentation on VOC2012 dataset using pytorch.

PiWiSe Pixel-wise segmentation on the VOC2012 dataset using pytorch. FCN SegNet PSPNet UNet RefineNet For a more complete implementation of segmentati

Bodo Kaiser 378 Dec 30, 2022
Code for "PVNet: Pixel-wise Voting Network for 6DoF Pose Estimation" CVPR 2019 oral

Good news! We release a clean version of PVNet: clean-pvnet, including how to train the PVNet on the custom dataset. Use PVNet with a detector. The tr

ZJU3DV 722 Dec 27, 2022
Retinal Vessel Segmentation with Pixel-wise Adaptive Filters (ISBI 2022)

Retinal Vessel Segmentation with Pixel-wise Adaptive Filters (ISBI 2022) Introdu

anonymous 14 Oct 27, 2022
"MST++: Multi-stage Spectral-wise Transformer for Efficient Spectral Reconstruction" (CVPRW 2022) & (Winner of NTIRE 2022 Challenge on Spectral Reconstruction from RGB)

MST++: Multi-stage Spectral-wise Transformer for Efficient Spectral Reconstruction (CVPRW 2022) Yuanhao Cai, Jing Lin, Zudi Lin, Haoqian Wang, Yulun Z

Yuanhao Cai 274 Jan 5, 2023
3DMV jointly combines RGB color and geometric information to perform 3D semantic segmentation of RGB-D scans.

3DMV 3DMV jointly combines RGB color and geometric information to perform 3D semantic segmentation of RGB-D scans. This work is based on our ECCV'18 p

Владислав Молодцов 0 Feb 6, 2022
Black-Box-Tuning - Black-Box Tuning for Language-Model-as-a-Service

Black-Box-Tuning Source code for paper "Black-Box Tuning for Language-Model-as-a

Tianxiang Sun 149 Jan 4, 2023
Mesh Graphormer is a new transformer-based method for human pose and mesh reconsruction from an input image

MeshGraphormer ✨ ✨ This is our research code of Mesh Graphormer. Mesh Graphormer is a new transformer-based method for human pose and mesh reconsructi

Microsoft 251 Jan 8, 2023
A repo that contains all the mesh keys needed for mesh backend, along with a code example of how to use them in python

Mesh-Keys A repo that contains all the mesh keys needed for mesh backend, along with a code example of how to use them in python Have been seeing alot

Joseph 53 Dec 13, 2022