Code for Ditto: Building Digital Twins of Articulated Objects from Interaction

UT Robot Perception and Learning Lab

Last update: Dec 22, 2022

Related tags

Overview

Ditto: Building Digital Twins of Articulated Objects from Interaction

Zhenyu Jiang, Cheng-Chun Hsu, Yuke Zhu

CVPR 2022, Oral

Project | arxiv

News

2022-04-28: We released the data generation code of Ditto here.

Introduction

Ditto (Digital Twins of Articulated Objects) is a model that reconstructs part-level geometry and articulation model of an articulated object given observations before and after an interaction. Specifically, we use a PointNet++ encoder to encoder the input point cloud observations, and fuse the subsampled point features with a simple attention layer. Then we use two independent decoders to propagate the fused point features into two sets of dense point features, for geometry reconstruction and articulation estimation separately. We construct feature grid/planes by projecting and pooling the point features, and query local features from the constructed feature grid/planes. Conditioning on local features, we use different decoders to predict occupancy, segmentation and joint parameters with respect to the query points. At then end, we can extract explicit geometry and articulation model from the implicit decoders.

If you find our work useful in your research, please consider citing.

Installation

Create a conda environment and install required packages.

conda env create -f conda_env_gpu.yaml -n Ditto

You can change the pytorch and cuda version in conda_env_gpu.yaml.

Build ConvONets dependents by running python scripts/convonet_setup.py build_ext --inplace.
Download the data, then unzip the data.zip under the repo's root.

Training

# single GPU
python run.py experiment=Ditto_s2m

# multiple GPUs
python run.py trainer.gpus=4 +trainer.accelerator='ddp' experiment=Ditto_s2m

# multiple GPUs + wandb logging
python run.py trainer.gpus=4 +trainer.accelerator='ddp' logger=wandb logger.wandb.group=s2m experiment=Ditto_s2m

Testing

# only support single GPU
python run_test.py experiment=Ditto_s2m trainer.resume_from_checkpoint=/path/to/trained/model/

Demo

Here is a minimum demo that starts from multiview depth maps before and after interaction and ends with a reconstructed digital twin. To run the demo, you need to install this library for visualization.

We provide the posed depth images of a real word laptop to run the demo. You can download from here and put it under data. You can also run demo your own data that follows the same format.

Data and pre-trained models

Data: here. Remeber to cite Shape2Motion and Abbatematteo et al. as well as Ditto when using these datasets.

Pre-trained models: Shape2Motion dataset, Synthetic dataset.

Useful tips

Run eval "$(python run.py -sc install=bash)" under the root directory, you can have auto-completion for commandline options.
Install pre-commit hooks by pip install pre-commit; pre-commit install, then you can have automatic formatting before each commit.

Related Repositories

Our code is based on this fantastic template Lightning-Hydra-Template.
We use ConvONets as our backbone.

Citing

@inproceedings{jiang2022ditto,
   title={Ditto: Building Digital Twins of Articulated Objects from Interaction},
   author={Jiang, Zhenyu and Hsu, Cheng-Chun and Zhu, Yuke},
   booktitle={arXiv preprint arXiv:2202.08227},
   year={2022}
}

Comments

Where do you save the digital twins?

Hi! I'm new to the way your code organizes and I wonder where the output of the model is. And how to visualize it using the utils3d tools? Many thanks!

opened by Yushi-Du 4
How to visualize the test results of shape2motion dataset?

Hi, according to the tutorial, I have realized the visualization of test results of the real word dataset, which depends on the input of depth images and RGB images. However, how to visualize the test results of the shape2motion dataset? Thank you very much for your help!

opened by buptmjj 4
The results of testing on the pre-trained model

Hello, we have tested using the pre-trained model provided in this paper, but the results seem to be incorrect. Where may there be a problem? We haven't changed the program code during this period. Thank you for your reply.

opened by buptmjj 4
R the models in the canonical object space?

Hi,

I notice that in the demo, the laptop is converted from depth image to camera coordinate, then to world coordinate. What's the reason to convert into the world coordinate? Does it promise that the object is in the canonical object space?

When training on the shape2motion data, is the training data in the canonical object space (with a canonical pose)? Or it's just in the camera coordinate

opened by Jianghanxiao 3
What's the meaning of 'recenter' when inferencing the pivot point

Hi,

Thanks for your great work. When going through the demo code, I don't quite understand about the recenter process. Can you help exaplain a bit. Based on my current understanding, the pivot point has been the averaged motion origin. Why do we want another recenter operation? What's the use of the double cross?

if recenter: pivot_point = np.cross(axis, np.cross(pivot_point, axis))

opened by Jianghanxiao 2
What are the data contained in the camera2base.json file?

Hi, I'd like to ask what data are contained in camera2base.json file? It looks like the pose of the camera, but what does the base coordinate system mean?

opened by buptmengjj 2
Different visualization result in demo_depth_map.ipynb
I have been replicating the visualization result shown in demo_depth_map.ipynb (https://github.com/UT-Austin-RPL/Ditto/blob/master/notebooks/demo_depth_map.ipynb). The following are the steps I executed:

Clone the Ditto repo and create a virtual env in PyCharm. (Python 3.8.10). Then perform the rest in the created virtual env.

install the required packages by 'pip install -r requirements.txt'

build the dependencies by 'python scripts/convonet_setup.py build_ext --inplace'

collect all the required data and put them under root/data directory, following the README.m (https://github.com/UT-Austin-RPL/Ditto#data-and-pre-trained-models, https://utexas.box.com/s/ujb2ky8y9vaog7nheth1n3tmm1rgx9t7, https://utexas.box.com/s/a4h001b3ciicrt3f71t4xd3wjsm04be7, https://utexas.box.com/s/zbf5bja20n2w6umryb1bcfbbcm3h2ysn)

install an older version of utils3d by 'pip install git+git://github.com/Steve-Tod/utils3d.git@bbd72687404436b37c90230a572891075aa8a53b' because the Pyrenderer in the newest located at a different path and will cause error when executing in my env.

Then I directly run all the cells in demo_depth_map.ipynb, and get the following results (from different viewpoints)

The prismatic joint axis looks different from the demo and meshes of both digital twins from my result seem to be incomplete.

Below are the pic from demo.
opened by RobinWangSD 2
FileNotFoundError: [Errno 2] No such file or directory:

Hello , this error comes when I do the Training , Is there any solution

File "C:\Users\Labor\anaconda3\envs\Ditto\lib\runpy.py", line 234, in _get_code_from_file with io.open_code(decoded_path) as f: FileNotFoundError: [Errno 2] No such file or directory: 'C:\Users\Labor\Digital-Twin\Ditto\logs\runs\2022-05-19\Ditto_s2m-10-04-35\run.py'

opened by NsiriRoua 12

Owner

UT Robot Perception and Learning Lab

GitHub

Official PyTorch implementation of CAPTRA: CAtegory-level Pose Tracking for Rigid and Articulated Objects from Point Clouds

CAPTRA: CAtegory-level Pose Tracking for Rigid and Articulated Objects from Point Clouds Introduction This is the official PyTorch implementation of o

96 Dec 7, 2022

Pytorch Implementation of Interaction Networks for Learning about Objects, Relations and Physics

Interaction-Network-Pytorch Pytorch Implementraion of Interaction Networks for Learning about Objects, Relations and Physics. Interaction Network is a

117 Nov 5, 2022

Implementation of Barlow Twins paper

barlowtwins PyTorch Implementation of Barlow Twins paper: Barlow Twins: Self-Supervised Learning via Redundancy Reduction This is currently a work in

86 Dec 20, 2022

PyTorch implementation of Barlow Twins.

Barlow Twins: Self-Supervised Learning via Redundancy Reduction PyTorch implementation of Barlow Twins. @article{zbontar2021barlow, title={Barlow Tw

839 Dec 29, 2022

Twins: Revisiting the Design of Spatial Attention in Vision Transformers

Twins: Revisiting the Design of Spatial Attention in Vision Transformers Very recently, a variety of vision transformer architectures for dense predic

482 Dec 18, 2022

Barlow Twins and HSIC

Barlow Twins and HSIC Unofficial Pytorch implementation for Barlow Twins and HSIC_SSL on small datasets (CIFAR10, STL10, and Tiny ImageNet). Correspon

49 Nov 24, 2022

Code for "LASR: Learning Articulated Shape Reconstruction from a Monocular Video". CVPR 2021.

LASR Installation Build with conda conda env create -f lasr.yml conda activate lasr # install softras cd third_party/softras; python setup.py install;

157 Dec 26, 2022

Code for Motion Representations for Articulated Animation paper

Motion Representations for Articulated Animation This repository contains the source code for the CVPR'2021 paper Motion Representations for Articulat

851 Jan 9, 2023

This repository contains the accompanying code for Deep Virtual Markers for Articulated 3D Shapes, ICCV'21

Deep Virtual Markers This repository contains the accompanying code for Deep Virtual Markers for Articulated 3D Shapes, ICCV'21 Getting Started Get sa

45 Oct 7, 2022

SCALE: Modeling Clothed Humans with a Surface Codec of Articulated Local Elements (CVPR 2021)

SCALE: Modeling Clothed Humans with a Surface Codec of Articulated Local Elements (CVPR 2021) This repository contains the official PyTorch implementa

133 Jan 5, 2023

A-SDF: Learning Disentangled Signed Distance Functions for Articulated Shape Representation (ICCV 2021)

A-SDF: Learning Disentangled Signed Distance Functions for Articulated Shape Representation (ICCV 2021) This repository contains the official implemen

81 Dec 14, 2022

Pytorch implementation for A-NeRF: Articulated Neural Radiance Fields for Learning Human Shape, Appearance, and Pose

A-NeRF: Articulated Neural Radiance Fields for Learning Human Shape, Appearance, and Pose Paper | Website | Data A-NeRF: Articulated Neural Radiance F

172 Dec 22, 2022

ViSER: Video-Specific Surface Embeddings for Articulated 3D Shape Reconstruction

ViSER: Video-Specific Surface Embeddings for Articulated 3D Shape Reconstruction. NeurIPS 2021.

59 Nov 25, 2022

API for RL algorithm design & testing of BCA (Building Control Agent) HVAC on EnergyPlus building energy simulator by wrapping their EMS Python API

RL - EmsPy (work In Progress...) The EmsPy Python package was made to facilitate Reinforcement Learning (RL) algorithm research for developing and tes

20 Jan 5, 2023

Code for KDD'20 "An Efficient Neighborhood-based Interaction Model for Recommendation on Heterogeneous Graph"

Heterogeneous INteract and aggreGatE (GraphHINGE) This is a pytorch implementation of GraphHINGE model. This is the experiment code in the following w

69 Nov 24, 2022

Code for our CVPR 2022 Paper "GEN-VLKT: Simplify Association and Enhance Interaction Understanding for HOI Detection"

GEN-VLKT Code for our CVPR 2022 paper "GEN-VLKT: Simplify Association and Enhance Interaction Understanding for HOI Detection". Contributed by Yue Lia

47 Dec 4, 2022

Provided is code that demonstrates the training and evaluation of the work presented in the paper: "On the Detection of Digital Face Manipulation" published in CVPR 2020.

FFD Source Code Provided is code that demonstrates the training and evaluation of the work presented in the paper: "On the Detection of Digital Face M

88 Nov 22, 2022

PaddleRobotics is an open-source algorithm library for robots based on Paddle, including open-source parts such as human-robot interaction, complex motion control, environment perception, SLAM positioning, and navigation.

简体中文 | English PaddleRobotics paddleRobotics是基于paddle的机器人开源算法库集，包括人机交互、复杂运动控制、环境感知、slam定位导航等开源算法部分。人机交互主动多模交互技术TFVT-HRI 主动多模交互技术是通过视觉、语音、触摸传感器等输入机器人

185 Dec 26, 2022

Repo for CVPR2021 paper "QPIC: Query-Based Pairwise Human-Object Interaction Detection with Image-Wide Contextual Information"

QPIC: Query-Based Pairwise Human-Object Interaction Detection with Image-Wide Contextual Information by Masato Tamura, Hiroki Ohashi, and Tomoaki Yosh

105 Dec 23, 2022