[CVPR 2022] Pytorch implementation of "Templates for 3D Object Pose Estimation Revisited: Generalization to New objects and Robustness to Occlusions" paper

Van Nguyen Nguyen

Last update: Dec 28, 2022

Related tags

Deep Learning computer-vision deep-learning pose-estimation 6dof object-pose-estimation cvpr2022

Overview

template-pose

Pytorch implementation of "Templates for 3D Object Pose Estimation Revisited: Generalization to New objects and Robustness to Occlusions" paper (accepted to CVPR 2022)

Van Nguyen Nguyen, Yinlin Hu, Yang Xiao, Mathieu Salzmann and Vincent Lepetit

Check out our paper and webpage for details!

If our project is helpful for your research, please consider citing :

@inproceedings{nguyen2022template,
    title={Templates for 3D Object Pose Estimation Revisited: Generalization to New objects and Robustness to Occlusions},
    author={Nguyen, Van Nguyen and Hu, Yinlin and Xiao, Yang and Salzmann, Mathieu and Lepetit, Vincent},
    booktitle={Proceedings IEEE Conf. on Computer Vision and Pattern Recognition (CVPR)},
    year={2022}}

Methodology 🧑‍🎓

We introduce template-pose, which estimates 3D pose of new objects (can be very different from the training ones, i.e LINEMOD dataset) with only their 3D models. Our method requires neither a training phase on these objects nor images depicting them.

Two settings are considered in this work:

Dataset	Predict ID object	In-plane rotation
(Occlusion-)LINEMOD	Yes	No
T-LESS	No	Yes

Installation 👨‍🔧

We recommend creating a new Anaconda environment to use template-pose. Use the following commands to setup a new environment:

conda env create -f environment.yml
conda activate template

Optional: Installation of BlenderProc is required to render synthetic images. It can be ignored if you use our provided template. More details can be found in Datasets.

Datasets 😺 🔌

Before downloading the datasets, you may change this line to define the $ROOT folder (to store data and results).

There are two options:

To download our pre-processed datasets (15GB) + SUN397 dataset (37GB)

./data/download_preprocessed_data.sh

Optional: You can download with following gdrive links and unzip them manually. We recommend keeping $DATA folder structure as detailed in ./data/README to keep pipeline simple:

LINEMOD and Occlusion-LINEMOD (3GB) -> then run python -m data.crop_image_linemod
T-LESS (11GB)
Templates (both T-LESS and LINEMOD) (1.7GB)
Dataframes including query-template pairwise used for training (11MB)
SUN397, randomized background for training on T-LESS (37GB)

To download the original datasets and process them from scratch (process GT poses, render templates, compute nearest neighbors). All the main steps are detailed in ./data/README.

./data/download_and_process_from_scratch.sh

For any training with backbone ResNet50, we initialise with pretrained features of MOCOv2 which can be downloaded with the following command:

python -m lib.download_weight --model_name MoCov2

T-LESS 🔌

1. To launch a training on T-LESS:

python train_tless.py --config_path ./config_run/TLESS.json

2. To reproduce the results on T-LESS:

To download pretrained weights (by default, they are saved at $ROOT/pretrained/TLESS.pth):

python -m lib.download_weight --model_name TLESS

Optional: You can download manually with this link

To evaluate model with the pretrained weight:

python test_tless.py --config_path ./config_run/TLESS.json --checkpoint $ROOT/pretrained/TLESS.pth

LINEMOD and Occlusion-LINEMOD 😺

1. To launch a training on LINEMOD:

python train_linemod.py --config_path config_run/LM_$backbone_$split_name.json

For example, with “base" backbone and split #1:

python train_linemod.py --config_path config_run/LM_baseNetwork_split1.json

2. To reproduce the results on LINEMOD:

To download pretrained weights (by default, they are saved at $ROOT/pretrained):

python -m lib.download_weight --model_name LM_$backbone_$split_name

Optional: You can download manually with this link

To evaluate model with a checkpoint_path:

python test_linemod.py --config_path config_run/LM_$backbone_$split_name.json --checkpoint checkpoint_path

For example, with “base" backbone and split #1:

python -m lib.download_weight --model_name LM_baseNetwork_split1
python test_linemod.py --config_path config_run/LM_baseNetwork_split1.json --checkpoint $ROOT/pretrained/LM_baseNetwork_split1.pth

Acknowledgement

The code is adapted from PoseContrast, DTI-Clustering, CosyPose and BOP Toolkit. Many thanks to them!

The authors thank Martin Sundermeyer, Paul Wohlhart and Shreyas Hampali for their fast reply, feedback!

Contact

If you have any question, feel free to create an issue or contact the first author at van-nguyen.nguyen@enpc.fr

Comments

LINEMOD.json

Hello author!

can i ask something.. so i run "python -m data.create_dataframe_linemod" this commend. but results like this (in linemod.json)

"real_location": [ [ NaN, NaN, NaN ], [ NaN, NaN, NaN ], [ NaN, NaN, NaN ], [ NaN, NaN, NaN ], [ NaN, NaN,

this results because of BlenderProc or blender version???????????????

opened by shqmffl486 3
The code for generating predefined poses

This code was very helpful to me. I see the *.npy file in ./lib/poses/predefined_poses. However, there seems to be nothing in the code about how these poses are generated. Maybe the author can share this part of the code?

opened by wendaizhou 2
implementation details

Hi Van-Nguyen,

Thanks for the excellent work! But I have a few questions about the implementation details.

1). How can we properly crop the target object from an RGB image? Or how did you know the 2D location and the size (even the in-plane rotation angle) of the object bounding box used for cropping the object? In my understanding, you need to use the ground truth object 6D pose to determine the object's 2D projection center and the cropping size, right?

2). For creating the templates, how did you define the (virtual) camera distance to the object (or the diameter of the viewpoint sphere)? As I know, different objects have different sizes, the camera distance will affect the scale of the object in the rendered images.

3). For the LM dataset, as mentioned in the supplementary material, the object in-plane rotation will be ignored. What does it mean? Do you mean, for each crop, the ground truth in-plane rotation is assumed to be known and we only estimate the object's viewpoint (out-of-plane rotation )?

Looking forward to your reply and thank you very much!

opened by dingdingcai 2
A question about the definition of global/local

Hi Van-Nguyen,

Thank you for your excellent work!

I am a little confused by the definition of global or local in the paper. The paper points out that [2] employs global features, but both methods use the local patch as input to train and inference. So, could you give a clearer introduction about the global/local feature you used in the experiments? Thank you.

Regards, Rui

opened by 63445538 2
If this mothod can be adapt to other dataset?

@nv-nguyen Thanks for this interesting paper!!!

I want to know if I have a dataset which is composed by other strong texture models such as car, desk, whether I can use this method mentioned by your great paper to estimate the pose?

And another problem I find is that I can run the training code on LINEMOD but can't run on TLESS, and the error is as follows.

Traceback (most recent call last): File "train_tless.py", line 107, in train_loss = training_utils.train(train_data=datasetLoader["train"], File "/home/rlk/template-pose-main/lib/datasets/tless/training_utils.py", line 64, in train tb_logger.add_scalar_dict_list('loss', [{'train_loss': meter_train_loss.avg, TypeError: add_scalar_dict_list() got an unexpected keyword argument 'step'

opened by ghost 1
How to use this work for pose estimation on novel objects?

Thanks for an interesting paper!

Is it possible to use this work for pose estimation of novel objects given that I have a textured CAD model? I managed to produce the template images using your BlenderProc script, but I can't figure out how to proceed from there. Can you provide instructions?

opened by mikkeljakobsen 1
Typo in paper

On T-LESS [14], we follow the protocol of [31] by using a dense regular icosahedron with 2’536 viewpoints and 36 in-plane rotations for each rendered image. Altogether, this yields 92’232 templates per object.

it should be 2562 and not 2536

have a nice day

opened by olivierp99 1
How to achieve “crop_frame”

I used my own dataset but could not complete the clipping. The camera's internal parameter matrix has been replaced, but it still cannot be implemented. Please how to crop_frame

opened by BraveBoBo 0
train_linemod.py

Hello!! @nv-nguyen

I'm sorry I kept asking you questions.. I want to train with my synthetic data!!

I'm going to train by adding 4 new models. But the comments (this is problem) "seen WARNING:root:NaN or Inf found in input tensor. WARNING:root:NaN or Inf found in input tensor. WARNING:root:NaN or Inf found in input tensor. WARNING:root:NaN or Inf found in input tensor. WARNING:root:NaN or Inf found in input tensor. WARNING:root:NaN or Inf found in input tensor. WARNING:root:NaN or Inf found in input tensor. WARNING:root:NaN or Inf found in input tensor. WARNING:root:NaN or Inf found in input tensor. WARNING:root:NaN or Inf found in input tensor. WARNING:root:NaN or Inf found in input tensor. WARNING:root:NaN or Inf found in input tensor. WARNING:root:NaN or Inf found in input tensor. WARNING:root:NaN or Inf found in input tensor. WARNING:root:NaN or Inf found in input tensor. WARNING:root:NaN or Inf found in input tensor. WARNING:root:NaN or Inf found in input tensor. WARNING:root:NaN or Inf found in input tensor. WARNING:root:NaN or Inf found in input tensor. WARNING:root:NaN or Inf found in input tensor. WARNING:root:NaN or Inf found in input tensor. WARNING:root:NaN or Inf found in input tensor. WARNING:root:NaN or Inf found in input tensor. WARNING:root:NaN or Inf found in input tensor. WARNING:root:NaN or Inf found in input tensor. WARNING:root:NaN or Inf found in input tensor. WARNING:root:NaN or Inf found in input tensor. WARNING:root:NaN or Inf found in input tensor. INFO:LM_ResNet50_split1:Epoch-78, seen -- Mean err: nan, Acc: 0.00, Rec : 0.93, Class and Pose : 0.00 INFO:LM_ResNet50_split1:Validation time for epoch 78: 0.57 minutes "

And

-> like this! and no warnings in this comment (this is okay) "Saving to /home/mount4t/template-pose_1/root/results/weights/LM_ResNet50_split1/model_epoch77.pth 98%|██████████████████████████████████████████████████████████████████████████████████▉ | 78/80 [13:57:56<21:28, 644.07s/it]INFO:LM_ResNet50_split1:Epoch-78 -- Iter [1/1528] loss: 0.00, (pos: 0.81, neg: 0.05) INFO:LM_ResNet50_split1:Epoch-78 -- Iter [10/1528] loss: 0.00, (pos: 0.86, neg: 0.05) INFO:LM_ResNet50_split1:Epoch-78 -- Iter [20/1528] loss: 0.00, (pos: 0.82, neg: 0.03) INFO:LM_ResNet50_split1:Epoch-78 -- Iter [30/1528] loss: 0.17, (pos: 0.83, neg: 0.07) "

So.... what does "WARNING:root:NaN or Inf found in input tensor." mean?

opened by shqmffl486 0
ask about process_gt_linemod

Hello author!

Thanks for the interesting paper!!

I'm a little confused in process gt linemod code in process_linemod function, you define the all_poses = ['cam_R_w2c: , 'cam_t_w2c'] If I use the synthetic data generated by blenderproc, Can I use the cam_R_w2c and cam_t_w2c in scene_camera.json? And I wonder that is it okay to use cam pose related with world(cam_R_w2c and cam_t_w2c)? I think it is not accurate object's 6D pose.

opened by kgmin156 0
half_sphere_level2_and_level3.npy

hello author!!

I'm going to train a new 3D model by putting it in an existing dataset. Should I modify the half_sphere_level2_and_level3.npy file? How can I open the file? help plz!

opened by shqmffl486 1
The code for testing single image

Hi @nv-nguyen, Thanks for the interesting paper!! I used test_linemod.py to evaluate the model. However, I didn't find visual results as shown in Figure 5 in the paper. Could you share the code for testing a single image?

Looking forward to your reply and thank you very much!

opened by liyuan-png 3

[CVPR 2022] Pytorch implementation of "Templates for 3D Object Pose Estimation Revisited: Generalization to New objects and Robustness to Occlusions" paper

Related tags

Overview

template-pose

Table of Content

Methodology 🧑‍🎓

Installation 👨‍🔧

Datasets 😺 🔌

T-LESS 🔌

1. To launch a training on T-LESS:

2. To reproduce the results on T-LESS:

LINEMOD and Occlusion-LINEMOD 😺

1. To launch a training on LINEMOD:

2. To reproduce the results on LINEMOD:

Acknowledgement

Contact

Comments

Owner

Van Nguyen Nguyen

Imposter-detector-2022 - HackED 2022 Team 3IQ - 2022 Imposter Detector

Official PyTorch implementation of the paper "Deep Constrained Least Squares for Blind Image Super-Resolution", CVPR 2022.

Pytorch re-implementation of Paper: SwinTextSpotter: Scene Text Spotting via Better Synergy between Text Detection and Text Recognition (CVPR 2022)

This project is the PyTorch implementation of our CVPR 2022 paper:

The 7th edition of NTIRE: New Trends in Image Restoration and Enhancement workshop will be held on June 2022 in conjunction with CVPR 2022.

PyTorch implementations of the paper: "DR.VIC: Decomposition and Reasoning for Video Individual Counting, CVPR, 2022"

Official Implementation of CVPR 2022 paper: "Mimicking the Oracle: An Initial Phase Decorrelation Approach for Class Incremental Learning"

This is the formal code implementation of the CVPR 2022 paper 'Federated Class Incremental Learning'.

Official implementation of the paper 'Details or Artifacts: A Locally Discriminative Learning Approach to Realistic Image Super-Resolution' in CVPR 2022

Official Pytorch implementation of "Learning to Estimate Robust 3D Human Mesh from In-the-Wild Crowded Scenes", CVPR 2022

Commonality in Natural Images Rescues GANs: Pretraining GANs with Generic and Privacy-free Synthetic Data - Official PyTorch Implementation (CVPR 2022)

Official pytorch implementation for Learning to Listen: Modeling Non-Deterministic Dyadic Facial Motion (CVPR 2022)

[CVPR 2022] Official PyTorch Implementation for "Reference-based Video Super-Resolution Using Multi-Camera Video Triplets"

This repository contains a pytorch implementation of "HeadNeRF: A Real-time NeRF-based Parametric Head Model (CVPR 2022)".

[CVPR 2022] Official code for the paper: "A Stitch in Time Saves Nine: A Train-Time Regularizing Loss for Improved Neural Network Calibration"

Code for our CVPR 2022 Paper "GEN-VLKT: Simplify Association and Enhance Interaction Understanding for HOI Detection"

Official code of the paper "Expanding Low-Density Latent Regions for Open-Set Object Detection" (CVPR 2022)

Official code for the CVPR 2022 (oral) paper "Extracting Triangular 3D Models, Materials, and Lighting From Images".

Official repository for the paper "Self-Supervised Models are Continual Learners" (CVPR 2022)