[CVPR 2022] Pytorch implementation of "Templates for 3D Object Pose Estimation Revisited: Generalization to New objects and Robustness to Occlusions" paper

Overview

template-pose

Pytorch implementation of "Templates for 3D Object Pose Estimation Revisited: Generalization to New objects and Robustness to Occlusions" paper (accepted to CVPR 2022)

Van Nguyen Nguyen, Yinlin Hu, Yang Xiao, Mathieu Salzmann and Vincent Lepetit

Check out our paper and webpage for details!

figures/method.png

If our project is helpful for your research, please consider citing :

@inproceedings{nguyen2022template,
    title={Templates for 3D Object Pose Estimation Revisited: Generalization to New objects and Robustness to Occlusions},
    author={Nguyen, Van Nguyen and Hu, Yinlin and Xiao, Yang and Salzmann, Mathieu and Lepetit, Vincent},
    booktitle={Proceedings IEEE Conf. on Computer Vision and Pattern Recognition (CVPR)},
    year={2022}}

Table of Content

Methodology πŸ§‘β€πŸŽ“

We introduce template-pose, which estimates 3D pose of new objects (can be very different from the training ones, i.e LINEMOD dataset) with only their 3D models. Our method requires neither a training phase on these objects nor images depicting them.

Two settings are considered in this work:

Dataset Predict ID object In-plane rotation
(Occlusion-)LINEMOD Yes No
T-LESS No Yes

Installation πŸ‘¨β€πŸ”§

We recommend creating a new Anaconda environment to use template-pose. Use the following commands to setup a new environment:

conda env create -f environment.yml
conda activate template

Optional: Installation of BlenderProc is required to render synthetic images. It can be ignored if you use our provided template. More details can be found in Datasets.

Datasets 😺 πŸ”Œ

Before downloading the datasets, you may change this line to define the $ROOT folder (to store data and results).

There are two options:

  1. To download our pre-processed datasets (15GB) + SUN397 dataset (37GB)
./data/download_preprocessed_data.sh

Optional: You can download with following gdrive links and unzip them manually. We recommend keeping $DATA folder structure as detailed in ./data/README to keep pipeline simple:

  1. To download the original datasets and process them from scratch (process GT poses, render templates, compute nearest neighbors). All the main steps are detailed in ./data/README.
./data/download_and_process_from_scratch.sh

For any training with backbone ResNet50, we initialise with pretrained features of MOCOv2 which can be downloaded with the following command:

python -m lib.download_weight --model_name MoCov2

T-LESS πŸ”Œ

1. To launch a training on T-LESS:

python train_tless.py --config_path ./config_run/TLESS.json

2. To reproduce the results on T-LESS:

To download pretrained weights (by default, they are saved at $ROOT/pretrained/TLESS.pth):

python -m lib.download_weight --model_name TLESS

Optional: You can download manually with this link

To evaluate model with the pretrained weight:

python test_tless.py --config_path ./config_run/TLESS.json --checkpoint $ROOT/pretrained/TLESS.pth

LINEMOD and Occlusion-LINEMOD 😺

1. To launch a training on LINEMOD:

python train_linemod.py --config_path config_run/LM_$backbone_$split_name.json

For example, with β€œbase" backbone and split #1:

python train_linemod.py --config_path config_run/LM_baseNetwork_split1.json

2. To reproduce the results on LINEMOD:

To download pretrained weights (by default, they are saved at $ROOT/pretrained):

python -m lib.download_weight --model_name LM_$backbone_$split_name

Optional: You can download manually with this link

To evaluate model with a checkpoint_path:

python test_linemod.py --config_path config_run/LM_$backbone_$split_name.json --checkpoint checkpoint_path

For example, with β€œbase" backbone and split #1:

python -m lib.download_weight --model_name LM_baseNetwork_split1
python test_linemod.py --config_path config_run/LM_baseNetwork_split1.json --checkpoint $ROOT/pretrained/LM_baseNetwork_split1.pth

Acknowledgement

The code is adapted from PoseContrast, DTI-Clustering, CosyPose and BOP Toolkit. Many thanks to them!

The authors thank Martin Sundermeyer, Paul Wohlhart and Shreyas Hampali for their fast reply, feedback!

Contact

If you have any question, feel free to create an issue or contact the first author at [email protected]

Comments
  • LINEMOD.json

    LINEMOD.json

    Hello author!

    can i ask something.. so i run "python -m data.create_dataframe_linemod" this commend. but results like this (in linemod.json)

    "real_location": [ [ NaN, NaN, NaN ], [ NaN, NaN, NaN ], [ NaN, NaN, NaN ], [ NaN, NaN, NaN ], [ NaN, NaN,

    this results because of BlenderProc or blender version???????????????

    opened by shqmffl486 3
  • The code for generating predefined poses

    The code for generating predefined poses

    This code was very helpful to me. I see the *.npy file in ./lib/poses/predefined_poses. However, there seems to be nothing in the code about how these poses are generated. Maybe the author can share this part of the code?

    opened by wendaizhou 2
  • implementation details

    implementation details

    Hi Van-Nguyen,

    Thanks for the excellent work! But I have a few questions about the implementation details.

    1). How can we properly crop the target object from an RGB image? Or how did you know the 2D location and the size (even the in-plane rotation angle) of the object bounding box used for cropping the object? In my understanding, you need to use the ground truth object 6D pose to determine the object's 2D projection center and the cropping size, right?

    2). For creating the templates, how did you define the (virtual) camera distance to the object (or the diameter of the viewpoint sphere)? As I know, different objects have different sizes, the camera distance will affect the scale of the object in the rendered images.

    3). For the LM dataset, as mentioned in the supplementary material, the object in-plane rotation will be ignored. What does it mean? Do you mean, for each crop, the ground truth in-plane rotation is assumed to be known and we only estimate the object's viewpoint (out-of-plane rotation )?

    Looking forward to your reply and thank you very much!

    opened by dingdingcai 2
  • A question about the definition of global/local

    A question about the definition of global/local

    Hi Van-Nguyen,

    Thank you for your excellent work!

    I am a little confused by the definition of global or local in the paper. The paper points out that [2] employs global features, but both methods use the local patch as input to train and inference. So, could you give a clearer introduction about the global/local feature you used in the experiments? Thank you.

    Regards, Rui

    opened by 63445538 2
  • If this mothod can be adapt to other dataset?

    If this mothod can be adapt to other dataset?

    @nv-nguyen Thanks for this interesting paper!!!

    I want to know if I have a dataset which is composed by other strong texture models such as car, desk, whether I can use this method mentioned by your great paper to estimate the pose?

    And another problem I find is that I can run the training code on LINEMOD but can't run on TLESS, and the error is as follows.

    Traceback (most recent call last): File "train_tless.py", line 107, in train_loss = training_utils.train(train_data=datasetLoader["train"], File "/home/rlk/template-pose-main/lib/datasets/tless/training_utils.py", line 64, in train tb_logger.add_scalar_dict_list('loss', [{'train_loss': meter_train_loss.avg, TypeError: add_scalar_dict_list() got an unexpected keyword argument 'step'

    opened by ghost 1
  • How to use this work for pose estimation on novel objects?

    How to use this work for pose estimation on novel objects?

    Thanks for an interesting paper!

    Is it possible to use this work for pose estimation of novel objects given that I have a textured CAD model? I managed to produce the template images using your BlenderProc script, but I can't figure out how to proceed from there. Can you provide instructions?

    opened by mikkeljakobsen 1
  • Typo in paper

    Typo in paper

    On T-LESS [14], we follow the protocol of [31] by using a dense regular icosahedron with 2’536 viewpoints and 36 in-plane rotations for each rendered image. Altogether, this yields 92’232 templates per object.

    it should be 2562 and not 2536

    have a nice day

    opened by olivierp99 1
  • How to achieve β€œcrop_frame”

    How to achieve β€œcrop_frame”

    I used my own dataset but could not complete the clipping. The camera's internal parameter matrix has been replaced, but it still cannot be implemented. Please how to crop_frame

    opened by BraveBoBo 0
  • train_linemod.py

    train_linemod.py

    Hello!! @nv-nguyen

    I'm sorry I kept asking you questions.. I want to train with my synthetic data!!

    I'm going to train by adding 4 new models. But the comments (this is problem) "seen WARNING:root:NaN or Inf found in input tensor. WARNING:root:NaN or Inf found in input tensor. WARNING:root:NaN or Inf found in input tensor. WARNING:root:NaN or Inf found in input tensor. WARNING:root:NaN or Inf found in input tensor. WARNING:root:NaN or Inf found in input tensor. WARNING:root:NaN or Inf found in input tensor. WARNING:root:NaN or Inf found in input tensor. WARNING:root:NaN or Inf found in input tensor. WARNING:root:NaN or Inf found in input tensor. WARNING:root:NaN or Inf found in input tensor. WARNING:root:NaN or Inf found in input tensor. WARNING:root:NaN or Inf found in input tensor. WARNING:root:NaN or Inf found in input tensor. WARNING:root:NaN or Inf found in input tensor. WARNING:root:NaN or Inf found in input tensor. WARNING:root:NaN or Inf found in input tensor. WARNING:root:NaN or Inf found in input tensor. WARNING:root:NaN or Inf found in input tensor. WARNING:root:NaN or Inf found in input tensor. WARNING:root:NaN or Inf found in input tensor. WARNING:root:NaN or Inf found in input tensor. WARNING:root:NaN or Inf found in input tensor. WARNING:root:NaN or Inf found in input tensor. WARNING:root:NaN or Inf found in input tensor. WARNING:root:NaN or Inf found in input tensor. WARNING:root:NaN or Inf found in input tensor. WARNING:root:NaN or Inf found in input tensor. INFO:LM_ResNet50_split1:Epoch-78, seen -- Mean err: nan, Acc: 0.00, Rec : 0.93, Class and Pose : 0.00 INFO:LM_ResNet50_split1:Validation time for epoch 78: 0.57 minutes "

    And

    -> like this! and no warnings in this comment (this is okay) "Saving to /home/mount4t/template-pose_1/root/results/weights/LM_ResNet50_split1/model_epoch77.pth 98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 78/80 [13:57:56<21:28, 644.07s/it]INFO:LM_ResNet50_split1:Epoch-78 -- Iter [1/1528] loss: 0.00, (pos: 0.81, neg: 0.05) INFO:LM_ResNet50_split1:Epoch-78 -- Iter [10/1528] loss: 0.00, (pos: 0.86, neg: 0.05) INFO:LM_ResNet50_split1:Epoch-78 -- Iter [20/1528] loss: 0.00, (pos: 0.82, neg: 0.03) INFO:LM_ResNet50_split1:Epoch-78 -- Iter [30/1528] loss: 0.17, (pos: 0.83, neg: 0.07) "

    So.... what does "WARNING:root:NaN or Inf found in input tensor." mean?

    opened by shqmffl486 0
  • ask about process_gt_linemod

    ask about process_gt_linemod

    Hello author!

    Thanks for the interesting paper!!

    I'm a little confused in process gt linemod code in process_linemod function, you define the all_poses = ['cam_R_w2c: , 'cam_t_w2c'] If I use the synthetic data generated by blenderproc, Can I use the cam_R_w2c and cam_t_w2c in scene_camera.json? And I wonder that is it okay to use cam pose related with world(cam_R_w2c and cam_t_w2c)? I think it is not accurate object's 6D pose.

    opened by kgmin156 0
  • half_sphere_level2_and_level3.npy

    half_sphere_level2_and_level3.npy

    hello author!!

    I'm going to train a new 3D model by putting it in an existing dataset. Should I modify the half_sphere_level2_and_level3.npy file? How can I open the file? help plz!

    opened by shqmffl486 1
  • The code for testing single image

    The code for testing single image

    Hi @nv-nguyen, Thanks for the interesting paper!! I used test_linemod.py to evaluate the model. However, I didn't find visual results as shown in Figure 5 in the paper. Could you share the code for testing a single image?

    Looking forward to your reply and thank you very much!

    opened by liyuan-png 3
Owner
Van Nguyen Nguyen
PhD student at Imagine-ENPC, France
Van Nguyen Nguyen
Imposter-detector-2022 - HackED 2022 Team 3IQ - 2022 Imposter Detector

HackED 2022 Team 3IQ - 2022 Imposter Detector By Aneeljyot Alagh, Curtis Kan, Jo

Joshua Ji 3 Aug 20, 2022
Official PyTorch implementation of the paper "Deep Constrained Least Squares for Blind Image Super-Resolution", CVPR 2022.

Deep Constrained Least Squares for Blind Image Super-Resolution [Paper] This is the official implementation of 'Deep Constrained Least Squares for Bli

MEGVII Research 141 Dec 30, 2022
Pytorch re-implementation of Paper: SwinTextSpotter: Scene Text Spotting via Better Synergy between Text Detection and Text Recognition (CVPR 2022)

SwinTextSpotter This is the pytorch implementation of Paper: SwinTextSpotter: Scene Text Spotting via Better Synergy between Text Detection and Text R

mxin262 183 Jan 3, 2023
This project is the PyTorch implementation of our CVPR 2022 paper:

Requirements and Dependency Install PyTorch with CUDA (for GPU). (Experiments are validated on python 3.8.11 and pytorch 1.7.0) (For visualization if

Lei Huang 23 Nov 29, 2022
The 7th edition of NTIRE: New Trends in Image Restoration and Enhancement workshop will be held on June 2022 in conjunction with CVPR 2022.

NTIRE 2022 - Image Inpainting Challenge Important dates 2022.02.01: Release of train data (input and output images) and validation data (only input) 2

AndrΓ©s Romero 37 Nov 27, 2022
PyTorch implementations of the paper: "DR.VIC: Decomposition and Reasoning for Video Individual Counting, CVPR, 2022"

DRNet for Video Indvidual Counting (CVPR 2022) Introduction This is the official PyTorch implementation of paper: DR.VIC: Decomposition and Reasoning

tao han 35 Nov 22, 2022
Official Implementation of CVPR 2022 paper: "Mimicking the Oracle: An Initial Phase Decorrelation Approach for Class Incremental Learning"

(CVPR 2022) Mimicking the Oracle: An Initial Phase Decorrelation Approach for Class Incremental Learning ArXiv This repo contains Official Implementat

Yujun Shi 24 Nov 1, 2022
This is the formal code implementation of the CVPR 2022 paper 'Federated Class Incremental Learning'.

Official Pytorch Implementation for GLFC [CVPR-2022] Federated Class-Incremental Learning This is the official implementation code of our paper "Feder

Race Wang 57 Dec 27, 2022
Official implementation of the paper 'Details or Artifacts: A Locally Discriminative Learning Approach to Realistic Image Super-Resolution' in CVPR 2022

LDL Paper | Supplementary Material Details or Artifacts: A Locally Discriminative Learning Approach to Realistic Image Super-Resolution Jie Liang*, Hu

null 150 Dec 26, 2022
Official Pytorch implementation of "Learning to Estimate Robust 3D Human Mesh from In-the-Wild Crowded Scenes", CVPR 2022

Learning to Estimate Robust 3D Human Mesh from In-the-Wild Crowded Scenes / 3DCrowdNet News ?? 3DCrowdNet achieves the state-of-the-art accuracy on 3D

Hongsuk Choi 113 Dec 21, 2022
Commonality in Natural Images Rescues GANs: Pretraining GANs with Generic and Privacy-free Synthetic Data - Official PyTorch Implementation (CVPR 2022)

Commonality in Natural Images Rescues GANs: Pretraining GANs with Generic and Privacy-free Synthetic Data (CVPR 2022) Potentials of primitive shapes f

null 31 Sep 27, 2022
Official pytorch implementation for Learning to Listen: Modeling Non-Deterministic Dyadic Facial Motion (CVPR 2022)

Learning to Listen: Modeling Non-Deterministic Dyadic Facial Motion This repository contains a pytorch implementation of "Learning to Listen: Modeling

null 50 Dec 17, 2022
[CVPR 2022] Official PyTorch Implementation for "Reference-based Video Super-Resolution Using Multi-Camera Video Triplets"

Reference-based Video Super-Resolution (RefVSR) Official PyTorch Implementation of the CVPR 2022 Paper Project | arXiv | RealMCVSR Dataset This repo c

Junyong Lee 151 Dec 30, 2022
This repository contains a pytorch implementation of "HeadNeRF: A Real-time NeRF-based Parametric Head Model (CVPR 2022)".

HeadNeRF: A Real-time NeRF-based Parametric Head Model This repository contains a pytorch implementation of "HeadNeRF: A Real-time NeRF-based Parametr

null 294 Jan 1, 2023
[CVPR 2022] Official code for the paper: "A Stitch in Time Saves Nine: A Train-Time Regularizing Loss for Improved Neural Network Calibration"

MDCA Calibration This is the official PyTorch implementation for the paper: "A Stitch in Time Saves Nine: A Train-Time Regularizing Loss for Improved

MDCA Calibration 21 Dec 22, 2022
Code for our CVPR 2022 Paper "GEN-VLKT: Simplify Association and Enhance Interaction Understanding for HOI Detection"

GEN-VLKT Code for our CVPR 2022 paper "GEN-VLKT: Simplify Association and Enhance Interaction Understanding for HOI Detection". Contributed by Yue Lia

Yue Liao 47 Dec 4, 2022
Official code of the paper "Expanding Low-Density Latent Regions for Open-Set Object Detection" (CVPR 2022)

OpenDet Expanding Low-Density Latent Regions for Open-Set Object Detection (CVPR2022) Jiaming Han, Yuqiang Ren, Jian Ding, Xingjia Pan, Ke Yan, Gui-So

csuhan 64 Jan 7, 2023
Official code for the CVPR 2022 (oral) paper "Extracting Triangular 3D Models, Materials, and Lighting From Images".

nvdiffrec Joint optimization of topology, materials and lighting from multi-view image observations as described in the paper Extracting Triangular 3D

NVIDIA Research Projects 1.4k Jan 1, 2023
Official repository for the paper "Self-Supervised Models are Continual Learners" (CVPR 2022)

Self-Supervised Models are Continual Learners This is the official repository for the paper: Self-Supervised Models are Continual Learners Enrico Fini

Enrico Fini 73 Dec 18, 2022