Official Pytorch implementation of 6DRepNet: 6D Rotation representation for unconstrained head pose estimation.

Overview

PWC PWC Hugging Face Spaces

6D Rotation Representation for Unconstrained Head Pose Estimation (Pytorch)

animated

Paper

Thorsten Hempel and Ahmed A. Abdelrahman and Ayoub Al-Hamadi, "6D Rotation Representation for Unconstrained Head Pose Estimation", submitted to ICIP 2022. [ResearchGate][Arxiv]

Abstract

In this paper, we present a method for unconstrained end-to-end head pose estimation. We address the problem of ambiguous rotation labels by introducing the rotation matrix formalism for our ground truth data and propose a continuous 6D rotation matrix representation for efficient and robust direct regression. This way, our method can learn the full rotation appearance which is contrary to previous approaches that restrict the pose prediction to a narrow-angle for satisfactory results. In addition, we propose a geodesic distance-based loss to penalize our network with respect to the manifold geometry. Experiments on the public AFLW2000 and BIWI datasets demonstrate that our proposed method significantly outperforms other state-of-the-art methods by up to 20%.


Trained on 300W-LP, Test on AFLW2000 and BIWI

Full Range Yaw Pitch Roll MAE Yaw Pitch Roll MAE
HopeNet ( =2) N 6.47 6.56 5.44 6.16 5.17 6.98 3.39 5.18
HopeNet ( =1) N 6.92 6.64 5.67 6.41 4.81 6.61 3.27 4.90
FSA-Net N 4.50 6.08 4.64 5.07 4.27 4.96 2.76 4.00
HPE N 4.80 6.18 4.87 5.28 3.12 5.18 4.57 4.29
QuatNet N 3.97 5.62 3.92 4.50 2.94 5.49 4.01 4.15
WHENet-V N 4.44 5.75 4.31 4.83 3.60 4.10 2.73 3.48
WHENet Y/N 5.11 6.24 4.92 5.42 3.99 4.39 3.06 3.81
TriNet Y 4.04 5.77 4.20 4.67 4.11 4.76 3.05 3.97
FDN N 3.78 5.61 3.88 4.42 4.52 4.70 2.56 3.93
6DRepNet Y 3.63 4.91 3.37 3.97 3.24 4.48 2.68 3.47

BIWI 70/30

Yaw Pitch Roll MAE
HopeNet ( =1) 3.29 3.39 3.00 3.23
FSA-Net 2.89 4.29 3.60 3.60
TriNet 2.93 3.04 2.44 2.80
FDN 3.00 3.98 2.88 3.29
6DRepNet 2.69 2.92 2.36 2.66

Fine-tuned Models

Fine-tuned models can be download from here: https://drive.google.com/drive/folders/1V1pCV0BEW3mD-B9MogGrz_P91UhTtuE_?usp=sharing

Quick Start:

git clone https://github.com/thohemp/6DRepNet
cd 6DRepNet

Set up a virtual environment:

python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt  # Install required packages

In order to run the demo scripts you need to install the face detector

pip install git+https://github.com/elliottzheng/face-detection.git@master

Camera Demo:

python demo.py  --snapshot 6DRepNet_300W_LP_AFLW2000.pth \
                --cam 0

Test/Train 3DRepNet

Preparing datasets

Download datasets:

  • 300W-LP, AFLW2000 from here.

  • BIWI (Biwi Kinect Head Pose Database) from here

Store them in the datasets directory.

For 300W-LP and AFLW2000 we need to create a filenamelist.

python create_filename_list.py --root_dir datasets/300W_LP

The BIWI datasets needs be preprocessed by a face detector to cut out the faces from the images. You can use the script provided here. For 7:3 splitting of the BIWI dataset you can use the equivalent script here. We set the cropped image size to 256.

Testing:

python test.py  --batch_size 64 \
                --dataset ALFW2000 \
                --data_dir datasets/AFLW2000 \
                --filename_list datasets/AFLW2000/files.txt \
                --snapshot output/snapshots/1.pth \
                --show_viz False 

Training

Download pre-trained RepVGG model 'RepVGG-B1g2-train.pth' from here and save it in the root directory.

python train.py --batch_size 64 \
                --num_epochs 30 \
                --lr 0.00001 \
                --dataset Pose_300W_LP \
                --data_dir datasets/300W_LP \
                --filename_list datasets/300W_LP/files.txt

Deploy models

For reparameterization the trained models into inference-models use the convert script.

python convert.py input-model.tar output-model.pth

Inference-models are loaded with the flag deploy=True.

model = SixDRepNet(backbone_name='RepVGG-B1g2',
                    backbone_file='',
                    deploy=True,
                    pretrained=False)

Citing

If you find our work useful, please cite the paper:

@misc{hempel20226d,
      title={6D Rotation Representation For Unconstrained Head Pose Estimation}, 
      author={Thorsten Hempel and Ahmed A. Abdelrahman and Ayoub Al-Hamadi},
      year={2022},
      eprint={2202.12555},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}
Comments
  • BIWI Dataset

    BIWI Dataset

    Hey, I can not reproduce Your results on the BIWI Dataset. Im comparing X Y Z angles obtained from the ground truth rotation matrix transformed by extrinsic calibration with -Pitch, Yaw and Roll repsectively. Im using Your pip package. I crop face with Retina Face detector as You do in demo.py and pass it to the model.predict() function. I instantiate model without any parameters, so the path to the weights are default. I have spotted one difference. In the Readme You wirte The BIWI datasets needs be preprocessed by a face detector to cut out the faces from the images. You can use the script provided [here](https://github.com/shamangary/FSA-Net/blob/master/data/TYY_create_db_biwi.py). For 7:3 splitting of the BIWI dataset you can use the equivalent script [here](https://github.com/shamangary/FSA-Net/blob/master/data/TYY_create_db_biwi_70_30.py). We set the cropped image size to 256. however, in the model.predict() the crop is resized to 244 (which i believe is longer edge of the picture, the shorter is then scaled with an appropriate ratio). Is it desired?

    I can not find more differences, but mean error is about 25 on X and Y and about 8 on Z. Can You help me figure it out?

    Best, Jan

    opened by janglinko-dac 7
  • Pip install failed

    Pip install failed

    When I install the package: pip install SixDRepNet, it returns Collecting SixDRepNet Downloading SixDRepNet-0.1.1.tar.gz (23 kB) Preparing metadata (setup.py) ... error error: subprocess-exited-with-error

    × python setup.py egg_info did not run successfully. │ exit code: 1 ╰─> [6 lines of output] Traceback (most recent call last): File "", line 36, in File "", line 34, in File "/private/var/folders/5l/9fsdwp_n1td91scw10n67zwc0000gp/T/pip-install-8xwcmbp1/sixdrepnet_2b06c43f46c5428d9c99677633d23b6e/setup.py", line 23, in long_description="".join(open("README.MD", "r").readlines()), FileNotFoundError: [Errno 2] No such file or directory: 'README.MD' [end of output]

    note: This error originates from a subprocess, and is likely not a problem with pip. error: metadata-generation-failed

    × Encountered error while generating package metadata. ╰─> See above for output.

    note: This is an issue with the package mentioned above, not pip. hint: See above for details.

    opened by GlennCGL 4
  • Pretrained weight cannot be downloaded

    Pretrained weight cannot be downloaded

    Hi there,

    I am experimenting with the SixDRepNet_Detector and I am experiencing the issue that the model pretrained weight cannot be downloaded

    here is the error message on Google Colab:

    image

    Thank you!

    opened by skyrockets-21 2
  • Questions regarding Learning full rotation appearance

    Questions regarding Learning full rotation appearance

    Hello there!!

    Thank you very much for your great work. I am really interested in your work and would like to implement it on images with full rotation appearance.

    I have two questions regarding this. 1.) Is your pre-trained model trained on full-rotation-appearance datasets (-180, 180) and capable of predicting head poses on images in which faces cannot be seen? 2.) If the answer to my first question is NO, could you please guide me on which datasets I should use for finetuning the pre-trained model to learn full orientation appearance?

    Thank you very much in advance for your consideration

    opened by Matus-Tanonwong 2
  • pre-trained models

    pre-trained models

    Hi, Thanks for this amazing work. I am really interested in your work. I just want to test your network on the 2 test dataset (AFLW2000 and BIWI). I am wondering that why you provide two .pth files (6DRepNet_300W_LP_AFLW2000.pth and 6DRepNet_300W_LP_BIWI.pth ) for each specific test data, should not we just test the network with one pretrained model for both test datasets?

    I am looking forward to your response, Thanks

    opened by SaharR1372 2
  • gap with results in papers

    gap with results in papers

    Hi, Thanks for your impressive paper and code. I tried this repo to reproduce this performance, I followed all instruction and trained on300w-lp use train.py without change any parameters, then evaluate on AFLW2000 using test.py and results as below:

    me: Yaw: 3.9897, Pitch: 5.0923, Roll: 3.6405, MAE: 4.2408 yours: Yaw: 3.63 , Pitch: 4.91 Roll: 3.37 , MAE: 3.97

    Is there any other tricks or changes should be apply for reproduce your results?

    opened by BenjaminGit001 2
  • Preprocess part in train.py code and demo.py are different

    Preprocess part in train.py code and demo.py are different

    Thanks for your good job.

    I try to test and train the 6DRepNet model, and find some issue.

    1. Preprocess code in train.py
        normalize = transforms.Normalize(
            mean=[0.485, 0.456, 0.406],
            std=[0.229, 0.224, 0.225])
    
        transformations = transforms.Compose([transforms.Resize(240),
                                              transforms.RandomCrop(224),
                                              transforms.ToTensor(),
                                              normalize])
    

    Preprocess code in demo.py

                    img = frame[y_min:y_max,x_min:x_max]
                   # cv2.imshow("crop", img)
                   # cv2.waitKey(5)
                    img = cv2.resize(img, (244, 244))/255.0
                    img = img.transpose(2, 0, 1)
                    img = torch.from_numpy(img).type(torch.FloatTensor)
                    img = torch.Tensor(img).cuda(gpu)
    

    normalize and input size are different.

    1. I download the pre-trained RepVGG model 'RepVGG-A0-train.pth' from here

    Just use demo.py code to test 9 faces with one image, output are wrong. 9 faces have same yaw, row and pitch valudes.

    and I also test the Fine-trained models from here, the pose values look well.

    So what are the difference between pre-trained RepVGG model and Fine-trained models?

    opened by YaoQ 2
  • Fixed setup and broken imports

    Fixed setup and broken imports

    • Fixed conflict with package and model (SixDRepNet) names by replacing package name with sixdrepnet.
    • Cleaned up imports to match the update
    • Replaced package name in setup.py by sixdrepnet
    • Moved backbone into the sixdrepnet package
    opened by fabawi 1
  • Licence for Fine-tuned models

    Licence for Fine-tuned models

    Hi. Thanks for this interesting and wonderful piece of work!

    I have a question about licensing, as the title says... What would be the licence for Fine-tuned models? Is it MIT like the codes, or is it different?

    I want to use it as part of a work study, but I am not skilled in machine learning and would like to use the model as is!

    I don't intend to publish, redistribute or incorporate them into products, but even if it is for research purposes, under my work rules, it is still a commercial use. So, I would like to ask you for more information about the lisence for the models.

    I look forward to response from you. Thank you.

    opened by imae-sound 1
  • cant change gpu id in demo.py

    cant change gpu id in demo.py

    there is a cuda error when changing gpu to another id(except 0)

    Traceback (most recent call last): File "demo.py", line 136, in <module> R_pred = model(img) File "/mnt/data2/head_pose_estimation/codes/6DRepNet/venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1186, in _call_impl return forward_call(*input, **kwargs) File "/mnt/data2/head_pose_estimation/codes/6DRepNet/model.py", line 48, in forward return utils.compute_rotation_matrix_from_ortho6d(x) File "/mnt/data2/head_pose_estimation/codes/6DRepNet/utils.py", line 146, in compute_rotation_matrix_from_ortho6d x = normalize_vector(x_raw, use_gpu) #batch*3 File "/mnt/data2/head_pose_estimation/codes/6DRepNet/utils.py", line 119, in normalize_vector v_mag = torch.max(v_mag, torch.autograd.Variable(torch.FloatTensor([1e-8]).cuda())) RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:3 and cuda:0!

    opened by soroush-mim 1
  • How to train pre-train model

    How to train pre-train model

    Hi, I tried to train the model from scratch, seems hard to train comparable performance as training model from the pre-train model. My question is how to train the pre-train model or how to train the similar performance from scratch?

    Thanks!

    opened by BenjaminGit001 1
  • is the pretrained model for only faces?

    is the pretrained model for only faces?

    Thanks for the work! Is the pretrained model is for only face pictures? If so, is there any other pretrained model for other objects, like box, bottle, shoe etc.?

    opened by himmetozcan 0
  • Query regarding face pose axis visualisation

    Query regarding face pose axis visualisation

    I see that to construct rotation matrix(R) from yaw, pitch and roll angle values, you use zyx order i.e Rz * Ry * Rx, where Rz is rotation about z-axis, Ry is rotation about the y-axis, and Rx is rotation about the x-axis.

    But for visualisation, it looks like the order you use is xyz i.e Rx * Ry * Rz and then use column vectors of this resulting matrix as axis coordinates (https://github.com/thohemp/6DRepNet/blob/master/utils.py#L54). May I know why this is done? Am I missing something?

    Thanks.

    opened by shubhamwagh 3
Owner
Thorsten Hempel
Computer Vision, Robotics
Thorsten Hempel
Web service for facial landmark detection, head pose estimation, facial action unit recognition, and eye-gaze estimation based on OpenFace 2.0

OpenGaze: Web Service for OpenFace Facial Behaviour Analysis Toolkit Overview OpenFace is a fantastic tool intended for computer vision and machine le

Sayom Shakib 4 Nov 3, 2022
OpenFace – a state-of-the art tool intended for facial landmark detection, head pose estimation, facial action unit recognition, and eye-gaze estimation.

OpenFace 2.2.0: a facial behavior analysis toolkit Over the past few years, there has been an increased interest in automatic facial behavior analysis

Tadas Baltrusaitis 5.8k Dec 31, 2022
Deep Learning Head Pose Estimation using PyTorch.

Hopenet is an accurate and easy to use head pose estimation network. Models have been trained on the 300W-LP dataset and have been tested on real data with good qualitative performance.

Nataniel Ruiz 1.3k Dec 26, 2022
Human head pose estimation using Keras over TensorFlow.

RealHePoNet: a robust single-stage ConvNet for head pose estimation in the wild.

Rafael Berral Soler 71 Jan 5, 2023
WHENet: Real-time Fine-Grained Estimation for Wide Range Head Pose

WHENet: Real-time Fine-Grained Estimation for Wide Range Head Pose Yijun Zhou and James Gregson - BMVC2020 Abstract: We present an end-to-end head-pos

null 368 Dec 26, 2022
Code for BMVC2021 "MOS: A Low Latency and Lightweight Framework for Face Detection, Landmark Localization, and Head Pose Estimation"

MOS-Multi-Task-Face-Detect Introduction This repo is the official implementation of "MOS: A Low Latency and Lightweight Framework for Face Detection,

null 104 Dec 8, 2022
The project is an official implementation of our CVPR2019 paper "Deep High-Resolution Representation Learning for Human Pose Estimation"

Deep High-Resolution Representation Learning for Human Pose Estimation (CVPR 2019) News [2020/07/05] A very nice blog from Towards Data Science introd

Leo Xiao 3.9k Jan 5, 2023
library for nonlinear optimization, wrapping many algorithms for global and local, constrained or unconstrained, optimization

NLopt is a library for nonlinear local and global optimization, for functions with and without gradient information. It is designed as a simple, unifi

Steven G. Johnson 1.4k Dec 25, 2022
Unconstrained Text Detection with Box Supervisionand Dynamic Self-Training

SelfText Beyond Polygon: Unconstrained Text Detection with Box Supervisionand Dynamic Self-Training Introduction This is a PyTorch implementation of "

weijiawu 34 Nov 9, 2022
Cross Quality LFW: A database for Analyzing Cross-Resolution Image Face Recognition in Unconstrained Environments

Cross-Quality Labeled Faces in the Wild (XQLFW) Here, we release the database, evaluation protocol and code for the following paper: Cross Quality LFW

Martin Knoche 10 Dec 12, 2022
Official project website for the CVPR 2021 paper "Exploring intermediate representation for monocular vehicle pose estimation"

EgoNet Official project website for the CVPR 2021 paper "Exploring intermediate representation for monocular vehicle pose estimation". This repo inclu

Shichao Li 138 Dec 9, 2022
Re-implementation of the Noise Contrastive Estimation algorithm for pyTorch, following "Noise-contrastive estimation: A new estimation principle for unnormalized statistical models." (Gutmann and Hyvarinen, AISTATS 2010)

Noise Contrastive Estimation for pyTorch Overview This repository contains a re-implementation of the Noise Contrastive Estimation algorithm, implemen

Denis Emelin 42 Nov 24, 2022
SE3 Pose Interp - Interpolate camera pose or trajectory in SE3, pose interpolation, trajectory interpolation

SE3 Pose Interpolation Pose estimated from SLAM system are always discrete, and

Ran Cheng 4 Dec 15, 2022
《Unsupervised 3D Human Pose Representation with Viewpoint and Pose Disentanglement》(ECCV 2020) GitHub: [fig9]

Unsupervised 3D Human Pose Representation [Paper] The implementation of our paper Unsupervised 3D Human Pose Representation with Viewpoint and Pose Di

null 42 Nov 24, 2022
Simple Pose: Rethinking and Improving a Bottom-up Approach for Multi-Person Pose Estimation

SimplePose Code and pre-trained models for our paper, “Simple Pose: Rethinking and Improving a Bottom-up Approach for Multi-Person Pose Estimation”, a

Jia Li 256 Dec 24, 2022
Repository for the paper "PoseAug: A Differentiable Pose Augmentation Framework for 3D Human Pose Estimation", CVPR 2021.

PoseAug: A Differentiable Pose Augmentation Framework for 3D Human Pose Estimation Code repository for the paper: PoseAug: A Differentiable Pose Augme

Pyjcsx 328 Dec 17, 2022
This repository contains codes of ICCV2021 paper: SO-Pose: Exploiting Self-Occlusion for Direct 6D Pose Estimation

SO-Pose This repository contains codes of ICCV2021 paper: SO-Pose: Exploiting Self-Occlusion for Direct 6D Pose Estimation This paper is basically an

shangbuhuan 52 Nov 25, 2022
Python scripts for performing 3D human pose estimation using the Mobile Human Pose model in ONNX.

Python scripts for performing 3D human pose estimation using the Mobile Human Pose model in ONNX.

Ibai Gorordo 99 Dec 31, 2022
Extreme Rotation Estimation using Dense Correlation Volumes

Extreme Rotation Estimation using Dense Correlation Volumes This repository contains a PyTorch implementation of the paper: Extreme Rotation Estimatio

Ruojin Cai 29 Nov 18, 2022