Official Pytorch implementation of 6DRepNet: 6D Rotation representation for unconstrained head pose estimation.

Thorsten Hempel

Last update: Dec 23, 2022

Related tags

Deep Learning analysis estimation pytorch orientation facial head pose head-pose-estimation head-pose 6d pytorch-implementation biwi aflw2000

Overview

6D Rotation Representation for Unconstrained Head Pose Estimation (Pytorch)

Paper

Thorsten Hempel and Ahmed A. Abdelrahman and Ayoub Al-Hamadi, "6D Rotation Representation for Unconstrained Head Pose Estimation", submitted to ICIP 2022. [ResearchGate][Arxiv]

Abstract

In this paper, we present a method for unconstrained end-to-end head pose estimation. We address the problem of ambiguous rotation labels by introducing the rotation matrix formalism for our ground truth data and propose a continuous 6D rotation matrix representation for efficient and robust direct regression. This way, our method can learn the full rotation appearance which is contrary to previous approaches that restrict the pose prediction to a narrow-angle for satisfactory results. In addition, we propose a geodesic distance-based loss to penalize our network with respect to the $\textit{SO}(3)$ manifold geometry. Experiments on the public AFLW2000 and BIWI datasets demonstrate that our proposed method significantly outperforms other state-of-the-art methods by up to 20%.

Trained on 300W-LP, Test on AFLW2000 and BIWI


	Full Range	Yaw	Pitch	Roll	MAE	Yaw	Pitch	Roll	MAE
HopeNet ( $\alpha$ =2)	N	6.47	6.56	5.44	6.16	5.17	6.98	3.39	5.18
HopeNet ( $\alpha$ =1)	N	6.92	6.64	5.67	6.41	4.81	6.61	3.27	4.90
FSA-Net	N	4.50	6.08	4.64	5.07	4.27	4.96	2.76	4.00
HPE	N	4.80	6.18	4.87	5.28	3.12	5.18	4.57	4.29
QuatNet	N	3.97	5.62	3.92	4.50	2.94	5.49	4.01	4.15
WHENet-V	N	4.44	5.75	4.31	4.83	3.60	4.10	2.73	3.48
WHENet	Y/N	5.11	6.24	4.92	5.42	3.99	4.39	3.06	3.81
TriNet	Y	4.04	5.77	4.20	4.67	4.11	4.76	3.05	3.97
FDN	N	3.78	5.61	3.88	4.42	4.52	4.70	2.56	3.93

6DRepNet	Y	3.63	4.91	3.37	3.97	3.24	4.48	2.68	3.47

BIWI 70/30


	Yaw	Pitch	Roll	MAE
HopeNet ( $\alpha$ =1)	3.29	3.39	3.00	3.23
FSA-Net	2.89	4.29	3.60	3.60
TriNet	2.93	3.04	2.44	2.80
FDN	3.00	3.98	2.88	3.29

6DRepNet	2.69	2.92	2.36	2.66

Fine-tuned Models

Fine-tuned models can be download from here: https://drive.google.com/drive/folders/1V1pCV0BEW3mD-B9MogGrz_P91UhTtuE_?usp=sharing

Quick Start:

git clone https://github.com/thohemp/6DRepNet
cd 6DRepNet

Set up a virtual environment:

python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt  # Install required packages

In order to run the demo scripts you need to install the face detector

pip install git+https://github.com/elliottzheng/face-detection.git@master

Camera Demo:

python demo.py  --snapshot 6DRepNet_300W_LP_AFLW2000.pth \
                --cam 0

Test/Train 3DRepNet

Preparing datasets

Download datasets:

300W-LP, AFLW2000 from here.
BIWI (Biwi Kinect Head Pose Database) from here

Store them in the datasets directory.

For 300W-LP and AFLW2000 we need to create a filenamelist.

python create_filename_list.py --root_dir datasets/300W_LP

The BIWI datasets needs be preprocessed by a face detector to cut out the faces from the images. You can use the script provided here. For 7:3 splitting of the BIWI dataset you can use the equivalent script here. We set the cropped image size to 256.

Testing:

python test.py  --batch_size 64 \
                --dataset ALFW2000 \
                --data_dir datasets/AFLW2000 \
                --filename_list datasets/AFLW2000/files.txt \
                --snapshot output/snapshots/1.pth \
                --show_viz False

Training

Download pre-trained RepVGG model 'RepVGG-B1g2-train.pth' from here and save it in the root directory.

python train.py --batch_size 64 \
                --num_epochs 30 \
                --lr 0.00001 \
                --dataset Pose_300W_LP \
                --data_dir datasets/300W_LP \
                --filename_list datasets/300W_LP/files.txt

Deploy models

For reparameterization the trained models into inference-models use the convert script.

python convert.py input-model.tar output-model.pth

Inference-models are loaded with the flag deploy=True.

model = SixDRepNet(backbone_name='RepVGG-B1g2',
                    backbone_file='',
                    deploy=True,
                    pretrained=False)

Citing

If you find our work useful, please cite the paper:

@misc{hempel20226d,
      title={6D Rotation Representation For Unconstrained Head Pose Estimation}, 
      author={Thorsten Hempel and Ahmed A. Abdelrahman and Ayoub Al-Hamadi},
      year={2022},
      eprint={2202.12555},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

Comments

BIWI Dataset

Hey, I can not reproduce Your results on the BIWI Dataset. Im comparing X Y Z angles obtained from the ground truth rotation matrix transformed by extrinsic calibration with -Pitch, Yaw and Roll repsectively. Im using Your pip package. I crop face with Retina Face detector as You do in demo.py and pass it to the model.predict() function. I instantiate model without any parameters, so the path to the weights are default. I have spotted one difference. In the Readme You wirte The BIWI datasets needs be preprocessed by a face detector to cut out the faces from the images. You can use the script provided [here](https://github.com/shamangary/FSA-Net/blob/master/data/TYY_create_db_biwi.py). For 7:3 splitting of the BIWI dataset you can use the equivalent script [here](https://github.com/shamangary/FSA-Net/blob/master/data/TYY_create_db_biwi_70_30.py). We set the cropped image size to 256. however, in the model.predict() the crop is resized to 244 (which i believe is longer edge of the picture, the shorter is then scaled with an appropriate ratio). Is it desired?

I can not find more differences, but mean error is about 25 on X and Y and about 8 on Z. Can You help me figure it out?

Best, Jan

opened by janglinko-dac 7
Pip install failed

When I install the package: pip install SixDRepNet, it returns Collecting SixDRepNet Downloading SixDRepNet-0.1.1.tar.gz (23 kB) Preparing metadata (setup.py) ... error error: subprocess-exited-with-error

× python setup.py egg_info did not run successfully. │ exit code: 1 ╰─> [6 lines of output] Traceback (most recent call last): File "", line 36, in File "", line 34, in File "/private/var/folders/5l/9fsdwp_n1td91scw10n67zwc0000gp/T/pip-install-8xwcmbp1/sixdrepnet_2b06c43f46c5428d9c99677633d23b6e/setup.py", line 23, in long_description="".join(open("README.MD", "r").readlines()), FileNotFoundError: [Errno 2] No such file or directory: 'README.MD' [end of output]

note: This error originates from a subprocess, and is likely not a problem with pip. error: metadata-generation-failed

× Encountered error while generating package metadata. ╰─> See above for output.

note: This is an issue with the package mentioned above, not pip. hint: See above for details.

opened by GlennCGL 4
Pretrained weight cannot be downloaded

Hi there,

I am experimenting with the SixDRepNet_Detector and I am experiencing the issue that the model pretrained weight cannot be downloaded

here is the error message on Google Colab:

Thank you!

opened by skyrockets-21 2
Questions regarding Learning full rotation appearance

Hello there!!

Thank you very much for your great work. I am really interested in your work and would like to implement it on images with full rotation appearance.

I have two questions regarding this. 1.) Is your pre-trained model trained on full-rotation-appearance datasets (-180, 180) and capable of predicting head poses on images in which faces cannot be seen? 2.) If the answer to my first question is NO, could you please guide me on which datasets I should use for finetuning the pre-trained model to learn full orientation appearance?

Thank you very much in advance for your consideration

opened by Matus-Tanonwong 2
pre-trained models

Hi, Thanks for this amazing work. I am really interested in your work. I just want to test your network on the 2 test dataset (AFLW2000 and BIWI). I am wondering that why you provide two .pth files (6DRepNet_300W_LP_AFLW2000.pth and 6DRepNet_300W_LP_BIWI.pth ) for each specific test data, should not we just test the network with one pretrained model for both test datasets?

I am looking forward to your response, Thanks

opened by SaharR1372 2
gap with results in papers

Hi, Thanks for your impressive paper and code. I tried this repo to reproduce this performance, I followed all instruction and trained on300w-lp use train.py without change any parameters, then evaluate on AFLW2000 using test.py and results as below:

me: Yaw: 3.9897, Pitch: 5.0923, Roll: 3.6405, MAE: 4.2408 yours: Yaw: 3.63 , Pitch: 4.91 Roll: 3.37 , MAE: 3.97

Is there any other tricks or changes should be apply for reproduce your results?

opened by BenjaminGit001 2

Preprocess part in train.py code and demo.py are different

Thanks for your good job.

I try to test and train the 6DRepNet model, and find some issue.

Preprocess code in train.py

    normalize = transforms.Normalize(
        mean=[0.485, 0.456, 0.406],
        std=[0.229, 0.224, 0.225])

    transformations = transforms.Compose([transforms.Resize(240),
                                          transforms.RandomCrop(224),
                                          transforms.ToTensor(),
                                          normalize])

Preprocess code in demo.py

                img = frame[y_min:y_max,x_min:x_max]
               # cv2.imshow("crop", img)
               # cv2.waitKey(5)
                img = cv2.resize(img, (244, 244))/255.0
                img = img.transpose(2, 0, 1)
                img = torch.from_numpy(img).type(torch.FloatTensor)
                img = torch.Tensor(img).cuda(gpu)

normalize and input size are different.

I download the pre-trained RepVGG model 'RepVGG-A0-train.pth' from here

Just use demo.py code to test 9 faces with one image, output are wrong. 9 faces have same yaw, row and pitch valudes.

and I also test the Fine-trained models from here, the pose values look well.

So what are the difference between pre-trained RepVGG model and Fine-trained models?

opened by YaoQ 2

Fixed setup and broken imports
Fixed conflict with package and model (SixDRepNet) names by replacing package name with sixdrepnet.

Cleaned up imports to match the update

Replaced package name in setup.py by sixdrepnet

Moved backbone into the sixdrepnet package
opened by fabawi 1
Licence for Fine-tuned models

Hi. Thanks for this interesting and wonderful piece of work!

I have a question about licensing, as the title says... What would be the licence for Fine-tuned models? Is it MIT like the codes, or is it different?

I want to use it as part of a work study, but I am not skilled in machine learning and would like to use the model as is!

I don't intend to publish, redistribute or incorporate them into products, but even if it is for research purposes, under my work rules, it is still a commercial use. So, I would like to ask you for more information about the lisence for the models.

I look forward to response from you. Thank you.

opened by imae-sound 1
cant change gpu id in demo.py

there is a cuda error when changing gpu to another id(except 0)

Traceback (most recent call last): File "demo.py", line 136, in <module> R_pred = model(img) File "/mnt/data2/head_pose_estimation/codes/6DRepNet/venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1186, in _call_impl return forward_call(*input, **kwargs) File "/mnt/data2/head_pose_estimation/codes/6DRepNet/model.py", line 48, in forward return utils.compute_rotation_matrix_from_ortho6d(x) File "/mnt/data2/head_pose_estimation/codes/6DRepNet/utils.py", line 146, in compute_rotation_matrix_from_ortho6d x = normalize_vector(x_raw, use_gpu) #batch*3 File "/mnt/data2/head_pose_estimation/codes/6DRepNet/utils.py", line 119, in normalize_vector v_mag = torch.max(v_mag, torch.autograd.Variable(torch.FloatTensor([1e-8]).cuda())) RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:3 and cuda:0!

opened by soroush-mim 1
How to train pre-train model

Hi, I tried to train the model from scratch, seems hard to train comparable performance as training model from the pre-train model. My question is how to train the pre-train model or how to train the similar performance from scratch?

Thanks!

opened by BenjaminGit001 1
is the pretrained model for only faces?

Thanks for the work! Is the pretrained model is for only face pictures? If so, is there any other pretrained model for other objects, like box, bottle, shoe etc.?

opened by himmetozcan 0
Query regarding face pose axis visualisation

I see that to construct rotation matrix(R) from yaw, pitch and roll angle values, you use zyx order i.e Rz * Ry * Rx, where Rz is rotation about z-axis, Ry is rotation about the y-axis, and Rx is rotation about the x-axis.

But for visualisation, it looks like the order you use is xyz i.e Rx * Ry * Rz and then use column vectors of this resulting matrix as axis coordinates (https://github.com/thohemp/6DRepNet/blob/master/utils.py#L54). May I know why this is done? Am I missing something?

Thanks.

opened by shubhamwagh 3

Official Pytorch implementation of 6DRepNet: 6D Rotation representation for unconstrained head pose estimation.

Related tags

Overview

6D Rotation Representation for Unconstrained Head Pose Estimation (Pytorch)

Paper

Abstract

Trained on 300W-LP, Test on AFLW2000 and BIWI

BIWI 70/30

Fine-tuned Models

Quick Start:

Set up a virtual environment:

Camera Demo:

Test/Train 3DRepNet

Preparing datasets

Testing:

Training

Deploy models

Citing

Comments

Owner

Thorsten Hempel

Web service for facial landmark detection, head pose estimation, facial action unit recognition, and eye-gaze estimation based on OpenFace 2.0

OpenFace – a state-of-the art tool intended for facial landmark detection, head pose estimation, facial action unit recognition, and eye-gaze estimation.

Deep Learning Head Pose Estimation using PyTorch.

Human head pose estimation using Keras over TensorFlow.

WHENet: Real-time Fine-Grained Estimation for Wide Range Head Pose

Code for BMVC2021 "MOS: A Low Latency and Lightweight Framework for Face Detection, Landmark Localization, and Head Pose Estimation"

The project is an official implementation of our CVPR2019 paper "Deep High-Resolution Representation Learning for Human Pose Estimation"

library for nonlinear optimization, wrapping many algorithms for global and local, constrained or unconstrained, optimization

Unconstrained Text Detection with Box Supervisionand Dynamic Self-Training

Cross Quality LFW: A database for Analyzing Cross-Resolution Image Face Recognition in Unconstrained Environments

Official project website for the CVPR 2021 paper "Exploring intermediate representation for monocular vehicle pose estimation"

Re-implementation of the Noise Contrastive Estimation algorithm for pyTorch, following "Noise-contrastive estimation: A new estimation principle for unnormalized statistical models." (Gutmann and Hyvarinen, AISTATS 2010)

SE3 Pose Interp - Interpolate camera pose or trajectory in SE3, pose interpolation, trajectory interpolation

《Unsupervised 3D Human Pose Representation with Viewpoint and Pose Disentanglement》(ECCV 2020) GitHub: [fig9]

Simple Pose: Rethinking and Improving a Bottom-up Approach for Multi-Person Pose Estimation

Repository for the paper "PoseAug: A Differentiable Pose Augmentation Framework for 3D Human Pose Estimation", CVPR 2021.

This repository contains codes of ICCV2021 paper: SO-Pose: Exploiting Self-Occlusion for Direct 6D Pose Estimation

Python scripts for performing 3D human pose estimation using the Mobile Human Pose model in ONNX.

Extreme Rotation Estimation using Dense Correlation Volumes