Deep Learning Head Pose Estimation using PyTorch.

Overview

Hopenet



Hopenet is an accurate and easy to use head pose estimation network. Models have been trained on the 300W-LP dataset and have been tested on real data with good qualitative performance.

For details about the method and quantitative results please check the CVPR Workshop paper.



new GoT trailer example video

new Conan-Cruise-Car example video

To use please install PyTorch and OpenCV (for video) - I believe that's all you need apart from usual libraries such as numpy. You need a GPU to run Hopenet (for now).

To test on a video using dlib face detections (center of head will be jumpy):

python code/test_on_video_dlib.py --snapshot PATH_OF_SNAPSHOT --face_model PATH_OF_DLIB_MODEL --video PATH_OF_VIDEO --output_string STRING_TO_APPEND_TO_OUTPUT --n_frames N_OF_FRAMES_TO_PROCESS --fps FPS_OF_SOURCE_VIDEO

To test on a video using your own face detections (we recommend using dockerface, center of head will be smoother):

python code/test_on_video_dockerface.py --snapshot PATH_OF_SNAPSHOT --video PATH_OF_VIDEO --bboxes FACE_BOUNDING_BOX_ANNOTATIONS --output_string STRING_TO_APPEND_TO_OUTPUT --n_frames N_OF_FRAMES_TO_PROCESS --fps FPS_OF_SOURCE_VIDEO

Face bounding box annotations should be in Dockerface format (n_frame x_min y_min x_max y_max confidence).

Pre-trained models:

300W-LP, alpha 1

300W-LP, alpha 2

300W-LP, alpha 1, robust to image quality

For more information on what alpha stands for please read the paper. First two models are for validating paper results, if used on real data we suggest using the last model as it is more robust to image quality and blur and gives good results on video.

Please open an issue if you have an problem.

Some very cool implementations of this work on other platforms by some cool people:

Gluon

MXNet

TensorFlow with Keras

A really cool lightweight version of HopeNet:

Deep Head Pose Light

If you find Hopenet useful in your research please cite:

@InProceedings{Ruiz_2018_CVPR_Workshops,
author = {Ruiz, Nataniel and Chong, Eunji and Rehg, James M.},
title = {Fine-Grained Head Pose Estimation Without Keypoints},
booktitle = {The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops},
month = {June},
year = {2018}
}

Nataniel Ruiz, Eunji Chong, James M. Rehg

Georgia Institute of Technology

Comments
  • The loss cannot get decreased during the training

    The loss cannot get decreased during the training

    Hi natanielruiz:

    I was trying to repeated your paper's result in recent days however I found I cannot get the loss decreased when I trained your model on 300W_LP dataset. I used the same parameters you provided in your paper where

    alpha = 1, lr = 1e-5 and default parameters for Adam Optimizer.

    I ran your network for 25 epochs and the losses for Yaw is vibrating around 3000 which means the MSE loss is still too large for the yaw degree.

    Do you have any idea how to debug the network or solve this issue? Thank you very much for your help!

    opened by developer-mayuan 15
  • Having runtime error when train your Hopenet

    Having runtime error when train your Hopenet

    Hi natanielruiz:

    Firstly, I want to say thank you for your great work! I tested your pretrained model on my own dataset and it works great. The result is accurate and robust. Then currently I would like to fine-tune your network with my own dataset, however, I found I cannot do it.

    I did prepared the 300W_LP dataset and generated the filelist based on the input of your code. (By the way, maybe you can provide the filelist generation code in your repository, which will make it self-contained.)

    Then, we I ran your train_hopenet.py code, sometimes I can got result for 1 or 2 epochs, however, it will always gave me the following error message:

    Loading data.
    Ready to train network.
    Epoch [1/5], Iter [100/7653] Losses: Yaw 4.5354, Pitch 4.0671, Roll 4.2844
    /opt/conda/conda-bld/pytorch_1503966894950/work/torch/lib/THCUNN/ClassNLLCriterion.cu:57: void cunn_ClassNLLCriterion_updateOutput_kernel(Dtype *, Dtype *, Dtype *, long *, Dtype *, int, int, int, int, long) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [11,0,0] Assertion `t >= 0 && t < n_classes` failed.
    THCudaCheck FAIL file=/opt/conda/conda-bld/pytorch_1503966894950/work/torch/lib/THCUNN/generic/ClassNLLCriterion.cu line=87 error=59 : device-side assert triggered
    Traceback (most recent call last):
      File "/home/foo/Academy/deep-head-pose/code/train_hopenet.py", line 166, in <module>
        alpha = args.alpha
      File "/home/foo/Ordnance/anaconda2/envs/Hopenet/lib/python2.7/site-packages/torch/nn/modules/module.py", line 224, in __call__
        result = self.forward(*input, **kwargs)
      File "/home/foo/Ordnance/anaconda2/envs/Hopenet/lib/python2.7/site-packages/torch/nn/modules/loss.py", line 482, in forward
        self.ignore_index)
      File "/home/foo/Ordnance/anaconda2/envs/Hopenet/lib/python2.7/site-packages/torch/nn/functional.py", line 746, in cross_entropy
        return nll_loss(log_softmax(input), target, weight, size_average, ignore_index)
      File "/home/foo/Ordnance/anaconda2/envs/Hopenet/lib/python2.7/site-packages/torch/nn/functional.py", line 672, in nll_loss
        return _functions.thnn.NLLLoss.apply(input, target, weight, size_average, ignore_index)
      File "/home/foo/Ordnance/anaconda2/envs/Hopenet/lib/python2.7/site-packages/torch/nn/_functions/thnn/auto.py", line 47, in forward
        output, *ctx.additional_args)
    RuntimeError: cuda runtime error (59) : device-side assert triggered at /opt/conda/conda-bld/pytorch_1503966894950/work/torch/lib/THCUNN/generic/ClassNLLCriterion.cu:87
    

    I did some search, and the most promising answer is the following link: https://discuss.pytorch.org/t/runtimeerror-cuda-runtime-error-59-device-side-assert-triggered-at-opt-conda-conda-bld-pytorch-1503970438496-work-torch-lib-thc-generic-thcstorage-c-32/9669/5

    It sees like in some case your output is out of the bound of the target. The following is my running environment:

    Python 2.7.14 (with Anaconda) Using conda virtual environment pytorch 0.2.0 py27hc03bea1_4cu80 [cuda80] soumith torchvision 0.1.9 py27hdb88a65_1 soumith

    I would like to know if you meet this kind of problem before and if you can provide me some ideas about how to solving this problem? Thank you very much for your help!

    opened by developer-mayuan 14
  • torch.autograd.backward(loss_seq, grad_seq)      RuntimeError: invalid gradient at index 0 - expected shape [] but got [1]

    torch.autograd.backward(loss_seq, grad_seq) RuntimeError: invalid gradient at index 0 - expected shape [] but got [1]

    Hi: When I run train_hopenet.py, I got this error "RuntimeError: invalid gradient at index 0 - expected shape [] but got [1]", can you tell me how to solve it? thanks very much!!

    opened by gnnbest 12
  • Different results from the paper (AFW dataset)

    Different results from the paper (AFW dataset)

    Hi, thank you for your great work. I checked the performance of the pretrained model (300W-LP, alpha 1 and 300W-LP, alpha 2), the result on AFLW2000 are the same as the paper. But when I test the models on the AFW dataset, the results are very different. I wrote my own code to compute the discrete predictions that rounds of the the nearest 15 degree, the yaw accuracy is only 21.15%, and I also check the mean absolute error is 53.3674. So even if I made some mistake calculating the discrete predictions, the MAE of yaw seems too large. I am wondering which step was missing to reproduce the result in the paper. (I have make sure that the input format are the same as the one required in the datasets.py)

    opened by invigen 12
  • FaceDetection+CustomTraining+Validation +Performance validation

    FaceDetection+CustomTraining+Validation +Performance validation

    @natanielruiz Hi thanks for the awesome work and sharing just had few queries

    1. For face detection should we always use the FRCNN or is there any possibility of using some other detection technique
    2. How to validate the output params like the pitch,roll and yaw values obtained from the algorithm
    3. Are the values of pitch,roll and yaw being generated based on the center of the face detection
    4. Can we use ur algorithm for training it on custom dataset or the model share can be used on generic dataset
    5. For custom training can you provide some references steps to achieve this
    6. I ran your code on the CPU and its very slow is there any provision in the future for running it on the CPU

    Thanks for the awesome work and sharing

    opened by abhigoku10 9
  • The performance of the pretrained model you provided is somewhat different

    The performance of the pretrained model you provided is somewhat different

    Hi, i tested the pretrained model (300W-LP, alpha 2) you provided on AFLW2000, and found that the performance is somewhat different from shown on the paper? is this model the best of your algorithm?

    image

    opened by tfygg 9
  • test on AFLW2000 error is so large

    test on AFLW2000 error is so large

    Hi, firstly, very thanks for your greate work! I hava some question for this work: (1)I coding and train a tensorflow model acording this work, when I test on the train dataset(300w-lp) its error is yaw=2.1 pitch=1.9 roll=1.7, but when I test the model on the test dataset(AFLW2000) its error is so large, yaw = 35.3 pitch=11.4 roll=12.5, I can't find where the problem~ (2)in AFLW2000 dataset, the face point have some cordinate is -1, how you crop the face roi use to test head pose? (3)in 300w-lp, the big pose face have error face point, how you process this dataset?

    very thanks!

    opened by flyduck 8
  • why combined loss is better?

    why combined loss is better?

    I want to get an answer about my question. But i can't find answer from the artical. In my opinion, the answer may be that combined loss is suitable for training models. Many comment point out that regression loss is hard for training,cls loss is easier. So,how do you think about it?

    opened by xubaoquan33 8
  • I made an error when I was executing Python

    I made an error when I was executing Python

    File "code/test_on_video_dlib.py", line 7, in import torch File "/usr/local/lib/python2.7/dist-packages/torch/init.py", line 53, in from torch._C import * ImportError: dlopen: cannot load any more object with static TLS

    opened by ytgcljj 6
  • Format, PEPify and improve Python 2/3 compatibility

    Format, PEPify and improve Python 2/3 compatibility

    This patch does not introduce any functionality changes. Instead, it focuses on PEP compliance (mostly solving line width limit violations and unused imports) and Python 2/3 compatibility (mostly wrapping prints into braces and using Python 3's range).

    opened by kirillbobyrev 5
  • SegmentationFault in GPU

    SegmentationFault in GPU

    @natanielruiz Hi i ran your code on the GPU but i get Ready to test network. 1 Segmentation fault (core dumped)

    i am using the following "python code/test_on_video_dlib.py --snapshot "/home/teai/abhilashsk/hpe/snapshot/hopenet_alpha1.pkl" --face_model "/home/teai/abhilashsk/hpe/model/mmod_human_face_detector.dat" --video "/home/teai/abhilashsk/hpe/Data/1-FemaleNoGlasses-Normal.mp4" --output_string "output" --n_frames 300 --fps 20" error and the code is not using the gpu completely , i think there is some codes changes to be made for it to work on the gpu can you pls let me know what have to be done

    opened by mayanks888 5
  • i have created a colab file to test it out quickly

    i have created a colab file to test it out quickly

    Hi,

    I request you to add this to this repo if you feel relevant. Thanks! It uses mtcnn for face detection.

    https://colab.research.google.com/drive/1vvntbLyVxxBHoVN0e6-pfs7gB3pp-VUS?usp=sharing

    Thanks!

    opened by maylad31 0
  • AFLW labels are broken?

    AFLW labels are broken?

    ERROR: image I am completely confused. It seems like my labels are either -1 or bigger than num_classes. But I don't know why... It's just that I the label files are correct, but when it's calculating the labels at image that it's breaking apart...

    I would be pleased to get some help. Thank you.

    opened by ilyii 1
  • softmax

    softmax

    code/test_on_video_dlib.py:146: UserWarning: Implicit dimension choice for softmax has been deprecated. Change the call to include dim=X as an argument. yaw_predicted = F.softmax(yaw) code/test_on_video_dlib.py:147: UserWarning: Implicit dimension choice for softmax has been deprecated. Change the call to include dim=X as an argument. pitch_predicted = F.softmax(pitch) code/test_on_video_dlib.py:148: UserWarning: Implicit dimension choice for softmax has been deprecated. Change the call to include dim=X as an argument. roll_predicted = F.softmax(roll)

    dim is 0,1,or 2?

    opened by song6cy 0
  • input sizes mismatch for nn.CrossEntropyLoss()

    input sizes mismatch for nn.CrossEntropyLoss()

    Hi @natanielruiz thanks for sharing your work,

    while trying to adapt it for a project i stumbled upon a problem. In lines 160 to 161 of your train_hopenet.py file stand:

    # Forward pass
    yaw, pitch, roll = model(images)
    

    If i am not mistaken the size of each by the model predicted angle is (batch_size, num_bins) so for example (128, 66). Which makes absolute fine sense. Because the fully connected layer is of output_size 66.

    While investigating the datahandling in the datasets.py there is the following codeblock

    # We get the pose in radians
    pose = utils.get_ypr_from_mat(mat_path)
    # And convert to degrees.
    pitch = pose[0] * 180 / np.pi
    yaw = pose[1] * 180 / np.pi
    roll = pose[2] * 180 / np.pi
    # Bin values
    bins = np.array(range(-99, 102, 3))
    labels = torch.LongTensor(np.digitize([yaw, pitch, roll], bins) - 1)
    

    While assuming that the pose of the head has 3 values, one for each angle. Then i would get the bin of each angle in the labels variable, like [30, 33, 33].

    The first codeblock is followed combined with:

    label_yaw = Variable(labels[:,0]).cuda(gpu)
    label_pitch = Variable(labels[:,1]).cuda(gpu)
    label_roll = Variable(labels[:,2]).cuda(gpu)
    
    # Continuous labels
    label_yaw_cont = Variable(cont_labels[:,0]).cuda(gpu)
    label_pitch_cont = Variable(cont_labels[:,1]).cuda(gpu)
    label_roll_cont = Variable(cont_labels[:,2]).cuda(gpu)
    
    # Cross entropy loss
    loss_yaw = criterion(yaw, label_yaw)
    loss_pitch = criterion(pitch, label_pitch)
    loss_roll = criterion(roll, label_roll)
    

    with the criterion being nn.CrossEntropyLoss().cuda(gpu).

    This is where i get confused, because the sizes of the inputs do not match? We have yaw with (128, 66) but the label_yaw is of size (128, 1).

    Could you please tell me where i am doing something wrong? Any help is appreciated.

    Kind regards

    opened by ghost 1
  • rationale behind learning rate

    rationale behind learning rate

    i was able to train your model on my own machine and get robust webcam estimations. i wonder what is the rationale behind disabling training (setting learning rate to 0) for the first conv and bn of your resnet backbone, and giving a 5x learning rate to the three fc layers? also, would you suggest more epoches for a smaller model? i need to make this work on peripheral device for work.

    thanks very much. great work.

    opened by simin75simin 0
Owner
Nataniel Ruiz
PhD candidate at Boston University doing Computer Vision and ML. M.S. from Georgia Tech, BA/M.S. from Ecole Polytechnique
Nataniel Ruiz
OpenFace – a state-of-the art tool intended for facial landmark detection, head pose estimation, facial action unit recognition, and eye-gaze estimation.

OpenFace 2.2.0: a facial behavior analysis toolkit Over the past few years, there has been an increased interest in automatic facial behavior analysis

Tadas Baltrusaitis 5.8k Nov 22, 2022
Official Pytorch implementation of 6DRepNet: 6D Rotation representation for unconstrained head pose estimation.

6D Rotation Representation for Unconstrained Head Pose Estimation (Pytorch) Paper Thorsten Hempel and Ahmed A. Abdelrahman and Ayoub Al-Hamadi, "6D Ro

Thorsten Hempel 263 Nov 21, 2022
Human head pose estimation using Keras over TensorFlow.

RealHePoNet: a robust single-stage ConvNet for head pose estimation in the wild.

Rafael Berral Soler 68 Nov 21, 2022
WHENet: Real-time Fine-Grained Estimation for Wide Range Head Pose

WHENet: Real-time Fine-Grained Estimation for Wide Range Head Pose Yijun Zhou and James Gregson - BMVC2020 Abstract: We present an end-to-end head-pos

null 362 Nov 26, 2022
Code for BMVC2021 "MOS: A Low Latency and Lightweight Framework for Face Detection, Landmark Localization, and Head Pose Estimation"

MOS-Multi-Task-Face-Detect Introduction This repo is the official implementation of "MOS: A Low Latency and Lightweight Framework for Face Detection,

null 100 Nov 21, 2022
Python scripts for performing 3D human pose estimation using the Mobile Human Pose model in ONNX.

Python scripts for performing 3D human pose estimation using the Mobile Human Pose model in ONNX.

Ibai Gorordo 94 Nov 18, 2022
SE3 Pose Interp - Interpolate camera pose or trajectory in SE3, pose interpolation, trajectory interpolation

SE3 Pose Interpolation Pose estimated from SLAM system are always discrete, and

Ran Cheng 3 May 27, 2022
Simple Pose: Rethinking and Improving a Bottom-up Approach for Multi-Person Pose Estimation

SimplePose Code and pre-trained models for our paper, “Simple Pose: Rethinking and Improving a Bottom-up Approach for Multi-Person Pose Estimation”, a

Jia Li 253 Nov 15, 2022
Repository for the paper "PoseAug: A Differentiable Pose Augmentation Framework for 3D Human Pose Estimation", CVPR 2021.

PoseAug: A Differentiable Pose Augmentation Framework for 3D Human Pose Estimation Code repository for the paper: PoseAug: A Differentiable Pose Augme

Pyjcsx 324 Nov 22, 2022
This repository contains codes of ICCV2021 paper: SO-Pose: Exploiting Self-Occlusion for Direct 6D Pose Estimation

SO-Pose This repository contains codes of ICCV2021 paper: SO-Pose: Exploiting Self-Occlusion for Direct 6D Pose Estimation This paper is basically an

shangbuhuan 51 Sep 28, 2022
Re-implementation of the Noise Contrastive Estimation algorithm for pyTorch, following "Noise-contrastive estimation: A new estimation principle for unnormalized statistical models." (Gutmann and Hyvarinen, AISTATS 2010)

Noise Contrastive Estimation for pyTorch Overview This repository contains a re-implementation of the Noise Contrastive Estimation algorithm, implemen

Denis Emelin 42 Oct 17, 2022
[ECCV 2020] Reimplementation of 3DDFAv2, including face mesh, head pose, landmarks, and more.

Stable Head Pose Estimation and Landmark Regression via 3D Dense Face Reconstruction Reimplementation of (ECCV 2020) Towards Fast, Accurate and Stable

Remilia Scarlet 210 Nov 20, 2022
The project is an official implementation of our CVPR2019 paper "Deep High-Resolution Representation Learning for Human Pose Estimation"

Deep High-Resolution Representation Learning for Human Pose Estimation (CVPR 2019) News [2020/07/05] A very nice blog from Towards Data Science introd

Leo Xiao 3.9k Nov 22, 2022
Deep High-Resolution Representation Learning for Human Pose Estimation

Deep High-Resolution Representation Learning for Human Pose Estimation (accepted to CVPR2019) News If you are interested in internship or research pos

HRNet 163 Nov 10, 2022
Deep Dual Consecutive Network for Human Pose Estimation (CVPR2021)

Deep Dual Consecutive Network for Human Pose Estimation (CVPR2021) Introduction This is the official code of Deep Dual Consecutive Network for Human P

null 291 Nov 18, 2022
PyTorch Implementation of Realtime Multi-Person Pose Estimation project.

PyTorch Realtime Multi-Person Pose Estimation This is a pytorch version of Realtime_Multi-Person_Pose_Estimation, origin code is here Realtime_Multi-P

Dave Fang 157 Nov 12, 2022
PyTorch implementation for 3D human pose estimation

Towards 3D Human Pose Estimation in the Wild: a Weakly-supervised Approach This repository is the PyTorch implementation for the network presented in:

Xingyi Zhou 576 Nov 27, 2022
A PyTorch toolkit for 2D Human Pose Estimation.

PyTorch-Pose PyTorch-Pose is a PyTorch implementation of the general pipeline for 2D single human pose estimation. The aim is to provide the interface

Wei Yang 1.1k Nov 19, 2022
This repository is the offical Pytorch implementation of ContextPose: Context Modeling in 3D Human Pose Estimation: A Unified Perspective (CVPR 2021).

Context Modeling in 3D Human Pose Estimation: A Unified Perspective (CVPR 2021) Introduction This repository is the offical Pytorch implementation of

null 36 Sep 13, 2022