WHENet: Real-time Fine-Grained Estimation for Wide Range Head Pose

Overview

WHENet: Real-time Fine-Grained Estimation for Wide Range Head Pose

Yijun Zhou and James Gregson - BMVC2020

Abstract: We present an end-to-end head-pose estimation network designed to predict Euler angles through the full range head yaws from a single RGB image. Existing methods perform well for frontal views but few target head pose from all viewpoints. This has applications in autonomous driving and retail. Our network builds on multi-loss approaches with changes to loss functions and training strategies adapted to wide range estimation. Additionally, we extract ground truth labelings of anterior views from a current panoptic dataset for the first time. The resulting Wide Headpose Estimation Network (WHENet) is the first fine-grained modern method applicable to the full-range of head yaws (hence wide) yet also meets or beats state-of-the-art methods for frontal head pose estimation. Our network is compact and efficient for mobile devices and applications. ArXiv

Demo

We provided two use case of the WHENet, image input and video input in this repo. Please make sure you installed all the requirments before running the demo code by pip install -r requirements.txt. Additionally, please download the YOLOv3 model for head detection and put it under yolo_v3/data.

Image demo

To run WHENet with image input, please put images and bbox.txt under one folder (E.g. Sample/) and just run pthon demo.py.

Format of bbox.txt are showed below:

image_name,x_min y_min x_max y_max
mov_001_007585.jpeg,240 0 304 83

Video/Webcam demo

We used YOLO_v3 in the video demo to get the cropped head image. In order to customize some of the functions we have put the yolo implementation and the pre-trained model in the repo. Hollywood head and Crowdhuman are used to train the head detection YOLO model.

demo_video.py [--video INPUT_VIDEO_PATH] [--snapshot WHENET_MODEL] [--display DISPLAY_OPTION] 
              [--score YOLO_CONFIDENCE_THRESHOLD] [--iou IOU_THRESHOLD] [--gpu GPU#] [--output OUTPUT_VIDEO_PATH]

Please set --video '' for webcam input.

Dependncies

Comments
  • Cant get it up and running, any help is appreciated

    Cant get it up and running, any help is appreciated

    I cant get this project up and running, would it be possible to get some more instructions ?

    What python version are you using?

    Any other dependencies ?

    What CPU and OS are you running ?

    using CUDA ?

    opened by bobmoff 1
  • Swish activation layer error

    Swish activation layer error

    Hi, i have converted this model to uff and onnx, but in both cases I am not able to serialize to a trt engine due to the swish layer.

    Error message: ERROR: INVALID_ARGUMENT: getPluginCreator could not find plugin swish_f32 version 1 ERROR: builtin_op_importers.cpp:3773 In function importFallbackPluginImporter: [8] Assertion failed: creator && "Plugin not found, are the plugin name, version, and namespace correct?" ERROR: Network must have at least one output ERROR: Network validation failed.

    kindly guide solutions or alternatives if possible, thanks p.s: the model is trained really well! i tested on faces with masks as well, very promising :)

    opened by randallsalvares 1
  • try to reproduce the small aflw error

    try to reproduce the small aflw error

    so i was trying to reproduce whenet-v using the same network structure, preprocessor, etc. i could only get around 30 yaw loss and around 10 pitch and roll loss for AFLW2000, even after filtering out angles greater than 99 degrees. i saw in your paper that whenet-v was trained on 300w and got an MAE on AFLW2000 below 5. i used hopenet's preprocessor for 300W and a batch size of 16, 1e-5 learning rate, same network structure, 25 episodes to train on 300W, etc. what should i check / change to reproduce the small AFLW error?

    also in your opinion can be model be further compressed to say like a few hundred kilobytes (assuming more training data, etc.) with a small loss of accuracy in practice? thanks.

    opened by simin75simin 0
  • Error in runing demo_video.py

    Error in runing demo_video.py

    Hi I am runing the demo_video.py, and get the following error:

    Traceback (most recent call last):
      File "demo_video.py", line 81, in <module>
        main(args)
      File "demo_video.py", line 54, in main
        frame_rgb = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
    cv2.error: OpenCV(3.4.8) /io/opencv/modules/imgproc/src/color.cpp:182: error: (-215:Assertion failed) !_src.empty() in function 'cvtColor'
    

    Can you help me ? Thanks a lot!

    opened by zwt233 0
  • Config folder is missing in MTC dataset

    Config folder is missing in MTC dataset

    Hi, Thank you for the great work and amazing codebase. I have a question regarding dataset preparation. 1. Line 215 in prepare_images.py, the config folder is missing in the MTC dataset. Although the downloaded dataset has the same size(270GB) as mentioned in documentation. There is no mention of this config folder in official website also. Any help is appreciated. github_issue

    opened by Aratrik 1
  • Inaccurate results for profile faces

    Inaccurate results for profile faces

    Hi,

    I'm currently testing the head pose estimation for profile-view faces, however, I am attaining unusual results. For test purposes, I'm using the 3 following profile view images: profile profile2 profile3

    For these 3 images, I get the following results: Yaw: [70.59546] Pitch: [34.594513] Roll: [33.04599] Yaw: [81.640686] Pitch: [-8.637505] Roll: [-13.95977] Yaw: [71.53331] Pitch: [10.675278] Roll: [13.468033]

    I am mainly focused on the yaw angles - It is clearly evident that these images contain strictly profile view images, however, neither of the detected yaw angles are close enough to 90 degrees.

    If anyone has any ideas as to what might be causing this issue, it would be highly appreciated.

    Thanks!

    opened by Mayur28 0
  • Error: This application failed to start because no Qt platform plugin could be initialized.

    Error: This application failed to start because no Qt platform plugin could be initialized.

    Hi I am using a Docker container to run this model, but it is throwing me Aborted (core dumped) error.

    qt.qpa.xcb: could not connect to display :1 qt.qpa.plugin: Could not load the Qt platform plugin "xcb" in "/usr/local/lib/python3.5/dist-packages/cv2/qt/plugins" even though it was found. This application failed to start because no Qt platform plugin could be initialized. Reinstalling the application may fix this problem

    this is how I am using the command, python3 demo_video.py --video off.mp4 --display 0 --output output.mp4

    Seems like problem is with --display argument.

    It take so much time on this step. image

    opened by khandriod 0
Owner
null
OpenFace – a state-of-the art tool intended for facial landmark detection, head pose estimation, facial action unit recognition, and eye-gaze estimation.

OpenFace 2.2.0: a facial behavior analysis toolkit Over the past few years, there has been an increased interest in automatic facial behavior analysis

Tadas Baltrusaitis 5.8k Dec 31, 2022
WHENet - ONNX, OpenVINO, TFLite, TensorRT, EdgeTPU, CoreML, TFJS, YOLOv4/YOLOv4-tiny-3L

HeadPoseEstimation-WHENet-yolov4-onnx-openvino ONNX, OpenVINO, TFLite, TensorRT, EdgeTPU, CoreML, TFJS, YOLOv4/YOLOv4-tiny-3L 1. Usage $ git clone htt

Katsuya Hyodo 49 Sep 21, 2022
Repository providing a wide range of self-supervised pretrained models for computer vision tasks.

Hierarchical Pretraining: Research Repository This is a research repository for reproducing the results from the project "Self-supervised pretraining

Colorado Reed 53 Nov 9, 2022
Human head pose estimation using Keras over TensorFlow.

RealHePoNet: a robust single-stage ConvNet for head pose estimation in the wild.

Rafael Berral Soler 71 Jan 5, 2023
Deep Learning Head Pose Estimation using PyTorch.

Hopenet is an accurate and easy to use head pose estimation network. Models have been trained on the 300W-LP dataset and have been tested on real data with good qualitative performance.

Nataniel Ruiz 1.3k Dec 26, 2022
Code for BMVC2021 "MOS: A Low Latency and Lightweight Framework for Face Detection, Landmark Localization, and Head Pose Estimation"

MOS-Multi-Task-Face-Detect Introduction This repo is the official implementation of "MOS: A Low Latency and Lightweight Framework for Face Detection,

null 104 Dec 8, 2022
Official Pytorch implementation of 6DRepNet: 6D Rotation representation for unconstrained head pose estimation.

6D Rotation Representation for Unconstrained Head Pose Estimation (Pytorch) Paper Thorsten Hempel and Ahmed A. Abdelrahman and Ayoub Al-Hamadi, "6D Ro

Thorsten Hempel 284 Dec 23, 2022
Python and C++ implementation of "MarkerPose: Robust real-time planar target tracking for accurate stereo pose estimation". Accepted at LXCV @ CVPR 2021.

MarkerPose: Robust real-time planar target tracking for accurate stereo pose estimation This is a PyTorch and LibTorch implementation of MarkerPose: a

Jhacson Meza 47 Nov 18, 2022
Demo for Real-time RGBD-based Extended Body Pose Estimation paper

Real-time RGBD-based Extended Body Pose Estimation This repository is a real-time demo for our paper that was published at WACV 2021 conference The ou

Renat Bashirov 118 Dec 26, 2022
This repo is official PyTorch implementation of MobileHumanPose: Toward real-time 3D human pose estimation in mobile devices(CVPRW 2021).

Github Code of "MobileHumanPose: Toward real-time 3D human pose estimation in mobile devices" Introduction This repo is official PyTorch implementatio

Choi Sang Bum 203 Jan 5, 2023
Real-time pose estimation accelerated with NVIDIA TensorRT

trt_pose Want to detect hand poses? Check out the new trt_pose_hand project for real-time hand pose and gesture recognition! trt_pose is aimed at enab

NVIDIA AI IOT 803 Jan 6, 2023
This repository contains a pytorch implementation of "HeadNeRF: A Real-time NeRF-based Parametric Head Model (CVPR 2022)".

HeadNeRF: A Real-time NeRF-based Parametric Head Model This repository contains a pytorch implementation of "HeadNeRF: A Real-time NeRF-based Parametr

null 294 Jan 1, 2023
Light-weight network, depth estimation, knowledge distillation, real-time depth estimation, auxiliary data.

light-weight-depth-estimation Boosting Light-Weight Depth Estimation Via Knowledge Distillation, https://arxiv.org/abs/2105.06143 Junjie Hu, Chenyou F

Junjie Hu 13 Dec 10, 2022
SE3 Pose Interp - Interpolate camera pose or trajectory in SE3, pose interpolation, trajectory interpolation

SE3 Pose Interpolation Pose estimated from SLAM system are always discrete, and

Ran Cheng 4 Dec 15, 2022
This is the official PyTorch implementation of the paper "TransFG: A Transformer Architecture for Fine-grained Recognition" (Ju He, Jie-Neng Chen, Shuai Liu, Adam Kortylewski, Cheng Yang, Yutong Bai, Changhu Wang, Alan Yuille).

TransFG: A Transformer Architecture for Fine-grained Recognition Official PyTorch code for the paper: TransFG: A Transformer Architecture for Fine-gra

Ju He 307 Jan 3, 2023
Code and data of the Fine-Grained R2R Dataset proposed in paper Sub-Instruction Aware Vision-and-Language Navigation

Fine-Grained R2R Code and data of the Fine-Grained R2R Dataset proposed in the EMNLP2020 paper Sub-Instruction Aware Vision-and-Language Navigation. C

YicongHong 34 Nov 15, 2022
The coda and data for "Measuring Fine-Grained Domain Relevance of Terms: A Hierarchical Core-Fringe Approach" (ACL '21)

We propose a hierarchical core-fringe learning framework to measure fine-grained domain relevance of terms – the degree that a term is relevant to a broad (e.g., computer science) or narrow (e.g., deep learning) domain.

Jie Huang 14 Oct 21, 2022
The implementation of CVPR2021 paper Temporal Query Networks for Fine-grained Video Understanding, by Chuhan Zhang, Ankush Gupta and Andrew Zisserman.

Temporal Query Networks for Fine-grained Video Understanding ?? This repository contains the implementation of CVPR2021 paper Temporal_Query_Networks

null 55 Dec 21, 2022
PyTorch implementation for Stochastic Fine-grained Labeling of Multi-state Sign Glosses for Continuous Sign Language Recognition.

Stochastic CSLR This is the PyTorch implementation for the ECCV 2020 paper: Stochastic Fine-grained Labeling of Multi-state Sign Glosses for Continuou

Zhe Niu 28 Dec 19, 2022