WHENet: Real-time Fine-Grained Estimation for Wide Range Head Pose

Last update: Dec 26, 2022

Related tags

Deep Learning HeadPoseEstimation-WHENet

Overview

WHENet: Real-time Fine-Grained Estimation for Wide Range Head Pose

Yijun Zhou and James Gregson - BMVC2020

Abstract: We present an end-to-end head-pose estimation network designed to predict Euler angles through the full range head yaws from a single RGB image. Existing methods perform well for frontal views but few target head pose from all viewpoints. This has applications in autonomous driving and retail. Our network builds on multi-loss approaches with changes to loss functions and training strategies adapted to wide range estimation. Additionally, we extract ground truth labelings of anterior views from a current panoptic dataset for the first time. The resulting Wide Headpose Estimation Network (WHENet) is the first fine-grained modern method applicable to the full-range of head yaws (hence wide) yet also meets or beats state-of-the-art methods for frontal head pose estimation. Our network is compact and efficient for mobile devices and applications. ArXiv

Demo

We provided two use case of the WHENet, image input and video input in this repo. Please make sure you installed all the requirments before running the demo code by pip install -r requirements.txt. Additionally, please download the YOLOv3 model for head detection and put it under yolo_v3/data.

Image demo

To run WHENet with image input, please put images and bbox.txt under one folder (E.g. Sample/) and just run pthon demo.py.

Format of bbox.txt are showed below:

image_name,x_min y_min x_max y_max
mov_001_007585.jpeg,240 0 304 83

Video/Webcam demo

We used YOLO_v3 in the video demo to get the cropped head image. In order to customize some of the functions we have put the yolo implementation and the pre-trained model in the repo. Hollywood head and Crowdhuman are used to train the head detection YOLO model.

demo_video.py [--video INPUT_VIDEO_PATH] [--snapshot WHENET_MODEL] [--display DISPLAY_OPTION] 
              [--score YOLO_CONFIDENCE_THRESHOLD] [--iou IOU_THRESHOLD] [--gpu GPU#] [--output OUTPUT_VIDEO_PATH]

Please set --video '' for webcam input.

Dependncies

EfficientNet https://github.com/qubvel/efficientnet
Yolo_v3 https://github.com/qqwweee/keras-yolo3

Comments

Cant get it up and running, any help is appreciated

I cant get this project up and running, would it be possible to get some more instructions ?

What python version are you using?

Any other dependencies ?

What CPU and OS are you running ?

using CUDA ?

opened by bobmoff 1
Swish activation layer error

Hi, i have converted this model to uff and onnx, but in both cases I am not able to serialize to a trt engine due to the swish layer.

Error message: ERROR: INVALID_ARGUMENT: getPluginCreator could not find plugin swish_f32 version 1 ERROR: builtin_op_importers.cpp:3773 In function importFallbackPluginImporter: [8] Assertion failed: creator && "Plugin not found, are the plugin name, version, and namespace correct?" ERROR: Network must have at least one output ERROR: Network validation failed.

kindly guide solutions or alternatives if possible, thanks p.s: the model is trained really well! i tested on faces with masks as well, very promising :)

opened by randallsalvares 1
try to reproduce the small aflw error

so i was trying to reproduce whenet-v using the same network structure, preprocessor, etc. i could only get around 30 yaw loss and around 10 pitch and roll loss for AFLW2000, even after filtering out angles greater than 99 degrees. i saw in your paper that whenet-v was trained on 300w and got an MAE on AFLW2000 below 5. i used hopenet's preprocessor for 300W and a batch size of 16, 1e-5 learning rate, same network structure, 25 episodes to train on 300W, etc. what should i check / change to reproduce the small AFLW error?

also in your opinion can be model be further compressed to say like a few hundred kilobytes (assuming more training data, etc.) with a small loss of accuracy in practice? thanks.

opened by simin75simin 0

Error in runing demo_video.py

Hi I am runing the demo_video.py, and get the following error:

Traceback (most recent call last):
  File "demo_video.py", line 81, in <module>
    main(args)
  File "demo_video.py", line 54, in main
    frame_rgb = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
cv2.error: OpenCV(3.4.8) /io/opencv/modules/imgproc/src/color.cpp:182: error: (-215:Assertion failed) !_src.empty() in function 'cvtColor'

Can you help me ? Thanks a lot!

opened by zwt233 0

Config folder is missing in MTC dataset

Hi, Thank you for the great work and amazing codebase. I have a question regarding dataset preparation. 1. Line 215 in prepare_images.py, the config folder is missing in the MTC dataset. Although the downloaded dataset has the same size(270GB) as mentioned in documentation. There is no mention of this config folder in official website also. Any help is appreciated.

opened by Aratrik 1
Inaccurate results for profile faces

Hi,

I'm currently testing the head pose estimation for profile-view faces, however, I am attaining unusual results. For test purposes, I'm using the 3 following profile view images:

For these 3 images, I get the following results: Yaw: [70.59546] Pitch: [34.594513] Roll: [33.04599] Yaw: [81.640686] Pitch: [-8.637505] Roll: [-13.95977] Yaw: [71.53331] Pitch: [10.675278] Roll: [13.468033]

I am mainly focused on the yaw angles - It is clearly evident that these images contain strictly profile view images, however, neither of the detected yaw angles are close enough to 90 degrees.

If anyone has any ideas as to what might be causing this issue, it would be highly appreciated.

Thanks!

opened by Mayur28 0
Error: This application failed to start because no Qt platform plugin could be initialized.

Hi I am using a Docker container to run this model, but it is throwing me Aborted (core dumped) error.

qt.qpa.xcb: could not connect to display :1 qt.qpa.plugin: Could not load the Qt platform plugin "xcb" in "/usr/local/lib/python3.5/dist-packages/cv2/qt/plugins" even though it was found. This application failed to start because no Qt platform plugin could be initialized. Reinstalling the application may fix this problem

this is how I am using the command, python3 demo_video.py --video off.mp4 --display 0 --output output.mp4

Seems like problem is with --display argument.

It take so much time on this step.

opened by khandriod 0

WHENet: Real-time Fine-Grained Estimation for Wide Range Head Pose

Related tags

Overview

WHENet: Real-time Fine-Grained Estimation for Wide Range Head Pose

Demo

Image demo

Video/Webcam demo

Dependncies

Comments

Cant get it up and running, any help is appreciated

Swish activation layer error

try to reproduce the small aflw error

Error in runing demo_video.py

Config folder is missing in MTC dataset

Inaccurate results for profile faces

Error: This application failed to start because no Qt platform plugin could be initialized.

Owner

OpenFace – a state-of-the art tool intended for facial landmark detection, head pose estimation, facial action unit recognition, and eye-gaze estimation.

WHENet - ONNX, OpenVINO, TFLite, TensorRT, EdgeTPU, CoreML, TFJS, YOLOv4/YOLOv4-tiny-3L

Repository providing a wide range of self-supervised pretrained models for computer vision tasks.

Human head pose estimation using Keras over TensorFlow.

Deep Learning Head Pose Estimation using PyTorch.

Code for BMVC2021 "MOS: A Low Latency and Lightweight Framework for Face Detection, Landmark Localization, and Head Pose Estimation"

Official Pytorch implementation of 6DRepNet: 6D Rotation representation for unconstrained head pose estimation.

Python and C++ implementation of "MarkerPose: Robust real-time planar target tracking for accurate stereo pose estimation". Accepted at LXCV @ CVPR 2021.

Demo for Real-time RGBD-based Extended Body Pose Estimation paper

This repo is official PyTorch implementation of MobileHumanPose: Toward real-time 3D human pose estimation in mobile devices(CVPRW 2021).

Real-time pose estimation accelerated with NVIDIA TensorRT

This repository contains a pytorch implementation of "HeadNeRF: A Real-time NeRF-based Parametric Head Model (CVPR 2022)".

Light-weight network, depth estimation, knowledge distillation, real-time depth estimation, auxiliary data.

SE3 Pose Interp - Interpolate camera pose or trajectory in SE3, pose interpolation, trajectory interpolation

This is the official PyTorch implementation of the paper "TransFG: A Transformer Architecture for Fine-grained Recognition" (Ju He, Jie-Neng Chen, Shuai Liu, Adam Kortylewski, Cheng Yang, Yutong Bai, Changhu Wang, Alan Yuille).

Code and data of the Fine-Grained R2R Dataset proposed in paper Sub-Instruction Aware Vision-and-Language Navigation

The coda and data for "Measuring Fine-Grained Domain Relevance of Terms: A Hierarchical Core-Fringe Approach" (ACL '21)

The implementation of CVPR2021 paper Temporal Query Networks for Fine-grained Video Understanding, by Chuhan Zhang, Ankush Gupta and Andrew Zisserman.

PyTorch implementation for Stochastic Fine-grained Labeling of Multi-state Sign Glosses for Continuous Sign Language Recognition.