Official implementation of ETH-XGaze dataset baseline

Xucong Zhang

Last update: Jan 3, 2023

Related tags

Deep Learning ETH-XGaze

Overview

ETH-XGaze baseline

Official implementation of ETH-XGaze dataset baseline.

ETH-XGaze dataset

ETH-XGaze dataset is a gaze estimation dataset consisting of over one million high-resolution images of varying gaze under extreme head poses. We established a simple baseline test on our ETH-XGaze dataset and other datasets. This repository includes the code and pre-trained model. Please find more details about the dataset on our project page.

License

The code is under the license of CC BY-NC-SA 4.0 license

Requirement

Python 3.5
Pytorch 1.1.0, torchvision
opencv-python

For model training

h5py to load the training data
configparser

For testing

dlib for face and facial landmark detection.

Training

You need to download the ETH-XGaze dataset for training. After downloading the data, make sure it is the version of pre-processed 224*224 pixels face patch. Put the data under '\data\xgaze'
Run the python main.py to train the model
The model will be saved under 'ckpt' folder.

Test

The demo.py files show how to perform the gaze estimation from input image. The example image is already in 'example/input' folder.

First, you need to download the pre-trained model, and put it under "ckpt" folder.
And then, run the 'python demo.py' for test.

Data normalization

The 'normalization_example.py' gives the example of data normalization from the raw dataset to the normalized data.

Citation

If using this code-base and/or the ETH-XGaze dataset in your research, please cite the following publication:

@inproceedings{Zhang2020ETHXGaze,
  author    = {Xucong Zhang and Seonwook Park and Thabo Beeler and Derek Bradley and Siyu Tang and Otmar Hilliges},
  title     = {ETH-XGaze: A Large Scale Dataset for Gaze Estimation under Extreme Head Pose and Gaze Variation},
  year      = {2020},
  booktitle = {European Conference on Computer Vision (ECCV)}
}

FAQ

Q: Where are the test set labels?
You can submit your test result to our leaderboard and get the results. Please do follow the registration first, otherwiese, your request will be ignored. Link to the leaderboard.

Q: What is the data normalization?
As we wrote in our paper, data normalization is a method to crop the face/eye image without head rotation around the roll axis. Please refer to the following paper for details: Revisiting Data Normalization for Appearance-Based Gaze Estimation

Q: Why convert 3D gaze direction (vector) to 2D gaze direction (pitch and yaw)? How to convert between 3D and 2D gaze directions?
Essentially to say, 2D pitch and yaw is enough to describe the gaze direction in the head coordinate system, and using 2D instead of 3D could make the model training easier. There are code examples on how to convert between them in the "utils.py" file as pitchyaw_to_vector and vector_to_pitchyaw.

Comments

Questions about the baseline structure

Hi Xucong,

Excellent work on the ETH XGaze! It really provides a diverse dataset of gaze estimation. I have a questions regarding the baseline structure. Why we don't compress the FC outputs through a tanh/sigmoid activation function to normalize the output a bit? Is there a intuition to use the original outputs?

Additionally, I suggest in the demo code a model.eval() could be added before running the forward pass. https://github.com/xucong-zhang/ETH-XGaze/blob/ca2d991b8dea2b244f75dbb899c84afd15ed745c/demo.py#L158-L164

Looking forward to your reply!

Best, Yijun

opened by Yijun88 9
Is there anyway to get the eye patches given we have `xgaze_224` (cropped face version) and `annotations` (lmks in original frame)
Hi everyone, I wonder is there anyway to get the eye patches given we have

xgaze_224 (cropped face version) and

annotations (lmks in original frame). Because download full frame version is 7T which will take along time to download.

Thanks and appreciate if anyone can help.
opened by vuthede 6
Calibrating gaze vector to Screen point ?

I'm trying to map the pred_gaze_np output to the 2D screen point. Is this something already implemented ? If not can you please help me with what approach should be followed. I tried a simple polynomial regression from the vector to screen points and the results are decent but I'm wondering if there's a better approach.

opened by ShreshthSaxena 5
XML files for camera parameters

Hi,

I noticed that in your code ,camera parameters are required (see below for details). However, I did not find the folder or the files in your code. Could you please let me know where can I download these parameters/xml files?

file_name = './calibration/cam_calibration/' + 'cam' + str(cam_id).zfill(2) + '.xml' (l.195 in normalization_example.py)

Your help is very much appreciated.

Thanks.

opened by Frandre 4
How to crop eye in the pre-processed datasets

Hello, I downloaded the 224x224 pre-processed dataset but found that only normalized face images are provided. Is there any way to extract cropped eyes? Since I need eye patches for my model. Thanks!

opened by senfu 3
about the pre-trained model

Hi Xucong,

Excellent work on the ETH XGaze! Now I have meet an issue about the pre-trained model. I have download the model,but it cant uncompress. I dont know whether there has something wrong with the source file. Looking forward to your reply!

Best, LiuGang

opened by ZERO-SPACE-X 3
pitch and yaw (raw outputs of network) are not in HCS (head coordinates system)

Hi, thanks for this great paper and dataset and also all of your previous valuable works in the field of appearance-based gaze estimation.

I recently tried to use the raw output of the network, which is trained on the ETH-XGaze dataset, to estimate the PoG (Point of Gaze) in CCS (Camera Coordinates System). So I used your normalization method and find the normalizing rotation matrix to transform the normalized gaze vector which is in HCS, to the 3D gaze vector which is in CCS.

But it seems that pitch and yaw are not in HCS because when everything is unchanging, except the camera position, the network output changes. So if it is correct and pitch and yaw are not in HCS, we need an extra step further than a normalizing rotation matrix which compensates head pose. But I can't find this step and it is ambiguous for me.

opened by ffletcherr 2
Is gc the coordinate of target or calulated vector

gc_normalized = gc - face_center # gaze vector I am not sure whether gc is the coordinate of target based on camera coordinate system or already calculated based on head coordinate system? And why doing this step? please explain this, thanks!

opened by LazyKai 2
Dataset structure

Hi Xucong,

Thank you so much for making this dataset, and code available. I wanted to ask, is there any way we can get the structure of the dataset? That's to say what each tar file contains, and so on, similar to what you did with MPIIFaceGaze. That was very helpful :) I was able to download the 448 dataset, but it's missing the json file for train_test split, and the test set. You mentioned in https://github.com/xucong-zhang/ETH-XGaze/issues/9#issuecomment-795644070 that it's possible to get a needed missing file from the raw data, could you please clarify how?

Many thanks :)

opened by AbdouMechraoui 1
Few questions about the dataset (gaze, pose)

Hello, I am trying to understand the data structure of eth-xgaze dataset. In 'OnePersonDataset', three values are returned when it is called, which are image, pose, and gaze. It seems that gaze is combination of pitch and yaw which are in radians. (please correct me if i am wrong) I am little confused about what pose does during training. If the pose represents 'what direction the face is pointing at', then how can the pose be defined with one number (unlike pitch and yaw)?

opened by jasony93 0
Data download issues

Hi, Sorry to bother you, I am a student from Guangzhou university of china. I want to download the dataset for researching and submitted the registration on the website according to the guidelines. But i haven't received the download link. My e-mail is [email protected], could you please send me the download link? I have submitted the application again. Thanks for your great work and help!

opened by Roylo-bot 0
What does the face_gaze mean in the annotation file?

What does the face_gaze mean in the annotation file? It does not the left/right eye gaze or the mean of the both eyes gaze, right? Which 3d landmark do you use to calculate the face gaze?

opened by LovePug-XC 0
How to calculate the rvec and tvec? Are they calculated by solvepnp, which paras are face_model_3d_coordinates and ldmk68s from csv?

I got rvec, tvec calculated by the func below, but the result is not correspond with rvec, tvec from the lable csv.

_, rvec, tvec = cv2.solvePnP(face_model_3d_coordinates, ldmk50, camera_matrix, distortion_coefficients, rvec, tvec, useExtrinsicGuess=True, flags=cv2.SOLVEPNP_ITERATIVE)

opened by exploreTiny 0
Data download request

Hello, i am a student from Tsinghua university of china and i want to download the data for some researching. I submitted the registration form but haven't receive the download link. Could you please send me a download link? My email is [email protected], the other information is in the registration form. Sorry to bother you. I am just in a bit of a hurry. Thanks a lot!

opened by zdw-qingdao 1

cam_id or solvePnPRansac issue

Hello, I caught this problem:

In demo.py if I use HeadPoseEstimator() instead of estimateHeadPose() this error occures:

--> 29     hr, ht, o_l, o_r, _ = head_pose_estimator(image, landmarks, camera_matrix[cam_id])
     30     ## the easy way to get head pose information, fast and simple
     31 #     facePts = face_model.reshape(6, 1, 3)

NameError: name 'cam_id' is not defined

If I replace camera_matrix[cam_id] with camera_matrix this problem occures:

---------------------------------------------------------------------------
error                                     Traceback (most recent call last)
<ipython-input-13-a4e70c1ea3f5> in <module>
     27     landmarks = landmarks.reshape(-1, 2)
     28     head_pose_estimator = HeadPoseEstimator()
---> 29     hr, ht, o_l, o_r, _ = head_pose_estimator(image, landmarks, camera_matrix)
     30     ## the easy way to get head pose information, fast and simple
     31 #     facePts = face_model.reshape(6, 1, 3)

~/ETH-XGaze/head_pose.py in __call__(self, frame, landmarks, intrinsics, target_io_dist, visualize)
    136         # Do PnP-based head pose fitting
    137         rvec, tvec, reprojected_points, o_l, o_r, face_model = \
--> 138             self.head_pose_fit(landmarks, eos_mesh, intrinsics, scaling_factor)
    139         o_r_2D = cv.projectPoints(o_r, rvec, tvec, intrinsics, None)[0].reshape(2)
    140         o_l_2D = cv.projectPoints(o_l, rvec, tvec, intrinsics, None)[0].reshape(2)

~/ETH-XGaze/head_pose.py in head_pose_fit(self, landmarks_2D, deformed_mesh, intrinsics, scaling_factor)
     95         # Initial fit
     96         camera_matrix = intrinsics
---> 97         success, rvec, tvec, inliers = cv.solvePnPRansac(
     98             sfm_points_ibug_subset, landmarks_2D, camera_matrix, None,
     99             flags=cv.SOLVEPNP_EPNP)

error: OpenCV(4.5.2) /tmp/pip-req-build-dccdjyga/opencv/modules/calib3d/src/solvepnp.cpp:241: error: (-215:Assertion failed) npoints >= 4 && npoints == std::max(ipoints.checkVector(2, CV_32F), ipoints.checkVector(2, CV_64F)) in function 'solvePnPRansac'

could you please help?

opened by neeek2303 0

Official implementation of ETH-XGaze dataset baseline

Related tags

Overview

ETH-XGaze baseline

ETH-XGaze dataset

License

Requirement

For model training

For testing

Training

Test

Data normalization

Citation

FAQ

Comments

Owner

Xucong Zhang

Image-generation-baseline - MUGE Text To Image Generation Baseline

Jingju baseline - A baseline model of our project of Beijing opera script generation

Eth brownie struct encoding example

VIL-100: A New Dataset and A Baseline Model for Video Instance Lane Detection (ICCV 2021)

This is an official implementation for "DeciWatch: A Simple Baseline for 10x Efficient 2D and 3D Pose Estimation"

Official Implementation and Dataset of "PPR10K: A Large-Scale Portrait Photo Retouching Dataset with Human-Region Mask and Group-Level Consistency", CVPR 2021

Official Code for ICML 2021 paper "Revisiting Point Cloud Shape Classification with a Simple and Effective Baseline"

This is the official code repository for A Simple Long-Tailed Rocognition Baseline via Vision-Language Model.

This is the official source code for SLATE. We provide the code for the model, the training code, and a dataset loader for the 3D Shapes dataset. This code is implemented in Pytorch.

A PyTorch implementation of the baseline method in Panoptic Narrative Grounding (ICCV 2021 Oral)

TensorFlow implementation of "A Simple Baseline for Bayesian Uncertainty in Deep Learning"

PyTorch implementation of "A Simple Baseline for Low-Budget Active Learning".

This repo is developed for Strong Baseline For Vehicle Re-Identification in Track 2 Ai-City-2021 Challenges

Simple is not Easy: A Simple Strong Baseline for TextVQA and TextCaps[AAAI2021]

FairMOT - A simple baseline for one-shot multi-object tracking

Baseline model for "GraspNet-1Billion: A Large-Scale Benchmark for General Object Grasping" (CVPR 2020)

A baseline code for VSPW

DFM: A Performance Baseline for Deep Feature Matching