Official implementation of ETH-XGaze dataset baseline

Overview

ETH-XGaze baseline

Official implementation of ETH-XGaze dataset baseline.

ETH-XGaze dataset

ETH-XGaze dataset is a gaze estimation dataset consisting of over one million high-resolution images of varying gaze under extreme head poses. We established a simple baseline test on our ETH-XGaze dataset and other datasets. This repository includes the code and pre-trained model. Please find more details about the dataset on our project page.

License

The code is under the license of CC BY-NC-SA 4.0 license

Requirement

  • Python 3.5
  • Pytorch 1.1.0, torchvision
  • opencv-python

For model training

  • h5py to load the training data
  • configparser

For testing

  • dlib for face and facial landmark detection.

Training

  • You need to download the ETH-XGaze dataset for training. After downloading the data, make sure it is the version of pre-processed 224*224 pixels face patch. Put the data under '\data\xgaze'
  • Run the python main.py to train the model
  • The model will be saved under 'ckpt' folder.

Test

The demo.py files show how to perform the gaze estimation from input image. The example image is already in 'example/input' folder.

  • First, you need to download the pre-trained model, and put it under "ckpt" folder.
  • And then, run the 'python demo.py' for test.

Data normalization

The 'normalization_example.py' gives the example of data normalization from the raw dataset to the normalized data.

Citation

If using this code-base and/or the ETH-XGaze dataset in your research, please cite the following publication:

@inproceedings{Zhang2020ETHXGaze,
  author    = {Xucong Zhang and Seonwook Park and Thabo Beeler and Derek Bradley and Siyu Tang and Otmar Hilliges},
  title     = {ETH-XGaze: A Large Scale Dataset for Gaze Estimation under Extreme Head Pose and Gaze Variation},
  year      = {2020},
  booktitle = {European Conference on Computer Vision (ECCV)}
}

FAQ

Q: Where are the test set labels?
You can submit your test result to our leaderboard and get the results. Please do follow the registration first, otherwiese, your request will be ignored. Link to the leaderboard.

Q: What is the data normalization?
As we wrote in our paper, data normalization is a method to crop the face/eye image without head rotation around the roll axis. Please refer to the following paper for details: Revisiting Data Normalization for Appearance-Based Gaze Estimation

Q: Why convert 3D gaze direction (vector) to 2D gaze direction (pitch and yaw)? How to convert between 3D and 2D gaze directions?
Essentially to say, 2D pitch and yaw is enough to describe the gaze direction in the head coordinate system, and using 2D instead of 3D could make the model training easier. There are code examples on how to convert between them in the "utils.py" file as pitchyaw_to_vector and vector_to_pitchyaw.

Comments
  • Questions about the baseline structure

    Questions about the baseline structure

    Hi Xucong,

    Excellent work on the ETH XGaze! It really provides a diverse dataset of gaze estimation. I have a questions regarding the baseline structure. Why we don't compress the FC outputs through a tanh/sigmoid activation function to normalize the output a bit? Is there a intuition to use the original outputs?

    Additionally, I suggest in the demo code a model.eval() could be added before running the forward pass. https://github.com/xucong-zhang/ETH-XGaze/blob/ca2d991b8dea2b244f75dbb899c84afd15ed745c/demo.py#L158-L164

    Looking forward to your reply!

    Best, Yijun

    opened by Yijun88 9
  • Is there anyway to get the eye patches given we have `xgaze_224` (cropped face version) and `annotations` (lmks in original frame)

    Is there anyway to get the eye patches given we have `xgaze_224` (cropped face version) and `annotations` (lmks in original frame)

    Hi everyone, I wonder is there anyway to get the eye patches given we have

    • xgaze_224 (cropped face version) and
    • annotations (lmks in original frame). Because download full frame version is 7T which will take along time to download.

    Thanks and appreciate if anyone can help.

    opened by vuthede 6
  • Calibrating gaze vector to Screen point ?

    Calibrating gaze vector to Screen point ?

    I'm trying to map the pred_gaze_np output to the 2D screen point. Is this something already implemented ? If not can you please help me with what approach should be followed. I tried a simple polynomial regression from the vector to screen points and the results are decent but I'm wondering if there's a better approach.

    opened by ShreshthSaxena 5
  • XML files for camera parameters

    XML files for camera parameters

    Hi,

    I noticed that in your code ,camera parameters are required (see below for details). However, I did not find the folder or the files in your code. Could you please let me know where can I download these parameters/xml files?

    file_name = './calibration/cam_calibration/' + 'cam' + str(cam_id).zfill(2) + '.xml' (l.195 in normalization_example.py)

    Your help is very much appreciated.

    Thanks.

    opened by Frandre 4
  • How to crop eye in the pre-processed datasets

    How to crop eye in the pre-processed datasets

    Hello, I downloaded the 224x224 pre-processed dataset but found that only normalized face images are provided. Is there any way to extract cropped eyes? Since I need eye patches for my model. Thanks!

    opened by senfu 3
  • about the pre-trained model

    about the pre-trained model

    Hi Xucong,

    Excellent work on the ETH XGaze! Now I have meet an issue about the pre-trained model. I have download the model,but it cant uncompress. I dont know whether there has something wrong with the source file. Looking forward to your reply!

    Best, LiuGang

    opened by ZERO-SPACE-X 3
  • pitch and yaw (raw outputs of network) are not in HCS (head coordinates system)

    pitch and yaw (raw outputs of network) are not in HCS (head coordinates system)

    Hi, thanks for this great paper and dataset and also all of your previous valuable works in the field of appearance-based gaze estimation.

    I recently tried to use the raw output of the network, which is trained on the ETH-XGaze dataset, to estimate the PoG (Point of Gaze) in CCS (Camera Coordinates System). So I used your normalization method and find the normalizing rotation matrix to transform the normalized gaze vector which is in HCS, to the 3D gaze vector which is in CCS.

    But it seems that pitch and yaw are not in HCS because when everything is unchanging, except the camera position, the network output changes. So if it is correct and pitch and yaw are not in HCS, we need an extra step further than a normalizing rotation matrix which compensates head pose. But I can't find this step and it is ambiguous for me.

    opened by ffletcherr 2
  • Is gc the coordinate of target or calulated vector

    Is gc the coordinate of target or calulated vector

    gc_normalized = gc - face_center # gaze vector I am not sure whether gc is the coordinate of target based on camera coordinate system or already calculated based on head coordinate system? And why doing this step? please explain this, thanks!

    opened by LazyKai 2
  • Dataset structure

    Dataset structure

    Hi Xucong,

    Thank you so much for making this dataset, and code available. I wanted to ask, is there any way we can get the structure of the dataset? That's to say what each tar file contains, and so on, similar to what you did with MPIIFaceGaze. That was very helpful :) I was able to download the 448 dataset, but it's missing the json file for train_test split, and the test set. You mentioned in https://github.com/xucong-zhang/ETH-XGaze/issues/9#issuecomment-795644070 that it's possible to get a needed missing file from the raw data, could you please clarify how?

    Many thanks :)

    opened by AbdouMechraoui 1
  • Few questions about the dataset (gaze, pose)

    Few questions about the dataset (gaze, pose)

    Hello, I am trying to understand the data structure of eth-xgaze dataset. In 'OnePersonDataset', three values are returned when it is called, which are image, pose, and gaze. It seems that gaze is combination of pitch and yaw which are in radians. (please correct me if i am wrong) I am little confused about what pose does during training. If the pose represents 'what direction the face is pointing at', then how can the pose be defined with one number (unlike pitch and yaw)?

    opened by jasony93 0
  • Data download issues

    Data download issues

    Hi, Sorry to bother you, I am a student from Guangzhou university of china. I want to download the dataset for researching and submitted the registration on the website according to the guidelines. But i haven't received the download link. My e-mail is [email protected], could you please send me the download link? I have submitted the application again. Thanks for your great work and help!

    opened by Roylo-bot 0
  • What does the face_gaze mean in the annotation file?

    What does the face_gaze mean in the annotation file?

    What does the face_gaze mean in the annotation file? It does not the left/right eye gaze or the mean of the both eyes gaze, right? Which 3d landmark do you use to calculate the face gaze?

    opened by LovePug-XC 0
  • How to calculate the rvec and tvec? Are they calculated by solvepnp, which paras are face_model_3d_coordinates and ldmk68s from csv?

    How to calculate the rvec and tvec? Are they calculated by solvepnp, which paras are face_model_3d_coordinates and ldmk68s from csv?

    I got rvec, tvec calculated by the func below, but the result is not correspond with rvec, tvec from the lable csv.

    _, rvec, tvec = cv2.solvePnP(face_model_3d_coordinates, ldmk50, camera_matrix, distortion_coefficients, rvec, tvec, useExtrinsicGuess=True, flags=cv2.SOLVEPNP_ITERATIVE)

    opened by exploreTiny 0
  • Data download request

    Data download request

    Hello, i am a student from Tsinghua university of china and i want to download the data for some researching. I submitted the registration form but haven't receive the download link. Could you please send me a download link? My email is [email protected], the other information is in the registration form. Sorry to bother you. I am just in a bit of a hurry. Thanks a lot!

    opened by zdw-qingdao 1
  • cam_id or solvePnPRansac issue

    cam_id or solvePnPRansac issue

    Hello, I caught this problem:

    In demo.py if I use HeadPoseEstimator() instead of estimateHeadPose() this error occures:

    --> 29     hr, ht, o_l, o_r, _ = head_pose_estimator(image, landmarks, camera_matrix[cam_id])
         30     ## the easy way to get head pose information, fast and simple
         31 #     facePts = face_model.reshape(6, 1, 3)
    
    NameError: name 'cam_id' is not defined
    

    If I replace camera_matrix[cam_id] with camera_matrix this problem occures:

    ---------------------------------------------------------------------------
    error                                     Traceback (most recent call last)
    <ipython-input-13-a4e70c1ea3f5> in <module>
         27     landmarks = landmarks.reshape(-1, 2)
         28     head_pose_estimator = HeadPoseEstimator()
    ---> 29     hr, ht, o_l, o_r, _ = head_pose_estimator(image, landmarks, camera_matrix)
         30     ## the easy way to get head pose information, fast and simple
         31 #     facePts = face_model.reshape(6, 1, 3)
    
    ~/ETH-XGaze/head_pose.py in __call__(self, frame, landmarks, intrinsics, target_io_dist, visualize)
        136         # Do PnP-based head pose fitting
        137         rvec, tvec, reprojected_points, o_l, o_r, face_model = \
    --> 138             self.head_pose_fit(landmarks, eos_mesh, intrinsics, scaling_factor)
        139         o_r_2D = cv.projectPoints(o_r, rvec, tvec, intrinsics, None)[0].reshape(2)
        140         o_l_2D = cv.projectPoints(o_l, rvec, tvec, intrinsics, None)[0].reshape(2)
    
    ~/ETH-XGaze/head_pose.py in head_pose_fit(self, landmarks_2D, deformed_mesh, intrinsics, scaling_factor)
         95         # Initial fit
         96         camera_matrix = intrinsics
    ---> 97         success, rvec, tvec, inliers = cv.solvePnPRansac(
         98             sfm_points_ibug_subset, landmarks_2D, camera_matrix, None,
         99             flags=cv.SOLVEPNP_EPNP)
    
    error: OpenCV(4.5.2) /tmp/pip-req-build-dccdjyga/opencv/modules/calib3d/src/solvepnp.cpp:241: error: (-215:Assertion failed) npoints >= 4 && npoints == std::max(ipoints.checkVector(2, CV_32F), ipoints.checkVector(2, CV_64F)) in function 'solvePnPRansac'
    

    could you please help?

    opened by neeek2303 0
Owner
Xucong Zhang
Postdoc at ETH Zurich
Xucong Zhang
Image-generation-baseline - MUGE Text To Image Generation Baseline

MUGE Text To Image Generation Baseline Requirements and Installation More detail

null 23 Oct 17, 2022
Jingju baseline - A baseline model of our project of Beijing opera script generation

Jingju Baseline It is a baseline of our project about Beijing opera script gener

midon 1 Jan 14, 2022
Eth brownie struct encoding example

eth-brownie struct encoding example Overview This repository contains an example of encoding a struct, so that it can be used in a function call, usin

Ittai Svidler 2 Mar 4, 2022
VIL-100: A New Dataset and A Baseline Model for Video Instance Lane Detection (ICCV 2021)

Preparation Please see dataset/README.md to get more details about our datasets-VIL100 Please see INSTALL.md to install environment and evaluation too

null 82 Dec 15, 2022
This is an official implementation for "DeciWatch: A Simple Baseline for 10x Efficient 2D and 3D Pose Estimation"

DeciWatch: A Simple Baseline for 10× Efficient 2D and 3D Pose Estimation This repo is the official implementation of "DeciWatch: A Simple Baseline for

null 117 Dec 24, 2022
Official Implementation and Dataset of "PPR10K: A Large-Scale Portrait Photo Retouching Dataset with Human-Region Mask and Group-Level Consistency", CVPR 2021

Portrait Photo Retouching with PPR10K Paper | Supplementary Material PPR10K: A Large-Scale Portrait Photo Retouching Dataset with Human-Region Mask an

null 184 Dec 11, 2022
Official Code for ICML 2021 paper "Revisiting Point Cloud Shape Classification with a Simple and Effective Baseline"

Revisiting Point Cloud Shape Classification with a Simple and Effective Baseline Ankit Goyal, Hei Law, Bowei Liu, Alejandro Newell, Jia Deng Internati

Princeton Vision & Learning Lab 115 Jan 4, 2023
This is the official code repository for A Simple Long-Tailed Rocognition Baseline via Vision-Language Model.

BALLAD This is the official code repository for A Simple Long-Tailed Rocognition Baseline via Vision-Language Model. Requirements Python3 Pytorch(1.7.

peng gao 11 Dec 1, 2021
This is the official source code for SLATE. We provide the code for the model, the training code, and a dataset loader for the 3D Shapes dataset. This code is implemented in Pytorch.

SLATE This is the official source code for SLATE. We provide the code for the model, the training code and a dataset loader for the 3D Shapes dataset.

Gautam Singh 66 Dec 26, 2022
A PyTorch implementation of the baseline method in Panoptic Narrative Grounding (ICCV 2021 Oral)

A PyTorch implementation of the baseline method in Panoptic Narrative Grounding (ICCV 2021 Oral)

Biomedical Computer Vision @ Uniandes 52 Dec 19, 2022
TensorFlow implementation of "A Simple Baseline for Bayesian Uncertainty in Deep Learning"

TensorFlow implementation of "A Simple Baseline for Bayesian Uncertainty in Deep Learning"

YeongHyeon Park 7 Aug 28, 2022
PyTorch implementation of "A Simple Baseline for Low-Budget Active Learning".

A Simple Baseline for Low-Budget Active Learning This repository is the implementation of A Simple Baseline for Low-Budget Active Learning. In this pa

null 10 Nov 14, 2022
This repo is developed for Strong Baseline For Vehicle Re-Identification in Track 2 Ai-City-2021 Challenges

A STRONG BASELINE FOR VEHICLE RE-IDENTIFICATION This paper is accepted to the IEEE Conference on Computer Vision and Pattern Recognition Workshop(CVPR

Cybercore Co. Ltd 78 Dec 29, 2022
Simple is not Easy: A Simple Strong Baseline for TextVQA and TextCaps[AAAI2021]

Simple is not Easy: A Simple Strong Baseline for TextVQA and TextCaps Here is the code for ssbassline model. We also provide OCR results/features/mode

ZephyrZhuQi 51 Nov 18, 2022
FairMOT - A simple baseline for one-shot multi-object tracking

FairMOT - A simple baseline for one-shot multi-object tracking

Yifu Zhang 3.6k Jan 8, 2023
Baseline model for "GraspNet-1Billion: A Large-Scale Benchmark for General Object Grasping" (CVPR 2020)

GraspNet Baseline Baseline model for "GraspNet-1Billion: A Large-Scale Benchmark for General Object Grasping" (CVPR 2020). [paper] [dataset] [API] [do

GraspNet 209 Dec 29, 2022
A baseline code for VSPW

A baseline code for VSPW Preparation Download VSPW dataset The VSPW dataset with extracted frames and masks is available here.

null 28 Aug 22, 2022
DFM: A Performance Baseline for Deep Feature Matching

DFM: A Performance Baseline for Deep Feature Matching Python (Pytorch) and Matlab (MatConvNet) implementations of our paper DFM: A Performance Baselin

null 143 Jan 2, 2023