The codes and models in 'Gaze Estimation using Transformer'.

Last update: Dec 27, 2022

Related tags

Deep Learning GazeTR

Overview

GazeTR

We provide the code of GazeTR-Hybrid in "Gaze Estimation using Transformer".

We recommend you to use data processing codes provided in GazeHub. You can direct run the method' code using the processed dataset.

Requirements

We build the project with pytorch1.7.0.

The warmup is used following here.

Usage

Directly use our code.

You should perform three steps to run our codes.

Prepare the data using our provided data processing codes.
Modify the config/train/config_xx.yaml and config/test/config_xx.yaml.
Run the commands.

To perform leave-one-person-out evaluation, you can run

python trainer/leave.py -s config/train/config_xx.yaml -p 0

Note that, this command only performs training in the 0th person. You should modify the parameter of -p and repeat it.

To perform training-test evaluation, you can run

python trainer/total.py -s config/train/config_xx.yaml

To test your model, you can run

python trainer/leave.py -s config/train/config_xx.yaml -t config/test/config_xx.yaml -p 0

python trainer/total.py -s config/train/config_xx.yaml -t config/test/config_xx.yaml

Build your own project.

You can import the model in model.py for your own project.

We give an example. Note that, the line 114 in model.py uses .cuda(). You should remove it if you run the model in CPU.

from model import Model
GazeTR = Model()

img = torch.ones(10, 3, 224 ,224).cuda()
img = {'face': img}
label = torch.ones(10, 2).cuda()

# for training
loss = GazeTR(img, label)

# for test
gaze = GazeTR(img)

Pre-trained model

You can download from google drive or baidu cloud disk with code 1234.

This is the pre-trained model in ETH-XGaze dataset with 50 epochs and 512 batch sizes.

Performance

Links to gaze estimation codes.

A Coarse-to-fine Adaptive Network for Appearance-based Gaze Estimation, AAAI 2020 (Coming soon)
Gaze360: Physically Unconstrained Gaze Estimation in the Wild, ICCV 2019
Appearance-Based Gaze Estimation Using Dilated-Convolutions, ACCV 2019
Appearance-Based Gaze Estimation via Evaluation-Guided Asymmetric Regression, ECCV 2018
RT-GENE: Real-Time Eye Gaze Estimation in Natural Environments, ECCV 2018
MPIIGaze: Real-World Dataset and Deep Appearance-Based Gaze Estimation, TPAMI 2017
It’s written all over your face: Full-face appearance-based gaze estimation, CVPRW 2017
Eye Tracking for Everyone, CVPR 2016
Appearance-Based Gaze Estimation in the Wild, CVPR 2015

License

The code is under the license of CC BY-NC-SA 4.0 license.

Contact

Please email any questions or comments to [email protected].

Comments

Reproducibility problem on MPIIFaceGaze

Hi @yihuacheng, I trained your pre-trained model on MPIIFaceGaze. I haven't made any changes in the script for training as well as pre-processing of dataset. I performed the leave-one-person-out evaluation on this dataset as mentioned in your paper. I am using PyTorch 1.7.0. I got the following best angular errors for respective person:

Person | Best error -- | -- 0 | 2.37 1 | 4.36 2 | 4.41 3 | 4.49 4 | 3.05 5 | 3.79 6 | 3.07 7 | 4.34 8 | 4.44 9 | 4.15 10 | 5.89 11 | 5.42 12 | 4.09 13 | 3.71 14 | 6.23 Mean | 4.254

The mean of this best angular errors comes out to be 4.254, which is far away from the reported 4.00 error. Please let me know if I am missing something over here. Also, help me to reproduce the reported results.

opened by vikrant7 2
How to implement this model on RT-Gene dataset?

Thank you for your great job! I trained your pre-trained model on RT-Gene. But i find the result is far away from the result of paper,the result is about 13, I used the face image as input to get the gaze angle directly. I want to know how to train on RT-Gene in your trial.

opened by swc1204 0
Can you share the Pre-trained pure ViT model using ETH-XGaze?

Hi, yihuacheng. I am too poor to train the model of pure ViT using EHT-XGaze. But I want to know the error of ETH-XGaze dataset using pure ViT model. Can you share the Pre-trained pure ViT model using EHT-XGaze? Thank you very much!!!

opened by Rao2000 0
Coversion from CCS to SCS?

Hi, Yihua, thanks for the great job regarding appearance-based gaze-estimation. I have gone through the review paper and codes in Gazehub. It seems that the way to acquireing Rs and Ts from CCS to SCS is not mentioned. It would be appreciated if you could elaborate a bit more on that or provide any reference link or paper. Thanks in advance.

opened by nonlinearHuman 1
Question about MPII data processing
Hi,

I have some problems about the MPII dataset processing and the use of MPII data in the GazeTR model.

In GazeTR reader.py, you define the decode function for MPII like this:

def Decode_MPII(line): anno = edict() anno.face, anno.lefteye, anno.righteye = line[0], line[1], line[2] anno.name = line[3] anno.gaze3d, anno.head3d = line[5], line[6] anno.gaze2d, anno.head2d = line[7], line[8] return anno

And in the data_processing_mpii.py file you provided, you process and write the annotations into the format like this:

outfile.write("Face Left Right Grid Origin whicheye 2DPoint HeadRot HeadTrans ratio FaceCorner LeftEyeCorner RightEyeCorner\n")

If you indeed used the same code to process the MPII dataset for training the GazeTR model, then this is not right. You can see that anno.gaze2d = line[7] which is actaully corresponding to HeadRot.

Could you please give some explanation about how to correctly use the data_processing code and how to load the data in GazeTR?

Thank you very much and best regards
opened by ShijianXu 0
pitch yaw & gaze3d
I think many of the gaze estimation related works misunderstand the term.

yaw pitch is different from spherical coordinate, so your function is wrong

most of the dataset's gaze3d label is from camera's coordinate, so you can't just transform your output to 3d and then calculate the arccos
opened by brianw0924 3

The codes and models in 'Gaze Estimation using Transformer'.

Related tags

Overview

GazeTR

Requirements

Usage

Directly use our code.

Build your own project.

Pre-trained model

Performance

Links to gaze estimation codes.

License

Contact

Comments

Reproducibility problem on MPIIFaceGaze

How to implement this model on RT-Gene dataset?

Can you share the Pre-trained pure ViT model using ETH-XGaze?

Coversion from CCS to SCS?

Question about MPII data processing

pitch yaw & gaze3d

Owner

OpenFace – a state-of-the art tool intended for facial landmark detection, head pose estimation, facial action unit recognition, and eye-gaze estimation.

Re-implementation of the Noise Contrastive Estimation algorithm for pyTorch, following "Noise-contrastive estimation: A new estimation principle for unnormalized statistical models." (Gutmann and Hyvarinen, AISTATS 2010)

Implementation of gaze tracking and demo

Implementation of gaze tracking and demo

Shitty gaze mouse controller

Codes to pre-train T5 (Text-to-Text Transfer Transformer) models pre-trained on Japanese web texts

Monocular Depth Estimation - Weighted-average prediction from multiple pre-trained depth estimation models

VSR-Transformer - This paper proposes a new Transformer for video super-resolution (called VSR-Transformer).

This repository contains codes of ICCV2021 paper: SO-Pose: Exploiting Self-Occlusion for Direct 6D Pose Estimation

The codes for the work "Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation"

Light-weight network, depth estimation, knowledge distillation, real-time depth estimation, auxiliary data.

PyTorch implementation and pretrained models for XCiT models. See XCiT: Cross-Covariance Image Transformer

A novel method to tune language models. Codes and datasets for paper ``GPT understands, too''.

Codes and models for the paper "Learning Unknown from Correlations: Graph Neural Network for Inter-novel-protein Interaction Prediction".

This is an official implementation for "Swin Transformer: Hierarchical Vision Transformer using Shifted Windows" on Object Detection and Instance Segmentation.

Collision risk estimation using stochastic motion models

The official codes of "Semi-supervised Models are Strong Unsupervised Domain Adaptation Learners".

The source codes for ACL 2021 paper 'BoB: BERT Over BERT for Training Persona-based Dialogue Models from Limited Personalized Data'