The codes and models in 'Gaze Estimation using Transformer'.

Related tags

Deep Learning GazeTR
Overview

GazeTR

We provide the code of GazeTR-Hybrid in "Gaze Estimation using Transformer".

We recommend you to use data processing codes provided in GazeHub. You can direct run the method' code using the processed dataset.

Requirements

We build the project with pytorch1.7.0.

The warmup is used following here.

Usage

Directly use our code.

You should perform three steps to run our codes.

  1. Prepare the data using our provided data processing codes.

  2. Modify the config/train/config_xx.yaml and config/test/config_xx.yaml.

  3. Run the commands.

To perform leave-one-person-out evaluation, you can run

python trainer/leave.py -s config/train/config_xx.yaml -p 0

Note that, this command only performs training in the 0th person. You should modify the parameter of -p and repeat it.

To perform training-test evaluation, you can run

python trainer/total.py -s config/train/config_xx.yaml    

To test your model, you can run

python trainer/leave.py -s config/train/config_xx.yaml -t config/test/config_xx.yaml -p 0

or

python trainer/total.py -s config/train/config_xx.yaml -t config/test/config_xx.yaml

Build your own project.

You can import the model in model.py for your own project.

We give an example. Note that, the line 114 in model.py uses .cuda(). You should remove it if you run the model in CPU.

from model import Model
GazeTR = Model()

img = torch.ones(10, 3, 224 ,224).cuda()
img = {'face': img}
label = torch.ones(10, 2).cuda()

# for training
loss = GazeTR(img, label)

# for test
gaze = GazeTR(img)

Pre-trained model

You can download from google drive or baidu cloud disk with code 1234.

This is the pre-trained model in ETH-XGaze dataset with 50 epochs and 512 batch sizes.

Performance

ComparisonA

ComparisonB

Links to gaze estimation codes.

License

The code is under the license of CC BY-NC-SA 4.0 license.

Contact

Please email any questions or comments to [email protected].

Comments
  • Reproducibility problem on MPIIFaceGaze

    Reproducibility problem on MPIIFaceGaze

    Hi @yihuacheng, I trained your pre-trained model on MPIIFaceGaze. I haven't made any changes in the script for training as well as pre-processing of dataset. I performed the leave-one-person-out evaluation on this dataset as mentioned in your paper. I am using PyTorch 1.7.0. I got the following best angular errors for respective person:

    Person | Best error -- | -- 0 | 2.37 1 | 4.36 2 | 4.41 3 | 4.49 4 | 3.05 5 | 3.79 6 | 3.07 7 | 4.34 8 | 4.44 9 | 4.15 10 | 5.89 11 | 5.42 12 | 4.09 13 | 3.71 14 | 6.23 Mean | 4.254

    The mean of this best angular errors comes out to be 4.254, which is far away from the reported 4.00 error. Please let me know if I am missing something over here. Also, help me to reproduce the reported results.

    opened by vikrant7 2
  • How to implement this model on RT-Gene dataset?

    How to implement this model on RT-Gene dataset?

    Thank you for your great job! I trained your pre-trained model on RT-Gene. But i find the result is far away from the result of paper,the result is about 13, I used the face image as input to get the gaze angle directly. I want to know how to train on RT-Gene in your trial.

    opened by swc1204 0
  • Can you share the Pre-trained pure ViT model using ETH-XGaze?

    Can you share the Pre-trained pure ViT model using ETH-XGaze?

    Hi, yihuacheng. I am too poor to train the model of pure ViT using EHT-XGaze. But I want to know the error of ETH-XGaze dataset using pure ViT model. Can you share the Pre-trained pure ViT model using EHT-XGaze? Thank you very much!!!

    opened by Rao2000 0
  • Coversion from CCS to SCS?

    Coversion from CCS to SCS?

    Hi, Yihua, thanks for the great job regarding appearance-based gaze-estimation. I have gone through the review paper and codes in Gazehub. It seems that the way to acquireing Rs and Ts from CCS to SCS is not mentioned. It would be appreciated if you could elaborate a bit more on that or provide any reference link or paper. Thanks in advance.

    opened by nonlinearHuman 1
  • Question about MPII data processing

    Question about MPII data processing

    Hi,

    I have some problems about the MPII dataset processing and the use of MPII data in the GazeTR model.

    In GazeTR reader.py, you define the decode function for MPII like this:

    def Decode_MPII(line):
        anno = edict()
        anno.face, anno.lefteye, anno.righteye = line[0], line[1], line[2]
        anno.name = line[3]
    
        anno.gaze3d, anno.head3d = line[5], line[6]
        anno.gaze2d, anno.head2d = line[7], line[8]
        return anno
    

    And in the data_processing_mpii.py file you provided, you process and write the annotations into the format like this:

    outfile.write("Face Left Right Grid Origin whicheye 2DPoint HeadRot HeadTrans ratio FaceCorner LeftEyeCorner RightEyeCorner\n")
    

    If you indeed used the same code to process the MPII dataset for training the GazeTR model, then this is not right. You can see that anno.gaze2d = line[7] which is actaully corresponding to HeadRot.

    Could you please give some explanation about how to correctly use the data_processing code and how to load the data in GazeTR?

    Thank you very much and best regards

    opened by ShijianXu 0
  • pitch yaw & gaze3d

    pitch yaw & gaze3d

    I think many of the gaze estimation related works misunderstand the term.

    1. yaw pitch is different from spherical coordinate, so your function is wrong
    2. most of the dataset's gaze3d label is from camera's coordinate, so you can't just transform your output to 3d and then calculate the arccos
    opened by brianw0924 3
Owner
null
OpenFace – a state-of-the art tool intended for facial landmark detection, head pose estimation, facial action unit recognition, and eye-gaze estimation.

OpenFace 2.2.0: a facial behavior analysis toolkit Over the past few years, there has been an increased interest in automatic facial behavior analysis

Tadas Baltrusaitis 5.8k Dec 31, 2022
Re-implementation of the Noise Contrastive Estimation algorithm for pyTorch, following "Noise-contrastive estimation: A new estimation principle for unnormalized statistical models." (Gutmann and Hyvarinen, AISTATS 2010)

Noise Contrastive Estimation for pyTorch Overview This repository contains a re-implementation of the Noise Contrastive Estimation algorithm, implemen

Denis Emelin 42 Nov 24, 2022
Implementation of gaze tracking and demo

Predicting Customer Demand by Using Gaze Detecting and Object Tracking This project is the integration of gaze detecting and object tracking. Predict

null 2 Oct 20, 2022
Implementation of gaze tracking and demo

Predicting Customer Demand by Using Gaze Detecting and Object Tracking This project is the integration of gaze detecting and object tracking. Predict

null 2 Oct 20, 2022
Shitty gaze mouse controller

demo.mp4 shitty_gaze_mouse_cotroller install tensofflow, cv2 run the main.py and as it starts it will collect data so first raise your left eyebrow(bo

null 16 Aug 30, 2022
Codes to pre-train T5 (Text-to-Text Transfer Transformer) models pre-trained on Japanese web texts

t5-japanese Codes to pre-train T5 (Text-to-Text Transfer Transformer) models pre-trained on Japanese web texts. The following is a list of models that

Kimio Kuramitsu 1 Dec 13, 2021
Monocular Depth Estimation - Weighted-average prediction from multiple pre-trained depth estimation models

merged_depth runs (1) AdaBins, (2) DiverseDepth, (3) MiDaS, (4) SGDepth, and (5) Monodepth2, and calculates a weighted-average per-pixel absolute dept

Pranav 39 Nov 21, 2022
VSR-Transformer - This paper proposes a new Transformer for video super-resolution (called VSR-Transformer).

VSR-Transformer By Jiezhang Cao, Yawei Li, Kai Zhang, Luc Van Gool This paper proposes a new Transformer for video super-resolution (called VSR-Transf

Jiezhang Cao 225 Nov 13, 2022
This repository contains codes of ICCV2021 paper: SO-Pose: Exploiting Self-Occlusion for Direct 6D Pose Estimation

SO-Pose This repository contains codes of ICCV2021 paper: SO-Pose: Exploiting Self-Occlusion for Direct 6D Pose Estimation This paper is basically an

shangbuhuan 52 Nov 25, 2022
The codes for the work "Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation"

Swin-Unet The codes for the work "Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation"(https://arxiv.org/abs/2105.05537). A validatio

null 869 Jan 7, 2023
Light-weight network, depth estimation, knowledge distillation, real-time depth estimation, auxiliary data.

light-weight-depth-estimation Boosting Light-Weight Depth Estimation Via Knowledge Distillation, https://arxiv.org/abs/2105.06143 Junjie Hu, Chenyou F

Junjie Hu 13 Dec 10, 2022
Facebook Research 605 Jan 2, 2023
A novel method to tune language models. Codes and datasets for paper ``GPT understands, too''.

P-tuning A novel method to tune language models. Codes and datasets for paper ``GPT understands, too''. How to use our code We have released the code

THUDM 562 Dec 27, 2022
Codes and models for the paper "Learning Unknown from Correlations: Graph Neural Network for Inter-novel-protein Interaction Prediction".

GNN_PPI Codes and models for the paper "Learning Unknown from Correlations: Graph Neural Network for Inter-novel-protein Interaction Prediction". Lear

Ursa Zrimsek 2 Dec 14, 2022
This is an official implementation for "Swin Transformer: Hierarchical Vision Transformer using Shifted Windows" on Object Detection and Instance Segmentation.

Swin Transformer for Object Detection This repo contains the supported code and configuration files to reproduce object detection results of Swin Tran

Swin Transformer 1.4k Dec 30, 2022
Collision risk estimation using stochastic motion models

collision_risk_estimation Collision risk estimation using stochastic motion models. This is a new approach, based on stochastic models, to predict the

Unmesh 7 Jun 26, 2022
The official codes of "Semi-supervised Models are Strong Unsupervised Domain Adaptation Learners".

SSL models are Strong UDA learners Introduction This is the official code of paper "Semi-supervised Models are Strong Unsupervised Domain Adaptation L

Yabin Zhang 26 Dec 26, 2022
The source codes for ACL 2021 paper 'BoB: BERT Over BERT for Training Persona-based Dialogue Models from Limited Personalized Data'

BoB: BERT Over BERT for Training Persona-based Dialogue Models from Limited Personalized Data This repository provides the implementation details for

null 124 Dec 27, 2022