Unified Gesture Recognition and Fingertip Detection
A unified convolutional neural network (CNN) algorithm for both hand gesture recognition and fingertip detection at the same time. The proposed algorithm uses a single network to predict both finger class probabilities for classification and fingertips positional output for regression in one evaluation. From the finger class probabilities, the gesture is recognized, and using both of the information fingertips are localized. Instead of directly regressing the fingertips position from the fully connected (FC) layer of the CNN, we regress the ensemble of fingertips position from a fully convolutional network (FCN) and subsequently take ensemble average to regress the final fingertips positional output.
Update
Included robust real-time hand detection using yolo
for better smooth performance in the first stage of the detection system and most of the code has been cleaned and restructured for ease of use. To get the previous versions, please visit the release section.
Requirements
- TensorFlow-GPU==2.2.0
- OpenCV==4.2.0
- ImgAug==0.2.6
- Weights:
Download the pre-trained weights
files of the unified gesture recognition and fingertip detection model and put theweights
folder in the working directory.
The weights
folder contains three weights files. The fingertip.h5
is for unified gesture recignition and finertiop detection. yolo.h5
and solo.h5
are for the yolo and solo method of hand detection. (what is solo?)
Paper
To get more information about the proposed method and experiments, please go through the paper
. Cite the paper as:
@article{alam2021unified,
title = {Unified learning approach for egocentric hand gesture recognition and fingertip detection},
author={Alam, Mohammad Mahmudul and Islam, Mohammad Tariqul and Rahman, SM Mahbubur},
journal = {Pattern Recognition},
volume = {121},
pages = {108200},
year = {2021},
publisher={Elsevier},
}
Dataset
The proposed gesture recognition and fingertip detection model is trained by employing Scut-Ego-Gesture Dataset
which has a total of eleven different single hand gesture datasets. Among the eleven different gesture datasets, eight of them are considered for experimentation. A detailed explanation about the partition of the dataset along with the list of the images used in the training, validation, and the test set is provided in the dataset/
folder.
Network Architecture
To implement the algorithm, the following network architecture is proposed where a single CNN is utilized for both hand gesture recognition and fingertip detection.
Prediction
To get the prediction on a single image run the predict.py
file. It will run the prediction in the sample image stored in the data/
folder. Here is the output for the sample.jpg
image.
Real-Time!
To run in real-time simply clone the repository and download the weights file and then run the real-time.py
file.
directory > python real-time.py
In real-time execution, there are two stages. In the first stage, the hand can be detected by using either you only look once (yolo)
or single object localization (solo)
algorithm. By default, yolo
will be used here. The detected hand portion is then cropped and fed to the second stage for gesture recognition and fingertip detection.
Output
Here is the output of the unified gesture recognition and fingertip detection model for all of the 8 classes of the dataset where not only each fingertip is detected but also each finger is classified.