Shuwa Gesture Toolkit is a framework that detects and classifies arbitrary gestures in short videos

Google

Last update: Dec 22, 2022

Related tags

Deep Learning shuwa

Overview

Shuwa Gesture Toolkit

Shuwa (手話) is Japanese for "Sign Language"

Shuwa Gesture Toolkit is a framework that detects and classifies arbitrary gestures in short videos. It is particularly useful for recognizing basic words in sign language. We collected thousands of example videos of people signing Japanese Sign Language (JSL) and Hong Kong Sign Language (HKSL) to train the baseline model for recognizing gestures and facial expressions.

The Shuwa Gesture Toolkit also allows you to train new gestures, so it can be trained to recognize any sign from any sign language in the world.

[Web Demo]

How it works

By combining pose, face, and hand detector results over multiple frames we can acquire a fairly requirement for sign language understanding includes body movement, facial movement, and hand gesture. After that we use DD-Net as a recognitor to predict sign features represented in the 832D vector. Finally using use K-Nearest Neighbor classification to output the class prediction.

All related models listed below.

PoseNet: Pose detector model.
FaceMesh : Face keypoints detector model.
HandLandmarks : Hand keypoints detector model.
DD-Net : Skeleton-based action recognition model.

Installation

For MacOS user
Install python 3.7 from official python.org for tkinter support.
Install dependencies
```
pip3 install -r requirements.txt 
```

Run Python Demo

python3 webcam_demo_knn.py

Use record mode to add more sign.
Play mode.

Run Detector demo

You can try each detector individually by using these scripts.

FaceMesh

python3 face_landmark\webcam_demo_face.py

PoseNet

python3 posenet\webcam_demo_pose.py

HandLandmarks

python3 hand_landmark\webcam_demo_hand.py

Deploy on the Web using Tensorflow.js

Instructions here

Train classifier from scratch

You can add a custom sign by using Record mode in the full demo program.
But if you want to train the classifier from scratch you can check out the process here

You might also like...

SiamMOT is a region-based Siamese Multi-Object Tracking network that detects and associates object instances simultaneously.

SiamMOT: Siamese Multi-Object Tracking

432 Dec 17, 2022

Some simple programs built in Python: webcam with cv2 that detects eyes and face, with grayscale filter

Programas en Python Algunos programas simples creados en Python: 📹 Webcam con c

1 Feb 15, 2022

A GOOD REPRESENTATION DETECTS NOISY LABELS

A GOOD REPRESENTATION DETECTS NOISY LABELS This code is a PyTorch implementation of the paper: Prerequisites Python 3.6.9 PyTorch 1.7.1 Torchvision 0.

64 Jan 4, 2023

YOLTv5 rapidly detects objects in arbitrarily large aerial or satellite images that far exceed the ~600×600 pixel size typically ingested by deep learning object detection frameworks

YOLTv5 rapidly detects objects in arbitrarily large aerial or satellite images that far exceed the ~600×600 pixel size typically ingested by deep learning object detection frameworks.

145 Jan 1, 2023

A gesture recognition system powered by OpenPose, k-nearest neighbours, and local outlier factor.

Comments

Training script issue
Hi, thank you for sharing this code!

We would like to ask for help about the error that we encounter while running the training script on our data with 10 words only. In the jupyter notebook, the model.fit_generator() produces the following error:

InvalidArgumentError: Input to reshape is a tensor with 10 values, but the requested shape has 1 [[node Reshape (defined at /Users/abc/opt/anaconda3/envs/githubshuwa/lib/python3.7/site-packages/tensorflow_addons/losses/triplet.py:257) ]] [Op:__inference_train_function_13545]

We used the following tool versions:

Python 3.7

Tensorflow 2.6

And we also updated the batch from 32 to 1 because the demo webapp expects the model to have an input shape with 1 batch only.

We would like to ask:

What versions of python and Tensorflow should we use?

Are there other possible cause for the error that we encountered?

How can we train using more than 1 batch without affecting the expected shape of the input of the model?

Thank you very much in advance!
opened by ARVRTest 3
Readme

Thank you for this interesting software. What version of python should I use. 3.8 fails. 3.6 works, but is it best? Also, python3 hand_landmark\webcam_demo_hand.py fails on linux. This needs a forward slash not a backslash. Also crop_utils module can't be found. Does something need to be added to pythonpath? Andy other hints would be appreciated. BTW, I can't type input to record on extract_knn_features.py. Thanks again. I'm on Pop!_OS 20.04.

opened by MikeyBeez 3
The prediction of the js version of the pose model is different from the python version.

I ran a inference of the posenet model in js version and its predictions are worse than its python counterpart Is there a reason for the performance degradation between the python and js posenet models? @bit-kim

opened by arulpraveent 0
Web Demo Issue

Hi! We noticed that the output layer of the model was updated. The new architecture is now incompatible with the web demo application.

signClassify.js expects to get an array of 2 elements from classifyModel.predict(..), but the new model only returns 1 element. Please refer to Line 103 or 454 of ./web_demo/public/js/ML/signClassify.js

Does it mean the web demo will also be updated?

Thank you!!!

opened by ARVRTest 0

Shuwa Gesture Toolkit is a framework that detects and classifies arbitrary gestures in short videos

Related tags

Overview

Shuwa Gesture Toolkit

How it works

Installation

Run Python Demo

Run Detector demo

Deploy on the Web using Tensorflow.js

Train classifier from scratch

You might also like...

SiamMOT is a region-based Siamese Multi-Object Tracking network that detects and associates object instances simultaneously.

Some simple programs built in Python: webcam with cv2 that detects eyes and face, with grayscale filter

A GOOD REPRESENTATION DETECTS NOISY LABELS

YOLTv5 rapidly detects objects in arbitrarily large aerial or satellite images that far exceed the ~600×600 pixel size typically ingested by deep learning object detection frameworks

A gesture recognition system powered by OpenPose, k-nearest neighbours, and local outlier factor.

Gesture-controlled Video Game. Just swing your finger and play the game without touching your PC

Unified learning approach for egocentric hand gesture recognition and fingertip detection

Deep learning based hand gesture recognition using LSTM and MediaPipie.

YouRefIt: Embodied Reference Understanding with Language and Gesture

Comments

Training script issue

Readme

The prediction of the js version of the pose model is different from the python version.

Web Demo Issue

Owner

Google

Control-Raspberry-Pi-Robot-using-Hand-Gestures - A 4WD Robot car based on Raspberry Pi that controlled by hand gestures(using openCV and mediapipe)

Gesture-Volume-Control - This Python program can adjust the system's volume by using hand gestures

Trash Sorter Extraordinaire is a software which efficiently detects the different types of waste in a pile of random trash through feeding it pictures or videos.

Flower classification model that classifies flowers in 10 classes made using transfer learning (~85% accuracy).

A simple rest api serving a deep learning model that classifies human gender based on their faces. (vgg16 transfare learning)

A simple rest api that classifies pneumonia infection weather it is Normal, Pneumonia Virus or Pneumonia Bacteria from a chest-x-ray image.

Use Python, OpenCV, and MediaPipe to control a keyboard with facial gestures

Using this you can control your PC/Laptop volume by Hand Gestures (pinch-in, pinch-out) created with Python.

A program that uses computer vision to detect hand gestures, used for controlling movie players.