Sign-to-Speech for Sign Language Understanding: A case study of Nigerian Sign Language

Last update: Oct 23, 2022

Related tags

Deep Learning Sign-to-Speech-for-Sign-Language-Understanding

Overview

Sign-to-Speech for Sign Language Understanding: A case study of Nigerian Sign Language

This repository contains the code, model, and deployment configs for the paper Sign-to-Speech for Sign Language Understanding: A case study of Nigerian Sign Language which appears at the NeurIPS workshop on Machine Learning for Developing World (ML4D) 2021.

Dataset

Our dataset is a novel dataset for the Nigerian Sign Language comprising of 5000 images of 137 sign words/phrases including the alphabet letters. Data collectors of 20+ individuals comprising of a TV sign language broadcaster and students and teachers from 2 special education schools in Nigeria. The dataset is not publicly available for now.

Model configs and code

To run deployed model

Clone the repository and pip install -r requirements.
If you are on a Linux OS, TTS engines might not be pre-installed on your platform. Use the code below to install them.

  sudo apt-get update && sudo apt-get install espeak ffmpeg libespeak1

While in the project directory's root, spin up the deepstack custom model's server by running the command below;

  sudo docker run -v ~/path/to/project_folder/deployed_model:/modelstore/detection -p 88:5000 deepquestai/deepstack

- Detect sign language meanings in image files and generate realistic voice of words.

run the image_detection script on the image;

  python image_detection.py image_filename.file_extension

My default port number is 88. To specify the port on which DeepStack server is running, run this instead;

python image_detection.py image_filename.file_extension --deepstack-port port_number

Running the above command would return two new files in your project root directory -

a copy of the image with bbox around the detected sign with the meaning on the top of the box,
an audiofile of the detected sign language.

- Detect sign language meanings on a live video (via webcam).

run the livefeed detection script;

  python livefeed_detection.py

My default port number is 88. To specify the port on which DeepStack server is running, run this instead;

  python livefeed_detection.py --deepstack-port port_number

This will spin up the webcam and would automatically detect any sign language words in view of the camera, while also displaying the sign meaning and returning its speech equivalent immediately through the PC's audio system. Press **q** to quit the live video.

video2132736597.mp4

Citation

Coming soon!

Robustness between the worst and average case

Robustness between the worst and average case A repository that implements intermediate robustness training and evaluation from the NeurIPS 2021 paper

10 Dec 10, 2021

One line to host them all. Bootstrap your image search case in minutes.

One line to host them all. Bootstrap your image search case in minutes. Survey NOW gives the world access to customized neural image search in just on

403 Dec 30, 2022

STYLER: Style Factor Modeling with Rapidity and Robustness via Speech Decomposition for Expressive and Controllable Neural Text to Speech

STYLER: Style Factor Modeling with Rapidity and Robustness via Speech Decomposition for Expressive and Controllable Neural Text to Speech Keon Lee, Ky

114 Dec 12, 2022

ERISHA is a mulitilingual multispeaker expressive speech synthesis framework. It can transfer the expressivity to the speaker's voice for which no expressive speech corpus is available.

ERISHA: Multilingual Multispeaker Expressive Text-to-Speech Library ERISHA is a multilingual multispeaker expressive speech synthesis framework. It ca

43 Nov 27, 2022

The source code for the Cutoff data augmentation approach proposed in this paper: "A Simple but Tough-to-Beat Data Augmentation Approach for Natural Language Understanding and Generation".

Cutoff: A Simple Data Augmentation Approach for Natural Language This repository contains source code necessary to reproduce the results presented in

49 Dec 22, 2022

Source code and dataset for ACL2021 paper: "ERICA: Improving Entity and Relation Understanding for Pre-trained Language Models via Contrastive Learning".

ERICA Source code and dataset for ACL2021 paper: "ERICA: Improving Entity and Relation Understanding for Pre-trained Language Models via Contrastive L

75 Nov 2, 2022

Sign-to-Speech for Sign Language Understanding: A case study of Nigerian Sign Language

Related tags

Overview

Sign-to-Speech for Sign Language Understanding: A case study of Nigerian Sign Language

Dataset

Model configs and code

To run deployed model

- Detect sign language meanings in image files and generate realistic voice of words.

- Detect sign language meanings on a live video (via webcam).

Citation

You might also like...

Robustness between the worst and average case

One line to host them all. Bootstrap your image search case in minutes.

STYLER: Style Factor Modeling with Rapidity and Robustness via Speech Decomposition for Expressive and Controllable Neural Text to Speech

ERISHA is a mulitilingual multispeaker expressive speech synthesis framework. It can transfer the expressivity to the speaker's voice for which no expressive speech corpus is available.

The source code for the Cutoff data augmentation approach proposed in this paper: "A Simple but Tough-to-Beat Data Augmentation Approach for Natural Language Understanding and Generation".

Source code and dataset for ACL2021 paper: "ERICA: Improving Entity and Relation Understanding for Pre-trained Language Models via Contrastive Learning".

Pytorch version of VidLanKD: Improving Language Understanding viaVideo-Distilled Knowledge Transfer

[ICML 2021] Towards Understanding and Mitigating Social Biases in Language Models

CPT: A Pre-Trained Unbalanced Transformer for Both Chinese Language Understanding and Generation

Owner

African language Speech Recognition - Speech-to-Text

PyTorch code for BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation

The source code for Generating Training Data with Language Models: Towards Zero-Shot Language Understanding.

Pre-trained BERT Models for Ancient and Medieval Greek, and associated code for LaTeCH 2021 paper titled - "A Pilot Study for BERT Language Modelling and Morphological Analysis for Ancient and Medieval Greek"

A Comprehensive Empirical Study of Vision-Language Pre-trained Model for Supervised Cross-Modal Retrieval

This repository is related to an Arabic tutorial, within the tutorial we discuss the common data structure and algorithms and their worst and best case for each, then implement the code using Python.

Code for the paper: Learning Adversarially Robust Representations via Worst-Case Mutual Information Maximization (https://arxiv.org/abs/2002.11798)

"NAS-Bench-301 and the Case for Surrogate Benchmarks for Neural Architecture Search".

Source code for our paper "Learning to Break Deep Perceptual Hashing: The Use Case NeuralHash"

In the case of your data having only 1 channel while want to use timm models