African language Speech Recognition - Speech-to-Text

Last update: Jan 5, 2023

Related tags

Deep Learning python machine-learning tensorflow keras deeplearning speech-to-text cml dvc mlops mlflow experiment-analysis audio-and-text-processing model-comparison-and-selection

Overview

Swahili-Speech-To-Text

Table of Contents

Swahili-Speech-To-Text
- Overview
- Scenario
- Approach
- Project Structure
  - data:
  - models:
  - notebooks:
  - scripts
  - tests:
  - logs:
  - root folder
- Installation guide

Overview

This repository is used for week 4 challenge of 10Academy. The instructions for this project can be found in the challenge document.

Scenario

The World Food Program wants to deploy an intelligent form that collects nutritional information of food bought and sold at markets in two different countries in Africa - Ethiopia and Kenya. The design of this intelligent form requires selected people to install an app on their mobile phone, and whenever they buy food, they use their voice to activate the app to register the list of items they just bought in their own language. The intelligent systems in the app are expected to live to transcribe the speech-to-text and organize the information in an easy-to-process way in a database.

You work for the Tenacious data science consultancy, which is chosen to deliver speech-to-text technology for Swahili. Your responsibility is to build a deep learning model that is capable of transcribing a speech to text. The model you produce should be accurate and is robust against background noise.

Approach

The project is divided and implemented by the following phases

Data pre-processing
Modelling using deep learning
Serving predictions on a web interface
Interpretation & Reporting

Project Structure

The repository has a number of files including python scripts, jupyter notebooks, pdfs and text files. Here is their structure with a brief explanation.

data:

the folder where the dataset csv files are stored

models:

the folder where models' pickle files are stored

notebooks:

EDA.ipynb: a jupyter notebook for exploratory data analysis
Meta-data Generation.ipynb: a jupyter notebook for extracting the metadata from the transription and audio files
Audio preprocessing.ipynb: a jupyter notebook for preprocessing the audio data

scripts

app_logger.py: a python script for logging
file_handler.py: a python script for handling reading and writing of csv, pickle and other files

tests:

the folder containing unit tests for components in the scripts

logs:

the folder containing log files (if it doesn't exist it will be created once logging starts)

root folder

10 Academy Batch 4 - Week 3 Challenge.pdf: the challenge document
requirements.txt: a text file lsiting the projet's dependancies
setup.py: a configuration file for installing the scripts as a package
README.md: Markdown text with a brief explanation of the project and the repository structure.

Installation guide

git clone https://github.com/10-Academy-Batch-4-Week-4/Swahili-Speech-To-Text
cd Swahili-Speech-To-Text
pip install -r requirements.txt

You might also like...

PyTorch implementation of "ContextNet: Improving Convolutional Neural Networks for Automatic Speech Recognition with Global Context" (INTERSPEECH 2020)

ContextNet ContextNet has CNN-RNN-transducer architecture and features a fully convolutional encoder that incorporates global context information into

24 Nov 24, 2022

Efficient Conformer: Progressive Downsampling and Grouped Attention for Automatic Speech Recognition

Efficient Conformer: Progressive Downsampling and Grouped Attention for Automatic Speech Recognition Official implementation of the Efficient Conforme

145 Dec 30, 2022

Code of the paper "Performance-Efficiency Trade-offs in Unsupervised Pre-training for Speech Recognition"

SEW (Squeezed and Efficient Wav2vec) The repo contains the code of the paper "Performance-Efficiency Trade-offs in Unsupervised Pre-training for Speec

67 Dec 1, 2022

Speech Recognition using DeepSpeech2.

deepspeech.pytorch Implementation of DeepSpeech2 for PyTorch using PyTorch Lightning. The repo supports training/testing and inference using the DeepS

2k Jan 4, 2023

Tensorflow Implementation for "Pre-trained Deep Convolution Neural Network Model With Attention for Speech Emotion Recognition"

Tensorflow Implementation for "Pre-trained Deep Convolution Neural Network Model With Attention for Speech Emotion Recognition" Pre-trained Deep Convo

5 Nov 11, 2022

SpecAugmentPyTorch - A Pytorch (support batch and channel) implementation of GoogleBrain's SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition

SpecAugment An implementation of SpecAugment for Pytorch How to use Install pytorch, version=1.9.0 (new feature (torch.Tensor.take_along_dim) is used

3 Oct 11, 2022

A real-time speech emotion recognition application using Scikit-learn and gradio

Speech-Emotion-Recognition-App A real-time speech emotion recognition application using Scikit-learn and gradio. Requirements librosa==0.6.3 numpy sou

6 Oct 4, 2022

Speech Emotion Recognition with Fusion of Acoustic- and Linguistic-Feature-Based Decisions

APSIPA-SER-with-A-and-T This code is the implementation of Speech Emotion Recognition (SER) with acoustic and linguistic features. The network model i

3 Jan 4, 2023

A pure PyTorch batched computation implementation of "CIF: Continuous Integrate-and-Fire for End-to-End Speech Recognition"

14 Dec 2, 2022

Comments

DVC initialization

There are a few alternatives to construct a GDrive remote URL for different uses, such as a folder or subfolder in the root, shared folders not owned by your account, etc.

further read: https://dvc.org/doc/user-guide/setup-google-drive-remote#using-a-custom-google-cloud-project-recommended

Thank you

opened by heavye 1
meta data created

Creating the data frame was causing error due to location problem says 'No such file or directory: '/content/drive/MyDrive/Week-4/speech_data/ALFFA_PUBLIC/ASR/SWAHILI/data/train/wav/SWH-05-20101106/SWH-05-20101107_16k-emission_swahili_05h30_-_06h00_tu_20101107_part1.wav'

opened by heavye 1

African language Speech Recognition - Speech-to-Text

Related tags

Overview

Swahili-Speech-To-Text

Overview

Scenario

Approach

Project Structure

data:

models:

notebooks:

scripts

tests:

logs:

root folder

Installation guide

You might also like...

PyTorch implementation of "ContextNet: Improving Convolutional Neural Networks for Automatic Speech Recognition with Global Context" (INTERSPEECH 2020)

Efficient Conformer: Progressive Downsampling and Grouped Attention for Automatic Speech Recognition

Code of the paper "Performance-Efficiency Trade-offs in Unsupervised Pre-training for Speech Recognition"

Speech Recognition using DeepSpeech2.

Tensorflow Implementation for "Pre-trained Deep Convolution Neural Network Model With Attention for Speech Emotion Recognition"

SpecAugmentPyTorch - A Pytorch (support batch and channel) implementation of GoogleBrain's SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition

A real-time speech emotion recognition application using Scikit-learn and gradio

Speech Emotion Recognition with Fusion of Acoustic- and Linguistic-Feature-Based Decisions

A pure PyTorch batched computation implementation of "CIF: Continuous Integrate-and-Fire for End-to-End Speech Recognition"

Comments

DVC initialization

meta data created

Owner

Code for the paper "MASTER: Multi-Aspect Non-local Network for Scene Text Recognition" (Pattern Recognition 2021)

STYLER: Style Factor Modeling with Rapidity and Robustness via Speech Decomposition for Expressive and Controllable Neural Text to Speech

Read Like Humans: Autonomous, Bidirectional and Iterative Language Modeling for Scene Text Recognition

Automatic number plate recognition using tech: Yolo, OCR, Scene text detection, scene text recognation, flask, torch

Deep Text Search is an AI-powered multilingual text search and recommendation engine with state-of-the-art transformer-based multilingual text embedding (50+ languages).

TAP: Text-Aware Pre-training for Text-VQA and Text-Caption, CVPR 2021 (Oral)

Face-Recognition-Attendence-System - This face recognition Attendence system using Python

PyTorch implementation of "Conformer: Convolution-augmented Transformer for Speech Recognition" (INTERSPEECH 2020)

AI grand challenge 2020 Repo (Speech Recognition Track)

PyTorch Lightning implementation of Automatic Speech Recognition