Arabic speech recognition, classification and text-to-speech.

ARBML

Last update: Dec 27, 2022

Related tags

Overview

klaam

Arabic speech recognition, classification and text-to-speech using many advanced models like wave2vec and fastspeech2. This repository allows training and prediction using pretrained models.

Usage

from klaam import SpeechClassification
model = SpeechClassification()
model.classify(wav_file)

from klaam import SpeechRecognition
model = SpeechRecognition()
model.transcribe(wav_file)

from klaam import TextToSpeech
model = TextToSpeech()
model.synthesize(sample_text)

There are two avilable models for recognition trageting MSA and egyptian dialect . You can set any of them using the lang attribute

 from klaam import SpeechRecognition
 model = SpeechRecognition(lang = 'msa')
 model.transcribe('file.wav')

Datasets

Dataset	Description	link
MGB-3	Egyptian Arabic Speech recognition in the wild. Every sentence was annotated by four annotators. More than 15 hours have been collected from YouTube.	requires registeration here
ADI-5	More than 50 hours collected from Aljazeera TV. 4 regional dialectal: Egyptian (EGY), Levantine (LAV), Gulf (GLF), North African (NOR), and Modern Standard Arabic (MSA). This dataset is a part of the MGB-3 challenge.	requires registeration here
Common voice	Multlilingual dataset avilable on huggingface	here.
Arabic Speech Corpus	Arabic dataset with alignment and transcriptions	here.

Models

We currently support four models, three of them are avilable on transformers.

Language	Description	Source
Egyptian	Speech recognition	wav2vec2-large-xlsr-53-arabic-egyptian
Standard Arabic	Speech recognition	wav2vec2-large-xlsr-53-arabic
EGY, NOR, LAV, GLF, MSA	Speech classification	wav2vec2-large-xlsr-dialect-classification
Standard Arabic	Text-to-Speech	fastspeech2

Example Notebooks

Name	Description	Notebook
Demo	Classification, Recongition and Text-to-speech in a few lines of code.
Demo with mic	Audio Recongition and classification with recording.

Training

The scripts are a modification of jqueguiner/wav2vec2-sprint.

classification

This script is used for the classification task on the 5 classes.

python run_classifier.py \
   --model_name_or_path="facebook/wav2vec2-large-xlsr-53" \
   --output_dir=/path/to/output \
   --cache_dir=/path/to/cache/ \
   --freeze_feature_extractor \
   --num_train_epochs="50" \
   --per_device_train_batch_size="32" \
   --preprocessing_num_workers="1" \
   --learning_rate="3e-5" \
   --warmup_steps="20" \
   --evaluation_strategy="steps"\
   --save_steps="100" \
   --eval_steps="100" \
   --save_total_limit="1" \
   --logging_steps="100" \
   --do_eval \
   --do_train \

Recognition

This script is for training on the dataset for pretraining on the egyption dialects dataset.

python run_mgb3.py \
    --model_name_or_path="facebook/wav2vec2-large-xlsr-53" \
    --output_dir=/path/to/output \
    --cache_dir=/path/to/cache/ \
    --freeze_feature_extractor \
    --num_train_epochs="50" \
    --per_device_train_batch_size="32" \
    --preprocessing_num_workers="1" \
    --learning_rate="3e-5" \
    --warmup_steps="20" \
    --evaluation_strategy="steps"\
    --save_steps="100" \
    --eval_steps="100" \
    --save_total_limit="1" \
    --logging_steps="100" \
    --do_eval \
    --do_train \

This script can be used for Arabic common voice training

python run_common_voice.py \
    --model_name_or_path="facebook/wav2vec2-large-xlsr-53" \
    --dataset_config_name="ar" \
    --output_dir=/path/to/output/ \
    --cache_dir=/path/to/cache \
    --overwrite_output_dir \
    --num_train_epochs="1" \
    --per_device_train_batch_size="32" \
    --per_device_eval_batch_size="32" \
    --evaluation_strategy="steps" \
    --learning_rate="3e-4" \
    --warmup_steps="500" \
    --fp16 \
    --freeze_feature_extractor \
    --save_steps="10" \
    --eval_steps="10" \
    --save_total_limit="1" \
    --logging_steps="10" \
    --group_by_length \
    --feat_proj_dropout="0.0" \
    --layerdrop="0.1" \
    --gradient_checkpointing \
    --do_train --do_eval \
    --max_train_samples 100 --max_val_samples 100

Text To Speech

We use the pytorch implementation of fastspeech2 by ming024. The procedure is as follows

Download the dataset

wget http://en.arabicspeechcorpus.com/arabic-speech-corpus.zip 
unzip arabic-speech-corpus.zip

Create multiple directories for data

mkdir -p raw_data/Arabic/Arabic preprocessed_data/Arabic/TextGrid/Arabic
cp arabic-speech-corpus/textgrid/* preprocessed_data/Arabic/TextGrid/Arabic

Prepare metadata

import os 
base_dir = '/content/arabic-speech-corpus'
lines = []
for lab_file in os.listdir(f'{base_dir}/lab'):
  lines.append(lab_file[:-4]+'|'+open(f'{base_dir}/lab/{lab_file}', 'r').read())


open(f'{base_dir}/metadata.csv', 'w').write(('\n').join(lines))

Clone my fork

git clone --depth 1 https://github.com/zaidalyafeai/FastSpeech2
cd FastSpeech2
pip install -r requirements.txt

Prepare alignments and prepreocessed data

python3 prepare_align.py config/Arabic/preprocess.yaml
python3 preprocess.py config/Arabic/preprocess.yaml

Unzip vocoders

unzip hifigan/generator_LJSpeech.pth.tar.zip -d hifigan
unzip hifigan/generator_universal.pth.tar.zip -d hifigan

Start training

python3 train.py -p config/Arabic/preprocess.yaml -m config/Arabic/model.yaml -t config/Arabic/train.yaml

Comments

Refactored code and small changes
Restructured the codebase as mentioned in the issue #18

Unexpected changes:

Missing packages to run demo.ipynb: - jupyter - inflect - matplotlib - gdown

Change of hardcoded paths in FastSpeech2 to be passed paths to the module.

Removed inference.py as it doesn't correlate to anything actually being used in the library.
opened by sudomaze 22
Text to Speech
We are thinking to add TTS models, here are some possible architectures to use

https://github.com/mozilla/TTS

https://github.com/ming024/FastSpeech2

help wanted
opened by zaidalyafeai 20
Functionality to split/align audio segments for training

The audio in two of the datasets we are using (MGB3 and MGB5) come in long sequences of tens of minutes. This is impractical to use with any GPU for training. Longer sequences of audio will result in out of memory errors in GPUs even with a small batch size.

The solution is to split the audio into smaller audio segments of 15 to 30 seconds depending on the hardware used (GPU memory to a large extent).

This issue is to track adding a functionality to split the audio into smaller chunks that can fit into a GPU.

opened by othrif 14
Sampling rate modifications
Hello @zaidalyafeai

For our bachelor thesis a friend and me started working on dialect classifcation a while ago, now we came across your repo and your working with the same corpus as we did. We want to investigate how the length of the provided samples is influencing the trained classifier when using wav2vec-xlsr as the base-model.

After some investigation of your code, we were wondering why you just read the first 20 seconds of each file. Is this not somewhat contraproductive? As we are losing a lot of training data trough that?

def speech_file_to_array_fn(batch): start = 0 stop = 20 srate = 16_000 speech_array, sampling_rate = sf.read(batch["file"], start = start * srate , stop = stop * srate) batch["speech"] = librosa.resample(np.asarray(speech_array), sampling_rate, srate) batch["sampling_rate"] = srate batch["parent"] = batch["label"] return batch

Did you preprocess your data cutted into smaller pieces so each is max. 20seconds long? Or is it possible to read the whole files in so and generate our batches according to the length of each file. As the whole thing is not quite straight forward to implement.
opened by pascalfiv 6
Error opening training file, File contains data in an unknown format.

Hi Ziad, I tried running this script that is available in the readme file to the train the MSA model:

python run_common_voice.py --model_name_or_path="facebook/wav2vec2-large-xlsr-53" --dataset_config_name="ar" --output_dir=/path/to/output/ --cache_dir=/path/to/cache --overwrite_output_dir="yes" --num_train_epochs="1" --per_device_train_batch_size="32" --per_device_eval_batch_size="32" --evaluation_strategy="steps" --learning_rate="3e-4" --warmup_steps="500" --fp16="no" --freeze_feature_extractor="yes" --save_steps="10" --eval_steps="10" --save_total_limit="1" --logging_steps="10" --group_by_length="no" --feat_proj_dropout="0.0" --layerdrop="0.1" --do_train="yes" --do_eval="yes" --max_train_samples 100 --max_val_samples 100

And I got this message:

_Traceback (most recent call last): File "C:\Users\user\PycharmProjects\pythonProject1\klaam\run_common_voice.py", line 511, in main() File "C:\Users\user\PycharmProjects\pythonProject1\klaam\run_common_voice.py", line 400, in main train_dataset = train_dataset.map( File "C:\Users\user\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\datasets\arrow_dataset.py", line 1955, in map return self._map_single( File "C:\Users\user\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\datasets\arrow_dataset.py", line 520, in wrapper out: Union["Dataset", "DatasetDict"] = func(self, *args, **kwargs) File "C:\Users\user\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\datasets\arrow_dataset.py", line 487, in wrapper out: Union["Dataset", "DatasetDict"] = func(self, *args, **kwargs) File "C:\Users\user\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\datasets\fingerprint.py", line 458, in wrapper out = func(self, *args, **kwargs) File "C:\Users\user\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\datasets\arrow_dataset.py", line 2320, in map_single example = apply_function_on_filtered_inputs(example, i, offset=offset) File "C:\Users\user\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\datasets\arrow_dataset.py", line 2220, in apply_function_on_filtered_inputs processed_inputs = function(*fn_args, *additional_args, **fn_kwargs) File "C:\Users\user\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\datasets\arrow_dataset.py", line 1915, in decorated result = f(decorated_item, *args, **kwargs) File "C:\Users\user\PycharmProjects\pythonProject1\klaam\run_common_voice.py", line 394, in speech_file_to_array_fn speech_array, sampling_rate = torchaudio.load(batch["path"]) File "C:\Users\user\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\torchaudio\backend\soundfile_backend.py", line 197, in load with soundfile.SoundFile(filepath, "r") as file: File "C:\Users\user\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\soundfile.py", line 629, in init self._file = self._open(file, mode_int, closefd) File "C:\Users\user\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\soundfile.py", line 1183, in _open _error_check(_snd.sf_error(file_ptr), File "C:\Users\user\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\soundfile.py", line 1357, in error_check raise RuntimeError(prefix + _ffi.string(err_str).decode('utf-8', 'replace')) RuntimeError: Error opening '/path/to/cache\downloads\extracted\31455a499a0212b1751dd0c1547b0d360037f6a8c0a69178647a45a577d0ff67\cv-corpus-6.1-2020-12-11/ar/clips/common_voice_ar_19225971.mp3': File contains data in an unknown format.

I think the reason behind it is that the training files are in .mp3 instead of .wav Any suggestions to how I can tackle this problem?

opened by JihadZoabi 4
Speech Recognition Error

Speech Recognition

OSError: Can't load config for 'Zaid/wav2vec2-large-xlsr-dialect-classification'. If you were trying to load it from 'https://huggingface.co/models', make sure you don't have a local directory with the same name. Otherwise, make sure 'Zaid/wav2vec2-large-xlsr-dialect-classification' is the correct path to a directory containing a config.json file

opened by waleedahmed2090 2
assert batch_size * group_size < len(dataset) AssertionError when I train the model

hello everyone,

@zaidalyafeai @mustafa0x @elgeish @MagedSaeed

I tried to train the model in my dataset and this error comes out could you please help me this is Traceback (most recent call last): File "/content/drive/MyDrive/FastSpeech2/train.py", line 198, in main(args, configs) File "/content/drive/MyDrive/FastSpeech2/train.py", line 32, in main assert batch_size * group_size < len(dataset) AssertionError

thank you

opened by zaynabmu 2
argparse.ArgumentError appears when trying to train the module
I tried to train the module with both scripts that are in the readme file, and both resulted in argparse.ArgumentError I tried running:

python run_mgb3.py \ --model_name_or_path="facebook/wav2vec2-large-xlsr-53" \ --output_dir=/path/to/output \ --cache_dir=/path/to/cache/ \ --freeze_feature_extractor \ --num_train_epochs="50" \ --per_device_train_batch_size="32" \ --preprocessing_num_workers="1" \ --learning_rate="3e-5" \ --warmup_steps="20" \ --evaluation_strategy="steps"\ --save_steps="100" \ --eval_steps="100" \ --save_total_limit="1" \ --logging_steps="100" \ --do_eval \ --do_train \

and also

python run_common_voice.py \ --model_name_or_path="facebook/wav2vec2-large-xlsr-53" \ --dataset_config_name="ar" \ --output_dir=/path/to/output/ \ --cache_dir=/path/to/cache \ --overwrite_output_dir \ --num_train_epochs="1" \ --per_device_train_batch_size="32" \ --per_device_eval_batch_size="32" \ --evaluation_strategy="steps" \ --learning_rate="3e-4" \ --warmup_steps="500" \ --fp16 \ --freeze_feature_extractor \ --save_steps="10" \ --eval_steps="10" \ --save_total_limit="1" \ --logging_steps="10" \ --group_by_length \ --feat_proj_dropout="0.0" \ --layerdrop="0.1" \ --gradient_checkpointing \ --do_train --do_eval \ --max_train_samples 100 --max_val_samples 100

and both codes resulted in this Error:

_2022-04-24 19:02:16.824403: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'cudart64_110.dll'; dlerror: cudart64_110.dll not found 2022-04-24 19:02:16.824670: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. Traceback (most recent call last): File "C:\Users\user\PycharmProjects\pythonProject1\klaam\run_mgb3.py", line 523, in main() File "C:\Users\user\PycharmProjects\pythonProject1\klaam\run_mgb3.py", line 263, in main parser = HfArgumentParser((ModelArguments, DataTrainingArguments, TrainingArguments)) File "C:\Users\user\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\transformers\hf_argparser.py", line 71, in init self._add_dataclass_arguments(dtype) File "C:\Users\user\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\transformers\hf_argparser.py", line 166, in _add_dataclass_arguments self._parse_dataclass_field(parser, field) File "C:\Users\user\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\transformers\hf_argparser.py", line 137, in _parse_dataclass_field parser.add_argument(field_name, **kwargs) File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.9_3.9.3312.0_x64__qbz5n2kfra8p0\lib\argparse.py", line 1440, in add_argument return self._add_action(action) File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.9_3.9.3312.0_x64__qbz5n2kfra8p0\lib\argparse.py", line 1805, in _add_action self._optionals._add_action(action) File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.9_3.9.3312.0_x64__qbz5n2kfra8p0\lib\argparse.py", line 1642, in _add_action action = super(_ArgumentGroup, self)._add_action(action) File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.9_3.9.3312.0_x64__qbz5n2kfra8p0\lib\argparse.py", line 1454, in _add_action self._check_conflict(action) File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.9_3.9.3312.0_x64__qbz5n2kfra8p0\lib\argparse.py", line 1591, in _check_conflict conflict_handler(action, confl_optionals) File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.9_3.9.3312.0_x64__qbz5n2kfra8p0\lib\argparse.py", line 1600, in handle_conflict_error raise ArgumentError(action, message % conflict_string) argparse.ArgumentError: argument --gradient_checkpointing: conflicting option string: --gradient_checkpointing

I reproduced the error by running it on another machine and still got it. Any suggestions to how to fix it?
opened by JihadZoabi 2
WER for Egyptian Arabic

Hi, I was wondering the WER for Egyptian Arabic, since I don’t see a score on this page? https://huggingface.co/Zaid/wav2vec2-large-xlsr-53-arabic-egyptian

opened by loganlebanoff 2
Error loading model
404 Client Error: Not Found for url: https://huggingface.co/Zaid/wav2vec2-large-xlsr-53-arabic-egyptian/resolve/main/tf_model.h5

OSError: Can't load weights for 'Zaid/wav2vec2-large-xlsr-53-arabic-egyptian'. Make sure that:

'Zaid/wav2vec2-large-xlsr-53-arabic-egyptian' is a correct model identifier listed on 'https://huggingface.co/models'

or 'Zaid/wav2vec2-large-xlsr-53-arabic-egyptian' is the correct path to a directory containing a file named one of pytorch_model.bin, tf_model.h5, model.ckpt.

these 2 errors appears as i am running however i modified the code from : if lang == 'egy': model_dir = 'Zaid/wav2vec2-large-xlsr-53-arabic-egyptian' elif lang == 'msa': model_dir = 'elgeish/wav2vec2-large-xlsr-53-arabic'

to :

if lang == "egy": model_dir = Wav2Vec2ForCTC.from_pretrained("Zaid/wav2vec2-large-xlsr-53-arabic-egyptian") elif lang == "msa": model_dir = Wav2Vec2ForCTC.from_pretrained("elgeish/wav2vec2-large-xlsr-53-arabic") self.bw = True

as its written in the hugging face site but still not working . Thanks in advance
opened by Ziad-Mohamedd 2
Timestamps

Thank you for this work -- شكرا! I tested this briefly and found the results to be quite good. Is there any way to get time-stamped results? (My use case is forced alignment)

opened by mustafa0x 2
Help required to prepare the dataset to train the model.

Hello,

I am having a hard time to train the model due to the dataset. I have downloaded the mgb3 dataset and loaded according to the error logs I came across. Could you please write the steps to prepare the dataset for train the model.

Thanks!

opened by muzamil47 0
Error

When I run the training/final step, I get this error can you advise? ^CTraceback (most recent call last): File "train.py", line 198, in main(args, configs) File "train.py", line 93, in main nn.utils.clip_grad_norm_(model.parameters(), grad_clip_thresh) File "/home/layan/.local/lib/python3.6/site-packages/torch/nn/utils/clip_grad.py", line 36, in clip_grad_norm_ total_norm = torch.norm(torch.stack([torch.norm(p.grad.detach(), norm_type).to(device) for p in parameters]), norm_type) File "/home/layan/.local/lib/python3.6/site-packages/torch/nn/utils/clip_grad.py", line 36, in total_norm = torch.norm(torch.stack([torch.norm(p.grad.detach(), norm_type).to(device) for p in parameters]), norm_type) File "/home/layan/.local/lib/python3.6/site-packages/torch/functional.py", line 1293, in norm return _VF.norm(input, p, dim=_dim, keepdim=keepdim) # type: ignore File "/home/layan/.local/lib/python3.6/site-packages/torch/_VF.py", line 25, in getattr def getattr(self, attr):

opened by layansawalha 0
How to capture voice from audio device

Hi, I had a look at how to get text from an audio file, but did not get his to extract the voice directly from the audio device through specking. i.e. without saving the voice into a wave file

opened by hajsf 0
[Proposal] Codebase refactoring
To organize the code and introduce testing and continuous integration, it would be beneficial to refactor the entire codebase.

TL;DR

Re-organizing the codebase to follow best practices and to introduce testing and continuous integration.

Separating logic to import the package as a separate module, scripts to localize scripts that were used for train/inference of the logic, notebooks to localize demos and simple scripts that were written as notebooks, and tests to test the logic

Adding GitHub Actions to test build, logic of the package, auto-generate docs, and to publish the package to pypi

Moving from pip and requirements.txt setup to conda for environment management and poetry for packages management. This will ease the development as the project scales.

Codebase refactoring

Mapping

file/dir | action | placement -- | -- | -- FastSpeed2/* | moved | kaalm/external/FastSpeed2/* dialect_speech_corpus | moved | klaam/speech_corpus/dialect.py egy_speech_corpus | moved | klaam/speech_corpus/egy.py mor_speech_corpus | moved | klaam/speech_corpus/mor.py samples | moved | samples .gitignore | moved | .gitignore LICENSE | moved | LICENSE README.md | moved | README.md audio_utils.py | moved | klaam/utils/audio.py demo.ipynb | moved | notebooks/demo.ipynb demo_with_mic.ipynb | moved | notebooks/demo_with_mix.ipynb inference.ipynb | moved | notebooks/inference.ipynb klaam.py | moved | klaam/run.py klaam_logo.PNG | moved | misc/klaam_logo.png models.py | moved | klaam/models/wav2vec.py processors.py | moved | klaam/processors/custom_wave2vec.py requirements.txt | removed | run.sh | moved | scripts/run.sh run_classifier.py | moved | scripts/run_classifier.py run_common_voice.py | moved | scripts/run_common_voice.py run_mgb3.py | moved | scripts/run_mgb3.py run_mgb5.py | moved | scripts/run_mgb5.py sample_run.sh | moved | scripts/sample_run.sh utils.py | moved | klaam/utils/utils.py | added | docs | added | tests | added | .github | added | output | added | environment.yml | added | install.sh | added | mypi.ini | added | pyproject.toml | added | pytest.ini | added | ckpts

Tree Structure

root | level 1 | level2 | description -- | -- | -- | -- .github | | | github stuff (e.g. github issue templates, github actions workflows, etc.) | workflows | | | | build.yml | to test building of the package | | publish.yml | to publish the package to pypi | | tests.yml | to run tests | | docs.yml | to generate documentation klaam | | | the logic for the package | utils | | | | audio.py | | | utils.py | | models | | | | wav2vec.py | | processors | | | | wave2vec.py | | external | | | | FastSpeed2/* | | speech_corpus | | | | dialect.py | | | egy.py | | | mor.py | | run.py | | notebooks | | | | demo.ipynb | | | demo_with_mix.ipynb | | | inference.ipynb | | | | | scripts | | | set of scripts to be used to train/evaluate or anything external from the logic of the package | run.sh | | | run_classifier.py | | | run_common_voice.py | | | run_mgb3.py | | | run_mgb5.py | | | sample_run.sh | | tests | | | set of tests to test logics within klaam | test_*.py | | | conftest.py | | misc | | | | klaam_logo.png | | samples | | | | demo.wav | | ckpts | ... | | checkpoints of pre-trained models that were downloaded docs | ... | | documentation files output | ... | | environment.yml | | | conda environment definition install.sh | | | installing script to setup conda environment and install dependecies using poetry mypy.ini | | | pylint configuration pyproject.toml | | | package definition and list of dependecies to be installed pytest.ini | | | pytest configuration LICENSE | | | README.md | | | .gitignore | | |

Environment/dependencies packages

conda is used to manage the environment and install essential libraries that are big/core to the package, e.g. TensorFlow, PyTorch, cudatools, etc.

poetry is used to manage dependencies and setup the package

pytest is used to enable unit/integration testing of the codebase

Commands

poetry add PACKAGE - to add a package (this will append to pyproject.toml)

If the package installation failed and couldn't find another way to add the package, then install it using conda and add to enviroment.yml manually. (leave a comment next to the line)

Check on the web for the right channels when install packages using conda

poetry install - to install the package (package_name)

pytest tests - to run all tests manually

pytest tests/TEST_PATH - to run a specific test file (check pytest documentation for more information)

Edit - added the following sections: env/dep packages and commands
opened by sudomaze 6
Missing file

Hello Mustafa & Ziad,

I have checked your awesome work, which is really helpful to me, but I have a question please ,,I am new to this field, so could you please share with me a good reference to understand the difference between hifi-GAN and Mel-GAN?, I have checked a lot of references over the internet, but they were not that helpful! Also I have another question related to the vocoder and speaker used, when I tried different combinations I have listened and was able to know that the vocoder HiFi-GAN and the speaker universal is the best combination,,but when I tried combination LJSpeech & HIFI-GAN , I received error that the file generator_LJSpeech.pth.tar does not exist, and when I checked the files and the code, I can see the code points to this directoryFastSpeech2/hifigan/generator_LJSpeech.pth.tar but , this file does not exist "generator_LJSpeech.pth.tar"

opened by Maria-tawfik 1

Arabic speech recognition, classification and text-to-speech.

Related tags

Overview

klaam

Usage

Datasets

Models

Example Notebooks

Training

classification

Recognition

Text To Speech

Comments

TL;DR

Codebase refactoring

Environment/dependencies packages

Commands

Owner

ARBML

Text to speech is a process to convert any text into voice. Text to speech project takes words on digital devices and convert them into audio. Here I have used Google-text-to-speech library popularly known as gTTS library to convert text file to .mp3 file. Hope you like my project!

A Python module made to simplify the usage of Text To Speech and Speech Recognition.

Develop open-source Python Arabic NLP libraries that the Arab world will easily use in all Natural Language Processing applications

Silero Models: pre-trained speech-to-text, text-to-speech models and benchmarks made embarrassingly simple

PyTorch implementation of Microsoft's text-to-speech system FastSpeech 2: Fast and High-Quality End-to-End Text to Speech.

Simple Speech to Text, Text to Speech

multi-label，classifier，text classification，多标签文本分类，文本分类，BERT，ALBERT，multi-label-classification，seq2seq，attention，beam search

Speech Recognition for Uyghur using Speech transformer

An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition

A Python wrapper for simple offline real-time dictation (speech-to-text) and speaker-recognition using Vosk.

Kashgari is a production-level NLP Transfer learning framework built on top of tf.keras for text-labeling and text-classification, includes Word2Vec, BERT, and GPT2 Language Embedding.

Kashgari is a production-level NLP Transfer learning framework built on top of tf.keras for text-labeling and text-classification, includes Word2Vec, BERT, and GPT2 Language Embedding.

Code for ACL 2022 main conference paper "STEMM: Self-learning with Speech-text Manifold Mixup for Speech Translation".

Code for EMNLP 2021 main conference paper "Text AutoAugment: Learning Compositional Augmentation Policy for Text Classification"

This repository contains data used in the NAACL 2021 Paper - Proteno: Text Normalization with Limited Data for Fast Deployment in Text to Speech Systems

Unofficial Implementation of Zero-Shot Text-to-Speech for Text-Based Insertion in Audio Narration

glow-speak is a fast, local, neural text to speech system that uses eSpeak-ng as a text/phoneme front-end.

Text-Summarization-using-NLP - Text Summarization using NLP to fetch BBC News Article and summarize its text and also it includes custom article Summarization