Arabic speech recognition, classification and text-to-speech.

Overview

klaam

Arabic speech recognition, classification and text-to-speech using many advanced models like wave2vec and fastspeech2. This repository allows training and prediction using pretrained models.

Usage

from klaam import SpeechClassification
model = SpeechClassification()
model.classify(wav_file)

from klaam import SpeechRecognition
model = SpeechRecognition()
model.transcribe(wav_file)

from klaam import TextToSpeech
model = TextToSpeech()
model.synthesize(sample_text)

There are two avilable models for recognition trageting MSA and egyptian dialect . You can set any of them using the lang attribute

 from klaam import SpeechRecognition
 model = SpeechRecognition(lang = 'msa')
 model.transcribe('file.wav')

Datasets

Dataset Description link
MGB-3 Egyptian Arabic Speech recognition in the wild. Every sentence was annotated by four annotators. More than 15 hours have been collected from YouTube. requires registeration here
ADI-5 More than 50 hours collected from Aljazeera TV. 4 regional dialectal: Egyptian (EGY), Levantine (LAV), Gulf (GLF), North African (NOR), and Modern Standard Arabic (MSA). This dataset is a part of the MGB-3 challenge. requires registeration here
Common voice Multlilingual dataset avilable on huggingface here.
Arabic Speech Corpus Arabic dataset with alignment and transcriptions here.

Models

We currently support four models, three of them are avilable on transformers.

Language Description Source
Egyptian Speech recognition wav2vec2-large-xlsr-53-arabic-egyptian
Standard Arabic Speech recognition wav2vec2-large-xlsr-53-arabic
EGY, NOR, LAV, GLF, MSA Speech classification wav2vec2-large-xlsr-dialect-classification
Standard Arabic Text-to-Speech fastspeech2

Example Notebooks

Name Description Notebook
Demo Classification, Recongition and Text-to-speech in a few lines of code.
Demo with mic Audio Recongition and classification with recording.

Training

The scripts are a modification of jqueguiner/wav2vec2-sprint.

classification

This script is used for the classification task on the 5 classes.

python run_classifier.py \
   --model_name_or_path="facebook/wav2vec2-large-xlsr-53" \
   --output_dir=/path/to/output \
   --cache_dir=/path/to/cache/ \
   --freeze_feature_extractor \
   --num_train_epochs="50" \
   --per_device_train_batch_size="32" \
   --preprocessing_num_workers="1" \
   --learning_rate="3e-5" \
   --warmup_steps="20" \
   --evaluation_strategy="steps"\
   --save_steps="100" \
   --eval_steps="100" \
   --save_total_limit="1" \
   --logging_steps="100" \
   --do_eval \
   --do_train \

Recognition

This script is for training on the dataset for pretraining on the egyption dialects dataset.

python run_mgb3.py \
    --model_name_or_path="facebook/wav2vec2-large-xlsr-53" \
    --output_dir=/path/to/output \
    --cache_dir=/path/to/cache/ \
    --freeze_feature_extractor \
    --num_train_epochs="50" \
    --per_device_train_batch_size="32" \
    --preprocessing_num_workers="1" \
    --learning_rate="3e-5" \
    --warmup_steps="20" \
    --evaluation_strategy="steps"\
    --save_steps="100" \
    --eval_steps="100" \
    --save_total_limit="1" \
    --logging_steps="100" \
    --do_eval \
    --do_train \

This script can be used for Arabic common voice training

python run_common_voice.py \
    --model_name_or_path="facebook/wav2vec2-large-xlsr-53" \
    --dataset_config_name="ar" \
    --output_dir=/path/to/output/ \
    --cache_dir=/path/to/cache \
    --overwrite_output_dir \
    --num_train_epochs="1" \
    --per_device_train_batch_size="32" \
    --per_device_eval_batch_size="32" \
    --evaluation_strategy="steps" \
    --learning_rate="3e-4" \
    --warmup_steps="500" \
    --fp16 \
    --freeze_feature_extractor \
    --save_steps="10" \
    --eval_steps="10" \
    --save_total_limit="1" \
    --logging_steps="10" \
    --group_by_length \
    --feat_proj_dropout="0.0" \
    --layerdrop="0.1" \
    --gradient_checkpointing \
    --do_train --do_eval \
    --max_train_samples 100 --max_val_samples 100

Text To Speech

We use the pytorch implementation of fastspeech2 by ming024. The procedure is as follows

Download the dataset

wget http://en.arabicspeechcorpus.com/arabic-speech-corpus.zip 
unzip arabic-speech-corpus.zip 

Create multiple directories for data

mkdir -p raw_data/Arabic/Arabic preprocessed_data/Arabic/TextGrid/Arabic
cp arabic-speech-corpus/textgrid/* preprocessed_data/Arabic/TextGrid/Arabic

Prepare metadata

import os 
base_dir = '/content/arabic-speech-corpus'
lines = []
for lab_file in os.listdir(f'{base_dir}/lab'):
  lines.append(lab_file[:-4]+'|'+open(f'{base_dir}/lab/{lab_file}', 'r').read())


open(f'{base_dir}/metadata.csv', 'w').write(('\n').join(lines))

Clone my fork

git clone --depth 1 https://github.com/zaidalyafeai/FastSpeech2
cd FastSpeech2
pip install -r requirements.txt

Prepare alignments and prepreocessed data

python3 prepare_align.py config/Arabic/preprocess.yaml
python3 preprocess.py config/Arabic/preprocess.yaml

Unzip vocoders

unzip hifigan/generator_LJSpeech.pth.tar.zip -d hifigan
unzip hifigan/generator_universal.pth.tar.zip -d hifigan

Start training

python3 train.py -p config/Arabic/preprocess.yaml -m config/Arabic/model.yaml -t config/Arabic/train.yaml
Comments
  • Refactored code and small changes

    Refactored code and small changes

    • Restructured the codebase as mentioned in the issue #18
    • Unexpected changes:
      • Missing packages to run demo.ipynb: - jupyter - inflect - matplotlib - gdown
      • Change of hardcoded paths in FastSpeech2 to be passed paths to the module.
      • Removed inference.py as it doesn't correlate to anything actually being used in the library.
    opened by sudomaze 22
  • Text to Speech

    Text to Speech

    We are thinking to add TTS models, here are some possible architectures to use

    • https://github.com/mozilla/TTS
    • https://github.com/ming024/FastSpeech2
    help wanted 
    opened by zaidalyafeai 20
  • Functionality to split/align audio segments for training

    Functionality to split/align audio segments for training

    The audio in two of the datasets we are using (MGB3 and MGB5) come in long sequences of tens of minutes. This is impractical to use with any GPU for training. Longer sequences of audio will result in out of memory errors in GPUs even with a small batch size.

    The solution is to split the audio into smaller audio segments of 15 to 30 seconds depending on the hardware used (GPU memory to a large extent).

    This issue is to track adding a functionality to split the audio into smaller chunks that can fit into a GPU.

    opened by othrif 14
  • Sampling rate modifications

    Sampling rate modifications

    Hello @zaidalyafeai

    For our bachelor thesis a friend and me started working on dialect classifcation a while ago, now we came across your repo and your working with the same corpus as we did. We want to investigate how the length of the provided samples is influencing the trained classifier when using wav2vec-xlsr as the base-model.

    After some investigation of your code, we were wondering why you just read the first 20 seconds of each file. Is this not somewhat contraproductive? As we are losing a lot of training data trough that?

     def speech_file_to_array_fn(batch):
            start = 0 
            stop = 20 
            srate = 16_000
            speech_array, sampling_rate = sf.read(batch["file"], start = start * srate , stop = stop * srate)
            batch["speech"] = librosa.resample(np.asarray(speech_array), sampling_rate, srate)
            batch["sampling_rate"] = srate
            batch["parent"] = batch["label"]
            return batch
    

    Did you preprocess your data cutted into smaller pieces so each is max. 20seconds long? Or is it possible to read the whole files in so and generate our batches according to the length of each file. As the whole thing is not quite straight forward to implement.

    opened by pascalfiv 6
  • Error opening training file, File contains data in an unknown format.

    Error opening training file, File contains data in an unknown format.

    Hi Ziad, I tried running this script that is available in the readme file to the train the MSA model:

    python run_common_voice.py --model_name_or_path="facebook/wav2vec2-large-xlsr-53" --dataset_config_name="ar" --output_dir=/path/to/output/ --cache_dir=/path/to/cache --overwrite_output_dir="yes" --num_train_epochs="1" --per_device_train_batch_size="32" --per_device_eval_batch_size="32" --evaluation_strategy="steps" --learning_rate="3e-4" --warmup_steps="500" --fp16="no" --freeze_feature_extractor="yes" --save_steps="10" --eval_steps="10" --save_total_limit="1" --logging_steps="10" --group_by_length="no" --feat_proj_dropout="0.0" --layerdrop="0.1" --do_train="yes" --do_eval="yes" --max_train_samples 100 --max_val_samples 100

    And I got this message:

    _Traceback (most recent call last): File "C:\Users\user\PycharmProjects\pythonProject1\klaam\run_common_voice.py", line 511, in main() File "C:\Users\user\PycharmProjects\pythonProject1\klaam\run_common_voice.py", line 400, in main train_dataset = train_dataset.map( File "C:\Users\user\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\datasets\arrow_dataset.py", line 1955, in map return self._map_single( File "C:\Users\user\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\datasets\arrow_dataset.py", line 520, in wrapper out: Union["Dataset", "DatasetDict"] = func(self, *args, **kwargs) File "C:\Users\user\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\datasets\arrow_dataset.py", line 487, in wrapper out: Union["Dataset", "DatasetDict"] = func(self, *args, **kwargs) File "C:\Users\user\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\datasets\fingerprint.py", line 458, in wrapper out = func(self, *args, **kwargs) File "C:\Users\user\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\datasets\arrow_dataset.py", line 2320, in map_single example = apply_function_on_filtered_inputs(example, i, offset=offset) File "C:\Users\user\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\datasets\arrow_dataset.py", line 2220, in apply_function_on_filtered_inputs processed_inputs = function(*fn_args, *additional_args, **fn_kwargs) File "C:\Users\user\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\datasets\arrow_dataset.py", line 1915, in decorated result = f(decorated_item, *args, **kwargs) File "C:\Users\user\PycharmProjects\pythonProject1\klaam\run_common_voice.py", line 394, in speech_file_to_array_fn speech_array, sampling_rate = torchaudio.load(batch["path"]) File "C:\Users\user\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\torchaudio\backend\soundfile_backend.py", line 197, in load with soundfile.SoundFile(filepath, "r") as file: File "C:\Users\user\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\soundfile.py", line 629, in init self._file = self._open(file, mode_int, closefd) File "C:\Users\user\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\soundfile.py", line 1183, in _open _error_check(_snd.sf_error(file_ptr), File "C:\Users\user\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\soundfile.py", line 1357, in error_check raise RuntimeError(prefix + _ffi.string(err_str).decode('utf-8', 'replace')) RuntimeError: Error opening '/path/to/cache\downloads\extracted\31455a499a0212b1751dd0c1547b0d360037f6a8c0a69178647a45a577d0ff67\cv-corpus-6.1-2020-12-11/ar/clips/common_voice_ar_19225971.mp3': File contains data in an unknown format.

    I think the reason behind it is that the training files are in .mp3 instead of .wav Any suggestions to how I can tackle this problem?

    opened by JihadZoabi 4
  • Speech Recognition Error

    Speech Recognition Error

    Speech Recognition

    OSError: Can't load config for 'Zaid/wav2vec2-large-xlsr-dialect-classification'. If you were trying to load it from 'https://huggingface.co/models', make sure you don't have a local directory with the same name. Otherwise, make sure 'Zaid/wav2vec2-large-xlsr-dialect-classification' is the correct path to a directory containing a config.json file

    opened by waleedahmed2090 2
  •  assert batch_size * group_size < len(dataset) AssertionError when I train the model

    assert batch_size * group_size < len(dataset) AssertionError when I train the model

    hello everyone,

    @zaidalyafeai @mustafa0x @elgeish @MagedSaeed

    I tried to train the model in my dataset and this error comes out could you please help me this is Traceback (most recent call last): File "/content/drive/MyDrive/FastSpeech2/train.py", line 198, in main(args, configs) File "/content/drive/MyDrive/FastSpeech2/train.py", line 32, in main assert batch_size * group_size < len(dataset) AssertionError

    thank you

    opened by zaynabmu 2
  • argparse.ArgumentError appears when trying to train the module

    argparse.ArgumentError appears when trying to train the module

    I tried to train the module with both scripts that are in the readme file, and both resulted in argparse.ArgumentError I tried running:

    python run_mgb3.py \
        --model_name_or_path="facebook/wav2vec2-large-xlsr-53" \
        --output_dir=/path/to/output \
        --cache_dir=/path/to/cache/ \
        --freeze_feature_extractor \
        --num_train_epochs="50" \
        --per_device_train_batch_size="32" \
        --preprocessing_num_workers="1" \
        --learning_rate="3e-5" \
        --warmup_steps="20" \
        --evaluation_strategy="steps"\
        --save_steps="100" \
        --eval_steps="100" \
        --save_total_limit="1" \
        --logging_steps="100" \
        --do_eval \
        --do_train \
    

    and also

    python run_common_voice.py \
        --model_name_or_path="facebook/wav2vec2-large-xlsr-53" \
        --dataset_config_name="ar" \
        --output_dir=/path/to/output/ \
        --cache_dir=/path/to/cache \
        --overwrite_output_dir \
        --num_train_epochs="1" \
        --per_device_train_batch_size="32" \
        --per_device_eval_batch_size="32" \
        --evaluation_strategy="steps" \
        --learning_rate="3e-4" \
        --warmup_steps="500" \
        --fp16 \
        --freeze_feature_extractor \
        --save_steps="10" \
        --eval_steps="10" \
        --save_total_limit="1" \
        --logging_steps="10" \
        --group_by_length \
        --feat_proj_dropout="0.0" \
        --layerdrop="0.1" \
        --gradient_checkpointing \
        --do_train --do_eval \
        --max_train_samples 100 --max_val_samples 100
    

    and both codes resulted in this Error:

    _2022-04-24 19:02:16.824403: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'cudart64_110.dll'; dlerror: cudart64_110.dll not found 2022-04-24 19:02:16.824670: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. Traceback (most recent call last): File "C:\Users\user\PycharmProjects\pythonProject1\klaam\run_mgb3.py", line 523, in main() File "C:\Users\user\PycharmProjects\pythonProject1\klaam\run_mgb3.py", line 263, in main parser = HfArgumentParser((ModelArguments, DataTrainingArguments, TrainingArguments)) File "C:\Users\user\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\transformers\hf_argparser.py", line 71, in init self._add_dataclass_arguments(dtype) File "C:\Users\user\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\transformers\hf_argparser.py", line 166, in _add_dataclass_arguments self._parse_dataclass_field(parser, field) File "C:\Users\user\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\transformers\hf_argparser.py", line 137, in _parse_dataclass_field parser.add_argument(field_name, **kwargs) File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.9_3.9.3312.0_x64__qbz5n2kfra8p0\lib\argparse.py", line 1440, in add_argument return self._add_action(action) File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.9_3.9.3312.0_x64__qbz5n2kfra8p0\lib\argparse.py", line 1805, in _add_action self._optionals._add_action(action) File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.9_3.9.3312.0_x64__qbz5n2kfra8p0\lib\argparse.py", line 1642, in _add_action action = super(_ArgumentGroup, self)._add_action(action) File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.9_3.9.3312.0_x64__qbz5n2kfra8p0\lib\argparse.py", line 1454, in _add_action self._check_conflict(action) File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.9_3.9.3312.0_x64__qbz5n2kfra8p0\lib\argparse.py", line 1591, in _check_conflict conflict_handler(action, confl_optionals) File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.9_3.9.3312.0_x64__qbz5n2kfra8p0\lib\argparse.py", line 1600, in handle_conflict_error raise ArgumentError(action, message % conflict_string) argparse.ArgumentError: argument --gradient_checkpointing: conflicting option string: --gradient_checkpointing

    I reproduced the error by running it on another machine and still got it. Any suggestions to how to fix it?

    opened by JihadZoabi 2
  • WER for Egyptian Arabic

    WER for Egyptian Arabic

    Hi, I was wondering the WER for Egyptian Arabic, since I don’t see a score on this page? https://huggingface.co/Zaid/wav2vec2-large-xlsr-53-arabic-egyptian

    opened by loganlebanoff 2
  • Error loading model

    Error loading model

    404 Client Error: Not Found for url: https://huggingface.co/Zaid/wav2vec2-large-xlsr-53-arabic-egyptian/resolve/main/tf_model.h5

    OSError: Can't load weights for 'Zaid/wav2vec2-large-xlsr-53-arabic-egyptian'. Make sure that:

    • 'Zaid/wav2vec2-large-xlsr-53-arabic-egyptian' is a correct model identifier listed on 'https://huggingface.co/models'

    • or 'Zaid/wav2vec2-large-xlsr-53-arabic-egyptian' is the correct path to a directory containing a file named one of pytorch_model.bin, tf_model.h5, model.ckpt.

    these 2 errors appears as i am running however i modified the code from : if lang == 'egy': model_dir = 'Zaid/wav2vec2-large-xlsr-53-arabic-egyptian' elif lang == 'msa': model_dir = 'elgeish/wav2vec2-large-xlsr-53-arabic'

    to :

    if lang == "egy": model_dir = Wav2Vec2ForCTC.from_pretrained("Zaid/wav2vec2-large-xlsr-53-arabic-egyptian") elif lang == "msa": model_dir = Wav2Vec2ForCTC.from_pretrained("elgeish/wav2vec2-large-xlsr-53-arabic") self.bw = True

    as its written in the hugging face site but still not working . Thanks in advance

    opened by Ziad-Mohamedd 2
  • Timestamps

    Timestamps

    Thank you for this work -- شكرا! I tested this briefly and found the results to be quite good. Is there any way to get time-stamped results? (My use case is forced alignment)

    opened by mustafa0x 2
  • Help required to prepare the dataset to train the model.

    Help required to prepare the dataset to train the model.

    Hello,

    I am having a hard time to train the model due to the dataset. I have downloaded the mgb3 dataset and loaded according to the error logs I came across. Could you please write the steps to prepare the dataset for train the model.

    Thanks!

    opened by muzamil47 0
  • Error

    Error

    When I run the training/final step, I get this error can you advise? ^CTraceback (most recent call last): File "train.py", line 198, in main(args, configs) File "train.py", line 93, in main nn.utils.clip_grad_norm_(model.parameters(), grad_clip_thresh) File "/home/layan/.local/lib/python3.6/site-packages/torch/nn/utils/clip_grad.py", line 36, in clip_grad_norm_ total_norm = torch.norm(torch.stack([torch.norm(p.grad.detach(), norm_type).to(device) for p in parameters]), norm_type) File "/home/layan/.local/lib/python3.6/site-packages/torch/nn/utils/clip_grad.py", line 36, in total_norm = torch.norm(torch.stack([torch.norm(p.grad.detach(), norm_type).to(device) for p in parameters]), norm_type) File "/home/layan/.local/lib/python3.6/site-packages/torch/functional.py", line 1293, in norm return _VF.norm(input, p, dim=_dim, keepdim=keepdim) # type: ignore File "/home/layan/.local/lib/python3.6/site-packages/torch/_VF.py", line 25, in getattr def getattr(self, attr):

    image

    opened by layansawalha 0
  • How to capture voice from audio device

    How to capture voice from audio device

    Hi, I had a look at how to get text from an audio file, but did not get his to extract the voice directly from the audio device through specking. i.e. without saving the voice into a wave file

    opened by hajsf 0
  • [Proposal] Codebase refactoring

    [Proposal] Codebase refactoring

    To organize the code and introduce testing and continuous integration, it would be beneficial to refactor the entire codebase.

    TL;DR

    • Re-organizing the codebase to follow best practices and to introduce testing and continuous integration.
    • Separating logic to import the package as a separate module, scripts to localize scripts that were used for train/inference of the logic, notebooks to localize demos and simple scripts that were written as notebooks, and tests to test the logic
    • Adding GitHub Actions to test build, logic of the package, auto-generate docs, and to publish the package to pypi
    • Moving from pip and requirements.txt setup to conda for environment management and poetry for packages management. This will ease the development as the project scales.

    Codebase refactoring

    Mapping

    file/dir | action | placement -- | -- | -- FastSpeed2/* | moved | kaalm/external/FastSpeed2/* dialect_speech_corpus | moved | klaam/speech_corpus/dialect.py egy_speech_corpus | moved | klaam/speech_corpus/egy.py mor_speech_corpus | moved | klaam/speech_corpus/mor.py samples | moved | samples .gitignore | moved | .gitignore LICENSE | moved | LICENSE README.md | moved | README.md audio_utils.py | moved | klaam/utils/audio.py demo.ipynb | moved | notebooks/demo.ipynb demo_with_mic.ipynb | moved | notebooks/demo_with_mix.ipynb inference.ipynb | moved | notebooks/inference.ipynb klaam.py | moved | klaam/run.py klaam_logo.PNG | moved | misc/klaam_logo.png models.py | moved | klaam/models/wav2vec.py processors.py | moved | klaam/processors/custom_wave2vec.py requirements.txt | removed |   run.sh | moved | scripts/run.sh run_classifier.py | moved | scripts/run_classifier.py run_common_voice.py | moved | scripts/run_common_voice.py run_mgb3.py | moved | scripts/run_mgb3.py run_mgb5.py | moved | scripts/run_mgb5.py sample_run.sh | moved | scripts/sample_run.sh utils.py | moved | klaam/utils/utils.py   | added | docs   | added | tests   | added | .github   | added | output   | added | environment.yml   | added | install.sh   | added | mypi.ini   | added | pyproject.toml   | added | pytest.ini   | added | ckpts

    Tree Structure

    root | level 1 | level2 | description -- | -- | -- | -- .github |   |   | github stuff (e.g. github issue templates, github actions workflows, etc.)   | workflows |   |     |   | build.yml | to test building of the package   |   | publish.yml | to publish the package to pypi   |   | tests.yml | to run tests   |   | docs.yml | to generate documentation klaam |   |   | the logic for the package   | utils |   |     |   | audio.py |     |   | utils.py |     | models |   |     |   | wav2vec.py |     | processors |   |     |   | wave2vec.py |     | external |   |     |   | FastSpeed2/* |     | speech_corpus |   |     |   | dialect.py |     |   | egy.py |     |   | mor.py |     | run.py |   |   notebooks |   |   |     | demo.ipynb |   |     | demo_with_mix.ipynb |   |     | inference.ipynb |   |     |   |   |   scripts |   |   | set of scripts to be used to train/evaluate or anything external from the logic of the package   | run.sh |   |     | run_classifier.py |   |     | run_common_voice.py |   |     | run_mgb3.py |   |     | run_mgb5.py |   |     | sample_run.sh |   |   tests |   |   | set of tests to test logics within klaam   | test_*.py |   |     | conftest.py |   |   misc |   |   |     | klaam_logo.png |   |   samples |   |   |     | demo.wav |   |   ckpts | ... |   | checkpoints of pre-trained models that were downloaded docs | ... |   | documentation files output | ... |   |   environment.yml |   |   | conda environment definition install.sh |   |   | installing script to setup conda environment and install dependecies using poetry mypy.ini |   |   | pylint configuration pyproject.toml |   |   | package definition and list of dependecies to be installed pytest.ini |   |   | pytest configuration LICENSE |   |   |   README.md |   |   |   .gitignore |   |   |  

    Environment/dependencies packages

    • conda is used to manage the environment and install essential libraries that are big/core to the package, e.g. TensorFlow, PyTorch, cudatools, etc.
    • poetry is used to manage dependencies and setup the package
    • pytest is used to enable unit/integration testing of the codebase

    Commands

    • poetry add PACKAGE - to add a package (this will append to pyproject.toml)
      • If the package installation failed and couldn't find another way to add the package, then install it using conda and add to enviroment.yml manually. (leave a comment next to the line)
      • Check on the web for the right channels when install packages using conda
    • poetry install - to install the package (package_name)
    • pytest tests - to run all tests manually
    • pytest tests/TEST_PATH - to run a specific test file (check pytest documentation for more information)

    Edit - added the following sections: env/dep packages and commands

    opened by sudomaze 6
  • Missing file

    Missing file

    Hello Mustafa & Ziad,

    I have checked your awesome work, which is really helpful to me, but I have a question please ,,I am new to this field, so could you please share with me a good reference to understand the difference between hifi-GAN and Mel-GAN?, I have checked a lot of references over the internet, but they were not that helpful! Also I have another question related to the vocoder and speaker used, when I tried different combinations I have listened and was able to know that the vocoder HiFi-GAN and the speaker universal is the best combination,,but when I tried combination LJSpeech & HIFI-GAN , I received error that the file generator_LJSpeech.pth.tar does not exist, and when I checked the files and the code, I can see the code points to this directoryFastSpeech2/hifigan/generator_LJSpeech.pth.tar but , this file does not exist "generator_LJSpeech.pth.tar"

    opened by Maria-tawfik 1
Owner
ARBML
Arabic NLP
ARBML
A Python module made to simplify the usage of Text To Speech and Speech Recognition.

Nav Module The solution for voice related stuff in Python Nav is a Python module which simplifies voice related stuff in Python. Just import the Modul

Snm Logic 1 Dec 20, 2021
Develop open-source Python Arabic NLP libraries that the Arab world will easily use in all Natural Language Processing applications

Develop open-source Python Arabic NLP libraries that the Arab world will easily use in all Natural Language Processing applications

BADER ALABDAN 2 Oct 22, 2022
Silero Models: pre-trained speech-to-text, text-to-speech models and benchmarks made embarrassingly simple

Silero Models: pre-trained speech-to-text, text-to-speech models and benchmarks made embarrassingly simple

Alexander Veysov 3.2k Dec 31, 2022
PyTorch implementation of Microsoft's text-to-speech system FastSpeech 2: Fast and High-Quality End-to-End Text to Speech.

An implementation of Microsoft's "FastSpeech 2: Fast and High-Quality End-to-End Text to Speech"

Chung-Ming Chien 1k Dec 30, 2022
Simple Speech to Text, Text to Speech

Simple Speech to Text, Text to Speech 1. Download Repository Opsi 1 Download repository ini, extract di lokasi yang diinginkan Opsi 2 Jika sudah famil

Habib Abdurrasyid 5 Dec 28, 2021
multi-label,classifier,text classification,多标签文本分类,文本分类,BERT,ALBERT,multi-label-classification,seq2seq,attention,beam search

multi-label,classifier,text classification,多标签文本分类,文本分类,BERT,ALBERT,multi-label-classification,seq2seq,attention,beam search

hellonlp 30 Dec 12, 2022
Speech Recognition for Uyghur using Speech transformer

Speech Recognition for Uyghur using Speech transformer Training: this model using CTC loss and Cross Entropy loss for training. Download pretrained mo

Uyghur 11 Nov 17, 2022
An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition

CRNN paper:An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition 1. create your ow

Tsukinousag1 3 Apr 2, 2022
A Python wrapper for simple offline real-time dictation (speech-to-text) and speaker-recognition using Vosk.

Simple-Vosk A Python wrapper for simple offline real-time dictation (speech-to-text) and speaker-recognition using Vosk. Check out the official Vosk G

null 2 Jun 19, 2022
Kashgari is a production-level NLP Transfer learning framework built on top of tf.keras for text-labeling and text-classification, includes Word2Vec, BERT, and GPT2 Language Embedding.

Kashgari Overview | Performance | Installation | Documentation | Contributing ?? ?? ?? We released the 2.0.0 version with TF2 Support. ?? ?? ?? If you

Eliyar Eziz 2.3k Dec 29, 2022
Kashgari is a production-level NLP Transfer learning framework built on top of tf.keras for text-labeling and text-classification, includes Word2Vec, BERT, and GPT2 Language Embedding.

Kashgari Overview | Performance | Installation | Documentation | Contributing ?? ?? ?? We released the 2.0.0 version with TF2 Support. ?? ?? ?? If you

Eliyar Eziz 2k Feb 9, 2021
Code for ACL 2022 main conference paper "STEMM: Self-learning with Speech-text Manifold Mixup for Speech Translation".

STEMM: Self-learning with Speech-Text Manifold Mixup for Speech Translation This is a PyTorch implementation for the ACL 2022 main conference paper ST

ICTNLP 29 Oct 16, 2022
Code for EMNLP 2021 main conference paper "Text AutoAugment: Learning Compositional Augmentation Policy for Text Classification"

Code for EMNLP 2021 main conference paper "Text AutoAugment: Learning Compositional Augmentation Policy for Text Classification"

LancoPKU 105 Jan 3, 2023
This repository contains data used in the NAACL 2021 Paper - Proteno: Text Normalization with Limited Data for Fast Deployment in Text to Speech Systems

Proteno This is the data release associated with the corresponding NAACL 2021 Paper - Proteno: Text Normalization with Limited Data for Fast Deploymen

null 37 Dec 4, 2022
Unofficial Implementation of Zero-Shot Text-to-Speech for Text-Based Insertion in Audio Narration

Zero-Shot Text-to-Speech for Text-Based Insertion in Audio Narration This repo contains only model Implementation of Zero-Shot Text-to-Speech for Text

Rishikesh (ऋषिकेश) 33 Sep 22, 2022
glow-speak is a fast, local, neural text to speech system that uses eSpeak-ng as a text/phoneme front-end.

Glow-Speak glow-speak is a fast, local, neural text to speech system that uses eSpeak-ng as a text/phoneme front-end. Installation git clone https://g

Rhasspy 8 Dec 25, 2022