lasr
Lightening Automatic Speech Recognition
An MIT License ASR research library, built on PyTorch-Lightning, for developing end-to-end ASR models.
Introduction
PyTorch Lightning is the lightweight PyTorch wrapper for high-performance AI research. PyTorch is extremely easy to use to build complex AI models. But once the research gets complicated and things like multi-GPU training, 16-bit precision and TPU training get mixed in, users are likely to introduce bugs. PyTorch Lightning solves exactly this problem. Lightning structures your PyTorch code so it can abstract the details of training. This makes AI research scalable and fast to iterate on.
This project is an example that implements the asr project with PyTorch Lightning. In this project, I trained a model consisting of a conformer encoder + LSTM decoder with Joint CTC-Attention. The lasr means lighthning automatic speech recognition. I hope this could be a guideline for those who research speech recognition.
Installation
This project recommends Python 3.7 or higher.
I recommend creating a new virtual environment for this project (using virtual env or conda).
Prerequisites
- Numpy:
pip install numpy
(Refer here for problem installing Numpy). - Pytorch: Refer to PyTorch website to install the version w.r.t. your environment.
- librosa:
conda install -c conda-forge librosa
(Refer here for problem installing librosa) - torchaudio:
pip install torchaudio==0.6.0
(Refer here for problem installing torchaudio) - sentencepiece:
pip install sentencepiece
(Refer here for problem installing sentencepiece) - pytorch-lightning:
pip install pytorch-lightning
(Refer here for problem installing pytorch-lightning) - hydra:
pip install hydra-core --upgrade
(Refer here for problem installing hydra)
Install from source
Currently we only support installation from source code using setuptools. Checkout the source code and run the
following commands:
pip install -e .
Install Apex (for 16-bit training)
For faster training install NVIDIA's apex library:
$ git clone https://github.com/NVIDIA/apex
$ cd apex
# ------------------------
# OPTIONAL: on your cluster you might need to load CUDA 10 or 9
# depending on how you installed PyTorch
# see available modules
module avail
# load correct CUDA before install
module load cuda-10.0
# ------------------------
# make sure you've loaded a cuda version > 4.0 and < 7.0
module load gcc-6.1.0
$ pip install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./
Get Started
I use Hydra to control all the training configurations. If you are not familiar with Hydra we recommend visiting the Hydra website. Generally, Hydra is an open-source framework that simplifies the development of research applications by providing the ability to create a hierarchical configuration dynamically.
Training Speech Recognizer
You can simply train with LibriSpeech dataset like below:
$ python ./bin/main.py --dataset_path $DATASET_PATH --dataset_download True
Check configuraions at [link]
Troubleshoots and Contributing
If you have any questions, bug reports, and feature requests, please open an issue on Github.
I appreciate any kind of feedback or contribution. Feel free to proceed with small issues like bug fixes, documentation improvement. For major contributions and new features, please discuss with the collaborators in corresponding issues.
Code Style
I follow PEP-8 for code style. Especially the style of docstrings is important to generate documentation.
License
This project is licensed under the MIT LICENSE - see the LICENSE.md file for details
Author
- Soohwan Kim @sooftware
- Contacts: [email protected]