Wav2Vec2 STT Python
Beta Software
Simple Python library, distributed via binary wheels with few direct dependencies, for easily using wav2vec 2.0 models for speech recognition.
Requirements:
- Python 3.7+
- Platform: Linux x64 (Windows is a work in progress; MacOS may work; PRs welcome)
- Python package requirements:
cffi
,numpy
- Wav2Vec2 2.0 Model (must be converted to compatible format)
- Several are available ready-to-go on this project's releases page and below.
- You can convert your own models by following the instructions here.
Models:
Model | Download Size |
---|---|
Facebook Wav2Vec2 2.0 Base (960h) | 360 MB |
Facebook Wav2Vec2 2.0 Large (960h) | 1.18 GB |
Facebook Wav2Vec2 2.0 Large LV60 (960h) | 1.18 GB |
Facebook Wav2Vec2 2.0 Large LV60 Self (960h) | 1.18 GB |
Usage
from wav2vec2_stt import Wav2Vec2STT
decoder = Wav2Vec2STT('model_dir')
import wave
wav_file = wave.open('tests/test.wav', 'rb')
wav_samples = wav_file.readframes(wav_file.getnframes())
assert decoder.decode(wav_samples).strip().lower() == 'it depends on the context'
Also contains a simple CLI interface for recognizing wav
files:
$ python -m wav2vec2_stt decode model test.wav
IT DEPENDS ON THE CONTEXT
$ python -m wav2vec2_stt decode model test.wav test.wav
IT DEPENDS ON THE CONTEXT
IT DEPENDS ON THE CONTEXT
$ python -m wav2vec2_stt -h
usage: python -m wav2vec2_stt [-h] {decode} ...
positional arguments:
{decode} sub-command
decode decode one or more WAV files
optional arguments:
-h, --help show this help message and exit
Installation/Building
Recommended installation via wheel from pip (requires a recent version of pip):
python -m pip install wav2vec2_stt
See setup.py for more details on building it yourself.
Author
- David Zurow (@daanzu)
License
This project is licensed under the GNU Affero General Public License v3 (AGPL-3.0-or-later). See the LICENSE file for details. If this license is problematic for you, please contact me.
Acknowledgments
- Contains and uses code from PyTorch and torchaudio, licensed under the BSD 2-Clause License.