Wenet STT Python
Beta Software
Simple Python library, distributed via binary wheels with few direct dependencies, for easily using WeNet models for speech recognition.
Requirements:
- Python 3.7+ x64
- Platform: Windows/Linux/MacOS
- Python package requirements:
cffi
,numpy
- Wenet Model (must be "runtime" format)
- Several are available ready-to-go on this project's releases page and below.
Features:
- Synchronous decoding of single utterance
- Streaming decoding, using separate thread
Models:
Model | Download Size |
---|---|
gigaspeech_20210728_u2pp_conformer | 549 MB |
gigaspeech_20210811_conformer_bidecoder | 540 MB |
Usage
from wenet_stt import WenetSTTModel
model = WenetSTTModel(WenetSTTModel.build_config('model_dir'))
import wave
with wave.open('tests/test.wav', 'rb') as wav_file:
wav_samples = wav_file.readframes(wav_file.getnframes())
assert model.decode(wav_samples).lower() == 'it depends on the context'
Also contains a simple CLI interface for recognizing wav
files:
$ python -m wenet_stt decode model test.wav
IT DEPENDS ON THE CONTEXT
$ python -m wenet_stt decode model test.wav test.wav
IT DEPENDS ON THE CONTEXT
IT DEPENDS ON THE CONTEXT
$ python -m wenet_stt -h
usage: python -m wenet_stt [-h] {decode} ...
positional arguments:
{decode} sub-command
decode decode one or more WAV files
optional arguments:
-h, --help show this help message and exit
Installation/Building
Recommended installation via binary wheel from pip (requires a recent version of pip):
python -m pip install wenet_stt
For details on building from source, see the Github Actions build workflow.
Author
- David Zurow (@daanzu)
License
This project is licensed under the GNU Affero General Public License v3 (AGPL-3.0-or-later). See the LICENSE file for details. If this license is problematic for you, please contact me.
Acknowledgments
- Contains and uses code from WeNet, licensed under the Apache-2.0 License, and other transitive dependencies (see source).