End-to-end text to speech system using gruut and onnx. There are 40 voices available across 8 languages.

Rhasspy

Last update: Dec 28, 2022

Related tags

Text Data & NLP larynx

Overview

Larynx

End-to-end text to speech system using gruut and onnx. There are 40 voices available across 8 languages.

$ docker run -it -p 5002:5002 rhasspy/larynx:en-us

Larynx's goals are:

"Good enough" synthesis to avoid using a cloud service
Faster than realtime performance on a Raspberry Pi 4 (with low quality vocoder)
Broad language support (8 languages)
Voices trained purely from public datasets

Samples

Listen to voice samples from all of the pre-trained models.

Docker Installation

Pre-built Docker images for each language are available for the following platforms:

linux/amd64 - desktop/laptop/server
linux/arm64 - Raspberry Pi 64-bit
linux/arm/v7 - Raspberry Pi 32-bit

Run the Larynx web server with:

$ docker run -it -p 5002:5002 rhasspy/larynx:<LANG>

where is one of:

de-de - German
en-us - U.S. English
es-es - Spanish
fr-fr - French
it-it - Italian
nl - Dutch
ru-ru - Russian
sv-se - Swedish

Visit http://localhost:5002 for the test page. See http://localhost:5002/openapi/ for HTTP endpoint documentation.

A larger docker image with all languages is also available as rhasspy/larynx

Debian Installation

Pre-built Debian packages are available for download.

There are three different kinds of packages, so you can install exactly what you want and no more:

larynx-tts__.deb
- Base Larynx code and dependencies (always required)
- ARCH is one of amd64 (most desktops, laptops), armhf (32-bit Raspberry Pi), arm64 (64-bit Raspberry Pi)
larynx-tts-lang-__all.deb
- Language-specific data files (at least one required)
- See above for a list of languages
larynx-tts-voice-__all.deb
- Voice-specific model files (at least one required)
- See samples to decide which voice(s) to choose

As an example, let's say you want to use the "harvard-glow_tts" voice for English on an amd64 laptop for Larynx version 0.4.0. You would need to download these files:

Once downloaded, you can install the packages all at once with:

sudo apt install \
  ./larynx-tts_0.4.0_amd64.deb \
  ./larynx-tts-lang-en-us_0.4.0_all.deb \
  ./larynx-tts-voice-en-us-harvard-glow-tts_0.4.0_all.deb

From there, you may run the larynx command or larynx-server to start the web server.

Python Installation

$ pip install larynx

For Raspberry Pi (ARM), you will first need to manually install phonetisaurus.

For 32-bit ARM systems, a pre-built onnxruntime wheel is available (official 64-bit wheels are available in PyPI).

Language Download

Larynx uses gruut to transform text into phonemes. You must install the appropriate gruut language before using Larynx. U.S. English is included with gruut, but for other languages:

$ python3 -m gruut <LANGUAGE> download

Voice/Vocoder Download

Voices and vocoders are available to download from the release page. They can be extracted anywhere, and the directory simply needs to be referenced in the command-line (e,g, --voices-dir /path/to/voices).

Web Server

You can run a local web server with:

$ python3 -m larynx.server --voices-dir /path/to/voices

Visit http://localhost:5002 to view the site and try out voices. See http://localhost/5002/openapi for documentation on the available HTTP endpoints.

The following default settings can be applied (for when they're not provided in an API call):

--quality - vocoder quality (high/medium/low, default: high)
--noise-scale - voice volatility (0-1, default: 0.333)
--length-scale - voice speed (<1 is faster, default: 1.0)

You may also set --voices-dir to change where your voices/vocoders are stored. The directory structure should be /.

See --help for more options.

MaryTTS Compatible API

To use Larynx as a drop-in replacement for a MaryTTS server (e.g., for use with Home Assistant), run:

$ docker run -it -p 59125:5002 rhasspy/larynx:<LANG>

The /process HTTP endpoint should now work for voices formatted as / such as en-us/harvard-glow_tts.

You can specify the vocoder by adding ; to the MaryTTS voice.

For example: en-us/harvard-glow_tts;hifi_gan:vctk_small will use the lowest quality (but fastest) vocoder. This is usually necessary to get decent performance on a Raspberry Pi.

Available vocoders are:

hifi_gan:universal_large (best quality, slowest, default)
hifi_gan:vctk_medium (medium quality)
hifi_gan:vctk_small (lowest quality, fastest)

Command-Line Example

The command below synthesizes multiple sentences and saves them to a directory. The --csv command-line flag indicates that each sentence is of the form id|text where id will be the name of the WAV file.

$ cat << EOF |
s01|The birch canoe slid on the smooth planks.
s02|Glue the sheet to the dark blue background.
s03|It's easy to tell the depth of a well.
s04|These days a chicken leg is a rare dish.
s05|Rice is often served in round bowls.
s06|The juice of lemons makes fine punch.
s07|The box was thrown beside the parked truck.
s08|The hogs were fed chopped corn and garbage.
s09|Four hours of steady work faced us.
s10|Large size in stockings is hard to sell.
EOF
  larynx \
    --debug \
    --csv \
    --voice harvard-glow_tts \
    --quality high \
    --output-dir wavs \
    --denoiser-strength 0.001

You can use the --interactive flag instead of --output-dir to type sentences and have the audio played immediately using the play command from sox.

GlowTTS Settings

The GlowTTS voices support two additional parameters:

--noise-scale - determines the speaker volatility during synthesis (0-1, default is 0.333)
--length-scale - makes the voice speaker slower (> 1) or faster (< 1)

Vocoder Settings

--denoiser-strength - runs the denoiser if > 0; a small value like 0.005 is recommended.

List Voices and Vocoders

$ larynx --list

Text to Speech Models

GlowTTS (40 voices)
- English (en-us, 21 voices)
  - blizzard_fls (F, accent, Blizzard)
  - cmu_aew (M, Arctic)
  - cmu_ahw (M, Arctic)
  - cmu_aup (M, accent, Arctic)
  - cmu_bdl (M, Arctic)
  - cmu_clb (F, Arctic)
  - cmu_eey (F, Arctic)
  - cmu_fem (M, Arctic)
  - cmu_jmk (M, Arctic)
  - cmu_ksp (M, accent, Arctic)
  - cmu_ljm (F, Arctic)
  - cmu_lnh (F, Arctic)
  - cmu_rms (M, Arctic)
  - cmu_rxr (M, Arctic)
  - cmu_slp (F, accent, Arctic)
  - cmu_slt (F, Arctic)
  - ek (F, accent, M-AILabs)
  - harvard (F, accent, CC/Attr/NC)
  - kathleen (F, CC0)
  - ljspeech (F, Public Domain)
  - mary_ann (F, M-AILabs)
- German (de-de, 5 voices)
  - thorsten (M, CC0)
  - eva_k (F, M-AILabs)
  - karlsson (M, M-AILabs)
  - rebecca_braunert_plunkett (F, M-AILabs)
  - pavoque (M, CC4/BY/NC/SA)
- French (fr-fr, 3 voices)
  - gilles_le_blanc (M, M-AILabs)
  - siwis (F, CC/Attr)
  - tom (M, ODbL)
- Spanish (es-es, 2 voices)
  - carlfm (M, public domain)
  - karen_savage (F, M-AILabs)
- Dutch (nl, 3 voices)
  - bart_de_leeuw (M, Apache2)
  - flemishguy (M, CC0)
  - rdh (M, CC0)
- Italian (it-it, 2 voices)
  - lisa (F, M-AILabs)
  - riccardo_fasol (M, Apache2)
- Swedish (sv-se, 1 voice)
  - talesyntese (M, CC0)
- Russian (ru-ru, 3 voices)
  - hajdurova (F, M-AILabs)
  - nikolaev (M, M-AILabs)
  - minaev (M, M-AILabs)
Tacotron2
- Coming soon

Vocoders

Hi-Fi GAN
- Universal large
- VCTK "medium"
- VCTK "small"
WaveGlow
- 256 channel trained on LJ Speech

Comments

Dot (.) stops synthesis

I am new to Larynx, so maybe my question can be answered easily and quickly, but I couldn't find anything to fix it.

Whenever a dot character is encountered, synthesis ends. I don't even need multiple sentences, but if it encounters something like X (feat. Y) it just says X feat. I am using Larynx over opentts in Home Assistant, but this can easily replicated in the GUI as well. So how exactly can I fix this? And maybe for later, how exactly can I synthesize multiple sentences? Thank you very much in advance, the voices are superb!
bug

opened by chainria 7
MaryTTS API interface is not 100% compatible
Hi Michael,

congratulations for your Larynx v1.0 release :partying_face: . Great work, as usual :slightly_smiling_face:.

I've been trying to use Larynx with the new SEPIA v0.24.0 client since it has an option now to use MaryTTS compatible TTS systems directly, but encountered some issues:

The /voices endpoint is not delivering information in the same format. The MaryTTS API response is: [voice] [language] [gender] [tech=hmm] but Larynx is giving [laguage]/[voice]. Since I'm automatically parsing the string it currently fails to get the right language.

The /voices endpoint will show all voices including the ones that haven't been downloaded yet.

The Larynx quality parameter is not accessible.

The last point is not really a MaryTTS compatibility issue, but it would be great to get each voice as 'low, medium, high' variation from the 'voices' endpoint, so the user could actually choose them from the list.

I believe the Larynx MaryTTS endpoints are mostly for Home-Assistant support and I'm not sure how HA is parsing the voices list (maybe it doesn't parse it at all or just uses the whole string), but it would be great to get the original format from the /voices endpoint. Would you be willing to make these changes? :innocent: :grin:
opened by fquirin 6

Required versions for python and pip

I have a working setup on a recent linux box (with python 3.8). But now I have to use an older computer (python 3.5, pip 8.1.1) and I run into trouble:

 Using cached https://files.pythonhosted.org/packages/f8/4d/a2.../larynx-0.3.1.tar.gz
 Complete output from command python setup.py egg_info:
 Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/tmp/pip-build-sbulg1mn/larynx/setup.py", line 13
    long_description: str = ""
                    ^
 SyntaxError: invalid syntax

What are the minimum versions required by larynx at the moment?

opened by svenha 6

SSL error when downloading new tts

Steps to reproduce:

Run larynx-server on NixOS with Docker
Attempt to download a tts

Full error output:

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/app/.venv/lib/python3.7/site-packages/quart/app.py", line 1827, in full_dispatch_request
    result = await self.dispatch_request(request_context)
  File "/app/.venv/lib/python3.7/site-packages/quart/app.py", line 1875, in dispatch_request
    return await handler(**request_.view_args)
  File "/app/larynx/server.py", line 667, in api_download
    tts_model_dir = download_voice(voice_name, voices_dirs[0], url)
  File "/app/larynx/utils.py", line 78, in download_voice
    response = urllib.request.urlopen(link)
  File "/usr/lib/python3.7/urllib/request.py", line 222, in urlopen
    return opener.open(url, data, timeout)
  File "/usr/lib/python3.7/urllib/request.py", line 531, in open
    response = meth(req, response)
  File "/usr/lib/python3.7/urllib/request.py", line 641, in http_response
    'http', request, response, code, msg, hdrs)
  File "/usr/lib/python3.7/urllib/request.py", line 563, in error
    result = self._call_chain(*args)
  File "/usr/lib/python3.7/urllib/request.py", line 503, in _call_chain
    result = func(*args)
  File "/usr/lib/python3.7/urllib/request.py", line 755, in http_error_302
    return self.parent.open(new, timeout=req.timeout)
  File "/usr/lib/python3.7/urllib/request.py", line 525, in open
    response = self._open(req, data)
  File "/usr/lib/python3.7/urllib/request.py", line 543, in _open
    '_open', req)
  File "/usr/lib/python3.7/urllib/request.py", line 503, in _call_chain
    result = func(*args)
  File "/usr/lib/python3.7/urllib/request.py", line 1367, in https_open
    context=self._context, check_hostname=self._check_hostname)
  File "/usr/lib/python3.7/urllib/request.py", line 1326, in do_open
    raise URLError(err)
urllib.error.URLError: <urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1056)>

bug

opened by 18fadly-anthony 5

German voice

Did you think about using pavoque-data for German voice? https://github.com/marytts/pavoque-data/releases/tag/v0.2 It has better quality in comparing with thorsten audio dataset. It has restriction for commercial using but as I see you already have a voice with that restriction.
enhancement

opened by alt131 5
How to change port number when running from docker

I'm using the docker command mentioned in https://github.com/rhasspy/larynx#docker-installation

However, when I flip the port number in the docker run command, it doesn't work the way I want it to. May I know the right way to host larynx in a different port, if there is one

opened by thevickypedia 4
MaryTTS emulation and Home Assistant
I'm having trouble setting up the MaryTTS component in Home Assistant to work with Larynx. In particular, there are several parameters that can be defined in yaml. The docs give this example:

tts: - platform: marytts host: "localhost" port: 59125 codec: "WAVE_FILE" voice: "cmu-slt-hsmm" language: "en_US" effect: Volume: "amount:2.0;"

Larynx is up and running and I can generate speech via localhost:59125. I'd like to use a specific voice and quality setting with Home Assistant's TTS. I tried setting the following:

... voice: "harvard-glow_tts" language: "en_us" ...

But Home Assistant's log shows an error saying that "en_us" is not a valid language ("en_US" is, though).

What are the correct parameters necessary to use a specific voice? And would it be possible to use an effect key to set the voice quality (high, medium, low)?
enhancement
opened by hawkeye217 4
New languages need a link

Thanks @synesthesiam for this excellent tool.

I followed the 'Python installation' method (on Ubuntu 20.10) and added the language de-de via python3 -m gruut de-de download. Before I could use the new language, I had to add a link in ~/.local/lib/python3.8/site-packages/gruut/data/ to ~/.config/gruut/de-de; otherwise the new language was not found.
bug

opened by svenha 3
How to send text to larynx SERVER using BASH script?

Impressive program!

Using a bash script, how do I send text to the larynx SERVER?

When using the larynx CLI, there is a 5-15 second startup delay (which I'm trying to eliminate). So I'd like to start a server, then send the text to the server to avoid this startup delay.

Unfortunately, I've failed to find documentation or an example on how to connect to the server using a bash script. I've experimented with the "--daemon" option, and studied the larynx-server example, but failed to find a solution. (Please forgive my inexperience with TTS.)

I'd appreciate an example (or pointer to documentation) showing how to send text to the larynx server using a bash script. Thank you.

opened by gvimlag 2
Version/tag mismatch when downloading voices for 1.0.0 release
Looks like the Github version tag is 1.0 but the code is looking for 1.0.0. The assets exist on Github with 1.0 in the path, but I'm getting this error when trying to download voices from the web interface:

larynx.utils.VoiceDownloadError: Failed to download voice en-us_kathleen-glow_tts from http://github.com/rhasspy/larynx/releases/download/v1.0.0/en-us_kathleen-glow_tts.tar.gz: HTTP Error 404: Not Found
opened by hawkeye217 2
OpenAPI page broken

I am getting HTTP 500 returned when I go to http://localhost:5002/openapi/ - The browser page says "Fetch error undefined /openapi/swagger.json"

On the command line, I tried find /usr/local/python3/ -name '*swagger*' and only got results for the swagger_ui package in site-packages.
bug

opened by polski-g 2
Improve performance with caching

I hope to gain some understanding about how feasible and useful it is to cache certain (intermediate) outputs. If large parts of phrases are re-used often, couldn't they be cached (perhaps on multiple levels) to improve response time? And if so, the cache could be pre-populated with expected outputs by speaking them all once. For example a program that reads the time of day could have a cache for 'The time is' as well as numbers up to 59. The expected reduction in response time would depend on which parts of the process actually take the most time, which I'm not sure about.

opened by JeroenvdV 0
Dates like "1700s" and "1980s" are replaced with the current date
I was really confused for a moment the first time I encountered this. :-)

To reproduce:

Run the Larynx server via the latest Docker container.

Open the web interface. Paste a phrase like "It was popular in the distant past of the 1700s." and press "Speak".

https://user-images.githubusercontent.com/3179832/199944142-5600025a-cc4a-4821-9be1-9bed10e5ecf9.mp4
opened by dbohdan 0

voices-dir option of larynx.server doesn't work

I'm using larynx==1.1.0

Even when I specified --voices-dir with directory which contains correct voice models, it seems not be used.

python3 -m larynx.server --voices-dir /var/lib/larynx-model/voices

DEBUG:larynx.utils:Downloading voice/vocoder for en-us_cmu_eey-glow_tts to /root/.local/share/larynx/voices from http://github.com/rhasspy/larynx/releases/download/v1.0/en-us_cmu_eey-glow_tts.tar.gz
en-us_cmu_eey-glow_tts:  18%|████████████████████▍

opened by syundo0730 0

ImportError: cannot import name 'escape' from 'jinja2'

Python installation method results in the error in larynx-server on Ubuntu 22.04.1

(larynx_venv) user@hp-laptop:~/Downloads/larynx-master$ larynx-server 
Traceback (most recent call last):
  File "/home/user/Downloads/larynx-master/larynx_venv/bin/larynx-server", line 5, in <module>
    from larynx.server.__main__ import main
  File "/home/user/Downloads/larynx-master/larynx_venv/lib/python3.10/site-packages/larynx/server.py", line 24, in <module>
    import quart_cors
  File "/home/user/Downloads/larynx-master/larynx_venv/lib/python3.10/site-packages/quart_cors/__init__.py", line 5, in <module>
    from quart import abort, Blueprint, current_app, make_response, Quart, request, Response, websocket
  File "/home/user/Downloads/larynx-master/larynx_venv/lib/python3.10/site-packages/quart/__init__.py", line 3, in <module>
    from jinja2 import escape, Markup
ImportError: cannot import name 'escape' from 'jinja2' (/home/user/Downloads/larynx-master/larynx_venv/lib/python3.10/site-packages/jinja2/__init__.py)

opened by faveoled 1

Browser request for favicon.ico returns HTTP 500 error and error on console

It would be nice to trap the favicon.ico HTTP request and return a simple icon so that the browser does not get a 500 error and the client does not log the error message.

ERROR:larynx:404 Not Found: The requested URL was not found on the server. If you entered the URL manually please check your spelling and try again.
Traceback (most recent call last):
  File "/home/larynx/app/.venv/lib/python3.9/site-packages/quart/app.py", line 1490, in full_dispatch_request
    result = await self.dispatch_request(request_context)
  File "/home/larynx/app/.venv/lib/python3.9/site-packages/quart/app.py", line 1530, in dispatch_request
    self.raise_routing_exception(request_)
  File "/home/larynx/app/.venv/lib/python3.9/site-packages/quart/app.py", line 1038, in raise_routing_exception
    raise request.routing_exception
  File "/home/larynx/app/.venv/lib/python3.9/site-packages/quart/ctx.py", line 64, in match_request
    ) = self.url_adapter.match(  # type: ignore
  File "/home/larynx/app/.venv/lib/python3.9/site-packages/werkzeug/routing.py", line 2041, in match
    raise NotFound()
werkzeug.exceptions.NotFound: 404 Not Found: The requested URL was not found on the server. If you entered the URL manually please check your spelling and try again.

opened by mrdvt92 0

Releases(v1.1)

Owner

Rhasspy

Offline voice assistant

GitHub

Text to speech is a process to convert any text into voice. Text to speech project takes words on digital devices and convert them into audio. Here I have used Google-text-to-speech library popularly known as gTTS library to convert text file to .mp3 file. Hope you like my project!

Text to speech (using Python) Text to speech is a process to convert any text into voice. Text to speech project takes words on digital devices and co

19 Jun 30, 2022

PyTorch implementation of Microsoft's text-to-speech system FastSpeech 2: Fast and High-Quality End-to-End Text to Speech.

An implementation of Microsoft's "FastSpeech 2: Fast and High-Quality End-to-End Text to Speech"

1k Dec 30, 2022

Utility for Google Text-To-Speech batch audio files generator. Ideal for prompt files creation with Google voices for application in offline IVRs

Google Text-To-Speech Batch Prompt File Maker Are you in the need of IVR prompts, but you have no voice actors? Let Google talk your prompts like a pr

1 Aug 19, 2021

WIT (Wikipedia-based Image Text) Dataset is a large multimodal multilingual dataset comprising 37M+ image-text sets with 11M+ unique images across 100+ languages.

740 Dec 24, 2022

In this repository, I have developed an end to end Automatic speech recognition project. I have developed the neural network model for automatic speech recognition with PyTorch and used MLflow to manage the ML lifecycle, including experimentation, reproducibility, deployment, and a central model registry.

End to End Automatic Speech Recognition In this repository, I have developed an end to end Automatic speech recognition project. I have developed the

22 Nov 13, 2022

glow-speak is a fast, local, neural text to speech system that uses eSpeak-ng as a text/phoneme front-end.

Glow-Speak glow-speak is a fast, local, neural text to speech system that uses eSpeak-ng as a text/phoneme front-end. Installation git clone https://g

8 Dec 25, 2022

A Python/Pytorch app for easily synthesising human voices

Voice Cloning App A Python/Pytorch app for easily synthesising human voices Documentation Discord Server Video guide Voice Sharing Hub FAQ's System Re

840 Jan 4, 2023

An open collection of annotated voices in Japanese language

声庭 (Koniwa): オープンな日本語音声とアノテーションのコレクション Koniwa (声庭): An open collection of annotated voices in Japanese language 概要 Koniwa(声庭)は利用・修正・再配布が自由でオープンな音声とアノテ

32 Dec 14, 2022

A PyTorch Implementation of End-to-End Models for Speech-to-Text

speech Speech is an open-source package to build end-to-end models for automatic speech recognition. Sequence-to-sequence models with attention, Conne

647 Dec 25, 2022

Silero Models: pre-trained speech-to-text, text-to-speech models and benchmarks made embarrassingly simple

3.2k Dec 31, 2022

Simple Speech to Text, Text to Speech

Simple Speech to Text, Text to Speech 1. Download Repository Opsi 1 Download repository ini, extract di lokasi yang diinginkan Opsi 2 Jika sudah famil

5 Dec 28, 2021

Ptorch NLU, a Chinese text classification and sequence annotation toolkit, supports multi class and multi label classification tasks of Chinese long text and short text, and supports sequence annotation tasks such as Chinese named entity recognition, part of speech tagging and word segmentation.

Pytorch-NLU，一个中文文本分类、序列标注工具包，支持中文长文本、短文本的多类、多标签分类任务，支持中文命名实体识别、词性标注、分词等序列标注任务。 Ptorch NLU, a Chinese text classification and sequence annotation toolkit, supports multi class and multi label classification tasks of Chinese long text and short text, and supports sequence annotation tasks such as Chinese named entity recognition, part of speech tagging and word segmentation.

186 Dec 24, 2022

Official PyTorch code for ClipBERT, an efficient framework for end-to-end learning on image-text and video-text tasks

Official PyTorch code for ClipBERT, an efficient framework for end-to-end learning on image-text and video-text tasks. It takes raw videos/images + text as inputs, and outputs task predictions. ClipBERT is designed based on 2D CNNs and transformers, and uses a sparse sampling strategy to enable efficient end-to-end video-and-language learning.

612 Jan 4, 2023

Open-Source Toolkit for End-to-End Speech Recognition leveraging PyTorch-Lightning and Hydra.

OpenSpeech provides reference implementations of various ASR modeling papers and three languages recipe to perform tasks on automatic speech recogniti

26 Dec 14, 2022

Open-Source Toolkit for End-to-End Speech Recognition leveraging PyTorch-Lightning and Hydra.

OpenSpeech provides reference implementations of various ASR modeling papers and three languages recipe to perform tasks on automatic speech recogniti

86 Jun 11, 2021

Open-Source Toolkit for End-to-End Speech Recognition leveraging PyTorch-Lightning and Hydra.

?? Contributing to OpenSpeech ?? OpenSpeech provides reference implementations of various ASR modeling papers and three languages recipe to perform ta

513 Jan 3, 2023

End-to-End Speech Processing Toolkit

ESPnet: end-to-end speech processing toolkit system/pytorch ver. 1.0.1 1.1.0 1.2.0 1.3.1 1.4.0 1.5.1 1.6.0 1.7.1 1.8.1 ubuntu18/python3.8/pip ubuntu18

5.9k Jan 3, 2023

Espresso: A Fast End-to-End Neural Speech Recognition Toolkit

Espresso Espresso is an open-source, modular, extensible end-to-end neural automatic speech recognition (ASR) toolkit based on the deep learning libra

919 Jan 3, 2023

Athena is an open-source implementation of end-to-end speech processing engine.

Athena is an open-source implementation of end-to-end speech processing engine. Our vision is to empower both industrial application and academic research on end-to-end models for speech processing. To make speech processing available to everyone, we're also releasing example implementation and recipe on some opensource dataset for various tasks (Automatic Speech Recognition, Speech Synthesis, Voice Conversion, Speaker Recognition, etc).

34 Sep 8, 2022

End-to-end text to speech system using gruut and onnx. There are 40 voices available across 8 languages.

Related tags

Overview

Larynx

Samples

Docker Installation

Debian Installation

Python Installation

Language Download

Voice/Vocoder Download

Web Server

MaryTTS Compatible API

Command-Line Example

GlowTTS Settings

Vocoder Settings

List Voices and Vocoders

Text to Speech Models

Vocoders

Comments

Steps to reproduce:

Full error output:

Releases(v1.1)

v1.1(Nov 11, 2021)

Changed

v1.0(Oct 20, 2021)

[1.0.0] - 20 Oct 2021

Added

v0.5(Aug 23, 2021)

v0.4.0(Apr 23, 2021)

2021-03-28(Mar 28, 2021)

Owner

Rhasspy

Text to speech is a process to convert any text into voice. Text to speech project takes words on digital devices and convert them into audio. Here I have used Google-text-to-speech library popularly known as gTTS library to convert text file to .mp3 file. Hope you like my project!

PyTorch implementation of Microsoft's text-to-speech system FastSpeech 2: Fast and High-Quality End-to-End Text to Speech.

Utility for Google Text-To-Speech batch audio files generator. Ideal for prompt files creation with Google voices for application in offline IVRs

WIT (Wikipedia-based Image Text) Dataset is a large multimodal multilingual dataset comprising 37M+ image-text sets with 11M+ unique images across 100+ languages.

glow-speak is a fast, local, neural text to speech system that uses eSpeak-ng as a text/phoneme front-end.

A Python/Pytorch app for easily synthesising human voices

An open collection of annotated voices in Japanese language

A PyTorch Implementation of End-to-End Models for Speech-to-Text

Silero Models: pre-trained speech-to-text, text-to-speech models and benchmarks made embarrassingly simple

Simple Speech to Text, Text to Speech

Ptorch NLU, a Chinese text classification and sequence annotation toolkit, supports multi class and multi label classification tasks of Chinese long text and short text, and supports sequence annotation tasks such as Chinese named entity recognition, part of speech tagging and word segmentation.

Official PyTorch code for ClipBERT, an efficient framework for end-to-end learning on image-text and video-text tasks

Open-Source Toolkit for End-to-End Speech Recognition leveraging PyTorch-Lightning and Hydra.

Open-Source Toolkit for End-to-End Speech Recognition leveraging PyTorch-Lightning and Hydra.

Open-Source Toolkit for End-to-End Speech Recognition leveraging PyTorch-Lightning and Hydra.

End-to-End Speech Processing Toolkit

Espresso: A Fast End-to-End Neural Speech Recognition Toolkit

Athena is an open-source implementation of end-to-end speech processing engine.