This program uses trial auth token of Azure Cognitive Services to do speech synthesis for you.

Levi Zim

Last update: Jan 5, 2023

Related tags

Deep Learning python cli text-to-speech tts speech-synthesis azure-cognitive-services tts-engine aspeak

Overview

🗣️ aspeak

A simple text-to-speech client using azure TTS API(trial). 😆

TL;DR: This program uses trial auth token of Azure Cognitive Services to do speech synthesis for you.

You can try the Azure TTS API online: https://azure.microsoft.com/en-us/services/cognitive-services/text-to-speech

Installation

$ pip install --upgrade aspeak

Limitations

Since we are using Azure Cognitive Services, there are some limitations:

Quota	Free (F0)³
Max number of transactions per certain time period per Speech service resource
Real-time API. Prebuilt neural voices and custom neural voices.	20 transactions per 60 seconds
Adjustable	No⁴
HTTP-specific quotas
Max audio length produced per request	10 min
Max total number of distinct `<voice>` and `<audio>` tags in SSML	50
Websocket specific quotas
Max audio length produced per turn	10 min
Max total number of distinct `<voice>` and `<audio>` tags in SSML	50
Max SSML message size per turn	64 KB

This table is copied from Azure Cognitive Services documentation

And the limitations may be subject to change. The table above might become outdated in the future. Please refer to the latest Azure Cognitive Services documentation for the latest information.

Attention: If the result audio is longer than 10 minutes, the audio will be truncated to 10 minutes and the program will not report an error.

Using `aspeak` as a Python library

See DEVELOP.md for more details. You can find examples in src/examples.

Usage

usage: aspeak [-h] [-V | -L | -Q | [-t [TEXT] [-p PITCH] [-r RATE] [-S STYLE] [-R ROLE] [-d STYLE_DEGREE] | -s [SSML]]]
              [-f FILE] [-e ENCODING] [-o OUTPUT_PATH] [-l LOCALE] [-v VOICE]
              [--mp3 [-q QUALITY] | --ogg [-q QUALITY] | --webm [-q QUALITY] | --wav [-q QUALITY] | -F FORMAT] 

This program uses trial auth token of Azure Cognitive Services to do speech synthesis for you

options:
  -h, --help            show this help message and exit
  -V, --version         show program's version number and exit
  -L, --list-voices     list available voices, you can combine this argument with -v and -l
  -Q, --list-qualities-and-formats
                        list available qualities and formats
  -t [TEXT], --text [TEXT]
                        Text to speak. Left blank when reading from file/stdin
  -s [SSML], --ssml [SSML]
                        SSML to speak. Left blank when reading from file/stdin
  -f FILE, --file FILE  Text/SSML file to speak, default to `-`(stdin)
  -e ENCODING, --encoding ENCODING
                        Text/SSML file encoding, default to "utf-8"(Not for stdin!)
  -o OUTPUT_PATH, --output OUTPUT_PATH
                        Output file path, wav format by default
  --mp3                 Use mp3 format for output. (Only works when outputting to a file)
  --ogg                 Use ogg format for output. (Only works when outputting to a file)
  --webm                Use webm format for output. (Only works when outputting to a file)
  --wav                 Use wav format for output
  -F FORMAT, --format FORMAT
                        Set output audio format (experts only)
  -l LOCALE, --locale LOCALE
                        Locale to use, default to en-US
  -v VOICE, --voice VOICE
                        Voice to use
  -q QUALITY, --quality QUALITY
                        Output quality, default to 0

Options for --text:
  -p PITCH, --pitch PITCH
                        Set pitch, default to 0. Valid values include floats(will be converted to percentages), percentages such as 20% and -10%, absolute values like 300Hz, and
                        relative values like -20Hz, +2st and string values like x-low. See the documentation for more details.
  -r RATE, --rate RATE  Set speech rate, default to 0. Valid values include floats(will be converted to percentages), percentages like -20%, floats with postfix "f" (e.g. 2f means
                        doubling the default speech rate), and string values like x-slow. See the documentation for more details.
  -S STYLE, --style STYLE
                        Set speech style, default to "general"
  -R {Girl,Boy,YoungAdultFemale,YoungAdultMale,OlderAdultFemale,OlderAdultMale,SeniorFemale,SeniorMale}, --role {Girl,Boy,YoungAdultFemale,YoungAdultMale,OlderAdultFemale,OlderAdultMale,SeniorFemale,SeniorMale}
                        Specifies the speaking role-play. This only works for some Chinese voices!
  -d {values in range 0.01-2 (inclusive)}, --style-degree {values in range 0.01-2 (inclusive)}
                        Specifies the intensity of the speaking style.This only works for some Chinese voices!

Attention: If the result audio is longer than 10 minutes, the audio will be truncated to 10 minutes and the program will not report an error. Unreasonable high/low values for pitch
and rate will be clipped to reasonable values by Azure Cognitive Services.Please refer to the documentation for other limitations at
https://github.com/kxxt/aspeak/blob/main/README.md#limitations

If you don't specify -o, we will use your default speaker.
If you don't specify -t or -s, we will assume -t is provided.
You must specify voice if you want to use special options for --text.

Special Note for Pitch and Rate

rate: The speaking rate of the voice.
- If you use a float value (say 0.5), the value will be multiplied by 100% and become 50.00%.
- You can use the following values as well: x-slow, slow, medium, fast, x-fast, default.
- You can also use percentage values directly: +10%.
- You can also use a relative float value (with f postfix), 1.2f:
  - According to the Azure documentation,
  - A relative value, expressed as a number that acts as a multiplier of the default.
  - For example, a value of 1f results in no change in the rate. A value of 0.5f results in a halving of the rate. A value of 3f results in a tripling of the rate.
pitch: The pitch of the voice.
- If you use a float value (say -0.5), the value will be multiplied by 100% and become -50.00%.
- You can also use the following values as well: x-low, low, medium, high, x-high, default.
- You can also use percentage values directly: +10%.
- You can also use a relative value, (e.g. -2st or +80Hz):
  - According to the Azure documentation,
  - A relative value, expressed as a number preceded by "+" or "-" and followed by "Hz" or "st" that specifies an amount to change the pitch.
  - The "st" indicates the change unit is semitone, which is half of a tone (a half step) on the standard diatonic scale.
- You can also use an absolute value: e.g. 600Hz

Note: Unreasonable high/low values will be clipped to reasonable values by Azure Cognitive Services.

About Custom Style Degree and Role

According to the Azure documentation , style degree specifies the intensity of the speaking style. It is a floating point number between 0.01 and 2, inclusive.

At the time of writing, style degree adjustments are supported for Chinese (Mandarin, Simplified) neural voices.

According to the Azure documentation , role specifies the speaking role-play. The voice acts as a different age and gender, but the voice name isn't changed.

At the time of writing, role adjustments are supported for these Chinese (Mandarin, Simplified) neural voices: zh-CN-XiaomoNeural, zh-CN-XiaoxuanNeural, zh-CN-YunxiNeural, and zh-CN-YunyeNeural.

Examples

Speak "Hello, world!" to default speaker.

$ aspeak -t "Hello, world"

List all available voices.

$ aspeak -L

List all available voices for Chinese.

$ aspeak -L -l zh-CN

Get information about a voice.

$ aspeak -L -v en-US-SaraNeural

Output

Microsoft Server Speech Text to Speech Voice (en-US, SaraNeural)
Display Name: Sara
Local Name: Sara @ en-US
Locale: English (United States)
Gender: Female
ID: en-US-SaraNeural
Styles: ['cheerful', 'angry', 'sad']
Voice Type: Neural
Status: GA

Save synthesized speech to a file.

$ aspeak -t "Hello, world" -o output.wav

If you prefer mp3/ogg/webm, you can use --mp3/--ogg/--webm option.

$ aspeak -t "Hello, world" -o output.mp3 --mp3
$ aspeak -t "Hello, world" -o output.ogg --ogg
$ aspeak -t "Hello, world" -o output.webm --webm

List available quality levels and formats

$ aspeak -Q

Output

Available qualities:
Qualities for wav:
-2: Riff8Khz16BitMonoPcm
-1: Riff16Khz16BitMonoPcm
 0: Riff24Khz16BitMonoPcm
 1: Riff24Khz16BitMonoPcm
Qualities for mp3:
-3: Audio16Khz32KBitRateMonoMp3
-2: Audio16Khz64KBitRateMonoMp3
-1: Audio16Khz128KBitRateMonoMp3
 0: Audio24Khz48KBitRateMonoMp3
 1: Audio24Khz96KBitRateMonoMp3
 2: Audio24Khz160KBitRateMonoMp3
 3: Audio48Khz96KBitRateMonoMp3
 4: Audio48Khz192KBitRateMonoMp3
Qualities for ogg:
-1: Ogg16Khz16BitMonoOpus
 0: Ogg24Khz16BitMonoOpus
 1: Ogg48Khz16BitMonoOpus
Qualities for webm:
-1: Webm16Khz16BitMonoOpus
 0: Webm24Khz16BitMonoOpus
 1: Webm24Khz16Bit24KbpsMonoOpus

Available formats:
- Riff8Khz16BitMonoPcm
- Riff16Khz16BitMonoPcm
- Audio16Khz128KBitRateMonoMp3
- Raw24Khz16BitMonoPcm
- Raw48Khz16BitMonoPcm
- Raw16Khz16BitMonoPcm
- Audio24Khz160KBitRateMonoMp3
- Ogg24Khz16BitMonoOpus
- Audio16Khz64KBitRateMonoMp3
- Raw8Khz8BitMonoALaw
- Audio24Khz16Bit48KbpsMonoOpus
- Ogg16Khz16BitMonoOpus
- Riff8Khz8BitMonoALaw
- Riff8Khz8BitMonoMULaw
- Audio48Khz192KBitRateMonoMp3
- Raw8Khz16BitMonoPcm
- Audio24Khz48KBitRateMonoMp3
- Raw24Khz16BitMonoTrueSilk
- Audio24Khz16Bit24KbpsMonoOpus
- Audio24Khz96KBitRateMonoMp3
- Webm24Khz16BitMonoOpus
- Ogg48Khz16BitMonoOpus
- Riff48Khz16BitMonoPcm
- Webm24Khz16Bit24KbpsMonoOpus
- Raw8Khz8BitMonoMULaw
- Audio16Khz16Bit32KbpsMonoOpus
- Audio16Khz32KBitRateMonoMp3
- Riff24Khz16BitMonoPcm
- Raw16Khz16BitMonoTrueSilk
- Audio48Khz96KBitRateMonoMp3
- Webm16Khz16BitMonoOpus

Increase/Decrease audio qualities

# Less than default quality.
$ aspeak -t "Hello, world" -o output.mp3 --mp3 -q=-1
# Best quality for mp3
$ aspeak -t "Hello, world" -o output.mp3 --mp3 -q=3

Read text from file and speak it.

$ cat input.txt | aspeak

$ aspeak -f input.txt

with custom encoding:

$ aspeak -f input.txt -e gbk

Read from stdin and speak it.

$ aspeak

or (more verbose)

$ aspeak -f -

maybe you prefer:

$ aspeak -l zh-CN << EOF
我能吞下玻璃而不伤身体。
EOF

Speak Chinese.

$ aspeak -t "你好，世界！" -l zh-CN

Use a custom voice.

$ aspeak -t "你好，世界！" -v zh-CN-YunjianNeural

Custom pitch, rate and style

$ aspeak -t "你好，世界！" -v zh-CN-XiaoxiaoNeural -p 1.5 -r 0.5 -S sad
$ aspeak -t "你好，世界！" -v zh-CN-XiaoxiaoNeural -p=-10% -r=+5% -S cheerful
$ aspeak -t "你好，世界！" -v zh-CN-XiaoxiaoNeural -p=+40Hz -r=1.2f -S fearful
$ aspeak -t "你好，世界！" -v zh-CN-XiaoxiaoNeural -p=high -r=x-slow -S calm
$ aspeak -t "你好，世界！" -v zh-CN-XiaoxiaoNeural -p=+1st -r=-7% -S lyrical

Advanced Usage

Use a custom audio format for output

Note: When outputing to default speaker, using a non-wav format may lead to white noises.

$ aspeak -t "Hello World" -F Riff48Khz16BitMonoPcm -o high-quality.wav

About This Application

I found Azure TTS can synthesize nearly authentic human voice, which is very interesting 😆 .
I wrote this program to learn Azure Cognitive Services.
And I use this program daily, because espeak and festival outputs terrible 😨 audio.
- But I respect 🙌 their maintainers' work, both are good open source software and they can be used off-line.
I hope you like it ❤️ .

Alternative Applications

Comments

Could not extract token from webpage

Raise this error today

def _get_auth_token() -> str:
    """
    Get a trial auth token from the trial webpage.
    """
    response = requests.get(TRAIL_URL)
    if response.status_code != 200:
        raise errors.TokenRetrievalError(status_code=response.status_code)
    text = response.text

    # We don't need bs4, because a little of regex is enough.

    match = re.search(r'\s+var\s+localizedResources\s+=\s+\{((.|\n)*?)\};', text, re.M)
    retrieval_error = errors.TokenRetrievalError(message='Could not extract token from webpage.',
                                                 status_code=response.status_code)
    if match is None:
        raise retrieval_error
    token = re.search(r'\s+token:\s*"([^"]+)"', match.group(1), re.M)
    if token is None:
        raise retrieval_error
    return token.group(1)

https://azure.microsoft.com/en-us/services/cognitive-services/text-to-speech/#overview works normal in browser in same machine.

opened by ArthurDavidson 16

Running in python script

How can I run this in my python script, just that without text being typed in, there would be used translated_text instead: def speak_paste(): try: spoken_text = driver1.find_element_by_xpath("/html/body/div/div[2]/div[3]/span").text test_str = (spoken_text) res = " ".join(lookp_dict.get(ele, ele) for ele in test_str.split()) pyperclip.copy(res) translator = deepl.Translator('') result = translator.translate_text((res), target_lang="ru", formality="less", preserve_formatting="1") translated_text = result.text This is basically using subtitle text to translate it with deepl, but I want just to pass this translated text to azure tts to synthesize it.

opened by Funktionar 11
Faster audio output/processing

Possible to use this in real-time communications? Compared with just azure it's slower and I have the deepl API to talk with foreigners. I'd like to get the audio within 200 ms and output it to a sound device, if it's feasible.

opened by Funktionar 9

ImportError from urllib3: cannot import name 'Mapping' from 'collections'

/Users/meeia ~ aspeak -t "Hello, world"
Traceback (most recent call last):
  File "/Library/Frameworks/Python.framework/Versions/3.10/bin/aspeak", line 5, in <module>
    from aspeak.__main__ import main
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/aspeak/__main__.py", line 1, in <module>
    from .cli import main
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/aspeak/cli/__init__.py", line 1, in <module>
    from .main import main
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/aspeak/cli/main.py", line 8, in <module>
    from .voices import list_voices
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/aspeak/cli/voices.py", line 1, in <module>
    import requests
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/requests/__init__.py", line 43, in <module>
    import urllib3
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/urllib3/__init__.py", line 8, in <module>
    from .connectionpool import (
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/urllib3/connectionpool.py", line 29, in <module>
    from .connection import (
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/urllib3/connection.py", line 39, in <module>
    from .util.ssl_ import (
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/urllib3/util/__init__.py", line 3, in <module>
    from .connection import is_connection_dropped
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/urllib3/util/connection.py", line 3, in <module>
    from .wait import wait_for_read
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/urllib3/util/wait.py", line 1, in <module>
    from .selectors import (
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/urllib3/util/selectors.py", line 14, in <module>
    from collections import namedtuple, Mapping
ImportError: cannot import name 'Mapping' from 'collections' (/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/collections/__init__.py)
/Users/meeia ~

wontfix upstream

opened by cnmeeia 3

need help

need help. C:\Users\公司>aspeak Traceback (most recent call last): File "c:\users\公司\appdata\local\programs\python\python37\lib\runpy.py", line 193, in run_module_as_main "main", mod_spec) File "c:\users\公司\appdata\local\programs\python\python37\lib\runpy.py", line 85, in run_code exec(code, run_globals) File "C:\Users\公司\AppData\Local\Programs\Python\Python37\Scripts\aspeak.exe_main.py", line 5, in File "c:\users\公司\appdata\local\programs\python\python37\lib\site-packages\aspeak_main.py", line 1, in from .cli import main File "c:\users\公司\appdata\local\programs\python\python37\lib\site-packages\aspeak\cli_init_.py", line 1, in from .main import main File "c:\users\公司\appdata\local\programs\python\python37\lib\site-packages\aspeak\cli\main.py", line 11, in from .parser import parser File "c:\users\公司\appdata\local\programs\python\python37\lib\site-packages\aspeak\cli\parser.py", line 5, in from .value_parsers import pitch, rate, format File "c:\users\公司\appdata\local\programs\python\python37\lib\site-packages\aspeak\cli\value_parsers.py", line 26 if (result := try_parse_float(arg)) and result[0]: ^ SyntaxError: invalid syntax

opened by jkcox2016 3
Use self-built TTS instead of `azure.cognitiveservices.speech`

I recently finished a rebuild of some essential functions of azure.cognitiveservices.speech, and adapted it in aspeak.

It has passed all examples, and my own tts has ran properly in my own code.

opened by flt6 1

Error: Speech synthesis canceled: CancellationReason.Error

Error: Speech synthesis canceled: CancellationReason.Error
WebSocket upgrade failed: Unspecified connection error (200). USP state: 2. Received audio size: 0 bytes.

今天突然出现的，不知道什么情况。是我的网络问题吗？

opened by mephisX 1

Speech synthesis canceled: CancellationReason.Error WebSocket operation failed. Internal error: 3. Error details: WS_ERROR_UNDERLYING_IO_ERROR USP state: 4. Received audio size: 1998720 bytes.

Speech synthesis canceled: CancellationReason.Error WebSocket operation failed. Internal error: 3. Error details: WS_ERROR_UNDERLYING_IO_ERROR USP state: 4. Received audio size: 1998720 bytes.

opened by RouderSky 1

RuntimeError in ssml_to_speech_async

An RuntimeError occurred while calling ssml_to_speech_async of instance of SpeechToFileService.

CRITICAL: Traceback (most recent call last):
  ......
  File "E:\***\tts.py", line 17, in tts
    return provider.ssml_to_speech_async(ssml,path=path)  # type: ignore
  File "F:\ProgramData\Miniconda3\lib\site-packages\aspeak\api\api.py", line 110, in wrapper
    self._setup_synthesizer(kwargs['path'])
  File "F:\ProgramData\Miniconda3\lib\site-packages\aspeak\api\api.py", line 139, in _setup_synthesizer
    self._synthesizer = speechsdk.SpeechSynthesizer(self._config, self._output)
  File "F:\ProgramData\Miniconda3\lib\site-packages\azure\cognitiveservices\speech\speech.py", line 1598, in __init__
    self._impl = self._get_impl(impl.SpeechSynthesizer, speech_config, audio_config,
  File "F:\ProgramData\Miniconda3\lib\site-packages\azure\cognitiveservices\speech\speech.py", line 1703, in _get_impl
    _impl = synth_type._from_config(speech_config._impl, None if audio_config is None else audio_config._impl)
RuntimeError: Exception with an error code: 0x8 (SPXERR_FILE_OPEN_FAILED)
[CALL STACK BEGIN]

    > pal_string_to_wstring

    - pal_string_to_wstring

    - synthesizer_create_speech_synthesizer_from_config

    - synthesizer_create_speech_synthesizer_from_config

    - 00007FFE37F772C4 (SymFromAddr() error: 试图访问无效的地址。)

    - 00007FFE37FC76A8 (SymFromAddr() error: 试图访问无效的地址。)

    - 00007FFE37FC87A8 (SymFromAddr() error: 试图访问无效的地址。)

    - PyArg_CheckPositional

    - Py_NewReference

    - PyEval_EvalFrameDefault

    - Py_NewReference

    - PyEval_EvalFrameDefault

    - PyFunction_Vectorcall

    - PyFunction_Vectorcall

    - PyMem_RawStrdup

    - Py_NewReference



[CALL STACK END]

tts.py

from aspeak import SpeechToFileService,AudioFormat,FileFormat

provider=None
fmt=AudioFormat(FileFormat.MP3,-1)

def init():
    global provider
    provider=SpeechToFileService(locale="zh-CN",audio_format=fmt)

def tts(ssml:str,path:str):
    if provider is None:
        init()
    return provider.ssml_to_speech_async(ssml,path=path)  # type: ignore

The thing is, this error seemed to occurred randomly, and only when I created(Finished) over 20 ssml_to_speech_async instance does it occurs. This error seems can't be catch through try

bug wontfix upstream

opened by flt6 1

Use correct function at ssml_to_speech_async

I updated to the latest version, and found both example for ssml_to_speech_async and my own code couldn't run correctly. I found in api/api.py file function ssml_to_speech_async uses speak_ssml not speak_ssml_async, and I think this is the reason.

opened by flt6 1
Can you add an option for output file name same to text?

for example, if the text is a simple text, the output name would be same.

aspeak -t "Hello" -ot --mp3 then the output would be "Hello.mp3"
enhancement wontfix

opened by bk111 1
How can I use the variable for input parameter?

print(type(names1["hot50_cn_topic_" + str(i) ][0:280])) #结果为 str

input1 = names1["hot50_cn_topic_" + str(i) ][0:280] input2 = "近日在全国多地，许多新冠感染者们已陆续转阴，回归到正常的生活和工作中。"

#os.system('aspeak -t names1["hot50_cn_topic_" + str(i) ][0:280] -o "./1/{}{}{}_".format(year, month, day)+str(i)+".mp3" -l zh-CN') # 结果： -t names1["hot50_cn_topic_" + str(i) ][0:280] 格式不正常 #os.system('aspeak -t input1 -v zh-CN-YunjianNeural -R YoungAdultMale -o "{}".mp3'.format(out1)) # 结果： -t input1 格式不正常 #os.system('aspeak -t input2 -v zh-CN-YunjianNeural -R YoungAdultMale -o "{}".mp3'.format(out1)) # 结果： -t input2 格式不正常

os.system('aspeak -t """近日在全国多地，许多新冠感染者们已陆续转阴，回归到正常的生活和工作中。""" -v zh-CN-YunjianNeural -R YoungAdultMale -o "{}".mp3'.format(out1)) #这个是正常的

-t 之后的输入参数怎么才能换成变量呢？

opened by bk111 1
certifi-2022.9.24-py3-none-any.whl: 1 vulnerabilities (highest severity is: 6.8)
Vulnerable Library - certifi-2022.9.24-py3-none-any.whl

Python package for providing Mozilla's CA Bundle.

Library home page: https://files.pythonhosted.org/packages/1d/38/fa96a426e0c0e68aabc68e896584b83ad1eec779265a028e156ce509630e/certifi-2022.9.24-py3-none-any.whl

Path to dependency file: /tmp/ws-scm/aspeak

Path to vulnerable library: /tmp/ws-scm/aspeak,/requirements.txt

Vulnerabilities

| CVE | Severity | CVSS | Dependency | Type | Fixed in (certifi version) | Remediation Available | | ------------- | ------------- | ----- | ----- | ----- | ------------- | --- | | CVE-2022-23491 | Medium | 6.8 | certifi-2022.9.24-py3-none-any.whl | Direct | certifi - 2022.12.07 | ❌ |

Details

CVE-2022-23491

Vulnerable Library - certifi-2022.9.24-py3-none-any.whl

Python package for providing Mozilla's CA Bundle.

Library home page: https://files.pythonhosted.org/packages/1d/38/fa96a426e0c0e68aabc68e896584b83ad1eec779265a028e156ce509630e/certifi-2022.9.24-py3-none-any.whl

Path to dependency file: /tmp/ws-scm/aspeak

Path to vulnerable library: /tmp/ws-scm/aspeak,/requirements.txt

Dependency Hierarchy:

:x: certifi-2022.9.24-py3-none-any.whl (Vulnerable Library)

Found in base branch: main

Vulnerability Details

Certifi is a curated collection of Root Certificates for validating the trustworthiness of SSL certificates while verifying the identity of TLS hosts. Certifi 2022.12.07 removes root certificates from "TrustCor" from the root store. These are in the process of being removed from Mozilla's trust store. TrustCor's root certificates are being removed pursuant to an investigation prompted by media reporting that TrustCor's ownership also operated a business that produced spyware. Conclusions of Mozilla's investigation can be found in the linked google group discussion.

Publish Date: 2022-12-07
URL: CVE-2022-23491

CVSS 3 Score Details (6.8)

Base Score Metrics:

Exploitability Metrics:

Attack Vector: Network

Attack Complexity: Low

Privileges Required: High

User Interaction: None

Scope: Changed

Impact Metrics:

Confidentiality Impact: None

Integrity Impact: High

Availability Impact: None

For more information on CVSS3 Scores, click here.

Suggested Fix

Type: Upgrade version

Origin: https://www.cve.org/CVERecord?id=CVE-2022-23491

Release Date: 2022-12-07

Fix Resolution: certifi - 2022.12.07

Step up your Open Source Security Game with Mend here

security vulnerability
opened by mend-bolt-for-github[bot] 0
After upgrading to 3.1.0, the error still occurs WebSocket upgrade failed.

[91mError[0m: Speech synthesis canceled: CancellationReason.Error WebSocket upgrade failed: Unspecified connection error (200). USP state: 2. Received audio size: 0 bytes.

opened by carllx 2
ERROR: Cannot install aspeak because these package versions have conflicting dependencies.
To fix this you could try to:

loosen the range of package versions you've specified

remove package versions to allow pip attempt to solve the dependency conflict

ERROR: Cannot install aspeak==0.1, aspeak==0.1.1, aspeak==0.2.0, aspeak==0.2.1, aspeak==0.3.0, aspeak==0.3.1, aspeak==0.3.2, aspeak==1.0.0, aspeak==1.1.0, aspeak==1.1.1, aspeak==1.1.2, aspeak==1.1.3, aspeak==1.1.4, aspeak==1.2.0, aspeak==1.3.0, aspeak==1.3.1, aspeak==1.4.0, aspeak==1.4.1, aspeak==1.4.2, aspeak==2.0.0, aspeak==2.0.1, aspeak==2.1.0, aspeak==3.0.0, aspeak==3.0.1 and aspeak==3.0.2 because these package versions have conflicting dependencies.
opened by qirenzhidao 3

Releases(v3.1.0)

v3.1.0(Nov 8, 2022)
re-enable trial service

remove deprecated API

Credit goes to @ujmyvq1582, who proposed a fix and @flt6, who implemented the fix.
Source code(tar.gz)
Source code(zip)
v3.0.2(Sep 5, 2022)
Fix incompatibility with python<3.8 (#26)

Read change logs for v3.0 here

Source code(tar.gz)
Source code(zip)
v3.0.1(Sep 5, 2022)
Fix SpeechServiceBase.ssml_to_speech_async which was a sync version in v3.0. (Thanks to @flt6)

Read change logs for v3.0 here.

Source code(tar.gz)
Source code(zip)
v3.0.0(Sep 4, 2022)
Fix a critical bug which makes aspeak<v3.0.0.dev1 failing (caused by changes made on microsoft side).

BREAKING CHANGE: remove token related API(The functionality no longer requires a token but the service is more restricted)

deprecate old API

a lot of refactors

update docs & examples

new API

re-export some commonly used types from azure.cognitiveservices.speech to simplify imports on user side code.

Source code(tar.gz)
Source code(zip)
v3.0.0b2(Sep 2, 2022)

Source code(tar.gz)
Source code(zip)
v3.0.0b1(Sep 2, 2022)
new API

deprecate old API

refactors

update docs & examples

Source code(tar.gz)
Source code(zip)
v3.0.0.dev1(Sep 1, 2022)
fix error caused by api change on microsoft side

BREAKING CHANGE: remove token related API

Source code(tar.gz)
Source code(zip)
v2.1.0(Jul 1, 2022)
async mode support

get_synthesizer method for speech synthesizer

refactor typing

updated docs

:tada: :Thanks @EverythingSuckz for his/her contribution to this release!
Source code(tar.gz)
Source code(zip)
v2.0.1(Jun 26, 2022)

Source code(tar.gz)
Source code(zip)
v2.0.0(May 16, 2022)
v2.0.0 finally arrived.

Changes:

Remove old Synthesizer API

Implement new functional API

Migrate the CLI to new API

Better CLI help message

More value formats for argument pitch and rate

Argument checks for pitch/rate/format

Many internal refactors

Configure pylint

Examples for the new API

Documentation for new API

Better documentation

Export error types to top level module

Source code(tar.gz)
Source code(zip)
v2.0.0rc2(May 16, 2022)

Source code(tar.gz)
Source code(zip)
v2.0.0rc1(May 16, 2022)
Refactors

Source code(tar.gz)
Source code(zip)
v2.0.0b2(May 15, 2022)

Source code(tar.gz)
Source code(zip)
v2.0.0b1(May 15, 2022)

Source code(tar.gz)
Source code(zip)
v2.0.0.dev3(May 15, 2022)

Source code(tar.gz)
Source code(zip)
v2.0.0.dev2(May 14, 2022)

Source code(tar.gz)
Source code(zip)
v2.0.0.dev1(May 14, 2022)

Source code(tar.gz)
Source code(zip)
v2.0.0.dev0(May 14, 2022)

Experimental Python API.
Source code(tar.gz)
Source code(zip)
v1.4.2(May 12, 2022)
Update usage line

Point out the limitations

Source code(tar.gz)
Source code(zip)
v1.4.1(May 11, 2022)
Errors for ineffective command line options

Source code(tar.gz)
Source code(zip)
v1.4.0(May 11, 2022)
Change default speech rate to 0

Add support for role and style degree

Better documentation.

Source code(tar.gz)
Source code(zip)
v1.3.1(May 8, 2022)

Source code(tar.gz)
Source code(zip)
v1.3.0(May 7, 2022)

Source code(tar.gz)
Source code(zip)
v1.2.0(May 5, 2022)

Source code(tar.gz)
Source code(zip)
v1.1.4(May 5, 2022)

Source code(tar.gz)
Source code(zip)
v1.1.3(May 5, 2022)

Source code(tar.gz)
Source code(zip)
v1.1.2(May 5, 2022)

Source code(tar.gz)
Source code(zip)
v1.1.1(May 3, 2022)

Source code(tar.gz)
Source code(zip)
1.1.0(May 3, 2022)

Source code(tar.gz)
Source code(zip)
v1.0.0(May 2, 2022)

Support custom style.
Source code(tar.gz)
Source code(zip)

Owner

Levi Zim

Developer / Student / AI / Data Science :octocat: Telegram channel: t.me/kxxtchannel :octocat: PGP: 0x57670CCFA42CCF0A :octocat: Reading: t.me/kxxt_read

GitHub

ERISHA is a mulitilingual multispeaker expressive speech synthesis framework. It can transfer the expressivity to the speaker's voice for which no expressive speech corpus is available.

ERISHA: Multilingual Multispeaker Expressive Text-to-Speech Library ERISHA is a multilingual multispeaker expressive speech synthesis framework. It ca

43 Nov 27, 2022

Microsoft Cognitive Toolkit (CNTK), an open source deep-learning toolkit

CNTK Chat Windows build status Linux build status The Microsoft Cognitive Toolkit (https://cntk.ai) is a unified deep learning toolkit that describes

17.3k Dec 29, 2022

Microsoft Cognitive Toolkit (CNTK), an open source deep-learning toolkit

CNTK Chat Windows build status Linux build status The Microsoft Cognitive Toolkit (https://cntk.ai) is a unified deep learning toolkit that describes

17k Feb 11, 2021

Code for the paper Language as a Cognitive Tool to Imagine Goals in Curiosity Driven Exploration

IMAGINE: Language as a Cognitive Tool to Imagine Goals in Curiosity Driven Exploration This repo contains the code base of the paper Language as a Cog

26 Dec 22, 2022

RCD: Relation Map Driven Cognitive Diagnosis for Intelligent Education Systems

RCD: Relation Map Driven Cognitive Diagnosis for Intelligent Education Systems This is our implementation for the paper: Weibo Gao, Qi Liu*, Zhenya Hu

10 Oct 16, 2022

TalkNet 2: Non-Autoregressive Depth-Wise Separable Convolutional Model for Speech Synthesis with Explicit Pitch and Duration Prediction.

TalkNet 2 [WIP] TalkNet 2: Non-Autoregressive Depth-Wise Separable Convolutional Model for Speech Synthesis with Explicit Pitch and Duration Predictio

69 Dec 17, 2022

This program uses trial auth token of Azure Cognitive Services to do speech synthesis for you.

Related tags

Overview

🗣️ aspeak

Installation

Limitations

Using aspeak as a Python library

Usage

Special Note for Pitch and Rate

About Custom Style Degree and Role

Examples

Speak "Hello, world!" to default speaker.

List all available voices.

List all available voices for Chinese.

Get information about a voice.

Save synthesized speech to a file.

List available quality levels and formats

Increase/Decrease audio qualities

Read text from file and speak it.

Read from stdin and speak it.

Speak Chinese.

Use a custom voice.

Custom pitch, rate and style

Advanced Usage

Use a custom audio format for output

About This Application

Alternative Applications

Comments

Vulnerabilities

Details

Vulnerable Library - certifi-2022.9.24-py3-none-any.whl

Vulnerability Details

CVSS 3 Score Details (6.8)

Suggested Fix

Releases(v3.1.0)

v3.1.0(Nov 8, 2022)

v3.0.2(Sep 5, 2022)

v3.0.1(Sep 5, 2022)

v3.0.0(Sep 4, 2022)

v3.0.0b2(Sep 2, 2022)

v3.0.0b1(Sep 2, 2022)

v3.0.0.dev1(Sep 1, 2022)

v2.1.0(Jul 1, 2022)

v2.0.1(Jun 26, 2022)

v2.0.0(May 16, 2022)

v2.0.0rc2(May 16, 2022)

v2.0.0rc1(May 16, 2022)

v2.0.0b2(May 15, 2022)

v2.0.0b1(May 15, 2022)

v2.0.0.dev3(May 15, 2022)

v2.0.0.dev2(May 14, 2022)

v2.0.0.dev1(May 14, 2022)

v2.0.0.dev0(May 14, 2022)

v1.4.2(May 12, 2022)

v1.4.1(May 11, 2022)

v1.4.0(May 11, 2022)

v1.3.1(May 8, 2022)

v1.3.0(May 7, 2022)

v1.2.0(May 5, 2022)

v1.1.4(May 5, 2022)

v1.1.3(May 5, 2022)

v1.1.2(May 5, 2022)

v1.1.1(May 3, 2022)

1.1.0(May 3, 2022)

v1.0.0(May 2, 2022)

Owner

Levi Zim

ERISHA is a mulitilingual multispeaker expressive speech synthesis framework. It can transfer the expressivity to the speaker's voice for which no expressive speech corpus is available.

Microsoft Cognitive Toolkit (CNTK), an open source deep-learning toolkit

Microsoft Cognitive Toolkit (CNTK), an open source deep-learning toolkit

Code for the paper Language as a Cognitive Tool to Imagine Goals in Curiosity Driven Exploration

RCD: Relation Map Driven Cognitive Diagnosis for Intelligent Education Systems

TalkNet 2: Non-Autoregressive Depth-Wise Separable Convolutional Model for Speech Synthesis with Explicit Pitch and Duration Prediction.

HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis

PyTorch Implementation of Google Brain's WaveGrad 2: Iterative Refinement for Text-to-Speech Synthesis

PyTorch Implementation of NCSOFT's FastPitchFormant: Source-filter based Decomposed Modeling for Speech Synthesis

A Flow-based Generative Network for Speech Synthesis

PyTorch Implementation of VAENAR-TTS: Variational Auto-Encoder based Non-AutoRegressive Text-to-Speech Synthesis.

PyTorch Implementation of Daft-Exprt: Robust Prosody Transfer Across Speakers for Expressive Speech Synthesis

Official implementation of deep Gaussian process (DGP)-based multi-speaker speech synthesis with PyTorch.

Using `aspeak` as a Python library