Simple text to phones converter for multiple languages

CoML

Last update: Dec 29, 2022

Related tags

Text Data & NLP phonemizer

Overview

Phonemizer -- foʊnmaɪzɚ

The phonemizer allows simple phonemization of words and texts in many languages.
Provides both the phonemize command-line tool and the Python function phonemizer.phonemize.
It is using four backends: espeak, espeak-mbrola, festival and segments.
- espeak-ng supports a lot of languages and IPA (International Phonetic Alphabet) output.
- espeak-ng-mbrola uses the SAMPA phonetic alphabet instead of IPA but does not preserve word boundaries.
- festival currently supports only American English. It uses a custom phoneset, but it allows tokenization at the syllable level.
- segments is a Unicode tokenizer that build a phonemization from a grapheme to phoneme mapping provided as a file by the user.

Installation

You need python>=3.6. If you really need to use python2, use an older version of the phonemizer.

Dependencies

You need to install festival, espeak-ng and mbrola on your system. On Debian/Ubuntu simply run:
```
  $ sudo apt-get install festival espeak-ng mbrola
```
When using the espeak-mbrola backend, additional mbrola voices must be installed (see here). On Debian/Ubuntu, list the possible voices with apt search mbrola.

Phonemizer

The simplest way is using pip:
```
  $ pip install phonemizer
```
OR install it from sources with:
```
  $ git clone https://github.com/bootphon/phonemizer
  $ cd phonemizer
  $ [sudo] python setup.py install
```
If you experiment an error such as ImportError: No module named setuptools during installation, refeer to issue 11.

Docker image

Alternatively you can run the phonemizer within docker, using the provided `Dockerfile**. To build the docker image, have a:

$ git clone https://github.com/bootphon/phonemizer
$ cd phonemizer
$ sudo docker build -t phonemizer .

Then run an interactive session with:

$ sudo docker run -it phonemizer /bin/bash

Testing

When installed from sources or whithin a Docker image, you can run the tests suite from the root phonemizer folder (once you installed pytest):

$ pip install pytest
$ pytest

Python usage

In Python import the phonemize function with from phonemizer import phonemize. See here for function documentation.

Command-line examples

The above examples can be run from Python using the phonemize function

For a complete list of available options, have a:

$ phonemize --help

See the installed backends with the --version option:

$ phonemize --version
phonemizer-2.2
available backends: espeak-ng-1.49.3, espeak-mbrola, festival-2.5.0, segments-2.0.1

Input/output exemples

from stdin to stdout:

  $ echo "hello world" | phonemize
  həloʊ wɜːld

from file to stdout

  $ echo "hello world" > hello.txt
  $ phonemize hello.txt
  həloʊ wɜːld

from file to file

  $ phonemize hello.txt -o hello.phon --strip
  $ cat hello.phon
  həloʊ wɜːld

Backends

The default is to use espeak us-english:

  $ echo "hello world" | phonemize
  həloʊ wɜːld
  $ echo "hello world" | phonemize -l en-us -b espeak
  həloʊ wɜːld

Use festival US English instead

  $ echo "hello world" | phonemize -l en-us -b festival
  hhaxlow werld

In French, using espeak and espeak-mbrola, with custom token separators (see below). espeak-mbrola does not support words separation.

  $ echo "bonjour le monde" | phonemize -b espeak -l fr-fr -p ' ' -w '/w '
  b ɔ̃ ʒ u ʁ /w l ə /w m ɔ̃ d /w
  $ echo "bonjour le monde" | phonemize -b espeak-mbrola -l mb-fr1 -p ' ' -w '/w '
  b o~ Z u R l @ m o~ d

In Japanese, using segments

  $ echo 'konnichiwa' | phonemize -b segments -l japanese
  konnitʃiwa
  $ echo 'konnichiwa' | phonemize -b segments -l ./phonemizer/share/japanese.g2p
  konnitʃiwa

Supported languages

The exhaustive list of supported languages is available with the command phonemize --list-languages [--backend <backend>].

Languages supported by espeak are available here.
Languages supported by espeak-mbrola are available here. Please note that the mbrola voices are not bundled with the phonemizer and must be installed separately.
Languages supported by festival are:
```
  en-us -> english-us
```

Languages supported by the segments backend are:

  chintang  -> ./phonemizer/share/segments/chintang.g2p
  cree      -> ./phonemizer/share/segments/cree.g2p
  inuktitut -> ./phonemizer/share/segments/inuktitut.g2p
  japanese  -> ./phonemizer/share/segments/japanese.g2p
  sesotho   -> ./phonemizer/share/segments/sesotho.g2p
  yucatec   -> ./phonemizer/share/segments/yucatec.g2p

Instead of a language you can also provide a file specifying a grapheme to phone mapping (see the files above for examples).

Token separators

You can specify separators for phones, syllables (festival only) and words (excepted espeak-mbrola).

$ echo "hello world" | phonemize -b festival -w ' ' -p ''
hhaxlow werld

$ echo "hello world" | phonemize -b festival -p ' ' -w ''
hh ax l ow w er l d

$ echo "hello world" | phonemize -b festival -p '-' -s '|'
hh-ax-l-|ow-| w-er-l-d-|

$ echo "hello world" | phonemize -b festival -p '-' -s '|' --strip
hh-ax-l|ow w-er-l-d

$ echo "hello world" | phonemize -b festival -p ' ' -s ';esyll ' -w ';eword '
hh ax l ;esyll ow ;esyll ;eword w er l d ;esyll ;eword

You cannot specify the same separator for several tokens (for instance a space for both phones and words):

$ echo "hello world" | phonemize -b festival -p ' ' -w ' '
fatal error: illegal separator with word=" ", syllable="" and phone=" ",
must be all differents if not empty

Punctuation

By default the punctuation is removed in the phonemized output. You can preserve it using the --preserve-punctuation option (not supported by the espeak-mbrola backend):

$ echo "hello, world!" | phonemize --strip
həloʊ wɜːld

$ echo "hello, world!" | phonemize --preserve-punctuation --strip
həloʊ, wɜːld!

Espeak specific options

The espeak backend can output the stresses on phones:

  $ echo "hello world" | phonemize -l en-us -b espeak --with-stress
  həlˈoʊ wˈɜːld

The espeak backend can switch languages during phonemization (below from French to English), use the --language-switch option to deal with it:

  $ echo "j'aime le football" | phonemize -l fr-fr -b espeak --language-switch keep-flags
  [WARNING] fount 1 utterances containing language switches on lines 1
  [WARNING] extra phones may appear in the "fr-fr" phoneset
  [WARNING] language switch flags have been kept (applying "keep-flags" policy)
  ʒɛm lə- (en)fʊtbɔːl(fr)

  $ echo "j'aime le football" | phonemize -l fr-fr -b espeak --language-switch remove-flags
  [WARNING] fount 1 utterances containing language switches on lines 1
  [WARNING] extra phones may appear in the "fr-fr" phoneset
  [WARNING] language switch flags have been removed (applying "remove-flags" policy)
  ʒɛm lə- fʊtbɔːl

  $ echo "j'aime le football" | phonemize -l fr-fr -b espeak --language-switch remove-utterance
  [WARNING] removed 1 utterances containing language switches (applying "remove-utterance" policy)

Licence

This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program. If not, see http://www.gnu.org/licenses/.

Comments

add direct access to punctuation regex

Fixes #99

First pass at adding this feature. Not extensively tested yet.

I took the approach @mmmaat suggested, which made the changes pretty minimal. But I'd be happy to rewrite it the way @hadware proposed if it's decided that's better.

opened by jncasey 14

[espeak][korean] end of espeak output discarded by phonemizer

echo "하늘은 파랗게 구름은 하얗게 실바람도 불어와 부풀은 내 마음 나뭇잎 푸르게 강물도 푸르게 아름다운 이곳에 내가 있고 네가 있네 우리는 이 땅 위에 우리는 태어나고 아름다운 이곳에 자랑스런 이곳에 살리라 찬란하게 빛나는 붉은 태양이 비추고 하얀 물결 넘치는 저 바다와 함께 있네 그 얼마나 좋은가 우리 사는 이곳에 사랑하는 그대와 노래하리 빰빠밤빠밤 빠바밤 빠바밤 빰빠 빠바바바바밤 오늘도 너를 만나러 가야지 말해야지 먼 훗날에 너와 나 살고 지고 영원한 이곳에 우리의 새 꿈을 만들어 보고파 찬란하게 빛나는
붉은 태양이 비추고 하얀 물결 넘치는 저 바다와 함께 있네 그 얼마나 좋은가 우리 사는 이곳에 사랑하는 그대와 사랑하며 노래하리 빰빠밤빠밤 빠바밤 빠바밤 빰빠 빠바바바바밤 빰빠밤빠밤 빠바밤 빠바밤 빰빠 빠바바바바밤 빰빠밤빠밤 빠바밤 빠바밤 빰빠 빠바바바바밤 빰빠밤빠밤 빠바밤 빠바밤 빰빠 빠바바바바밤 오오오오 봄여름이 지나면 가을 겨울이 온다네 아름다운 강산 너의 마음은 나의 마음 나의 마음은 너의 마음 너와 나는 한마음 너와 나 우리 영원히 영원히 사랑 영원히 영원히 우리 모두 다 모두 다 끝없이 다정해 end of the sentence" | phonemize
[WARNING] 1 utterances containing language switches on lines 1
[WARNING] extra phones may appear in the "en-us" phoneset
[WARNING] language switch flags have been kept (applying "keep-flags" policy)
(ko)hɐnɯɾɯn phɐɾɐtkhe ɡuɾɯmɯn hɐjɐtkhe siɫbɐɾɐmdo puɾʌwɐ puphuɾɯnnɛmɐɯmnɐmunnip phuɾɯqe ɡɐŋmuɫdo phuɾɯqe ɐɾɯmdɐun iqosenɛqɐ itkoneqɐ inne uɾinɯn i tɐŋ wie uɾinɯn thɛʌnɐqo ɐɾɯmdɐun iqose tɕɐɾɐŋsɯɾʌn iqose sɐliɾɐ tʃhɐnɾɐnhɐqe pinnɐnɯn pulɡɯn thɛjɐŋi pitʃhuqo hɐjɐnmuɫqjʌɫnʌmtʃhinɯn tɕʌ pɐdɐwɐ hɐmqe inne ɡɯ ʌɫmɐnɐ tɕot(enus) (ko)hɐjɐnmuɫqjʌɫnʌmtʃhinɯn tɕʌ pɐdɐwɐ hɐmqe inne ɡɯ ʌɫmɐnɐ tɕoɯnqɐ uɾi sɐnɯn iqose sɐɾɐŋhɐnɯn ɡɯdɛwɐ sɐɾɐŋhɐmjʌnoɾɛhɐɾi pɐmpɐbɐmpɐbɐm pɐbɐbɐm pɐbɐbɐm pɐmpɐ pɐbɐbɐbɐbɐbɐm pɐmpɐbɐmpɐbɐm pɐbɐbɐm pɐbɐbɐm pɐmpɐ pɐbɐbɐbɐbɐbɐm pɐmpɐbɐmpɐbɐm pɐbɐbɐm pɐbɐbɐm pɐmpɐ pɐbɐbɐbɐbɐbɐm pɐmpɐbɐmpɐbɐm pɐbɐbɐm pɐbɐ(enus)

I'm using WSL to preprocess korean to ipa . for some reason the phonemizer takes only part of the sentence as input and do not preprocess characters after that . I tried using cat,echo,and phonemizer I/O(using option -o) but the result are all same

bug

opened by Ldoun 10

Request: more flexibility around punctuation definitions
Is your feature request related to a problem? Please describe. I'd like more flexibility in defining punctuation, ideally by having access directly to the regex.

Specifically, instead of defining the characters to be counted as punctuation, I think it'd be more useful to me to define which characters are words to be phonemized, and treat everything else as punctuation.

Describe the solution you'd like Something as broad as [^\p{L}\p{M}0-9'] could work as a default, which from what I understand would capture everything that's not a number, unicode letter or its diacritics.

That may be overly broad, though, because I've run into trouble with espeak and characters from Cyrillic and Korean sets already, and I'd imagine characters from other less-supported languages could also be problematic.

Describe alternatives you've considered In my local copy of phonemizer, I've played with hard-coding the punctuation regex like so:

marks = "[^a-zA-ZÀ-ÖØ-öø-ÿ0-9',.$@&+\\-=/\\\]" self._marks_re = re.compile(fr'(\s*{marks}+\s*|\s*(?<!\d),\s*|\s*(?<!\d)\.(?!\d)\s*)+')

which captures everything that's not a latin character or a set of marks that the backends can pronounce, like "æt" for "@".

The back half of this is also an attempt to handle the problem raised in #87, though I haven't tested it much, and there may be some cases where it breaks.
feature request
opened by jncasey 9
fixes to --preserve-punctuation
This reworks a few things related to preserving punctuation.

Fixes the inconsistency described in #106. Now word separators appear in the same locations whether preserve_punctuation is True or False.

Addresses part of the problem described in #104 (but doesn't return the output to what it was in 0.3, which I believe was incorrect – they'd also need to use a non-None word separator, maybe, to get what they're after?)

Fixes #108 by refactoring the restore method to be iterative instead of recursive

A number of the tests had to be updated due to that first bullet point.
opened by jncasey 8
Adding preserve_empty_lines option

Adding the feature I requested in #95.

Not stripping out the empty lines was causing problems with the festival backend and preserve_punctuation, so I took the approach of stripping out the empty lines pre-phonemization and then reinserting them afterward.

I want to flag that I changed the conversion of the input text to a list from a generator to list comprehension here since I needed to run through the list a second time to preserve the empty lines, and I thought this made the code easier to read than making another generator. I'm assuming that this won't make a big difference on performance given how I think phonemizer is generally used.

opened by jncasey 8

Windows issue with NamedTemporaryFile

when i run phonemize on windows 10 , python 3.6 i still have issue. it looks the temp file doesn't created at all. (the backslash issue fixed i can see)

>>> ph=phonemize('Hello World',strip=False,njobs=1,backend='espeak')
Failed to read file 'C:\Users\cinetec\AppData\Local\Temp\tmp5sigu2vf'
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Users\cinetec\AppData\Local\Programs\Python\Python36\lib\site-packages\phonemizer-1.0.1-py3.6.egg\phonemizer\phonemize.py", line 94, in phonemize
    text, separator=separator, strip=strip, njobs=njobs)
  File "C:\Users\cinetec\AppData\Local\Programs\Python\Python36\lib\site-packages\phonemizer-1.0.1-py3.6.egg\phonemizer\backend.py", line 130, in phonemize
    out = self._phonemize_aux(self._list2str(text), separator, strip)
  File "C:\Users\cinetec\AppData\Local\Programs\Python\Python36\lib\site-packages\phonemizer-1.0.1-py3.6.egg\phonemizer\backend.py", line 235, in _phonemize_aux
    shlex.split(command, posix=False)).decode('utf8')
  File "C:\Users\cinetec\AppData\Local\Programs\Python\Python36\lib\subprocess.py", line 356, in check_output
    **kwargs).stdout
  File "C:\Users\cinetec\AppData\Local\Programs\Python\Python36\lib\subprocess.py", line 438, in run
    output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '['espeak', '-ven-us', '--ipa=3', '-q', '-f', 'C:\\Users\\cinetec\\AppData\\Local\\Temp\\tmp5sigu2vf']' returned non-zero exit status 1.

opened by snowzhangy 8

State of the field and force alignment literature
Part of a review at openjournals/joss-reviews#3958

[ ] State of the field: Do the authors describe how this software compares to other commonly-used packages?

The manuscript identifies a number of related software packages with which this program could interface, but there is a gap in its references to the force alignment literature. Within linguistics, especially phonetic analysis, force alignment is an important part of the research pipeline whereby an acoustic signal is segmented and aligned with a text transcript. This then allows corpus queries and phonetic analysis of segments. The paper briefly touched on this when discussing Kaldi (Povey, et al. 2011), but the state of the field is broader and this program has important implications for that field. I believe the paper would be improved by further review of that literature.

The most impactful piece of software in that field is the force Alignment and Vowel Extraction (FAVE) toolkit (Rosenfelder, et al. 2014) which converts orthographic transcriptions to phonetic transcriptions through dictionary lookups using the CMU pronunciation dictionary. This has the downside of not being able to handle out-of-dictionary words requiring experimenter transcription or data exclusion.

Other researchers have been trying to improve coverage of force alignment to underdocumented languages and a major problem is the lack of grapheme-to-phoneme mappings (Barth, et al. 2020) or comprehensive pronunciation dictionaries (Johnson, Di Paolo, and Bell 2018). These can be substantial work and the language, orthographic system, or researcher time can limit the utility of these approaches.

These programs require a task similar to the one performed by this package, but do it in a seemingly different way. Comparing this package to the methods used in those packages will improve the paper by connecting it to a wider body of literature and identifying new potential areas of impact.

This is a really interesting project, and I'm excited to look further into the code!
joss
opened by chrisbrickhouse 7
instructions for docker users

How does one access one's files from inside the interactive session? That is, if I sudo docker run -it phonemizer /bin/bash I get transported inside a different space, where my files are not available. Right? And this is the only way to call phonemizer in mac?

Minor suggestion: In the instructions for docker users, add the explicit instruction to do git clone https://github.com/bootphon/phonemizer.git.

opened by alecristia 7
espeak-ng

Hi, I heard about espeak-ng and am considering installing it: https://github.com/espeak-ng/espeak-ng Do you know if phonemizer will work with espeak-ng? regards, Andrew
feature request help wanted

opened by cainesap 7
When phonemizing a text whick has more than 100k utterances, it will always gives a "RuntimeError"
Describe the bug

When phonemizing a text whick has more than 100k utterances, it will always gives a "RuntimeError" include "espeak not installed on your system"，“failed to find espeak library” and "invalid voice code 'cmn' " at around 900 utterances.

Phonemizer version phonemizer-3.0 available backends: espeak-ng-1.49.2, espeak-mbrola, festival-2.5.0, segments-2.2.0

System cat /proc/version: Linux version 4.15.0-106-generic (buildd@lcy01-amd64-016) (gcc version 7.5.0 (Ubuntu 7.5.0-3ubuntu1~18.04))

python: Python 3.9.1 (default, Dec 11 2020, 14:32:07) [GCC 7.3.0] :: Anaconda, Inc. on linux

To reproduce

txtdict = txt2dict(text_path) with open(scp_path) as f: for line in f.readlines(): txt = txtdict.get(line[0]) phone = phonemize(txt, backend='espeak', language='cmn', separator=Separator(word='/', phone=' ', syllable="-")) rows.append([wav, new_wav, txt, phone, new_phone])

Expected behavior
opened by YoungKang1222 6
Problems about Mandarin phoneme

When run phonemize with -l cmn or zh, the phoneme is International Phonetic Alphabet instead of Mandarin phoneme

Is this a code problem or a bug? How can i convert Chinese to Mandarin phoneme? Look forward to your reply, thanks!

Expected behavior I try to convert Chinese to Mandarin phoneme with this code：

cat chinese.txt | PHONEMIZER_ESPEAK_PATH=$(which espeak) phonemize -o train_out.phn -p ' ' -w '' -l zh -j 70 --language-switch remove-flags

result: fatal error: language "zh" is not supported by the espeak backend then i check: espeak --voices it include Mandarin phonemize --list-languages it said：cmn -> Chinese (Mandarin)

Phonemizer version phonemizer-3.0 espeak-ng-1.50, espeak-mbrola, festival-2.5.0, segments-2.2.0

System ubuntu 20.04
bug

opened by mynah15 6

Disparity between backends with punctuation

Describe the bug When using the default preserve_punctuation=False, the Festival backend ignores text that only contains punctuation, whereas the Espeak backend returns the empty string.

Phonemizer version

phonemizer-3.2.1
available backends: espeak-ng-1.50, espeak-mbrola, festival-2.5.0, segments-2.2.1

System Ubuntu 20.04.4 Linux kernel 5.15.0 Python 3.8.10

To reproduce

from phonemizer import phonemize

print(phonemize([".", "."], language="en-us", backend="festival"))
print(phonemize([".", "."], language="en-us", backend="espeak"))
print(phonemize([".", "."], language="mb-us1", backend="espeak-mbrola"))

Yields output

[]
['', '']
['', '']

Expected behavior Should output:

['', '']
['', '']
['', '']

bug

opened by agkphysics 1

EspeakBackend enters a corrupted state upon seeing some characters

Describe the bug When calling phonemize on an instance of EspeakBackend with the character "ꪁ", the backend enters a corrupted state where all succeeding phonemization (including in the sentence with "ꪁ") is incorrect.

Phonemizer version Phonemizer 3.2.1 Espeak NG 1.50

System Reproduced the bug both on Win10 and Ubuntu

To reproduce

from phonemizer.backend import EspeakBackend

texts = [
    "a, b, c, d, e, f, p, w, y, z",
    "ꪁ",
    "a, b, c, d, e, f, p, w, y, z"
]

backend = EspeakBackend(
    language="en-us", preserve_punctuation=True, with_stress=True,
    language_switch="remove-flags", words_mismatch="ignore"
)

for text in texts:
    print(backend.phonemize([text])[0])

Expected behavior Expected output:

ˈeɪ , bˈiː , sˈiː , dˈiː , ˈiː , ˈɛf , pˈiː , dˈʌbəljˌuː , wˈaɪ , zˈiː 

ˈeɪ , bˈiː , sˈiː , dˈiː , ˈiː , ˈɛf , pˈiː , dˈʌbəljˌuː , wˈaɪ , zˈiː

Actual output:

ˈeɪ , bˈiː , sˈiː , dˈiː , ˈiː , ˈɛf , pˈiː , dˈʌbəljˌuː , wˈaɪ , zˈiː 

ˈʌ , bˈʌ , sˈʌ , dˈʌ , ˈʌ , ˈʌf , pˈʌ , dˈʌbd-jʌ , wˈʌ , zˈʌ

bug espeak

opened by CorentinJ 1

Can't use multiple EspeakBackend objects with njobs=1

Describe the bug It seems that instantiation of multiple EspeakBackend objects is not correctly handled. All the objects start operating with the language used to instantiate the last object. Please refer to the example below.

Phonemizer version 3.0.1

System macOS 11.6.4 python 3.8.9 [Clang 13.0.0 (clang-1300.0.29.30)] on darwin

To reproduce

from phonemizer.backend import EspeakBackend

en_backend = EspeakBackend(
    "en-us",
    preserve_punctuation=True,
    with_stress=True,
    language_switch="remove-flags",
    words_mismatch="ignore",
)
en_sentence = ["I love to eat pizza everyday"]
print(en_backend.phonemize(en_sentence, njobs=1, strip=True)) # ['aɪ lˈʌv tʊ ˈiːt pˈiːtsə ˈɛvɹɪdˌeɪ']

de_backend = EspeakBackend(
    "de",
    preserve_punctuation=True,
    with_stress=True,
    language_switch="remove-flags",
    words_mismatch="ignore",
)
de_sentence = ["ich esse jeden tag gerne pizza."]
print(de_backend.phonemize(de_sentence, njobs=1, strip=True)) # ['ɪç ˈɛsə jˈeːdən tˈɑːk ɡˈɛɾnə pˈɪtsɑː.']

incorrect_en = en_backend.phonemize(en_sentence, njobs=1, strip=True)
en_with_de = de_backend.phonemize(en_sentence, njobs=1, strip=True)

assert en_with_de == incorrect_en
print(incorrect_en, en_with_de) 
# ['ˈiː lˈoːvə tˈoː eːˈɑːt pˈɪtsɑː ˈeːveːrˌyːdɛɪ'] 
# ['ˈiː lˈoːvə tˈoː eːˈɑːt pˈɪtsɑː ˈeːveːrˌyːdɛɪ']

Expected behavior Notice that incorrect_en is equal to en_with_de and not equal to en_sentence.

Additional context This problem happens only with njobs=1 and doesn't appear with njobs>1

bug espeak

opened by eeishaan 0

Use espeak phone X-SAMPA to language-specific SAMPA foldings.
Currently, to get the language-specific SAMPA form of each phoneme, a working (and espeak-friendly) installation of mbrola is required. This is problematic for several reasons:

it requires an additional system-wide package install on linux platform (even though mbrola is available on most distributions)

it requires the installation of a corresponding mbrola voice, which is either unpractical and/or quite heavy. Moreover, the entirety of the voice's speech data isn't actually used by espeak for the phonemization.

the OSX/windows support for mbrola is very bad.

The foldings are all here: https://github.com/espeak-ng/espeak-ng/tree/master/phsource/mbrola
espeak
opened by hadware 1
fatal error: language "mb-fr4" is not supported by the espeak-mbrola backend

echo "bonjour le monde" | phonemize -b espeak-mbrola -l mb-fr1 -p ' ' -w '/w ' when running this command it is giving error

fatal error: language "mb-fr4" is not supported by the espeak-mbrola backend

opened by sravani40 1

Releases(v3.2.1)

v3.2.1(Jun 9, 2022)
bug fixes

Fixed a bug when trying to restore punctuation on a multiline text. See issue #129

Source code(tar.gz)
Source code(zip)
v3.2.0(May 23, 2022)
bug fixes

Fixed a bug when trying to restore punctuation on very long text. See #108

improvements

Improved consistency with the handling of word separators when preserving punctuation, and when using a word separator that is not a literal space character. See #106

new features

Added the option to define punctuation with a regular expression. Previously only strings were accepted. See #120

In the python API, the punctuation_marks parameter can now be passed to phonemize (or a backend constructor) as a re.Pattern that defines which characters will be matched as punctuation. Passing punctuation_marks as a str will continue to function as before, treating each character in the string as a punctuation mark.

Added the optional parameter --punctuation_marks_is_regex to the CLI interface. When used, the CLI will attempt to compile a re.Pattern from the value passed to --punctuation-marks.

Source code(tar.gz)
Source code(zip)
v3.1.1(Mar 31, 2022)
ChangeLog

improvements

Preserve empty lines in texts when using --preserve-empty-lines. Without this option, empty lines used to be automatically dropped. See PR #103

new features

Type hinted most of phonemizer's API. This makes the usage of our API a bit clearer, and can be easily leveraged by IDE's and type checkers to prevent typing issues.

Source code(tar.gz)
Source code(zip)
v3.0.1(Dec 18, 2021)
ChangeLog

improvements in README after JOSS reviews

bug fixes

The method BaseBackend.phonemize now raises a RuntimeError if the input text is a str instead of a list of of str (was only logging an error message).

Preserve punctuation alignement when using --preserve-punctuation, was inserting a space before each punctuation token, see issue #97.

Source code(tar.gz)
Source code(zip)
v3.0(Oct 25, 2021)
phonemizer-3.0

breaking change

Do not remove empty lines from output. For example:

# this is now phonemize(["hello", "!??"]) == ['həloʊ ', ''] # this was phonemize(["hello", "!??"]) == ['həloʊ ']

Default backend in the phonemize function is now espeak (was festival).

espeak-mbrola backend now requires espeak>=1.49.

--espeak-path option renamed as --espeak-libraryand PHONEMIZER_ESPEAK_PATH environment variable renamed as PHONEMIZER_ESPEAK_LIBRARY.

--festival-path option renamed as --festival-executable and PHONEMIZER_FESTIVAL_PATH environment variable renamed as PHONEMIZER_FESTIVAL_EXECUTABLE.

The methods backend.phonemize() from the backend classes take only a list of str a input text (was either a str or a list of str).

The methods backend.version() from the backend classes returns a tuple of int instead of a str.

improvements

espeak and mbrola backends now rely on the espeak shared library using the ctypes Python module, instead of reliying on the espeak executable through subprocesses. This implies drastic speed improvments, up to 40 times faster.

new features

New option --prepend-text to prepend the input text to phonemized utterances, so as to have both orthographic and phonemized available at output.

New option --tie for the espeak backend to display a tie character within multi-letter phonemes. (see issue #74).

New option --words-mismatch for the espeak backend. This allows to detect when espeak merge consecutive words or drop a word from the orthographic text. Possible actions are to ignore those misatches, to issue a warning for each line where a mismatch is detectd, or to remove those lines from the output.

bugfixes

phonemizer's logger no more conflicts with other loggers when imported from Python (see PR #61).

Source code(tar.gz)
Source code(zip)
v2.2.2(Jan 6, 2021)
phonemizer-2.2.2

bugfixes

Fixed installation from source (bug introduced in 2.2.1, see issue #52).

Fixed a bug when trying to restore punctuation on an empty text (see issue #54).

Fixed an edge case bug when using custom punctuation marks (see issue #55).

Fixed regex issue that causes digits to be considered punctuation (see issue #60).

Source code(tar.gz)
Source code(zip)
v2.2.1(Jul 24, 2020)
Changelog for phonemizer-2.2.1

improvements

From Python import the phonemize function using from phonemizer import phonemize instead of from phonemizer.phonemize import phonemize. The second import is still available for compatibility.

bugfixes

Fixed a minor bug in utils.chunks.

Fixed warnings on language switching for espeak backend when using parallel jobs (see issue #50).

Save file in utf-8 explicitly for Windows compat (see issue #43).

Fixed build and tests in Dockerfile (see issue #45).

Source code(tar.gz)
Source code(zip)
v2.2(Feb 27, 2020)
ChangeLog

new features

New option --list-languages to list the available languages for a given backend from the command line.

The --sampa option of the espeak backend has been replaced by a new backend espeak-mbrola.

The former --sampa option (introduced in phonemizer-2.0) outputs phones that are not standard SAMPA but are adapted to the espeak TTS front-end.

On the other hand the espeak-mbrola backend allows espeak to output phones in standard SAMPA (adapted to the mbrola TTS front-end). This backend requires mbrola to be installed, as well as additional mbrola voices to support needed languages. This backend does not support word separation nor punctuation preservation.

bugfixes

Fixed issues with punctuation processing on some corner cases, see issues #39 and #40.

Improvments and updates in the documentation (Readme, phonemize --help and Python code).

Fixed a test when using espeak>=1.50.

Empty lines are correctly ignored when reading text from a file.

Source code(tar.gz)
Source code(zip)
v2.1(Jan 29, 2020)
ChangeLog for phonemizer-2.1

new features

Possibility to preserve the punctuation (ignored and silently removed by default) in the phonemized output with the new option --preserve-punctuation from command line (or the equivalent preserve-punctuation from Python API). With the punctuation-marks option, one can overload the default marls considered as punctuation.

It is now possible to specify the path to a custom espeak or festival executable (for instance to use a local installation or to test different versions). Either specify the PHONEMIZER_ESPEAK_PATH environment variable, the --espeak-path option from command line or use the EspeakBackend.set_espeak_path method from the Python API. Similarly for festival use PHONEMIZER_FESTIVAL_PATH, --festival-path or FestivalBackend.set_festival_path.

The --sampa option is now available for espeak (was available only for espeak-ng).

When using espeak with SAMPA output, some SAMPA phones are corrected to correspond to the normalized SAMPA alphabet (espeak seems not to respect it). The corrections are language specific. A correction file must be placed in phonemizer/share/espeak. This have been implemented only for French by now.

bugfixes

parses correctly the version of espeak-ng even for dev versions (e.g. 1.51-dev).

fixed an issue with espeak backend, where multiple phone separators can be present at the end of a word, see #31.

added an additional stress symbol - for espeak.

Source code(tar.gz)
Source code(zip)
v2.0.1(Nov 7, 2019)
phonemizer-2.0.1

bugfixes

keep-flags was not the default argument for language_switch in the class EspeakBackend.

fixed an issue with punctuation processing in the espeak backend, see #26

improvements

log a warning if using python2.

Source code(tar.gz)
Source code(zip)
v2.0(Oct 10, 2019)
ChangeLog

incompatible change

Starting with phonemizer-2.0 only python3 is supported. Compatibility with python2 is no more ensured nor tested. https://pythonclock.org.

bugfixes

new --language-switch option to use with espeak backend to deals with language switching on phonemized output. In previous version there was a bug in detection of the language switching flags (sometimes removed, sometimes not). Now you can choose to keep the flags, to remove them, or to delete the whole utterance.

bugfix in a test with espeak>=1.49.3.

bugfix using NamedTemporaryFile on windows, see #21.

bugfix when calling festival or espeak subprocesses on Windows, see #17.

bugfix in detecting recent versions of espeak-ng, see #18.

bugfix when using utf8 input on espeak backend (python2), see #19.

new features and improvements

new --sampa option to output phonemes in SAMPA alphabet instead of IPA, available for espeak-ng only.

new --with-stress option to use with espeak backend to not remove the stresses on phonemized output. For instance:

$ echo "hello world" | phonemize həloʊ wɜːld $ echo "hello world" | phonemize --with-stress həlˈoʊ wˈɜːld

improved logging: by default only warnings are displayed, use the new --quiet option to inhibate all log messages or --verbose to see all of them. Log messages now display level name (debug/info/warning).

improved code organization:

backends are now implemented in the backend submodule as separated source files.

improved version string (displays uninstalled backends, moved outside of main for use from Python).

improved logger implemented in its own module so as a call to phonemizer from CLI or API yields the same log messages.

Source code(tar.gz)
Source code(zip)
v1.0(Dec 18, 2018)
incompabile changes

The following changes break the compatibility with previous versions of phonemizer (0.X.Y):

command-line phonemize program: new --backend <espeak|festival|segments> option, default language is now espeak en-us (was festival en-us),

it is now illegal to have the same separator at different levels (for instance a space for both word and phone),

from Python, must import the phonemize function as from phonemizer.phonemize import phonemize, was from phonemizer import phonemize.

New backend segments for phonemization based on grapheme-to-phoneme mappings.

Major refactoring of the backends implementation and separators (as Python classes).

Input to phonemizer now supports utf8.

Better handling of errors (display of a meaningful message).

Fixed a bug in fetching espeak version on macos, see #14.

Source code(tar.gz)
Source code(zip)
v0.3.3(Aug 29, 2018)

ChangeLog

Fixed a bug introduced in phonemizer-0.3.2 (apostrophes in festival backend). See #12.
Source code(tar.gz)
Source code(zip)
v0.3.2(Jul 26, 2018)
ChangeLog

Continuous integration with tracis-ci

Support for docker

Better support for different versions of espeak/festival

Minor bugfixes and improved tests

Source code(tar.gz)
Source code(zip)
v0.3.1(Nov 13, 2017)
ChangeLog

New espeak or espeak-ng backend with more than 100 languages

Support for Python 2.7 and 3.5

Integration with zenodo for citation

Various bugfixes and minor improvments

Source code(tar.gz)
Source code(zip)
v0.2(May 27, 2016)

Source code(tar.gz)
Source code(zip)

Owner

CoML

GitHub

OpenAI CLIP text encoders for multiple languages!

Multilingual-CLIP OpenAI CLIP text encoders for any language Colab Notebook · Pre-trained Models · Report Bug Overview OpenAI recently released the pa

481 Dec 30, 2022

Text to speech is a process to convert any text into voice. Text to speech project takes words on digital devices and convert them into audio. Here I have used Google-text-to-speech library popularly known as gTTS library to convert text file to .mp3 file. Hope you like my project!

Text to speech (using Python) Text to speech is a process to convert any text into voice. Text to speech project takes words on digital devices and co

19 Jun 30, 2022

WIT (Wikipedia-based Image Text) Dataset is a large multimodal multilingual dataset comprising 37M+ image-text sets with 11M+ unique images across 100+ languages.

740 Dec 24, 2022

Text editor on python tkinter to convert english text to other languages with the help of ployglot.

Transliterator Text Editor This is a simple transliteration program which is used to convert english word to phonetically matching word in another lan

1 Jan 16, 2022

Text to speech converter with GUI made in Python.

Text-to-speech-with-GUI Text to speech converter with GUI made in Python. To run this download the zip file and run the main file or clone this repo.

1 Nov 15, 2021

American Sign Language (ASL) to Text Converter

Signterpreter American Sign Language (ASL) to Text Converter Recommendations Although there is grayscale and gaussian blur, we recommend that you use

0 Feb 20, 2022

End-to-end text to speech system using gruut and onnx. There are 40 voices available across 8 languages.

End to end text to speech system using gruut and onnx

673 Dec 28, 2022

Input english text, then translate it between languages n times using the Deep Translator Python Library.

mass-translator About Input english text, then translate it between languages n times using the Deep Translator Python Library. How to Use Install dep

2 Mar 4, 2022

This converter will create the exact measure for your cappuccino recipe from the grandiose Rafaella Ballerini!

About CappuccinoJs This converter will create the exact measure for your cappuccino recipe from the grandiose Rafaella Ballerini! Este conversor criar

48 Nov 15, 2022

Ptorch NLU, a Chinese text classification and sequence annotation toolkit, supports multi class and multi label classification tasks of Chinese long text and short text, and supports sequence annotation tasks such as Chinese named entity recognition, part of speech tagging and word segmentation.

Pytorch-NLU，一个中文文本分类、序列标注工具包，支持中文长文本、短文本的多类、多标签分类任务，支持中文命名实体识别、词性标注、分词等序列标注任务。 Ptorch NLU, a Chinese text classification and sequence annotation toolkit, supports multi class and multi label classification tasks of Chinese long text and short text, and supports sequence annotation tasks such as Chinese named entity recognition, part of speech tagging and word segmentation.

186 Dec 24, 2022

Text-Summarization-using-NLP - Text Summarization using NLP to fetch BBC News Article and summarize its text and also it includes custom article Summarization

Text-Summarization-using-NLP Text Summarization using NLP to fetch BBC News Arti

21 Aug 6, 2022

Coreference resolution for English, German and Polish, optimised for limited training data and easily extensible for further languages

Coreferee Author: Richard Paul Hudson, msg systems ag 1. Introduction 1.1 The basic idea 1.2 Getting started 1.2.1 English 1.2.2 German 1.2.3 Polish 1

169 Dec 21, 2022

Simple text to phones converter for multiple languages

Related tags

Overview

Phonemizer -- foʊnmaɪzɚ

Installation

Dependencies

Phonemizer

Docker image

Testing

Python usage

Command-line examples

Input/output exemples

Backends

Supported languages

Token separators

Punctuation

Espeak specific options

Licence

Comments

Releases(v3.2.1)

v3.2.1(Jun 9, 2022)

v3.2.0(May 23, 2022)

v3.1.1(Mar 31, 2022)

ChangeLog

v3.0.1(Dec 18, 2021)

ChangeLog

v3.0(Oct 25, 2021)

phonemizer-3.0

breaking change

improvements

new features

bugfixes

v2.2.2(Jan 6, 2021)

phonemizer-2.2.2

v2.2.1(Jul 24, 2020)

Changelog for phonemizer-2.2.1

v2.2(Feb 27, 2020)

ChangeLog

v2.1(Jan 29, 2020)

ChangeLog for phonemizer-2.1

v2.0.1(Nov 7, 2019)

phonemizer-2.0.1

v2.0(Oct 10, 2019)

ChangeLog

v1.0(Dec 18, 2018)

v0.3.3(Aug 29, 2018)

ChangeLog

v0.3.2(Jul 26, 2018)

ChangeLog

v0.3.1(Nov 13, 2017)

ChangeLog

v0.2(May 27, 2016)

Owner

CoML

OpenAI CLIP text encoders for multiple languages!

Text to speech is a process to convert any text into voice. Text to speech project takes words on digital devices and convert them into audio. Here I have used Google-text-to-speech library popularly known as gTTS library to convert text file to .mp3 file. Hope you like my project!

WIT (Wikipedia-based Image Text) Dataset is a large multimodal multilingual dataset comprising 37M+ image-text sets with 11M+ unique images across 100+ languages.

Text editor on python tkinter to convert english text to other languages with the help of ployglot.

Text to speech converter with GUI made in Python.

American Sign Language (ASL) to Text Converter

End-to-end text to speech system using gruut and onnx. There are 40 voices available across 8 languages.

Input english text, then translate it between languages n times using the Deep Translator Python Library.

This converter will create the exact measure for your cappuccino recipe from the grandiose Rafaella Ballerini!

Ptorch NLU, a Chinese text classification and sequence annotation toolkit, supports multi class and multi label classification tasks of Chinese long text and short text, and supports sequence annotation tasks such as Chinese named entity recognition, part of speech tagging and word segmentation.

Text-Summarization-using-NLP - Text Summarization using NLP to fetch BBC News Article and summarize its text and also it includes custom article Summarization

Official Stanford NLP Python Library for Many Human Languages

Easy to use, state-of-the-art Neural Machine Translation for 100+ languages

Official Stanford NLP Python Library for Many Human Languages

Get list of common stop words in various languages in Python

Official Stanford NLP Python Library for Many Human Languages

Get list of common stop words in various languages in Python

Share constant definitions between programming languages and make your constants constant again

Coreference resolution for English, German and Polish, optimised for limited training data and easily extensible for further languages