Python interface to ISLEX, an English IPA pronunciation dictionary with syllable and stress marking.

Related tags

Miscellaneous pysle
Overview

pysle

Questions? Comments? Feedback?


Pronounced like 'p' + 'isle'.

An interface to a pronunciation dictionary with stress markings (ISLEX - the international speech lexicon), along with some tools for working with comparing and aligning pronunciations (e.g. a list of phones someone said versus a standard or canonical dictionary pronunciation).

Table of contents

  1. Documentation
  2. Common Use Cases
  3. Version History
  4. Requirements
  5. Optional resources
  6. Installation
  7. Example usage
  8. Citing psyle
  9. Acknowledgements

Documentation

Automatically generated pdocs can be found here:

http://timmahrt.github.io/pysle/

The documentation is generated with the following command: pdoc ./pysle -d google -o docs

Common Use Cases

What can you do with this library?

  • look up the list of phones and syllables for canonical pronunciations of a word

    isletool.LexicalTool('ISLEdict.txt').lookup('cat')
  • map an actual pronunciation to a dictionary pronunciation (can be used to automatically find speech errors)

    pronunciationtools.findClosestPronunciation(isleDict, 'cat', ['k', 'æ',])
  • automatically syllabify a praat textgrid containing words and phones (e.g. force-aligned text) -- requires the praatIO library

    pysle.syllabifyTextgrid(isleDict, praatioTextgrid, "words", "phones")
  • search for words based on pronunciation

    isletool.LexicalTool('ISLEdict.txt').search('dVV') # Any word containing a 'd' followed by two vowels

    e.g. Words that start with a sound, or have a sound word medially, or in stressed vowel position, etc.

    see /tests/dictionary_search.py

Version History

Pysle uses semantic versioning (Major.Minor.Patch)

Please view CHANGELOG.md for version history.

Requirements

  • Python 3.7.* or above (or below, probably)

Click here to visit travis-ci and see the specific versions of python that pysle is currently tested under

  • The praatIO library is required IF you want to use the textgrid functionality. It is not required for normal use.

ISLE Dictionary

pysle requires the ISLEdict pronunciation dictionary (copyright Mark Hasegawa-Johnson, licensed under the MIT open source license). This is bundled with psyle. However, you may want to use a subset of the pronunciations or you may want to add your own pronunciations.

In that case, please get the original file.

ISLEX github page

Direct link to the ISLEX file used in this project (ISLEdict.txt)

See examples/isletool_examples.py for an example of how to load a custom ISLEdict file.

Installation

Pysle is on pypi and can be installed or upgraded from the command-line shell with pip like so

python -m pip install pysle --upgrade

Otherwise, to manually install, after downloading the source from github, from a command-line shell, navigate to the directory containing setup.py and type

python setup.py install

If python is not in your path, you'll need to enter the full path e.g.

C:\Python36\python.exe setup.py install

Example usage

Here is a typical usage

from pysle import isletool
isleDict = isletool.LexicalTool('C:\islev2.dict')
print(isleDict.lookup('catatonic')[0]) # Get the first pronunciation
# >> (([['k', 'ˌæ'], ['ɾ', 'ə'], ['t', 'ˈɑ'], ['n', 'ɪ', 'k']], [2, 0], [1, 1]),)

and another

from pysle import isletool
from pysle import pronunciationtools

isleDict = isletool.LexicalTool('C:\islev2.dict')

searchWord = 'another'
phoneList = ['n', '@', 'th', 'r'] # Actually produced (ASCII or IPA ok here)

returnList = pronunciationtools.findBestSyllabification(isleDict, searchWord, phoneList)
syllableList = returnList[2]
print(syllableList)
# >> [["''"], ['n', '@'], ['th', 'r']]

Please see \examples for example usage

Citing pysle

Pysle is general purpose coding and doesn't need to be cited (you should cite the ISLEX project instead) but if you would like to, it can be cited like so:

Tim Mahrt. Pysle. https://github.com/timmahrt/pysle, 2016.

Acknowledgements

Development of Pysle was possible thanks to NSF grant IIS 07-03624 to Jennifer Cole and Mark Hasegawa-Johnson, NSF grant BCS 12-51343 to Jennifer Cole, José Hualde, and Caroline Smith, and to the A*MIDEX project (n° ANR-11-IDEX-0001-02) to James Sneed German funded by the Investissements d'Avenir French Government program, managed by the French National Research Agency (ANR).

Comments
  • no dictionary

    no dictionary

    from pysle import isletool
    
    isletool.LexicalTool('ISLEdict.txt').lookup('cat')
    

    I think u should point all data files into MANIFEST.in file

    See https://realpython.com/pypi-publish-python-package/

    I do something like for persian language: https://github.com/PasaOpasen/PersianG2P

    opened by PasaOpasen 15
  • [Question] Lookup by exact IPA match

    [Question] Lookup by exact IPA match

    Say I want to disambiguate two pronunciations of the word "bow". I can get these from your library (small excerpt from a large number of matches):

    >>> isleDict.search('boʊ')
    ... ('bow', [('# b ˈoʊ #', ['nn', 'vb', 'vbp'])]), ('bow-and-quarter_line', [('# b ˈoʊ . n̩ # k w ˈɑ ɹ . t ə ɹ #', [])]),  ...
    

    I'm having trouble figuring out how to modify the regex to only get '# b ˈoʊ #' without the extra stuff. Any suggestions? I tried the regular anchor characters (^, $) without luck.

    opened by sevagh 8
  • Syllable markers for written word

    Syllable markers for written word

    Hello!

    Firstly, thank you so much for your hard work on this. I'm working on a project to try and make reading more accessible, and your ISLE dictionary interface is a huge help. What I'm trying to do is visually tie together the pronunciation with the letters it stems from.

    Unfortunately, this is pretty tricky in English, particularly at the phonetic level. I assume it is quite doable at the syllabic level.

    I found the MOBY hyphenator project, and it seems I could merge the two dictionaries, however it is inconsistent with ISLE, for instance:

    eve¥ry¥day everyday(jj) # ˈɛ̃ . v ɹ i . d ˌei #

    In addition, it seems to be missing some very basic words like:

    $ grep -E '^ha[\^rd]+en' mhyph.txt | head -n1
    hard^en
    $ grep -E '^ha[\^rd]+er' mhyph.txt | head -n1
    $
    

    Any ideas on how to go about this?

    Thanks! Jacob

    opened by jacobSingh 8
  • No attribute 'isletool'

    No attribute 'isletool'

    Let's see:

    
    import pysle
    
    pysle.isletool.LexicalTool.lookup('cat')
    Traceback (most recent call last):
    
      File "<ipython-input-1-09f456e372af>", line 3, in <module>
        pysle.isletool.LexicalTool.lookup('cat')
    
    AttributeError: module 'pysle' has no attribute 'isletool'
    
    
    from pysle import isle
    Traceback (most recent call last):
    
      File "<ipython-input-2-23a7e2ed01cb>", line 1, in <module>
        from pysle import isle
    
    ImportError: cannot import name 'isle' from 'pysle' (C:\ProgramData\Anaconda3\lib\site-packages\pysle\__init__.py)
    
    
    from pysle import pronunciationTools
    Traceback (most recent call last):
    
      File "<ipython-input-3-23b867baacbb>", line 1, in <module>
        from pysle import pronunciationTools
    
    ImportError: cannot import name 'pronunciationTools' from 'pysle' (C:\ProgramData\Anaconda3\lib\site-packages\pysle\__init__.py)
    
    opened by PasaOpasen 4
  • 'ɛ̃' missing as vowel in isletool

    'ɛ̃' missing as vowel in isletool

    the vowel sound 'ɛ̃' is not accounted for in the search functionality of isletool.

    This means that a search like: isleDict.search("",numSyllables=2,multiword='no',stressedSyllable='only',wordInitial='only') should include ('welder', ['# w ˈɛ̃ l . d ɚ #']) but it does not (unless I'm misunderstanding something?)

    I was able to fix it for my needs by adding 'ɛ̃' to isletool.charList and isletool.vowelList, and changing line 211 of isletool.py to:

    "V": u"(?:aʊ|ei|oʊ|ɑɪ|ɔi|[iuæɑɔəɛɪʊʌɛ̃]):?",  # vowels
    
    opened by mcmahanp 4
  • "." as phone

    I don't know enough about linguistic to know if "." is a phone but it comes up for some words. In some lines in my dictionary like "pirouette" the entry ends with ". #" leading to "." showing up as a phone. Is this intended?

    from pysle import isletool
    isleDict = isletool.LexicalTool('../data/isledict/ISLEdict.txt')
    isleDict.data["pirouette"]
    #> ['# p ˌɪ . ɹ u . ˈɛ̃ . #', '# p ˌɪ . ɹ u . ˈɛ̃ t #']
    isleDict.lookup('pirouette')
    #> [([['p', 'ˌɪ'], ['ɹ', 'u'], ['ˈɛ̃', '.']], [2, 0], [0, 1]),
    #> ([['p', 'ˌɪ'], ['ɹ', 'u'], ['ˈɛ̃', 't']], [2, 0], [0, 1])]
    
    opened by wassname 4
  • Don't read the full isle dictionary into memory?

    Don't read the full isle dictionary into memory?

    Right now isledict is read fully into memory. This is probably fine if an application needs to be able to quickly access items over a long period, but for most use cases is probably unnecessary.

    I'm wondering if it might be possible to quickly search for desired content and load only that into memory?

    opened by timmahrt 2
  • Add transcribe function

    Add transcribe function

    sentence = "do you want another pumpkinseed"
    phoneList = isletool.transcribe(isleDict, sentence, 'longest')
    print(phoneList)
    
    > 'du ju want ənʌðəɹ pʌmpkɪnsid'
    
    opened by timmahrt 1
  • multi word entries give single word pronunciation

    multi word entries give single word pronunciation

    I'm not sure if this is intended but multi word entries only return the first words pronunciation. E.g. for "australian_seal" we only get the pronunciation for australian.

    from pysle import isletool
    isleDict = isletool.LexicalTool('../data/isledict/ISLEdict.txt')
    
    isleDict.data["australian_seal"]
    #> ['# ɑ . s t ɹ ˈei l . j n̩ # s i l #']
    
    isleDict.lookup('australian_seal'))
    #> [([['ɑ'], ['s', 't', 'ɹ', 'ˈei', 'l'], ['j', 'n̩']], [1], [3])]
    

    Thanks for making this package!

    opened by wassname 1
  • Pysle 4

    Pysle 4

    The intention with version 4 is to modernize pysle a bit

    • [x] lazy load isle dictionary for faster boot times (~20s -> 0.5s)
    • [x] type hinting
    • [x] move to object oriented
    • [x] function signature/return type documentation
    • [x] unit tests
    • [x] simple use tutorial
    opened by timmahrt 0
  • Praattool syllabification fixes

    Praattool syllabification fixes

    This PR does the following:

    • [x] fixes several important bugs related to syllable alignment
      • a variable set inside a loop was never properly initialized (if there is not stress specified by ISLE, a stressed syllable of the current word would be marked the same as the previous word)
      • a argument to a function was mutated inside of the function (if there were multiple entries for a word in ISLE, later entries would be penalized stronger compared with earlier entries)
    • [x] adds more flexibility in error handling with praattools.syllabifyTextgrid
    • [x] requires praatio 5.0 or greater
    • [x] drops support for python 2.7
    opened by timmahrt 0
  • Standardize input representation of phonemes

    Standardize input representation of phonemes

    Phones in the ISLE dictionary are in IPA. There is no constraint about what the input to ISLE is. pysle tries to smudge the input to be ipa-like but this is kindof hidden to the user. Non-IPA representations probably work sometimes but this is a dangerous assumption.

    It's also possible that some parts of the code forgot this fact and assume the input is NOT IPA.

    The input representation /is/ preserved in the output, so that is good at least.

    Seems like a headache to validate everything is working ok.

    opened by timmahrt 1
Owner
Tim
I write tools for working with speech data.
Tim
python's memory-saving dictionary data structure

ConstDict python代替的Dict数据结构 若字典不会增加字段,只读/原字段修改 使用ConstDict可节省内存 Dict()内存主要消耗的地方: 1、Dict扩容机制,预留内存空间 2、Dict也是一个对象,内部会动态维护__dict__,增加slot类属性可以节省内容 节省内存大小

Grenter 1 Nov 3, 2021
Dicionario-git-github - Dictionary created to help train new users of Git and GitHub applications

Dicionário ?? Dicionário criado com o objetivo de auxiliar no treinamento de nov

Felippe Rafael 1 Feb 7, 2022
Telegram bot for Urban Dictionary.

Urban Dictionary Bot @TheUrbanDictBot A star ⭐ from you means a lot to us! Telegram bot for Urban Dictionary. Usage Deploy to Heroku Tap on above butt

Stark Bots 17 Nov 24, 2022
✨ Udemy Coupon Finder For Discord. Supports Turkish & English Language.

Udemy Course Finder Bot | Udemy Kupon Bulucu Botu This bot finds new udemy coupons and sends to the channel. Before Setup You must have python >= 3.6

Penguen 4 May 4, 2022
python DroneCAN code generation, interface and utilities

UAVCAN v0 stack in Python Python implementation of the UAVCAN v0 protocol stack. UAVCAN is a lightweight protocol designed for reliable communication

DroneCAN 11 Dec 12, 2022
apysc is the Python frontend library to create html and js file, that has ActionScript 3 (as3)-like interface.

apysc apysc is the Python frontend library to create HTML and js files, that has ActionScript 3 (as3)-like interface. Notes: Currently developing and

simonritchie 17 Dec 14, 2022
Python interface to IEX and IEX cloud APIs

Python interface to IEX Cloud Referral Please subscribe to IEX Cloud using this referral code. Getting Started Install Install from pip pip install py

IEX Cloud 41 Dec 21, 2022
Custom python interface to xstan (a modified (cmd)stan)

Custom python interface to xstan (a modified (cmd)stan) Use at your own risk, currently everything is very brittle and will probably be changed in the

null 2 Dec 16, 2021
A(Sync) Interface for internal Audible API written in pure Python.

Audible Audible is a Python low-level interface to communicate with the non-publicly Audible API. It enables Python developers to create there own Aud

mkb79 192 Jan 3, 2023
A simplified python interface to COPASI.

BasiCO This project hosts a simplified python interface to COPASI. While all functionality from COPASI is exposed via automatically generated SWIG wra

COPASI 8 Dec 21, 2022
Programmatic interface to Synapse services for Python

A Python client for Sage Bionetworks' Synapse, a collaborative, open-source research platform that allows teams to share data, track analyses, and collaborate

Sage Bionetworks 54 Dec 23, 2022
Comics/doujinshi reader application. Web-based, will work on desktop and tablet devices with swipe interface.

Yomiko Comics/doujinshi reader application. Web-based, will work on desktop and tablet devices with swipe interface. Scans one or more directories of

Kyubi Systems 26 Aug 10, 2022
This is a far more in-depth and advanced version of "Write user interface to a file API Sample"

Fusion360-Write-UserInterface This is a far more in-depth and advanced version of "Write user interface to a file API Sample" from https://help.autode

null 4 Mar 18, 2022
Tools, guides, and resources for blockchain analysts to interface with data on the Ergo platform.

Ergo Intelligence Objective Provide a suite of easy-to-use toolkits, guides, and resources for blockchain analysts and data scientists to quickly unde

Chris 5 Mar 15, 2022
TrainingBike - Code, models and schematics I've used to interface my stationary training bike with PC.

TrainingBike Code, models and schematics I've used to interface my stationary training bike with PC. You can find more information about the project i

null 1 Jan 1, 2022
A collection of common regular expressions bundled with an easy to use interface.

CommonRegex Find all times, dates, links, phone numbers, emails, ip addresses, prices, hex colors, and credit card numbers in a string. We did the har

Madison May 1.5k Dec 31, 2022
Python3 Interface to numa Linux library

py-libnuma is python3 interface to numa Linux library so that you can set task affinity and memory affinity in python level for your process which can help you to improve your code's performence.

Dalong 13 Nov 10, 2022
poro is a LCU interface to change some lol's options.

poro is a LCU interface to change some lol's options. with this program you can: change your profile icon change your profiel background image ch

João Dematte 2 Jan 5, 2022
Dockernized ZeroTierOne controller with zero-ui web interface.

docker-zerotier-controller Dockernized ZeroTierOne controller with zero-ui web interface. 中文讨论 Customize ZeroTierOne's controller planets Modify patch

sbilly 209 Jan 4, 2023