Python port of Google's libphonenumber

Overview

phonenumbers Python Library

Coverage Status

This is a Python port of Google's libphonenumber library It supports Python 2.5-2.7 and Python 3.x (in the same codebase, with no 2to3 conversion needed).

Original Java code is Copyright (C) 2009-2015 The Libphonenumber Authors.

Release HISTORY, derived from upstream release notes.

Installation

Install using pip with:

pip install phonenumbers

Example Usage

The main object that the library deals with is a PhoneNumber object. You can create this from a string representing a phone number using the parse function, but you also need to specify the country that the phone number is being dialled from (unless the number is in E.164 format, which is globally unique).

>>> import phonenumbers
>>> x = phonenumbers.parse("+442083661177", None)
>>> print(x)
Country Code: 44 National Number: 2083661177 Leading Zero: False
>>> type(x)
<class 'phonenumbers.phonenumber.PhoneNumber'>
>>> y = phonenumbers.parse("020 8366 1177", "GB")
>>> print(y)
Country Code: 44 National Number: 2083661177 Leading Zero: False
>>> x == y
True
>>> z = phonenumbers.parse("00 1 650 253 2222", "GB")  # as dialled from GB, not a GB number
>>> print(z)
Country Code: 1 National Number: 6502532222 Leading Zero(s): False

The PhoneNumber object that parse produces typically still needs to be validated, to check whether it's a possible number (e.g. it has the right number of digits) or a valid number (e.g. it's in an assigned exchange).

>>> z = phonenumbers.parse("+120012301", None)
>>> print(z)
Country Code: 1 National Number: 20012301 Leading Zero: False
>>> phonenumbers.is_possible_number(z)  # too few digits for USA
False
>>> phonenumbers.is_valid_number(z)
False
>>> z = phonenumbers.parse("+12001230101", None)
>>> print(z)
Country Code: 1 National Number: 2001230101 Leading Zero: False
>>> phonenumbers.is_possible_number(z)
True
>>> phonenumbers.is_valid_number(z)  # NPA 200 not used
False

The parse function will also fail completely (with a NumberParseException) on inputs that cannot be uniquely parsed, or that can't possibly be phone numbers.

>>> z = phonenumbers.parse("02081234567", None)  # no region, no + => unparseable
Traceback (most recent call last):
  File "phonenumbers/phonenumberutil.py", line 2350, in parse
    "Missing or invalid default region.")
phonenumbers.phonenumberutil.NumberParseException: (0) Missing or invalid default region.
>>> z = phonenumbers.parse("gibberish", None)
Traceback (most recent call last):
  File "phonenumbers/phonenumberutil.py", line 2344, in parse
    "The string supplied did not seem to be a phone number.")
phonenumbers.phonenumberutil.NumberParseException: (1) The string supplied did not seem to be a phone number.

Once you've got a phone number, a common task is to format it in a standardized format. There are a few formats available (under PhoneNumberFormat), and the format_number function does the formatting.

>>> phonenumbers.format_number(x, phonenumbers.PhoneNumberFormat.NATIONAL)
'020 8366 1177'
>>> phonenumbers.format_number(x, phonenumbers.PhoneNumberFormat.INTERNATIONAL)
'+44 20 8366 1177'
>>> phonenumbers.format_number(x, phonenumbers.PhoneNumberFormat.E164)
'+442083661177'

If your application has a UI that allows the user to type in a phone number, it's nice to get the formatting applied as the user types. The AsYouTypeFormatter object allows this.

>>> formatter = phonenumbers.AsYouTypeFormatter("US")
>>> formatter.input_digit("6")
'6'
>>> formatter.input_digit("5")
'65'
>>> formatter.input_digit("0")
'650'
>>> formatter.input_digit("2")
'650 2'
>>> formatter.input_digit("5")
'650 25'
>>> formatter.input_digit("3")
'650 253'
>>> formatter.input_digit("2")
'650-2532'
>>> formatter.input_digit("2")
'(650) 253-22'
>>> formatter.input_digit("2")
'(650) 253-222'
>>> formatter.input_digit("2")
'(650) 253-2222'

Sometimes, you've got a larger block of text that may or may not have some phone numbers inside it. For this, the PhoneNumberMatcher object provides the relevant functionality; you can iterate over it to retrieve a sequence of PhoneNumberMatch objects. Each of these match objects holds a PhoneNumber object together with information about where the match occurred in the original string.

>>> text = "Call me at 510-748-8230 if it's before 9:30, or on 703-4800500 after 10am."
>>> for match in phonenumbers.PhoneNumberMatcher(text, "US"):
...     print(match)
...
PhoneNumberMatch [11,23) 510-748-8230
PhoneNumberMatch [51,62) 703-4800500
>>> for match in phonenumbers.PhoneNumberMatcher(text, "US"):
...     print(phonenumbers.format_number(match.number, phonenumbers.PhoneNumberFormat.E164))
...
+15107488230
+17034800500

You might want to get some information about the location that corresponds to a phone number. The geocoder.area_description_for_number does this, when possible.

>>> from phonenumbers import geocoder
>>> ch_number = phonenumbers.parse("0431234567", "CH")
>>> geocoder.description_for_number(ch_number, "de")
'Zürich'
>>> geocoder.description_for_number(ch_number, "en")
'Zurich'
>>> geocoder.description_for_number(ch_number, "fr")
'Zurich'
>>> geocoder.description_for_number(ch_number, "it")
'Zurigo'

For mobile numbers in some countries, you can also find out information about which carrier originally owned a phone number.

>>> from phonenumbers import carrier
>>> ro_number = phonenumbers.parse("+40721234567", "RO")
>>> carrier.name_for_number(ro_number, "en")
'Vodafone'

You might also be able to retrieve a list of time zone names that the number potentially belongs to.

>>> from phonenumbers import timezone
>>> gb_number = phonenumbers.parse("+447986123456", "GB")
>>> timezone.time_zones_for_number(gb_number)
('Atlantic/Reykjavik', 'Europe/London')

For more information about the other functionality available from the library, look in the unit tests or in the original libphonenumber project.

Memory Usage

The library includes a lot of metadata, potentially giving a significant memory overhead. There are two mechanisms for dealing with this.

  • The normal metadata for the core functionality of the library is loaded on-demand, on a region-by-region basis (i.e. the metadata for a region is only loaded on the first time it is needed).
  • Metadata for extended functionality is held in separate packages, which therefore need to be explicitly loaded separately. This affects:
    • The geocoding metadata, which is held in phonenumbers.geocoder and used by the geocoding functions (geocoder.description_for_number, geocoder.description_for_valid_number or geocoder.country_name_for_number).
    • The carrier metadata, which is held in phonenumbers.carrier and used by the mapping functions (carrier.name_for_number or carrier.name_for_valid_number).
    • The timezone metadata, which is held in phonenumbers.timezone and used by the timezone functions (time_zones_for_number or time_zones_for_geographical_number).

The phonenumberslite version of the library does not include the geocoder, carrier and timezone packages, which can be useful if you have problems installing the main phonenumbers library due to space/memory limitations.

If you need to ensure that the metadata memory use is accounted for at start of day (i.e. that a subsequent on-demand load of metadata will not cause a pause or memory exhaustion):

  • Force-load the normal metadata by calling phonenumbers.PhoneMetadata.load_all().
  • Force-load the extended metadata by importing the appropriate packages (phonenumbers.geocoder, phonenumbers.carrier, phonenumbers.timezone).

The phonenumberslite version of the package does not include the geocoding, carrier and timezone metadata, which can be useful if you have problems installing the main phonenumbers package due to space/memory limitations.

Project Layout

  • The python/ directory holds the Python code.
  • The resources/ directory is a copy of the resources/ directory from libphonenumber. This is not needed to run the Python code, but is needed when upstream changes to the master metadata need to be incorporated.
  • The tools/ directory holds the tools that are used to process upstream changes to the master metadata.
Comments
  • Problem installing 6.0 on newly installed Ubuntu machine

    Problem installing 6.0 on newly installed Ubuntu machine

    I just tried installing phonenumbers on a newly installed Ubuntu 12.04 installation.

    I'm running these versions of things: Python: 2.7.3 pip: 1.5.1 virtualenv: 1.11.1

    To repeat:

    $ mkvirtualenv test
    $ pip install phonenumbers
    [... long list of byte compiled files ...]
    Cleaning up...
    Command /home/deploy/.virtualenvs/test/bin/python -c "import setuptools, tokenize;__file__='/home/deploy/.virtualenvs/test/build/phonenumbers/setup.py';exec(compile(getattr(tokenize, 'open', open)(__file__).read().replace('\r\n', '\n'), __file__, 'exec'))" install --record /tmp/pip-11ulaL-record/install-record.txt --single-version-externally-managed --compile --install-headers /home/deploy/.virtualenvs/test/include/site/python2.7 failed with error code -9 in /home/deploy/.virtualenvs/test/build/phonenumbers
    Storing debug log for failure in /home/deploy/.pip/pip.log
    

    I've no idea what error code -9 means, but it works if I install version 5.9.2 instead.

    opened by gaqzi 26
  • Enhancement: add type annotations to API surface

    Enhancement: add type annotations to API surface

    I believe exceptions due to type errors when trying to parse the wrong datatype should be more clear. I recently ran into a situation whereby I was inadvertently passing a PhoneNumber object type to the parse function. This resulted in the following error: AttributeError: 'PhoneNumber' object has no attribute 'find'

    Despite the fact I was passing a PhoneNumber object, it was not obvious to me in the moment that I needed to be passing a string based on the exception, which sent me investigating why that object lacked the attribute being called instead of letting me know that I was experiencing a TypeError and that the function was expecting a string in the first place. This becomes more ambiguous if a user feeds the function an int, (which seems a much more likely mistake) which returns the following exception: TypeError: object of type 'int' has no len()

    The following code produces these exceptions depending on what is printed:

    import phonenumbers
    
    phone_number_int = 1111111111
    phone_number_str = str(phone_number_int)
    phone_number_object = phonenumbers.parse(phone_number_str, 'US')
    phone_number_error = phonenumbers.parse(phone_number_int)
    
    print(phone_number_error)
    

    I believe a simple check of the number variable that raises a TypeError if not a string type and lets the user know the function is expecting a string would be helpful, and I would be happy to raise a pull request with this change.

    Thank you.

    opened by mitchellkennedy 19
  • Problems with excessive memory usage

    Problems with excessive memory usage

    We are using phonennumbers, in order to be able to normalize phone numbers from 0660123123 for example to +48660123123 ITU-T E.164 format, knowing the ISO countr code the number is from (regardless of the format of the original number). This is a great piece of code which took us almost no time to integrate. Thanks for the porting work !!! We noticed however after merging our code with this changes and deploying to rthe production server, suddenly our site stopped responding. We reverted back quickly and tried a few changes, and finally we narrowed it down to single line of code:

    import phonenumbers
    

    It seems that the library is awfully memory-hungry. After further checking - we found our that just importing phonenumbers increases the memory used by application by 50+MB. This is a django app we are running and before the import it was about 40 MB, so this is quite significant increase. Moreover it seems that there are some very bad effects, if the memory starts being swapped out to disc. Instead of observed moderate slowdown of the app, it basically stops responding. It seems that the code in your library is really hard trying to scan through the whole memory it's using and tries to bring the whole data back to memory.... Which - when you run several processes in parallell - can be disastrous because it seems that the processes are continuously swapping each other in back to memory/out to disk and the whole application stops responding.

    Just to give a bit of background: we are running our django app in "controlled" environment of heroku where one "dyno" only has 512 MB of ram, and we are using gunicorn in front, that spawns several worker processes - this is in order to get scalability and be able to handle several requests in parallel per "dyno". So far we could easily run 4 workers per dyno but after adding phonenenumbers, the swapping out (and putting the site to halt) kicks in already at 2nd worker. We are running some other software there, like pgbouncer and the os itself takes a bit of memory, so we do not have all 512 MB available - it looks like we have around 180 MB. Which means for us that basically in order to get the same throughput with phonennumbers, we need to get 4x more dynos(!) (and on heroku you pay per dyno(!)). Obviously that's not really acceptable :-1: . I know you mentioned at your wiki page that you are not that conscious about memory as the original Java library, because the library should be used in server environment... but it seems that assumption is not true for us, unfortunately.

    We are looking into ways to solve it, so i have the following question. Is there a way to run the library in less-memory hungry mode ? How do the guys in Java lib did it? Maybe we can help somehow in moving it in the right direction of being less memory hungry? I guess the memory used comes from the need of doing type-ahead typing etc. features which are very useful for a mobile app, but less (if at all) useful for python code run at the server side (seems better to use it in javascript), so maybe that part can be trimmed down (or even completely removed, if this is the culprit). We only need one functionality: get a phone number and iso CC of where the number is from and produce ITU-T E.164 formatted number (regardless of the format of the original number), so maybe there is a way we could simply remove the other code and use the rest?

    Any help greatly appreciated.

    opened by potiuk 14
  • FIXED_LINE_OR_MOBILE issue

    FIXED_LINE_OR_MOBILE issue

    When I try to get FIXED_LINE_OR_MOBILE . My number +1 414-719-88XX is a mobile number with AT&T. But it says its an fixed line .

    but on freecarrierlookup.com i got the genuine result

    Phone Number: 14147198856 Carrier: ATT Mobility Is Wireless: y ( mobile) SMS Gateway Address: [email protected] MMS Gateway Address: [email protected]

    please help . where we will get the update

    opened by anishmenon 10
  • PhoneNumberFormat: Not outputting specified formats.

    PhoneNumberFormat: Not outputting specified formats.

    I have a list of GB numbers (some with leading zeros and some without.

    I am trying to format them all to E164 format but phonenumbers.PhoneNumberFormat.INTERNATIONAL/E164/NATIONAL all output the same wrong results.

    I can't replicate this on the JAVA web app which outputs the numbers correctly.

    Code Block:

    z = phonenumbers.parse(str(number), "GB")
    
        if phonenumbers.is_valid_number(z):
    
            phonenumbers.format_number(z, phonenumbers.PhoneNumberFormat.INTERNATIONAL)
    

    Actual Output: Country Code: 44 National Number: XXXXXXXXXX

    Expected Output: +44XXXXXXXXXX

    Am I missing something?

    opened by SamuelMTDavies 9
  • Porting to Python 3

    Porting to Python 3

    Hello, this isn't really an issue, but I don't know where else to talk about this.

    I need to use phonenumbers with Python 3.3.

    The README talks about:

    • running 2to3, but I'm not sure how well that plays with pip.
    • the python3 branch, but the wording doesn't inspire a lot of confidence.

    Current porting efforts keep compatibility with Python 2.5, which makes the shared-source approach one order of magnitude harder than targeting Python 2.6 as the lowest supported version.

    Since I have some experience with porting to Python 3 (I ported Django, among other things), I tried to redo the porting for 2.6+ and 3.3+, which is all I need. It just required the following steps:

    • making relative imports explicit eg. from .util import ...
    • changing try blocks to use the except ... as ... syntax
    • removing long literals; I'm not sure why they're used instead of plain integers; this change makes a lot of noise in the tests
    • replacing long by int
    • adding text_type (unicode or str) and xrange (xrange or range) for compatibility
    • replacing sys.maxint by something else -- I used sys.maxsize but that's semantically wrong, another hardcoded constant would be better

    At this point, the library appears to work ie. I can reproduce the examples in the documentation. However, for some reason, the tests (python setup.py test) fail massively, as if no data could be found for any country.

    Finally I suggest the following fix to UnicodeMixin:

     class UnicodeMixin(object):  # pragma no cover
         """Define __str__ operator in terms of __unicode__ for Python 2/3"""
    +    # Emulate the way the interpreter looks up magic methods
         if sys.version_info >= (3, 0):
    -        __str__ = lambda x: x.__unicode__()
    +        __str__ = lambda x: type(x).__unicode__(x)
         else:
    -        __str__ = lambda x: unicode(x).encode('utf-8')
    +        __str__ = lambda x: type(x).__unicode__(x).encode('utf-8')
    

    It shouldn't matter except in weird subclassing schemes.

    This isn't really actionnable, and I expect you to close this issue once you've read it, but I found it interesting to share this experience.

    [EDIT] I accidentally sumbitted the issue while I was writing, I've completed my description since then.

    opened by aaugustin 6
  • Geocode returning wrong lat/long

    Geocode returning wrong lat/long

    I use phonenumber and geocoder and when that launches the program to have the lat and the lng it always gives me the same value

    my code:

    import phonenumbers
    from phonenumbers import geocoder
    from phonenumbers import carrier
    import opencage
    from opencage.geocoder import OpenCageGeocode
    
    number = "+33752179624"
    
    pepnumber = phonenumbers.parse(number)
    
    location = geocoder.description_for_number(pepnumber, "fr")
    print(location)
    
    service_pro = phonenumbers.parse(number)
    print(carrier.name_for_number(service_pro, "fr"))
    
    geocoder = OpenCageGeocode("f2698d2dc8e342cfb20bf2b5eb61a8e7")
    query = str(location)
    
    results = geocoder.geocode(query)
    
    lat = results[0]['geometry']['lat']
    lng = results[0]['geometry']['lng']
    print(lat, lng)
    
    opened by Midaco-YT 5
  • Package started installing global-scope

    Package started installing global-scope "tests" package

    91abecb66b131a7fe139b3149e49c4eadbe00a3c has added "tests" to the list of installed global-scope packages. This is a very bad idea since it's not namespaced at all, and can collide with other packages.

    If you need to install tests, please move them inside "phonenumbers" directory to avoid polluting the global namespace.

    opened by mgorny 5
  • RTL phone rendering issue.

    RTL phone rendering issue.

    Hi,

    Using Django 3.0 as a Framework, and python-phonenumbers to render phone numbers in Django template.

    LTR work great, but when the web page is in RTL (Right to Left = Arabic langauge) there's an issue rendrering phone numbers.

    Ex : Phone number : 0779537579

    parsed = phonenumbers.parse('0779537579', 'SA')
    phonenumbers.format_number(parsed, phonenumbers.PhoneNumberFormat.NATIONAL)
    

    1. Phone rendering in LTR : 0779 53 75 79
    2. Phone rendrering in RTL : 79 75 53 0779

    Is there a workaround for this or an option to say that this is a RTL one ???, I'm I missing something ?

    PS:if the phone number does't have a space it is well rendered in the tempalte, wich (the sapce) was made by the phonenumbers(PhoneNumberFormat.NATIONAL).

    Amazing library Hope to hear from you soon.

    Thank you

    opened by infosrabah 5
  • FIX error when number '0631464833' and region None

    FIX error when number '0631464833' and region None

    FIX error "AttributeError: 'NoneType' object has no attribute 'country_code'" look at the line 2820:

    if region is None:
        metadata = None
    

    in line 2854 country_code = metadata.country_code error when metadata is None and phone number without country prefix like '0631464833'

    opened by KomarovAlea 5
  • Country codes not validated

    Country codes not validated

    I read the contributing guidelines and hope this issue is related more to this repo than the Java one.

    When trying to use the PhoneNumberMatcher with invalid country codes, no error is thrown. E.g. for match in PhoneNumberMatcher("Some text with or without a number", "INVALID_CC"):     print(match)

    The parse method in phonenumberutil.py does seem to do validation, but that's not reflected in the results.

    opened by McWillie 5
  • Typing Improvement: Use IntEnum for PhoneNumberFormat, PhoneNumberType, etc.

    Typing Improvement: Use IntEnum for PhoneNumberFormat, PhoneNumberType, etc.

    Python 3.8.12 phonenumbers 8.12.33

    Currently enum like classes like PhoneNumberFormat are using plain classes. For type checking it would be much more convenient if you changed these to Enums or IntEnums (if you need them to pass as int for compatibility reasons). This would allow you to annotate functions like example_number_for_type as example_number_for_type(str, PhoneNumberType) rather than the much more lax example_number_for_type(str, int), which is really not that helpful as a type hint. And even if you don't end up changing the function signatures, it would at least be a win for third party apps, if they could require PhoneNumberType in their functions, rather than a int.

    Apart from that the constant values in these classes should really be annotated as Final if you aren't going to change it to an enum. That way type checkers could at the very least do some static analysis knowing that these values won't ever change at runtime. Although currently I don't think it is possible to define a Literal using a Final variable yet. So if I actually wanted to restrict function calls I'd have to define my own IntEnum that just mirrors your classes (or hardcode the Literals using the same integer values, which could break at any point in time).

    opened by Daverball 8
  • Check for non-`None` expectations

    Check for non-`None` expectations

    From https://github.com/daviddrysdale/python-phonenumbers/issues/200#issuecomment-910799543:

    it is possible for None to be returned, but a lot of implementation code expects a value and doesn't check for None

    applies to phonenumberutil.(example_number | invalid_example_number | example_number_for_type | example_number_for_non_geo_entity | region_code_for_number | ndd_prefix_for_region) and to a lesser extent re_util.fullmatch, as it is only ever used in a boolean context.

    opened by daviddrysdale 7
  • Caching compiled `re` patterns

    Caching compiled `re` patterns

    First of all, thank you for creating this Python port.

    In my use, I'm checking a large number of short strings to see if they are likely to be a phone number. I found that I'm making a lot of calls to re.compile() (for example here). Caching the patterns leads to about a 6-10x speedup for my use case.

    If this is something that would make sense given the contributing guidelines, I'd be happy to do a PR.

    opened by pvarsh 4
  • performance of country_name_for_number

    performance of country_name_for_number

    Why is country_name_for_number so slow compared to region_code_for_number? I'd expect the a lookup for the region_code to country_name to be quite quick. It appears to check validity of the number. I haven't figured out where most time is spent yet. Suppose there's any way to speed things up?

    --Karl

    from phonenumbers import format_number, is_possible_number, is_possible_number_with_reason, is_valid_number, number_type
    from phonenumbers.geocoder import region_code_for_number, country_name_for_number, description_for_number
    
    # setup p for test
    p = parse('13035551212', 'US')
    print("parse_number:", timeit(lambda: parse('13035551212','US'), number=10000))
    print("format_number:", timeit(lambda: format_number(p, phonenumbers.PhoneNumberFormat.E164), number=10000))
    print("is_possible_number:", timeit(lambda: is_possible_number(p), number=10000))
    print("is_possible_number_with_reason:", timeit(lambda: is_possible_number_with_reason(p), number=10000))
    print("is_valid_number:", timeit(lambda: is_valid_number(p), number=10000))
    print("number_type:", timeit(lambda: number_type(p), number=10000))
    print("region_code_for_number:", timeit(lambda: region_code_for_number(p), number=10000))
    print("country_name_for_number:", timeit(lambda: country_name_for_number(p, 'en'), number=10000))
    print("desctription_for_number:", timeit(lambda: description_for_number(p, 'en'), number=10000))
    

    results

    ('parse_number:', 1.0535039901733398) ('format_number:', 0.021859169006347656) ('is_possible_number:', 0.036605119705200195) ('is_possible_number_with_reason:', 0.0437769889831543) ('is_valid_number:', 1.0846400260925293) ('number_type:', 1.1673779487609863) ('region_code_for_number:', 0.5287020206451416) ('country_name_for_number:', 4.455769062042236) ('desctription_for_number:', 1.7113380432128906)

    opened by kputland 3
Owner
David Drysdale
David Drysdale
STT for TorchScript is a port of Coqui STT based on DeepSpeech to PyTorch.

st3 STT for TorchScript is a port of Coqui STT based on DeepSpeech to PyTorch. Currently it supports converting pbmm models to pt scripts with integra

Vlad Ki 8 Oct 18, 2021
Python module (C extension and plain python) implementing Aho-Corasick algorithm

pyahocorasick pyahocorasick is a fast and memory efficient library for exact or approximate multi-pattern string search meaning that you can find mult

Wojciech Muła 763 Dec 27, 2022
Python module (C extension and plain python) implementing Aho-Corasick algorithm

pyahocorasick pyahocorasick is a fast and memory efficient library for exact or approximate multi-pattern string search meaning that you can find mult

Wojciech Muła 579 Feb 17, 2021
Python-zhuyin - An open source Python library that provides a unified interface for converting between Chinese pinyin and Zhuyin (bopomofo)

Python-zhuyin - An open source Python library that provides a unified interface for converting between Chinese pinyin and Zhuyin (bopomofo)

null 2 Dec 29, 2022
Ελληνικά νέα (Python script) / Greek News Feed (Python script)

Ελληνικά νέα (Python script) / Greek News Feed (Python script) Ελληνικά English Το 2017 είχα υλοποιήσει ένα Python script για να εμφανίζει τα τωρινά ν

Loren Kociko 1 Jun 14, 2022
Web mining module for Python, with tools for scraping, natural language processing, machine learning, network analysis and visualization.

Pattern Pattern is a web mining module for Python. It has tools for: Data Mining: web services (Google, Twitter, Wikipedia), web crawler, HTML DOM par

Computational Linguistics Research Group 8.4k Dec 30, 2022
A python framework to transform natural language questions to queries in a database query language.

__ _ _ _ ___ _ __ _ _ / _` | | | |/ _ \ '_ \| | | | | (_| | |_| | __/ |_) | |_| | \__, |\__,_|\___| .__/ \__, | |_| |_| |___/

Machinalis 1.2k Dec 18, 2022
Python library for processing Chinese text

SnowNLP: Simplified Chinese Text Processing SnowNLP是一个python写的类库,可以方便的处理中文文本内容,是受到了TextBlob的启发而写的,由于现在大部分的自然语言处理库基本都是针对英文的,于是写了一个方便处理中文的类库,并且和TextBlob

Rui Wang 6k Jan 2, 2023
A Python package implementing a new model for text classification with visualization tools for Explainable AI :octocat:

A Python package implementing a new model for text classification with visualization tools for Explainable AI ?? Online live demos: http://tworld.io/s

Sergio Burdisso 285 Jan 2, 2023
Python bindings to the dutch NLP tool Frog (pos tagger, lemmatiser, NER tagger, morphological analysis, shallow parser, dependency parser)

Frog for Python This is a Python binding to the Natural Language Processing suite Frog. Frog is intended for Dutch and performs part-of-speech tagging

Maarten van Gompel 46 Dec 14, 2022
A python wrapper around the ZPar parser for English.

NOTE This project is no longer under active development since there are now really nice pure Python parsers such as Stanza and Spacy. The repository w

ETS 49 Sep 12, 2022
💫 Industrial-strength Natural Language Processing (NLP) in Python

spaCy: Industrial-strength NLP spaCy is a library for advanced Natural Language Processing in Python and Cython. It's built on the very latest researc

Explosion 24.9k Jan 2, 2023
Python interface for converting Penn Treebank trees to Stanford Dependencies and Universal Depenencies

PyStanfordDependencies Python interface for converting Penn Treebank trees to Universal Dependencies and Stanford Dependencies. Example usage Start by

David McClosky 64 May 8, 2022
Fuzzy String Matching in Python

FuzzyWuzzy Fuzzy string matching like a boss. It uses Levenshtein Distance to calculate the differences between sequences in a simple-to-use package.

SeatGeek 8.8k Jan 1, 2023
🎐 a python library for doing approximate and phonetic matching of strings.

jellyfish Jellyfish is a python library for doing approximate and phonetic matching of strings. Written by James Turk <[email protected]> and Michael

James Turk 1.8k Dec 21, 2022
Python wrapper for Stanford CoreNLP tools v3.4.1

Python interface to Stanford Core NLP tools v3.4.1 This is a Python wrapper for Stanford University's NLP group's Java-based CoreNLP tools. It can eit

Dustin Smith 610 Sep 7, 2022