The Levenshtein Python C extension module contains functions for fast computation of Levenshtein distance and string similarity

Overview

Maintainer wanted

MaintainerWanted

I am looking for a new maintainer to the project as it is apparent that I haven't had the need for this particular library for well over 7 years now, due to it being a C-only library and its somewhat restrictive original license.

Introduction

The Levenshtein Python C extension module contains functions for fast computation of

  • Levenshtein (edit) distance, and edit operations
  • string similarity
  • approximate median strings, and generally string averaging
  • string sequence and set similarity

It supports both normal and Unicode strings.

Python 2.2 or newer is required; Python 3 is supported.

StringMatcher.py is an example SequenceMatcher-like class built on the top of Levenshtein. It misses some SequenceMatcher's functionality, and has some extra OTOH.

Levenshtein.c can be used as a pure C library, too. You only have to define NO_PYTHON preprocessor symbol (-DNO_PYTHON) when compiling it. The functionality is similar to that of the Python extension. No separate docs are provided yet, RTFS. But they are not interchangeable:

  • C functions exported when compiling with -DNO_PYTHON (see Levenshtein.h) are not exported when compiling as a Python extension (and vice versa)
  • Unicode character type used with -DNO_PYTHON is wchar_t, Python extension uses Py_UNICODE, they may be the same but don't count on it

Installation

pip install python-Levenshtein

Documentation

gendoc.sh generates HTML API documentation, you probably want a selfcontained instead of includable version, so run in ./gendoc.sh --selfcontained. It needs Levenshtein already installed and genextdoc.py.

License

Levenshtein is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.

See the file COPYING for the full text of GNU General Public License version 2.

History

This package was long missing from the Python Package Index and available as source checkout only, but can now be found on PyPI again.

We needed to restore this package for Go Mobile for Plone and Pywurfl projects which depend on this.

Source code

Authors

  • Maintainer: Antti Haapala <[email protected]>
  • Python 3 compatibility: Esa Määttä
  • Jonatas CD: Fixed documentation generation
  • Previous maintainer: Mikko Ohtamaa
  • Original code: David Necas (Yeti) <yeti at physics.muni.cz>
Comments
  • Usage docs on github

    Usage docs on github

    I installed the module. When I look at the github docs, it says to run something to get docs. When I try to run that thing, it can't find gendoc.sh. So, now I have to chase down some doc tool.

    It would be handy to have some simple usage docs on github in the README. What are the main function points, and how are they used? What do I have to import?

    opened by dfrankow 9
  • cc1plus causes build failure

    cc1plus causes build failure

    on Ubuntu 14.04 server, gcc version 4.9.2 (Ubuntu 4.9.2-0ubuntu1~14.04)

    Downloading/unpacking editdistance
      Downloading editdistance-0.2.tar.gz
      Running setup.py (path:/tmp/pip_build_root/editdistance/setup.py) egg_info for package editdistance
    
    Installing collected packages: editdistance
      Running setup.py install for editdistance
        building 'editdistance.bycython' extension
        x86_64-linux-gnu-gcc -pthread -fno-strict-aliasing -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -fPIC -I./editdistance -I/usr/include/python2.7 -c editdistance/_editdistance.cpp -o build/temp.linux-x86_64-2.7/editdistance/_editdistance.o
        x86_64-linux-gnu-gcc: error trying to exec 'cc1plus': execvp: No such file or directory
        error: command 'x86_64-linux-gnu-gcc' failed with exit status 1
        Complete output from command /usr/bin/python -c "import setuptools, tokenize;__file__='/tmp/pip_build_root/editdistance/setup.py';exec(compile(getattr(tokenize, 'open', open)(__file__).read().replace('\r\n', '\n'), __file__, 'exec'))" install --record /tmp/pip-481IOZ-record/install-record.txt --single-version-externally-managed --compile:
        running install
    
    running build
    
    running build_py
    
    creating build
    
    creating build/lib.linux-x86_64-2.7
    
    creating build/lib.linux-x86_64-2.7/editdistance
    
    copying editdistance/__init__.py -> build/lib.linux-x86_64-2.7/editdistance
    
    copying editdistance/_editdistance.h -> build/lib.linux-x86_64-2.7/editdistance
    
    running build_ext
    
    building 'editdistance.bycython' extension
    
    creating build/temp.linux-x86_64-2.7
    
    creating build/temp.linux-x86_64-2.7/editdistance
    
    x86_64-linux-gnu-gcc -pthread -fno-strict-aliasing -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -fPIC -I./editdistance -I/usr/include/python2.7 -c editdistance/_editdistance.cpp -o build/temp.linux-x86_64-2.7/editdistance/_editdistance.o
    
    x86_64-linux-gnu-gcc: error trying to exec 'cc1plus': execvp: No such file or directory
    
    error: command 'x86_64-linux-gnu-gcc' failed with exit status 1
    
    opened by sfchen 5
  • Update License?

    Update License?

    Hiya,

    Is there any chance of changing the license of this module to make it less restrictive? Ideally I would suggest the MIT license, but you could also update it to GPL v3.

    opened by chrisjbryant 4
  • Integer overflow in lev_edit_distance()

    Integer overflow in lev_edit_distance()

    An integer overflow in lev_edit_distance() leads to a heap based buffer overflow.

    https://github.com/ztane/python-Levenshtein/blob/3a7412f38f1991c20d7a2765f30c2bb9cb1e63e0/Levenshtein/_levenshtein.c#L2278

    When len2 is greater than 1/4 of size_t max, the multiplication will overflow. This causes a smaller than expected allocation to occur. In a 32bit python interpreter with a len2 of 1073741825 (0x40000001) the call to malloc will end up allocating 4 bytes (0x40000001 * 0x4 = 0x100000004 which wraps a 32bit size_t to 0x4).

    C:\Users\test>"c:\Program Files (x86)\Python38-32\python.exe" Python 3.8.7 (tags/v3.8.7:6503f05, Dec 21 2020, 17:43:54) [MSC v.1928 32 bit (Intel)] on win32 Type "help", "copyright", "credits" or "license" for more information.

    import Levenshtein s = "A" * 1073741825 Levenshtein.ratio("BBBB", s)

    C:\Users\test>

    Additionally, throughout the code the return value of calls to PyString_GET_SIZE() are not checked for Py_INVALID_SIZE ((Py_ssize_t)-1). If an object is passed which forces size to be invalid, the resulting Py_ssize_t error code is cast to size_t. This will result in the string sizes operated on throughout all of the subsequent operations to effectively be the entire addressable memory space. exploitation of this one would likely be pretty tricky, but it would be easy to cause a crash if you can get an object with an invalid size into this function. I'm not entirely sure if this is possible with string or unicode objects, but it seems likely.

    Here's an example: https://github.com/ztane/python-Levenshtein/blob/3a7412f38f1991c20d7a2765f30c2bb9cb1e63e0/Levenshtein/_levenshtein.c#L708

    opened by cris-spring 4
  • get_matching_blocks() fails on python3

    get_matching_blocks() fails on python3

    Hello. I actually use a fuzzywuzzy package which relies on your repo, but suddenly I discovered that, when run on python3, tests for fuzzywuzzy produce output like that

    ERROR: testWRatioUnicodeString (__main__.RatioTest)
    ----------------------------------------------------------------------
    Traceback (most recent call last):
      File "test_fuzzywuzzy.py", line 199, in testWRatioUnicodeString
        score = fuzz.WRatio(s1, s2, force_ascii=False)
      File "/home/crystal/github/fuzzywuzzy/fuzzywuzzy/fuzz.py", line 255, in WRatio
        partial = partial_ratio(p1, p2) * partial_scale
      File "/home/crystal/github/fuzzywuzzy/fuzzywuzzy/fuzz.py", line 77, in partial_ratio
        blocks = m.get_matching_blocks()
      File "/home/crystal/github/fuzzywuzzy/fuzzywuzzy/StringMatcher.py", line 57, in get_matching_blocks
        self._str1, self._str2)
    TypeError: inverse expected a list of edit operations
    

    StringMatcher.py is the same file as in your repo and it seems that there is some problem in get_matching_blocks() function, and only on python3. I haven't written extensions, so I tried to understand the root - but failed :(

    Could you suggest the solutionfor this?

    opened by ojomio 4
  • Feature Request - Documentation on Release

    Feature Request - Documentation on Release

    Would you mind providing the HTML docs that can be generated?

    It'd be very nice in deciding whether or not this is a library with which I (or others) would like to work.

    Thanks.

    opened by daryltucker 4
  • Crashes python interpreter in seq* with simple arguments

    Crashes python interpreter in seq* with simple arguments

    Hello, i'm forwarding Debian bug #597609:

    $ gdb python
    GNU gdb (GDB) 7.6 (Debian 7.6-5)
    Copyright (C) 2013 Free Software Foundation, Inc.
    License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
    This is free software: you are free to change and redistribute it.
    There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
    and "show warranty" for details.
    This GDB was configured as "x86_64-linux-gnu".
    For bug reporting instructions, please see:
    <http://www.gnu.org/software/gdb/bugs/>...
    Reading symbols from /usr/bin/python2.7...Reading symbols from /usr/lib/debug/usr/bin/python2.7...done.
    done.
    (gdb) run
    Starting program: /usr/bin/python2.7 
    warning: no loadable sections found in added symbol-file system-supplied DSO at 0x7ffff7ffa000
    warning: Could not load shared library symbols for linux-vdso.so.1.
    Do you need "set solib-search-path" or "set sysroot"?
    [Thread debugging using libthread_db enabled]
    Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
    Python 2.7.5+ (default, Sep 17 2013, 15:31:50) 
    [GCC 4.8.1] on linux2
    Type "help", "copyright", "credits" or "license" for more information.
    >>> import Levenshtein
    >>> Levenshtein.seqratio("hallo", "bla")
    
    Program received signal SIGSEGV, Segmentation fault.
    extract_stringlist (list='hallo', name=name@entry=0x7ffff64b2c00 "seqratio", n=n@entry=5, sizelist=sizelist@entry=0x7fffffffdd90, strlist=strlist@entry=0x7fffffffdd80) at Levenshtein.c:1166
    1166    Levenshtein.c: No such file or directory.
    (gdb) bt full
    #0  extract_stringlist (list='hallo', name=name@entry=0x7ffff64b2c00 "seqratio", n=n@entry=5, sizelist=sizelist@entry=0x7fffffffdd90, strlist=strlist@entry=0x7fffffffdd80)
        at Levenshtein.c:1166
            i = <optimized out>
            first = <unknown at remote 0x-2bd86a95779d78df>
    #1  0x00007ffff64ac59e in setseq_common (args=<optimized out>, name=name@entry=0x7ffff64b2c00 "seqratio", foo=..., lensum=lensum@entry=0x7fffffffddf8) at Levenshtein.c:1319
            n1 = 5
            n2 = 3
            strings1 = 0x0
            strings2 = 0x0
            sizes1 = 0x0
            sizes2 = 0x0
            strlist1 = 'hallo'
            strlist2 = 'bla'
            strseq1 = <optimized out>
            strseq2 = ['b', 'l', 'a']
            stringtype1 = <optimized out>
            stringtype2 = <optimized out>
            r = -1
    #2  0x00007ffff64b0676 in seqratio_py (self=<optimized out>, args=<optimized out>) at Levenshtein.c:1251
            lensum = 8
            r = <optimized out>
    #3  0x0000000000529e45 in call_function (oparg=<optimized out>, pp_stack=0x7fffffffdf00) at ../Python/ceval.c:4021
            flags = <optimized out>
            tstate = 0x9410a0
            func = <built-in function seqratio>
            w = <optimized out>
            na = <optimized out>
            nk = <optimized out>
            n = <optimized out>
            pfunc = 0xa2ed08
            x = <optimized out>
    #4  PyEval_EvalFrameEx (f=f@entry=Frame 0xa2eb90, for file <stdin>, line 1, in <module> (), throwflag=throwflag@entry=0) at ../Python/ceval.c:2666
            sp = 0xa2ed10
            stack_pointer = <optimized out>
            next_instr = 0x7ffff7ee5763 "Fd\002"
            opcode = <optimized out>
            oparg = <optimized out>
            why = WHY_NOT
            err = <optimized out>
            x = <optimized out>
            v = <optimized out>
            w = <optimized out>
            u = <optimized out>
    ---Type <return> to continue, or q <return> to quit---
            t = <optimized out>
            stream = 0x0
            fastlocals = 0xa2ed08
            freevars = <optimized out>
            retval = <optimized out>
            tstate = <optimized out>
            co = <optimized out>
            instr_ub = -1
            instr_lb = 0
            instr_prev = -1
            first_instr = <optimized out>
            names = <optimized out>
            consts = <optimized out>
            enter = '__enter__'
            exit = '__exit__'
    #5  0x00000000004c6544 in PyEval_EvalCodeEx (closure=0x0, defcount=0, defs=0x0, kwcount=0, kws=0x0, argcount=0, args=0x0, locals=<optimized out>, globals=<optimized out>, co=0x7ffff7f371b0)
        at ../Python/ceval.c:3253
            retval = 0x0
            fastlocals = 0xa2ed08
            freevars = 0xa2ed08
            u = <optimized out>
            f = Frame 0xa2eb90, for file <stdin>, line 1, in <module> ()
            tstate = 0x9410a0
            x = <optimized out>
    #6  PyEval_EvalCode (locals=<optimized out>, globals=<optimized out>, co=0x7ffff7f371b0) at ../Python/ceval.c:667
    No locals.
    #7  run_mod.42568 (mod=mod@entry=0xa2cce0, filename=filename@entry=0x5bb2b5 "<stdin>", globals=<optimized out>, locals=<optimized out>, flags=flags@entry=0x7fffffffe0c0, 
        arena=arena@entry=0x9ab2d0) at ../Python/pythonrun.c:1365
            co = 0x7ffff7f371b0
    #8  0x000000000043407e in PyRun_InteractiveOneFlags (fp=fp@entry=0x7ffff729d240 <_IO_2_1_stdin_>, filename=filename@entry=0x5bb2b5 "<stdin>", flags=flags@entry=0x7fffffffe0c0)
        at ../Python/pythonrun.c:852
            m = <optimized out>
            d = <optimized out>
            v = '>>> '
            w = '... '
            mod = 0xa2cce0
            arena = 0x9ab2d0
            ps1 = <optimized out>
            ps2 = 0x7ffff7eed3b4 "... "
            errcode = 0
    #9  0x000000000043419a in PyRun_InteractiveLoopFlags (fp=fp@entry=0x7ffff729d240 <_IO_2_1_stdin_>, filename=filename@entry=0x5bb2b5 "<stdin>", flags=flags@entry=0x7fffffffe0c0)
        at ../Python/pythonrun.c:772
            v = <optimized out>
    ---Type <return> to continue, or q <return> to quit---
            ret = <optimized out>
            local_flags = {cf_flags = 0}
    #10 0x000000000043484f in PyRun_AnyFileExFlags (fp=fp@entry=0x7ffff729d240 <_IO_2_1_stdin_>, filename=filename@entry=0x5bb2b5 "<stdin>", closeit=closeit@entry=0, 
        flags=flags@entry=0x7fffffffe0c0) at ../Python/pythonrun.c:741
            err = <optimized out>
    #11 0x00000000004353e3 in Py_Main (argc=<optimized out>, argv=0x7fffffffe278) at ../Modules/main.c:640
            c = <optimized out>
            sts = <optimized out>
            command = 0x0
            filename = 0x0
            module = 0x0
            fp = 0x7ffff729d240 <_IO_2_1_stdin_>
            p = <optimized out>
            unbuffered = <optimized out>
            skipfirstline = <optimized out>
            stdin_is_interactive = 1
            help = <optimized out>
            version = <optimized out>
            saw_unbuffered_flag = <optimized out>
            cf = {cf_flags = 0}
    #12 0x00007ffff6f17995 in __libc_start_main (main=0x4354a1 <main>, argc=1, ubp_av=0x7fffffffe278, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, 
        stack_end=0x7fffffffe268) at libc-start.c:260
            result = <optimized out>
            unwind_buf = {cancel_jmp_buf = {{jmp_buf = {0, -2830849066025673389, 5720549, 140737488347760, 0, 0, 2830849065964215635, 2830829193398394195}, mask_was_saved = 0}}, priv = {pad = {
                  0x0, 0x0, 0x5b8e60 <__libc_csu_init>, 0x7fffffffe278}, data = {prev = 0x0, cleanup = 0x0, canceltype = 6000224}}}
            not_first_call = <optimized out>
    #13 0x0000000000574a0e in _start ()
    No symbol table info available.
    (gdb) thread apply all backtrace
    
    Thread 1 (Thread 0x7ffff7fc1700 (LWP 8108)):
    #0  extract_stringlist (list='hallo', name=name@entry=0x7ffff64b2c00 "seqratio", n=n@entry=5, sizelist=sizelist@entry=0x7fffffffdd90, strlist=strlist@entry=0x7fffffffdd80)
        at Levenshtein.c:1166
    #1  0x00007ffff64ac59e in setseq_common (args=<optimized out>, name=name@entry=0x7ffff64b2c00 "seqratio", foo=..., lensum=lensum@entry=0x7fffffffddf8) at Levenshtein.c:1319
    #2  0x00007ffff64b0676 in seqratio_py (self=<optimized out>, args=<optimized out>) at Levenshtein.c:1251
    #3  0x0000000000529e45 in call_function (oparg=<optimized out>, pp_stack=0x7fffffffdf00) at ../Python/ceval.c:4021
    #4  PyEval_EvalFrameEx (f=f@entry=Frame 0xa2eb90, for file <stdin>, line 1, in <module> (), throwflag=throwflag@entry=0) at ../Python/ceval.c:2666
    #5  0x00000000004c6544 in PyEval_EvalCodeEx (closure=0x0, defcount=0, defs=0x0, kwcount=0, kws=0x0, argcount=0, args=0x0, locals=<optimized out>, globals=<optimized out>, co=0x7ffff7f371b0)
        at ../Python/ceval.c:3253
    #6  PyEval_EvalCode (locals=<optimized out>, globals=<optimized out>, co=0x7ffff7f371b0) at ../Python/ceval.c:667
    #7  run_mod.42568 (mod=mod@entry=0xa2cce0, filename=filename@entry=0x5bb2b5 "<stdin>", globals=<optimized out>, locals=<optimized out>, flags=flags@entry=0x7fffffffe0c0, 
        arena=arena@entry=0x9ab2d0) at ../Python/pythonrun.c:1365
    #8  0x000000000043407e in PyRun_InteractiveOneFlags (fp=fp@entry=0x7ffff729d240 <_IO_2_1_stdin_>, filename=filename@entry=0x5bb2b5 "<stdin>", flags=flags@entry=0x7fffffffe0c0)
        at ../Python/pythonrun.c:852
    #9  0x000000000043419a in PyRun_InteractiveLoopFlags (fp=fp@entry=0x7ffff729d240 <_IO_2_1_stdin_>, filename=filename@entry=0x5bb2b5 "<stdin>", flags=flags@entry=0x7fffffffe0c0)
        at ../Python/pythonrun.c:772
    #10 0x000000000043484f in PyRun_AnyFileExFlags (fp=fp@entry=0x7ffff729d240 <_IO_2_1_stdin_>, filename=filename@entry=0x5bb2b5 "<stdin>", closeit=closeit@entry=0, 
        flags=flags@entry=0x7fffffffe0c0) at ../Python/pythonrun.c:741
    #11 0x00000000004353e3 in Py_Main (argc=<optimized out>, argv=0x7fffffffe278) at ../Modules/main.c:640
    #12 0x00007ffff6f17995 in __libc_start_main (main=0x4354a1 <main>, argc=1, ubp_av=0x7fffffffe278, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, 
        stack_end=0x7fffffffe268) at libc-start.c:260
    #13 0x0000000000574a0e in _start ()
    (gdb) quit
    A debugging session is active.
    
            Inferior 1 [process 8108] will be killed.
    
    Quit anyway? (y or n) y
    
    opened by sandrotosi 3
  • Expectations from a maintainer?

    Expectations from a maintainer?

    Read the notice about needing a maintainer for this project, is this still something you are looking for? I frequently use this package at work and I would love to see how one could help out.

    Maintainer wanted

    I am looking for a new maintainer to the project as it is apparent that I haven’t had the need for this particular library for well over > 7 years now, due to it being a C-only library and its somewhat restrictive original license.

    opened by creatorrr 2
  • editops missing replace operations in some cases

    editops missing replace operations in some cases

    Sample strings: TCTTTGGAGCACAAAACCAGTTGAAACATCAAATTCGTTTGATGTACTGAAGTCAGAGGACGCGCAGGGA TCTTTGGAGCACAAAACCAGTTGAAACATCATTATTCCTTCGTTTGATGTACTGAAGTCAGAGGACGCGCAGGGA

    In this case TTATT was inserted and AA was replaced with CC, but editops only returns one replace of C to A.

    Expected output: [('insert', 31, 31), ('insert', 31, 32), ('insert', 32, 34), ('insert', 32, 35), ('insert', 32, 36), ('replace', 32, 37), ('replace', 33, 38)]

    Actual output: [('insert', 31, 31), ('insert', 31, 32), ('insert', 32, 34), ('insert', 32, 35), ('insert', 32, 36), ('replace', 32, 37)]

    opened by danylofitel 2
  • Added pip install instructions

    Added pip install instructions

    I added some pip install instructions to the README. While the github repo name and the pip package name match, that can't always be assumed. This saves the step of searching PyPI for the right package.

    opened by timworx 2
  • segfault error when using seqratio

    segfault error when using seqratio

    I don't think this is a bug, is more of a enhancement / improvement.

    In [1]: import Levenshtein In [2]: Levenshtein.seqratio('ab', ['cd', 'ra', 'ab', 'abs']) Segmentation fault (core dumped)

    Note that the first parameter is a string, so it's not the appropriate usage. However it feels like it should be a TypeError at python level.

    opened by cpersico 2
  • Operation weights not configurable

    Operation weights not configurable

    Hello.

    Generally Lev allows configurable weights, I'd really like to see that here as using Lev as a substring "approximate" matcher doesn't work when the insert op is weighted the same as other operations. It's hard to tweak search results when the weights cannot be changed.

    opened by KthProg 1
  • Instillation error: AttributeError: module 'plugin.setuptools' has no attribute 'load_plux_entrypoints'

    Instillation error: AttributeError: module 'plugin.setuptools' has no attribute 'load_plux_entrypoints'

    python 3.9.10 pip 22.1 setuptools 62.3.2

    getting this error when trying to install with pip:

    AttributeError: module 'plugin.setuptools' has no attribute 'load_plux_entrypoints'
    

    Does this work under python 3.9?

    opened by awhillas 0
  • wheel for python310

    wheel for python310

    If You like people to use this on windows. please provide wheels for a recent python:

    building 'Levenshtein._levenshtein' extension
          error: Microsoft Visual C++ 14.0 or greater is required. Get it with "Microsoft C++ Build Tools": https://visualstudio.microsoft.com/visual-cpp-build-tools/
          [end of output]
    

    And then I need 9.5GB ov visual studio

    opened by fenchu 1
  • Incompatible architecture (apple silicon)

    Incompatible architecture (apple silicon)

    Running into an error with this package when running a project on apple silicon (M1)

    Using Python specified in "runtime": python3.7
      File "/Users/name/.local/share/virtualenvs/library1CORs3LD/lib/python3.7/site-packages/Levenshtein/__init__.py", line 1, in <module>
        from Levenshtein import _levenshtein
    ImportError: dlopen(/Users/name/.local/share/virtualenvs/library-1CORs3LD/lib/python3.7/site-packages/Levenshtein/_levenshtein.cpython-37m-darwin.so, 0x0002): tried: '/Users/name/.local/share/virtualenvs/library1CORs3LD/lib/python3.7/site-packages/Levenshtein/_levenshtein.cpython-37m-darwin.so' (mach-o file, but is an incompatible architecture (have 'x86_64', need 'arm64e')), '/usr/local/lib/_levenshtein.cpython-37m-darwin.so' (no such file), '/usr/lib/_levenshtein.cpython-37m-darwin.so' (no such file)
    

    (mach-o file, but is an incompatible architecture (have 'x86_64', need 'arm64e'

    Not sure of a solution here

    opened by jer-tx 1
  • installation fails on macOS 12.1

    installation fails on macOS 12.1

    python 3.10

    the error -

    102 warnings generated. clang -bundle -undefined dynamic_lookup -arch arm64 -arch x86_64 -g -L/usr/local/opt/icu4c/lib set -gx CPPFLAGS -I/usr/local/opt/icu4c/include build/temp.macosx-10.9-universal2-3.10/Levenshtein/_levenshtein.o -o build/lib.macosx-10.9-universal2-3.10/Levenshtein/_levenshtein.cpython-310-darwin.so clang: error: unknown argument: '-gx' clang: error: no such file or directory: 'set' clang: error: no such file or directory: 'CPPFLAGS' error: command '/usr/bin/clang' failed with exit code 1 [end of output]

    note: This error originates from a subprocess, and is likely not a problem with pip. error: legacy-install-failure

    × Encountered error while trying to install package. ╰─> python-Levenshtein

    note: This is an issue with the package mentioned above, not pip. hint: See above for output from the failure.

    opened by advik-student-dev 1
  • installation fails on MacOs 11.6

    installation fails on MacOs 11.6

    Installing this via pip gives:

        Running setup.py install for python-Levenshtein ... error
        ERROR: Command errored out with exit status 1:
         command: /Users/david/eebo-revo/env/bin/python3 -u -c 'import io, os, sys, setuptools, tokenize; sys.argv[0] = '"'"'/private/var/folders/xw/ttr6x0x97j328t5rh8bmky9h0000gn/T/pip-install-63nolnkf/python-levenshtein_b71ed4700818417a946a26877422615d/setup.py'"'"'; __file__='"'"'/private/var/folders/xw/ttr6x0x97j328t5rh8bmky9h0000gn/T/pip-install-63nolnkf/python-levenshtein_b71ed4700818417a946a26877422615d/setup.py'"'"';f = getattr(tokenize, '"'"'open'"'"', open)(__file__) if os.path.exists(__file__) else io.StringIO('"'"'from setuptools import setup; setup()'"'"');code = f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' install --record /private/var/folders/xw/ttr6x0x97j328t5rh8bmky9h0000gn/T/pip-record-wkm2rfxg/install-record.txt --single-version-externally-managed --compile --install-headers /Users/david/eebo-revo/env/include/site/python3.9/python-Levenshtein
             cwd: /private/var/folders/xw/ttr6x0x97j328t5rh8bmky9h0000gn/T/pip-install-63nolnkf/python-levenshtein_b71ed4700818417a946a26877422615d/
        Complete output (32 lines):
        running install
        running build
        running build_py
        creating build
        creating build/lib.macosx-11-x86_64-3.9
        creating build/lib.macosx-11-x86_64-3.9/Levenshtein
        copying Levenshtein/StringMatcher.py -> build/lib.macosx-11-x86_64-3.9/Levenshtein
        copying Levenshtein/__init__.py -> build/lib.macosx-11-x86_64-3.9/Levenshtein
        running egg_info
        writing python_Levenshtein.egg-info/PKG-INFO
        writing dependency_links to python_Levenshtein.egg-info/dependency_links.txt
        writing entry points to python_Levenshtein.egg-info/entry_points.txt
        writing namespace_packages to python_Levenshtein.egg-info/namespace_packages.txt
        writing requirements to python_Levenshtein.egg-info/requires.txt
        writing top-level names to python_Levenshtein.egg-info/top_level.txt
        adding license file 'COPYING' (matched pattern 'COPYING*')
        reading manifest file 'python_Levenshtein.egg-info/SOURCES.txt'
        reading manifest template 'MANIFEST.in'
        warning: no previously-included files matching '*pyc' found anywhere in distribution
        warning: no previously-included files matching '*so' found anywhere in distribution
        warning: no previously-included files matching '.project' found anywhere in distribution
        warning: no previously-included files matching '.pydevproject' found anywhere in distribution
        writing manifest file 'python_Levenshtein.egg-info/SOURCES.txt'
        copying Levenshtein/_levenshtein.c -> build/lib.macosx-11-x86_64-3.9/Levenshtein
        copying Levenshtein/_levenshtein.h -> build/lib.macosx-11-x86_64-3.9/Levenshtein
        running build_ext
        building 'Levenshtein._levenshtein' extension
        creating build/temp.macosx-11-x86_64-3.9
        creating build/temp.macosx-11-x86_64-3.9/Levenshtein
        clang -Wno-unused-result -Wsign-compare -Wunreachable-code -fno-common -dynamic -DNDEBUG -g -fwrapv -O3 -Wall -isysroot /Library/Developer/CommandLineTools/SDKs/MacOSX.sdk -I/usr/local/opt/libffi/include -I/usr/local/include -I/usr/local/opt/[email protected]/include -I/usr/local/opt/sqlite/include -I/Users/david/eebo-revo/env/include -I/usr/local/opt/[email protected]/Frameworks/Python.framework/Versions/3.9/include/python3.9 -c Levenshtein/_levenshtein.c -o build/temp.macosx-11-x86_64-3.9/Levenshtein/_levenshtein.o
        clang: error: invalid version number in 'MACOSX_DEPLOYMENT_TARGET=11'
        error: command '/usr/bin/clang' failed with exit code 1
        ----------------------------------------
    ERROR: Command errored out with exit status 1: /Users/david/eebo-revo/env/bin/python3 -u -c 'import io, os, sys, setuptools, tokenize; sys.argv[0] = '"'"'/private/var/folders/xw/ttr6x0x97j328t5rh8bmky9h0000gn/T/pip-install-63nolnkf/python-levenshtein_b71ed4700818417a946a26877422615d/setup.py'"'"'; __file__='"'"'/private/var/folders/xw/ttr6x0x97j328t5rh8bmky9h0000gn/T/pip-install-63nolnkf/python-levenshtein_b71ed4700818417a946a26877422615d/setup.py'"'"';f = getattr(tokenize, '"'"'open'"'"', open)(__file__) if os.path.exists(__file__) else io.StringIO('"'"'from setuptools import setup; setup()'"'"');code = f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' install --record /private/var/folders/xw/ttr6x0x97j328t5rh8bmky9h0000gn/T/pip-record-wkm2rfxg/install-record.txt --single-version-externally-managed --compile --install-headers /Users/david/eebo-revo/env/include/site/python3.9/python-Levenshtein Check the logs for full command output.
    WARNING: You are using pip version 21.1.1; however, version 21.3.1 is available.
    You should consider upgrading via the '/Users/david/eebo-revo/env/bin/python3 -m pip install --upgrade pip' command.
    
    opened by hughjonesd 1
Owner
Antti Haapala
Antti Haapala
Compute distance between sequences. 30+ algorithms, pure python implementation, common interface, optional external libs usage.

TextDistance TextDistance -- python library for comparing distance between two or more sequences by many algorithms. Features: 30+ algorithms Pure pyt

Life4 3k Jan 2, 2023
Fuzzy String Matching in Python

FuzzyWuzzy Fuzzy string matching like a boss. It uses Levenshtein Distance to calculate the differences between sequences in a simple-to-use package.

SeatGeek 8.8k Jan 8, 2023
strbind - lapidary text converter for translate an text file to the C-style string

strbind strbind - lapidary text converter for translate an text file to the C-style string. My motivation is fast adding large text chunks to the C co

Mihail Zaytsev 1 Oct 22, 2021
Converts a Bangla numeric string to literal words.

Bangla Number in Words Converts a Bangla numeric string to literal words. Install $ pip install banglanum2words Usage

Syed Mostofa Monsur 3 Aug 29, 2022
pydantic-i18n is an extension to support an i18n for the pydantic error messages.

pydantic-i18n is an extension to support an i18n for the pydantic error messages

Boardpack 48 Dec 21, 2022
An extension to detect if the articles content match its title.

Clickbait Detector An extension to detect if the articles content match its title. This was developed in a period of 24-hours in a hackathon called 'H

Arvind Krishna 5 Jul 26, 2022
PyMultiDictionary is a Dictionary Module for Python 3+ to get meanings, translations, synonyms and antonyms of words in 20 different languages

PyMultiDictionary PyMultiDictionary is a Dictionary Module for Python 3+ to get meanings, translations, synonyms and antonyms of words in 20 different

Pablo Pizarro R. 19 Dec 26, 2022
TextStatistics - Get a text file wich contains English text

TextStatistics This program get a text file wich contains English text. The program analyses the text, and print some information. For this program I

null 2 Nov 15, 2021
This repository contains scripts to control a RGB text fan attached to a Raspberry Pi.

RGB Text Fan Controller This repository contains scripts to control a RGB text fan attached to a Raspberry Pi. Setup The Raspberry Pi and RGB text fan

Luke Prior 1 Oct 1, 2021
A simple Python module for parsing human names into their individual components

Name Parser A simple Python (3.2+ & 2.6+) module for parsing human names into their individual components. hn.title hn.first hn.middle hn.last hn.suff

Derek Gulbranson 574 Dec 20, 2022
A non-validating SQL parser module for Python

python-sqlparse - Parse SQL statements sqlparse is a non-validating SQL parser for Python. It provides support for parsing, splitting and formatting S

Andi Albrecht 3.1k Jan 4, 2023
Implementation of hashids (http://hashids.org) in Python. Compatible with Python 2 and Python 3

hashids for Python 2.7 & 3 A python port of the JavaScript hashids implementation. It generates YouTube-like hashes from one or many numbers. Use hash

David Aurelio 1.4k Jan 2, 2023
Markup is an online annotation tool that can be used to transform unstructured documents into structured formats for NLP and ML tasks, such as named-entity recognition. Markup learns as you annotate in order to predict and suggest complex annotations. Markup also provides integrated access to existing and custom ontologies, enabling the prediction and suggestion of ontology mappings based on the text you're annotating.

Markup is an online annotation tool that can be used to transform unstructured documents into structured formats for NLP and ML tasks, such as named-entity recognition. Markup learns as you annotate in order to predict and suggest complex annotations. Markup also provides integrated access to existing and custom ontologies, enabling the prediction and suggestion of ontology mappings based on the text you're annotating.

Samuel Dobbie 146 Dec 18, 2022
A Python library that provides an easy way to identify devices like mobile phones, tablets and their capabilities by parsing (browser) user agent strings.

Python User Agents user_agents is a Python library that provides an easy way to identify/detect devices like mobile phones, tablets and their capabili

Selwin Ong 1.3k Dec 22, 2022
Etranslate is a free and unlimited python library for transiting your texts

Etranslate is a free and unlimited python library for transiting your texts

Abolfazl Khalili 16 Sep 13, 2022
a python package that lets you add custom colors and text formatting to your scripts in a very easy way!

colormate Python script text formatting package What is colormate? colormate is a python library that lets you add text formatting to your scripts, it

Rodrigo 2 Dec 14, 2022
Build a translation program similar to Google Translate with Python programming language and QT library

google-translate Build a translation program similar to Google Translate with Python programming language and QT library Different parts of the progra

Amir Hussein Sharifnezhad 3 Oct 9, 2021
A Python package to facilitate research on building and evaluating automated scoring models.

Rater Scoring Modeling Tool Introduction Automated scoring of written and spoken test responses is a growing field in educational natural language pro

ETS 59 Oct 10, 2022
Deasciify-highlighted - A Python script for deasciifying text to Turkish and copying clipboard

deasciify-highlighted is a Python script for deasciifying text to Turkish and copying clipboard.

Ümit Altıntaş 3 Mar 18, 2022