RapidFuzz is a fast string matching library for Python and C++

Max Bachmann

Last update: Jan 4, 2023

Related tags

General Utilities python cpp levenshtein levenshtein-distance string-matching string-similarity string-comparison

Overview

Rapid fuzzy string matching in Python and C++ using the Levenshtein Distance

Description • Installation • Usage • License

Description

RapidFuzz is a fast string matching library for Python and C++, which is using the string similarity calculations from FuzzyWuzzy. However there are two aspects that set RapidFuzz apart from FuzzyWuzzy:

It is MIT licensed so it can be used whichever License you might want to choose for your project, while you're forced to adopt the GPL license when using FuzzyWuzzy
It is mostly written in C++ and on top of this comes with a lot of Algorithmic improvements to make string matching even faster, while still providing the same results. More details on these performance improvements in form of benchmarks can be found here

Requirements

Python 3.5 or later
On Windows the Visual C++ 2019 redistributable is required

Installation

There are several ways to install RapidFuzz, the recommended methods are to either use pip(the Python package manager) or conda (an open-source, cross-platform, package manager)

with pip

RapidFuzz can be installed with pip the following way:

pip install rapidfuzz

There are pre-built binaries (wheels) of RapidFuzz for MacOS (10.9 and later), Linux x86_64 and Windows. Wheels for armv6l (Raspberry Pi Zero) and armv7l (Raspberry Pi) are available on piwheels.

✖️ failure "ImportError: DLL load failed"

If you run into this error on Windows the reason is most likely, that the Visual C++ 2019 redistributable is not installed, which is required to find C++ Libraries (The C++ 2019 version includes the 2015, 2017 and 2019 version).

with conda

RapidFuzz can be installed with conda:

conda install -c conda-forge rapidfuzz

from git

RapidFuzz can be installed directly from the source distribution by cloning the repository. This requires a C++14 capable compiler.

git clone https://github.com/maxbachmann/rapidfuzz.git
cd rapidfuzz
pip install .

Usage

Some simple functions are shown below. A complete documentation of all functions can be found here.

Scorers

Scorers in RapidFuzz can be found in the modules fuzz and string_metric.

Simple Ratio

> fuzz.ratio("this is a test", "this is a test!")
96.55171966552734

Partial Ratio

> fuzz.partial_ratio("this is a test", "this is a test!")
100.0

Token Sort Ratio

fuzz.token_sort_ratio("fuzzy wuzzy was a bear", "wuzzy fuzzy was a bear") 100.0 ">

> fuzz.ratio("fuzzy wuzzy was a bear", "wuzzy fuzzy was a bear")
90.90908813476562
> fuzz.token_sort_ratio("fuzzy wuzzy was a bear", "wuzzy fuzzy was a bear")
100.0

Token Set Ratio

fuzz.token_set_ratio("fuzzy was a bear", "fuzzy fuzzy was a bear") 100.0 ">

> fuzz.token_sort_ratio("fuzzy was a bear", "fuzzy fuzzy was a bear")
83.8709716796875
> fuzz.token_set_ratio("fuzzy was a bear", "fuzzy fuzzy was a bear")
100.0

Process

The process module makes it compare strings to lists of strings. This is generally more performant than using the scorers directly from Python. Here are some examples on the usage of processors in RapidFuzz:

process.extract("new york jets", choices, scorer=fuzz.WRatio, limit=2) [('New York Jets', 100, 1), ('New York Giants', 78.57142639160156, 2)] > process.extractOne("cowboys", choices, scorer=fuzz.WRatio) ("Dallas Cowboys", 90, 3) ">

> from rapidfuzz import process, fuzz
> choices = ["Atlanta Falcons", "New York Jets", "New York Giants", "Dallas Cowboys"]
> process.extract("new york jets", choices, scorer=fuzz.WRatio, limit=2)
[('New York Jets', 100, 1), ('New York Giants', 78.57142639160156, 2)]
> process.extractOne("cowboys", choices, scorer=fuzz.WRatio)
("Dallas Cowboys", 90, 3)

The full documentation of processors can be found here

Benchmark

The following benchmark gives a quick performance comparision between RapidFuzz and FuzzyWuzzy. More detailed benchmarks for the string metrics can be found in the documentation. For this simple comparision I generated a list of 10.000 strings with length 10, that is compared to a sample of 100 elements from this list:

words = [
  ''.join(random.choice(string.ascii_letters + string.digits) for _ in range(10))
  for _ in range(10_000)
]
samples = words[::len(words) // 100]

The first benchmark compares the performance of the scorers in FuzzyWuzzy and RapidFuzz when they are used directly from Python in the following way:

for sample in samples:
  for word in words:
    scorer(sample, word)

The following graph shows how many elements are processed per second with each of the scorers. There are big performance differences between the different scorers. However each of the scorers is faster in RapidFuzz

The second benchmark compares the performance when the scorers are used in combination with extractOne in the following way:

for sample in samples:
  extractOne(sample, word, scorer=scorer)

The following graph shows how many elements are processed per second with each of the scorers. In RapidFuzz the usage of scorers through processors like extractOne is a lot faster than directly using it. Thats why they should be used whenever possible.

License

RapidFuzz is licensed under the MIT license since I believe that everyone should be able to use it without being forced to adopt the GPL license. Thats why the library is based on an older version of fuzzywuzzy that was MIT licensed as well. This old version of fuzzywuzzy can be found here.

Comments

Read the Docs Setup

Creating a pull for the Read the Docs setup and starting documentation. This PR is towards the issue #17.

For more extensive documentation on functions we should add extensive docstrings in each of the functions. Then the main file can be changed to pull those docstrings from the code.

opened by TrigonaMinima 18

2.13.3: test suite is failing because it cannot find `tests.common`

I'm packaging your module as an rpm package so I'm using the typical PEP517 based build, install and test cycle used on building packages from non-root account.

python3 -sBm build -w --no-isolation
because I'm calling build with --no-isolation I'm using during all processes only locally installed modules
install .whl file in </install/prefix>
run pytest with PYTHONPATH pointing to sitearch and sitelib inside </install/prefix>

Here is pytest output:

+ PYTHONPATH=/home/tkloczko/rpmbuild/BUILDROOT/python-rapidfuzz-2.13.3-2.fc35.x86_64/usr/lib64/python3.8/site-packages:/home/tkloczko/rpmbuild/BUILDROOT/python-rapidfuzz-2.13.3-2.fc35.x86_64/usr/lib/python3.8/site-packages
+ /usr/bin/pytest -ra --import-mode=importlib
=========================================================================== test session starts ============================================================================
platform linux -- Python 3.8.15, pytest-7.2.0, pluggy-1.0.0
rootdir: /home/tkloczko/rpmbuild/BUILD/rapidfuzz-2.13.3, configfile: pyproject.toml, testpaths: tests
plugins: hypothesis-6.58.2
collected 75 items / 11 errors

================================================================================== ERRORS ==================================================================================
___________________________________________________________________ ERROR collecting tests/test_fuzz.py ____________________________________________________________________
ImportError while importing test module '/home/tkloczko/rpmbuild/BUILD/rapidfuzz-2.13.3/tests/test_fuzz.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
tests/test_fuzz.py:7: in <module>
    from .common import symmetric_scorer_tester, scorer_tester
E   ModuleNotFoundError: No module named 'tests.common'; 'tests' is not a package
________________________________________________________ ERROR collecting tests/distance/test_DamerauLevenshtein.py ________________________________________________________
ImportError while importing test module '/home/tkloczko/rpmbuild/BUILD/rapidfuzz-2.13.3/tests/distance/test_DamerauLevenshtein.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
tests/distance/test_DamerauLevenshtein.py:4: in <module>
    from ..common import GenericScorer
E   ModuleNotFoundError: No module named 'tests.common'; 'tests' is not a package
_____________________________________________________________ ERROR collecting tests/distance/test_Hamming.py ______________________________________________________________
ImportError while importing test module '/home/tkloczko/rpmbuild/BUILD/rapidfuzz-2.13.3/tests/distance/test_Hamming.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
tests/distance/test_Hamming.py:2: in <module>
    from ..common import GenericScorer
E   ModuleNotFoundError: No module named 'tests.common'; 'tests' is not a package
______________________________________________________________ ERROR collecting tests/distance/test_Indel.py _______________________________________________________________
ImportError while importing test module '/home/tkloczko/rpmbuild/BUILD/rapidfuzz-2.13.3/tests/distance/test_Indel.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
tests/distance/test_Indel.py:2: in <module>
    from ..common import GenericScorer
E   ModuleNotFoundError: No module named 'tests.common'; 'tests' is not a package
_______________________________________________________________ ERROR collecting tests/distance/test_Jaro.py _______________________________________________________________
ImportError while importing test module '/home/tkloczko/rpmbuild/BUILD/rapidfuzz-2.13.3/tests/distance/test_Jaro.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
tests/distance/test_Jaro.py:4: in <module>
    from ..common import GenericScorer
E   ModuleNotFoundError: No module named 'tests.common'; 'tests' is not a package
___________________________________________________________ ERROR collecting tests/distance/test_JaroWinkler.py ____________________________________________________________
ImportError while importing test module '/home/tkloczko/rpmbuild/BUILD/rapidfuzz-2.13.3/tests/distance/test_JaroWinkler.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
tests/distance/test_JaroWinkler.py:4: in <module>
    from ..common import GenericScorer
E   ModuleNotFoundError: No module named 'tests.common'; 'tests' is not a package
______________________________________________________________ ERROR collecting tests/distance/test_LCSseq.py ______________________________________________________________
ImportError while importing test module '/home/tkloczko/rpmbuild/BUILD/rapidfuzz-2.13.3/tests/distance/test_LCSseq.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
tests/distance/test_LCSseq.py:2: in <module>
    from ..common import GenericScorer
E   ModuleNotFoundError: No module named 'tests.common'; 'tests' is not a package
___________________________________________________________ ERROR collecting tests/distance/test_Levenshtein.py ____________________________________________________________
ImportError while importing test module '/home/tkloczko/rpmbuild/BUILD/rapidfuzz-2.13.3/tests/distance/test_Levenshtein.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
tests/distance/test_Levenshtein.py:3: in <module>
    from ..common import GenericScorer
E   ModuleNotFoundError: No module named 'tests.common'; 'tests' is not a package
_______________________________________________________________ ERROR collecting tests/distance/test_OSA.py ________________________________________________________________
ImportError while importing test module '/home/tkloczko/rpmbuild/BUILD/rapidfuzz-2.13.3/tests/distance/test_OSA.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
tests/distance/test_OSA.py:2: in <module>
    from ..common import GenericScorer
E   ModuleNotFoundError: No module named 'tests.common'; 'tests' is not a package
_____________________________________________________________ ERROR collecting tests/distance/test_Postfix.py ______________________________________________________________
ImportError while importing test module '/home/tkloczko/rpmbuild/BUILD/rapidfuzz-2.13.3/tests/distance/test_Postfix.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
tests/distance/test_Postfix.py:2: in <module>
    from ..common import GenericScorer
E   ModuleNotFoundError: No module named 'tests.common'; 'tests' is not a package
______________________________________________________________ ERROR collecting tests/distance/test_Prefix.py ______________________________________________________________
ImportError while importing test module '/home/tkloczko/rpmbuild/BUILD/rapidfuzz-2.13.3/tests/distance/test_Prefix.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
tests/distance/test_Prefix.py:2: in <module>
    from ..common import GenericScorer
E   ModuleNotFoundError: No module named 'tests.common'; 'tests' is not a package
========================================================================= short test summary info ==========================================================================
ERROR tests/test_fuzz.py
ERROR tests/distance/test_DamerauLevenshtein.py
ERROR tests/distance/test_Hamming.py
ERROR tests/distance/test_Indel.py
ERROR tests/distance/test_Jaro.py
ERROR tests/distance/test_JaroWinkler.py
ERROR tests/distance/test_LCSseq.py
ERROR tests/distance/test_Levenshtein.py
ERROR tests/distance/test_OSA.py
ERROR tests/distance/test_Postfix.py
ERROR tests/distance/test_Prefix.py
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Interrupted: 11 errors during collection !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
============================================================================ 11 errors in 1.11s ============================================================================

Here is list of installed modules in build env

Package                       Version
----------------------------- -----------------
alabaster                     0.7.12
appdirs                       1.4.4
attrs                         22.1.0
Babel                         2.11.0
Brlapi                        0.8.3
build                         0.9.0
charset-normalizer            3.0.1
contourpy                     1.0.6
cssselect                     1.1.0
cycler                        0.11.0
Cython                        0.29.32
distro                        1.8.0
dnspython                     2.2.1
docutils                      0.19
exceptiongroup                1.0.0
extras                        1.0.0
fixtures                      4.0.0
fonttools                     4.38.0
gpg                           1.17.1-unknown
hypothesis                    6.58.2
idna                          3.4
imagesize                     1.4.1
importlib-metadata            5.1.0
iniconfig                     1.1.1
Jinja2                        3.1.2
kiwisolver                    1.4.4
latexcodec                    2.0.1
libcomps                      0.1.19
louis                         3.23.0
lxml                          4.9.1
MarkupSafe                    2.1.1
matplotlib                    3.6.2
numpy                         1.23.1
olefile                       0.46
packaging                     21.3
pbr                           5.9.0
pep517                        0.13.0
Pillow                        9.3.0
pip                           22.3.1
pluggy                        1.0.0
pybtex                        0.24.0
pybtex-docutils               1.0.2
Pygments                      2.13.0
PyGObject                     3.42.2
pyparsing                     3.0.9
pytest                        7.2.0
python-dateutil               2.8.2
pytz                          2022.4
PyYAML                        6.0
requests                      2.28.1
rpm                           4.17.0
scikit-build                  0.16.2
scour                         0.38.2
setuptools                    65.6.3
six                           1.16.0
snowballstemmer               2.2.0
sortedcontainers              2.4.0
Sphinx                        5.3.0
sphinxcontrib-applehelp       1.0.2.dev20221204
sphinxcontrib-bibtex          2.5.0
sphinxcontrib-devhelp         1.0.2.dev20221204
sphinxcontrib-htmlhelp        2.0.0
sphinxcontrib-jsmath          1.0.1.dev20221204
sphinxcontrib-qthelp          1.0.3.dev20221204
sphinxcontrib-serializinghtml 1.1.5
testtools                     2.5.0
tomli                         2.0.1
urllib3                       1.26.12
wheel                         0.38.4
zipp                          3.11.0

opened by kloczek 16

Fuzzy search by prefix?

Good day, which would be the best approach/scorer to use in order to have better results when trying to match in same order (by prefix)? for example

For example If I search the value "38050" within a list that contains "358", the result is ('358', 72.0, 8) because 3, 5 and 8 are present in 38050, but for me is not of interest since 3, 5 and 8 are in different order. Would be a match for me if the choice found would be 380XX, that has similarity in prefix compared with 38050.

The issue is related with this question I made yesterday and they suggest to use another scorer or try differents.

https://stackoverflow.com/questions/74093719/how-to-make-fuzzy-search-between-lists-showing-matches-and-not-found-elements

Thanks in advance.
question

opened by RasecMalkic 13
Could you please provide compatibility with Cython 0.29.x?

I've been trying to package RapidFuzz for Gentoo. Unfortunately, we're nowhere near close to being ready to switch to Cython 3.x, so the requirement on alpha version of Cython makes it impossible for us to package it. Could you please consider providing compatibility with the current release versions of Cython?
enhancement

opened by mgorny 13

Issues packaging with cx_freeze

I'm currently trying to package an application that uses rapidfuzz using cx_freeze. The packaging is successful but when I try to run the application I get the following error.

implementation requires numpy to be installed

I'm using the process.cdist and I understand numpy is required for the matrix output however I also import numpy at the beginning of the module which calls rapidfuzz without issue so the dependency must be there.

basically what I would like to understand is if there is a way to test which sub modules from numpy are required but are failing to import?

Below is my setup.py

import sys
from cx_Freeze import setup, Executable
from setuptools import find_packages

options = {
    "build_exe": {
        "zip_include_packages": ["*"],
        "zip_exclude_packages": [],
        "build_exe": "dist\\",
        "includes": [
            "numpy",
            "numpy.int16",
            "numpy.int64",
            "_pytest._argcomplete",
            "_pytest._code.code",
            "_pytest._code.source",
            "_pytest._io.saferepr",
            "_pytest._io.terminalwriter",
            "_pytest._io.wcwidth",
            "_pytest._version",
            "_pytest.assertion.rewrite",
            "_pytest.assertion.truncate",
            "_pytest.assertion.util",
            "_pytest.cacheprovider",
            "_pytest.capture",
            "_pytest.compat",
            "_pytest.config.argparsing",
            "_pytest.config.compat",
            "_pytest.config.exceptions",
            "_pytest.config.findpaths",
            "_pytest.debugging",
            "_pytest.deprecated",
            "_pytest.doctest",
            "_pytest.faulthandler",
            "_pytest.fixtures",
            "_pytest.freeze_support",
            "_pytest.helpconfig",
            "_pytest.hookspec",
            "_pytest.junitxml",
            "_pytest.legacypath",
            "_pytest.logging",
            "_pytest.main",
            "_pytest.mark.expression",
            "_pytest.mark.structures",
            "_pytest.monkeypatch",
            "_pytest.nodes",
            "_pytest.nose",
            "_pytest.outcomes",
            "_pytest.pastebin",
            "_pytest.pathlib",
            "_pytest.pytester",
            "_pytest.pytester_assertions",
            "_pytest.python",
            "_pytest.python_api",
            "_pytest.python_path",
            "_pytest.recwarn",
            "_pytest.reports",
            "_pytest.runner",
            "_pytest.scope",
            "_pytest.setuponly",
            "_pytest.setupplan",
            "_pytest.skipping",
            "_pytest.stash",
            "_pytest.stepwise",
            "_pytest.terminal",
            "_pytest.threadexception",
            "_pytest.timing",
            "_pytest.tmpdir",
            "_pytest.unittest",
            "_pytest.unraisableexception",
            "_pytest.warning_types",
            "_pytest.warnings",
            "py._builtin",
            "py._path.local",
            "py._io.capture",
            "py._io.saferepr",
            "py._io.terminalwriter",
            "py._xmlgen",
            "py._error",
            "py._std",
            # builtin files imported by pytest using py.std implicit mechanism
            "argparse",
            "shlex",
            "warnings",
            "types",
            "rapidfuzz.utils_cpp",
            "rapidfuzz.utils_py",
            "rapidfuzz.process_py",
            "rapidfuzz.fuzz_py",
            "rapidfuzz.distance.Hamming_py",
            "rapidfuzz.process_cpp",
            "rapidfuzz.fuzz_cpp",
            "rapidfuzz.distance.Levenshtein_cpp",
            "rapidfuzz.distance.Levenshtein_py",
            "rapidfuzz.string_metric_cpp",
            "rapidfuzz.string_metric_py",
            "jinja2.ext",
            "jinja2",
        ],
        "include_files": ["tests/"],
    }
}


f = open("README.md", "r")
LONG_DESCRIPTION = f.read()
f.close()

setup(
    name="centralized_integrations",
    version="0.1",
    description="xxx",
    long_description=LONG_DESCRIPTION,
    long_description_content_type="text/markdown",
    author="xxx",
    author_email="xxx",
    url="xxx",
    license="BSD 3-Clause License",
    packages=find_packages(exclude=["ez_setup"]),
    options=options,
    include_package_data=True,
    executables=[Executable("cli/main.py", base=None)],
    entry_points="""
        [console_scripts]
        cli = cli.main:main
    """,
)

bug

opened by NaaN108 12

2.0.1 fails to build: Expected ':', found 'class'

Error compiling Cython file:
------------------------------------------------------------
...
)

from array import array

cdef extern from "rapidfuzz/details/types.hpp" namespace "rapidfuzz" nogil:
    cpdef enum class EditType:
              ^
------------------------------------------------------------

/disk-samsung/freebsd-ports/devel/py-rapidfuzz/work-py38/rapidfuzz-2.0.1/src/cython/cpp_common.pxd:23:15: Expected ':', found 'class'

Error compiling Cython file:
------------------------------------------------------------
...
# distutils: language=c++
# cython: language_level=3, binding=True, linetrace=True

from rapidfuzz_capi cimport RF_String, PREPROCESSOR_STRUCT_VERSION, RF_Preprocessor
from cpp_common cimport (
^
------------------------------------------------------------

/disk-samsung/freebsd-ports/devel/py-rapidfuzz/work-py38/rapidfuzz-2.0.1/src/cython/cpp_utils.pyx:5:0: 'cpp_common/is_valid_string.pxd' not found

Error compiling Cython file:
------------------------------------------------------------
...
# distutils: language=c++
# cython: language_level=3, binding=True, linetrace=True

from rapidfuzz_capi cimport RF_String, PREPROCESSOR_STRUCT_VERSION, RF_Preprocessor
from cpp_common cimport (
^
------------------------------------------------------------

/disk-samsung/freebsd-ports/devel/py-rapidfuzz/work-py38/rapidfuzz-2.0.1/src/cython/cpp_utils.pyx:5:0: 'cpp_common/convert_string.pxd' not found

Error compiling Cython file:
------------------------------------------------------------
...
# distutils: language=c++
# cython: language_level=3, binding=True, linetrace=True

from rapidfuzz_capi cimport RF_String, PREPROCESSOR_STRUCT_VERSION, RF_Preprocessor
from cpp_common cimport (
^
------------------------------------------------------------

/disk-samsung/freebsd-ports/devel/py-rapidfuzz/work-py38/rapidfuzz-2.0.1/src/cython/cpp_utils.pyx:5:0: 'cpp_common/hash_array.pxd' not found

Error compiling Cython file:
------------------------------------------------------------
...
# distutils: language=c++
# cython: language_level=3, binding=True, linetrace=True

from rapidfuzz_capi cimport RF_String, PREPROCESSOR_STRUCT_VERSION, RF_Preprocessor
from cpp_common cimport (
^
------------------------------------------------------------

/disk-samsung/freebsd-ports/devel/py-rapidfuzz/work-py38/rapidfuzz-2.0.1/src/cython/cpp_utils.pyx:5:0: 'cpp_common/hash_sequence.pxd' not found

Error compiling Cython file:
------------------------------------------------------------
...
# distutils: language=c++
# cython: language_level=3, binding=True, linetrace=True

from rapidfuzz_capi cimport RF_String, PREPROCESSOR_STRUCT_VERSION, RF_Preprocessor
from cpp_common cimport (
^
------------------------------------------------------------

/disk-samsung/freebsd-ports/devel/py-rapidfuzz/work-py38/rapidfuzz-2.0.1/src/cython/cpp_utils.pyx:5:0: 'cpp_common/conv_sequence.pxd' not found

Error compiling Cython file:
------------------------------------------------------------
...
    validate_string(sentence, "sentence must be a String")
    return default_process_impl(sentence)


cdef bool default_process_capi(sentence, RF_String* str_) except False:
    proc_str = conv_sequence(sentence)
              ^
------------------------------------------------------------

/disk-samsung/freebsd-ports/devel/py-rapidfuzz/work-py38/rapidfuzz-2.0.1/src/cython/cpp_utils.pyx:44:15: 'conv_sequence' is not a constant, variable or function identifier

Error compiling Cython file:
------------------------------------------------------------

Python-3.8 cython-0.29.26 FreeBSD 13

bug

opened by yurivict 12

Inconsistent behavior between process.extractOne and fuzz.ratio

I am using rapidfuzz for similarity of devanagari words. Here's a reproducible example.

from rapidfuzz import fuzz, process

word = 'मस्सा'
wordlist = {'शर्ट', 'वर्ट', 'वार्ट', 'वऑर्ट', 'वॉर्ट'}

process.extractOne(word, wordlist)
# gives: ('वार्ट', 100.0)

for i in wordlist: print(fuzz.ratio(i, word))
# gives
# 22.22222222222222
# 22.22222222222222
# 19.999999999999996
# 19.999999999999996
# 19.999999999999996

In the output of process.extractOne best score should be 22.22 instead of 100.0.

Environment

Python: 3.6.10 Rapidfuzz: 0.7.8

opened by TrigonaMinima 12

Asian languages usage

Hello,

Not an issue per say, but I was curious about possible asian languages compatibility. Chinese, Korean, and Japanese all perform very poorly with Levenshtein distance matching because they’re not alphabet-based.

A solution would be to use a romanisation library to translate them to alphabet characters in the same way that all characters are put in lowercase.

I’d love to give a hand but unfortunately my cpp is pretty rusty and I’d do more harm than good.
enhancement

opened by T1-Tolki 12
pure Python mode fails to import

This was reported here: https://github.com/python-poetry/poetry/issues/6078 It should be relatively simple to add these missing functions to the pure Python version.

@pekkarr how does the aur ship packages? If it ships binaries those appear to be broken otherwise it would not even attempt to use the pure Python fallback version. You might want to set the environment variable RAPIDFUZZ_BUILD_EXTENSION while packaging to make sure that when a build error occurs it does not simply package the pure Python version.
bug

opened by maxbachmann 10

Poetry update is failing for some reason with the latest

i3@sleipnir:~/Documents/Projects/libretrofuzz$ poetry update -vvv
Using virtualenv: /home/i3/.cache/pypoetry/virtualenvs/libretrofuzz-VjwDmzHD-py3.8
Updating dependencies
Resolving dependencies...
   1: fact: libretrofuzz is 2.4.1
   1: derived: libretrofuzz
   1: fact: libretrofuzz depends on beautifulsoup4 (^4.10.0)
   1: fact: libretrofuzz depends on questionary (^1.10.0)
   1: fact: libretrofuzz depends on typer (^0.5.0)
   1: fact: libretrofuzz depends on rapidfuzz (^2.0.8)
   1: fact: libretrofuzz depends on httpx (^0.23.0)
   1: fact: libretrofuzz depends on tqdm (^4.64.0)
   1: fact: libretrofuzz depends on prompt_toolkit (^3.0.30)
   1: fact: libretrofuzz depends on pillow (^7.0.0)
   1: fact: libretrofuzz depends on pytest (^5.2)
   1: fact: libretrofuzz depends on pytest (^5.2)
   1: selecting libretrofuzz (2.4.1)
   1: derived: pytest (>=5.2,<6.0)
   1: derived: pillow (>=7.0.0,<8.0.0)
   1: derived: prompt_toolkit (>=3.0.30,<4.0.0)
   1: derived: tqdm (>=4.64.0,<5.0.0)
   1: derived: httpx (>=0.23.0,<0.24.0)
   1: derived: rapidfuzz (>=2.0.8,<3.0.0)
   1: derived: typer[all] (>=0.5.0,<0.6.0)
   1: derived: questionary (>=1.10.0,<2.0.0)
   1: derived: beautifulsoup4 (>=4.10.0,<5.0.0)
PyPI: 15 packages found for pytest >=5.2,<6.0
   1: fact: pytest (5.4.3) depends on py (>=1.5.0)
   1: fact: pytest (5.4.3) depends on packaging (*)
   1: fact: pytest (5.4.3) depends on attrs (>=17.4.0)
   1: fact: pytest (5.4.3) depends on more-itertools (>=4.0.0)
   1: fact: pytest (5.4.3) depends on pluggy (>=0.12,<1.0)
   1: fact: pytest (5.4.3) depends on wcwidth (*)
   1: fact: pytest (5.4.3) depends on atomicwrites (>=1.0)
   1: fact: pytest (5.4.3) depends on colorama (*)
   1: selecting pytest (5.4.3)
   1: derived: colorama
   1: derived: atomicwrites (>=1.0)
   1: derived: wcwidth
   1: derived: pluggy (>=0.12,<1.0)
   1: derived: more-itertools (>=4.0.0)
   1: derived: attrs (>=17.4.0)
   1: derived: packaging
   1: derived: py (>=1.5.0)
PyPI: 5 packages found for pillow >=7.0.0,<8.0.0
   1: selecting pillow (7.2.0)
PyPI: 1 packages found for prompt-toolkit >=3.0.30,<4.0.0
   1: fact: prompt-toolkit (3.0.30) depends on wcwidth (*)
   1: selecting prompt-toolkit (3.0.30)
PyPI: No release information found for tqdm-2.0.0.dev0, skipping
PyPI: 1 packages found for tqdm >=4.64.0,<5.0.0
   1: fact: tqdm (4.64.0) depends on colorama (*)
   1: selecting tqdm (4.64.0)
PyPI: No release information found for httpx-0.0.1, skipping
PyPI: 1 packages found for httpx >=0.23.0,<0.24.0
   1: fact: httpx (0.23.0) depends on certifi (*)
   1: fact: httpx (0.23.0) depends on sniffio (*)
   1: fact: httpx (0.23.0) depends on rfc3986 (>=1.3,<2)
   1: fact: httpx (0.23.0) depends on httpcore (>=0.15.0,<0.16.0)
   1: selecting httpx (0.23.0)
   1: derived: httpcore (>=0.15.0,<0.16.0)
   1: derived: rfc3986[idna2008] (>=1.3,<2)
   1: derived: sniffio
   1: derived: certifi
PyPI: 15 packages found for rapidfuzz >=2.0.8,<3.0.0
   1: fact: rapidfuzz (2.3.0) depends on jarowinkler (>=1.2.0,<2.0.0)
   1: selecting rapidfuzz (2.3.0)
   1: derived: jarowinkler (>=1.2.0,<2.0.0)
PyPI: 1 packages found for typer >=0.5.0,<0.6.0
   1: fact: typer (0.5.0) depends on typer (0.5.0)
   1: fact: typer (0.5.0) depends on rich (>=10.11.0,<13.0.0)
   1: fact: typer (0.5.0) depends on shellingham (>=1.3.0,<2.0.0)
   1: fact: typer (0.5.0) depends on colorama (>=0.4.3,<0.5.0)
   1: fact: typer (0.5.0) depends on click (>=7.1.1,<9.0.0)
   1: selecting typer[all] (0.5.0)
   1: derived: click (>=7.1.1,<9.0.0)
   1: derived: colorama (>=0.4.3,<0.5.0)
   1: derived: shellingham (>=1.3.0,<2.0.0)
   1: derived: rich (>=10.11.0,<13.0.0)
   1: derived: typer (==0.5.0)
PyPI: 1 packages found for questionary >=1.10.0,<2.0.0
   1: fact: questionary (1.10.0) depends on prompt_toolkit (>=2.0,<4.0)
   1: selecting questionary (1.10.0)
PyPI: 3 packages found for beautifulsoup4 >=4.10.0,<5.0.0
   1: fact: beautifulsoup4 (4.11.1) depends on soupsieve (>1.2)
   1: selecting beautifulsoup4 (4.11.1)
   1: derived: soupsieve (>1.2)
PyPI: 17 packages found for wcwidth *
   1: selecting wcwidth (0.2.5)
PyPI: 3 packages found for pluggy >=0.12,<1.0
   1: selecting pluggy (0.13.1)
PyPI: 26 packages found for more-itertools >=4.0.0
   1: selecting more-itertools (8.13.0)
PyPI: 13 packages found for attrs >=17.4.0
   1: selecting attrs (21.4.0)
PyPI: 39 packages found for packaging *
   1: fact: packaging (21.3) depends on pyparsing (>=2.0.2,<3.0.5 || >3.0.5)
   1: selecting packaging (21.3)
   1: derived: pyparsing (>=2.0.2,!=3.0.5)
PyPI: No release information found for py-0.8.0-alpha2, skipping
PyPI: No release information found for py-0.9.0, skipping
PyPI: No release information found for py-1.4.32.dev1, skipping
PyPI: 12 packages found for py >=1.5.0
   1: selecting py (1.11.0)
PyPI: 1 packages found for httpcore >=0.15.0,<0.16.0
   1: fact: httpcore (0.15.0) depends on h11 (>=0.11,<0.13)
   1: fact: httpcore (0.15.0) depends on sniffio (>=1.0.0,<2.0.0)
   1: fact: httpcore (0.15.0) depends on anyio (>=3.0.0,<4.0.0)
   1: fact: httpcore (0.15.0) depends on certifi (*)
   1: selecting httpcore (0.15.0)
   1: derived: anyio (>=3.0.0,<4.0.0)
   1: derived: sniffio (>=1.0.0,<2.0.0)
   1: derived: h11 (>=0.11,<0.13)
PyPI: No release information found for rfc3986-0.0.0, skipping
PyPI: 5 packages found for rfc3986 >=1.3,<2
   1: fact: rfc3986 (1.5.0) depends on rfc3986 (1.5.0)
   1: fact: rfc3986 (1.5.0) depends on idna (*)
   1: selecting rfc3986[idna2008] (1.5.0)
   1: derived: idna
   1: derived: rfc3986 (==1.5.0)
PyPI: 3 packages found for sniffio >=1.0.0,<2.0.0
   1: selecting sniffio (1.2.0)
PyPI: No release information found for certifi-0, skipping
PyPI: 48 packages found for certifi *
   1: selecting certifi (2022.6.15)
PyPI: 1 packages found for jarowinkler >=1.2.0,<2.0.0
   1: selecting jarowinkler (1.2.0)
PyPI: 11 packages found for click >=7.1.1,<9.0.0
   1: fact: click (8.1.3) depends on colorama (*)
   1: selecting click (8.1.3)
PyPI: 29 packages found for soupsieve >1.2
   1: selecting soupsieve (2.3.2.post1)
PyPI: No release information found for pyparsing-1.1.2, skipping
PyPI: No release information found for pyparsing-1.2, skipping
PyPI: No release information found for pyparsing-1.3.3, skipping
PyPI: 39 packages found for pyparsing >=2.0.2,<3.0.5 || >3.0.5
   1: selecting pyparsing (3.0.9)
PyPI: 14 packages found for anyio >=3.0.0,<4.0.0
   1: fact: anyio (3.6.1) depends on idna (>=2.8)
   1: fact: anyio (3.6.1) depends on sniffio (>=1.1)
   1: selecting anyio (3.6.1)
   1: derived: idna (>=2.8)
PyPI: No release information found for h11-0.0.1, skipping
PyPI: 2 packages found for h11 >=0.11,<0.13
   1: selecting h11 (0.12.0)
PyPI: No release information found for rfc3986-0.0.0, skipping
PyPI: 1 packages found for rfc3986 1.5.0
   1: selecting rfc3986 (1.5.0)
PyPI: 3 packages found for colorama >=0.4.3,<0.5.0
   1: selecting colorama (0.4.5)
PyPI: 8 packages found for atomicwrites >=1.0
   1: selecting atomicwrites (1.4.1)
PyPI: 4 packages found for shellingham >=1.3.0,<2.0.0
   1: selecting shellingham (1.4.0)
PyPI: 25 packages found for rich >=10.11.0,<13.0.0
   1: fact: rich (12.5.1) depends on typing-extensions (>=4.0.0,<5.0)
   1: fact: rich (12.5.1) depends on pygments (>=2.6.0,<3.0.0)
   1: fact: rich (12.5.1) depends on commonmark (>=0.9.0,<0.10.0)
   1: selecting rich (12.5.1)
   1: derived: commonmark (>=0.9.0,<0.10.0)
   1: derived: pygments (>=2.6.0,<3.0.0)
   1: derived: typing-extensions (>=4.0.0,<5.0)
PyPI: 2 packages found for commonmark >=0.9.0,<0.10.0
   1: selecting commonmark (0.9.1)
PyPI: 15 packages found for pygments >=2.6.0,<3.0.0
   1: selecting pygments (2.12.0)
PyPI: 1 packages found for typer 0.5.0
   1: fact: typer (0.5.0) depends on click (>=7.1.1,<9.0.0)
   1: selecting typer (0.5.0)
PyPI: No release information found for idna-0.1, skipping
PyPI: 7 packages found for idna >=2.8
   1: selecting idna (3.3)
PyPI: 6 packages found for typing-extensions >=4.0.0,<5.0
   1: selecting typing-extensions (4.3.0)
   1: Version solving took 0.249 seconds.
   1: Tried 1 solutions.

Writing lock file

Finding the necessary packages for the current system

Package operations: 1 install, 0 updates, 0 removals

  • Installing rapidfuzz (2.3.0): Pending...
  • Installing rapidfuzz (2.3.0): Failed

  RuntimeError

  Unable to find installation candidates for rapidfuzz (2.3.0)

  at ~/.local/lib/python3.8/site-packages/poetry/installation/chooser.py:72 in choose_for
       68│ 
       69│             links.append(link)
       70│ 
       71│         if not links:
    →  72│             raise RuntimeError(
       73│                 "Unable to find installation candidates for {}".format(package)
       74│             )
       75│ 
       76│         # Get the best link

i3@sleipnir:~/Documents/Projects/libretrofuzz$

If i remove the ^ from rapidfuzz = "^2.0.8" it suddenly starts working, so it seems like one of the edits of the pyproject.toml (or something) broke building on pypi.

opened by i30817 9

cant pip install using pypy on windows 10

error code below

├ù Building wheel for rapidfuzz (pyproject.toml) did not run successfully.
  Γöé exit code: 1
  Γò░ΓöÇ> [1852 lines of output]
      Not searching for unused variables given on the command line.
      -- The C compiler identification is MSVC 19.32.31332.0
      -- Detecting C compiler ABI info
      -- Detecting C compiler ABI info - done
      -- Check for working C compiler: C:/Program Files (x86)/Microsoft Visual Studio/2022/BuildTools/VC/Tools/MSVC/14.32.31326/bin/Hostx86/x64/cl.exe - skipped
      -- Detecting C compile features
      -- Detecting C compile features - done
      -- The CXX compiler identification is MSVC 19.32.31332.0
      CMake Warning (dev) at C:/Users/Kaman/AppData/Local/Temp/pip-build-env-dk8raibg/overlay/Lib/site-packages/cmake/data/share/cmake-3.22/Modules/CMakeDetermineCXXCompiler.cmake:162 (if):
        Policy CMP0054 is not set: Only interpret if() arguments as variables or
        keywords when unquoted.  Run "cmake --help-policy CMP0054" for policy
        details.  Use the cmake_policy command to set the policy and suppress this
        warning.

        Quoted variables like "MSVC" will no longer be dereferenced when the policy
        is set to NEW.  Since the policy is not set the OLD behavior will be used.
      Call Stack (most recent call first):
        CMakeLists.txt:4 (ENABLE_LANGUAGE)
      This warning is for project developers.  Use -Wno-dev to suppress it.

      CMake Warning (dev) at C:/Users/Kaman/AppData/Local/Temp/pip-build-env-dk8raibg/overlay/Lib/site-packages/cmake/data/share/cmake-3.22/Modules/CMakeDetermineCXXCompiler.cmake:183 (elseif):
        Policy CMP0054 is not set: Only interpret if() arguments as variables or
        keywords when unquoted.  Run "cmake --help-policy CMP0054" for policy
        details.  Use the cmake_policy command to set the policy and suppress this
        warning.

        Quoted variables like "MSVC" will no longer be dereferenced when the policy
        is set to NEW.  Since the policy is not set the OLD behavior will be used.
      Call Stack (most recent call first):
        CMakeLists.txt:4 (ENABLE_LANGUAGE)
      This warning is for project developers.  Use -Wno-dev to suppress it.

      -- Detecting CXX compiler ABI info
      -- Detecting CXX compiler ABI info - done
      -- Check for working CXX compiler: C:/Program Files (x86)/Microsoft Visual Studio/2022/BuildTools/VC/Tools/MSVC/14.32.31326/bin/Hostx86/x64/cl.exe - skipped
      -- Detecting CXX compile features
      -- Detecting CXX compile features - done
      -- Configuring done
      -- Generating done
      -- Build files have been written to: C:/Users/Kaman/AppData/Local/Temp/pip-install-97kw0l1e/rapidfuzz_1509194bbaaf44eeb188dd098187530c/_cmake_test_compile/build
      -- The C compiler identification is MSVC 19.32.31332.0
      -- The CXX compiler identification is MSVC 19.32.31332.0
      -- Detecting C compiler ABI info
      -- Detecting C compiler ABI info - done
      -- Check for working C compiler: C:/Program Files (x86)/Microsoft Visual Studio/2022/BuildTools/VC/Tools/MSVC/14.32.31326/bin/Hostx86/x64/cl.exe - skipped
      -- Detecting C compile features
      -- Detecting C compile features - done
      -- Detecting CXX compiler ABI info
      -- Detecting CXX compiler ABI info - done
      -- Check for working CXX compiler: C:/Program Files (x86)/Microsoft Visual Studio/2022/BuildTools/VC/Tools/MSVC/14.32.31326/bin/Hostx86/x64/cl.exe - skipped
      -- Detecting CXX compile features
      -- Detecting CXX compile features - done
      -- Found PythonInterp: C:/pypy/pypy.exe (found version "3.9.10")
      -- Could NOT find PythonLibs (missing: PYTHON_LIBRARIES) (found version "3.9.10")
      -- Found Python: C:/pypy/pypy.exe (found version "3.9.10") found components: Interpreter Development Development.Module Development.Embed
      Using packaged version of Taskflow
      -- CMAKE_ROOT: C:/Users/Kaman/AppData/Local/Temp/pip-build-env-dk8raibg/overlay/Lib/site-packages/cmake/data/share/cmake-3.22
      -- Looking for a CUDA compiler
      -- Looking for a CUDA compiler - NOTFOUND
      -- CMAKE_HOST_SYSTEM: Windows-10.0.19044
      -- CMAKE_BUILD_TYPE: Release
      -- CMAKE_CXX_COMPILER: C:/Program Files (x86)/Microsoft Visual Studio/2022/BuildTools/VC/Tools/MSVC/14.32.31326/bin/Hostx86/x64/cl.exe
      -- CMAKE_CXX_COMPILER_ID: MSVC
      -- CMAKE_CXX_COMPILER_VERSION: 19.32.31332.0
      -- CMAKE_CXX_FLAGS: /DWIN32 /D_WINDOWS /W3 /GR /EHsc
      -- CMAKE_CUDA_COMPILER: NOTFOUND
      -- CMAKE_CUDA_COMPILER_ID:
      -- CMAKE_CUDA_COMPILER_VERSION:
      -- CMAKE_CUDA_FLAGS:
      -- CMAKE_MODULE_PATH: C:/Users/Kaman/AppData/Local/Temp/pip-build-env-dk8raibg/overlay/Lib/site-packages/skbuild/resources/cmake
      -- CMAKE_CURRENT_SOURCE_DIR: C:/Users/Kaman/AppData/Local/Temp/pip-install-97kw0l1e/rapidfuzz_1509194bbaaf44eeb188dd098187530c/extern/taskflow
      -- CMAKE_CURRENT_BINARY_DIR: C:/Users/Kaman/AppData/Local/Temp/pip-install-97kw0l1e/rapidfuzz_1509194bbaaf44eeb188dd098187530c/_skbuild/win-amd64-3.9/cmake-build/extern/taskflow
      -- CMAKE_EXE_LINKER_FLAGS: /machine:x64
      -- CMAKE_INSTALL_PREFIX: C:/Users/Kaman/AppData/Local/Temp/pip-install-97kw0l1e/rapidfuzz_1509194bbaaf44eeb188dd098187530c/_skbuild/win-amd64-3.9/cmake-install
      -- CMAKE_MODULE_PATH: C:/Users/Kaman/AppData/Local/Temp/pip-build-env-dk8raibg/overlay/Lib/site-packages/skbuild/resources/cmake
      -- CMAKE_PREFIX_PATH:
      -- PROJECT_NAME: Taskflow
      -- TF_BUILD_BENCHMARKS: OFF
      -- TF_BUILD_CUDA: OFF
      -- TF_BUILD_TESTS: OFF
      -- TF_BUILD_EXAMPLES: OFF
      -- TF_INC_INSTALL_DIR: C:/Users/Kaman/AppData/Local/Temp/pip-install-97kw0l1e/rapidfuzz_1509194bbaaf44eeb188dd098187530c/_skbuild/win-amd64-3.9/cmake-install/include
      -- TF_LIB_INSTALL_DIR: C:/Users/Kaman/AppData/Local/Temp/pip-install-97kw0l1e/rapidfuzz_1509194bbaaf44eeb188dd098187530c/_skbuild/win-amd64-3.9/cmake-install/lib
      -- TF_UTEST_DIR: C:/Users/Kaman/AppData/Local/Temp/pip-install-97kw0l1e/rapidfuzz_1509194bbaaf44eeb188dd098187530c/extern/taskflow/unittests
      -- TF_EXAMPLE_DIR: C:/Users/Kaman/AppData/Local/Temp/pip-install-97kw0l1e/rapidfuzz_1509194bbaaf44eeb188dd098187530c/extern/taskflow/examples
      -- TF_BENCHMARK_DIR: C:/Users/Kaman/AppData/Local/Temp/pip-install-97kw0l1e/rapidfuzz_1509194bbaaf44eeb188dd098187530c/extern/taskflow/benchmarks
      -- TF_3RD_PARTY_DIR: C:/Users/Kaman/AppData/Local/Temp/pip-install-97kw0l1e/rapidfuzz_1509194bbaaf44eeb188dd098187530c/extern/taskflow/3rd-party
      -- Looking for pthread.h
      -- Looking for pthread.h - not found
      -- Found Threads: TRUE
      Using packaged version of rapidfuzz-cpp
      Using packaged version of jaro_winkler
      -- Performing Test Weak Link MODULE -> SHARED (gnu_ld_ignore) - Failed
      -- Performing Test Weak Link MODULE -> SHARED (osx_dynamic_lookup) - Failed
      -- Performing Test Weak Link MODULE -> SHARED (no_flag) - Failed
      _modinit_prefix:PyInit_
      _modinit_prefix:PyInit_
      _modinit_prefix:PyInit_
      _modinit_prefix:PyInit_
      _modinit_prefix:PyInit_
      _modinit_prefix:PyInit_
      _modinit_prefix:PyInit_
      _modinit_prefix:PyInit_
      _modinit_prefix:PyInit_
      _modinit_prefix:PyInit_
      -- Configuring done
      -- Generating done
      CMake Warning:
        Manually-specified variables were not used by the project:

          PYTHON_NumPy_INCLUDE_DIRS
          Python3_EXECUTABLE
          Python3_INCLUDE_DIR
          Python3_LIBRARY
          Python3_NumPy_INCLUDE_DIRS
          Python_NumPy_INCLUDE_DIRS
          SKBUILD


      -- Build files have been written to: C:/Users/Kaman/AppData/Local/Temp/pip-install-97kw0l1e/rapidfuzz_1509194bbaaf44eeb188dd098187530c/_skbuild/win-amd64-3.9/cmake-build
      [1/22] Building CXX object rapidfuzz\CMakeFiles\cpp_utils.dir\utils.cpp.obj
      cl : ???? ??? warning D9025 : '/W3' ?? '/W4' ????????
      [2/22] Building CXX object rapidfuzz\CMakeFiles\cpp_utils.dir\cpp_utils.cxx.obj
      cl : ???? ??? warning D9025 : '/W3' ?? '/W4' ????????
      C:\Users\Kaman\AppData\Local\Temp\pip-install-97kw0l1e\rapidfuzz_1509194bbaaf44eeb188dd098187530c\rapidfuzz\cpp_utils.cxx(2519): warning C4100: '__pyx_self': ?????????? 1 ??????????
      C:\Users\Kaman\AppData\Local\Temp\pip-install-97kw0l1e\rapidfuzz_1509194bbaaf44eeb188dd098187530c\rapidfuzz\cpp_utils.cxx(3387): warning C4127: ?????????
      C:\Users\Kaman\AppData\Local\Temp\pip-install-97kw0l1e\rapidfuzz_1509194bbaaf44eeb188dd098187530c\rapidfuzz\cpp_utils.cxx(3399): warning C4127: ?????????
      C:\Users\Kaman\AppData\Local\Temp\pip-install-97kw0l1e\rapidfuzz_1509194bbaaf44eeb188dd098187530c\rapidfuzz\cpp_utils.cxx(3767): warning C4127: ?????????
      C:\Users\Kaman\AppData\Local\Temp\pip-install-97kw0l1e\rapidfuzz_1509194bbaaf44eeb188dd098187530c\rapidfuzz\cpp_utils.cxx(3779): warning C4127: ?????????
      C:\Users\Kaman\AppData\Local\Temp\pip-install-97kw0l1e\rapidfuzz_1509194bbaaf44eeb188dd098187530c\rapidfuzz\cpp_utils.cxx(4869): warning C4127: ?????????
      C:\Users\Kaman\AppData\Local\Temp\pip-install-97kw0l1e\rapidfuzz_1509194bbaaf44eeb188dd098187530c\rapidfuzz\cpp_utils.cxx(4911): warning C4127: ?????????
      C:\Users\Kaman\AppData\Local\Temp\pip-install-97kw0l1e\rapidfuzz_1509194bbaaf44eeb188dd098187530c\rapidfuzz\cpp_utils.cxx(6373): warning C4100: 'boundscheck': ?????????? 1 ??????????
      C:\Users\Kaman\AppData\Local\Temp\pip-install-97kw0l1e\rapidfuzz_1509194bbaaf44eeb188dd098187530c\rapidfuzz\cpp_utils.cxx(6372): warning C4100: 'wraparound': ?????????? 1 ??????????
      C:\Users\Kaman\AppData\Local\Temp\pip-install-97kw0l1e\rapidfuzz_1509194bbaaf44eeb188dd098187530c\rapidfuzz\cpp_utils.cxx(6391): warning C4100: 'boundscheck': ?????????? 1 ??????????
      C:\Users\Kaman\AppData\Local\Temp\pip-install-97kw0l1e\rapidfuzz_1509194bbaaf44eeb188dd098187530c\rapidfuzz\cpp_utils.cxx(6390): warning C4100: 'wraparound': ?????????? 1 ??????????
      C:\Users\Kaman\AppData\Local\Temp\pip-install-97kw0l1e\rapidfuzz_1509194bbaaf44eeb188dd098187530c\rapidfuzz\cpp_utils.cxx(6409): warning C4100: 'boundscheck': ?????????? 1 ??????????
      C:\Users\Kaman\AppData\Local\Temp\pip-install-97kw0l1e\rapidfuzz_1509194bbaaf44eeb188dd098187530c\rapidfuzz\cpp_utils.cxx(6408): warning C4100: 'wraparound': ?????????? 1 ??????????
      C:\Users\Kaman\AppData\Local\Temp\pip-install-97kw0l1e\rapidfuzz_1509194bbaaf44eeb188dd098187530c\rapidfuzz\cpp_utils.cxx(8350): warning C4100: 'tstate': ?????????? 1 ??????????
      C:\pypy\Include\pypy_decl.h(24): warning C4505: 'PySlice_GetIndicesEx': ???????????????????????????
      C:\Users\Kaman\AppData\Local\Temp\pip-install-97kw0l1e\rapidfuzz_1509194bbaaf44eeb188dd098187530c\rapidfuzz\cpp_utils.cxx(3760) : warning C4702: ?????????????
      C:\Users\Kaman\AppData\Local\Temp\pip-install-97kw0l1e\rapidfuzz_1509194bbaaf44eeb188dd098187530c\rapidfuzz\cpp_utils.cxx(3380) : warning C4702: ?????????????
      [3/22] Linking CXX shared module rapidfuzz\cpp_utils.pypy39-pp73-win_amd64.pyd
      FAILED: rapidfuzz/cpp_utils.pypy39-pp73-win_amd64.pyd
      cmd.exe /C "cd . && C:\Users\Kaman\AppData\Local\Temp\pip-build-env-dk8raibg\overlay\Lib\site-packages\cmake\data\bin\cmake.exe -E vs_link_dll --intdir=rapidfuzz\CMakeFiles\cpp_utils.dir --rc=C:\PROGRA~2\WI3CF2~1\10\bin\100190~1.0\x86\rc.exe --mt=C:\PROGRA~2\WI3CF2~1\10\bin\100190~1.0\x86\mt.exe --manifests  -- C:\PROGRA~2\MICROS~2\2022\BUILDT~1\VC\Tools\MSVC\1432~1.313\bin\Hostx86\x64\link.exe /nologo rapidfuzz\CMakeFiles\cpp_utils.dir\cpp_utils.cxx.obj rapidfuzz\CMakeFiles\cpp_utils.dir\utils.cpp.obj  /out:rapidfuzz\cpp_utils.pypy39-pp73-win_amd64.pyd /implib:rapidfuzz\cpp_utils.lib /pdb:rapidfuzz\cpp_utils.pdb /dll /version:0.0 /machine:x64 /INCREMENTAL:NO /EXPORT:PyInit_cpp_utils  kernel32.lib user32.lib gdi32.lib winspool.lib shell32.lib ole32.lib oleaut32.lib uuid.lib comdlg32.lib advapi32.lib  && cd ."
      LINK: command "C:\PROGRA~2\MICROS~2\2022\BUILDT~1\VC\Tools\MSVC\1432~1.313\bin\Hostx86\x64\link.exe /nologo rapidfuzz\CMakeFiles\cpp_utils.dir\cpp_utils.cxx.obj rapidfuzz\CMakeFiles\cpp_utils.dir\utils.cpp.obj /out:rapidfuzz\cpp_utils.pypy39-pp73-win_amd64.pyd /implib:rapidfuzz\cpp_utils.lib /pdb:rapidfuzz\cpp_utils.pdb /dll /version:0.0 /machine:x64 /INCREMENTAL:NO /EXPORT:PyInit_cpp_utils kernel32.lib user32.lib gdi32.lib winspool.lib shell32.lib ole32.lib oleaut32.lib uuid.lib comdlg32.lib advapi32.lib /MANIFEST /MANIFESTFILE:rapidfuzz\cpp_utils.pypy39-pp73-win_amd64.pyd.manifest" failed (exit code 1104) with the following output:
      LINK : fatal error LNK1104: ???? 'python39.lib' ????????????
      [4/22] Building CXX object rapidfuzz\distance\CMakeFiles\_initialize.dir\_initialize.cxx.obj
      cl : ???? ??? warning D9025 : '/W3' ?? '/W4' ????????

bug

opened by TingTingin 9

Add BK Tree implementation

It would make sense to add a BK Tree implementation for scorers which full fill the triangle inequality. This would provide massive performance improvements for things like searches.

https://dl.acm.org/doi/10.1145/362003.362025
enhancement performance

opened by maxbachmann 2

BUG: `None` can't work with `process.cdist`

None will fail at process.cdist. But None is okay to fuzz.ratio.

>>> from rapidfuzz import process
>>> process.cdist(
...     ["hello", "world"],
...     ["hi", None],
... )
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Software\miniforge3\envs\dtoolkit\lib\site-packages\rapidfuzz\process_cpp.py", line 73, in cdist
    _cdist(
  File "src/rapidfuzz/process_cpp_impl.pyx", line 1508, in rapidfuzz.process_cpp_impl.cdist
  File "src/rapidfuzz/process_cpp_impl.pyx", line 1393, in rapidfuzz.process_cpp_impl.cdist_two_lists
  File "src/rapidfuzz/process_cpp_impl.pyx", line 1321, in rapidfuzz.process_cpp_impl.preprocess
  File "./src/rapidfuzz/cpp_common.pxd", line 332, in cpp_common.conv_sequence
  File "./src/rapidfuzz/cpp_common.pxd", line 300, in cpp_common.hash_sequence
TypeError: object of type 'NoneType' has no len()

enhancement

opened by Zeroto521 4

add SIMD support to more functions in the process module
SIMD support is still missing for:

[ ] process.extractOne

[ ] process.extract

[ ] process.cdist when both sequences are similar.

[ ] process.extract_iter

performance
opened by maxbachmann 0
add SIMD support for long sequences

for sequences with lengths over 64 characters it would still be possible to calculate the similarity for multiple sequences in parallel using simd. However for very long sequences it might be faster to compare individual sequences especially when a score_cutoff is specified
performance

opened by maxbachmann 0

Releases(v2.13.7)

v2.13.7(Dec 20, 2022)
Fixed

fix function signature of get_requires_for_build_wheel

Source code(tar.gz)
Source code(zip)
v2.13.6(Dec 11, 2022)
Changed

reformat changelog as restructured text to get rig of m2r2 dependency

Source code(tar.gz)
Source code(zip)
v2.13.5(Dec 10, 2022)
Added

added docs to sdist

Fixed

fix two cases of undefined behavior in process.cdist

Source code(tar.gz)
Source code(zip)
v2.13.4(Dec 8, 2022)
Changed

handle float("nan") similar to None for query / choice, since this is common for non-existent data in tools like numpy

Fixed

fix handling on None/float("nan") in process.distance

use absolute imports inside tests

Source code(tar.gz)
Source code(zip)
v2.13.3(Dec 3, 2022)
Fixed

improve handling of functions wrapped using functools.wraps

fix broken fallback to Python implementation when the a ImportError occurs on import. This can e.g. occur when the binary has a dependency on libatomic, but it is unavailable on the system

define CMAKE_C_COMPILER_AR/CMAKE_CXX_COMPILER_AR/CMAKE_C_COMPILER_RANLIB/CMAKE_CXX_COMPILER_RANLIB if they are not defined yet

Source code(tar.gz)
Source code(zip)
v2.13.2(Nov 5, 2022)
Fixed

fix incorrect results in Hamming.normalized_similarity

fix incorrect score_cutoff handling in pure python implementation of Postfix.normalized_distance and Prefix.normalized_distance

fix Levenshtein.normalized_similarity and Levenshtein.normalized_distance when used in combination with the process module

fuzz.partial_ratio was not always symmetric when len(s1) == len(s2)

Source code(tar.gz)
Source code(zip)
v2.13.1(Nov 3, 2022)
Fixed

fix bug in normalized_similarity of most scorers, leading to incorrect results when used in combination with the process module

fix sse2 support

fix bug in JaroWinkler and Jaro when used in the pure python process module

forward kwargs in pure Python implementation of process.extract

Source code(tar.gz)
Source code(zip)
v2.13.0(Oct 29, 2022)
Fixed

fix bug in Levenshtein.editops leading to crashes when used with score_hint

Changed

moved capi from rapidfuzz_capi into rapidfuzz, since it will always succeed the installation now that there is a pure Python mode

add score_hint argument to process module

add score_hint argument to Levenshtein module

Source code(tar.gz)
Source code(zip)
v2.12.0(Oct 24, 2022)
Changed

drop support for Python 3.6

Added

added Prefix/Suffix similarity

Fixed

fixed packaging with pyinstaller

Source code(tar.gz)
Source code(zip)
v2.11.1(Oct 3, 2022)
Fixed

Fix segmentation fault in process.cdist when used with an empty query sequence

Source code(tar.gz)
Source code(zip)
v2.11.0(Oct 2, 2022)
Changes

move jarowinkler dependency into rapidfuzz to simplify maintenance

Performance

add SIMD implementation for fuzz.ratio/fuzz.QRatio/Levenshtein/Indel/LCSseq/OSA to improve performance for short strings in cdist

Source code(tar.gz)
Source code(zip)
v2.10.3(Sep 30, 2022)
Fixed

use scikit-build=0.14.1 on Linux, since scikit-build=0.15.0 fails to find the Python Interpreter

workaround gcc bug in template type deduction

Source code(tar.gz)
Source code(zip)
v2.10.2(Sep 27, 2022)
Fixed

fix support for cmake versions below 3.17

Source code(tar.gz)
Source code(zip)
v2.10.1(Sep 25, 2022)
Changed

modernize cmake build to fix most conda-forge builds

Source code(tar.gz)
Source code(zip)
v2.10.0(Sep 18, 2022)
Added

add editops to hamming distance

Performance

strip common affix in osa distance

Fixed

ignore missing pandas in Python3.11 tests

Source code(tar.gz)
Source code(zip)
v2.9.0(Sep 16, 2022)
Added

add optimal string alignment (OSA)

Source code(tar.gz)
Source code(zip)
v2.8.0(Sep 11, 2022)
Fixed

fuzz.partial_ratio did not find the optimal alignment in some edge cases (#219)

Performance

improve performance of fuzz.partial_ratio

Changed

increased minimum C++ version to C++17 (see #255)

Source code(tar.gz)
Source code(zip)
v2.7.0(Sep 11, 2022)
Performance

improve performance of Levenshtein.distance/Levenshtein.editops for long sequences.

Added

add score_hint parameter to Levenshtein.editops which allows the use of a faster implementation

Changed

all functions in the string_metric module do now raise a deprecation warning. They are now only wrappers for their replacement functions, which makes them slower when used with the process module

Source code(tar.gz)
Source code(zip)
v2.6.1(Sep 4, 2022)
Fixed

fix incorrect results of partial_ratio for long needles (#257)

Source code(tar.gz)
Source code(zip)
v2.6.0(Aug 20, 2022)
Fixed

fix hashing for custom classes

Added

add support for slicing in Editops.__getitem__/Editops.__delitem__

add DamerauLevenshtein module

Source code(tar.gz)
Source code(zip)
v2.5.0(Aug 14, 2022)
Added

added support for KeyboardInterrupt in processor module It might still take a bit until the KeyboardInterrupt is registered, but no longer runs all text comparisions after pressing Ctrl + C

Fixed

fix default scorer used by cdist to use C++ implementation if possible

Source code(tar.gz)
Source code(zip)
v2.4.4(Aug 12, 2022)
Changed

Added support for Python3.11

Source code(tar.gz)
Source code(zip)
v2.4.3(Aug 8, 2022)
Fixed

fix value range of jaro_similarity/jaro_winkler_similarity in the pure Python mode for the string_metric module

fix missing atomic symbol on arm 32 bit

Source code(tar.gz)
Source code(zip)
v2.4.2(Jul 30, 2022)
Fixed

add missing symbol to pure Python version which prevented the usage of the fallback implementation

Source code(tar.gz)
Source code(zip)
v2.4.1(Jul 29, 2022)
Fixed

fix version number

Source code(tar.gz)
Source code(zip)
v2.4.0(Jul 29, 2022)
Fixed

fix banded Levenshtein implementation

Performance

improve performance and memory usage of Levenshtein.editops

memory usage is reduced from O(NM) to O(N)

performance is improved for long sequences

Source code(tar.gz)
Source code(zip)
v2.3.0(Jul 22, 2022)
Added

add as_matching_blocks to Editops/Opcodes

add support for deletions from Editops

add Editops.apply/Opcodes.apply

add Editops.remove_subsequence

Changed

merge adjacent similar blocks in Opcodes

Fixed

fix usage of eval(repr(Editop)), eval(repr(Editops)), eval(repr(Opcode)) and eval(repr(Opcodes))

fix opcode conversion for empty source sequence

fix validation for empty Opcode list passed into Opcodes.__init__

Source code(tar.gz)
Source code(zip)
v2.2.0(Jul 19, 2022)
Changed

added in-tree build backend to install cmake and ninja only when it is not installed yet and only when wheels are available

Source code(tar.gz)
Source code(zip)
v2.1.4(Jul 17, 2022)
Changed

changed internal implementation of cdist to remove build dependency to numpy

Added

added wheels for musllinux and manylinux ppc64le, s390x

Source code(tar.gz)
Source code(zip)
v2.1.3(Jul 9, 2022)
Fixed

fix missing type stubs

Source code(tar.gz)
Source code(zip)