Fuzzy String Matching in Python

SeatGeek

Last update: Jan 1, 2023

Related tags

Text Data & NLP fuzzywuzzy

Overview

FuzzyWuzzy

Fuzzy string matching like a boss. It uses Levenshtein Distance to calculate the differences between sequences in a simple-to-use package.

Requirements

Python 2.7 or higher
difflib
python-Levenshtein (optional, provides a 4-10x speedup in String Matching, though may result in differing results for certain cases)

For testing

pycodestyle
hypothesis
pytest

Installation

Using PIP via PyPI

pip install fuzzywuzzy

or the following to install python-Levenshtein too

pip install fuzzywuzzy[speedup]

Using PIP via Github

pip install git+git://github.com/seatgeek/[email protected]#egg=fuzzywuzzy

Adding to your requirements.txt file (run pip install -r requirements.txt afterwards)

git+ssh://[email protected]/seatgeek/[email protected]#egg=fuzzywuzzy

Manually via GIT

git clone git://github.com/seatgeek/fuzzywuzzy.git fuzzywuzzy
cd fuzzywuzzy
python setup.py install

Usage

>>> from fuzzywuzzy import fuzz
>>> from fuzzywuzzy import process

Simple Ratio

>>> fuzz.ratio("this is a test", "this is a test!")
    97

Partial Ratio

>>> fuzz.partial_ratio("this is a test", "this is a test!")
    100

Token Sort Ratio

>>> fuzz.ratio("fuzzy wuzzy was a bear", "wuzzy fuzzy was a bear")
    91
>>> fuzz.token_sort_ratio("fuzzy wuzzy was a bear", "wuzzy fuzzy was a bear")
    100

Token Set Ratio

>>> fuzz.token_sort_ratio("fuzzy was a bear", "fuzzy fuzzy was a bear")
    84
>>> fuzz.token_set_ratio("fuzzy was a bear", "fuzzy fuzzy was a bear")
    100

Process

>>> choices = ["Atlanta Falcons", "New York Jets", "New York Giants", "Dallas Cowboys"]
>>> process.extract("new york jets", choices, limit=2)
    [('New York Jets', 100), ('New York Giants', 78)]
>>> process.extractOne("cowboys", choices)
    ("Dallas Cowboys", 90)

You can also pass additional parameters to extractOne method to make it use a specific scorer. A typical use case is to match file paths:

>>> process.extractOne("System of a down - Hypnotize - Heroin", songs)
    ('/music/library/good/System of a Down/2005 - Hypnotize/01 - Attack.mp3', 86)
>>> process.extractOne("System of a down - Hypnotize - Heroin", songs, scorer=fuzz.token_sort_ratio)
    ("/music/library/good/System of a Down/2005 - Hypnotize/10 - She's Like Heroin.mp3", 61)

Known Ports

FuzzyWuzzy is being ported to other languages too! Here are a few ports we know about:

Java: xpresso's fuzzywuzzy implementation
Java: fuzzywuzzy (java port)
Rust: fuzzyrusty (Rust port)
JavaScript: fuzzball.js (JavaScript port)
C++: Tmplt/fuzzywuzzy
C#: fuzzysharp (.Net port)
Go: go-fuzzywuzz (Go port)
Free Pascal: FuzzyWuzzy.pas (Free Pascal port)
Kotlin multiplatform: FuzzyWuzzy-Kotlin
R: fuzzywuzzyR (R port)

Comments

Incompatible License

I see you've got issue #113 closed, however a simple reading of the GPL linked in the StringMatcher file doesn't just imply, but in fact definitively states, that your project must be licensed as GPL.

1. You may copy and distribute verbatim copies of the Program's source code as you receive it, in any medium, provided that you conspicuously and appropriately publish on each copy an appropriate copyright notice and disclaimer of warranty; keep intact all the notices that refer to this License and to the absence of any warranty; and give any other recipients of the Program a copy of this License along with the Program.

You have not copied the entire source verbatim, you've copied a select portion and incorporated it into a further derived work.

2. You may modify your copy or copies of the Program or any portion of it, thus forming a work based on the Program, and copy and distribute such modifications or work under the terms of Section 1 above, provided that you also meet all of these conditions:

a) You must cause the modified files to carry prominent notices stating that you changed the files and the date of any change.

b) You must cause any work that you distribute or publish, that in whole or in part contains or is derived from the Program or any part thereof, to be licensed as a whole at no charge to all third parties under the terms of this License.

Emphasis added. When you decided to copy in a portion of the code from python-levenshtein you incidentally selected the GPL license for your project, and it's been apparently licensed as MIT / X11, but effectively and legally licensed as GPL, ever since.

I say this as someone who has a company project that's now been copy-lefted by your code, albeit accidentally and inadvertently.

opened by yawpitch 37
List supported python versions, bump minor

Now that #33 has been merged in, fuzzywuzzy looks good to go for python 3 compatibility. Might as well advertise this in the setup.py / on pypi. Version bumped so a new version can be pushed to pypi.

Cheers!

opened by JeffPaine 14
Remove query processing and default processor

Added test case to check that using a processor of the form lambda x: x['key'] doesn't fail.

Set default_processor to None and do not run processor on query.

Remove test case that checked if string reduced to 0 by processor, as string will no longer be processed.

Adjusted test cases in test_fuzzywuzzy_hypothesis to not use a processor, but not fail if scorer reduces query to empty string. If the user supplies a processor that modifies the choice so that it is no longer an exact match for the query, not finding an exact match would be the expected behavior.

Saw some relevant discussion around issues #77 and #141 etc. If processor doesn't run on the query, which I feel it CAN NOT, then processor must also default to None to avoid unexpected behaviors.

(also have a separate branch that only adds the new test case fwiw)

opened by nol13 13
UserWarning: Using slow pure-python SequenceMatcher
C:\Python27\lib\site-packages\fuzzywuzzy\fuzz.py:33: UserWarning: Using slow pure-python SequenceMatcher warnings.warn('Using slow pure-python SequenceMatcher')

The line of code I have for importing is from fuzzywuzzy import fuzz.

I'm running Python 2.7.8 on Windows 8, pip 1.5.6, and fuzzywuzzy 0.4.0.
opened by sylvia43 13
Upload latest version to pypi

Could we kindly upload the latest version to pypi (directions)?

I'd like to use this library on a python 3 project where using pip install would be greatly appreciated :smile: Cheers!

opened by JeffPaine 13

ImportError: cannot import name fuzz

Below is more information.

$> sudo pip install fuzzywuzzy   #works, no error

$> vi mytest.py
from fuzzywuzzy import fuzz
from fuzzywuzzy import process

$>python mytest.py
...
ImportError: cannot import name fuzz

opened by harishvc 12

Fuzzy install via pip in Conda environment, python-Levenshtein warning

Hi...

I'm new to Conda, and just re-installed fuzzywuzzy using conda version of pip. python-Levenshtein is also install according to conda.

I'm using Python 3.3.

I'm not sure what to do about this...

Cheers !

opened by dpcuneowcg 12
Fuzzywuzzy 4x slower *with* python-Levenshtein installed

According to the intro page: "python-Levenshtein (optional, provides a 4-10x speedup in String Matching)"

I am fuzzy matching paragraphs with their best match from a significant list of paragraphs. When I use fuzzywuzzy without python-Levenshtein on Ubuntu, I receive the error that I should install it for a speed-up, and it took about 45 mins on a test set of paragraphs:

real 46m56.813s user 125m45.952s sys 0m0.432s

When I then installed python-Levenshtein, the error went away as expected, but running the same example set of paragraphs took over 200 minutes:

real 204m51.384s user 522m49.436s sys 0m21.376s

The results are identical except for one paragraph, which does match better with the slower, python-Levenshtein version. While I haven't timed an example, doing the same on the mac also seems significantly slower with python-Levenshtein installed.

The error message when python-Levenshtein is not installed says that the reason to install the package is for speed, so to me something isn't right: either it shouldn't be slower, or the error message should change to say that it should be installed for increased accuracy (if that is the case).
needs more info

opened by Hooloovoo 11

Not python3 compliant

I get the following error:

File "/usr/lib/python3.4/site-packages/fuzzywuzzy/fuzz.py", line 49, in ratio
    s1, s2 = utils.make_type_consistent(s1, s2)
  File "/usr/lib/python3.4/site-packages/fuzzywuzzy/utils.py", line 43, in make_type_consistent
    elif isinstance(s1, unicode) and isinstance(s2, unicode):
NameError: name 'unicode' is not defined

opened by ashneo76 11

Support for other langauges
Hi, First of all, thanks for maintaining this. I just noticed that both token_sort_ratio and token_set_ratio don't support Arabic characters. I don't know about other non-English ones but at lease they don't support Arabic.. It's returning 0 as a result of comparing anything with Arabic string. Even if they were 2 Arabic strings..

>>> print fuzz.token_sort_ratio("مرحبا جميعا", "مرحبا جميعا وشكرا لكم") 0 >>> print fuzz.partial_ratio("مرحبا جميعا", "مرحبا جميعا وشكرا لكم") 100

So I'm just wondering if this's a bug or it simply just doesn't support non-English characters? Thanks
opened by tester88 10
Fix for Python 3.7

Fixes #233

According to PEP 479, if raise StopIteration occurs directly in a generator, simply replace it with return.

This is both backwards and forwards compatible code.

opened by hb-alexbotello 9

NameError: name 'ratio' is not defined

While running the Fuzzy Wuzzy process.extract() method the following error is thrown :-

    matches = fw_process.extract(
../../.local/share/virtualenvs/proj-NKfiPrkj/lib/python3.8/site-packages/fuzzywuzzy/process.py:168: in extract
    return heapq.nlargest(limit, sl, key=lambda i: i[1]) if limit is not None else \
/usr/lib/python3.8/heapq.py:563: in nlargest
    result = [(key(elem), i, elem) for i, elem in zip(range(0, -n, -1), it)]
/usr/lib/python3.8/heapq.py:563: in <listcomp>
    result = [(key(elem), i, elem) for i, elem in zip(range(0, -n, -1), it)]
../../.local/share/virtualenvs/proj-NKfiPrkj/lib/python3.8/site-packages/fuzzywuzzy/process.py:117: in extractWithoutOrder
    score = scorer(processed_query, processed)
../../.local/share/virtualenvs/proj-NKfiPrkj/lib/python3.8/site-packages/fuzzywuzzy/fuzz.py:276: in WRatio
    base = ratio(p1, p2)
../../.local/share/virtualenvs/proj-NKfiPrkj/lib/python3.8/site-packages/fuzzywuzzy/utils.py:38: in decorator
    return func(*args, **kwargs)
../../.local/share/virtualenvs/proj-NKfiPrkj/lib/python3.8/site-packages/fuzzywuzzy/utils.py:29: in decorator
    return func(*args, **kwargs)
../../.local/share/virtualenvs/proj-NKfiPrkj/lib/python3.8/site-packages/fuzzywuzzy/utils.py:47: in decorator
    return func(*args, **kwargs)
../../.local/share/virtualenvs/proj-NKfiPrkj/lib/python3.8/site-packages/fuzzywuzzy/fuzz.py:28: in ratio
    return utils.intr(100 * m.ratio())
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <fuzzywuzzy.StringMatcher.StringMatcher object at 0x7f115ab32160>

    def ratio(self):
        if not self._ratio:
>           self._ratio = ratio(self._str1, self._str2)
E           NameError: name 'ratio' is not defined

I found that pinning the python-Levenshtein version to 0.12.2 solve the issue.

opened by DivanshuTak 1

How to decrease False positive matches? (process.extract / WRatio)
I am using process.extract method, And I know it uses WRatio under the hood for calculating score. Following is the case in which I am getting very high score of 90 despite the string hardly equal. Is there any way to fix this in WRatio?

inp_name="america" name_list=["american Futures and Options Exchange"] process.extractOne(inp_name,name_list)

Output--> ('american Futures and Options Exchange', 90.0, 0)

PS: I know other alternatives likes fuzz.ratio, partial_ratio, token_sort_ratio. But WRatio works pretty well for my usecase. So any workaround for the same would be appreciated... Thanks!
opened by Pranav082001 3

'list' object has no attribute 'items'

When trying to get the data of such a string 'hello 𝙎𝙈𝙈 world' using token_set_ratio(), no problems arise, but there is an error when calling process.extract().

If you remove the incomprehensible characters "SMM" from the line, then there is no error

Example:

strtest = 'hello 𝙎𝙈𝙈 world'
stroka = "word"
print(str(fuzz.token_set_ratio(stroka, strtest))) # OK
for message in process.extract(stroka, [strtest, 'sss'], limit=1): # ERROR
    pass

Error:

Traceback (most recent call last):
  File "C:\Users\Alexey\AppData\Local\Programs\Python\Python39\lib\site-packages\fuzzywuzzy\process.py", line 108, in extractWithoutOrder
    for key, choice in choices.items():
AttributeError: 'list' object has no attribute 'items'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\Users\Alexey\Documents\fuzzywuzzy\index.py", line 191, in <module>
    for message in process.extract(stroka, [strtest, 'sss'], limit=1): # ERROR
  File "C:\Users\Alexey\AppData\Local\Programs\Python\Python39\lib\site-packages\fuzzywuzzy\process.py", line 168, in extract
    return heapq.nlargest(limit, sl, key=lambda i: i[1]) if limit is not None else \
  File "C:\Users\Alexey\AppData\Local\Programs\Python\Python39\lib\heapq.py", line 531, in nlargest
    result = max(it, default=sentinel, key=key)
  File "C:\Users\Alexey\AppData\Local\Programs\Python\Python39\lib\site-packages\fuzzywuzzy\process.py", line 117, in extractWithoutOrder
    score = scorer(processed_query, processed)
  File "C:\Users\Alexey\AppData\Local\Programs\Python\Python39\lib\site-packages\fuzzywuzzy\fuzz.py", line 288, in WRatio
    partial = partial_ratio(p1, p2) * partial_scale
  File "C:\Users\Alexey\AppData\Local\Programs\Python\Python39\lib\site-packages\fuzzywuzzy\utils.py", line 38, in decorator
    return func(*args, **kwargs)
  File "C:\Users\Alexey\AppData\Local\Programs\Python\Python39\lib\site-packages\fuzzywuzzy\utils.py", line 29, in decorator
    return func(*args, **kwargs)
  File "C:\Users\Alexey\AppData\Local\Programs\Python\Python39\lib\site-packages\fuzzywuzzy\utils.py", line 47, in decorator
    return func(*args, **kwargs)
  File "C:\Users\Alexey\AppData\Local\Programs\Python\Python39\lib\site-packages\fuzzywuzzy\fuzz.py", line 47, in partial_ratio
    blocks = m.get_matching_blocks()
  File "C:\Users\Alexey\AppData\Local\Programs\Python\Python39\lib\site-packages\fuzzywuzzy\StringMatcher.py", line 58, in get_matching_blocks
    self._matching_blocks = matching_blocks(self.get_opcodes(),
ValueError: apply_edit edit operations are invalid or inapplicable

opened by syfulin 0

Removed a manual file handler pitfall

The problem There was a case where the code was using a manual file handler pitfall, where a file stream was being opened and closed manually. But since Python supports automatic stream closing using the block 'with', its better to use it instead of the manual close in order to remove a bug vector.

Solution Refactored the code to remove the manual file handler

opened by NaelsonDouglas 0
token_set_ratio Degenerate Case
Referring to the description of token_set_ratio in the original blog post: if the SORTED_INTERSECTION is a strict subset of STRING2, the result ratio will be 100. E.g.,

fuzz.token_set_ratio("Deep Learning", "Python Machine Learning: Machine Learning and Deep Learning with Python, scikit-learn, and TensorFlow 2")

yields 100. This is patently incorrect, and does not uphold the purported intuition ("because the SORTED_INTERSECTION component is always exactly the same, the scores increase when (a) that makes up a larger percentage of the full string, and (b) the string remainders are more similar").

Looking at fuzz._token_set, we see that it returns

max( [ ratio_func(sorted_sect, combined_1to2), ratio_func(sorted_sect, combined_2to1), ratio_func(combined_1to2, combined_2to1) ] )

It appears the assumption is that the string remainder will never be empty. Perhaps something like this is more appropriate:

max( [ 0 if sorted_sect == combined_1to2 else ratio_func(sorted_sect, combined_1to2), 0 if sorted_sect == combined_2to1 else ratio_func(sorted_sect, combined_2to1), ratio_func(combined_1to2, combined_2to1) ] )
opened by rogerrohrbach 0
Mark repository as archived

since this repo seems to haven been depercated I suggest to mark it as read-only in the settings. This also displays a banner on top of the page, which may be even easier to catch than the readme …

opened by Bibo-Joshi 0

Releases(0.18.0)

0.18.0(Feb 13, 2020)

Source code(tar.gz)
Source code(zip)
0.16.0(Dec 18, 2017)
Add punctuation characters back in so process does something. [davidcellis]

Simpler alphabet and even fewer examples. [davidcellis]

Fewer examples and larger deadlines for Hypothesis. [davidcellis]

Slightly more examples. [davidcellis]

Attempt to fix the failing 2.7 and 3.6 python tests. [davidcellis]

Readme: add link to C++ port. [Lizard]

Fix tests on Python 3.3. [Jon Banafato]

Modify tox.ini and .travis.yml to install enum34 when running with Python 3.3 to allow hypothesis tests to pass.

Normalize Python versions. [Jon Banafato]

Enable Travis-CI tests for Python 3.6

Enable tests for all supported Python versions in tox.ini

Add Trove classifiers for Python 3.4 - 3.6 to setup.py

Note: Python 2.6 and 3.3 are no longer supported by the Python core team. Support for these can likely be dropped, but that's out of scope for this change set.

Fix typos. [Sven-Hendrik Haase]

Source code(tar.gz)
Source code(zip)
0.15.1(Sep 21, 2017)
Fix setup.py (addresses #155) [Paul O'Leary McCann]

Merge remote-tracking branch 'upstream/master' into extract_optimizations. [nolan]

Seed random before generating benchmark strings. [nolan]

Cleaner implementation of same idea without new param, but adding existing full_process param to Q,W,UQ,UW. [nolan]

Fix benchmark only generate list once. [nolan]

Only run util.full_process once on query when using extract functions, add new benchmarks. [nolan]

Source code(tar.gz)
Source code(zip)
0.15.0(Sep 21, 2017)
Add extras require to install python-levenshtein optionally. [Rolando Espinoza]

This allows to install python-levenshtein as dependency.

Fix link formatting in the README. [Alex Chan]

Add fuzzball.js JavaScript port link. [nolan]

Added Rust Port link. [Logan Collins]

Validate_string docstring. [davidcellis]

For full comparisons test that ONLY exact matches (after processing) are added. [davidcellis]

Add detailed docstrings to WRatio and QRatio comparisons. [davidcellis]

Source code(tar.gz)
Source code(zip)
0.14.0(Feb 20, 2017)
Possible PEP-8 fix + make pep-8 warnings appear in test. [davidcellis]

Possible PEP-8 fix. [davidcellis]

Possible PEP-8 fix. [davidcellis]

Test for stderr log instead of warning. [davidcellis]

Convert warning.warn to logging.warning. [davidcellis]

Additional details for empty string warning from process. [davidcellis]

Enclose warnings.simplefilter() inside a with statement. [samkennerly]

Source code(tar.gz)
Source code(zip)
0.13.0(Feb 20, 2017)
Support alternate git status output. [Jose Diaz-Gonzalez]

Split warning test into new test file, added to travis execution on 2.6 / pypy3. [davidcellis]

Remove hypothesis examples database from gitignore. [davidcellis]

Add check for warning to tests. [davidcellis]

Check processor and warn before scorer may remove processor. [davidcellis]

Renamed test - tidied docstring. [davidcellis]

Add token ratios to the list of scorers that skip running full_process as a processor. [davidcellis]

Added tokex_sort, token_set to test. [davidcellis]

Test docstrings/comments. [davidcellis]

Added py.test .cache/ removed duplicated build from gitignore. [davidcellis]

Added default_scorer, default_processor parameters to make it easier to change in the future. [davidcellis]

Rewrote extracts to explicitly use default values for processor and scorer. [davidcellis]

Changed Hypothesis tests to use pytest parameters. [davidcellis]

Added Hypothesis based tests for identical strings. [Ducksual]

Added test for simple 'a, b' string on process.extractOne. [Ducksual]

Process the query in process.extractWithoutOrder when using a scorer which does not do so. [Ducksual]

Mention that difflib and levenshtein results may differ. [Jose Diaz-Gonzalez]

Source code(tar.gz)
Source code(zip)
0.12.0(Sep 14, 2016)
Declare support for universal wheels. [Thomas Grainger]

Clarify that license is GPLv2. [Gareth Tan]

Source code(tar.gz)
Source code(zip)
0.11.1(Sep 14, 2016)
Add editorconfig. [Jose Diaz-Gonzalez]

Added tox.ini cofig file for easy local multi-environment testing changed travis config to use py.test like tox updated use of pep8 module to pycodestyle. [Pedro Rodrigues]

Source code(tar.gz)
Source code(zip)
0.11.0(Jun 30, 2016)
Clean-up. [desmaisons_david]

Improving performance. [desmaisons_david]

Performance Improvement. [desmaisons_david]

Fix link to Levenshtein. [Brian J. McGuirk]

Fix readme links. [Brian J. McGuirk]

Add license to StringMatcher.py. [Jose Diaz-Gonzalez]

Closes #113

Source code(tar.gz)
Source code(zip)
0.10.0(Jun 30, 2016)
Handle None inputs same as empty string (Issue #94) [Nick Miller]

Source code(tar.gz)
Source code(zip)
0.9.0(Jun 30, 2016)
Pull down all keys when updating local copy. [Jose Diaz-Gonzalez]

Source code(tar.gz)
Source code(zip)
0.8.2(Jun 30, 2016)
Remove the warning for "slow" sequence matcher on PyPy. [Julian Berman] where it's preferable to use the pure-python implementation.

Source code(tar.gz)
Source code(zip)
0.8.1(Jun 30, 2016)
Minor release changes. [Jose Diaz-Gonzalez]

Clean up wiki link in readme. [Ewan Oglethorpe]

Source code(tar.gz)
Source code(zip)
0.8.0(Nov 16, 2015)
Refer to Levenshtein distance in readme. Closes #88. [Jose Diaz-Gonzalez]

Added install step for travis to have pep8 available. [Pedro Rodrigues]

Added a pep8 test. The way I add the error 501 to the ignore tuple is probably wrong but from the docs and source code of pep8 I could not find any other way. [Pedro Rodrigues]

I also went ahead and removed the pep8 call from the release file.

Added python 3.5, pypy, and ypyp3 to the travis config file. [Pedro Rodrigues]

Added another step to the release file to run the tests before releasing. [Pedro Rodrigues]

Fixed a few pep8 errors Added a verification step in the release automation file. This step should probably be somewhere at git level. [Pedro Rodrigues]

Pep8. [Pedro Rodrigues]

Leaving TODOs in the code was never a good idea. [Pedro Rodrigues]

Changed return values to be rounded integers. [Pedro Rodrigues]

Added a test with the recovered data file. [Pedro Rodrigues]

Recovered titledata.csv. [Pedro Rodrigues]

Move extract test methods into the process test. [Shale Craig]

Somehow, they ended up in the RatioTest, despite asserting that the ProcessTest works.

Source code(tar.gz)
Source code(zip)
0.7.0(Oct 2, 2015)
Use portable syntax for catching exception on tests. [Luis Madrigal]

[Fix] test against correct variable. [Luis Madrigal]

Add unit tests for validator decorators. [Luis Madrigal]

Move validators to decorator functions. [Luis Madrigal]

This allows easier composition and IMO makes the functions more readable

Fix typo: dictionery -> dictionary. [shale]

FizzyWuzzy -> FuzzyWuzzy typo correction. [shale]

Add check for gitchangelog. [Jose Diaz-Gonzalez]

Source code(tar.gz)
Source code(zip)
0.6.2(Sep 3, 2015)
Ensure the rst-lint binary is available. [Jose Diaz-Gonzalez]

Source code(tar.gz)
Source code(zip)
0.6.1(Sep 3, 2015)
Minor whitespace changes for PEP8. [Jose Diaz-Gonzalez]

Source code(tar.gz)
Source code(zip)
0.6.0(Jul 20, 2015)
Added link to a java port. [Andriy Burkov]

Patched "name 'unicode' is not defined" python3. [Carlos Garay]

https://github.com/seatgeek/fuzzywuzzy/issues/80

Make process.extract accept {dict, list}-like choices. [Nathan Typanski]

Previously, process.extract expected lists or dictionaries, and tested this with isinstance() calls. In keeping with the spirit of Python (duck typing and all that), this change enables one to use extract() on any dict-like object for dict-like results, or any list-like object for list-like results.

So now we can (and, indeed, I've added tests for these uses) call extract() on things like:

a generator of strings ("any iterable")

a UserDict

custom user-made classes that "look like" dicts (or, really, anything with a .items() method that behaves like a dict)

plain old lists and dicts

The behavior is exactly the same for previous use cases of lists-and-dicts.

This change goes along nicely with PR #68, since those docs suggest dict-like behavior is valid, and this change makes that true.

Merge conflict. [Adam Cohen]

Improve docs for fuzzywuzzy.process. [Nathan Typanski]

The documentation for this module was dated and sometimes inaccurate. This overhauls the docs to accurately describe the current module, including detailing optional arguments that were not previously explained - e.g., limit argument to extract().

This change follows the Google Python Style Guide, which may be found at:

https://google-styleguide.googlecode.com/svn/trunk/pyguide.html?showone=Comments#Comments

Source code(tar.gz)
Source code(zip)
0.5.0(Jul 20, 2015)
FIX: 0.4.0 is released, no need to specify 0.3.1 in README. [Josh Warner (Mac)]

Fixed a small typo. [Rostislav Semenov]

Reset processor and scorer defaults to None with argument checking. [foxxyz]

Catch generators without lengths. [Jeremiah Lowin]

Fixed python3 issue and deprecated assertion method. [foxxyz]

Fixed some docstrings, typos, python3 string method compatibility, some errors that crept in during rebase. [foxxyz]

[mod] The lamdba in extract is not needed. [Olivier Le Thanh Duong]

[mod] Pass directly the defaults functions in the args

[mod] itertools.takewhile() can handle empty list just fine no need to test for it

[mod] Shorten extractOne by removing double if

[mod] Use a list comprehention in extract()

[mod] Autopep8 on process.py

[doc] Document make_type_consistent

[mod] bad_chars shortened

[enh] Move regex compilation outside the method, otherwhise we don't get the benefit from it

[mod] Don't need all the blah just to redefine method from string module

[mod] Remove unused import

[mod] Autopep8 on string_processing.py

[mod] Rewrote asciidammit without recursion to make it more readable

[mod] Autopep8 on utils.py

[mod] Remove unused import

[doc] Add some doc to fuzz.py

[mod] Move the code to sort string in a separate function

[doc] Docstrings for WRatio, UWRatio

Add note on which package to install. Closes #67. [Jose Diaz-Gonzalez]

Source code(tar.gz)
Source code(zip)
0.4.0(Oct 31, 2014)
Merge pull request #64 from ojomio/master. [Jose Diaz-Gonzalez]

In extarctBests() and extractOne() use '>=' instead of '>'

Merge pull request #62 from ojomio/master. [Jose Diaz-Gonzalez]

Fixed python3 issue with SequenceMatcher import

Source code(tar.gz)
Source code(zip)
0.3.3(Oct 22, 2014)
Update release script to make it more generic. [Jose Diaz-Gonzalez]

Merge pull request #60 from ojomio/master. [Jose Diaz-Gonzalez] Fixed issue #59 - "partial" parameter for _token_set() is now honored

Merge pull request #54 from jlowin/patch-1. [Jose Diaz-Gonzalez] Remove explicit check for lists

Source code(tar.gz)
Source code(zip)
0.3.2(Sep 12, 2014)
Make release command an executable. [Jose Diaz-Gonzalez]

Simplify MANIFEST.in. [Jose Diaz-Gonzalez]

Add a release script. [Jose Diaz-Gonzalez]

Fix readme codeblock. [Jose Diaz-Gonzalez]

Minor formatting. [Jose Diaz-Gonzalez]

Update readme with proper installation notes. [Jose Diaz-Gonzalez]

Use version from fuzzywuzzy package. [Jose Diaz-Gonzalez]

Set version constant in init.py. [Jose Diaz-Gonzalez]

Update setup.py. [Jose Diaz-Gonzalez]

Rename LICENSE to LICENSE.txt. [Jose Diaz-Gonzalez]

Update packaging a bit. [Jose Diaz-Gonzalez]

Source code(tar.gz)
Source code(zip)
0.3.0(Aug 24, 2014)
Allow choices to be a list or dict

Add testing for 3.4

Typo updates

Update readme, change formatting to RST

Fix package requirements

PEP8!

Source code(tar.gz)
Source code(zip)

Owner

SeatGeek

GitHub http://chairnerd.seatgeek.com/fuzzywuzzy-fuzzy-string-matching-in-python/

Fuzzy String Matching in Python

FuzzyWuzzy Fuzzy string matching like a boss. It uses Levenshtein Distance to calculate the differences between sequences in a simple-to-use package.

7.9k Feb 17, 2021

:id: A python library for accurate and scalable fuzzy matching, record deduplication and entity-resolution.

Dedupe Python Library dedupe is a python library that uses machine learning to perform fuzzy matching, deduplication and entity resolution quickly on

3.6k Jan 2, 2023

:id: A python library for accurate and scalable fuzzy matching, record deduplication and entity-resolution.

Dedupe Python Library dedupe is a python library that uses machine learning to perform fuzzy matching, deduplication and entity resolution quickly on

2.9k Feb 11, 2021

:id: A python library for accurate and scalable fuzzy matching, record deduplication and entity-resolution.

Dedupe Python Library dedupe is a python library that uses machine learning to perform fuzzy matching, deduplication and entity resolution quickly on

2.9k Feb 17, 2021

Linear programming solver for paper-reviewer matching and mind-matching

Paper-Reviewer Matcher A python package for paper-reviewer matching algorithm based on topic modeling and linear programming. The algorithm is impleme

66 Jul 5, 2022

Takes a string and puts it through different languages in Google Translate a requested amount of times, returning nonsense.

PythonTextObfuscator Takes a string and puts it through different languages in Google Translate a requested amount of times, returning nonsense. Requi

2 Aug 29, 2022

This is a Python binding to the tokenizer Ucto. Tokenisation is one of the first step in almost any Natural Language Processing task, yet it is not always as trivial a task as it appears to be. This binding makes the power of the ucto tokeniser available to Python. Ucto itself is regular-expression based, extensible, and advanced tokeniser written in C++ (http://ilk.uvt.nl/ucto).

Ucto for Python This is a Python binding to the tokeniser Ucto. Tokenisation is one of the first step in almost any Natural Language Processing task,

27 Dec 14, 2022

Fuzzy String Matching in Python

Related tags

Overview

FuzzyWuzzy

Requirements

For testing

Installation

Usage

Simple Ratio

Partial Ratio

Token Sort Ratio

Token Set Ratio

Process

Known Ports

Comments

Releases(0.18.0)

0.18.0(Feb 13, 2020)

0.16.0(Dec 18, 2017)

0.15.1(Sep 21, 2017)

0.15.0(Sep 21, 2017)

0.14.0(Feb 20, 2017)

0.13.0(Feb 20, 2017)

0.12.0(Sep 14, 2016)

0.11.1(Sep 14, 2016)

0.11.0(Jun 30, 2016)

0.10.0(Jun 30, 2016)

0.9.0(Jun 30, 2016)

0.8.2(Jun 30, 2016)

0.8.1(Jun 30, 2016)

0.8.0(Nov 16, 2015)

0.7.0(Oct 2, 2015)

0.6.2(Sep 3, 2015)

0.6.1(Sep 3, 2015)

0.6.0(Jul 20, 2015)

0.5.0(Jul 20, 2015)

0.4.0(Oct 31, 2014)

0.3.3(Oct 22, 2014)

0.3.2(Sep 12, 2014)

0.3.0(Aug 24, 2014)

Owner

SeatGeek

Fuzzy String Matching in Python

:id: A python library for accurate and scalable fuzzy matching, record deduplication and entity-resolution.

:id: A python library for accurate and scalable fuzzy matching, record deduplication and entity-resolution.

:id: A python library for accurate and scalable fuzzy matching, record deduplication and entity-resolution.

Linear programming solver for paper-reviewer matching and mind-matching

Takes a string and puts it through different languages in Google Translate a requested amount of times, returning nonsense.

String Gen + Word Checker

🎐 a python library for doing approximate and phonetic matching of strings.

🎐 a python library for doing approximate and phonetic matching of strings.

Python package for performing Entity and Text Matching using Deep Learning.

🎐 a python library for doing approximate and phonetic matching of strings.

Python package for performing Entity and Text Matching using Deep Learning.

Pattern Matching in Python

Facilitating the design, comparison and sharing of deep text matching models.

Facilitating the design, comparison and sharing of deep text matching models.

Code release for "COTR: Correspondence Transformer for Matching Across Images"

The Internet Archive Research Assistant - Daily search Internet Archive for new items matching your keywords

A pytorch implementation of the ACL2019 paper "Simple and Effective Text Matching with Richer Alignment Features".