Fuzzy String Matching in Python

SeatGeek

Last update: Jan 8, 2023

Related tags

Text Processing fuzzywuzzy

Overview

FuzzyWuzzy

Fuzzy string matching like a boss. It uses Levenshtein Distance to calculate the differences between sequences in a simple-to-use package.

Requirements

Python 2.7 or higher
difflib
python-Levenshtein (optional, provides a 4-10x speedup in String Matching, though may result in differing results for certain cases)

For testing

pycodestyle
hypothesis
pytest

Installation

Using PIP via PyPI

pip install fuzzywuzzy

or the following to install python-Levenshtein too

pip install fuzzywuzzy[speedup]

Using PIP via Github

pip install git+git://github.com/seatgeek/[email protected]#egg=fuzzywuzzy

Adding to your requirements.txt file (run pip install -r requirements.txt afterwards)

git+ssh://[email protected]/seatgeek/[email protected]#egg=fuzzywuzzy

Manually via GIT

git clone git://github.com/seatgeek/fuzzywuzzy.git fuzzywuzzy
cd fuzzywuzzy
python setup.py install

Usage

>>> from fuzzywuzzy import fuzz
>>> from fuzzywuzzy import process

Simple Ratio

>>> fuzz.ratio("this is a test", "this is a test!")
    97

Partial Ratio

>>> fuzz.partial_ratio("this is a test", "this is a test!")
    100

Token Sort Ratio

>>> fuzz.ratio("fuzzy wuzzy was a bear", "wuzzy fuzzy was a bear")
    91
>>> fuzz.token_sort_ratio("fuzzy wuzzy was a bear", "wuzzy fuzzy was a bear")
    100

Token Set Ratio

>>> fuzz.token_sort_ratio("fuzzy was a bear", "fuzzy fuzzy was a bear")
    84
>>> fuzz.token_set_ratio("fuzzy was a bear", "fuzzy fuzzy was a bear")
    100

Process

>>> choices = ["Atlanta Falcons", "New York Jets", "New York Giants", "Dallas Cowboys"]
>>> process.extract("new york jets", choices, limit=2)
    [('New York Jets', 100), ('New York Giants', 78)]
>>> process.extractOne("cowboys", choices)
    ("Dallas Cowboys", 90)

You can also pass additional parameters to extractOne method to make it use a specific scorer. A typical use case is to match file paths:

>>> process.extractOne("System of a down - Hypnotize - Heroin", songs)
    ('/music/library/good/System of a Down/2005 - Hypnotize/01 - Attack.mp3', 86)
>>> process.extractOne("System of a down - Hypnotize - Heroin", songs, scorer=fuzz.token_sort_ratio)
    ("/music/library/good/System of a Down/2005 - Hypnotize/10 - She's Like Heroin.mp3", 61)

Known Ports

FuzzyWuzzy is being ported to other languages too! Here are a few ports we know about:

Java: xpresso's fuzzywuzzy implementation
Java: fuzzywuzzy (java port)
Rust: fuzzyrusty (Rust port)
JavaScript: fuzzball.js (JavaScript port)
C++: Tmplt/fuzzywuzzy
C#: fuzzysharp (.Net port)
Go: go-fuzzywuzz (Go port)
Free Pascal: FuzzyWuzzy.pas (Free Pascal port)
Kotlin multiplatform: FuzzyWuzzy-Kotlin
R: fuzzywuzzyR (R port)

Comments

Incompatible License

I see you've got issue #113 closed, however a simple reading of the GPL linked in the StringMatcher file doesn't just imply, but in fact definitively states, that your project must be licensed as GPL.

1. You may copy and distribute verbatim copies of the Program's source code as you receive it, in any medium, provided that you conspicuously and appropriately publish on each copy an appropriate copyright notice and disclaimer of warranty; keep intact all the notices that refer to this License and to the absence of any warranty; and give any other recipients of the Program a copy of this License along with the Program.

You have not copied the entire source verbatim, you've copied a select portion and incorporated it into a further derived work.

2. You may modify your copy or copies of the Program or any portion of it, thus forming a work based on the Program, and copy and distribute such modifications or work under the terms of Section 1 above, provided that you also meet all of these conditions:

a) You must cause the modified files to carry prominent notices stating that you changed the files and the date of any change.

b) You must cause any work that you distribute or publish, that in whole or in part contains or is derived from the Program or any part thereof, to be licensed as a whole at no charge to all third parties under the terms of this License.

Emphasis added. When you decided to copy in a portion of the code from python-levenshtein you incidentally selected the GPL license for your project, and it's been apparently licensed as MIT / X11, but effectively and legally licensed as GPL, ever since.

I say this as someone who has a company project that's now been copy-lefted by your code, albeit accidentally and inadvertently.

opened by yawpitch 37
List supported python versions, bump minor

Now that #33 has been merged in, fuzzywuzzy looks good to go for python 3 compatibility. Might as well advertise this in the setup.py / on pypi. Version bumped so a new version can be pushed to pypi.

Cheers!

opened by JeffPaine 14
Remove query processing and default processor

Added test case to check that using a processor of the form lambda x: x['key'] doesn't fail.

Set default_processor to None and do not run processor on query.

Remove test case that checked if string reduced to 0 by processor, as string will no longer be processed.

Adjusted test cases in test_fuzzywuzzy_hypothesis to not use a processor, but not fail if scorer reduces query to empty string. If the user supplies a processor that modifies the choice so that it is no longer an exact match for the query, not finding an exact match would be the expected behavior.

Saw some relevant discussion around issues #77 and #141 etc. If processor doesn't run on the query, which I feel it CAN NOT, then processor must also default to None to avoid unexpected behaviors.

(also have a separate branch that only adds the new test case fwiw)

opened by nol13 13
UserWarning: Using slow pure-python SequenceMatcher
C:\Python27\lib\site-packages\fuzzywuzzy\fuzz.py:33: UserWarning: Using slow pure-python SequenceMatcher warnings.warn('Using slow pure-python SequenceMatcher')

The line of code I have for importing is from fuzzywuzzy import fuzz.

I'm running Python 2.7.8 on Windows 8, pip 1.5.6, and fuzzywuzzy 0.4.0.
opened by sylvia43 13
Upload latest version to pypi

Could we kindly upload the latest version to pypi (directions)?

I'd like to use this library on a python 3 project where using pip install would be greatly appreciated :smile: Cheers!

opened by JeffPaine 13

ImportError: cannot import name fuzz

Below is more information.

$> sudo pip install fuzzywuzzy   #works, no error

$> vi mytest.py
from fuzzywuzzy import fuzz
from fuzzywuzzy import process

$>python mytest.py
...
ImportError: cannot import name fuzz

opened by harishvc 12

Fuzzy install via pip in Conda environment, python-Levenshtein warning

Hi...

I'm new to Conda, and just re-installed fuzzywuzzy using conda version of pip. python-Levenshtein is also install according to conda.

I'm using Python 3.3.

I'm not sure what to do about this...

Cheers !

opened by dpcuneowcg 12
Fuzzywuzzy 4x slower *with* python-Levenshtein installed

According to the intro page: "python-Levenshtein (optional, provides a 4-10x speedup in String Matching)"

I am fuzzy matching paragraphs with their best match from a significant list of paragraphs. When I use fuzzywuzzy without python-Levenshtein on Ubuntu, I receive the error that I should install it for a speed-up, and it took about 45 mins on a test set of paragraphs:

real 46m56.813s user 125m45.952s sys 0m0.432s

When I then installed python-Levenshtein, the error went away as expected, but running the same example set of paragraphs took over 200 minutes:

real 204m51.384s user 522m49.436s sys 0m21.376s

The results are identical except for one paragraph, which does match better with the slower, python-Levenshtein version. While I haven't timed an example, doing the same on the mac also seems significantly slower with python-Levenshtein installed.

The error message when python-Levenshtein is not installed says that the reason to install the package is for speed, so to me something isn't right: either it shouldn't be slower, or the error message should change to say that it should be installed for increased accuracy (if that is the case).
needs more info

opened by Hooloovoo 11

Not python3 compliant

I get the following error:

File "/usr/lib/python3.4/site-packages/fuzzywuzzy/fuzz.py", line 49, in ratio
    s1, s2 = utils.make_type_consistent(s1, s2)
  File "/usr/lib/python3.4/site-packages/fuzzywuzzy/utils.py", line 43, in make_type_consistent
    elif isinstance(s1, unicode) and isinstance(s2, unicode):
NameError: name 'unicode' is not defined

opened by ashneo76 11

Support for other langauges
Hi, First of all, thanks for maintaining this. I just noticed that both token_sort_ratio and token_set_ratio don't support Arabic characters. I don't know about other non-English ones but at lease they don't support Arabic.. It's returning 0 as a result of comparing anything with Arabic string. Even if they were 2 Arabic strings..

>>> print fuzz.token_sort_ratio("مرحبا جميعا", "مرحبا جميعا وشكرا لكم") 0 >>> print fuzz.partial_ratio("مرحبا جميعا", "مرحبا جميعا وشكرا لكم") 100

So I'm just wondering if this's a bug or it simply just doesn't support non-English characters? Thanks
opened by tester88 10
Fix for Python 3.7

Fixes #233

According to PEP 479, if raise StopIteration occurs directly in a generator, simply replace it with return.

This is both backwards and forwards compatible code.

opened by hb-alexbotello 9

NameError: name 'ratio' is not defined

While running the Fuzzy Wuzzy process.extract() method the following error is thrown :-

    matches = fw_process.extract(
../../.local/share/virtualenvs/proj-NKfiPrkj/lib/python3.8/site-packages/fuzzywuzzy/process.py:168: in extract
    return heapq.nlargest(limit, sl, key=lambda i: i[1]) if limit is not None else \
/usr/lib/python3.8/heapq.py:563: in nlargest
    result = [(key(elem), i, elem) for i, elem in zip(range(0, -n, -1), it)]
/usr/lib/python3.8/heapq.py:563: in <listcomp>
    result = [(key(elem), i, elem) for i, elem in zip(range(0, -n, -1), it)]
../../.local/share/virtualenvs/proj-NKfiPrkj/lib/python3.8/site-packages/fuzzywuzzy/process.py:117: in extractWithoutOrder
    score = scorer(processed_query, processed)
../../.local/share/virtualenvs/proj-NKfiPrkj/lib/python3.8/site-packages/fuzzywuzzy/fuzz.py:276: in WRatio
    base = ratio(p1, p2)
../../.local/share/virtualenvs/proj-NKfiPrkj/lib/python3.8/site-packages/fuzzywuzzy/utils.py:38: in decorator
    return func(*args, **kwargs)
../../.local/share/virtualenvs/proj-NKfiPrkj/lib/python3.8/site-packages/fuzzywuzzy/utils.py:29: in decorator
    return func(*args, **kwargs)
../../.local/share/virtualenvs/proj-NKfiPrkj/lib/python3.8/site-packages/fuzzywuzzy/utils.py:47: in decorator
    return func(*args, **kwargs)
../../.local/share/virtualenvs/proj-NKfiPrkj/lib/python3.8/site-packages/fuzzywuzzy/fuzz.py:28: in ratio
    return utils.intr(100 * m.ratio())
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <fuzzywuzzy.StringMatcher.StringMatcher object at 0x7f115ab32160>

    def ratio(self):
        if not self._ratio:
>           self._ratio = ratio(self._str1, self._str2)
E           NameError: name 'ratio' is not defined

I found that pinning the python-Levenshtein version to 0.12.2 solve the issue.

opened by DivanshuTak 1

How to decrease False positive matches? (process.extract / WRatio)
I am using process.extract method, And I know it uses WRatio under the hood for calculating score. Following is the case in which I am getting very high score of 90 despite the string hardly equal. Is there any way to fix this in WRatio?

inp_name="america" name_list=["american Futures and Options Exchange"] process.extractOne(inp_name,name_list)

Output--> ('american Futures and Options Exchange', 90.0, 0)

PS: I know other alternatives likes fuzz.ratio, partial_ratio, token_sort_ratio. But WRatio works pretty well for my usecase. So any workaround for the same would be appreciated... Thanks!
opened by Pranav082001 3

'list' object has no attribute 'items'

When trying to get the data of such a string 'hello 𝙎𝙈𝙈 world' using token_set_ratio(), no problems arise, but there is an error when calling process.extract().

If you remove the incomprehensible characters "SMM" from the line, then there is no error

Example:

strtest = 'hello 𝙎𝙈𝙈 world'
stroka = "word"
print(str(fuzz.token_set_ratio(stroka, strtest))) # OK
for message in process.extract(stroka, [strtest, 'sss'], limit=1): # ERROR
    pass

Error:

Traceback (most recent call last):
  File "C:\Users\Alexey\AppData\Local\Programs\Python\Python39\lib\site-packages\fuzzywuzzy\process.py", line 108, in extractWithoutOrder
    for key, choice in choices.items():
AttributeError: 'list' object has no attribute 'items'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\Users\Alexey\Documents\fuzzywuzzy\index.py", line 191, in <module>
    for message in process.extract(stroka, [strtest, 'sss'], limit=1): # ERROR
  File "C:\Users\Alexey\AppData\Local\Programs\Python\Python39\lib\site-packages\fuzzywuzzy\process.py", line 168, in extract
    return heapq.nlargest(limit, sl, key=lambda i: i[1]) if limit is not None else \
  File "C:\Users\Alexey\AppData\Local\Programs\Python\Python39\lib\heapq.py", line 531, in nlargest
    result = max(it, default=sentinel, key=key)
  File "C:\Users\Alexey\AppData\Local\Programs\Python\Python39\lib\site-packages\fuzzywuzzy\process.py", line 117, in extractWithoutOrder
    score = scorer(processed_query, processed)
  File "C:\Users\Alexey\AppData\Local\Programs\Python\Python39\lib\site-packages\fuzzywuzzy\fuzz.py", line 288, in WRatio
    partial = partial_ratio(p1, p2) * partial_scale
  File "C:\Users\Alexey\AppData\Local\Programs\Python\Python39\lib\site-packages\fuzzywuzzy\utils.py", line 38, in decorator
    return func(*args, **kwargs)
  File "C:\Users\Alexey\AppData\Local\Programs\Python\Python39\lib\site-packages\fuzzywuzzy\utils.py", line 29, in decorator
    return func(*args, **kwargs)
  File "C:\Users\Alexey\AppData\Local\Programs\Python\Python39\lib\site-packages\fuzzywuzzy\utils.py", line 47, in decorator
    return func(*args, **kwargs)
  File "C:\Users\Alexey\AppData\Local\Programs\Python\Python39\lib\site-packages\fuzzywuzzy\fuzz.py", line 47, in partial_ratio
    blocks = m.get_matching_blocks()
  File "C:\Users\Alexey\AppData\Local\Programs\Python\Python39\lib\site-packages\fuzzywuzzy\StringMatcher.py", line 58, in get_matching_blocks
    self._matching_blocks = matching_blocks(self.get_opcodes(),
ValueError: apply_edit edit operations are invalid or inapplicable

opened by syfulin 0

Removed a manual file handler pitfall

The problem There was a case where the code was using a manual file handler pitfall, where a file stream was being opened and closed manually. But since Python supports automatic stream closing using the block 'with', its better to use it instead of the manual close in order to remove a bug vector.

Solution Refactored the code to remove the manual file handler

opened by NaelsonDouglas 0
token_set_ratio Degenerate Case
Referring to the description of token_set_ratio in the original blog post: if the SORTED_INTERSECTION is a strict subset of STRING2, the result ratio will be 100. E.g.,

fuzz.token_set_ratio("Deep Learning", "Python Machine Learning: Machine Learning and Deep Learning with Python, scikit-learn, and TensorFlow 2")

yields 100. This is patently incorrect, and does not uphold the purported intuition ("because the SORTED_INTERSECTION component is always exactly the same, the scores increase when (a) that makes up a larger percentage of the full string, and (b) the string remainders are more similar").

Looking at fuzz._token_set, we see that it returns

max( [ ratio_func(sorted_sect, combined_1to2), ratio_func(sorted_sect, combined_2to1), ratio_func(combined_1to2, combined_2to1) ] )

It appears the assumption is that the string remainder will never be empty. Perhaps something like this is more appropriate:

max( [ 0 if sorted_sect == combined_1to2 else ratio_func(sorted_sect, combined_1to2), 0 if sorted_sect == combined_2to1 else ratio_func(sorted_sect, combined_2to1), ratio_func(combined_1to2, combined_2to1) ] )
opened by rogerrohrbach 0
Mark repository as archived

since this repo seems to haven been depercated I suggest to mark it as read-only in the settings. This also displays a banner on top of the page, which may be even easier to catch than the readme …

opened by Bibo-Joshi 0

Releases(0.18.0)

0.18.0(Feb 13, 2020)

Source code(tar.gz)
Source code(zip)
0.16.0(Dec 18, 2017)
Add punctuation characters back in so process does something. [davidcellis]

Simpler alphabet and even fewer examples. [davidcellis]

Fewer examples and larger deadlines for Hypothesis. [davidcellis]

Slightly more examples. [davidcellis]

Attempt to fix the failing 2.7 and 3.6 python tests. [davidcellis]

Readme: add link to C++ port. [Lizard]

Fix tests on Python 3.3. [Jon Banafato]

Modify tox.ini and .travis.yml to install enum34 when running with Python 3.3 to allow hypothesis tests to pass.

Normalize Python versions. [Jon Banafato]

Enable Travis-CI tests for Python 3.6

Enable tests for all supported Python versions in tox.ini

Add Trove classifiers for Python 3.4 - 3.6 to setup.py

Note: Python 2.6 and 3.3 are no longer supported by the Python core team. Support for these can likely be dropped, but that's out of scope for this change set.

Fix typos. [Sven-Hendrik Haase]

Source code(tar.gz)
Source code(zip)
0.15.1(Sep 21, 2017)
Fix setup.py (addresses #155) [Paul O'Leary McCann]

Merge remote-tracking branch 'upstream/master' into extract_optimizations. [nolan]

Seed random before generating benchmark strings. [nolan]

Cleaner implementation of same idea without new param, but adding existing full_process param to Q,W,UQ,UW. [nolan]

Fix benchmark only generate list once. [nolan]

Only run util.full_process once on query when using extract functions, add new benchmarks. [nolan]

Source code(tar.gz)
Source code(zip)
0.15.0(Sep 21, 2017)
Add extras require to install python-levenshtein optionally. [Rolando Espinoza]

This allows to install python-levenshtein as dependency.

Fix link formatting in the README. [Alex Chan]

Add fuzzball.js JavaScript port link. [nolan]

Added Rust Port link. [Logan Collins]

Validate_string docstring. [davidcellis]

For full comparisons test that ONLY exact matches (after processing) are added. [davidcellis]

Add detailed docstrings to WRatio and QRatio comparisons. [davidcellis]

Source code(tar.gz)
Source code(zip)
0.14.0(Feb 20, 2017)
Possible PEP-8 fix + make pep-8 warnings appear in test. [davidcellis]

Possible PEP-8 fix. [davidcellis]

Possible PEP-8 fix. [davidcellis]

Test for stderr log instead of warning. [davidcellis]

Convert warning.warn to logging.warning. [davidcellis]

Additional details for empty string warning from process. [davidcellis]

Enclose warnings.simplefilter() inside a with statement. [samkennerly]

Source code(tar.gz)
Source code(zip)
0.13.0(Feb 20, 2017)
Support alternate git status output. [Jose Diaz-Gonzalez]

Split warning test into new test file, added to travis execution on 2.6 / pypy3. [davidcellis]

Remove hypothesis examples database from gitignore. [davidcellis]

Add check for warning to tests. [davidcellis]

Check processor and warn before scorer may remove processor. [davidcellis]

Renamed test - tidied docstring. [davidcellis]

Add token ratios to the list of scorers that skip running full_process as a processor. [davidcellis]

Added tokex_sort, token_set to test. [davidcellis]

Test docstrings/comments. [davidcellis]

Added py.test .cache/ removed duplicated build from gitignore. [davidcellis]

Added default_scorer, default_processor parameters to make it easier to change in the future. [davidcellis]

Rewrote extracts to explicitly use default values for processor and scorer. [davidcellis]

Changed Hypothesis tests to use pytest parameters. [davidcellis]

Added Hypothesis based tests for identical strings. [Ducksual]

Added test for simple 'a, b' string on process.extractOne. [Ducksual]

Process the query in process.extractWithoutOrder when using a scorer which does not do so. [Ducksual]

Mention that difflib and levenshtein results may differ. [Jose Diaz-Gonzalez]

Source code(tar.gz)
Source code(zip)
0.12.0(Sep 14, 2016)
Declare support for universal wheels. [Thomas Grainger]

Clarify that license is GPLv2. [Gareth Tan]

Source code(tar.gz)
Source code(zip)
0.11.1(Sep 14, 2016)
Add editorconfig. [Jose Diaz-Gonzalez]

Added tox.ini cofig file for easy local multi-environment testing changed travis config to use py.test like tox updated use of pep8 module to pycodestyle. [Pedro Rodrigues]

Source code(tar.gz)
Source code(zip)
0.11.0(Jun 30, 2016)
Clean-up. [desmaisons_david]

Improving performance. [desmaisons_david]

Performance Improvement. [desmaisons_david]

Fix link to Levenshtein. [Brian J. McGuirk]

Fix readme links. [Brian J. McGuirk]

Add license to StringMatcher.py. [Jose Diaz-Gonzalez]

Closes #113

Source code(tar.gz)
Source code(zip)
0.10.0(Jun 30, 2016)
Handle None inputs same as empty string (Issue #94) [Nick Miller]

Source code(tar.gz)
Source code(zip)
0.9.0(Jun 30, 2016)
Pull down all keys when updating local copy. [Jose Diaz-Gonzalez]

Source code(tar.gz)
Source code(zip)
0.8.2(Jun 30, 2016)
Remove the warning for "slow" sequence matcher on PyPy. [Julian Berman] where it's preferable to use the pure-python implementation.

Source code(tar.gz)
Source code(zip)
0.8.1(Jun 30, 2016)
Minor release changes. [Jose Diaz-Gonzalez]

Clean up wiki link in readme. [Ewan Oglethorpe]

Source code(tar.gz)
Source code(zip)
0.8.0(Nov 16, 2015)
Refer to Levenshtein distance in readme. Closes #88. [Jose Diaz-Gonzalez]

Added install step for travis to have pep8 available. [Pedro Rodrigues]

Added a pep8 test. The way I add the error 501 to the ignore tuple is probably wrong but from the docs and source code of pep8 I could not find any other way. [Pedro Rodrigues]

I also went ahead and removed the pep8 call from the release file.

Added python 3.5, pypy, and ypyp3 to the travis config file. [Pedro Rodrigues]

Added another step to the release file to run the tests before releasing. [Pedro Rodrigues]

Fixed a few pep8 errors Added a verification step in the release automation file. This step should probably be somewhere at git level. [Pedro Rodrigues]

Pep8. [Pedro Rodrigues]

Leaving TODOs in the code was never a good idea. [Pedro Rodrigues]

Changed return values to be rounded integers. [Pedro Rodrigues]

Added a test with the recovered data file. [Pedro Rodrigues]

Recovered titledata.csv. [Pedro Rodrigues]

Move extract test methods into the process test. [Shale Craig]

Somehow, they ended up in the RatioTest, despite asserting that the ProcessTest works.

Source code(tar.gz)
Source code(zip)
0.7.0(Oct 2, 2015)
Use portable syntax for catching exception on tests. [Luis Madrigal]

[Fix] test against correct variable. [Luis Madrigal]

Add unit tests for validator decorators. [Luis Madrigal]

Move validators to decorator functions. [Luis Madrigal]

This allows easier composition and IMO makes the functions more readable

Fix typo: dictionery -> dictionary. [shale]

FizzyWuzzy -> FuzzyWuzzy typo correction. [shale]

Add check for gitchangelog. [Jose Diaz-Gonzalez]

Source code(tar.gz)
Source code(zip)
0.6.2(Sep 3, 2015)
Ensure the rst-lint binary is available. [Jose Diaz-Gonzalez]

Source code(tar.gz)
Source code(zip)
0.6.1(Sep 3, 2015)
Minor whitespace changes for PEP8. [Jose Diaz-Gonzalez]

Source code(tar.gz)
Source code(zip)
0.6.0(Jul 20, 2015)
Added link to a java port. [Andriy Burkov]

Patched "name 'unicode' is not defined" python3. [Carlos Garay]

https://github.com/seatgeek/fuzzywuzzy/issues/80

Make process.extract accept {dict, list}-like choices. [Nathan Typanski]

Previously, process.extract expected lists or dictionaries, and tested this with isinstance() calls. In keeping with the spirit of Python (duck typing and all that), this change enables one to use extract() on any dict-like object for dict-like results, or any list-like object for list-like results.

So now we can (and, indeed, I've added tests for these uses) call extract() on things like:

a generator of strings ("any iterable")

a UserDict

custom user-made classes that "look like" dicts (or, really, anything with a .items() method that behaves like a dict)

plain old lists and dicts

The behavior is exactly the same for previous use cases of lists-and-dicts.

This change goes along nicely with PR #68, since those docs suggest dict-like behavior is valid, and this change makes that true.

Merge conflict. [Adam Cohen]

Improve docs for fuzzywuzzy.process. [Nathan Typanski]

The documentation for this module was dated and sometimes inaccurate. This overhauls the docs to accurately describe the current module, including detailing optional arguments that were not previously explained - e.g., limit argument to extract().

This change follows the Google Python Style Guide, which may be found at:

https://google-styleguide.googlecode.com/svn/trunk/pyguide.html?showone=Comments#Comments

Source code(tar.gz)
Source code(zip)
0.5.0(Jul 20, 2015)
FIX: 0.4.0 is released, no need to specify 0.3.1 in README. [Josh Warner (Mac)]

Fixed a small typo. [Rostislav Semenov]

Reset processor and scorer defaults to None with argument checking. [foxxyz]

Catch generators without lengths. [Jeremiah Lowin]

Fixed python3 issue and deprecated assertion method. [foxxyz]

Fixed some docstrings, typos, python3 string method compatibility, some errors that crept in during rebase. [foxxyz]

[mod] The lamdba in extract is not needed. [Olivier Le Thanh Duong]

[mod] Pass directly the defaults functions in the args

[mod] itertools.takewhile() can handle empty list just fine no need to test for it

[mod] Shorten extractOne by removing double if

[mod] Use a list comprehention in extract()

[mod] Autopep8 on process.py

[doc] Document make_type_consistent

[mod] bad_chars shortened

[enh] Move regex compilation outside the method, otherwhise we don't get the benefit from it

[mod] Don't need all the blah just to redefine method from string module

[mod] Remove unused import

[mod] Autopep8 on string_processing.py

[mod] Rewrote asciidammit without recursion to make it more readable

[mod] Autopep8 on utils.py

[mod] Remove unused import

[doc] Add some doc to fuzz.py

[mod] Move the code to sort string in a separate function

[doc] Docstrings for WRatio, UWRatio

Add note on which package to install. Closes #67. [Jose Diaz-Gonzalez]

Source code(tar.gz)
Source code(zip)
0.4.0(Oct 31, 2014)
Merge pull request #64 from ojomio/master. [Jose Diaz-Gonzalez]

In extarctBests() and extractOne() use '>=' instead of '>'

Merge pull request #62 from ojomio/master. [Jose Diaz-Gonzalez]

Fixed python3 issue with SequenceMatcher import

Source code(tar.gz)
Source code(zip)
0.3.3(Oct 22, 2014)
Update release script to make it more generic. [Jose Diaz-Gonzalez]

Merge pull request #60 from ojomio/master. [Jose Diaz-Gonzalez] Fixed issue #59 - "partial" parameter for _token_set() is now honored

Merge pull request #54 from jlowin/patch-1. [Jose Diaz-Gonzalez] Remove explicit check for lists

Source code(tar.gz)
Source code(zip)
0.3.2(Sep 12, 2014)
Make release command an executable. [Jose Diaz-Gonzalez]

Simplify MANIFEST.in. [Jose Diaz-Gonzalez]

Add a release script. [Jose Diaz-Gonzalez]

Fix readme codeblock. [Jose Diaz-Gonzalez]

Minor formatting. [Jose Diaz-Gonzalez]

Update readme with proper installation notes. [Jose Diaz-Gonzalez]

Use version from fuzzywuzzy package. [Jose Diaz-Gonzalez]

Set version constant in init.py. [Jose Diaz-Gonzalez]

Update setup.py. [Jose Diaz-Gonzalez]

Rename LICENSE to LICENSE.txt. [Jose Diaz-Gonzalez]

Update packaging a bit. [Jose Diaz-Gonzalez]

Source code(tar.gz)
Source code(zip)
0.3.0(Aug 24, 2014)
Allow choices to be a list or dict

Add testing for 3.4

Typo updates

Update readme, change formatting to RST

Fix package requirements

PEP8!

Source code(tar.gz)
Source code(zip)

Owner

SeatGeek

GitHub http://chairnerd.seatgeek.com/fuzzywuzzy-fuzzy-string-matching-in-python/

The Levenshtein Python C extension module contains functions for fast computation of Levenshtein distance and string similarity

Contents Maintainer wanted Introduction Installation Documentation License History Source code Authors Maintainer wanted I am looking for a new mainta

1.2k Dec 16, 2022

strbind - lapidary text converter for translate an text file to the C-style string

strbind strbind - lapidary text converter for translate an text file to the C-style string. My motivation is fast adding large text chunks to the C co

1 Oct 22, 2021

Converts a Bangla numeric string to literal words.

Bangla Number in Words Converts a Bangla numeric string to literal words. Install $ pip install banglanum2words Usage

3 Aug 29, 2022

Implementation of hashids (http://hashids.org) in Python. Compatible with Python 2 and Python 3

hashids for Python 2.7 & 3 A python port of the JavaScript hashids implementation. It generates YouTube-like hashes from one or many numbers. Use hash

1.4k Jan 2, 2023

Python character encoding detector

Chardet: The Universal Character Encoding Detector Detects ASCII, UTF-8, UTF-16 (2 variants), UTF-32 (4 variants) Big5, GB2312, EUC-TW, HZ-GB-2312, IS

1.8k Jan 8, 2023

Paranoid text spacing in Python

pangu.py Paranoid text spacing for good readability, to automatically insert whitespace between CJK (Chinese, Japanese, Korean) and half-width charact

194 Nov 19, 2022

An implementation of figlet written in Python

All of the documentation and the majority of the work done was by Christopher Jones ([email protected]). Packaged by Peter Waller <[email protected]>,

1.1k Jan 2, 2023

Compute distance between sequences. 30+ algorithms, pure python implementation, common interface, optional external libs usage.

TextDistance TextDistance -- python library for comparing distance between two or more sequences by many algorithms. Features: 30+ algorithms Pure pyt

3k Jan 2, 2023

Python flexible slugify function

awesome-slugify Python flexible slugify function PyPi: https://pypi.python.org/pypi/awesome-slugify Github: https://github.com/dimka665/awesome-slugif

471 Dec 20, 2022

Python Lex-Yacc

2.4k Dec 31, 2022

Python library for creating PEG parsers

PyParsing -- A Python Parsing Module Introduction The pyparsing module is an alternative approach to creating and executing simple grammars, vs. the t

1.7k Dec 27, 2022

A simple Python module for parsing human names into their individual components

Name Parser A simple Python (3.2+ & 2.6+) module for parsing human names into their individual components. hn.title hn.first hn.middle hn.last hn.suff

574 Dec 20, 2022

Python port of Google's libphonenumber

phonenumbers Python Library This is a Python port of Google's libphonenumber library It supports Python 2.5-2.7 and Python 3.x (in the same codebase,

3.1k Dec 29, 2022

A Python library that provides an easy way to identify devices like mobile phones, tablets and their capabilities by parsing (browser) user agent strings.

Python User Agents user_agents is a Python library that provides an easy way to identify/detect devices like mobile phones, tablets and their capabili

1.3k Dec 22, 2022

Fuzzy String Matching in Python

Related tags

Overview

FuzzyWuzzy

Requirements

For testing

Installation

Usage

Simple Ratio

Partial Ratio

Token Sort Ratio

Token Set Ratio

Process

Known Ports

Comments

Releases(0.18.0)

0.18.0(Feb 13, 2020)

0.16.0(Dec 18, 2017)

0.15.1(Sep 21, 2017)

0.15.0(Sep 21, 2017)

0.14.0(Feb 20, 2017)

0.13.0(Feb 20, 2017)

0.12.0(Sep 14, 2016)

0.11.1(Sep 14, 2016)

0.11.0(Jun 30, 2016)

0.10.0(Jun 30, 2016)

0.9.0(Jun 30, 2016)

0.8.2(Jun 30, 2016)

0.8.1(Jun 30, 2016)

0.8.0(Nov 16, 2015)

0.7.0(Oct 2, 2015)

0.6.2(Sep 3, 2015)

0.6.1(Sep 3, 2015)

0.6.0(Jul 20, 2015)

0.5.0(Jul 20, 2015)

0.4.0(Oct 31, 2014)

0.3.3(Oct 22, 2014)

0.3.2(Sep 12, 2014)

0.3.0(Aug 24, 2014)

Owner

SeatGeek

The Levenshtein Python C extension module contains functions for fast computation of Levenshtein distance and string similarity

strbind - lapidary text converter for translate an text file to the C-style string

Converts a Bangla numeric string to literal words.

Implementation of hashids (http://hashids.org) in Python. Compatible with Python 2 and Python 3

Python character encoding detector

Paranoid text spacing in Python

An implementation of figlet written in Python

Compute distance between sequences. 30+ algorithms, pure python implementation, common interface, optional external libs usage.

Python flexible slugify function

Python Lex-Yacc

Python library for creating PEG parsers

A simple Python module for parsing human names into their individual components

Python port of Google's libphonenumber

A Python library that provides an easy way to identify devices like mobile phones, tablets and their capabilities by parsing (browser) user agent strings.

A non-validating SQL parser module for Python

An anthology of a variety of tools for the Persian language in Python

Widevine KEY Extractor in Python

A Python app which can convert normal text to Handwritten text.

Etranslate is a free and unlimited python library for transiting your texts