Fuzzy String Matching in Python

Overview
https://travis-ci.org/seatgeek/fuzzywuzzy.svg?branch=master

FuzzyWuzzy

Fuzzy string matching like a boss. It uses Levenshtein Distance to calculate the differences between sequences in a simple-to-use package.

Requirements

For testing

  • pycodestyle
  • hypothesis
  • pytest

Installation

Using PIP via PyPI

pip install fuzzywuzzy

or the following to install python-Levenshtein too

pip install fuzzywuzzy[speedup]

Using PIP via Github

pip install git+git://github.com/seatgeek/[email protected]#egg=fuzzywuzzy

Adding to your requirements.txt file (run pip install -r requirements.txt afterwards)

git+ssh://[email protected]/seatgeek/[email protected]#egg=fuzzywuzzy

Manually via GIT

git clone git://github.com/seatgeek/fuzzywuzzy.git fuzzywuzzy
cd fuzzywuzzy
python setup.py install

Usage

>>> from fuzzywuzzy import fuzz
>>> from fuzzywuzzy import process

Simple Ratio

>>> fuzz.ratio("this is a test", "this is a test!")
    97

Partial Ratio

>>> fuzz.partial_ratio("this is a test", "this is a test!")
    100

Token Sort Ratio

>>> fuzz.ratio("fuzzy wuzzy was a bear", "wuzzy fuzzy was a bear")
    91
>>> fuzz.token_sort_ratio("fuzzy wuzzy was a bear", "wuzzy fuzzy was a bear")
    100

Token Set Ratio

>>> fuzz.token_sort_ratio("fuzzy was a bear", "fuzzy fuzzy was a bear")
    84
>>> fuzz.token_set_ratio("fuzzy was a bear", "fuzzy fuzzy was a bear")
    100

Process

>>> choices = ["Atlanta Falcons", "New York Jets", "New York Giants", "Dallas Cowboys"]
>>> process.extract("new york jets", choices, limit=2)
    [('New York Jets', 100), ('New York Giants', 78)]
>>> process.extractOne("cowboys", choices)
    ("Dallas Cowboys", 90)

You can also pass additional parameters to extractOne method to make it use a specific scorer. A typical use case is to match file paths:

>>> process.extractOne("System of a down - Hypnotize - Heroin", songs)
    ('/music/library/good/System of a Down/2005 - Hypnotize/01 - Attack.mp3', 86)
>>> process.extractOne("System of a down - Hypnotize - Heroin", songs, scorer=fuzz.token_sort_ratio)
    ("/music/library/good/System of a Down/2005 - Hypnotize/10 - She's Like Heroin.mp3", 61)

Known Ports

FuzzyWuzzy is being ported to other languages too! Here are a few ports we know about:

Comments
  • Incompatible License

    Incompatible License

    I see you've got issue #113 closed, however a simple reading of the GPL linked in the StringMatcher file doesn't just imply, but in fact definitively states, that your project must be licensed as GPL.

    1. You may copy and distribute verbatim copies of the Program's source code as you receive it, in any medium, provided that you conspicuously and appropriately publish on each copy an appropriate copyright notice and disclaimer of warranty; keep intact all the notices that refer to this License and to the absence of any warranty; and give any other recipients of the Program a copy of this License along with the Program.

    You have not copied the entire source verbatim, you've copied a select portion and incorporated it into a further derived work.

    2. You may modify your copy or copies of the Program or any portion of it, thus forming a work based on the Program, and copy and distribute such modifications or work under the terms of Section 1 above, provided that you also meet all of these conditions:

    a) You must cause the modified files to carry prominent notices stating that you changed the files and the date of any change.

    b) You must cause any work that you distribute or publish, that in whole or in part contains or is derived from the Program or any part thereof, to be licensed as a whole at no charge to all third parties under the terms of this License.

    Emphasis added. When you decided to copy in a portion of the code from python-levenshtein you incidentally selected the GPL license for your project, and it's been apparently licensed as MIT / X11, but effectively and legally licensed as GPL, ever since.

    I say this as someone who has a company project that's now been copy-lefted by your code, albeit accidentally and inadvertently.

    opened by yawpitch 37
  • List supported python versions, bump minor

    List supported python versions, bump minor

    Now that #33 has been merged in, fuzzywuzzy looks good to go for python 3 compatibility. Might as well advertise this in the setup.py / on pypi. Version bumped so a new version can be pushed to pypi.

    Cheers!

    opened by JeffPaine 14
  • Remove query processing and default processor

    Remove query processing and default processor

    Added test case to check that using a processor of the form lambda x: x['key'] doesn't fail.

    Set default_processor to None and do not run processor on query.

    Remove test case that checked if string reduced to 0 by processor, as string will no longer be processed.

    Adjusted test cases in test_fuzzywuzzy_hypothesis to not use a processor, but not fail if scorer reduces query to empty string. If the user supplies a processor that modifies the choice so that it is no longer an exact match for the query, not finding an exact match would be the expected behavior.

    Saw some relevant discussion around issues #77 and #141 etc. If processor doesn't run on the query, which I feel it CAN NOT, then processor must also default to None to avoid unexpected behaviors.

    (also have a separate branch that only adds the new test case fwiw)

    opened by nol13 13
  • UserWarning: Using slow pure-python SequenceMatcher

    UserWarning: Using slow pure-python SequenceMatcher

    C:\Python27\lib\site-packages\fuzzywuzzy\fuzz.py:33: UserWarning: Using slow pure-python SequenceMatcher
      warnings.warn('Using slow pure-python SequenceMatcher')
    

    The line of code I have for importing is from fuzzywuzzy import fuzz.

    I'm running Python 2.7.8 on Windows 8, pip 1.5.6, and fuzzywuzzy 0.4.0.

    opened by sylvia43 13
  • Upload latest version to pypi

    Upload latest version to pypi

    Could we kindly upload the latest version to pypi (directions)?

    I'd like to use this library on a python 3 project where using pip install would be greatly appreciated :smile: Cheers!

    opened by JeffPaine 13
  • ImportError: cannot import name fuzz

    ImportError: cannot import name fuzz

    Below is more information.

    $> sudo pip install fuzzywuzzy   #works, no error
    
    $> vi mytest.py
    from fuzzywuzzy import fuzz
    from fuzzywuzzy import process
    
    $>python mytest.py
    ...
    ImportError: cannot import name fuzz
    
    opened by harishvc 12
  • Fuzzy install via pip in Conda environment,  python-Levenshtein warning

    Fuzzy install via pip in Conda environment, python-Levenshtein warning

    Hi...

    I'm new to Conda, and just re-installed fuzzywuzzy using conda version of pip. python-Levenshtein is also install according to conda.

    I'm using Python 3.3.

    I'm not sure what to do about this...

    Cheers !

    opened by dpcuneowcg 12
  • Fuzzywuzzy 4x slower *with* python-Levenshtein installed

    Fuzzywuzzy 4x slower *with* python-Levenshtein installed

    According to the intro page: "python-Levenshtein (optional, provides a 4-10x speedup in String Matching)"

    I am fuzzy matching paragraphs with their best match from a significant list of paragraphs. When I use fuzzywuzzy without python-Levenshtein on Ubuntu, I receive the error that I should install it for a speed-up, and it took about 45 mins on a test set of paragraphs:

    real 46m56.813s user 125m45.952s sys 0m0.432s

    When I then installed python-Levenshtein, the error went away as expected, but running the same example set of paragraphs took over 200 minutes:

    real 204m51.384s user 522m49.436s sys 0m21.376s

    The results are identical except for one paragraph, which does match better with the slower, python-Levenshtein version. While I haven't timed an example, doing the same on the mac also seems significantly slower with python-Levenshtein installed.

    The error message when python-Levenshtein is not installed says that the reason to install the package is for speed, so to me something isn't right: either it shouldn't be slower, or the error message should change to say that it should be installed for increased accuracy (if that is the case).

    needs more info 
    opened by Hooloovoo 11
  • Not python3 compliant

    Not python3 compliant

    I get the following error:

    File "/usr/lib/python3.4/site-packages/fuzzywuzzy/fuzz.py", line 49, in ratio
        s1, s2 = utils.make_type_consistent(s1, s2)
      File "/usr/lib/python3.4/site-packages/fuzzywuzzy/utils.py", line 43, in make_type_consistent
        elif isinstance(s1, unicode) and isinstance(s2, unicode):
    NameError: name 'unicode' is not defined
    
    opened by ashneo76 11
  • Support for other langauges

    Support for other langauges

    Hi, First of all, thanks for maintaining this. I just noticed that both token_sort_ratio and token_set_ratio don't support Arabic characters. I don't know about other non-English ones but at lease they don't support Arabic.. It's returning 0 as a result of comparing anything with Arabic string. Even if they were 2 Arabic strings..

    >>> print fuzz.token_sort_ratio("مرحبا جميعا", "مرحبا جميعا وشكرا لكم")
    0
    >>> print fuzz.partial_ratio("مرحبا جميعا", "مرحبا جميعا وشكرا لكم")
    100
    
    

    So I'm just wondering if this's a bug or it simply just doesn't support non-English characters? Thanks

    opened by tester88 10
  • Fix for Python 3.7

    Fix for Python 3.7

    Fixes #233

    According to PEP 479, if raise StopIteration occurs directly in a generator, simply replace it with return.

    This is both backwards and forwards compatible code.

    opened by hb-alexbotello 9
  • NameError: name 'ratio' is not defined

    NameError: name 'ratio' is not defined

    While running the Fuzzy Wuzzy process.extract() method the following error is thrown :-

        matches = fw_process.extract(
    ../../.local/share/virtualenvs/proj-NKfiPrkj/lib/python3.8/site-packages/fuzzywuzzy/process.py:168: in extract
        return heapq.nlargest(limit, sl, key=lambda i: i[1]) if limit is not None else \
    /usr/lib/python3.8/heapq.py:563: in nlargest
        result = [(key(elem), i, elem) for i, elem in zip(range(0, -n, -1), it)]
    /usr/lib/python3.8/heapq.py:563: in <listcomp>
        result = [(key(elem), i, elem) for i, elem in zip(range(0, -n, -1), it)]
    ../../.local/share/virtualenvs/proj-NKfiPrkj/lib/python3.8/site-packages/fuzzywuzzy/process.py:117: in extractWithoutOrder
        score = scorer(processed_query, processed)
    ../../.local/share/virtualenvs/proj-NKfiPrkj/lib/python3.8/site-packages/fuzzywuzzy/fuzz.py:276: in WRatio
        base = ratio(p1, p2)
    ../../.local/share/virtualenvs/proj-NKfiPrkj/lib/python3.8/site-packages/fuzzywuzzy/utils.py:38: in decorator
        return func(*args, **kwargs)
    ../../.local/share/virtualenvs/proj-NKfiPrkj/lib/python3.8/site-packages/fuzzywuzzy/utils.py:29: in decorator
        return func(*args, **kwargs)
    ../../.local/share/virtualenvs/proj-NKfiPrkj/lib/python3.8/site-packages/fuzzywuzzy/utils.py:47: in decorator
        return func(*args, **kwargs)
    ../../.local/share/virtualenvs/proj-NKfiPrkj/lib/python3.8/site-packages/fuzzywuzzy/fuzz.py:28: in ratio
        return utils.intr(100 * m.ratio())
    _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
    
    self = <fuzzywuzzy.StringMatcher.StringMatcher object at 0x7f115ab32160>
    
        def ratio(self):
            if not self._ratio:
    >           self._ratio = ratio(self._str1, self._str2)
    E           NameError: name 'ratio' is not defined
    
    
    

    I found that pinning the python-Levenshtein version to 0.12.2 solve the issue.

    opened by DivanshuTak 1
  • How to decrease False positive matches? (process.extract / WRatio)

    How to decrease False positive matches? (process.extract / WRatio)

    I am using process.extract method, And I know it uses WRatio under the hood for calculating score. Following is the case in which I am getting very high score of 90 despite the string hardly equal. Is there any way to fix this in WRatio?

    inp_name="america"
    
    name_list=["american Futures and Options Exchange"]
            
    process.extractOne(inp_name,name_list)
    
    

    Output--> ('american Futures and Options Exchange', 90.0, 0)

    PS: I know other alternatives likes fuzz.ratio, partial_ratio, token_sort_ratio. But WRatio works pretty well for my usecase. So any workaround for the same would be appreciated... Thanks!

    opened by Pranav082001 3
  • 'list' object has no attribute 'items'

    'list' object has no attribute 'items'

    When trying to get the data of such a string 'hello 𝙎𝙈𝙈 world' using token_set_ratio(), no problems arise, but there is an error when calling process.extract().

    If you remove the incomprehensible characters "SMM" from the line, then there is no error

    Example:

    strtest = 'hello 𝙎𝙈𝙈 world'
    stroka = "word"
    print(str(fuzz.token_set_ratio(stroka, strtest))) # OK
    for message in process.extract(stroka, [strtest, 'sss'], limit=1): # ERROR
        pass
    

    Error:

    Traceback (most recent call last):
      File "C:\Users\Alexey\AppData\Local\Programs\Python\Python39\lib\site-packages\fuzzywuzzy\process.py", line 108, in extractWithoutOrder
        for key, choice in choices.items():
    AttributeError: 'list' object has no attribute 'items'
    
    During handling of the above exception, another exception occurred:
    
    Traceback (most recent call last):
      File "C:\Users\Alexey\Documents\fuzzywuzzy\index.py", line 191, in <module>
        for message in process.extract(stroka, [strtest, 'sss'], limit=1): # ERROR
      File "C:\Users\Alexey\AppData\Local\Programs\Python\Python39\lib\site-packages\fuzzywuzzy\process.py", line 168, in extract
        return heapq.nlargest(limit, sl, key=lambda i: i[1]) if limit is not None else \
      File "C:\Users\Alexey\AppData\Local\Programs\Python\Python39\lib\heapq.py", line 531, in nlargest
        result = max(it, default=sentinel, key=key)
      File "C:\Users\Alexey\AppData\Local\Programs\Python\Python39\lib\site-packages\fuzzywuzzy\process.py", line 117, in extractWithoutOrder
        score = scorer(processed_query, processed)
      File "C:\Users\Alexey\AppData\Local\Programs\Python\Python39\lib\site-packages\fuzzywuzzy\fuzz.py", line 288, in WRatio
        partial = partial_ratio(p1, p2) * partial_scale
      File "C:\Users\Alexey\AppData\Local\Programs\Python\Python39\lib\site-packages\fuzzywuzzy\utils.py", line 38, in decorator
        return func(*args, **kwargs)
      File "C:\Users\Alexey\AppData\Local\Programs\Python\Python39\lib\site-packages\fuzzywuzzy\utils.py", line 29, in decorator
        return func(*args, **kwargs)
      File "C:\Users\Alexey\AppData\Local\Programs\Python\Python39\lib\site-packages\fuzzywuzzy\utils.py", line 47, in decorator
        return func(*args, **kwargs)
      File "C:\Users\Alexey\AppData\Local\Programs\Python\Python39\lib\site-packages\fuzzywuzzy\fuzz.py", line 47, in partial_ratio
        blocks = m.get_matching_blocks()
      File "C:\Users\Alexey\AppData\Local\Programs\Python\Python39\lib\site-packages\fuzzywuzzy\StringMatcher.py", line 58, in get_matching_blocks
        self._matching_blocks = matching_blocks(self.get_opcodes(),
    ValueError: apply_edit edit operations are invalid or inapplicable
    
    opened by syfulin 0
  • Removed a manual file handler pitfall

    Removed a manual file handler pitfall

    The problem There was a case where the code was using a manual file handler pitfall, where a file stream was being opened and closed manually. But since Python supports automatic stream closing using the block 'with', its better to use it instead of the manual close in order to remove a bug vector.

    Solution Refactored the code to remove the manual file handler

    opened by NaelsonDouglas 0
  • token_set_ratio Degenerate Case

    token_set_ratio Degenerate Case

    Referring to the description of token_set_ratio in the original blog post: if the SORTED_INTERSECTION is a strict subset of STRING2, the result ratio will be 100. E.g.,

    fuzz.token_set_ratio("Deep Learning", "Python Machine Learning: Machine Learning and Deep Learning with Python, scikit-learn, and TensorFlow 2")
    

    yields 100. This is patently incorrect, and does not uphold the purported intuition ("because the SORTED_INTERSECTION component is always exactly the same, the scores increase when (a) that makes up a larger percentage of the full string, and (b) the string remainders are more similar").

    Looking at fuzz._token_set, we see that it returns

    max(
        [
            ratio_func(sorted_sect, combined_1to2),
            ratio_func(sorted_sect, combined_2to1),
            ratio_func(combined_1to2, combined_2to1)
        ]
    )
    

    It appears the assumption is that the string remainder will never be empty. Perhaps something like this is more appropriate:

    max(
        [
            0 if sorted_sect == combined_1to2 else ratio_func(sorted_sect, combined_1to2),
            0 if sorted_sect == combined_2to1 else ratio_func(sorted_sect, combined_2to1),
            ratio_func(combined_1to2, combined_2to1)
        ]
    )
    
    opened by rogerrohrbach 0
  • Mark repository as archived

    Mark repository as archived

    since this repo seems to haven been depercated I suggest to mark it as read-only in the settings. This also displays a banner on top of the page, which may be even easier to catch than the readme …

    opened by Bibo-Joshi 0
Releases(0.18.0)
  • 0.16.0(Dec 18, 2017)

    • Add punctuation characters back in so process does something. [davidcellis]

    • Simpler alphabet and even fewer examples. [davidcellis]

    • Fewer examples and larger deadlines for Hypothesis. [davidcellis]

    • Slightly more examples. [davidcellis]

    • Attempt to fix the failing 2.7 and 3.6 python tests. [davidcellis]

    • Readme: add link to C++ port. [Lizard]

    • Fix tests on Python 3.3. [Jon Banafato]

      Modify tox.ini and .travis.yml to install enum34 when running with Python 3.3 to allow hypothesis tests to pass.

    • Normalize Python versions. [Jon Banafato]

      • Enable Travis-CI tests for Python 3.6
      • Enable tests for all supported Python versions in tox.ini
      • Add Trove classifiers for Python 3.4 - 3.6 to setup.py

      Note: Python 2.6 and 3.3 are no longer supported by the Python core team. Support for these can likely be dropped, but that's out of scope for this change set.

    • Fix typos. [Sven-Hendrik Haase]

    Source code(tar.gz)
    Source code(zip)
  • 0.15.1(Sep 21, 2017)

    • Fix setup.py (addresses #155) [Paul O'Leary McCann]

    • Merge remote-tracking branch 'upstream/master' into extract_optimizations. [nolan]

    • Seed random before generating benchmark strings. [nolan]

    • Cleaner implementation of same idea without new param, but adding existing full_process param to Q,W,UQ,UW. [nolan]

    • Fix benchmark only generate list once. [nolan]

    • Only run util.full_process once on query when using extract functions, add new benchmarks. [nolan]

    Source code(tar.gz)
    Source code(zip)
  • 0.15.0(Sep 21, 2017)

    • Add extras require to install python-levenshtein optionally. [Rolando Espinoza]

      This allows to install python-levenshtein as dependency.

    • Fix link formatting in the README. [Alex Chan]

    • Add fuzzball.js JavaScript port link. [nolan]

    • Added Rust Port link. [Logan Collins]

    • Validate_string docstring. [davidcellis]

    • For full comparisons test that ONLY exact matches (after processing) are added. [davidcellis]

    • Add detailed docstrings to WRatio and QRatio comparisons. [davidcellis]

    Source code(tar.gz)
    Source code(zip)
  • 0.14.0(Feb 20, 2017)

    • Possible PEP-8 fix + make pep-8 warnings appear in test. [davidcellis]
    • Possible PEP-8 fix. [davidcellis]
    • Possible PEP-8 fix. [davidcellis]
    • Test for stderr log instead of warning. [davidcellis]
    • Convert warning.warn to logging.warning. [davidcellis]
    • Additional details for empty string warning from process. [davidcellis]
    • Enclose warnings.simplefilter() inside a with statement. [samkennerly]
    Source code(tar.gz)
    Source code(zip)
  • 0.13.0(Feb 20, 2017)

    • Support alternate git status output. [Jose Diaz-Gonzalez]
    • Split warning test into new test file, added to travis execution on 2.6 / pypy3. [davidcellis]
    • Remove hypothesis examples database from gitignore. [davidcellis]
    • Add check for warning to tests. [davidcellis]
    • Check processor and warn before scorer may remove processor. [davidcellis]
    • Renamed test - tidied docstring. [davidcellis]
    • Add token ratios to the list of scorers that skip running full_process as a processor. [davidcellis]
    • Added tokex_sort, token_set to test. [davidcellis]
    • Test docstrings/comments. [davidcellis]
    • Added py.test .cache/ removed duplicated build from gitignore. [davidcellis]
    • Added default_scorer, default_processor parameters to make it easier to change in the future. [davidcellis]
    • Rewrote extracts to explicitly use default values for processor and scorer. [davidcellis]
    • Changed Hypothesis tests to use pytest parameters. [davidcellis]
    • Added Hypothesis based tests for identical strings. [Ducksual]
    • Added test for simple 'a, b' string on process.extractOne. [Ducksual]
    • Process the query in process.extractWithoutOrder when using a scorer which does not do so. [Ducksual]
    • Mention that difflib and levenshtein results may differ. [Jose Diaz-Gonzalez]
    Source code(tar.gz)
    Source code(zip)
  • 0.12.0(Sep 14, 2016)

  • 0.11.1(Sep 14, 2016)

    • Add editorconfig. [Jose Diaz-Gonzalez]
    • Added tox.ini cofig file for easy local multi-environment testing changed travis config to use py.test like tox updated use of pep8 module to pycodestyle. [Pedro Rodrigues]
    Source code(tar.gz)
    Source code(zip)
  • 0.11.0(Jun 30, 2016)

    • Clean-up. [desmaisons_david]

    • Improving performance. [desmaisons_david]

    • Performance Improvement. [desmaisons_david]

    • Fix link to Levenshtein. [Brian J. McGuirk]

    • Fix readme links. [Brian J. McGuirk]

    • Add license to StringMatcher.py. [Jose Diaz-Gonzalez]

      Closes #113

    Source code(tar.gz)
    Source code(zip)
  • 0.10.0(Jun 30, 2016)

  • 0.9.0(Jun 30, 2016)

  • 0.8.2(Jun 30, 2016)

  • 0.8.1(Jun 30, 2016)

  • 0.8.0(Nov 16, 2015)

    • Refer to Levenshtein distance in readme. Closes #88. [Jose Diaz-Gonzalez]

    • Added install step for travis to have pep8 available. [Pedro Rodrigues]

    • Added a pep8 test. The way I add the error 501 to the ignore tuple is probably wrong but from the docs and source code of pep8 I could not find any other way. [Pedro Rodrigues]

      I also went ahead and removed the pep8 call from the release file.

    • Added python 3.5, pypy, and ypyp3 to the travis config file. [Pedro Rodrigues]

    • Added another step to the release file to run the tests before releasing. [Pedro Rodrigues]

    • Fixed a few pep8 errors Added a verification step in the release automation file. This step should probably be somewhere at git level. [Pedro Rodrigues]

    • Pep8. [Pedro Rodrigues]

    • Leaving TODOs in the code was never a good idea. [Pedro Rodrigues]

    • Changed return values to be rounded integers. [Pedro Rodrigues]

    • Added a test with the recovered data file. [Pedro Rodrigues]

    • Recovered titledata.csv. [Pedro Rodrigues]

    • Move extract test methods into the process test. [Shale Craig]

      Somehow, they ended up in the RatioTest, despite asserting that the ProcessTest works.

    Source code(tar.gz)
    Source code(zip)
  • 0.7.0(Oct 2, 2015)

    • Use portable syntax for catching exception on tests. [Luis Madrigal]

    • [Fix] test against correct variable. [Luis Madrigal]

    • Add unit tests for validator decorators. [Luis Madrigal]

    • Move validators to decorator functions. [Luis Madrigal]

      This allows easier composition and IMO makes the functions more readable

    • Fix typo: dictionery -> dictionary. [shale]

    • FizzyWuzzy -> FuzzyWuzzy typo correction. [shale]

    • Add check for gitchangelog. [Jose Diaz-Gonzalez]

    Source code(tar.gz)
    Source code(zip)
  • 0.6.2(Sep 3, 2015)

  • 0.6.1(Sep 3, 2015)

  • 0.6.0(Jul 20, 2015)

    • Added link to a java port. [Andriy Burkov]

    • Patched "name 'unicode' is not defined" python3. [Carlos Garay]

      https://github.com/seatgeek/fuzzywuzzy/issues/80

    • Make process.extract accept {dict, list}-like choices. [Nathan Typanski]

      Previously, process.extract expected lists or dictionaries, and tested this with isinstance() calls. In keeping with the spirit of Python (duck typing and all that), this change enables one to use extract() on any dict-like object for dict-like results, or any list-like object for list-like results.

      So now we can (and, indeed, I've added tests for these uses) call extract() on things like:

      • a generator of strings ("any iterable")
      • a UserDict
      • custom user-made classes that "look like" dicts (or, really, anything with a .items() method that behaves like a dict)
      • plain old lists and dicts

      The behavior is exactly the same for previous use cases of lists-and-dicts.

      This change goes along nicely with PR #68, since those docs suggest dict-like behavior is valid, and this change makes that true.

    • Merge conflict. [Adam Cohen]

    • Improve docs for fuzzywuzzy.process. [Nathan Typanski]

      The documentation for this module was dated and sometimes inaccurate. This overhauls the docs to accurately describe the current module, including detailing optional arguments that were not previously explained - e.g., limit argument to extract().

      This change follows the Google Python Style Guide, which may be found at:

      https://google-styleguide.googlecode.com/svn/trunk/pyguide.html?showone=Comments#Comments

    Source code(tar.gz)
    Source code(zip)
  • 0.5.0(Jul 20, 2015)

    • FIX: 0.4.0 is released, no need to specify 0.3.1 in README. [Josh Warner (Mac)]

    • Fixed a small typo. [Rostislav Semenov]

    • Reset processor and scorer defaults to None with argument checking. [foxxyz]

    • Catch generators without lengths. [Jeremiah Lowin]

    • Fixed python3 issue and deprecated assertion method. [foxxyz]

    • Fixed some docstrings, typos, python3 string method compatibility, some errors that crept in during rebase. [foxxyz]

    • [mod] The lamdba in extract is not needed. [Olivier Le Thanh Duong]

      [mod] Pass directly the defaults functions in the args

      [mod] itertools.takewhile() can handle empty list just fine no need to test for it

      [mod] Shorten extractOne by removing double if

      [mod] Use a list comprehention in extract()

      [mod] Autopep8 on process.py

      [doc] Document make_type_consistent

      [mod] bad_chars shortened

      [enh] Move regex compilation outside the method, otherwhise we don't get the benefit from it

      [mod] Don't need all the blah just to redefine method from string module

      [mod] Remove unused import

      [mod] Autopep8 on string_processing.py

      [mod] Rewrote asciidammit without recursion to make it more readable

      [mod] Autopep8 on utils.py

      [mod] Remove unused import

      [doc] Add some doc to fuzz.py

      [mod] Move the code to sort string in a separate function

      [doc] Docstrings for WRatio, UWRatio

    • Add note on which package to install. Closes #67. [Jose Diaz-Gonzalez]

    Source code(tar.gz)
    Source code(zip)
  • 0.4.0(Oct 31, 2014)

    • Merge pull request #64 from ojomio/master. [Jose Diaz-Gonzalez]

      In extarctBests() and extractOne() use '>=' instead of '>'

    • Merge pull request #62 from ojomio/master. [Jose Diaz-Gonzalez]

      Fixed python3 issue with SequenceMatcher import

    Source code(tar.gz)
    Source code(zip)
  • 0.3.3(Oct 22, 2014)

    • Update release script to make it more generic. [Jose Diaz-Gonzalez]
    • Merge pull request #60 from ojomio/master. [Jose Diaz-Gonzalez] Fixed issue #59 - "partial" parameter for _token_set() is now honored
    • Merge pull request #54 from jlowin/patch-1. [Jose Diaz-Gonzalez] Remove explicit check for lists
    Source code(tar.gz)
    Source code(zip)
  • 0.3.2(Sep 12, 2014)

    • Make release command an executable. [Jose Diaz-Gonzalez]
    • Simplify MANIFEST.in. [Jose Diaz-Gonzalez]
    • Add a release script. [Jose Diaz-Gonzalez]
    • Fix readme codeblock. [Jose Diaz-Gonzalez]
    • Minor formatting. [Jose Diaz-Gonzalez]
    • Update readme with proper installation notes. [Jose Diaz-Gonzalez]
    • Use version from fuzzywuzzy package. [Jose Diaz-Gonzalez]
    • Set version constant in init.py. [Jose Diaz-Gonzalez]
    • Update setup.py. [Jose Diaz-Gonzalez]
    • Rename LICENSE to LICENSE.txt. [Jose Diaz-Gonzalez]
    • Update packaging a bit. [Jose Diaz-Gonzalez]
    Source code(tar.gz)
    Source code(zip)
  • 0.3.0(Aug 24, 2014)

    • Allow choices to be a list or dict
    • Add testing for 3.4
    • Typo updates
    • Update readme, change formatting to RST
    • Fix package requirements
    • PEP8!
    Source code(tar.gz)
    Source code(zip)
Fuzzy String Matching in Python

FuzzyWuzzy Fuzzy string matching like a boss. It uses Levenshtein Distance to calculate the differences between sequences in a simple-to-use package.

SeatGeek 7.9k Feb 17, 2021
:id: A python library for accurate and scalable fuzzy matching, record deduplication and entity-resolution.

Dedupe Python Library dedupe is a python library that uses machine learning to perform fuzzy matching, deduplication and entity resolution quickly on

Dedupe.io 3.6k Jan 2, 2023
:id: A python library for accurate and scalable fuzzy matching, record deduplication and entity-resolution.

Dedupe Python Library dedupe is a python library that uses machine learning to perform fuzzy matching, deduplication and entity resolution quickly on

Dedupe.io 2.9k Feb 11, 2021
:id: A python library for accurate and scalable fuzzy matching, record deduplication and entity-resolution.

Dedupe Python Library dedupe is a python library that uses machine learning to perform fuzzy matching, deduplication and entity resolution quickly on

Dedupe.io 2.9k Feb 17, 2021
Linear programming solver for paper-reviewer matching and mind-matching

Paper-Reviewer Matcher A python package for paper-reviewer matching algorithm based on topic modeling and linear programming. The algorithm is impleme

Titipat Achakulvisut 66 Jul 5, 2022
Takes a string and puts it through different languages in Google Translate a requested amount of times, returning nonsense.

PythonTextObfuscator Takes a string and puts it through different languages in Google Translate a requested amount of times, returning nonsense. Requi

null 2 Aug 29, 2022
String Gen + Word Checker

Creates random strings and checks if any of them are a real words. Mostly a waste of time ngl but it is cool to see it work and the fact that it can generate a real random word within10sec

null 1 Jan 6, 2022
🎐 a python library for doing approximate and phonetic matching of strings.

jellyfish Jellyfish is a python library for doing approximate and phonetic matching of strings. Written by James Turk <[email protected]> and Michael

James Turk 1.8k Dec 21, 2022
🎐 a python library for doing approximate and phonetic matching of strings.

jellyfish Jellyfish is a python library for doing approximate and phonetic matching of strings. Written by James Turk <[email protected]> and Michael

James Turk 1.4k Feb 12, 2021
Python package for performing Entity and Text Matching using Deep Learning.

DeepMatcher DeepMatcher is a Python package for performing entity and text matching using deep learning. It provides built-in neural networks and util

null 461 Dec 28, 2022
🎐 a python library for doing approximate and phonetic matching of strings.

jellyfish Jellyfish is a python library for doing approximate and phonetic matching of strings. Written by James Turk <[email protected]> and Michael

James Turk 1.4k Feb 17, 2021
Python package for performing Entity and Text Matching using Deep Learning.

DeepMatcher DeepMatcher is a Python package for performing entity and text matching using deep learning. It provides built-in neural networks and util

null 276 Feb 9, 2021
Pattern Matching in Python

Pattern Matching finalmente chega no Python 3.10. E daí? "Pattern matching", ou "correspondência de padrões" como é conhecido no Brasil. Algumas pesso

Fabricio Werneck 6 Feb 16, 2022
Facilitating the design, comparison and sharing of deep text matching models.

MatchZoo Facilitating the design, comparison and sharing of deep text matching models. MatchZoo 是一个通用的文本匹配工具包,它旨在方便大家快速的实现、比较、以及分享最新的深度文本匹配模型。 ?? News

Neural Text Matching Community 3.7k Jan 2, 2023
Facilitating the design, comparison and sharing of deep text matching models.

MatchZoo Facilitating the design, comparison and sharing of deep text matching models. MatchZoo 是一个通用的文本匹配工具包,它旨在方便大家快速的实现、比较、以及分享最新的深度文本匹配模型。 ?? News

Neural Text Matching Community 3.4k Feb 18, 2021
Code release for "COTR: Correspondence Transformer for Matching Across Images"

COTR: Correspondence Transformer for Matching Across Images This repository contains the inference code for COTR. We plan to release the training code

UBC Computer Vision Group 358 Dec 24, 2022
The Internet Archive Research Assistant - Daily search Internet Archive for new items matching your keywords

The Internet Archive Research Assistant - Daily search Internet Archive for new items matching your keywords

Kay Savetz 60 Dec 25, 2022
A pytorch implementation of the ACL2019 paper "Simple and Effective Text Matching with Richer Alignment Features".

RE2 This is a pytorch implementation of the ACL 2019 paper "Simple and Effective Text Matching with Richer Alignment Features". The original Tensorflo

null 286 Jan 2, 2023