Diff Match Patch is a high-performance library in multiple languages that manipulates plain text.

Overview

The Diff Match and Patch libraries offer robust algorithms to perform the operations required for synchronizing plain text.

  1. Diff:
    • Compare two blocks of plain text and efficiently return a list of differences.
    • Diff Demo
  2. Match:
    • Given a search string, find its best fuzzy match in a block of plain text. Weighted for both accuracy and location.
    • Match Demo
  3. Patch:
    • Apply a list of patches onto plain text. Use best-effort to apply patch even when the underlying text doesn't match.
    • Patch Demo

Originally built in 2006 to power Google Docs, this library is now available in C++, C#, Dart, Java, JavaScript, Lua, Objective C, and Python.

Reference

Languages

Although each language port of Diff Match Patch uses the same API, there are some language-specific notes.

A standardized speed test tracks the relative performance of diffs in each language.

Algorithms

This library implements Myer's diff algorithm which is generally considered to be the best general-purpose diff. A layer of pre-diff speedups and post-diff cleanups surround the diff algorithm, improving both performance and output quality.

This library also implements a Bitap matching algorithm at the heart of a flexible matching and patching strategy.

Comments
  • [WIP] initial typescript conversion

    [WIP] initial typescript conversion

    Following on from #58, I've kicked off the conversion of diff-match-patch starting with @NeilFraser's starting point as the first commit. I've included the built copy for now, though I may strip that out and rebase it based on feedback.

    One interesting thing between Neil's built version and my built version:

    diff --git a/typescript/built/diff_match_patch.js b/typescript/built/diff_match_patch.js
    index 8fbb030..dafa48a 100644
    --- a/typescript/built/diff_match_patch.js
    +++ b/typescript/built/diff_match_patch.js
    @@ -387,7 +387,7 @@ var diff_match_patch = /** @class */ (function () {
          */
         diff_match_patch.prototype.diff_linesToChars_ = function (text1, text2) {
             var lineArray = []; // e.g. lineArray[4] == 'Hello\n'
    -        var lineHash = {}; // e.g. lineHash['Hello\n'] == 4
    +        var lineHash = Object.create(null); // e.g. lineHash['Hello\n'] == 4
             // '\x00' is a valid character, but various debuggers don't like it.
             // So we'll insert a junk entry to avoid generating a null character.
             lineArray[0] = '';
    

    Not quite sure why typescript goes that route to make an object with a null prototype (that still behaves like an object).

    What works

    • Base tests ✅
    • Speed tests ❌
      • The speed tests use diff_match_patch_uncompressed.js which is not generated (by name) here
    • Typing! ✅

    Next steps

    • [ ] Determine how well the types line up with that of @types/diff-match-patch to ease migration
    • [ ] Make speed tests run
    • [ ] Determine how the uncompressed and compressed versions will be stored (I need direction here)
    opened by rgbkrk 9
  • simplify encodeURI using HttpUtility

    simplify encodeURI using HttpUtility

    This simplifies the encodeURI function, using HttpUtility instead of Uri. Tests pass. I suspect this also to be more performant (although I didn't test this specifically).

    opened by jhgbrt 6
  • dart version have something wrong, please fix it...

    dart version have something wrong, please fix it...

    like "message": "'Diff.==' ('(Diff) → bool') isn't a valid override of 'Object.==' ('(dynamic) →

    "message": "A value of type 'Operation' can't be assigned to a variable of type 'int'.",
    "startColumn": 27,
    "endColumn": 54,
    

    ...

    opened by BetterWorld-Liuser 5
  • C# - Exception when diffing long files (Invalid URI: The Uri string is too long.)

    C# - Exception when diffing long files (Invalid URI: The Uri string is too long.)

    I'm trying to diff several files sometimes up to around half a megabyte in size and with some of them I get a "Invalid URI: The Uri string is too long." exception. This is using the patch_make function.

    opened by XXLuigiMario 5
  • Line Diff causes an exception

    Line Diff causes an exception

    Running the example usage from here on the javascript version:

      const text1="this\n is\n a\n test\n";
      const text2="this\n wasn't\n a\n test\n";
      // Example code "as is":
      var dmp = new diff_match_patch();
      var a = dmp.diff_linesToChars_(text1, text2);
      var lineText1 = a[0];
      var lineText2 = a[1];
      var lineArray = a[2];
      var diffs = dmp.diff_main(lineText1, lineText2, false);
      dmp.diff_charsToLines_(diffs, lineArray);
    

    Causes this exception:

    Uncaught Error: Null input. (diff_main) at diff_match_patch.diff_main (:2:190) at :6:19

    opened by urigoren 5
  • 相同内容对比,Java和OC返回结果不一致!这样的问题该怎样解决?Compared with the same content, Java and OC return results are inconsistent! How can such a problem be solved?

    相同内容对比,Java和OC返回结果不一致!这样的问题该怎样解决?Compared with the same content, Java and OC return results are inconsistent! How can such a problem be solved?

    Compared with the same content, Java and OC return results are inconsistent! How can such a problem be solved?(相同内容对比,Java和OC返回结果不一致!这样的问题该怎样解决? Andios )Java on the left and OC on the right

    opened by xuxh6 4
  • Does not seem to work with big text

    Does not seem to work with big text

    Hi tried to compare two ancient greek codices of the New Testament but it seems to failed since for the Vaticanus and Sinaiticus I only have βιβλοϲγενεϲεωϲ|ιυ||χυ|υιου as common bloc.

    opened by kylak 3
  • Modified the Objective-C implementation to use ARC

    Modified the Objective-C implementation to use ARC

    This change does three things:

    1. Updates the Xcode project to support the latest Xcode (11.5),
    2. Make the Apple recommended settings to the Xcode project, and
    3. (most importantly) Updates the project and the code to use Automatic Reference Counting (ARC).

    Closes #97

    opened by stevenklassen8376 3
  • cleanupSemantic doesnt return anything

    cleanupSemantic doesnt return anything

    The last part of the example doesn't work because cleanupSemantic doesn't return the result.

    I looked at the code and indeed the function doesn't return anything.

    opened by jdruiter 3
  • Port C++ code to use Qt5 and its QStringRef

    Port C++ code to use Qt5 and its QStringRef

    Qt4 is no longer supported by Qt.

    Apart from that, porting to Qt5 should also improve the performance, as it has added QStringRef class and QString::leftRef, QString::midRef and QString::rightRef methods. I see that the code splits QStrings into substrings quite a lot, and it does so by making copies, but with QStringRef you can avoid making copies since QStringRef substrings reference the original QString.

    opened by nurupo 3
  • %0A in output

    %0A in output

    I encountered this in Java (using patch_toText(patch_make(...)), but I can see it on the JavaScript demo page, too, and I saw a past Google Code discussion touch on it for a Python usage, plus some Go code working around it.

    Enter the following into the demo page:

    Old version:

    a a

    New version:

    a a a

    (I entered both with trailing newlines, but the behavior is similar without.)

    Compute Patch gives:

    @@ -1,4 +1,6 @@
     a%0Aa%0A
    +a%0A
    

    I would have expected something more like:

    $ diff -u <(echo a; echo a) <(echo a; echo a; echo a) | tail -n +3
    @@ -1,2 +1,3 @@
     a
     a
    +a
    

    (Hmm, I notice only now that diff-match-match disagrees with diff -u about the number of lines involved. (At least, I think that's what the ,4/,6 and ,2/,3 are?))

    As a workaround, I can replace %0A sequences with the empty string, but as best I can tell, even that will leave various other characters escaped.

    I'm a little confused as to why URL encoding comes into this at all. I would expect for Patch.toString() to avoid all that, instead looking like this:

            text.append(aDiff.text.replaceAll("\n", ""))                                                                                                          
                .append("\n");                    
    

    (It seems strange that I need to remove internal \n characters at all. I haven't dug into it.)

    opened by cpovirk 3
  • diff_cleanupSemantic cleans last newline incorrect

    diff_cleanupSemantic cleans last newline incorrect

    The cleaner methodes like diff_cleanupSemantic not correct the last newline like expected. Sometimes (by multiple line insert/delete) the result grabs a newline character from the line before, and cut his own of. It would be better if complete lines are inserted/deleted. I made my own cleaner to correct this and I can post it here, but it should be done in the diff_cleanup.

    (tested only with the cs code)

    opened by Alisis33 4
  • [C#] Convert Javadoc in C# code to proper triple-slash documentation

    [C#] Convert Javadoc in C# code to proper triple-slash documentation

    I like this library and would like to give back: This PR converts the Javadoc in the C# source code to C# XML comments. With these changes, each class's documentation would be made usable by IDE's.

    Notably, this PR does not affect functionality, compat, etc. in any way, so there is no risk accepting this PR. Aside from the documentation convertion, my IDE auto-formatted the opening curly brace to start on the next line, as is the norm in C#.

    Edit: This PR is easier to review with whitespace ignored: https://github.com/google/diff-match-patch/pull/128/files?w=1

    opened by RenderMichael 1
  • Typescript (.ts) port of the file diff_match_patch_uncompressed.js

    Typescript (.ts) port of the file diff_match_patch_uncompressed.js

    This is useful for projects that use Typescript (e.g. a Visual Studio Code Extension). Tested using the same logic as is present in the Patch Demo. "visualSilicon" has also submitted a CLA for contributions to Google open-source projects.

    opened by ramin-zaghi 3
  • Wiki/Documentation: Suggested function name should be diff_wordsToChars

    Wiki/Documentation: Suggested function name should be diff_wordsToChars

    In the first sentence in the section about a word mode:

    https://github.com/google/diff-match-patch/wiki/Line-or-Word-Diffs#word-mode

    Computing a word-mode diff is exactly the same as the line-mode diff, except you will have to make a copy of diff_linesToChars and call it diff_linesToWords.

    The function should be renamed to "diff_wordsToChars", not "diff_linesToWords".

    Because words will be mapped to single chars, if you do the suggested modification. Analogous to lines that will be mapped to single chars in the original function.

    opened by ztiromoritz 0
  • Request for code review

    Request for code review

    Hi,

    I yesterday ran into a problem that my diff implementation (adopted from the internet) in some cases actually doesn't work reliably.

    Suppose I have a bit stream of zeros and ones, called simply "mine" (99658 bits) and "theirs" (99655 bits). These represent low-level magnetic information of two full revolutions of one and the same track on a 2DD floppy (nominally 100,000 bits).

    Apart from being of different lengths (theirs being by three bits shorter than mine), there is also a difference in the middle - 34,425-th bit is different, thanks to magnetic deterioration. Otherwise, the two bit streams are identical.

    image

    image

    image

    My implementation of the diff algorithm produces the following script that transforms mine into theirs (unlike the numbering in OpenOffice Calc, the numbering of bits is zero-based, hence off by 1). The first two seem to successfully deal with the difference in the middle. The latter two fail to remove the tail three extra bits in mine.

    • insertion of 1 bit from theirs at 34424 into mine at 34424,
    • deletion of 1 bit from mine at 34424,
    • deletion of 1 bit from mine at 99646,
    • deletion of 1 bit from mine at 99650.

    Because I've exhausted all my skills, I would like to ask if anyone could check my implementation of the diff algorithm and eventually fix the bug that's inside. Although I read Mr.Myers's paper "An O(ND) Difference Algorithm and its Variations," I'm not much more clever than before. The diff algorithm is implemented here (circa 100 lines of actual code): https://github.com/tomas-nestorovic/RIDE/blob/master/Main/src/Diff.h

    Edit: Attached mine.txt and theirs.txt with respective bit streams. theirs.txt mine.txt

    MANY THANKS IN ADVANCE FOR ANY RESPONSES! :-)

    opened by tomas-nestorovic 1
Owner
Google
Google ❤️ Open Source
Google
A flexible free and unlimited python tool to translate between different languages in a simple way using multiple translators.

deep-translator Translation for humans A flexible FREE and UNLIMITED tool to translate between different languages in a simple way using multiple tran

Nidhal Baccouri 806 Jan 4, 2023
Implements a polyglot REPL which supports multiple languages and shared meta-object protocol scope between REPLs.

MetaCall Polyglot REPL Description This repository implements a Polyglot REPL which shares the state of the meta-object protocol between the REPLs. Us

MetaCall 10 Dec 28, 2022
vFuzzer is a tool developed for fuzzing buffer overflows, For now, It can be used for fuzzing plain vanilla stack based buffer overflows

vFuzzer vFuzzer is a tool developed for fuzzing buffer overflows, For now, It can be used for fuzzing plain vanilla stack based buffer overflows, The

Vedant Bhalgama 5 Nov 12, 2022
flake8 plugin which forbids match statements (PEP 634)

flake8-match flake8 plugin which forbids match statements (PEP 634)

Anthony Sottile 25 Nov 1, 2022
YourCity is a platform to match people to their prefect city.

YourCity YourCity is a city matching App that matches users to their ideal city. It is a fullstack React App made with a Redux state manager and a bac

Nico G Pierson 6 Sep 25, 2021
Simple script to match riders with drivers.

theBestPooler Simple script to match riders with drivers. It's a greedy, unoptimised search, so no guarantees that it works. It just seems to work (ve

Devansh 1 Nov 22, 2021
Larvamatch - Find your larva or punk match.

LarvaMatch Find your larva or punk match. UI TBD API (not started) The API will allow you to specify a punk by token id to find a larva match, and vic

null 1 Jan 2, 2022
peace-performance (Rust) binding for python. To calculate star ratings and performance points for all osu! gamemodes

peace-performance-python Fast, To calculate star ratings and performance points for all osu! gamemodes peace-performance (Rust) binding for python bas

null 9 Sep 19, 2022
tox-gh is a tox plugin which helps running tox on GitHub Actions with multiple different Python versions on multiple workers in parallel

tox-gh is a tox plugin which helps running tox on GitHub Actions with multiple different Python versions on multiple workers in parallel. This project is inspired by tox-travis.

tox development team 19 Dec 26, 2022
A patch and keygen tools for typora.

A patch and keygen tools for typora.

Mason Shi 1.4k Apr 12, 2022
Translation patch for Hololive ERROR

Translation patch for Hololive ERROR How do I install the patch? Grab the Translation.zip file for the latest version from the releases page, and unzi

null 18 Jul 20, 2022
EasyBuild is a software build and installation framework that allows you to manage (scientific) software on High Performance Computing (HPC) systems in an efficient way.

EasyBuild is a software build and installation framework that allows you to manage (scientific) software on High Performance Computing (HPC) systems in an efficient way.

EasyBuild community 87 Dec 27, 2022
Hello World in different languages !

Hello World And some Examples in different Programming Languages This repository contains a big list of programming languages and some examples for th

AmirHossein Mohammadi 131 Dec 26, 2022
Multtable is a collection of multiplication table generators in various languages.

Multtable Multtable is a collection of multiplication table generators in various languages. This project was created as a joke based on one of my bro

pollen__ 7 Mar 5, 2022
My collection of mini-projects in various languages

Mini-Projects My collection of mini-projects in various languages About: This repository consists of a number of small projects. Most of these "mini-p

Siddhant Attavar 1 Jul 11, 2022
Similarity checking of sign languages

Similarity checking of sign languages This repository checks for similarity betw

Tonni Das Jui 1 May 13, 2022
Calculatrix is a project where I'll create plenty of calculators in a lot of differents languages

Calculatrix What is Calculatrix ? Calculatrix is a project where I'll create plenty of calculators in a lot of differents languages. I know this sound

null 1 Jun 14, 2022
Print 'text color' and 'text format' on Term with Python

term-printer Print 'text color' and 'text format' on Term with Python ※ It may not work depending on the OS and shell used. PIP $ pip install term-pri

ななといつ 10 Nov 12, 2022