Mesi
Mesi is a tool to measure the similarity in a many-to-many fashion of long-form documents like Python source code or technical writing. The output can be useful in determining which of a collection of files are the most similar to each other.
Installation
Python 3.9+ and pipx are recommended, although Python 3.6+ and/or pip will also work.
pipx install mesi
If you'd like to test out Mesi before installing it, use the remote execution feature of pipx
, which will temporarily download Mesi and run it in an isolated virtual environment.
pipx run mesi --help
Usage
For a directory structure that looks like:
lab-one
├── StudentOne
│ ├── pyproject.toml
│ ├── deliverables
│ │ └── python_program.py
│ └── README.md
├── StudentTwo
│ ├── pyproject.toml
│ ├── deliverables
│ │ └── python_program.py
│ └── README.md
│
where similarity should be measured between each student's deliverables/python_program.py
file, run the command:
mesi lab-one/*/deliverables/python_program.py
A lower distance in the produced table equates to a higher degree of similarity.
See the help menu (mesi --help
) for additional options and configuration.
Algorithms
There are many algorithms to choose from when comparing string similarity! Mesi implements all the algorithms provided by TextDistance. In general levenshtein
is never a bad choice, which is why it is the default.
Bugs/Requests
Please use the GitHub issue tracker to submit bugs or request new features, options, or algorithms.
Dependencies
Mesi uses two primary dependencies for text similarity calculation: polyleven, and TextDistance. Polyleven is the default, as its singular implementation of Levenshtein distance can be faster in most situations. However, if a different edit distance algorithm is requested, TextDistance's implementations will be used.
License
Distributed under the terms of the GPL v3 license, mesi is free and open source software.