15 Repositories
Python deduplication Libraries
Python package for Near Duplicate Video Detection (Perceptual Video Hashing) - Get a 64-bit comparable hash-value for any video.
The Python package for near duplicate video detection ⭐️ Introduction Videohash is a Python package for detecting near-duplicate videos (Perceptual Vi
Very efficient backup system based on the git packfile format, providing fast incremental saves and global deduplication
Very efficient backup system based on the git packfile format, providing fast incremental saves and global deduplication (among and within files, including virtual machine images). Current release is 0.31, and the development branch is master. Please post problems or patches to the mailing list for discussion (see the end of the README below).
Imagededup - 😎 Finding duplicate images made easy
imagededup is a python package that simplifies the task of finding exact and near duplicates in an image collection.
Simple, configuration-driven backup software for servers and workstations
title permalink borgmatic index.html It's your data. Keep it that way. borgmatic is simple, configuration-driven backup software for servers and works
Deduplicating archiver with compression and authenticated encryption.
More screencasts: installation, advanced usage What is BorgBackup? BorgBackup (short: Borg) is a deduplicating backup program. Optionally, it supports
Distributed, blockchain based hashtables middleware for deduplication of file uploads to the cloud
distributed-blockchain-based-secure-file-dedupe Searching is Distributed, Block and Access List for each upload is unique and it is stored in a single
📧 CLI to deduplicate mails from mail boxes.
Mail Deduplicate Command-line tool to deduplicate mails from a set of boxes. Stable release: Development: Features Duplicate detection based on cherry
py-image-dedup is a tool to sort out or remove duplicates within a photo library
py-image-dedup is a tool to sort out or remove duplicates within a photo library. Unlike most other solutions, py-image-dedup intentionally uses an approximate image comparison to also detect duplicates of images that slightly differ in resolution, color or other minor details.
Web Scraping, Document Deduplication & GPT-2 Fine-tuning with a newly created scam dataset.
Web Scraping, Document Deduplication & GPT-2 Fine-tuning with a newly created scam dataset.
Deduplication is the task to combine different representations of the same real world entity.
Deduplication is the task to combine different representations of the same real world entity. This package implements deduplication using active learning. Active learning allows for rapid training without having to provide a large, manually labelled dataset.
Find duplicate files
dupeGuru dupeGuru is a cross-platform (Linux, OS X, Windows) GUI tool to find duplicate files in a system. It is written mostly in Python 3 and has th
:id: A python library for accurate and scalable fuzzy matching, record deduplication and entity-resolution.
Dedupe Python Library dedupe is a python library that uses machine learning to perform fuzzy matching, deduplication and entity resolution quickly on
:id: A python library for accurate and scalable fuzzy matching, record deduplication and entity-resolution.
Dedupe Python Library dedupe is a python library that uses machine learning to perform fuzzy matching, deduplication and entity resolution quickly on
UDdup - URLs Deduplication Tool
UDdup - URLs Deduplication Tool The tool gets a list of URLs, and removes "duplicate" pages in the sense of URL patterns that are probably repetitive
:id: A python library for accurate and scalable fuzzy matching, record deduplication and entity-resolution.
Dedupe Python Library dedupe is a python library that uses machine learning to perform fuzzy matching, deduplication and entity resolution quickly on