Find duplicate files

Overview

dupeGuru

dupeGuru is a cross-platform (Linux, OS X, Windows) GUI tool to find duplicate files in a system. It is written mostly in Python 3 and has the peculiarity of using multiple GUI toolkits, all using the same core Python code. On OS X, the UI layer is written in Objective-C and uses Cocoa. On Linux, it is written in Python and uses Qt5.

The Cocoa UI of dupeGuru is hosted in a separate repo: https://github.com/arsenetar/dupeguru-cocoa

Current status

Still looking for additional help especially with regards to:

  • OSX maintenance: reproducing bugs & cocoa version, building package with Cocoa UI.
  • Linux maintenance: reproducing bugs, maintaining PPA repository, Debian package.
  • Translations: updating missing strings, transifex project at https://www.transifex.com/voltaicideas/dupeguru-1
  • Documentation: keeping it up-to-date.

Contents of this folder

This folder contains the source for dupeGuru. Its documentation is in help, but is also available online in its built form. Here's how this source tree is organized:

  • core: Contains the core logic code for dupeGuru. It's Python code.
  • qt: UI code for the Qt toolkit. It's written in Python and uses PyQt.
  • images: Images used by the different UI codebases.
  • pkg: Skeleton files required to create different packages
  • help: Help document, written for Sphinx.
  • locale: .po files for localization.
  • hscommon: A collection of helpers used across HS applications.
  • qtlib: A collection of helpers used across Qt UI codebases of HS applications.

How to build dupeGuru from source

Windows & macOS specific additional instructions

For windows instructions see the Windows Instructions.

For macos instructions (qt version) see the macOS Instructions.

Prerequisites

System Setup

When running in a linux based environment the following system packages or equivalents are needed to build:

  • python3-pyqt5
  • pyqt5-dev-tools (on some systems, see note)
  • python3-wheel (for hsaudiotag3k)
  • python3-venv (only if using a virtual environment)
  • python3-dev
  • build-essential

Note: On some linux systems pyrcc5 is not put on the path when installing python3-pyqt5, this will cause some issues with the resource files (and icons). These systems should have a respective pyqt5-dev-tools package, which should also be installed. The presence of pyrcc5 can be checked with which pyrcc5. Debian based systems need the extra package, and Arch does not.

To create packages the following are also needed:

  • python3-setuptools
  • debhelper

Building with Make

dupeGuru comes with a makefile that can be used to build and run:

$ make && make run

Building without Make

$ cd <dupeGuru directory>
$ python3 -m venv --system-site-packages ./env
$ source ./env/bin/activate
$ pip install -r requirements.txt
$ python build.py
$ python run.py

Generating Debian/Ubuntu package

To generate packages the extra requirements in requirements-extra.txt must be installed, the steps are as follows:

$ cd <dupeGuru directory>
$ python3 -m venv --system-site-packages ./env
$ source ./env/bin/activate
$ pip install -r requirements.txt -r requirements-extra.txt
$ python build.py --clean
$ python package.py

This can be made a one-liner (once in the directory) as:

$ bash -c "python3 -m venv --system-site-packages env && source env/bin/activate && pip install -r requirements.txt -r requirements-extra.txt && python build.py --clean && python package.py"

Running tests

The complete test suite is run with Tox 1.7+. If you have it installed system-wide, you don't even need to set up a virtualenv. Just cd into the root project folder and run tox.

If you don't have Tox system-wide, install it in your virtualenv with pip install tox and then run tox.

You can also run automated tests without Tox. Extra requirements for running tests are in requirements-extra.txt. So, you can do pip install -r requirements-extra.txt inside your virtualenv and then py.test core hscommon

Comments
  • Add new contributor

    Add new contributor

    dupeGuru has currently only one maintainer, me. This is a dangerous situation that needs to be corrected.

    The goal is to eventually have another active maintainer, but before we can get there, the project needs more contributors. It is very much lacking on that side right now.

    Whatever your skills, if you are remotely interestested in being a contributor, I'm interested in mentoring you. I've been saying so in the Contribute page for a while, but now I'm thinking it might be a better idea to adverstise the need for contributors in a ticket. This way, it's clear whether someone has answered the call or not.

    So, if you would like to start contributing to dupeGuru but would like some guidance/mentorship, simply add a comment here, we'll get started.

    bug beginner 
    opened by hsoft 56
  • Package 4.0.4

    Package 4.0.4

    • [x] Update Version to 4.0.4
    • [x] Update Changelog
    • [x] Package Windows 64 bit
    • [x] Package Windows 32 bit
    • [ ] Package OSX Qt (experimental)
    • [ ] Package OSX Cocoa
    • [x] ~~Package .deb~~
    • [x] ~~Package .rpm (maybe)~~
    • [x] PPA Ubuntu LTS
    • [x] PPA Ubuntu Latest (LTS packages should work)
    • [x] Package Arch Linux
    • [x] Commit any changes needed to complete packing
    • [x] Tag Repository
    • [x] Make Github Release
    • [ ] Update Website
    opened by arsenetar 46
  • Package 4.1.0

    Package 4.1.0

    • [x] Merge remaining PRs #733 & #705
    • [x] Update Version to 4.1.0 #733
    • [x] Update Changelog #733
    • [x] Package Windows 64 bit
    • [x] Package Windows 32 bit
    • [ ] Package OSX Qt (experimental)
    • [ ] Package OSX Cocoa
    • [x] Package .deb
    • [x] Notify Ubuntu PPA maintainers
    • [x] Notify Arch Linux maintainers
    • [x] Make Github Release
    • [ ] Update website links for new packages.
    opened by arsenetar 37
  • dupeGuru PPA is outdated

    dupeGuru PPA is outdated

    Hello, I'm not sure if this is the right place to raise this but are there any plans to provide updated versions to the Ubuntu PPA ?

    The last build is from 2017-08-25, current Ubuntu release is 17.10 (Artful Aardvark) rleased in October 2017.

    I thought the building of the DEB files for the PPA is an automated process. So could this be checked and fixed? Unfortunately I'm lacking the skills to be more helpful.

    Linux 
    opened by seb-1204 33
  • Package 4.1.1

    Package 4.1.1

    • [x] Verify any other issues to fix before tag
    • [x] Update Version to 4.1.1
    • [x] Update Changelog
    • [x] Package Windows 64 bit
    • [x] Package Windows 32 bit
    • [x] Package OSX Qt (experimental) Packaging has been fixed
    • [x] Package OSX Cocoa
    • [x] Package .deb (x64 only, also debian source archive)
    • [x] Notify Ubuntu PPA maintainers
    • [x] Notify Arch Linux maintainers
    • [x] Make Github Release
    • [x] Update website links for new packages.
    opened by arsenetar 27
  • Language problems in 4.1.0

    Language problems in 4.1.0

    First of all, thank you for this great project and all the awesome work!

    I just installed 4.1.0 on windows, the language only defaults to system language,

    e.g., if my system is in English, the interface is displayed in English, changing it in display setting seems to be doing nothing, even if I reboot my PC, the displayed language is still English. Same applies when I installed it in a Chinese system, language defaults to Chinese, there is simply no way to change it.

    Another issue is that on transifex, I saw that Chinese translation is 98% finished with only 4 strings to be translated, however in reality more than half of the interface is not translated.

    • OS: Windows 10 1903 x64
    • Version: 4.1.0

    1

    bug Windows 
    opened by terrytw 19
  • Initial Update of Windows Packaging

    Initial Update of Windows Packaging

    I tested the generated installers and executable files on a couple windows 10 machines and they seem to work fine. There are probably a few areas for improvement, namely:

    • Installing both x86 and x64 versions is not completely supported as the installer script is written right now, I think I know how to get this working without being overly complicated.
    • path to makensis is currently hard-coded in package.py, adding the ability to pass the path would be better

    Right now I think as long as the program itself works and the packaging works well it would probably be good to get it out to see if there are additional issues.

    Ref #393

    opened by arsenetar 17
  • Post-scan re-prioritization

    Post-scan re-prioritization

    Delta value + Dupes only mode is powerful, but not for all cases. For example, there's no way to re-prioritize cases like this one. A new tool would be needed with more powerful options, such as kind-based prioritization, folder-based prioritization, and so one. I'm not sure yet of the form it should take.

    enhancement 
    opened by hsoft 16
  • Dupeguru 4.1.1 does not start on Ubuntu 21.04 with Wayland

    Dupeguru 4.1.1 does not start on Ubuntu 21.04 with Wayland

    Describe the bug Hi, I finally succeded to install Dupeguru 4.1.1 new version on Ubuntu 21.04 - I just had to add python3-pyqt5

    Installation was fine, but when I launch Dupeguru, it doesn't start, and on the terminal I have following error message :

    $ dupeguru Warning: Ignoring XDG_SESSION_TYPE=wayland on Gnome. Use QT_QPA_PLATFORM=wayland to run on Wayland anyway. Traceback (most recent call last): File "/usr/bin/dupeguru", line 89, in sys.exit(main()) File "/usr/bin/dupeguru", line 72, in main from qt.app import DupeGuru File "/usr/share/dupeguru/qt/app.py", line 22, in from core.app import AppMode, DupeGuru as DupeGuruModel File "/usr/share/dupeguru/core/app.py", line 24, in from . import se, me, pe File "/usr/share/dupeguru/core/pe/init.py", line 1, in from . import ( # noqa File "/usr/share/dupeguru/core/pe/block.py", line 9, in from ._block import NoBlocksError, DifferentBlockCountError, avgdiff, getblocks2 # NOQA ModuleNotFoundError: No module named 'core.pe._block'

    Desktop:

    • OS: Ubuntu 21.04 / Gnome 3.38.5 / Wayland
    • Version 4.1.1

    Thanks !

    bug 
    opened by Valeryan24 15
  • Crash on startup on Ubuntu 20.04

    Crash on startup on Ubuntu 20.04

    Hi, I just installed DupeGuru with the Ubuntu ppa - https://launchpad.net/~dupeguru/+archive/ubuntu/ppa - on the development version 20.04 LTS.

    But program doesn't launch, here is the error message :

    Warning: Ignoring XDG_SESSION_TYPE=wayland on Gnome. Use QT_QPA_PLATFORM=wayland to run on Wayland anyway. Gdk-Message: 18:41:40.726: Window 0x209b220 is a temporary window without parent, application will not be able to position it on screen. Gdk-Message: 18:41:40.726: Window 0x209b220 is a temporary window without parent, application will not be able to position it on screen. Gdk-Message: 18:41:40.726: Window 0x209b220 is a temporary window without parent, application will not be able to position it on screen. Gdk-Message: 18:41:40.726: Window 0x209b220 is a temporary window without parent, application will not be able to position it on screen. Traceback (most recent call last): File "/usr/bin/dupeguru", line 81, in sys.exit(main()) File "/usr/bin/dupeguru", line 66, in main from qt.app import DupeGuru File "/usr/share/dupeguru/qt/app.py", line 22, in from core.app import AppMode, DupeGuru as DupeGuruModel File "/usr/share/dupeguru/core/app.py", line 24, in from . import se, me, pe File "/usr/share/dupeguru/core/pe/init.py", line 1, in from . import block, cache, exif, iphoto_plist, matchblock, matchexif, photo, prioritize, result_table, scanner # noqa File "/usr/share/dupeguru/core/pe/block.py", line 9, in from ._block import NoBlocksError, DifferentBlockCountError, avgdiff, getblocks2 # NOQA ModuleNotFoundError: No module named 'core.pe._block'

    https://framapic.org/V0Mwt71yWUPq/nKkKQ0meHsr5.png https://framapic.org/lYvGBlydaTrK/FrI6fXBeRpOe.png

    Thanks in advance for your help !

    bug Linux 
    opened by Valeryan24 15
  • Support for ubuntu 18.04

    Support for ubuntu 18.04

    Running dupeguru xenial distribution on 18.04 bionic beaver results in ModuleNotFoundError: No module named 'core.pe._block'

    The fix is to relink proper libraries -

    sudo ln /usr/share/dupeguru/core/pe/_cache.cpython-35m-x86_64-linux-gnu.so /usr/share/dupeguru/core/pe/_cache.cpython-36m-x86_64-linux-gnu.so
    sudo ln /usr/share/dupeguru/core/pe/_block.cpython-35m-x86_64-linux-gnu.so /usr/share/dupeguru/core/pe/_block.cpython-36m-x86_64-linux-gnu.so
    sudo ln /usr/share/dupeguru/qt/pe/_block_qt.cpython-35m-x86_64-linux-gnu.so /usr/share/dupeguru/qt/pe/_block_qt.cpython-36m-x86_64-linux-gnu.so
    

    Could you add this fix the distribution for 18.04 bionic beaver?

    opened by alexivkin 14
  • feat: Remove shelve picture cache

    feat: Remove shelve picture cache

    • Remove shelve picture cache as it has had a fair number of historical issues. Original issue for which it was added should be long resolved. Additionally this allows additional consolidation of the various cache code and potentially dbs in the future.
    • Remove all related preferences and related code for changing cache backend between sqlite and shelve.
    opened by arsenetar 1
  • How to make sure program reads content of the file?

    How to make sure program reads content of the file?

    I have 1.2TB of data(photos and videos), out of which DupeGuru identified 400GB of duplicates.

    My concern is that it finished analysis very fast (within one minute), and I didn't see much disk reading activity in Windows Task Manager. I would expect DupeGuru to spend significant time (30mins?) to check content of 400GB of duplicate files to calculate hashes. I have "partial hash for large files" option disabled.

    How to explain such behavior and how to make sure DupeGuru is actually checking files content and not just sizes? OS: Windows 11, filesystem: NTFS, disk: SSD NVME.

    bug 
    opened by rikuiki 5
  • Behavior of

    Behavior of "keep selection preference"

    Is your feature request related to a problem? Please describe. I am new to using dupeguru. I just installed the dmg on an M1 running OSX12.6.2

    When there are two multiple audio files with different quality bitrate, is the "keep selection preference" to prefer keep the higher quality audio file?

    I could only find high level documentation. If there is documentation that I missed that discusses these settings please refer me to it.

    Thank you in advance.

    opened by noahwallach 0
  • Error when scanning entire mac desktop as reference, and a couple of folders within the desktop as normal, and a few folders excluded

    Error when scanning entire mac desktop as reference, and a couple of folders within the desktop as normal, and a few folders excluded

    Application Identifier: com.hardcoded-software.dupeguru Application Version: 4.0.3 Mac OS X Version: Version 10.16 (Build 21G5046c)

    Traceback (most recent call last): File "build/dupeGuru.app/Contents/Resources/py/cocoa/inter.py", line 259, in pulse File "build/dupeGuru.app/Contents/Resources/py/hscommon/gui/progress_window.py", line 101, in pulse File "build/dupeGuru.app/Contents/Resources/py/core/app.py", line 323, in _job_error File "build/dupeGuru.app/Contents/Resources/py/hscommon/jobprogress/performer.py", line 43, in _async_run File "build/dupeGuru.app/Contents/Resources/py/core/app.py", line 780, in do File "build/dupeGuru.app/Contents/Resources/py/core/scanner.py", line 137, in get_dupe_groups File "build/dupeGuru.app/Contents/Resources/py/core/pe/scanner.py", line 31, in _getmatches File "build/dupeGuru.app/Contents/Resources/py/core/pe/matchblock.py", line 167, in getmatches File "build/dupeGuru.app/Contents/Resources/py/core/pe/matchblock.py", line 65, in prepare_pictures File "build/dupeGuru.app/Contents/Resources/py/core/pe/cache_shelve.py", line 129, in purge_outdated File "build/dupeGuru.app/Contents/Resources/py/core/pe/cache_shelve.py", line 47, in delitem File "build/dupeGuru.app/Contents/Resources/py/shelve.py", line 128, in delitem KeyError: b'id:16042'

    bug 
    opened by matttrv 0
  • Running Multiple Instances of DupeGuru in 1 computer

    Running Multiple Instances of DupeGuru in 1 computer

    Application Name: dupeGuru Version: 4.3.1 Python: 3.8.13 Operating System: Windows-10-10.0.17763-SP0

    Traceback (most recent call last): File "hscommon\gui\progress_window.py", line 111, in pulse File "core\app.py", line 300, in _job_completed File "core\fs.py", line 176, in commit sqlite3.OperationalError: database is locked

    opened by LumarMotta 1
Releases(4.3.1)
Some Boring Research About Products Recognition 、Duplicate Img Detection、Img Stitch、OCR

Products Recognition 介绍 商品识别,围绕在复杂的商场零售场景中,识别出货架图像中的商品信息。主要组成部分: 重复图像检测。【更新进度 4/10】 图像拼接。【更新进度 0/10】 目标检测。【更新进度 0/10】 商品识别。【更新进度 1/10】 OCR。【更新进度 1/10】

zhenjieWang 18 Jan 27, 2022
Simple python code to fix your combo list by removing any text after a separator or removing duplicate combos

Combo List Fixer A simple python code to fix your combo list by removing any text after a separator or removing duplicate combos Removing any text aft

Hamidreza Dehghan 3 Dec 5, 2022
Near-Duplicate Video Retrieval with Deep Metric Learning

Near-Duplicate Video Retrieval with Deep Metric Learning This repository contains the Tensorflow implementation of the paper Near-Duplicate Video Retr

null 2 Jan 24, 2022
Python Sreamlit Duplicate Records Finder Remover

Python-Sreamlit-Duplicate-Records-Finder-Remover Streamlit is an open-source Python library that makes it easy to create and share beautiful, custom w

RONALD KANYEPI 1 Jan 21, 2022
Automatically move or copy files based on metadata associated with the files. For example, file your photos based on EXIF metadata or use MP3 tags to file your music files.

Automatically move or copy files based on metadata associated with the files. For example, file your photos based on EXIF metadata or use MP3 tags to file your music files.

Rhet Turnbull 14 Nov 2, 2022
Dragon Age: Origins toolset to extract/build .erf files, patch language-specific .dlg files, and view the contents of files in the ERF or GFF format

DAOTools This is a set of tools for Dragon Age: Origins modding. It can patch the text lines of .dlg files, extract and build an .erf file, and view t

null 8 Dec 6, 2022
Analyse a forensic target (such as a directory) to find and report files found and not found from CIRCL hashlookup public service

Analyse a forensic target (such as a directory) to find and report files found and not found from CIRCL hashlookup public service. This tool can help a digital forensic investigator to know the context, origin of specific files during a digital forensic investigation.

hashlookup 96 Dec 20, 2022
This tool analyzes the json files generated by stream-lnd-htlcs to find hidden channel demand.

analyze_lnd_htlc Introduction Rebalancing channels is an important part of running a Lightning Network node. While it would be great if all channels c

Marimox 4 Dec 8, 2022
Find unused resource keys in properties files in a Salesforce Commerce Cloud project and get rid of them.

Find Unused Resource Keys Find unused resource keys in properties files in a Salesforce Commerce Cloud project and get rid of them. It looks through a

Noël 5 Jan 8, 2022
Find potentially sensitive files

find_files Find potentially sensitive files This script searchs for potentially sensitive files based off of file name or string contained in the file

null 4 Aug 20, 2022
Find vulnerable Log4j2 versions on disk and also inside Java Archive Files (Log4Shell CVE-2021-44228)

log4j-finder A Python3 script to scan the filesystem to find Log4j2 that is vulnerable to Log4Shell (CVE-2021-44228) It scans recursively both on disk

Fox-IT 431 Dec 22, 2022
A simple CLI application helps you to find giant files that are eating up your system storage

Large file finder Sometimes it's very hard to find if some giant files are eating up your system storage. We might need to hunt those down. This simpl

Rahul Baruri 5 Nov 18, 2022
A simple CLI based any Download Tool, that find files and let you stream or download thorugh WebTorrent CLI or Aria or any command tool

Privateer A simple CLI based any Download Tool, that find files and let you stream or download thorugh WebTorrent CLI or Aria or any command tool How

Shreyash Chavan 2 Apr 4, 2022
Library to create spreadsheet files compatible with MS Excel 97/2000/XP/2003 XLS files, on any platform.

xlwt This is a library for developers to use to generate spreadsheet files compatible with Microsoft Excel versions 95 to 2003. The package itself is

null 1k Dec 24, 2022
Organize Django settings into multiple files and directories. Easily override and modify settings. Use wildcards and optional settings files.

Organize Django settings into multiple files and directories. Easily override and modify settings. Use wildcards in settings file paths and mark setti

Nikita Sobolev 940 Jan 3, 2023
Bot simply search for the files from provided channel according to given query and gives link to those files as buttons!

Auto Filter Bot ㅤㅤㅤㅤㅤㅤㅤ ㅤㅤㅤㅤㅤㅤㅤ You can call this as an Auto Filter Bot if you like :D Bot simply search for the files from provided channel according

TroJanzHEX 89 Nov 23, 2022
Organize Django settings into multiple files and directories. Easily override and modify settings. Use wildcards and optional settings files.

Organize Django settings into multiple files and directories. Easily override and modify settings. Use wildcards in settings file paths and mark setti

Nikita Sobolev 942 Jan 5, 2023
Python function to stream unzip all the files in a ZIP archive: without loading the entire ZIP file or any of its files into memory at once

Python function to stream unzip all the files in a ZIP archive: without loading the entire ZIP file or any of its files into memory at once

Department for International Trade 206 Jan 2, 2023