A basic duplicate image detection service using perceptual image hash functions and nearest neighbor search, implemented using faiss, fastapi, and imagehash

Matthew Podolak

Last update: Nov 11, 2022

Related tags

Overview

Duplicate Image Detection

Getting Started

Install dependencies pip install -r requirements.txt
Run service python main.py

Testing

Test with pytest

How it Works

This system uses a perceptual hashing function, similar to Apple's CSAM Detection. Instead of generating image hashes using NeuralHash, it uses a difference hash (dHash), which is simpler and less computationally intensive as it doesn't require neural networks. Since we don't have the same privacy constraints as Apple, we will be using nearest neighbor searches to identify duplicate images.

Difference Hash

dHash is a perceptual hashing function that produces hash values that are resilient to image scaling, as well as changes in color, brightness, and aspect ratio [1]. There are 4 main steps for creating a difference hash for an image:

Convert to greyscale*
Resize image to (hash_size+1, hash_size)
Calculate horizontal gradient, reducing image size to (hash_size, hash_size)
Assign bits based on horizontal gradient values

*We convert the image to greyscale before resizing for optimal performance

Nearest Neighbors

Image hashes that we want to check for duplicates against will be stored in a binary index for fast and efficient nearest neighbor searches. We will use Hamming distance as a metric to determine the similarity between image hashes, for dHash, distances less than 10 (96.09% similarity) likely indicate similar/duplicate images [1].

References

[1] https://www.hackerfactor.com/blog/?/archives/529-Kind-of-Like-That.html

You might also like...

Project page of the paper 'Analyzing Perception-Distortion Tradeoff using Enhanced Perceptual Super-resolution Network' (ECCVW 2018)

EPSR (Enhanced Perceptual Super-resolution Network) paper This repo provides the test code, pretrained models, and results on benchmark datasets of ou

78 Nov 19, 2022

Near-Duplicate Video Retrieval with Deep Metric Learning

Near-Duplicate Video Retrieval with Deep Metric Learning This repository contains the Tensorflow implementation of the paper Near-Duplicate Video Retr

2 Jan 24, 2022

Evaluation Pipeline for our ECCV2020: Journey Towards Tiny Perceptual Super-Resolution.

Journey Towards Tiny Perceptual Super-Resolution Test code for our ECCV2020 paper: https://arxiv.org/abs/2007.04356 Our x4 upscaling pre-trained model

6 Mar 30, 2022

Source code for our paper "Learning to Break Deep Perceptual Hashing: The Use Case NeuralHash"

Learning to Break Deep Perceptual Hashing: The Use Case NeuralHash Abstract: Apple recently revealed its deep perceptual hashing system NeuralHash to

11 Dec 3, 2022

a basic code repository for basic task in CV(classification,detection,segmentation)

basic_cv a basic code repository for basic task in CV(classification,detection,segmentation,tracking) classification generate dataset train predict de

1 Oct 15, 2021

Implemented fully documented Particle Swarm Optimization algorithm (basic model with few advanced features) using Python programming language

Implemented fully documented Particle Swarm Optimization (PSO) algorithm in Python which includes a basic model along with few advanced features such as updating inertia weight, cognitive, social learning coefficients and maximum velocity of the particle.

9 Nov 29, 2022

Code for Subgraph Federated Learning with Missing Neighbor Generation (NeurIPS 2021)

To run the code Unzip the package to your local directory; Run 'pip install -r requirements.txt' to download required packages; Open file ~/nips_code/

32 Dec 26, 2022

A series of convenience functions to make basic image processing operations such as translation, rotation, resizing, skeletonization, and displaying Matplotlib images easier with OpenCV and Python.

imutils A series of convenience functions to make basic image processing functions such as translation, rotation, resizing, skeletonization, and displ

4.3k Jan 8, 2023

Approximate Nearest Neighbors in C++/Python optimized for memory usage and loading/saving to disk

Annoy Annoy (Approximate Nearest Neighbors Oh Yeah) is a C++ library with Python bindings to search for points in space that are close to a given quer

10.6k Jan 4, 2023

A basic duplicate image detection service using perceptual image hash functions and nearest neighbor search, implemented using faiss, fastapi, and imagehash

Related tags

Overview

Duplicate Image Detection

Getting Started

Testing

How it Works

Difference Hash

Nearest Neighbors

References

You might also like...

Project page of the paper 'Analyzing Perception-Distortion Tradeoff using Enhanced Perceptual Super-resolution Network' (ECCVW 2018)

Near-Duplicate Video Retrieval with Deep Metric Learning

Evaluation Pipeline for our ECCV2020: Journey Towards Tiny Perceptual Super-Resolution.

Source code for our paper "Learning to Break Deep Perceptual Hashing: The Use Case NeuralHash"

a basic code repository for basic task in CV(classification,detection,segmentation)

Implemented fully documented Particle Swarm Optimization algorithm (basic model with few advanced features) using Python programming language

Code for Subgraph Federated Learning with Missing Neighbor Generation (NeurIPS 2021)

A series of convenience functions to make basic image processing operations such as translation, rotation, resizing, skeletonization, and displaying Matplotlib images easier with OpenCV and Python.

Approximate Nearest Neighbors in C++/Python optimized for memory usage and loading/saving to disk

Owner

Matthew Podolak

Simple and Effective Few-Shot Named Entity Recognition with Structured Nearest Neighbor Learning

Pytorch implementation of paper "Efficient Nearest Neighbor Language Models" (EMNLP 2021)

K-Nearest Neighbor in Pytorch

Personal implementation of paper "Approximate Nearest Neighbor Negative Contrastive Learning for Dense Text Retrieval"

GPU implementation of $k$-Nearest Neighbors and Shared-Nearest Neighbors

Weighted K Nearest Neighbors (kNN) algorithm implemented on python from scratch.

A library for building and serving multi-node distributed faiss indices.

Deep Image Search is an AI-based image search engine that includes deep transfor learning features Extraction and tree-based vectorized search.

From Fidelity to Perceptual Quality: A Semi-Supervised Approach for Low-Light Image Enhancement (CVPR'2020)