Find target hash collisions for Apple's NeuralHash perceptual hash function.💣

Overview

neural-hash-collider

Find target hash collisions for Apple's NeuralHash perceptual hash function.

For example, starting from a picture of this cat, we can find an adversarial image that has the same hash as the picture of the dog in this post:

python collide.py --image cat.jpg --target 59a34eabe31910abfb06f308

Cat image with NeuralHash 59a34eabe31910abfb06f308 Dog image with NeuralHash 59a34eabe31910abfb06f308

We can confirm the hash collision using nnhash.py from AsuharietYgvar/AppleNeuralHash2ONNX:

$ python nnhash.py dog.png
59a34eabe31910abfb06f308
$ python nnhash.py adv.png
59a34eabe31910abfb06f308

How it works

NeuralHash is a perceptual hash function that uses a neural network. Images are resized to 360x360 and passed through a neural network to produce a 128-dimensional feature vector. Then, the vector is projected onto R^96 using a 128x96 "seed" matrix. Finally, to produce a 96-bit hash, the 96-dimensional vector is thresholded: negative entries turn into a 0 bit, and non-negative entries turn into a 1 bit.

This entire process, except for the thresholding, is differentiable, so we can use gradient descent to find hash collisions. This is a well-known property of neural networks, that they are vulnerable to adversarial examples.

We can define a loss that captures how close an image is to a given target hash: this loss is basically just the NeuralHash algorithm as described above, but with the final "hard" thresholding step tweaked so that it is "soft" (in particular, differentiable). Exactly how this is done (choices of activation functions, parameters, etc.) can affect convergence, so it can require some experimentation. After choosing the loss function, we can follow the standard method to find adversarial examples for neural networks: gradient descent.

Details

The implementation currently does an alternating projections style attack to find an adversarial example that has the intended hash and also looks similar to the original. See collide.py for the full details. The implementation uses two different loss functions: one measures the distance to the target hash, and the other measures the quality of the perturbation (l2 norm + total variation). We first optimize for a collision, focusing only on matching the target hash. Once we find a projection, we alternate between minimizing the perturbation and ensuring that the hash value does not change. The attack has a number of parameters; run python collide.py --help or refer to the code for a full list. Tweaking these parameters can make a big difference in convergence time and the quality of the output.

The implementation also supports a flag --blur [sigma] that blurs the perturbation on every step of the search. This can slow down or break convergence, but on some examples, it can be helpful for getting results that look more natural and less like glitch art.

Examples

Reproducing the Lena/Barbara result from this post:

The first image above is the original Lena image. The second was produced with --target a426dae78cc63799d01adc32 to collide with Barbara. The third was produced with the additional argument --blur 1.0. The fourth is the original Barbara image. Checking their hashes:

$ python nnhash.py lena.png
32dac883f7b91bbf45a48296
$ python nnhash.py lena-adv.png
a426dae78cc63799d01adc32
$ python nnhash.py lena-adv-blur-1.0.png
a426dae78cc63799d01adc32
$ python nnhash.py barbara.png
a426dae78cc63799d01adc32

Reproducing the Picard/Sidious result from this post:

The first image above is the original Picard image. The second was produced with --target e34b3da852103c3c0828fbd1 --tv-weight 3e-4 to collide with Sidious. The third was produced with the additional argument --blur 0.5. The fourth is the original Sidious image. Checking their hashes:

$ python nnhash.py picard.png
73fae120ad3191075efd5580
$ python nnhash.py picard-adv.png
e34b2da852103c3c0828fbd1
$ python nnhash.py picard-adv-blur-0.5.png
e34b2da852103c3c0828fbd1
$ python nnhash.py sidious.png
e34b2da852103c3c0828fbd1

Prerequisites

  • Get Apple's NeuralHash model following the instructions in AsuharietYgvar/AppleNeuralHash2ONNX and either put all the files in this directory or supply the --model / --seed arguments
  • Install Python dependencies: pip install -r requirements.txt

Usage

Run python collide.py --image [path to image] --target [target hash] to generate a hash collision. Run python collide.py --help to see all the options, including some knobs you can tweak, like the learning rate and some other parameters.

Limitations

The code in this repository is intended to be a demonstration, and perhaps a starting point for other exploration. Tweaking the implementation (choice of loss function, choice of parameters, etc.) might produce much better results than this code currently achieves.

You might also like...
ghotok mail - lets you find available contact email addresses from target website
ghotok mail - lets you find available contact email addresses from target website

ghotok-mail ghotok mail - lets you find available contact email addresses from target website git clone https://github.com/josifkhan/ghotok-mail cd gh

Discord Token Finder - Find half of your target's token with just their ID.
Discord Token Finder - Find half of your target's token with just their ID.

Discord Token Finder - Find half of your target's token with just their ID.

Some Python scripts that fx(hash) users might find useful.

fx_hash_utils Some Python scripts that fx(hash) users might find useful. get_images This script downloads all the static images of the tokens generate

 From Fidelity to Perceptual Quality: A Semi-Supervised Approach for Low-Light Image Enhancement (CVPR'2020)
From Fidelity to Perceptual Quality: A Semi-Supervised Approach for Low-Light Image Enhancement (CVPR'2020)

Under-exposure introduces a series of visual degradation, i.e. decreased visibility, intensive noise, and biased color, etc. To address these problems, we propose a novel semi-supervised learning approach for low-light image enhancement.

Evaluation Pipeline for our ECCV2020: Journey Towards Tiny Perceptual Super-Resolution.

Journey Towards Tiny Perceptual Super-Resolution Test code for our ECCV2020 paper: https://arxiv.org/abs/2007.04356 Our x4 upscaling pre-trained model

Project page of the paper 'Analyzing Perception-Distortion Tradeoff using Enhanced Perceptual Super-resolution Network' (ECCVW 2018)
Project page of the paper 'Analyzing Perception-Distortion Tradeoff using Enhanced Perceptual Super-resolution Network' (ECCVW 2018)

EPSR (Enhanced Perceptual Super-resolution Network) paper This repo provides the test code, pretrained models, and results on benchmark datasets of ou

We provide useful util functions. When adding a util function, please add a description of the util function.

Utils Collection Motivation When we implement codes, we often search for util functions that are already implemented. Here, we are going to share util

This is my favourite function - the Rastrigin function.
This is my favourite function - the Rastrigin function.

This is my favourite function - the Rastrigin function. What sparked my curiosity and interest in the function was its complexity in terms of many local optimum points, which makes it particularly interesting and useful for testing any optimisation algorithms.

Azure-function-proxy - Basic proxy as an azure function serverless app

azure function proxy (for phishing) here are config files for using *[.]azureweb

Lambda-function - Python codes that allow notification of changes made to some services using the AWS Lambda Function
Lambda-function - Python codes that allow notification of changes made to some services using the AWS Lambda Function

AWS Lambda Function This repository contains python codes that allow notificatio

Find usage statistics (imports, function calls, attribute access) for Python code-bases

Python Library stats This is a small library that allows you to query some useful statistics for Python code-bases. We currently report library import

Visualize and compare datasets, target values and associations, with one line of code.
Visualize and compare datasets, target values and associations, with one line of code.

In-depth EDA (target analysis, comparison, feature analysis, correlation) in two lines of code! Sweetviz is an open-source Python library that generat

Visualize and compare datasets, target values and associations, with one line of code.
Visualize and compare datasets, target values and associations, with one line of code.

In-depth EDA (target analysis, comparison, feature analysis, correlation) in two lines of code! Sweetviz is an open-source Python library that generat

Prototypical Pseudo Label Denoising and Target Structure Learning for Domain Adaptive Semantic Segmentation (CVPR 2021)
Prototypical Pseudo Label Denoising and Target Structure Learning for Domain Adaptive Semantic Segmentation (CVPR 2021)

Prototypical Pseudo Label Denoising and Target Structure Learning for Domain Adaptive Semantic Segmentation (CVPR 2021, official Pytorch implementatio

PyTorch code for the paper
PyTorch code for the paper "Curriculum Graph Co-Teaching for Multi-target Domain Adaptation" (CVPR2021)

PyTorch code for the paper "Curriculum Graph Co-Teaching for Multi-target Domain Adaptation" (CVPR2021) This repo presents PyTorch implementation of M

Red Team tool for exfiltrating files from a target's Google Drive that you have access to, via Google's API.

GD-Thief Red Team tool for exfiltrating files from a target's Google Drive that you(the attacker) has access to, via the Google Drive API. This includ

Recon is a script to perform a full recon on a target with the main tools to search for vulnerabilities.
Recon is a script to perform a full recon on a target with the main tools to search for vulnerabilities.

👑 Recon 👑 The step of recognizing a target in both Bug Bounties and Pentest can be very time-consuming. Thinking about it, I decided to create my ow

Python and C++ implementation of
Python and C++ implementation of "MarkerPose: Robust real-time planar target tracking for accurate stereo pose estimation". Accepted at LXCV @ CVPR 2021.

MarkerPose: Robust real-time planar target tracking for accurate stereo pose estimation This is a PyTorch and LibTorch implementation of MarkerPose: a

Generate Contextual Directory Wordlist For Target Org

PathPermutor Generate Contextual Directory Wordlist For Target Org This script generates contextual wordlist for any target org based on the set of UR

Comments
  • Hash given here differs from hash in nnhash.py?

    Hash given here differs from hash in nnhash.py?

    I have the following two images:

    I then ask nnhash.py from AsuharietYgvar/AppleNeuralHash2ONNX for the NeuralHash of the second picture (as instructed by the README in this repo) and get a result.

    $ python3 nnhash.py model.onnx neuralhash_128x96_seed1.dat ../images/target/palpatine_square.png 
    e34b2da852103c3c0828fbd1
    

    I then run the collider as instructed with that result as target hash and seem to find a collision:

    $ python3 collide.py --image ../images/source/picard_square.png --target e34b2da852103c3c0828fbd1
    [.../snip/...]
    iteration: 221/1000, best: 0, hash: e34b2da852103c3c0828fbd1, distance: 0, loss: 7.484
    

    Seems like everything worked fine, the hash displayed is equal to the target! But! If I now run nnhash.py on one of the saved pictures, it gives me a slightly different hash than the target?

    $ python3 nnhash.py model.onnx neuralhash_128x96_seed1.dat out_iter\=00220_dist\=00.png 
    e24b2da85210343c0828fbd1
    

    e34b2da852103c3c0828fbd1 e24b2da85210343c0828fbd1


    Is this expected behaviour that I don't understand? Do the two bytes not make a difference, somehow? Or is this only producing near-collisions instead of proper collisions?

    opened by lambdaTotoro 4
  • numpy not found

    numpy not found

    Thank you for this interesting demonstration!

    When I tried it on Ubuntu 21.04, I received the following error:

    Traceback (most recent call last): File "collide.py", line 3, in import numpy as np ImportError: No module named numpy

    It worked like a charm when I used "python3" instead of simply "python" as command. Maybe that's an interesting advice for the readme.

    opened by errotu 1
  • Save image immediately if distance is zero

    Save image immediately if distance is zero

    Obviously, after a distance of zero, there is nothing to improve upon, so it should be saved. This fixes a problem when a distance of zero would be reached but the model would continue without saving, and eventually would end with a non-zero distance (which is worse).

    opened by shaunakg 0
  • noise filtering.

    noise filtering.

    Try something like

                      xdelta = x - target_image
                      xdelta = xdelta - gaussian_filter(xdelta, sigma=1.5)
                      x = x - feedback*xdelta
    

    before clipping. E.g. measure the error to the target image, high pass it, and feed that back in.

    With feedback between 0.01 and 0.15 I often get pretty good results without breaking convergence.

    One thing to do can be to converge with lower feedback then gradually increase it while the hash is still matching. It won't fix it if it settled into glitch art but the higher feedback can polish off the last of the noise.

    Cheers.

    opened by gmaxwell 1
Owner
Anish Athalye
grad student @mit-pdos
Anish Athalye
Semi-hash-based Image Generator

pixel-planet Semi-hash-based Image Generator Utilizable for NFTs Generation Process Input is salted and hashed Colors (background, planet, stars) are

Bill Ni 3 Sep 1, 2022
Plots the graph of a function with ASCII characters.

ASCII Graph Plotter Plots the graph of a function with ASCII characters. See the change log here. Developed by InformaticFreak (c) 2021 How to use py

InformaticFreak 2 Apr 29, 2022
Source code for our paper "Learning to Break Deep Perceptual Hashing: The Use Case NeuralHash"

Learning to Break Deep Perceptual Hashing: The Use Case NeuralHash Abstract: Apple recently revealed its deep perceptual hashing system NeuralHash to

ml-research@TUDarmstadt 11 Dec 3, 2022
Compare the contents of your hosted and proxy repositories for coordinate collisions

Nexus Repository Manager dependency/namespace confusion checker This repository contains a script to check if you have artifacts containing the same n

Sonatype Community 59 Mar 31, 2022
Convert Apple NeuralHash model for CSAM Detection to ONNX.

Apple NeuralHash is a perceptual hashing method for images based on neural networks. It can tolerate image resize and compression.

Asuhariet Ygvar 1.5k Dec 31, 2022
Demonstrates iterative FGSM on Apple's NeuralHash model.

apple-neuralhash-attack Demonstrates iterative FGSM on Apple's NeuralHash model. TL;DR: It is possible to apply noise to CSAM images and make them loo

Lim Swee Kiat 11 Jun 23, 2022
Craxk is a SINGLE AND NON-REPLICABLE Hash that uses data from the hardware where it is executed to form a hash that can only be reproduced by a single machine.

What is Craxk ? Craxk is a UNIQUE AND NON-REPLICABLE Hash that uses data from the hardware where it is executed to form a hash that can only be reprod

null 5 Jun 19, 2021
That Hash will name that hash type! Identify MD5, SHA256 and 300+ other hashes Comes with

Call for translators! We're looking for translators to help translate this spec for everyone! Read this documentation in the following languages 한국어 中

All Contributors 6.8k Jan 5, 2023
Analyse a forensic target (such as a directory) to find and report files found and not found from CIRCL hashlookup public service

Analyse a forensic target (such as a directory) to find and report files found and not found from CIRCL hashlookup public service. This tool can help a digital forensic investigator to know the context, origin of specific files during a digital forensic investigation.

hashlookup 96 Dec 20, 2022