Open source implementation of "A Self-Supervised Descriptor for Image Copy Detection" (SSCD).

Meta Research

Last update: Jan 4, 2023

Related tags

Deep Learning sscd-copy-detection

Overview

A Self-Supervised Descriptor for Image Copy Detection (SSCD)

This is the open-source codebase for "A Self-Supervised Descriptor for Image Copy Detection", recently accepted to CVPR 2022.

This work uses self-supervised contrastive learning with strong differential entropy regularization to create a fingerprint for image copy detection.

About this codebase

This implementation is built on Pytorch Lightning, with some components from Classy Vision.

Our original experiments were conducted in a proprietary codebase using data files (fonts and emoji) that are not licensed for redistribution. This version uses Noto fonts and Twemoji emoji, via the AugLy project. As a result, models trained in this codebase perform slightly differently than our pretrained models.

Pretrained models

We provide trained models from our original experiments to allow others to reproduce our evaluation results.

For convenience, we provide equivalent model files in a few formats:

Files ending in .classy.pt are weight files using Classy Vision ResNe(X)t backbones, which is how these models were trained.
Files ending in .torchvision.pt are weight files using Torchvision ResNet backbones. These files may be easier to integrate in Torchvision-based codebases. See model.py for how we integrate GeM pooling and L2 normalization into these models.
Files ending in .torchscript.pt are standalone TorchScript models that can be used in any pytorch project without any SSCD code.

We provide the following models:

name	dataset	trunk	augmentations	dimensions	classy vision	torchvision	torchscript
sscd_disc_blur	DISC	ResNet50	strong blur	512	link	link	link
sscd_disc_advanced	DISC	ResNet50	advanced	512	link	link	link
sscd_disc_mixup	DISC	ResNet50	advanced + mixup	512	link	link	link
sscd_disc_large	DISC	ResNeXt101 32x4	advanced + mixup	1024	link		link
sscd_imagenet_blur	ImageNet	ResNet50	strong blur	512	link	link	link
sscd_imagenet_advanced	ImageNet	ResNet50	advanced	512	link	link	link
sscd_imagenet_mixup	ImageNet	ResNet50	advanced + mixup	512	link	link	link

We recommend sscd_disc_mixup (ResNet50) as a default SSCD model, especially when comparing to other standard ResNet50 models, and sscd_disc_large (ResNeXt101) as a higher accuracy alternative using a bit more compute.

Classy Vision and Torchvision use different default cardinality settings for ResNeXt101. We do not provide a Torchvision version of the sscd_disc_large model for this reason.

Installation

If you only plan to use torchscript models for inference, no installation steps are necessary, and any environment with a recent version of pytorch installed can run our torchscript models.

For all other uses, see installation steps below.

The code is written for pytorch-lightning 1.5 (the latest version at time of writing), and may need changes for future Lightning versions.

Option 1: Install dependencies using Conda

Install and activate conda, then create a conda environment for SSCD as follows:

# Create conda environment
conda create --name sscd -c pytorch -c conda-forge \
  pytorch torchvision cudatoolkit=11.3 \
  "pytorch-lightning>=1.5,<1.6" lightning-bolts \
  faiss python-magic pandas numpy

# Activate environment
conda activate sscd

# Install Classy Vision and AugLy from PIP:
python -m pip install classy_vision augly

You may need to select a cudatoolkit version that corresponds to the system CUDA library version you have installed. See PyTorch documentation for supported combinations of pytorch, torchvision and cudatoolkit versions.

For a non-CUDA (CPU only) installation, replace cudatoolkit=... with cpuonly.

Option 2: Install dependencies using PIP

# Create environment
python3 -m virtualenv ./venv

# Activate environment
source ./venv/bin/activate

# Install dependencies in this environment
python -m pip install -r ./requirements.txt --extra-index-url https://download.pytorch.org/whl/cu113

The --extra-index-url option selects a newer version of CUDA libraries, required for NVidia A100 GPUs. This can be omitted if A100 support is not needed.

Inference using SSCD models

This section describes how to use pretrained SSCD models for inference. To perform inference for DISC and Copydays evaluations, see Evaluation.

Preprocessing

We recommend preprocessing images for inference either resizing the small edge to 288 or resizing the image to a square tensor.

Using fixed-sized square tensors is more efficient on GPUs, to make better use of batching. Copy detection using square tensors benefits from directly resizing to the target tensor size. This skews the image, and does not preserve aspect ratio. This differs from the common practice for classification inference.

from torchvision import transforms

normalize = transforms.Normalize(
    mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225],
)
small_288 = transforms.Compose([
    transforms.Resize(288),
    transforms.ToTensor(),
    normalize,
])
skew_320 = transforms.Compose([
    transforms.Resize([320, 320]),
    transforms.ToTensor(),
    normalize,
])

Inference using Torchscript

Torchscript files can be loaded directly in other projects without any SSCD code or dependencies.

import torch
from PIL import Image

model = torch.jit.load("/path/to/sscd_disc_mixup.torchscript.pt")
img = Image.open("/path/to/image.png").convert('RGB')
batch = small_288(img).unsqueeze(0)
embedding = model(batch)[0, :]

These Torchscript models are prepared for inference. For other uses (eg. fine-tuning), use model weight files, as described below.

Load model weight files

To load model weight files, first construct the Model object, then load the weights using the standard torch.load and load_state_dict methods.

import torch
from sscd.models.model import Model

model = Model("CV_RESNET50", 512, 3.0)
weights = torch.load("/path/to/sscd_disc_mixup.classy.pt")
model.load_state_dict(weights)
model.eval()

Once loaded, these models can be used interchangeably with Torchscript models for inference.

Model backbone strings can be found in the Backbone enum in model.py. Classy Vision models start with the prefix CV_ and Torchvision models start with TV_.

Using SSCD descriptors

SSCD models produce 512 dimension (except the "large" model, which uses 1024 dimensions) L2 normalized descriptors for each input image. The similarity of two images with descriptors a and b can be measured by descriptor cosine similarity (a.dot(b); higher is more similar), or equivalently using euclidean distance ((a-b).norm(); lower is more similar).

For the sscd_disc_mixup model, DISC image pairs with embedding cosine similarity greater than 0.75 are copies with 90% precision, for example. This corresponds to a euclidean distance less than 0.7, or squared euclidean distance less than 0.5.

Descriptor post-processing

For best results, we recommend additional descriptor processing when sample images from the target distribution are available. Centering (subtracting the mean) followed by L2 normalization, or whitening followed by L2 normalization, can improve accuracy.

Score normalization can make similarity more consistent and improve global accuracy metrics (but has no effect on ranking metrics).

Other model formats

If pretrained models in another format (eg. ONYX) would be useful for you, let us know by filing a feature request.

Reproducing evaluation results

To reproduce evaluation results, see Evaluation.

Training SSCD models

For information on how to train SSCD models, see Training.

License

The SSCD codebase uses the CC-NC 4.0 International license.

Citation

If you find our codebase useful, please consider giving a star ⭐ and cite as:

@article{pizzi2022self,
  title={A Self-Supervised Descriptor for Image Copy Detection},
  author={Pizzi, Ed and Roy, Sreya Dutta and Ravindra, Sugosh Nagavara and Goyal, Priya and Douze, Matthijs},
  journal={Proc. CVPR},
  year={2022}
}

You might also like...

An Open Source Machine Learning Framework for Everyone

Documentation TensorFlow is an end-to-end open source platform for machine learning. It has a comprehensive, flexible ecosystem of tools, libraries, a

170.1k Jan 5, 2023

An open source machine learning library for performing regression tasks using RVM technique.

Introduction neonrvm is an open source machine learning library for performing regression tasks using RVM technique. It is written in C programming la

33 May 31, 2022

EasyMocap is an open-source toolbox for markerless human motion capture from RGB videos.

EasyMocap is an open-source toolbox for markerless human motion capture from RGB videos. In this project, we provide the basic code for fitt

2.2k Jan 5, 2023

An Open Source Machine Learning Framework for Everyone

Documentation TensorFlow is an end-to-end open source platform for machine learning. It has a comprehensive, flexible ecosystem of tools, libraries, a

153.2k Feb 13, 2021

Microsoft Cognitive Toolkit (CNTK), an open source deep-learning toolkit

CNTK Chat Windows build status Linux build status The Microsoft Cognitive Toolkit (https://cntk.ai) is a unified deep learning toolkit that describes

17k Feb 11, 2021

TACTO: A Fast, Flexible and Open-source Simulator for High-Resolution Vision-based Tactile Sensors

TACTO: A Fast, Flexible and Open-source Simulator for High-Resolution Vision-based Tactile Sensors This package provides a simulator for vision-based

255 Dec 27, 2022

Open source repository for the code accompanying the paper 'Non-Rigid Neural Radiance Fields Reconstruction and Novel View Synthesis of a Deforming Scene from Monocular Video'.

Non-Rigid Neural Radiance Fields This is the official repository for the project "Non-Rigid Neural Radiance Fields: Reconstruction and Novel View Synt

296 Dec 29, 2022

A Free and Open Source Python Library for Multiobjective Optimization

Platypus What is Platypus? Platypus is a framework for evolutionary computing in Python with a focus on multiobjective evolutionary algorithms (MOEAs)

424 Dec 18, 2022

Open source code for Paper "A Co-Interactive Transformer for Joint Slot Filling and Intent Detection"

A Co-Interactive Transformer for Joint Slot Filling and Intent Detection This repository contains the PyTorch implementation of the paper: A Co-Intera

67 Dec 5, 2022

Comments

copydays download problem

Hi @edpizzi. Good job. But I am facing a problem with downloading copydays dataset. Here is link http://pascal.inrialpes.fr/data/holidays/copydays_original.tar.gz Can you help me in that? thank you in advance. if not possible, at least i want to know about how many original images are.

opened by Harry-KIT 4
Result Reproducibility

Hi, @edpizzi congrats on your work! I am trying to reproduce the results from your paper, and I checked that the code provided in this repo will have different results from the paper.

Would it be possible to provide the results of the exact training code from this repo on the DISC dataset? If possible, could you provide these results using the same parameter described in the paper: --entropy_weight=30 --augmentations=ADVANCED --mixup=true

Kind regards, João Phillipe

opened by phillipecardenuto 3
Support search_with_capped_res DISC evaluations

Hello,

I have been looking at reproducing the model and it feels the code for evaluation of uAP includes only top 10 nearest neighbours.

While the uAP as calculated in ISC competition I believe used around 500K nearest neighbours. I am not sure of the impact of the value, but it might lead to overestimation of uAP.

opened by shubhamjain0594 3
Support global candidate retrieval for DISC evals.

Global candidate retrieval chooses the "nearest" candidates globally, by adding all candidates to a global heap, and choosing the 500K highest-similarity candidates across all queries. This type of retrieval was used in the ISC descriptor track.

The SSCD paper instead uses ordinary KNN retrieval (K=10, which also produces 500k candidates).

Global candidate retrieval reduces uAP without score normalization somewhat (-0.5% for sscd_disc_mixup, -1.9% for a similar model trained on ImageNet, which is less calibrated on DISC), but has negligible effects with score normalization (-0.03%).

Addresses issue #1
CLA Signed

opened by edpizzi 1

Open source implementation of "A Self-Supervised Descriptor for Image Copy Detection" (SSCD).

Related tags

Overview

A Self-Supervised Descriptor for Image Copy Detection (SSCD)

About this codebase

Pretrained models

Installation

Option 1: Install dependencies using Conda

Option 2: Install dependencies using PIP

Inference using SSCD models

Preprocessing

Inference using Torchscript

Load model weight files

Using SSCD descriptors

Descriptor post-processing

Other model formats

Reproducing evaluation results

Training SSCD models

License

Citation

You might also like...

An Open Source Machine Learning Framework for Everyone

An open source machine learning library for performing regression tasks using RVM technique.

EasyMocap is an open-source toolbox for markerless human motion capture from RGB videos.

An Open Source Machine Learning Framework for Everyone

Microsoft Cognitive Toolkit (CNTK), an open source deep-learning toolkit

TACTO: A Fast, Flexible and Open-source Simulator for High-Resolution Vision-based Tactile Sensors

Open source repository for the code accompanying the paper 'Non-Rigid Neural Radiance Fields Reconstruction and Novel View Synthesis of a Deforming Scene from Monocular Video'.

A Free and Open Source Python Library for Multiobjective Optimization

Open source code for Paper "A Co-Interactive Transformer for Joint Slot Filling and Intent Detection"

Comments

copydays download problem

Result Reproducibility

Support search_with_capped_res DISC evaluations

Support global candidate retrieval for DISC evals.

Owner

Meta Research

This repository is an open-source implementation of the ICRA 2021 paper: Locus: LiDAR-based Place Recognition using Spatiotemporal Higher-Order Pooling.

An open source implementation of CLIP.

Open source implementation of AceNAS: Learning to Rank Ace Neural Architectures with Weak Supervision of Weight Sharing

This is the open-source reference implementation of the SIGGRAPH 2021 paper Intersection-free Rigid Body Dynamics.

OpenGAN: Open-Set Recognition via Open Data Generation

An Open Source Machine Learning Framework for Everyone

Open source Python module for computer vision

Karate Club: An API Oriented Open-source Python Framework for Unsupervised Learning on Graphs (CIKM 2020)

Microsoft Cognitive Toolkit (CNTK), an open source deep-learning toolkit