Open source implementation of "A Self-Supervised Descriptor for Image Copy Detection" (SSCD).

Overview

A Self-Supervised Descriptor for Image Copy Detection (SSCD)

This is the open-source codebase for "A Self-Supervised Descriptor for Image Copy Detection", recently accepted to CVPR 2022.

This work uses self-supervised contrastive learning with strong differential entropy regularization to create a fingerprint for image copy detection.

SSCD diagram

About this codebase

This implementation is built on Pytorch Lightning, with some components from Classy Vision.

Our original experiments were conducted in a proprietary codebase using data files (fonts and emoji) that are not licensed for redistribution. This version uses Noto fonts and Twemoji emoji, via the AugLy project. As a result, models trained in this codebase perform slightly differently than our pretrained models.

Pretrained models

We provide trained models from our original experiments to allow others to reproduce our evaluation results.

For convenience, we provide equivalent model files in a few formats:

  • Files ending in .classy.pt are weight files using Classy Vision ResNe(X)t backbones, which is how these models were trained.
  • Files ending in .torchvision.pt are weight files using Torchvision ResNet backbones. These files may be easier to integrate in Torchvision-based codebases. See model.py for how we integrate GeM pooling and L2 normalization into these models.
  • Files ending in .torchscript.pt are standalone TorchScript models that can be used in any pytorch project without any SSCD code.

We provide the following models:

name dataset trunk augmentations dimensions classy vision torchvision torchscript
sscd_disc_blur DISC ResNet50 strong blur 512 link link link
sscd_disc_advanced DISC ResNet50 advanced 512 link link link
sscd_disc_mixup DISC ResNet50 advanced + mixup 512 link link link
sscd_disc_large DISC ResNeXt101 32x4 advanced + mixup 1024 link link
sscd_imagenet_blur ImageNet ResNet50 strong blur 512 link link link
sscd_imagenet_advanced ImageNet ResNet50 advanced 512 link link link
sscd_imagenet_mixup ImageNet ResNet50 advanced + mixup 512 link link link

We recommend sscd_disc_mixup (ResNet50) as a default SSCD model, especially when comparing to other standard ResNet50 models, and sscd_disc_large (ResNeXt101) as a higher accuracy alternative using a bit more compute.

Classy Vision and Torchvision use different default cardinality settings for ResNeXt101. We do not provide a Torchvision version of the sscd_disc_large model for this reason.

Installation

If you only plan to use torchscript models for inference, no installation steps are necessary, and any environment with a recent version of pytorch installed can run our torchscript models.

For all other uses, see installation steps below.

The code is written for pytorch-lightning 1.5 (the latest version at time of writing), and may need changes for future Lightning versions.

Option 1: Install dependencies using Conda

Install and activate conda, then create a conda environment for SSCD as follows:

# Create conda environment
conda create --name sscd -c pytorch -c conda-forge \
  pytorch torchvision cudatoolkit=11.3 \
  "pytorch-lightning>=1.5,<1.6" lightning-bolts \
  faiss python-magic pandas numpy

# Activate environment
conda activate sscd

# Install Classy Vision and AugLy from PIP:
python -m pip install classy_vision augly

You may need to select a cudatoolkit version that corresponds to the system CUDA library version you have installed. See PyTorch documentation for supported combinations of pytorch, torchvision and cudatoolkit versions.

For a non-CUDA (CPU only) installation, replace cudatoolkit=... with cpuonly.

Option 2: Install dependencies using PIP

# Create environment
python3 -m virtualenv ./venv

# Activate environment
source ./venv/bin/activate

# Install dependencies in this environment
python -m pip install -r ./requirements.txt --extra-index-url https://download.pytorch.org/whl/cu113

The --extra-index-url option selects a newer version of CUDA libraries, required for NVidia A100 GPUs. This can be omitted if A100 support is not needed.

Inference using SSCD models

This section describes how to use pretrained SSCD models for inference. To perform inference for DISC and Copydays evaluations, see Evaluation.

Preprocessing

We recommend preprocessing images for inference either resizing the small edge to 288 or resizing the image to a square tensor.

Using fixed-sized square tensors is more efficient on GPUs, to make better use of batching. Copy detection using square tensors benefits from directly resizing to the target tensor size. This skews the image, and does not preserve aspect ratio. This differs from the common practice for classification inference.

from torchvision import transforms

normalize = transforms.Normalize(
    mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225],
)
small_288 = transforms.Compose([
    transforms.Resize(288),
    transforms.ToTensor(),
    normalize,
])
skew_320 = transforms.Compose([
    transforms.Resize([320, 320]),
    transforms.ToTensor(),
    normalize,
])

Inference using Torchscript

Torchscript files can be loaded directly in other projects without any SSCD code or dependencies.

import torch
from PIL import Image

model = torch.jit.load("/path/to/sscd_disc_mixup.torchscript.pt")
img = Image.open("/path/to/image.png").convert('RGB')
batch = small_288(img).unsqueeze(0)
embedding = model(batch)[0, :]

These Torchscript models are prepared for inference. For other uses (eg. fine-tuning), use model weight files, as described below.

Load model weight files

To load model weight files, first construct the Model object, then load the weights using the standard torch.load and load_state_dict methods.

import torch
from sscd.models.model import Model

model = Model("CV_RESNET50", 512, 3.0)
weights = torch.load("/path/to/sscd_disc_mixup.classy.pt")
model.load_state_dict(weights)
model.eval()

Once loaded, these models can be used interchangeably with Torchscript models for inference.

Model backbone strings can be found in the Backbone enum in model.py. Classy Vision models start with the prefix CV_ and Torchvision models start with TV_.

Using SSCD descriptors

SSCD models produce 512 dimension (except the "large" model, which uses 1024 dimensions) L2 normalized descriptors for each input image. The similarity of two images with descriptors a and b can be measured by descriptor cosine similarity (a.dot(b); higher is more similar), or equivalently using euclidean distance ((a-b).norm(); lower is more similar).

For the sscd_disc_mixup model, DISC image pairs with embedding cosine similarity greater than 0.75 are copies with 90% precision, for example. This corresponds to a euclidean distance less than 0.7, or squared euclidean distance less than 0.5.

Descriptor post-processing

For best results, we recommend additional descriptor processing when sample images from the target distribution are available. Centering (subtracting the mean) followed by L2 normalization, or whitening followed by L2 normalization, can improve accuracy.

Score normalization can make similarity more consistent and improve global accuracy metrics (but has no effect on ranking metrics).

Other model formats

If pretrained models in another format (eg. ONYX) would be useful for you, let us know by filing a feature request.

Reproducing evaluation results

To reproduce evaluation results, see Evaluation.

Training SSCD models

For information on how to train SSCD models, see Training.

License

The SSCD codebase uses the CC-NC 4.0 International license.

Citation

If you find our codebase useful, please consider giving a star and cite as:

@article{pizzi2022self,
  title={A Self-Supervised Descriptor for Image Copy Detection},
  author={Pizzi, Ed and Roy, Sreya Dutta and Ravindra, Sugosh Nagavara and Goyal, Priya and Douze, Matthijs},
  journal={Proc. CVPR},
  year={2022}
}
You might also like...
An Open Source Machine Learning Framework for Everyone
An Open Source Machine Learning Framework for Everyone

Documentation TensorFlow is an end-to-end open source platform for machine learning. It has a comprehensive, flexible ecosystem of tools, libraries, a

An open source machine learning library for performing regression tasks using RVM technique.

Introduction neonrvm is an open source machine learning library for performing regression tasks using RVM technique. It is written in C programming la

EasyMocap is an open-source toolbox for markerless human motion capture from RGB videos.
EasyMocap is an open-source toolbox for markerless human motion capture from RGB videos.

EasyMocap is an open-source toolbox for markerless human motion capture from RGB videos. In this project, we provide the basic code for fitt

An Open Source Machine Learning Framework for Everyone
An Open Source Machine Learning Framework for Everyone

Documentation TensorFlow is an end-to-end open source platform for machine learning. It has a comprehensive, flexible ecosystem of tools, libraries, a

Microsoft Cognitive Toolkit (CNTK), an open source deep-learning toolkit

CNTK Chat Windows build status Linux build status The Microsoft Cognitive Toolkit (https://cntk.ai) is a unified deep learning toolkit that describes

TACTO: A Fast, Flexible and Open-source Simulator for High-Resolution Vision-based Tactile Sensors
TACTO: A Fast, Flexible and Open-source Simulator for High-Resolution Vision-based Tactile Sensors

TACTO: A Fast, Flexible and Open-source Simulator for High-Resolution Vision-based Tactile Sensors This package provides a simulator for vision-based

Open source repository for the code accompanying the paper 'Non-Rigid Neural Radiance Fields Reconstruction and Novel View Synthesis of a Deforming Scene from Monocular Video'.
Open source repository for the code accompanying the paper 'Non-Rigid Neural Radiance Fields Reconstruction and Novel View Synthesis of a Deforming Scene from Monocular Video'.

Non-Rigid Neural Radiance Fields This is the official repository for the project "Non-Rigid Neural Radiance Fields: Reconstruction and Novel View Synt

A Free and Open Source Python Library for Multiobjective Optimization

Platypus What is Platypus? Platypus is a framework for evolutionary computing in Python with a focus on multiobjective evolutionary algorithms (MOEAs)

Open source code for Paper
Open source code for Paper "A Co-Interactive Transformer for Joint Slot Filling and Intent Detection"

A Co-Interactive Transformer for Joint Slot Filling and Intent Detection This repository contains the PyTorch implementation of the paper: A Co-Intera

Comments
  • copydays download problem

    copydays download problem

    Hi @edpizzi. Good job. But I am facing a problem with downloading copydays dataset. Here is link http://pascal.inrialpes.fr/data/holidays/copydays_original.tar.gz Can you help me in that? thank you in advance. if not possible, at least i want to know about how many original images are.

    opened by Harry-KIT 4
  • Result Reproducibility

    Result Reproducibility

    Hi, @edpizzi congrats on your work! I am trying to reproduce the results from your paper, and I checked that the code provided in this repo will have different results from the paper.

    Would it be possible to provide the results of the exact training code from this repo on the DISC dataset? If possible, could you provide these results using the same parameter described in the paper: --entropy_weight=30 --augmentations=ADVANCED --mixup=true

    Kind regards, João Phillipe

    opened by phillipecardenuto 3
  • Support search_with_capped_res DISC evaluations

    Support search_with_capped_res DISC evaluations

    Hello,

    I have been looking at reproducing the model and it feels the code for evaluation of uAP includes only top 10 nearest neighbours.

    While the uAP as calculated in ISC competition I believe used around 500K nearest neighbours. I am not sure of the impact of the value, but it might lead to overestimation of uAP.

    opened by shubhamjain0594 3
  • Support global candidate retrieval for DISC evals.

    Support global candidate retrieval for DISC evals.

    Global candidate retrieval chooses the "nearest" candidates globally, by adding all candidates to a global heap, and choosing the 500K highest-similarity candidates across all queries. This type of retrieval was used in the ISC descriptor track.

    The SSCD paper instead uses ordinary KNN retrieval (K=10, which also produces 500k candidates).

    Global candidate retrieval reduces uAP without score normalization somewhat (-0.5% for sscd_disc_mixup, -1.9% for a similar model trained on ImageNet, which is less calibrated on DISC), but has negligible effects with score normalization (-0.03%).

    Addresses issue #1

    CLA Signed 
    opened by edpizzi 1
Owner
Meta Research
Meta Research
This repository is an open-source implementation of the ICRA 2021 paper: Locus: LiDAR-based Place Recognition using Spatiotemporal Higher-Order Pooling.

Locus This repository is an open-source implementation of the ICRA 2021 paper: Locus: LiDAR-based Place Recognition using Spatiotemporal Higher-Order

Robotics and Autonomous Systems Group 96 Dec 15, 2022
An open source implementation of CLIP.

OpenCLIP Welcome to an open source implementation of OpenAI's CLIP (Contrastive Language-Image Pre-training). The goal of this repository is to enable

null 2.7k Dec 31, 2022
Open source implementation of AceNAS: Learning to Rank Ace Neural Architectures with Weak Supervision of Weight Sharing

AceNAS This repo is the experiment code of AceNAS, and is not considered as an official release. We are working on integrating AceNAS as a built-in st

Yuge Zhang 6 Sep 7, 2022
OpenGAN: Open-Set Recognition via Open Data Generation

OpenGAN: Open-Set Recognition via Open Data Generation ICCV 2021 (oral) Real-world machine learning systems need to analyze novel testing data that di

Shu Kong 90 Jan 6, 2023
An Open Source Machine Learning Framework for Everyone

Documentation TensorFlow is an end-to-end open source platform for machine learning. It has a comprehensive, flexible ecosystem of tools, libraries, a

null 170.1k Jan 4, 2023
Open source Python module for computer vision

About PCV PCV is a pure Python library for computer vision based on the book "Programming Computer Vision with Python" by Jan Erik Solem. More details

Jan Erik Solem 1.9k Jan 6, 2023
Karate Club: An API Oriented Open-source Python Framework for Unsupervised Learning on Graphs (CIKM 2020)

Karate Club is an unsupervised machine learning extension library for NetworkX. Please look at the Documentation, relevant Paper, Promo Video, and Ext

Benedek Rozemberczki 1.8k Jan 7, 2023
Microsoft Cognitive Toolkit (CNTK), an open source deep-learning toolkit

CNTK Chat Windows build status Linux build status The Microsoft Cognitive Toolkit (https://cntk.ai) is a unified deep learning toolkit that describes

Microsoft 17.3k Dec 29, 2022