code for Image Manipulation Detection by Multi-View Multi-Scale Supervision

Overview

MVSS-Net

Code and models for ICCV 2021 paper: Image Manipulation Detection by Multi-View Multi-Scale Supervision

Image text

Update

To Be Done.

  • 21.12.17, Something new: MVSS-Net++

We now have an improved version of MVSS-Net, denoted as MVSS-Net++. Check here.

Environment

  • Ubuntu 16.04.6 LTS
  • Python 3.6
  • cuda10.1+cudnn7.6.3

Requirements

Usage

Dataset

An example of the dataset index file is given as data/CASIAv1plus.txt, where each line contains:

img_path mask_path label
  • 0 represents the authentic and 1 represents the manipulated.
  • For an authentic image, the mask_path is "None".
  • For wild images without mask groundtruth, the index should at least contain "img_path" per line.
Training sets
Test sets
  • DEFACTO-12k
  • Columbia
  • COVER
  • NIST16
  • CASIAv1plus: Note that some of the authentic images in CASIAv1 also appear in CASIAv2. With those images fully replaced by Corel images that are new to both CASIAv1 and CASIAv2, we constructed a revision of CASIAv1 termed as CASIAv1plus. We recommend to use CASIAv1plus as an alternative to the original CASIAv1.

Trained Models

We offer FCNs and MVSS-Nets trained on CASIAv2 and DEFACTO_84k, respectively. Please download the models and place them in the ckpt directory:

The performance of these models for image-level manipulation detection (metric: AUC and image-level F1) is as follows. More details are reported in the paper.

Performance metric: AUC
Model Training data CASIAv1plus Columbia COVER DEFACTO-12k
MVSS_Net CASIAv2 0.932 0.980 0.731 0.573
MVSS_Net DEFACTO-84k 0.771 0.563 0.525 0.886
FCN CASIAv2 0.769 0.762 0.541 0.551
FCN DEFACTO-84k 0.629 0.535 0.543 0.840
Performance metric: Image-level F1 (threshold=0.5)
Model Training data CASIAv1plus Columbia COVER DEFACTO-12k
MVSS_Net CASIAv2 0.759 0.802 0.244 0.404
MVSS_Net DEFACTO-84k 0.685 0.353 0.360 0.799
FCN CASIAv2 0.684 0.481 0.180 0.458
FCN DEFACTO-84k 0.561 0.492 0.511 0.709

Inference & Evaluation

You can specify which pre-trained model to use by setting model_path in do_pred_and_eval.sh. Given a test_collection (e.g. CASIAv1plus or DEFACTO12k-test), the prediction maps and evaluation results will be saved under save_dir. The default threshold is set as 0.5.

bash do_pred_and_eval.sh $test_collection
#e.g. bash do_pred_and_eval.sh CASIAv1plus

For inference only, use following command to skip evaluation:

bash do_pred.sh $test_collection
#e.g. bash do_pred.sh CASIAv1plus

Demo

  • demo.ipynb: A step-by-step notebook tutorial showing the usage of a pre-trained model to detect manipulation in a specific image.

Citation

If you find this work useful in your research, please consider citing:

@InProceedings{MVSS_2021ICCV,  
author = {Chen, Xinru and Dong, Chengbo and Ji, Jiaqi and Cao, juan and Li, Xirong},  
title = {Image Manipulation Detection by Multi-View Multi-Scale Supervision},  
booktitle = {The IEEE International Conference on Computer Vision (ICCV)},  
year = {2021}  
}

Acknowledgments

Contact

If you enounter any issue when running the code, please feel free to reach us either by creating a new issue in the github or by emailing

Comments
  • questions about pixel-f1

    questions about pixel-f1

    Hi, Thank you very much for your work. I read your paper and tested your code (casia1.0) these days, but there are some doubts about the pixel F1 score As shown in the following image,In the case of fixed threshold , the image F1 and pixel f1 are consistent, but I want to get the best threshold test results,What can I do to get it? thanks image

    image

    image

    image

    opened by chchshshhh 10
  • Ground truth edge maps for training

    Ground truth edge maps for training

    Dear authors,

    Thanks for sharing the source code of MVSS-Net and it is really a great work!

    I am actually wondering how the edge loss is calculated during training. As CASIV v2 does not provide ground truth edge maps, did you extract the edge maps from the ground truth pixel maps on your own? If so, could you please share how you did it?

    Thank you!

    opened by zjbthomas 2
  • Is this work?

    Is this work?

    I am have my own small test image corpus. I am run this project. And... Result is around a ZERO! Casia or Defacto - no changes. May be MVSS work only with special cases? For example (original, fake, map, map on fake) and ignore text, please:

    MVSS-Net_girl MVSS-Net_bird_background

    opened by Vadim2S 2
  • Questions about the Implementation and Time to release model.

    Questions about the Implementation and Time to release model.

    Hello! Thanks for your inspiring work. And I 'm very intersted in it. Last two weeks, I read your great work and decided to re-Implement your network according to the paper. However, after two weeks, the network is roughly finished, and I trained it in the CASIA_v2 dataset. But the performance is not so good. I thought there must be something wrong with my code. So I have a few questions here.

    1. The red arrow in the Network structure fig means upsample by bilinear interpolate? It is a little confusing as the legend not mentioned it.
    2. What is the epochs and scheduler of the training process? Following your paper, I set the epoch to 75, and [20, 35, 50] the learing rate will multiple 10^-1.
    3. Questions about the loss. When it comes to authenic img, the loss only consists class-supervision, will this loss be multiplied by 0.04?
    4. The network structure implemented by myself may contain other questions, so when will you release the model? I'm very excited and appreciated if you could opensource the model. Thanks for your patience to read my questions. Sinerely hope that your research could comes to next level.
    opened by Codefmeister 2
  • Could not find the TianChi Model

    Could not find the TianChi Model

    opened by garyhsu29 1
  • questions about training in the paper

    questions about training in the paper

    Hi, @dong03 Thanks for your nice work! After reading your paper, I still have some questions:

    1. You see 'We re-train FCN (Seg) and MVSS-Net(full setup) from scratch on CASIAv2.', do you pre-train ResFCN first and then use this model to init your MVSS-Net? or just train these two models independently?
    2. What's the valid set when training?
    opened by Senwang98 1
  • about the dataset

    about the dataset

    hello, i have just downloaded the casia dataset, however, the dataset don't have the mask files, i have tried to get the mask according to the pixel difference, but those are not very accurate. Could you please tell me how do you get the mask gt? Thanks a lot.

    opened by romanticegg 1
  • columbia groundtruth

    columbia groundtruth

    Hello!May I ask whether you converted the groundtruth of Columbia dataset into a binary mask graph for training?After converting, I found that the conversion of tamper boundary was not thorough and the effect was not good enough.Can you share the groundtruth after your transformation?

    opened by laichou 0
  • corel images from CASIAv1plus dataset

    corel images from CASIAv1plus dataset

    The images coming from the corel dataset that are included in CASIAv1plus.txt have different names with the corel images downloaded from here https://sites.google.com/site/dctresearch/Home/content-based-image-retrieval (this link is included in the data/README.md)

    opened by apournaras 0
  • columbia dataset

    columbia dataset

    It seems like the names of the images of the columbia dataset are different from the version downloaded from here https://www.ee.columbia.edu/ln/dvmm/downloads/AuthSplicedDataSet/AuthSplicedDataSet.htm Is this the right version of the dataset?

    opened by apournaras 0
  • About the optimal threshold

    About the optimal threshold

    20220413085316 From this picture, most models have relatively clear judgments on the tampered area in most cases. Why can the F1 scores of most models be doubled or even tripled only by adjusting the optimal threshold? Effect of the threshold seem excessive?

    opened by areylng 0
  • generalization problem

    generalization problem

    hi, cause there is no training code , i try pretrain model on my own dataset(data from real world may experience resample, resize and muti-compression) , however i have tried MVSS pretrained model trained CAISA and DEFACTO but results are really disappointed (worse than Mantra-net),so may be the model should trained on my dataset first? or i should change some predict parameters?

    opened by kkpssr 0
Owner
dong_chengbo
dong_chengbo
Learning trajectory representations using self-supervision and programmatic supervision.

Trajectory Embedding for Behavior Analysis (TREBA) Implementation from the paper: Jennifer J. Sun, Ann Kennedy, Eric Zhan, David J. Anderson, Yisong Y

null 58 Nov 30, 2022
Mixup for Supervision, Semi- and Self-Supervision Learning Toolbox and Benchmark

OpenSelfSup News Downstream tasks now support more methods(Mask RCNN-FPN, RetinaNet, Keypoints RCNN) and more datasets(Cityscapes). 'GaussianBlur' is

AI Lab, Westlake University 304 Nov 30, 2022
Blender add-on: Add to Cameras menu: View → Camera, View → Add Camera, Camera → View, Previous Camera, Next Camera

Blender add-on: Camera additions In 3D view, it adds these actions to the View|Cameras menu: View → Camera : set the current camera to the 3D view Vie

German Bauer 11 Feb 8, 2022
Multi-Scale Geometric Consistency Guided Multi-View Stereo

ACMM [News] The code for ACMH is released!!! [News] The code for ACMP is released!!! About ACMM is a multi-scale geometric consistency guided multi-vi

Qingshan Xu 116 Dec 1, 2022
Code of U2Fusion: a unified unsupervised image fusion network for multiple image fusion tasks, including multi-modal, multi-exposure and multi-focus image fusion.

U2Fusion Code of U2Fusion: a unified unsupervised image fusion network for multiple image fusion tasks, including multi-modal (VIS-IR, medical), multi

Han Xu 125 Dec 1, 2022
ImVoxelNet: Image to Voxels Projection for Monocular and Multi-View General-Purpose 3D Object Detection

ImVoxelNet: Image to Voxels Projection for Monocular and Multi-View General-Purpose 3D Object Detection This repository contains implementation of the

Visual Understanding Lab @ Samsung AI Center Moscow 185 Nov 29, 2022
Code release for SLIP Self-supervision meets Language-Image Pre-training

SLIP: Self-supervision meets Language-Image Pre-training What you can find in this repo: Pre-trained models (with ViT-Small, Base, Large) and code to

Meta Research 611 Nov 27, 2022
Official implementation of "SinIR: Efficient General Image Manipulation with Single Image Reconstruction" (ICML 2021)

SinIR (Official Implementation) Requirements To install requirements: pip install -r requirements.txt We used Python 3.7.4 and f-strings which are in

null 47 Oct 11, 2022
Provided is code that demonstrates the training and evaluation of the work presented in the paper: "On the Detection of Digital Face Manipulation" published in CVPR 2020.

FFD Source Code Provided is code that demonstrates the training and evaluation of the work presented in the paper: "On the Detection of Digital Face M

null 88 Nov 22, 2022
Official implementation of "Not only Look, but also Listen: Learning Multimodal Violence Detection under Weak Supervision" ECCV2020

XDVioDet Official implementation of "Not only Look, but also Listen: Learning Multimodal Violence Detection under Weak Supervision" ECCV2020. The proj

peng 63 Nov 20, 2022
[CVPR'21] Projecting Your View Attentively: Monocular Road Scene Layout Estimation via Cross-view Transformation

Projecting Your View Attentively: Monocular Road Scene Layout Estimation via Cross-view Transformation Weixiang Yang, Qi Li, Wenxi Liu, Yuanlong Yu, Y

null 115 Nov 27, 2022
PanopticBEV - Bird's-Eye-View Panoptic Segmentation Using Monocular Frontal View Images

Bird's-Eye-View Panoptic Segmentation Using Monocular Frontal View Images This r

null 61 Nov 30, 2022
Sound-guided Semantic Image Manipulation - Official Pytorch Code (CVPR 2022)

?? Sound-guided Semantic Image Manipulation (CVPR2022) Official Pytorch Implementation Sound-guided Semantic Image Manipulation IEEE/CVF Conference on

CVLAB 55 Nov 28, 2022
Code for "Multi-View Multi-Person 3D Pose Estimation with Plane Sweep Stereo"

Multi-View Multi-Person 3D Pose Estimation with Plane Sweep Stereo This repository includes the source code for our CVPR 2021 paper on multi-view mult

Jiahao Lin 60 Dec 1, 2022
CLIP: Connecting Text and Image (Learning Transferable Visual Models From Natural Language Supervision)

CLIP (Contrastive Language–Image Pre-training) Experiments (Evaluation) Model Dataset Acc (%) ViT-B/32 (Paper) CIFAR100 65.1 ViT-B/32 (Our) CIFAR100 6

Myeongjun Kim 50 Oct 9, 2022
Supervision Exists Everywhere: A Data Efficient Contrastive Language-Image Pre-training Paradigm

DeCLIP Supervision Exists Everywhere: A Data Efficient Contrastive Language-Image Pre-training Paradigm. Our paper is available in arxiv Updates ** Ou

Sense-GVT 466 Dec 2, 2022
Learning Pixel-level Semantic Affinity with Image-level Supervision for Weakly Supervised Semantic Segmentation, CVPR 2018

Learning Pixel-level Semantic Affinity with Image-level Supervision This code is deprecated. Please see https://github.com/jiwoon-ahn/irn instead. Int

Jiwoon Ahn 336 Nov 30, 2022
An unofficial implementation of "Unpaired Image Super-Resolution using Pseudo-Supervision." CVPR2020

UnpairedSR An unofficial implementation of "Unpaired Image Super-Resolution using Pseudo-Supervision." CVPR2020 turn RCAN(modified) --> xmodel(xilinx

JiaKui Hu 10 Oct 28, 2022
Official repository of the paper Learning to Regress 3D Face Shape and Expression from an Image without 3D Supervision

Official repository of the paper Learning to Regress 3D Face Shape and Expression from an Image without 3D Supervision

Soubhik Sanyal 680 Nov 30, 2022