code for Image Manipulation Detection by Multi-View Multi-Scale Supervision

dong_chengbo

Last update: Dec 30, 2022

Related tags

Deep Learning MVSS-Net

Overview

MVSS-Net

Code and models for ICCV 2021 paper: Image Manipulation Detection by Multi-View Multi-Scale Supervision

Update

22.02.17, Pretrained model for Real-World Image Foregery Localization Challange

To Be Done.

21.12.17, Something new: MVSS-Net++

We now have an improved version of MVSS-Net, denoted as MVSS-Net++. Check here.

Environment

Ubuntu 16.04.6 LTS
Python 3.6
cuda10.1+cudnn7.6.3

Requirements

Install nvidia-apex and move it to current directory.
pip install requirements.txt

Usage

Dataset

An example of the dataset index file is given as data/CASIAv1plus.txt, where each line contains:

img_path mask_path label

0 represents the authentic and 1 represents the manipulated.
For an authentic image, the mask_path is "None".
For wild images without mask groundtruth, the index should at least contain "img_path" per line.

Training sets

DEFACTO-84k
CASIAv2 / Edge-Mask

Test sets

DEFACTO-12k
Columbia
COVER
NIST16
CASIAv1plus: Note that some of the authentic images in CASIAv1 also appear in CASIAv2. With those images fully replaced by Corel images that are new to both CASIAv1 and CASIAv2, we constructed a revision of CASIAv1 termed as CASIAv1plus. We recommend to use CASIAv1plus as an alternative to the original CASIAv1.

Trained Models

We offer FCNs and MVSS-Nets trained on CASIAv2 and DEFACTO_84k, respectively. Please download the models and place them in the ckpt directory:

百度网盘 (提取码：mvss)
Google drive

The performance of these models for image-level manipulation detection (metric: AUC and image-level F1) is as follows. More details are reported in the paper.

Performance metric: AUC

Model	Training data	CASIAv1plus	Columbia	COVER	DEFACTO-12k
MVSS_Net	CASIAv2	0.932	0.980	0.731	0.573
MVSS_Net	DEFACTO-84k	0.771	0.563	0.525	0.886
FCN	CASIAv2	0.769	0.762	0.541	0.551
FCN	DEFACTO-84k	0.629	0.535	0.543	0.840

Performance metric: Image-level F1 (threshold=0.5)

Model	Training data	CASIAv1plus	Columbia	COVER	DEFACTO-12k
MVSS_Net	CASIAv2	0.759	0.802	0.244	0.404
MVSS_Net	DEFACTO-84k	0.685	0.353	0.360	0.799
FCN	CASIAv2	0.684	0.481	0.180	0.458
FCN	DEFACTO-84k	0.561	0.492	0.511	0.709

Inference & Evaluation

You can specify which pre-trained model to use by setting model_path in do_pred_and_eval.sh. Given a test_collection (e.g. CASIAv1plus or DEFACTO12k-test), the prediction maps and evaluation results will be saved under save_dir. The default threshold is set as 0.5.

bash do_pred_and_eval.sh $test_collection
#e.g. bash do_pred_and_eval.sh CASIAv1plus

For inference only, use following command to skip evaluation:

bash do_pred.sh $test_collection
#e.g. bash do_pred.sh CASIAv1plus

Demo

demo.ipynb: A step-by-step notebook tutorial showing the usage of a pre-trained model to detect manipulation in a specific image.

Citation

If you find this work useful in your research, please consider citing:

@InProceedings{MVSS_2021ICCV,  
author = {Chen, Xinru and Dong, Chengbo and Ji, Jiaqi and Cao, juan and Li, Xirong},  
title = {Image Manipulation Detection by Multi-View Multi-Scale Supervision},  
booktitle = {The IEEE International Conference on Computer Vision (ICCV)},  
year = {2021}  
}

Acknowledgments

Nvidia-apex is adopted for semi-precision training/inferencing.
The implement of DA module is based on the awesome-semantic-segmentation-pytorch.

Contact

If you enounter any issue when running the code, please feel free to reach us either by creating a new issue in the github or by emailing

Xinru Chen ([email protected])
Chengbo Dong ([email protected])

Comments

questions about pixel-f1

Hi, Thank you very much for your work. I read your paper and tested your code (casia1.0) these days, but there are some doubts about the pixel F1 score As shown in the following image,In the case of fixed threshold , the image F1 and pixel f1 are consistent, but I want to get the best threshold test results,What can I do to get it? thanks

opened by chchshshhh 10
Ground truth edge maps for training

Dear authors,

Thanks for sharing the source code of MVSS-Net and it is really a great work!

I am actually wondering how the edge loss is calculated during training. As CASIV v2 does not provide ground truth edge maps, did you extract the edge maps from the ground truth pixel maps on your own? If so, could you please share how you did it?

Thank you!

opened by zjbthomas 2
Is this work?

I am have my own small test image corpus. I am run this project. And... Result is around a ZERO! Casia or Defacto - no changes. May be MVSS work only with special cases? For example (original, fake, map, map on fake) and ignore text, please:

opened by Vadim2S 2
Questions about the Implementation and Time to release model.
Hello! Thanks for your inspiring work. And I 'm very intersted in it. Last two weeks, I read your great work and decided to re-Implement your network according to the paper. However, after two weeks, the network is roughly finished, and I trained it in the CASIA_v2 dataset. But the performance is not so good. I thought there must be something wrong with my code. So I have a few questions here.

The red arrow in the Network structure fig means upsample by bilinear interpolate? It is a little confusing as the legend not mentioned it.

What is the epochs and scheduler of the training process? Following your paper, I set the epoch to 75, and [20, 35, 50] the learing rate will multiple 10^-1.

Questions about the loss. When it comes to authenic img, the loss only consists class-supervision, will this loss be multiplied by 0.04?

The network structure implemented by myself may contain other questions, so when will you release the model? I'm very excited and appreciated if you could opensource the model. Thanks for your patience to read my questions. Sinerely hope that your research could comes to next level.
opened by Codefmeister 2
Could not find the TianChi Model

In the Readme it said: Update: 22.02.17, Pretrained model for Real-World Image Foregery Localization Challange But I could not find the model anywhere, could you please help me on that? Thank you!

opened by garyhsu29 1
questions about training in the paper
Hi, @dong03 Thanks for your nice work! After reading your paper, I still have some questions:

You see 'We re-train FCN (Seg) and MVSS-Net(full setup) from scratch on CASIAv2.', do you pre-train ResFCN first and then use this model to init your MVSS-Net? or just train these two models independently?

What's the valid set when training?
opened by Senwang98 1
about the dataset

hello, i have just downloaded the casia dataset, however, the dataset don't have the mask files, i have tried to get the mask according to the pixel difference, but those are not very accurate. Could you please tell me how do you get the mask gt? Thanks a lot.

opened by romanticegg 1
columbia groundtruth

Hello!May I ask whether you converted the groundtruth of Columbia dataset into a binary mask graph for training?After converting, I found that the conversion of tamper boundary was not thorough and the effect was not good enough.Can you share the groundtruth after your transformation?

opened by laichou 0
corel images from CASIAv1plus dataset

The images coming from the corel dataset that are included in CASIAv1plus.txt have different names with the corel images downloaded from here https://sites.google.com/site/dctresearch/Home/content-based-image-retrieval (this link is included in the data/README.md)

opened by apournaras 0
columbia dataset

It seems like the names of the images of the columbia dataset are different from the version downloaded from here https://www.ee.columbia.edu/ln/dvmm/downloads/AuthSplicedDataSet/AuthSplicedDataSet.htm Is this the right version of the dataset?

opened by apournaras 0
About the optimal threshold

From this picture, most models have relatively clear judgments on the tampered area in most cases. Why can the F1 scores of most models be doubled or even tripled only by adjusting the optimal threshold? Effect of the threshold seem excessive?

opened by areylng 0
generalization problem

hi, cause there is no training code , i try pretrain model on my own dataset(data from real world may experience resample, resize and muti-compression) , however i have tried MVSS pretrained model trained CAISA and DEFACTO but results are really disappointed (worse than Mantra-net),so may be the model should trained on my dataset first? or i should change some predict parameters?

opened by kkpssr 0

Owner

dong_chengbo

GitHub

Learning trajectory representations using self-supervision and programmatic supervision.

Trajectory Embedding for Behavior Analysis (TREBA) Implementation from the paper: Jennifer J. Sun, Ann Kennedy, Eric Zhan, David J. Anderson, Yisong Y

58 Jan 6, 2023

Mixup for Supervision, Semi- and Self-Supervision Learning Toolbox and Benchmark

OpenSelfSup News Downstream tasks now support more methods(Mask RCNN-FPN, RetinaNet, Keypoints RCNN) and more datasets(Cityscapes). 'GaussianBlur' is

332 Jan 3, 2023

Blender add-on: Add to Cameras menu: View → Camera, View → Add Camera, Camera → View, Previous Camera, Next Camera

Blender add-on: Camera additions In 3D view, it adds these actions to the View|Cameras menu: View → Camera : set the current camera to the 3D view Vie

11 Feb 8, 2022

Multi-Scale Geometric Consistency Guided Multi-View Stereo

ACMM [News] The code for ACMH is released!!! [News] The code for ACMP is released!!! About ACMM is a multi-scale geometric consistency guided multi-vi

118 Jan 4, 2023

Code of U2Fusion: a unified unsupervised image fusion network for multiple image fusion tasks, including multi-modal, multi-exposure and multi-focus image fusion.

U2Fusion Code of U2Fusion: a unified unsupervised image fusion network for multiple image fusion tasks, including multi-modal (VIS-IR, medical), multi

129 Dec 11, 2022

ImVoxelNet: Image to Voxels Projection for Monocular and Multi-View General-Purpose 3D Object Detection

ImVoxelNet: Image to Voxels Projection for Monocular and Multi-View General-Purpose 3D Object Detection This repository contains implementation of the

Visual Understanding Lab @ Samsung AI Center Moscow

190 Dec 30, 2022

Official implementation of "SinIR: Efficient General Image Manipulation with Single Image Reconstruction" (ICML 2021)

SinIR (Official Implementation) Requirements To install requirements: pip install -r requirements.txt We used Python 3.7.4 and f-strings which are in

47 Oct 11, 2022

Code release for SLIP Self-supervision meets Language-Image Pre-training

SLIP: Self-supervision meets Language-Image Pre-training What you can find in this repo: Pre-trained models (with ViT-Small, Base, Large) and code to

621 Dec 31, 2022

Provided is code that demonstrates the training and evaluation of the work presented in the paper: "On the Detection of Digital Face Manipulation" published in CVPR 2020.

FFD Source Code Provided is code that demonstrates the training and evaluation of the work presented in the paper: "On the Detection of Digital Face M

88 Nov 22, 2022

Official implementation of "Not only Look, but also Listen: Learning Multimodal Violence Detection under Weak Supervision" ECCV2020

XDVioDet Official implementation of "Not only Look, but also Listen: Learning Multimodal Violence Detection under Weak Supervision" ECCV2020. The proj

64 Dec 12, 2022

[CVPR'21] Projecting Your View Attentively: Monocular Road Scene Layout Estimation via Cross-view Transformation

Projecting Your View Attentively: Monocular Road Scene Layout Estimation via Cross-view Transformation Weixiang Yang, Qi Li, Wenxi Liu, Yuanlong Yu, Y

118 Dec 26, 2022

PanopticBEV - Bird's-Eye-View Panoptic Segmentation Using Monocular Frontal View Images

Bird's-Eye-View Panoptic Segmentation Using Monocular Frontal View Images This r

63 Dec 16, 2022

Sound-guided Semantic Image Manipulation - Official Pytorch Code (CVPR 2022)

?? Sound-guided Semantic Image Manipulation (CVPR2022) Official Pytorch Implementation Sound-guided Semantic Image Manipulation IEEE/CVF Conference on

58 Dec 28, 2022

Code for "Multi-View Multi-Person 3D Pose Estimation with Plane Sweep Stereo"

Multi-View Multi-Person 3D Pose Estimation with Plane Sweep Stereo This repository includes the source code for our CVPR 2021 paper on multi-view mult

66 Jan 4, 2023

CLIP: Connecting Text and Image (Learning Transferable Visual Models From Natural Language Supervision)

CLIP (Contrastive Language–Image Pre-training) Experiments (Evaluation) Model Dataset Acc (%) ViT-B/32 (Paper) CIFAR100 65.1 ViT-B/32 (Our) CIFAR100 6

52 Jan 7, 2023

Supervision Exists Everywhere: A Data Efficient Contrastive Language-Image Pre-training Paradigm

DeCLIP Supervision Exists Everywhere: A Data Efficient Contrastive Language-Image Pre-training Paradigm. Our paper is available in arxiv Updates ** Ou

470 Dec 30, 2022

Learning Pixel-level Semantic Affinity with Image-level Supervision for Weakly Supervised Semantic Segmentation, CVPR 2018

Learning Pixel-level Semantic Affinity with Image-level Supervision This code is deprecated. Please see https://github.com/jiwoon-ahn/irn instead. Int

337 Dec 15, 2022

An unofficial implementation of "Unpaired Image Super-Resolution using Pseudo-Supervision." CVPR2020

UnpairedSR An unofficial implementation of "Unpaired Image Super-Resolution using Pseudo-Supervision." CVPR2020 turn RCAN(modified) --> xmodel(xilinx

10 Oct 28, 2022

Official repository of the paper Learning to Regress 3D Face Shape and Expression from an Image without 3D Supervision

689 Dec 25, 2022