A denoising autoencoder + adversarial losses and attention mechanisms for face swapping.

Overview

faceswap-GAN

Adding Adversarial loss and perceptual loss (VGGface) to deepfakes'(reddit user) auto-encoder architecture.

Updates

Date    Update
2018-08-27     Colab support: A colab notebook for faceswap-GAN v2.2 is provided.
2018-07-25     Data preparation: Add a new notebook for video pre-processing in which MTCNN is used for face detection as well as face alignment.
2018-06-29     Model architecture: faceswap-GAN v2.2 now supports different output resolutions: 64x64, 128x128, and 256x256. Default RESOLUTION = 64 can be changed in the config cell of v2.2 notebook.
2018-06-25     New version: faceswap-GAN v2.2 has been released. The main improvements of v2.2 model are its capability of generating realistic and consistent eye movements (results are shown below, or Ctrl+F for eyes), as well as higher video quality with face alignment.
2018-06-06     Model architecture: Add a self-attention mechanism proposed in SAGAN into V2 GAN model. (Note: There is still no official code release for SAGAN, the implementation in this repo. could be wrong. We'll keep an eye on it.)

Google Colab support

Here is a playground notebook for faceswap-GAN v2.2 on Google Colab. Users can train their own model in the browser.

[Update 2019/10/04] There seems to be import errors in the latest Colab environment due to inconsistent version of packages. Please make sure that the Keras and TensorFlow follow the version number shown in the requirement section below.

Descriptions

faceswap-GAN v2.2

  • FaceSwap_GAN_v2.2_train_test.ipynb

    • Notebook for model training of faceswap-GAN model version 2.2.
    • This notebook also provides code for still image transformation at the bottom.
    • Require additional training images generated through prep_binary_masks.ipynb.
  • FaceSwap_GAN_v2.2_video_conversion.ipynb

    • Notebook for video conversion of faceswap-GAN model version 2.2.
    • Face alignment using 5-points landmarks is introduced to video conversion.
  • prep_binary_masks.ipynb

    • Notebook for training data preprocessing. Output binary masks are save in ./binary_masks/faceA_eyes and ./binary_masks/faceB_eyes folders.
    • Require face_alignment package. (An alternative method for generating binary masks (not requiring face_alignment and dlib packages) can be found in MTCNN_video_face_detection_alignment.ipynb.)
  • MTCNN_video_face_detection_alignment.ipynb

    • This notebook performs face detection/alignment on the input video.
    • Detected faces are saved in ./faces/raw_faces and ./faces/aligned_faces for non-aligned/aligned results respectively.
    • Crude eyes binary masks are also generated and saved in ./faces/binary_masks_eyes. These binary masks can serve as a suboptimal alternative to masks generated through prep_binary_masks.ipynb.

Usage

  1. Run MTCNN_video_face_detection_alignment.ipynb to extract faces from videos. Manually move/rename the aligned face images into ./faceA/ or ./faceB/ folders.
  2. Run prep_binary_masks.ipynb to generate binary masks of training images.
    • You can skip this pre-processing step by (1) setting use_bm_eyes=False in the config cell of the train_test notebook, or (2) use low-quality binary masks generated in step 1.
  3. Run FaceSwap_GAN_v2.2_train_test.ipynb to train models.
  4. Run FaceSwap_GAN_v2.2_video_conversion.ipynb to create videos using the trained models in step 3.

Miscellaneous

Training data format

  • Face images are supposed to be in ./faceA/ or ./faceB/ folder for each taeget respectively.
  • Images will be resized to 256x256 during training.

Generative adversarial networks for face swapping

1. Architecture

enc_arch3d

dec_arch3d

dis_arch3d

2. Results

  • Improved output quality: Adversarial loss improves reconstruction quality of generated images. trump_cage

  • Additional results: This image shows 160 random results generated by v2 GAN with self-attention mechanism (image format: source -> mask -> transformed).

  • Evaluations: Evaluations of the output quality on Trump/Cage dataset can be found here.

The Trump/Cage images are obtained from the reddit user deepfakes' project on pastebin.com.

3. Features

  • VGGFace perceptual loss: Perceptual loss improves direction of eyeballs to be more realistic and consistent with input face. It also smoothes out artifacts in the segmentation mask, resulting higher output quality.

  • Attention mask: Model predicts an attention mask that helps on handling occlusion, eliminating artifacts, and producing natrual skin tone.

  • Configurable input/output resolution (v2.2): The model supports 64x64, 128x128, and 256x256 outupt resolutions.

  • Face tracking/alignment using MTCNN and Kalman filter in video conversion:

    • MTCNN is introduced for more stable detections and reliable face alignment (FA).
    • Kalman filter smoothen the bounding box positions over frames and eliminate jitter on the swapped face. comp_FA
  • Eyes-aware training: Introduce high reconstruction loss and edge loss in eyes area, which guides the model to generate realistic eyes.

Frequently asked questions and troubleshooting

1. How does it work?

  • The following illustration shows a very high-level and abstract (but not exactly the same) flowchart of the denoising autoencoder algorithm. The objective functions look like this. flow_chart

2. Previews look good, but it does not transform to the output videos?

  • Model performs its full potential when the input images are preprocessed with face alignment methods.
    • readme_note001

Requirements

Acknowledgments

Code borrows from tjwei, eriklindernoren, fchollet, keras-contrib and reddit user deepfakes' project. The generative network is adopted from CycleGAN. Weights and scripts of MTCNN are from FaceNet. Illustrations are from irasutoya.

Comments
  • dlib video face detection takes massive amount of time

    dlib video face detection takes massive amount of time

    I am not a jupyter user although I assume I have the repository setup correctly as it seems to be doing work. I have been working with other code bases for a while using the same modules so a dependency issue is likely not an issue.

    When I step through the code for dlib_video_face_detection.ipynb I get to the code block where moviepy does some manipulation of an input video. According to the timestamp on the output I am looking at a very long time before it is complete.

    0%| | 9/15887 [11:09<329:34:28, 74.72s/it]

    The target video is 1280x720, 00:08:50 long with a data rate of 1413kbps. My hardware consists of a 3.5ghz i5, gtx1080, and 16gb of ram. Training large data sets have not been a problem for me so I am unsure of why processing a video frame by frame would take this long.

    What is the purpose of the

    output = '_.mp4'
    clip1 = VideoFileClip("x-cropped.mp4")
    clip = clip1.fl_image(process_video)#.subclip(0,10) #NOTE: this function expects color images!!
    %time clip.write_videofile(output, audio=False)
    

    block besides to run the process_video method on each frame? I have used the dlib module as a stand alone script and it processes the video in a handful of minutes pulling many faces as a result.

    Would it be beneficial to pre-process the video file in ffmpeg before handing the work to the notebook in any way? Perhaps rip the frames beforehand so that moviepy would not need to step frame by frame?

    opened by abduct 12
  • "Weights file not found." despite them being present

    try:
    	encoder.load_weights("models/encoder.h5")
    	decoder_A.load_weights("models/decoder_A.h5")
    	decoder_B.load_weights("models/decoder_B.h5")
    	# netDA.load_weights("models/netDA.h5") 
    	# netDB.load_weights("models/netDB.h5") 
    	print("model loaded.")
    except:
    	print("weights file not found.")
    	pass
    

    At this point in the code, it always fails, saying it couldn't find the weight files. They are located at ./faceswap-GAN-master/models/. Is this incorrect? I should note that the model is the Trump to Cage model from the deepfakes/faceswap project, and that I commented out netDA and netDB because they do not exist.

    Any help? Thank you.

    opened by Irastris 10
  • The shape of face did't match, Using default setting.

    The shape of face did't match, Using default setting.

    opened by isleon 8
  • The new GAN version doesn't seem to work

    The new GAN version doesn't seem to work

    I've been training the new version for 10000 + 6000 iterations and the output (from the show_g function) doesn't even start to change... I saw some outputs where the network tried to mimic PersonB but at the end of the day the three columns [test_A, path_A(test_A), path_B(test_A)] all look the same as test_A and vice versa...

    Not only that, when I tried turning use_mixup to False I had this error about different number of channels here:

    number of input channels does not match corresponding dimension of filter, 3 != 6 `output_real = netD(real) # positive

    It seems we have to manually change nc_D_inp to 3 instead of 6.

    Keep up the good work...

    opened by Nelthirion 8
  • gtx 1060 with 6G ram, out of memory

    gtx 1060 with 6G ram, out of memory

    I am running the script in gtx 1060 with 6G ram laptop, it halts on 56 niters and said out of memory, is there any way I could lower the memory requirement? Thanks.

    opened by chikiuso 7
  • error on face transform

    error on face transform

    When I am running this line in FaceSwap_GAN_v2.2_train_test.ipynb result_img, result_rgb, result_mask = ftrans.transform( aligned_det_face_im, direction="AtoB", roi_coverage=0.93, color_correction="adain_xyz", IMAGE_SHAPE=(RESOLUTION, RESOLUTION, 3) )
    I got this error: Invalid argument: transpose expects a vector of size 3. But input(1) is a vector of size 4 [[{{node model_22/conv2d_157/convolution-0-TransposeNHWCToNCHW-LayoutOptimizer}}]] [[concatenate_24/concat/_2775]] Has anybody encountered this?

    opened by krunt 5
  • Is it possible to enable multi GPU support?

    Is it possible to enable multi GPU support?

    I have access to a machine that has two K60 Tesla cards attached to it. Running FaceSwap_GAN_v2_train.ipynb I can see in nvidia-smi that one GPU was at 100% usage and one was at 0% usage.

    Is there any configuration I need to do to allow the libraries that your script calls to use more than one GPU?

    opened by leftler 5
  • time is 1875.640280

    time is 1875.640280

    I am wondering if the train screen changes every 1875 seconds and this speed is correct. Can not use this program with CUDA?

    [2/150][50] Loss_DA: 0.205057 Loss_DB: 0.193327 Loss_GA: 0.415360 Loss_GB: 0.413623 time: 1875.640280 [4/150][100] Loss_DA: 0.200707 Loss_DB: 0.211183 Loss_GA: 0.295002 Loss_GB: 0.341839 time: 3606.393274

    opened by bigsea00001 5
  • SOS, I don't have the weights files!

    SOS, I don't have the weights files!

    try: encoder.load_weights("models/encoder.h5") decoder_A.load_weights("models/decoder_A.h5") decoder_B.load_weights("models/decoder_B.h5") # netDA.load_weights("models/netDA.h5") # netDB.load_weights("models/netDB.h5") print("model loaded.") except: print("weights file not found.") pass

    Failed loading. Something goes wrong with weights files.

    I ran the FaceSwap_GAN_v2.1_train at the beginning, and threw this exception. So where can I find the all weight files, or I need to run which file first?

    opened by Elephantameler 4
  • Could you change the input and output to 128, 128 for WGAN model?

    Could you change the input and output to 128, 128 for WGAN model?

    Hi, I see a WGAN in temp folder. Could you help change the input and output to 128, 128 for WGAN model? I tried to change it but not succeed, Thanks!!

    opened by chikiuso 4
  • Training stops at a certain iteration

    Training stops at a certain iteration

    I have a GTX 1080 and have trained successfully the original NN, but when I try to train these, they run for like an hour and then stop training at a given iteration. If I rerun the function, it stops at the same one.

    opened by ghost 4
  • Exception: URL fetch failure on https://github.com/rcmalli/keras-vggface/releases/download/v2.0/rcmalli_vggface_tf_notop_resnet50.h5 : None -- retrieval incomplete: got only 43057152 out of 94694792 bytes

    Exception: URL fetch failure on https://github.com/rcmalli/keras-vggface/releases/download/v2.0/rcmalli_vggface_tf_notop_resnet50.h5 : None -- retrieval incomplete: got only 43057152 out of 94694792 bytes

    when I running the FaceSwap_GAN_v2.2_train_test, there are something wrong was happened Exception: URL fetch failure on https://github.com/rcmalli/keras-vggface/releases/download/v2.0/rcmalli_vggface_tf_notop_resnet50.h5 : None -- retrieval incomplete: got only 43057152 out of 94694792 bytes

    how can i resolve it?

    opened by crd0429 0
  • Asian-celeb dataset download link

    Asian-celeb dataset download link


    [Asian-celeb dataset]

    • Training data(Asian-celeb)

    The dataset consists of the crawled images of celebrities on the he web.The ima images are covered under a Creative Commons Attribution-NonCommercial 4.0 International license (Please read the license terms here. e. http://creativecommons.org/licenses/by-nc/4.0/).


    [train_msra.tar.gz]

    MD5:c5b668f2204c400099b14f367069aef5

    Content: Train dataset called MS-Celeb-1M-v1c with 86,876 ids/3,923,399 aligned images cleaned from MS-Celeb-1M dataset.

    This dataset has been excluded from both LFW and Asian-Celeb.

    Format: *.jpg

    Google: https://drive.google.com/file/d/1aaPdI0PkmQzRbWErazOgYtbLA1mwJIfK/view?usp=sharing

    [msra_lmk.tar.gz]

    MD5:7c053dd0462b4af243bb95b7b31da6e6

    Content: A list of five-point landmarks for the 3,923,399 images in MS-Celeb-1M-v1c.

    Format: .....

    while is the path of images in tar file train_msceleb.tar.gz.

    Label is an integer ranging from 0 to 86,875.

    (x,y) is the coordinate of a key point on the aligned images.

    left eye right eye nose tip mouth left mouth right

    Google: https://drive.google.com/file/d/1FQ7P4ItyKCneNEvYfJhW2Kff7cOAFpgk/view?usp=sharing

    [train_celebrity.tar.gz]

    MD5:9f2e9858afb6c1032c4f9d7332a92064

    Content: Train dataset called Asian-Celeb with 93,979 ids/2,830,146 aligned images.

    This dataset has been excluded from both LFW and MS-Celeb-1M-v1c.

    Format: *.jpg

    Google: https://drive.google.com/file/d/1-p2UKlcX06MhRDJxJukSZKTz986Brk8N/view?usp=sharing

    [celebrity_lmk.tar.gz]

    MD5:9c0260c77c13fbb32692fc06a5dbfaf0

    Content: A list of five-point landmarks for the 2,830,146 images in Asian-Celeb.

    Format: .....

    while is the path of images in tar file train_celebrity.tar.gz.

    Label is an integer ranging from 86,876 to 196,319.

    (x,y) is the coordinate of a key point on the aligned images.

    left eye right eye nose tip mouth left mouth right

    Google: https://drive.google.com/file/d/1sQVV9epoF_8jS3ge6DqbilpWk3UNE8U7/view?usp=sharing

    [testdata.tar.gz]

    MD5:f17c4712f7562ea6d45f0a158e59b792

    Content: Test dataset with 1,862,120 aligned images.

    Format: *.jpg

    Google: https://drive.google.com/file/d/1ghzuEQqmUFN3nVujfrZfBx_CeGUpWzuw/view?usp=sharing

    [testdata_lmk.tar]

    MD5:7e4995eb9976a2cfd2b23db05d76572c

    Content: A list of five-point landmarks for the 1,862,120 images in testdata.tar.gz.

    Features should be extracted in the same sequence and with the same amount with this list.

    Format: .....

    while is the path of images in tar file testdata.tar.gz.

    (x,y) is the coordinate of a key point on the aligned images.

    left eye right eye nose tip mouth left mouth right

    Google: https://drive.google.com/file/d/1lYzqnPyHXRVgXJYbEVh6zTXn3Wq4JO-I/view?usp=sharing

    [feature_tools.tar.gz]

    MD5:227b069d7a83aa43b0cb738c2252dbc4

    Content: Feature format transform tool and a sample feature file.

    Format: We use the same format as Megaface(http://megaface.cs.washington.edu/) except that we merge all files into a single binary file.

    Google: https://drive.google.com/file/d/1bjZwOonyZ9KnxecuuTPVdY95mTIXMeuP/view?usp=sharing

    opened by AmesianX 0
  • AttributeError: 'str' object has no attribute 'decode'

    AttributeError: 'str' object has no attribute 'decode'

    #from keras_vggface.vggface import VGGFace

    VGGFace ResNet50

    #vggface = VGGFace(include_top=False, model='resnet50', input_shape=(224, 224, 3))'

    from colab_demo.vggface_models import RESNET50 vggface = RESNET50(include_top=False, weights=None, input_shape=(224, 224, 3)) vggface.load_weights("rcmalli_vggface_tf_notop_resnet50.h5")

    #from keras.applications.resnet50 import ResNet50 #vggface = ResNet50(include_top=False, input_shape=(224, 224, 3))

    #vggface.summary()

    model.build_pl_model(vggface_model=vggface, before_activ=loss_config["PL_before_activ"]) model.build_train_functions(loss_weights=loss_weights, **loss_config)

    Error:

    AttributeError Traceback (most recent call last) in () 6 from colab_demo.vggface_models import RESNET50 7 vggface = RESNET50(include_top=False, weights=None, input_shape=(224, 224, 3)) ----> 8 vggface.load_weights("rcmalli_vggface_tf_notop_resnet50.h5") 9 10 #from keras.applications.resnet50 import ResNet50

    1 frames /usr/local/lib/python3.7/dist-packages/keras/engine/topology.py in load_weights_from_hdf5_group(f, layers, reshape) 3326 """ 3327 if 'keras_version' in f.attrs: -> 3328 original_keras_version = f.attrs['keras_version'].decode('utf8') 3329 else: 3330 original_keras_version = '1'

    AttributeError: 'str' object has no attribute 'decode'

    opened by LZZ383 2
  • AsserstionError

    AsserstionError

    '''' AssertionError Traceback (most recent call last)

    in () 14 15 model.build_pl_model(vggface_model=vggface, before_activ=loss_config["PL_before_activ"]) ---> 16 model.build_train_functions(loss_weights=loss_weights, **loss_config)

    2 frames

    /usr/local/lib/python3.7/dist-packages/keras/legacy/interfaces.py in get_updates_arg_preprocessing(args, kwargs) 652 kwargs['params'] = params 653 return [opt], kwargs, [] --> 654 elif len(args) == 3: 655 if isinstance(args[1], (list, tuple)): 656 assert isinstance(args[2], dict)

    AssertionError: '''

    opened by varunp2k 0
  • 'FaceswapGANModel' object has no attribute 'netDA_train'

    'FaceswapGANModel' object has no attribute 'netDA_train'

    AttributeError Traceback (most recent call last) in 112 data_A = train_batchA.get_next_batch() 113 data_B = train_batchB.get_next_batch() --> 114 errDA, errDB = model.train_one_batch_D(data_A=data_A, data_B=data_B) 115 errDA_sum +=errDA[0] 116 errDB_sum +=errDB[0]

    F:\faceswap-GAN-master\networks\faceswap_gan_model.py in train_one_batch_D(self, data_A, data_B) 326 else: 327 raise ValueError("Something's wrong with the input data generator.") --> 328 errDA = self.netDA_train([warped_A, target_A]) 329 errDB = self.netDB_train([warped_B, target_B]) 330 return errDA, errDB

    AttributeError: 'FaceswapGANModel' object has no attribute 'netDA_train'

    How can I solve this problem

    opened by yangdazhuo816 1
Owner
null
Official Implementation of Swapping Autoencoder for Deep Image Manipulation (NeurIPS 2020)

Swapping Autoencoder for Deep Image Manipulation Taesung Park, Jun-Yan Zhu, Oliver Wang, Jingwan Lu, Eli Shechtman, Alexei A. Efros, Richard Zhang UC

null 449 Dec 27, 2022
Implementation of Bidirectional Recurrent Independent Mechanisms (Learning to Combine Top-Down and Bottom-Up Signals in Recurrent Neural Networks with Attention over Modules)

BRIMs Bidirectional Recurrent Independent Mechanisms Implementation of the paper Learning to Combine Top-Down and Bottom-Up Signals in Recurrent Neura

Sarthak Mittal 26 May 26, 2022
Implementation of self-attention mechanisms for general purpose. Focused on computer vision modules. Ongoing repository.

Self-attention building blocks for computer vision applications in PyTorch Implementation of self attention mechanisms for computer vision in PyTorch

AI Summer 962 Dec 23, 2022
🍀 Pytorch implementation of various Attention Mechanisms, MLP, Re-parameter, Convolution, which is helpful to further understand papers.⭐⭐⭐

?? Pytorch implementation of various Attention Mechanisms, MLP, Re-parameter, Convolution, which is helpful to further understand papers.⭐⭐⭐

xmu-xiaoma66 7.7k Jan 5, 2023
CoSMA: Convolutional Semi-Regular Mesh Autoencoder. From Paper "Mesh Convolutional Autoencoder for Semi-Regular Meshes of Different Sizes"

Mesh Convolutional Autoencoder for Semi-Regular Meshes of Different Sizes Implementation of CoSMA: Convolutional Semi-Regular Mesh Autoencoder arXiv p

Fraunhofer SCAI 10 Oct 11, 2022
Implementing Graph Convolutional Networks and Information Retrieval Mechanisms using pure Python and NumPy

Implementing Graph Convolutional Networks and Information Retrieval Mechanisms using pure Python and NumPy

Noah Getz 3 Jun 22, 2022
Hypersearch weight debugging and losses tutorial

tutorial Activate tensorboard option Running TensorBoard remotely When working on a remote server, you can use SSH tunneling to forward the port of th

null 1 Dec 11, 2021
Variational autoencoder for anime face reconstruction

VAE animeface Variational autoencoder for anime face reconstruction Introduction This repository is an exploratory example to train a variational auto

Minzhe Zhang 2 Dec 11, 2021
Seach Losses of our paper 'Loss Function Discovery for Object Detection via Convergence-Simulation Driven Search', accepted by ICLR 2021.

CSE-Autoloss Designing proper loss functions for vision tasks has been a long-standing research direction to advance the capability of existing models

Peidong Liu(刘沛东) 54 Dec 17, 2022
Dogs classification with Deep Metric Learning using some popular losses

Tsinghua Dogs classification with Deep Metric Learning 1. Introduction Tsinghua Dogs dataset Tsinghua Dogs is a fine-grained classification dataset fo

QuocThangNguyen 45 Nov 9, 2022
Alpha-IoU: A Family of Power Intersection over Union Losses for Bounding Box Regression

Alpha-IoU: A Family of Power Intersection over Union Losses for Bounding Box Regression YOLOv5 with alpha-IoU losses implemented in PyTorch. Example r

Jacobi(Jiabo He) 147 Dec 5, 2022
Speech Enhancement Generative Adversarial Network Based on Asymmetric AutoEncoder

ASEGAN: Speech Enhancement Generative Adversarial Network Based on Asymmetric AutoEncoder 中文版简介 Readme with English Version 介绍 基于SEGAN模型的改进版本,使用自主设计的非

Nitin 53 Nov 17, 2022
VITS: Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech

VITS: Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech Jaehyeon Kim, Jungil Kong, and Juhee Son In our rece

Jaehyeon Kim 1.7k Jan 8, 2023
HiFi-GAN: High Fidelity Denoising and Dereverberation Based on Speech Deep Features in Adversarial Networks

HiFiGAN Denoiser This is a Unofficial Pytorch implementation of the paper HiFi-GAN: High Fidelity Denoising and Dereverberation Based on Speech Deep F

Rishikesh (ऋषिकेश) 134 Dec 27, 2022
Pytorch implementation of paper "Learning Co-segmentation by Segment Swapping for Retrieval and Discovery"

SegSwap Pytorch implementation of paper "Learning Co-segmentation by Segment Swapping for Retrieval and Discovery" [PDF] [Project page] If our project

xshen 41 Dec 10, 2022
Official PyTorch Implementation for InfoSwap: Information Bottleneck Disentanglement for Identity Swapping

InfoSwap: Information Bottleneck Disentanglement for Identity Swapping Code usage Please check out the user manual page. Paper Gege Gao, Huaibo Huang,

Grace Hešeri 56 Dec 20, 2022
Pytorch Implementation of Continual Learning With Filter Atom Swapping (ICLR'22 Spolight) Paper

Continual Learning With Filter Atom Swapping Pytorch Implementation of Continual Learning With Filter Atom Swapping (ICLR'22 Spolight) Paper If find t

null 11 Aug 29, 2022
Face Library is an open source package for accurate and real-time face detection and recognition

Face Library Face Library is an open source package for accurate and real-time face detection and recognition. The package is built over OpenCV and us

null 52 Nov 9, 2022
Face and Pose detector that emits MQTT events when a face or human body is detected and not detected.

Face Detect MQTT Face or Pose detector that emits MQTT events when a face or human body is detected and not detected. I built this as an alternative t

Jacob Morris 38 Oct 21, 2022