MakeItTalk: Speaker-Aware Talking-Head Animation

Overview

MakeItTalk: Speaker-Aware Talking-Head Animation

This is the code repository implementing the paper:

MakeItTalk: Speaker-Aware Talking-Head Animation

Yang Zhou, Xintong Han, Eli Shechtman, Jose Echevarria , Evangelos Kalogerakis, Dingzeyu Li

SIGGRAPH Asia 2020

Abstract We present a method that generates expressive talking-head videos from a single facial image with audio as the only input. In contrast to previous attempts to learn direct mappings from audio to raw pixels for creating talking faces, our method first disentangles the content and speaker information in the input audio signal. The audio content robustly controls the motion of lips and nearby facial regions, while the speaker information determines the specifics of facial expressions and the rest of the talking-head dynamics. Another key component of our method is the prediction of facial landmarks reflecting the speaker-aware dynamics. Based on this intermediate representation, our method works with many portrait images in a single unified framework, including artistic paintings, sketches, 2D cartoon characters, Japanese mangas, and stylized caricatures. In addition, our method generalizes well for faces and characters that were not observed during training. We present extensive quantitative and qualitative evaluation of our method, in addition to user studies, demonstrating generated talking-heads of significantly higher quality compared to prior state-of-the-art methods.

[Project page] [Paper] [Video] [Arxiv] [Colab Demo] [Colab Demo TDLR]

img

Figure. Given an audio speech signal and a single portrait image as input (left), our model generates speaker-aware talking-head animations (right). Both the speech signal and the input face image are not observed during the model training process. Our method creates both non-photorealistic cartoon animations (top) and natural human face videos (bottom).

Updates

  • facewarp source code and compile instructions
  • Pre-trained models
  • Google colab quick demo for natural faces [detail] [TDLR]
  • Training code for each module
  • Customized puppet creating tool

Requirements

  • Python environment 3.6
conda create -n makeittalk_env python=3.6
conda activate makeittalk_env
sudo apt-get install ffmpeg
  • python packages
pip install -r requirements.txt
sudo dpkg --add-architecture i386
wget -nc https://dl.winehq.org/wine-builds/winehq.key
sudo apt-key add winehq.key
sudo apt-add-repository 'deb https://dl.winehq.org/wine-builds/ubuntu/ xenial main'
sudo apt update
sudo apt install --install-recommends winehq-stable

Pre-trained Models

Download the following pre-trained models to examples/ckpt folder for testing your own animation.

Model Link to the model
Voice Conversion Link
Speech Content Module Link
Speaker-aware Module Link
Image2Image Translation Module Link
Non-photorealistic Warping (.exe) Link

Animate You Portraits!

  • Download pre-trained embedding [here] and save to examples/dump folder.

Nature Human Faces / Paintings

  • crop your portrait image into size 256x256 and put it under examples folder with .jpg format. Make sure the head is almost in the middle (check existing examples for a reference).

  • put test audio files under examples folder as well with .wav format.

  • animate!

python main_end2end.py --jpg 
     

   
  • use addition args --amp_lip_x --amp_lip_y --amp_pos to amply lip motion (in x/y-axis direction) and head motion displacements, default values are =2., =2., =.5

Cartoon Faces

  • put test audio files under examples folder as well with .wav format.

  • animate one of the existing puppets

Puppet Name wilk roy sketch color cartoonM danbooru1
Image img img img img img img
python main_end2end_cartoon.py --jpg 
   
     --jpg_bg 
    

    
   
  • --jpg_bg takes a same-size image as the background image to create the animation, such as the puppet's body, the overall fixed background image. If you want to use the background, make sure the puppet face image (i.e. --jpg image) is in png format and is transparent on the non-face area. If you don't need any background, please also create a same-size image (e.g. a pure white image) to hold the argument place.

  • use addition args --amp_lip_x --amp_lip_y --amp_pos to amply lip motion (in x/y-axis direction) and head motion displacements, default values are =2., =2., =.5

  • create your own puppets (ToDo...)

Train

Train Voice Conversion Module

Todo...

Train Content Branch

  • Create dataset root directory

  • Dataset: Download preprocessed dataset [here], and put it under /dump .

  • Train script: Run script below. Models will be saved in /ckpt/ .

    python main_train_content.py --train --write --root_dir <root_dir> --name <train_instance_name>

Train Speaker-Aware Branch

Todo...

Train Image-to-Image Translation

Todo...

License

Acknowledgement

We would like to thank Timothy Langlois for the narration, and Kaizhi Qian for the help with the voice conversion module. We thank Jakub Fiser for implementing the real-time GPU version of the triangle morphing algorithm. We thank Daichi Ito for sharing the caricature image and Dave Werner for Wilk, the gruff but ultimately lovable puppet.

This research is partially funded by NSF (EAGER-1942069) and a gift from Adobe. Our experiments were performed in the UMass GPU cluster obtained under the Collaborative Fund managed by the MassTech Collaborative.

Comments
  • AttributeError: 'NoneType' object has no attribute 'ndim'

    AttributeError: 'NoneType' object has no attribute 'ndim'

    Traceback (most recent call last): File "main_end2end.py", line 73, in shapes = predictor.get_landmarks(img) File "/usr/local/lib/python3.6/dist-packages/face_alignment/api.py", line 107, in get_landmarks return self.get_landmarks_from_image(image_or_path, detected_faces) File "/usr/local/lib/python3.6/dist-packages/torch/autograd/grad_mode.py", line 26, in decorate_context return func(*args, **kwargs) File "/usr/local/lib/python3.6/dist-packages/face_alignment/api.py", line 134, in get_landmarks_from_image if image.ndim == 2: AttributeError: 'NoneType' object has no attribute 'ndim'

    opened by ak9250 4
  • FileNotFoundError: [Errno 2] No such file or directory: 'examples/dump/celeb_withrot_val_au.pickle'

    FileNotFoundError: [Errno 2] No such file or directory: 'examples/dump/celeb_withrot_val_au.pickle'

    Traceback (most recent call last): File "main_end2end.py", line 150, in model = Audio2landmark_model(opt_parser, jpg_shape=shape_3d) File "/content/MakeItTalk/src/approaches/train_audio2landmark.py", line 85, in init num_window_step=1) File "/content/MakeItTalk/src/dataset/audio2landmark/audio2landmark_dataset.py", line 33, in init with open(os.path.join(self.dump_dir, '{}_{}_au.pickle'.format(dump_name, status)), 'rb') as fp: FileNotFoundError: [Errno 2] No such file or directory: 'examples/dump/celeb_withrot_val_au.pickle'

    opened by ak9250 3
  • facewarp.exe

    facewarp.exe

    Hello, I'm trying to execute a part of main_end2end_cartoon.py script. Specifically, I am interested in the facewarp part of the script so what i actually do is facewarp/facewarp.exe examples_cartoon/wilk.png out/triangulation.txt out/reference_points.txt out/warped_points.txt -novsync -dump
    but i get the following error.

    EXCEPTION:
            Failed to read image '-novsync'.
    

    When i omit both -novsync and -dump arguments the script seems to run but does not output something.

    Am I missing something or is it a bug? Thanks, Tim

    opened by efthymisgeo 2
  • Merge updates from Yang's forked repo

    Merge updates from Yang's forked repo

    Description

    Related Issue

    Motivation and Context

    How Has This Been Tested?

    Screenshots (if appropriate):

    Types of changes

    • [ ] Bug fix (non-breaking change which fixes an issue)
    • [ ] New feature (non-breaking change which adds functionality)
    • [ ] Breaking change (fix or feature that would cause existing functionality to change)

    Checklist:

    • [ ] I have signed the Adobe Open Source CLA.
    • [ ] My code follows the code style of this project.
    • [ ] My change requires a change to the documentation.
    • [ ] I have updated the documentation accordingly.
    • [ ] I have read the CONTRIBUTING document.
    • [ ] I have added tests to cover my changes.
    • [ ] All new and existing tests passed.
    opened by dingzeyuli 2
  • No AutoVC in third party

    No AutoVC in third party

    Hi! Thanks for a great project! Wanted to try the Colab demo, but found out that there is nо Autovc utils in thirdparty directory. And in is actually in src So the line should be changed to

    from src.approaches.train_image_translation import Image_translation_block
    
    Снимок экрана 2020-11-08 в 22 36 47
    opened by kryzhikov 1
  • Two files are missing to train the model

    Two files are missing to train the model

    Hello, the program can't run without these models。" ckpt_best_model.pth "and "ckpt_last_epoch.pth" files are missing to train the model

    opened by baihua86 1
  • [Do not merge yet] add facewarp source code

    [Do not merge yet] add facewarp source code

    Description

    Add facewarp source code

    Todo Items before merging this PR

    @yzhou359, please add more commit to this branch/PR until all these checkboxes are complete.

    • [x] Build the facewarp on your Windows machine
    • [x] Decide if we want to test on mac/Linux -- if it's too much work, let's leave it for the future.
    • [x] Update the name dingwarp to something more general
    • [x] Modify the existing README to make sure anyone can build on their machine

    If there is any other task that I missed, please add to the above list.

    Do not merge this PR before the above checklist is complete.

    Checklist:

    • [x] I have signed the Adobe Open Source CLA.
    • [x] My code follows the code style of this project.
    • [x] My change requires a change to the documentation.
    • [x] I have updated the documentation accordingly.
    • [x] I have read the CONTRIBUTING document.
    • [x] I have added tests to cover my changes.
    • [x] All new and existing tests passed.
    Do not merge 
    opened by dingzeyuli 0
  • Fix bugs for cartoon animation

    Fix bugs for cartoon animation

    Description

    Fix bugs for cartoon animation. Fix bugs caused by path difference in Windows and Linux.

    Related Issue

    Motivation and Context

    How Has This Been Tested?

    Screenshots (if appropriate):

    Types of changes

    • [x] Bug fix (non-breaking change which fixes an issue)
    • [ ] New feature (non-breaking change which adds functionality)
    • [ ] Breaking change (fix or feature that would cause existing functionality to change)

    Checklist:

    • [ ] I have signed the Adobe Open Source CLA.
    • [ ] My code follows the code style of this project.
    • [ ] My change requires a change to the documentation.
    • [ ] I have updated the documentation accordingly.
    • [ ] I have read the CONTRIBUTING document.
    • [ ] I have added tests to cover my changes.
    • [ ] All new and existing tests passed.
    opened by yzhou359 0
  • How to only get the middle video as output

    How to only get the middle video as output

    I have been looking around this code I wanted to see if I can try and create some cool animations for some images, How can I have the middle animation video as the output

    opened by Pavankunchala 1
  • end2end CPU support

    end2end CPU support

    Added CPU-only devices support

    Description

    • main_end2end now checks if "cuda" is not available, and if so, defaults to CPU.

    Motivation and Context

    • Now allows inferencing on devices with no cuda-enabled GPUs

    How Has This Been Tested?

    • Tested locally on custom animations

    Types of changes

    • [ ] Bug fix (non-breaking change which fixes an issue)
    • [x] New feature (non-breaking change which adds functionality)
    • [ ] Breaking change (fix or feature that would cause existing functionality to change)

    Checklist:

    • [x] I have signed the Adobe Open Source CLA.
    • [x] My code follows the code style of this project.
    • [ ] My change requires a change to the documentation.
    • [ ] I have updated the documentation accordingly.
    • [x] I have read the CONTRIBUTING document.
    • [ ] I have added tests to cover my changes.
    • [x] All new and existing tests passed.
    opened by yogeshchandrasekharuni 0
  • rebuild facewarp.exe

    rebuild facewarp.exe

    hello,I rebuilt the ‘facewar.exe’ according the file 'https://github.com/adobe-research/MakeItTalk/blob/main/facewarp/facewarp/README.md', but after running the 'facewar.exe' ,the result is "Could find no file with path '%06d.tga' and index in the range 0-4", so sorry, I guess the file ’facewarp.cpp‘ is incomplete,or for other reasons. hope you can reply, I want to make changes on the ’facewarp.cpp‘ to accomplish my goal.

    opened by ka106 0
  • Training Code Status?

    Training Code Status?

    Hi, It's been over a year since the last commit on this repo. Can anyone update me about the status of the training code? When can we expect to get the entire training code?

    opened by mdv3101 0
Owner
Adobe Research
Adobe Research
Look Who’s Talking: Active Speaker Detection in the Wild

Look Who's Talking: Active Speaker Detection in the Wild Dependencies pip install -r requirements.txt In addition to the Python dependencies, ffmpeg

Clova AI Research 60 Dec 8, 2022
Unofficial pytorch implementation of paper "One-Shot Free-View Neural Talking-Head Synthesis for Video Conferencing"

One-Shot Free-View Neural Talking Head Synthesis Unofficial pytorch implementation of paper "One-Shot Free-View Neural Talking-Head Synthesis for Vide

ZLH 406 Dec 23, 2022
Unofficial implementation of One-Shot Free-View Neural Talking Head Synthesis

face-vid2vid Usage Dataset Preparation cd datasets wget https://yt-dl.org/downloads/latest/youtube-dl -O youtube-dl chmod a+rx youtube-dl python load_

worstcoder 68 Dec 30, 2022
Official code release for "Learned Spatial Representations for Few-shot Talking-Head Synthesis" ICCV 2021

Official code release for "Learned Spatial Representations for Few-shot Talking-Head Synthesis" ICCV 2021

Moustafa Meshry 16 Oct 5, 2022
Code for One-shot Talking Face Generation from Single-speaker Audio-Visual Correlation Learning (AAAI 2022)

One-shot Talking Face Generation from Single-speaker Audio-Visual Correlation Learning (AAAI 2022) Paper | Demo Requirements Python >= 3.6 , Pytorch >

FuxiVirtualHuman 84 Jan 3, 2023
GANimation: Anatomically-aware Facial Animation from a Single Image (ECCV'18 Oral) [PyTorch]

GANimation: Anatomically-aware Facial Animation from a Single Image [Project] [Paper] Official implementation of GANimation. In this work we introduce

Albert Pumarola 1.8k Dec 28, 2022
Official Pytorch Implementation of 3DV2021 paper: SAFA: Structure Aware Face Animation.

SAFA: Structure Aware Face Animation (3DV2021) Official Pytorch Implementation of 3DV2021 paper: SAFA: Structure Aware Face Animation. Getting Started

QiulinW 122 Dec 23, 2022
Demo for the paper "Overlap-aware low-latency online speaker diarization based on end-to-end local segmentation"

Streaming speaker diarization Overlap-aware low-latency online speaker diarization based on end-to-end local segmentation by Juan Manuel Coria, Hervé

Juanma Coria 187 Jan 6, 2023
Code for Pose-Controllable Talking Face Generation by Implicitly Modularized Audio-Visual Representation (CVPR 2021)

Pose-Controllable Talking Face Generation by Implicitly Modularized Audio-Visual Representation (CVPR 2021) Hang Zhou, Yasheng Sun, Wayne Wu, Chen Cha

Hang_Zhou 628 Dec 28, 2022
FACIAL: Synthesizing Dynamic Talking Face With Implicit Attribute Learning. ICCV, 2021.

FACIAL: Synthesizing Dynamic Talking Face with Implicit Attribute Learning PyTorch implementation for the paper: FACIAL: Synthesizing Dynamic Talking

null 226 Jan 8, 2023
Building Ellee — A GPT-3 and Computer Vision Powered Talking Robotic Teddy Bear With Human Level Conversation Intelligence

Using an object detection and facial recognition system built on MobileNetSSDV2 and Dlib and running on an NVIDIA Jetson Nano, a GPT-3 model, Google Speech Recognition, Amazon Polly and servo motors, I built Ellee - a robotic teddy bear who can move her head and converse naturally.

null 24 Oct 26, 2022
style mixing for animation face

An implementation of StyleGAN on Animation dataset. Install git clone https://github.com/MorvanZhou/anime-StyleGAN cd anime-StyleGAN pip install -r re

Morvan 46 Nov 30, 2022
CharacterGAN: Few-Shot Keypoint Character Animation and Reposing

CharacterGAN Implementation of the paper "CharacterGAN: Few-Shot Keypoint Character Animation and Reposing" by Tobias Hinz, Matthew Fisher, Oliver Wan

Tobias Hinz 181 Dec 27, 2022
Code for Motion Representations for Articulated Animation paper

Motion Representations for Articulated Animation This repository contains the source code for the CVPR'2021 paper Motion Representations for Articulat

Snap Research 851 Jan 9, 2023
Dynamic Realtime Animation Control

Our project is targeted at making an application that dynamically detects the user’s expressions and gestures and projects it onto an animation software which then renders a 2D/3D animation realtime that gets broadcasted live.

Harsh Avinash 10 Aug 1, 2022
Source code of our BMVC 2021 paper: AniFormer: Data-driven 3D Animation with Transformer

AniFormer This is the PyTorch implementation of our BMVC 2021 paper AniFormer: Data-driven 3D Animation with Transformer. Haoyu Chen, Hao Tang, Nicu S

null 7 Oct 22, 2021
Image Segmentation Animation using Quadtree concepts.

QuadTree Image Segmentation Animation using QuadTree concepts. Usage usage: quad.py [-h] [-fps FPS] [-i ITERATIONS] [-ws WRITESTART] [-b] [-img] [-s S

Alex Eidt 29 Dec 25, 2022
[SIGGRAPH 2022 Journal Track] AvatarCLIP: Zero-Shot Text-Driven Generation and Animation of 3D Avatars

AvatarCLIP: Zero-Shot Text-Driven Generation and Animation of 3D Avatars Fangzhou Hong1*  Mingyuan Zhang1*  Liang Pan1  Zhongang Cai1,2,3  Lei Yang2 

Fangzhou Hong 749 Jan 4, 2023
[ECCV 2020] Reimplementation of 3DDFAv2, including face mesh, head pose, landmarks, and more.

Stable Head Pose Estimation and Landmark Regression via 3D Dense Face Reconstruction Reimplementation of (ECCV 2020) Towards Fast, Accurate and Stable

Remilia Scarlet 221 Dec 30, 2022