FACIAL: Synthesizing Dynamic Talking Face With Implicit Attribute Learning. ICCV, 2021.

Related tags

Deep Learning FACIAL
Overview

FACIAL: Synthesizing Dynamic Talking Face with Implicit Attribute Learning

PyTorch implementation for the paper:

FACIAL: Synthesizing Dynamic Talking Face with Implicit Attribute Learning

Chenxu Zhang, Yifan Zhao, Yifei Huang, Ming Zeng, Saifeng Ni, Madhukar Budagavi, Xiaohu Guo

ICCV 2021

Run demo on Google Colab

Open In Colab

Requirements

  • Python environment
conda create -n audio_face
conda activate audio_face
  • ffmpeg
sudo apt-get install ffmpeg
  • python packages
pip install -r requirements.txt
  • you may add opencv by conda.
conda install opencv

Citation

@inproceedings{zhang2021facial,
  title={FACIAL: Synthesizing Dynamic Talking Face with Implicit Attribute Learning},
  author={Zhang, Chenxu and Zhao, Yifan and Huang, Yifei and Zeng, Ming and Ni, Saifeng and Budagavi, Madhukar and Guo, Xiaohu},
  booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
  pages={3867--3876},
  year={2021}
}
Comments
  • Audio/Video Sync Problem

    Audio/Video Sync Problem

    Hi guys ,

    I've already synthesized my own video, and the resolution seems good! However, the voice sometimes not sync well with the video (I used mandarin tts speech).

    https://user-images.githubusercontent.com/53510802/149249264-89a4c5cb-5ef7-49f7-ba87-c999e7a8470c.mp4

    Does anyone ever have similar problem? Any suggestion is appreciated.

    Thanks a lot

    opened by MingZJU 14
  • 两个问题

    两个问题

    首先感谢作者公开项目代码供大家研究。费尽千辛万苦总算把demo脚本和公开的train脚本都跑通了,而且是在CUDA10.1、tf2.3环境。但在使用新的语音生成视频的过程中,发现几个问题。希望作者有空给予解答。 1,face2vid在执行 test_video.py 时,我使用RTX 2080Ti 大约为15帧/秒。作者曾称使用了1080Ti,应该不会比这个更快。 我使用1分钟的音频,预处理声音耗时3分钟左右,合成画面环节又耗时3分钟左右。整体耗时相当于声音时长的6倍以上,在效率上距离实时合成画面应该还有很大差距。作者曾说可以做到近似实时,能否在效率改进等方面提供一些思路?

    2,在声音预处理过程采用了deepspeech 0.1.0的模型,已经非常陈旧了。这个项目已经更新到了0.9.3版本,并且有了支持中文的pbmm模型,为什么不采用更新版本的模型而要用最初版本的pb模型呢?作者能否调整一下代码以提供对新版本的支持?

    opened by yaleimeng 10
  • How to train my own data?

    How to train my own data?

    Thanks for your sharing @zhangchenxu528 。 I get the bad results by using my own data, the results seems to be affected by "latest_net_G.pth". How to train a custom latest_net_G.pth? Or do I need to do some other work? 000700 .

    opened by CruiseYuGH 10
  • 3DFaceReconstruction with PyTorch .mat files have different keys from provided video_preprocess!

    3DFaceReconstruction with PyTorch .mat files have different keys from provided video_preprocess!

    Hi. Hello to everyone at the repo! Much kudos to Professor @zhangchenxu528 for his great research. I just used a new video to generate a 3DFaceReconstruction with PyTorch (Original notebook used Tensorflow) as detailed here:

    https://github.com/sicxu/Deep3DFaceRecon_pytorch

    The generated .mat file has different keys from the video_preprocess provided. For example when running the file, handle_netface.py, there is this line of code:

    faceshape[i-1,:] = loadmat(os.path.join(param_folder,str(i)+'.mat'))['coeff'][0,:]

    This means that the .mat file is expected to have the key 'coeff'! However the .mat from the PyTorch version only has the following keys: 'angle', 'exp', 'gamma', 'id', 'tex', and 'trans'. When one concats these keys, its dimensions become equal to that of the original required 'coeff' key.

    Here is what I have done so far! However, I am not sure that the keys are concatenated in the correct order.

    coeffs = np.concatenate((mat_contents['angle'][0],
                            mat_contents['exp'][0],
                            mat_contents['gamma'][0],
                            mat_contents['id'][0],
                            mat_contents['tex'][0],
                            mat_contents['trans'][0]
    

    Can someone please confirm if this is the correct order?

    I have attached the two mat files here. 1.mat.orig.zip contains the .mat file in video_preprocess that has a 'coeff' key. The file 1.mat.zip contains the .mat file that doesn't have a 'coeff' key, but rather the following keys: 'angle', 'exp', 'gamma', 'id', 'tex', and 'trans'.

    1.mat.orig.zip 1.mat.zip

    opened by geek0075 6
  • Puzzle of train our custom dataset encountered

    Puzzle of train our custom dataset encountered

    Hi, thank your awesome work! I try to use your method to train the custom dataset, but the video fps of trainet is 25. I don’t want to force frame rate of those videos to to 30. Can I achieve this by modifying the original code?

    https://github.com/zhangchenxu528/FACIAL/blob/95eaf8f40e44e8c1ad5ed047b874881fb6115b66/audio2face/audio_preprocessing.py#L41 Do I need to change only this line of code? Looking forward to your advice.

    opened by xiao-keeplearning 6
  • How to get .mat file for every image?

    How to get .mat file for every image?

    Hi, I tried to make my own dataset. But I have no idea to create the mat file. When I use the deep3dface to make the mat file, it asked a txt files which record the coordinates of eye, noise and mouth. But the deep3dface author doesn't show how to make the txt file for every image. Could you please show us the method to make the mat file? or the method to make .txt file for every iamge?

    opened by psuu0001 6
  • No accurate output for custom video.

    No accurate output for custom video.

    Hi @zhangchenxu528 , Thankyou for the wonderful project ! Looked at your demo video and output (for even higher resolution) in colab. I have trained for custom video model using your training colab notebook. Don't know why, but didnt get proper output video for a custom video of 4 mins. Except the preprocessed data, rest of everything I used, is same as provided in the colab notebook.

    opened by TejaswiniiB 5
  • Whats the standard of training dataset?

    Whats the standard of training dataset?

    Hi, I have read your paper. It's a good idea to use 3d construction to do aud2face task. I plan to make my own Chinese speech dataset but I don't know how what's the standard of making a dataset. Your paper mentioned that you used 450 clips for training. I want to know will the background music allowed in video clips or just one person is talking?

    opened by psuu0001 5
  • How to create the

    How to create the "train1_openface" file?

    Hi, I am a little confused about the Openface file. Is it related to the Deep3DFaceReconstruction files? Then how can I create the csv file? Looking forward to your reply.Thanks!

    opened by lshil00 4
  • 'deepspeech/input_node', does not exist in the graph

    'deepspeech/input_node', does not exist in the graph

    Thanks for the great work. However, when I run the python audio_preprocessing.py code, I face error: Traceback (most recent call last): File "/home/ubuntu/muhiddin/FACIAL/audio2face/audio_preprocessing.py", line 61, in processed_audio = process_audio(ds_fname, audio4deepspeech, fps) # generate audio feature File "/home/ubuntu/muhiddin/FACIAL/audio2face/audio_preprocessing.py", line 39, in process_audio processed_audio = audio_handler.process(audio, fps) File "/home/ubuntu/muhiddin/FACIAL/audio2face/_utils/audio_handler.py", line 58, in process return self.convert_to_deepspeech(audio, fps) File "/home/ubuntu/muhiddin/FACIAL/audio2face/_utils/audio_handler.py", line 107, in convert_to_deepspeech input_tensor = graph.get_tensor_by_name('deepspeech/input_node:0') File "/home/ubuntu/anaconda3/envs/audio_face/lib/python3.10/site-packages/tensorflow/python/framework/ops.py", line 4150, in get_tensor_by_name return self.as_graph_element(name, allow_tensor=True, allow_operation=False) File "/home/ubuntu/anaconda3/envs/audio_face/lib/python3.10/site-packages/tensorflow/python/framework/ops.py", line 3974, in as_graph_element return self._as_graph_element_locked(obj, allow_tensor, allow_operation) File "/home/ubuntu/anaconda3/envs/audio_face/lib/python3.10/site-packages/tensorflow/python/framework/ops.py", line 4014, in _as_graph_element_locked raise KeyError("The name %s refers to a Tensor which does not " KeyError: "The name 'deepspeech/input_node:0' refers to a Tensor which does not exist. The operation, 'deepspeech/input_node', does not exist in the graph." Can you give some solution for this issue?

    opened by muxiddin19 3
  • Error in 5.1.2 Train face2video by yourself (optional 2)

    Error in 5.1.2 Train face2video by yourself (optional 2)

    When I run the python train.py --blink_path '../video_preprocess/train1_openface/train1_512_audio.csv' --name train3 --model pose2vid --dataroot ./datasets/train3/ --netG local --ngf 32 --num_D 3 --tf_log --niter_fix_global 0 --label_nc 0 --no_instance --save_epoch_freq 2 --lr=0.0001 --resize_or_crop resize --no_flip --verbose --n_local_enhancers 1 faced this issue: RuntimeError: CUDA error: out of memory CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1. I am using a Linux server with #GPU is 2, is it related to this? Please, any solution?

    opened by muxiddin19 3
  • run on tensorflow2.x?

    run on tensorflow2.x?

    Tensorflow 1.x is not supported in Google colab now.

    ValueError: Tensorflow 1 is unsupported in Colab.

    With tensorflow 2.x, I modified the code in audioface/_util/audio_handler.py import tensorflow as tf to import tensorflow.compat.v1 as tf

    But after finished the first several steps, I encountered error as below during "python audio_preprocessing.py"

    Traceback (most recent call last): File "audio_preprocessing.py", line 60, in processed_audio = process_audio(ds_fname, audio4deepspeech, fps) # generate audio feature File "audio_preprocessing.py", line 38, in process_audio processed_audio = audio_handler.process(audio, fps) File "/content/FACIAL/audio2face/_utils/audio_handler.py", line 59, in process return self.convert_to_deepspeech(audio, fps) File "/content/FACIAL/audio2face/_utils/audio_handler.py", line 108, in convert_to_deepspeech input_tensor = graph.get_tensor_by_name('deepspeech/input_node:0') File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/framework/ops.py", line 4128, in get_tensor_by_name return self.as_graph_element(name, allow_tensor=True, allow_operation=False) File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/framework/ops.py", line 3952, in as_graph_element return self._as_graph_element_locked(obj, allow_tensor, allow_operation) File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/framework/ops.py", line 3992, in _as_graph_element_locked raise KeyError("The name %s refers to a Tensor which does not " KeyError: "The name 'deepspeech/input_node:0' refers to a Tensor which does not exist. The operation, 'deepspeech/input_node', does not exist in the graph."

    Any body solved such problem?

    opened by mendynew 0
  • Hello I need

    Hello I need

    In this paper I find Dataset statistics. The proposed dataset contains rich samples of more than 450 video clips which are collected from the videos used by Agarwal et al. [1]. Each video clip lasts for around 1 minute. We re-normalize all videos to 30 FPS, forming 535,400 frames in total. Can you seed me a link for this viodes,thanks.Or can you give me a drvie to download the pre-train for 450 video! thank you again!

    opened by qcj1206 0
  • 如何从面部重建的.mat文件中获取需要的部分?

    如何从面部重建的.mat文件中获取需要的部分?

    您好,非常感谢您分享的优秀的工作和代码! 我注意到在您分享的train a new person on Google Colab中,video-preprocess的deep3dreconstruction部分,其中的.mat文件是1*257维的,请问您如何从BFM或者其它模型中获取这些数据?谢谢!

    opened by ax0057 0
  • render_netface_fitpose produces cropped rendered images

    render_netface_fitpose produces cropped rendered images

    Hi,

    I've generated the face model .mat files using Deep3DFaceRecon_pytorch, and the facial features using OpenFace, however the resulting rendered images used for training are cropped heavily. I am not sure if it is an issue with render_netface_fitpose.py or fit_headpose.py, or if the Deep3DFaceRecon_pytorch .mat files are still incorrect (I modified handle_netface.py as mentioned in #26, as the pytorch version uses different keys). I have attached an example output and the corresponding true image. Thank you very much for your help.

    000001 0001

    opened by NicoSperry 1
  • Un-natural shoulder movement.

    Un-natural shoulder movement.

    Hi, I have been working on you project for video synthesis. I have successfully been able to generate the video with good quality lip syncing. But the output video seems to have some un-natural shoulder movements can you please suggest what should I do to keep them as it is in the input video.

    opened by rohaantahir 7
Owner
null
Code for Pose-Controllable Talking Face Generation by Implicitly Modularized Audio-Visual Representation (CVPR 2021)

Pose-Controllable Talking Face Generation by Implicitly Modularized Audio-Visual Representation (CVPR 2021) Hang Zhou, Yasheng Sun, Wayne Wu, Chen Cha

Hang_Zhou 628 Dec 28, 2022
Official code release for "Learned Spatial Representations for Few-shot Talking-Head Synthesis" ICCV 2021

Official code release for "Learned Spatial Representations for Few-shot Talking-Head Synthesis" ICCV 2021

Moustafa Meshry 16 Oct 5, 2022
Code for One-shot Talking Face Generation from Single-speaker Audio-Visual Correlation Learning (AAAI 2022)

One-shot Talking Face Generation from Single-speaker Audio-Visual Correlation Learning (AAAI 2022) Paper | Demo Requirements Python >= 3.6 , Pytorch >

FuxiVirtualHuman 84 Jan 3, 2023
Web service for facial landmark detection, head pose estimation, facial action unit recognition, and eye-gaze estimation based on OpenFace 2.0

OpenGaze: Web Service for OpenFace Facial Behaviour Analysis Toolkit Overview OpenFace is a fantastic tool intended for computer vision and machine le

Sayom Shakib 4 Nov 3, 2022
OpenFace – a state-of-the art tool intended for facial landmark detection, head pose estimation, facial action unit recognition, and eye-gaze estimation.

OpenFace 2.2.0: a facial behavior analysis toolkit Over the past few years, there has been an increased interest in automatic facial behavior analysis

Tadas Baltrusaitis 5.8k Dec 31, 2022
Automatically measure the facial Width-To-Height ratio and get facial analysis results provided by Microsoft Azure

fwhr-calc-website This project is to automatically measure the facial Width-To-Height ratio and get facial analysis results provided by Microsoft Azur

SoohyunPark 1 Feb 7, 2022
Disentangled Face Attribute Editing via Instance-Aware Latent Space Search, accepted by IJCAI 2021.

Instance-Aware Latent-Space Search This is a PyTorch implementation of the following paper: Disentangled Face Attribute Editing via Instance-Aware Lat

null 67 Dec 21, 2022
Sync2Gen Code for ICCV 2021 paper: Scene Synthesis via Uncertainty-Driven Attribute Synchronization

Sync2Gen Code for ICCV 2021 paper: Scene Synthesis via Uncertainty-Driven Attribute Synchronization 0. Environment Environment: python 3.6 and cuda 10

Haitao Yang 62 Dec 30, 2022
Official implementation of Protected Attribute Suppression System, ICCV 2021

Official implementation of Protected Attribute Suppression System, ICCV 2021

Prithviraj Dhar 6 Jan 1, 2023
[SIGGRAPH Asia 2021] DeepVecFont: Synthesizing High-quality Vector Fonts via Dual-modality Learning.

DeepVecFont This is the homepage for "DeepVecFont: Synthesizing High-quality Vector Fonts via Dual-modality Learning". Yizhi Wang and Zhouhui Lian. WI

Yizhi Wang 17 Dec 22, 2022
[SIGGRAPH Asia 2021] DeepVecFont: Synthesizing High-quality Vector Fonts via Dual-modality Learning.

DeepVecFont This is the homepage for "DeepVecFont: Synthesizing High-quality Vector Fonts via Dual-modality Learning". Yizhi Wang and Zhouhui Lian. WI

Yizhi Wang 5 Oct 22, 2021
DyStyle: Dynamic Neural Network for Multi-Attribute-Conditioned Style Editing

DyStyle: Dynamic Neural Network for Multi-Attribute-Conditioned Style Editing Figure: Joint multi-attribute edits using DyStyle model. Great diversity

null 74 Dec 3, 2022
Facial Action Unit Intensity Estimation via Semantic Correspondence Learning with Dynamic Graph Convolution

FAU Implementation of the paper: Facial Action Unit Intensity Estimation via Semantic Correspondence Learning with Dynamic Graph Convolution. Yingruo

Evelyn 78 Nov 29, 2022
Official code release for ICCV 2021 paper SNARF: Differentiable Forward Skinning for Animating Non-rigid Neural Implicit Shapes.

Official code release for ICCV 2021 paper SNARF: Differentiable Forward Skinning for Animating Non-rigid Neural Implicit Shapes.

null 235 Dec 26, 2022
Synthesizing Long-Term 3D Human Motion and Interaction in 3D in CVPR2021

Long-term-Motion-in-3D-Scenes This is an implementation of the CVPR'21 paper "Synthesizing Long-Term 3D Human Motion and Interaction in 3D". Please ch

Jiashun Wang 76 Dec 13, 2022
This is the research repository for Vid2Doppler: Synthesizing Doppler Radar Data from Videos for Training Privacy-Preserving Activity Recognition.

Vid2Doppler: Synthesizing Doppler Radar Data from Videos for Training Privacy-Preserving Activity Recognition This is the research repository for Vid2

Future Interfaces Group (CMU) 26 Dec 24, 2022
Synthesizing and manipulating 2048x1024 images with conditional GANs

pix2pixHD Project | Youtube | Paper Pytorch implementation of our method for high-resolution (e.g. 2048x1024) photorealistic image-to-image translatio

NVIDIA Corporation 6k Dec 27, 2022
Attention-guided gan for synthesizing IR images

SI-AGAN Attention-guided gan for synthesizing IR images This repository contains the Tensorflow code for "Pedestrian Gender Recognition by Style Trans

null 1 Oct 25, 2021
Official PyTorch Implementation of paper "Deep 3D Mask Volume for View Synthesis of Dynamic Scenes", ICCV 2021.

Deep 3D Mask Volume for View Synthesis of Dynamic Scenes Official PyTorch Implementation of paper "Deep 3D Mask Volume for View Synthesis of Dynamic S

Ken Lin 17 Oct 12, 2022