This repository contains the codes for LipGAN. LipGAN was published as a part of the paper titled "Towards Automatic Face-to-Face Translation".

Overview

LipGAN

Generate realistic talking faces for any human speech and face identity.

[Paper] | [Project Page] | [Demonstration Video]

image

Important Update:

A new, improved work that can produce significantly more accurate and natural results on moving talking face videos is available here: https://github.com/Rudrabha/Wav2Lip


Code without MATLAB dependency is now available in fully_pythonic branch. Note that the models in both the branches are not entirely identical and either one may perform better than the other in several cases. The model used at the time of the paper's publication is with the MATLAB dependency and this is the one that has been extensively tested. Please feel free to experiment with the fully_pythonic branch if you do not want to have the MATLAB dependency. A Google Colab notebook is also available for the fully_pythonic branch. [Credits: Kirill]


Features

  • Can handle in-the-wild face poses and expressions.
  • Can handle speech in any language and is robust to background noise.
  • Paste faces back into the original video with minimal/no artefacts --- can potentially correct lip sync errors in dubbed movies!
  • Complete multi-gpu training code, pre-trained models available.
  • Fast inference code to generate results from the pre-trained models

Prerequisites

  • Python >= 3.5
  • ffmpeg: sudo apt-get install ffmpeg
  • Matlab R2016a (for audio preprocessing, this dependency will be removed in later versions)
  • Install necessary packages using pip install -r requirements.txt
  • Install keras-contrib pip install git+https://www.github.com/keras-team/keras-contrib.git

Getting the weights

Download checkpoints of the folowing models into the logs/ folder

Generating talking face videos using pretrained models (Inference)

LipGAN takes speech features in the form of MFCCs and we need to preprocess our input audio file to get the MFCC features. We use the create_mat.m script to create .mat files for a given audio.

cd matlab
matlab -nodesktop
>> create_mat(input_wav_or_mp4_file, path_to_output.mat) # replace with file paths
>> exit
cd ..

Usage #1: Generating correct lip motion on a random talking face video

Here, we are given an audio input (as .mat MFCC features) and a video of an identity speaking something entirely different. LipGAN can synthesize the correct lip motion for the given audio and overlay it on the given video of the speaking identity (Example #1, #2 in the above image).

python batch_inference.py --checkpoint_path <saved_checkpoint> --face <random_input_video> --fps <fps_of_input_video> --audio <guiding_audio_wav_file> --mat <mat_file_from_above> --results_dir <folder_to_save_generated_video>

The generated result_voice.mp4 will contain the input video lip synced with the given input audio. Note that the FPS parameter is by default 25, make sure you set the FPS correctly for your own input video.

Usage #2: Generating talking video from a single face image

Refer to example #3 in the above picture. Given an audio, LipGAN generates a correct mouth shape (viseme) at each time-step and overlays it on the input image. The sequence of generated mouth shapes yields a talking face video.

python batch_inference.py --checkpoint_path <saved_checkpoint> --face <random_input_face> --audio <guiding_audio_wav_file> --mat <mat_file_from_above> --results_dir <folder_to_save_generated_video>

Please use the --pads argument to correct for inaccurate face detections such as not covering the chin region correctly. This can improve the results further.

More options

python batch_inference.py --help

Training LipGAN

We illustrate the training pipeline using the LRS2 dataset. Adapting for other datasets would involve small modifications to the code.

Preprocess the dataset

We need to do two things: (i) Save the MFCC features from the audio and (ii) extract and save the facial crops of each frame in the video.

LRS2 dataset folder structure
data_root (mvlrs_v1)
├── main, pretrain (we use only main folder in this work)
|	├── list of folders
|	│   ├── five-digit numbered video IDs ending with (.mp4)
Saving the MFCC features

We use MATLAB to save the MFCC files for all the videos present in the dataset. Refer to the fully_pythonic branch if you do not want to use MATLAB.

# Please copy the appropriate LRS2 train split's filelist.txt to the filelists/ folder. The example below is shown for LRS2.
cd matlab
matlab -nodesktop
>> preprocess_mat('../filelists/train.txt', 'mvlrs_v1/main/') # replace with appropriate file paths for other datasets.
>> exit
cd ..
Saving the Face Crops of all Video Frames

We preprocess the video files by detecting faces using a face detector from dlib.

# Please copy the appropriate LRS2 split's filelist.txt to the filelists/ folder. Example below is shown for LRS2. 
python preprocess.py --split [train|pretrain|val] --videos_data_root mvlrs_v1/ --final_data_root <folder_to_store_preprocessed_files>

### More options while preprocessing (like number of workers, image size etc.)
python preprocess.py --help
Final preprocessed folder structure
data_root (mvlrs_v1)
├── main, pretrain (we use only main folder in this work)
|	├── list of folders
|	│   ├── folders with five-digit video IDs 
|	│   |	 ├── 0.jpg, 1.jpg .... (extracted face crops of each frame)
|	│   |	 ├── 0.npz, 1.npz .... (mfcc features corresponding to each frame)

Train the generator only

As training LipGAN is computationally intensive, you can just train the generator alone for quick, decent results.

python train_unet.py --data_root <path_to_preprocessed_dataset>

### Extensive set of training options available. Please run and refer to:
python train_unet.py --help

Train LipGAN

python train.py --data_root <path_to_preprocessed_dataset>

### Extensive set of training options available. Please run and refer to:
python train.py --help

License and Citation

The software is licensed under the MIT License. Please cite the following paper if you have use this code:

@inproceedings{KR:2019:TAF:3343031.3351066,
  author = {K R, Prajwal and Mukhopadhyay, Rudrabha and Philip, Jerin and Jha, Abhishek and Namboodiri, Vinay and Jawahar, C V},
  title = {Towards Automatic Face-to-Face Translation},
  booktitle = {Proceedings of the 27th ACM International Conference on Multimedia}, 
  series = {MM '19}, 
  year = {2019},
  isbn = {978-1-4503-6889-6},
  location = {Nice, France},
   = {1428--1436},
  numpages = {9},
  url = {http://doi.acm.org/10.1145/3343031.3351066},
  doi = {10.1145/3343031.3351066},
  acmid = {3351066},
  publisher = {ACM},
  address = {New York, NY, USA},
  keywords = {cross-language talking face generation, lip synthesis, neural machine translation, speech to speech translation, translation systems, voice transfer},
}

Acknowledgements

Part of the MATLAB code is taken from the an implementation of the Talking Face Generation implementation. We thank the authors for releasing their code.

Comments
  • When using video colab gets killed

    When using video colab gets killed

    Hey @prajwalkr Thanks for sharing the codebase and models for such a wonderful project! Much appreciated I have successfully been able to replicate Use Case-2 (using image with lip sync) on both CPU & Google Colab.

    However for Use Case 1 (using video with LipSync) Both my CPU and Google Colab quit the process since it consumes a lot of RAM. I have a short text about 75 Characters long. I have an input video of 30 seconds at 60 FPS. How do I end up using it here?

    opened by aretius 16
  • Error with ffmpeg config for batch inference

    Error with ffmpeg config for batch inference

    Hi, @Rudrabha @prajwalkr thanks for the amazing code and congrats on your paper!!

    I am trying to run inference but I get the following error message with ffmpeg:

    [buffer @ 0x6f2b80] Error setting option pix_fmt to value -1. [graph 0 input from stream 1:0 @ 0x6f2e20] Error applying options to the filter. Error opening filters!

    If there some config requirement that I am missing? Looking forward to your response.

    opened by ayushchopra96 14
  • CONTINIOUS PLAYING

    CONTINIOUS PLAYING

    I want to create a chatbot that uses text to speech and lipgan for face animation. Is there a way that the lipgan can be used in real time and create something like a talking avatar that uses text to speech live? Any help will be valuable

    opened by leonvit 9
  • Sample Dataset for usage #2

    Sample Dataset for usage #2

    Hi @Rudrabha , Amazing work. I was working on generating video from image+audio and it would be very helpful if you could post a sample image and audio file. I've been getting different errors every time I'm using a random image.

    Thanks!

    opened by shikhar-scs 9
  • Some problems about lip sync

    Some problems about lip sync

    Hi, @prajwalkr. Thanks for sharing the revolutionary work. However, when I run the code and input the same image which you gave in the previous issues, I cannot get a satisfactory result. My result_video is listed as follows. Could you give me some advice on improving the result or correcting my possible mistakes? Thanks a lot. https://www.youtube.com/watch?v=beuf71Wrg3g

    opened by tju-zxy 8
  • multi gpu inference

    multi gpu inference

    Thanks for your great work. I tried to inference on a machine with multiple gpu, it could detect all gpu (I also set the n_gpus as the number of gpu). it works fine on dlib, after running dlib part, it halts on just after "Model Created" and "Model Loaded".

    Could I have some hints how I could try on multi gpu machine? Thanks for your help!

    opened by chikiuso 7
  • May I ask how many GPU ram is needed?

    May I ask how many GPU ram is needed?

    Hi @prajwalkr , I tried to run it on 1080 ti with 11 G gpu ram, however the system halt every time I run on GPU, May I ask how many GPU ram is needed? Thanks.

    opened by chikiuso 7
  • Generated Video Rambles & Stops (Bad Lip Sync)

    Generated Video Rambles & Stops (Bad Lip Sync)

    Thanks for developing this software as it works very well! There's an issue where if you have an audio clip, the generated video continues to "ramble" through them. Things I've tried:

    • Setting the correct FPS.
    • Setting the max seconds to match the audio/video clip.
    • Tried multiple video AND audio sources.
    • Reinstalling everything that coincides with your requirements.txt file.

    It's very easy to reproduce. You can record audio and try it. The model will ramble right through it, and there will be an elongated pause at the end of the video with the person freezing. If I were to guess, maybe something with the mfcc (I'm unfamiliar with this as a whole) isn't working properly, or there needs to be an implemented interpolation method to sync video with audio. Currently using the pretrained models at your Google Drive link. Any insight is appreciated!

    opened by ExponentialML 6
  • When using a single image UnboundLocalError: local variable 'full_frames' referenced before assignment

    When using a single image UnboundLocalError: local variable 'full_frames' referenced before assignment

    When using a single image full_frames is not defined

    Traceback (most recent call last): File "batch_inference.py", line 228, in <module> main() File "batch_inference.py", line 178, in main print ("Number of frames to be used for inference: "+str(len(full_frames))) UnboundLocalError: local variable 'full_frames' referenced before assignment

    Also FPS is a required parameter, in the docs for image it isn't specified

    opened by seranus 6
  • Need Help - ValueError: Layer #37 (named

    Need Help - ValueError: Layer #37 (named "batch_normalization_34" in the current model) was found to correspond to layer conv2d_35 in the save file. However the new layer batch_normalization_34 expects 4 weights, but the saved weights have 2 elements.

    Hello Sir,

    I like your project very much and I am trying it on Google Colab by referring this link (https://colab.research.google.com/drive/1NLUwupCBsB1HrpEmOIHeMgU63sus2LxP). Attaching video (output_00006.mp4) and audio (taunt.wav) files for your reference. After executing all steps successfully while running last step I get below log. Please let me know if I am missing something as I did not see output file generated in /content directory even after refreshing folder in Google Colab.

    /content/LipGAN /usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/dtypes.py:516: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'. _np_qint8 = np.dtype([("qint8", np.int8, 1)]) /usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/dtypes.py:517: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'. _np_quint8 = np.dtype([("quint8", np.uint8, 1)]) /usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/dtypes.py:518: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'. _np_qint16 = np.dtype([("qint16", np.int16, 1)]) /usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/dtypes.py:519: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'. _np_quint16 = np.dtype([("quint16", np.uint16, 1)]) /usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/dtypes.py:520: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'. _np_qint32 = np.dtype([("qint32", np.int32, 1)]) /usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/dtypes.py:525: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'. np_resource = np.dtype([("resource", np.ubyte, 1)]) /usr/local/lib/python3.6/dist-packages/tensorboard/compat/tensorflow_stub/dtypes.py:541: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'. _np_qint8 = np.dtype([("qint8", np.int8, 1)]) /usr/local/lib/python3.6/dist-packages/tensorboard/compat/tensorflow_stub/dtypes.py:542: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'. _np_quint8 = np.dtype([("quint8", np.uint8, 1)]) /usr/local/lib/python3.6/dist-packages/tensorboard/compat/tensorflow_stub/dtypes.py:543: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'. _np_qint16 = np.dtype([("qint16", np.int16, 1)]) /usr/local/lib/python3.6/dist-packages/tensorboard/compat/tensorflow_stub/dtypes.py:544: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'. _np_quint16 = np.dtype([("quint16", np.uint16, 1)]) /usr/local/lib/python3.6/dist-packages/tensorboard/compat/tensorflow_stub/dtypes.py:545: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'. _np_qint32 = np.dtype([("qint32", np.int32, 1)]) /usr/local/lib/python3.6/dist-packages/tensorboard/compat/tensorflow_stub/dtypes.py:550: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'. np_resource = np.dtype([("resource", np.ubyte, 1)]) Using TensorFlow backend. 2000 Number of frames available for inference: 3841 (80, 328) Length of mel chunks: 95 0% 0/1 [00:00<?, ?it/s] 0% 0/61 [00:00<?, ?it/s] 2% 1/61 [00:02<02:19, 2.33s/it] 3% 2/61 [00:02<01:21, 1.37s/it] 5% 3/61 [00:03<01:01, 1.05s/it] 7% 4/61 [00:03<00:50, 1.12it/s] 8% 5/61 [00:03<00:44, 1.26it/s] 10% 6/61 [00:04<00:40, 1.37it/s] 11% 7/61 [00:04<00:36, 1.46it/s] 13% 8/61 [00:05<00:34, 1.54it/s] 15% 9/61 [00:05<00:32, 1.61it/s] 16% 10/61 [00:06<00:30, 1.66it/s] 18% 11/61 [00:06<00:29, 1.71it/s] 20% 12/61 [00:06<00:27, 1.76it/s] 21% 13/61 [00:07<00:26, 1.80it/s] 23% 14/61 [00:07<00:25, 1.83it/s] 25% 15/61 [00:08<00:24, 1.86it/s] 26% 16/61 [00:08<00:23, 1.89it/s] 28% 17/61 [00:08<00:22, 1.92it/s] 30% 18/61 [00:09<00:22, 1.94it/s] 31% 19/61 [00:09<00:21, 1.96it/s] 33% 20/61 [00:10<00:20, 1.98it/s] 34% 21/61 [00:10<00:20, 2.00it/s] 36% 22/61 [00:10<00:19, 2.01it/s] 38% 23/61 [00:11<00:18, 2.03it/s] 39% 24/61 [00:11<00:18, 2.05it/s] 41% 25/61 [00:12<00:17, 2.06it/s] 43% 26/61 [00:12<00:16, 2.07it/s] 44% 27/61 [00:12<00:16, 2.08it/s] 46% 28/61 [00:13<00:15, 2.09it/s] 48% 29/61 [00:13<00:15, 2.11it/s] 49% 30/61 [00:14<00:14, 2.12it/s] 51% 31/61 [00:14<00:14, 2.12it/s] 52% 32/61 [00:15<00:13, 2.13it/s] 54% 33/61 [00:15<00:13, 2.14it/s] 56% 34/61 [00:15<00:12, 2.15it/s] 57% 35/61 [00:16<00:12, 2.16it/s] 59% 36/61 [00:16<00:11, 2.16it/s] 61% 37/61 [00:17<00:11, 2.17it/s] 62% 38/61 [00:17<00:10, 2.18it/s] 64% 39/61 [00:17<00:10, 2.18it/s] 66% 40/61 [00:18<00:09, 2.19it/s] 67% 41/61 [00:18<00:09, 2.19it/s] 69% 42/61 [00:19<00:08, 2.20it/s] 70% 43/61 [00:19<00:08, 2.20it/s] 72% 44/61 [00:19<00:07, 2.21it/s] 74% 45/61 [00:20<00:07, 2.21it/s] 75% 46/61 [00:20<00:06, 2.22it/s] 77% 47/61 [00:21<00:06, 2.22it/s] 79% 48/61 [00:21<00:05, 2.23it/s] 80% 49/61 [00:21<00:05, 2.23it/s] 82% 50/61 [00:22<00:04, 2.24it/s] 84% 51/61 [00:22<00:04, 2.24it/s] 85% 52/61 [00:23<00:04, 2.24it/s] 87% 53/61 [00:23<00:03, 2.25it/s] 89% 54/61 [00:23<00:03, 2.25it/s] 90% 55/61 [00:24<00:02, 2.25it/s] 92% 56/61 [00:24<00:02, 2.26it/s] 93% 57/61 [00:25<00:01, 2.26it/s] 95% 58/61 [00:25<00:01, 2.26it/s] 97% 59/61 [00:26<00:00, 2.26it/s] 98% 60/61 [00:26<00:00, 2.27it/s] 100% 61/61 [00:26<00:00, 2.30it/s]WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:517: The name tf.placeholder is deprecated. Please use tf.compat.v1.placeholder instead.

    WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:74: The name tf.get_default_graph is deprecated. Please use tf.compat.v1.get_default_graph instead.

    WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:4138: The name tf.random_uniform is deprecated. Please use tf.random.uniform instead.

    WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:174: The name tf.get_default_session is deprecated. Please use tf.compat.v1.get_default_session instead.

    WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:181: The name tf.ConfigProto is deprecated. Please use tf.compat.v1.ConfigProto instead.

    2020-07-05 15:01:10.670458: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 AVX512F FMA 2020-07-05 15:01:10.674180: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcuda.so.1 2020-07-05 15:01:10.674808: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2020-07-05 15:01:10.675569: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0xcd015b80 executing computations on platform CUDA. Devices: 2020-07-05 15:01:10.675601: I tensorflow/compiler/xla/service/service.cc:175] StreamExecutor device (0): Tesla P100-PCIE-16GB, Compute Capability 6.0 2020-07-05 15:01:10.677270: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2000185000 Hz 2020-07-05 15:01:10.677454: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0xcd015800 executing computations on platform Host. Devices: 2020-07-05 15:01:10.677481: I tensorflow/compiler/xla/service/service.cc:175] StreamExecutor device (0): , 2020-07-05 15:01:10.677688: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2020-07-05 15:01:10.678238: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1640] Found device 0 with properties: name: Tesla P100-PCIE-16GB major: 6 minor: 0 memoryClockRate(GHz): 1.3285 pciBusID: 0000:00:04.0 2020-07-05 15:01:10.678764: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudart.so.10.0 2020-07-05 15:01:10.682118: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcublas.so.10.0 2020-07-05 15:01:10.684604: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcufft.so.10.0 2020-07-05 15:01:10.685235: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcurand.so.10.0 2020-07-05 15:01:10.688999: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcusolver.so.10.0 2020-07-05 15:01:10.690899: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcusparse.so.10.0 2020-07-05 15:01:10.691003: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudnn.so.7 2020-07-05 15:01:10.691109: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2020-07-05 15:01:10.691678: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2020-07-05 15:01:10.692183: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1763] Adding visible gpu devices: 0 2020-07-05 15:01:10.692252: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudart.so.10.0 2020-07-05 15:01:10.693576: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1181] Device interconnect StreamExecutor with strength 1 edge matrix: 2020-07-05 15:01:10.693601: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1187] 0 2020-07-05 15:01:10.693612: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1200] 0: N 2020-07-05 15:01:10.693725: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2020-07-05 15:01:10.694282: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2020-07-05 15:01:10.694774: W tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:40] Overriding allow_growth setting because the TF_FORCE_GPU_ALLOW_GROWTH environment variable is set. Original config value was 0. 2020-07-05 15:01:10.694810: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1326] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 15059 MB memory) -> physical GPU (device: 0, name: Tesla P100-PCIE-16GB, pci bus id: 0000:00:04.0, compute capability: 6.0) WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:1834: The name tf.nn.fused_batch_norm is deprecated. Please use tf.compat.v1.nn.fused_batch_norm instead.


    Layer (type) Output Shape Param # Connected to

    input_face (InputLayer) (None, 96, 96, 6) 0


    conv2d_1 (Conv2D) (None, 96, 96, 32) 9440 input_face[0][0]


    batch_normalization_1 (BatchNor (None, 96, 96, 32) 128 conv2d_1[0][0]


    activation_1 (Activation) (None, 96, 96, 32) 0 batch_normalization_1[0][0]


    conv2d_2 (Conv2D) (None, 48, 48, 64) 51264 activation_1[0][0]


    batch_normalization_2 (BatchNor (None, 48, 48, 64) 256 conv2d_2[0][0]


    activation_2 (Activation) (None, 48, 48, 64) 0 batch_normalization_2[0][0]


    conv2d_3 (Conv2D) (None, 48, 48, 64) 36928 activation_2[0][0]


    batch_normalization_3 (BatchNor (None, 48, 48, 64) 256 conv2d_3[0][0]


    activation_3 (Activation) (None, 48, 48, 64) 0 batch_normalization_3[0][0]


    conv2d_4 (Conv2D) (None, 48, 48, 64) 36928 activation_3[0][0]


    batch_normalization_4 (BatchNor (None, 48, 48, 64) 256 conv2d_4[0][0]


    activation_4 (Activation) (None, 48, 48, 64) 0 batch_normalization_4[0][0]


    add_1 (Add) (None, 48, 48, 64) 0 activation_4[0][0]
    activation_2[0][0]


    activation_5 (Activation) (None, 48, 48, 64) 0 add_1[0][0]


    input_audio (InputLayer) (None, 80, 27, 1) 0


    conv2d_5 (Conv2D) (None, 48, 48, 64) 36928 activation_5[0][0]


    conv2d_27 (Conv2D) (None, 80, 27, 32) 320 input_audio[0][0]


    batch_normalization_5 (BatchNor (None, 48, 48, 64) 256 conv2d_5[0][0]


    batch_normalization_27 (BatchNo (None, 80, 27, 32) 128 conv2d_27[0][0]


    activation_6 (Activation) (None, 48, 48, 64) 0 batch_normalization_5[0][0]


    activation_36 (Activation) (None, 80, 27, 32) 0 batch_normalization_27[0][0]


    conv2d_6 (Conv2D) (None, 48, 48, 64) 36928 activation_6[0][0]


    conv2d_28 (Conv2D) (None, 80, 27, 32) 9248 activation_36[0][0]


    batch_normalization_6 (BatchNor (None, 48, 48, 64) 256 conv2d_6[0][0]


    batch_normalization_28 (BatchNo (None, 80, 27, 32) 128 conv2d_28[0][0]


    activation_7 (Activation) (None, 48, 48, 64) 0 batch_normalization_6[0][0]


    activation_37 (Activation) (None, 80, 27, 32) 0 batch_normalization_28[0][0]


    add_2 (Add) (None, 48, 48, 64) 0 activation_7[0][0]
    activation_5[0][0]


    conv2d_29 (Conv2D) (None, 80, 27, 32) 9248 activation_37[0][0]


    activation_8 (Activation) (None, 48, 48, 64) 0 add_2[0][0]


    batch_normalization_29 (BatchNo (None, 80, 27, 32) 128 conv2d_29[0][0]


    conv2d_7 (Conv2D) (None, 24, 24, 128) 73856 activation_8[0][0]


    activation_38 (Activation) (None, 80, 27, 32) 0 batch_normalization_29[0][0]


    batch_normalization_7 (BatchNor (None, 24, 24, 128) 512 conv2d_7[0][0]


    add_10 (Add) (None, 80, 27, 32) 0 activation_38[0][0]
    activation_36[0][0]


    activation_9 (Activation) (None, 24, 24, 128) 0 batch_normalization_7[0][0]


    activation_39 (Activation) (None, 80, 27, 32) 0 add_10[0][0]


    conv2d_8 (Conv2D) (None, 24, 24, 128) 147584 activation_9[0][0]


    conv2d_30 (Conv2D) (None, 80, 27, 32) 9248 activation_39[0][0]


    batch_normalization_8 (BatchNor (None, 24, 24, 128) 512 conv2d_8[0][0]


    batch_normalization_30 (BatchNo (None, 80, 27, 32) 128 conv2d_30[0][0]


    activation_10 (Activation) (None, 24, 24, 128) 0 batch_normalization_8[0][0]


    activation_40 (Activation) (None, 80, 27, 32) 0 batch_normalization_30[0][0]


    conv2d_9 (Conv2D) (None, 24, 24, 128) 147584 activation_10[0][0]


    conv2d_31 (Conv2D) (None, 80, 27, 32) 9248 activation_40[0][0]


    batch_normalization_9 (BatchNor (None, 24, 24, 128) 512 conv2d_9[0][0]


    batch_normalization_31 (BatchNo (None, 80, 27, 32) 128 conv2d_31[0][0]


    activation_11 (Activation) (None, 24, 24, 128) 0 batch_normalization_9[0][0]


    activation_41 (Activation) (None, 80, 27, 32) 0 batch_normalization_31[0][0]


    add_3 (Add) (None, 24, 24, 128) 0 activation_11[0][0]
    activation_9[0][0]


    add_11 (Add) (None, 80, 27, 32) 0 activation_41[0][0]
    activation_39[0][0]


    activation_12 (Activation) (None, 24, 24, 128) 0 add_3[0][0]


    activation_42 (Activation) (None, 80, 27, 32) 0 add_11[0][0]


    conv2d_10 (Conv2D) (None, 24, 24, 128) 147584 activation_12[0][0]


    conv2d_32 (Conv2D) (None, 27, 9, 64) 18496 activation_42[0][0]


    batch_normalization_10 (BatchNo (None, 24, 24, 128) 512 conv2d_10[0][0]


    batch_normalization_32 (BatchNo (None, 27, 9, 64) 256 conv2d_32[0][0]


    activation_13 (Activation) (None, 24, 24, 128) 0 batch_normalization_10[0][0]


    activation_43 (Activation) (None, 27, 9, 64) 0 batch_normalization_32[0][0]


    conv2d_11 (Conv2D) (None, 24, 24, 128) 147584 activation_13[0][0]


    conv2d_33 (Conv2D) (None, 27, 9, 64) 36928 activation_43[0][0]


    batch_normalization_11 (BatchNo (None, 24, 24, 128) 512 conv2d_11[0][0]


    batch_normalization_33 (BatchNo (None, 27, 9, 64) 256 conv2d_33[0][0]


    activation_14 (Activation) (None, 24, 24, 128) 0 batch_normalization_11[0][0]


    activation_44 (Activation) (None, 27, 9, 64) 0 batch_normalization_33[0][0]


    add_4 (Add) (None, 24, 24, 128) 0 activation_14[0][0]
    activation_12[0][0]


    conv2d_34 (Conv2D) (None, 27, 9, 64) 36928 activation_44[0][0]


    activation_15 (Activation) (None, 24, 24, 128) 0 add_4[0][0]


    batch_normalization_34 (BatchNo (None, 27, 9, 64) 256 conv2d_34[0][0]


    conv2d_12 (Conv2D) (None, 24, 24, 128) 147584 activation_15[0][0]


    activation_45 (Activation) (None, 27, 9, 64) 0 batch_normalization_34[0][0]


    batch_normalization_12 (BatchNo (None, 24, 24, 128) 512 conv2d_12[0][0]


    add_12 (Add) (None, 27, 9, 64) 0 activation_45[0][0]
    activation_43[0][0]


    activation_16 (Activation) (None, 24, 24, 128) 0 batch_normalization_12[0][0]


    activation_46 (Activation) (None, 27, 9, 64) 0 add_12[0][0]


    conv2d_13 (Conv2D) (None, 24, 24, 128) 147584 activation_16[0][0]


    conv2d_35 (Conv2D) (None, 27, 9, 64) 36928 activation_46[0][0]


    batch_normalization_13 (BatchNo (None, 24, 24, 128) 512 conv2d_13[0][0]


    batch_normalization_35 (BatchNo (None, 27, 9, 64) 256 conv2d_35[0][0]


    activation_17 (Activation) (None, 24, 24, 128) 0 batch_normalization_13[0][0]


    activation_47 (Activation) (None, 27, 9, 64) 0 batch_normalization_35[0][0]


    add_5 (Add) (None, 24, 24, 128) 0 activation_17[0][0]
    activation_15[0][0]


    conv2d_36 (Conv2D) (None, 27, 9, 64) 36928 activation_47[0][0]


    activation_18 (Activation) (None, 24, 24, 128) 0 add_5[0][0]


    batch_normalization_36 (BatchNo (None, 27, 9, 64) 256 conv2d_36[0][0]


    conv2d_14 (Conv2D) (None, 12, 12, 256) 295168 activation_18[0][0]


    activation_48 (Activation) (None, 27, 9, 64) 0 batch_normalization_36[0][0]


    batch_normalization_14 (BatchNo (None, 12, 12, 256) 1024 conv2d_14[0][0]


    add_13 (Add) (None, 27, 9, 64) 0 activation_48[0][0]
    activation_46[0][0]


    activation_19 (Activation) (None, 12, 12, 256) 0 batch_normalization_14[0][0]


    activation_49 (Activation) (None, 27, 9, 64) 0 add_13[0][0]


    conv2d_15 (Conv2D) (None, 12, 12, 256) 590080 activation_19[0][0]


    conv2d_37 (Conv2D) (None, 9, 9, 128) 73856 activation_49[0][0]


    batch_normalization_15 (BatchNo (None, 12, 12, 256) 1024 conv2d_15[0][0]


    batch_normalization_37 (BatchNo (None, 9, 9, 128) 512 conv2d_37[0][0]


    activation_20 (Activation) (None, 12, 12, 256) 0 batch_normalization_15[0][0]


    activation_50 (Activation) (None, 9, 9, 128) 0 batch_normalization_37[0][0]


    conv2d_16 (Conv2D) (None, 12, 12, 256) 590080 activation_20[0][0]


    conv2d_38 (Conv2D) (None, 9, 9, 128) 147584 activation_50[0][0]


    batch_normalization_16 (BatchNo (None, 12, 12, 256) 1024 conv2d_16[0][0]


    batch_normalization_38 (BatchNo (None, 9, 9, 128) 512 conv2d_38[0][0]


    activation_21 (Activation) (None, 12, 12, 256) 0 batch_normalization_16[0][0]


    activation_51 (Activation) (None, 9, 9, 128) 0 batch_normalization_38[0][0]


    add_6 (Add) (None, 12, 12, 256) 0 activation_21[0][0]
    activation_19[0][0]


    conv2d_39 (Conv2D) (None, 9, 9, 128) 147584 activation_51[0][0]


    activation_22 (Activation) (None, 12, 12, 256) 0 add_6[0][0]


    batch_normalization_39 (BatchNo (None, 9, 9, 128) 512 conv2d_39[0][0]


    conv2d_17 (Conv2D) (None, 12, 12, 256) 590080 activation_22[0][0]


    activation_52 (Activation) (None, 9, 9, 128) 0 batch_normalization_39[0][0]


    batch_normalization_17 (BatchNo (None, 12, 12, 256) 1024 conv2d_17[0][0]


    add_14 (Add) (None, 9, 9, 128) 0 activation_52[0][0]
    activation_50[0][0]


    activation_23 (Activation) (None, 12, 12, 256) 0 batch_normalization_17[0][0]


    activation_53 (Activation) (None, 9, 9, 128) 0 add_14[0][0]


    conv2d_18 (Conv2D) (None, 12, 12, 256) 590080 activation_23[0][0]


    conv2d_40 (Conv2D) (None, 9, 9, 128) 147584 activation_53[0][0]


    batch_normalization_18 (BatchNo (None, 12, 12, 256) 1024 conv2d_18[0][0]


    batch_normalization_40 (BatchNo (None, 9, 9, 128) 512 conv2d_40[0][0]


    activation_24 (Activation) (None, 12, 12, 256) 0 batch_normalization_18[0][0]


    activation_54 (Activation) (None, 9, 9, 128) 0 batch_normalization_40[0][0]


    add_7 (Add) (None, 12, 12, 256) 0 activation_24[0][0]
    activation_22[0][0]


    conv2d_41 (Conv2D) (None, 9, 9, 128) 147584 activation_54[0][0]


    activation_25 (Activation) (None, 12, 12, 256) 0 add_7[0][0]


    batch_normalization_41 (BatchNo (None, 9, 9, 128) 512 conv2d_41[0][0]


    conv2d_19 (Conv2D) (None, 6, 6, 512) 1180160 activation_25[0][0]


    activation_55 (Activation) (None, 9, 9, 128) 0 batch_normalization_41[0][0]


    batch_normalization_19 (BatchNo (None, 6, 6, 512) 2048 conv2d_19[0][0]


    add_15 (Add) (None, 9, 9, 128) 0 activation_55[0][0]
    activation_53[0][0]


    activation_26 (Activation) (None, 6, 6, 512) 0 batch_normalization_19[0][0]


    activation_56 (Activation) (None, 9, 9, 128) 0 add_15[0][0]


    conv2d_20 (Conv2D) (None, 6, 6, 512) 2359808 activation_26[0][0]


    conv2d_42 (Conv2D) (None, 3, 3, 256) 295168 activation_56[0][0]


    batch_normalization_20 (BatchNo (None, 6, 6, 512) 2048 conv2d_20[0][0]


    batch_normalization_42 (BatchNo (None, 3, 3, 256) 1024 conv2d_42[0][0]


    activation_27 (Activation) (None, 6, 6, 512) 0 batch_normalization_20[0][0]


    activation_57 (Activation) (None, 3, 3, 256) 0 batch_normalization_42[0][0]


    conv2d_21 (Conv2D) (None, 6, 6, 512) 2359808 activation_27[0][0]


    conv2d_43 (Conv2D) (None, 3, 3, 256) 590080 activation_57[0][0]


    batch_normalization_21 (BatchNo (None, 6, 6, 512) 2048 conv2d_21[0][0]


    batch_normalization_43 (BatchNo (None, 3, 3, 256) 1024 conv2d_43[0][0]


    activation_28 (Activation) (None, 6, 6, 512) 0 batch_normalization_21[0][0]


    activation_58 (Activation) (None, 3, 3, 256) 0 batch_normalization_43[0][0]


    add_8 (Add) (None, 6, 6, 512) 0 activation_28[0][0]
    activation_26[0][0]


    conv2d_44 (Conv2D) (None, 3, 3, 256) 590080 activation_58[0][0]


    activation_29 (Activation) (None, 6, 6, 512) 0 add_8[0][0]


    batch_normalization_44 (BatchNo (None, 3, 3, 256) 1024 conv2d_44[0][0]


    conv2d_22 (Conv2D) (None, 6, 6, 512) 2359808 activation_29[0][0]


    activation_59 (Activation) (None, 3, 3, 256) 0 batch_normalization_44[0][0]


    batch_normalization_22 (BatchNo (None, 6, 6, 512) 2048 conv2d_22[0][0]


    add_16 (Add) (None, 3, 3, 256) 0 activation_59[0][0]
    activation_57[0][0]


    activation_30 (Activation) (None, 6, 6, 512) 0 batch_normalization_22[0][0]


    activation_60 (Activation) (None, 3, 3, 256) 0 add_16[0][0]


    conv2d_23 (Conv2D) (None, 6, 6, 512) 2359808 activation_30[0][0]


    conv2d_45 (Conv2D) (None, 3, 3, 256) 590080 activation_60[0][0]


    batch_normalization_23 (BatchNo (None, 6, 6, 512) 2048 conv2d_23[0][0]


    batch_normalization_45 (BatchNo (None, 3, 3, 256) 1024 conv2d_45[0][0]


    activation_31 (Activation) (None, 6, 6, 512) 0 batch_normalization_23[0][0]


    activation_61 (Activation) (None, 3, 3, 256) 0 batch_normalization_45[0][0]


    add_9 (Add) (None, 6, 6, 512) 0 activation_31[0][0]
    activation_29[0][0]


    conv2d_46 (Conv2D) (None, 3, 3, 256) 590080 activation_61[0][0]


    activation_32 (Activation) (None, 6, 6, 512) 0 add_9[0][0]


    batch_normalization_46 (BatchNo (None, 3, 3, 256) 1024 conv2d_46[0][0]


    conv2d_24 (Conv2D) (None, 3, 3, 512) 2359808 activation_32[0][0]


    activation_62 (Activation) (None, 3, 3, 256) 0 batch_normalization_46[0][0]


    batch_normalization_24 (BatchNo (None, 3, 3, 512) 2048 conv2d_24[0][0]


    add_17 (Add) (None, 3, 3, 256) 0 activation_62[0][0]
    activation_60[0][0]


    activation_33 (Activation) (None, 3, 3, 512) 0 batch_normalization_24[0][0]


    activation_63 (Activation) (None, 3, 3, 256) 0 add_17[0][0]


    conv2d_25 (Conv2D) (None, 1, 1, 512) 2359808 activation_33[0][0]


    conv2d_47 (Conv2D) (None, 1, 1, 512) 1180160 activation_63[0][0]


    batch_normalization_25 (BatchNo (None, 1, 1, 512) 2048 conv2d_25[0][0]


    batch_normalization_47 (BatchNo (None, 1, 1, 512) 2048 conv2d_47[0][0]


    activation_34 (Activation) (None, 1, 1, 512) 0 batch_normalization_25[0][0]


    activation_64 (Activation) (None, 1, 1, 512) 0 batch_normalization_47[0][0]


    conv2d_26 (Conv2D) (None, 1, 1, 512) 262656 activation_34[0][0]


    conv2d_48 (Conv2D) (None, 1, 1, 512) 262656 activation_64[0][0]


    batch_normalization_26 (BatchNo (None, 1, 1, 512) 2048 conv2d_26[0][0]


    batch_normalization_48 (BatchNo (None, 1, 1, 512) 2048 conv2d_48[0][0]


    activation_35 (Activation) (None, 1, 1, 512) 0 batch_normalization_26[0][0]


    activation_65 (Activation) (None, 1, 1, 512) 0 batch_normalization_48[0][0]


    concatenate_1 (Concatenate) (None, 1, 1, 1024) 0 activation_35[0][0]
    activation_65[0][0]


    conv2d_transpose_1 (Conv2DTrans (None, 3, 3, 512) 4719104 concatenate_1[0][0]


    batch_normalization_49 (BatchNo (None, 3, 3, 512) 2048 conv2d_transpose_1[0][0]


    activation_66 (Activation) (None, 3, 3, 512) 0 batch_normalization_49[0][0]


    concatenate_2 (Concatenate) (None, 3, 3, 1024) 0 activation_33[0][0]
    activation_66[0][0]


    conv2d_transpose_2 (Conv2DTrans (None, 6, 6, 512) 4719104 concatenate_2[0][0]


    batch_normalization_50 (BatchNo (None, 6, 6, 512) 2048 conv2d_transpose_2[0][0]


    activation_67 (Activation) (None, 6, 6, 512) 0 batch_normalization_50[0][0]


    conv2d_49 (Conv2D) (None, 6, 6, 512) 2359808 activation_67[0][0]


    batch_normalization_51 (BatchNo (None, 6, 6, 512) 2048 conv2d_49[0][0]


    activation_68 (Activation) (None, 6, 6, 512) 0 batch_normalization_51[0][0]


    conv2d_50 (Conv2D) (None, 6, 6, 512) 2359808 activation_68[0][0]


    batch_normalization_52 (BatchNo (None, 6, 6, 512) 2048 conv2d_50[0][0]


    activation_69 (Activation) (None, 6, 6, 512) 0 batch_normalization_52[0][0]


    add_18 (Add) (None, 6, 6, 512) 0 activation_69[0][0]
    activation_67[0][0]


    activation_70 (Activation) (None, 6, 6, 512) 0 add_18[0][0]


    conv2d_51 (Conv2D) (None, 6, 6, 512) 2359808 activation_70[0][0]


    batch_normalization_53 (BatchNo (None, 6, 6, 512) 2048 conv2d_51[0][0]


    activation_71 (Activation) (None, 6, 6, 512) 0 batch_normalization_53[0][0]


    conv2d_52 (Conv2D) (None, 6, 6, 512) 2359808 activation_71[0][0]


    batch_normalization_54 (BatchNo (None, 6, 6, 512) 2048 conv2d_52[0][0]


    activation_72 (Activation) (None, 6, 6, 512) 0 batch_normalization_54[0][0]


    add_19 (Add) (None, 6, 6, 512) 0 activation_72[0][0]
    activation_70[0][0]


    activation_73 (Activation) (None, 6, 6, 512) 0 add_19[0][0]


    concatenate_3 (Concatenate) (None, 6, 6, 1024) 0 activation_32[0][0]
    activation_73[0][0]


    conv2d_transpose_3 (Conv2DTrans (None, 12, 12, 256) 2359552 concatenate_3[0][0]


    batch_normalization_55 (BatchNo (None, 12, 12, 256) 1024 conv2d_transpose_3[0][0]


    activation_74 (Activation) (None, 12, 12, 256) 0 batch_normalization_55[0][0]


    conv2d_53 (Conv2D) (None, 12, 12, 256) 590080 activation_74[0][0]


    batch_normalization_56 (BatchNo (None, 12, 12, 256) 1024 conv2d_53[0][0]


    activation_75 (Activation) (None, 12, 12, 256) 0 batch_normalization_56[0][0]


    conv2d_54 (Conv2D) (None, 12, 12, 256) 590080 activation_75[0][0]


    batch_normalization_57 (BatchNo (None, 12, 12, 256) 1024 conv2d_54[0][0]


    activation_76 (Activation) (None, 12, 12, 256) 0 batch_normalization_57[0][0]


    add_20 (Add) (None, 12, 12, 256) 0 activation_76[0][0]
    activation_74[0][0]


    activation_77 (Activation) (None, 12, 12, 256) 0 add_20[0][0]


    conv2d_55 (Conv2D) (None, 12, 12, 256) 590080 activation_77[0][0]


    batch_normalization_58 (BatchNo (None, 12, 12, 256) 1024 conv2d_55[0][0]


    activation_78 (Activation) (None, 12, 12, 256) 0 batch_normalization_58[0][0]


    conv2d_56 (Conv2D) (None, 12, 12, 256) 590080 activation_78[0][0]


    batch_normalization_59 (BatchNo (None, 12, 12, 256) 1024 conv2d_56[0][0]


    activation_79 (Activation) (None, 12, 12, 256) 0 batch_normalization_59[0][0]


    add_21 (Add) (None, 12, 12, 256) 0 activation_79[0][0]
    activation_77[0][0]


    activation_80 (Activation) (None, 12, 12, 256) 0 add_21[0][0]


    concatenate_4 (Concatenate) (None, 12, 12, 512) 0 activation_25[0][0]
    activation_80[0][0]


    conv2d_transpose_4 (Conv2DTrans (None, 24, 24, 128) 589952 concatenate_4[0][0]


    batch_normalization_60 (BatchNo (None, 24, 24, 128) 512 conv2d_transpose_4[0][0]


    activation_81 (Activation) (None, 24, 24, 128) 0 batch_normalization_60[0][0]


    conv2d_57 (Conv2D) (None, 24, 24, 128) 147584 activation_81[0][0]


    batch_normalization_61 (BatchNo (None, 24, 24, 128) 512 conv2d_57[0][0]


    activation_82 (Activation) (None, 24, 24, 128) 0 batch_normalization_61[0][0]


    conv2d_58 (Conv2D) (None, 24, 24, 128) 147584 activation_82[0][0]


    batch_normalization_62 (BatchNo (None, 24, 24, 128) 512 conv2d_58[0][0]


    activation_83 (Activation) (None, 24, 24, 128) 0 batch_normalization_62[0][0]


    add_22 (Add) (None, 24, 24, 128) 0 activation_83[0][0]
    activation_81[0][0]


    activation_84 (Activation) (None, 24, 24, 128) 0 add_22[0][0]


    conv2d_59 (Conv2D) (None, 24, 24, 128) 147584 activation_84[0][0]


    batch_normalization_63 (BatchNo (None, 24, 24, 128) 512 conv2d_59[0][0]


    activation_85 (Activation) (None, 24, 24, 128) 0 batch_normalization_63[0][0]


    conv2d_60 (Conv2D) (None, 24, 24, 128) 147584 activation_85[0][0]


    batch_normalization_64 (BatchNo (None, 24, 24, 128) 512 conv2d_60[0][0]


    activation_86 (Activation) (None, 24, 24, 128) 0 batch_normalization_64[0][0]


    add_23 (Add) (None, 24, 24, 128) 0 activation_86[0][0]
    activation_84[0][0]


    activation_87 (Activation) (None, 24, 24, 128) 0 add_23[0][0]


    concatenate_5 (Concatenate) (None, 24, 24, 256) 0 activation_18[0][0]
    activation_87[0][0]


    conv2d_transpose_5 (Conv2DTrans (None, 48, 48, 64) 147520 concatenate_5[0][0]


    batch_normalization_65 (BatchNo (None, 48, 48, 64) 256 conv2d_transpose_5[0][0]


    activation_88 (Activation) (None, 48, 48, 64) 0 batch_normalization_65[0][0]


    conv2d_61 (Conv2D) (None, 48, 48, 64) 36928 activation_88[0][0]


    batch_normalization_66 (BatchNo (None, 48, 48, 64) 256 conv2d_61[0][0]


    activation_89 (Activation) (None, 48, 48, 64) 0 batch_normalization_66[0][0]


    conv2d_62 (Conv2D) (None, 48, 48, 64) 36928 activation_89[0][0]


    batch_normalization_67 (BatchNo (None, 48, 48, 64) 256 conv2d_62[0][0]


    activation_90 (Activation) (None, 48, 48, 64) 0 batch_normalization_67[0][0]


    add_24 (Add) (None, 48, 48, 64) 0 activation_90[0][0]
    activation_88[0][0]


    activation_91 (Activation) (None, 48, 48, 64) 0 add_24[0][0]


    conv2d_63 (Conv2D) (None, 48, 48, 64) 36928 activation_91[0][0]


    batch_normalization_68 (BatchNo (None, 48, 48, 64) 256 conv2d_63[0][0]


    activation_92 (Activation) (None, 48, 48, 64) 0 batch_normalization_68[0][0]


    conv2d_64 (Conv2D) (None, 48, 48, 64) 36928 activation_92[0][0]


    batch_normalization_69 (BatchNo (None, 48, 48, 64) 256 conv2d_64[0][0]


    activation_93 (Activation) (None, 48, 48, 64) 0 batch_normalization_69[0][0]


    add_25 (Add) (None, 48, 48, 64) 0 activation_93[0][0]
    activation_91[0][0]


    activation_94 (Activation) (None, 48, 48, 64) 0 add_25[0][0]


    concatenate_6 (Concatenate) (None, 48, 48, 128) 0 activation_8[0][0]
    activation_94[0][0]


    conv2d_transpose_6 (Conv2DTrans (None, 96, 96, 32) 36896 concatenate_6[0][0]


    batch_normalization_70 (BatchNo (None, 96, 96, 32) 128 conv2d_transpose_6[0][0]


    activation_95 (Activation) (None, 96, 96, 32) 0 batch_normalization_70[0][0]


    concatenate_7 (Concatenate) (None, 96, 96, 64) 0 activation_1[0][0]
    activation_95[0][0]


    conv2d_65 (Conv2D) (None, 96, 96, 16) 9232 concatenate_7[0][0]


    batch_normalization_71 (BatchNo (None, 96, 96, 16) 64 conv2d_65[0][0]


    activation_96 (Activation) (None, 96, 96, 16) 0 batch_normalization_71[0][0]


    conv2d_66 (Conv2D) (None, 96, 96, 16) 2320 activation_96[0][0]


    batch_normalization_72 (BatchNo (None, 96, 96, 16) 64 conv2d_66[0][0]


    activation_97 (Activation) (None, 96, 96, 16) 0 batch_normalization_72[0][0]


    conv2d_67 (Conv2D) (None, 96, 96, 3) 51 activation_97[0][0]


    prediction (Activation) (None, 96, 96, 3) 0 conv2d_67[0][0]

    Total params: 49,573,971 Trainable params: 49,543,123 Non-trainable params: 30,848


    WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/optimizers.py:790: The name tf.train.Optimizer is deprecated. Please use tf.compat.v1.train.Optimizer instead.

    Model Created Traceback (most recent call last): File "batch_inference.py", line 217, in main() File "batch_inference.py", line 193, in main model.load_weights(args.checkpoint_path) File "/usr/local/lib/python3.6/dist-packages/keras/engine/network.py", line 1166, in load_weights f, self.layers, reshape=reshape) File "/usr/local/lib/python3.6/dist-packages/keras/engine/saving.py", line 1056, in load_weights_from_hdf5_group ' elements.') ValueError: Layer #37 (named "batch_normalization_34" in the current model) was found to correspond to layer conv2d_35 in the save file. However the new layer batch_normalization_34 expects 4 weights, but the saved weights have 2 elements.

    opened by chetancc 5
  • Really Bad Lip Sync Results For Use Case 1

    Really Bad Lip Sync Results For Use Case 1

    Hi,

    Thank you for sharing the model and it's great to see the progress made overall.

    When testing, I've observed that while the results are somewhat as expected for talking face videos - i.e. use case 2, lip movements of picture - but the results are really bad when generating correct lip motion on a random talking face video - i.e. the Use Case 1 as mentioned in the repository. I am comparing with the results shown in Github or discussed in the paper.

    In the results generated for Use Case 1, the lip motion seems almost the same as source video. Basically I'm trying to understand if it's supposed to be like that or not. It appears as if there is nearly no lip-sync at all - the lip movements are almost like those in the source video and not much indicative of words being played in the input audio. If input audio has a pause, the lip movements keep happening if the source video had lip movements.

    I'm sharing some examples of the results to get a better idea of the model's capabilities - results are generated using a sample video of Obama and another sample audio of Obama

    Here is another example with a different video of Obama and another sample audio of Obama:

    To generate the results, I got someone to create a colab notebook and they used Librosa approach. I can share the notebook in case you want to see if some error was made.

    Again, really appreciate the model and I feel it represents a great advance in the overall tech, just creating the issue just to see whether the quality of results is what is to be expected from the model or whether it can be improved. Thank you.

    opened by abm505 5
  • Bump numpy from 1.13.3 to 1.22.0

    Bump numpy from 1.13.3 to 1.22.0

    Bumps numpy from 1.13.3 to 1.22.0.

    Release notes

    Sourced from numpy's releases.

    v1.22.0

    NumPy 1.22.0 Release Notes

    NumPy 1.22.0 is a big release featuring the work of 153 contributors spread over 609 pull requests. There have been many improvements, highlights are:

    • Annotations of the main namespace are essentially complete. Upstream is a moving target, so there will likely be further improvements, but the major work is done. This is probably the most user visible enhancement in this release.
    • A preliminary version of the proposed Array-API is provided. This is a step in creating a standard collection of functions that can be used across application such as CuPy and JAX.
    • NumPy now has a DLPack backend. DLPack provides a common interchange format for array (tensor) data.
    • New methods for quantile, percentile, and related functions. The new methods provide a complete set of the methods commonly found in the literature.
    • A new configurable allocator for use by downstream projects.

    These are in addition to the ongoing work to provide SIMD support for commonly used functions, improvements to F2PY, and better documentation.

    The Python versions supported in this release are 3.8-3.10, Python 3.7 has been dropped. Note that 32 bit wheels are only provided for Python 3.8 and 3.9 on Windows, all other wheels are 64 bits on account of Ubuntu, Fedora, and other Linux distributions dropping 32 bit support. All 64 bit wheels are also linked with 64 bit integer OpenBLAS, which should fix the occasional problems encountered by folks using truly huge arrays.

    Expired deprecations

    Deprecated numeric style dtype strings have been removed

    Using the strings "Bytes0", "Datetime64", "Str0", "Uint32", and "Uint64" as a dtype will now raise a TypeError.

    (gh-19539)

    Expired deprecations for loads, ndfromtxt, and mafromtxt in npyio

    numpy.loads was deprecated in v1.15, with the recommendation that users use pickle.loads instead. ndfromtxt and mafromtxt were both deprecated in v1.17 - users should use numpy.genfromtxt instead with the appropriate value for the usemask parameter.

    (gh-19615)

    ... (truncated)

    Commits

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    dependencies 
    opened by dependabot[bot] 0
  • Bump opencv-python from 4.0.0.21 to 4.2.0.32

    Bump opencv-python from 4.0.0.21 to 4.2.0.32

    Bumps opencv-python from 4.0.0.21 to 4.2.0.32.

    Release notes

    Sourced from opencv-python's releases.

    4.2.0.32

    OpenCV version 4.2.0.

    Changes:

    • macOS environment updated from xcode8.3 to xcode 9.4
    • macOS uses now Qt 5 instead of Qt 4
    • Nasm version updated to Docker containers
    • multibuild updated

    Fixes:

    • don't use deprecated brew tap-pin, instead refer to the full package name when installing #267
    • replace get_config_var() with get_config_vars() in setup.py #274
    • add workaround for DLL errors in Windows Server #264

    4.1.2.30

    OpenCV version 4.1.2.

    Changes:

    • Python 3.8 builds added to the build matrix
    • Support for Python 3.4 builds dropped (Python 3.4 is in EOL)
    • multibuild updated
    • minor build logic changes
    • Docker images rebuilt

    Notes:

    Please note that Python 2.7 enters into EOL phase in January 2020. opencv-python Python 2.7 wheels won't be provided after that.

    4.1.1.26

    OpenCV version 4.1.1.

    Changes:

    ... (truncated)

    Commits

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    dependencies 
    opened by dependabot[bot] 0
Owner
Rudrabha Mukhopadhyay
PhD Scholar, CVIT, IIIT Hyderabad
Rudrabha Mukhopadhyay
This project is part of Eleuther AI's quest to create a massive repository of high quality text data for training language models.

This project is part of Eleuther AI's quest to create a massive repository of high quality text data for training language models.

EleutherAI 42 Dec 13, 2022
This repository details the steps in creating a Part of Speech tagger using Trigram Hidden Markov Models and the Viterbi Algorithm without using external libraries.

POS-Tagger This repository details the creation of a Part-of-Speech tagger using Trigram Hidden Markov Models to predict word tags in a word sequence.

Raihan Ahmed 1 Dec 9, 2021
This repository contains data used in the NAACL 2021 Paper - Proteno: Text Normalization with Limited Data for Fast Deployment in Text to Speech Systems

Proteno This is the data release associated with the corresponding NAACL 2021 Paper - Proteno: Text Normalization with Limited Data for Fast Deploymen

null 37 Dec 4, 2022
This repository contains the code for EMNLP-2021 paper "Word-Level Coreference Resolution"

Word-Level Coreference Resolution This is a repository with the code to reproduce the experiments described in the paper of the same name, which was a

null 79 Dec 27, 2022
code for modular summarization work published in ACL2021 by Krishna et al

This repository contains the code for running modular summarization pipelines as described in the publication Krishna K, Khosla K, Bigham J, Lipton ZC

Approximately Correct Machine Intelligence (ACMI) Lab 21 Nov 24, 2022
code for modular summarization work published in ACL2021 by Krishna et al

This repository contains the code for running modular summarization pipelines as described in the publication Krishna K, Khosla K, Bigham J, Lipton ZC

Kundan Krishna 6 Jun 4, 2021
IndoBERTweet is the first large-scale pretrained model for Indonesian Twitter. Published at EMNLP 2021 (main conference)

IndoBERTweet ?? ???? 1. Paper Fajri Koto, Jey Han Lau, and Timothy Baldwin. IndoBERTweet: A Pretrained Language Model for Indonesian Twitter with Effe

IndoLEM 40 Nov 30, 2022
Multilingual Emotion classification using BERT (fine-tuning). Published at the WASSA workshop (ACL2022).

XLM-EMO: Multilingual Emotion Prediction in Social Media Text Abstract Detecting emotion in text allows social and computational scientists to study h

MilaNLP 35 Sep 17, 2022
PhoNLP: A BERT-based multi-task learning toolkit for part-of-speech tagging, named entity recognition and dependency parsing

PhoNLP is a multi-task learning model for joint part-of-speech (POS) tagging, named entity recognition (NER) and dependency parsing. Experiments on Vietnamese benchmark datasets show that PhoNLP produces state-of-the-art results, outperforming a single-task learning approach that fine-tunes the pre-trained Vietnamese language model PhoBERT for each task independently.

VinAI Research 109 Dec 2, 2022
Simple, Pythonic, text processing--Sentiment analysis, part-of-speech tagging, noun phrase extraction, translation, and more.

TextBlob: Simplified Text Processing Homepage: https://textblob.readthedocs.io/ TextBlob is a Python (2 and 3) library for processing textual data. It

Steven Loria 8.4k Dec 26, 2022
Toolkit for Machine Learning, Natural Language Processing, and Text Generation, in TensorFlow. This is part of the CASL project: http://casl-project.ai/

Texar is a toolkit aiming to support a broad set of machine learning, especially natural language processing and text generation tasks. Texar provides

ASYML 2.3k Jan 7, 2023
Bidirectional LSTM-CRF and ELMo for Named-Entity Recognition, Part-of-Speech Tagging and so on.

anaGo anaGo is a Python library for sequence labeling(NER, PoS Tagging,...), implemented in Keras. anaGo can solve sequence labeling tasks such as nam

Hiroki Nakayama 1.5k Dec 5, 2022
Simple, Pythonic, text processing--Sentiment analysis, part-of-speech tagging, noun phrase extraction, translation, and more.

TextBlob: Simplified Text Processing Homepage: https://textblob.readthedocs.io/ TextBlob is a Python (2 and 3) library for processing textual data. It

Steven Loria 7.5k Feb 17, 2021
Toolkit for Machine Learning, Natural Language Processing, and Text Generation, in TensorFlow. This is part of the CASL project: http://casl-project.ai/

Texar is a toolkit aiming to support a broad set of machine learning, especially natural language processing and text generation tasks. Texar provides

ASYML 2.1k Feb 17, 2021
Bidirectional LSTM-CRF and ELMo for Named-Entity Recognition, Part-of-Speech Tagging and so on.

anaGo anaGo is a Python library for sequence labeling(NER, PoS Tagging,...), implemented in Keras. anaGo can solve sequence labeling tasks such as nam

Hiroki Nakayama 1.4k Feb 17, 2021
Integrating the Best of TF into PyTorch, for Machine Learning, Natural Language Processing, and Text Generation. This is part of the CASL project: http://casl-project.ai/

Texar-PyTorch is a toolkit aiming to support a broad set of machine learning, especially natural language processing and text generation tasks. Texar

ASYML 726 Dec 30, 2022
Mirco Ravanelli 2.3k Dec 27, 2022
Ptorch NLU, a Chinese text classification and sequence annotation toolkit, supports multi class and multi label classification tasks of Chinese long text and short text, and supports sequence annotation tasks such as Chinese named entity recognition, part of speech tagging and word segmentation.

Pytorch-NLU,一个中文文本分类、序列标注工具包,支持中文长文本、短文本的多类、多标签分类任务,支持中文命名实体识别、词性标注、分词等序列标注任务。 Ptorch NLU, a Chinese text classification and sequence annotation toolkit, supports multi class and multi label classification tasks of Chinese long text and short text, and supports sequence annotation tasks such as Chinese named entity recognition, part of speech tagging and word segmentation.

null 186 Dec 24, 2022
Part of Speech Tagging using Hidden Markov Model (HMM) POS Tagger and Brill Tagger

Part of Speech Tagging using Hidden Markov Model (HMM) POS Tagger and Brill Tagger In this project, our aim is to tune, compare, and contrast the perf

Chirag Daryani 0 Dec 25, 2021