Audio2Face - Audio To Face With Python

FACEGOOD

Last update: Dec 26, 2022

Related tags

Deep Learning Audio2Face

Overview

Audio2Face

Discription

We create a project that transforms audio to blendshape weights,and drives the digital human,xiaomei,in UE project.

Base Module

The framework we used contains three parts.In Formant network step,we perform fixed-function analysis of the input audio clip.In the articulation network,we concatenate an emotional state vector to the output of each convolution layer after the ReLU activation. The fully-connected layers at the end expand the 256+E abstract features to blendshape weights .

Usage

this pipeline shows how we use FACEGOOD Audio2Face.

Test video

Prepare data

step1: record voice and video ,and create animation from video in maya. note: the voice must contain vowel ,exaggerated talking and normal talking.Dialogue covers as many pronunciations as possible.
step2: we deal the voice with LPC,to split the voice into segment frames corresponding to the animation frames in maya.

Input data

Use ExportBsWeights.py to export weights file from Maya.Then we can get BS_name.npy and BS_value.npy .

Use step1_LPC.py to deal with wav file to get lpc_*.npy . Preprocess the wav to 2d data.

train

we recommand that uses FACEGOOD avatary to produces trainning data.its fast and accurate. http://www.avatary.com

the data for train is stored in dataSet1

python step14_train.py --epochs 8 --dataSet dataSet1

test

In folder /test,we supply a test application named AiSpeech.
wo provide a pretrained model,zsmeif.pb
In floder /example/ueExample, we provide a packaged ue project that contains a digit human created by FACEGOOD can drived by /AiSpeech/zsmeif.py.

you can follow the steps below to use it:

make sure you connect the microphone to computer.
run the script in terminal.

python zsmeif.py
when the terminal show the message "run main", please run FaceGoodLiveLink.exe which is placed in /example/ueExample/ folder.
click and hold on the left mouse button on the screen in UE project, then you can talk with the AI model and wait for the voice and animation response.

Dependences

tersorflow-gpu 1.15

python-libs: pyaudio requests websocket websocket-client

Data

The testing data, Maya model, and ue4 test project can be downloaded from the link below.

data_all code : n6ty

GoogleDrive

Reference

Audio-Driven Facial Animation by Joint End-to-End Learning of Pose and Emotion

Contact

Wechat: FACEGOOD_CHINA
Email：[email protected]
Discord: https://discord.gg/V46y6uTdw8

License

Audio2Face Core is released under the terms of the MIT license.See COPYING for more information or see https://opensource.org/licenses/MIT.

Comments

Difference between dataSet1 and dataSetx?

Hi, what is the difference between dataSet1 and dataSetx?

Does it mean different people? Could we combine all data to train and get a person-independent model ?

Thanks!

opened by John-Yao 2
测试模型时报错：Error Main loop: HTTPSConnectionPool(host='api.talkinggenie.com', port=443): Max retries exceeded with url: /api/v2/public/authToken (Caused by ProxyError('Cannot connect to proxy.', OSError(0, 'Error')))

2022-08-08 16:43:18.557937: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudart64_100.dll WARNING:tensorflow:From D:\anaconda3\envs\audio2face_lqy\lib\site-packages\tensorflow_core\python\compat\v2_compat.py:68: disable_resource_variables (from tensorflow.python.ops.variab le_scope) is deprecated and will be removed in a future version. Instructions for updating: non-resource variables are not supported in the long term 2022-08-08 16:43:20.328710: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library nvcuda.dll 2022-08-08 16:43:20.347819: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties: name: NVIDIA GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.683 pciBusID: 0000:01:00.0 2022-08-08 16:43:20.356155: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudart64_100.dll 2022-08-08 16:43:20.364649: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cublas64_100.dll 2022-08-08 16:43:20.373982: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cufft64_100.dll 2022-08-08 16:43:20.377125: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library curand64_100.dll 2022-08-08 16:43:20.390252: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusolver64_100.dll 2022-08-08 16:43:20.398360: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusparse64_100.dll 2022-08-08 16:43:20.407310: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudnn64_7.dll 2022-08-08 16:43:20.407496: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1746] Adding visible gpu devices: 0 2022-08-08 16:43:20.407963: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 2022-08-08 16:43:20.409759: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties: name: NVIDIA GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.683 pciBusID: 0000:01:00.0 2022-08-08 16:43:20.409940: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudart64_100.dll 2022-08-08 16:43:20.410032: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cublas64_100.dll 2022-08-08 16:43:20.410148: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cufft64_100.dll 2022-08-08 16:43:20.410237: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library curand64_100.dll 2022-08-08 16:43:20.410325: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusolver64_100.dll 2022-08-08 16:43:20.410412: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusparse64_100.dll 2022-08-08 16:43:20.410500: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudnn64_7.dll 2022-08-08 16:43:20.410601: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1746] Adding visible gpu devices: 0 2022-08-08 16:43:20.998933: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1159] Device interconnect StreamExecutor with strength 1 edge matrix: 2022-08-08 16:43:20.999124: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1165] 0 2022-08-08 16:43:20.999265: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1178] 0: N 2022-08-08 16:43:20.999574: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1304] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 9640 MB memory) -> ph ysical GPU (device: 0, name: NVIDIA GeForce GTX 1080 Ti, pci bus id: 0000:01:00.0, compute capability: 6.1) WARNING:tensorflow:From D:\vedio2face\FACEGOOD-Audio2Face-main\FACEGOOD-Audio2Face-main\code\test\AiSpeech\lib\tensorflow\input_lpc_output_weight.py:20: FastGFile.init (from tenso rflow.python.platform.gfile) is deprecated and will be removed in a future version. Instructions for updating: Use tf.gfile.GFile. the cpus number is: 0 the cpus number is: 1 run main

Error Main loop: HTTPSConnectionPool(host='api.talkinggenie.com', port=443): Max retries exceeded with url: /api/v2/public/authToken (Caused by ProxyError('Cannot connect to proxy.', OSError(0, 'Error')))

opened by Study-Li404 0
与云渲染结合使用

您好，我初步了解了下您的项目，觉得和我们的产品有较大的合作空间。

我们这边专注于云渲染技术，就是把 UE4，UNITY3D 等三维应用上云然后通过轻终端的浏览器等方式访问。

在我们云渲染产品里已经把语音输入，智能语音交互（Speech）等功能集成了，如果与我们的云渲染结合使用，您这边可以专注于算法和三维渲染。

对于高保真数组人的场景，上云渲染可以解决对终端算力的依赖。

我们的接入Demo 点这里

我准备先初步测试下，如果有深度合作的想法可以联系我。

opened by jjunk1989 0
我获取的blendshape weight都很小，基本上都在10^-5 ~ 10^-3数量级，请问有可能是什么原因呢？

下面是我获取的一组完整的blendshape weight：

0.0,-0.0,0.0,-0.0,0.0,-0.0,0.0,0.0,0.0,6.478356226580217e-05,0.0,0.0,7.74457112129312e-06,1.4016297427588142e-05,0.0003456445410847664,0.0,0.0,-1.0420791113574523e-05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,-0.0,-0.0,1.4069833014218602e-05,0.0,0.0,0.0,0.0,-8.011257159523666e-05,0.0,1.212013557960745e-05,1.2120142855565064e-05,0.0020416416227817535,0.0020416416227817535,0.002010183408856392,0.002010183408856392,0.0,0.0,0.0,0.0,0.0,1.2120121027692221e-05,1.2120119208702818e-05,0.0008284172508865595,0.0008284170180559158,0.0,0.0,5.53658464923501e-05,5.536514800041914e-05,0.0029786918312311172,0.0029780641198158264,0.0,0.0,0.0028428025543689728,0.0007716789841651917,0.0,9.665172547101974e-05,9.665219113230705e-05,0.0012831706553697586,0.001283368095755577,0.0,0.0,0.0,0.0,0.006156831979751587,0.0003454512916505337,0.000345451757311821,0.0009102877229452133,0.0009102877229452133,0.0006938898004591465,0.00055687315762043,1.0965315595967695e-05,1.096533833333524e-05,0.0,-8.066563168540597e-07,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,-0.0,0.0,-0.0,0.0,-0.0,0.0,-0.0,0.0,0.0,-0.0,0.0,-0.0,0.0,-0.0,0.0,-0.0,0.0,-0.0,0.0,0.0

补充信息：语音用的是code\test\AiSpeech\res\xxx_00004.wav；别的语音也试过，也是同样的情况。

opened by leetesla 0

question about the implemtation of motion loss

split_y = tf.split(y,2,0) #参数分别为：tensor，拆分数，维度
split_y_ = tf.split(y_,2,0) #参数分别为：tensor，拆分数，维度
# print(10)
y0 = split_y[0]
y1 = split_y[1]
y_0 = split_y_[0]
y_1 = split_y_[1]
loss_M = 2 * tf.reduce_mean(tf.square(y0 - y1 -y_0 + y_1))

Currenly, the motion loss is not caculated on the adjacent frames. tf.split() only split the tensor to parts greedily.

y0 = y[::2, ...]
y1 = y[1::2, ...]
y_0 = y_[::2, ...]
y_1 = y_[1::2, ...]

This array slice with step 2 can generate adjancet frames.

opened by qhanson 0

Owner

FACEGOOD

Make a World of Avatars

GitHub

Audio2Face - Audio To Face With Python

Related tags

Overview

Audio2Face

Discription

Base Module

Usage

Prepare data

Input data

train

test

Dependences

Data

Reference

Contact

License

Comments

Difference between dataSet1 and dataSetx?

测试模型时报错：Error Main loop: HTTPSConnectionPool(host='api.talkinggenie.com', port=443): Max retries exceeded with url: /api/v2/public/authToken (Caused by ProxyError('Cannot connect to proxy.', OSError(0, 'Error')))

与云渲染结合使用

我获取的blendshape weight都很小，基本上都在10^-5 ~ 10^-3数量级，请问有可能是什么原因呢？

question about the implemtation of motion loss

Owner

FACEGOOD

Python tools for 3D face: 3DMM, Mesh processing(transform, camera, light, render), 3D face representations.

Face-Recognition-Attendence-System - This face recognition Attendence system using Python

Video-face-extractor - Video face extractor with Python

Code for Pose-Controllable Talking Face Generation by Implicitly Modularized Audio-Visual Representation (CVPR 2021)

Code for One-shot Talking Face Generation from Single-speaker Audio-Visual Correlation Learning (AAAI 2022)

img2pose: Face Alignment and Detection via 6DoF, Face Pose Estimation

Code for HLA-Face: Joint High-Low Adaptation for Low Light Face Detection (CVPR21)

DVG-Face: Dual Variational Generation for Heterogeneous Face Recognition, TPAMI 2021

[TIP 2021] SADRNet: Self-Aligned Dual Face Regression Networks for Robust 3D Dense Face Alignment and Reconstruction

Realtime Face Anti Spoofing with Face Detector based on Deep Learning using Tensorflow/Keras and OpenCV

Swapping face using Face Mesh with TensorFlow Lite

Face Synthetics dataset is a collection of diverse synthetic face images with ground truth labels.

Face Library is an open source package for accurate and real-time face detection and recognition

VGGFace2-HQ - A high resolution face dataset for face editing purpose

A large-scale face dataset for face parsing, recognition, generation and editing.

AI Face Mesh: This is a simple face mesh detection program based on Artificial intelligence.

Face and Pose detector that emits MQTT events when a face or human body is detected and not detected.

Python codes for Lite Audio-Visual Speech Enhancement.

Python script that takes an Impulse response .wav and a input .wav to demonstrate audio convolution.