Code for paper 'Audio-Driven Emotional Video Portraits'.

Last update: Dec 31, 2022

Related tags

Audio EVP

Overview

Audio-Driven Emotional Video Portraits [CVPR2021]

Xinya Ji, Zhou Hang, Kaisiyuan Wang, Wayne Wu, Chen Change Loy, Xun Cao, Feng Xu

Given an audio clip and a target video, our Emotional Video Portraits (EVP) approach is capable of generating emotion-controllable talking portraits and change the emotion of them smoothly by interpolating at the latent space.

Installation

We train and test based on Python3.6 and Pytorch. To install the dependencies run:

pip install -r requirements.txt

Testing

Download the pre-trained models and data under the following link: google-drive (we release results of two target person: M003 and M030), unzip the test.zip and put the file in corresponding places.

Step1 : audio2landmark

The emotion of predicted landmark motions can be manipulated by the emotion features (recommanded):

python audio2lm/test.py --config config/target_test.yaml --audio path/to/audio --condition feature --emo_feature path/to/feature

or by the emotional audio of the target person:

python audio2lm/test.py --config config/target_test.yaml --audio path/to/audio --condition feature --emo_audio path/to/emo_audio

The results will be stored in results/target.mov

Step2 : landmark2video

A parametric 3D face model and the corresponding fitting algorithm should be used here to regress the geometry, expression and pose parameters of the predicted landmarks and the target video. Here we release some parameters of the testing results.

lm2video/data/target/3DMM/3DMM: images and landmark positions of the video

lm2video/data/target/3DMM/target_test: parameters of target's video

lm2video/data/target/3DMM/target_test_pose: pose parameters of video

lm2video/data/target/3DMM/test_results: parameters of predicted landmarks

Here we use vid2vid to generate video from edgemaps:
1. Generate the testing data by running:
```
python lm2video/lm2map.py
```
  and copy the results in lm2video/results/ to vid2vid/datasets/face/.
2. Replace the face_dataset.py and base_options.py in vid2vid to lm2video/face_dataset.py and lm2video/base_options.py, the 106 keypoint version.
3. Copy lm2video/data/target/latest_net_G0.pth to vid2vid/checkpoints/target/ , lm2video/test_target.sh to vid2vid/scripts/face and run:
```
bash ./scripts/face/test_target.sh
```

Training

Coming soon.

Citation

@article{ji2021audio,
  title={Audio-Driven Emotional Video Portraits},
  author={Ji, Xinya and Zhou, Hang and Wang, Kaisiyuan and Wu, Wayne and Loy, Chen Change and Cao, Xun and Xu, Feng},
  journal={arXiv preprint arXiv:2104.07452},
  year={2021}
}

Comments

The effectiveness on "Cross-Reconstructed Emotion Disentanglement" module

To ensure audio emotion and speech content are disentangled, you design a Cross-Reconstructed Emotion Disentanglement module in paper. In my opinion, emotion encoder and content encoder should be freeze once the disentanglement training if finished. But i found that the two pretrained models of two different subjects you provide has totally different weights in the emotion encoder and content encoder. Thus i guess that you finetune these two encoders together with other parts when you train your audio2lm module, but how can you guarantee the disentanglement once you finetune these two encoders?

opened by Dorniwang 3
About the landmark MEAN and PCA

Thank you for the great work. Could you please share the detail of calculating one specific person's landmark mean and pca? Take a specific person's video data in MEAD for example. We have multiple viewpoints, emotions and intensities. Could you please tell which videos are used to calculate the landmark and produce mean and pca? Like front view with intensity 3? or front view with intensity 2? or some other combination or even all the videos.

I am trying to pre-process the MEAD dataset and do some face reenactment work, I would appreciate it a lot if you may share the pre-processing details. Looking forward to the reply!

opened by DaddyJin 3
No such file or directory: base_weight.pickle
Enjoyed the paper - thanks! Looking forward to seeing the training code :) - A few things:

in test.py line 208 add_audio(config['video_dir'], opt.in_file) throws an error - AttributeError: 'Namespace' object has no attribute 'in_file' (the coide runs if this line is commented out).

Line 3 in M003_test.yaml video_dir: audio2lm/result/M003.mp4 throws an error - OpenCV: FFMPEG: tag 0x44495658/'XVID' is not supported with codec id 12 and format 'mp4 / MP4 (MPEG-4 Part 14)' that can be fixed by changing it to video_dir: audio2lm/result/M003.avi

In section 2 - running python lm2video/lm2map.py results in FileNotFoundError: [Errno 2] No such file or directory: './base_weight.pickle' - not sure how this file is generated or obtained. Any help much appreciated - 谢谢你
opened by jeesunkim 2
The implementation of "Cross-Reconstructed Emotion Disentanglement" module

Hi @jixinya, Thanks for your excellent work! I am curious about the implementation of Cross-Reconstructed Emotion Disentanglement. In the paper, you say, "Given four audio samples Xi,m, Xj,n, Xj,m, Xi,n" for disentangling. However, the implementation in this project is a little different: you sample 2 emotions and 3 contents and set X11, X21, X12, X23 as inputs, and X11, X11, X12, X12 as four targets to calculate the cross reconstruction loss and self reconstruction loss. (as below) https://github.com/jixinya/EVP/blob/990ea8b085a450b6fcc2c28b817989191e173218/train/disentanglement/code/dataload.py#L103-L107 Could you please explain this? Hope for your response.

opened by yChenN1 0
How are the data in voice and face key point files related

How are the data in mfcc and face key point landmark files related and whether they are time aligned |--train |---landmark |---dataSet-M030 --landmark ---mfcc

opened by pfeducode 0
3DMM parameters

Hi,

I am trying to test with my own dataset but have some trouble with generating 3dmm parameters. I am wondering if you can explain the procedure of how you get the parameters.

Thanks,

opened by junegoo94 0
Generate the data for training error

I am running python landmark / code / preProcess.py. Then, this path does not exist. How should it be operated

Traceback (most recent call last): File "/home/user/PycharmProjects/EVP/EVP/train/landmark/code/preprocess.py", line 145, in a = np.load(path) File "/home/user/anaconda3/envs/evp/lib/python3.6/site-packages/numpy/lib/npyio.py", line 416, in load fid = stack.enter_context(open(os_fspath(file), "rb")) FileNotFoundError: [Errno 2] No such file or directory: '/home/user/PycharmProjects/EVP/EVP/train/landmark/dataset_M030/landmark/M030_fear_3_026/5.npy'

opened by yangdaowu 1
question about training
There are some problems when I running python train/disentanglement/dtw/MFCC_dtw.py.

Can you tell the function in 132-140? https://github.com/jixinya/EVP/blob/1f725b8e23f5e29f6211d74e3c08636de7053239/train/disentanglement/dtw/MFCC_dtw.py#L132-L140

the dimension of mfcc_k is (13, ) is right? The dimension of mfcc_k can not feed into network.(it will cause dismatch dimension)

Waiting your reply. Thanks.
opened by Sample-design-alt 0

Owner

GitHub

cross-library (GStreamer + Core Audio + MAD + FFmpeg) audio decoding for Python

audioread Decode audio files using whichever backend is available. The library currently supports: Gstreamer via PyGObject. Core Audio on Mac OS X via

359 Feb 15, 2021

Audio spatialization over WebRTC and JACK Audio Connection Kit

Audio spatialization over WebRTC Spatify provides a framework for building multichannel installations using WebRTC.

34 Jun 29, 2022

Audio augmentations library for PyTorch for audio in the time-domain

Audio augmentations library for PyTorch for audio in the time-domain, with support for stochastic data augmentations as used often in self-supervised / contrastive learning.

166 Jan 8, 2023

praudio provides audio preprocessing framework for Deep Learning audio applications

praudio provides objects and a script for performing complex preprocessing operations on entire audio datasets with one command.

105 Dec 26, 2022

convert-to-opus-cli is a Python CLI program for converting audio files to opus audio format.

convert-to-opus-cli convert-to-opus-cli is a Python CLI program for converting audio files to opus audio format. Installation Must have installed ffmp

4 Dec 21, 2022

This bot can stream audio or video files and urls in telegram voice chats

Voice Chat Streamer This bot can stream audio or video files and urls in telegram voice chats :) ?? Follow me and star this repo for more telegram bot

4 Oct 9, 2022

Accompanying code for our paper "Point Cloud Audio Processing"

Point Cloud Audio Processing Krishna Subramani1, Paris Smaragdis1 1UIUC Paper For the necessary libraries/prerequisites, please use conda/anaconda to

17 Nov 17, 2022

Just-Music - Spotify API Driven Music Web app, that allows to listen and control and share songs

Just Music... Just Music Is A Web APP That Allows Users To Play Song Using Spoti

3 May 1, 2022

Using python to generate a bat script of repetitive lines of code that differ in some way but can sort out a group of audio files according to their common names

Batch Sorting Using python to generate a bat script of repetitive lines of code that differ in some way but can sort out a group of audio files accord