Code for paper 'Audio-Driven Emotional Video Portraits'.

Related tags

Audio EVP
Overview

Audio-Driven Emotional Video Portraits [CVPR2021]

Xinya Ji, Zhou Hang, Kaisiyuan Wang, Wayne Wu, Chen Change Loy, Xun Cao, Feng Xu

[Project] [Paper]

visualization

Given an audio clip and a target video, our Emotional Video Portraits (EVP) approach is capable of generating emotion-controllable talking portraits and change the emotion of them smoothly by interpolating at the latent space.

Installation

We train and test based on Python3.6 and Pytorch. To install the dependencies run:

pip install -r requirements.txt

Testing

  • Download the pre-trained models and data under the following link: google-drive (we release results of two target person: M003 and M030), unzip the test.zip and put the file in corresponding places.

  • Step1 : audio2landmark

    The emotion of predicted landmark motions can be manipulated by the emotion features (recommanded):

    python audio2lm/test.py --config config/target_test.yaml --audio path/to/audio --condition feature --emo_feature path/to/feature
    

    or by the emotional audio of the target person:

    python audio2lm/test.py --config config/target_test.yaml --audio path/to/audio --condition feature --emo_audio path/to/emo_audio
    

    The results will be stored in results/target.mov

  • Step2 : landmark2video

    A parametric 3D face model and the corresponding fitting algorithm should be used here to regress the geometry, expression and pose parameters of the predicted landmarks and the target video. Here we release some parameters of the testing results.

    lm2video/data/target/3DMM/3DMM: images and landmark positions of the video

    lm2video/data/target/3DMM/target_test: parameters of target's video

    lm2video/data/target/3DMM/target_test_pose: pose parameters of video

    lm2video/data/target/3DMM/test_results: parameters of predicted landmarks

    Here we use vid2vid to generate video from edgemaps:

    1. Generate the testing data by running:

      python lm2video/lm2map.py
      

      and copy the results in lm2video/results/ to vid2vid/datasets/face/.

    2. Replace the face_dataset.py and base_options.py in vid2vid to lm2video/face_dataset.py and lm2video/base_options.py, the 106 keypoint version.

    3. Copy lm2video/data/target/latest_net_G0.pth to vid2vid/checkpoints/target/ , lm2video/test_target.sh to vid2vid/scripts/face and run:

      bash ./scripts/face/test_target.sh
      

Training

  • Coming soon.

Citation

@article{ji2021audio,
  title={Audio-Driven Emotional Video Portraits},
  author={Ji, Xinya and Zhou, Hang and Wang, Kaisiyuan and Wu, Wayne and Loy, Chen Change and Cao, Xun and Xu, Feng},
  journal={arXiv preprint arXiv:2104.07452},
  year={2021}
}
Comments
  • The effectiveness on

    The effectiveness on "Cross-Reconstructed Emotion Disentanglement" module

    To ensure audio emotion and speech content are disentangled, you design a Cross-Reconstructed Emotion Disentanglement module in paper. In my opinion, emotion encoder and content encoder should be freeze once the disentanglement training if finished. But i found that the two pretrained models of two different subjects you provide has totally different weights in the emotion encoder and content encoder. Thus i guess that you finetune these two encoders together with other parts when you train your audio2lm module, but how can you guarantee the disentanglement once you finetune these two encoders?

    opened by Dorniwang 3
  • About the landmark MEAN and PCA

    About the landmark MEAN and PCA

    Thank you for the great work. Could you please share the detail of calculating one specific person's landmark mean and pca? Take a specific person's video data in MEAD for example. We have multiple viewpoints, emotions and intensities. Could you please tell which videos are used to calculate the landmark and produce mean and pca? Like front view with intensity 3? or front view with intensity 2? or some other combination or even all the videos.

    I am trying to pre-process the MEAD dataset and do some face reenactment work, I would appreciate it a lot if you may share the pre-processing details. Looking forward to the reply!

    opened by DaddyJin 3
  • No such file or directory: base_weight.pickle

    No such file or directory: base_weight.pickle

    Enjoyed the paper - thanks! Looking forward to seeing the training code :) - A few things:

    1. in test.py line 208 add_audio(config['video_dir'], opt.in_file) throws an error - AttributeError: 'Namespace' object has no attribute 'in_file' (the coide runs if this line is commented out).
    2. Line 3 in M003_test.yaml video_dir: audio2lm/result/M003.mp4 throws an error - OpenCV: FFMPEG: tag 0x44495658/'XVID' is not supported with codec id 12 and format 'mp4 / MP4 (MPEG-4 Part 14)' that can be fixed by changing it to video_dir: audio2lm/result/M003.avi
    3. In section 2 - running python lm2video/lm2map.py results in FileNotFoundError: [Errno 2] No such file or directory: './base_weight.pickle' - not sure how this file is generated or obtained. Any help much appreciated - 谢谢你
    opened by jeesunkim 2
  • The implementation of

    The implementation of "Cross-Reconstructed Emotion Disentanglement" module

    Hi @jixinya, Thanks for your excellent work! I am curious about the implementation of Cross-Reconstructed Emotion Disentanglement. In the paper, you say, "Given four audio samples Xi,m, Xj,n, Xj,m, Xi,n" for disentangling. However, the implementation in this project is a little different: you sample 2 emotions and 3 contents and set X11, X21, X12, X23 as inputs, and X11, X11, X12, X12 as four targets to calculate the cross reconstruction loss and self reconstruction loss. (as below) https://github.com/jixinya/EVP/blob/990ea8b085a450b6fcc2c28b817989191e173218/train/disentanglement/code/dataload.py#L103-L107 Could you please explain this? Hope for your response.

    opened by yChenN1 0
  • How are the data in voice and face key point files related

    How are the data in voice and face key point files related

    How are the data in mfcc and face key point landmark files related and whether they are time aligned |--train |---landmark |---dataSet-M030 --landmark ---mfcc

    opened by pfeducode 0
  • 3DMM parameters

    3DMM parameters

    Hi,

    I am trying to test with my own dataset but have some trouble with generating 3dmm parameters. I am wondering if you can explain the procedure of how you get the parameters.

    Thanks,

    opened by junegoo94 0
  • Generate the data for training error

    Generate the data for training error

    I am running python landmark / code / preProcess.py. Then, this path does not exist. How should it be operated

    Traceback (most recent call last): File "/home/user/PycharmProjects/EVP/EVP/train/landmark/code/preprocess.py", line 145, in a = np.load(path) File "/home/user/anaconda3/envs/evp/lib/python3.6/site-packages/numpy/lib/npyio.py", line 416, in load fid = stack.enter_context(open(os_fspath(file), "rb")) FileNotFoundError: [Errno 2] No such file or directory: '/home/user/PycharmProjects/EVP/EVP/train/landmark/dataset_M030/landmark/M030_fear_3_026/5.npy'

    opened by yangdaowu 1
  • question about training

    question about training

    There are some problems when I running python train/disentanglement/dtw/MFCC_dtw.py.

    1. Can you tell the function in 132-140? https://github.com/jixinya/EVP/blob/1f725b8e23f5e29f6211d74e3c08636de7053239/train/disentanglement/dtw/MFCC_dtw.py#L132-L140
    2. the dimension of mfcc_k is (13, ) is right? The dimension of mfcc_k can not feed into network.(it will cause dismatch dimension)

    Waiting your reply. Thanks.

    opened by Sample-design-alt 0
Owner
null
cross-library (GStreamer + Core Audio + MAD + FFmpeg) audio decoding for Python

audioread Decode audio files using whichever backend is available. The library currently supports: Gstreamer via PyGObject. Core Audio on Mac OS X via

beetbox 359 Feb 15, 2021
Audio spatialization over WebRTC and JACK Audio Connection Kit

Audio spatialization over WebRTC Spatify provides a framework for building multichannel installations using WebRTC.

Bruno Gola 34 Jun 29, 2022
Audio augmentations library for PyTorch for audio in the time-domain

Audio augmentations library for PyTorch for audio in the time-domain, with support for stochastic data augmentations as used often in self-supervised / contrastive learning.

Janne 166 Jan 8, 2023
praudio provides audio preprocessing framework for Deep Learning audio applications

praudio provides objects and a script for performing complex preprocessing operations on entire audio datasets with one command.

Valerio Velardo 105 Dec 26, 2022
convert-to-opus-cli is a Python CLI program for converting audio files to opus audio format.

convert-to-opus-cli convert-to-opus-cli is a Python CLI program for converting audio files to opus audio format. Installation Must have installed ffmp

null 4 Dec 21, 2022
This bot can stream audio or video files and urls in telegram voice chats

Voice Chat Streamer This bot can stream audio or video files and urls in telegram voice chats :) ?? Follow me and star this repo for more telegram bot

WiskeyWorm 4 Oct 9, 2022
Accompanying code for our paper "Point Cloud Audio Processing"

Point Cloud Audio Processing Krishna Subramani1, Paris Smaragdis1 1UIUC Paper For the necessary libraries/prerequisites, please use conda/anaconda to

Krishna Subramani 17 Nov 17, 2022
Just-Music - Spotify API Driven Music Web app, that allows to listen and control and share songs

Just Music... Just Music Is A Web APP That Allows Users To Play Song Using Spoti

Ayush Mishra 3 May 1, 2022
Using python to generate a bat script of repetitive lines of code that differ in some way but can sort out a group of audio files according to their common names

Batch Sorting Using python to generate a bat script of repetitive lines of code that differ in some way but can sort out a group of audio files accord

David Mainoo 1 Oct 29, 2021
Code for csig audio deepfake detection

FMFCC Audio Deepfake Detection Solution This repo provides an solution for the 多媒体伪造取证大赛. Our solution achieve the 1st in the Audio Deepfake Detection

BokingChen 9 Jun 4, 2022
Audio fingerprinting and recognition in Python

dejavu Audio fingerprinting and recognition algorithm implemented in Python, see the explanation here: How it works Dejavu can memorize audio by liste

Will Drevo 6k Jan 6, 2023
kapre: Keras Audio Preprocessors

Kapre Keras Audio Preprocessors - compute STFT, ISTFT, Melspectrogram, and others on GPU real-time. Tested on Python 3.6 and 3.7 Why Kapre? vs. Pre-co

Keunwoo Choi 867 Dec 29, 2022
Python library for audio and music analysis

librosa A python package for music and audio analysis. Documentation See https://librosa.org/doc/ for a complete reference manual and introductory tut

librosa 5.6k Jan 6, 2023
?️ Open Source Audio Matching and Mastering

Matching + Mastering = ❤️ Matchering 2.0 is a novel Containerized Web Application and Python Library for audio matching and mastering. It follows a si

Sergey Grishakov 781 Jan 5, 2023
Python Audio Analysis Library: Feature Extraction, Classification, Segmentation and Applications

A Python library for audio feature extraction, classification, segmentation and applications This doc contains general info. Click here for the comple

Theodoros Giannakopoulos 5.1k Jan 2, 2023
Manipulate audio with a simple and easy high level interface

Pydub Pydub lets you do stuff to audio in a way that isn't stupid. Stuff you might be looking for: Installing Pydub API Documentation Dependencies Pla

James Robert 6.6k Jan 1, 2023
Scalable audio processing framework written in Python with a RESTful API

TimeSide : scalable audio processing framework and server written in Python TimeSide is a python framework enabling low and high level audio analysis,

Parisson 340 Jan 4, 2023
Python module for handling audio metadata

Mutagen is a Python module to handle audio metadata. It supports ASF, FLAC, MP4, Monkey's Audio, MP3, Musepack, Ogg Opus, Ogg FLAC, Ogg Speex, Ogg The

Quod Libet 1.1k Dec 31, 2022
Python I/O for STEM audio files

stempeg = stems + ffmpeg Python package to read and write STEM audio files. Technically, stems are audio containers that combine multiple audio stream

Fabian-Robert Stöter 72 Dec 23, 2022