AudioDVP:Photorealistic Audio-driven Video Portraits

Related tags

Audio AudioDVP
Overview

AudioDVP

This is the official implementation of Photorealistic Audio-driven Video Portraits.

Major Requirements

  • Ubuntu >= 18.04
  • PyTorch >= 1.2
  • GCC >= 7.5
  • NVCC >= 10.1
  • FFmpeg (with H.264 support)

FYI, detailed environment setup is in enviroment.yml. (You definitely don't have to install all of them, just install what you need when you encounter an import error.)

Major implementation differences against original paper

  • Geometry parameter and texture parameter of 3DMM is now initialized from zero and shared among all samples during fitting, since it is more reasonable.

  • Using OpenCV rather than PIL for image editing operation.

Usage

1. Download face model data

  • Download Basel Face Model 2009. (Register and get 01_MorphableModel.mat.)

  • Download expression basis from 3DFace. (There is an Exp_Pca.bin in CoarseData.)

  • Download auxiliary files from Deep3DFaceReconstruction.

  • Put the data in renderer/data like the structure below.

    renderer/data
    ├── 01_MorphableModel.mat
    ├── Exp_Pca.bin
    ├── BFM_front_idx.mat
    ├── BFM_exp_idx.mat
    ├── facemodel_info.mat
    ├── select_vertex_id.mat
    ├── std_exp.txt
    └── data.mat(This is generated by the step 2 below.)
    

2. Build data

cd renderer/
python build_data.py

3.Download pretrained model of ATnet

  • The link is here.
  • Put atnet_lstm_18.pth in vendor/ATVGnet/model.

4.Download pretrained ResNet on VGGFace2

  • The link is here.
  • Put resnet50_ft_weight.pkl in weights

5.Download Trump speech video

  • The link is here. (Video courtesy of The White House.)
  • Put it in data/video

6.Compile CUDA rasterizer kernel

cd renderer/kernels
python setup.py build_ext --inplace

7.Running demo script

# Explanation of every step is provided.
./scripts/demo.sh

Since we provide both training and inference code, we won't upload pretrained model for brevity at present. We provide expected result in data/sample_result.mp4 using synthesized audio in data/test_audio.

Acknowledgment

This work is build upon many great open source code and data.

Notification

  • Our method is built upon Deep Video Portraits.
  • Our method adopts a person-specific Audio2Expression module, which is not robust enough than a universal one trained on large dataset such as Lip Reading Sentences in the Wild. A universal one is encouraged! Fortunately, our method works quite well on WaveNet sythesized audio like provided in data/test_audio.
  • The code IS NOT fully tested on another clean machine.
  • There is a known bug in the rasterizer that several pixels of rendered face are black (not assigned with any color) in some corner conditions due to float point error which I can't fix.

Disclaimer

We made this code publicly available to benefit graphics and vision community. Please DO NOT abuse the code for devil things.

Citation

@article{wen2020audiodvp,
    author={Xin Wen and Miao Wang and Christian Richardt and Ze-Yin Chen and Shi-Min Hu},
    journal={IEEE Transactions on Visualization and Computer Graphics}, 
    title={Photorealistic Audio-driven Video Portraits}, 
    year={2020},
    volume={26},
    number={12},
    pages={3457-3466},
    doi={10.1109/TVCG.2020.3023573}
}

License

BSD

Comments
  • Step: Crop and resize video frames too slow

    Step: Crop and resize video frames too slow

    crop and resize video frames

    python utils/crop_portrait.py \

    --data_dir $video_dir \

    --crop_level 2.0 \

    --vertical_adjust 0.2

    image

    Do you have the same problem? Or is it have to be slow?

    opened by House-Leo 6
  • Can't find expression basis

    Can't find expression basis

    Hi,

    Thank you for your amazing work! I want to run your code, but I can't find expression basis from https://github.com/Juyong/3DFace .TAT

    Did I miss important information?

    opened by qianyunw 6
  • RuntimeError: triangles.is_contiguous() INTERNAL ASSERT FAILED at

    RuntimeError: triangles.is_contiguous() INTERNAL ASSERT FAILED at "rasterize_triangles.cpp"

    This error occurs when I run the train.py script. I don't know how to fix it. python train.py --data_dir $video_dir --num_epoch 20 --serial_batches False --display_freq 400 --print_freq 400 --batch_size 5

    image

    image

    opened by iamchenxin-coder 4
  • When I go to the

    When I go to the "dataset [singledataset] was created" step,Something went wrong

    Error in rasterize_triangles_cuda_forward: invalid device function Error in rasterize_triangles_cuda_forward: invalid device function (epoch: 0, iters: 400, data: 0.293, comp: 0.205) Photometric: 0.00000 Landmark: 1836.84644 Alpha: 0.05272 Beta: 0.00000 Delta: 63.04750

    opened by supersarr 3
  • triangles must be contiguous

    triangles must be contiguous

    RuntimeError: triangles.is_contiguous() INTERNAL ASSERT FAILED at rasterize_triangles.cpp:47, please report a bug to PyTorch. triangles must be contiguous

    I meet this problem,do you know how to solve it? thanks.

    duplicate 
    opened by ghost 3
  • RuntimeError: triangles.is_contiguous() INTERNAL ASSERT FAILED   ,can u give some ad

    RuntimeError: triangles.is_contiguous() INTERNAL ASSERT FAILED ,can u give some ad

    when I run:

    # # 3D face reconstruction

    python train.py
    --data_dir $video_dir
    --num_epoch 20
    --serial_batches False
    --display_freq 400
    --print_freq 400
    --batch_size 5

    opened by anthonyyuan 3
  • 关于Audio2expression模型相关求助

    关于Audio2expression模型相关求助

    作者您好,感谢您开源的相关代码。 留意到您的Audio2expression模型仅使用3个卷积+一个全连接, 我们在tensorboard上观察loss下降至3e-3左右,我们使用之进行推理发现存在相关问题: 1.嘴唇变化幅度较小 2.人物嘴唇存在抖动情况(类似于打哆嗦) 我们尝试修改Audio2expression(增加全连接层数量,增大batch_size)可让loss降低至1.5e-4, 但是我们发现其表现效果更差了(哆嗦情况加重,反复抖动) 您是否有比较好的意见或者建议呢? 感谢!

    opened by 821029883 2
  • ./scripts/demo.sh: 行 104: 30820 已杀死     会爆内存

    ./scripts/demo.sh: 行 104: 30820 已杀死 会爆内存

    ./scripts/demo.sh: 行 104: 30820 已杀死 会爆内存,应该在哪部降低内存呢?

    ./scripts/demo.sh: 行 104: 30820 已杀死 /home/chenbl/anaconda3/envs/cbl/bin/python3 vendor/neural-face-renderer/train.py --dataroot $video_dir/nfr/AB --name fr --model nfr --checkpoints_dir $video_dir/checkpoints --netG unet_256 --direction BtoA --lambda_L1 100 --dataset_mode temporal --norm batch --pool_size 0 --use_refine --nput_nc 21 --Nw 7 --batch_size 8 --preprocess none --num_threads 4 --n_epochs 6 --n_epochs_decay 0 --load_size 256

    opened by ghost 2
  • pretrained model

    pretrained model

    Hello, can you provide a pre training model?Hello, can you provide a pre training model? Want to see other specific effects, baidu cloud link what do you have?

    opened by chensheng-code 2
  • Could not find  the rasterize_triangles_cpp

    Could not find the rasterize_triangles_cpp

    ERROR: Could not find a version that satisfies the requirement rasterize_triangles_cpp==0.0.0 (from versions: none) ERROR: No matching distribution found for rasterize_triangles_cpp==0.0.0 try: 1)won't work to change all kinds of sources 2)won't work to delete "==0.0.0"

    opened by doctorimage 2
  • fix issues #9 and #23

    fix issues #9 and #23

    Fix the error occured while using Pytorch 1.8 and CUDA 11.1.

    RuntimeError: triangles.is_contiguous() INTERNAL ASSERT FAILED at "rasterize_triangles.cpp":47, please report a bug to PyTorch. triangles must be contiguous
    
    opened by quqixun 1
cross-library (GStreamer + Core Audio + MAD + FFmpeg) audio decoding for Python

audioread Decode audio files using whichever backend is available. The library currently supports: Gstreamer via PyGObject. Core Audio on Mac OS X via

beetbox 359 Feb 15, 2021
Audio spatialization over WebRTC and JACK Audio Connection Kit

Audio spatialization over WebRTC Spatify provides a framework for building multichannel installations using WebRTC.

Bruno Gola 34 Jun 29, 2022
Audio augmentations library for PyTorch for audio in the time-domain

Audio augmentations library for PyTorch for audio in the time-domain, with support for stochastic data augmentations as used often in self-supervised / contrastive learning.

Janne 166 Jan 8, 2023
praudio provides audio preprocessing framework for Deep Learning audio applications

praudio provides objects and a script for performing complex preprocessing operations on entire audio datasets with one command.

Valerio Velardo 105 Dec 26, 2022
convert-to-opus-cli is a Python CLI program for converting audio files to opus audio format.

convert-to-opus-cli convert-to-opus-cli is a Python CLI program for converting audio files to opus audio format. Installation Must have installed ffmp

null 4 Dec 21, 2022
This bot can stream audio or video files and urls in telegram voice chats

Voice Chat Streamer This bot can stream audio or video files and urls in telegram voice chats :) ?? Follow me and star this repo for more telegram bot

WiskeyWorm 4 Oct 9, 2022
Just-Music - Spotify API Driven Music Web app, that allows to listen and control and share songs

Just Music... Just Music Is A Web APP That Allows Users To Play Song Using Spoti

Ayush Mishra 3 May 1, 2022
Audio fingerprinting and recognition in Python

dejavu Audio fingerprinting and recognition algorithm implemented in Python, see the explanation here: How it works Dejavu can memorize audio by liste

Will Drevo 6k Jan 6, 2023
kapre: Keras Audio Preprocessors

Kapre Keras Audio Preprocessors - compute STFT, ISTFT, Melspectrogram, and others on GPU real-time. Tested on Python 3.6 and 3.7 Why Kapre? vs. Pre-co

Keunwoo Choi 867 Dec 29, 2022
Python library for audio and music analysis

librosa A python package for music and audio analysis. Documentation See https://librosa.org/doc/ for a complete reference manual and introductory tut

librosa 5.6k Jan 6, 2023
?️ Open Source Audio Matching and Mastering

Matching + Mastering = ❤️ Matchering 2.0 is a novel Containerized Web Application and Python Library for audio matching and mastering. It follows a si

Sergey Grishakov 781 Jan 5, 2023
Python Audio Analysis Library: Feature Extraction, Classification, Segmentation and Applications

A Python library for audio feature extraction, classification, segmentation and applications This doc contains general info. Click here for the comple

Theodoros Giannakopoulos 5.1k Jan 2, 2023
Manipulate audio with a simple and easy high level interface

Pydub Pydub lets you do stuff to audio in a way that isn't stupid. Stuff you might be looking for: Installing Pydub API Documentation Dependencies Pla

James Robert 6.6k Jan 1, 2023
Scalable audio processing framework written in Python with a RESTful API

TimeSide : scalable audio processing framework and server written in Python TimeSide is a python framework enabling low and high level audio analysis,

Parisson 340 Jan 4, 2023
Python module for handling audio metadata

Mutagen is a Python module to handle audio metadata. It supports ASF, FLAC, MP4, Monkey's Audio, MP3, Musepack, Ogg Opus, Ogg FLAC, Ogg Speex, Ogg The

Quod Libet 1.1k Dec 31, 2022
Python I/O for STEM audio files

stempeg = stems + ffmpeg Python package to read and write STEM audio files. Technically, stems are audio containers that combine multiple audio stream

Fabian-Robert Stöter 72 Dec 23, 2022
Python library for handling audio datasets.

AUDIOMATE Audiomate is a library for easy access to audio datasets. It provides the datastructures for accessing/loading different datasets in a gener

Matthias 121 Nov 27, 2022
An audio digital processing toolbox based on a workflow/pipeline principle

AudioTK Audio ToolKit is a set of audio filters. It helps assembling workflows for specific audio processing workloads. The audio workflow is split in

Matthieu Brucher 238 Oct 18, 2022
Pyroomacoustics is a package for audio signal processing for indoor applications. It was developed as a fast prototyping platform for beamforming algorithms in indoor scenarios.

Summary Pyroomacoustics is a software package aimed at the rapid development and testing of audio array processing algorithms. The content of the pack

Audiovisual Communications Laboratory 1k Jan 9, 2023