This repository contains a PyTorch implementation of "AD-NeRF: Audio Driven Neural Radiance Fields for Talking Head Synthesis".

Last update: Dec 29, 2022

Related tags

Deep Learning AD-NeRF

Overview

AD-NeRF: Audio Driven Neural Radiance Fields for Talking Head Synthesis

| Project Page | Paper |

PyTorch implementation for the paper "AD-NeRF: Audio Driven Neural Radiance Fields for Talking Head Synthesis"

Prerequisites

You can create an anaconda environment called adnerf with:

conda env create -f environment.yml
conda activate adnerf

PyTorch3D

Recommend install from a local clone

git clone https://github.com/facebookresearch/pytorch3d.git
cd pytorch3d && pip install -e .

Basel Face Model 2009

Put "01_MorphableModel.mat" to data_util/face_tracking/3DMM/; cd data_util/face_tracking; run
```
python convert_BFM.py
```

Train AD-NeRF

Data Preprocess ($id Obama for example)
```
bash process_data.sh Obama
```
- Input: A portrait video at 25fps containing voice audio. (dataset/vids/$id.mp4)
- Output: folder dataset/$id that contains all files for training
Train Two NeRFs (Head-NeRF and Torso-NeRF)
- Train Head-NeRF with command
```
python NeRFs/HeadNeRF/run_nerf.py --config dataset/$id/HeadNeRF_config.txt
```
- Copy latest trainied model from dataset/$id/logs/$id_head to dataset/$id/logs/$id_com
- Train Torso-NeRF with command
```
python NeRFs/TorsoNeRF/run_nerf.py --config dataset/$id/TorsoNeRF_config.txt
```

Run AD-NeRF for rendering

Reconstruct original video with audio input

python NeRFs/TorsoNeRF/run_nerf.py --config dataset/$id/TorsoNeRFTest_config.txt --aud_file=dataset/$id/aud.npy --test_size=300

Drive the target person with another audio input

python NeRFs/TorsoNeRF/run_nerf.py --config dataset/$id/TorsoNeRFTest_config.txt --aud_file=${deepspeechfile.npy} --test_size=-1

Acknowledgments

We use face-parsing.PyTorch for parsing head and torso maps, and DeepSpeech for audio feature extraction. The NeRF model is implemented based on NeRF-pytorch.

Comments

Torso not added to final output

I trained the Obama example and I get expected results from the HeadNeRF, but when I train and render the TorsoNeRF, I get results that look very similar (i.e. a floating head, torso is missing). I only trained for 10'000 iterations, but from my experience with NeRF, the correct shapes should appear rather quickly.

Could it be that there is a setting that needs to be turned on, or a bug in the code, that prevents the Torso from being learned by the TorsoNeRF? Or does it take a large amount of iterations for the Torso to appear?

I used the commands as specified in the README.

Thanks for your help, this is a very cool project!

HeadNeRF output: TorsoNeRF output:

opened by RobinRenggli 18
pytorch3d

Hello, mine is RTX3090, now the installation of pytorch3d error, the problem can not install nvidiacub, I custom installation can not be called, this place how do you solve?

opened by wode123 11
Problem about loading the torso pretrained models

Hi, yudong When I try to load the torso pretrained models for a better initialization. I get an error. So why the pretrained models donot have the ['network_audattnet_state_dict']. How can I handle this issue? Thanks!

opened by sstzal 9
No file 3DMM_info.npy when doing step6 in process_data.py

Hi, yudong When I run the process_data.py, there is an error in step 6. Can you tell me where can I find the 3DMM_info.npy?

Here is the details: Traceback (most recent call last): File "data_util/face_tracking/face_tracker.py", line 47, in id_dim, exp_dim, tex_dim, point_num) File "...../code/AD-NeRF-master/data_util/face_tracking/facemodel.py", line 16, in init modelpath, '3DMM_info.npy'), allow_pickle=True).item() File "..../anaconda3/envs/adnerf_cu11/lib/python3.7/site-packages/numpy/lib/npyio.py", line 417, in load fid = stack.enter_context(open(os_fspath(file), "rb")) FileNotFoundError: [Errno 2] No such file or directory: '..../code/AD-NeRF-master/data_util/face_tracking/3DMM/3DMM_info.npy'

opened by sstzal 9
关于 tracking parameters

你好，我使用奥巴马开源模型在pretrained_models/TorsoNeRFTest_config.txt或者在自己处理完obama数据后生成的TorsoNeRFTest_config.txt下测试，均出现身体脱节抖动问题，我认为这个问题是因为我处理生成的obama数据和开源的模型不匹配，涉及到tracking parameters等，因此是否我在自己处理的数据上重新训练，将不会有身体脱节抖动问题呢？

opened by loongofqiao 8
Increase accuracy of torso parsing

(Firstly, I want to apologize for creating multiple issues; if this is bothersome or you are very busy just let me know and I will try to work these out myself. In this case, feel free to close the issue.)

I've tried several videos now, and in many cases the parsing is not very accurate for the torso, especially when the subject is a woman with long hair. The problem is, if the torso isn't correctly parsed, it influences the generated background image.

Are there parameters in the parsing model that can be adjusted, or is it just not perfect regarding the torso? Do you have any recommendations for what to change if the results of step 3 in the preprocessing aren't good?

Here are some examples:

opened by RobinRenggli 7
Expected Inference Time

Hi @YudongGuo, I wanted to know what is the expected inference time to generate output for let's say 300 frames or 12sec video. For me, it took around 5 hours to produce the output of duration 12sec [300 frames]. Is this the expected time or am I missing something? Thanks.

opened by saslamsameja 7
Bin size issue in step 6

When I was trying to generate my own dataset with an one-minute video, during step 6, after find best focal 1200, it raised lots of same error messages: Bin size was too small in the coarse rasterization phase. This caused an overflow, meaning output may be incomplete. To solve, try increasing max_faces_per_bin / max_points_per_bin, decreasing bin_size, or setting bin_size to -1 to use the naïve rasterization. But I didn't find where I could set the bin_size. Since the track_params.pt in step was not generated due to the error, I couldn't move forward. Could anybody help?

opened by mumukawayi 6
Torso training cannot start

Congratulations on your great work!

I am trying to train your model from scratch with my own data to understand the whole pipepline. The preprocessing and head NeRF went well. However, when I try to train the torsor NeRF with pretrained head and torso ckpts, it cannot start at all. The following info appeared and then it stopped training.

$python NeRFs/TorsoNeRF/run_nerf.py --config dataset/cctv/TorsoNeRF_config.txt Found ckpts ['/dump/2/zhule.zhl/gitWorks/AD-NeRF/dataset/cctv/logs/cctv_com/120000_head.tar'] Reloading from /dump/2/zhule.zhl/gitWorks/AD-NeRF/dataset/cctv/logs/cctv_com/120000_head.tar Not ndc! load audattnet Found ckpts ['/dump/2/zhule.zhl/gitWorks/AD-NeRF/dataset/cctv/logs/cctv_com/600000_body.tar'] Reloading from /dump/2/zhule.zhl/gitWorks/AD-NeRF/dataset/cctv/logs/cctv_com/600000_body.tar Not ndc! Begin TRAIN views are [ 0 1 2 ... 5863 5864 5865] VAL views are [5866 5867 5868 5869 5870 5871 5872 5873] 0it [00:00, ?it/s]

I appreciate it if you could help look into this issue.

Thanks!!

opened by tearscoco 6
coords_norect.shape[0] = 0.

ValueError a must be greater than 0 unless no samples are taken File "/apdcephfs/private_quincheng/LipSync/AD-NeRF/NeRFs/HeadNeRF/run_nerf.py", line 862, in train coords_norect.shape[0], size=[norect_num], replace=False) # (N_rand,) File "/apdcephfs/private_quincheng/LipSync/AD-NeRF/NeRFs/HeadNeRF/run_nerf.py", line 965, in train()

I found coords_norect.shape[0] = 0. How to fix this error?

opened by 649459021 6
Deepspeech model

Hi Yudong,

I am very interested in your excellent work. When I want to load the deepspeech model ('deepspeech-0.9.2-models.pbmm'), it raised the error ('google.protobuf.message.DecodeError: Error parsing message'). My conda environment is followed your command by 'environment.yml'. Can u help me address this issue?

Best regards, Allen

opened by QUTGXX 6
About the version problem of pytorch3d

Hi,This is a very good job, I am very interested in this project ! And I am very much looking forward to the effect in the article. However,I have encountered a problem. I have configured the environment according to README.md, but the problem of "Not compiled with GPU support" appeared when running process_data.py. I learned from the Internet that this is because pytorch3D does not have a GPU version corresponding to pytorch. I did not get information about the corresponding version on the github homepage of pytorch3D, so I really hope you can give the version number of pytorch3D or the pytorch3D source package of the corresponding version of this project.

opened by xiao-lin-hub 0
FileNotFoundError: [Errno 2] No such file or directory: '/home/zoloz/hangke/AD-NeRF/dataset/Obama/aud.npy'

Hello, author. I also encountered a problem when running rendering. python NeRFs/TorsoNeRF/run_ nerf. py --config dataset/$id/TorsoNeRFTest_ config. txt --aud_ file=dataset/$id/aud. npy --test_ size=300 This missing 'aud. NPY' The error is as follows: traceback (most recent call last): File "NeRFs/HeadNeRF/run_nerf.py", line 965, in train() File "NeRFs/HeadNeRF/run_nerf.py", line 655, in train args. datadir, args.testskip) File "/home/zoloz/hangke/AD-NeRF/NeRFs/HeadNeRF/load_audface.py", line 37, in load_ audface_ data aud_ features = np.load(os.path.join(basedir, 'aud.npy')) File "/home/zoloz/anaconda3/envs/adnerf/lib/python3.7/site-packages/numpy/lib/npyio.py", line 417, in load fid = stack. enter_ context(open(os_fspath(file), "rb")) FileNotFoundError: [Errno 2] No such file or directory: '/home/zoloz/hangke/AD-NeRF/dataset/Obama/aud.npy'

opened by Prometheus945 0
FileNotFoundError: No such file: '/data01/home/scy0764/AD-NeRF-master/AD-NeRF-master/dataset/Obama/bc.jpg'

Traceback (most recent call last): File "NeRFs/HeadNeRF/run_nerf.py", line 965, in train() File "NeRFs/HeadNeRF/run_nerf.py", line 655, in train args.datadir, args.testskip) File "/data01/home/scy0764/AD-NeRF-master/AD-NeRF-master/NeRFs/HeadNeRF/load_audface.py", line 75, in load_audface_data bc_img = imageio.imread(os.path.join(basedir, 'bc.jpg')) File "/data01/home/scy0764/.conda/envs/adnerf/lib/python3.7/site-packages/imageio/core/functions.py", line 265, in imread reader = read(uri, format, "i", **kwargs) File "/data01/home/scy0764/.conda/envs/adnerf/lib/python3.7/site-packages/imageio/core/functions.py", line 172, in get_reader request = Request(uri, "r" + mode, **kwargs) File "/data01/home/scy0764/.conda/envs/adnerf/lib/python3.7/site-packages/imageio/core/request.py", line 124, in init self._parse_uri(uri) File "/data01/home/scy0764/.conda/envs/adnerf/lib/python3.7/site-packages/imageio/core/request.py", line 260, in _parse_uri raise FileNotFoundError("No such file: '%s'" % fn) FileNotFoundError: No such file: '/data01/home/scy0764/AD-NeRF-master/AD-NeRF-master/dataset/Obama/bc.jpg'

opened by smallwhite-dragon 0
ValueError: Cannot load file containing pickled data when allow_pickle=False

ack (most recent call last): File "NeRFs/HeadNeRF/run_nerf.py", line 965, in train() File "NeRFs/HeadNeRF/run_nerf.py", line 655, in train args.datadir, args.testskip) File "/data01/home/scy0764/AD-NeRF-master/NeRFs/HeadNeRF/load_audface.py", line 37, in load_audface_data aud_features = np.load(os.path.join(basedir, 'aud.npy'), allow_pickle=True) File "/data01/home/scy0764/.conda/envs/adnerf/lib/python3.7/site-packages/numpy/lib/npyio.py", line 445, in load raise ValueError("Cannot load file containing pickled data " ValueError: Cannot load file containing pickled data when allow_pickle=False

opened by smallwhite-dragon 0
How to improve the output video quality

https://user-images.githubusercontent.com/42994566/187853571-4654deeb-8c05-4e93-9147-bbd3af09653a.mp4

I use your pretrained model, and then my video looks like chins flickering and lips twitching. What's the problem may cause that? Should I fine tuning two nerfs with the input video?

opened by ZziTaiLeo 0

This repository contains a PyTorch implementation of "AD-NeRF: Audio Driven Neural Radiance Fields for Talking Head Synthesis".

Related tags

Overview

AD-NeRF: Audio Driven Neural Radiance Fields for Talking Head Synthesis

| Project Page | Paper |

Prerequisites

Train AD-NeRF

Run AD-NeRF for rendering

Acknowledgments

Comments

Owner

This repository contains the PyTorch implementation of the paper STaCK: Sentence Ordering with Temporal Commonsense Knowledge appearing at EMNLP 2021.

RGBD-Net - This repository contains a pytorch lightning implementation for the 3DV 2021 RGBD-Net paper.

This repository contains a pytorch implementation of "HeadNeRF: A Real-time NeRF-based Parametric Head Model (CVPR 2022)".

An image base contains 490 images for learning (400 cars and 90 boats), and another 21 images for testingAn image base contains 490 images for learning (400 cars and 90 boats), and another 21 images for testing

This repository contains PyTorch code for Robust Vision Transformers.

This repository contains PyTorch models for SpecTr (Spectral Transformer).

An efficient and effective learning to rank algorithm by mining information across ranking candidates. This repository contains the tensorflow implementation of SERank model. The code is developed based on TF-Ranking.

This repository contains the implementation of Deep Detail Enhancment for Any Garment proposed in Eurographics 2021

This repository contains a re-implementation of the code for the CVPR 2021 paper "Omnimatte: Associating Objects and Their Effects in Video."

This repository contains the official implementation code of the paper Improving Multimodal Fusion with Hierarchical Mutual Information Maximization for Multimodal Sentiment Analysis, accepted at EMNLP 2021.

This repository contains the implementation of the paper: "Towards Frequency-Based Explanation for Robust CNN"

This repository contains numerical implementation for the paper Intertemporal Pricing under Reference Effects: Integrating Reference Effects and Consumer Heterogeneity.

This repo contains the pytorch implementation for Dynamic Concept Learner (accepted by ICLR 2021).

This repository contains the code used for Predicting Patient Outcomes with Graph Representation Learning (https://arxiv.org/abs/2101.03940).

This repository contains the implementations related to the experiments of a set of publicly available datasets that are used in the time series forecasting research space.

This repository contains the code for our fast polygonal building extraction from overhead images pipeline.

This repository contains the code for the paper "Hierarchical Motion Understanding via Motion Programs"

This repository contains all the code and materials distributed in the 2021 Q-Programming Summer of Qode.

This repository contains the code, data, and models of the paper titled "XL-Sum: Large-Scale Multilingual Abstractive Summarization for 44 Languages" published in Findings of the Association for Computational Linguistics: ACL 2021.