Unofficial pytorch implementation of paper "One-Shot Free-View Neural Talking-Head Synthesis for Video Conferencing"

Overview

One-Shot Free-View Neural Talking Head Synthesis

Unofficial pytorch implementation of paper "One-Shot Free-View Neural Talking-Head Synthesis for Video Conferencing"

Driving | FOMM | Ours:
show

Free-View:
show

Train:

python run.py --config config/vox-256.yaml --device_ids 0,1,2,3,4,5,6,7

Demo:

python demo.py --config config/vox-256.yaml --checkpoint path/to/checkpoint --source_image path/to/source --driving_video path/to/driving --relative --adapt_scale --find_best_frame

free-view (e.g. yaw=20, pitch=roll=0):

python demo.py --config config/vox-256.yaml --checkpoint path/to/checkpoint --source_image path/to/source --driving_video path/to/driving --relative --adapt_scale --find_best_frame --free_view --yaw 20 --pitch 0 --row 0

Note: run crop-video.py --inp driving_video.mp4 to get crop suggestions and crop the driving video as prompted.

Acknowlegement:

Thanks to NV, AliaksandrSiarohin and DeepHeadPose

Comments
  • share log.txt

    share log.txt

    @zhanglonghao1992 could you provide your log.txt from your training? I think it would be beneficial for people to understand are they going in the right direction or not.

    opened by romanvey 9
  • How to solve

    How to solve "ValueError: ImageIO does not generally support reading folders."~Thank you very much~

    python demo.py --config config/a.yaml --checkpoint c heckpoint/00000500-checkpoint.pth.tar --source_image photo/a.jpg --driving_video driving_video/a.mp4 demo.py:28: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details. config = yaml.load(f) /apply/anaconda3/envs/One-Shot_Free-View_Neural_Talking_Head_Synthesis/lib/python3.6/site-packages/torch/functional.py:445: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at ../aten/src/ATen/native/TensorShape.cpp:2157.) return _VF.meshgrid(tensors, **kwargs) # type: ignore[attr-defined] demo.py:290: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details. config = yaml.load(f) estimate jacobian: False demo.py:75: UserWarning: Implicit dimension choice for softmax has been deprecated. Change the call to include dim=X as an argument. pred = F.softmax(pred) 0%| | 0/986 [00:00<?, ?it/s]/apply/anaconda3/envs/One-Shot_Free-View_Neural_Talking_Head_Synthesis/lib/python3.6/site-packages/torch/nn/functional.py:4004: UserWarning: Default grid_sample and affine_grid behavior has changed to align_corners=False since 1.3.0. Please specify align_corners=True if the old behavior is desired. See the documentation of grid_sample for details. "Default grid_sample and affine_grid behavior has changed " /apply/anaconda3/envs/One-Shot_Free-View_Neural_Talking_Head_Synthesis/lib/python3.6/site-packages/torch/nn/functional.py:1806: UserWarning: nn.functional.sigmoid is deprecated. Use torch.sigmoid instead. warnings.warn("nn.functional.sigmoid is deprecated. Use torch.sigmoid instead.") 100%|█████████████████████████████████| 986/986 [02:06<00:00, 7.81it/s] Traceback (most recent call last): File "demo.py", line 304, in imageio.mimsave(opt.result_video, [img_as_ubyte(frame) for frame in predictions], fps=fps) File "/apply/anaconda3/envs/One-Shot_Free-View_Neural_Talking_Head_Synthesis/lib/python3.6/site-packages/imageio/core/functions.py", line 283, in mimwrite with imopen(uri, "wI", plugin=format) as file: File "/apply/anaconda3/envs/One-Shot_Free-View_Neural_Talking_Head_Synthesis/lib/python3.6/site-packages/imageio/core/imopen.py", line 223, in imopen raise err_type(err_msg) ValueError: ImageIO does not generally support reading folders. Limited support may be available via specific plugins. Specify the plugin explicitly using the plugin kwarg, e.g. plugin='DICOM'

    opened by 805094591 5
  • 使用预训练模型推断时发现的一些问题

    使用预训练模型推断时发现的一些问题

    你好,首先非常感谢你的开源代码和预训练模型,在使用vox-256-new(https://www.mediafire.com/folder/fcvtkn21j57bb/TalkingHead_Update)进行推断时,遇到了一些问题,和预期不一致,因此想请教一下。

    如下图所示,我尝试输出驱动过程中的一些中间结果,希望能得到原论文Fig.4中的效果,重建图和最终的结果图都符合预期,但是中间结果看起来很奇怪。

    image

    另外,我尝试编辑translation中的t,也就是对kp_source['value']中的x、y、z分别添加一定的偏移量,如下图所示,x、y的结果是符合预期的,分别对应左右和上下的translation,但是z对应的结果很奇怪,预期的应该是z对应类似远近的概念。

    image

    最后尝试编辑yaw(±45°)、pitch(±20°)、roll(±20°),yaw和roll的效果很好,但是pitch的效果略差一些,不知是否与上面z的奇怪现象有关。

    image

    我之前也尝试过复现这篇论文,但是也遇到了和以上类似的问题,除此之外还有一个问题就是occlusion map全部为1,这一点也很奇怪,希望能够一起讨论。

    opened by Honlan 4
  • Early training behavior

    Early training behavior

    Hi. First of all, thanks for your code. I'm training the model from scratch. I've found out that images from the generator look similar to the source image, but not the driving (pose, emotion, and so on). Perhaps, I'm at the start of training (only a single epoch has passed). Is this behavior expected? Please find the log image attached. Images_epoch_0_iteration_2000_rec jpg_00000000

    opened by GLivshits 4
  • About rotationt replacing the Jacobians

    About rotationt replacing the Jacobians

    Very good result. When I tried to reproduce this paper, I found that the Jacobian matrix was cancelled in the v3 version of arxiv. I canceled the Jacobians and got very worried results. I would like to ask if you use the Rotation parameter to replace the Jacobian. Also want to know if the code will be open source in the near future?

    opened by begins233 4
  • 关于重新训练的问题?

    关于重新训练的问题?

    您好, 感谢大佬的分享。遇到一些问题想请教一下。 首先数据集的处理,参考:https://github.com/AliaksandrSiarohin/video-preprocessing 由于数据集比较大,我这边尝试先用少量数据集试试能不能跑起来。(处理好的数据集放在一个文件夹下,分成train 和test) 同时需要补充hopenet的权重文件:https://github.com/natanielruiz/deep-head-pose 这样子处理之后,没有报错,但是会一直卡在 Loading hopenet...

    opened by DWCTOD 3
  • How to train with TalkingHead-1KH?

    How to train with TalkingHead-1KH?

    Has anyone here managed to train using this repo and the TalkingHead-1KH dataset? I believe the author (@zhanglonghao1992) has trained on this dataset before, but the code in the repo seems to default to training with VoxCeleb. It's not clear to me how to use the TalkingHead-1KH dataset out-of-the-box with this code. Should additional pre-processing steps be done, similar to those done to VoxCeleb here? Would you be willing to elaborate on what you did exactly to get, I assume, reasonable results when compared to VoxCeleb, @zhanglonghao1992?

    opened by yahskapar 2
  • answer about keypoint_detector.py

    answer about keypoint_detector.py

    Line 173

    yaw = self.fc_roll(out)
    pitch = self.fc_pitch(out)
    roll = self.fc_yaw(out)  
    t = self.fc_t(out)           
    exp = self.fc_exp(out)
    

    why yaw is fc_roll? why roll is fc_yaw?

    opened by hongju-jeong 2
  • headpose error

    headpose error

    here is the result when I set headpose_pred_to_degree() return to 0: https://user-images.githubusercontent.com/46595349/142132076-dd5cffc6-fa64-4c64-9f01-63d649137db8.mp4 and here is the original result: https://user-images.githubusercontent.com/46595349/142132085-b4707739-c2e0-4bf1-91fb-be4f318cb775.mp4 seems headpose-pred do not work correctly. ps : here is the ori-video and pic

    https://user-images.githubusercontent.com/46595349/142132892-a1c39ba6-9546-4eba-bf27-bcb120260c40.mp4 ouy2

    opened by highway007 2
  • some questions about rotation matrix and jacobian

    some questions about rotation matrix and jacobian

    您好,感谢您一直在更新!

    有一些关于旋转矩阵R和雅可比行列式J的问题不太懂想请教下:

    1. 11.5号update里使用rotation替换Jacobian,然后11号更新把这个划掉了,还增加了备注not working,代码里rotation的部分也删掉了,请问下是rotation matrix没有效果吗,所以还是用Jacobian?

    2. 最新的代码感觉不太像correct rotation matrix,感觉像是没有用,还是我看漏了...

    还有一点关于free-view的想法想听下您的看法:

    论文好像没有很详细解释为什么这样做就能够很好的free-view。表达意思更像是:能够控制yaw, pitch, roll的大小来操控free-view的程度,关键点旋转了 --> 达到free-view的效果,生成效果就依靠网络生成图像的能力吗?即使一些地方在source_image里可能没看到也能生成。

    希望能得到您的回复~~辛苦~

    opened by Vijayue 2
  • ValueError: Could not find a format to write the specified file in mode 'I'

    ValueError: Could not find a format to write the specified file in mode 'I'

    Thank you for your hard work.

    I am trying to run demo.py and I keep getting this error. Is there any thing I'm missing for running the code? I use an .mp4 video as a driving video, the code runs till the end, I have a list of prediction frames, but can not write the final video.

    opened by laleh-samadfam 2
  • [The results in some first epochs]

    [The results in some first epochs]

    Hi, Thank you for your implementation!. I am using your code to training on V100. I used

    • spade-based generator
    • estimate_jacobian=False
    • I change num_repeats to 1 instead of 75 for more quickly see the result after each epoch. Here is the result after 2 epochs. Does it looks sensible to you after 2 epochs? If not, there might be something wrong in my dataset preparation.

    https://user-images.githubusercontent.com/19920599/210220080-7aee3517-de24-471e-97ad-ae5ba58f459d.mp4

    opened by vuthede 1
  • Speeding up talking head synthesis

    Speeding up talking head synthesis

    I'm using a modified version of this project as a part of another project where talking head synthesis is a component. I'm curious if anyone here (author included) has any recommendations for speeding up the actual talking head synthesis itself, especially if it's used to augment, for example, tens of thousands of videos. I've achieved some moderate success just splitting up computation into different processes (given multiple processes running on GPUs with a larger memory pool, such as 48GB), but this is a bit clunky and also appears to mess up some videos by making them choppy at certain segments once stitched back together from multiple processes. Perhaps I missed it, but I don't really see any parameters to modify an inference-time batch size, or anything like that, within the code itself.

    Any other ideas?

    opened by yahskapar 0
  • Audio channel missing in result.mp4 (from demo.py)?

    Audio channel missing in result.mp4 (from demo.py)?

    I am able to synthesize neural talking heads with 'demo.py' but I am not getting a synchronized audio channel in the result.mp4 file - the audio in the driving video is lost.

    Perhaps there is a setting to keep the audio?

    opened by larryheck 0
  • Using jacobians or rotation matrix for warping features

    Using jacobians or rotation matrix for warping features

    Hi @zhanglonghao1992 , Thanks for sharing your implementation of the paper!

    As I understand there is difference from the first to the last version of the paper regarding the using of the jacobians as part of the feature deformation- the jacobians are replaced with the rotation matrices of the source and the driving. I see in lines 50-60 of your implementation to the dense motion model that you comment the use of the rotation matrices and now you are using the the location of the keypoints only - why did you do this?

    Thanks in advance, Tal

    opened by talbenh 0
Owner
ZLH
ZLH
Unofficial PyTorch implementation of MobileViT based on paper "MobileViT: Light-weight, General-purpose, and Mobile-friendly Vision Transformer".

MobileViT RegNet Unofficial PyTorch implementation of MobileViT based on paper MOBILEVIT: LIGHT-WEIGHT, GENERAL-PURPOSE, AND MOBILE-FRIENDLY VISION TR

Hong-Jia Chen 91 Dec 2, 2022
Unofficial pytorch implementation of the paper "Dynamic High-Pass Filtering and Multi-Spectral Attention for Image Super-Resolution"

DFSA Unofficial pytorch implementation of the ICCV 2021 paper "Dynamic High-Pass Filtering and Multi-Spectral Attention for Image Super-Resolution" (p

null 2 Nov 15, 2021
Unofficial PyTorch reimplementation of the paper Swin Transformer V2: Scaling Up Capacity and Resolution

PyTorch reimplementation of the paper Swin Transformer V2: Scaling Up Capacity and Resolution [arXiv 2021].

Christoph Reich 122 Dec 12, 2022
This is an unofficial implementation of the paper “Student-Teacher Feature Pyramid Matching for Unsupervised Anomaly Detection”.

This is an unofficial implementation of the paper “Student-Teacher Feature Pyramid Matching for Unsupervised Anomaly Detection”.

haifeng xia 32 Oct 26, 2022
Unofficial Tensorflow-Keras implementation of Fastformer based on paper [Fastformer: Additive Attention Can Be All You Need](https://arxiv.org/abs/2108.09084).

Fastformer-Keras Unofficial Tensorflow-Keras implementation of Fastformer based on paper Fastformer: Additive Attention Can Be All You Need. Tensorflo

Yam Peleg 10 Jan 30, 2022
Unofficial implementation of the paper: PonderNet: Learning to Ponder in TensorFlow

PonderNet-TensorFlow This is an Unofficial Implementation of the paper: PonderNet: Learning to Ponder in TensorFlow. Official PyTorch Implementation:

null 1 Oct 23, 2022
Unofficial Tensorflow 2 implementation of the paper Implicit Neural Representations with Periodic Activation Functions

Siren: Implicit Neural Representations with Periodic Activation Functions The unofficial Tensorflow 2 implementation of the paper Implicit Neural Repr

Seyma Yucer 2 Jun 27, 2022
Unofficial implementation of HiFi-GAN+ from the paper "Bandwidth Extension is All You Need" by Su, et al.

HiFi-GAN+ This project is an unoffical implementation of the HiFi-GAN+ model for audio bandwidth extension, from the paper Bandwidth Extension is All

Brent M. Spell 134 Dec 30, 2022
This is an unofficial PyTorch implementation of Meta Pseudo Labels

This is an unofficial PyTorch implementation of Meta Pseudo Labels. The official Tensorflow implementation is here.

Jungdae Kim 320 Jan 8, 2023
An unofficial PyTorch implementation of a federated learning algorithm, FedAvg.

Federated Averaging (FedAvg) in PyTorch An unofficial implementation of FederatedAveraging (or FedAvg) algorithm proposed in the paper Communication-E

Seok-Ju Hahn 123 Jan 6, 2023
Unofficial PyTorch implementation of Attention Free Transformer (AFT) layers by Apple Inc.

aft-pytorch Unofficial PyTorch implementation of Attention Free Transformer's layers by Zhai, et al. [abs, pdf] from Apple Inc. Installation You can i

Rishabh Anand 184 Dec 12, 2022
Unofficial PyTorch implementation of Neural Additive Models (NAM) by Agarwal, et al.

nam-pytorch Unofficial PyTorch implementation of Neural Additive Models (NAM) by Agarwal, et al. [abs, pdf] Installation You can access nam-pytorch vi

Rishabh Anand 11 Mar 14, 2022
Unofficial implementation of Alias-Free Generative Adversarial Networks. (https://arxiv.org/abs/2106.12423) in PyTorch

alias-free-gan-pytorch Unofficial implementation of Alias-Free Generative Adversarial Networks. (https://arxiv.org/abs/2106.12423) This implementation

Kim Seonghyeon 502 Jan 3, 2023
Unofficial Pytorch Implementation of WaveGrad2

WaveGrad 2 — Unofficial PyTorch Implementation WaveGrad 2: Iterative Refinement for Text-to-Speech Synthesis Unofficial PyTorch+Lightning Implementati

MINDs Lab 104 Nov 29, 2022
The author's officially unofficial PyTorch BigGAN implementation.

BigGAN-PyTorch The author's officially unofficial PyTorch BigGAN implementation. This repo contains code for 4-8 GPU training of BigGANs from Large Sc

Andy Brock 2.6k Jan 2, 2023
Unofficial PyTorch Implementation of UnivNet: A Neural Vocoder with Multi-Resolution Spectrogram Discriminators for High-Fidelity Waveform Generation

UnivNet UnivNet: A Neural Vocoder with Multi-Resolution Spectrogram Discriminators for High-Fidelity Waveform Generation This is an unofficial PyTorch

MINDs Lab 170 Jan 4, 2023
StarGAN-ZSVC: Unofficial PyTorch Implementation

This repository is an unofficial PyTorch implementation of StarGAN-ZSVC by Matthew Baas and Herman Kamper. This repository provides both model architectures and the code to inference or train them.

Jirayu Burapacheep 11 Aug 28, 2022
Unofficial pytorch implementation for Self-critical Sequence Training for Image Captioning. and others.

An Image Captioning codebase This is a codebase for image captioning research. It supports: Self critical training from Self-critical Sequence Trainin

Ruotian(RT) Luo 906 Jan 3, 2023
Unofficial PyTorch Implementation of UnivNet: A Neural Vocoder with Multi-Resolution Spectrogram Discriminators for High-Fidelity Waveform Generation

UnivNet UnivNet: A Neural Vocoder with Multi-Resolution Spectrogram Discriminators for High-Fidelity Waveform Generation This is an unofficial PyTorch

MINDs Lab 54 Aug 30, 2021