TalkingHead-1KH is a talking-head dataset consisting of YouTube videos

Last update: Dec 29, 2022

Related tags

Deep Learning TalkingHead-1KH

Overview

TalkingHead-1KH Dataset

TalkingHead-1KH is a talking-head dataset consisting of YouTube videos, originally created as a benchmark for face-vid2vid:

One-Shot Free-View Neural Talking-Head Synthesis for Video Conferencing
Ting-Chun Wang (NVIDIA), Arun Mallya (NVIDIA), Ming-Yu Liu (NVIDIA)
https://nvlabs.github.io/face-vid2vid/
https://arxiv.org/abs/2011.15126.pdf

The dataset consists of 500k video clips, of which about 80k are greater than 512x512 resolution. Only videos under permissive licenses are included. Note that the number of videos differ from that in the original paper because a more robust preprocessing script was used to split the videos. For business inquiries, please visit our website and submit the form: NVIDIA Research Licensing.

Download

Unzip the video metadata

First, unzip the metadata and put it under the root directory:

unzip data_list.zip

Unit test

This step downloads a small subset of the dataset to verify the scripts are working on your computer. You can also skip this step if you want to directly download the entire dataset.

bash videos_download_and_crop.sh small

The processed clips should appear in small/cropped_clips.

Download the entire dataset

Please run

bash videos_download_and_crop.sh train

The script will automatically download the YouTube videos, split them into short clips, and then crop and trim them to include only the face regions. The final processed clips should appear in train/cropped_clips.

Evaluation

To download the evaluation set which consists of only 1080p videos, please run

bash videos_download_and_crop.sh val

The processed clips should appear in val/cropped_clips.

We also provide the reconstruction results synthesized by our model here. For each video, we use only the first frame to reconstruct all the following frames.

Furthermore, for models trained using the VoxCeleb2 dataset, we also provide comparisons using another model trained on the VoxCeleb2 dataset. Please find the reconstruction results here.

Licenses

The individual videos were published in YouTube by their respective authors under Creative Commons BY 3.0 license. The metadata file, the download script file, the processing script file, and the documentation file are made available under MIT license. You can use, redistribute, and adapt it, as long as you (a) give appropriate credit by citing our paper, (b) indicate any changes that you've made, and (c) distribute any derivative works under the same license.

Privacy

When collecting the data, we were careful to only include videos that – to the best of our knowledge – were intended for free use and redistribution by their respective authors. That said, we are committed to protecting the privacy of individuals who do not wish their videos to be included.

If you would like to remove your video from the dataset, you can either

Go to YouTube and change the license of your video, or remove your video entirely.
Contact [email protected]. Please include your YouTube video link in the email.

Acknowledgements

This webpage borrows heavily from the FFHQ-dataset page.

Citation

If you use this dataset for your work, please cite

@inproceedings{wang2021facevid2vid,
  title={One-Shot Free-View Neural Talking-Head Synthesis for Video Conferencing},
  author={Ting-Chun Wang and Arun Mallya and Ming-Yu Liu},
  booktitle={CVPR},
  year={2021}
}

Comments

Black videos during trimming

Hi, the dataset is interesting. I get 3/4 black videos when generating the small dataset. I also notice that the trim function works from time to time when I modify the start/end_frames, though the beginning frames might be still black. Could anyone tell the reason and how to fix it?

I'm using ffmpeg 2.8.15 built with gcc 4.8.5, and ffmpeg-python 0.2.0.

Update: I also tested this command in terminal, but still get a black output video. So seems it's due to the version of ffmpeg?

ffmpeg -i -7TMJtnhiPM_0000.mp4 -vf trim=start_frame=50:end_frame=80 -an output80.mp4

opened by FebOne1 3
Is the download code still working?

Hi, thanks for releasing the dataset. I am facing a download issue:

File "/anaconda3/lib/python3.8/site-packages/pytube/cipher.py", line 301, in get_throttling_function_code code_lines_list = find_object_from_startpoint(js, match.span()[1]).split('\n') AttributeError: 'NoneType' object has no attribute 'span'

But a few days ago I successfully downloaded some. So I was wondering this is the issue for me only or for all? Btw, I have pytube.version='11.0.2'.

opened by KelestZ 2
1min_clips folder

Hi, after running bash videos_download_and_crop.sh small I got this message and folder small/cropped_clips appears to be empty :

Using pool size of 8 File exists: --Y9imYnfBw File exists: -7TMJtnhiPM 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 7632.95it/s] Elapsed time: 0.02 videos_download_and_crop.sh: line 7: ./videos_split.sh: Permission denied Using pool size of 8 Input file small/1min_clips/--Y9imYnfBw_0000.mp4 does not exist, skipping Input file small/1min_clips/--Y9imYnfBw_0000.mp4 does not exist, skipping Input file small/1min_clips/-7TMJtnhiPM_0000.mp4 does not exist, skipping Input file small/1min_clips/-7TMJtnhiPM_0000.mp4 does not exist, skipping 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:00<00:00, 12228.29it/s] Elapsed time: 0.02

UPDATE: It worked if I change ./videos_split.sh ${dataset}/raw_videos ${dataset}/1min_clips in videos_download_and_crop.sh to bash videos_split.sh ${dataset}/raw_videos ${dataset}/1min_clips

opened by neeek2303 1
The amount of the Test sets?
Thank you for sharing the dataset! That's really a great job! When I download and process the dataset, I find the amount of the train set is larger than the number listed in the paper "Implicit Warping for Animation with Image Set" (see figure). But the amount of the test set is smaller than the number in the paper. So the amount difference is caused from? I guess two possible cases：

The val_video_tubes.txt in the current repository is not complete.

The test set used in the paper additionally samples some videos from the train set?
opened by Vijayue 1
Multiple identities in one video

I found that the dataset didn't specify identity.

So there will be many identities in one videos.

Could you provide any suggestion to split the identity too?

opened by brianw0924 0
broken videos after segment and trim

I fixed the problem with adding -reset_timestamp 1 when segmenting the videos

but still facing broken videos at the crop & trim stage

Then I found that

stream = ffmpeg.trim(stream, start_frame=S, end_frame=E+1)

will resulting in "some" broken videos (black, and instantly jump to the end)

There are still some videos works, but many trimmed videos are broken,

Could you provide any suggestion?

opened by brianw0924 1
wrong annotations

I think the annotations maybe totally wrong. why the same audio will corresponds to many different identity's crop? such like: several clips have a same segment in time line, and that means a same audio clip across different identity.

so, do you have the right annotation? can you update it?

opened by Dorniwang 2
Wrong annotations or what?

Hi, I notice an issue when running videos_crop.py that is the trim range out of the video length. For example, this clipped video (attached), Ndnq9Ofs2eA_0022.mp4, has ~1 min time length, FPS=25.0, and a total of 1497 frames. But in train_video_tubes.txt, there is a given trim range, [1497, 1525] (raw 3, column 4-5), greater than the length of the video itself. The issue occurs on other videos as well, I haven't calculated how many times it happens just yet.

https://user-images.githubusercontent.com/27111467/153286974-bb63020c-34e6-40a4-8615-1ecd0ffe59db.mp4

opened by KelestZ 1

Owner

GitHub

Unofficial implementation of One-Shot Free-View Neural Talking Head Synthesis

face-vid2vid Usage Dataset Preparation cd datasets wget https://yt-dl.org/downloads/latest/youtube-dl -O youtube-dl chmod a+rx youtube-dl python load_

68 Dec 30, 2022

Official code release for "Learned Spatial Representations for Few-shot Talking-Head Synthesis" ICCV 2021

16 Oct 5, 2022

Official code for CVPR2022 paper: Depth-Aware Generative Adversarial Network for Talking Head Video Generation

?? Depth-Aware Generative Adversarial Network for Talking Head Video Generation (CVPR 2022) ?? If DaGAN is helpful in your photos/projects, please hel

503 Jan 4, 2023

🐦 Opytimizer is a Python library consisting of meta-heuristic optimization techniques.

Opytimizer: A Nature-Inspired Python Optimizer Welcome to Opytimizer. Did you ever reach a bottleneck in your computational experiments? Are you tired

546 Dec 31, 2022

3D AffordanceNet is a 3D point cloud benchmark consisting of 23k shapes from 23 semantic object categories, annotated with 56k affordance annotations and covering 18 visual affordance categories.

3D AffordanceNet This repository is the official experiment implementation of 3D AffordanceNet benchmark. 3D AffordanceNet is a 3D point cloud benchma

49 Dec 1, 2022

This repository contains the code for using the H3DS dataset introduced in H3D-Net: Few-Shot High-Fidelity 3D Head Reconstruction

H3DS Dataset This repository contains the code for using the H3DS dataset introduced in H3D-Net: Few-Shot High-Fidelity 3D Head Reconstruction Access

72 Dec 10, 2022

Code for Pose-Controllable Talking Face Generation by Implicitly Modularized Audio-Visual Representation (CVPR 2021)

Pose-Controllable Talking Face Generation by Implicitly Modularized Audio-Visual Representation (CVPR 2021) Hang Zhou, Yasheng Sun, Wayne Wu, Chen Cha

628 Dec 28, 2022

Look Who’s Talking: Active Speaker Detection in the Wild

Look Who's Talking: Active Speaker Detection in the Wild Dependencies pip install -r requirements.txt In addition to the Python dependencies, ffmpeg

60 Dec 8, 2022

FACIAL: Synthesizing Dynamic Talking Face With Implicit Attribute Learning. ICCV, 2021.

FACIAL: Synthesizing Dynamic Talking Face with Implicit Attribute Learning PyTorch implementation for the paper: FACIAL: Synthesizing Dynamic Talking

226 Jan 8, 2023

Building Ellee — A GPT-3 and Computer Vision Powered Talking Robotic Teddy Bear With Human Level Conversation Intelligence

Using an object detection and facial recognition system built on MobileNetSSDV2 and Dlib and running on an NVIDIA Jetson Nano, a GPT-3 model, Google Speech Recognition, Amazon Polly and servo motors, I built Ellee - a robotic teddy bear who can move her head and converse naturally.

24 Oct 26, 2022

Code for One-shot Talking Face Generation from Single-speaker Audio-Visual Correlation Learning (AAAI 2022)

One-shot Talking Face Generation from Single-speaker Audio-Visual Correlation Learning (AAAI 2022) Paper | Demo Requirements Python >= 3.6 , Pytorch >

84 Jan 3, 2023

Code I use to automatically update my videos' metadata on YouTube

mCodingYouTube This repository contains the code I use to automatically update my videos' metadata on YouTube, including: titles, descriptions, tags,

19 Oct 7, 2022

[ECCV 2020] Reimplementation of 3DDFAv2, including face mesh, head pose, landmarks, and more.

Stable Head Pose Estimation and Landmark Regression via 3D Dense Face Reconstruction Reimplementation of (ECCV 2020) Towards Fast, Accurate and Stable

221 Dec 30, 2022

Human head pose estimation using Keras over TensorFlow.

RealHePoNet: a robust single-stage ConvNet for head pose estimation in the wild.

71 Jan 5, 2023

WHENet: Real-time Fine-Grained Estimation for Wide Range Head Pose

WHENet: Real-time Fine-Grained Estimation for Wide Range Head Pose Yijun Zhou and James Gregson - BMVC2020 Abstract: We present an end-to-end head-pos

368 Dec 26, 2022

Deep Learning Head Pose Estimation using PyTorch.

Hopenet is an accurate and easy to use head pose estimation network. Models have been trained on the 300W-LP dataset and have been tested on real data with good qualitative performance.

1.3k Dec 26, 2022

Web service for facial landmark detection, head pose estimation, facial action unit recognition, and eye-gaze estimation based on OpenFace 2.0

OpenGaze: Web Service for OpenFace Facial Behaviour Analysis Toolkit Overview OpenFace is a fantastic tool intended for computer vision and machine le

4 Nov 3, 2022

OpenFace – a state-of-the art tool intended for facial landmark detection, head pose estimation, facial action unit recognition, and eye-gaze estimation.

OpenFace 2.2.0: a facial behavior analysis toolkit Over the past few years, there has been an increased interest in automatic facial behavior analysis

5.8k Dec 31, 2022

Wanli Li and Tieyun Qian: Exploit a Multi-head Reference Graph for Semi-supervised Relation Extraction, IJCNN 2021

MRefG Wanli Li and Tieyun Qian: "Exploit a Multi-head Reference Graph for Semi-supervised Relation Extraction", IJCNN 2021 1. Requirements To reproduc

5 Jul 26, 2022