MEAD: A Large-scale Audio-visual Dataset for Emotional Talking-face Generation [ECCV2020]

Related tags

Data Analysis Mead
Overview

MEAD: A Large-scale Audio-visual Dataset for Emotional Talking-face Generation [ECCV2020]

by Kaisiyuan Wang, Qianyi Wu, Linsen Song, Zhuoqian Yang, Wayne Wu, Chen Qian, Ran He, Yu Qiao, Chen Change Loy.

Introduction

This repository is for our ECCV2020 paper MEAD: A Large-scale Audio-visual Dataset for Emotional Talking-face Generation.

Multi-view Emotional Audio-visual Dataset

To cope with the challenge of realistic and natural emotional talking face genertaion, we build the Multi-view Emotional Audio-visual Dataset (MEAD) which is a talking-face video corpus featuring 60 actors and actresses talking with 8 different emotions at 3 different intensity levels. High-quality audio-visual clips are captured at 7 different view angles in a strictly-controlled environment. Together with the dataset, we also release an emotional talking-face generation baseline which enables the manipulation of both emotion and its intensity. For more specific information about the dataset, please refer to here.

image

Installation

This repository is based on Pytorch, so please follow the official instructions in here. The code is tested under pytorch1.0 and Python 3.6 on Ubuntu 16.04.

Usage

Training set & Testing set Split

Please refer to the Section 6 "Speech Corpus of Mead" in the supplementary material. The speech corpora are basically divided into 3 parts, (i.e., common, generic, and emotion-related). For each intensity level, we directly use the last 10 sentences of neutral category and the last 6 sentences of the other seven emotion categories as the testing set. Note that all the sentences in the testing set come from the "emotion-related" part. Meanwhile if you are trying to manipulate the emotion category, you can use all the 40 sentences of neutral category as the input samples.

Training

  1. Download the dataset from here. We package the audio-visual data of each actor in a single folder named after "MXXX" or "WXXX", where "M" and "W" indicate actor and actress, respectively.
  2. As Mead requires different modules to achieve different functions, thus we seperate the training for Mead into three stages. In each stage, the corresponding configuration (.yaml file) should be set up accordingly, and used as below:

Stage 1: Audio-to-Landmarks Module

cd Audio2Landmark
python train.py --config config.yaml

Stage 2: Neutral-to-Emotion Transformer

cd Neutral2Emotion
python train.py --config config.yaml

Stage 3: Refinement Network

cd Refinement
python train.py --config config.yaml

Testing

  1. First, download the pretrained models and put them in models folder.
  2. Second, download the demo audio data.
  3. Run the following command to generate a talking sequence with a specific emotion
cd Refinement
python demo.py --config config_demo.yaml

You can try different emotions by replacing the number with other integers from 0~7.

  • 0:angry
  • 1:disgust
  • 2:contempt
  • 3:fear
  • 4:happy
  • 5:sad
  • 6:surprised
  • 7:neutral

In addition, you can also try compound emotion by setting up two different emotions at the same time.

image

  1. The results are stored in outputs folder.

Citation

If you find this code useful for your research, please cite our paper:

@inproceedings{kaisiyuan2020mead,
 author = {Wang, Kaisiyuan and Wu, Qianyi and Song, Linsen and Yang, Zhuoqian and Wu, Wayne and Qian, Chen and He, Ran and Qiao, Yu and Loy, Chen Change},
 title = {MEAD: A Large-scale Audio-visual Dataset for Emotional Talking-face Generation},
 booktitle = {ECCV},
 month = Augest,
 year = {2020}
} 
Comments
  • MEAD - Missing views for speaker W017

    MEAD - Missing views for speaker W017

    I downloaded the full Part 0 from Google Drive, and am planning to use this dataset for my research. However, while preprocessing it I noticed that many views are missing from speaker W017 (it looks like the only view that exists is "down"). Is this intended? Do you plan on releasing these extra views in the future? Thanks.

    opened by miraodasilva 5
  • Demo code error

    Demo code error

    Hi, I am unable to run the demo code to generate the facial images. When I try to run demo.py under Refinement module after modifying the paths to the model files and list files appropriately, files audio_test_all.txt and video_list_test.txt (specified in config_demo.yml) appear to be missing. Also what should be the variable gan_path in config_demo.yml be set to ?

    opened by ssinha89 4
  • How to generate the .list or pickle

    How to generate the .list or pickle

    Hi, thank you for this great dataset.But when I want to train this work step by step,I find some problem. (1) Audio2Landmark: dataloader : how to generate audio_demo.list ? where is ./lists/landmarks.pickle? Could you tell me how to generate these files? ie. in audio_demo.txt why it is "M003_07_1_output_01/000.pickle reference.jpg"? (2) video_list.txt: how to generate video_list?ie. in video_list.txt why it is "M003/M003_01_1_output_01/000.jpg"? Could you realse the generate code?

    opened by Neroaway 3
  • The demo.py file does not work well, how to set the para of 'gan_path'?'

    The demo.py file does not work well, how to set the para of 'gan_path'?'

    Hi, I am unable to run the demo code to generate the facial images. When I try to run demo.py under Refinement module after modifying the paths to the model files and list files appropriately, files audio_test_all.txt and video_list_test.txt (specified in config_demo.yml) appear to be missing. Also what should be the variable gan_path in config_demo.yml be set to ?

    Actually, when running demo.py, you just need a single file. I have uploaded it, namely audio_demo.txt, and also the reference image. By the way, please feel free to download the testing audio data, which is provided on google drive.

    Originally posted by @uniBruce in https://github.com/uniBruce/Mead/issues/3#issuecomment-821160951

    opened by QUTGXX 2
  • Some question about Refinement network

    Some question about Refinement network

    I found there are some bugs in the demo.py. For example, the parameters in function draw_heatmap_from_78_landMark is lack when the function is called, then the parameters w and h is inverted.

    When I tried to fix these bugs, the following outputs appears: N2E result: N2E

    Audio2Landmark result: HeatMap

    Refinement result: RefineImage

    It is obvious that the final result, which is Refinement result, has too much noise. I wonder whether the N2E and Audio2Landmark network is normal, and how to get an accurate output of Refinement network.

    opened by JoeFang123 2
  • Link to Pretrained models

    Link to Pretrained models

    Hi, Thanks for releasing the dataset and the code. The link for the pretrained models needed for running the test code is currently not working, could you kindly update the repository with the correct hyperlink?

    opened by ssinha89 2
  • Speech Corpus Text Files

    Speech Corpus Text Files

    Thank you for making your dataset and method available. I would like to ask if the text of the corpus in txt (or other form) is available somewhere, or do we need to take it from the supplementary material of the pdf?

    opened by filby89 1
  • Why is 403 displayed at the end of downloading video data?

    Why is 403 displayed at the end of downloading video data?

    Hi, thanks for your project. Now I have a problem that there is 403 error displayed in the end of downloading video data, but it's ok when downloading audio data. Can u give some solutions? image

    opened by Baenny 1
  • How to generate image(384x384) from video?

    How to generate image(384x384) from video?

    Hi,Thanks for your works!However, you code donot include the code(generate image from video).I think this very code is important for this work. Could you release the code?Thank you very much!

    opened by Neroaway 1
  • Audio clips with duration >7 seconds

    Audio clips with duration >7 seconds

    Thanks for your efforts in producing the MEAD dataset. We are looking forward to working with it.

    It seems to me that some audio clips are longer than the maximum duration suggested in the paper. The supplementary material Fig 1 plots as well as the text on the first page suggests that the maximum "sentence duration" is 7 seconds. https://wywu.github.io/projects/MEAD/support/MEAD-supp.pdf

    From my example below I believe you should be able to replicate a case where the duration is 17 seconds. To replicate, you could try downloading the audio.tar file for M034 from Google Drive, extracting and then running the following python code:

    >>> import librosa
    >>> sentence_path = './fear/level_3/028.m4a'
    >>> y, sr = librosa.load(sentence_path, sr=None)
    >>> librosa.get_duration(y=y, sr=sr)
    17.237333333333332
    >>> sr
    48000
    >>> librosa.__version__
    '0.8.1'
    
    opened by mjschlauch 1
  • Audio pickle donot match image from video

    Audio pickle donot match image from video

    Hi, when I use the preprocess_mfcc.py to create audio.pickle, I fine the number of pickle donot match the number of image from video. ie . angry/level_1/001.m4a create 97 pickle, but video/front/angry/level_1/001.mp4 create 98 images. Have you encountered this problem and how to solve it? Thank you!

    opened by Neroaway 1
  • Camera Calibrations

    Camera Calibrations

    Hi, congrats on the fantastic dataset! It would be really helpful if it were possible to extract 3D data from these videos. Do you have the camera calibrations for the cameras you used?

    opened by jsaunders909 1
  • Link to dataset is broken

    Link to dataset is broken

    Hi I found that the link to dataset you provide in the article and in this repo is broken. https://wywu.github.io/projects/MEAD/MEAD.html Could you provide a new one?

    opened by neeek2303 3
  • The correspondence of audio snippets between different emotions

    The correspondence of audio snippets between different emotions

    Hi,

    I downloaded part of the datasets and found that the correspondence of the audio snippets are not arranged as what I have expected.

    I thought the number in the filename indicates the content of the audio. E.g. 001.mp4 in disgusted should be the same content as 001.mp4 in neutral. But unfortunately, they are not the same in M003. And I don't know why 30 snippets are provided for emotions other than neutral, but there are 40 snippets for neutral.

    Could you explain to me why is it? And could you provide the correspondence relations of different snippets? It is really hard to use your dataset if the correspondence are not provided.

    Thanks

    opened by cyj907 5
Owner
null
TextDescriptives - A Python library for calculating a large variety of statistics from text

A Python library for calculating a large variety of statistics from text(s) using spaCy v.3 pipeline components and extensions. TextDescriptives can be used to calculate several descriptive statistics, readability metrics, and metrics related to dependency distance.

null 150 Dec 30, 2022
Open-source Laplacian Eigenmaps for dimensionality reduction of large data in python.

Fast Laplacian Eigenmaps in python Open-source Laplacian Eigenmaps for dimensionality reduction of large data in python. Comes with an wrapper for NMS

null 17 Jul 9, 2022
CleanX is an open source python library for exploring, cleaning and augmenting large datasets of X-rays, or certain other types of radiological images.

cleanX CleanX is an open source python library for exploring, cleaning and augmenting large datasets of X-rays, or certain other types of radiological

Candace Makeda Moore, MD 20 Jan 5, 2023
An ETL Pipeline of a large data set from a fictitious music streaming service named Sparkify.

An ETL Pipeline of a large data set from a fictitious music streaming service named Sparkify. The ETL process flows from AWS's S3 into staging tables in AWS Redshift.

null 1 Feb 11, 2022
An Aspiring Drop-In Replacement for NumPy at Scale

Legate NumPy is a Legate library that aims to provide a distributed and accelerated drop-in replacement for the NumPy API on top of the Legion runtime. Using Legate NumPy you do things like run the final example of the Python CFD course completely unmodified on 2048 A100 GPUs in a DGX SuperPOD and achieve good weak scaling.

Legate 502 Jan 3, 2023
Python package for analyzing behavioral data for Brain Observatory: Visual Behavior

Allen Institute Visual Behavior Analysis package This repository contains code for analyzing behavioral data from the Allen Brain Observatory: Visual

Allen Institute 16 Nov 4, 2022
[CVPR2022] This repository contains code for the paper "Nested Collaborative Learning for Long-Tailed Visual Recognition", published at CVPR 2022

Nested Collaborative Learning for Long-Tailed Visual Recognition This repository is the official PyTorch implementation of the paper in CVPR 2022: Nes

Jun Li 65 Dec 9, 2022
Data Intelligence Applications - Online Product Advertising and Pricing with Context Generation

Data Intelligence Applications - Online Product Advertising and Pricing with Context Generation Overview Consider the scenario in which advertisement

Manuel Bressan 2 Nov 18, 2021
MeSH2Matrix - A set of Python codes for the generation of biomedical ontologies from the MeSH keywords of the PubMed scholarly publications

A set of Python codes for the generation of biomedical ontologies from the MeSH keywords of the PubMed scholarly publications

SisonkeBiotik 6 Nov 30, 2022
Exploratory Data Analysis for Employee Retention Dataset

Exploratory Data Analysis for Employee Retention Dataset Employee turn-over is a very costly problem for companies. The cost of replacing an employee

kana sudheer reddy 2 Oct 1, 2021
Retail-Sim is python package to easily create synthetic dataset of retaile store.

Retailer's Sale Data Simulation Retail-Sim is python package to easily create synthetic dataset of retaile store. Simulation Model Simulator consists

Corca AI 7 Sep 30, 2022
A python package which can be pip installed to perform statistics and visualize binomial and gaussian distributions of the dataset

GBiStat package A python package to assist programmers with data analysis. This package could be used to plot : Binomial Distribution of the dataset p

Rishikesh S 4 Oct 17, 2022
Pipeline and Dataset helpers for complex algorithm evaluation.

tpcp - Tiny Pipelines for Complex Problems A generic way to build object-oriented datasets and algorithm pipelines and tools to evaluate them pip inst

Machine Learning and Data Analytics Lab FAU 3 Dec 7, 2022
This is a python script to navigate and extract the FSD50K dataset

FSD50K navigator This is a script I use to navigate the sound dataset from FSK50K.

sweemeng 2 Nov 23, 2021
Python dataset creator to construct datasets composed of OpenFace extracted features and Shimmer3 GSR+ Sensor datas

Python dataset creator to construct datasets composed of OpenFace extracted features and Shimmer3 GSR+ Sensor datas

Gabriele 3 Jul 5, 2022
For making Tagtog annotation into csv dataset

tagtog_relation_extraction for making Tagtog annotation into csv dataset How to Use On Tagtog 1. Go to Project > Downloads 2. Download all documents,

hyeong 4 Dec 28, 2021
Code for One-shot Talking Face Generation from Single-speaker Audio-Visual Correlation Learning (AAAI 2022)

One-shot Talking Face Generation from Single-speaker Audio-Visual Correlation Learning (AAAI 2022) Paper | Demo Requirements Python >= 3.6 , Pytorch >

FuxiVirtualHuman 84 Jan 3, 2023
A large-scale face dataset for face parsing, recognition, generation and editing.

CelebAMask-HQ [Paper] [Demo] CelebAMask-HQ is a large-scale face image dataset that has 30,000 high-resolution face images selected from the CelebA da

switchnorm 1.7k Dec 26, 2022
Code for paper 'Audio-Driven Emotional Video Portraits'.

Audio-Driven Emotional Video Portraits [CVPR2021] Xinya Ji, Zhou Hang, Kaisiyuan Wang, Wayne Wu, Chen Change Loy, Xun Cao, Feng Xu [Project] [Paper] G

null 197 Dec 31, 2022
Code for "Learning Canonical Representations for Scene Graph to Image Generation", Herzig & Bar et al., ECCV2020

Learning Canonical Representations for Scene Graph to Image Generation (ECCV 2020) Roei Herzig*, Amir Bar*, Huijuan Xu, Gal Chechik, Trevor Darrell, A

roei_herzig 24 Jul 7, 2022