Repository for MeshTalk supplemental material and code once the (already approved) 16 GHS captures our lab will make publicly available are released.

Meta Research

Last update: Jan 6, 2023

Related tags

Deep Learning meshtalk

Overview

meshtalk

This repository contains code to run MeshTalk for face animation from audio. If you use MeshTalk, please cite

@inproceedings{richard2021meshtalk,
    author    = {Richard, Alexander and Zollh\"ofer, Michael and Wen, Yandong and de la Torre, Fernando and Sheikh, Yaser},
    title     = {MeshTalk: 3D Face Animation From Speech Using Cross-Modality Disentanglement},
    booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
    month     = {October},
    year      = {2021},
    pages     = {1173-1182}
}

Supplemental Material

Running MeshTalk

Dependencies

ffmpeg
numpy
torch         (tested with v1.10.0)
pytorch3d     (tested with v0.4.0)
torchaudio    (tested with v0.10.0)

Animating a Face Mesh from Audio

Download the pretrained models and unzip them. Make sure your python path contains the root directory (export PYTHONPATH=<your_meshtalk_root_directory>).

Then, run

python animate_face.py --model_dir <your_pretrained_model_dir> --audio_file <your_speech_snippet.wav> --output <your_output_file.mp4>

See a description of command line arguments via python animate_face.py --help. We provide a neutral face template mesh in assets/face_template.obj. Note that the rendered results look slightly different than in the paper and supplemental video because we use a differnt (open source) rendering engine in this repository.

Training your own MeshTalk version

We are in the process of releasing high-quality 3D face captures of 16 subjects (a subset of the dataset used in this paper). We will link to the dataset here once it is available.

License

The code and dataset are released under CC-NC 4.0 International license.

Comments

Can I change the OBJ model?

If I want to change an OBJ face, what are the requirements? Or is there a template for the face you use? Then you can create many faces through the template. I read other issues and learned that not all OBJ can be used. Does the number of vertices of the mesh need to be the same? Does the face size need to be the same?

This is a cool project.

opened by ALIENMINT 6
asset files creation

Hi, I ran the custom audio expressions on your neutral mesh object and it ran well. I wanted to run the audio on my own custom (model)object files. I have created the object files for my person model. For this how do I generate the asset files - face_mean.npy, face_std forehead_mask and neck mask files? Are these files generated for the object file, or am i supposed to resize the object file to the 6172 dimension in order to use with the existing asset files? Thank you for your help in advance.

opened by programmeddeath1 6
new obj

i have a new obj file with 6172 points from the default obj file, Q1:what is the meaning of the file face_mean and face_std and the two txt with smoothing ? Is the middle face and the hyperbole face ? Q2: how to make the face_mean and face_std and the smooth txt file?

opened by luoww1992 5
Training parameters

Hello,

I am trying to train MeshTalk on the VOCA dataset, however, the loss value explodes if I use a learning rate 1e-4 or higher, and keeps oscillating in the range of 0.2 if I use a lower learning rate (this does not lead to realistic results). I was wondering what training parameters were used in the paper?

I am using the following parameters: no. of frames, T = 128 optimizer SGD with lr=9e-5 (at the moment), momentum=0.9, nesterov=True M_upper = 5 and M_lower = 5 batch_size = 16

Thanks for any help!

opened by UttaranB127 5
mesh faces missing for multiface

The mesh graph (.obj) multiface provided has almost 2000 faces less than the mesh by meshtalk (.obj). I wonder how to cope with it. Should I do some remeshing work to connect the isolated vertices together?

opened by songtoy 4
How to use diffrent obj model?

Incredible work！Thanks! I have a question on using diffrent obj model. I tried to use obj model file created by deca, but meet a error:

(meshtalk) ubuntu@ubuntu-X640-G30:/data/cx/GANs/meshtalk$ python animate_face.py --model_dir weights/pretrained_models --audio_file test.wav --output outputs --face_template myasset/mzd.obj /home/ubuntu/.local/lib/python3.8/site-packages/torchaudio/backend/utils.py:53: UserWarning: "sox" backend is being deprecated. The default backend will be changed to "sox_io" backend in 0.8.0 and "sox" backend will be removed in 0.9.0. Please migrate to "sox_io" backend. Please refer to https://github.com/pytorch/audio/issues/903 for the detail. warnings.warn( load assets... load models... Loaded: weights/pretrained_models/vertex_unet.pkl Loaded: weights/pretrained_models/context_model.pkl Loaded: weights/pretrained_models/encoder.pkl animate face mesh... /home/ubuntu/.local/lib/python3.8/site-packages/torch/functional.py:515: UserWarning: stft will require the return_complex parameter be explicitly specified in a future PyTorch release. Use return_complex=False to preserve the current behavior or return_complex=True to return a complex output. (Triggered internally at /pytorch/aten/src/ATen/native/SpectralOps.cpp:653.) return _VF.stft(input, n_fft, hop_length, win_length, window, # type: ignore /home/ubuntu/.local/lib/python3.8/site-packages/torch/functional.py:515: UserWarning: The function torch.rfft is deprecated and will be removed in a future PyTorch release. Use the new torch.fft module functions, instead, by importing torch.fft and calling torch.fft.fft or torch.fft.rfft. (Triggered internally at /pytorch/aten/src/ATen/native/SpectralOps.cpp:590.) return _VF.stft(input, n_fft, hop_length, win_length, window, # type: ignore Traceback (most recent call last): File "animate_face.py", line 93, in geom = template_verts.cuda().view(1, 1, 6172, 3).expand(-1, T, -1, -1).contiguous() RuntimeError: shape '[1, 1, 6172, 3]' is invalid for input of size 15069

What should I do if I want to animate different obj files?

opened by AdamMayor2018 4
Different topology from multiface dataset?

I find that the number of vertices from your given template object is different from what I downloaded from multiface dataset. Especially the details of the mouth are quite different, would you please share more information about the experiments?

opened by chenerg 3

Context model - how to train?

Hello,

How to train the autoregressive model for inference? In the forward function, what would be the first expression_one_hot tensor? I understand subsequent inputs would be the labels output of previous timestep.

`def forward(self, expression_one_hot: th.Tensor, audio_code: th.Tensor):

   x = self.embedding(expression_one_hot)

    for layer in self.context_layers:
        x = layer(x, audio_code)
        x = F.leaky_relu(x, 0.2)

    logits = self.logits(x)
    logprobs = F.log_softmax(logits, dim=-1)
    probs = F.softmax(logprobs, dim=-1)
    labels = th.argmax(logprobs, dim=-1)

    return {"logprobs": logprobs, "probs": probs, "labels": labels}`

Thanks

opened by karthik-mohankumar 3

Do you have any uv texture mapping files?

Hi. I am very impressed with your wonderful research. Thank you so much for sharing the great results. I want to render a texture to the output generated by this model. Can I get a uv texture mapping file that matches the output?

opened by shovelingpig 3
Audio features are different from your paper statement

Hi, I found the audio preprocessing use simple transformation in your codes (load_audio & audio_chunking). But there are different from your statement in paper where the paper says"Our audio data is recorded at 16kHz. For each tracked mesh, we compute the Mel spectrogram of a 600ms audio snippet starting 500ms before and ending 100ms after the respective visual frame. We extract 80-dimensional Mel spectral features every 10ms, using 1, 024 frequency bins and a window size of 800 for the underlying Fourier transform."

I didn't find any Mel spectral calculation in your code, why there are different? Is the current version is better than Mel spectral features?

opened by kjhgfdsaas 3

Build pytorch3d 0.4.0 failed with torch1.10

I try to build pytorch3d 0.4.0 source with torch1.10 as same version as readme. But it always failed. The log is below:

/home/local/gcc-5.3.0/bin/gcc -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -DWITH_CUDA -DTHRUST_IGNORE_CUB_VERSION_CHECK -I/home/Projects/github_projects/pytorch3d/pytorch3d/csrc -I/home/software_packages/cub-1.10.0 -I/home/anaconda3/envs/torch1.10/lib/python3.7/site-packages/torch/include -I/home/anaconda3/envs/torch1.10/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -I/home/anaconda3/envs/torch1.10/lib/python3.7/site-packages/torch/include/TH -I/home/anaconda3/envs/torch1.10/lib/python3.7/site-packages/torch/include/THC -I/usr/local/cuda-10.2/include -I/home/anaconda3/envs/torch1.10/include/python3.7m -c /home/Projects/github_projects/pytorch3d/pytorch3d/csrc/rasterize_meshes/rasterize_meshes_cpu.cpp -o build/temp.linux-x86_64-3.7/home/Projects/github_projects/pytorch3d/pytorch3d/csrc/rasterize_meshes/rasterize_meshes_cpu.o -std=c++14 -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="_gcc" -DPYBIND11_STDLIB="_libstdcpp" -DPYBIND11_BUILD_ABI="_cxxabi1011" -DTORCH_EXTENSION_NAME=_C -D_GLIBCXX_USE_CXX11_ABI=0
  cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
  /home/Projects/github_projects/pytorch3d/pytorch3d/csrc/rasterize_meshes/rasterize_meshes_cpu.cpp: In function ‘std::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor> RasterizeMeshesNaiveCpu(const at::Tensor&, const at::Tensor&, const at::Tensor&, const at::Tensor&, std::tuple<int, int>, float, int, bool, bool, bool)’:
  /home/Projects/github_projects/pytorch3d/pytorch3d/csrc/rasterize_meshes/rasterize_meshes_cpu.cpp:294:28: error: converting to ‘std::tuple<float, int, float, float, float, float>’ from initializer list would use explicit constructor ‘constexpr std::tuple< <template-parameter-1-1> >::tuple(_UElements&& ...) [with _UElements = {const float&, int&, const float&, const float&, const float&, const float&}; <template-parameter-2-2> = void; _Elements = {float, int, float, float, float, float}]’
                 q[idx_top_k] = {
                              ^
  error: command '/home/local/gcc-5.3.0/bin/gcc' failed with exit status 1
  Building wheel for pytorch3d (setup.py) ... error
  ERROR: Failed building wheel for pytorch3d

Dose pytorch3d 0.4.0 really support torch1.10? I see the requirement is less than 1.7.1 in pytorch3d 0.4.0 url and less than 1.9.1 in pytorch3d main url

My environment:

centos 7
gcc 5.3.0
cuda 10.2
cub 1.10
python 3.7 (conda environment)
torch1.10
pytorch3d 0.4.0

opened by wikiwen 3

Which data was used for the pre-trained model

Hi! The paper mentions the following:

We release a subset of 16 subjects of this dataset and our model using only these subjects as a baseline to compare against

Since multiface was release with only 13 identities, can you please confirm what was used for the released pre-trained model? (e.g. the 13 identities in multiface? Those plus 3 other identities? Or another set of 16 identities?)

Thank you!

opened by luizgh 0

Releases(pretrained_models_v1.0)

pretrained_models_v1.0(Oct 27, 2021)

Pretrained models for MeshTalk
Source code(tar.gz)
Source code(zip)
pretrained_models.zip(155.06 MB)

Owner

Meta Research

GitHub

Supplemental Code for "ImpressionNet :A Multi view Approach to Predict Socio Facial Impressions"

Supplemental Code for "ImpressionNet :A Multi view Approach to Predict Socio Facial Impressions" Environment requirement This code is based on Python

1 Dec 19, 2021

This repository allows you to anonymize sensitive information in images/videos. The solution is fully compatible with the DL-based training/inference solutions that we already published/will publish for Object Detection and Semantic Segmentation.

BMW-Anonymization-Api Data privacy and individuals’ anonymity are and always have been a major concern for data-driven companies. Therefore, we design

148 Dec 21, 2022

A general-purpose, flexible, and easy-to-use simulator alongside an OpenAI Gym trading environment for MetaTrader 5 trading platform (Approved by OpenAI Gym)

gym-mtsim: OpenAI Gym - MetaTrader 5 Simulator MtSim is a simulator for the MetaTrader 5 trading platform alongside an OpenAI Gym environment for rein

184 Dec 31, 2022

This repository contains code released by Google Research.

26.6k Dec 31, 2022

piSTAR Lab is a modular platform built to make AI experimentation accessible and fun. (pistar.ai)

piSTAR Lab WARNING: This is an early release. Overview piSTAR Lab is a modular deep reinforcement learning platform built to make AI experimentation a

0 Aug 1, 2022

[CVPR 2021] Released code for Counterfactual Zero-Shot and Open-Set Visual Recognition

Counterfactual Zero-Shot and Open-Set Visual Recognition This project provides implementations for our CVPR 2021 paper Counterfactual Zero-S

144 Dec 24, 2022

Code of PVTv2 is released! PVTv2 largely improves PVTv1 and works better than Swin Transformer with ImageNet-1K pre-training.

Updates (2020/06/21) Code of PVTv2 is released! PVTv2 largely improves PVTv1 and works better than Swin Transformer with ImageNet-1K pre-training. Pyr

1.3k Jan 4, 2023

NP DRAW paper released code

NP-DRAW: A Non-Parametric Structured Latent Variable Model for Image Generation This repo contains the official implementation for the NP-DRAW paper.

22 Mar 13, 2022

Released code for Objects are Different: Flexible Monocular 3D Object Detection, CVPR21

MonoFlex Released code for Objects are Different: Flexible Monocular 3D Object Detection, CVPR21. Work in progress. Installation This repo is tested w

169 Dec 6, 2022

[ICCV 2021] Released code for Causal Attention for Unbiased Visual Recognition

CaaM This repo contains the codes of training our CaaM on NICO/ImageNet9 dataset. Due to my recent limited bandwidth, this codebase is still messy, wh

66 Dec 31, 2022

A repository for storing njxzc final exam review material

文档地址，请戳我 ?? ?? ?? ☀️ 1.Reason 大三上期末复习软件工程的时候，发现其他高校在GitHub上开源了他们学校的期末试题，我很受触动。期末

2 Jan 18, 2022

We have implemented shaDow-GNN as a general and powerful pipeline for graph representation learning. For more details, please find our paper titled Deep Graph Neural Networks with Shallow Subgraph Samplers, available on arXiv (https//arxiv.org/abs/2012.01380).

Deep GNN, Shallow Sampling Hanqing Zeng, Muhan Zhang, Yinglong Xia, Ajitesh Srivastava, Andrey Malevich, Rajgopal Kannan, Viktor Prasanna, Long Jin, R

117 Dec 20, 2022

This is the first released system towards complex meters` detection and recognition, which is implemented by computer vision techniques.

A three-stage detection and recognition pipeline of complex meters in wild This is the first released system towards detection and recognition of comp

19 Nov 28, 2022

Code for You Only Cut Once: Boosting Data Augmentation with a Single Cut

You Only Cut Once (YOCO) YOCO is a simple method/strategy of performing augmenta

88 Dec 28, 2022

A Multi-modal Model Chinese Spell Checker Released on ACL2021.

ReaLiSe ReaLiSe is a multi-modal Chinese spell checking model. This the office code for the paper Read, Listen, and See: Leveraging Multimodal Informa

106 Dec 29, 2022

This repo contains research materials released by members of the Google Brain team in Tokyo.

Brain Tokyo Workshop ?? ?? This repo contains research materials released by members of the Google Brain team in Tokyo. Past Projects Weight Agnostic

1.2k Jan 2, 2023

Fuzzification helps developers protect the released, binary-only software from attackers who are capable of applying state-of-the-art fuzzing techniques

About Fuzzification Fuzzification helps developers protect the released, binary-only software from attackers who are capable of applying state-of-the-

55 Oct 25, 2022

This is the official code release for the paper Shape and Material Capture at Home

This is the official code release for the paper Shape and Material Capture at Home. The code enables you to reconstruct a 3D mesh and Cook-Torrance BRDF from one or more images captured with a flashlight or camera with flash.

89 Dec 10, 2022

Code for PhySG: Inverse Rendering with Spherical Gaussians for Physics-based Relighting and Material Editing

PhySG: Inverse Rendering with Spherical Gaussians for Physics-based Relighting and Material Editing CVPR 2021. Project page: https://kai-46.github.io/

141 Dec 14, 2022

Repository for MeshTalk supplemental material and code once the (already approved) 16 GHS captures our lab will make publicly available are released.

Related tags

Overview

meshtalk

Supplemental Material

Running MeshTalk

Dependencies

Animating a Face Mesh from Audio

Training your own MeshTalk version

License

Comments

Releases(pretrained_models_v1.0)

pretrained_models_v1.0(Oct 27, 2021)

Owner

Meta Research

Supplemental Code for "ImpressionNet :A Multi view Approach to Predict Socio Facial Impressions"

This repository allows you to anonymize sensitive information in images/videos. The solution is fully compatible with the DL-based training/inference solutions that we already published/will publish for Object Detection and Semantic Segmentation.

A general-purpose, flexible, and easy-to-use simulator alongside an OpenAI Gym trading environment for MetaTrader 5 trading platform (Approved by OpenAI Gym)

This repository contains code released by Google Research.

piSTAR Lab is a modular platform built to make AI experimentation accessible and fun. (pistar.ai)

[CVPR 2021] Released code for Counterfactual Zero-Shot and Open-Set Visual Recognition

Code of PVTv2 is released! PVTv2 largely improves PVTv1 and works better than Swin Transformer with ImageNet-1K pre-training.

NP DRAW paper released code

Released code for Objects are Different: Flexible Monocular 3D Object Detection, CVPR21

[ICCV 2021] Released code for Causal Attention for Unbiased Visual Recognition

A repository for storing njxzc final exam review material

We have implemented shaDow-GNN as a general and powerful pipeline for graph representation learning. For more details, please find our paper titled Deep Graph Neural Networks with Shallow Subgraph Samplers, available on arXiv (https//arxiv.org/abs/2012.01380).

This is the first released system towards complex meters` detection and recognition, which is implemented by computer vision techniques.

Code for You Only Cut Once: Boosting Data Augmentation with a Single Cut

A Multi-modal Model Chinese Spell Checker Released on ACL2021.

This repo contains research materials released by members of the Google Brain team in Tokyo.

Fuzzification helps developers protect the released, binary-only software from attackers who are capable of applying state-of-the-art fuzzing techniques

This is the official code release for the paper Shape and Material Capture at Home

Code for PhySG: Inverse Rendering with Spherical Gaussians for Physics-based Relighting and Material Editing