Face Synthetics dataset is a collection of diverse synthetic face images with ground truth labels.

Overview

alt text

The Face Synthetics dataset

Face Synthetics dataset is a collection of diverse synthetic face images with ground truth labels.

It was introduced in our paper Fake It Till You Make It: Face analysis in the wild using synthetic data alone.

Our dataset contains:

  • 100,000 images of faces at 512 x 512 pixel resolution
  • 70 standard facial landmark annotations
  • per-pixel semantic class anotations

It can be used to train machine learning systems for face-related tasks such as landmark localization and face parsing, showing that synthetic data can both match real data in accuracy as well as open up new approaches where manual labelling would be impossible.

Some images also include hands and off-center distractor faces in addition to primary faces centered in the image.

The Face Synthetics dataset can be used for non-commercial research, and is licensed under the license found in LICENSE.txt.

Downloading the dataset

A sample dataset with 100 images (34MB) can be downloaded from here

A sample dataset with 1000 images (320MB) can be downloaded from here

A full dataset of 100,000 images (32GB) can be downloaded from here

Dataset layout

The Face Synthetics dataset is a single .zip file containing color images, segmentation images, and 2D landmark coordinates in a text file.

dataset.zip
├── {frame_id}.png        # Rendered image of a face
├── {frame_id}_seg.png    # Segmentation image, where each pixel has an integer value mapping to the categories below
├── {frame_id}_ldmks.txt  # Landmark annotations for 70 facial landmarks (x, y) coordinates for every row

Our landmark annotations follow the 68 landmark scheme from iBUG with two additional points for the pupil centers. Please note that our 2D landmarks are projections of 3D points and do not follow the outline of the face/lips/eyebrows in the way that is common from manually annotated landmarks. They can be thought of as an "x-ray" version of 2D landmarks.

Each pixel in the segmentation image will belong to one of the following classes:

BACKGROUND = 0
SKIN = 1
NOSE = 2
RIGHT_EYE = 3
LEFT_EYE = 4
RIGHT_BROW = 5
LEFT_BROW = 6
RIGHT_EAR = 7
LEFT_EAR = 8
MOUTH_INTERIOR = 9
TOP_LIP = 10
BOTTOM_LIP = 11
NECK = 12
HAIR = 13
BEARD = 14
CLOTHING = 15
GLASSES = 16
HEADWEAR = 17
FACEWEAR = 18
IGNORE = 255

Pixels marked as IGNORE should be ignored during training.

Notes:

  • Opaque eyeglass lenses are labeled as GLASSES, while transparent lenses as the class behind them.
  • For bushy eyebrows, a few eyebrow pixels may extend beyond the boundary of the face. These pixels are labelled as IGNORE.

Disclaimer

Some of our rendered faces may be close in appearance to the faces of real people. Any such similarity is naturally unintentional, as it would be in a dataset of real images, where people may appear similar to others unknown to them.

Generalization to real data

For best results, we suggest you follow the methodology described in our paper (citation below). Especially note the need for 1) data augmentation; 2) use of a translation layer if evaluating on real data benchmarks that contain different types of annotations.

Our dataset strives to be as diverse as possible and generalizes to real test data as described in the paper. However, you may encounter situations that it does not cover and/or where generalization is less successful. We recommend that machine learning practitioners always test models on real data that is representative of the target deployment scenario.

Citation

If you use the Face Synthetics Dataset your research, please cite the following paper:

@misc{wood2021fake,
    title={Fake It Till You Make It: Face analysis in the wild using synthetic data alone},
    author={Erroll Wood and Tadas Baltru\v{s}aitis and Charlie Hewitt and Sebastian Dziadzio and Matthew Johnson and Virginia Estellers and Thomas J. Cashman and Jamie Shotton},
    year={2021},
    eprint={2109.15102},
    archivePrefix={arXiv},
    primaryClass={cs.CV}
}
Comments
  • Failing to download the full dataset

    Failing to download the full dataset

    Hello guys, thank you for the great work. I've tried (numerous times) to download the full dataset (32GB), using different internet connections (aws machine as well), but at each time, the download eventually failed. Any suggestion? maybe there is a mirror?

    opened by roey1rg 5
  • Multi view rendered images

    Multi view rendered images

    First of all, thank you for this great work. I was wondering if the dataset could be expanded to provided images rendered from multiple camera directions, something similar to co3d from facebook. This will enable many additional research possibilities.

    opened by eldercrow 1
  • Is the process of attaching hair and eyebrows to head manual or automated?

    Is the process of attaching hair and eyebrows to head manual or automated?

    Thanks for this great work. I wonder the process of attaching hair and eyebrows to head manual or automated? If manually attaching, how long does it take?

    Your responses are appreciated.

    Regards

    opened by X-niper 1
  • Could you release corresponding synthetic 3d model?

    Could you release corresponding synthetic 3d model?

    Hi, Thanks for your great work!

    I study in face systhetic and notice the great dataset.

    Could you release corresponding synthetic 3d model of synthetic image?

    It may be brought more research topic on the dataset. Thanks!

    opened by John-Yao 1
  • Render passes

    Render passes

    Thank you for this dataset. Very cool project.

    Did you capture render passes, or are project files available to rerender?

    I'd love to be able to experiment with normal, curvature, depth, etc

    opened by caseybasichis 1
  • Face annotation conversion

    Face annotation conversion

    Greetings! Thanks for sharing such an inspirational results!

    I'm setting up a set of training experiments with the dataset. I found, that landmarks' annotation format does not directly maps to 300W, WFLW, etc.:

    1. Different keypoints positions.
    2. Invisible landmarks are drawn as direct projections. In contrast, other annotation types supposes landmarks movement to the edge of visible part of the face.

    So do you have a plan to a) provide a converter to the mentioned benchmarks or b) share a 3D landmarks annotation? I would be very grateful.

    Sample from your dataset: Microsoft_3

    Example from WFLW dataset: WFLW_4

    opened by moonlight99 1
  • train val test split

    train val test split

    Hi guys, thank you for the great work. It seems that all the data is in a one big directory, is there a standard way to split the data to train / val / test ?

    opened by lkdci 1
  • Request for split compression

    Request for split compression

    Dear all,

    I've tried to download the full dataset of 100,000 images (32GB) several times via the Google Chrome browser and wget, but I cannot download them because of network error.

    I think the size of the compressed dataset is too large to download at once.

    Thus, I request that you should upload the dataset using split compression. It is useful to retry download even if the download fails.

    Best regards, Vujadeyoon

    opened by vujadeyoon 1
  • How do we can get a bounding box?

    How do we can get a bounding box?

    Hello, Thanks for your great work

    Now, I am studying about this one.

    I want to know how you guys get the bounding box 256 x 256? Is there any specific method or additional detection netwrok for it?

    opened by rlgnswk 0
  • This repo is missing important files

    This repo is missing important files

    There are important files that Microsoft projects should all have that are not present in this repository. A pull request has been opened to add the missing file(s). When the pr is merged this issue will be closed automatically.

    Microsoft teams can learn more about this effort and share feedback within the open source guidance available internally.

    Merge this pull request

    opened by microsoft-github-policy-service[bot] 0
  • Adding Microsoft SECURITY.MD

    Adding Microsoft SECURITY.MD

    Please accept this contribution adding the standard Microsoft SECURITY.MD :lock: file to help the community understand the security policy and how to safely report security issues. GitHub uses the presence of this file to light-up security reminders and a link to the file. This pull request commits the latest official SECURITY.MD file from https://github.com/microsoft/repo-templates/blob/main/shared/SECURITY.md.

    Microsoft teams can learn more about this effort and share feedback within the open source guidance available internally.

    opened by microsoft-github-policy-service[bot] 0
  • wrap3d manually or with python scripts?

    wrap3d manually or with python scripts?

    Hi, you have mentioned that you register your head models with r3ds. I want to know whether this process is automated or manual? Specifically, how you choose the corresponding keypoints, which consumes quite a lot of time. After 'wraping', will you modify the wrapped results further to get better results?

    opened by X-niper 0
  • Alpha background masks.

    Alpha background masks.

    If you could release the Alpha masks shown in the paper, they would be really useful for training background matting models (for example PaddlePaddleSeg PP-Matting)

    The segmentation images only contain a binary background class, so applying these as masks can result in a halo effect.

    opened by GilesBathgate 0
  • Mirroring FaceSynthetics on Hugging Face

    Mirroring FaceSynthetics on Hugging Face

    Hello! Thank you for creating and publishing this dataset 😄 Would you be interested in uploading it to the Hugging Face dataset hub? Hosting is free, and I know our users would find this dataset extremely valuable. Beyond helping with discoverability, datasets on Hugging Face can be used with the datasets library (https://github.com/huggingface/datasets), which enables things like streaming and also provides a ton of efficient data-manipulation tools.

    We have guides on how to upload datasets if this is something you're interested in, but I'm also happy to help out with this myself!

    cc: @osanseviero

    opened by NimaBoscarino 0
  • 6d pose annotations

    6d pose annotations

    Hi! First, I would also like to express my admiration for this work.

    Then please add my request for releasing 6d pose information. I think it would be interesting to train estimators for 6d pose on your dataset due to having perfect ground truth data in contrast to training on 300w-lp.

    Thank you

    Michael

    opened by DaWelter 5
  • Full annotations

    Full annotations

    Hello! Thanks for the amazing work! Do you plan to release of all types of annotations? I mean depth, normals, UVs, all points for head. That would be cool!

    opened by r3krut 1
Owner
Microsoft
Open source projects and samples from Microsoft
Microsoft
noisy labels; missing labels; semi-supervised learning; entropy; uncertainty; robustness and generalisation.

ProSelfLC: CVPR 2021 ProSelfLC: Progressive Self Label Correction for Training Robust Deep Neural Networks For any specific discussion or potential fu

amos_xwang 57 Dec 4, 2022
An image base contains 490 images for learning (400 cars and 90 boats), and another 21 images for testingAn image base contains 490 images for learning (400 cars and 90 boats), and another 21 images for testing

SVM Données Une base d’images contient 490 images pour l’apprentissage (400 voitures et 90 bateaux), et encore 21 images pour fait des tests. Prétrait

Achraf Rahouti 3 Nov 30, 2021
Python scripts for performing object detection with the 1000 labels of the ImageNet dataset in ONNX.

Python scripts for performing object detection with the 1000 labels of the ImageNet dataset in ONNX. The repository combines a class agnostic object localizer to first detect the objects in the image, and next a ResNet50 model trained on ImageNet is used to label each box.

Ibai Gorordo 24 Nov 14, 2022
A Light CNN for Deep Face Representation with Noisy Labels

A Light CNN for Deep Face Representation with Noisy Labels Citation If you use our models, please cite the following paper: @article{wulight, title=

Alfred Xiang Wu 715 Nov 5, 2022
CCPD: a diverse and well-annotated dataset for license plate detection and recognition

CCPD (Chinese City Parking Dataset, ECCV) UPdate on 10/03/2019. CCPD Dataset is now updated. We are confident that images in subsets of CCPD is much m

detectRecog 1.8k Dec 30, 2022
Synthetic LiDAR sequential point cloud dataset with point-wise annotations

SynLiDAR dataset: Learning From Synthetic LiDAR Sequential Point Cloud This is official repository of the SynLiDAR dataset. For technical details, ple

null 78 Dec 27, 2022
Implementation of CVAE. Trained CVAE on faces from UTKFace Dataset to produce synthetic faces with a given degree of happiness/smileyness.

Conditional Smiles! (SmileCVAE) About Implementation of AE, VAE and CVAE. Trained CVAE on faces from UTKFace Dataset. Using an encoding of the Smile-s

Raúl Ortega 3 Jan 9, 2022
Hypersim: A Photorealistic Synthetic Dataset for Holistic Indoor Scene Understanding

The Hypersim Dataset For many fundamental scene understanding tasks, it is difficult or impossible to obtain per-pixel ground truth labels from real i

Apple 1.3k Jan 4, 2023
IntraQ: Learning Synthetic Images with Intra-Class Heterogeneity for Zero-Shot Network Quantization

IntraQ: Learning Synthetic Images with Intra-Class Heterogeneity for Zero-Shot Network Quantization paper Requirements Python >= 3.7.10 Pytorch == 1.7

null 1 Nov 19, 2021
Commonality in Natural Images Rescues GANs: Pretraining GANs with Generic and Privacy-free Synthetic Data - Official PyTorch Implementation (CVPR 2022)

Commonality in Natural Images Rescues GANs: Pretraining GANs with Generic and Privacy-free Synthetic Data (CVPR 2022) Potentials of primitive shapes f

null 31 Sep 27, 2022
Official repository of the paper Privacy-friendly Synthetic Data for the Development of Face Morphing Attack Detectors

SMDD-Synthetic-Face-Morphing-Attack-Detection-Development-dataset Official repository of the paper Privacy-friendly Synthetic Data for the Development

null 10 Dec 12, 2022
Official Repo for Ground-aware Monocular 3D Object Detection for Autonomous Driving

Visual 3D Detection Package: This repo aims to provide flexible and reproducible visual 3D detection on KITTI dataset. We expect scripts starting from

Yuxuan Liu 305 Dec 19, 2022
[WACV 2020] Reducing Footskate in Human Motion Reconstruction with Ground Contact Constraints

Reducing Footskate in Human Motion Reconstruction with Ground Contact Constraints Official implementation for Reducing Footskate in Human Motion Recon

Virginia Tech Vision and Learning Lab 38 Nov 1, 2022
PointCloud Annotation Tools, support to label object bound box, ground, lane and kerb

PointCloud Annotation Tools, support to label object bound box, ground, lane and kerb

halo 368 Dec 6, 2022
GndNet: Fast ground plane estimation and point cloud segmentation for autonomous vehicles using deep neural networks.

GndNet: Fast Ground plane Estimation and Point Cloud Segmentation for Autonomous Vehicles. Authors: Anshul Paigwar, Ozgur Erkent, David Sierra Gonzale

Anshul Paigwar 114 Dec 29, 2022
Autonomous Ground Vehicle Navigation and Control Simulation Examples in Python

Autonomous Ground Vehicle Navigation and Control Simulation Examples in Python THIS PROJECT IS CURRENTLY A WORK IN PROGRESS AND THUS THIS REPOSITORY I

Joshua Marshall 14 Dec 31, 2022
Using LSTM to detect spoofing attacks in an Air-Ground network

Using LSTM to detect spoofing attacks in an Air-Ground network Specifications IDE: Spider Packages: Tensorflow 2.1.0 Keras NumPy Scikit-learn Matplotl

Tiep M. H. 1 Nov 20, 2021
ObjectDrawer-ToolBox: a graphical image annotation tool to generate ground plane masks for a 3D object reconstruction system

ObjectDrawer-ToolBox is a graphical image annotation tool to generate ground plane masks for a 3D object reconstruction system, Object Drawer.

null 77 Jan 5, 2023
A large dataset of 100k Google Satellite and matching Map images, resembling pix2pix's Google Maps dataset.

Larger Google Sat2Map dataset This dataset extends the aerial ⟷ Maps dataset used in pix2pix (Isola et al., CVPR17). The provide script download_sat2m

null 34 Dec 28, 2022