StyleGAN-Human: A Data-Centric Odyssey of Human Generation

stylegan-human

Last update: Jan 8, 2023

Related tags

Deep Learning StyleGAN-Human

Overview

StyleGAN-Human: A Data-Centric Odyssey of Human Generation

Abstract: Unconditional human image generation is an important task in vision and graphics, which enables various applications in the creative industry. Existing studies in this field mainly focus on "network engineering" such as designing new components and objective functions. This work takes a data-centric perspective and investigates multiple critical aspects in "data engineering", which we believe would complement the current practice. To facilitate a comprehensive study, we collect and annotate a large-scale human image dataset with over 230K samples capturing diverse poses and textures. Equipped with this large dataset, we rigorously investigate three essential factors in data engineering for StyleGAN-based human generation, namely data size, data distribution, and data alignment. Extensive experiments reveal several valuable observations w.r.t. these aspects: 1) Large-scale data, more than 40K images, are needed to train a high-fidelity unconditional human generation model with vanilla StyleGAN. 2) A balanced training set helps improve the generation quality with rare face poses compared to the long-tailed counterpart, whereas simply balancing the clothing texture distribution does not effectively bring an improvement. 3) Human GAN models with body centers for alignment outperform models trained using face centers or pelvis points as alignment anchors. In addition, a model zoo and human editing applications are demonstrated to facilitate future research in the community.
Keyword: Human Image Generation, Data-Centric, StyleGAN

Jianglin Fu, Shikai Li, Yuming Jiang, Kwan-Yee Lin, Chen Qian, Chen Change Loy, Wayne Wu, and Ziwei Liu
[Demo Video] | [Project Page] | [Paper]

Updates

[26/04/2022] Technical report released!
[22/04/2022] Technical report will be released before May.
[21/04/2022] The codebase and project page are created.

Model Zoo

Structure	1024x512	512x256
StyleGAN1	stylegan_human_v1_1024.pkl	to be released
StyleGAN2	stylegan_human_v2_1024.pkl	stylegan_human_v2_512.pkl
StyleGAN3	to be released	stylegan_human_v3_512.pkl

Web Demo

Integrated into Huggingface Spaces 🤗 using Gradio. Try out the Web Demo for generation: and interpolation

We prepare a Colab demo to allow you to synthesize images with the provided models, as well as visualize the performance of style-mixing, interpolation, and attributes editing. The notebook will guide you to install the necessary environment and download pretrained models. The output images can be found in ./StyleGAN-Human/outputs/. Hope you enjoy!

Usage

System requirements

The original code bases are stylegan (tensorflow), stylegan2-ada (pytorch), stylegan3 (pytorch), released by NVidia
We tested in Python 3.8.5 and PyTorch 1.9.1 with CUDA 11.1. (See https://pytorch.org for PyTorch install instructions.)

Installation

To work with this project on your own machine, you need to install the environmnet as follows:

conda env create -f environment.yml
conda activate stylehuman
# [Optional: tensorflow 1.x is required for StyleGAN1. ]
pip install nvidia-pyindex
pip install nvidia-tensorflow[horovod]
pip install nvidia-tensorboard==1.15

Extra notes:

In case having some conflicts when calling CUDA version, please try to empty the LD_LIBRARY_PATH. For example:

LD_LIBRARY_PATH=; python generate.py --outdir=out/stylegan_human_v2_1024 --trunc=1 --seeds=1,3,5,7 
--network=pretrained_models/stylegan_human_v2_1024.pkl --version 2

We found the following troubleshooting links might be helpful: 1., 2.

Pretrained models

Please put the downloaded pretrained models from above link under the folder 'pretrained_models'.

Generate full-body human images using our pretrained model

# Generate human full-body images without truncation
python generate.py --outdir=outputs/generate/stylegan_human_v2_1024 --trunc=1 --seeds=1,3,5,7 --network=pretrained_models/stylegan_human_v2_1024.pkl --version 2

# Generate human full-body images with truncation 
python generate.py --outdir=outputs/generate/stylegan_human_v2_1024 --trunc=0.8 --seeds=0-10 --network=pretrained_models/stylegan_human_v2_1024.pkl --version 2

# Generate human full-body images using stylegan V1
python generate.py --outdir=outputs/generate/stylegan_human_v1_1024 --network=pretrained_models/stylegan_human_v1_1024.pkl --version 1 --seeds=1,3,5

# Generate human full-body images using stylegan V3
python generate.py --outdir=outputs/generate/stylegan_human_v3_512 --network=pretrained_models/stylegan_human_v3_512.pkl --version 3 --seeds=1,3,5

Note: The following demos are generated based on models related to StyleGAN V2 (stylegan_human_v2_512.pkl and stylegan_human_v2_1024.pkl). If you want to see results for V1 or V3, you need to change the loading method of the corresponding models.

Interpolation

python interpolation.py --network=pretrained_models/stylegan_human_v2_1024.pkl  --seeds=85,100 --outdir=outputs/inter_gifs

Style-mixing image using stylegan2

python style_mixing.py --network=pretrained_models/stylegan_human_v2_1024.pkl --rows=85,100,75,458,1500 \\
    --cols=55,821,1789,293 --styles=0-3 --outdir=outputs/stylemixing

Style-mixing video using stylegan2

python stylemixing_video.py --network=pretrained_models/stylegan_human_v2_1024.pkl --row-seed=3859 \\
    --col-seeds=3098,31759,3791 --col-styles=8-12 --trunc=0.8 --outdir=outputs/stylemixing_video

Editing with InterfaceGAN, StyleSpace, and Sefa

python edit.py --network pretrained_models/stylegan_human_v2_1024.pkl --attr_name upper_length \\
    --seeds 61531,61570,61571,61610 --outdir outputs/edit_results

Note:

''upper_length'' and ''bottom_length'' of ''attr_name'' are available for demo.
Layers to control and editing strength are set in edit/edit_config.py.

Demo for InsetGAN

We implement a quick demo using the key idea from InsetGAN: combining the face generated by FFHQ with the human-body generated by our pretrained model, optimizing both face and body latent codes to get a coherent full-body image. Before running the script, you need to download the FFHQ face model, or you can use your own face model, as well as pretrained face landmark and pretrained CNN face detection model for dlib

python insetgan.py --body_network=pretrained_models/stylegan_human_v2_1024.pkl --face_network=pretrained_models/ffhq.pkl \\
    --body_seed=82 --face_seed=43  --trunc=0.6 --outdir=outputs/insetgan/ --video 1

Results

Editing

InsetGAN re-implementation

For more demo, please visit our web page .

TODO List

Release 1024x512 version of StyleGAN-Human based on StyleGAN3
Release 512x256 version of StyleGAN-Human based on StyleGAN1
Extension of downstream application (InsetGAN): Add face inversion interface to support fusing user face image and stylegen-human body image
Add Inversion Script into the provided editing pipeline
Release Dataset

Citation

If you find this work useful for your research, please consider citing our paper:

@article{fu2022styleganhuman,
      title={StyleGAN-Human: A Data-Centric Odyssey of Human Generation}, 
      author={Fu, Jianglin and Li, Shikai and Jiang, Yuming and Lin, Kwan-Yee and Qian, Chen and Loy, Chen-Change and Wu, Wayne and Liu, Ziwei},
      journal   = {arXiv preprint},
      volume    = {arXiv:2204.11823},
      year    = {2022}

Acknowlegement

Part of the code is borrowed from stylegan (tensorflow), stylegan2-ada (pytorch), stylegan3 (pytorch).

Comments

Questions about SHHQ-1.0 FID

Hi! Your work is so meaningful! Nowadays I'm trying to caculate the FID score using your released model trained with SHHQ-1.0 (~40K, SHHQ-1.0_sg2_512.pk), then I use it to generate 50K images and caculate FID with SHHQ-1.0, but my FID score (7.12) is much higher than that you reported (3.68). Could you share more details about your reported FID? What images you use to caculate FID? Is there a test set using for FID? By the way, could you release your training settings of 40K model and 230K model, it is not clear that if they are trained with similar iters or epochs.

opened by SeanChen0220 5

alignment.py doesnt work

run:
!python alignment.py --image-folder /content/input --output-folder /content/output. ,

but I got

Traceback (most recent call last):
  File "alignment.py", line 221, in <module>
    run(cmd_args)
  File "alignment.py", line 144, in run
    comb, segmentation, bg, ori_img = human_seg.run(image,None)  #mybg) 
  File "/content/StyleGAN-Human/PP_HumanSeg/deploy/infer.py", line 100, in run
    processed_imgs, ori_shapes = self.preprocess(img)
  File "/content/StyleGAN-Human/PP_HumanSeg/deploy/infer.py", line 92, in preprocess
    processed_img = self.compose(img)[0]
  File "/usr/local/lib/python3.7/dist-packages/paddleseg/transforms/transforms.py", line 56, in __call__
    if 'img' not in data.keys():
AttributeError: 'numpy.ndarray' object has no attribute 'keys'

My environment

colab pro.
input folder: /content/input; only one image, of which size is (512, 1024), PIL image.
To solve module error, !pip install paddleseg paddlepaddle

and I believe, the problem is here, self.compose = T.Compose(self.cfg.transforms).
self.cfg.transforms wants to get dictionary having a key, 'img'.
but I cant get though it.

Plz help.

opened by jujemu 4

Distorted faces and distorted patterns when using PTI

I tried two methods in order to find the latent code with your pre-trained model, stylegan_human_v2_1024.pkl and e4e_w+.pt

1 - I started with some seeds (e.g. 0, 4000, 4050, 4020, 5500, 5000, 10) and generated some images. Then I treated those images as real images and tried to find their latent representations using ("python run_pti.py"), I used the default config in "pti_configs" folder. In some cases, the face, and in others, the pattern of the cloth is distorted (please see the following pictures)

2 - I downloaded the sample images from your GitHub and did the same procedure. The aforementioned problems are more noticeable. I got distorted faces and patterns.

I was wondering if you have any advice for these cases.

opened by resa-git 3
Improve speed of PTI encoding

Thank you for releasing the codes for PTI encoding of whole body images. I will like to know if there's a way to speed up the process of PTI encoding.

opened by onyekaokonji 2
Issue with new release

Thanks for releasing Inversion script. Projection is working fine but the result is poor. We can try hyper parameter tuning. But you have given another option e4e. ( w+) . It is failing "key error"

File "/StyleGAN_Human/pti/training/coaches/base_coach.py", line 136, in initilize_e4e opts = ckpt['opts'] KeyError: 'opts'

I tried printing the key values in ckpt ( loading the e4ew+.pt) . There is no such key "opts". Could you please tell us what can be done to get e4e based inversion ? I expect it will give better output than projection.

opened by Lakshmanaraja 2
What are the meanings of seeds when editing images?

Hi, thanks for your great job and the released codes.

When I tried to edit the images I found that there are some seeds should be fed into the model, e.g. [60948,60965,61174,61210,61511,61598,61610] #bottom -> long. What are the meanings of these seeds and how to obtain the corresponding seeds if I would like to perform other kinds of editing?

Thank you for your help.

opened by picksh 2
FID of the StyleGAN model

Dear StyleGAN-Human team,

Thank you for sharing this great work, I really like it.

Could you upload the FID of each StyleGAN model? Have you try to use the projected gan or vision-aided-gan to improve the generation quality?

Thank you for your help.

Best Wishes,

Zongze

opened by betterze 2

insetgan.py cudnnConvolutionBiasActivationForward

Hello, I am trying to run the code of insetgan on Google Colab and I've run in this issue.

Previously, the code was running without errors.

Has anyone run in the same issue? How could this problem be solved?

This is the output of insetgan.py:

Setting up [LPIPS] perceptual loss: trunk [alex], v[0.1], spatial [off]
/usr/local/lib/python3.7/dist-packages/torchvision/models/_utils.py:209: UserWarning: The parameter 'pretrained' is deprecated since 0.13 and will be removed in 0.15, please use 'weights' instead.
  f"The parameter '{pretrained_param}' is deprecated since 0.13 and will be removed in 0.15, "
/usr/local/lib/python3.7/dist-packages/torchvision/models/_utils.py:223: UserWarning: Arguments other than a weight enum or `None` for 'weights' are deprecated since 0.13 and will be removed in 0.15. The current behavior is equivalent to passing `weights=AlexNet_Weights.IMAGENET1K_V1`. You can also use `weights=AlexNet_Weights.DEFAULT` to get the most up-to-date weights.
  warnings.warn(msg)
Loading model from: /usr/local/lib/python3.7/dist-packages/lpips/weights/v0.1/alex.pth
Traceback (most recent call last):
  File "insetgan.py", line 398, in <module>
    main()
  File "/usr/local/lib/python3.7/dist-packages/click/core.py", line 829, in __call__
    return self.main(*args, **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/click/core.py", line 782, in main
    rv = self.invoke(ctx)
  File "/usr/local/lib/python3.7/dist-packages/click/core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/local/lib/python3.7/dist-packages/click/core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/click/decorators.py", line 21, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "insetgan.py", line 372, in main
    _, body_crop, _ = insgan.detect_face_dlib(body_img)
  File "insetgan.py", line 110, in detect_face_dlib
    output_size=256)
  File "/content/StyleGAN-Human/utils/face_alignment.py", line 40, in align_face_for_insetgan
    lm, face_rect = get_landmark(img, detector, predictor)
  File "/content/StyleGAN-Human/utils/face_alignment.py", line 18, in get_landmark
    dets = detector(img, 1)
RuntimeError: Error while calling cudnnConvolutionBiasActivationForward( context(), &alpha1, descriptor(data), data.device(), (const cudnnFilterDescriptor_t)filter_handle, filters.device(), (const cudnnConvolutionDescriptor_t)conv_handle, (cudnnConvolutionFwdAlgo_t)forward_algo, forward_workspace, forward_workspace_size_in_bytes, &alpha2, out_desc, out, descriptor(biases), biases.device(), identity_activation_descriptor(), out_desc, out) in file /tmp/pip-install-7f5va9na/dlib_dd01df0de7e94b8893437a0dcaab1525/dlib/cuda/cudnn_dlibapi.cpp:1237. code: 9, reason: CUDNN_STATUS_NOT_SUPPORTED

opened by ikros98 1

Fine-Tuning with original data

I'm trying to additionally train a model for StyleGAN3 using my own images (512x512 resolution).

python train.py --outdir=training_results/sg3/ --cfg=stylegan3-r --gpus=8 --batch=32 --gamma=12.4 \
    --mirror=1 --aug=noaug --data=data/OrgData/ --square=True --snap=250

When I run the prepared script, I get the following error:

Traceback (most recent call last):
  File "train_stylegan_human.py", line 293, in <module>
    main() # pylint: disable=no-value-for-parameter
  File "/opt/conda/lib/python3.8/site-packages/click/core.py", line 1128, in __call__
    return self.main(*args, **kwargs)
  File "/opt/conda/lib/python3.8/site-packages/click/core.py", line 1053, in main
    rv = self.invoke(ctx)
  File "/opt/conda/lib/python3.8/site-packages/click/core.py", line 1395, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/opt/conda/lib/python3.8/site-packages/click/core.py", line 754, in invoke
    return __callback(*args, **kwargs)
  File "train_stylegan_human.py", line 288, in main
    launch_training(c=c, desc=desc, outdir=opts.outdir, dry_run=opts.dry_run)
  File "train_stylegan_human.py", line 98, in launch_training
    subprocess_fn(rank=0, c=c, temp_dir=temp_dir)
  File "train_stylegan_human.py", line 49, in subprocess_fn
    training_loop.training_loop(rank=rank, **c)
  File "/home/rd/img_studio/source/submodules/StyleGAN3/training/training_loop.py", line 163, in training_loop
    misc.copy_params_and_buffers(resume_data[name], module, require_all=False)
  File "/home/rd/img_studio/source/submodules/StyleGAN3/torch_utils/misc.py", line 164, in copy_params_and_buffers
    tensor.copy_(src_tensors[name].detach()).requires_grad_(tensor.requires_grad)
RuntimeError: The size of tensor a (512) must match the size of tensor b (1024) at non-singleton dimension 1

I set --cbase=16384 referring to the StyleGAN3 issue, but the result did not change.

Can someone please tell me how to deal with this?

opened by ccc07130 0

Is the model trained on Original Dataset ? or aligned before training?
Thanks for your great work. I got access to SHHQ dataset recently. I have a few doubts.

Alignment of images : I could see dataset images with many different backgrounds. I just want to know which is better? Batch processing of alignment of images and create a new dataset. Use this new dataset for training

or including alignment / other preprocessing as a part of training. Just want to know the best practice? which one you followed ?

In the paper, You have mentioned Guassian blur and changing to uniform background. May i assume your alignment script will do both?

How to use the segmention mask you have provided with the dataset ( for alignment ) in this training?
opened by Lakshmanaraja 0
how to directly input 18 different codes in w+ space?

I find there are two functions in your code G.mapping and G.synthesis. But if I want to 18 different codes in w+ space without using the affine transformation in G.synthesis. How to achieve that? Thanks a lot!!

opened by linziqu 0
Question about FID computation
@stylegan-human, the FID I am getting for stylegan_human_v2_1024.pkl seems higher than what is reported in the paper.

I did the following two methods for FID computation:

Generated 10 K images and calculated FID using pytorch-fid to get FID of ~59

Use the calc_metrics.py from the stylegan2-ada-pytorch repo and calculate the fid50k_full to be 50.0

Both of these calculations are done with respect to the SHHQ-1.0 40K images.

Just curious what dataset was used as reference to compute the FID on your generated images! Is it possible to share the FID computation code for reproducibility? Thanks for your consideration.
opened by koutilya-pnvr 0
[QUESTION] Use custom dataset lables

How I can Prepare my Dataset with Labels to Train this Model? I lable my data set as provided here in the PyTorch Docs

Here a Example frome my CSV: "Image_Name""test.png", "Male":True, "Hair_Black":False, "Hair_Brown":True

The original StyleGAN Lib use only Integer Lables. How I can get the benefits in Learning from my Dataset? Thanks

opened by Avedena 1
Please Release the training code

Thank you for the amazing job you are doing! You have achieved amazing results and the generated images look fantastic!

Please release the training code so we can train the model further (and on top of ) on our own time for personal needs. I see a lot of people asking for it - please make it available for us . Thank you !

opened by redyellowRain 7
style mixing with only change the style of the cloth

Hi, Thanks for your amazing job,

From the style mixing results, the middle mixing will influence the cloth and the id appearance at the same time, I am curious about whether you try to point out the mixing way only changing the style of cloth when keeping the identity.

opened by linziqu 0

Owner

stylegan-human

GitHub

[SIGGRAPH'22] StyleGAN-XL: Scaling StyleGAN to Large Diverse Datasets

[Project] [PDF] This repository contains code for our SIGGRAPH'22 paper "StyleGAN-XL: Scaling StyleGAN to Large Diverse Datasets" by Axel Sauer, Katja

742 Jan 4, 2023

StyleSpace Analysis: Disentangled Controls for StyleGAN Image Generation

StyleSpace Analysis: Disentangled Controls for StyleGAN Image Generation Demo video: CVPR 2021 Oral: Single Channel Manipulation: Localized or attribu

267 Dec 30, 2022

Implementation of StyleSpace Analysis: Disentangled Controls for StyleGAN Image Generation in PyTorch

StyleSpace Analysis: Disentangled Controls for StyleGAN Image Generation Implementation of StyleSpace Analysis: Disentangled Controls for StyleGAN Ima

86 Dec 7, 2022

Official implementation of "StyleCariGAN: Caricature Generation via StyleGAN Feature Map Modulation" (SIGGRAPH 2021)

StyleCariGAN in PyTorch Official implementation of StyleCariGAN:Caricature Generation via StyleGAN Feature Map Modulation in PyTorch Requirements PyTo

49 Oct 31, 2022

Official implementation of "StyleCariGAN: Caricature Generation via StyleGAN Feature Map Modulation" (SIGGRAPH 2021)

StyleCariGAN: Caricature Generation via StyleGAN Feature Map Modulation This repository contains the official PyTorch implementation of the following

270 Dec 30, 2022

(CVPR 2021) Lifting 2D StyleGAN for 3D-Aware Face Generation

Lifting 2D StyleGAN for 3D-Aware Face Generation Official implementation of paper "Lifting 2D StyleGAN for 3D-Aware Face Generation". Requirements You

66 Nov 29, 2022

[CVPR 2022 Oral] Versatile Multi-Modal Pre-Training for Human-Centric Perception

Versatile Multi-Modal Pre-Training for Human-Centric Perception Fangzhou Hong1 Liang Pan1 Zhongang Cai1,2,3 Ziwei Liu1* 1S-Lab, Nanyang Technologic

96 Jan 3, 2023

Team nan solution repository for FPT data-centric competition. Data augmentation, Albumentation, Mosaic, Visualization, KNN application

FPT_data_centric_competition - Team nan solution repository for FPT data-centric competition. Data augmentation, Albumentation, Mosaic, Visualization, KNN application

2 Oct 30, 2022

Does MAML Only Work via Feature Re-use? A Data Set Centric Perspective

Does-MAML-Only-Work-via-Feature-Re-use-A-Data-Set-Centric-Perspective Does MAML Only Work via Feature Re-use? A Data Set Centric Perspective Installin

2 Nov 7, 2022

Source code for the GPT-2 story generation models in the EMNLP 2020 paper "STORIUM: A Dataset and Evaluation Platform for Human-in-the-Loop Story Generation"

Storium GPT-2 Models This is the official repository for the GPT-2 models described in the EMNLP 2020 paper [STORIUM: A Dataset and Evaluation Platfor

27 Dec 20, 2022

Object-Centric Learning with Slot Attention

Slot Attention This is a re-implementation of "Object-Centric Learning with Slot Attention" in PyTorch (https://arxiv.org/abs/2006.15055). Requirement

72 Jan 2, 2023

NUANCED is a user-centric conversational recommendation dataset that contains 5.1k annotated dialogues and 26k high-quality user turns.

NUANCED: Natural Utterance Annotation for Nuanced Conversation with Estimated Distributions Overview NUANCED is a user-centric conversational recommen

18 Dec 28, 2021

EMNLP'2021: Simple Entity-centric Questions Challenge Dense Retrievers

EntityQuestions This repository contains the EntityQuestions dataset as well as code to evaluate retrieval results from the the paper Simple Entity-ce

119 Sep 28, 2022

EMNLP'2021: Simple Entity-centric Questions Challenge Dense Retrievers

EntityQuestions This repository contains the EntityQuestions dataset as well as code to evaluate retrieval results from the the paper Simple Entity-ce

50 Sep 24, 2021

The official repo for OC-SORT: Observation-Centric SORT on video Multi-Object Tracking. OC-SORT is simple, online and robust to occlusion/non-linear motion.

OC-SORT Observation-Centric SORT (OC-SORT) is a pure motion-model-based multi-object tracker. It aims to improve tracking robustness in crowded scenes

325 Jan 5, 2023

Official Implementation for "ReStyle: A Residual-Based StyleGAN Encoder via Iterative Refinement" https://arxiv.org/abs/2104.02699

ReStyle: A Residual-Based StyleGAN Encoder via Iterative Refinement Recently, the power of unconditional image synthesis has significantly advanced th

967 Jan 4, 2023

Non-Official Pytorch implementation of "Face Identity Disentanglement via Latent Space Mapping" https://arxiv.org/abs/2005.07728 Using StyleGAN2 instead of StyleGAN

Face Identity Disentanglement via Latent Space Mapping - Implement in pytorch with StyleGAN 2 Description Pytorch implementation of the paper Face Ide

58 Dec 24, 2022

VOGUE: Try-On by StyleGAN Interpolation Optimization

VOGUE is a StyleGAN interpolation optimization algorithm for photo-realistic try-on. Top: shirt try-on automatically synthesized by our method in two different examples.

66 Dec 9, 2022

Segmentation in Style: Unsupervised Semantic Image Segmentation with Stylegan and CLIP

Segmentation in Style: Unsupervised Semantic Image Segmentation with Stylegan and CLIP Abstract: We introduce a method that allows to automatically se

134 Dec 19, 2022