ADGAN - The Implementation of paper Controllable Person Image Synthesis with Attribute-Decomposed GAN

Overview

ADGAN

PyTorch | project page | paper

PyTorch implementation for controllable person image synthesis.

Controllable Person Image Synthesis with Attribute-Decomposed GAN
Yifang Men, Yiming Mao, Yuning Jiang, Wei-Ying Ma, Zhouhui Lian, Peking University & ByteDance AI Lab, CVPR 2020(Oral).

Component Attribute Transfer

Pose Transfer

Requirement

  • python 3
  • pytorch(>=1.0)
  • torchvision
  • numpy
  • scipy
  • scikit-image
  • pillow
  • pandas
  • tqdm
  • dominate

Getting Started

You can directly download our generated images (in Deepfashion) from Google Drive.

Installation

  • Clone this repo:
git clone https://github.com/menyifang/ADGAN.git
cd ADGAN

Data Preperation

We use DeepFashion dataset and provide our dataset split files, extracted keypoints files and extracted segmentation files for convience.

The dataset structure is recommended as:

+—deepfashion
|   +—fashion_resize
|       +--train (files in 'train.lst')
|          +-- e.g. fashionMENDenimid0000008001_1front.jpg
|       +--test (files in 'test.lst')
|          +-- e.g. fashionMENDenimid0000056501_1front.jpg
|       +--trainK(keypoints of person images)
|          +-- e.g. fashionMENDenimid0000008001_1front.jpg.npy
|       +--testK
|          +-- e.g. fashionMENDenimid0000056501_1front.jpg.npy
|   +—semantic_merge
|   +—fashion-resize-pairs-train.csv
|   +—fashion-resize-pairs-test.csv
|   +—fashion-resize-annotation-pairs-train.csv
|   +—fashion-resize-annotation-pairs-test.csv
|   +—train.lst
|   +—test.lst
|   +—vgg19-dcbb9e9d.pth
|   +—vgg_conv.pth
...
  1. Person images
python tool/generate_fashion_datasets.py

Note: In our settings, we crop the images of DeepFashion into the resolution of 176x256 in a center-crop manner.

  1. Keypoints files
  • Download train/test pairs and train/test key points annotations from Google Drive, including fashion-resize-pairs-train.csv, fashion-resize-pairs-test.csv, fashion-resize-annotation-train.csv, fashion-resize-annotation-train.csv. Put these four files under the deepfashion directory.
  • Generate the pose heatmaps. Launch
python tool/generate_pose_map_fashion.py
  1. Segmentation files
  • Extract human segmentation results from existing human parser (e.g. Look into Person) and merge into 8 categories. Our segmentation results are provided in Google Drive, including ‘semantic_merge2’ and ‘semantic_merge3’ in different merge manner. Put one of them under the deepfashion directory.

Optionally, you can also generate these files by yourself.

  1. Keypoints files

We use OpenPose to generate keypoints.

  • Download pose estimator from Google Drive. Put it under the root folder ADGAN.
  • Change the paths input_folder and output_path in tool/compute_coordinates.py. And then launch
python2 compute_coordinates.py
  1. Dataset split files
python2 tool/create_pairs_dataset.py

Train a model

bash ./scripts/train.sh 

Test a model

Download our pretrained model from Google Drive. Modify your data path and launch

bash ./scripts/test.sh 

Evaluation

We adopt SSIM, IS, DS, CX for evaluation. This part is finished by Yiming Mao.

1) SSIM

For evaluation, Tensorflow 1.4.1(python3) is required.

python tool/getMetrics_market.py

2) DS Score

Download pretrained on VOC 300x300 model and install propper caffe version SSD. Put it in the ssd_score forlder.

python compute_ssd_score_fashion.py --input_dir path/to/generated/images

3) CX (Contextual Score)

Refer to folder ‘cx’ to compute contextual score.

Citation

If you use this code for your research, please cite our paper:

@inproceedings{men2020controllable,
  title={Controllable Person Image Synthesis with Attribute-Decomposed GAN},
  author={Men, Yifang and Mao, Yiming and Jiang, Yuning and Ma, Wei-Ying and Lian, Zhouhui},
  booktitle={Computer Vision and Pattern Recognition (CVPR), 2020 IEEE Conference on},
  year={2020}
}


Acknowledgments

Our code is based on PATN and thanks for their great work.

Comments
  • Pretrained model not generating proper images

    Pretrained model not generating proper images

    Hi,

    I'm trying to generate images with the pretrained model and the provided preprocessed dataset, but I'm only getting random pixels. I wonder if I'm missing anything in my setup not mentioned in the README file. Really appreciate your help!

    Sample output: fashionMENJackets_Vestsid0000724701_2side jpg___fashionMENJackets_Vestsid0000724701_1front jpg_vis

    My test.sh: python test.py
    --dataroot deepfashion
    --dirSem deepfashion
    --pairLst deepfashion/fashion-resize-pairs-test.csv
    --checkpoints_dir ./checkpoints
    --results_dir ./results
    --name fashion_AdaGen_sty512_nres8_lre3_SS_fc_vgg_cxloss_ss_merge3
    --model adgan
    --phase test
    --dataset_mode keypoint
    --norm instance
    --batchSize 1
    --resize_or_crop no
    --gpu_ids 0
    --BP_input_nc 18
    --no_flip
    --which_model_netG ADGen
    --which_epoch 800

    My folder structure: ADGAN ├── checkpoints │   ├── fashion_AdaGen_sty512_nres8_lre3_SS_fc_vgg_cxloss_ss_merge3 │   │   ├── 1000_net_netG.pth │   │   ├── 800_net_netG.pth │   │   ├── loss_log.txt │   │   ├── opt.txt ├── cx ├── data ├── deepfashion │   ├── fashion-resize-annotation-test.csv │   ├── fashion-resize-annotation-train.csv │   ├── fashion-resize-pairs-test.csv │   ├── fashion-resize-pairs-train.csv │   ├── resized │   ├── semantic_merge2 │   ├── semantic_merge3 │   ├── test │   ├── testK │   ├── test.lst │   ├── train │   ├── trainK │   ├── train.lst │   ├── vgg19-dcbb9e9d.pth │   └── vgg_conv.pth ├── gif ├── losses ├── models ├── options ├── README.md ├── scripts ├── ssd_score ├── test.py ├── tool ├── train.py └── util

    I also fixed a hardcoded path in model_adgen.py locally.

    opened by JiamingFB 5
  • How long does training take?

    How long does training take?

    Hi @menyifang ,

    How long did it roughly take to train the pretrained model with 2 V100 GPUs? A few days or weeks? (I read your paper but it doesn't seem to be mentioned there.)

    opened by JiamingFB 3
  • What is the mapping of the semantic map of person image to the merged K=8 attribute?

    What is the mapping of the semantic map of person image to the merged K=8 attribute?

    I am trying to map the segmentation mask output with the merged (K=8) indexes. The current indexes I have are

    np.array(('Background', # always index 0 'Hat', 'Hair', 'Glove', 'Sunglasses', 'UpperClothes', 'Dress', 'Coat', 'Socks', 'Pants', 'Jumpsuits', 'Scarf', 'Skirt', 'Face', 'Left-arm', 'Right-arm', 'Left-leg', 'Right-leg', 'Left-shoe', 'Right-shoe',)) is the input

    and the merged index is : background, hair, face, upper clothes, pants, skirt, arm and leg

    Is there a code you could share where this operation is performed? I am trying to reuse the pre-trained model

    opened by nitthilan 2
  • Not able to reproduce the result

    Not able to reproduce the result

    I am not able to reproduce the results using the pre-trained models. fashionWOMENJackets_Coatsid0000417103_1front jpg___fashionWOMENJackets_Coatsid0000417103_2side jpg_vis

    The above is the output I am getting. Can you predict why I am getting this issue?

    opened by nitthilan 2
  • RuntimeError: The size of tensor a (256) must match the size of tensor b (176) at non-singleton dimension 3

    RuntimeError: The size of tensor a (256) must match the size of tensor b (176) at non-singleton dimension 3

    File "/home/user20202735/ADGAN/models/model_adgen.py", line 37, in forward style = self.enc_style(img_B, sem_B) File "/home/user20202735/.conda/envs/adgan/lib/python3.6/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl return forward_call(*input, **kwargs) File "/home/user20202735/ADGAN/models/model_adgen.py", line 130, in forward xi = x.mul(semi)

    opened by fengbuck 1
  • why batch norm here?

    why batch norm here?

    Hi, thanks for your great work first!

    I'm trying to reproduce your code, but I can not understand why these code using F.batch_norm in AdaptiveInstanceNorm2d, why not just F.instance_norm?

    https://github.com/menyifang/ADGAN/blob/4dd70649ad136829b92dd6a1a823af7594a0220f/models/model_adgen.py#L355-L362

    opened by budui 1
  • About the perceptual loss

    About the perceptual loss

    In your paper, the perceptual loss was:

    # vggsubmod refers to certain layer of VGG.
    Lper = L1(gram(vggsubmod(x)), gram(vggsubmod(y))  
    

    However in your implementation: https://github.com/menyifang/ADGAN/blob/d948cb135801c83295e9427cab5d7d738436aa95/losses/L1_plus_perceptualLoss.py#L63-L72

    Could you give some explanation on that ? thanks.

    opened by mazzzystar 1
  • fixed the hardcoded path in model_adgen.py

    fixed the hardcoded path in model_adgen.py

    The path to /data/deepfashion/vgg19-dcbb9e9d.pth was hardcoded as your local machine path. So I made some changes to make it more dynamic as it was a pin in my ass when I tried running this model today. But honestly good work that you have done in this model. keep it up please <3.

    opened by Ziad-Usama 0
  • run bash ./script/train.sh

    run bash ./script/train.sh

    After prepared the environment, then run the cmd "bash ./script/train.sh", I got the error like "RuntimeError: The size of tensor a (750) must match the size of tensor b (176) at non-singleton dimension 3", can you answer the question, Thank you very much!!!

    opened by XuJ1E 1
  • Run time error during test

    Run time error during test

    I tested with bash python ./scripts/test.sh to test using pre-trained 800-netG model.

    data is arranged as follows:

    +—deepfashion
    |   +—fashion_resize
    |       +--train (files in 'train.lst')
    |          +-- e.g. fashionMENDenimid0000008001_1front.jpg
    |       +--test (files in 'test.lst')
    |          +-- e.g. fashionMENDenimid0000056501_1front.jpg
    |       +--trainK(keypoints of person images)
    |          +-- e.g. fashionMENDenimid0000008001_1front.jpg.npy
    |       +--testK
    |          +-- e.g. fashionMENDenimid0000056501_1front.jpg.npy
    |   +—semantic_merge
    |   +—fashion-resize-pairs-train.csv
    |   +—fashion-resize-pairs-test.csv
    |   +—fashion-resize-annotation-pairs-train.csv
    |   +—fashion-resize-annotation-pairs-test.csv
    |   +—train.lst
    |   +—test.lst
    |   +—vgg19-dcbb9e9d.pth
    |   +—vgg_conv.pth
    ... 
    

    code reference

    https://github.com/menyifang/ADGAN/blob/c76647172e923573b4012b6c17a1b3938155aedd/data/keypoint.py#L52:L88
    

    I got following runtime error :

    /ADGAN/data/keypoint.py", line 80, in __getitem__
     BP1 = BP1.transpose(2, 0) #c,w,h
     IndexError: Dimension out of range (expected to be in range of [-2, 1], but got 2)
    

    debug output :

    >>>BP1_img.shape
    (256, 176)
    

    Any suggestions how to solve this!

    opened by EMHussain 0
  • Download error - In-shop Clothes Retrieval Benchmark

    Download error - In-shop Clothes Retrieval Benchmark

    When I downloaded "In-shop Clothes Retrieval Benchmark", I got the following 9 error messages:

    In-shop Clothes Retrieval Benchmark/README.txt

    • You do not have permission to download this document. In-shop Clothes Retrieval Benchmark/Img/img.zip
    • You do not have permission to download this document. In-shop Clothes Retrieval Benchmark/Anno/list_item_inshop.txt
    • You do not have permission to download this document. In-shop Clothes Retrieval Benchmark/Anno/list_description_inshop.json
    • You do not have permission to download this document. In-shop Clothes Retrieval Benchmark/Anno/list_landmarks_inshop.txt
    • You do not have permission to download this document. In-shop Clothes Retrieval Benchmark/Eval/list_eval_partition.txt
    • You do not have permission to download this document. In-shop Clothes Retrieval Benchmark/Anno/attributes/list_attr_cloth.txt
    • You do not have permission to download this document. In-shop Clothes Retrieval Benchmark/Anno/list_bbox_inshop.txt
    • You do not have permission to download this document. In-shop Clothes Retrieval Benchmark/Anno/attributes/list_attr_items.txt
    • You do not have permission to download this document.

    Is that OK to skip these files?

    opened by eastchun 0
  • Dataset download locked by passwd

    Dataset download locked by passwd

    Hi, I am trying to download data (so many data...!!) anyway it said .ds_stre file is passwd protected and asked me passwd. Could you help me on this?

    In fact, img_highres_seg-004 and img_highres-003 are passwd protected.

    opened by eastchun 1
  • Issue on compute_coordinates.py

    Issue on compute_coordinates.py

    I'm running compute_coordinates.py in order to recalculate keypoints, but when I run it I have follow warning, (that I suppose mean some issue on size of images):

    tensorflow:Model was constructed with shape Tensor("input_5:0", shape=(1, 368, 368, 3), dtype=float32) for input (1, 368, 368, 3), but it was re-called on a Tensor with incompatible shape (None, 184, 126, 3).

    The script run, but it create a file with all -1 in keypoints.

    What should be the issue?

    opened by EnricoBeltramo 2
Owner
Men Yifang
Men Yifang
Implementation for HFGI: High-Fidelity GAN Inversion for Image Attribute Editing

HFGI: High-Fidelity GAN Inversion for Image Attribute Editing High-Fidelity GAN Inversion for Image Attribute Editing Update: We released the inferenc

Tengfei Wang 371 Dec 30, 2022
Official pytorch code for SSC-GAN: Semi-Supervised Single-Stage Controllable GANs for Conditional Fine-Grained Image Generation(ICCV 2021)

SSC-GAN_repo Pytorch implementation for 'Semi-Supervised Single-Stage Controllable GANs for Conditional Fine-Grained Image Generation'.PDF SSC-GAN:Sem

tyty 4 Aug 28, 2022
Pytorch implementation of CVPR2021 paper "MUST-GAN: Multi-level Statistics Transfer for Self-driven Person Image Generation"

MUST-GAN Code | paper The Pytorch implementation of our CVPR2021 paper "MUST-GAN: Multi-level Statistics Transfer for Self-driven Person Image Generat

TianxiangMa 46 Dec 26, 2022
DeRF: Decomposed Radiance Fields

DeRF: Decomposed Radiance Fields Daniel Rebain, Wei Jiang, Soroosh Yazdani, Ke Li, Kwang Moo Yi, Andrea Tagliasacchi Links Paper Project Page Abstract

UBC Computer Vision Group 24 Dec 2, 2022
CAMPARI: Camera-Aware Decomposed Generative Neural Radiance Fields

CAMPARI: Camera-Aware Decomposed Generative Neural Radiance Fields Paper | Supplementary | Video | Poster If you find our code or paper useful, please

null 26 Nov 29, 2022
Sync2Gen Code for ICCV 2021 paper: Scene Synthesis via Uncertainty-Driven Attribute Synchronization

Sync2Gen Code for ICCV 2021 paper: Scene Synthesis via Uncertainty-Driven Attribute Synchronization 0. Environment Environment: python 3.6 and cuda 10

Haitao Yang 62 Dec 30, 2022
[CVPR 2022] TransEditor: Transformer-Based Dual-Space GAN for Highly Controllable Facial Editing

TransEditor: Transformer-Based Dual-Space GAN for Highly Controllable Facial Editing (CVPR 2022) This repository provides the official PyTorch impleme

Billy XU 128 Jan 3, 2023
FuseDream: Training-Free Text-to-Image Generationwith Improved CLIP+GAN Space OptimizationFuseDream: Training-Free Text-to-Image Generationwith Improved CLIP+GAN Space Optimization

FuseDream This repo contains code for our paper (paper link): FuseDream: Training-Free Text-to-Image Generation with Improved CLIP+GAN Space Optimizat

XCL 191 Dec 31, 2022
This is the PyTorch implementation of GANs N’ Roses: Stable, Controllable, Diverse Image to Image Translation

Official PyTorch repo for GAN's N' Roses. Diverse im2im and vid2vid selfie to anime translation.

null 1.1k Jan 1, 2023
Official implementation of the paper Chunked Autoregressive GAN for Conditional Waveform Synthesis

Chunked Autoregressive GAN (CARGAN) Official implementation of the paper Chunked Autoregressive GAN for Conditional Waveform Synthesis [paper] [compan

Descript 150 Dec 6, 2022
The source code of the ICCV2021 paper "PIRenderer: Controllable Portrait Image Generation via Semantic Neural Rendering"

The source code of the ICCV2021 paper "PIRenderer: Controllable Portrait Image Generation via Semantic Neural Rendering"

Ren Yurui 261 Jan 9, 2023
The source code of the ICCV2021 paper "PIRenderer: Controllable Portrait Image Generation via Semantic Neural Rendering"

Website | ArXiv | Get Start | Video PIRenderer The source code of the ICCV2021 paper "PIRenderer: Controllable Portrait Image Generation via Semantic

Ren Yurui 81 Sep 25, 2021
π-GAN: Periodic Implicit Generative Adversarial Networks for 3D-Aware Image Synthesis

π-GAN: Periodic Implicit Generative Adversarial Networks for 3D-Aware Image Synthesis Project Page | Paper | Data Eric Ryan Chan*, Marco Monteiro*, Pe

null 375 Dec 31, 2022
Liquid Warping GAN with Attention: A Unified Framework for Human Image Synthesis

Liquid Warping GAN with Attention: A Unified Framework for Human Image Synthesis, including human motion imitation, appearance transfer, and novel view synthesis. Currently the paper is under review of IEEE TPAMI. It is an extension of our previous ICCV project impersonator, and it has a more powerful ability in generalization and produces higher-resolution results (512 x 512, 1024 x 1024) than the previous ICCV version.

null 2.3k Jan 5, 2023
PyTorch implementation of Lip to Speech Synthesis with Visual Context Attentional GAN (NeurIPS2021)

Lip to Speech Synthesis with Visual Context Attentional GAN This repository contains the PyTorch implementation of the following paper: Lip to Speech

null 6 Nov 2, 2022
DR-GAN: Automatic Radial Distortion Rectification Using Conditional GAN in Real-Time

DR-GAN: Automatic Radial Distortion Rectification Using Conditional GAN in Real-Time Introduction This is official implementation for DR-GAN (IEEE TCS

Kang Liao 18 Dec 23, 2022
Code, Data and Demo for Paper: Controllable Generation from Pre-trained Language Models via Inverse Prompting

InversePrompting Paper: Controllable Generation from Pre-trained Language Models via Inverse Prompting Code: The code is provided in the "chinese_ip"

THUDM 101 Dec 16, 2022
The official pytorch implemention of the CVPR paper "Temporal Modulation Network for Controllable Space-Time Video Super-Resolution".

This is the official PyTorch implementation of TMNet in the CVPR 2021 paper "Temporal Modulation Network for Controllable Space-Time VideoSuper-Resolu

Gang Xu 95 Oct 24, 2022
HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis

HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis Jungil Kong, Jaehyeon Kim, Jaekyoung Bae In our paper, we p

Rishikesh (ऋषिकेश) 31 Dec 8, 2022