The PyTorch implementation for paper "Neural Texture Extraction and Distribution for Controllable Person Image Synthesis" (CVPR2022 Oral)

Ren Yurui

Last update: Dec 10, 2022

Related tags

Overview

Neural-Texture-Extraction-Distribution

The PyTorch implementation for our paper "Neural Texture Extraction and Distribution for Controllable Person Image Synthesis" (CVPR2022 Oral)

We propose a Neural-Texture-Extraction-Distribution operation for controllable person image synthesis. Our model can be used to control the pose and appearance of a reference image:

Pose Control

Appearance Control

News

2022.4.30 Colab demos are provided for quick exploration.
2022.4.28 Code for PyTorch is available now!

Installation

Requirements

Python 3
PyTorch 1.7.1
CUDA 10.2

Conda Installation

# 1. Create a conda virtual environment.
conda create -n NTED python=3.6
conda activate NTED
conda install -c pytorch pytorch=1.7.1 torchvision cudatoolkit=10.2

# 2. Clone the Repo and Install dependencies
git clone --recursive https://github.com/RenYurui/Neural-Texture-Extraction-Distribution.git
pip install -r requirements.txt

# 3. Install mmfashion (for appearance control only)
pip install mmcv==0.5.1
pip install pycocotools==2.0.4
cd ./scripts
chmod +x insert_mmfashion2mmdetection.sh
./insert_mmfashion2mmdetection.sh
cd ../third_part/mmdetection
pip install -v -e .

Demo

Several demos are provided. Please first download the resources by runing

cd scripts
./download_demos.sh

Pose Transfer

Run the following code for the results.

PATH_TO_OUTPUT=./demo_results
python demo.py \
--config ./config/fashion_512.yaml \
--which_iter 495400 \
--name fashion_512 \
--file_pairs ./txt_files/demo.txt \
--input_dir ./demo_images \
--output_dir $PATH_TO_OUTPUT

Appearance Control

Meanwhile, run the following code for the appearance control demo.

python appearance_control.py \
--config ./config/fashion_512.yaml \
--name fashion_512 \
--which_iter 495400 \
--input_dir ./demo_images \
--file_pairs ./txt_files/appearance_control.txt

Colab Demo

Please check the Colab Demos for pose control and appearance control.

Dataset

Download img_highres.zip of the DeepFashion Dataset from In-shop Clothes Retrieval Benchmark.
Unzip img_highres.zip. You will need to ask for password from the dataset maintainers. Then rename the obtained folder as img and put it under the ./dataset/deepfashion directory.
We split the train/test set following GFLA. Several images with significant occlusions are removed from the training set. Download the train/test pairs and the keypoints pose.zip extracted with Openpose by runing:
```
cd scripts
./download_dataset.sh
```
Or you can download these files manually：
- Download the train/test pairs from Google Drive including train_pairs.txt, test_pairs.txt, train.lst, test.lst. Put these files under the ./dataset/deepfashion directory.
- Download the keypoints pose.rar extracted with Openpose from Google Driven. Unzip and put the obtained floder under the ./dataset/deepfashion directory.

Run the following code to save images to lmdb dataset.

python -m scripts.prepare_data \
--root ./dataset/deepfashion \
--out ./dataset/deepfashion

Training

This project supports multi-GPUs training. The following code shows an example for training the model with 512x352 images using 4 GPUs.

CUDA_VISIBLE_DEVICES=0,1,2,3 python -m torch.distributed.launch \
--nproc_per_node=4 \
--master_port 1234 train.py \
--config ./config/fashion_512.yaml \
--name $name_of_your_experiment

All configs for this experiment are saved in ./config/fashion_512.yaml. If you change the number of GPUs, you may need to modify the batch_size in ./config/fashion_512.yaml to ensure using a same batch_size.

Inference

Download the trained weights for 512x352 images and 256x176 images. Put the obtained checkpoints under ./result/fashion_512 and ./result/fashion_256 respectively.

Run the following code to evaluate the trained model:

# run evaluation for 512x352 images
python -m torch.distributed.launch \
--nproc_per_node=1 \
--master_port 12345 inference.py \
--config ./config/fashion_512.yaml \
--name fashion_512 \
--no_resume \
--output_dir ./result/fashion_512/inference 

# run evaluation for 256x176 images
python -m torch.distributed.launch \
--nproc_per_node=1 \
--master_port 12345 inference.py \
--config ./config/fashion_256.yaml \
--name fashion_256 \
--no_resume \
--output_dir ./result/fashion_256/inference

The result images are save in ./result/fashion_512/inference and ./result/fashion_256/inference.

Comments

i only found 'extraction_softmax' in the code. Are they the same?

https://github.com/RenYurui/Neural-Texture-Extraction-Distribution/blob/6e3f0e94c4b6a14ceafa4f8d82950e6b4674a0b3/trainers/extraction_distribution_trainer.py#L188

opened by olream 2
Incomplete pose file

Hi Ren Yurui,

Thanks for your work.

I encountered an error when training the model. neural-Texture-Extraction-Distribution main/dataset/deepfashion/pose/WOMEN/Blouses_Shirts/id_00001612/02_4_full.txt not found. I checked train_pairs.txt and found it exists, but it does not exist in the pose file. Also, I found that the number of images in the img file is larger than the number of files in the pose file. So I would like to ask if the pose file you gave is incomplete. If so, can you release the full pose file?

Thanks!

opened by moonlight703 2
Where is "hook_softmax"?

我看到代码中是通过hook_softmax来实现注意力图的：

extraction_distribution_trainer.py中的 attn_image = attn2image(info['hook_softmax'], info['semantic_distribution'], input_image)

但是并没有在model中找到这个键位，请问它在哪里呢？感谢~

opened by WeiYu021 1
Released Model Link Broken

Hi Yurui,

Thank you for sharing the great work with the community! Nice to see your new work released!

The link for 256x176 weights seems broken, would you be able to take a look on this? Besides, the discriminator weights seem not included in the pickle file for 512x352 weights. Could you by chance release the trained discriminator as well?

Thanks! Aiyu

opened by cuiaiyu 1
Add docker env and web demo

Hey @RenYurui! 👋

This pull request makes it possible to run your model inside a Docker environment, which makes it easier for other people to run it. We're using an open source tool called Cog to make this process easier.

This also means we can make a web demo where other people can try out your model! View it here: https://replicate.com/renyurui/controllable-person-synthesis. Currently, only the pose manipulation aspect is supported.

We've added some examples to the web demo; please click the black "Claim this model" button so you own it/edit it.

In case you're wondering who I am, I'm from Replicate, where we're trying to make machine learning reproducible by implementing models we like. 😊

opened by vccheng2001 0

Size mismatch when loading the pretrained 256x176 model

Hi Yurui,

When I try to load the checkpoint for 256x176 model with the given configuration, I get size mismatch errors.

I then found the the 256x176 checkpoint has the same name as the checkpoint of 512x352, so I load the 256x176 checkpoint with configuration for 512x352 for sanity check, and it works. I think the link of 256x176 checkpoint may actually contain the checkpoint of 512x352.

Could you please take a look on this?

Thank you again! Aiyu

The error log is attached.

 Error(s) in loading state_dict for Generator:
        Unexpected key(s) in state_dict: "reference_encoder.convs.4.blur.kernel", "reference_encoder.convs.4.conv.weight", "reference_encoder.convs.4.activate.bias", "reference_encoder.convs.4.extraction_operations.0.value_conv.weight", "reference_encoder.convs.4.extraction_operations.0.value_conv.bias", "reference_encoder.convs.4.extraction_operations.0.semantic_extraction_filter.weight", "reference_encoder.convs.4.extraction_operations.1.value_conv.weight", "reference_encoder.convs.4.extraction_operations.1.value_conv.bias", "reference_encoder.convs.4.extraction_operations.1.semantic_extraction_filter.weight", "skeleton_encoder.convs.4.blur.kernel", "skeleton_encoder.convs.4.conv.weight", "skeleton_encoder.convs.4.activate.bias", "target_image_renderer.convs.5.conv0.conv.weight", "target_image_renderer.convs.5.conv0.blur.kernel", "target_image_renderer.convs.5.conv0.activate.bias", "target_image_renderer.convs.5.conv1.conv.weight", "target_image_renderer.convs.5.conv1.activate.bias", "target_image_renderer.convs.5.to_rgb.upsample.kernel", "target_image_renderer.convs.5.to_rgb.conv.weight", "target_image_renderer.convs.5.to_rgb.conv.bias", "target_image_renderer.convs.4.conv0.distribution_operation.semantic_distribution_filter.weight", "target_image_renderer.convs.4.conv0.distribution_operation.semantic_distribution_filter.bias", "target_image_renderer.convs.4.conv1.distribution_operation.semantic_distribution_filter.weight", "target_image_renderer.convs.4.conv1.distribution_operation.semantic_distribution_filter.bias".
        size mismatch for reference_encoder.first.conv.weight: copying a param with shape torch.Size([64, 3, 1, 1]) from checkpoint, the shape in current model is torch.Size([128, 3, 1, 1]).
        size mismatch for reference_encoder.first.activate.bias: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([128]).
        size mismatch for reference_encoder.convs.0.conv.weight: copying a param with shape torch.Size([128, 64, 3, 3]) from checkpoint, the shape in current model is torch.Size([256, 128, 3, 3]).
        size mismatch for reference_encoder.convs.0.activate.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]).
        size mismatch for reference_encoder.convs.0.extraction_operations.0.value_conv.weight: copying a param with shape torch.Size([128, 128, 3, 3]) from checkpoint, the shape in current model is torch.Size([256, 256, 3, 3]).
        size mismatch for reference_encoder.convs.0.extraction_operations.0.value_conv.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]).
        size mismatch for reference_encoder.convs.0.extraction_operations.0.semantic_extraction_filter.weight: copying a param with shape torch.Size([64, 128, 3, 3]) from checkpoint, the shape in current model is torch.Size([64, 256, 3, 3]).
        size mismatch for reference_encoder.convs.0.extraction_operations.1.value_conv.weight: copying a param with shape torch.Size([128, 128, 3, 3]) from checkpoint, the shape in current model is torch.Size([256, 256, 3, 3]).
        size mismatch for reference_encoder.convs.0.extraction_operations.1.value_conv.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]).
        size mismatch for reference_encoder.convs.0.extraction_operations.1.semantic_extraction_filter.weight: copying a param with shape torch.Size([64, 128, 3, 3]) from checkpoint, the shape in current model is torch.Size([64, 256, 3, 3]).
        size mismatch for reference_encoder.convs.1.conv.weight: copying a param with shape torch.Size([256, 128, 3, 3]) from checkpoint, the shape in current model is torch.Size([512, 256, 3, 3]).
        size mismatch for reference_encoder.convs.1.activate.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]).
        size mismatch for reference_encoder.convs.1.extraction_operations.0.value_conv.weight: copying a param with shape torch.Size([256, 256, 3, 3]) from checkpoint, the shape in current model is torch.Size([512, 512, 3, 3]).
        size mismatch for reference_encoder.convs.1.extraction_operations.0.value_conv.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]).
        size mismatch for reference_encoder.convs.1.extraction_operations.0.semantic_extraction_filter.weight: copying a param with shape torch.Size([64, 256, 3, 3]) from checkpoint, the shape in current model is torch.Size([64, 512, 3, 3]).
        size mismatch for reference_encoder.convs.1.extraction_operations.1.value_conv.weight: copying a param with shape torch.Size([256, 256, 3, 3]) from checkpoint, the shape in current model is torch.Size([512, 512, 3, 3]).
        size mismatch for reference_encoder.convs.1.extraction_operations.1.value_conv.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]).
        size mismatch for reference_encoder.convs.1.extraction_operations.1.semantic_extraction_filter.weight: copying a param with shape torch.Size([64, 256, 3, 3]) from checkpoint, the shape in current model is torch.Size([64, 512, 3, 3]).
        size mismatch for reference_encoder.convs.2.conv.weight: copying a param with shape torch.Size([512, 256, 3, 3]) from checkpoint, the shape in current model is torch.Size([512, 512, 3, 3]).
        size mismatch for reference_encoder.convs.3.extraction_operations.0.value_conv.weight: copying a param with shape torch.Size([512, 512, 3, 3]) from checkpoint, the shape in current model is torch.Size([512, 512, 1, 1]).
        size mismatch for reference_encoder.convs.3.extraction_operations.0.semantic_extraction_filter.weight: copying a param with shape torch.Size([32, 512, 3, 3]) from checkpoint, the shape in current model is torch.Size([16, 512, 1, 1]).
        size mismatch for reference_encoder.convs.3.extraction_operations.1.value_conv.weight: copying a param with shape torch.Size([512, 512, 3, 3]) from checkpoint, the shape in current model is torch.Size([512, 512, 1, 1]).
        size mismatch for reference_encoder.convs.3.extraction_operations.1.semantic_extraction_filter.weight: copying a param with shape torch.Size([32, 512, 3, 3]) from checkpoint, the shape in current model is torch.Size([16, 512, 1, 1]).
        size mismatch for skeleton_encoder.first.conv.weight: copying a param with shape torch.Size([64, 20, 1, 1]) from checkpoint, the shape in current model is torch.Size([128, 20, 1, 1]).
        size mismatch for skeleton_encoder.first.activate.bias: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([128]).
        size mismatch for skeleton_encoder.convs.0.conv.weight: copying a param with shape torch.Size([128, 64, 3, 3]) from checkpoint, the shape in current model is torch.Size([256, 128, 3, 3]).
        size mismatch for skeleton_encoder.convs.0.activate.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]).
        size mismatch for skeleton_encoder.convs.1.conv.weight: copying a param with shape torch.Size([256, 128, 3, 3]) from checkpoint, the shape in current model is torch.Size([512, 256, 3, 3]).
        size mismatch for skeleton_encoder.convs.1.activate.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]).
        size mismatch for skeleton_encoder.convs.2.conv.weight: copying a param with shape torch.Size([512, 256, 3, 3]) from checkpoint, the shape in current model is torch.Size([512, 512, 3, 3]).
        size mismatch for target_image_renderer.convs.2.conv0.distribution_operation.semantic_distribution_filter.weight: copying a param with shape torch.Size([32, 512, 3, 3]) from checkpoint, the shape in current model is torch.Size([64, 512, 3, 3]).
        size mismatch for target_image_renderer.convs.2.conv0.distribution_operation.semantic_distribution_filter.bias: copying a param with shape torch.Size([32]) from checkpoint, the shape in current model is torch.Size([64]).
        size mismatch for target_image_renderer.convs.2.conv1.distribution_operation.semantic_distribution_filter.weight: copying a param with shape torch.Size([32, 512, 3, 3]) from checkpoint, the shape in current model is torch.Size([64, 512, 3, 3]).
        size mismatch for target_image_renderer.convs.2.conv1.distribution_operation.semantic_distribution_filter.bias: copying a param with shape torch.Size([32]) from checkpoint, the shape in current model is torch.Size([64]).

opened by cuiaiyu 1

checking test data

hey thanks for your work! i checked some of the test images (more specific fashionMENShirts_Polosid0000180202_4full.jpg) i got very good results

i was very impressed by the ability of the model to create left shoulder tattoo so i added some black lines and got the same results

why the model didn't generate the new lines but give the same results?

thanks

opened by orydatadudes 1
The metrics provided in NTED paper is inconsistent with the ones in GFLA

I note that the quantitative results in your new NTED paper is inconsistent with GFLA,can you explain this and point out the difference in your metrics calculation. NTED GFLA

opened by mlyarthur 3

Owner

Ren Yurui

GitHub

Code for the CVPR2022 paper "Frequency-driven Imperceptible Adversarial Attack on Semantic Similarity"

Introduction This is an official release of the paper "Frequency-driven Imperceptible Adversarial Attack on Semantic Similarity" (arxiv link). Abstrac

21 Nov 23, 2022

CVPR2022 paper "Dense Learning based Semi-Supervised Object Detection"

[CVPR2022] DSL: Dense Learning based Semi-Supervised Object Detection DSL is the first work on Anchor-Free detector for Semi-Supervised Object Detecti

69 Dec 8, 2022

The official codes of our CVPR2022 paper: A Differentiable Two-stage Alignment Scheme for Burst Image Reconstruction with Large Shift

TwoStageAlign The official codes of our CVPR2022 paper: A Differentiable Two-stage Alignment Scheme for Burst Image Reconstruction with Large Shift Pa

32 Dec 15, 2022

The PyTorch implementation for paper "Neural Texture Extraction and Distribution for Controllable Person Image Synthesis" (CVPR2022 Oral)

Related tags

Overview

Neural-Texture-Extraction-Distribution

News

Installation

Requirements

Conda Installation

Demo

Pose Transfer

Appearance Control

Colab Demo

Dataset

Training

Inference

Comments

Owner

Ren Yurui

Code for the CVPR2022 paper "Frequency-driven Imperceptible Adversarial Attack on Semantic Similarity"

CVPR2022 paper "Dense Learning based Semi-Supervised Object Detection"

The official codes of our CVPR2022 paper: A Differentiable Two-stage Alignment Scheme for Burst Image Reconstruction with Large Shift

Source code for CVPR2022 paper "Abandoning the Bayer-Filter to See in the Dark"

Official code for CVPR2022 paper: Depth-Aware Generative Adversarial Network for Talking Head Video Generation

PSTR: End-to-End One-Step Person Search With Transformers (CVPR2022)

[CVPR2022] Bridge-Prompt: Towards Ordinal Action Understanding in Instructional Videos

Official code for "Eigenlanes: Data-Driven Lane Descriptors for Structurally Diverse Lanes", CVPR2022

Group R-CNN for Point-based Weakly Semi-supervised Object Detection (CVPR2022)

Multi-View Consistent Generative Adversarial Networks for 3D-aware Image Synthesis (CVPR2022)

TCTrack: Temporal Contexts for Aerial Tracking (CVPR2022)

Incremental Transformer Structure Enhanced Image Inpainting with Masking Positional Encoding (CVPR2022)

FaceVerse: a Fine-grained and Detail-controllable 3D Face Morphable Model from a Hybrid Dataset (CVPR2022)

Towards Implicit Text-Guided 3D Shape Generation (CVPR2022)

Video Frame Interpolation with Transformer (CVPR2022)

[CVPR2022] Representation Compensation Networks for Continual Semantic Segmentation

A Text Attention Network for Spatial Deformation Robust Scene Text Image Super-resolution (CVPR2022)

Unsupervised Domain Adaptation for Nighttime Aerial Tracking (CVPR2022)

Official code for "Towards An End-to-End Framework for Flow-Guided Video Inpainting" (CVPR2022)