The PyTorch implementation for paper "Neural Texture Extraction and Distribution for Controllable Person Image Synthesis" (CVPR2022 Oral)

Overview

ArXiv | Get Start

Neural-Texture-Extraction-Distribution

The PyTorch implementation for our paper "Neural Texture Extraction and Distribution for Controllable Person Image Synthesis" (CVPR2022 Oral)

We propose a Neural-Texture-Extraction-Distribution operation for controllable person image synthesis. Our model can be used to control the pose and appearance of a reference image:

  • Pose Control

  • Appearance Control

News

  • 2022.4.30 Colab demos are provided for quick exploration.
  • 2022.4.28 Code for PyTorch is available now!

Installation

Requirements

  • Python 3
  • PyTorch 1.7.1
  • CUDA 10.2

Conda Installation

# 1. Create a conda virtual environment.
conda create -n NTED python=3.6
conda activate NTED
conda install -c pytorch pytorch=1.7.1 torchvision cudatoolkit=10.2

# 2. Clone the Repo and Install dependencies
git clone --recursive https://github.com/RenYurui/Neural-Texture-Extraction-Distribution.git
pip install -r requirements.txt

# 3. Install mmfashion (for appearance control only)
pip install mmcv==0.5.1
pip install pycocotools==2.0.4
cd ./scripts
chmod +x insert_mmfashion2mmdetection.sh
./insert_mmfashion2mmdetection.sh
cd ../third_part/mmdetection
pip install -v -e .

Demo

Several demos are provided. Please first download the resources by runing

cd scripts
./download_demos.sh

Pose Transfer

Run the following code for the results.

PATH_TO_OUTPUT=./demo_results
python demo.py \
--config ./config/fashion_512.yaml \
--which_iter 495400 \
--name fashion_512 \
--file_pairs ./txt_files/demo.txt \
--input_dir ./demo_images \
--output_dir $PATH_TO_OUTPUT

Appearance Control

Meanwhile, run the following code for the appearance control demo.

python appearance_control.py \
--config ./config/fashion_512.yaml \
--name fashion_512 \
--which_iter 495400 \
--input_dir ./demo_images \
--file_pairs ./txt_files/appearance_control.txt

Colab Demo

Please check the Colab Demos for pose control and appearance control.

Dataset

  • Download img_highres.zip of the DeepFashion Dataset from In-shop Clothes Retrieval Benchmark.

  • Unzip img_highres.zip. You will need to ask for password from the dataset maintainers. Then rename the obtained folder as img and put it under the ./dataset/deepfashion directory.

  • We split the train/test set following GFLA. Several images with significant occlusions are removed from the training set. Download the train/test pairs and the keypoints pose.zip extracted with Openpose by runing:

    cd scripts
    ./download_dataset.sh

    Or you can download these files manually:

    • Download the train/test pairs from Google Drive including train_pairs.txt, test_pairs.txt, train.lst, test.lst. Put these files under the ./dataset/deepfashion directory.
    • Download the keypoints pose.rar extracted with Openpose from Google Driven. Unzip and put the obtained floder under the ./dataset/deepfashion directory.
  • Run the following code to save images to lmdb dataset.

    python -m scripts.prepare_data \
    --root ./dataset/deepfashion \
    --out ./dataset/deepfashion

Training

This project supports multi-GPUs training. The following code shows an example for training the model with 512x352 images using 4 GPUs.

CUDA_VISIBLE_DEVICES=0,1,2,3 python -m torch.distributed.launch \
--nproc_per_node=4 \
--master_port 1234 train.py \
--config ./config/fashion_512.yaml \
--name $name_of_your_experiment

All configs for this experiment are saved in ./config/fashion_512.yaml. If you change the number of GPUs, you may need to modify the batch_size in ./config/fashion_512.yaml to ensure using a same batch_size.

Inference

  • Download the trained weights for 512x352 images and 256x176 images. Put the obtained checkpoints under ./result/fashion_512 and ./result/fashion_256 respectively.

  • Run the following code to evaluate the trained model:

    # run evaluation for 512x352 images
    python -m torch.distributed.launch \
    --nproc_per_node=1 \
    --master_port 12345 inference.py \
    --config ./config/fashion_512.yaml \
    --name fashion_512 \
    --no_resume \
    --output_dir ./result/fashion_512/inference 
    
    # run evaluation for 256x176 images
    python -m torch.distributed.launch \
    --nproc_per_node=1 \
    --master_port 12345 inference.py \
    --config ./config/fashion_256.yaml \
    --name fashion_256 \
    --no_resume \
    --output_dir ./result/fashion_256/inference 

The result images are save in ./result/fashion_512/inference and ./result/fashion_256/inference.

Comments
  • i only found 'extraction_softmax' in the code.  Are they the same?

    i only found 'extraction_softmax' in the code. Are they the same?

    https://github.com/RenYurui/Neural-Texture-Extraction-Distribution/blob/6e3f0e94c4b6a14ceafa4f8d82950e6b4674a0b3/trainers/extraction_distribution_trainer.py#L188

    opened by olream 2
  • Incomplete pose file

    Incomplete pose file

    Hi Ren Yurui,

    Thanks for your work.

    I encountered an error when training the model. neural-Texture-Extraction-Distribution main/dataset/deepfashion/pose/WOMEN/Blouses_Shirts/id_00001612/02_4_full.txt not found. I checked train_pairs.txt and found it exists, but it does not exist in the pose file. Also, I found that the number of images in the img file is larger than the number of files in the pose file. So I would like to ask if the pose file you gave is incomplete. If so, can you release the full pose file?

    Thanks!

    opened by moonlight703 2
  • Where is

    Where is "hook_softmax"?

    我看到代码中是通过hook_softmax来实现注意力图的:

    extraction_distribution_trainer.py中的 attn_image = attn2image(info['hook_softmax'], info['semantic_distribution'], input_image)

    但是并没有在model中找到这个键位,请问它在哪里呢?感谢~

    opened by WeiYu021 1
  • Released Model Link Broken

    Released Model Link Broken

    Hi Yurui,

    Thank you for sharing the great work with the community! Nice to see your new work released!

    The link for 256x176 weights seems broken, would you be able to take a look on this? Besides, the discriminator weights seem not included in the pickle file for 512x352 weights. Could you by chance release the trained discriminator as well?

    Thanks! Aiyu

    opened by cuiaiyu 1
  • Add docker env and web demo

    Add docker env and web demo

    Hey @RenYurui! 👋

    This pull request makes it possible to run your model inside a Docker environment, which makes it easier for other people to run it. We're using an open source tool called Cog to make this process easier.

    This also means we can make a web demo where other people can try out your model! View it here: https://replicate.com/renyurui/controllable-person-synthesis. Currently, only the pose manipulation aspect is supported.

    We've added some examples to the web demo; please click the black "Claim this model" button so you own it/edit it.

    In case you're wondering who I am, I'm from Replicate, where we're trying to make machine learning reproducible by implementing models we like. 😊

    opened by vccheng2001 0
  • Size mismatch when loading the pretrained 256x176 model

    Size mismatch when loading the pretrained 256x176 model

    Hi Yurui,

    When I try to load the checkpoint for 256x176 model with the given configuration, I get size mismatch errors.

    I then found the the 256x176 checkpoint has the same name as the checkpoint of 512x352, so I load the 256x176 checkpoint with configuration for 512x352 for sanity check, and it works. I think the link of 256x176 checkpoint may actually contain the checkpoint of 512x352.

    Could you please take a look on this?

    Thank you again! Aiyu

    The error log is attached.

     Error(s) in loading state_dict for Generator:
            Unexpected key(s) in state_dict: "reference_encoder.convs.4.blur.kernel", "reference_encoder.convs.4.conv.weight", "reference_encoder.convs.4.activate.bias", "reference_encoder.convs.4.extraction_operations.0.value_conv.weight", "reference_encoder.convs.4.extraction_operations.0.value_conv.bias", "reference_encoder.convs.4.extraction_operations.0.semantic_extraction_filter.weight", "reference_encoder.convs.4.extraction_operations.1.value_conv.weight", "reference_encoder.convs.4.extraction_operations.1.value_conv.bias", "reference_encoder.convs.4.extraction_operations.1.semantic_extraction_filter.weight", "skeleton_encoder.convs.4.blur.kernel", "skeleton_encoder.convs.4.conv.weight", "skeleton_encoder.convs.4.activate.bias", "target_image_renderer.convs.5.conv0.conv.weight", "target_image_renderer.convs.5.conv0.blur.kernel", "target_image_renderer.convs.5.conv0.activate.bias", "target_image_renderer.convs.5.conv1.conv.weight", "target_image_renderer.convs.5.conv1.activate.bias", "target_image_renderer.convs.5.to_rgb.upsample.kernel", "target_image_renderer.convs.5.to_rgb.conv.weight", "target_image_renderer.convs.5.to_rgb.conv.bias", "target_image_renderer.convs.4.conv0.distribution_operation.semantic_distribution_filter.weight", "target_image_renderer.convs.4.conv0.distribution_operation.semantic_distribution_filter.bias", "target_image_renderer.convs.4.conv1.distribution_operation.semantic_distribution_filter.weight", "target_image_renderer.convs.4.conv1.distribution_operation.semantic_distribution_filter.bias".
            size mismatch for reference_encoder.first.conv.weight: copying a param with shape torch.Size([64, 3, 1, 1]) from checkpoint, the shape in current model is torch.Size([128, 3, 1, 1]).
            size mismatch for reference_encoder.first.activate.bias: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([128]).
            size mismatch for reference_encoder.convs.0.conv.weight: copying a param with shape torch.Size([128, 64, 3, 3]) from checkpoint, the shape in current model is torch.Size([256, 128, 3, 3]).
            size mismatch for reference_encoder.convs.0.activate.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]).
            size mismatch for reference_encoder.convs.0.extraction_operations.0.value_conv.weight: copying a param with shape torch.Size([128, 128, 3, 3]) from checkpoint, the shape in current model is torch.Size([256, 256, 3, 3]).
            size mismatch for reference_encoder.convs.0.extraction_operations.0.value_conv.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]).
            size mismatch for reference_encoder.convs.0.extraction_operations.0.semantic_extraction_filter.weight: copying a param with shape torch.Size([64, 128, 3, 3]) from checkpoint, the shape in current model is torch.Size([64, 256, 3, 3]).
            size mismatch for reference_encoder.convs.0.extraction_operations.1.value_conv.weight: copying a param with shape torch.Size([128, 128, 3, 3]) from checkpoint, the shape in current model is torch.Size([256, 256, 3, 3]).
            size mismatch for reference_encoder.convs.0.extraction_operations.1.value_conv.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]).
            size mismatch for reference_encoder.convs.0.extraction_operations.1.semantic_extraction_filter.weight: copying a param with shape torch.Size([64, 128, 3, 3]) from checkpoint, the shape in current model is torch.Size([64, 256, 3, 3]).
            size mismatch for reference_encoder.convs.1.conv.weight: copying a param with shape torch.Size([256, 128, 3, 3]) from checkpoint, the shape in current model is torch.Size([512, 256, 3, 3]).
            size mismatch for reference_encoder.convs.1.activate.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]).
            size mismatch for reference_encoder.convs.1.extraction_operations.0.value_conv.weight: copying a param with shape torch.Size([256, 256, 3, 3]) from checkpoint, the shape in current model is torch.Size([512, 512, 3, 3]).
            size mismatch for reference_encoder.convs.1.extraction_operations.0.value_conv.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]).
            size mismatch for reference_encoder.convs.1.extraction_operations.0.semantic_extraction_filter.weight: copying a param with shape torch.Size([64, 256, 3, 3]) from checkpoint, the shape in current model is torch.Size([64, 512, 3, 3]).
            size mismatch for reference_encoder.convs.1.extraction_operations.1.value_conv.weight: copying a param with shape torch.Size([256, 256, 3, 3]) from checkpoint, the shape in current model is torch.Size([512, 512, 3, 3]).
            size mismatch for reference_encoder.convs.1.extraction_operations.1.value_conv.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]).
            size mismatch for reference_encoder.convs.1.extraction_operations.1.semantic_extraction_filter.weight: copying a param with shape torch.Size([64, 256, 3, 3]) from checkpoint, the shape in current model is torch.Size([64, 512, 3, 3]).
            size mismatch for reference_encoder.convs.2.conv.weight: copying a param with shape torch.Size([512, 256, 3, 3]) from checkpoint, the shape in current model is torch.Size([512, 512, 3, 3]).
            size mismatch for reference_encoder.convs.3.extraction_operations.0.value_conv.weight: copying a param with shape torch.Size([512, 512, 3, 3]) from checkpoint, the shape in current model is torch.Size([512, 512, 1, 1]).
            size mismatch for reference_encoder.convs.3.extraction_operations.0.semantic_extraction_filter.weight: copying a param with shape torch.Size([32, 512, 3, 3]) from checkpoint, the shape in current model is torch.Size([16, 512, 1, 1]).
            size mismatch for reference_encoder.convs.3.extraction_operations.1.value_conv.weight: copying a param with shape torch.Size([512, 512, 3, 3]) from checkpoint, the shape in current model is torch.Size([512, 512, 1, 1]).
            size mismatch for reference_encoder.convs.3.extraction_operations.1.semantic_extraction_filter.weight: copying a param with shape torch.Size([32, 512, 3, 3]) from checkpoint, the shape in current model is torch.Size([16, 512, 1, 1]).
            size mismatch for skeleton_encoder.first.conv.weight: copying a param with shape torch.Size([64, 20, 1, 1]) from checkpoint, the shape in current model is torch.Size([128, 20, 1, 1]).
            size mismatch for skeleton_encoder.first.activate.bias: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([128]).
            size mismatch for skeleton_encoder.convs.0.conv.weight: copying a param with shape torch.Size([128, 64, 3, 3]) from checkpoint, the shape in current model is torch.Size([256, 128, 3, 3]).
            size mismatch for skeleton_encoder.convs.0.activate.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]).
            size mismatch for skeleton_encoder.convs.1.conv.weight: copying a param with shape torch.Size([256, 128, 3, 3]) from checkpoint, the shape in current model is torch.Size([512, 256, 3, 3]).
            size mismatch for skeleton_encoder.convs.1.activate.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]).
            size mismatch for skeleton_encoder.convs.2.conv.weight: copying a param with shape torch.Size([512, 256, 3, 3]) from checkpoint, the shape in current model is torch.Size([512, 512, 3, 3]).
            size mismatch for target_image_renderer.convs.2.conv0.distribution_operation.semantic_distribution_filter.weight: copying a param with shape torch.Size([32, 512, 3, 3]) from checkpoint, the shape in current model is torch.Size([64, 512, 3, 3]).
            size mismatch for target_image_renderer.convs.2.conv0.distribution_operation.semantic_distribution_filter.bias: copying a param with shape torch.Size([32]) from checkpoint, the shape in current model is torch.Size([64]).
            size mismatch for target_image_renderer.convs.2.conv1.distribution_operation.semantic_distribution_filter.weight: copying a param with shape torch.Size([32, 512, 3, 3]) from checkpoint, the shape in current model is torch.Size([64, 512, 3, 3]).
            size mismatch for target_image_renderer.convs.2.conv1.distribution_operation.semantic_distribution_filter.bias: copying a param with shape torch.Size([32]) from checkpoint, the shape in current model is torch.Size([64]).
    
    opened by cuiaiyu 1
  • checking test data

    checking test data

    hey thanks for your work! i checked some of the test images (more specific fashionMENShirts_Polosid0000180202_4full.jpg) i got very good results

    02_4_full_2_fashionMENPantsid0000014302_7additional

    i was very impressed by the ability of the model to create left shoulder tattoo so i added some black lines and got the same results

    WhatsApp Image 2022-07-07 at 11 10 57 AM_2_fashionMENPantsid0000014302_7additional

    why the model didn't generate the new lines but give the same results?

    thanks

    opened by orydatadudes 1
  • The metrics provided in NTED paper is inconsistent with the ones in GFLA

    The metrics provided in NTED paper is inconsistent with the ones in GFLA

    I note that the quantitative results in your new NTED paper is inconsistent with GFLA,can you explain this and point out the difference in your metrics calculation. NTED image GFLA image

    opened by mlyarthur 3
Owner
Ren Yurui
Ren Yurui
Code for the CVPR2022 paper "Frequency-driven Imperceptible Adversarial Attack on Semantic Similarity"

Introduction This is an official release of the paper "Frequency-driven Imperceptible Adversarial Attack on Semantic Similarity" (arxiv link). Abstrac

Leo 21 Nov 23, 2022
CVPR2022 paper "Dense Learning based Semi-Supervised Object Detection"

[CVPR2022] DSL: Dense Learning based Semi-Supervised Object Detection DSL is the first work on Anchor-Free detector for Semi-Supervised Object Detecti

Bhchen 69 Dec 8, 2022
The official codes of our CVPR2022 paper: A Differentiable Two-stage Alignment Scheme for Burst Image Reconstruction with Large Shift

TwoStageAlign The official codes of our CVPR2022 paper: A Differentiable Two-stage Alignment Scheme for Burst Image Reconstruction with Large Shift Pa

Shi Guo 32 Dec 15, 2022
Source code for CVPR2022 paper "Abandoning the Bayer-Filter to See in the Dark"

Abandoning the Bayer-Filter to See in the Dark (CVPR 2022) Paper: https://arxiv.org/abs/2203.04042 (Arxiv version) This code includes the training and

null 74 Dec 15, 2022
Official code for CVPR2022 paper: Depth-Aware Generative Adversarial Network for Talking Head Video Generation

?? Depth-Aware Generative Adversarial Network for Talking Head Video Generation (CVPR 2022) ?? If DaGAN is helpful in your photos/projects, please hel

Fa-Ting Hong 503 Jan 4, 2023
PSTR: End-to-End One-Step Person Search With Transformers (CVPR2022)

PSTR (CVPR2022) This code is an official implementation of "PSTR: End-to-End One-Step Person Search With Transformers (CVPR2022)". End-to-end one-step

Jiale Cao 28 Dec 13, 2022
[CVPR2022] Bridge-Prompt: Towards Ordinal Action Understanding in Instructional Videos

Bridge-Prompt: Towards Ordinal Action Understanding in Instructional Videos Created by Muheng Li, Lei Chen, Yueqi Duan, Zhilan Hu, Jianjiang Feng, Jie

null 58 Dec 23, 2022
Official code for "Eigenlanes: Data-Driven Lane Descriptors for Structurally Diverse Lanes", CVPR2022

[CVPR 2022] Eigenlanes: Data-Driven Lane Descriptors for Structurally Diverse Lanes Dongkwon Jin, Wonhui Park, Seong-Gyun Jeong, Heeyeon Kwon, and Cha

Dongkwon Jin 106 Dec 29, 2022
Group R-CNN for Point-based Weakly Semi-supervised Object Detection (CVPR2022)

Group R-CNN for Point-based Weakly Semi-supervised Object Detection (CVPR2022) By Shilong Zhang*, Zhuoran Yu*, Liyang Liu*, Xinjiang Wang, Aojun Zhou,

Shilong Zhang 129 Dec 24, 2022
Multi-View Consistent Generative Adversarial Networks for 3D-aware Image Synthesis (CVPR2022)

Multi-View Consistent Generative Adversarial Networks for 3D-aware Image Synthesis Multi-View Consistent Generative Adversarial Networks for 3D-aware

Xuanmeng Zhang 78 Dec 10, 2022
TCTrack: Temporal Contexts for Aerial Tracking (CVPR2022)

TCTrack: Temporal Contexts for Aerial Tracking (CVPR2022) Ziang Cao and Ziyuan Huang and Liang Pan and Shiwei Zhang and Ziwei Liu and Changhong Fu In

Intelligent Vision for Robotics in Complex Environment 100 Dec 19, 2022
Incremental Transformer Structure Enhanced Image Inpainting with Masking Positional Encoding (CVPR2022)

Incremental Transformer Structure Enhanced Image Inpainting with Masking Positional Encoding by Qiaole Dong*, Chenjie Cao*, Yanwei Fu Paper and Supple

Qiaole Dong 190 Dec 27, 2022
FaceVerse: a Fine-grained and Detail-controllable 3D Face Morphable Model from a Hybrid Dataset (CVPR2022)

FaceVerse FaceVerse: a Fine-grained and Detail-controllable 3D Face Morphable Model from a Hybrid Dataset Lizhen Wang, Zhiyuan Chen, Tao Yu, Chenguang

Lizhen Wang 219 Dec 28, 2022
Towards Implicit Text-Guided 3D Shape Generation (CVPR2022)

Towards Implicit Text-Guided 3D Shape Generation Towards Implicit Text-Guided 3D Shape Generation (CVPR2022) Code for the paper [Towards Implicit Text

null 55 Dec 16, 2022
Video Frame Interpolation with Transformer (CVPR2022)

VFIformer Official PyTorch implementation of our CVPR2022 paper Video Frame Interpolation with Transformer Dependencies python >= 3.8 pytorch >= 1.8.0

DV Lab 63 Dec 16, 2022
[CVPR2022] Representation Compensation Networks for Continual Semantic Segmentation

RCIL [CVPR2022] Representation Compensation Networks for Continual Semantic Segmentation Chang-Bin Zhang1, Jia-Wen Xiao1, Xialei Liu1, Ying-Cong Chen2

Chang-Bin Zhang 71 Dec 28, 2022
A Text Attention Network for Spatial Deformation Robust Scene Text Image Super-resolution (CVPR2022)

A Text Attention Network for Spatial Deformation Robust Scene Text Image Super-resolution (CVPR2022) https://arxiv.org/abs/2203.09388 Jianqi Ma, Zheto

MA Jianqi, shiki 104 Jan 5, 2023
Unsupervised Domain Adaptation for Nighttime Aerial Tracking (CVPR2022)

Unsupervised Domain Adaptation for Nighttime Aerial Tracking (CVPR2022) Junjie Ye, Changhong Fu, Guangze Zheng, Danda Pani Paudel, and Guang Chen. Uns

Intelligent Vision for Robotics in Complex Environment 91 Dec 30, 2022
Official code for "Towards An End-to-End Framework for Flow-Guided Video Inpainting" (CVPR2022)

E2FGVI (CVPR 2022) English | 简体中文 This repository contains the official implementation of the following paper: Towards An End-to-End Framework for Flo

Media Computing Group @ Nankai University 537 Jan 7, 2023