HairCLIP: Design Your Hair by Text and Reference Image

Last update: Jan 6, 2023

Related tags

Deep Learning HairCLIP

Overview

This repository hosts the official PyTorch implementation of the paper: "HairCLIP: Design Your Hair by Text and Reference Image".

Our single framework supports hairstyle and hair color editing individually or jointly, and conditional inputs can come from either image or text domain.

Tianyi Wei¹, Dongdong Chen², Wenbo Zhou¹, Jing Liao³, Zhentao Tan¹, Lu Yuan², Weiming Zhang¹, Nenghai Yu¹
¹University of Science and Technology of China, ²Microsoft Cloud AI, ³City University of Hong Kong

Abstract

Hair editing is an interesting and challenging problem in computer vision and graphics. Many existing methods require well-drawn sketches or masks as conditional inputs for editing, however these interactions are neither straightforward nor efficient. In order to free users from the tedious interaction process, this paper proposes a new hair editing interaction mode, which enables manipulating hair attributes individually or jointly based on the texts or reference images provided by users. For this purpose, we encode the image and text conditions in a shared embedding space and propose a unified hair editing framework by leveraging the powerful image text representation capability of the Contrastive Language-Image Pre-Training (CLIP) model. With the carefully designed network structures and loss functions, our framework can perform high-quality hair editing in a disentangled manner. Extensive experiments demonstrate the superiority of our approach in terms of manipulation accuracy, visual realism of editing results, and irrelevant attribute preservation.

Comparison

Comparison to Text-Driven Image Manipulation Methods

Comparison to Hair Transfer Methods

Application

Hair Interpolation

Generalization Ability to Unseen Descriptions

Cross-Modal Conditional Inputs

To Do

Release testing code
Release pretrained model
Release training code

Comments

Except hair coloe change only, but hair style of some results are change

I want to change hair color on FFHQ data, however hairstyle of some of results are change. Did I do wrong? The following is my command

python scripts/inference.py
--exp_dir=./experiment
--checkpoint_path=../pretrained_models/hairclip.pt
--latents_test_path=./latents.pt
--editing_type=color
--input_type=text
--color_description=red

opened by kasim0226 7
Demo Play ?

Hi. 🤗 This is an awesome work. 👍 Thanks for all of you, the contributors. 🌹 I am wondering if you could tell me if you have any plan to make one demo public on huggingface/spaces, etc. 🤔 ？

opened by ZenMoore 3
Can I use my own image test?

Hello, can I use my own image for the resend test, I found that the input was test_face.pt (test data set ?) file, and I did not find the input image content in the code, The only thing that feels like an input image is w(w=torch.Size([1, 18, 512])), But it's not the size of a picture

opened by liuzhuangyuan 2
The generated image is quite different from the reference image

I tested the effect and found that the hair style of the generated image is quite different from that of the reference image. Here is my test script. The reference image is selected from CelebAMask-HQ dataset. Is there a problem in my test process？

python scripts/inference.py \ --exp_dir=../outputs/0321/ \ --checkpoint_path=../pretrained_models/hairclip.pt \ --latents_test_path=../pretrained_models/test_faces.pt \ --editing_type=both \ --input_type=image_image \ --color_ref_img_test_path=../input/16 \ --hairstyle_ref_img_test_path=../input/16 --num_of_ref_img 1

opened by 1273545169 2
about pretrained unet infer

mask_512 = (torch.unsqueeze(torch.max(labels_predict, 1)[1], 1)==13).float() 1.why hair equal 13, bg not equal 13? 2.unet infer results that have 19 channels, what did they means?

opened by eeric 2
About the training details.

Thank you for your great project!

In this paper, you said “We train and evaluate our hair mapper on the CelebA-HQ dataset. Since we use e4e [43] as our inversion encoder, we follow its division of the training set and test set.” However, I found that e4e used the FFHQ dataset for training and the CelebA-HQ test dataset for evaluation. Hence, I feel confused. My question is that how to split the training and test datasets on the CelebA-HQ dataset?

opened by bb12346 2
About modulation module

Hi, Great work! But I have a question about the modulation module of mapper network. I assume the dimension of x and e should be 1x1xC. If so, what is the mean and std of x? channel-wise average? And how about the output dimensions of fr(e) and fb(e)?

Thanks.

opened by janchen0611 2
用两张图片测试的时候报错

输入命令： E:\Linux\XSpace\papers\HairCLIP\mapper>python scripts/inference.py --exp_dir=E:\Linux\XSpace\pap ers\HairCLIP\data\exp --checkpoint_path=F:\Dataset\CelebA\Data\hairclip.pt --latents_test_path=F:\Dataset\CelebA\Data\test_faces.pt --editin g_type=color --input_type=image --hairstyle_description="hairstyle_list.txt" --color_ref_img_test_path=E:\Linux\XSpace\papers\HairCLIP\data
ref

在 latent_mappers.py 中的 x = clip_model.encode_image(masked_generated_renormed) 报错了，错误信息如下：

*** RuntimeError: The following operation failed in the TorchScript interpreter. Traceback of TorchScript, serialized code (most recent call last): File "code/torch/multimodal/model/multimodal_transformer/___torch_mangle_9591.py", line 19, in encode_image _0 = self.visual input = torch.to(image, torch.device("cuda:0"), 5, False, False, None) return (_0).forward(input, ) ~~~~~~~~~~~ <--- HERE def encode_text(self: torch.multimodal.model.multimodal_transformer.___torch_mangle_9591.Multimodal, input: Tensor) -> Tensor: File "code/torch/multimodal/model/multimodal_transformer.py", line 34, in forward x2 = torch.add(x1, torch.to(_4, 5, False, False, None), alpha=1) x3 = torch.permute((_3).forward(x2, ), [1, 0, 2]) x4 = torch.permute((_2).forward(x3, ), [1, 0, 2]) ~~~~~~~~~~~ <--- HERE _15 = torch.slice(x4, 0, 0, 9223372036854775807, 1) x5 = torch.slice(torch.select(_15, 1, 0), 1, 0, 9223372036854775807, 1) File "code/torch/multimodal/model/multimodal_transformer/___torch_mangle_9477.py", line 8, in forward def forward(self: torch.multimodal.model.multimodal_transformer.___torch_mangle_9477.Transformer, x: Tensor) -> Tensor: return (self.resblocks).forward(x, ) ~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE def forward1(self: torch.multimodal.model.multimodal_transformer.___torch_mangle_9477.Transformer, x: Tensor) -> Tensor: File "code/torch/torch/nn/modules/container/___torch_mangle_9476.py", line 29, in forward _8 = getattr(self, "3") _9 = getattr(self, "2") _10 = (getattr(self, "1")).forward((getattr(self, "0")).forward(x, ), ) ~~~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE _11 = (_7).forward((_8).forward((_9).forward(_10, ), ), ) _12 = (_4).forward((_5).forward((_6).forward(_11, ), ), ) File "code/torch/multimodal/model/multimodal_transformer/___torch_mangle_9376.py", line 13, in forward _0 = self.mlp _1 = self.ln_2 _2 = (self.attn).forward((self.ln_1).forward(x, ), ) ~~~~~~~~~~~~~~~~~~ <--- HERE x0 = torch.add(x, _2, alpha=1) x1 = torch.add(x0, (_0).forward((_1).forward(x0, ), ), alpha=1) File "code/torch/torch/nn/modules/activation/___torch_mangle_9369.py", line 38, in forward _16 = [-1, int(torch.mul(bsz, CONSTANTS.c0)), _8] v0 = torch.transpose(torch.view(_15, _16), 0, 1) attn_output_weights = torch.bmm(q2, torch.transpose(k0, 1, 2)) ~~~~~~~~~ <--- HERE input = torch.softmax(attn_output_weights, -1, None) attn_output_weights0 = torch.dropout(input, 0., True)

Traceback of TorchScript, original code (most recent call last): /opt/conda/lib/python3.7/site-packages/torch/nn/functional.py(4294): multi_head_attention_forward /opt/conda/lib/python3.7/site-packages/torch/nn/modules/activation.py(985): forward /opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py(709): _slow_forward /opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py(725): _call_impl /root/workspace/multimodal-pytorch/multimodal/model/multimodal_transformer.py(45): attention /root/workspace/multimodal-pytorch/multimodal/model/multimodal_transformer.py(48): forward /opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py(709): _slow_forward /opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py(725): _call_impl /opt/conda/lib/python3.7/site-packages/torch/nn/modules/container.py(117): forward /opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py(709): _slow_forward /opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py(725): _call_impl /root/workspace/multimodal-pytorch/multimodal/model/multimodal_transformer.py(63): forward /opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py(709): _slow_forward /opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py(725): _call_impl /root/workspace/multimodal-pytorch/multimodal/model/multimodal_transformer.py(93): forward /opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py(709): _slow_forward /opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py(725): _call_impl /root/workspace/multimodal-pytorch/multimodal/model/multimodal_transformer.py(221): visual_forward /opt/conda/lib/python3.7/site-packages/torch/jit/_trace.py(940): trace_module (36): export_torchscript_models (3): /opt/conda/lib/python3.7/site-packages/IPython/core/interactiveshell.py(3418): run_code /opt/conda/lib/python3.7/site-packages/IPython/core/interactiveshell.py(3338): run_ast_nodes /opt/conda/lib/python3.7/site-packages/IPython/core/interactiveshell.py(3147): run_cell_async /opt/conda/lib/python3.7/site-packages/IPython/core/async_helpers.py(68): _pseudo_sync_runner /opt/conda/lib/python3.7/site-packages/IPython/core/interactiveshell.py(2923): _run_cell /opt/conda/lib/python3.7/site-packages/IPython/core/interactiveshell.py(2878): run_cell /opt/conda/lib/python3.7/site-packages/IPython/terminal/interactiveshell.py(555): interact /opt/conda/lib/python3.7/site-packages/IPython/terminal/interactiveshell.py(564): mainloop /opt/conda/lib/python3.7/site-packages/IPython/terminal/ipapp.py(356): start /opt/conda/lib/python3.7/site-packages/traitlets/config/application.py(845): launch_instance /opt/conda/lib/python3.7/site-packages/IPython/init.py(126): start_ipython /opt/conda/bin/ipython(8): RuntimeError: cublas runtime error : unknown error at C:/cb/pytorch_1000000000000/work/aten/src/THC/THCBlas.cu:225 (Pdb) img_tensor.shape torch.Size([1, 3, 1024, 1024])

请问是输入的tensor大小不对吗

opened by hello-lx 1
Is is normal speed?

Hello, I want to ask if the speed of run the inferrence.py for testing is normal. This is my executive code: cd mapper python scripts/inference.py
--exp_dir=/home/ps/HairCLIP/mapper/path/to/experiment
--checkpoint_path=/home/ps/HairCLIP/pretrained_models/hairclip.pt
--latents_test_path=/home/ps/HairCLIP/mapper/path/to/test_faces.pt
--editing_type=hairstyle
--input_type=text
--hairstyle_description="/home/ps/HairCLIP/mapper/hairstyle_list.txt" \

opened by sunhaha123 1
Hairstyles can only show so much
First of all , thanks for you excellent work! There are many hairstyles in hairstyle.txt, but actually I found only a few styles in result images after trying all styles. More or less repeat the following images.

cornrows cut hairstyle

crew cut hairstyle (the points on left glasses in right image is mouse)

the following is my command:

python scripts/inference.py --exp_dir=../result/test_1/ --checkpoint_path=../pretrained_models/hairclip.pt --latents_test_path=../inference_data/test_1/latent.pt --editing_type=hairstyle --input_type=text --hairstyle_description="hairstyle_list.txt"

What's the problem? Should I train with my own dataset?

I list some hairstyles which have the same effect:

the same as cornrows: crown braid hairstyle, dreadlocks hairstyle, finger waves hairstyle, french braid hairstyle and so on.

the same as crew cut hairstyle: caesar cut hairstyle, dido flip hairstyle, extensions hairstyle, fade hairstyle, fauxhawk hairstyle, frosted tops hairstyle ,full crown hairstyle, harvard clip hairstyle, high and tigh hairstyle, hime cut hairstyle, hi-top fade hairstyle and so son.
opened by ZziTaiLeo 1
Hosting HairCLIP model

Hi!

First off, thank you for your work!

I'm trying to create a Colab Notebook to play with your model, but since the weights and stuff are hosted inside google drive, the download limits seems to restrict me from simply downloading it with gdown or wget.

Could I download it and move it to another hosting service (i.e archive.org) to avoid this issue? Of course, I would add all the references to all the authors and parties involved.

Again, thanks for your work!

opened by ouhenio 1
F and C

Hello, boss. I noticed that the neural network structure diagram may be incorrectly drawn in the paper. F should be fine, meaning high-level semantic information; C should be coarse, meaning low-level semantic information.

opened by 123456klk1 0
question of split database(train.pt and test.pt)

@wty-ustc Thank you for the amazing work! I try to split the CelebA-HQ by official list_eval_partition.txt. Eventually, I got 24183/2993/2824 images for training/validation/testing split. but i found the len of train.pt is 24176 ...so... I'm very confused about what data you're used?

opened by ssxxx1a 0
About Video Hair Editing

Thanks you for you great works! Do you think video hair editing based on HairCLIP is achievable？ I have a little try, but the region of hairstyle still hard to control. Consistency in hair styles is quite difficult to maintain. Can you give me some insights about video-hairstyle-editing?

opened by ZziTaiLeo 0

HairCLIP: Design Your Hair by Text and Reference Image

Related tags

Overview

Overview

Abstract

Comparison

Comparison to Text-Driven Image Manipulation Methods

Comparison to Hair Transfer Methods

Application

Hair Interpolation

Generalization Ability to Unseen Descriptions

Cross-Modal Conditional Inputs

To Do

Comments

Owner

This repository contains numerical implementation for the paper Intertemporal Pricing under Reference Effects: Integrating Reference Effects and Consumer Heterogeneity.

Image morphing without reference points by applying warp maps and optimizing over them.

MASA-SR: Matching Acceleration and Spatial Adaptation for Reference-Based Image Super-Resolution (CVPR2021)

Pip-package for trajectory benchmarking from "Be your own Benchmark: No-Reference Trajectory Metric on Registered Point Clouds", ECMR'21

No-reference Image Quality Assessment(NIQA) Algorithms (BRISQUE, NIQE, PIQE, RankIQA, MetaIQA)

Code for reproducing our analysis in the paper titled: Image Cropping on Twitter: Fairness Metrics, their Limitations, and the Importance of Representation, Design, and Agency

FuseDream: Training-Free Text-to-Image Generationwith Improved CLIP+GAN Space OptimizationFuseDream: Training-Free Text-to-Image Generationwith Improved CLIP+GAN Space Optimization

A 1.3B text-to-image generation model trained on 14 million image-text pairs

An open source Jetson Nano baseboard and tools to design your own.

Deep Text Search is an AI-powered multilingual text search and recommendation engine with state-of-the-art transformer-based multilingual text embedding (50+ languages).

TAP: Text-Aware Pre-training for Text-VQA and Text-Caption, CVPR 2021 (Oral)

Pytorch re-implementation of Paper: SwinTextSpotter: Scene Text Spotting via Better Synergy between Text Detection and Text Recognition (CVPR 2022)

Fast, modular reference implementation of Instance Segmentation and Object Detection algorithms in PyTorch.

Reference implementation of code generation projects from Facebook AI Research. General toolkit to apply machine learning to code, from dataset creation to model training and evaluation. Comes with pretrained models.

YouRefIt: Embodied Reference Understanding with Language and Gesture

Wanli Li and Tieyun Qian: Exploit a Multi-head Reference Graph for Semi-supervised Relation Extraction, IJCNN 2021