Official implementation for: Blended Diffusion for Text-driven Editing of Natural Images.

Overview

Blended Diffusion for Text-driven Editing of Natural Images

Blended Diffusion for Text-driven Editing of Natural Images
Omri Avrahami, Dani Lischinski, Ohad Fried

Abstract: Natural language offers a highly intuitive interface for image editing. In this paper, we introduce the first solution for performing local (region-based) edits in generic natural images, based on a natural language description along with an ROI mask. We achieve our goal by leveraging and combining a pretrained language-image model (CLIP), to steer the edit towards a user-provided text prompt, with a denoising diffusion probabilistic model (DDPM) to generate natural-looking results. To seamlessly fuse the edited region with the unchanged parts of the image, we spatially blend noised versions of the input image with the local text-guided diffusion latent at a progression of noise levels. In addition, we show that adding augmentations to the diffusion process mitigates adversarial results. We compare against several baselines and related methods, both qualitatively and quantitatively, and show that our method outperforms these solutions in terms of overall realism, ability to preserve the background and matching the text. Finally, we show several text-driven editing applications, including adding a new object to an image, removing/replacing/altering existing objects, background replacement, and image extrapolation.

Applications

Multiple synthesis results for the same prompt

Synthesis results for different prompts

Altering part of an existing object

Background replacement

Scribble-guided editing

Text-guided extrapolation

Composing several applications

Code availability

Full code will be released soon.

Comments
  • Question about training

    Question about training

    Hi, this is really an impressive work! Two question here.

    1. I would like ask is the overall process of the text-guided image editing is using only pre-trained model without any extra training or fine-tuning?
    2. If it does not required any further fine-tuning or training, what is the purpose of having diffusion guided loss (which combine loss from CLIP model and background preservation loss)?

    Thanks in advance for your clarification!

    opened by JacksonCakes 4
  • model_output_size

    model_output_size

    in image_editor.py:

    self.model.load_state_dict( torch.load( "checkpoints/256x256_diffusion_uncond.pt" if self.args.model_output_size == 256 else "checkpoints/512x512_diffusion.pt", map_location="cpu", ) )

    mentioned '512x512_diffusion', is it an conditional or unconditional model? (can you share me its download link?) It is natural that your method is built on a pretrained uncontitional model. If '512x512_diffusion' is a conditioonal model, what is the condition for face data for example? can you help me figure out this?

    opened by fido20160817 2
  • Scribble-guided editing

    Scribble-guided editing

    Hi! I wonder if a loss such as MSE or LPIPS is used between the user-provided scribbles and the scribbled regions of $\widehat{x}_0$ , in addition to the CLIP loss. I am curious how the shapes and colors stay consistent when only text with no specific description, e.g., "blanket" in Fig 9, is given.

    opened by wileewang 2
  • AttributeError: 'PosixPath' object has no attribute 'with_stem'

    AttributeError: 'PosixPath' object has no attribute 'with_stem'

    Thanks for opening your code. I really appreciated that.

    I tried to run your code in the Google Colab with GPU Runtime.

    I got an error. And I couldn't find any solution despite of googling...

    The error message is as follows : AttributeError: 'PosixPath' object has no attribute 'with_stem'

    I think it's kinda related to "pathlib" module. maybe it's due to the fact that pathlib doesn't work well with latest python 3.x version which I'm using.

    I hope this error will be solved soon QQ

    -----detail description-------

    I run the terminal argument like this as follows : python main.py -p "rock" -i "input_example/img.png" --mask "input_example/mask.png" --output_path "output" --batch_size 1

    And, I got this whole bunch of error messages as follow : Using device: cuda:0 tcmalloc: large alloc 2209964032 bytes == 0x89ebe000 @ 0x7fbdefa2cb6b 0x7fbdefa4c379 0x7fbd30b1026e 0x7fbd30b119e2 0x7fbd334aeee1 0x7fbdd598e236 0x7fbdd541ef98 0x593784 0x594731 0x548cc1 0x51566f 0x549e0e 0x593fce 0x5118f8 0x549e0e 0x4bcb19 0x59582d 0x595b69 0x62026d 0x55de15 0x59af67 0x515655 0x549e0e 0x4bca8a 0x5134a6 0x549576 0x593fce 0x548ae9 0x5127f1 0x4bc98a 0x532b86 Setting up [LPIPS] perceptual loss: trunk [vgg], v[0.1], spatial [off] Loading model from: /usr/local/lib/python3.7/dist-packages/lpips/weights/v0.1/vgg.pth Start iterations 0 0% 0/75 [00:00<?, ?it/s]clip_loss - 867.99 range_loss - 0.00

    Traceback (most recent call last): File "main.py", line 8, in <module> image_editor.edit_image_by_prompt()

    File "/content/drive/MyDrive/ws/blended-diffusion/optimization/image_editor.py", line 266, in edit_image_by_prompt visualization_path = visualization_path.with_stem(

    AttributeError: 'PosixPath' object has no attribute 'with_stem' 0% 0/75 [00:01<?, ?it/s]

    The codes below are the ones that error comes in. Those are from blended-diffusion/optimization/image_editor.py python file.


    line 1 - from pathlib import Path ... line 261 - for b in range(self.args.batch_size): line 262 - pred_image = sample["pred_xstart"][b] line 263 - visualization_path = Path( line 264 - os.path.join(self.args.output_path, self.args.output_file) line 265 - ) line 266 - visualization_path = visualization_path.with_stem(

    opened by ngys321 1
  • Purpose of skip_timesteps

    Purpose of skip_timesteps

    Hi, I would like to ask what is the purpose of having skip_timesteps in the code? I can't seems to find any related information on this from the paper.

    opened by JacksonCakes 1
Owner
null
Minimal diffusion models - Minimal code and simple experiments to play with Denoising Diffusion Probabilistic Models (DDPMs)

Minimal code and simple experiments to play with Denoising Diffusion Probabilist

Rithesh Kumar 16 Oct 6, 2022
An image base contains 490 images for learning (400 cars and 90 boats), and another 21 images for testingAn image base contains 490 images for learning (400 cars and 90 boats), and another 21 images for testing

SVM Données Une base d’images contient 490 images pour l’apprentissage (400 voitures et 90 bateaux), et encore 21 images pour fait des tests. Prétrait

Achraf Rahouti 3 Nov 30, 2021
Commonality in Natural Images Rescues GANs: Pretraining GANs with Generic and Privacy-free Synthetic Data - Official PyTorch Implementation (CVPR 2022)

Commonality in Natural Images Rescues GANs: Pretraining GANs with Generic and Privacy-free Synthetic Data (CVPR 2022) Potentials of primitive shapes f

null 31 Sep 27, 2022
A Repository of Community-Driven Natural Instructions

A Repository of Community-Driven Natural Instructions TLDR; this repository maintains a community effort to create a large collection of tasks and the

AI2 244 Jan 4, 2023
Flybirds - BDD-driven natural language automated testing framework, present by Trip Flight

Flybird | English Version 行为驱动开发(Behavior-driven development,缩写BDD),是一种软件过程的思想或者

Ctrip, Inc. 706 Dec 30, 2022
Pytorch implementation of "Grad-TTS: A Diffusion Probabilistic Model for Text-to-Speech"

GradTTS Unofficial Pytorch implementation of "Grad-TTS: A Diffusion Probabilistic Model for Text-to-Speech" (arxiv) About this repo This is an unoffic

HeyangXue1997 103 Dec 23, 2022
PyTorch Implementation of DiffGAN-TTS: High-Fidelity and Efficient Text-to-Speech with Denoising Diffusion GANs

DiffGAN-TTS - PyTorch Implementation PyTorch implementation of DiffGAN-TTS: High

Keon Lee 157 Jan 1, 2023
Official PyTorch implementation for FastDPM, a fast sampling algorithm for diffusion probabilistic models

Official PyTorch implementation for "On Fast Sampling of Diffusion Probabilistic Models". FastDPM generation on CIFAR-10, CelebA, and LSUN datasets. S

Zhifeng Kong 68 Dec 26, 2022
Official implementation for "Style Transformer for Image Inversion and Editing" (CVPR 2022)

Style Transformer for Image Inversion and Editing (CVPR2022) https://arxiv.org/abs/2203.07932 Existing GAN inversion methods fail to provide latent co

Xueqi Hu 153 Dec 2, 2022
Official PyTorch implementation of "IntegralAction: Pose-driven Feature Integration for Robust Human Action Recognition in Videos", CVPRW 2021

IntegralAction: Pose-driven Feature Integration for Robust Human Action Recognition in Videos Introduction This repo is official PyTorch implementatio

Gyeongsik Moon 29 Sep 24, 2022
A Python script that creates subtitles of a given length from text paragraphs that can be easily imported into any Video Editing software such as FinalCut Pro for further adjustments.

Text to Subtitles - Python This python file creates subtitles of a given length from text paragraphs that can be easily imported into any Video Editin

Dmytro North 9 Dec 24, 2022
InDuDoNet+: A Model-Driven Interpretable Dual Domain Network for Metal Artifact Reduction in CT Images

InDuDoNet+: A Model-Driven Interpretable Dual Domain Network for Metal Artifact Reduction in CT Images Hong Wang, Yuexiang Li, Haimiao Zhang, Deyu Men

Hong Wang 4 Dec 27, 2022
(ICCV 2021) Official code of "Dressing in Order: Recurrent Person Image Generation for Pose Transfer, Virtual Try-on and Outfit Editing."

Dressing in Order (DiOr) ?? [Paper] ?? [Webpage] ?? [Running this code] The official implementation of "Dressing in Order: Recurrent Person Image Gene

Aiyu Cui 277 Dec 28, 2022
Official code release for: EditGAN: High-Precision Semantic Image Editing

Official code release for: EditGAN: High-Precision Semantic Image Editing

null 565 Jan 5, 2023
[SIGGRAPH 2022 Journal Track] AvatarCLIP: Zero-Shot Text-Driven Generation and Animation of 3D Avatars

AvatarCLIP: Zero-Shot Text-Driven Generation and Animation of 3D Avatars Fangzhou Hong1*  Mingyuan Zhang1*  Liang Pan1  Zhongang Cai1,2,3  Lei Yang2 

Fangzhou Hong 749 Jan 4, 2023
Pytorch Implementation of DiffSinger: Diffusion Acoustic Model for Singing Voice Synthesis (TTS Extension)

DiffSinger - PyTorch Implementation PyTorch implementation of DiffSinger: Diffusion Acoustic Model for Singing Voice Synthesis (TTS Extension). Status

Keon Lee 152 Jan 2, 2023
Implementation of Retrieval-Augmented Denoising Diffusion Probabilistic Models in Pytorch

Retrieval-Augmented Denoising Diffusion Probabilistic Models (wip) Implementation of Retrieval-Augmented Denoising Diffusion Probabilistic Models in P

Phil Wang 55 Jan 1, 2023
Implementation of GeoDiff: a Geometric Diffusion Model for Molecular Conformation Generation (ICLR 2022).

GeoDiff: a Geometric Diffusion Model for Molecular Conformation Generation [OpenReview] [arXiv] [Code] The official implementation of GeoDiff: A Geome

Minkai Xu 155 Dec 26, 2022
Pytorch re-implementation of Paper: SwinTextSpotter: Scene Text Spotting via Better Synergy between Text Detection and Text Recognition (CVPR 2022)

SwinTextSpotter This is the pytorch implementation of Paper: SwinTextSpotter: Scene Text Spotting via Better Synergy between Text Detection and Text R

mxin262 183 Jan 3, 2023