mirror of
https://github.com/huggingface/diffusers.git
synced 2025-12-12 15:34:17 +08:00
56 lines
4.8 KiB
Markdown
56 lines
4.8 KiB
Markdown
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
|
|
|
|
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
|
the License. You may obtain a copy of the License at
|
|
|
|
http://www.apache.org/licenses/LICENSE-2.0
|
|
|
|
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
|
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
|
specific language governing permissions and limitations under the License.
|
|
-->
|
|
|
|
# DiffEdit
|
|
|
|
[DiffEdit: Diffusion-based semantic image editing with mask guidance](https://huggingface.co/papers/2210.11427) is by Guillaume Couairon, Jakob Verbeek, Holger Schwenk, and Matthieu Cord.
|
|
|
|
The abstract from the paper is:
|
|
|
|
*Image generation has recently seen tremendous advances, with diffusion models allowing to synthesize convincing images for a large variety of text prompts. In this article, we propose DiffEdit, a method to take advantage of text-conditioned diffusion models for the task of semantic image editing, where the goal is to edit an image based on a text query. Semantic image editing is an extension of image generation, with the additional constraint that the generated image should be as similar as possible to a given input image. Current editing methods based on diffusion models usually require to provide a mask, making the task much easier by treating it as a conditional inpainting task. In contrast, our main contribution is able to automatically generate a mask highlighting regions of the input image that need to be edited, by contrasting predictions of a diffusion model conditioned on different text prompts. Moreover, we rely on latent inference to preserve content in those regions of interest and show excellent synergies with mask-based diffusion. DiffEdit achieves state-of-the-art editing performance on ImageNet. In addition, we evaluate semantic image editing in more challenging settings, using images from the COCO dataset as well as text-based generated images.*
|
|
|
|
The original codebase can be found at [Xiang-cd/DiffEdit-stable-diffusion](https://github.com/Xiang-cd/DiffEdit-stable-diffusion), and you can try it out in this [demo](https://blog.problemsolversguild.com/technical/research/2022/11/02/DiffEdit-Implementation.html).
|
|
|
|
This pipeline was contributed by [clarencechen](https://github.com/clarencechen). ❤️
|
|
|
|
## Tips
|
|
|
|
* The pipeline can generate masks that can be fed into other inpainting pipelines.
|
|
* In order to generate an image using this pipeline, both an image mask (source and target prompts can be manually specified or generated, and passed to [`~StableDiffusionDiffEditPipeline.generate_mask`])
|
|
and a set of partially inverted latents (generated using [`~StableDiffusionDiffEditPipeline.invert`]) _must_ be provided as arguments when calling the pipeline to generate the final edited image.
|
|
* The function [`~StableDiffusionDiffEditPipeline.generate_mask`] exposes two prompt arguments, `source_prompt` and `target_prompt`
|
|
that let you control the locations of the semantic edits in the final image to be generated. Let's say,
|
|
you wanted to translate from "cat" to "dog". In this case, the edit direction will be "cat -> dog". To reflect
|
|
this in the generated mask, you simply have to set the embeddings related to the phrases including "cat" to
|
|
`source_prompt` and "dog" to `target_prompt`.
|
|
* When generating partially inverted latents using `invert`, assign a caption or text embedding describing the
|
|
overall image to the `prompt` argument to help guide the inverse latent sampling process. In most cases, the
|
|
source concept is sufficiently descriptive to yield good results, but feel free to explore alternatives.
|
|
* When calling the pipeline to generate the final edited image, assign the source concept to `negative_prompt`
|
|
and the target concept to `prompt`. Taking the above example, you simply have to set the embeddings related to
|
|
the phrases including "cat" to `negative_prompt` and "dog" to `prompt`.
|
|
* If you wanted to reverse the direction in the example above, i.e., "dog -> cat", then it's recommended to:
|
|
* Swap the `source_prompt` and `target_prompt` in the arguments to `generate_mask`.
|
|
* Change the input prompt in [`~StableDiffusionDiffEditPipeline.invert`] to include "dog".
|
|
* Swap the `prompt` and `negative_prompt` in the arguments to call the pipeline to generate the final edited image.
|
|
* The source and target prompts, or their corresponding embeddings, can also be automatically generated. Please refer to the [DiffEdit](../../using-diffusers/diffedit) guide for more details.
|
|
|
|
## StableDiffusionDiffEditPipeline
|
|
[[autodoc]] StableDiffusionDiffEditPipeline
|
|
- all
|
|
- generate_mask
|
|
- invert
|
|
- __call__
|
|
|
|
## StableDiffusionPipelineOutput
|
|
[[autodoc]] pipelines.stable_diffusion.StableDiffusionPipelineOutput
|