mirror of
https://github.com/huggingface/diffusers.git
synced 2025-12-19 19:04:49 +08:00
* refactor adapter docs * ip-adapter * ip adapter * fix toctree * fix toctree * lora * images * controlnet * feedback * controlnet * t2i * fix typo * feedback --------- Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
65 lines
3.1 KiB
Markdown
65 lines
3.1 KiB
Markdown
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
|
|
|
|
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
|
the License. You may obtain a copy of the License at
|
|
|
|
http://www.apache.org/licenses/LICENSE-2.0
|
|
|
|
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
|
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
|
specific language governing permissions and limitations under the License.
|
|
-->
|
|
|
|
# Textual Inversion
|
|
|
|
[Textual Inversion](https://huggingface.co/papers/2208.01618) is a method for generating personalized images of a concept. It works by fine-tuning a models word embeddings on 3-5 images of the concept (for example, pixel art) that is associated with a unique token (`<sks>`). This allows you to use the `<sks>` token in your prompt to trigger the model to generate pixel art images.
|
|
|
|
Textual Inversion weights are very lightweight and typically only a few KBs because they're only word embeddings. However, this also means the word embeddings need to be loaded after loading a model with [`~DiffusionPipeline.from_pretrained`].
|
|
|
|
```py
|
|
import torch
|
|
from diffusers import AutoPipelineForText2Image
|
|
|
|
pipeline = AutoPipelineForText2Image.from_pretrained(
|
|
"stable-diffusion-v1-5/stable-diffusion-v1-5",
|
|
torch_dtype=torch.float16
|
|
).to("cuda")
|
|
```
|
|
|
|
Load the word embeddings with [`~loaders.TextualInversionLoaderMixin.load_textual_inversion`] and include the unique token in the prompt to activate its generation.
|
|
|
|
```py
|
|
pipeline.load_textual_inversion("sd-concepts-library/gta5-artwork")
|
|
prompt = "A cute brown bear eating a slice of pizza, stunning color scheme, masterpiece, illustration, <gta5-artwork> style"
|
|
pipeline(prompt).images[0]
|
|
```
|
|
|
|
<div class="flex justify-center">
|
|
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/load_txt_embed.png" />
|
|
</div>
|
|
|
|
Textual Inversion can also be trained to learn *negative embeddings* to steer generation away from unwanted characteristics such as "blurry" or "ugly". It is useful for improving image quality.
|
|
|
|
EasyNegative is a widely used negative embedding that contains multiple learned negative concepts. Load the negative embeddings and specify the file name and token associated with the negative embeddings. Pass the token to `negative_prompt` in your pipeline to activate it.
|
|
|
|
```py
|
|
import torch
|
|
from diffusers import AutoPipelineForText2Image
|
|
|
|
pipeline = AutoPipelineForText2Image.from_pretrained(
|
|
"stable-diffusion-v1-5/stable-diffusion-v1-5",
|
|
torch_dtype=torch.float16
|
|
).to("cuda")
|
|
pipeline.load_textual_inversion(
|
|
"EvilEngine/easynegative",
|
|
weight_name="easynegative.safetensors",
|
|
token="easynegative"
|
|
)
|
|
prompt = "A cute brown bear eating a slice of pizza, stunning color scheme, masterpiece, illustration"
|
|
negative_prompt = "easynegative"
|
|
pipeline(prompt, negative_prompt).images[0]
|
|
```
|
|
|
|
<div class="flex justify-center">
|
|
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/load_neg_embed.png" />
|
|
</div> |