mirror of
https://github.com/huggingface/diffusers.git
synced 2025-12-24 13:24:49 +08:00
[docs] Textual inversion inference (#3473)
* add textual inversion inference to docs * add to toctree --------- Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
This commit is contained in:
@@ -0,0 +1,80 @@
|
||||
# Textual inversion
|
||||
|
||||
[[open-in-colab]]
|
||||
|
||||
The [`StableDiffusionPipeline`] supports textual inversion, a technique that enables a model like Stable Diffusion to learn a new concept from just a few sample images. This gives you more control over the generated images and allows you to tailor the model towards specific concepts. You can get started quickly with a collection of community created concepts in the [Stable Diffusion Conceptualizer](https://huggingface.co/spaces/sd-concepts-library/stable-diffusion-conceptualizer).
|
||||
|
||||
This guide will show you how to run inference with textual inversion using a pre-learned concept from the Stable Diffusion Conceptualizer. If you're interested in teaching a model new concepts with textual inversion, take a look at the [Textual Inversion](./training/text_inversion) training guide.
|
||||
|
||||
Login to your Hugging Face account:
|
||||
|
||||
```py
|
||||
from huggingface_hub import notebook_login
|
||||
|
||||
notebook_login()
|
||||
```
|
||||
|
||||
Import the necessary libraries, and create a helper function to visualize the generated images:
|
||||
|
||||
```py
|
||||
import os
|
||||
import torch
|
||||
|
||||
import PIL
|
||||
from PIL import Image
|
||||
|
||||
from diffusers import StableDiffusionPipeline
|
||||
from transformers import CLIPFeatureExtractor, CLIPTextModel, CLIPTokenizer
|
||||
|
||||
|
||||
def image_grid(imgs, rows, cols):
|
||||
assert len(imgs) == rows * cols
|
||||
|
||||
w, h = imgs[0].size
|
||||
grid = Image.new("RGB", size=(cols * w, rows * h))
|
||||
grid_w, grid_h = grid.size
|
||||
|
||||
for i, img in enumerate(imgs):
|
||||
grid.paste(img, box=(i % cols * w, i // cols * h))
|
||||
return grid
|
||||
```
|
||||
|
||||
Pick a Stable Diffusion checkpoint and a pre-learned concept from the [Stable Diffusion Conceptualizer](https://huggingface.co/spaces/sd-concepts-library/stable-diffusion-conceptualizer):
|
||||
|
||||
```py
|
||||
pretrained_model_name_or_path = "runwayml/stable-diffusion-v1-5"
|
||||
repo_id_embeds = "sd-concepts-library/cat-toy"
|
||||
```
|
||||
|
||||
Now you can load a pipeline, and pass the pre-learned concept to it:
|
||||
|
||||
```py
|
||||
pipeline = StableDiffusionPipeline.from_pretrained(pretrained_model_name_or_path, torch_dtype=torch.float16).to("cuda")
|
||||
|
||||
pipeline.load_textual_inversion(repo_id_embeds)
|
||||
```
|
||||
|
||||
Create a prompt with the pre-learned concept by using the special placeholder token `<cat-toy>`, and choose the number of samples and rows of images you'd like to generate:
|
||||
|
||||
```py
|
||||
prompt = "a grafitti in a favela wall with a <cat-toy> on it"
|
||||
|
||||
num_samples = 2
|
||||
num_rows = 2
|
||||
```
|
||||
|
||||
Then run the pipeline (feel free to adjust the parameters like `num_inference_steps` and `guidance_scale` to see how they affect image quality), save the generated images and visualize them with the helper function you created at the beginning:
|
||||
|
||||
```py
|
||||
all_images = []
|
||||
for _ in range(num_rows):
|
||||
images = pipe(prompt, num_images_per_prompt=num_samples, num_inference_steps=50, guidance_scale=7.5).images
|
||||
all_images.extend(images)
|
||||
|
||||
grid = image_grid(all_images, num_samples, num_rows)
|
||||
grid
|
||||
```
|
||||
|
||||
<div class="flex justify-center">
|
||||
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/textual_inversion_inference.png">
|
||||
</div>
|
||||
Reference in New Issue
Block a user