Compare commits

...

235 Commits

Author SHA1 Message Date
Patrick von Platen
6b2d9e6acd [Draft] MultiControlNet 2023-03-09 10:46:24 +00:00
Steven Liu
78507bda24 [docs] Build notebooks from Markdown (#2570)
* 📝 add mechanism for building colab notebook

* 🖍 add notebooks to correct folder

* 🖍 fix folder name
2023-03-08 12:13:19 +01:00
YiYi Xu
d2a5247a1f Add notebook doc img2img (#2472)
* convert img2img.mdx into notebook doc

* fix

* Update docs/source/en/using-diffusers/img2img.mdx

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

---------

Co-authored-by: yiyixuxu <yixu@yis-macbook-pro.lan>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2023-03-07 20:00:56 -10:00
Will Berman
309d8cf9ab add deps table check updated to ci (#2590) 2023-03-07 15:24:44 -08:00
Steven Liu
b285d94e10 [docs] Move Textual Inversion training examples to docs (#2576)
* 📝 add textual inversion examples to docs

* 🖍 apply feedback

* 🖍 add colab link
2023-03-07 14:21:18 -08:00
clarencechen
55660cfb6d Improve dynamic thresholding and extend to DDPM and DDIM Schedulers (#2528)
* Improve dynamic threshold

* Update code

* Add dynamic threshold to ddim and ddpm

* Encapsulate and leverage code copy mechanism

Update style

* Clean up DDPM/DDIM constructor arguments

* add test

* also add to unipc

---------

Co-authored-by: Peter Lin <peterlin9863@gmail.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2023-03-07 23:10:26 +01:00
Michael Gartsbein
46bef6e31d community stablediffusion controlnet img2img pipeline (#2584)
Co-authored-by: mishka <gartsocial@gmail.com>
2023-03-07 13:31:56 -08:00
Patrick von Platen
22a31760c4 [Docs] Weight prompting using compel (#2574)
* add docs

* correct

* finish

* Apply suggestions from code review

Co-authored-by: Will Berman <wlbberman@gmail.com>
Co-authored-by: YiYi Xu <yixu310@gmail.com>

* update deps table

* Apply suggestions from code review

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

---------

Co-authored-by: Will Berman <wlbberman@gmail.com>
Co-authored-by: YiYi Xu <yixu310@gmail.com>
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
2023-03-07 20:26:33 +01:00
zxypro
f0b661b8fb [Docs]Fix invalid link to Pokemons dataset (#2583) 2023-03-07 14:26:09 +01:00
Isamu Isozaki
8552fd7efa Added multitoken training for textual inversion. Issue 369 (#661)
* Added multitoken training for textual inversion

* Updated assertion

* Removed duplicate save code

* Fixed undefined bug

* Fixed save

* Added multitoken clip model +util helper

* Removed code splitting

* Removed class

* Fixed errors

* Fixed errors

* Added loading functionality

* Loading via dict instead

* Fixed bug of invalid index being loaded

* Fixed adding placeholder token only adding 1 token

* Fixed bug when initializing tokens

* Fixed bug when initializing tokens

* Removed flawed logic

* Fixed vector shuffle

* Fixed tokenizer's inconsistent __call__ method

* Fixed tokenizer's inconsistent __call__ method

* Handling list input

* Added exception for adding invalid tokens to token map

* Removed unnecessary files and started working on progressive tokens

* Set at minimum load one token

* Changed to global step

* Added method to load automatic1111 tokens

* Fixed bug in load

* Quality+style fixes

* Update quality/style fixes

* Cast embeddings to fp16 when loading

* Fixed quality

* Started moving things over

* Clearing diffs

* Clearing diffs

* Moved everything

* Requested changes
2023-03-07 12:09:36 +01:00
Hu Ye
e09a7d01c8 fix the default value of doc (#2539) 2023-03-07 11:40:22 +01:00
Pedro Cuenca
d3ce6f4b1e Support revision in Flax text-to-image training (#2567)
Support revision in Flax text-to-image training.
2023-03-07 08:16:31 +01:00
Steven Liu
ff91f154ee Update quicktour (#2463)
* first draft of updated quicktour

* 🖍 apply feedbacks

* 🖍 apply feedback and minor edits

* 🖍 add link to safety checker
2023-03-06 13:45:36 -08:00
Steven Liu
62bea2df36 [docs] Move text-to-image LoRA training from blog to docs (#2527)
* include text2image lora training in docs

* 🖍 apply feedback

* 🖍 minor edits
2023-03-06 13:45:07 -08:00
Steven Liu
9136be14a7 [docs] Move DreamBooth training materials to docs (#2547)
* move dbooth github stuff to docs

* add notebooks

* 🖍 minor shuffle

* 🖍 fix markdown table

* 🖍 apply feedback

*  make style

* 🖍 minor fix in code snippet
2023-03-06 13:44:30 -08:00
Steven Liu
7004ff55d5 [docs] Move relevant code for text2image to docs (#2537)
* move relevant code from text2image on GitHub to docs

* 🖍 add inference for text2image with flax

* 🖍 apply feedback
2023-03-06 13:43:45 -08:00
Will Berman
ca7ca11bcd community controlnet inpainting pipelines (#2561)
* community controlnet inpainting pipelines

* add community member attribution re: @pcuenca
2023-03-06 12:55:31 -08:00
YiYi Xu
c7da8fd233 add intermediate logging for dreambooth training script (#2557)
* add  intermediate logging
---------

Co-authored-by: yiyixuxu <yixu310@gmail,com>
2023-03-06 08:13:12 -10:00
Patrick von Platen
b8bfef2ab9 make style 2023-03-06 19:11:45 +01:00
haixinxu
f3f626d556 Allow textual_inversion_flax script to use save_steps and revision flag (#2075)
* Update textual_inversion_flax.py

* Update textual_inversion_flax.py

* Typo

sorry.

* Format source
2023-03-06 19:11:27 +01:00
YiYi Xu
b7b4683bdc allow Attend-and-excite pipeline work with different image sizes (#2476)
add attn_res variable

Co-authored-by: yiyixuxu <yixu310@gmail,com>
2023-03-06 08:06:54 -10:00
Patrick von Platen
56958e1177 [Training] Fix tensorboard typo (#2566) 2023-03-06 15:13:38 +01:00
Patrick von Platen
ec021923d2 [Unet1d] correct docs (#2565) 2023-03-06 14:36:28 +01:00
Patrick von Platen
1598a57958 make style 2023-03-06 10:51:03 +00:00
Haofan Wang
63805f8af7 Support convert LoRA safetensors into diffusers format (#2403)
* add lora convertor

* Update convert_lora_safetensor_to_diffusers.py

* Update README.md

* Update convert_lora_safetensor_to_diffusers.py
2023-03-06 11:50:46 +01:00
Sean Sube
9920c333c6 add OnnxStableDiffusionUpscalePipeline pipeline (#2158)
* [Onnx] add Stable Diffusion Upscale pipeline

* add a test for the OnnxStableDiffusionUpscalePipeline

* check for VAE config before adjusting scaling factor

* update test assertions, lint fixes

* run fix-copies target

* switch test checkpoint to one hosted on huggingface

* partially restore attention mask

* reshape embeddings after running text encoder

* add longer nightly test for ONNX upscale pipeline

* use package import to fix tests

* fix scheduler compatibility and class labels dtype

* use more precise type

* remove LMS from fast tests

* lookup latent and timestamp types

* add docs for ONNX upscaling, rename lookup table

* replace deprecated pipeline names in ONNX docs
2023-03-06 11:48:01 +01:00
Patrick von Platen
f38e3626cd make style 2023-03-06 10:40:18 +00:00
ForserX
5f826a35fb Add custom vae (diffusers type) to onnx converter (#2325) 2023-03-06 11:39:55 +01:00
Will Berman
f7278638e4 ema step, don't empty cuda cache (#2563) 2023-03-06 10:54:56 +01:00
Vico Chu
b36cbd4fba Fix: controlnet docs format (#2559) 2023-03-06 09:25:21 +01:00
Naga Sai Abhinay
2e3541d7f4 [Community Pipeline] Unclip Image Interpolation (#2400)
* unclip img interpolation poc

* Added code sample and refactoring.
2023-03-05 16:55:30 -08:00
Sanchit Gandhi
2b4f849db9 [PipelineTesterMixin] Handle non-image outputs for attn slicing test (#2504)
* [PipelineTesterMixin] Handle non-image outputs for batch/sinle inference test

* style

---------

Co-authored-by: William Berman <WLBberman@gmail.com>
2023-03-05 15:36:47 -08:00
Dhruv Nair
e4c356d3f6 Fix for InstructPix2PixPipeline to allow for prompt embeds to be passed in without prompts. (#2456)
* fix check inputs to allow prompt embeds in instruct pix2pix

* linting

* add reference comment to check inputs

* remove comment

* style changes

---------

Co-authored-by: Will Berman <wlbberman@gmail.com>
2023-03-05 11:42:50 -08:00
Nicolas Patry
2ea1da89ab Fix regression introduced in #2448 (#2551)
* Fix regression introduced in #2448

* Style.
2023-03-04 16:11:57 +01:00
Steven Liu
fa6d52d594 Training tutorial (#2473)
* first draft

*  minor edits

*  minor fixes

* 🖍 apply feedbacks

* 🖍 apply feedback and minor edits
2023-03-03 15:41:03 -08:00
Will Berman
a72a057d62 move test num_images_per_prompt to pipeline mixin (#2488)
* attend and excite batch test causing timeouts

* move test num_images_per_prompt to pipeline mixin

* style

* prompt_key -> self.batch_params
2023-03-03 11:45:07 -08:00
Laveraaa
2f489571a7 Update pipeline_stable_diffusion_inpaint_legacy.py resize to integer multiple of 8 instead of 32 for init image and mask (#2350)
Update pipeline_stable_diffusion_inpaint_legacy.py

Change resize to integer multiple of 8 instead of 32
2023-03-03 19:08:22 +01:00
alvanli
e75eae3711 Bug Fix: Remove explicit message argument in deprecate (#2421)
Remove explicit message argument
2023-03-03 19:03:16 +01:00
Alex McKinney
5e5ce13e2f adds xformers support to train_unconditional.py (#2520) 2023-03-03 18:35:59 +01:00
Patrick von Platen
7f0f7e1e91 Correct section docs (#2540) 2023-03-03 18:34:34 +01:00
Patrick von Platen
3d2648d743 [Post release] Push post release (#2546) 2023-03-03 18:11:01 +01:00
Nicolas Patry
1f4deb697f Adding support for safetensors and LoRa. (#2448)
* Adding support for `safetensors` and LoRa.

* Adding metadata.
2023-03-03 18:00:19 +01:00
Patrick von Platen
f20c8f5a1a Release: v0.14.0 2023-03-03 16:45:08 +01:00
Patrick von Platen
5b6582cf73 [Model offload] Add nice warning (#2543)
* [Model offload] Add nice warning

* Treat sequential and model offload differently.

Sequential raises an error because the operation would fail with a
cryptic warning later.

* Forcibly move to cpu when offloading.

* make style

* one more fix

* make fix-copies

* up

---------

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
2023-03-03 16:44:10 +01:00
Anton Lozhkov
4f0141a67d Fix ONNX checkpoint loading (#2544)
* Revert "Disable ONNX tests (#2509)"

This reverts commit a0549fea44.

* add external weights

* + pb

* style
2023-03-03 16:08:56 +01:00
Patrick von Platen
1021929313 Small fixes for controlnet (#2542)
* Small fixes for controlnet

* finish links
2023-03-03 14:20:43 +01:00
Fkunn1326
8f21a9f0e2 Fix convert SD to diffusers error (#1979) (#2529)
Fix convert sd 768 error (#1979)

Co-authored-by: tweeter0830 <tweeter0830@users.noreply.github.com>
2023-03-02 19:11:40 +01:00
Isamu Isozaki
d9b9533c7e Textual inv make save log both steps (#2178)
* Initial commit

* removed images

* Made logging the same as save

* Removed logging function

* Quality fixes

* Quality fixes

* Tested

* Added support back for validation_epochs

* Fixing styles

* Did changes

* Change to log_validation

* Add extra space after wandb import

* Add extra space after wandb

Co-authored-by: Will Berman <wlbberman@gmail.com>

* Fixed spacing

---------

Co-authored-by: Will Berman <wlbberman@gmail.com>
2023-03-02 19:04:18 +01:00
Ilmari Heikkinen
801484840a 8k Stable Diffusion with tiled VAE (#1441)
* Tiled VAE for high-res text2img and img2img

* vae tiling, fix formatting

* enable_vae_tiling API and tests

* tiled vae docs, disable tiling for images that would have only one tile

* tiled vae tests, use channels_last memory format

* tiled vae tests, use smaller test image

* tiled vae tests, remove tiling test from fast tests

* up

* up

* make style

* Apply suggestions from code review

* Apply suggestions from code review

* Apply suggestions from code review

* make style

* improve naming

* finish

* apply suggestions

* Apply suggestions from code review

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* up

---------

Co-authored-by: Ilmari Heikkinen <ilmari@fhtr.org>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
2023-03-02 17:42:32 +01:00
Takuma Mori
8dfff7c015 Add a ControlNet model & pipeline (#2407)
* add scaffold
- copied convert_controlnet_to_diffusers.py from
convert_original_stable_diffusion_to_diffusers.py

* Add support to load ControlNet (WIP)
- this makes Missking Key error on ControlNetModel

* Update to convert ControlNet without error msg
- init impl for StableDiffusionControlNetPipeline
- init impl for ControlNetModel

* cleanup of commented out

* split create_controlnet_diffusers_config()
from create_unet_diffusers_config()

- add config: hint_channels

* Add input_hint_block, input_zero_conv and
middle_block_out
- this makes missing key error on loading model

* add unet_2d_blocks_controlnet.py
- copied from unet_2d_blocks.py as impl CrossAttnDownBlock2D,DownBlock2D
- this makes missing key error on loading model

* Add loading for input_hint_block, zero_convs
and middle_block_out

- this makes no error message on model loading

* Copy from UNet2DConditionalModel except __init__

* Add ultra primitive test for ControlNetModel
inference

* Support ControlNetModel inference
- without exceptions

* copy forward() from UNet2DConditionModel

* Impl ControlledUNet2DConditionModel inference
- test_controlled_unet_inference passed

* Frozen weight & biases for training

* Minimized version of ControlNet/ControlledUnet
- test_modules_controllnet.py passed

* make style

* Add support model loading for minimized ver

* Remove all previous version files

* from_pretrained and inference test passed

* copied from pipeline_stable_diffusion.py
except `__init__()`

* Impl pipeline, pixel match test (almost) passed.

* make style

* make fix-copies

* Fix to add import ControlNet blocks
for `make fix-copies`

* Remove einops dependency

* Support  np.ndarray, PIL.Image for controlnet_hint

* set default config file as lllyasviel's

* Add support grayscale (hw) numpy array

* Add and update docstrings

* add control_net.mdx

* add control_net.mdx to toctree

* Update copyright year

* Fix to add PIL.Image RGB->BGR conversion
- thanks @Mystfit

* make fix-copies

* add basic fast test for controlnet

* add slow test for controlnet/unet

* Ignore down/up_block len check on ControlNet

* add a copy from test_stable_diffusion.py

* Accept controlnet_hint is None

* merge pipeline_stable_diffusion.py diff

* Update class name to SDControlNetPipeline

* make style

* Baseline fast test almost passed (w long desc)

* still needs investigate.

Following didn't passed descriped in TODO comment:
- test_stable_diffusion_long_prompt
- test_stable_diffusion_no_safety_checker

Following didn't passed same as stable_diffusion_pipeline:
- test_attention_slicing_forward_pass
- test_inference_batch_single_identical
- test_xformers_attention_forwardGenerator_pass
these seems come from calc accuracy.

* Add note comment related vae_scale_factor

* add test_stable_diffusion_controlnet_ddim

* add assertion for vae_scale_factor != 8

* slow test of pipeline almost passed
Failed: test_stable_diffusion_pipeline_with_model_offloading
- ImportError: `enable_model_offload` requires `accelerate v0.17.0` or higher

but currently latest version == 0.16.0

* test_stable_diffusion_long_prompt passed

* test_stable_diffusion_no_safety_checker passed

- due to its model size, move to slow test

* remove PoC test files

* fix num_of_image, prompt length issue add add test

* add support List[PIL.Image] for controlnet_hint

* wip

* all slow test passed

* make style

* update for slow test

* RGB(PIL)->BGR(ctrlnet) conversion

* fixes

* remove manual num_images_per_prompt test

* add document

* add `image` argument docstring

* make style

* Add line to correct conversion

* add controlnet_conditioning_scale (aka control_scales
strength)

* rgb channel ordering by default

* image batching logic

* Add control image descriptions for each checkpoint

* Only save controlnet model in conversion script

* Update src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_controlnet.py

typo

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Update docs/source/en/api/pipelines/stable_diffusion/control_net.mdx

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Update docs/source/en/api/pipelines/stable_diffusion/control_net.mdx

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Update docs/source/en/api/pipelines/stable_diffusion/control_net.mdx

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Update docs/source/en/api/pipelines/stable_diffusion/control_net.mdx

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Update docs/source/en/api/pipelines/stable_diffusion/control_net.mdx

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Update docs/source/en/api/pipelines/stable_diffusion/control_net.mdx

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Update docs/source/en/api/pipelines/stable_diffusion/control_net.mdx

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Update docs/source/en/api/pipelines/stable_diffusion/control_net.mdx

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Update docs/source/en/api/pipelines/stable_diffusion/control_net.mdx

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* add gerated image example

* a depth mask -> a depth map

* rename control_net.mdx to controlnet.mdx

* fix toc title

* add ControlNet abstruct and link

* Update src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_controlnet.py

Co-authored-by: dqueue <dbyqin@gmail.com>

* remove controlnet constructor arguments re: @patrickvonplaten

* [integration tests] test canny

* test_canny fixes

* [integration tests] test_depth

* [integration tests] test_hed

* [integration tests] test_mlsd

* add channel order config to controlnet

* [integration tests] test normal

* [integration tests] test_openpose test_scribble

* change height and width to default to conditioning image

* [integration tests] test seg

* style

* test_depth fix

* [integration tests] size fixes

* [integration tests] cpu offloading

* style

* generalize controlnet embedding

* fix conversion script

* Update docs/source/en/api/pipelines/stable_diffusion/controlnet.mdx

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

* Update docs/source/en/api/pipelines/stable_diffusion/controlnet.mdx

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

* Update docs/source/en/api/pipelines/stable_diffusion/controlnet.mdx

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

* Update docs/source/en/api/pipelines/stable_diffusion/controlnet.mdx

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

* Style adapted to the documentation of pix2pix

* merge main by hand

* style

* [docs] controlling generation doc nits

* correct some things

* add: controlnetmodel to autodoc.

* finish docs

* finish

* finish 2

* correct images

* finish controlnet

* Apply suggestions from code review

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* uP

* upload model

* up

* up

---------

Co-authored-by: William Berman <WLBberman@gmail.com>
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
Co-authored-by: dqueue <dbyqin@gmail.com>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2023-03-02 15:34:07 +01:00
Will Berman
1a6fa69ab6 PipelineTesterMixin parameter configuration refactor (#2502)
* attend and excite batch test causing timeouts

* PipelineTesterMixin argument configuration refactor

* error message text re: @yiyixuxu

* remove eta re: @patrickvonplaten
2023-03-01 12:44:55 -08:00
Patrick von Platen
664b4de9e2 [Tests] Fix slow tests (#2526)
* [Tests] Fix slow tests

* [Tests] Fix slow tsets
2023-03-01 16:21:33 +01:00
Pedro Cuenca
e4a9fb3b74 Bring Flax attention naming in sync with PyTorch (#2511)
Bring flax attention naming in sync with PyTorch.
2023-03-01 12:28:56 +01:00
Patrick von Platen
eadf0e2555 [Copyright] 2023 (#2524) 2023-03-01 10:31:00 +01:00
Will Berman
856dad57bb is_safetensors_compatible refactor (#2499)
* is_safetensors_compatible refactor

* files list comma
2023-02-28 20:37:42 -08:00
Pedro Cuenca
a75ac3fa8d Sequential cpu offload: require accelerate 0.14.0 (#2517)
* Sequential cpu offload: require accelerate 0.14.0.

* Import is_accelerate_version

* Missing copy.
2023-02-28 20:34:36 +01:00
Pedro Cuenca
477aaa96d0 Use "hub" directory for cache instead of "diffusers" (#2005)
* Use "hub" directory for cache instead of "diffusers"

* Import cache locations from huggingface_hub

I verified that the constants are available in huggingface_hub version
0.10.0, which is the minimum we require.

Co-authored-by: Lucain Pouget <lucainp@gmail.com>

* make style

* Move cached directories to new location.

* make style

* Apply suggestions by @Wauplin

Co-authored-by: Lucain <lucainp@gmail.com>

* Fix is_file

* Ignore symlinks.

Especially important if we want to ensure that the user may want to invoke the
process again later, if they are keeping multiple envs with different
versions.

* Style

---------

Co-authored-by: Lucain Pouget <lucainp@gmail.com>
2023-02-28 20:01:02 +01:00
Sayak Paul
e3a2c7f02c [Docs] Include more information in the "controlling generation" doc (#2434)
* edit controlling generation doc.

* add: demo link to pix2pix zero docs.

* refactor oanorama a bit.

* Apply suggestions from code review

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* pix: typo.

---------

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
2023-02-28 19:51:35 +05:30
Will Berman
1586186eea pix2pix tests no write to fs (#2497)
* attend and excite batch test causing timeouts

* pix2pix tests, no write to fs
2023-02-27 15:26:28 -08:00
Will Berman
42beaf1d23 move pipeline based test skips out of pipeline mixin (#2486) 2023-02-27 10:02:34 -08:00
Will Berman
824cb538b1 attend and excite batch test causing timeouts (#2498) 2023-02-27 10:01:59 -08:00
Patrick von Platen
a0549fea44 Disable ONNX tests (#2509) 2023-02-27 18:58:15 +01:00
Patrick von Platen
1c36a1239e [Docs] Improve safetensors (#2508)
* [Docs] Improve safetensors

* Apply suggestions from code review
2023-02-27 18:39:02 +01:00
Pedro Cuenca
48a2eb33f9 Add 4090 benchmark (PyTorch 2.0) (#2503)
* Add 4090 benchmark (PyTorch 2.0)

* Small changes in nomenclature.
2023-02-27 18:26:00 +01:00
Patrick von Platen
0e975e5ff6 [Safetensors] Make sure metadata is saved (#2506)
* [Safetensors] Make sure metadata is saved

* make style
2023-02-27 16:24:40 +01:00
Will Berman
7f43f65235 image_noiser -> image_normalizer comment (#2496) 2023-02-26 23:03:23 -08:00
Omer Bar Tal
6960e72225 add MultiDiffusion to controlling generation (#2490) 2023-02-25 14:28:17 +05:30
Pedro Cuenca
5de4347663 Fix test train_unconditional (#2481)
* Fix tensorboard tracking with `accelerate` @ `main`

* Fix `train_unconditional.py` with accelerate from main.
2023-02-24 14:31:16 -08:00
Pedro Cuenca
54bc882d96 mps test fixes (#2470)
* Skip variant tests (UNet1d, UNetRL) on mps.

mish op not yet supported.

* Exclude a couple of panorama tests on mps

They are too slow for fast CI.

* Exclude mps panorama from more tests.

* mps: exclude all fast panorama tests as they keep failing.
2023-02-24 15:19:53 +01:00
Haofan Wang
589faa8c88 Update train_text_to_image_lora.py (#2464)
* Update train_text_to_image_lora.py

* Update train_text_to_image_lora.py
2023-02-23 11:08:21 +05:30
Sayak Paul
39a3c77e0d fix: code snippet of instruct pix2pix from the docs. (#2446) 2023-02-21 18:07:22 +05:30
YiYi Xu
17ecf72d44 add demo (#2436)
Co-authored-by: yiyixuxu <yixu@yis-macbook-pro.lan>
2023-02-20 06:25:28 -10:00
Michael Gartsbein
f3fac68c55 small bugfix at StableDiffusionDepth2ImgPipeline call to check_inputs and batch size calculation (#2423)
small bugfix in call to check_inputs and batch size calculation

Co-authored-by: mishka <gartsocial@gmail.com>
2023-02-20 11:55:47 +01:00
Sayak Paul
8f1fe75b4c [Docs] Add a note on SDEdit (#2433)
add: note on SDEdit
2023-02-20 16:19:51 +05:30
Patrick von Platen
2ab4fcdb43 fix transformers naming (#2430) 2023-02-20 08:38:30 +01:00
Patrick von Platen
d7cfa0baa2 make style 2023-02-20 09:34:30 +02:00
Haofan Wang
4135558a78 Update pipeline_utils.py (#2415) 2023-02-20 08:34:13 +01:00
YiYi Xu
45572c2485 fix the get_indices function (#2418)
Co-authored-by: yiyixuxu <yixu310@gmail,com>
2023-02-19 18:34:13 -10:00
Sayak Paul
5f65ef4d0a remove author names. (#2428)
* remove author names.

* add: demo link to panorama.
2023-02-20 07:19:57 +05:30
Patrick von Platen
c85efbb9ff Fix deprecation warning (#2426)
Deprecation warning should only hit at version 0.15
2023-02-20 00:13:03 +02:00
Will Berman
1e5eaca754 stable unclip integration tests turn on memory saving (#2408)
* stable unclip integration tests turn on memory saving

* add note on turning on memory saving
2023-02-18 19:24:52 -08:00
Patrick von Platen
55de50921f make style 2023-02-18 00:00:01 +02:00
Patrick von Platen
3231712b7d Post release 0.14 2023-02-17 23:57:46 +02:00
Patrick von Platen
b2c1e0d6d4 Release: v0.13.0 2023-02-17 23:38:05 +02:00
Will Berman
bfdffbea32 add xformers 0.0.16 warning message (#2345)
* add xformers 0.0.16 warning message

* fix version check to check whole version string
2023-02-17 13:25:46 -08:00
YiYi Xu
770d3b3c29 add index page (#2401)
* add index page

* update

---------

Co-authored-by: yiyixuxu <yixu@yis-macbook-pro.lan>
2023-02-17 22:24:16 +01:00
Pedro Cuenca
780b3a4f8c Fix typo in AttnProcessor2_0 symbol (#2404)
Fix typo in AttnProcessor2_0 symbol.
2023-02-17 22:21:18 +02:00
Will Berman
07547dfacd controlling generation doc nits (#2406)
controlling generation docs fixes
2023-02-17 22:20:53 +02:00
Will Berman
5979089713 Revert "Release: v0.13.0" (#2405)
This reverts commit 024c4376fb.
2023-02-17 10:48:16 -08:00
Patrick von Platen
024c4376fb Release: v0.13.0 2023-02-17 18:46:00 +02:00
daquexian
0866e85e76 apply_forward_hook simply returns if no accelerate (#2387)
Signed-off-by: daquexian <daquexian566@gmail.com>
2023-02-17 17:27:23 +01:00
Will Berman
d2e2c611bc controlling generation docs (#2388)
* controlling generation docs

* Apply suggestions from code review

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
Co-authored-by: YiYi Xu <yixu310@gmail.com>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

* up

* up

* uP

* up

---------

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
Co-authored-by: YiYi Xu <yixu310@gmail.com>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2023-02-17 17:20:37 +01:00
Amiruddin Nagri
b6b73d97b4 Fixing typos in documentation (#2389)
Fixing typos in outgoing links
2023-02-17 16:42:59 +01:00
Omer Bar Tal
38de964343 add MultiDiffusionPanorama pipeline (#2393)
* add MultiDiffusionPanorama pipeline

* fix docs naming

* update pipeline name, remove redundant tests

* apply styling.

* debugging information.

* fix: assertion values.

* fix-copies.

* update docs

* update docs

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2023-02-17 16:39:50 +01:00
Patrick von Platen
14b950705a Add ddim inversion pix2pix (#2397)
* add

* finish

* add tests

* add tests

* up

* up

* pull from main

* uP

* Apply suggestions from code review

* finish

* Update docs/source/en/_toctree.yml

Co-authored-by: Suraj Patil <surajp815@gmail.com>

* finish

* clean docs

* next

* next

* Apply suggestions from code review

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* up

* up

---------

Co-authored-by: Suraj Patil <surajp815@gmail.com>
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
2023-02-17 16:27:51 +01:00
Manuel Brack
01a80807de Add semantic guidance pipeline (#2223)
* Add semantic guidance pipeline

* Fix style

* Refactor Pipeline

* Pipeline documentation

* Add documentation

* Fix style and quality

* Fix doctree

* Add tests for SEGA

* Update src/diffusers/pipelines/semantic_stable_diffusion/pipeline_semantic_stable_diffusion.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Update src/diffusers/pipelines/semantic_stable_diffusion/pipeline_semantic_stable_diffusion.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Update src/diffusers/pipelines/semantic_stable_diffusion/pipeline_semantic_stable_diffusion.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Make compatible with half precision

* Change deprecation warning to throw an exception

* update

---------

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2023-02-17 15:54:15 +01:00
patil-suraj
291ecdacd3 quikc doc fix 2023-02-17 15:45:54 +01:00
patil-suraj
350a510335 fix docs 2023-02-17 15:25:55 +01:00
Sayak Paul
867a217d14 add: inversion to pix2pix zero docs. (#2398)
* add: inversion to pix2pix zero docs.

* add: comment to emphasize the use of flan to generate.

* more nits.
2023-02-17 14:51:58 +01:00
Suraj Patil
0c0bb085e1 Torch2.0 scaled_dot_product_attention processor (#2303)
* add sdpa processor

* don't use it by default

* add some checks and style

* typo

* support torch sdpa in dreambooth example

* use torch attn proc by default when available

* typo

* add attn mask

* fix naming

* being doc

* doc

* Apply suggestions from code review

* polish

* torctree

* Apply suggestions from code review

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* better name

* style

* add benchamrk table

* Update docs/source/en/optimization/torch2.0.mdx

* up

* fix example

* check if processor is None

* Apply suggestions from code review

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* add fp32 benchmakr

* Apply suggestions from code review

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
2023-02-17 14:22:26 +01:00
Sayak Paul
c5fa13aa0d [Pipelines] Add a section on generating captions and embeddings for Pix2Pix Zero (#2395)
* add: section on generating embeddings.

* Apply suggestions from code review

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* apply changes from code review.

---------

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
2023-02-17 13:21:21 +01:00
Pedro Cuenca
351b37ea73 Fix UniPC tests and remove some test warnings (#2396)
* Change solver_type to match the previous tests.

* Prevent warnings about scale_model_inputs

* Prevent console log about division by zero.
2023-02-17 13:20:20 +01:00
Patrick von Platen
2e0d489a4e [Pix2Pix] Add utility function (#2385)
* [Pix2Pix] Add utility function

* improve

* update

* Apply suggestions from code review

* uP

* Update src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_pix2pix_zero.py
2023-02-17 10:49:00 +01:00
Sayak Paul
abd5dcbbf1 [Pix2Pix Zero] Fix slow tests (#2391)
* fix: slow tests.

* retrieving the slices.

* fix: assertion.

* debugging.

* debugging.

* debugging

* debugging.

* debugging

* debugging.

* debugging.

* debugging

* debugging

* change debugging.

* change debugging.

* fix: tests for pix2pix zero.
2023-02-17 10:35:50 +01:00
Wenliang Zhao
d45bb937ab [Docs] Fix UniPC docs (#2386)
* fix typos in the doc

* restyle the code
2023-02-17 08:10:56 +02:00
Tianlei Wu
568b73fdf8 Fix stable diffusion onnx pipeline error when batch_size > 1 (#2366)
fix safety_checker for batch_size > 1
2023-02-16 23:57:33 +01:00
Patrick von Platen
8e1cae5d66 Revert "[Pix2Pix0] Add utility function to get edit vector" (#2384)
Revert "[Pix2Pix0] Add utility function to get edit vector (#2383)"

This reverts commit 857c04cfba.
2023-02-16 23:00:27 +01:00
Patrick von Platen
857c04cfba [Pix2Pix0] Add utility function to get edit vector (#2383)
uP
2023-02-16 22:59:53 +01:00
YiYi Xu
2e7a28652a Attend and excite 2 (#2369)
* attend and excite pipeline

* update

update docstring example

remove visualization

remove the base class attention control

remove dependency on stable diffusion pipeline

always apply gaussian filter with default setting

remove run_standard_sd argument

hardcode attention_res and scale_range (related to step size)

Update docs/source/en/api/pipelines/stable_diffusion/attend_and_excite.mdx

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

Update src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_attend_and_excite.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

Update src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_attend_and_excite.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

Update src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_attend_and_excite.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

Update src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_attend_and_excite.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

Update src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_attend_and_excite.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

Update src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_attend_and_excite.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

Update tests/pipelines/stable_diffusion_2/test_stable_diffusion_attend_and_excite.py

Co-authored-by: Will Berman <wlbberman@gmail.com>

Update src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_attend_and_excite.py

Co-authored-by: Will Berman <wlbberman@gmail.com>

Update src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_attend_and_excite.py

Co-authored-by: Will Berman <wlbberman@gmail.com>

revert test_float16_inference

revert change to the batch related tests

fix test_float16_inference

handle batch

remove the deprecation message

remove None check, step_size

remove debugging logging

add slow test

indices_to_alter -> indices

add check_input

* skip mps

* style

* Apply suggestions from code review

Co-authored-by: Suraj Patil <surajp815@gmail.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* indices -> token_indices
---------

Co-authored-by: evin <evinpinarornek@gmail.com>
Co-authored-by: yiyixuxu <yixu310@gmail,com>
Co-authored-by: Suraj Patil <surajp815@gmail.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2023-02-16 11:15:54 -10:00
Patrick von Platen
f243282e3e [Dummy imports] Add missing if else statements for SD] (#2381)
* [Dummy imports] Add missing if else statements for SD]

* Apply suggestions from code review

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

---------

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
2023-02-16 21:53:07 +01:00
Patrick von Platen
ca980fd0d1 [Examples] Make sure EMA works with any device (#2382)
* Fix EMA

* up

* update
2023-02-16 21:27:47 +01:00
Pedro Cuenca
a60f5555f5 Make diffusers importable with transformers < 4.26 (#2380) 2023-02-16 20:17:33 +01:00
Patrick von Platen
90a624f697 improve tests 2023-02-16 20:42:00 +02:00
fxmarty
d32e9391f9 Replace torch.concat calls by torch.cat (#2378)
replace torch.concat by torch.cat
2023-02-16 19:36:33 +01:00
Wenliang Zhao
aaaec06487 add the UniPC scheduler (#2373)
* add UniPC scheduler

* add the return type to the functions

* code quality check

* add tests

* finish docs

---------

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2023-02-16 19:19:06 +01:00
Pedro Cuenca
2777264ee8 enable_model_cpu_offload (#2285)
* enable_model_offload PoC

It's surprisingly more involved than expected, see comments in the PR.

* Rename final_offload_hook

* Invoke the vae forward hook manually.

* Completely remove decoder.

* Style

* apply_forward_hook decorator

* Rename method.

* Style

* Copy enable_model_cpu_offload

* Fix copies.

* Remove comment.

* Fix copies

* Missing import

* Fix doc-builder style.

* Merge main and fix again.

* Add docs

* Fix docs.

* Add a couple of tests.

* style
2023-02-16 19:06:36 +01:00
Sayak Paul
6eaebe8278 [Utils] Adds store() and restore() methods to EMAModel (#2302)
* add store and restore() methods to EMAModel.

* Update src/diffusers/training_utils.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* make style with doc builder

* remove explicit listing.

* Apply suggestions from code review

Co-authored-by: Will Berman <wlbberman@gmail.com>

* Apply suggestions from code review

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* chore: better variable naming.

* better treatment of temp_stored_params

Co-authored-by: patil-suraj <surajp815@gmail.com>

* make style

* remove temporary params from earth 🌎

* make fix-copies.

---------

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Will Berman <wlbberman@gmail.com>
Co-authored-by: patil-suraj <surajp815@gmail.com>
2023-02-16 15:20:25 +01:00
Will Berman
b214bb25f8 train_text_to_image EMAModel saving (#2341) 2023-02-16 14:40:28 +01:00
Suraj Patil
de9ce9e936 [SchedulingPNDM ] reset cur_model_output after each call (#2376)
reset cur_model_output
2023-02-16 14:38:42 +01:00
Susung Hong
fa35750d3b Add Self-Attention-Guided (SAG) Stable Diffusion pipeline (#2193)
* Add Stable Diffusion Sw/ elf-Attention Guidance

* Modify __init__.py

* Register attention storing processor

* Update pipeline_stable_diffusion_sag.py

* Editing default value

* Update pipeline_stable_diffusion_sag.py

* Update pipeline_stable_diffusion_sag.py

* Update pipeline_stable_diffusion_sag.py

* Update dummy_torch_and_transformers_objects.py

* Update pipeline_stable_diffusion_sag.py

* Update pipeline_stable_diffusion_sag.py

* Update pipeline_stable_diffusion_sag.py

* Update src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_sag.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Update pipeline_stable_diffusion_sag.py

* Update src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_sag.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Update src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_sag.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Update src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_sag.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Update src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_sag.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Update src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_sag.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Update src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_sag.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Update src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_sag.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Update src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_sag.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Update src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_sag.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Update src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_sag.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Update pipeline_stable_diffusion_sag.py

* Update pipeline_stable_diffusion_sag.py

* Update pipeline_stable_diffusion_sag.py

* Update pipeline_stable_diffusion_sag.py

* Create test_stable_diffusion_sag.py

* Create self_attention_guidance.py

* Update pipeline_stable_diffusion_sag.py

* Update test_stable_diffusion_sag.py

* Update pipeline_stable_diffusion_sag.py

* Rename self_attention_guidance.py to self_attention_guidance.mdx

* Update self_attention_guidance.mdx

* Update self_attention_guidance.mdx

* Update _toctree.yml

* Update pipeline_stable_diffusion_sag.py

* Update pipeline_stable_diffusion_sag.py

* Update pipeline_stable_diffusion_sag.py

* Update pipeline_stable_diffusion_sag.py

* Fixing order

* Update pipeline_stable_diffusion_sag.py

* fixing import order

* fix order

* Update pipeline_stable_diffusion_sag.py

* Update pipeline_stable_diffusion_sag.py

* Naming change

* Noting pred_x0

* Adding some fast tests

* Update pipeline_stable_diffusion_sag.py

* Update test_stable_diffusion_sag.py

* Update test_stable_diffusion_sag.py

* Update test_stable_diffusion_sag.py

* Update docs/source/en/api/pipelines/stable_diffusion/self_attention_guidance.mdx

* implement gaussian_blur

* Update pipeline_stable_diffusion_sag.py

* Update pipeline_stable_diffusion_sag.py

* fix tests

* Update pipeline_stable_diffusion_sag.py

* Update pipeline_stable_diffusion_sag.py

---------

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Will Berman <wlbberman@gmail.com>
2023-02-16 13:04:49 +01:00
Sayak Paul
fd3d5502d4 [Pipelines] Adds pix2pix zero (#2334)
* add: support for BLIP generation.

* add: support for editing synthetic images.

* remove unnecessary comments.

* add inits and run make fix-copies.

* version change of diffusers.

* fix: condition for loading the captioner.

* default conditions_input_image to False.

* guidance_amount -> cross_attention_guidance_amount

* fix inputs to check_inputs()

* fix: attribute.

* fix: prepare_attention_mask() call.

* debugging.

* better placement of references.

* remove torch.no_grad() decorations.

* put torch.no_grad() context before the first denoising loop.

* detach() latents before decoding them.

* put deocding in a torch.no_grad() context.

* add reconstructed image for debugging.

* no_grad(0

* apply formatting.

* address one-off suggestions from the draft PR.

* back to torch.no_grad() and add more elaborate comments.

* refactor prepare_unet() per Patrick's suggestions.

* more elaborate description for .

* formatting.

* add docstrings to the methods specific to pix2pix zero.

* suspecting a redundant noise prediction.

* needed for gradient computation chain.

* less hacks.

* fix: attention mask handling within the processor.

* remove attention reference map computation.

* fix: cross attn args.

* fix: prcoessor.

* store attention maps.

* fix: attention processor.

* update docs and better treatment to xa args.

* update the final noise computation call.

* change xa args call.

* remove xa args option from the pipeline.

* add: docs.

* first test.

* fix: url call.

* fix: argument call.

* remove image conditioning for now.

* 🚨 add: fast tests.

* explicit placement of the xa attn weights.

* add: slow tests 🐢

* fix: tests.

* edited direction embedding should be on the same device as prompt_embeds.

* debugging message.

* debugging.

* add pix2pix zero pipeline for a non-deterministic test.

* debugging/

* remove debugging message.

* make caption generation _

* address comments (part I).

* address PR comments (part II)

* fix: DDPM test assertion.

* refactor doc.

* address PR comments (part III).

* fix: type annotation for the scheduler.

* apply styling.

* skip_mps and add note on embeddings in the docs.
2023-02-16 11:20:38 +01:00
Patrick von Platen
e5810e686e [Variant] Add "variant" as input kwarg so to have better UX when downloading no_ema or fp16 weights (#2305)
* [Variant] Add variant loading mechanism

* clean

* improve further

* up

* add tests

* add some first tests

* up

* up

* use path splittetx

* add deprecate

* deprecation warnings

* improve docs

* up

* up

* up

* fix tests

* Apply suggestions from code review

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Apply suggestions from code review

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* correct code format

* fix warning

* finish

* Apply suggestions from code review

Co-authored-by: Suraj Patil <surajp815@gmail.com>

* Apply suggestions from code review

Co-authored-by: Suraj Patil <surajp815@gmail.com>

* Update docs/source/en/using-diffusers/loading.mdx

Co-authored-by: Suraj Patil <surajp815@gmail.com>

* Apply suggestions from code review

Co-authored-by: Will Berman <wlbberman@gmail.com>
Co-authored-by: Suraj Patil <surajp815@gmail.com>

* correct loading docs

* finish

---------

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
Co-authored-by: Suraj Patil <surajp815@gmail.com>
Co-authored-by: Will Berman <wlbberman@gmail.com>
2023-02-16 11:02:58 +01:00
Damian Stewart
e3ddbe25ed Fix 3-way merging with the checkpoint_merger community pipeline (#2355)
correctly locate 3rd file; also correct misleading docs
2023-02-16 10:52:41 +01:00
Will Berman
46def7265f checkpointing_steps_total_limit->checkpoints_total_limit (#2374) 2023-02-16 00:28:58 -08:00
Will Berman
296b01e1a1 add total number checkpoints to training scripts (#2367)
* add total number checkpoints to training scripts

* Update examples/dreambooth/train_dreambooth.py

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2023-02-15 23:58:06 -08:00
Will Berman
a3ae46610f schedulers add glide noising schedule (#2347) 2023-02-15 23:51:33 -08:00
meg
c613288c9b Funky spacing issue (#2368)
There isn't a space between the "Scope" paragraph and "Ethical Guidelines", here: https://huggingface.co/docs/diffusers/main/en/conceptual/ethical_guidelines , yet I can't see that in the preview. In this PR, I'm simply adding some spaces in the hopes that it resolves the issue.....
2023-02-15 17:36:31 -08:00
Patrick von Platen
4c52982a0b [Tests] Add MPS skip decorator (#2362)
* finish

* Apply suggestions from code review

* fix indent and import error in test_stable_diffusion_depth

---------

Co-authored-by: William Berman <WLBberman@gmail.com>
2023-02-15 22:17:25 +01:00
Will Berman
2a49fac864 KarrasDiffusionSchedulers type note (#2365) 2023-02-15 12:37:56 -08:00
Kashif Rasul
51b61b69c5 [Docs] initial docs about KarrasDiffusionSchedulers (#2349)
* initial docs about KarrasDiffusionSchedulers

* typo

* grammer

* Update docs/source/en/api/schedulers/overview.mdx

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* do not list the schedulers explicitly

---------

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
2023-02-15 10:19:57 -08:00
Patrick von Platen
666d80a1c8 fix some tests 2023-02-15 10:22:06 +00:00
Patrick von Platen
91925fbb76 Fix callback type hints - no optional function argument (#2357)
replace type hints
2023-02-14 14:35:05 -08:00
Ben Evans
0db19da01f Log Unconditional Image Generation Samples to W&B (#2287)
* Log Unconditional Image Generation Samples to WandB

* Check for wandb installation and parity between onnxruntime script

* Log epoch to wandb

* Check for tensorboard logger early on

* style fixes

---------

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2023-02-14 23:11:12 +01:00
Will Berman
62b3c9e06a unCLIP variant (#2297)
* pipeline_variant

* Add docs for when clip_stats_path is specified

* Update src/diffusers/pipelines/stable_diffusion/pipeline_stable_unclip.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Update src/diffusers/pipelines/stable_diffusion/pipeline_stable_unclip.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Update src/diffusers/pipelines/stable_diffusion/pipeline_stable_unclip_img2img.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Update src/diffusers/pipelines/stable_diffusion/pipeline_stable_unclip_img2img.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* prepare_latents # Copied from re: @patrickvonplaten

* NoiseAugmentor->ImageNormalizer

* stable_unclip_prior default to None re: @patrickvonplaten

* prepare_prior_extra_step_kwargs

* prior denoising scale model input

* {DDIM,DDPM}Scheduler -> KarrasDiffusionSchedulers re: @patrickvonplaten

* docs

* Update docs/source/en/api/pipelines/stable_unclip.mdx

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

---------

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2023-02-14 11:28:57 -08:00
Will Berman
e55687e1e1 unet check length inputs (#2327)
* unet check length input

* prep test file for changes

* correct all tests

* clean up

---------

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2023-02-13 16:25:50 -08:00
Will Berman
9e8ee2ace1 dreambooth checkpointing tests and docs (#2339) 2023-02-13 14:16:32 -08:00
Will Berman
6782b70dd3 github issue forum link (#2335)
* github issue forum link

* Update .github/ISSUE_TEMPLATE/config.yml

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

---------

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2023-02-13 11:21:14 -08:00
Will Berman
f190714e77 karlo image variation use kakaobrain upload (#2338) 2023-02-13 10:53:33 -08:00
Patrick von Platen
6cbd7b8b27 [Tests] Remove unnecessary tests (#2337) 2023-02-13 18:27:41 +01:00
Patrick von Platen
bc0cee9d1c [Latent Upscaling] Remove unused noise (#2298) 2023-02-13 18:06:26 +01:00
Patrick von Platen
1f5f17c5b4 [Versatile Diffusion] Fix tests (#2336) 2023-02-13 18:04:50 +01:00
Patrick von Platen
98c1a8e793 [Docs] Fix ethical guidelines docs (#2333) 2023-02-13 14:15:53 +01:00
Plat
0850b88fa1 Fix typo in load_pipeline_from_original_stable_diffusion_ckpt() method (#2320)
fix typo
2023-02-13 12:26:56 +01:00
bddppq
5d4f59ee96 Fix running LoRA with xformers (#2286)
* Fix running LoRA with xformers

* support disabling xformers

* reformat

* Add test
2023-02-13 11:58:18 +01:00
Giada Pistilli
f2eae16849 Add ethical guidelines (#2330)
* add ethical guidelines

* update file name

* edit file name

* update toctree

* Update docs/source/en/conceptual/ethical_guidelines.mdx

* Update docs/source/en/conceptual/ethical_guidelines.mdx

---------

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2023-02-13 10:43:40 +01:00
Patrick von Platen
120844aadf [Tests] Refactor push tests (#2329)
* [Tests] Refactor push tests

* correct
2023-02-13 10:06:11 +01:00
Naga Sai Abhinay
a688c7bdfb [Community Pipeline] UnCLIP Text Interpolation Pipeline (#2257)
* UnCLIP Text Interpolation Pipeline

* Formatter fixes

* Changes based on feedback

* Formatting fix

* Formatting fix

* isort formatting fix(?)

* Remove duplicate code

* Formatting fix

* Refactor __call__ and change example in readme.

* Update examples/community/unclip_text_interpolation.py

Refactor to linter formatting

Co-authored-by: Will Berman <wlbberman@gmail.com>

---------

Co-authored-by: Will Berman <wlbberman@gmail.com>
2023-02-12 22:16:18 -08:00
Will Berman
1e7f965442 convert ckpt script docstring fixes (#2293)
* convert ckpt script docstring fixes

* Update src/diffusers/pipelines/stable_diffusion/convert_from_ckpt.py

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Update src/diffusers/pipelines/stable_diffusion/convert_from_ckpt.py

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

---------

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
2023-02-10 13:57:49 -08:00
Will Berman
beb59abfa0 remove ddpm test_full_inference (#2291)
* remove ddpm test_full_inference

* style
2023-02-10 13:51:07 -08:00
Patrick von Platen
96c2279bcd Correct fast tests (#2314)
* correct some

* Apply suggestions from code review

* correct

* Update tests/pipelines/altdiffusion/test_alt_diffusion_img2img.py

* Final
2023-02-10 14:12:34 +01:00
Patrick von Platen
716286f19d Fast CPU tests should also run on main (#2313)
add fast tests
2023-02-10 12:46:01 +01:00
Patrick von Platen
e83b43612b make style 2023-02-10 13:07:46 +02:00
erkams
1be7df0205 [LoRA] Freezing the model weights (#2245)
* [LoRA] Freezing the model weights

Freeze the model weights since we don't need to calculate grads for them.

* Apply suggestions from code review

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Apply suggestions from code review

---------

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Suraj Patil <surajp815@gmail.com>
2023-02-09 11:45:11 +01:00
Patrick von Platen
62a15cec6e make style 2023-02-09 12:27:44 +02:00
Ben Evans
f3c848383a Run same number of DDPM steps in inference as training (#2263)
Resolves ValueError: `num_inference_steps`: 1000 cannot be larger than `self.config.train_timesteps`: 50 as the unet model trained with this scheduler can only handle maximal 50 timesteps.
2023-02-09 10:36:38 +01:00
Will Berman
fd5c3c09af misc fixes (#2282)
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2023-02-08 09:02:42 -08:00
Patrick von Platen
648090e26e fix pix2pix docs (#2290) 2023-02-08 16:38:18 +01:00
Patrick von Platen
1ed6b77781 [Examples] Test all examples on CPU (#2289)
* [Examples] Test all examples on CPU

* add

* correct

* Apply suggestions from code review
2023-02-08 15:59:13 +01:00
Chenguo Lin
9d0d070996 EMA: fix state_dict() and load_state_dict() & add cur_decay_value (#2146)
* EMA: fix `state_dict()` & add `cur_decay_value`

* EMA: fix a bug in `load_state_dict()`

'float' object (`state_dict["power"]`) has no attribute 'get'.

* del train_unconditional_ort.py
2023-02-08 10:44:50 +01:00
Isamu Isozaki
c1971a53bc Textual inv save log memory (#2184)
* Quality check and adding tokenizer

* Adapted stable diffusion to mixed precision+finished up style fixes

* Fixed based on patrick's review

* Fixed oom from number of validation images

* Removed unnecessary np.array conversion

---------

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2023-02-08 10:37:10 +01:00
Patrick von Platen
41db2dbf90 correct tests 2023-02-08 11:12:51 +02:00
Patrick von Platen
a7ca03aa85 Replace flake8 with ruff and update black (#2279)
* before running make style

* remove left overs from flake8

* finish

* make fix-copies

* final fix

* more fixes
2023-02-07 23:46:23 +01:00
Patrick von Platen
f5ccffecf7 Use accelerate save & loading hooks to have better checkpoint structure (#2048)
* better accelerated saving

* up

* finish

* finish

* uP

* up

* up

* fix

* Apply suggestions from code review

* correct ema

* Remove @

* up

* Apply suggestions from code review

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Update docs/source/en/training/dreambooth.mdx

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

---------

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
2023-02-07 20:03:59 +01:00
Pedro Cuenca
e619db24be mps cross-attention hack: don't crash on fp16 (#2258)
* mps cross-attention hack: don't crash on fp16

* Make conversion explicit.
2023-02-07 19:51:33 +01:00
wfng92
111228cb39 Fix torchvision.transforms and transforms function naming clash (#2274)
* Fix torchvision.transforms and transforms function naming clash

* Update unconditional script for onnx

* Apply suggestions from code review

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

---------

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
2023-02-07 17:36:32 +01:00
Patrick von Platen
bbb46ad3d5 [Tests] Fix slow tests (#2271) 2023-02-07 14:42:12 +01:00
wfng92
b1dad2e9d3 Make center crop and random flip as args for unconditional image generation (#2259)
* Add center crop and horizontal flip to args

* Update command to use center crop and random flip

* Add center crop and horizontal flip to args

* Update command to use center crop and random flip
2023-02-07 11:58:31 +01:00
Patrick von Platen
cd52475560 [Examples] Remove datasets important that is not needed (#2267)
* [Examples] Remove datasets important that is not needed

* remove from lora tambien
2023-02-07 11:55:34 +01:00
Patrick von Platen
0f04e799dc fix vae pt script 2023-02-07 08:34:19 +00:00
YiYi Xu
1051ca81a6 Stable Diffusion Latent Upscaler (#2059)
* Modify UNet2DConditionModel

- allow skipping mid_block

- adding a norm_group_size argument so that we can set the `num_groups` for group norm using `num_channels//norm_group_size`

- allow user to set dimension for the timestep embedding (`time_embed_dim`)

- the kernel_size for `conv_in` and `conv_out` is now configurable

- add random fourier feature layer (`GaussianFourierProjection`) for `time_proj`

- allow user to add the time and class embeddings before passing through the projection layer together - `time_embedding(t_emb + class_label))`

- added 2 arguments `attn1_types` and `attn2_types`

  * currently we have argument `only_cross_attention`: when it's set to `True`, we will have a to the
`BasicTransformerBlock` block with 2 cross-attention , otherwise we
get a self-attention followed by a cross-attention; in k-upscaler, we need to have blocks that include just one cross-attention, or self-attention -> cross-attention;
so I added `attn1_types` and `attn2_types` to the unet's argument list to allow user specify the attention types for the 2 positions in each block;  note that I stil kept
the `only_cross_attention` argument for unet for easy configuration, but it will be converted to `attn1_type` and `attn2_type` when passing down to the down blocks

- the position of downsample layer and upsample layer is now configurable

- in k-upscaler unet, there is only one skip connection per each up/down block (instead of each layer in stable diffusion unet), added `skip_freq = "block"` to support
this use case

- if user passes attention_mask to unet, it will prepare the mask and pass a flag to cross attention processer to skip the `prepare_attention_mask` step
inside cross attention block

add up/down blocks for k-upscaler

modify CrossAttention class

- make the `dropout` layer in `to_out` optional

- `use_conv_proj` - use conv instead of linear for all projection layers (i.e. `to_q`, `to_k`, `to_v`, `to_out`) whenever possible. note that when it's used to do cross
attention, to_k, to_v has to be linear because the `encoder_hidden_states` is not 2d

- `cross_attention_norm` - add an optional layernorm on encoder_hidden_states

- `attention_dropout`: add an optional dropout on attention score

adapt BasicTransformerBlock

- add an ada groupnorm layer  to conditioning attention input with timestep embedding

- allow skipping the FeedForward layer in between the attentions

- replaced the only_cross_attention argument with attn1_type and attn2_type for more flexible configuration

update timestep embedding: add new act_fn  gelu and an optional act_2

modified ResnetBlock2D

- refactored with AdaGroupNorm class (the timestep scale shift normalization)

- add `mid_channel` argument - allow the first conv to have a different output dimension from the second conv

- add option to use input AdaGroupNorm on the input instead of groupnorm

- add options to add a dropout layer after each conv

- allow user to set the bias in conv_shortcut (needed for k-upscaler)

- add gelu

adding conversion script for k-upscaler unet

add pipeline

* fix attention mask

* fix a typo

* fix a bug

* make sure model can be used with GPU

* make pipeline work with fp16

* fix an error in BasicTransfomerBlock

* make style

* fix typo

* some more fixes

* uP

* up

* correct more

* some clean-up

* clean time proj

* up

* uP

* more changes

* remove the upcast_attention=True from unet config

* remove attn1_types, attn2_types etc

* fix

* revert incorrect changes up/down samplers

* make style

* remove outdated files

* Apply suggestions from code review

* attention refactor

* refactor cross attention

* Apply suggestions from code review

* update

* up

* update

* Apply suggestions from code review

* finish

* Update src/diffusers/models/cross_attention.py

* more fixes

* up

* up

* up

* finish

* more corrections of conversion state

* act_2 -> act_2_fn

* remove dropout_after_conv from ResnetBlock2D

* make style

* simplify KAttentionBlock

* add fast test for latent upscaler pipeline

* add slow test

* slow test fp16

* make style

* add doc string for pipeline_stable_diffusion_latent_upscale

* add api doc page for latent upscaler pipeline

* deprecate attention mask

* clean up embeddings

* simplify resnet

* up

* clean up resnet

* up

* correct more

* up

* up

* improve a bit more

* correct more

* more clean-ups

* Update docs/source/en/api/pipelines/stable_diffusion/latent_upscale.mdx

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Update docs/source/en/api/pipelines/stable_diffusion/latent_upscale.mdx

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* add docstrings for new unet config

* Update src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_latent_upscale.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Update src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_latent_upscale.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* # Copied from

* encode the image if not latent

* remove force casting vae to fp32

* fix

* add comments about preconditioning parameters from k-diffusion paper

* attn1_type, attn2_type -> add_self_attention

* clean up get_down_block and get_up_block

* fix

* fixed a typo(?) in ada group norm

* update slice attention processer for cross attention

* update slice

* fix fast test

* update the checkpoint

* finish tests

* fix-copies

* fix-copy for modeling_text_unet.py

* make style

* make style

* fix f-string

* Update src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_latent_upscale.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* fix import

* correct changes

* fix resnet

* make fix-copies

* correct euler scheduler

* add missing #copied from for preprocess

* revert

* fix

* fix copies

* Update docs/source/en/api/pipelines/stable_diffusion/latent_upscale.mdx

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Update docs/source/en/api/pipelines/stable_diffusion/latent_upscale.mdx

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Update docs/source/en/api/pipelines/stable_diffusion/latent_upscale.mdx

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Update docs/source/en/api/pipelines/stable_diffusion/latent_upscale.mdx

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Update src/diffusers/models/cross_attention.py

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Update src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_latent_upscale.py

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Update src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_latent_upscale.py

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* clean up conversion script

* KDownsample2d,KUpsample2d -> KDownsample2D,KUpsample2D

* more

* Update src/diffusers/models/unet_2d_condition.py

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* remove prepare_extra_step_kwargs

* Update src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_latent_upscale.py

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Update src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_latent_upscale.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* fix a typo in timestep embedding

* remove num_image_per_prompt

* fix fasttest

* make style + fix-copies

* fix

* fix xformer test

* fix style

* doc string

* make style

* fix-copies

* docstring for time_embedding_norm

* make style

* final finishes

* make fix-copies

* fix tests

---------

Co-authored-by: yiyixuxu <yixu@yis-macbook-pro.lan>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
2023-02-07 09:11:57 +01:00
Patrick von Platen
3b66cc0fc1 make style 2023-02-07 08:11:22 +00:00
chavinlo
717a956a02 Create convert_vae_pt_to_diffusers.py (#2215)
* Create convert_vae_pt_to_diffusers.py

Just a simple script to convert VAE.pt files to diffusers format
Tested with: https://huggingface.co/WarriorMama777/OrangeMixs/blob/main/VAEs/orangemix.vae.pt

* Update convert_vae_pt_to_diffusers.py

Forgot to add the function call

* make style

---------

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: chavinlo <example@example.com>
2023-02-07 09:10:34 +01:00
Jorge C. Gomes
d43972ae71 Fixes prompt input checks in StableDiffusion img2img pipeline (#2206)
* Fixes prompt input checks in img2img

Allows providing prompt_embeds instead of the prompt, which is not currently possible as the first check fails.
This becomes the same as the function found in 8267c78445/src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion.py (L393)

* Continues the fix

This also needs to be fixed. Becomes consistent with 8267c78445/src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion.py (L558)

I've now tested this implementation, and it produces the expected results.
2023-02-07 09:10:24 +01:00
Fazzie-Maqianli
ffed2420c4 fix distributed init twice (#2252)
fix colossalai dreambooth
2023-02-07 08:55:39 +01:00
Pedro Cuenca
8178c840f2 Mention training problems with xFormers 0.0.16 (#2254) 2023-02-06 11:19:26 +01:00
nickkolok
3a0d3da66f Fix a typo: bfloa16 -> bfloat16 (#2243) 2023-02-06 09:14:08 +01:00
psychedelicious
22c1ba56c2 Fix k_dpm_2 & k_dpm_2_a on MPS (#2241)
Needed to convert `timesteps` to `float32` a bit sooner.

Fixes #1537
2023-02-05 23:45:15 +01:00
Pedro Cuenca
7386e7730c Show error when loading safety_checker from_flax (#2187)
* Show error when loading safety_checker `from_flax`

* fix style
2023-02-04 20:55:11 +01:00
Pedro Cuenca
154a7865fc [Flax DDPM] Make key optional so default pipelines don't fail (#2176)
Make `key` optional so default pipelines don't fail.
2023-02-04 20:45:20 +01:00
Robin Hutmacher
9baa29e9c0 Fix typo in StableDiffusionInpaintPipeline (#2197)
* Fix typo in StableDiffusionInpaintPipeline

* Add embedded prompt handling

---------

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2023-02-03 19:03:15 +01:00
Jorge C. Gomes
58c416ab0c Fixes LoRAXFormersCrossAttnProcessor (#2207)
Related to #2124 
The current implementation is throwing a shape mismatch error. Which makes sense, as this line is obviously missing, comparing to XFormersCrossAttnProcessor and LoRACrossAttnProcessor.

I don't have formal tests, but I compared `LoRACrossAttnProcessor` and `LoRAXFormersCrossAttnProcessor` ad-hoc, and they produce the same results with this fix.
2023-02-03 18:10:48 +01:00
Isamu Isozaki
d46d78c584 Hotfix textual inv logging (#2183) 2023-02-03 18:08:46 +01:00
Patrick von Platen
05168e5d83 make style 2023-02-03 19:03:13 +02:00
Justin Merrell
948022e1e8 fix: flagged_images implementation (#1947)
Flagged images would be set to the blank image instead of the original image that contained the NSF concept for optional viewing.

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2023-02-03 18:02:56 +01:00
Patrick von Platen
2f9a70aa85 [LoRA] Make sure validation works in multi GPU setup (#2172)
* [LoRA] Make sure validation works in multi GPU setup

* more fixes

* up
2023-02-03 16:50:10 +01:00
Sayak Paul
e43e206dc7 removes ~s in favor of full-fledged links. (#2229)
remove ~ in favor of full-fledged links.
2023-02-03 20:18:39 +05:30
Will Berman
99c39b4012 [nit] negative_prompt typo (#2227)
* negative_prompt typo

* fix
2023-02-03 14:05:50 +01:00
dymil
7547f9b475 Fix timestep dtype in legacy inpaint (#2120)
* Fix timestep dtype in legacy inpaint

This matches the structure in the text2img, img2img, and inpaint ONNX pipelines

* Fix style in dtype patch
2023-02-03 13:04:21 +01:00
Prathik Rao
a87e87fcbe refactor onnxruntime integration (#2042)
* refactor onnxruntime integration

* fix requirements.txt bug

* make style

* add support for textual_inversion

* make style

* add readme

* cleanup README files

* 1/27/2023 update to training scripts

* make style

* 1/30 update to train_unconditional

* style with black-22.8.0

---------

Co-authored-by: Prathik Rao <prathikrao@microsoft.com>
Co-authored-by: anton- <anton@huggingface.co>
2023-02-03 12:04:59 +01:00
Dudu Moshe
ecadcdefe1 [Bug] scheduling_ddpm: fix variance in the case of learned_range type. (#2090)
scheduling_ddpm: fix variance in the case of learned_range type.

In the case of learned_range variance type, there are missing logs
and exponent comparing to the theory (see "Improved Denoising Diffusion
Probabilistic Models" section 3.1 equation 15:
https://arxiv.org/pdf/2102.09672.pdf).
2023-02-03 09:42:42 +01:00
Pedro Cuenca
2bbd532990 Docs: short section on changing the scheduler in Flax (#2181)
* Short doc on changing the scheduler in Flax.

* Apply fix from @patil-suraj

Co-authored-by: Suraj Patil <surajp815@gmail.com>

---------

Co-authored-by: Suraj Patil <surajp815@gmail.com>
2023-02-02 18:52:21 +01:00
Adalberto
68ef0666e2 Create train_dreambooth_inpaint_lora.py (#2205)
* Create train_dreambooth_inpaint_lora.py

* Update train_dreambooth_inpaint_lora.py

* Update train_dreambooth_inpaint_lora.py

* Update train_dreambooth_inpaint_lora.py

* Update train_dreambooth_inpaint_lora.py
2023-02-02 13:15:15 +01:00
Kashif Rasul
7ac95703cd add CITATION.cff (#2211)
add citation.cff
2023-02-02 12:46:44 +01:00
Pedro Cuenca
3816c9ad9f Update xFormers docs (#2208)
Update xFormers docs.
2023-02-01 19:56:32 +01:00
Patrick von Platen
8267c78445 [Loading] Better error message on missing keys (#2198)
* up

* finish
2023-02-01 14:22:39 +01:00
Muyang Li
4fc7084875 Fix a dimension bug in Transform2d (#2144)
The dimension does not match when `inner_dim` is not equal to `in_channels`.
2023-02-01 10:11:45 +01:00
Sayak Paul
9213d81bd0 add: guide on kerascv conversion tool. (#2169)
* add: guide on kerascv conversion tool.

* Apply suggestions from code review

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com>
Co-authored-by: Suraj Patil <surajp815@gmail.com>

* address additional suggestions from review.

* change links to documentation-images.

* add separate links for training and inference goodies from diffusers.

* address Patrick's comments.

---------

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com>
Co-authored-by: Suraj Patil <surajp815@gmail.com>
2023-02-01 09:41:00 +01:00
Asad Memon
dd3cae3327 Pass LoRA rank to LoRALinearLayer (#2191) 2023-02-01 09:40:02 +01:00
Patrick von Platen
f73d0b6bec [Docs] remove license (#2188) 2023-01-31 22:11:32 +01:00
Patrick von Platen
d0d7ffffbd [Docs] Add components to docs (#2175) 2023-01-31 22:11:14 +01:00
Abhishek Varma
87cf88ed3d Use requests instead of wget in convert_from_ckpt.py (#2168)
-- This commit adopts `requests` in place of `wget` to fetch config `.yaml`
   files as part of `load_pipeline_from_original_stable_diffusion_ckpt` API.
-- This was done because in Windows PowerShell one needs to explicitly ensure
   that `wget` binary is part of the PATH variable. If not present, this leads
   to the code not being able to download the `.yaml` config file.

Signed-off-by: Abhishek Varma <abhishek@nod-labs.com>
Co-authored-by: Abhishek Varma <abhishek@nod-labs.com>
2023-01-31 14:35:45 +01:00
Patrick von Platen
60d915fbed make style 2023-01-31 11:46:48 +00:00
1lint
d1efefe15e [Breaking change] fix legacy inpaint noise and resize mask tensor (#2147)
* fix legacy inpaint noise and resize mask tensor

* updated legacy inpaint pipe test expected_slice
2023-01-31 12:44:35 +01:00
Sayak Paul
7d96b38b70 [examples] Fix CLI argument in the launch script command for text2image with LoRA (#2171)
* Update README.md

* Update README.md
2023-01-31 09:47:09 +01:00
Dudu Moshe
cedafb8600 [Bug]: fix DDPM scheduler arbitrary infer steps count. (#2076)
scheduling_ddpm: fix evaluate with lower timesteps count than train.

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2023-01-31 09:13:26 +01:00
Patrick von Platen
69caa96472 fix slow test 2023-01-31 07:39:30 +00:00
hysts
da113364df Add instance prompt to model card of lora dreambooth example (#2112) 2023-01-31 08:14:25 +01:00
Pedro Cuenca
44f6bc81c7 Don't copy when unwrapping model (#2166)
* Don't copy when unwrapping model.

Otherwise an exception is raised when using fp16.

* Remove unused import
2023-01-30 20:18:20 +01:00
Pedro Cuenca
164b6e0532 Section on using LoRA alpha / scale (#2139)
* Section on using LoRA alpha / scale.

* Accept suggestion

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

* Clarify on merge.

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2023-01-30 14:14:46 +01:00
Patrick von Platen
a6610db7a8 [Design philosopy] Create official doc (#2140)
* finish more

* finish philosophy

* Apply suggestions from code review

Co-authored-by: YiYi Xu <yixu310@gmail.com>
Co-authored-by: Will Berman <wlbberman@gmail.com>

---------

Co-authored-by: YiYi Xu <yixu310@gmail.com>
Co-authored-by: Will Berman <wlbberman@gmail.com>
2023-01-30 09:27:37 +01:00
Pedro Cuenca
0b68101a13 [diffusers-cli] Fix typo in accelerate and transformers versions (#2154)
Fix typo in accelerate and transformers versions.
2023-01-30 09:04:45 +01:00
Ayan Das
125d783076 fix typo in EMAModel's load_state_dict() (#2151)
Possible typo introduced in 7c82a16fc1
2023-01-29 13:23:18 +01:00
Pedro Cuenca
fdf70cb54b Fix typo (#2138) 2023-01-27 20:08:56 +01:00
Nicolas Patry
20396e2bd2 Adding some safetensors docs. (#2122)
* Tmp.

* Adding more docs.

* Doc style.

* Remove the argument `use_safetensors=True`.

* doc-builder
2023-01-27 18:20:50 +01:00
Will Berman
2cf34e6db4 [from_pretrained] only load config one time (#2131) 2023-01-27 08:23:55 -08:00
Patrick von Platen
04ad948673 make style 2 - sorry 2023-01-27 16:54:40 +02:00
Patrick von Platen
97ef5e0665 make style 2023-01-27 16:52:04 +02:00
Patrick von Platen
31be42209d Don't call the Hub if local_files_only is specifiied (#2119)
Don't call the Hub if
2023-01-27 09:42:33 +02:00
RahulBhalley
43c5ac2be7 Typo fix: torwards -> towards (#2134) 2023-01-27 08:20:18 +01:00
Ji soo Kim
c750a82374 Fix typos in loaders.py (#2137)
Fix typo in loaders.py
2023-01-27 08:20:07 +01:00
Patrick von Platen
0c39f53cbb Allow lora from pipeline (#2129)
* [LoRA] All to use in inference with pipeline

* [LoRA] allow cross attention kwargs passed to pipeline

* finish
2023-01-27 08:19:46 +01:00
Will Berman
0a5948e7f4 remove redundant allow_patterns (#2130) 2023-01-26 13:22:28 -08:00
Patrick von Platen
f653ded7ed [LoRA] Make sure LoRA can be disabled after it's run (#2128) 2023-01-26 21:26:11 +01:00
Will Berman
e92d43feb0 [nit] torch_dtype used twice in doc string (#2126) 2023-01-26 11:19:20 -08:00
hysts
7436e30c72 Fix model card of LoRA (#2114)
Fix
2023-01-26 19:08:45 +01:00
Will Berman
14976500ed fuse attention mask (#2111)
* fuse attention mask

* lint

* use 0 beta when no attention mask re: @Birch-san
2023-01-26 08:36:07 -08:00
Cyberes
96af5bf7d9 Fix unable to save_pretrained when using pathlib (#1972)
* fix PosixPath is not JSON serializable

* use PosixPath

* forgot elif like a dummy
2023-01-26 16:53:34 +01:00
Patrick von Platen
bbc2a03052 [Import Utils] Fix naming (#2118) 2023-01-26 15:54:59 +01:00
Suraj Patil
1e216be895 make scaling factor a config arg of vae/vqvae (#1860)
* make scaling factor cnfig arg of vae

* fix

* make flake happy

* fix ldm

* fix upscaler

* qualirty

* Apply suggestions from code review

Co-authored-by: Anton Lozhkov <anton@huggingface.co>
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* solve conflicts, addres some comments

* examples

* examples min version

* doc

* fix type

* typo

* Update src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_upscale.py

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* remove duplicate line

* Apply suggestions from code review

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

Co-authored-by: Anton Lozhkov <anton@huggingface.co>
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2023-01-26 14:37:19 +01:00
Pedro Cuenca
915a563611 Allow UNet2DModel to use arbitrary class embeddings (#2080)
* Allow `UNet2DModel` to use arbitrary class embeddings.

We can currently use class conditioning in `UNet2DConditionModel`, but
not in `UNet2DModel`. However, `UNet2DConditionModel` requires text
conditioning too, which is unrelated to other types of conditioning.
This commit makes it possible for `UNet2DModel` to be conditioned on
entities other than timesteps. This is useful for training /
research purposes. We can currently train models to perform
unconditional image generation or text-to-image generation, but it's not
straightforward to train a model to perform class-conditioned image
generation, if text conditioning is not required.

We could potentiall use `UNet2DConditionModel` for class-conditioning
without text embeddings by using down/up blocks without
cross-conditioning. However:
- The mid block currently requires cross attention.
- We are required to provide `encoder_hidden_states` to `forward`.

* Style

* Align class conditioning, add docstring for `num_class_embeds`.

* Copy docstring to versatile_diffusion UNetFlatConditionModel
2023-01-26 13:46:32 +01:00
Pedro Cuenca
0856137337 [textual inversion] Allow validation images (#2077)
* [textual inversion] Allow validation images.

* Change key to `validation`

* Specify format instead of transposing.

As discussed with @sayakpaul.

* Style

Co-authored-by: isamu-isozaki <isamu.website@gmail.com>
2023-01-26 09:20:03 +01:00
Suraj Patil
946d1cb200 [dreambooth] check the low-precision guard before preparing model (#2102)
check the dtype before preparing model
2023-01-25 11:06:33 -08:00
Patrick von Platen
09779cbb40 [Bump version] 0.13.0dev0 & Deprecate predict_epsilon (#2109)
* [Bump version] 0.13

* Bump model up

* up
2023-01-25 17:59:02 +01:00
Patrick von Platen
b0cc7c202b make style 2023-01-25 16:03:56 +02:00
Oren WANG
fb98acf03b [lora] Fix bug with training without validation (#2106) 2023-01-25 14:56:13 +01:00
466 changed files with 33203 additions and 2838 deletions

View File

@@ -1,4 +1,7 @@
contact_links:
- name: Blank issue
url: https://github.com/huggingface/diffusers/issues/new
about: General usage questions and community discussions
about: Other
- name: Forum
url: https://discuss.huggingface.co/
about: General usage questions and community discussions

View File

@@ -13,6 +13,7 @@ jobs:
with:
commit_sha: ${{ github.sha }}
package: diffusers
notebook_folder: diffusers_doc
languages: en ko
secrets:
token: ${{ secrets.HUGGINGFACE_PUSH }}

View File

@@ -27,9 +27,8 @@ jobs:
pip install .[quality]
- name: Check quality
run: |
black --check --preview examples tests src utils scripts
isort --check-only examples tests src utils scripts
flake8 examples tests src utils scripts
black --check examples tests src utils scripts
ruff examples tests src utils scripts
doc-builder style src/diffusers docs/source --max_len 119 --check_only --path_to_docs docs/source
check_repository_consistency:
@@ -48,3 +47,4 @@ jobs:
run: |
python utils/check_copies.py
python utils/check_dummies.py
make deps_table_check_updated

View File

@@ -36,6 +36,11 @@ jobs:
runner: docker-cpu
image: diffusers/diffusers-onnxruntime-cpu
report: onnx_cpu
- name: PyTorch Example CPU tests on Ubuntu
framework: pytorch_examples
runner: docker-cpu
image: diffusers/diffusers-pytorch-cpu
report: torch_cpu
name: ${{ matrix.config.name }}
@@ -90,6 +95,13 @@ jobs:
--make-reports=tests_${{ matrix.config.report }} \
tests/
- name: Run example PyTorch CPU tests
if: ${{ matrix.config.framework == 'pytorch_examples' }}
run: |
python -m pytest -n 2 --max-worker-restart=0 --dist=loadfile \
--make-reports=tests_${{ matrix.config.report }} \
examples/test_examples.py
- name: Failure short reports
if: ${{ failure() }}
run: cat reports/tests_${{ matrix.config.report }}_failures_short.txt

165
.github/workflows/push_tests_fast.yml vendored Normal file
View File

@@ -0,0 +1,165 @@
name: Slow tests on main
on:
push:
branches:
- main
env:
DIFFUSERS_IS_CI: yes
HF_HOME: /mnt/cache
OMP_NUM_THREADS: 8
MKL_NUM_THREADS: 8
PYTEST_TIMEOUT: 600
RUN_SLOW: no
jobs:
run_fast_tests:
strategy:
fail-fast: false
matrix:
config:
- name: Fast PyTorch CPU tests on Ubuntu
framework: pytorch
runner: docker-cpu
image: diffusers/diffusers-pytorch-cpu
report: torch_cpu
- name: Fast Flax CPU tests on Ubuntu
framework: flax
runner: docker-cpu
image: diffusers/diffusers-flax-cpu
report: flax_cpu
- name: Fast ONNXRuntime CPU tests on Ubuntu
framework: onnxruntime
runner: docker-cpu
image: diffusers/diffusers-onnxruntime-cpu
report: onnx_cpu
- name: PyTorch Example CPU tests on Ubuntu
framework: pytorch_examples
runner: docker-cpu
image: diffusers/diffusers-pytorch-cpu
report: torch_cpu
name: ${{ matrix.config.name }}
runs-on: ${{ matrix.config.runner }}
container:
image: ${{ matrix.config.image }}
options: --shm-size "16gb" --ipc host -v /mnt/hf_cache:/mnt/cache/
defaults:
run:
shell: bash
steps:
- name: Checkout diffusers
uses: actions/checkout@v3
with:
fetch-depth: 2
- name: Install dependencies
run: |
apt-get update && apt-get install libsndfile1-dev -y
python -m pip install -e .[quality,test]
python -m pip install -U git+https://github.com/huggingface/transformers
python -m pip install git+https://github.com/huggingface/accelerate
- name: Environment
run: |
python utils/print_env.py
- name: Run fast PyTorch CPU tests
if: ${{ matrix.config.framework == 'pytorch' }}
run: |
python -m pytest -n 2 --max-worker-restart=0 --dist=loadfile \
-s -v -k "not Flax and not Onnx" \
--make-reports=tests_${{ matrix.config.report }} \
tests/
- name: Run fast Flax TPU tests
if: ${{ matrix.config.framework == 'flax' }}
run: |
python -m pytest -n 2 --max-worker-restart=0 --dist=loadfile \
-s -v -k "Flax" \
--make-reports=tests_${{ matrix.config.report }} \
tests/
- name: Run fast ONNXRuntime CPU tests
if: ${{ matrix.config.framework == 'onnxruntime' }}
run: |
python -m pytest -n 2 --max-worker-restart=0 --dist=loadfile \
-s -v -k "Onnx" \
--make-reports=tests_${{ matrix.config.report }} \
tests/
- name: Run example PyTorch CPU tests
if: ${{ matrix.config.framework == 'pytorch_examples' }}
run: |
python -m pytest -n 2 --max-worker-restart=0 --dist=loadfile \
--make-reports=tests_${{ matrix.config.report }} \
examples/test_examples.py
- name: Failure short reports
if: ${{ failure() }}
run: cat reports/tests_${{ matrix.config.report }}_failures_short.txt
- name: Test suite reports artifacts
if: ${{ always() }}
uses: actions/upload-artifact@v2
with:
name: pr_${{ matrix.config.report }}_test_reports
path: reports
run_fast_tests_apple_m1:
name: Fast PyTorch MPS tests on MacOS
runs-on: [ self-hosted, apple-m1 ]
steps:
- name: Checkout diffusers
uses: actions/checkout@v3
with:
fetch-depth: 2
- name: Clean checkout
shell: arch -arch arm64 bash {0}
run: |
git clean -fxd
- name: Setup miniconda
uses: ./.github/actions/setup-miniconda
with:
python-version: 3.9
- name: Install dependencies
shell: arch -arch arm64 bash {0}
run: |
${CONDA_RUN} python -m pip install --upgrade pip
${CONDA_RUN} python -m pip install -e .[quality,test]
${CONDA_RUN} python -m pip install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cpu
${CONDA_RUN} python -m pip install git+https://github.com/huggingface/accelerate
${CONDA_RUN} python -m pip install -U git+https://github.com/huggingface/transformers
- name: Environment
shell: arch -arch arm64 bash {0}
run: |
${CONDA_RUN} python utils/print_env.py
- name: Run fast PyTorch tests on M1 (MPS)
shell: arch -arch arm64 bash {0}
env:
HF_HOME: /System/Volumes/Data/mnt/cache
HUGGING_FACE_HUB_TOKEN: ${{ secrets.HUGGING_FACE_HUB_TOKEN }}
run: |
${CONDA_RUN} python -m pytest -n 0 -s -v --make-reports=tests_torch_mps tests/
- name: Failure short reports
if: ${{ failure() }}
run: cat reports/tests_torch_mps_failures_short.txt
- name: Test suite reports artifacts
if: ${{ always() }}
uses: actions/upload-artifact@v2
with:
name: pr_torch_mps_test_reports
path: reports

3
.gitignore vendored
View File

@@ -169,3 +169,6 @@ tags
# dependencies
/transformers
# ruff
.ruff_cache

40
CITATION.cff Normal file
View File

@@ -0,0 +1,40 @@
cff-version: 1.2.0
title: 'Diffusers: State-of-the-art diffusion models'
message: >-
If you use this software, please cite it using the
metadata from this file.
type: software
authors:
- given-names: Patrick
family-names: von Platen
- given-names: Suraj
family-names: Patil
- given-names: Anton
family-names: Lozhkov
- given-names: Pedro
family-names: Cuenca
- given-names: Nathan
family-names: Lambert
- given-names: Kashif
family-names: Rasul
- given-names: Mishig
family-names: Davaadorj
- given-names: Thomas
family-names: Wolf
repository-code: 'https://github.com/huggingface/diffusers'
abstract: >-
Diffusers provides pretrained diffusion models across
multiple modalities, such as vision and audio, and serves
as a modular toolbox for inference and training of
diffusion models.
keywords:
- deep-learning
- pytorch
- image-generation
- diffusion
- text2image
- image2image
- score-based-generative-modeling
- stable-diffusion
license: Apache-2.0
version: 0.12.1

View File

@@ -1,5 +1,5 @@
<!---
Copyright 2022 The HuggingFace Team. All rights reserved.
Copyright 2023 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
@@ -177,7 +177,7 @@ Follow these steps to start contributing ([supported Python versions](https://gi
$ make style
```
🧨 Diffusers also uses `flake8` and a few custom scripts to check for coding mistakes. Quality
🧨 Diffusers also uses `ruff` and a few custom scripts to check for coding mistakes. Quality
control runs in CI, however you can also run the same checks with:
```bash

View File

@@ -9,9 +9,8 @@ modified_only_fixup:
$(eval modified_py_files := $(shell python utils/get_modified_files.py $(check_dirs)))
@if test -n "$(modified_py_files)"; then \
echo "Checking/fixing $(modified_py_files)"; \
black --preview $(modified_py_files); \
isort $(modified_py_files); \
flake8 $(modified_py_files); \
black $(modified_py_files); \
ruff $(modified_py_files); \
else \
echo "No library .py files were modified"; \
fi
@@ -41,9 +40,8 @@ repo-consistency:
# this target runs checks on all files
quality:
black --check --preview $(check_dirs)
isort --check-only $(check_dirs)
flake8 $(check_dirs)
black --check $(check_dirs)
ruff $(check_dirs)
doc-builder style src/diffusers docs/source --max_len 119 --check_only --path_to_docs docs/source
python utils/check_doc_toc.py
@@ -57,8 +55,8 @@ extra_style_checks:
# this target runs checks on all files and potentially modifies some of them
style:
black --preview $(check_dirs)
isort $(check_dirs)
black $(check_dirs)
ruff $(check_dirs) --fix
${MAKE} autogenerate_code
${MAKE} extra_style_checks

View File

@@ -467,12 +467,12 @@ image.save("ddpm_generated_image.png")
- [Unconditional Diffusion with continuous scheduler](https://huggingface.co/google/ncsnpp-ffhq-1024)
**Other Image Notebooks**:
* [image-to-image generation with Stable Diffusion](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/image_2_image_using_diffusers.ipynb) ![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg),
* [tweak images via repeated Stable Diffusion seeds](https://colab.research.google.com/github/pcuenca/diffusers-examples/blob/main/notebooks/stable-diffusion-seeds.ipynb) ![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg),
* [image-to-image generation with Stable Diffusion](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/image_2_image_using_diffusers.ipynb) ![Open In Colab](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/image_2_image_using_diffusers.ipynb),
* [tweak images via repeated Stable Diffusion seeds](https://colab.research.google.com/github/pcuenca/diffusers-examples/blob/main/notebooks/stable-diffusion-seeds.ipynb) ![Open In Colab](https://colab.research.google.com/github/pcuenca/diffusers-examples/blob/main/notebooks/stable-diffusion-seeds.ipynb),
**Diffusers for Other Modalities**:
* [Molecule conformation generation](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/geodiff_molecule_conformation.ipynb) ![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg),
* [Model-based reinforcement learning](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/reinforcement_learning_with_diffusers.ipynb) ![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg),
* [Molecule conformation generation](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/geodiff_molecule_conformation.ipynb) ![Open In Colab](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/geodiff_molecule_conformation.ipynb),
* [Model-based reinforcement learning](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/reinforcement_learning_with_diffusers.ipynb) ![Open In Colab](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/reinforcement_learning_with_diffusers.ipynb),
### Web Demos
If you just want to play around with some web demos, you can try out the following 🚀 Spaces:

View File

@@ -1,5 +1,5 @@
<!---
Copyright 2022- The HuggingFace Team. All rights reserved.
Copyright 2023- The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.

9
docs/source/_config.py Normal file
View File

@@ -0,0 +1,9 @@
# docstyle-ignore
INSTALL_CONTENT = """
# Diffusers installation
! pip install diffusers transformers datasets accelerate
# To install from source instead of the last release, comment the command above and uncomment the following one.
# ! pip install git+https://github.com/huggingface/diffusers.git
"""
notebook_first_cells = [{"type": "code", "content": INSTALL_CONTENT}]

View File

@@ -8,6 +8,10 @@
- local: installation
title: Installation
title: Get started
- sections:
- local: tutorials/basic_training
title: Train a diffusion model
title: Tutorials
- sections:
- sections:
- local: using-diffusers/loading
@@ -18,6 +22,8 @@
title: Configuring Pipelines, Models, and Schedulers
- local: using-diffusers/custom_pipeline_overview
title: Loading and Adding Custom Pipelines
- local: using-diffusers/kerascv
title: Using KerasCV Stable Diffusion Checkpoints in Diffusers
title: Loading & Hub
- sections:
- local: using-diffusers/unconditional_image_generation
@@ -30,6 +36,8 @@
title: Text-Guided Image-Inpainting
- local: using-diffusers/depth2img
title: Text-Guided Depth-to-Image
- local: using-diffusers/controlling_generation
title: Controlling generation
- local: using-diffusers/reusing_seeds
title: Reusing seeds for deterministic generation
- local: using-diffusers/reproducibility
@@ -38,6 +46,10 @@
title: Community Pipelines
- local: using-diffusers/contribute_pipeline
title: How to contribute a Pipeline
- local: using-diffusers/using_safetensors
title: Using safetensors
- local: using-diffusers/weighted_prompts
title: Weighting Prompts
title: Pipelines for Inference
- sections:
- local: using-diffusers/rl
@@ -51,6 +63,8 @@
- sections:
- local: optimization/fp16
title: Memory and Speed
- local: optimization/torch2.0
title: Torch2.0 support
- local: optimization/xformers
title: xFormers
- local: optimization/onnx
@@ -70,17 +84,19 @@
- local: training/text_inversion
title: Textual Inversion
- local: training/dreambooth
title: Dreambooth
title: DreamBooth
- local: training/text2image
title: Text-to-image fine-tuning
title: Text-to-image
- local: training/lora
title: LoRA Support in Diffusers
title: Low-Rank Adaptation of Large Language Models (LoRA)
title: Training
- sections:
- local: conceptual/philosophy
title: Philosophy
- local: conceptual/contribution
title: How to contribute?
- local: conceptual/ethical_guidelines
title: Diffusers' Ethical Guidelines
title: Conceptual Guides
- sections:
- sections:
@@ -126,6 +142,8 @@
title: Safe Stable Diffusion
- local: api/pipelines/score_sde_ve
title: Score SDE VE
- local: api/pipelines/semantic_stable_diffusion
title: Semantic Guidance
- sections:
- local: api/pipelines/stable_diffusion/overview
title: Overview
@@ -141,11 +159,25 @@
title: Image-Variation
- local: api/pipelines/stable_diffusion/upscale
title: Super-Resolution
- local: api/pipelines/stable_diffusion/latent_upscale
title: Stable-Diffusion-Latent-Upscaler
- local: api/pipelines/stable_diffusion/pix2pix
title: InstructPix2Pix
- local: api/pipelines/stable_diffusion/attend_and_excite
title: Attend and Excite
- local: api/pipelines/stable_diffusion/pix2pix_zero
title: Pix2Pix Zero
- local: api/pipelines/stable_diffusion/self_attention_guidance
title: Self-Attention Guidance
- local: api/pipelines/stable_diffusion/panorama
title: MultiDiffusion Panorama
- local: api/pipelines/stable_diffusion/controlnet
title: Text-to-Image Generation with ControlNet Conditioning
title: Stable Diffusion
- local: api/pipelines/stable_diffusion_2
title: Stable Diffusion 2
- local: api/pipelines/stable_unclip
title: Stable unCLIP
- local: api/pipelines/stochastic_karras_ve
title: Stochastic Karras VE
- local: api/pipelines/unclip
@@ -162,6 +194,8 @@
title: Overview
- local: api/schedulers/ddim
title: DDIM
- local: api/schedulers/ddim_inverse
title: DDIMInverse
- local: api/schedulers/ddpm
title: DDPM
- local: api/schedulers/deis
@@ -190,6 +224,8 @@
title: Singlestep DPM-Solver
- local: api/schedulers/stochastic_karras_ve
title: Stochastic Kerras VE
- local: api/schedulers/unipc
title: UniPCMultistepScheduler
- local: api/schedulers/score_sde_ve
title: VE-SDE
- local: api/schedulers/score_sde_vp

View File

@@ -1,4 +1,4 @@
<!--Copyright 2022 The HuggingFace Team. All rights reserved.
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

View File

@@ -1,4 +1,4 @@
<!--Copyright 2022 The HuggingFace Team. All rights reserved.
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
@@ -34,6 +34,7 @@ Any pipeline object can be saved locally with [`~DiffusionPipeline.save_pretrain
- __call__
- device
- to
- components
## ImagePipelineOutput
By default diffusion pipelines return an object of class

View File

@@ -1,4 +1,4 @@
<!--Copyright 2022 The HuggingFace Team. All rights reserved.
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

View File

@@ -1,4 +1,4 @@
<!--Copyright 2022 The HuggingFace Team. All rights reserved.
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
@@ -12,7 +12,7 @@ specific language governing permissions and limitations under the License.
# Loaders
There are many weights to train adapter neural networks for diffusion models, such as
There are many ways to train adapter neural networks for diffusion models, such as
- [Textual Inversion](./training/text_inversion.mdx)
- [LoRA](https://github.com/cloneofsimo/lora)
- [Hypernetworks](https://arxiv.org/abs/1609.09106)

View File

@@ -1,4 +1,4 @@
<!--Copyright 2022 The HuggingFace Team. All rights reserved.
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

View File

@@ -1,4 +1,4 @@
<!--Copyright 2022 The HuggingFace Team. All rights reserved.
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
@@ -64,6 +64,12 @@ The models are built on the base class ['ModelMixin'] that is a `torch.nn.module
## PriorTransformerOutput
[[autodoc]] models.prior_transformer.PriorTransformerOutput
## ControlNetOutput
[[autodoc]] models.controlnet.ControlNetOutput
## ControlNetModel
[[autodoc]] ControlNetModel
## FlaxModelMixin
[[autodoc]] FlaxModelMixin

View File

@@ -1,4 +1,4 @@
<!--Copyright 2022 The HuggingFace Team. All rights reserved.
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

View File

@@ -1,4 +1,4 @@
<!--Copyright 2022 The HuggingFace Team. All rights reserved.
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

View File

@@ -1,4 +1,4 @@
<!--Copyright 2022 The HuggingFace Team. All rights reserved.
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

View File

@@ -1,4 +1,4 @@
<!--Copyright 2022 The HuggingFace Team. All rights reserved.
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

View File

@@ -1,4 +1,4 @@
<!--Copyright 2022 The HuggingFace Team. All rights reserved.
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

View File

@@ -1,4 +1,4 @@
<!--Copyright 2022 The HuggingFace Team. All rights reserved.
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

View File

@@ -1,4 +1,4 @@
<!--Copyright 2022 The HuggingFace Team. All rights reserved.
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

View File

@@ -1,4 +1,4 @@
<!--Copyright 2022 The HuggingFace Team. All rights reserved.
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

View File

@@ -1,4 +1,4 @@
<!--Copyright 2022 The HuggingFace Team. All rights reserved.
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

View File

@@ -1,4 +1,4 @@
<!--Copyright 2022 The HuggingFace Team. All rights reserved.
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

View File

@@ -1,4 +1,4 @@
<!--Copyright 2022 The HuggingFace Team. All rights reserved.
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
@@ -46,6 +46,7 @@ available a colab notebook to directly try them out.
|---|---|:---:|:---:|
| [alt_diffusion](./alt_diffusion) | [**AltDiffusion**](https://arxiv.org/abs/2211.06679) | Image-to-Image Text-Guided Generation | -
| [audio_diffusion](./audio_diffusion) | [**Audio Diffusion**](https://github.com/teticio/audio_diffusion.git) | Unconditional Audio Generation |
| [controlnet](./api/pipelines/stable_diffusion/controlnet) | [**ControlNet with Stable Diffusion**](https://arxiv.org/abs/2302.05543) | Image-to-Image Text-Guided Generation | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/controlnet.ipynb)
| [cycle_diffusion](./cycle_diffusion) | [**Cycle Diffusion**](https://arxiv.org/abs/2210.05559) | Image-to-Image Text-Guided Generation |
| [dance_diffusion](./dance_diffusion) | [**Dance Diffusion**](https://github.com/williamberman/diffusers.git) | Unconditional Audio Generation |
| [ddpm](./ddpm) | [**Denoising Diffusion Probabilistic Models**](https://arxiv.org/abs/2006.11239) | Unconditional Image Generation |
@@ -57,13 +58,24 @@ available a colab notebook to directly try them out.
| [pndm](./pndm) | [**Pseudo Numerical Methods for Diffusion Models on Manifolds**](https://arxiv.org/abs/2202.09778) | Unconditional Image Generation |
| [score_sde_ve](./score_sde_ve) | [**Score-Based Generative Modeling through Stochastic Differential Equations**](https://openreview.net/forum?id=PxTIG12RRHS) | Unconditional Image Generation |
| [score_sde_vp](./score_sde_vp) | [**Score-Based Generative Modeling through Stochastic Differential Equations**](https://openreview.net/forum?id=PxTIG12RRHS) | Unconditional Image Generation |
| [stable_diffusion](./stable_diffusion) | [**Stable Diffusion**](https://stability.ai/blog/stable-diffusion-public-release) | Text-to-Image Generation | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/training_example.ipynb)
| [stable_diffusion](./stable_diffusion) | [**Stable Diffusion**](https://stability.ai/blog/stable-diffusion-public-release) | Image-to-Image Text-Guided Generation | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/image_2_image_using_diffusers.ipynb)
| [stable_diffusion](./stable_diffusion) | [**Stable Diffusion**](https://stability.ai/blog/stable-diffusion-public-release) | Text-Guided Image Inpainting | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/in_painting_with_stable_diffusion_using_diffusers.ipynb)
| [stable_diffusion_2](./stable_diffusion_2) | [**Stable Diffusion 2**](https://stability.ai/blog/stable-diffusion-v2-release) | Text-to-Image Generation |
| [semantic_stable_diffusion](./semantic_stable_diffusion) | [**SEGA: Instructing Diffusion using Semantic Dimensions**](https://arxiv.org/abs/2301.12247) | Text-to-Image Generation |
| [stable_diffusion_text2img](./stable_diffusion/text2img) | [**Stable Diffusion**](https://stability.ai/blog/stable-diffusion-public-release) | Text-to-Image Generation | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/training_example.ipynb)
| [stable_diffusion_img2img](./stable_diffusion/img2img) | [**Stable Diffusion**](https://stability.ai/blog/stable-diffusion-public-release) | Image-to-Image Text-Guided Generation | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/image_2_image_using_diffusers.ipynb)
| [stable_diffusion_inpaint](./stable_diffusion/inpaint) | [**Stable Diffusion**](https://stability.ai/blog/stable-diffusion-public-release) | Text-Guided Image Inpainting | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/in_painting_with_stable_diffusion_using_diffusers.ipynb)
| [stable_diffusion_panorama](./stable_diffusion/panorama) | [**MultiDiffusion: Fusing Diffusion Paths for Controlled Image Generation**](https://arxiv.org/abs/2302.08113) | Text-Guided Panorama View Generation |
| [stable_diffusion_pix2pix](./stable_diffusion/pix2pix) | [**InstructPix2Pix: Learning to Follow Image Editing Instructions**](https://arxiv.org/abs/2211.09800) | Text-Based Image Editing |
| [stable_diffusion_pix2pix_zero](./stable_diffusion/pix2pix_zero) | [**Zero-shot Image-to-Image Translation**](https://arxiv.org/abs/2302.03027) | Text-Based Image Editing |
| [stable_diffusion_attend_and_excite](./stable_diffusion/attend_and_excite) | [**Attend and Excite: Attention-Based Semantic Guidance for Text-to-Image Diffusion Models**](https://arxiv.org/abs/2301.13826) | Text-to-Image Generation |
| [stable_diffusion_self_attention_guidance](./stable_diffusion/self_attention_guidance) | [**Self-Attention Guidance**](https://arxiv.org/abs/2210.00939) | Text-to-Image Generation |
| [stable_diffusion_image_variation](./stable_diffusion/image_variation) | [**Stable Diffusion Image Variations**](https://github.com/LambdaLabsML/lambda-diffusers#stable-diffusion-image-variations) | Image-to-Image Generation |
| [stable_diffusion_latent_upscale](./stable_diffusion/latent_upscale) | [**Stable Diffusion Latent Upscaler**](https://twitter.com/StabilityAI/status/1590531958815064065) | Text-Guided Super Resolution Image-to-Image |
| [stable_diffusion_2](./stable_diffusion_2/) | [**Stable Diffusion 2**](https://stability.ai/blog/stable-diffusion-v2-release) | Text-to-Image Generation |
| [stable_diffusion_2](./stable_diffusion_2) | [**Stable Diffusion 2**](https://stability.ai/blog/stable-diffusion-v2-release) | Text-Guided Image Inpainting |
| [stable_diffusion_2](./stable_diffusion_2) | [**Stable Diffusion 2**](https://stability.ai/blog/stable-diffusion-v2-release) | Depth-to-Image Text-Guided Generation |
| [stable_diffusion_2](./stable_diffusion_2) | [**Stable Diffusion 2**](https://stability.ai/blog/stable-diffusion-v2-release) | Text-Guided Super Resolution Image-to-Image |
| [stable_diffusion_safe](./stable_diffusion_safe) | [**Safe Stable Diffusion**](https://arxiv.org/abs/2211.05105) | Text-Guided Generation | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/ml-research/safe-latent-diffusion/blob/main/examples/Safe%20Latent%20Diffusion.ipynb)
| [stable_unclip](./stable_unclip) | **Stable unCLIP** | Text-to-Image Generation |
| [stable_unclip](./stable_unclip) | **Stable unCLIP** | Image-to-Image Text-Guided Generation |
| [stochastic_karras_ve](./stochastic_karras_ve) | [**Elucidating the Design Space of Diffusion-Based Generative Models**](https://arxiv.org/abs/2206.00364) | Unconditional Image Generation |
| [unclip](./unclip) | [Hierarchical Text-Conditional Image Generation with CLIP Latents](https://arxiv.org/abs/2204.06125) | Text-to-Image Generation |
| [versatile_diffusion](./versatile_diffusion) | [Versatile Diffusion: Text, Images and Variations All in One Diffusion Model](https://arxiv.org/abs/2211.08332) | Text-to-Image Generation |

View File

@@ -1,4 +1,4 @@
<!--Copyright 2022 The HuggingFace Team. All rights reserved.
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

View File

@@ -1,4 +1,4 @@
<!--Copyright 2022 The HuggingFace Team. All rights reserved.
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

View File

@@ -1,4 +1,4 @@
<!--Copyright 2022 The HuggingFace Team. All rights reserved.
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

View File

@@ -1,4 +1,4 @@
<!--Copyright 2022 The HuggingFace Team. All rights reserved.
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

View File

@@ -0,0 +1,79 @@
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->
# Semantic Guidance
Semantic Guidance for Diffusion Models was proposed in [SEGA: Instructing Diffusion using Semantic Dimensions](https://arxiv.org/abs/2301.12247) and provides strong semantic control over the image generation.
Small changes to the text prompt usually result in entirely different output images. However, with SEGA a variety of changes to the image are enabled that can be controlled easily and intuitively, and stay true to the original image composition.
The abstract of the paper is the following:
*Text-to-image diffusion models have recently received a lot of interest for their astonishing ability to produce high-fidelity images from text only. However, achieving one-shot generation that aligns with the user's intent is nearly impossible, yet small changes to the input prompt often result in very different images. This leaves the user with little semantic control. To put the user in control, we show how to interact with the diffusion process to flexibly steer it along semantic directions. This semantic guidance (SEGA) allows for subtle and extensive edits, changes in composition and style, as well as optimizing the overall artistic conception. We demonstrate SEGA's effectiveness on a variety of tasks and provide evidence for its versatility and flexibility.*
*Overview*:
| Pipeline | Tasks | Colab | Demo
|---|---|:---:|:---:|
| [pipeline_semantic_stable_diffusion.py](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/semantic_stable_diffusion/pipeline_semantic_stable_diffusion) | *Text-to-Image Generation* | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/ml-research/semantic-image-editing/blob/main/examples/SemanticGuidance.ipynb) | [Coming Soon](https://huggingface.co/AIML-TUDA)
## Tips
- The Semantic Guidance pipeline can be used with any [Stable Diffusion](./api/pipelines/stable_diffusion/text2img) checkpoint.
### Run Semantic Guidance
The interface of [`SemanticStableDiffusionPipeline`] provides several additional parameters to influence the image generation.
Exemplary usage may look like this:
```python
import torch
from diffusers import SemanticStableDiffusionPipeline
pipe = SemanticStableDiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5", torch_dtype=torch.float16)
pipe = pipe.to("cuda")
out = pipe(
prompt="a photo of the face of a woman",
num_images_per_prompt=1,
guidance_scale=7,
editing_prompt=[
"smiling, smile", # Concepts to apply
"glasses, wearing glasses",
"curls, wavy hair, curly hair",
"beard, full beard, mustache",
],
reverse_editing_direction=[False, False, False, False], # Direction of guidance i.e. increase all concepts
edit_warmup_steps=[10, 10, 10, 10], # Warmup period for each concept
edit_guidance_scale=[4, 5, 5, 5.4], # Guidance scale for each concept
edit_threshold=[
0.99,
0.975,
0.925,
0.96,
], # Threshold for each concept. Threshold equals the percentile of the latent space that will be discarded. I.e. threshold=0.99 uses 1% of the latent dimensions
edit_momentum_scale=0.3, # Momentum scale that will be added to the latent guidance
edit_mom_beta=0.6, # Momentum beta
edit_weights=[1, 1, 1, 1, 1], # Weights of the individual concepts against each other
)
```
For more examples check the colab notebook.
## StableDiffusionSafePipelineOutput
[[autodoc]] pipelines.semantic_stable_diffusion.SemanticStableDiffusionPipelineOutput
- all
## SemanticStableDiffusionPipeline
[[autodoc]] SemanticStableDiffusionPipeline
- all
- __call__

View File

@@ -0,0 +1,75 @@
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->
# Attend and Excite: Attention-Based Semantic Guidance for Text-to-Image Diffusion Models
## Overview
Attend and Excite for Stable Diffusion was proposed in [Attend-and-Excite: Attention-Based Semantic Guidance for Text-to-Image Diffusion Models](https://attendandexcite.github.io/Attend-and-Excite/) and provides textual attention control over the image generation.
The abstract of the paper is the following:
*Text-to-image diffusion models have recently received a lot of interest for their astonishing ability to produce high-fidelity images from text only. However, achieving one-shot generation that aligns with the user's intent is nearly impossible, yet small changes to the input prompt often result in very different images. This leaves the user with little semantic control. To put the user in control, we show how to interact with the diffusion process to flexibly steer it along semantic directions. This semantic guidance (SEGA) allows for subtle and extensive edits, changes in composition and style, as well as optimizing the overall artistic conception. We demonstrate SEGA's effectiveness on a variety of tasks and provide evidence for its versatility and flexibility.*
Resources
* [Project Page](https://attendandexcite.github.io/Attend-and-Excite/)
* [Paper](https://arxiv.org/abs/2301.13826)
* [Original Code](https://github.com/AttendAndExcite/Attend-and-Excite)
* [Demo](https://huggingface.co/spaces/AttendAndExcite/Attend-and-Excite)
## Available Pipelines:
| Pipeline | Tasks | Colab | Demo
|---|---|:---:|:---:|
| [pipeline_semantic_stable_diffusion_attend_and_excite.py](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/stable_diffusion/pipeline_semantic_stable_diffusion_attend_and_excite) | *Text-to-Image Generation* | - | https://huggingface.co/spaces/AttendAndExcite/Attend-and-Excite
### Usage example
```python
import torch
from diffusers import StableDiffusionAttendAndExcitePipeline
model_id = "CompVis/stable-diffusion-v1-4"
pipe = StableDiffusionAttendAndExcitePipeline.from_pretrained(model_id, torch_dtype=torch.float16).to("cuda")
pipe = pipe.to("cuda")
prompt = "a cat and a frog"
# use get_indices function to find out indices of the tokens you want to alter
pipe.get_indices(prompt)
token_indices = [2, 5]
seed = 6141
generator = torch.Generator("cuda").manual_seed(seed)
images = pipe(
prompt=prompt,
token_indices=token_indices,
guidance_scale=7.5,
generator=generator,
num_inference_steps=50,
max_iter_to_alter=25,
).images
image = images[0]
image.save(f"../images/{prompt}_{seed}.png")
```
## StableDiffusionAttendAndExcitePipeline
[[autodoc]] StableDiffusionAttendAndExcitePipeline
- all
- __call__

View File

@@ -0,0 +1,167 @@
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->
# Text-to-Image Generation with ControlNet Conditioning
## Overview
[Adding Conditional Control to Text-to-Image Diffusion Models](https://arxiv.org/abs/2302.05543) by Lvmin Zhang and Maneesh Agrawala.
Using the pretrained models we can provide control images (for example, a depth map) to control Stable Diffusion text-to-image generation so that it follows the structure of the depth image and fills in the details.
The abstract of the paper is the following:
*We present a neural network structure, ControlNet, to control pretrained large diffusion models to support additional input conditions. The ControlNet learns task-specific conditions in an end-to-end way, and the learning is robust even when the training dataset is small (< 50k). Moreover, training a ControlNet is as fast as fine-tuning a diffusion model, and the model can be trained on a personal devices. Alternatively, if powerful computation clusters are available, the model can scale to large amounts (millions to billions) of data. We report that large diffusion models like Stable Diffusion can be augmented with ControlNets to enable conditional inputs like edge maps, segmentation maps, keypoints, etc. This may enrich the methods to control large diffusion models and further facilitate related applications.*
This model was contributed by the amazing community contributor [takuma104](https://huggingface.co/takuma104) ❤️ .
Resources:
* [Paper](https://arxiv.org/abs/2302.05543)
* [Original Code](https://github.com/lllyasviel/ControlNet)
## Available Pipelines:
| Pipeline | Tasks | Demo
|---|---|:---:|
| [StableDiffusionControlNetPipeline](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_controlnet.py) | *Text-to-Image Generation with ControlNet Conditioning* | [Colab Example](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/controlnet.ipynb)
## Usage example
In the following we give a simple example of how to use a *ControlNet* checkpoint with Diffusers for inference.
The inference pipeline is the same for all pipelines:
* 1. Take an image and run it through a pre-conditioning processor.
* 2. Run the pre-processed image through the [`StableDiffusionControlNetPipeline`].
Let's have a look at a simple example using the [Canny Edge ControlNet](https://huggingface.co/lllyasviel/sd-controlnet-canny).
```python
from diffusers import StableDiffusionControlNetPipeline
from diffusers.utils import load_image
# Let's load the popular vermeer image
image = load_image(
"https://hf.co/datasets/huggingface/documentation-images/resolve/main/diffusers/input_image_vermeer.png"
)
```
![img](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/input_image_vermeer.png)
Next, we process the image to get the canny image. This is step *1.* - running the pre-conditioning processor. The pre-conditioning processor is different for every ControlNet. Please see the model cards of the [official checkpoints](#controlnet-with-stable-diffusion-1.5) for more information about other models.
First, we need to install opencv:
```
pip install opencv-contrib-python
```
Next, let's also install all required Hugging Face libraries:
```
pip install diffusers transformers git+https://github.com/huggingface/accelerate.git
```
Then we can retrieve the canny edges of the image.
```python
import cv2
from PIL import Image
import numpy as np
image = np.array(image)
low_threshold = 100
high_threshold = 200
image = cv2.Canny(image, low_threshold, high_threshold)
image = image[:, :, None]
image = np.concatenate([image, image, image], axis=2)
canny_image = Image.fromarray(image)
```
Let's take a look at the processed image.
![img](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/vermeer_canny_edged.png)
Now, we load the official [Stable Diffusion 1.5 Model](runwayml/stable-diffusion-v1-5) as well as the ControlNet for canny edges.
```py
from diffusers import StableDiffusionControlNetPipeline, ControlNetModel
import torch
controlnet = ControlNetModel.from_pretrained("lllyasviel/sd-controlnet-canny", torch_dtype=torch.float16)
pipe = StableDiffusionControlNetPipeline.from_pretrained(
"runwayml/stable-diffusion-v1-5", controlnet=controlnet, torch_dtype=torch.float16
)
```
To speed-up things and reduce memory, let's enable model offloading and use the fast [`UniPCMultistepScheduler`].
```py
from diffusers import UniPCMultistepScheduler
pipe.scheduler = UniPCMultistepScheduler.from_config(pipe.scheduler.config)
# this command loads the individual model components on GPU on-demand.
pipe.enable_model_cpu_offload()
```
Finally, we can run the pipeline:
```py
generator = torch.manual_seed(0)
out_image = pipe(
"disco dancer with colorful lights", num_inference_steps=20, generator=generator, image=canny_image
).images[0]
```
This should take only around 3-4 seconds on GPU (depending on hardware). The output image then looks as follows:
![img](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/vermeer_disco_dancing.png)
**Note**: To see how to run all other ControlNet checkpoints, please have a look at [ControlNet with Stable Diffusion 1.5](#controlnet-with-stable-diffusion-1.5)
<!-- TODO: add space -->
## Available checkpoints
ControlNet requires a *control image* in addition to the text-to-image *prompt*.
Each pretrained model is trained using a different conditioning method that requires different images for conditioning the generated outputs. For example, Canny edge conditioning requires the control image to be the output of a Canny filter, while depth conditioning requires the control image to be a depth map. See the overview and image examples below to know more.
All checkpoints can be found under the authors' namespace [lllyasviel](https://huggingface.co/lllyasviel).
### ControlNet with Stable Diffusion 1.5
| Model Name | Control Image Overview| Control Image Example | Generated Image Example |
|---|---|---|---|
|[lllyasviel/sd-controlnet-canny](https://huggingface.co/lllyasviel/sd-controlnet-canny)<br/> *Trained with canny edge detection* | A monochrome image with white edges on a black background.|<a href="https://huggingface.co/takuma104/controlnet_dev/blob/main/gen_compare/control_images/converted/control_bird_canny.png"><img width="64" style="margin:0;padding:0;" src="https://huggingface.co/takuma104/controlnet_dev/resolve/main/gen_compare/control_images/converted/control_bird_canny.png"/></a>|<a href="https://huggingface.co/takuma104/controlnet_dev/resolve/main/gen_compare/output_images/diffusers/output_bird_canny_1.png"><img width="64" src="https://huggingface.co/takuma104/controlnet_dev/resolve/main/gen_compare/output_images/diffusers/output_bird_canny_1.png"/></a>|
|[lllyasviel/sd-controlnet-depth](https://huggingface.co/lllyasviel/sd-controlnet-depth)<br/> *Trained with Midas depth estimation* |A grayscale image with black representing deep areas and white representing shallow areas.|<a href="https://huggingface.co/takuma104/controlnet_dev/blob/main/gen_compare/control_images/converted/control_vermeer_depth.png"><img width="64" src="https://huggingface.co/takuma104/controlnet_dev/resolve/main/gen_compare/control_images/converted/control_vermeer_depth.png"/></a>|<a href="https://huggingface.co/takuma104/controlnet_dev/resolve/main/gen_compare/output_images/diffusers/output_vermeer_depth_2.png"><img width="64" src="https://huggingface.co/takuma104/controlnet_dev/resolve/main/gen_compare/output_images/diffusers/output_vermeer_depth_2.png"/></a>|
|[lllyasviel/sd-controlnet-hed](https://huggingface.co/lllyasviel/sd-controlnet-hed)<br/> *Trained with HED edge detection (soft edge)* |A monochrome image with white soft edges on a black background.|<a href="https://huggingface.co/takuma104/controlnet_dev/blob/main/gen_compare/control_images/converted/control_bird_hed.png"><img width="64" src="https://huggingface.co/takuma104/controlnet_dev/resolve/main/gen_compare/control_images/converted/control_bird_hed.png"/></a>|<a href="https://huggingface.co/takuma104/controlnet_dev/resolve/main/gen_compare/output_images/diffusers/output_bird_hed_1.png"><img width="64" src="https://huggingface.co/takuma104/controlnet_dev/resolve/main/gen_compare/output_images/diffusers/output_bird_hed_1.png"/></a> |
|[lllyasviel/sd-controlnet-mlsd](https://huggingface.co/lllyasviel/sd-controlnet-mlsd)<br/> *Trained with M-LSD line detection* |A monochrome image composed only of white straight lines on a black background.|<a href="https://huggingface.co/takuma104/controlnet_dev/blob/main/gen_compare/control_images/converted/control_room_mlsd.png"><img width="64" src="https://huggingface.co/takuma104/controlnet_dev/resolve/main/gen_compare/control_images/converted/control_room_mlsd.png"/></a>|<a href="https://huggingface.co/takuma104/controlnet_dev/resolve/main/gen_compare/output_images/diffusers/output_room_mlsd_0.png"><img width="64" src="https://huggingface.co/takuma104/controlnet_dev/resolve/main/gen_compare/output_images/diffusers/output_room_mlsd_0.png"/></a>|
|[lllyasviel/sd-controlnet-normal](https://huggingface.co/lllyasviel/sd-controlnet-normal)<br/> *Trained with normal map* |A [normal mapped](https://en.wikipedia.org/wiki/Normal_mapping) image.|<a href="https://huggingface.co/takuma104/controlnet_dev/blob/main/gen_compare/control_images/converted/control_human_normal.png"><img width="64" src="https://huggingface.co/takuma104/controlnet_dev/resolve/main/gen_compare/control_images/converted/control_human_normal.png"/></a>|<a href="https://huggingface.co/takuma104/controlnet_dev/resolve/main/gen_compare/output_images/diffusers/output_human_normal_1.png"><img width="64" src="https://huggingface.co/takuma104/controlnet_dev/resolve/main/gen_compare/output_images/diffusers/output_human_normal_1.png"/></a>|
|[lllyasviel/sd-controlnet-openpose](https://huggingface.co/lllyasviel/sd-controlnet_openpose)<br/> *Trained with OpenPose bone image* |A [OpenPose bone](https://github.com/CMU-Perceptual-Computing-Lab/openpose) image.|<a href="https://huggingface.co/takuma104/controlnet_dev/blob/main/gen_compare/control_images/converted/control_human_openpose.png"><img width="64" src="https://huggingface.co/takuma104/controlnet_dev/resolve/main/gen_compare/control_images/converted/control_human_openpose.png"/></a>|<a href="https://huggingface.co/takuma104/controlnet_dev/resolve/main/gen_compare/output_images/diffusers/output_human_openpose_0.png"><img width="64" src="https://huggingface.co/takuma104/controlnet_dev/resolve/main/gen_compare/output_images/diffusers/output_human_openpose_0.png"/></a>|
|[lllyasviel/sd-controlnet-scribble](https://huggingface.co/lllyasviel/sd-controlnet_scribble)<br/> *Trained with human scribbles* |A hand-drawn monochrome image with white outlines on a black background.|<a href="https://huggingface.co/takuma104/controlnet_dev/blob/main/gen_compare/control_images/converted/control_vermeer_scribble.png"><img width="64" src="https://huggingface.co/takuma104/controlnet_dev/resolve/main/gen_compare/control_images/converted/control_vermeer_scribble.png"/></a>|<a href="https://huggingface.co/takuma104/controlnet_dev/resolve/main/gen_compare/output_images/diffusers/output_vermeer_scribble_0.png"><img width="64" src="https://huggingface.co/takuma104/controlnet_dev/resolve/main/gen_compare/output_images/diffusers/output_vermeer_scribble_0.png"/></a> |
|[lllyasviel/sd-controlnet-seg](https://huggingface.co/lllyasviel/sd-controlnet_seg)<br/>*Trained with semantic segmentation* |An [ADE20K](https://groups.csail.mit.edu/vision/datasets/ADE20K/)'s segmentation protocol image.|<a href="https://huggingface.co/takuma104/controlnet_dev/blob/main/gen_compare/control_images/converted/control_room_seg.png"><img width="64" src="https://huggingface.co/takuma104/controlnet_dev/resolve/main/gen_compare/control_images/converted/control_room_seg.png"/></a>|<a href="https://huggingface.co/takuma104/controlnet_dev/resolve/main/gen_compare/output_images/diffusers/output_room_seg_1.png"><img width="64" src="https://huggingface.co/takuma104/controlnet_dev/resolve/main/gen_compare/output_images/diffusers/output_room_seg_1.png"/></a> |
## StableDiffusionControlNetPipeline
[[autodoc]] StableDiffusionControlNetPipeline
- all
- __call__
- enable_attention_slicing
- disable_attention_slicing
- enable_vae_slicing
- disable_vae_slicing
- enable_xformers_memory_efficient_attention
- disable_xformers_memory_efficient_attention

View File

@@ -1,4 +1,4 @@
<!--Copyright 2022 The HuggingFace Team. All rights reserved.
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

View File

@@ -1,4 +1,4 @@
<!--Copyright 2022 The HuggingFace Team. All rights reserved.
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

View File

@@ -1,4 +1,4 @@
<!--Copyright 2022 The HuggingFace Team. All rights reserved.
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
@@ -20,6 +20,9 @@ The original codebase can be found here: [CampVis/stable-diffusion](https://gith
[`StableDiffusionImg2ImgPipeline`] is compatible with all Stable Diffusion checkpoints for [Text-to-Image](./text2img)
The pipeline uses the diffusion-denoising mechanism proposed by SDEdit ([SDEdit: Guided Image Synthesis and Editing with Stochastic Differential Equations](https://arxiv.org/abs/2108.01073)
proposed by Chenlin Meng, Yutong He, Yang Song, Jiaming Song, Jiajun Wu, Jun-Yan Zhu, Stefano Ermon).
[[autodoc]] StableDiffusionImg2ImgPipeline
- all
- __call__

View File

@@ -1,4 +1,4 @@
<!--Copyright 2022 The HuggingFace Team. All rights reserved.
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

View File

@@ -0,0 +1,33 @@
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->
# Stable Diffusion Latent Upscaler
## StableDiffusionLatentUpscalePipeline
The Stable Diffusion Latent Upscaler model was created by [Katherine Crowson](https://github.com/crowsonkb/k-diffusion) in collaboration with [Stability AI](https://stability.ai/). It can be used on top of any [`StableDiffusionUpscalePipeline`] checkpoint to enhance its output image resolution by a factor of 2.
A notebook that demonstrates the original implementation can be found here:
- [Stable Diffusion Upscaler Demo](https://colab.research.google.com/drive/1o1qYJcFeywzCIdkfKJy7cTpgZTCM2EI4)
Available Checkpoints are:
- *stabilityai/latent-upscaler*: [stabilityai/sd-x2-latent-upscaler](https://huggingface.co/stabilityai/sd-x2-latent-upscaler)
[[autodoc]] StableDiffusionLatentUpscalePipeline
- all
- __call__
- enable_sequential_cpu_offload
- enable_attention_slicing
- disable_attention_slicing
- enable_xformers_memory_efficient_attention
- disable_xformers_memory_efficient_attention

View File

@@ -1,4 +1,4 @@
<!--Copyright 2022 The HuggingFace Team. All rights reserved.
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
@@ -31,7 +31,10 @@ For more details about how Stable Diffusion works and how it differs from the ba
| [StableDiffusionDepth2ImgPipeline](./depth2img) | **Experimental** *Depth-to-Image Text-Guided Generation * | | Coming soon
| [StableDiffusionImageVariationPipeline](./image_variation) | **Experimental** *Image Variation Generation * | | [🤗 Stable Diffusion Image Variations](https://huggingface.co/spaces/lambdalabs/stable-diffusion-image-variations)
| [StableDiffusionUpscalePipeline](./upscale) | **Experimental** *Text-Guided Image Super-Resolution * | | Coming soon
| [StableDiffusionLatentUpscalePipeline](./latent_upscale) | **Experimental** *Text-Guided Image Super-Resolution * | | Coming soon
| [StableDiffusionInstructPix2PixPipeline](./pix2pix) | **Experimental** *Text-Based Image Editing * | | [InstructPix2Pix: Learning to Follow Image Editing Instructions](https://huggingface.co/spaces/timbrooks/instruct-pix2pix)
| [StableDiffusionAttendAndExcitePipeline](./attend_and_excite) | **Experimental** *Text-to-Image Generation * | | [Attend-and-Excite: Attention-Based Semantic Guidance for Text-to-Image Diffusion Models](https://huggingface.co/spaces/AttendAndExcite/Attend-and-Excite)
| [StableDiffusionPix2PixZeroPipeline](./pix2pix_zero) | **Experimental** *Text-Based Image Editing * | | [Zero-shot Image-to-Image Translation](https://arxiv.org/abs/2302.03027)

View File

@@ -0,0 +1,58 @@
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->
# MultiDiffusion: Fusing Diffusion Paths for Controlled Image Generation
## Overview
[MultiDiffusion: Fusing Diffusion Paths for Controlled Image Generation](https://arxiv.org/abs/2302.08113) by Omer Bar-Tal, Lior Yariv, Yaron Lipman, and Tali Dekel.
The abstract of the paper is the following:
*Recent advances in text-to-image generation with diffusion models present transformative capabilities in image quality. However, user controllability of the generated image, and fast adaptation to new tasks still remains an open challenge, currently mostly addressed by costly and long re-training and fine-tuning or ad-hoc adaptations to specific image generation tasks. In this work, we present MultiDiffusion, a unified framework that enables versatile and controllable image generation, using a pre-trained text-to-image diffusion model, without any further training or finetuning. At the center of our approach is a new generation process, based on an optimization task that binds together multiple diffusion generation processes with a shared set of parameters or constraints. We show that MultiDiffusion can be readily applied to generate high quality and diverse images that adhere to user-provided controls, such as desired aspect ratio (e.g., panorama), and spatial guiding signals, ranging from tight segmentation masks to bounding boxes.
Resources:
* [Project Page](https://multidiffusion.github.io/).
* [Paper](https://arxiv.org/abs/2302.08113).
* [Original Code](https://github.com/omerbt/MultiDiffusion).
* [Demo](https://huggingface.co/spaces/weizmannscience/MultiDiffusion).
## Available Pipelines:
| Pipeline | Tasks | Demo
|---|---|:---:|
| [StableDiffusionPanoramaPipeline](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_panorama.py) | *Text-Guided Panorama View Generation* | [🤗 Space](https://huggingface.co/spaces/weizmannscience/MultiDiffusion)) |
<!-- TODO: add Colab -->
## Usage example
```python
import torch
from diffusers import StableDiffusionPanoramaPipeline, DDIMScheduler
model_ckpt = "stabilityai/stable-diffusion-2-base"
scheduler = DDIMScheduler.from_pretrained(model_ckpt, subfolder="scheduler")
pipe = StableDiffusionPanoramaPipeline.from_pretrained(model_ckpt, scheduler=scheduler, torch_dtype=torch.float16)
pipe = pipe.to("cuda")
prompt = "a photo of the dolomites"
image = pipe(prompt).images[0]
image.save("dolomites.png")
```
## StableDiffusionPanoramaPipeline
[[autodoc]] StableDiffusionPanoramaPipeline
- __call__
- all

View File

@@ -60,7 +60,7 @@ def download_image(url):
image = download_image(url)
prompt = "make the mountains snowy"
edit = pipe(prompt, image=image, num_inference_steps=20, image_guidance_scale=1.5, guidance_scale=7).images[0]
images = pipe(prompt, image=image, num_inference_steps=20, image_guidance_scale=1.5, guidance_scale=7).images
images[0].save("snowy_mountains.png")
```

View File

@@ -0,0 +1,291 @@
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->
# Zero-shot Image-to-Image Translation
## Overview
[Zero-shot Image-to-Image Translation](https://arxiv.org/abs/2302.03027).
The abstract of the paper is the following:
*Large-scale text-to-image generative models have shown their remarkable ability to synthesize diverse and high-quality images. However, it is still challenging to directly apply these models for editing real images for two reasons. First, it is hard for users to come up with a perfect text prompt that accurately describes every visual detail in the input image. Second, while existing models can introduce desirable changes in certain regions, they often dramatically alter the input content and introduce unexpected changes in unwanted regions. In this work, we propose pix2pix-zero, an image-to-image translation method that can preserve the content of the original image without manual prompting. We first automatically discover editing directions that reflect desired edits in the text embedding space. To preserve the general content structure after editing, we further propose cross-attention guidance, which aims to retain the cross-attention maps of the input image throughout the diffusion process. In addition, our method does not need additional training for these edits and can directly use the existing pre-trained text-to-image diffusion model. We conduct extensive experiments and show that our method outperforms existing and concurrent works for both real and synthetic image editing.*
Resources:
* [Project Page](https://pix2pixzero.github.io/).
* [Paper](https://arxiv.org/abs/2302.03027).
* [Original Code](https://github.com/pix2pixzero/pix2pix-zero).
* [Demo](https://huggingface.co/spaces/pix2pix-zero-library/pix2pix-zero-demo).
## Tips
* The pipeline can be conditioned on real input images. Check out the code examples below to know more.
* The pipeline exposes two arguments namely `source_embeds` and `target_embeds`
that let you control the direction of the semantic edits in the final image to be generated. Let's say,
you wanted to translate from "cat" to "dog". In this case, the edit direction will be "cat -> dog". To reflect
this in the pipeline, you simply have to set the embeddings related to the phrases including "cat" to
`source_embeds` and "dog" to `target_embeds`. Refer to the code example below for more details.
* When you're using this pipeline from a prompt, specify the _source_ concept in the prompt. Taking
the above example, a valid input prompt would be: "a high resolution painting of a **cat** in the style of van gough".
* If you wanted to reverse the direction in the example above, i.e., "dog -> cat", then it's recommended to:
* Swap the `source_embeds` and `target_embeds`.
* Change the input prompt to include "dog".
* To learn more about how the source and target embeddings are generated, refer to the [original
paper](https://arxiv.org/abs/2302.03027). Below, we also provide some directions on how to generate the embeddings.
* Note that the quality of the outputs generated with this pipeline is dependent on how good the `source_embeds` and `target_embeds` are. Please, refer to [this discussion](#generating-source-and-target-embeddings) for some suggestions on the topic.
## Available Pipelines:
| Pipeline | Tasks | Demo
|---|---|:---:|
| [StableDiffusionPix2PixZeroPipeline](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_pix2pix_zero.py) | *Text-Based Image Editing* | [🤗 Space](https://huggingface.co/spaces/pix2pix-zero-library/pix2pix-zero-demo) |
<!-- TODO: add Colab -->
## Usage example
### Based on an image generated with the input prompt
```python
import requests
import torch
from diffusers import DDIMScheduler, StableDiffusionPix2PixZeroPipeline
def download(embedding_url, local_filepath):
r = requests.get(embedding_url)
with open(local_filepath, "wb") as f:
f.write(r.content)
model_ckpt = "CompVis/stable-diffusion-v1-4"
pipeline = StableDiffusionPix2PixZeroPipeline.from_pretrained(
model_ckpt, conditions_input_image=False, torch_dtype=torch.float16
)
pipeline.scheduler = DDIMScheduler.from_config(pipeline.scheduler.config)
pipeline.to("cuda")
prompt = "a high resolution painting of a cat in the style of van gogh"
src_embs_url = "https://github.com/pix2pixzero/pix2pix-zero/raw/main/assets/embeddings_sd_1.4/cat.pt"
target_embs_url = "https://github.com/pix2pixzero/pix2pix-zero/raw/main/assets/embeddings_sd_1.4/dog.pt"
for url in [src_embs_url, target_embs_url]:
download(url, url.split("/")[-1])
src_embeds = torch.load(src_embs_url.split("/")[-1])
target_embeds = torch.load(target_embs_url.split("/")[-1])
images = pipeline(
prompt,
source_embeds=src_embeds,
target_embeds=target_embeds,
num_inference_steps=50,
cross_attention_guidance_amount=0.15,
).images
images[0].save("edited_image_dog.png")
```
### Based on an input image
When the pipeline is conditioned on an input image, we first obtain an inverted
noise from it using a `DDIMInverseScheduler` with the help of a generated caption. Then
the inverted noise is used to start the generation process.
First, let's load our pipeline:
```py
import torch
from transformers import BlipForConditionalGeneration, BlipProcessor
from diffusers import DDIMScheduler, DDIMInverseScheduler, StableDiffusionPix2PixZeroPipeline
captioner_id = "Salesforce/blip-image-captioning-base"
processor = BlipProcessor.from_pretrained(captioner_id)
model = BlipForConditionalGeneration.from_pretrained(captioner_id, torch_dtype=torch.float16, low_cpu_mem_usage=True)
sd_model_ckpt = "CompVis/stable-diffusion-v1-4"
pipeline = StableDiffusionPix2PixZeroPipeline.from_pretrained(
sd_model_ckpt,
caption_generator=model,
caption_processor=processor,
torch_dtype=torch.float16,
safety_checker=None,
)
pipeline.scheduler = DDIMScheduler.from_config(pipeline.scheduler.config)
pipeline.inverse_scheduler = DDIMInverseScheduler.from_config(pipeline.scheduler.config)
pipeline.enable_model_cpu_offload()
```
Then, we load an input image for conditioning and obtain a suitable caption for it:
```py
import requests
from PIL import Image
img_url = "https://github.com/pix2pixzero/pix2pix-zero/raw/main/assets/test_images/cats/cat_6.png"
raw_image = Image.open(requests.get(img_url, stream=True).raw).convert("RGB").resize((512, 512))
caption = pipeline.generate_caption(raw_image)
```
Then we employ the generated caption and the input image to get the inverted noise:
```py
generator = torch.manual_seed(0)
inv_latents = pipeline.invert(caption, image=raw_image, generator=generator).latents
```
Now, generate the image with edit directions:
```py
# See the "Generating source and target embeddings" section below to
# automate the generation of these captions with a pre-trained model like Flan-T5 as explained below.
source_prompts = ["a cat sitting on the street", "a cat playing in the field", "a face of a cat"]
target_prompts = ["a dog sitting on the street", "a dog playing in the field", "a face of a dog"]
source_embeds = pipeline.get_embeds(source_prompts, batch_size=2)
target_embeds = pipeline.get_embeds(target_prompts, batch_size=2)
image = pipeline(
caption,
source_embeds=source_embeds,
target_embeds=target_embeds,
num_inference_steps=50,
cross_attention_guidance_amount=0.15,
generator=generator,
latents=inv_latents,
negative_prompt=caption,
).images[0]
image.save("edited_image.png")
```
## Generating source and target embeddings
The authors originally used the [GPT-3 API](https://openai.com/api/) to generate the source and target captions for discovering
edit directions. However, we can also leverage open source and public models for the same purpose.
Below, we provide an end-to-end example with the [Flan-T5](https://huggingface.co/docs/transformers/model_doc/flan-t5) model
for generating captions and [CLIP](https://huggingface.co/docs/transformers/model_doc/clip) for
computing embeddings on the generated captions.
**1. Load the generation model**:
```py
import torch
from transformers import AutoTokenizer, T5ForConditionalGeneration
tokenizer = AutoTokenizer.from_pretrained("google/flan-t5-xl")
model = T5ForConditionalGeneration.from_pretrained("google/flan-t5-xl", device_map="auto", torch_dtype=torch.float16)
```
**2. Construct a starting prompt**:
```py
source_concept = "cat"
target_concept = "dog"
source_text = f"Provide a caption for images containing a {source_concept}. "
"The captions should be in English and should be no longer than 150 characters."
target_text = f"Provide a caption for images containing a {target_concept}. "
"The captions should be in English and should be no longer than 150 characters."
```
Here, we're interested in the "cat -> dog" direction.
**3. Generate captions**:
We can use a utility like so for this purpose.
```py
def generate_captions(input_prompt):
input_ids = tokenizer(input_prompt, return_tensors="pt").input_ids.to("cuda")
outputs = model.generate(
input_ids, temperature=0.8, num_return_sequences=16, do_sample=True, max_new_tokens=128, top_k=10
)
return tokenizer.batch_decode(outputs, skip_special_tokens=True)
```
And then we just call it to generate our captions:
```py
source_captions = generate_captions(source_text)
target_captions = generate_captions(target_concept)
```
We encourage you to play around with the different parameters supported by the
`generate()` method ([documentation](https://huggingface.co/docs/transformers/main/en/main_classes/text_generation#transformers.generation_tf_utils.TFGenerationMixin.generate)) for the generation quality you are looking for.
**4. Load the embedding model**:
Here, we need to use the same text encoder model used by the subsequent Stable Diffusion model.
```py
from diffusers import StableDiffusionPix2PixZeroPipeline
pipeline = StableDiffusionPix2PixZeroPipeline.from_pretrained(
"CompVis/stable-diffusion-v1-4", torch_dtype=torch.float16
)
pipeline = pipeline.to("cuda")
tokenizer = pipeline.tokenizer
text_encoder = pipeline.text_encoder
```
**5. Compute embeddings**:
```py
import torch
def embed_captions(sentences, tokenizer, text_encoder, device="cuda"):
with torch.no_grad():
embeddings = []
for sent in sentences:
text_inputs = tokenizer(
sent,
padding="max_length",
max_length=tokenizer.model_max_length,
truncation=True,
return_tensors="pt",
)
text_input_ids = text_inputs.input_ids
prompt_embeds = text_encoder(text_input_ids.to(device), attention_mask=None)[0]
embeddings.append(prompt_embeds)
return torch.concatenate(embeddings, dim=0).mean(dim=0).unsqueeze(0)
source_embeddings = embed_captions(source_captions, tokenizer, text_encoder)
target_embeddings = embed_captions(target_captions, tokenizer, text_encoder)
```
And you're done! [Here](https://colab.research.google.com/drive/1tz2C1EdfZYAPlzXXbTnf-5PRBiR8_R1F?usp=sharing) is a Colab Notebook that you can use to interact with the entire process.
Now, you can use these embeddings directly while calling the pipeline:
```py
from diffusers import DDIMScheduler
pipeline.scheduler = DDIMScheduler.from_config(pipeline.scheduler.config)
images = pipeline(
prompt,
source_embeds=source_embeddings,
target_embeds=target_embeddings,
num_inference_steps=50,
cross_attention_guidance_amount=0.15,
).images
images[0].save("edited_image_dog.png")
```
## StableDiffusionPix2PixZeroPipeline
[[autodoc]] StableDiffusionPix2PixZeroPipeline
- __call__
- all

View File

@@ -0,0 +1,64 @@
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->
# Self-Attention Guidance (SAG)
## Overview
[Self-Attention Guidance](https://arxiv.org/abs/2210.00939) by Susung Hong et al.
The abstract of the paper is the following:
*Denoising diffusion models (DDMs) have been drawing much attention for their appreciable sample quality and diversity. Despite their remarkable performance, DDMs remain black boxes on which further study is necessary to take a profound step. Motivated by this, we delve into the design of conventional U-shaped diffusion models. More specifically, we investigate the self-attention modules within these models through carefully designed experiments and explore their characteristics. In addition, inspired by the studies that substantiate the effectiveness of the guidance schemes, we present plug-and-play diffusion guidance, namely Self-Attention Guidance (SAG), that can drastically boost the performance of existing diffusion models. Our method, SAG, extracts the intermediate attention map from a diffusion model at every iteration and selects tokens above a certain attention score for masking and blurring to obtain a partially blurred input. Subsequently, we measure the dissimilarity between the predicted noises obtained from feeding the blurred and original input to the diffusion model and leverage it as guidance. With this guidance, we observe apparent improvements in a wide range of diffusion models, e.g., ADM, IDDPM, and Stable Diffusion, and show that the results further improve by combining our method with the conventional guidance scheme. We provide extensive ablation studies to verify our choices.*
Resources:
* [Project Page](https://ku-cvlab.github.io/Self-Attention-Guidance).
* [Paper](https://arxiv.org/abs/2210.00939).
* [Original Code](https://github.com/KU-CVLAB/Self-Attention-Guidance).
* [Demo](https://colab.research.google.com/github/SusungHong/Self-Attention-Guidance/blob/main/SAG_Stable.ipynb).
## Available Pipelines:
| Pipeline | Tasks | Demo
|---|---|:---:|
| [StableDiffusionSAGPipeline](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_sag.py) | *Text-to-Image Generation* | [Colab](https://colab.research.google.com/github/SusungHong/Self-Attention-Guidance/blob/main/SAG_Stable.ipynb) |
## Usage example
```python
import torch
from diffusers import StableDiffusionSAGPipeline
from accelerate.utils import set_seed
pipe = StableDiffusionSAGPipeline.from_pretrained("CompVis/stable-diffusion-v1-4", torch_dtype=torch.float16)
pipe = pipe.to("cuda")
seed = 8978
prompt = "."
guidance_scale = 7.5
num_images_per_prompt = 1
sag_scale = 1.0
set_seed(seed)
images = pipe(
prompt, num_images_per_prompt=num_images_per_prompt, guidance_scale=guidance_scale, sag_scale=sag_scale
).images
images[0].save("example.png")
```
## StableDiffusionSAGPipeline
[[autodoc]] StableDiffusionSAGPipeline
- __call__
- all

View File

@@ -1,4 +1,4 @@
<!--Copyright 2022 The HuggingFace Team. All rights reserved.
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
@@ -17,7 +17,7 @@ specific language governing permissions and limitations under the License.
The Stable Diffusion model was created by the researchers and engineers from [CompVis](https://github.com/CompVis), [Stability AI](https://stability.ai/), [runway](https://github.com/runwayml), and [LAION](https://laion.ai/). The [`StableDiffusionPipeline`] is capable of generating photo-realistic images given any text input using Stable Diffusion.
The original codebase can be found here:
- *Stable Diffusion V1*: [CampVis/stable-diffusion](https://github.com/CompVis/stable-diffusion)
- *Stable Diffusion V1*: [CompVis/stable-diffusion](https://github.com/CompVis/stable-diffusion)
- *Stable Diffusion v2*: [Stability-AI/stablediffusion](https://github.com/Stability-AI/stablediffusion)
Available Checkpoints are:
@@ -36,4 +36,6 @@ Available Checkpoints are:
- enable_vae_slicing
- disable_vae_slicing
- enable_xformers_memory_efficient_attention
- disable_xformers_memory_efficient_attention
- disable_xformers_memory_efficient_attention
- enable_vae_tiling
- disable_vae_tiling

View File

@@ -1,4 +1,4 @@
<!--Copyright 2022 The HuggingFace Team. All rights reserved.
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

View File

@@ -1,4 +1,4 @@
<!--Copyright 2022 The HuggingFace Team. All rights reserved.
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

View File

@@ -1,4 +1,4 @@
<!--Copyright 2022 The HuggingFace Team. All rights reserved.
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
@@ -24,7 +24,7 @@ The abstract of the paper is the following:
| Pipeline | Tasks | Colab | Demo
|---|---|:---:|:---:|
| [pipeline_stable_diffusion_safe.py](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/stable_diffusion_safe/pipeline_stable_diffusion_safe.py) | *Text-to-Image Generation* | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/ml-research/safe-latent-diffusion/blob/main/examples/Safe%20Latent%20Diffusion.ipynb) | -
| [pipeline_stable_diffusion_safe.py](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/stable_diffusion_safe/pipeline_stable_diffusion_safe.py) | *Text-to-Image Generation* | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/ml-research/safe-latent-diffusion/blob/main/examples/Safe%20Latent%20Diffusion.ipynb) | [![Huggingface Spaces](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https://huggingface.co/spaces/AIML-TUDA/unsafe-vs-safe-stable-diffusion)
## Tips
@@ -58,7 +58,7 @@ You may use the 4 configurations defined in the [Safe Latent Diffusion paper](ht
>>> out = pipeline(prompt=prompt, **SafetyConfig.MAX)
```
The following configurations are available: `SafetyConfig.WEAK`, `SafetyConfig.MEDIUM`, `SafetyConfig.STRONg`, and `SafetyConfig.MAX`.
The following configurations are available: `SafetyConfig.WEAK`, `SafetyConfig.MEDIUM`, `SafetyConfig.STRONG`, and `SafetyConfig.MAX`.
### How to load and use different schedulers.

View File

@@ -0,0 +1,97 @@
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->
# Stable unCLIP
Stable unCLIP checkpoints are finetuned from [stable diffusion 2.1](./stable_diffusion_2) checkpoints to condition on CLIP image embeddings.
Stable unCLIP also still conditions on text embeddings. Given the two separate conditionings, stable unCLIP can be used
for text guided image variation. When combined with an unCLIP prior, it can also be used for full text to image generation.
## Tips
Stable unCLIP takes a `noise_level` as input during inference. `noise_level` determines how much noise is added
to the image embeddings. A higher `noise_level` increases variation in the final un-noised images. By default,
we do not add any additional noise to the image embeddings i.e. `noise_level = 0`.
### Available checkpoints:
TODO
### Text-to-Image Generation
```python
import torch
from diffusers import StableUnCLIPPipeline
pipe = StableUnCLIPPipeline.from_pretrained(
"fusing/stable-unclip-2-1-l", torch_dtype=torch.float16
) # TODO update model path
pipe = pipe.to("cuda")
prompt = "a photo of an astronaut riding a horse on mars"
images = pipe(prompt).images
images[0].save("astronaut_horse.png")
```
### Text guided Image-to-Image Variation
```python
import requests
import torch
from PIL import Image
from io import BytesIO
from diffusers import StableUnCLIPImg2ImgPipeline
pipe = StableUnCLIPImg2ImgPipeline.from_pretrained(
"fusing/stable-unclip-2-1-l-img2img", torch_dtype=torch.float16
) # TODO update model path
pipe = pipe.to("cuda")
url = "https://raw.githubusercontent.com/CompVis/stable-diffusion/main/assets/stable-samples/img2img/sketch-mountains-input.jpg"
response = requests.get(url)
init_image = Image.open(BytesIO(response.content)).convert("RGB")
init_image = init_image.resize((768, 512))
prompt = "A fantasy landscape, trending on artstation"
images = pipe(prompt, init_image).images
images[0].save("fantasy_landscape.png")
```
### StableUnCLIPPipeline
[[autodoc]] StableUnCLIPPipeline
- all
- __call__
- enable_attention_slicing
- disable_attention_slicing
- enable_vae_slicing
- disable_vae_slicing
- enable_xformers_memory_efficient_attention
- disable_xformers_memory_efficient_attention
### StableUnCLIPImg2ImgPipeline
[[autodoc]] StableUnCLIPImg2ImgPipeline
- all
- __call__
- enable_attention_slicing
- disable_attention_slicing
- enable_vae_slicing
- disable_vae_slicing
- enable_xformers_memory_efficient_attention
- disable_xformers_memory_efficient_attention

View File

@@ -1,4 +1,4 @@
<!--Copyright 2022 The HuggingFace Team. All rights reserved.
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

View File

@@ -1,4 +1,4 @@
<!--Copyright 2022 The HuggingFace Team. All rights reserved.
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0

View File

@@ -1,4 +1,4 @@
<!--Copyright 2022 The HuggingFace Team. All rights reserved.
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

View File

@@ -1,4 +1,4 @@
<!--Copyright 2022 The HuggingFace Team. All rights reserved.
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

View File

@@ -1,4 +1,4 @@
<!--Copyright 2022 The HuggingFace Team. All rights reserved.
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

View File

@@ -0,0 +1,21 @@
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->
# Inverse Denoising Diffusion Implicit Models (DDIMInverse)
## Overview
This scheduler is the inverted scheduler of [Denoising Diffusion Implicit Models](https://arxiv.org/abs/2010.02502) (DDIM) by Jiaming Song, Chenlin Meng and Stefano Ermon.
The implementation is mostly based on the DDIM inversion definition of [Null-text Inversion for Editing Real Images using Guided Diffusion Models](https://arxiv.org/pdf/2211.09794.pdf)
## DDIMInverseScheduler
[[autodoc]] DDIMInverseScheduler

View File

@@ -1,4 +1,4 @@
<!--Copyright 2022 The HuggingFace Team. All rights reserved.
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

View File

@@ -1,4 +1,4 @@
<!--Copyright 2022 The HuggingFace Team. All rights reserved.
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

View File

@@ -1,4 +1,4 @@
<!--Copyright 2022 The HuggingFace Team. All rights reserved.
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

View File

@@ -1,4 +1,4 @@
<!--Copyright 2022 The HuggingFace Team. All rights reserved.
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

View File

@@ -1,4 +1,4 @@
<!--Copyright 2022 The HuggingFace Team. All rights reserved.
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

View File

@@ -1,4 +1,4 @@
<!--Copyright 2022 The HuggingFace Team. All rights reserved.
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

View File

@@ -1,4 +1,4 @@
<!--Copyright 2022 The HuggingFace Team. All rights reserved.
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

View File

@@ -1,4 +1,4 @@
<!--Copyright 2022 The HuggingFace Team. All rights reserved.
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

View File

@@ -1,4 +1,4 @@
<!--Copyright 2022 The HuggingFace Team. All rights reserved.
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

View File

@@ -1,4 +1,4 @@
<!--Copyright 2022 The HuggingFace Team. All rights reserved.
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

View File

@@ -1,4 +1,4 @@
<!--Copyright 2022 The HuggingFace Team. All rights reserved.
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
@@ -43,11 +43,12 @@ To this end, the design of schedulers is such that:
The following table summarizes all officially supported schedulers, their corresponding paper
| Scheduler | Paper |
|---|---|
| [ddim](./ddim) | [**Denoising Diffusion Implicit Models**](https://arxiv.org/abs/2010.02502) |
| [ddim_inverse](./ddim_inverse) | [**Denoising Diffusion Implicit Models**](https://arxiv.org/abs/2010.02502) |
| [ddpm](./ddpm) | [**Denoising Diffusion Probabilistic Models**](https://arxiv.org/abs/2006.11239) |
| [deis](./deis) | [**DEISMultistepScheduler**](https://arxiv.org/abs/2204.13902) |
| [singlestep_dpm_solver](./singlestep_dpm_solver) | [**Singlestep DPM-Solver**](https://arxiv.org/abs/2206.00927) |
| [multistep_dpm_solver](./multistep_dpm_solver) | [**Multistep DPM-Solver**](https://arxiv.org/abs/2206.00927) |
| [heun](./heun) | [**Heun scheduler inspired by Karras et. al paper**](https://arxiv.org/abs/2206.00364) |
@@ -62,6 +63,7 @@ The following table summarizes all officially supported schedulers, their corres
| [euler](./euler) | [**Euler scheduler**](https://arxiv.org/abs/2206.00364) |
| [euler_ancestral](./euler_ancestral) | [**Euler Ancestral scheduler**](https://github.com/crowsonkb/k-diffusion/blob/481677d114f6ea445aa009cf5bd7a9cdee909e47/k_diffusion/sampling.py#L72) |
| [vq_diffusion](./vq_diffusion) | [**VQDiffusionScheduler**](https://arxiv.org/abs/2111.14822) |
| [unipc](./unipc) | [**UniPCMultistepScheduler**](https://arxiv.org/abs/2302.04867) |
| [repaint](./repaint) | [**RePaint scheduler**](https://arxiv.org/abs/2201.09865) |
## API
@@ -83,4 +85,8 @@ The class [`SchedulerOutput`] contains the outputs from any schedulers `step(...
### KarrasDiffusionSchedulers
`KarrasDiffusionSchedulers` encompasses the main generalization of schedulers in Diffusers. The schedulers in this class are distinguished, at a high level, by their noise sampling strategy; the type of network and scaling; and finally the training strategy or how the loss is weighed.
The different schedulers, depending on the type of ODE solver, fall into the above taxonomy and provide a good abstraction for the design of the main schedulers implemented in Diffusers. The schedulers in this class are given below:
[[autodoc]] schedulers.scheduling_utils.KarrasDiffusionSchedulers

View File

@@ -1,4 +1,4 @@
<!--Copyright 2022 The HuggingFace Team. All rights reserved.
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

View File

@@ -1,4 +1,4 @@
<!--Copyright 2022 The HuggingFace Team. All rights reserved.
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

View File

@@ -1,4 +1,4 @@
<!--Copyright 2022 The HuggingFace Team. All rights reserved.
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

View File

@@ -1,4 +1,4 @@
<!--Copyright 2022 The HuggingFace Team. All rights reserved.
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

View File

@@ -1,4 +1,4 @@
<!--Copyright 2022 The HuggingFace Team. All rights reserved.
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

View File

@@ -1,4 +1,4 @@
<!--Copyright 2022 The HuggingFace Team. All rights reserved.
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

View File

@@ -0,0 +1,24 @@
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->
# UniPC
## Overview
UniPC is a training-free framework designed for the fast sampling of diffusion models, which consists of a corrector (UniC) and a predictor (UniP) that share a unified analytical form and support arbitrary orders.
For more details about the method, please refer to the [[paper]](https://arxiv.org/abs/2302.04867) and the [[code]](https://github.com/wl-zhao/UniPC).
Fast Sampling of Diffusion Models with Exponential Integrator.
## UniPCMultistepScheduler
[[autodoc]] UniPCMultistepScheduler

View File

@@ -1,4 +1,4 @@
<!--Copyright 2022 The HuggingFace Team. All rights reserved.
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

View File

@@ -1,4 +1,4 @@
<!--Copyright 2022 The HuggingFace Team. All rights reserved.
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
@@ -177,7 +177,7 @@ Follow these steps to start contributing ([supported Python versions](https://gi
$ make style
```
🧨 Diffusers also uses `flake8` and a few custom scripts to check for coding mistakes. Quality
🧨 Diffusers also uses `ruff` and a few custom scripts to check for coding mistakes. Quality
control runs in CI, however you can also run the same checks with:
```bash

View File

@@ -0,0 +1,49 @@
# 🧨 Diffusers Ethical Guidelines
## Preamble
[Diffusers](https://huggingface.co/docs/diffusers/index) provides pre-trained diffusion models and serves as a modular toolbox for inference and training.
Given its real case applications in the world and potential negative impacts on society, we think it is important to provide the project with ethical guidelines to guide the development, users contributions, and usage of the Diffusers library.
The risks associated with using this technology are still being examined, but to name a few: copyrights issues for artists; deep-fake exploitation; sexual content generation in inappropriate contexts; non-consensual impersonation; harmful social biases perpetuating the oppression of marginalized groups.
We will keep tracking risks and adapt the following guidelines based on the community's responsiveness and valuable feedback.
## Scope
The Diffusers community will apply the following ethical guidelines to the projects development and help coordinate how the community will integrate the contributions, especially concerning sensitive topics related to ethical concerns.
## Ethical guidelines
The following ethical guidelines apply generally, but we will primarily implement them when dealing with ethically sensitive issues while making a technical choice. Furthermore, we commit to adapting those ethical principles over time following emerging harms related to the state of the art of the technology in question.
- **Transparency**: we are committed to being transparent in managing PRs, explaining our choices to users, and making technical decisions.
- **Consistency**: we are committed to guaranteeing our users the same level of attention in project management, keeping it technically stable and consistent.
- **Simplicity**: with a desire to make it easy to use and exploit the Diffusers library, we are committed to keeping the projects goals lean and coherent.
- **Accessibility**: the Diffusers project helps lower the entry bar for contributors who can help run it even without technical expertise. Doing so makes research artifacts more accessible to the community.
- **Reproducibility**: we aim to be transparent about the reproducibility of upstream code, models, and datasets when made available through the Diffusers library.
- **Responsibility**: as a community and through teamwork, we hold a collective responsibility to our users by anticipating and mitigating this technology's potential risks and dangers.
## Examples of implementations: Safety features and Mechanisms
The team works daily to make the technical and non-technical tools available to deal with the potential ethical and social risks associated with diffusion technology. Moreover, the community's input is invaluable in ensuring these features' implementation and raising awareness with us.
- [**Community tab**](https://huggingface.co/docs/hub/repositories-pull-requests-discussions): it enables the community to discuss and better collaborate on a project.
- **Bias exploration and evaluation**: the Hugging Face team provides a [space](https://huggingface.co/spaces/society-ethics/DiffusionBiasExplorer) to demonstrate the biases in Stable Diffusion interactively. In this sense, we support and encourage bias explorers and evaluations.
- **Encouraging safety in deployment**
- [**Safe Stable Diffusion**](https://huggingface.co/docs/diffusers/main/en/api/pipelines/stable_diffusion_safe): It mitigates the well-known issue that models, like Stable Diffusion, that are trained on unfiltered, web-crawled datasets tend to suffer from inappropriate degeneration. Related paper: [Safe Latent Diffusion: Mitigating Inappropriate Degeneration in Diffusion Models](https://arxiv.org/abs/2211.05105).
- **Staged released on the Hub**: in particularly sensitive situations, access to some repositories should be restricted. This staged release is an intermediary step that allows the repositorys authors to have more control over its use.
- **Licensing**: [OpenRAILs](https://huggingface.co/blog/open_rail), a new type of licensing, allow us to ensure free access while having a set of restrictions that ensure more responsible use.

View File

@@ -1,4 +1,4 @@
<!--Copyright 2022 The HuggingFace Team. All rights reserved.
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
@@ -12,6 +12,99 @@ specific language governing permissions and limitations under the License.
# Philosophy
- Readability and clarity are preferred over highly optimized code. A strong importance is put on providing readable, intuitive and elementary code design. *E.g.*, the provided [schedulers](https://github.com/huggingface/diffusers/tree/main/src/diffusers/schedulers) are separated from the provided [models](https://github.com/huggingface/diffusers/tree/main/src/diffusers/models) and use well-commented code that can be read alongside the original paper.
- Diffusers is **modality independent** and focuses on providing pretrained models and tools to build systems that generate **continuous outputs**, *e.g.* vision and audio. This is one of the guiding goals even if the initial pipelines are devoted to vision tasks.
- Diffusion models and schedulers are provided as concise, elementary building blocks. In contrast, diffusion pipelines are a collection of end-to-end diffusion systems that can be used out-of-the-box, should stay as close as possible to their original implementations and can include components of other libraries, such as text encoders. Examples of diffusion pipelines are [Glide](https://github.com/openai/glide-text2im), [Latent Diffusion](https://github.com/CompVis/latent-diffusion) and [Stable Diffusion](https://github.com/compvis/stable-diffusion).
🧨 Diffusers provides **state-of-the-art** pretrained diffusion models across multiple modalities.
Its purpose is to serve as a **modular toolbox** for both inference and training.
We aim at building a library that stands the test of time and therefore take API design very seriously.
In a nutshell, Diffusers is built to be a natural extension of PyTorch. Therefore, most of our design choices are based on [PyTorch's Design Principles](https://pytorch.org/docs/stable/community/design.html#pytorch-design-philosophy). Let's go over the most important ones:
## Usability over Performance
- While Diffusers has many built-in performance-enhancing features (see [Memory and Speed](https://huggingface.co/docs/diffusers/optimization/fp16)), models are always loaded with the highest precision and lowest optimization. Therefore, by default diffusion pipelines are always instantiated on CPU with float32 precision if not otherwise defined by the user. This ensures usability across different platforms and accelerators and means that no complex installations are required to run the library.
- Diffusers aim at being a **light-weight** package and therefore has very few required dependencies, but many soft dependencies that can improve performance (such as `accelerate`, `safetensors`, `onnx`, etc...). We strive to keep the library as lightweight as possible so that it can be added without much concern as a dependency on other packages.
- Diffusers prefers simple, self-explainable code over condensed, magic code. This means that short-hand code syntaxes such as lambda functions, and advanced PyTorch operators are often not desired.
## Simple over easy
As PyTorch states, **explicit is better than implicit** and **simple is better than complex**. This design philosophy is reflected in multiple parts of the library:
- We follow PyTorch's API with methods like [`DiffusionPipeline.to`](https://huggingface.co/docs/diffusers/main/en/api/diffusion_pipeline#diffusers.DiffusionPipeline.to) to let the user handle device management.
- Raising concise error messages is preferred to silently correct erroneous input. Diffusers aims at teaching the user, rather than making the library as easy to use as possible.
- Complex model vs. scheduler logic is exposed instead of magically handled inside. Schedulers/Samplers are separated from diffusion models with minimal dependencies on each other. This forces the user to write the unrolled denoising loop. However, the separation allows for easier debugging and gives the user more control over adapting the denoising process or switching out diffusion models or schedulers.
- Separately trained components of the diffusion pipeline, *e.g.* the text encoder, the unet, and the variational autoencoder, each have their own model class. This forces the user to handle the interaction between the different model components, and the serialization format separates the model components into different files. However, this allows for easier debugging and customization. Dreambooth or textual inversion training
is very simple thanks to diffusers' ability to separate single components of the diffusion pipeline.
## Tweakable, contributor-friendly over abstraction
For large parts of the library, Diffusers adopts an important design principle of the [Transformers library](https://github.com/huggingface/transformers), which is to prefer copy-pasted code over hasty abstractions. This design principle is very opinionated and stands in stark contrast to popular design principles such as [Don't repeat yourself (DRY)](https://en.wikipedia.org/wiki/Don%27t_repeat_yourself).
In short, just like Transformers does for modeling files, diffusers prefers to keep an extremely low level of abstraction and very self-contained code for pipelines and schedulers.
Functions, long code blocks, and even classes can be copied across multiple files which at first can look like a bad, sloppy design choice that makes the library unmaintainable.
**However**, this design has proven to be extremely successful for Transformers and makes a lot of sense for community-driven, open-source machine learning libraries because:
- Machine Learning is an extremely fast-moving field in which paradigms, model architectures, and algorithms are changing rapidly, which therefore makes it very difficult to define long-lasting code abstractions.
- Machine Learning practitioners like to be able to quickly tweak existing code for ideation and research and therefore prefer self-contained code over one that contains many abstractions.
- Open-source libraries rely on community contributions and therefore must build a library that is easy to contribute to. The more abstract the code, the more dependencies, the harder to read, and the harder to contribute to. Contributors simply stop contributing to very abstract libraries out of fear of breaking vital functionality. If contributing to a library cannot break other fundamental code, not only is it more inviting for potential new contributors, but it is also easier to review and contribute to multiple parts in parallel.
At Hugging Face, we call this design the **single-file policy** which means that almost all of the code of a certain class should be written in a single, self-contained file. To read more about the philosophy, you can have a look
at [this blog post](https://huggingface.co/blog/transformers-design-philosophy).
In diffusers, we follow this philosophy for both pipelines and schedulers, but only partly for diffusion models. The reason we don't follow this design fully for diffusion models is because almost all diffusion pipelines, such
as [DDPM](https://huggingface.co/docs/diffusers/v0.12.0/en/api/pipelines/ddpm), [Stable Diffusion](https://huggingface.co/docs/diffusers/v0.12.0/en/api/pipelines/stable_diffusion/overview#stable-diffusion-pipelines), [UnCLIP (Dalle-2)](https://huggingface.co/docs/diffusers/v0.12.0/en/api/pipelines/unclip#overview) and [Imagen](https://imagen.research.google/) all rely on the same diffusion model, the [UNet](https://huggingface.co/docs/diffusers/api/models#diffusers.UNet2DConditionModel).
Great, now you should have generally understood why 🧨 Diffusers is designed the way it is 🤗.
We try to apply these design principles consistently across the library. Nevertheless, there are some minor exceptions to the philosophy or some unlucky design choices. If you have feedback regarding the design, we would ❤️ to hear it [directly on GitHub](https://github.com/huggingface/diffusers/issues/new?assignees=&labels=&template=feedback.md&title=).
## Design Philosophy in Details
Now, let's look a bit into the nitty-gritty details of the design philosophy. Diffusers essentially consist of three major classes, [pipelines](https://github.com/huggingface/diffusers/tree/main/src/diffusers/pipelines), [models](https://github.com/huggingface/diffusers/tree/main/src/diffusers/models), and [schedulers](https://github.com/huggingface/diffusers/tree/main/src/diffusers/schedulers).
Let's walk through more in-detail design decisions for each class.
### Pipelines
Pipelines are designed to be easy to use (therefore do not follow [*Simple over easy*](#simple-over-easy) 100%)), are not feature complete, and should loosely be seen as examples of how to use [models](#models) and [schedulers](#schedulers) for inference.
The following design principles are followed:
- Pipelines follow the single-file policy. All pipelines can be found in individual directories under src/diffusers/pipelines. One pipeline folder corresponds to one diffusion paper/project/release. Multiple pipeline files can be gathered in one pipeline folder, as its done for [`src/diffusers/pipelines/stable-diffusion`](https://github.com/huggingface/diffusers/tree/main/src/diffusers/pipelines/stable_diffusion). If pipelines share similar functionality, one can make use of the [#Copied from mechanism](https://github.com/huggingface/diffusers/blob/125d783076e5bd9785beb05367a2d2566843a271/src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_img2img.py#L251).
- Pipelines all inherit from [`DiffusionPipeline`]
- Every pipeline consists of different model and scheduler components, that are documented in the [`model_index.json` file](https://huggingface.co/runwayml/stable-diffusion-v1-5/blob/main/model_index.json), are accessible under the same name as attributes of the pipeline and can be shared between pipelines with [`DiffusionPipeline.components`](https://huggingface.co/docs/diffusers/main/en/api/diffusion_pipeline#diffusers.DiffusionPipeline.components) function.
- Every pipeline should be loadable via the [`DiffusionPipeline.from_pretrained`](https://huggingface.co/docs/diffusers/main/en/api/diffusion_pipeline#diffusers.DiffusionPipeline.from_pretrained) function.
- Pipelines should be used **only** for inference.
- Pipelines should be very readable, self-explanatory, and easy to tweak.
- Pipelines should be designed to build on top of each other and be easy to integrate into higher-level APIs.
- Pipelines are **not** intended to be feature-complete user interfaces. For future complete user interfaces one should rather have a look at [InvokeAI](https://github.com/invoke-ai/InvokeAI), [Diffuzers](https://github.com/abhishekkrthakur/diffuzers), and [lama-cleaner](https://github.com/Sanster/lama-cleaner)
- Every pipeline should have one and only one way to run it via a `__call__` method. The naming of the `__call__` arguments should be shared across all pipelines.
- Pipelines should be named after the task they are intended to solve.
- In almost all cases, novel diffusion pipelines shall be implemented in a new pipeline folder/file.
### Models
Models are designed as configurable toolboxes that are natural extensions of [PyTorch's Module class](https://pytorch.org/docs/stable/generated/torch.nn.Module.html). They only partly follow the **single-file policy**.
The following design principles are followed:
- Models correspond to **a type of model architecture**. *E.g.* the [`UNet2DConditionModel`] class is used for all UNet variations that expect 2D image inputs and are conditioned on some context.
- All models can be found in [`src/diffusers/models`](https://github.com/huggingface/diffusers/tree/main/src/diffusers/models) and every model architecture shall be defined in its file, e.g. [`unet_2d_condition.py`](https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/unet_2d_condition.py), [`transformer_2d.py`](https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/transformer_2d.py), etc...
- Models **do not** follow the single-file policy and should make use of smaller model building blocks, such as [`attention.py`](https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/attention.py), [`resnet.py`](https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/resnet.py), [`embeddings.py`](https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/embeddings.py), etc... **Note**: This is in stark contrast to Transformers' modeling files and shows that models do not really follow the single-file policy.
- Models intend to expose complexity, just like PyTorch's module does, and give clear error messages.
- Models all inherit from `ModelMixin` and `ConfigMixin`.
- Models can be optimized for performance when it doesnt demand major code changes, keeps backward compatibility, and gives significant memory or compute gain.
- Models should by default have the highest precision and lowest performance setting.
- To integrate new model checkpoints whose general architecture can be classified as an architecture that already exists in Diffusers, the existing model architecture shall be adapted to make it work with the new checkpoint. One should only create a new file if the model architecture is fundamentally different.
- Models should be designed to be easily extendable to future changes. This can be achieved by limiting public function arguments, configuration arguments, and "foreseeing" future changes, *e.g.* it is usually better to add `string` "...type" arguments that can easily be extended to new future types instead of boolean `is_..._type` arguments. Only the minimum amount of changes shall be made to existing architectures to make a new model checkpoint work.
- The model design is a difficult trade-off between keeping code readable and concise and supporting many model checkpoints. For most parts of the modeling code, classes shall be adapted for new model checkpoints, while there are some exceptions where it is preferred to add new classes to make sure the code is kept concise and
readable longterm, such as [UNet blocks](https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/unet_2d_blocks.py) and [Attention processors](https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/cross_attention.py).
### Schedulers
Schedulers are responsible to guide the denoising process for inference as well as to define a noise schedule for training. They are designed as individual classes with loadable configuration files and strongly follow the **single-file policy**.
The following design principles are followed:
- All schedulers are found in [`src/diffusers/schedulers`](https://github.com/huggingface/diffusers/tree/main/src/diffusers/schedulers).
- Schedulers are **not** allowed to import from large utils files and shall be kept very self-contained.
- One scheduler python file corresponds to one scheduler algorithm (as might be defined in a paper).
- If schedulers share similar functionalities, we can make use of the `#Copied from` mechanism.
- Schedulers all inherit from `SchedulerMixin` and `ConfigMixin`.
- Schedulers can be easily swapped out with the [`ConfigMixin.from_config`](https://huggingface.co/docs/diffusers/main/en/api/configuration#diffusers.ConfigMixin.from_config) method as explained in detail [here](./using-diffusers/schedulers.mdx).
- Every scheduler has to have a `set_num_inference_steps`, and a `step` function. `set_num_inference_steps(...)` has to be called before every denoising process, *i.e.* before `step(...)` is called.
- Every scheduler exposes the timesteps to be "looped over" via a `timesteps` attribute, which is an array of timesteps the model will be called upon
- The `step(...)` function takes a predicted model output and the "current" sample (x_t) and returns the "previous", slightly more denoised sample (x_t-1).
- Given the complexity of diffusion schedulers, the `step` function does not expose all the complexity and can be a bit of a "black box".
- In almost all cases, novel schedulers shall be implemented in a new scheduling file.

View File

@@ -1,4 +1,4 @@
<!--Copyright 2022 The HuggingFace Team. All rights reserved.
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
@@ -36,6 +36,7 @@ available a colab notebook to directly try them out.
|---|---|:---:|:---:|
| [alt_diffusion](./api/pipelines/alt_diffusion) | [**AltDiffusion**](https://arxiv.org/abs/2211.06679) | Image-to-Image Text-Guided Generation |
| [audio_diffusion](./api/pipelines/audio_diffusion) | [**Audio Diffusion**](https://github.com/teticio/audio-diffusion.git) | Unconditional Audio Generation | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/teticio/audio-diffusion/blob/master/notebooks/audio_diffusion_pipeline.ipynb)
| [controlnet](./api/pipelines/stable_diffusion/controlnet) | [**ControlNet with Stable Diffusion**](https://arxiv.org/abs/2302.05543) | Image-to-Image Text-Guided Generation | [![Open In Colab](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/controlnet.ipynb)
| [cycle_diffusion](./api/pipelines/cycle_diffusion) | [**Cycle Diffusion**](https://arxiv.org/abs/2210.05559) | Image-to-Image Text-Guided Generation |
| [dance_diffusion](./api/pipelines/dance_diffusion) | [**Dance Diffusion**](https://github.com/williamberman/diffusers.git) | Unconditional Audio Generation |
| [ddpm](./api/pipelines/ddpm) | [**Denoising Diffusion Probabilistic Models**](https://arxiv.org/abs/2006.11239) | Unconditional Image Generation |
@@ -47,13 +48,24 @@ available a colab notebook to directly try them out.
| [pndm](./api/pipelines/pndm) | [**Pseudo Numerical Methods for Diffusion Models on Manifolds**](https://arxiv.org/abs/2202.09778) | Unconditional Image Generation |
| [score_sde_ve](./api/pipelines/score_sde_ve) | [**Score-Based Generative Modeling through Stochastic Differential Equations**](https://openreview.net/forum?id=PxTIG12RRHS) | Unconditional Image Generation |
| [score_sde_vp](./api/pipelines/score_sde_vp) | [**Score-Based Generative Modeling through Stochastic Differential Equations**](https://openreview.net/forum?id=PxTIG12RRHS) | Unconditional Image Generation |
| [stable_diffusion](./api/pipelines/stable_diffusion/text2img) | [**Stable Diffusion**](https://stability.ai/blog/stable-diffusion-public-release) | Text-to-Image Generation | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/training_example.ipynb)
| [stable_diffusion](./api/pipelines/stable_diffusion/img2img) | [**Stable Diffusion**](https://stability.ai/blog/stable-diffusion-public-release) | Image-to-Image Text-Guided Generation | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/image_2_image_using_diffusers.ipynb)
| [stable_diffusion](./api/pipelines/stable_diffusion/inpaint) | [**Stable Diffusion**](https://stability.ai/blog/stable-diffusion-public-release) | Text-Guided Image Inpainting | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/in_painting_with_stable_diffusion_using_diffusers.ipynb)
| [stable_diffusion_2](./api/pipelines/stable_diffusion_2) | [**Stable Diffusion 2**](https://stability.ai/blog/stable-diffusion-v2-release) | Text-to-Image Generation |
| [semantic_stable_diffusion](./api/pipelines/semantic_stable_diffusion) | [**Semantic Guidance**](https://arxiv.org/abs/2301.12247) | Text-Guided Generation | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/ml-research/semantic-image-editing/blob/main/examples/SemanticGuidance.ipynb)
| [stable_diffusion_text2img](./api/pipelines/stable_diffusion/text2img) | [**Stable Diffusion**](https://stability.ai/blog/stable-diffusion-public-release) | Text-to-Image Generation | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/training_example.ipynb)
| [stable_diffusion_img2img](./api/pipelines/stable_diffusion/img2img) | [**Stable Diffusion**](https://stability.ai/blog/stable-diffusion-public-release) | Image-to-Image Text-Guided Generation | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/image_2_image_using_diffusers.ipynb)
| [stable_diffusion_inpaint](./api/pipelines/stable_diffusion/inpaint) | [**Stable Diffusion**](https://stability.ai/blog/stable-diffusion-public-release) | Text-Guided Image Inpainting | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/in_painting_with_stable_diffusion_using_diffusers.ipynb)
| [stable_diffusion_panorama](./api/pipelines/stable_diffusion/panorama) | [**MultiDiffusion**](https://multidiffusion.github.io/) | Text-to-Panorama Generation |
| [stable_diffusion_pix2pix](./api/pipelines/stable_diffusion/pix2pix) | [**InstructPix2Pix**](https://github.com/timothybrooks/instruct-pix2pix) | Text-Guided Image Editing|
| [stable_diffusion_pix2pix_zero](./api/pipelines/stable_diffusion/pix2pix_zero) | [**Zero-shot Image-to-Image Translation**](https://pix2pixzero.github.io/) | Text-Guided Image Editing |
| [stable_diffusion_attend_and_excite](./api/pipelines/stable_diffusion/attend_and_excite) | [**Attend and Excite for Stable Diffusion**](https://attendandexcite.github.io/Attend-and-Excite/) | Text-to-Image Generation |
| [stable_diffusion_self_attention_guidance](./api/pipelines/stable_diffusion/self_attention_guidance) | [**Self-Attention Guidance**](https://ku-cvlab.github.io/Self-Attention-Guidance) | Text-to-Image Generation |
| [stable_diffusion_image_variation](./stable_diffusion/image_variation) | [**Stable Diffusion Image Variations**](https://github.com/LambdaLabsML/lambda-diffusers#stable-diffusion-image-variations) | Image-to-Image Generation |
| [stable_diffusion_latent_upscale](./stable_diffusion/latent_upscale) | [**Stable Diffusion Latent Upscaler**](https://twitter.com/StabilityAI/status/1590531958815064065) | Text-Guided Super Resolution Image-to-Image |
| [stable_diffusion_2](./api/pipelines/stable_diffusion_2) | [**Stable Diffusion 2**](https://stability.ai/blog/stable-diffusion-v2-release) | Text-to-Image Generation |
| [stable_diffusion_2](./api/pipelines/stable_diffusion_2) | [**Stable Diffusion 2**](https://stability.ai/blog/stable-diffusion-v2-release) | Text-Guided Image Inpainting |
| [stable_diffusion_2](./api/pipelines/stable_diffusion_2) | [**Depth-Conditional Stable Diffusion**](https://github.com/Stability-AI/stablediffusion#depth-conditional-stable-diffusion) | Depth-to-Image Generation |
| [stable_diffusion_2](./api/pipelines/stable_diffusion_2) | [**Stable Diffusion 2**](https://stability.ai/blog/stable-diffusion-v2-release) | Text-Guided Super Resolution Image-to-Image |
| [stable_diffusion_safe](./api/pipelines/stable_diffusion_safe) | [**Safe Stable Diffusion**](https://arxiv.org/abs/2211.05105) | Text-Guided Generation | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/ml-research/safe-latent-diffusion/blob/main/examples/Safe%20Latent%20Diffusion.ipynb)
| [stable_unclip](./stable_unclip) | **Stable unCLIP** | Text-to-Image Generation |
| [stable_unclip](./stable_unclip) | **Stable unCLIP** | Image-to-Image Text-Guided Generation |
| [stochastic_karras_ve](./api/pipelines/stochastic_karras_ve) | [**Elucidating the Design Space of Diffusion-Based Generative Models**](https://arxiv.org/abs/2206.00364) | Unconditional Image Generation |
| [unclip](./api/pipelines/unclip) | [Hierarchical Text-Conditional Image Generation with CLIP Latents](https://arxiv.org/abs/2204.06125) | Text-to-Image Generation |
| [versatile_diffusion](./api/pipelines/versatile_diffusion) | [Versatile Diffusion: Text, Images and Variations All in One Diffusion Model](https://arxiv.org/abs/2211.08332) | Text-to-Image Generation |

View File

@@ -1,4 +1,4 @@
<!--Copyright 2022 The HuggingFace Team. All rights reserved.
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

View File

@@ -1,4 +1,4 @@
<!--Copyright 2022 The HuggingFace Team. All rights reserved.
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
@@ -133,6 +133,35 @@ images = pipe([prompt] * 32).images
You may see a small performance boost in VAE decode on multi-image batches. There should be no performance impact on single-image batches.
## Tiled VAE decode and encode for large images
Tiled VAE processing makes it possible to work with large images on limited VRAM. For example, generating 4k images in 8GB of VRAM. Tiled VAE decoder splits the image into overlapping tiles, decodes the tiles, and blends the outputs to make the final image.
You want to couple this with [`~StableDiffusionPipeline.enable_attention_slicing`] or [`~StableDiffusionPipeline.enable_xformers_memory_efficient_attention`] to further minimize memory use.
To use tiled VAE processing, invoke [`~StableDiffusionPipeline.enable_vae_tiling`] in your pipeline before inference. For example:
```python
import torch
from diffusers import StableDiffusionPipeline, UniPCMultistepScheduler
pipe = StableDiffusionPipeline.from_pretrained(
"runwayml/stable-diffusion-v1-5",
torch_dtype=torch.float16,
)
pipe.scheduler = UniPCMultistepScheduler.from_config(pipe.scheduler.config)
pipe = pipe.to("cuda")
prompt = "a beautiful landscape photograph"
pipe.enable_vae_tiling()
pipe.enable_xformers_memory_efficient_attention()
image = pipe([prompt], width=3840, height=2224, num_inference_steps=20).images[0]
```
The output image will have some tile-to-tile tone variation from the tiles having separate decoders, but you shouldn't see sharp seams between the tiles. The tiling is turned off for images that are 512x512 or smaller.
<a name="sequential_offloading"></a>
## Offloading to CPU with accelerate for memory savings
For additional memory savings, you can offload the weights to CPU and only load them to GPU when performing the forward pass.
@@ -156,7 +185,13 @@ image = pipe(prompt).images[0]
And you can get the memory consumption to < 3GB.
If is also possible to chain it with attention slicing for minimal memory consumption (< 2GB).
Note that this method works at the submodule level, not on whole models. This is the best way to minimize memory consumption, but inference is much slower due to the iterative nature of the process. The UNet component of the pipeline runs several times (as many as `num_inference_steps`); each time, the different submodules of the UNet are sequentially onloaded and then offloaded as they are needed, so the number of memory transfers is large.
<Tip>
Consider using <a href="#model_offloading">model offloading</a> as another point in the optimization space: it will be much faster, but memory savings won't be as large.
</Tip>
It is also possible to chain offloading with attention slicing for minimal memory consumption (< 2GB).
```Python
import torch
@@ -177,6 +212,55 @@ image = pipe(prompt).images[0]
**Note**: When using `enable_sequential_cpu_offload()`, it is important to **not** move the pipeline to CUDA beforehand or else the gain in memory consumption will only be minimal. See [this issue](https://github.com/huggingface/diffusers/issues/1934) for more information.
<a name="model_offloading"></a>
## Model offloading for fast inference and memory savings
[Sequential CPU offloading](#sequential_offloading), as discussed in the previous section, preserves a lot of memory but makes inference slower, because submodules are moved to GPU as needed, and immediately returned to CPU when a new module runs.
Full-model offloading is an alternative that moves whole models to the GPU, instead of handling each model's constituent _modules_. This results in a negligible impact on inference time (compared with moving the pipeline to `cuda`), while still providing some memory savings.
In this scenario, only one of the main components of the pipeline (typically: text encoder, unet and vae)
will be in the GPU while the others wait in the CPU. Compoments like the UNet that run for multiple iterations will stay on GPU until they are no longer needed.
This feature can be enabled by invoking `enable_model_cpu_offload()` on the pipeline, as shown below.
```Python
import torch
from diffusers import StableDiffusionPipeline
pipe = StableDiffusionPipeline.from_pretrained(
"runwayml/stable-diffusion-v1-5",
torch_dtype=torch.float16,
)
prompt = "a photo of an astronaut riding a horse on mars"
pipe.enable_model_cpu_offload()
image = pipe(prompt).images[0]
```
This is also compatible with attention slicing for additional memory savings.
```Python
import torch
from diffusers import StableDiffusionPipeline
pipe = StableDiffusionPipeline.from_pretrained(
"runwayml/stable-diffusion-v1-5",
torch_dtype=torch.float16,
)
prompt = "a photo of an astronaut riding a horse on mars"
pipe.enable_model_cpu_offload()
pipe.enable_attention_slicing(1)
image = pipe(prompt).images[0]
```
<Tip>
This feature requires `accelerate` version 0.17.0 or larger.
</Tip>
## Using Channels Last memory format
Channels last memory format is an alternative way of ordering NCHW tensors in memory preserving dimensions ordering. Channels last tensors ordered in such a way that channels become the densest dimension (aka storing images pixel-per-pixel). Since not all operators currently support channels last format it may result in a worst performance, so it's better to try it and see if it works for your model.
@@ -210,6 +294,7 @@ torch.set_grad_enabled(False)
n_experiments = 2
unet_runs_per_experiment = 50
# load inputs
def generate_inputs():
sample = torch.randn(2, 4, 64, 64).half().cuda()
@@ -288,6 +373,8 @@ pipe = StableDiffusionPipeline.from_pretrained(
# use jitted unet
unet_traced = torch.jit.load("unet_traced.pt")
# del pipe.unet
class TracedUNet(torch.nn.Module):
def __init__(self):

View File

@@ -1,4 +1,4 @@
<!--Copyright 2022 The HuggingFace Team. All rights reserved.
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

View File

@@ -1,4 +1,4 @@
<!--Copyright 2022 The HuggingFace Team. All rights reserved.
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

View File

@@ -1,4 +1,4 @@
<!--Copyright 2022 The HuggingFace Team. All rights reserved.
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
@@ -21,13 +21,13 @@ specific language governing permissions and limitations under the License.
## Stable Diffusion Inference
The snippet below demonstrates how to use the ONNX runtime. You need to use `StableDiffusionOnnxPipeline` instead of `StableDiffusionPipeline`. You also need to download the weights from the `onnx` branch of the repository, and indicate the runtime provider you want to use.
The snippet below demonstrates how to use the ONNX runtime. You need to use `OnnxStableDiffusionPipeline` instead of `StableDiffusionPipeline`. You also need to download the weights from the `onnx` branch of the repository, and indicate the runtime provider you want to use.
```python
# make sure you're logged in with `huggingface-cli login`
from diffusers import StableDiffusionOnnxPipeline
from diffusers import OnnxStableDiffusionPipeline
pipe = StableDiffusionOnnxPipeline.from_pretrained(
pipe = OnnxStableDiffusionPipeline.from_pretrained(
"runwayml/stable-diffusion-v1-5",
revision="onnx",
provider="CUDAExecutionProvider",
@@ -37,6 +37,37 @@ prompt = "a photo of an astronaut riding a horse on mars"
image = pipe(prompt).images[0]
```
The snippet below demonstrates how to use the ONNX runtime with the Stable Diffusion upscaling pipeline.
```python
from diffusers import OnnxStableDiffusionPipeline, OnnxStableDiffusionUpscalePipeline
prompt = "a photo of an astronaut riding a horse on mars"
steps = 50
txt2img = OnnxStableDiffusionPipeline.from_pretrained(
"runwayml/stable-diffusion-v1-5",
revision="onnx",
provider="CUDAExecutionProvider",
)
small_image = txt2img(
prompt,
num_inference_steps=steps,
).images[0]
generator = torch.manual_seed(0)
upscale = OnnxStableDiffusionUpscalePipeline.from_pretrained(
"ssube/stable-diffusion-x4-upscaler-onnx",
provider="CUDAExecutionProvider",
)
large_image = upscale(
prompt,
small_image,
generator=generator,
num_inference_steps=steps,
).images[0]
```
## Known Issues
- Generating multiple prompts in a batch seems to take too much memory. While we look into it, you may need to iterate instead of batching.

View File

@@ -1,4 +1,4 @@
<!--Copyright 2022 The HuggingFace Team. All rights reserved.
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

View File

@@ -0,0 +1,208 @@
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->
# Accelerated PyTorch 2.0 support in Diffusers
Starting from version `0.13.0`, Diffusers supports the latest optimization from the upcoming [PyTorch 2.0](https://pytorch.org/get-started/pytorch-2.0/) release. These include:
1. Support for accelerated transformers implementation with memory-efficient attention no extra dependencies required.
2. [torch.compile](https://pytorch.org/tutorials/intermediate/torch_compile_tutorial.html) support for extra performance boost when individual models are compiled.
## Installation
To benefit from the accelerated transformers implementation and `torch.compile`, we will need to install the nightly version of PyTorch, as the stable version is yet to be released. The first step is to install CUDA 11.7 or CUDA 11.8,
as PyTorch 2.0 does not support the previous versions. Once CUDA is installed, torch nightly can be installed using:
```bash
pip install --pre torch torchvision --index-url https://download.pytorch.org/whl/nightly/cu117
```
## Using accelerated transformers and torch.compile.
1. **Accelerated Transformers implementation**
PyTorch 2.0 includes an optimized and memory-efficient attention implementation through the [`torch.nn.functional.scaled_dot_product_attention`](https://pytorch.org/docs/master/generated/torch.nn.functional.scaled_dot_product_attention) function, which automatically enables several optimizations depending on the inputs and the GPU type. This is similar to the `memory_efficient_attention` from [xFormers](https://github.com/facebookresearch/xformers), but built natively into PyTorch.
These optimizations will be enabled by default in Diffusers if PyTorch 2.0 is installed and if `torch.nn.functional.scaled_dot_product_attention` is available. To use it, just install `torch 2.0` as suggested above and simply use the pipeline. For example:
```Python
import torch
from diffusers import StableDiffusionPipeline
pipe = StableDiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5", torch_dtype=torch.float16)
pipe = pipe.to("cuda")
prompt = "a photo of an astronaut riding a horse on mars"
image = pipe(prompt).images[0]
```
If you want to enable it explicitly (which is not required), you can do so as shown below.
```Python
import torch
from diffusers import StableDiffusionPipeline
from diffusers.models.cross_attention import AttnProcessor2_0
pipe = StableDiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5", torch_dtype=torch.float16).to("cuda")
pipe.unet.set_attn_processor(AttnProcessor2_0())
prompt = "a photo of an astronaut riding a horse on mars"
image = pipe(prompt).images[0]
```
This should be as fast and memory efficient as `xFormers`. More details [in our benchmark](#benchmark).
2. **torch.compile**
To get an additional speedup, we can use the new `torch.compile` feature. To do so, we simply wrap our `unet` with `torch.compile`. For more information and different options, refer to the
[torch compile docs](https://pytorch.org/tutorials/intermediate/torch_compile_tutorial.html).
```python
import torch
from diffusers import StableDiffusionPipeline
pipe = StableDiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5", torch_dtype=torch.float16).to(
"cuda"
)
pipe.unet = torch.compile(pipe.unet)
batch_size = 10
prompt = "A photo of an astronaut riding a horse on marse."
images = pipe(prompt, num_inference_steps=steps, num_images_per_prompt=batch_size).images
```
Depending on the type of GPU, `compile()` can yield between 2-9% of _additional speed-up_ over the accelerated transformer optimizations. Note, however, that compilation is able to squeeze more performance improvements in more recent GPU architectures such as Ampere (A100, 3090), Ada (4090) and Hopper (H100).
Compilation takes some time to complete, so it is best suited for situations where you need to prepare your pipeline once and then perform the same type of inference operations multiple times.
## Benchmark
We conducted a simple benchmark on different GPUs to compare vanilla attention, xFormers, `torch.nn.functional.scaled_dot_product_attention` and `torch.compile+torch.nn.functional.scaled_dot_product_attention`.
For the benchmark we used the the [stable-diffusion-v1-4](https://huggingface.co/CompVis/stable-diffusion-v1-4) model with 50 steps. The `xFormers` benchmark is done using the `torch==1.13.1` version, while the accelerated transformers optimizations are tested using nightly versions of PyTorch 2.0. The tables below summarize the results we got.
The `Speed over xformers` columns denote the speed-up gained over `xFormers` using the `torch.compile+torch.nn.functional.scaled_dot_product_attention`.
### FP16 benchmark
The table below shows the benchmark results for inference using `fp16`. As we can see, `torch.nn.functional.scaled_dot_product_attention` is as fast as `xFormers` (sometimes slightly faster/slower) on all the GPUs we tested.
And using `torch.compile` gives further speed-up of up of 10% over `xFormers`, but it's mostly noticeable on the A100 GPU.
___The time reported is in seconds.___
| GPU | Batch Size | Vanilla Attention | xFormers | PyTorch2.0 SDPA | SDPA + torch.compile | Speed over xformers (%) |
| --- | --- | --- | --- | --- | --- | --- |
| A100 | 10 | 12.02 | 8.7 | 8.79 | 7.89 | 9.31 |
| A100 | 16 | 18.95 | 13.57 | 13.67 | 12.25 | 9.73 |
| A100 | 32 (1) | OOM | 26.56 | 26.68 | 24.08 | 9.34 |
| A100 | 64 | | 52.51 | 53.03 | 47.81 | 8.95 |
| | | | | | | |
| A10 | 4 | 13.94 | 9.81 | 10.01 | 9.35 | 4.69 |
| A10 | 8 | 27.09 | 19 | 19.53 | 18.33 | 3.53 |
| A10 | 10 | 33.69 | 23.53 | 24.19 | 22.52 | 4.29 |
| A10 | 16 | OOM | 37.55 | 38.31 | 36.81 | 1.97 |
| A10 | 32 (1) | | 77.19 | 78.43 | 76.64 | 0.71 |
| A10 | 64 (1) | | 173.59 | 158.99 | 155.14 | 10.63 |
| | | | | | | |
| T4 | 4 | 38.81 | 30.09 | 29.74 | 27.55 | 8.44 |
| T4 | 8 | OOM | 55.71 | 55.99 | 53.85 | 3.34 |
| T4 | 10 | OOM | 68.96 | 69.86 | 65.35 | 5.23 |
| T4 | 16 | OOM | 111.47 | 113.26 | 106.93 | 4.07 |
| | | | | | | |
| V100 | 4 | 9.84 | 8.16 | 8.09 | 7.65 | 6.25 |
| V100 | 8 | OOM | 15.62 | 15.44 | 14.59 | 6.59 |
| V100 | 10 | OOM | 19.52 | 19.28 | 18.18 | 6.86 |
| V100 | 16 | OOM | 30.29 | 29.84 | 28.22 | 6.83 |
| | | | | | | |
| 3090 | 4 | 10.04 | 7.82 | 7.89 | 7.47 | 4.48 |
| 3090 | 8 | 19.27 | 14.97 | 15.04 | 14.22 | 5.01 |
| 3090 | 10| 24.08 | 18.7 | 18.7 | 17.69 | 5.40 |
| 3090 | 16 | OOM | 29.06 | 29.06 | 28.2 | 2.96 |
| 3090 | 32 (1) | | 58.05 | 58 | 54.88 | 5.46 |
| 3090 | 64 (1) | | 126.54 | 126.03 | 117.33 | 7.28 |
| | | | | | | |
| 3090 Ti | 4 | 9.07 | 7.14 | 7.15 | 6.81 | 4.62 |
| 3090 Ti | 8 | 17.51 | 13.65 | 13.72 | 12.99 | 4.84 |
| 3090 Ti | 10 (2) | 21.79 | 16.85 | 16.93 | 16.02 | 4.93 |
| 3090 Ti | 16 | OOM | 26.1 | 26.28 | 25.46 | 2.45 |
| 3090 Ti | 32 (1) | | 51.78 | 52.04 | 49.15 | 5.08 |
| 3090 Ti | 64 (1) | | 112.02 | 112.33 | 103.91 | 7.24 |
| | | | | | | |
| 4090 | 4 | 10.48 | 8.37 | 8.32 | 8.01 | 4.30 |
| 4090 | 8 | 14.33 | 10.22 | 10.42 | 9.78 | 4.31 |
| 4090 | 16 | | 17.07 | 17.46 | 17.15 | -0.47 |
| 4090 | 32 (1) | | 39.03 | 39.86 | 37.97 | 2.72 |
| 4090 | 64 (1) | | 77.29 | 79.44 | 77.67 | -0.49 |
### FP32 benchmark
The table below shows the benchmark results for inference using `fp32`. In this case, `torch.nn.functional.scaled_dot_product_attention` is faster than `xFormers` on all the GPUs we tested.
Using `torch.compile` in addition to the accelerated transformers implementation can yield up to 19% performance improvement over `xFormers` in Ampere and Ada cards, and up to 20% (Ampere) or 28% (Ada) over vanilla attention.
| GPU | Batch Size | Vanilla Attention | xFormers | PyTorch2.0 SDPA | SDPA + torch.compile | Speed over xformers (%) | Speed over vanilla (%) |
| --- | --- | --- | --- | --- | --- | --- | --- |
| A100 | 4 | 16.56 | 12.42 | 12.2 | 11.84 | 4.67 | 28.50 |
| A100 | 10 | OOM | 29.93 | 29.44 | 28.5 | 4.78 | |
| A100 | 16 | | 47.08 | 46.27 | 44.8 | 4.84 | |
| A100 | 32 | | 92.89 | 91.34 | 88.35 | 4.89 | |
| A100 | 64 | | 185.3 | 182.71 | 176.48 | 4.76 | |
| | | | | | | |
| A10 | 1 | 10.59 | 8.81 | 7.51 | 7.35 | 16.57 | 30.59 |
| A10 | 4 | 34.77 | 27.63 | 22.77 | 22.07 | 20.12 | 36.53 |
| A10 | 8 | | 56.19 | 43.53 | 43.86 | 21.94 | |
| A10 | 16 | | 116.49 | 88.56 | 86.64 | 25.62 | |
| A10 | 32 | | 221.95 | 175.74 | 168.18 | 24.23 | |
| A10 | 48 | | 333.23 | 264.84 | | 20.52 | |
| | | | | | | |
| T4 | 1 | 28.2 | 24.49 | 23.93 | 23.56 | 3.80 | 16.45 |
| T4 | 2 | 52.77 | 45.7 | 45.88 | 45.06 | 1.40 | 14.61 |
| T4 | 4 | OOM | 85.72 | 85.78 | 84.48 | 1.45 | |
| T4 | 8 | | 149.64 | 150.75 | 148.4 | 0.83 | |
| | | | | | | |
| V100 | 1 | 7.4 | 6.84 | 6.8 | 6.66 | 2.63 | 10.00 |
| V100 | 2 | 13.85 | 12.81 | 12.66 | 12.35 | 3.59 | 10.83 |
| V100 | 4 | OOM | 25.73 | 25.31 | 24.78 | 3.69 | |
| V100 | 8 | | 43.95 | 43.37 | 42.25 | 3.87 | |
| V100 | 16 | | 84.99 | 84.73 | 82.55 | 2.87 | |
| | | | | | | |
| 3090 | 1 | 7.09 | 6.78 | 6.11 | 6.03 | 11.06 | 14.95 |
| 3090 | 4 | 22.69 | 21.45 | 18.67 | 18.09 | 15.66 | 20.27 |
| 3090 | 8 | | 42.59 | 36.75 | 35.59 | 16.44 | |
| 3090 | 16 | | 85.35 | 72.37 | 70.25 | 17.69 | |
| 3090 | 32 (1) | | 162.05 | 138.99 | 134.53 | 16.98 | |
| 3090 | 48 | | 241.91 | 207.75 | | 14.12 | |
| | | | | | | |
| 3090 Ti | 1 | 6.45 | 6.19 | 5.64 | 5.49 | 11.31 | 14.88 |
| 3090 Ti | 4 | 20.32 | 19.31 | 16.9 | 16.37 | 15.23 | 19.44 |
| 3090 Ti | 8 (2) | | 37.93 | 33.05 | 31.99 | 15.66 | |
| 3090 Ti | 16 | | 75.37 | 65.25 | 64.32 | 14.66 | |
| 3090 Ti | 32 (1) | | 142.55 | 124.44 | 120.74 | 15.30 | |
| 3090 Ti | 48 | | 213.19 | 186.55 | | 12.50 | |
| | | | | | | |
| 4090 | 1 | 5.54 | 4.99 | 4.51 | 4.44 | 11.02 | 19.86 |
| 4090 | 4 | 13.67 | 11.4 | 10.3 | 9.84 | 13.68 | 28.02 |
| 4090 | 8 | | 19.79 | 17.13 | 16.19 | 18.19 | |
| 4090 | 16 | | 38.62 | 33.14 | 32.31 | 16.34 | |
| 4090 | 32 (1) | | 76.57 | 65.96 | 62.05 | 18.96 | |
| 4090 | 48 | | 114.44 | 98.78 | | 13.68 | |
(1) Batch Size >= 32 requires enable_vae_slicing() because of https://github.com/pytorch/pytorch/issues/81665
This is required for PyTorch 1.13.1, and also for PyTorch 2.0 and batch size of 64
For more details about how this benchmark was run, please refer to [this PR](https://github.com/huggingface/diffusers/pull/2303).

View File

@@ -1,4 +1,4 @@
<!--Copyright 2022 The HuggingFace Team. All rights reserved.
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
@@ -14,13 +14,22 @@ specific language governing permissions and limitations under the License.
We recommend the use of [xFormers](https://github.com/facebookresearch/xformers) for both inference and training. In our tests, the optimizations performed in the attention blocks allow for both faster speed and reduced memory consumption.
Installing xFormers has historically been a bit involved, as binary distributions were not always up to date. Fortunately, the project has [very recently](https://github.com/facebookresearch/xformers/pull/591) integrated a process to build pip wheels as part of the project's continuous integration, so this should improve a lot starting from xFormers version 0.0.16.
Until xFormers 0.0.16 is deployed, you can install pip wheels using [`TestPyPI`](https://test.pypi.org/project/formers/). These are the steps that worked for us in a Linux computer to install xFormers version 0.0.15:
Starting from version `0.0.16` of xFormers, released on January 2023, installation can be easily performed using pre-built pip wheels:
```bash
pip install pyre-extensions==0.0.23
pip install -i https://test.pypi.org/simple/ formers==0.0.15.dev376
pip install xformers
```
We'll update these instructions when the wheels are published to the official PyPI repository.
<Tip>
The xFormers PIP package requires the latest version of PyTorch (1.13.1 as of xFormers 0.0.16). If you need to use a previous version of PyTorch, then we recommend you install xFormers from source using [the project instructions](https://github.com/facebookresearch/xformers#installing-xformers).
</Tip>
After xFormers is installed, you can use `enable_xformers_memory_efficient_attention()` for faster inference and reduced memory consumption, as discussed [here](fp16#memory-efficient-attention).
<Tip warning={true}>
According to [this issue](https://github.com/huggingface/diffusers/issues/2234#issuecomment-1416931212), xFormers `v0.0.16` cannot be used for training (fine-tune or Dreambooth) in some GPUs. If you observe that problem, please install a development version as indicated in that comment.
</Tip>

View File

@@ -1,4 +1,4 @@
<!--Copyright 2022 The HuggingFace Team. All rights reserved.
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
@@ -10,10 +10,25 @@ an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express o
specific language governing permissions and limitations under the License.
-->
[[open-in-colab]]
# Quicktour
Get up and running with 🧨 Diffusers quickly!
Whether you're a developer or an everyday user, this quick tour will help you get started and show you how to use [`DiffusionPipeline`] for inference.
Diffusion models are trained to denoise random Gaussian noise step-by-step to generate a sample of interest, such as an image or audio. This has sparked a tremendous amount of interest in generative AI, and you have probably seen examples of diffusion generated images on the internet. 🧨 Diffusers is a library aimed at making diffusion models widely accessible to everyone.
Whether you're a developer or an everyday user, this quicktour will introduce you to 🧨 Diffusers and help you get up and generating quickly! There are three main components of the library to know about:
* The [`DiffusionPipeline`] is a high-level end-to-end class designed to rapidly generate samples from pretrained diffusion models for inference.
* Popular pretrained [model](./api/models) architectures and modules that can be used as building blocks for creating diffusion systems.
* Many different [schedulers](./api/schedulers/overview) - algorithms that control how noise is added for training, and how to generate denoised images during inference.
The quicktour will show you how to use the [`DiffusionPipeline`] for inference, and then walk you through how to combine a model and scheduler to replicate what's happening inside the [`DiffusionPipeline`].
<Tip>
The quicktour is a simplified version of the introductory 🧨 Diffusers [notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/diffusers_intro.ipynb) to help you get started quickly. If you want to learn more about 🧨 Diffusers goal, design philosophy, and additional details about it's core API, check out the notebook!
</Tip>
Before you begin, make sure you have all the necessary libraries installed:
@@ -21,32 +36,32 @@ Before you begin, make sure you have all the necessary libraries installed:
pip install --upgrade diffusers accelerate transformers
```
- [`accelerate`](https://huggingface.co/docs/accelerate/index) speeds up model loading for inference and training
- [`transformers`](https://huggingface.co/docs/transformers/index) is required to run the most popular diffusion models, such as [Stable Diffusion](https://huggingface.co/docs/diffusers/api/pipelines/stable_diffusion/overview)
- [🤗 Accelerate](https://huggingface.co/docs/accelerate/index) speeds up model loading for inference and training.
- [🤗 Transformers](https://huggingface.co/docs/transformers/index) is required to run the most popular diffusion models, such as [Stable Diffusion](https://huggingface.co/docs/diffusers/api/pipelines/stable_diffusion/overview).
## DiffusionPipeline
The [`DiffusionPipeline`] is the easiest way to use a pre-trained diffusion system for inference. You can use the [`DiffusionPipeline`] out-of-the-box for many tasks across different modalities. Take a look at the table below for some supported tasks:
The [`DiffusionPipeline`] is the easiest way to use a pretrained diffusion system for inference. It is an end-to-end system containing the model and the scheduler. You can use the [`DiffusionPipeline`] out-of-the-box for many tasks. Take a look at the table below for some supported tasks, and for a complete list of supported tasks, check out the [🧨 Diffusers Summary](./api/pipelines/overview#diffusers-summary) table.
| **Task** | **Description** | **Pipeline**
|------------------------------|--------------------------------------------------------------------------------------------------------------|-----------------|
| Unconditional Image Generation | generate an image from gaussian noise | [unconditional_image_generation](./using-diffusers/unconditional_image_generation`) |
| Unconditional Image Generation | generate an image from Gaussian noise | [unconditional_image_generation](./using-diffusers/unconditional_image_generation) |
| Text-Guided Image Generation | generate an image given a text prompt | [conditional_image_generation](./using-diffusers/conditional_image_generation) |
| Text-Guided Image-to-Image Translation | adapt an image guided by a text prompt | [img2img](./using-diffusers/img2img) |
| Text-Guided Image-Inpainting | fill the masked part of an image given the image, the mask and a text prompt | [inpaint](./using-diffusers/inpaint) |
| Text-Guided Depth-to-Image Translation | adapt parts of an image guided by a text prompt while preserving structure via depth estimation | [depth2image](./using-diffusers/depth2image) |
| Text-Guided Depth-to-Image Translation | adapt parts of an image guided by a text prompt while preserving structure via depth estimation | [depth2img](./using-diffusers/depth2img) |
For more in-detail information on how diffusion pipelines function for the different tasks, please have a look at the [**Using Diffusers**](./using-diffusers/overview) section.
Start by creating an instance of a [`DiffusionPipeline`] and specify which pipeline checkpoint you would like to download.
You can use the [`DiffusionPipeline`] for any [checkpoint](https://huggingface.co/models?library=diffusers&sort=downloads) stored on the Hugging Face Hub.
In this quicktour, you'll load the [`stable-diffusion-v1-5`](https://huggingface.co/runwayml/stable-diffusion-v1-5) checkpoint for text-to-image generation.
As an example, start by creating an instance of [`DiffusionPipeline`] and specify which pipeline checkpoint you would like to download.
You can use the [`DiffusionPipeline`] for any [Diffusers' checkpoint](https://huggingface.co/models?library=diffusers&sort=downloads).
In this guide though, you'll use [`DiffusionPipeline`] for text-to-image generation with [Stable Diffusion](https://huggingface.co/CompVis/stable-diffusion).
<Tip warning={true}>
For [Stable Diffusion](https://huggingface.co/CompVis/stable-diffusion), please carefully read its [license](https://huggingface.co/spaces/CompVis/stable-diffusion-license) before running the model.
This is due to the improved image generation capabilities of the model and the potentially harmful content that could be produced with it.
Please, head over to your stable diffusion model of choice, *e.g.* [`runwayml/stable-diffusion-v1-5`](https://huggingface.co/runwayml/stable-diffusion-v1-5), and read the license.
For [Stable Diffusion](https://huggingface.co/CompVis/stable-diffusion) models, please carefully read the [license](https://huggingface.co/spaces/CompVis/stable-diffusion-license) first before running the model. 🧨 Diffusers implements a [`safety_checker`](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/stable_diffusion/safety_checker.py) to prevent offensive or harmful content, but the model's improved image generation capabilities can still produce potentially harmful content.
You can load the model as follows:
</Tip>
Load the model with the [`~DiffusionPipeline.from_pretrained`] method:
```python
>>> from diffusers import DiffusionPipeline
@@ -54,77 +69,245 @@ You can load the model as follows:
>>> pipeline = DiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5")
```
The [`DiffusionPipeline`] downloads and caches all modeling, tokenization, and scheduling components.
Because the model consists of roughly 1.4 billion parameters, we strongly recommend running it on GPU.
You can move the generator object to GPU, just like you would in PyTorch.
The [`DiffusionPipeline`] downloads and caches all modeling, tokenization, and scheduling components. You'll see that the Stable Diffusion pipeline is composed of the [`UNet2DConditionModel`] and [`PNDMScheduler`] among other things:
```py
>>> pipeline
StableDiffusionPipeline {
"_class_name": "StableDiffusionPipeline",
"_diffusers_version": "0.13.1",
...,
"scheduler": [
"diffusers",
"PNDMScheduler"
],
...,
"unet": [
"diffusers",
"UNet2DConditionModel"
],
"vae": [
"diffusers",
"AutoencoderKL"
]
}
```
We strongly recommend running the pipeline on a GPU because the model consists of roughly 1.4 billion parameters.
You can move the generator object to a GPU, just like you would in PyTorch:
```python
>>> pipeline.to("cuda")
```
Now you can use the `pipeline` on your text prompt:
Now you can pass a text prompt to the `pipeline` to generate an image, and then access the denoised image. By default, the image output is wrapped in a [`PIL.Image`](https://pillow.readthedocs.io/en/stable/reference/Image.html?highlight=image#the-image-class) object.
```python
>>> image = pipeline("An image of a squirrel in Picasso style").images[0]
>>> image
```
The output is by default wrapped into a [PIL Image object](https://pillow.readthedocs.io/en/stable/reference/Image.html?highlight=image#the-image-class).
<div class="flex justify-center">
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/image_of_squirrel_painting.png"/>
</div>
You can save the image by simply calling:
Save the image by calling `save`:
```python
>>> image.save("image_of_squirrel_painting.png")
```
**Note**: You can also use the pipeline locally by downloading the weights via:
### Local pipeline
You can also use the pipeline locally. The only difference is you need to download the weights first:
```
git lfs install
git clone https://huggingface.co/runwayml/stable-diffusion-v1-5
```
and then loading the saved weights into the pipeline.
Then load the saved weights into the pipeline:
```python
>>> pipeline = DiffusionPipeline.from_pretrained("./stable-diffusion-v1-5")
```
Running the pipeline is then identical to the code above as it's the same model architecture.
Now you can run the pipeline as you would in the section above.
```python
>>> generator.to("cuda")
>>> image = generator("An image of a squirrel in Picasso style").images[0]
>>> image.save("image_of_squirrel_painting.png")
```
### Swapping schedulers
Diffusion systems can be used with multiple different [schedulers](./api/schedulers/overview) each with their
pros and cons. By default, Stable Diffusion runs with [`PNDMScheduler`], but it's very simple to
use a different scheduler. *E.g.* if you would instead like to use the [`EulerDiscreteScheduler`] scheduler,
you could use it as follows:
Different schedulers come with different denoising speeds and quality trade-offs. The best way to find out which one works best for you is to try them out! One of the main features of 🧨 Diffusers is to allow you to easily switch between schedulers. For example, to replace the default [`PNDMScheduler`] with the [`EulerDiscreteScheduler`], load it with the [`~diffusers.ConfigMixin.from_config`] method:
```python
```py
>>> from diffusers import EulerDiscreteScheduler
>>> pipeline = StableDiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5")
>>> # change scheduler to Euler
>>> pipeline.scheduler = EulerDiscreteScheduler.from_config(pipeline.scheduler.config)
```
For more in-detail information on how to change between schedulers, please refer to the [Using Schedulers](./using-diffusers/schedulers) guide.
Try generating an image with the new scheduler and see if you notice a difference!
[Stability AI's](https://stability.ai/) Stable Diffusion model is an impressive image generation model
and can do much more than just generating images from text. We have dedicated a whole documentation page,
just for Stable Diffusion [here](./conceptual/stable_diffusion).
In the next section, you'll take a closer look at the components - the model and scheduler - that make up the [`DiffusionPipeline`] and learn how to use these components to generate an image of a cat.
If you want to know how to optimize Stable Diffusion to run on less memory, higher inference speeds, on specific hardware, such as Mac, or with [ONNX Runtime](https://onnxruntime.ai/), please have a look at our
optimization pages:
## Models
- [Optimized PyTorch on GPU](./optimization/fp16)
- [Mac OS with PyTorch](./optimization/mps)
- [ONNX](./optimization/onnx)
- [OpenVINO](./optimization/open_vino)
Most models take a noisy sample, and at each timestep it predicts the *noise residual* (other models learn to predict the previous sample directly or the velocity or [`v-prediction`](https://github.com/huggingface/diffusers/blob/5e5ce13e2f89ac45a0066cb3f369462a3cf1d9ef/src/diffusers/schedulers/scheduling_ddim.py#L110)), the difference between a less noisy image and the input image. You can mix and match models to create other diffusion systems.
If you want to fine-tune or train your diffusion model, please have a look at the [**training section**](./training/overview)
Models are initiated with the [`~ModelMixin.from_pretrained`] method which also locally caches the model weights so it is faster the next time you load the model. For the quicktour, you'll load the [`UNet2DModel`], a basic unconditional image generation model with a checkpoint trained on cat images:
Finally, please be considerate when distributing generated images publicly 🤗.
```py
>>> from diffusers import UNet2DModel
>>> repo_id = "google/ddpm-cat-256"
>>> model = UNet2DModel.from_pretrained(repo_id)
```
To access the model parameters, call `model.config`:
```py
>>> model.config
```
The model configuration is a 🧊 frozen 🧊 dictionary, which means those parameters can't be changed after the model is created. This is intentional and ensures that the parameters used to define the model architecture at the start remain the same, while other parameters can still be adjusted during inference.
Some of the most important parameters are:
* `sample_size`: the height and width dimension of the input sample.
* `in_channels`: the number of input channels of the input sample.
* `down_block_types` and `up_block_types`: the type of down- and upsampling blocks used to create the UNet architecture.
* `block_out_channels`: the number of output channels of the downsampling blocks; also used in reverse order for the number of input channels of the upsampling blocks.
* `layers_per_block`: the number of ResNet blocks present in each UNet block.
To use the model for inference, create the image shape with random Gaussian noise. It should have a `batch` axis because the model can receive multiple random noises, a `channel` axis corresponding to the number of input channels, and a `sample_size` axis for the height and width of the image:
```py
>>> import torch
>>> torch.manual_seed(0)
>>> noisy_sample = torch.randn(1, model.config.in_channels, model.config.sample_size, model.config.sample_size)
>>> noisy_sample.shape
torch.Size([1, 3, 256, 256])
```
For inference, pass the noisy image to the model and a `timestep`. The `timestep` indicates how noisy the input image is, with more noise at the beginning and less at the end. This helps the model determine its position in the diffusion process, whether it is closer to the start or the end. Use the `sample` method to get the model output:
```py
>>> with torch.no_grad():
... noisy_residual = model(sample=noisy_sample, timestep=2).sample
```
To generate actual examples though, you'll need a scheduler to guide the denoising process. In the next section, you'll learn how to couple a model with a scheduler.
## Schedulers
Schedulers manage going from a noisy sample to a less noisy sample given the model output - in this case, it is the `noisy_residual`.
<Tip>
🧨 Diffusers is a toolbox for building diffusion systems. While the [`DiffusionPipeline`] is a convenient way to get started with a pre-built diffusion system, you can also choose your own the model and scheduler components separately to build a custom diffusion system.
</Tip>
For the quicktour, you'll instantiate the [`DDPMScheduler`] with it's [`~diffusers.ConfigMixin.from_config`] method:
```py
>>> from diffusers import DDPMScheduler
>>> scheduler = DDPMScheduler.from_config(repo_id)
>>> scheduler
DDPMScheduler {
"_class_name": "DDPMScheduler",
"_diffusers_version": "0.13.1",
"beta_end": 0.02,
"beta_schedule": "linear",
"beta_start": 0.0001,
"clip_sample": true,
"clip_sample_range": 1.0,
"num_train_timesteps": 1000,
"prediction_type": "epsilon",
"trained_betas": null,
"variance_type": "fixed_small"
}
```
<Tip>
💡 Notice how the scheduler is instantiated from a configuration. Unlike a model, a scheduler does not have trainable weights and is parameter-free!
</Tip>
Some of the most important parameters are:
* `num_train_timesteps`: the length of the denoising process or in other words, the number of timesteps required to process random Gaussian noise into a data sample.
* `beta_schedule`: the type of noise schedule to use for inference and training.
* `beta_start` and `beta_end`: the start and end noise values for the noise schedule.
To predict a slightly less noisy image, pass the following to the scheduler's [`~diffusers.DDPMScheduler.step`] method: model output, `timestep`, and current `sample`.
```py
>>> less_noisy_sample = scheduler.step(model_output=noisy_residual, timestep=2, sample=noisy_sample).prev_sample
>>> less_noisy_sample.shape
```
The `less_noisy_sample` can be passed to the next `timestep` where it'll get even less noisier! Let's bring it all together now and visualize the entire denoising process.
First, create a function that postprocesses and displays the denoised image as a `PIL.Image`:
```py
>>> import PIL.Image
>>> import numpy as np
>>> def display_sample(sample, i):
... image_processed = sample.cpu().permute(0, 2, 3, 1)
... image_processed = (image_processed + 1.0) * 127.5
... image_processed = image_processed.numpy().astype(np.uint8)
... image_pil = PIL.Image.fromarray(image_processed[0])
... display(f"Image at step {i}")
... display(image_pil)
```
To speed up the denoising process, move the input and model to a GPU:
```py
>>> model.to("cuda")
>>> noisy_sample = noisy_sample.to("cuda")
```
Now create a denoising loop that predicts the residual of the less noisy sample, and computes the less noisy sample with the scheduler:
```py
>>> import tqdm
>>> sample = noisy_sample
>>> for i, t in enumerate(tqdm.tqdm(scheduler.timesteps)):
... # 1. predict noise residual
... with torch.no_grad():
... residual = model(sample, t).sample
... # 2. compute less noisy image and set x_t -> x_t-1
... sample = scheduler.step(residual, t, sample).prev_sample
... # 3. optionally look at image
... if (i + 1) % 50 == 0:
... display_sample(sample, i + 1)
```
Sit back and watch as a cat is generated from nothing but noise! 😻
<div class="flex justify-center">
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/diffusion-quicktour.png"/>
</div>
## Next steps
Hopefully you generated some cool images with 🧨 Diffusers in this quicktour! For your next steps, you can:
* Train or finetune a model to generate your own images in the [training](./tutorials/basic_training) tutorial.
* See example official and community [training or finetuning scripts](https://github.com/huggingface/diffusers/tree/main/examples#-diffusers-examples) for a variety of use cases.
* Learn more about loading, accessing, changing and comparing schedulers in the [Using different Schedulers](./using-diffusers/schedulers) guide.
* Explore prompt engineering, speed and memory optimizations, and tips and tricks for generating higher quality images with the [Stable Diffusion](./stable_diffusion) guide.
* Dive deeper into speeding up 🧨 Diffusers with guides on [optimized PyTorch on a GPU](./optimization/fp16), and inference guides for running [Stable Diffusion on Apple Silicon (M1/M2)](./optimization/mps) and [ONNX Runtime](./optimization/onnx).

View File

@@ -1,4 +1,4 @@
<!--Copyright 2022 The HuggingFace Team. All rights reserved.
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

View File

@@ -1,4 +1,4 @@
<!--Copyright 2022 The HuggingFace Team. All rights reserved.
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
@@ -10,55 +10,67 @@ an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express o
specific language governing permissions and limitations under the License.
-->
# DreamBooth fine-tuning example
# DreamBooth
[DreamBooth](https://arxiv.org/abs/2208.12242) is a method to personalize text-to-image models like stable diffusion given just a few (3~5) images of a subject.
[[open-in-colab]]
[DreamBooth](https://arxiv.org/abs/2208.12242) is a method to personalize text-to-image models like Stable Diffusion given just a few (3-5) images of a subject. It allows the model to generate contextualized images of the subject in different scenes, poses, and views.
![Dreambooth examples from the project's blog](https://dreambooth.github.io/DreamBooth_files/teaser_static.jpg)
_Dreambooth examples from the [project's blog](https://dreambooth.github.io)._
<small>Dreambooth examples from the <a href="https://dreambooth.github.io">project's blog.</a></small>
The [Dreambooth training script](https://github.com/huggingface/diffusers/tree/main/examples/dreambooth) shows how to implement this training procedure on a pre-trained Stable Diffusion model.
This guide will show you how to finetune DreamBooth with the [`CompVis/stable-diffusion-v1-4`](https://huggingface.co/CompVis/stable-diffusion-v1-4) model for various GPU sizes, and with Flax. All the training scripts for DreamBooth used in this guide can be found [here](https://github.com/huggingface/diffusers/tree/main/examples/dreambooth) if you're interested in digging deeper and seeing how things work.
<Tip warning={true}>
Dreambooth fine-tuning is very sensitive to hyperparameters and easy to overfit. We recommend you take a look at our [in-depth analysis](https://huggingface.co/blog/dreambooth) with recommended settings for different subjects, and go from there.
</Tip>
## Training locally
### Installing the dependencies
Before running the scripts, make sure to install the library's training dependencies. We also recommend to install `diffusers` from the `main` github branch.
Before running the scripts, make sure you install the library's training dependencies. We also recommend installing 🧨 Diffusers from the `main` GitHub branch:
```bash
pip install git+https://github.com/huggingface/diffusers
pip install -U -r diffusers/examples/dreambooth/requirements.txt
```
xFormers is not part of the training requirements, but [we recommend you install it if you can](../optimization/xformers). It could make your training faster and less memory intensive.
xFormers is not part of the training requirements, but we recommend you [install](../optimization/xformers) it if you can because it could make your training faster and less memory intensive.
After all dependencies have been set up you can configure a [🤗 Accelerate](https://github.com/huggingface/accelerate/) environment with:
After all the dependencies have been set up, initialize a [🤗 Accelerate](https://github.com/huggingface/accelerate/) environment with:
```bash
accelerate config
```
In this example we'll use model version `v1-4`, so please visit [its card](https://huggingface.co/CompVis/stable-diffusion-v1-4) and carefully read the license before proceeding.
To setup a default 🤗 Accelerate environment without choosing any configurations:
The command below will download and cache the model weights from the Hub because we use the model's Hub id `CompVis/stable-diffusion-v1-4`. You may also clone the repo locally and use the local path in your system where the checkout was saved.
```bash
accelerate config default
```
### Dog toy example
Or if your environment doesn't support an interactive shell like a notebook, you can use:
In this example we'll use [these images](https://drive.google.com/drive/folders/1BO_dyz-p65qhBRRMRA4TbZ8qW4rB99JZ) to add a new concept to Stable Diffusion using the Dreambooth process. They will be our training data. Please, download them and place them somewhere in your system.
```py
from accelerate.utils import write_basic_config
Then you can launch the training script using:
write_basic_config()
```
## Finetuning
<Tip warning={true}>
DreamBooth finetuning is very sensitive to hyperparameters and easy to overfit. We recommend you take a look at our [in-depth analysis](https://huggingface.co/blog/dreambooth) with recommended settings for different subjects to help you choose the appropriate hyperparameters.
</Tip>
<frameworkcontent>
<pt>
Let's try DreamBooth with a [few images of a dog](https://drive.google.com/drive/folders/1BO_dyz-p65qhBRRMRA4TbZ8qW4rB99JZ); download and save them to a directory and then set the `INSTANCE_DIR` environment variable to that path:
```bash
export MODEL_NAME="CompVis/stable-diffusion-v1-4"
export INSTANCE_DIR="path_to_training_images"
export OUTPUT_DIR="path_to_saved_model"
```
Then you can launch the training script (you can find the full training script [here](https://github.com/huggingface/diffusers/blob/main/examples/dreambooth/train_dreambooth.py)) with the following command:
```bash
accelerate launch train_dreambooth.py \
--pretrained_model_name_or_path=$MODEL_NAME \
--instance_data_dir=$INSTANCE_DIR \
@@ -72,13 +84,44 @@ accelerate launch train_dreambooth.py \
--lr_warmup_steps=0 \
--max_train_steps=400
```
</pt>
<jax>
If you have access to TPUs or want to train even faster, you can try out the [Flax training script](https://github.com/huggingface/diffusers/blob/main/examples/dreambooth/train_dreambooth_flax.py). The Flax training script doesn't support gradient checkpointing or gradient accumulation, so you'll need a GPU with at least 30GB of memory.
### Training with a prior-preserving loss
Before running the script, make sure you have the requirements installed:
Prior preservation is used to avoid overfitting and language-drift. Please, refer to the paper to learn more about it if you are interested. For prior preservation, we use other images of the same class as part of the training process. The nice thing is that we can generate those images using the Stable Diffusion model itself! The training script will save the generated images to a local path we specify.
```bash
pip install -U -r requirements.txt
```
According to the paper, it's recommended to generate `num_epochs * num_samples` images for prior preservation. 200-300 works well for most cases.
Now you can launch the training script with the following command:
```bash
export MODEL_NAME="duongna/stable-diffusion-v1-4-flax"
export INSTANCE_DIR="path-to-instance-images"
export OUTPUT_DIR="path-to-save-model"
python train_dreambooth_flax.py \
--pretrained_model_name_or_path=$MODEL_NAME \
--instance_data_dir=$INSTANCE_DIR \
--output_dir=$OUTPUT_DIR \
--instance_prompt="a photo of sks dog" \
--resolution=512 \
--train_batch_size=1 \
--learning_rate=5e-6 \
--max_train_steps=400
```
</jax>
</frameworkcontent>
## Finetuning with prior-preserving loss
Prior preservation is used to avoid overfitting and language-drift (check out the [paper](https://arxiv.org/abs/2208.12242) to learn more if you're interested). For prior preservation, you use other images of the same class as part of the training process. The nice thing is that you can generate those images using the Stable Diffusion model itself! The training script will save the generated images to a local path you specify.
The author's recommend generating `num_epochs * num_samples` images for prior preservation. In most cases, 200-300 images work well.
<frameworkcontent>
<pt>
```bash
export MODEL_NAME="CompVis/stable-diffusion-v1-4"
export INSTANCE_DIR="path_to_training_images"
@@ -102,32 +145,148 @@ accelerate launch train_dreambooth.py \
--num_class_images=200 \
--max_train_steps=800
```
</pt>
<jax>
```bash
export MODEL_NAME="duongna/stable-diffusion-v1-4-flax"
export INSTANCE_DIR="path-to-instance-images"
export CLASS_DIR="path-to-class-images"
export OUTPUT_DIR="path-to-save-model"
### Saving checkpoints while training
python train_dreambooth_flax.py \
--pretrained_model_name_or_path=$MODEL_NAME \
--instance_data_dir=$INSTANCE_DIR \
--class_data_dir=$CLASS_DIR \
--output_dir=$OUTPUT_DIR \
--with_prior_preservation --prior_loss_weight=1.0 \
--instance_prompt="a photo of sks dog" \
--class_prompt="a photo of dog" \
--resolution=512 \
--train_batch_size=1 \
--learning_rate=5e-6 \
--num_class_images=200 \
--max_train_steps=800
```
</jax>
</frameworkcontent>
It's easy to overfit while training with Dreambooth, so sometimes it's useful to save regular checkpoints during the process. One of the intermediate checkpoints might work better than the final model! To use this feature you need to pass the following argument to the training script:
## Finetuning the text encoder and UNet
The script also allows you to finetune the `text_encoder` along with the `unet`. In our experiments (check out the [Training Stable Diffusion with DreamBooth using 🧨 Diffusers](https://huggingface.co/blog/dreambooth) post for more details), this yields much better results, especially when generating images of faces.
<Tip warning={true}>
Training the text encoder requires additional memory and it won't fit on a 16GB GPU. You'll need at least 24GB VRAM to use this option.
</Tip>
Pass the `--train_text_encoder` argument to the training script to enable finetuning the `text_encoder` and `unet`:
<frameworkcontent>
<pt>
```bash
export MODEL_NAME="CompVis/stable-diffusion-v1-4"
export INSTANCE_DIR="path_to_training_images"
export CLASS_DIR="path_to_class_images"
export OUTPUT_DIR="path_to_saved_model"
accelerate launch train_dreambooth.py \
--pretrained_model_name_or_path=$MODEL_NAME \
--train_text_encoder \
--instance_data_dir=$INSTANCE_DIR \
--class_data_dir=$CLASS_DIR \
--output_dir=$OUTPUT_DIR \
--with_prior_preservation --prior_loss_weight=1.0 \
--instance_prompt="a photo of sks dog" \
--class_prompt="a photo of dog" \
--resolution=512 \
--train_batch_size=1 \
--use_8bit_adam
--gradient_checkpointing \
--learning_rate=2e-6 \
--lr_scheduler="constant" \
--lr_warmup_steps=0 \
--num_class_images=200 \
--max_train_steps=800
```
</pt>
<jax>
```bash
export MODEL_NAME="duongna/stable-diffusion-v1-4-flax"
export INSTANCE_DIR="path-to-instance-images"
export CLASS_DIR="path-to-class-images"
export OUTPUT_DIR="path-to-save-model"
python train_dreambooth_flax.py \
--pretrained_model_name_or_path=$MODEL_NAME \
--train_text_encoder \
--instance_data_dir=$INSTANCE_DIR \
--class_data_dir=$CLASS_DIR \
--output_dir=$OUTPUT_DIR \
--with_prior_preservation --prior_loss_weight=1.0 \
--instance_prompt="a photo of sks dog" \
--class_prompt="a photo of dog" \
--resolution=512 \
--train_batch_size=1 \
--learning_rate=2e-6 \
--num_class_images=200 \
--max_train_steps=800
```
</jax>
</frameworkcontent>
## Finetuning with LoRA
You can also use Low-Rank Adaptation of Large Language Models (LoRA), a fine-tuning technique for accelerating training large models, on DreamBooth. For more details, take a look at the [LoRA training](training/lora#dreambooth) guide.
## Saving checkpoints while training
It's easy to overfit while training with Dreambooth, so sometimes it's useful to save regular checkpoints during the training process. One of the intermediate checkpoints might actually work better than the final model! Pass the following argument to the training script to enable saving checkpoints:
```bash
--checkpointing_steps=500
```
This will save the full training state in subfolders of your `output_dir`. Subfolder names begin with the prefix `checkpoint-`, and then the number of steps performed so far; for example: `checkpoint-1500` would be a checkpoint saved after 1500 training steps.
This saves the full training state in subfolders of your `output_dir`. Subfolder names begin with the prefix `checkpoint-`, followed by the number of steps performed so far; for example, `checkpoint-1500` would be a checkpoint saved after 1500 training steps.
#### Resuming training from a saved checkpoint
### Resume training from a saved checkpoint
If you want to resume training from any of the saved checkpoints, you can pass the argument `--resume_from_checkpoint` and then indicate the name of the checkpoint you want to use. You can also use the special string `"latest"` to resume from the last checkpoint saved (i.e., the one with the largest number of steps). For example, the following would resume training from the checkpoint saved after 1500 steps:
If you want to resume training from any of the saved checkpoints, you can pass the argument `--resume_from_checkpoint` to the script and specify the name of the checkpoint you want to use. You can also use the special string `"latest"` to resume from the last saved checkpoint (the one with the largest number of steps). For example, the following would resume training from the checkpoint saved after 1500 steps:
```bash
--resume_from_checkpoint="checkpoint-1500"
```
This would be a good opportunity to tweak some of your hyperparameters if you wish.
This is a good opportunity to tweak some of your hyperparameters if you wish.
#### Performing inference using a saved checkpoint
### Inference from a saved checkpoint
Saved checkpoints are stored in a format suitable for resuming training. They not only include the model weights, but also the state of the optimizer, data loaders and learning rate.
Saved checkpoints are stored in a format suitable for resuming training. They not only include the model weights, but also the state of the optimizer, data loaders, and learning rate.
You can use a checkpoint for inference, but first you need to convert it to an inference pipeline. This is how you could do it:
If you have **`"accelerate>=0.16.0"`** installed, use the following code to run
inference from an intermediate checkpoint.
```python
from diffusers import DiffusionPipeline, UNet2DConditionModel
from transformers import CLIPTextModel
import torch
# Load the pipeline with the same arguments (model, revision) that were used for training
model_id = "CompVis/stable-diffusion-v1-4"
unet = UNet2DConditionModel.from_pretrained("/sddata/dreambooth/daruma-v2-1/checkpoint-100/unet")
# if you have trained with `--args.train_text_encoder` make sure to also load the text encoder
text_encoder = CLIPTextModel.from_pretrained("/sddata/dreambooth/daruma-v2-1/checkpoint-100/text_encoder")
pipeline = DiffusionPipeline.from_pretrained(model_id, unet=unet, text_encoder=text_encoder, dtype=torch.float16)
pipeline.to("cuda")
# Perform inference, or save, or push to the hub
pipeline.save_pretrained("dreambooth-pipeline")
```
If you have **`"accelerate<0.16.0"`** installed, you need to convert it to an inference pipeline first:
```python
from accelerate import Accelerator
@@ -156,15 +315,37 @@ pipeline = DiffusionPipeline.from_pretrained(
pipeline.save_pretrained("dreambooth-pipeline")
```
### Training on a 16GB GPU
## Optimizations for different GPU sizes
With the help of gradient checkpointing and the 8-bit optimizer from [bitsandbytes](https://github.com/TimDettmers/bitsandbytes), it's possible to train dreambooth on a 16GB GPU.
Depending on your hardware, there are a few different ways to optimize DreamBooth on GPUs from 16GB to just 8GB!
### xFormers
[xFormers](https://github.com/facebookresearch/xformers) is a toolbox for optimizing Transformers, and it include a [memory-efficient attention](https://facebookresearch.github.io/xformers/components/ops.html#module-xformers.ops) mechanism that is used in 🧨 Diffusers. You'll need to [install xFormers](./optimization/xformers) and then add the following argument to your training script:
```bash
--enable_xformers_memory_efficient_attention
```
xFormers is not available in Flax.
### Set gradients to none
Another way you can lower your memory footprint is to [set the gradients](https://pytorch.org/docs/stable/generated/torch.optim.Optimizer.zero_grad.html) to `None` instead of zero. However, this may change certain behaviors, so if you run into any issues, try removing this argument. Add the following argument to your training script to set the gradients to `None`:
```bash
--set_grads_to_none
```
### 16GB GPU
With the help of gradient checkpointing and [bitsandbytes](https://github.com/TimDettmers/bitsandbytes) 8-bit optimizer, it's possible to train DreamBooth on a 16GB GPU. Make sure you have bitsandbytes installed:
```bash
pip install bitsandbytes
```
Then pass the `--use_8bit_adam` option to the training script.
Then pass the `--use_8bit_adam` option to the training script:
```bash
export MODEL_NAME="CompVis/stable-diffusion-v1-4"
@@ -191,25 +372,18 @@ accelerate launch train_dreambooth.py \
--max_train_steps=800
```
### Fine-tune the text encoder in addition to the UNet
### 12GB GPU
The script also allows to fine-tune the `text_encoder` along with the `unet`. It has been observed experimentally that this gives much better results, especially on faces. Please, refer to [our blog](https://huggingface.co/blog/dreambooth) for more details.
To enable this option, pass the `--train_text_encoder` argument to the training script.
<Tip>
Training the text encoder requires additional memory, so training won't fit on a 16GB GPU. You'll need at least 24GB VRAM to use this option.
</Tip>
To run DreamBooth on a 12GB GPU, you'll need to enable gradient checkpointing, the 8-bit optimizer, xFormers, and set the gradients to `None`:
```bash
export MODEL_NAME="CompVis/stable-diffusion-v1-4"
export INSTANCE_DIR="path_to_training_images"
export CLASS_DIR="path_to_class_images"
export OUTPUT_DIR="path_to_saved_model"
export INSTANCE_DIR="path-to-instance-images"
export CLASS_DIR="path-to-class-images"
export OUTPUT_DIR="path-to-save-model"
accelerate launch train_dreambooth.py \
--pretrained_model_name_or_path=$MODEL_NAME \
--train_text_encoder \
--instance_data_dir=$INSTANCE_DIR \
--class_data_dir=$CLASS_DIR \
--output_dir=$OUTPUT_DIR \
@@ -218,8 +392,10 @@ accelerate launch train_dreambooth.py \
--class_prompt="a photo of dog" \
--resolution=512 \
--train_batch_size=1 \
--use_8bit_adam
--gradient_checkpointing \
--gradient_accumulation_steps=1 --gradient_checkpointing \
--use_8bit_adam \
--enable_xformers_memory_efficient_attention \
--set_grads_to_none \
--learning_rate=2e-6 \
--lr_scheduler="constant" \
--lr_warmup_steps=0 \
@@ -227,19 +403,25 @@ accelerate launch train_dreambooth.py \
--max_train_steps=800
```
### Training on a 8 GB GPU:
### 8 GB GPU
Using [DeepSpeed](https://www.deepspeed.ai/) it's even possible to offload some
tensors from VRAM to either CPU or NVME, allowing training to proceed with less GPU memory.
For 8GB GPUs, you'll need the help of [DeepSpeed](https://www.deepspeed.ai/) to offload some
tensors from the VRAM to either the CPU or NVME, enabling training with less GPU memory.
DeepSpeed needs to be enabled with `accelerate config`. During configuration,
answer yes to "Do you want to use DeepSpeed?". Combining DeepSpeed stage 2, fp16
mixed precision, and offloading both the model parameters and the optimizer state to CPU, it's
possible to train on under 8 GB VRAM. The drawback is that this requires more system RAM (about 25 GB). See [the DeepSpeed documentation](https://huggingface.co/docs/accelerate/usage_guides/deepspeed) for more configuration options.
Run the following command to configure your 🤗 Accelerate environment:
Changing the default Adam optimizer to DeepSpeed's special version of Adam
`deepspeed.ops.adam.DeepSpeedCPUAdam` gives a substantial speedup, but enabling
it requires the system's CUDA toolchain version to be the same as the one installed with PyTorch. 8-bit optimizers don't seem to be compatible with DeepSpeed at the moment.
```bash
accelerate config
```
During configuration, confirm that you want to use DeepSpeed. Now it's possible to train on under 8GB VRAM by combining DeepSpeed stage 2, fp16 mixed precision, and offloading the model parameters and the optimizer state to the CPU. The drawback is that this requires more system RAM, about 25 GB. See [the DeepSpeed documentation](https://huggingface.co/docs/accelerate/usage_guides/deepspeed) for more configuration options.
You should also change the default Adam optimizer to DeepSpeed's optimized version of Adam
[`deepspeed.ops.adam.DeepSpeedCPUAdam`](https://deepspeed.readthedocs.io/en/latest/optimizers.html#adam-cpu) for a substantial speedup. Enabling `DeepSpeedCPUAdam` requires your system's CUDA toolchain version to be the same as the one installed with PyTorch.
8-bit optimizers don't seem to be compatible with DeepSpeed at the moment.
Launch training with the following command:
```bash
export MODEL_NAME="CompVis/stable-diffusion-v1-4"
@@ -269,7 +451,10 @@ accelerate launch train_dreambooth.py \
## Inference
Once you have trained a model, inference can be done using the `StableDiffusionPipeline`, by simply indicating the path where the model was saved. Make sure that your prompts include the special `identifier` used during training (`sks` in the previous examples).
Once you have trained a model, specify the path to where the model is saved, and use it for inference in the [`StableDiffusionPipeline`]. Make sure your prompts include the special `identifier` used during training (`sks` in the previous examples).
If you have **`"accelerate>=0.16.0"`** installed, you can use the following code to run
inference from an intermediate checkpoint:
```python
from diffusers import StableDiffusionPipeline
@@ -284,4 +469,4 @@ image = pipe(prompt, num_inference_steps=50, guidance_scale=7.5).images[0]
image.save("dog-bucket.png")
```
You may also run inference from [any of the saved training checkpoints](#performing-inference-using-a-saved-checkpoint).
You may also run inference from any of the [saved training checkpoints](#inference-from-a-saved-checkpoint).

View File

@@ -10,54 +10,151 @@ an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express o
specific language governing permissions and limitations under the License.
-->
# LoRA Support in Diffusers
# Low-Rank Adaptation of Large Language Models (LoRA)
Diffusers supports LoRA for faster fine-tuning of Stable Diffusion, allowing greater memory efficiency and easier portability.
[[open-in-colab]]
Low-Rank Adaption of Large Language Models was first introduced by Microsoft in
[LoRA: Low-Rank Adaptation of Large Language Models](https://arxiv.org/abs/2106.09685) by *Edward J. Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, Weizhu Chen*.
<Tip warning={true}>
In a nutshell, LoRA allows adapting pretrained models by adding pairs of rank-decomposition weight matrices (called **update matrices**)
to existing weights and **only** training those newly added weights. This has a couple of advantages:
- Previous pretrained weights are kept frozen so that the model is not so prone to [catastrophic forgetting](https://www.pnas.org/doi/10.1073/pnas.1611835114).
- Rank-decomposition matrices have significantly fewer parameters than the original model, which means that trained LoRA weights are easily portable.
- LoRA matrices are generally added to the attention layers of the original model and they control to which extent the model is adapted toward new training images via a `scale` parameter.
**__Note that the usage of LoRA is not just limited to attention layers. In the original LoRA work, the authors found out that just amending
the attention layers of a language model is sufficient to obtain good downstream performance with great efficiency. This is why, it's common
to just add the LoRA weights to the attention layers of a model.__**
[cloneofsimo](https://github.com/cloneofsimo) was the first to try out LoRA training for Stable Diffusion in the popular [lora](https://github.com/cloneofsimo/lora) GitHub repository.
<Tip>
LoRA allows us to achieve greater memory efficiency since the pretrained weights are kept frozen and only the LoRA weights are trained, thereby
allowing us to run fine-tuning on consumer GPUs like Tesla T4, RTX 3080 or even RTX 2080 Ti! One can get access to GPUs like T4 in the free
tiers of Kaggle Kernels and Google Colab Notebooks.
Currently, LoRA is only supported for the attention layers of the [`UNet2DConditionalModel`].
</Tip>
## Getting started with LoRA for fine-tuning
[Low-Rank Adaptation of Large Language Models (LoRA)](https://arxiv.org/abs/2106.09685) is a training method that accelerates the training of large models while consuming less memory. It adds pairs of rank-decomposition weight matrices (called **update matrices**) to existing weights, and **only** trains those newly added weights. This has a couple of advantages:
Stable Diffusion can be fine-tuned in different ways:
- Previous pretrained weights are kept frozen so the model is not as prone to [catastrophic forgetting](https://www.pnas.org/doi/10.1073/pnas.1611835114).
- Rank-decomposition matrices have significantly fewer parameters than the original model, which means that trained LoRA weights are easily portable.
- LoRA matrices are generally added to the attention layers of the original model. 🧨 Diffusers provides the [`~diffusers.loaders.UNet2DConditionLoadersMixin.load_attn_procs`] method to load the LoRA weights into a model's attention layers. You can control the extent to which the model is adapted toward new training images via a `scale` parameter.
- The greater memory-efficiency allows you to run fine-tuning on consumer GPUs like the Tesla T4, RTX 3080 or even the RTX 2080 Ti! GPUs like the T4 are free and readily accessible in Kaggle or Google Colab notebooks.
* [Textual inversion](https://huggingface.co/docs/diffusers/main/en/training/text_inversion)
* [DreamBooth](https://huggingface.co/docs/diffusers/main/en/training/dreambooth)
* [Text2Image fine-tuning](https://huggingface.co/docs/diffusers/main/en/training/text2image)
<Tip>
We provide two end-to-end examples that show how to run fine-tuning with LoRA:
💡 LoRA is not only limited to attention layers. The authors found that amending
the attention layers of a language model is sufficient to obtain good downstream performance with great efficiency. This is why it's common to just add the LoRA weights to the attention layers of a model. Check out the [Using LoRA for efficient Stable Diffusion fine-tuning](https://huggingface.co/blog/lora) blog for more information about how LoRA works!
* [DreamBooth](https://github.com/huggingface/diffusers/tree/main/examples/dreambooth#training-with-low-rank-adaptation-of-large-language-models-lora)
* [Text2Image](https://github.com/huggingface/diffusers/tree/main/examples/text_to_image#training-with-lora)
</Tip>
If you want to perform DreamBooth training with LoRA, for instance, you would run:
[cloneofsimo](https://github.com/cloneofsimo) was the first to try out LoRA training for Stable Diffusion in the popular [lora](https://github.com/cloneofsimo/lora) GitHub repository. 🧨 Diffusers now supports finetuning with LoRA for [text-to-image generation](https://github.com/huggingface/diffusers/tree/main/examples/text_to_image#training-with-lora) and [DreamBooth](https://github.com/huggingface/diffusers/tree/main/examples/dreambooth#training-with-low-rank-adaptation-of-large-language-models-lora). This guide will show you how to do both.
If you'd like to store or share your model with the community, login to your Hugging Face account (create [one](hf.co/join) if you don't have one already):
```bash
huggingface-cli login
```
## Text-to-image
Finetuning a model like Stable Diffusion, which has billions of parameters, can be slow and difficult. With LoRA, it is much easier and faster to finetune a diffusion model. It can run on hardware with as little as 11GB of GPU RAM without resorting to tricks such as 8-bit optimizers.
### Training[[text-to-image-training]]
Let's finetune [`stable-diffusion-v1-5`](https://huggingface.co/runwayml/stable-diffusion-v1-5) on the [Pokémon BLIP captions](https://huggingface.co/datasets/lambdalabs/pokemon-blip-captions) dataset to generate your own Pokémon.
To start, make sure you have the `MODEL_NAME` and `DATASET_NAME` environment variables set. The `OUTPUT_DIR` and `HUB_MODEL_ID` variables are optional and specify where to save the model to on the Hub:
```bash
export MODEL_NAME="runwayml/stable-diffusion-v1-5"
export OUTPUT_DIR="/sddata/finetune/lora/pokemon"
export HUB_MODEL_ID="pokemon-lora"
export DATASET_NAME="lambdalabs/pokemon-blip-captions"
```
There are some flags to be aware of before you start training:
* `--push_to_hub` stores the trained LoRA embeddings on the Hub.
* `--report_to=wandb` reports and logs the training results to your Weights & Biases dashboard (as an example, take a look at this [report](https://wandb.ai/pcuenq/text2image-fine-tune/runs/b4k1w0tn?workspace=user-pcuenq)).
* `--learning_rate=1e-04`, you can afford to use a higher learning rate than you normally would with LoRA.
Now you're ready to launch the training (you can find the full training script [here](https://github.com/huggingface/diffusers/blob/main/examples/text_to_image/train_text_to_image_lora.py)):
```bash
accelerate launch --mixed_precision="fp16" train_text_to_image_lora.py \
--pretrained_model_name_or_path=$MODEL_NAME \
--dataset_name=$DATASET_NAME \
--dataloader_num_workers=8 \
--resolution=512 --center_crop --random_flip \
--train_batch_size=1 \
--gradient_accumulation_steps=4 \
--max_train_steps=15000 \
--learning_rate=1e-04 \
--max_grad_norm=1 \
--lr_scheduler="cosine" --lr_warmup_steps=0 \
--output_dir=${OUTPUT_DIR} \
--push_to_hub \
--hub_model_id=${HUB_MODEL_ID} \
--report_to=wandb \
--checkpointing_steps=500 \
--validation_prompt="A pokemon with blue eyes." \
--seed=1337
```
### Inference[[text-to-image-inference]]
Now you can use the model for inference by loading the base model in the [`StableDiffusionPipeline`] and then the [`DPMSolverMultistepScheduler`]:
```py
>>> import torch
>>> from diffusers import StableDiffusionPipeline, DPMSolverMultistepScheduler
>>> model_base = "runwayml/stable-diffusion-v1-5"
>>> pipe = StableDiffusionPipeline.from_pretrained(model_base, torch_dtype=torch.float16)
>>> pipe.scheduler = DPMSolverMultistepScheduler.from_config(pipe.scheduler.config)
```
Load the LoRA weights from your finetuned model *on top of the base model weights*, and then move the pipeline to a GPU for faster inference. When you merge the LoRA weights with the frozen pretrained model weights, you can optionally adjust how much of the weights to merge with the `scale` parameter:
<Tip>
💡 A `scale` value of `0` is the same as not using your LoRA weights and you're only using the base model weights, and a `scale` value of `1` means you're only using the fully finetuned LoRA weights. Values between `0` and `1` interpolates between the two weights.
</Tip>
```py
>>> pipe.unet.load_attn_procs(model_path)
>>> pipe.to("cuda")
# use half the weights from the LoRA finetuned model and half the weights from the base model
>>> image = pipe(
... "A pokemon with blue eyes.", num_inference_steps=25, guidance_scale=7.5, cross_attention_kwargs={"scale": 0.5}
... ).images[0]
# use the weights from the fully finetuned LoRA model
>>> image = pipe("A pokemon with blue eyes.", num_inference_steps=25, guidance_scale=7.5).images[0]
>>> image.save("blue_pokemon.png")
```
## DreamBooth
[DreamBooth](https://arxiv.org/abs/2208.12242) is a finetuning technique for personalizing a text-to-image model like Stable Diffusion to generate photorealistic images of a subject in different contexts, given a few images of the subject. However, DreamBooth is very sensitive to hyperparameters and it is easy to overfit. Some important hyperparameters to consider include those that affect the training time (learning rate, number of training steps), and inference time (number of steps, scheduler type).
<Tip>
💡 Take a look at the [Training Stable Diffusion with DreamBooth using 🧨 Diffusers](https://huggingface.co/blog/dreambooth) blog for an in-depth analysis of DreamBooth experiments and recommended settings.
</Tip>
### Training[[dreambooth-training]]
Let's finetune [`stable-diffusion-v1-5`](https://huggingface.co/runwayml/stable-diffusion-v1-5) with DreamBooth and LoRA with some 🐶 [dog images](https://drive.google.com/drive/folders/1BO_dyz-p65qhBRRMRA4TbZ8qW4rB99JZ). Download and save these images to a directory.
To start, make sure you have the `MODEL_NAME` and `INSTANCE_DIR` (path to directory containing images) environment variables set. The `OUTPUT_DIR` variables is optional and specifies where to save the model to on the Hub:
```bash
export MODEL_NAME="runwayml/stable-diffusion-v1-5"
export INSTANCE_DIR="path-to-instance-images"
export OUTPUT_DIR="path-to-save-model"
```
There are some flags to be aware of before you start training:
* `--push_to_hub` stores the trained LoRA embeddings on the Hub.
* `--report_to=wandb` reports and logs the training results to your Weights & Biases dashboard (as an example, take a look at this [report](https://wandb.ai/pcuenq/text2image-fine-tune/runs/b4k1w0tn?workspace=user-pcuenq)).
* `--learning_rate=1e-04`, you can afford to use a higher learning rate than you normally would with LoRA.
Now you're ready to launch the training (you can find the full training script [here](https://github.com/huggingface/diffusers/blob/main/examples/dreambooth/train_dreambooth_lora.py)):
```bash
accelerate launch train_dreambooth_lora.py \
--pretrained_model_name_or_path=$MODEL_NAME \
--instance_data_dir=$INSTANCE_DIR \
@@ -78,78 +175,40 @@ accelerate launch train_dreambooth_lora.py \
--push_to_hub
```
A similar process can be followed to fully fine-tune Stable Diffusion on a custom dataset using the
`examples/text_to_image/train_text_to_image_lora.py` script.
### Inference[[dreambooth-inference]]
Refer to the respective examples linked above to learn more.
Now you can use the model for inference by loading the base model in the [`StableDiffusionPipeline`]:
```py
>>> import torch
>>> from diffusers import StableDiffusionPipeline
>>> model_base = "runwayml/stable-diffusion-v1-5"
>>> pipe = StableDiffusionPipeline.from_pretrained(model_base, torch_dtype=torch.float16)
```
Load the LoRA weights from your finetuned DreamBooth model *on top of the base model weights*, and then move the pipeline to a GPU for faster inference. When you merge the LoRA weights with the frozen pretrained model weights, you can optionally adjust how much of the weights to merge with the `scale` parameter:
<Tip>
When using LoRA we can use a much higher learning rate (typically 1e-4 as opposed to ~1e-6) compared to non-LoRA Dreambooth fine-tuning.
💡 A `scale` value of `0` is the same as not using your LoRA weights and you're only using the base model weights, and a `scale` value of `1` means you're only using the fully finetuned LoRA weights. Values between `0` and `1` interpolates between the two weights.
</Tip>
But there is no free lunch. For the given dataset and expected generation quality, you'd still need to experiment with
different hyperparameters. Here are some important ones:
* Training time
* Learning rate
* Number of training steps
* Inference time
* Number of steps
* Scheduler type
Additionally, you can follow [this blog](https://huggingface.co/blog/dreambooth) that documents some of our experimental
findings for performing DreamBooth training of Stable Diffusion.
When fine-tuning, the LoRA update matrices are only added to the attention layers. To enable this, we added new weight
loading functionalities. Their details are available [here](https://huggingface.co/docs/diffusers/main/en/api/loaders).
## Inference
Assuming you used the `examples/text_to_image/train_text_to_image_lora.py` to fine-tune Stable Diffusion on the [Pokemon
dataset](https://huggingface.co/datasets/lambdalabs/pokemon-blip-captions), you can perform inference like so:
```py
from diffusers import StableDiffusionPipeline
import torch
model_path = "sayakpaul/sd-model-finetuned-lora-t4"
pipe = StableDiffusionPipeline.from_pretrained("CompVis/stable-diffusion-v1-4", torch_dtype=torch.float16)
pipe.unet.load_attn_procs(model_path)
pipe.to("cuda")
prompt = "A pokemon with blue eyes."
image = pipe(prompt, num_inference_steps=30, guidance_scale=7.5).images[0]
image.save("pokemon.png")
```
Here are some example images you can expect:
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/pokemon-collage.png"/>
[`sayakpaul/sd-model-finetuned-lora-t4`](https://huggingface.co/sayakpaul/sd-model-finetuned-lora-t4) contains [LoRA fine-tuned update matrices](https://huggingface.co/sayakpaul/sd-model-finetuned-lora-t4/blob/main/pytorch_lora_weights.bin)
which is only 3 MBs in size. During inference, the pre-trained Stable Diffusion checkpoints are loaded alongside these update
matrices and then they are combined to run inference.
You can use the [`huggingface_hub`](https://github.com/huggingface/huggingface_hub) library to retrieve the base model
from [`sayakpaul/sd-model-finetuned-lora-t4`](https://huggingface.co/sayakpaul/sd-model-finetuned-lora-t4) like so:
```py
from huggingface_hub.repocard import RepoCard
>>> pipe.unet.load_attn_procs(model_path)
>>> pipe.to("cuda")
# use half the weights from the LoRA finetuned model and half the weights from the base model
card = RepoCard.load("sayakpaul/sd-model-finetuned-lora-t4")
base_model = card.data.to_dict()["base_model"]
# 'CompVis/stable-diffusion-v1-4'
```
>>> image = pipe(
... "A picture of a sks dog in a bucket.",
... num_inference_steps=25,
... guidance_scale=7.5,
... cross_attention_kwargs={"scale": 0.5},
... ).images[0]
# use the weights from the fully finetuned LoRA model
And then you can use `pipe = StableDiffusionPipeline.from_pretrained(base_model, torch_dtype=torch.float16)`.
This is especially useful when you don't want to hardcode the base model identifier during initializing the `StableDiffusionPipeline`.
Inference for DreamBooth training remains the same. Check
[this section](https://github.com/huggingface/diffusers/tree/main/examples/dreambooth#inference-1) for more details.
## Known limitations
* Currently, we only support LoRA for the attention layers of [`UNet2DConditionModel`](https://huggingface.co/docs/diffusers/main/en/api/models#diffusers.UNet2DConditionModel).
>>> image = pipe("A picture of a sks dog in a bucket.", num_inference_steps=25, guidance_scale=7.5).images[0]
>>> image.save("bucket-dog.png")
```

View File

@@ -1,4 +1,4 @@
<!--Copyright 2022 The HuggingFace Team. All rights reserved.
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

View File

@@ -1,4 +1,4 @@
<!--Copyright 2022 The HuggingFace Team. All rights reserved.
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
@@ -11,20 +11,15 @@ specific language governing permissions and limitations under the License.
-->
# Stable Diffusion text-to-image fine-tuning
The [`train_text_to_image.py`](https://github.com/huggingface/diffusers/tree/main/examples/text_to_image) script shows how to fine-tune the stable diffusion model on your own dataset.
# Text-to-image
<Tip warning={true}>
The text-to-image fine-tuning script is experimental. It's easy to overfit and run into issues like catastrophic forgetting. We recommend to explore different hyperparameters to get the best results on your dataset.
The text-to-image fine-tuning script is experimental. It's easy to overfit and run into issues like catastrophic forgetting. We recommend you explore different hyperparameters to get the best results on your dataset.
</Tip>
## Running locally
### Installing the dependencies
Text-to-image models like Stable Diffusion generate an image from a text prompt. This guide will show you how to finetune the [`CompVis/stable-diffusion-v1-4`](https://huggingface.co/CompVis/stable-diffusion-v1-4) model on your own dataset with PyTorch and Flax. All the training scripts for text-to-image finetuning used in this guide can be found in this [repository](https://github.com/huggingface/diffusers/tree/main/examples/text_to_image) if you're interested in taking a closer look.
Before running the scripts, make sure to install the library's training dependencies:
@@ -33,32 +28,51 @@ pip install git+https://github.com/huggingface/diffusers.git
pip install -U -r requirements.txt
```
And initialize an [🤗Accelerate](https://github.com/huggingface/accelerate/) environment with:
And initialize an [🤗 Accelerate](https://github.com/huggingface/accelerate/) environment with:
```bash
accelerate config
```
You need to accept the model license before downloading or using the weights. In this example we'll use model version `v1-4`, so you'll need to visit [its card](https://huggingface.co/CompVis/stable-diffusion-v1-4), read the license and tick the checkbox if you agree.
You have to be a registered user in 🤗 Hugging Face Hub, and you'll also need to use an access token for the code to work. For more information on access tokens, please refer to [this section of the documentation](https://huggingface.co/docs/hub/security-tokens).
Run the following command to authenticate your token
```bash
huggingface-cli login
```
If you have already cloned the repo, then you won't need to go through these steps. Instead, you can pass the path to your local checkout to the training script and it will be loaded from there.
### Hardware Requirements for Fine-tuning
## Hardware requirements
Using `gradient_checkpointing` and `mixed_precision` it should be possible to fine tune the model on a single 24GB GPU. For higher `batch_size` and faster training it's better to use GPUs with more than 30GB of GPU memory. You can also use JAX / Flax for fine-tuning on TPUs or GPUs, see [below](#flax-jax-finetuning) for details.
Using `gradient_checkpointing` and `mixed_precision`, it should be possible to finetune the model on a single 24GB GPU. For higher `batch_size`'s and faster training, it's better to use GPUs with more than 30GB of GPU memory. You can also use JAX/Flax for fine-tuning on TPUs or GPUs, which will be covered [below](#flax-jax-finetuning).
### Fine-tuning Example
You can reduce your memory footprint even more by enabling memory efficient attention with xFormers. Make sure you have [xFormers installed](./optimization/xformers) and pass the `--enable_xformers_memory_efficient_attention` flag to the training script.
The following script will launch a fine-tuning run using [Justin Pinkneys' captioned Pokemon dataset](https://huggingface.co/datasets/lambdalabs/pokemon-blip-captions), available in Hugging Face Hub.
xFormers is not available for Flax.
## Upload model to Hub
Store your model on the Hub by adding the following argument to the training script:
```bash
--push_to_hub
```
## Save and load checkpoints
It is a good idea to regularly save checkpoints in case anything happens during training. To save a checkpoint, pass the following argument to the training script:
```bash
--checkpointing_steps=500
```
Every 500 steps, the full training state is saved in a subfolder in the `output_dir`. The checkpoint has the format `checkpoint-` followed by the number of steps trained so far. For example, `checkpoint-1500` is a checkpoint saved after 1500 training steps.
To load a checkpoint to resume training, pass the argument `--resume_from_checkpoint` to the training script and specify the checkpoint you want to resume from. For example, the following argument resumes training from the checkpoint saved after 1500 training steps:
```bash
--resume_from_checkpoint="checkpoint-1500"
```
## Fine-tuning
<frameworkcontent>
<pt>
Launch the [PyTorch training script](https://github.com/huggingface/diffusers/blob/main/examples/text_to_image/train_text_to_image.py) for a fine-tuning run on the [Pokémon BLIP captions](https://huggingface.co/datasets/lambdalabs/pokemon-blip-captions) dataset like this:
```bash
export MODEL_NAME="CompVis/stable-diffusion-v1-4"
@@ -80,9 +94,9 @@ accelerate launch train_text_to_image.py \
--output_dir="sd-pokemon-model"
```
To run on your own training files you need to prepare the dataset according to the format required by `datasets`. You can upload your dataset to the Hub, or you can prepare a local folder with your files. [This documentation](https://huggingface.co/docs/datasets/v2.4.0/en/image_load#imagefolder-with-metadata) explains how to do it.
To finetune on your own dataset, prepare the dataset according to the format required by 🤗 [Datasets](https://huggingface.co/docs/datasets/index). You can [upload your dataset to the Hub](https://huggingface.co/docs/datasets/image_dataset#upload-dataset-to-the-hub), or you can [prepare a local folder with your files](https://huggingface.co/docs/datasets/image_dataset#imagefolder).
You should modify the script if you wish to use custom loading logic. We have left pointers in the code in the appropriate places :)
Modify the script if you want to use custom loading logic. We left pointers in the code in the appropriate places to help you. 🤗 The example script below shows how to finetune on a local dataset in `TRAIN_DIR` and where to save the model to in `OUTPUT_DIR`:
```bash
export MODEL_NAME="CompVis/stable-diffusion-v1-4"
@@ -104,25 +118,19 @@ accelerate launch train_text_to_image.py \
--lr_scheduler="constant" --lr_warmup_steps=0 \
--output_dir=${OUTPUT_DIR}
```
</pt>
<jax>
With Flax, it's possible to train a Stable Diffusion model faster on TPUs and GPUs thanks to [@duongna211](https://github.com/duongna21). This is very efficient on TPU hardware but works great on GPUs too. The Flax training script doesn't support features like gradient checkpointing or gradient accumulation yet, so you'll need a GPU with at least 30GB of memory or a TPU v3.
Once training is finished the model will be saved to the `OUTPUT_DIR` specified in the command. To load the fine-tuned model for inference, just pass that path to `StableDiffusionPipeline`:
Before running the script, make sure you have the requirements installed:
```python
from diffusers import StableDiffusionPipeline
model_path = "path_to_saved_model"
pipe = StableDiffusionPipeline.from_pretrained(model_path, torch_dtype=torch.float16)
pipe.to("cuda")
image = pipe(prompt="yoda").images[0]
image.save("yoda-pokemon.png")
```bash
pip install -U -r requirements_flax.txt
```
### Flax / JAX fine-tuning
Now you can launch the [Flax training script](https://github.com/huggingface/diffusers/blob/main/examples/text_to_image/train_text_to_image_flax.py) like this:
Thanks to [@duongna211](https://github.com/duongna21) it's possible to fine-tune Stable Diffusion using Flax! This is very efficient on TPU hardware but works great on GPUs too. You can use the [Flax training script](https://github.com/huggingface/diffusers/blob/main/examples/text_to_image/train_text_to_image_flax.py) like this:
```Python
```bash
export MODEL_NAME="runwayml/stable-diffusion-v1-5"
export dataset_name="lambdalabs/pokemon-blip-captions"
@@ -136,3 +144,77 @@ python train_text_to_image_flax.py \
--max_grad_norm=1 \
--output_dir="sd-pokemon-model"
```
To finetune on your own dataset, prepare the dataset according to the format required by 🤗 [Datasets](https://huggingface.co/docs/datasets/index). You can [upload your dataset to the Hub](https://huggingface.co/docs/datasets/image_dataset#upload-dataset-to-the-hub), or you can [prepare a local folder with your files](https://huggingface.co/docs/datasets/image_dataset#imagefolder).
Modify the script if you want to use custom loading logic. We left pointers in the code in the appropriate places to help you. 🤗 The example script below shows how to finetune on a local dataset in `TRAIN_DIR`:
```bash
export MODEL_NAME="duongna/stable-diffusion-v1-4-flax"
export TRAIN_DIR="path_to_your_dataset"
python train_text_to_image_flax.py \
--pretrained_model_name_or_path=$MODEL_NAME \
--train_data_dir=$TRAIN_DIR \
--resolution=512 --center_crop --random_flip \
--train_batch_size=1 \
--mixed_precision="fp16" \
--max_train_steps=15000 \
--learning_rate=1e-05 \
--max_grad_norm=1 \
--output_dir="sd-pokemon-model"
```
</jax>
</frameworkcontent>
## LoRA
You can also use Low-Rank Adaptation of Large Language Models (LoRA), a fine-tuning technique for accelerating training large models, for fine-tuning text-to-image models. For more details, take a look at the [LoRA training](lora#text-to-image) guide.
## Inference
Now you can load the fine-tuned model for inference by passing the model path or model name on the Hub to the [`StableDiffusionPipeline`]:
<frameworkcontent>
<pt>
```python
from diffusers import StableDiffusionPipeline
model_path = "path_to_saved_model"
pipe = StableDiffusionPipeline.from_pretrained(model_path, torch_dtype=torch.float16)
pipe.to("cuda")
image = pipe(prompt="yoda").images[0]
image.save("yoda-pokemon.png")
```
</pt>
<jax>
```python
import jax
import numpy as np
from flax.jax_utils import replicate
from flax.training.common_utils import shard
from diffusers import FlaxStableDiffusionPipeline
model_path = "path_to_saved_model"
pipe, params = FlaxStableDiffusionPipeline.from_pretrained(model_path, dtype=jax.numpy.bfloat16)
prompt = "yoda pokemon"
prng_seed = jax.random.PRNGKey(0)
num_inference_steps = 50
num_samples = jax.device_count()
prompt = num_samples * [prompt]
prompt_ids = pipeline.prepare_inputs(prompt)
# shard inputs and rng
params = replicate(params)
prng_seed = jax.random.split(prng_seed, jax.device_count())
prompt_ids = shard(prompt_ids)
images = pipeline(prompt_ids, params, prng_seed, num_inference_steps, jit=True).images
images = pipeline.numpy_to_pil(np.asarray(images.reshape((num_samples,) + images.shape[-3:])))
image.save("yoda-pokemon.png")
```
</jax>
</frameworkcontent>

View File

@@ -1,4 +1,4 @@
<!--Copyright 2022 The HuggingFace Team. All rights reserved.
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
@@ -14,74 +14,85 @@ specific language governing permissions and limitations under the License.
# Textual Inversion
Textual Inversion is a technique for capturing novel concepts from a small number of example images in a way that can later be used to control text-to-image pipelines. It does so by learning new 'words' in the embedding space of the pipeline's text encoder. These special words can then be used within text prompts to achieve very fine-grained control of the resulting images.
[[open-in-colab]]
[Textual Inversion](https://arxiv.org/abs/2208.01618) is a technique for capturing novel concepts from a small number of example images. While the technique was originally demonstrated with a [latent diffusion model](https://github.com/CompVis/latent-diffusion), it has since been applied to other model variants like [Stable Diffusion](https://huggingface.co/docs/diffusers/main/en/conceptual/stable_diffusion). The learned concepts can be used to better control the images generated from text-to-image pipelines. It learns new "words" in the text encoder's embedding space, which are used within text prompts for personalized image generation.
![Textual Inversion example](https://textual-inversion.github.io/static/images/editing/colorful_teapot.JPG)
_By using just 3-5 images you can teach new concepts to a model such as Stable Diffusion for personalized image generation ([image source](https://github.com/rinongal/textual_inversion))._
<small>By using just 3-5 images you can teach new concepts to a model such as Stable Diffusion for personalized image generation <a href="https://github.com/rinongal/textual_inversion">(image source)</a></small>
This technique was introduced in [An Image is Worth One Word: Personalizing Text-to-Image Generation using Textual Inversion](https://arxiv.org/abs/2208.01618). The paper demonstrated the concept using a [latent diffusion model](https://github.com/CompVis/latent-diffusion) but the idea has since been applied to other variants such as [Stable Diffusion](https://huggingface.co/docs/diffusers/main/en/conceptual/stable_diffusion).
This guide will show you how to train a [`runwayml/stable-diffusion-v1-5`](https://huggingface.co/runwayml/stable-diffusion-v1-5) model with Textual Inversion. All the training scripts for Textual Inversion used in this guide can be found [here](https://github.com/huggingface/diffusers/tree/main/examples/textual_inversion) if you're interested in taking a closer look at how things work under the hood.
<Tip>
## How It Works
There is a community-created collection of trained Textual Inversion models in the [Stable Diffusion Textual Inversion Concepts Library](https://huggingface.co/sd-concepts-library) which are readily available for inference. Over time, this'll hopefully grow into a useful resource as more concepts are added!
![Diagram from the paper showing overview](https://textual-inversion.github.io/static/images/training/training.JPG)
_Architecture Overview from the [textual inversion blog post](https://textual-inversion.github.io/)_
</Tip>
Before a text prompt can be used in a diffusion model, it must first be processed into a numerical representation. This typically involves tokenizing the text, converting each token to an embedding and then feeding those embeddings through a model (typically a transformer) whose output will be used as the conditioning for the diffusion model.
Textual inversion learns a new token embedding (v* in the diagram above). A prompt (that includes a token which will be mapped to this new embedding) is used in conjunction with a noised version of one or more training images as inputs to the generator model, which attempts to predict the denoised version of the image. The embedding is optimized based on how well the model does at this task - an embedding that better captures the object or style shown by the training images will give more useful information to the diffusion model and thus result in a lower denoising loss. After many steps (typically several thousand) with a variety of prompt and image variants the learned embedding should hopefully capture the essence of the new concept being taught.
## Usage
To train your own textual inversions, see the [example script here](https://github.com/huggingface/diffusers/tree/main/examples/textual_inversion).
There is also a notebook for training:
[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/sd_textual_inversion_training.ipynb)
And one for inference:
[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/stable_conceptualizer_inference.ipynb)
In addition to using concepts you have trained yourself, there is a community-created collection of trained textual inversions in the new [Stable Diffusion public concepts library](https://huggingface.co/sd-concepts-library) which you can also use from the inference notebook above. Over time this will hopefully grow into a useful resource as more examples are added.
## Example: Running locally
The `textual_inversion.py` script [here](https://github.com/huggingface/diffusers/blob/main/examples/textual_inversion) shows how to implement the training procedure and adapt it for stable diffusion.
### Installing the dependencies
Before running the scripts, make sure to install the library's training dependencies.
Before you begin, make sure you install the library's training dependencies:
```bash
pip install diffusers[training] accelerate transformers
pip install diffusers accelerate transformers
```
And initialize an [🤗Accelerate](https://github.com/huggingface/accelerate/) environment with:
After all the dependencies have been set up, initialize a [🤗Accelerate](https://github.com/huggingface/accelerate/) environment with:
```bash
accelerate config
```
### Cat toy example
You need to accept the model license before downloading or using the weights. In this example we'll use model version `v1-4`, so you'll need to visit [its card](https://huggingface.co/CompVis/stable-diffusion-v1-4), read the license and tick the checkbox if you agree.
You have to be a registered user in 🤗 Hugging Face Hub, and you'll also need to use an access token for the code to work. For more information on access tokens, please refer to [this section of the documentation](https://huggingface.co/docs/hub/security-tokens).
Run the following command to authenticate your token
To setup a default 🤗 Accelerate environment without choosing any configurations:
```bash
huggingface-cli login
accelerate config default
```
If you have already cloned the repo, then you won't need to go through these steps.
Or if your environment doesn't support an interactive shell like a notebook, you can use:
<br>
```bash
from accelerate.utils import write_basic_config
Now let's get our dataset.Download 3-4 images from [here](https://drive.google.com/drive/folders/1fmJMs25nxS_rSNqS5hTcRdLem_YQXbq5) and save them in a directory. This will be our training data.
write_basic_config()
```
And launch the training using
Finally, you try and [install xFormers](https://huggingface.co/docs/diffusers/main/en/training/optimization/xformers) to reduce your memory footprint with xFormers memory-efficient attention. Once you have xFormers installed, add the `--enable_xformers_memory_efficient_attention` argument to the training script. xFormers is not supported for Flax.
## Upload model to Hub
If you want to store your model on the Hub, add the following argument to the training script:
```bash
--push_to_hub
```
## Save and load checkpoints
It is often a good idea to regularly save checkpoints of your model during training. This way, you can resume training from a saved checkpoint if your training is interrupted for any reason. To save a checkpoint, pass the following argument to the training script to save the full training state in a subfolder in `output_dir` every 500 steps:
```bash
--checkpointing_steps=500
```
To resume training from a saved checkpoint, pass the following argument to the training script and the specific checkpoint you'd like to resume from:
```bash
--resume_from_checkpoint="checkpoint-1500"
```
## Finetuning
For your training dataset, download these [images of a cat statue](https://drive.google.com/drive/folders/1fmJMs25nxS_rSNqS5hTcRdLem_YQXbq5) and store them in a directory.
Set the `MODEL_NAME` environment variable to the model repository id, and the `DATA_DIR` environment variable to the path of the directory containing the images. Now you can launch the [training script](https://github.com/huggingface/diffusers/blob/main/examples/textual_inversion/textual_inversion.py):
<Tip>
💡 A full training run takes ~1 hour on one V100 GPU. While you're waiting for the training to complete, feel free to check out [how Textual Inversion works](#how-it-works) in the section below if you're curious!
</Tip>
<frameworkcontent>
<pt>
```bash
export MODEL_NAME="runwayml/stable-diffusion-v1-5"
export DATA_DIR="path-to-dir-containing-images"
@@ -100,14 +111,56 @@ accelerate launch textual_inversion.py \
--lr_warmup_steps=0 \
--output_dir="textual_inversion_cat"
```
</pt>
<jax>
If you have access to TPUs, try out the [Flax training script](https://github.com/huggingface/diffusers/blob/main/examples/textual_inversion/textual_inversion_flax.py) to train even faster (this'll also work for GPUs). With the same configuration settings, the Flax training script should be at least 70% faster than the PyTorch training script! ⚡️
A full training run takes ~1 hour on one V100 GPU.
Before you begin, make sure you install the Flax specific dependencies:
```bash
pip install -U -r requirements_flax.txt
```
### Inference
Then you can launch the [training script](https://github.com/huggingface/diffusers/blob/main/examples/textual_inversion/textual_inversion_flax.py):
Once you have trained a model using above command, the inference can be done simply using the `StableDiffusionPipeline`. Make sure to include the `placeholder_token` in your prompt.
```bash
export MODEL_NAME="duongna/stable-diffusion-v1-4-flax"
export DATA_DIR="path-to-dir-containing-images"
python textual_inversion_flax.py \
--pretrained_model_name_or_path=$MODEL_NAME \
--train_data_dir=$DATA_DIR \
--learnable_property="object" \
--placeholder_token="<cat-toy>" --initializer_token="toy" \
--resolution=512 \
--train_batch_size=1 \
--max_train_steps=3000 \
--learning_rate=5.0e-04 --scale_lr \
--output_dir="textual_inversion_cat"
```
</jax>
</frameworkcontent>
### Intermediate logging
If you're interested in following along with your model training progress, you can save the generated images from the training process. Add the following arguments to the training script to enable intermediate logging:
- `validation_prompt`, the prompt used to generate samples (this is set to `None` by default and intermediate logging is disabled)
- `num_validation_images`, the number of sample images to generate
- `validation_steps`, the number of steps before generating `num_validation_images` from the `validation_prompt`
```bash
--validation_prompt="A <cat-toy> backpack"
--num_validation_images=4
--validation_steps=100
```
## Inference
Once you have trained a model, you can use it for inference with the [`StableDiffusionPipeline]. Make sure you include the `placeholder_token` in your prompt, in this case, it is `<cat-toy>`.
<frameworkcontent>
<pt>
```python
from diffusers import StableDiffusionPipeline
@@ -120,3 +173,43 @@ image = pipe(prompt, num_inference_steps=50, guidance_scale=7.5).images[0]
image.save("cat-backpack.png")
```
</pt>
<jax>
```python
import jax
import numpy as np
from flax.jax_utils import replicate
from flax.training.common_utils import shard
from diffusers import FlaxStableDiffusionPipeline
model_path = "path-to-your-trained-model"
pipe, params = FlaxStableDiffusionPipeline.from_pretrained(model_path, dtype=jax.numpy.bfloat16)
prompt = "A <cat-toy> backpack"
prng_seed = jax.random.PRNGKey(0)
num_inference_steps = 50
num_samples = jax.device_count()
prompt = num_samples * [prompt]
prompt_ids = pipeline.prepare_inputs(prompt)
# shard inputs and rng
params = replicate(params)
prng_seed = jax.random.split(prng_seed, jax.device_count())
prompt_ids = shard(prompt_ids)
images = pipeline(prompt_ids, params, prng_seed, num_inference_steps, jit=True).images
images = pipeline.numpy_to_pil(np.asarray(images.reshape((num_samples,) + images.shape[-3:])))
image.save("cat-backpack.png")
```
</jax>
</frameworkcontent>
## How it works
![Diagram from the paper showing overview](https://textual-inversion.github.io/static/images/training/training.JPG)
<small>Architecture overview from the Textual Inversion <a href="https://textual-inversion.github.io/">blog post.</a></small>
Usually, text prompts are tokenized into an embedding before being passed to a model, which is often a transformer. Textual Inversion does something similar, but it learns a new token embedding, `v*`, from a special token `S*` in the diagram above. The model output is used to condition the diffusion model, which helps the diffusion model understand the prompt and new concepts from just a few example images.
To do this, Textual Inversion uses a generator model and noisy versions of the training images. The generator tries to predict less noisy versions of the images, and the token embedding `v*` is optimized based on how well the generator does. If the token embedding successfully captures the new concept, it gives more useful information to the diffusion model and helps create clearer images with less noise. This optimization process typically occurs after several thousand steps of exposure to a variety of prompt and image variants.

View File

@@ -1,4 +1,4 @@
<!--Copyright 2022 The HuggingFace Team. All rights reserved.
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

View File

@@ -0,0 +1,414 @@
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->
[[open-in-colab]]
# Train a diffusion model
Unconditional image generation is a popular application of diffusion models that generates images that look like those in the dataset used for training. Typically, the best results are obtained from finetuning a pretrained model on a specific dataset. You can find many of these checkpoints on the [Hub](https://huggingface.co/search/full-text?q=unconditional-image-generation&type=model), but if you can't find one you like, you can always train your own!
This tutorial will teach you how to train a [`UNet2DModel`] from scratch on a subset of the [Smithsonian Butterflies](https://huggingface.co/datasets/huggan/smithsonian_butterflies_subset) dataset to generate your own 🦋 butterflies 🦋.
<Tip>
💡 This training tutorial is based on the [Training with 🧨 Diffusers](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/training_example.ipynb) notebook. For additional details and context about diffusion models like how they work, check out the notebook!
</Tip>
Before you begin, make sure you have 🤗 Datasets installed to load and preprocess image datasets, and 🤗 Accelerate, to simplify training on any number of GPUs. The following command will also install [TensorBoard](https://www.tensorflow.org/tensorboard) to visualize training metrics (you can also use [Weights & Biases](https://docs.wandb.ai/) to track your training).
```bash
!pip install diffusers[training]
```
We encourage you to share your model with the community, and in order to do that, you'll need to login to your Hugging Face account (create one [here](https://hf.co/join) if you don't already have one!). You can login from a notebook and enter your token when prompted:
```py
>>> from huggingface_hub import notebook_login
>>> notebook_login()
```
Or login in from the terminal:
```bash
huggingface-cli login
```
Since the model checkpoints are quite large, install [Git-LFS](https://git-lfs.com/) to version these large files:
```bash
!sudo apt -qq install git-lfs
!git config --global credential.helper store
```
## Training configuration
For convenience, create a `TrainingConfig` class containing the training hyperparameters (feel free to adjust them):
```py
>>> from dataclasses import dataclass
>>> @dataclass
... class TrainingConfig:
... image_size = 128 # the generated image resolution
... train_batch_size = 16
... eval_batch_size = 16 # how many images to sample during evaluation
... num_epochs = 50
... gradient_accumulation_steps = 1
... learning_rate = 1e-4
... lr_warmup_steps = 500
... save_image_epochs = 10
... save_model_epochs = 30
... mixed_precision = "fp16" # `no` for float32, `fp16` for automatic mixed precision
... output_dir = "ddpm-butterflies-128" # the model name locally and on the HF Hub
... push_to_hub = True # whether to upload the saved model to the HF Hub
... hub_private_repo = False
... overwrite_output_dir = True # overwrite the old model when re-running the notebook
... seed = 0
>>> config = TrainingConfig()
```
## Load the dataset
You can easily load the [Smithsonian Butterflies](https://huggingface.co/datasets/huggan/smithsonian_butterflies_subset) dataset with the 🤗 Datasets library:
```py
>>> from datasets import load_dataset
>>> config.dataset_name = "huggan/smithsonian_butterflies_subset"
>>> dataset = load_dataset(config.dataset_name, split="train")
```
<Tip>
💡 You can find additional datasets from the [HugGan Community Event](https://huggingface.co/huggan) or you can use your own dataset by creating a local [`ImageFolder`](https://huggingface.co/docs/datasets/image_dataset#imagefolder). Set `config.dataset_name` to the repository id of the dataset if it is from the HugGan Community Event, or `imagefolder` if you're using your own images.
</Tip>
🤗 Datasets uses the [`~datasets.Image`] feature to automatically decode the image data and load it as a [`PIL.Image`](https://pillow.readthedocs.io/en/stable/reference/Image.html) which we can visualize:
```py
>>> import matplotlib.pyplot as plt
>>> fig, axs = plt.subplots(1, 4, figsize=(16, 4))
>>> for i, image in enumerate(dataset[:4]["image"]):
... axs[i].imshow(image)
... axs[i].set_axis_off()
>>> fig.show()
```
<div class="flex justify-center">
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/butterflies_ds.png"/>
</div>
The images are all different sizes though, so you'll need to preprocess them first:
* `Resize` changes the image size to the one defined in `config.image_size`.
* `RandomHorizontalFlip` augments the dataset by randomly mirroring the images.
* `Normalize` is important to rescale the pixel values into a [-1, 1] range, which is what the model expects.
```py
>>> from torchvision import transforms
>>> preprocess = transforms.Compose(
... [
... transforms.Resize((config.image_size, config.image_size)),
... transforms.RandomHorizontalFlip(),
... transforms.ToTensor(),
... transforms.Normalize([0.5], [0.5]),
... ]
... )
```
Use 🤗 Datasets' [`~datasets.Dataset.set_transform`] method to apply the `preprocess` function on the fly during training:
```py
>>> def transform(examples):
... images = [preprocess(image.convert("RGB")) for image in examples["image"]]
... return {"images": images}
>>> dataset.set_transform(transform)
```
Feel free to visualize the images again to confirm that they've been resized. Now you're ready to wrap the dataset in a [DataLoader](https://pytorch.org/docs/stable/data#torch.utils.data.DataLoader) for training!
```py
>>> import torch
>>> train_dataloader = torch.utils.data.DataLoader(dataset, batch_size=config.train_batch_size, shuffle=True)
```
## Create a UNet2DModel
Pretrained models in 🧨 Diffusers are easily created from their model class with the parameters you want. For example, to create a [`UNet2DModel`]:
```py
>>> from diffusers import UNet2DModel
>>> model = UNet2DModel(
... sample_size=config.image_size, # the target image resolution
... in_channels=3, # the number of input channels, 3 for RGB images
... out_channels=3, # the number of output channels
... layers_per_block=2, # how many ResNet layers to use per UNet block
... block_out_channels=(128, 128, 256, 256, 512, 512), # the number of output channels for each UNet block
... down_block_types=(
... "DownBlock2D", # a regular ResNet downsampling block
... "DownBlock2D",
... "DownBlock2D",
... "DownBlock2D",
... "AttnDownBlock2D", # a ResNet downsampling block with spatial self-attention
... "DownBlock2D",
... ),
... up_block_types=(
... "UpBlock2D", # a regular ResNet upsampling block
... "AttnUpBlock2D", # a ResNet upsampling block with spatial self-attention
... "UpBlock2D",
... "UpBlock2D",
... "UpBlock2D",
... "UpBlock2D",
... ),
... )
```
It is often a good idea to quickly check the sample image shape matches the model output shape:
```py
>>> sample_image = dataset[0]["images"].unsqueeze(0)
>>> print("Input shape:", sample_image.shape)
Input shape: torch.Size([1, 3, 128, 128])
>>> print("Output shape:", model(sample_image, timestep=0).sample.shape)
Output shape: torch.Size([1, 3, 128, 128])
```
Great! Next, you'll need a scheduler to add some noise to the image.
## Create a scheduler
The scheduler behaves differently depending on whether you're using the model for training or inference. During inference, the scheduler generates image from the noise. During training, the scheduler takes a model output - or a sample - from a specific point in the diffusion process and applies noise to the image according to a *noise schedule* and an *update rule*.
Let's take a look at the [`DDPMScheduler`] and use the `add_noise` method to add some random noise to the `sample_image` from before:
```py
>>> import torch
>>> from PIL import Image
>>> from diffusers import DDPMScheduler
>>> noise_scheduler = DDPMScheduler(num_train_timesteps=1000)
>>> noise = torch.randn(sample_image.shape)
>>> timesteps = torch.LongTensor([50])
>>> noisy_image = noise_scheduler.add_noise(sample_image, noise, timesteps)
>>> Image.fromarray(((noisy_image.permute(0, 2, 3, 1) + 1.0) * 127.5).type(torch.uint8).numpy()[0])
```
<div class="flex justify-center">
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/noisy_butterfly.png"/>
</div>
The training objective of the model is to predict the noise added to the image. The loss at this step can be calculated by:
```py
>>> import torch.nn.functional as F
>>> noise_pred = model(noisy_image, timesteps).sample
>>> loss = F.mse_loss(noise_pred, noise)
```
## Train the model
By now, you have most of the pieces to start training the model and all that's left is putting everything together.
First, you'll need an optimizer and a learning rate scheduler:
```py
>>> from diffusers.optimization import get_cosine_schedule_with_warmup
>>> optimizer = torch.optim.AdamW(model.parameters(), lr=config.learning_rate)
>>> lr_scheduler = get_cosine_schedule_with_warmup(
... optimizer=optimizer,
... num_warmup_steps=config.lr_warmup_steps,
... num_training_steps=(len(train_dataloader) * config.num_epochs),
... )
```
Then, you'll need a way to evaluate the model. For evaluation, you can use the [`DDPMPipeline`] to generate a batch of sample images and save it as a grid:
```py
>>> from diffusers import DDPMPipeline
>>> import math
>>> def make_grid(images, rows, cols):
... w, h = images[0].size
... grid = Image.new("RGB", size=(cols * w, rows * h))
... for i, image in enumerate(images):
... grid.paste(image, box=(i % cols * w, i // cols * h))
... return grid
>>> def evaluate(config, epoch, pipeline):
... # Sample some images from random noise (this is the backward diffusion process).
... # The default pipeline output type is `List[PIL.Image]`
... images = pipeline(
... batch_size=config.eval_batch_size,
... generator=torch.manual_seed(config.seed),
... ).images
... # Make a grid out of the images
... image_grid = make_grid(images, rows=4, cols=4)
... # Save the images
... test_dir = os.path.join(config.output_dir, "samples")
... os.makedirs(test_dir, exist_ok=True)
... image_grid.save(f"{test_dir}/{epoch:04d}.png")
```
Now you can wrap all these components together in a training loop with 🤗 Accelerate for easy TensorBoard logging, gradient accumulation, and mixed precision training. To upload the model to the Hub, write a function to get your repository name and information and then push it to the Hub.
<Tip>
💡 The training loop below may look intimidating and long, but it'll be worth it later when you launch your training in just one line of code! If you can't wait and want to start generating images, feel free to copy and run the code below. You can always come back and examine the training loop more closely later, like when you're waiting for your model to finish training. 🤗
</Tip>
```py
>>> from accelerate import Accelerator
>>> from huggingface_hub import HfFolder, Repository, whoami
>>> from tqdm.auto import tqdm
>>> from pathlib import Path
>>> import os
>>> def get_full_repo_name(model_id: str, organization: str = None, token: str = None):
... if token is None:
... token = HfFolder.get_token()
... if organization is None:
... username = whoami(token)["name"]
... return f"{username}/{model_id}"
... else:
... return f"{organization}/{model_id}"
>>> def train_loop(config, model, noise_scheduler, optimizer, train_dataloader, lr_scheduler):
... # Initialize accelerator and tensorboard logging
... accelerator = Accelerator(
... mixed_precision=config.mixed_precision,
... gradient_accumulation_steps=config.gradient_accumulation_steps,
... log_with="tensorboard",
... logging_dir=os.path.join(config.output_dir, "logs"),
... )
... if accelerator.is_main_process:
... if config.push_to_hub:
... repo_name = get_full_repo_name(Path(config.output_dir).name)
... repo = Repository(config.output_dir, clone_from=repo_name)
... elif config.output_dir is not None:
... os.makedirs(config.output_dir, exist_ok=True)
... accelerator.init_trackers("train_example")
... # Prepare everything
... # There is no specific order to remember, you just need to unpack the
... # objects in the same order you gave them to the prepare method.
... model, optimizer, train_dataloader, lr_scheduler = accelerator.prepare(
... model, optimizer, train_dataloader, lr_scheduler
... )
... global_step = 0
... # Now you train the model
... for epoch in range(config.num_epochs):
... progress_bar = tqdm(total=len(train_dataloader), disable=not accelerator.is_local_main_process)
... progress_bar.set_description(f"Epoch {epoch}")
... for step, batch in enumerate(train_dataloader):
... clean_images = batch["images"]
... # Sample noise to add to the images
... noise = torch.randn(clean_images.shape).to(clean_images.device)
... bs = clean_images.shape[0]
... # Sample a random timestep for each image
... timesteps = torch.randint(
... 0, noise_scheduler.num_train_timesteps, (bs,), device=clean_images.device
... ).long()
... # Add noise to the clean images according to the noise magnitude at each timestep
... # (this is the forward diffusion process)
... noisy_images = noise_scheduler.add_noise(clean_images, noise, timesteps)
... with accelerator.accumulate(model):
... # Predict the noise residual
... noise_pred = model(noisy_images, timesteps, return_dict=False)[0]
... loss = F.mse_loss(noise_pred, noise)
... accelerator.backward(loss)
... accelerator.clip_grad_norm_(model.parameters(), 1.0)
... optimizer.step()
... lr_scheduler.step()
... optimizer.zero_grad()
... progress_bar.update(1)
... logs = {"loss": loss.detach().item(), "lr": lr_scheduler.get_last_lr()[0], "step": global_step}
... progress_bar.set_postfix(**logs)
... accelerator.log(logs, step=global_step)
... global_step += 1
... # After each epoch you optionally sample some demo images with evaluate() and save the model
... if accelerator.is_main_process:
... pipeline = DDPMPipeline(unet=accelerator.unwrap_model(model), scheduler=noise_scheduler)
... if (epoch + 1) % config.save_image_epochs == 0 or epoch == config.num_epochs - 1:
... evaluate(config, epoch, pipeline)
... if (epoch + 1) % config.save_model_epochs == 0 or epoch == config.num_epochs - 1:
... if config.push_to_hub:
... repo.push_to_hub(commit_message=f"Epoch {epoch}", blocking=True)
... else:
... pipeline.save_pretrained(config.output_dir)
```
Phew, that was quite a bit of code! But you're finally ready to launch the training with 🤗 Accelerate's [`~accelerate.notebook_launcher`] function. Pass the function the training loop, all the training arguments, and the number of processes (you can change this value to the number of GPUs available to you) to use for training:
```py
>>> from accelerate import notebook_launcher
>>> args = (config, model, noise_scheduler, optimizer, train_dataloader, lr_scheduler)
>>> notebook_launcher(train_loop, args, num_processes=1)
```
Once training is complete, take a look at the final 🦋 images 🦋 generated by your diffusion model!
```py
>>> import glob
>>> sample_images = sorted(glob.glob(f"{config.output_dir}/samples/*.png"))
>>> Image.open(sample_images[-1])
```
<div class="flex justify-center">
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/butterflies_final.png"/>
</div>
## Next steps
Unconditional image generation is one example of a task that can be trained. You can explore other tasks and training techniques by visiting the [🧨 Diffusers Training Examples](./training/overview) page. Here are some examples of what you can learn:
* [Textual Inversion](./training/text_inversion), an algorithm that teaches a model a specific visual concept and integrates it into the generated image.
* [DreamBooth](./training/dreambooth), a technique for generating personalized images of a subject given several input images of the subject.
* [Guide](./training/text2image) to finetuning a Stable Diffusion model on your own dataset.
* [Guide](./training/lora) to using LoRA, a memory-efficient technique for finetuning really large models faster.

View File

@@ -1,4 +1,4 @@
<!--Copyright 2022 The HuggingFace Team. All rights reserved.
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

View File

@@ -1,4 +1,4 @@
<!--Copyright 2022 The HuggingFace Team. All rights reserved.
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

Some files were not shown because too many files have changed in this diff Show More