Compare commits

...

376 Commits

Author SHA1 Message Date
Dhruv Nair
50a96ca719 update 2024-01-18 10:14:04 +00:00
Steven Liu
5ca062e011 [docs] Fix missing API function (#6604)
fix?
2024-01-17 13:59:09 -08:00
Linoy Tsaban
619e3ab6f6 [bug fix] advanced dreambooth lora sdxl - fixes bugs described in #6486 (#6599)
* fixes bugs:
1. redundant retraction
2. param clone
3. stopping optimization of text encoder params

* param upscaling

* style
2024-01-17 20:11:45 +05:30
Patrick von Platen
9e2804f720 Update pr_test_peft_backend.yml to use 1 process for testing (#6613) 2024-01-17 19:25:30 +05:30
Aryan V S
9112028ed8 FreeInit (#6315)
* freeinit

* update freeinit implementation based on review

Co-Authored-By: Dhruv Nair <dhruv.nair@gmail.com>

* fix

* another fix

* refactor

* fix timesteps missing bug

* apply suggestions from review

Co-Authored-By: Dhruv Nair <dhruv.nair@gmail.com>

* add test for freeinit

* apply suggestions from review

Co-Authored-By: Dhruv Nair <dhruv.nair@gmail.com>

* refactor

* fix test

* fix tensor not on same device

* update

* remove return_intermediate_results

* fix broken freeinit test

* update animatediff docs

---------

Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>
2024-01-17 17:17:07 +05:30
Steve Rhoades
dce06680d2 Fixes torch.compile() compatible training (#6589)
resolve conflicts
2024-01-17 07:47:03 +05:30
Bhavay Malhotra
dd63168319 Update installation.md (#6438)
* Update installation.md

* Update installation.md

* Update installation.md
2024-01-16 13:41:38 -08:00
Celestial Phineas
1040dfd9cc [Fix] Multiple image conditionings in a single batch for StableDiffusionControlNetPipeline (#6334)
* [Fix] Multiple image conditionings in a single batch for `StableDiffusionControlNetPipeline`.

* Refactor `check_inputs` in `StableDiffusionControlNetPipeline` to avoid redundant codes.

* Make the behavior of MultiControlNetModel to be the same to the original ControlNetModel

* Keep the code change minimum for nested list support

* Add fast test `test_inference_nested_image_input`

* Remove redundant check for nested image condition in `check_inputs`

Remove `len(image) == len(prompt)` check out of `check_image()`

Co-authored-by: YiYi Xu <yixu310@gmail.com>

* Better `ValueError` message for incompatible nested image list size

Co-authored-by: YiYi Xu <yixu310@gmail.com>

* Fix syntax error in `check_inputs`

* Remove warning message for multi-ControlNets with multiple prompts

* Fix a typo in test_controlnet.py

Co-authored-by: YiYi Xu <yixu310@gmail.com>

* Add test case for multiple prompts, single image conditioning in `StableDiffusionMultiControlNetPipelineFastTests`

* Improved `ValueError` message for nested `controlnet_conditioning_scale`

* Documenting the behavior of image list as `StableDiffusionControlNetPipeline` input

---------

Co-authored-by: YiYi Xu <yixu310@gmail.com>
2024-01-16 10:40:55 -10:00
Sayak Paul
49a4b377c1 remove omegaconf from the residues 👋 (#6600)
remove omegaconf 👋
2024-01-16 21:55:41 +05:30
JuanCarlosPi
dff35a86e4 Change in ip-adapter docs. CLIPVisionModelWithProjection should be im… (#6597)
Change in ip-adapter docs. CLIPVisionModelWithProjection should be imported from transformers, not diffusers
2024-01-16 08:18:13 -08:00
Yondon Fu
8842bcadb9 [SVD] Return np.ndarray when output_type="np" (#6507)
[SVD] Fix output_type="np"
2024-01-16 14:51:36 +05:30
Steve Rhoades
181280baba Fixes training resuming: Advanced Dreambooth LoRa Training (#6566)
* Fixes #6418 Advanced Dreambooth LoRa Training

* change order of import to fix nit

* fix nit, use cast_training_params

* remove torch.compile fix, will move to a new PR

* remove unnecessary import
2024-01-16 14:30:49 +05:30
Charchit Sharma
53f498d2a4 Use of Posix to better support Windows compatibility in testing_utils (#6587)
* changes in utils

* removed loc
2024-01-16 10:27:29 +02:00
Charchit Sharma
990860911f change to posix for better Windows support for lora loaders (#6590)
* posix lora

* changes and style fix
2024-01-16 13:46:29 +05:30
Fabio Rigano
23eed39702 Fix path generation in IP Adapter (#6564)
* Fix path generation on Windows

* Update set_default_attn_processors

* Use pathlib

* Fix quality

* Fix copy

* Revert changes in set_default_attn_processors

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-01-16 11:25:22 +05:30
YiYi Xu
fefed44543 update slow test for SDXL k-diffusion pipeline (#6588)
update expected slice
2024-01-15 18:54:33 -10:00
Dong
814f56d2fe 🐛 fix ip-adapter controlnet img2img missing code (#6528)
* 🐛 fix ip-adapter controlnet img2img missing code

* 📝 edit test

* 📝 edit test

* 📝 run make style and quality

* 🎨 remove slow tests
2024-01-15 16:41:09 -10:00
SangKim
96d6e16550 Enable image resizing to adjust its height and width in StableDiffusionXLInstructPix2PixPipeline (#6581)
* Enable image resizing to adjust its height and width in StableDiffusionXLInstructPix2PixPipeline

* Ensure that validation is performed at every 'validation_step', not at every step
2024-01-16 07:50:34 +05:30
Aryan V S
c11de13588 [training] fix training resuming problem for fp16 (SD LoRA DreamBooth) (#6554)
* fix training resume

* update

* update
2024-01-16 07:27:06 +05:30
Patrick von Platen
357855f8fc [Docs] Fix controlnet indent (#6578) 2024-01-15 18:12:55 +02:00
Fabio Rigano
f825221b5d [Community Pipeline] IPAdapter FaceID (#6276)
* Add support for IPAdapter FaceID

* Add docs

* Move subfolder to kwargs

* Fix quality

* Fix image encoder loading

* Fix loading + add test

* Move to community folder

* Fix style

* Revert constant update

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-01-15 16:43:54 +02:00
Aryan V S
119d734f6e [AnimateDiff+Controlnet] Fix multicontrolnet support (#6551)
* fix multicontrolnet support

* update README with multicontrolnet example
2024-01-15 16:36:54 +02:00
Sayak Paul
cb4b3f0b78 [OmegaConf] replace it with yaml (#6488)
* remove omegaconf from convert_from_ckpt.

* remove from single_file.

* change to string based ubscription.

* style

* okay

* fix: vae_param

* no . indexing.

* style

* style

* turn getattrs into explicit if/else

* style

* propagate changes to ldm_uncond.

* propagate to gligen

* propagate to if.

* fix: quotes.

* propagate to audioldm.

* propagate to audioldm2

* propagate to musicldm.

* propagate to vq_diffusion

* propagate to zero123.

* remove omegaconf from diffusers codebase.
2024-01-15 20:02:10 +05:30
Haofan Wang
3d574b3bbe Fix a bug of flip in SDXL training script (#6547)
* Update train_text_to_image_sdxl.py

* Update train_text_to_image_lora_sdxl.py
2024-01-15 16:28:04 +02:00
Charchit Sharma
09903774d9 Make T2I Adapter SDXL Training Script torch.compile compatible (#6577)
update for t2i_adapter
2024-01-15 19:42:56 +05:30
dependabot[bot]
d6a70d8ba8 Bump jinja2 from 3.1.2 to 3.1.3 in /examples/research_projects/realfill (#6539)
Bumps [jinja2](https://github.com/pallets/jinja) from 3.1.2 to 3.1.3.
- [Release notes](https://github.com/pallets/jinja/releases)
- [Changelog](https://github.com/pallets/jinja/blob/main/CHANGES.rst)
- [Commits](https://github.com/pallets/jinja/compare/3.1.2...3.1.3)

---
updated-dependencies:
- dependency-name: jinja2
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-01-15 16:10:10 +02:00
Charchit Sharma
e3103e171f Make InstructPix2Pix SDXL Training Script torch.compile compatible (#6576)
* changes for pix2pix_sdxl

* style fix
2024-01-15 17:54:03 +05:30
Charchit Sharma
b053053ac9 Make InstructPix2Pix Training Script torch.compile compatible (#6558)
* added torch.compile for pix2pix

* required changes
2024-01-15 17:03:22 +05:30
Vinh H. Pham
08702fc1cb Make text-to-image SDXL LoRA Training Script torch.compile compatible (#6556)
make compile compatible
2024-01-15 16:58:16 +05:30
Vinh H. Pham
7ce89e979c Make text-to-image SD LoRA Training Script torch.compile compatible (#6555)
make compile compatible
2024-01-15 16:55:08 +05:30
gzguevara
05faf3263b SDXL text-to-image torch compatible (#6550)
* torch compatible

* code quality fix

* ruff style

* ruff format
2024-01-15 16:49:11 +05:30
Sayak Paul
a080f0d3a2 [Training Utils] create a utility for casting the lora params during training. (#6553)
create a utility for casting the lora params during training.
2024-01-15 13:51:13 +05:30
Sayak Paul
79df50388d [Training] fix training resuming problem when using FP16 (SDXL LoRA DreamBooth) (#6514)
* fix: training resume from fp16.

* add: comment

* remove residue from another branch.

* remove more residues.

* thanks to Younes; no hacks.

* style.

* clean things a bit and modularize _set_state_dict_into_text_encoder

* add comment about the fix detailed.
2024-01-12 17:11:06 +05:30
Vinh H. Pham
7d631825b0 Make Dreambooth SD Training Script torch.compile compatible (#6532)
* support compile

* make style

* move unwrap_model inside function

* change unwrap call

* run make style

* Update examples/dreambooth/train_dreambooth.py

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

* Revert "Update examples/dreambooth/train_dreambooth.py"

This reverts commit 70ab09732e.

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-01-12 12:50:15 +05:30
gzguevara
33d2b5b087 SD text-to-image torch compile compatible (#6519)
* added unwrapper

* fiz typo
2024-01-12 09:28:35 +05:30
Suvaditya Mukherjee
f486d34b04 Make ControlNet SD Training Script torch.compile compatible (#6525)
* update: make controlnet script torch compile compatible

Signed-off-by: Suvaditya Mukherjee <suvadityamuk@gmail.com>

* update: correct earlier mistakes for compilation

Signed-off-by: Suvaditya Mukherjee <suvadityamuk@gmail.com>

* update: fix code style issues

Signed-off-by: Suvaditya Mukherjee <suvadityamuk@gmail.com>

---------

Signed-off-by: Suvaditya Mukherjee <suvadityamuk@gmail.com>
2024-01-12 09:27:26 +05:30
Charchit Sharma
e44b205e0b Make ControlNet SDXL Training Script torch.compile compatible (#6526)
* make torch.compile compatible

* fix quality
2024-01-12 09:25:09 +05:30
Vinh H. Pham
60cb44323d Make Dreambooth SD LoRA Training Script torch.compile compatible (#6534)
support compile
2024-01-12 09:24:03 +05:30
Radamés Ajna
1dd0ac9401 [DPO Training] pass tracker name as argument (#6542)
pass tracker name as argumentw
2024-01-12 09:15:39 +05:30
Yassine El Boudouri
c6b04589b6 Remove conversion to RGB (#6479)
* Remove conversion to RGB

* Add a Conversion Function

* Add type hint for convert_method

* Update src/diffusers/utils/loading_utils.py

Update docstring

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Update docstring

* Optimize imports

* Optimize imports (2)

* Reformat code

---------

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-01-12 07:20:24 +05:30
Sayak Paul
5dc3471380 [SVD] support generators that are created on GPU (#6484)
* debug generator

* fix?

* fix?

* fix

* remove print.

* revert none check
2024-01-11 20:08:18 +05:30
Aryan V S
9df566e6da [Community] StyleAligned Pipeline (#6489)
* add stylealigned sdxl pipeline

* bugfix

* update docs

* remove einops dependency

* update README

* update example docstring
2024-01-11 14:35:55 +01:00
Sayak Paul
be0b425762 [Training] make checkpointing compatible when using torch.compile (part II) (#6511)
make checkpointing compatible when using torch.compile.
2024-01-11 18:37:30 +05:30
jquintanilla4
da843b3d53 .load_ip_adapter in StableDiffusionXLAdapterPipeline (#6246)
* Added testing notebook and .load_ip_adapter to XLAdapterPipeline

* Added annotations

* deleted testing notebook

* Update src/diffusers/pipelines/t2i_adapter/pipeline_stable_diffusion_xl_adapter.py

Co-authored-by: YiYi Xu <yixu310@gmail.com>

* code clean up

* Add feature_extractor and image_encoder to components

---------

Co-authored-by: YiYi Xu <yixu310@gmail.com>
2024-01-11 13:04:08 +05:30
dg845
17cece072a Fix bug in LCM Distillation Scripts when args.unet_time_cond_proj_dim is used (#6523)
* Fix bug where unet's time_cond_proj_dim is not set correctly if using args.unet_time_cond_proj_dim.

* make style
2024-01-11 08:21:07 +05:30
Steven Liu
a551ddf928 [docs] mask_blur and padding_mask_crop (#6498)
new inpaint features
2024-01-10 08:14:34 -08:00
Steven Liu
1d57892980 [docs] Callbacks (#6471)
edits
2024-01-10 08:14:07 -08:00
antoine-scenario
3e8b63216e Add IP-Adapter to StableDiffusionXLControlNetImg2ImgPipeline (#6293)
* add IP-Adapter to StableDiffusionXLControlNetImg2ImgPipeline

Update src/diffusers/pipelines/controlnet/pipeline_controlnet_sd_xl_img2img.py

Co-authored-by: YiYi Xu <yixu310@gmail.com>

fix tests

* fix failing test

---------

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-01-09 22:02:11 -10:00
YiYi Xu
dd4459ad79 [Refactor] splitingResnetBlock2D into multiple blocks (#6166)
---------

Co-authored-by: yiyixuxu <yixu310@gmail,com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-01-09 21:38:05 -10:00
YiYi Xu
6313645b6b add StableDiffusionXLKDiffusionPipeline (#6447)
---------

Co-authored-by: yiyixuxu <yixu310@gmail,com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2024-01-09 16:29:01 -10:00
Rahul Raman
2d1f2182cc example: Train Instruct pix2 pix with lora implementation (#6469)
* base template file - train_instruct_pix2pix.py

* additional import and parser argument requried for lora

* finetune only instructpix2pix model -- no need to include these layers

* inject lora layers

* freeze unet model -- only lora layers are trained

* training modifications to train only lora parameters

* store only lora parameters

* move train script to research project

* run quality and style code checks

* move train script to a new folder

* add README

* update README

* update references in README

---------

Co-authored-by: Rahul Raman <rahulraman@gmail.com>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-01-10 06:38:19 +05:30
Steven Liu
3be7c96e28 [docs] Stable video diffusion (#6472)
svd
2024-01-09 09:21:58 -08:00
Steven Liu
3c79dd9dbe [docs] PEFT adapter API (#6499)
follow up
2024-01-09 08:09:15 -08:00
Steven Liu
9d767916da [docs] Fast diffusion (#6470)
* edits

* fix

* feedback
2024-01-09 08:08:31 -08:00
Patrick von Platen
ae7cd5ad4c Link issue template to discussions 2024-01-09 17:00:15 +01:00
Sayak Paul
4497b3ec98 [Training] make DreamBooth SDXL LoRA training script compatible with torch.compile (#6483)
* make it torch.compile comaptible

* make the text encoder compatible too.

* style
2024-01-09 20:11:26 +05:30
Yifan Zhou
fc63ebdd3a [Community Pipeline] Rerender-A-Video: Zero-Shot Video-to-Video Translation (#6332)
* upload codes and doc

* lint

* lint

* lint

* update code

* remove blank lines

* Fix load url
2024-01-09 14:55:34 +01:00
Sayak Paul
8b6fae4ce5 [SVD] fix: vae type (#6475)
fix: vae type
2024-01-09 14:50:02 +01:00
jiqing-feng
aa1797e109 enable stable-xl textual inversion (#6421)
* enable stable-xl textual inversion

* check if optimizer_2 exists

* check text_encoder_2 before using

* add textual inversion for sdxl in a single file

* fix style

* fix example style

* reset for error changes

* add readme for sdxl

* fix style

* disable autocast as it will cause cast error when weight_dtype=bf16

* fix spelling error

* fix style and readme and 8bit optimizer

* add README_sdxl.md link

* add tracker key on log_validation

* run style

* rm the second center crop

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-01-09 15:12:33 +05:30
Patrick von Platen
5bacc2f5af [SAG] Support more schedulers, add better error message and make tests faster (#6465)
* finish

* finish

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-01-09 09:24:38 +05:30
Yasuna
6ae7e8112a [Docs] update: tutorials ja | INDEX.md, TUTORIAL_OVERVIEW.md, TOCTREE.yml (#6338)
* add tutorials to toctree.yml

* fix title

* fix words

* add overview ja

* fix diffusion to 拡散

* fix line 21

* add space

* delete supported pipline

* fix tutorial_overview.md

* fix space

* fix typo

* Delete docs/source/ja/tutorials/using_peft_for_inference.md

this file is not translated

* Delete docs/source/ja/tutorials/basic_training.md

this file is not translated

* Delete docs/source/ja/tutorials/autopipeline.md

this file is not translated

* fix toctree
2024-01-08 09:06:46 -08:00
Sayak Paul
e0f349c2b0 correct reviewers in PR template (#6485) 2024-01-08 19:12:04 +05:30
Sayak Paul
774f5c4581 minor changes to the SVD doc (#6466)
minor changes
2024-01-06 08:40:46 +05:30
Sayak Paul
a483a8eddf Rename REAMDE.md to README.md 2024-01-05 20:49:44 +05:30
Lucain
9a9daee724 Fix offline mode import (#6467) 2024-01-05 15:34:40 +01:00
Sayak Paul
585f941366 [Core] introduce PeftAdapterMixin module. (#6416)
* introduce integrations module.

* remove duplicate methods.

* better imports.

* move to loaders.py

* remove peftadaptermixin from modelmixin.

* add: peftadaptermixin selectively.

* add: entry to _toctree

* Empty-Commit
2024-01-05 18:18:28 +05:30
Dhruv Nair
86a26761ac Correctly handle creating model index json files when setting compiled modules in pipelines. (#6436)
update
2024-01-05 18:02:09 +05:30
Liang Hou
6ef2b8a92f Fix amused paper link (#6462) 2024-01-05 13:12:09 +01:00
Vinh H. Pham
3848606c7e [Community Pipeline] Add gluegen (#6433)
* init works

* add gluegen pipeline

* add gluegen code

* add another way to load language adapter

* make style

* Update README.md

* change doc
2024-01-05 13:05:40 +01:00
Sayak Paul
2a97067b84 [Experimental] Diffusion LoRA DPO training (#6422)
* add: experimental script for diffusion dpo training.

* random_crop cli.

* fix: caption tokenization.

* fix: pixel_values index.

* fix: grad?

* debug

* fix: reduction.

* fixes in the loss calculation.

* style

* fix: unwrap call.

* fix: validation inference.

* add: initial sdxl script

* debug

* make sure images in the tuple are of same res

* fix model_max_length

* report print

* boom

* fix: numerical issues.

* fix: resolution

* comment about resize.

* change the order of the training transformation.

* save call.

* debug

* remove print

* manually detaching necessary?

* use the same vae for validation.

* add: readme.
2024-01-05 16:40:06 +05:30
Sayak Paul
ae060fc4f1 [feat] introduce unload_lora(). (#6451)
* introduce unload_lora.

* fix-copies
2024-01-05 16:22:11 +05:30
Sayak Paul
9d945b2b90 0.25.0 post release (#6358)
* post release

* style

---------

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2024-01-05 16:13:27 +05:30
Junsheng121
d184291c7d null-text-inversion-pipeline-implementation (#6329)
* null-text-inversion-implementation

* edited

* edited

* edited

* edited

* edited

* edit

* makestyle

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-01-05 11:35:21 +01:00
Sayak Paul
0a0bb526aa [LoRA depcrecation] LoRA depcrecation trilogy (#6450)
* edebug

* debug

* more debug

* more more debug

* remove tests for LoRAAttnProcessors.

* rename
2024-01-05 15:48:20 +05:30
Linoy Tsaban
2fada8dc1b [bug fix] fixes #6444 - checkpointing save issue in advanced dreambooth lora sdxl script (#6464)
* unwrap text encoder when saving hook only for full text encoder tuning

* unwrap text encoder when saving hook only for full text encoder tuning

* save embeddings in each checkpoint as well

* save embeddings in each checkpoint as well

* save embeddings in each checkpoint as well

* Update examples/advanced_diffusion_training/train_dreambooth_lora_sdxl_advanced.py

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-01-05 15:35:24 +05:30
jiqing-feng
f2d51a28f7 Intel Gen 4 Xeon and later support bf16 (#6367)
* Intel Gen 4 Xeon and later support bf16

* fix bf16 notes
2024-01-05 11:47:28 +05:30
Horseee
811fd06292 [Doc] Add DeepCache in section optimization/General optimizations (#6390)
* add documentation for DeepCache

* fix typo

* add wandb url for DeepCache

* fix some typos

* add item in _toctree.yml

* update formats for arguments

* Update deepcache.md

* Update docs/source/en/optimization/deepcache.md

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

* add StableDiffusionXLPipeline in doc

* Separate SDPipeline and SDXLPipeline

* Add the paper link of ablation experiments for hyper-parameters

* Apply suggestions from code review

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2024-01-05 09:57:08 +05:30
dg845
f3d1333e02 Improve LCM(-LoRA) Distillation Scripts (#6420)
* Make WDS pipeline interpolation type configurable.

* Make the VAE encoding batch size configurable.

* Make lora_alpha and lora_dropout configurable for LCM LoRA scripts.

* Generalize scalings_for_boundary_conditions function and make the timestep scaling configurable.

* Make LoRA target modules configurable for LCM-LoRA scripts.

* Move resolve_interpolation_mode to src/diffusers/training_utils.py and make interpolation type configurable in non-WDS script.

* apply suggestions from review
2024-01-05 06:55:13 +05:30
Steven Liu
acd926f4f2 [docs] Fix local links (#6440)
fix local links
2024-01-04 09:59:11 -08:00
Lucain
691d8d3e15 Respect offline mode when loading pipeline (#6456)
* Respect offline mode when loading model

* default to local entry if connectionerror
2024-01-04 17:05:55 +01:00
Sayak Paul
e7c0af5e71 add: amused paper link. (#6453) 2024-01-04 13:44:54 +05:30
Sayak Paul
107e02160a [LoRA tests] fix stuff related to assertions arising from the recent changes. (#6448)
* debug

* debug test_with_different_scales_fusion_equivalence

* use the right method.

* place it right.

* let's see.

* let's see again

* alright then.

* add a comment.
2024-01-04 12:55:15 +05:30
sayakpaul
6dbef45e6e Revert "debug"
This reverts commit 7715e6c31c.
2024-01-04 10:39:38 +05:30
sayakpaul
7715e6c31c debug 2024-01-04 10:39:00 +05:30
sayakpaul
05b3d36a25 Revert "debug"
This reverts commit fb4aec0ce3.
2024-01-04 10:38:04 +05:30
sayakpaul
fb4aec0ce3 debug 2024-01-04 10:37:28 +05:30
Sayak Paul
63de23e3db disable running peft non-peft lora test in the peft env. (#6437)
* disable running peft non-peft lora test in the peft env.

* Empty-Commit
2024-01-04 10:18:46 +05:30
Chi
2993257f2a Batter way to write binarize() function. (#6394)
* I added a new doc string to the class. This is more flexible to understanding other developers what are doing and where it's using.

* Update src/diffusers/models/unet_2d_blocks.py

This changes suggest by maintener.

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

* Update src/diffusers/models/unet_2d_blocks.py

Add suggested text

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

* Update unet_2d_blocks.py

I changed the Parameter to Args text.

* Update unet_2d_blocks.py

proper indentation set in this file.

* Update unet_2d_blocks.py

a little bit of change in the act_fun argument line.

* I run the black command to reformat style in the code

* Update unet_2d_blocks.py

similar doc-string add to have in the original diffusion repository.

* Batter way to write binarize function

* Solve check_code_quality error

* My mistake to run pull request but not reformated file

* Update image_processor.py

* remove extra variable and space

* Update image_processor.py

* Run ruff libarary to reformat my file

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: YiYi Xu <yixu310@gmail.com>
2024-01-04 09:32:08 +05:30
Sayak Paul
aad18faa3e Update README_sdxl.md to update the LR (#6432)
Update README_sdxl.md
2024-01-03 20:55:51 +05:30
Sayak Paul
d700140076 [LoRA deprecation] handle rest of the stuff related to deprecated lora stuff. (#6426)
* handle rest of the stuff related to deprecated lora stuff.

* fix: copies

* don't modify the uNet in-place.

* fix: temporal autoencoder.

* manually remove lora layers.

* don't copy unet.

* alright

* remove lora attn processors from unet3d

* fix: unet3d.

* styl

* Empty-Commit
2024-01-03 20:54:09 +05:30
Sayak Paul
2e4dc3e25d [LoRA] add: test to check if peft loras are loadable in non-peft envs. (#6400)
* add: test to check if peft loras are loadable in non-peft envs.

* add torch_device approrpiately.

* fix: get_dummy_inputs().

* test logits.

* rename

* debug

* debug

* fix: generator

* new assertion values after fixing the seed.

* shape

* remove print statements and settle this.

* to update values.

* change values when lora config is initialized under a fixed seed.

* update colab link

* update notebook link

* sanity restored by getting the exact same values without peft.
2024-01-03 09:57:49 +05:30
YiYi Xu
3e2961f0b4 [doc] update inpaint doc to use apply_overlay (#6364)
add doc

Co-authored-by: yiyixuxu <yixu310@gmail,com>
2024-01-02 11:16:36 -10:00
Vinh H. Pham
79c380bc80 Correct how apply_overlay read crop_coords (#6417)
correct reading variables
2024-01-02 19:31:12 +01:00
Aryan V S
e30b661437 Update lpw_xl pipeline to latest diffusers (#6411)
* add clip_skip, freeu, qkv

* fix

* add ip-adapter support

* callback on step end

* update

* fix NoneType bug

* fix

* add guidance scale embedding

* add textual inversion
2024-01-02 16:28:45 +01:00
Linoy Tsaban
b4077af212 [bug fix] using snr gamma and prior preservation loss in the dreambooth lora sdxl training scripts (#6356)
* change timesteps used to calculate snr when --with_prior_preservation is enabled

* change timesteps used to calculate snr when --with_prior_preservation is enabled (canonical script)

* style

* revert canonical script to before snr gamma change

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-01-02 09:21:39 -06:00
Daniel Socek
9f2bff502e [svd] fix noise_aug_strength type in svd pipe (#6389) 2024-01-02 14:45:07 +01:00
CyrusVorwald
0cb92717f9 add StableDiffusionXLControlNetInpaintPipeline to auto pipeline (#6302)
* add StableDiffusionXLControlNetInpaintPipeline to auto pipeline

* fixed style
2024-01-02 14:44:48 +01:00
Fabio Rigano
86714b72d0 Add unload_ip_adapter method (#6192)
* Add unload_ip_adapter method

* Update attn_processors with original layers

* Add test

* Use set_default_attn_processor

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-01-02 14:40:46 +01:00
Sayak Paul
61f6c5472a [LoRA] Remove the use of depcrecated loRA functionalities such as LoRAAttnProcessor (#6369)
* start deprecating loraattn.

* fix

* wrap into unet_lora_state_dict

* utilize text_encoder_lora_params

* utilize text_encoder_attn_modules

* debug

* debug

* remove print

* don't use text encoder for test_stable_diffusion_lora

* load the procs.

* set_default_attn_processor

* fix: set_default_attn_processor call.

* fix: lora_components[unet_lora_params]

* checking for 3d.

* 3d.

* more fixes.

* debug

* debug

* debug

* debug

* more debug

* more debug

* more debug

* more debug

* more debug

* more debug

* hack.

* remove comments and prep for a PR.

* appropriate set_lora_weights()

* fix

* fix: test_unload_lora_sd

* fix: test_unload_lora_sd

* use dfault attebtion processors.

* debu

* debug nan

* debug nan

* debug nan

* use NaN instead of inf

* remove comments.

* fix: test_text_encoder_lora_state_dict_unchanged

* attention processor default

* default attention processors.

* default

* style
2024-01-02 18:14:04 +05:30
lookas
17546020fc Fix #6409 (#6410)
* Update value_guided_sampling.py

Fix #6409

* Comply code style

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-01-02 11:16:21 +05:30
2510
8a366b835c Fix gradient-checkpointing option is ignored in SDXL+LoRA training. (#6388) (#6402)
* Fix gradient-checkpointing option is ignored in SDXL+LoRA training. (#6388)

* Fix gradient-checkpointing option is ignored in SD+LoRA training.

* Fix gradient checkpoint is not applied to text encoders. (SDXL+LoRA)

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-01-01 08:51:04 +05:30
Sayak Paul
61d223c884 add: CUDA graph details. (#6408) 2023-12-31 13:43:26 +05:30
apolinário
bf725e044e Add new WebUI conversion state_dict_utils to __init__ utils (#6404)
* Add new state_dict_utils to __init__ utils

* style

---------

Co-authored-by: multimodalart <joaopaulo.passos+multimodal@gmail.com>
2023-12-30 09:31:39 -06:00
apolinário
1622265e13 Add WebUI format support to Advanced Training Script (#6403)
* Add WebUI format support to Advanced Training Script

* style

---------

Co-authored-by: multimodalart <joaopaulo.passos+multimodal@gmail.com>
2023-12-30 08:45:49 -06:00
apolinário
0b63ad5ad5 Create convert_diffusers_sdxl_lora_to_webui.py (#6395)
* Create convert_diffusers_sdxl_lora_to_webui.py

* Move some conversion logic to utils

* fix logging import

* Add usage example

---------

Co-authored-by: multimodalart <joaopaulo.passos+multimodal@gmail.com>
2023-12-30 08:15:11 -06:00
Sayak Paul
6a376ceea2 [LoRA] remove unnecessary components from lora peft test suite (#6401)
remove unnecessary components from lora peft suite/
2023-12-30 18:25:40 +05:30
gzguevara
9f283b01d2 changed w&b report link (#6387) 2023-12-29 19:49:11 +05:30
Sayak Paul
203724e9d9 [Docs] add note on fp16 in fast diffusion (#6380)
add note on fp16
2023-12-29 09:38:50 +05:30
gzguevara
e7044a4221 multi-subject-dreambooth-inpainting with 🤗 datasets (#6378)
* files added

* fixing code quality

* fixing code quality

* fixing code quality

* fixing code quality

* sorted import block

* seperated import wandb

* ruff on script

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2023-12-29 09:33:49 +05:30
Sayak Paul
034b39b8cb [docs] add details concerning diffusers-specific bits. (#6375)
add details concerning diffusers-specific bits.
2023-12-28 23:12:49 +05:30
Sayak Paul
2db73f4a50 remove delete documentation trigger workflows. (#6373) 2023-12-28 18:26:14 +05:30
Adrian Punga
84d7faebe4 Fix support for MPS in KDPM2AncestralDiscreteScheduler (#6365)
Fix support for MPS

MPS doesn't support float64
2023-12-28 10:22:02 +01:00
YiYi Xu
4c483deb90 [refactor embeddings] gligen + ip-adapter (#6244)
* refactor ip-adapter-imageproj, gligen

---------

Co-authored-by: yiyixuxu <yixu310@gmail,com>
2023-12-27 18:48:42 -10:00
Sayak Paul
1ac07d8a8d [Training examples] Follow up of #6306 (#6346)
* add to dreambooth lora.

* add: t2i lora.

* add: sdxl t2i lora.

* style

* lcm lora sdxl.

* unwrap

* fix: enable_adapters().
2023-12-28 07:37:50 +05:30
apolinário
1fff527702 Fix keys for lora format on advanced training scripts (#6361)
fix keys for lora format on advanced training scripts
2023-12-27 11:38:03 -06:00
apolinário
645a62bf3b Add PEFT to advanced training script (#6294)
* Fix ProdigyOPT in SDXL Dreambooth script

* style

* style

* Add PEFT to Advanced Training Script

* style

* style

*  style 

* change order for logic operation

* add lora alpha

* style

* Align PEFT to new format

* Update train_dreambooth_lora_sdxl_advanced.py

Apply #6355 fix

---------

Co-authored-by: multimodalart <joaopaulo.passos+multimodal@gmail.com>
2023-12-27 10:00:32 -03:00
Dhruv Nair
6414d4e4f9 Fix chunking in SVD (#6350)
fix
2023-12-27 13:07:41 +01:00
Andy W
43672b4a22 Fix "push_to_hub only create repo in consistency model lora SDXL training script" (#6102)
* fix

* style fix

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2023-12-27 15:25:19 +05:30
dg845
9df3d84382 Fix LCM distillation bug when creating the guidance scale embeddings using multiple GPUs. (#6279)
Fix bug when creating the guidance embeddings using multiple GPUs.

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2023-12-27 14:25:21 +05:30
Jianqi Pan
c751449011 fix: use retrieve_latents (#6337) 2023-12-27 10:44:26 +05:30
Dhruv Nair
c1e8bdf1d4 Move ControlNetXS into Community Folder (#6316)
* update

* update

* update

* update

* update

* make style

* remove docs

* update

* move to research folder.

* fix-copies

* remove _toctree entry.

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2023-12-27 08:15:23 +05:30
Sayak Paul
78b87dc25a [LoRA] make LoRAs trained with peft loadable when peft isn't installed (#6306)
* spit diffusers-native format from the get go.

* rejig the peft_to_diffusers mapping.
2023-12-27 08:01:10 +05:30
Will Berman
0af12f1f8a amused update links to new repo (#6344)
* amused update links to new repo

* lint
2023-12-26 22:46:28 +01:00
Justin Ruan
6e123688dc Remove unused parameters and fixed FutureWarning (#6317)
* Remove unused parameters and fixed `FutureWarning`

* Fixed wrong config instance

* update unittest for `DDIMInverseScheduler`
2023-12-26 22:09:10 +01:00
YiYi Xu
f0a588b8e2 adding auto1111 features to inpainting pipeline (#6072)
* add inpaint_full_res

* fix

* update

* move get_crop_region to image processor

* Update src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_inpaint.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* move apply_overlay to image processor

---------

Co-authored-by: yiyixuxu <yixu310@gmail,com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2023-12-26 10:20:29 -10:00
priprapre
fa31704420 [SDXL-IP2P] Update README_sdxl, Replace the link for wandb log with the correct run (#6270)
Replace the link for wandb log with the correct run
2023-12-26 21:13:11 +01:00
Sayak Paul
9d79991da0 [Docs] fix: video rendering on svd. (#6330)
fix: video rendering on svd.
2023-12-26 21:05:22 +01:00
Will Berman
7d865ac9c6 amused other pipelines docs (#6343)
other pipelines
2023-12-26 20:20:32 +01:00
Dhruv Nair
fb02316db8 Add AnimateDiff conversion scripts (#6340)
* add scripts

* update
2023-12-26 22:40:00 +05:30
Dhruv Nair
98a2b3d2d8 Update Animatediff docs (#6341)
* update

* update

* update
2023-12-26 22:39:46 +05:30
Dhruv Nair
2026ec0a02 Interruptable Pipelines (#5867)
* add interruptable pipelines

* add tests

* updatemsmq

* add interrupt property

* make fix copies

* Revert "make fix copies"

This reverts commit 914b35332b.

* add docs

* add tutorial

* Update docs/source/en/tutorials/interrupting_diffusion_process.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/tutorials/interrupting_diffusion_process.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* update

* fix quality issues

* fix

* update

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2023-12-26 22:39:26 +05:30
dg845
3706aa3305 Add rescale_betas_zero_snr Argument to DDPMScheduler (#6305)
* Add rescale_betas_zero_snr argument to DDPMScheduler.

* Propagate rescale_betas_zero_snr changes to DDPMParallelScheduler.

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2023-12-26 17:54:30 +01:00
Sayak Paul
d4f10ea362 [Diffusion fast] add doc for diffusion fast (#6311)
* add doc for diffusion fast

* add entry to _toctree

* Apply suggestions from code review

* fix titlew

* fix: title entry

* add note about fuse_qkv_projections
2023-12-26 22:19:55 +05:30
Younes Belkada
3aba99af8f [Peft / Lora] Add adapter_names in fuse_lora (#5823)
* add adapter_name in fuse

* add tesrt

* up

* fix CI

* adapt from suggestion

* Update src/diffusers/utils/testing_utils.py

Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>

* change to `require_peft_version_greater`

* change variable names in test

* Update src/diffusers/loaders/lora.py

Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>

* break into 2 lines

* final comments

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>
2023-12-26 16:54:47 +01:00
Sayak Paul
6683f97959 [Training] Add datasets version of LCM LoRA SDXL (#5778)
* add: script to train lcm lora for sdxl with 🤗 datasets

* suit up the args.

* remove comments.

* fix num_update_steps

* fix batch unmarshalling

* fix num_update_steps_per_epoch

* fix; dataloading.

* fix microconditions.

* unconditional predictions debug

* fix batch size.

* no need to use use_auth_token

* Apply suggestions from code review

Co-authored-by: Suraj Patil <surajp815@gmail.com>

* make vae encoding batch size an arg

* final serialization in kohya

* style

* state dict rejigging

* feat: no separate teacher unet.

* debug

* fix state dict serialization

* debug

* debug

* debug

* remove prints.

* remove kohya utility and make style

* fix serialization

* fix

* add test

* add peft dependency.

* add: peft

* remove peft

* autocast device determination from accelerator

* autocast

* reduce lora rank.

* remove unneeded space

* Apply suggestions from code review

Co-authored-by: Suraj Patil <surajp815@gmail.com>

* style

* remove prompt dropout.

* also save in native diffusers ckpt format.

* debug

* debug

* debug

* better formation of the null embeddings.

* remove space.

* autocast fixes.

* autocast fix.

* hacky

* remove lora_sayak

* Apply suggestions from code review

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

* style

* make log validation leaner.

* move back enabled in.

* fix: log_validation call.

* add: checkpointing tests

* taking my chances to see if disabling autocasting has any effect?

* start debugging

* name

* name

* name

* more debug

* more debug

* index

* remove index.

* print length

* print length

* print length

* move unet.train() after add_adapter()

* disable some prints.

* enable_adapters() manually.

* remove prints.

* some changes.

* fix params_to_optimize

* more fixes

* debug

* debug

* remove print

* disable grad for certain contexts.

* Add support for IPAdapterFull (#5911)

* Add support for IPAdapterFull


Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

---------

Co-authored-by: YiYi Xu <yixu310@gmail.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Fix a bug in `add_noise` function  (#6085)

* fix

* copies

---------

Co-authored-by: yiyixuxu <yixu310@gmail,com>

* [Advanced Diffusion Script] Add Widget default text (#6100)

add widget

* [Advanced Training Script] Fix pipe example (#6106)

* IP-Adapter for StableDiffusionControlNetImg2ImgPipeline (#5901)

* adapter for StableDiffusionControlNetImg2ImgPipeline

* fix-copies

* fix-copies

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

* IP adapter support for most pipelines (#5900)

* support ip-adapter in src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_upscale.py

* support ip-adapter in src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_attend_and_excite.py

* support ip-adapter in src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_instruct_pix2pix.py

* update tests

* support ip-adapter in src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_panorama.py

* support ip-adapter in src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_sag.py

* support ip-adapter in src/diffusers/pipelines/stable_diffusion_safe/pipeline_stable_diffusion_safe.py

* support ip-adapter in src/diffusers/pipelines/latent_consistency_models/pipeline_latent_consistency_text2img.py

* support ip-adapter in src/diffusers/pipelines/latent_consistency_models/pipeline_latent_consistency_img2img.py

* support ip-adapter in src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_ldm3d.py

* revert changes to sd_attend_and_excite and sd_upscale

* make style

* fix broken tests

* update ip-adapter implementation to latest

* apply suggestions from review

---------

Co-authored-by: YiYi Xu <yixu310@gmail.com>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

* fix: lora_alpha

* make vae casting conditional/

* param upcasting

* propagate comments from https://github.com/huggingface/diffusers/pull/6145

Co-authored-by: dg845 <dgu8957@gmail.com>

* [Peft] fix saving / loading when unet is not "unet" (#6046)

* [Peft] fix saving / loading when unet is not "unet"

* Update src/diffusers/loaders/lora.py

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

* undo stablediffusion-xl changes

* use unet_name to get unet for lora helpers

* use unet_name

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

* [Wuerstchen] fix fp16 training and correct lora args (#6245)

fix fp16 training

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

* [docs] fix: animatediff docs (#6339)

fix: animatediff docs

* add: note about the new script in readme_sdxl.

* Revert "[Peft] fix saving / loading when unet is not "unet" (#6046)"

This reverts commit 4c7e983bb5.

* Revert "[Wuerstchen] fix fp16 training and correct lora args (#6245)"

This reverts commit 0bb9cf0216.

* Revert "[docs] fix: animatediff docs (#6339)"

This reverts commit 11659a6f74.

* remove tokenize_prompt().

* assistive comments around enable_adapters() and diable_adapters().

---------

Co-authored-by: Suraj Patil <surajp815@gmail.com>
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
Co-authored-by: Fabio Rigano <57982783+fabiorigano@users.noreply.github.com>
Co-authored-by: YiYi Xu <yixu310@gmail.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: yiyixuxu <yixu310@gmail,com>
Co-authored-by: apolinário <joaopaulo.passos@gmail.com>
Co-authored-by: Charchit Sharma <charchitsharma11@gmail.com>
Co-authored-by: Aryan V S <contact.aryanvs@gmail.com>
Co-authored-by: dg845 <dgu8957@gmail.com>
Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com>
2023-12-26 21:22:05 +05:30
Sayak Paul
4e7b0cb396 [docs] fix: animatediff docs (#6339)
fix: animatediff docs
2023-12-26 19:13:49 +05:30
Kashif Rasul
35b81fffae [Wuerstchen] fix fp16 training and correct lora args (#6245)
fix fp16 training

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2023-12-26 11:40:04 +01:00
Kashif Rasul
e0d8c910e9 [Peft] fix saving / loading when unet is not "unet" (#6046)
* [Peft] fix saving / loading when unet is not "unet"

* Update src/diffusers/loaders/lora.py

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

* undo stablediffusion-xl changes

* use unet_name to get unet for lora helpers

* use unet_name

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2023-12-26 11:39:28 +01:00
dg845
a3d31e3a3e Change LCM-LoRA README Script Example Learning Rates to 1e-4 (#6304)
Change README LCM-LoRA example learning rates to 1e-4.
2023-12-25 21:29:20 +05:30
Jianqi Pan
84c403aedb fix: cannot set guidance_scale (#6326)
fix: set guidance_scale
2023-12-25 21:16:57 +05:30
Sayak Paul
f4b0b26f7e [Tests] Speed up example tests (#6319)
* remove validation args from textual onverson tests

* reduce number of train steps in textual inversion tests

* fix: directories.

* debig

* fix: directories.

* remove validation tests from textual onversion

* try reducing the time of test_text_to_image_checkpointing_use_ema

* fix: directories

* speed up test_text_to_image_checkpointing

* speed up test_text_to_image_checkpointing_checkpoints_total_limit_removes_multiple_checkpoints

* fix

* speed up test_instruct_pix2pix_checkpointing_checkpoints_total_limit_removes_multiple_checkpoints

* set checkpoints_total_limit to 2.

* test_text_to_image_lora_checkpointing_checkpoints_total_limit_removes_multiple_checkpoints speed up

* speed up test_unconditional_checkpointing_checkpoints_total_limit_removes_multiple_checkpoints

* debug

* fix: directories.

* speed up test_instruct_pix2pix_checkpointing_checkpoints_total_limit

* speed up: test_controlnet_checkpointing_checkpoints_total_limit_removes_multiple_checkpoints

* speed up test_controlnet_sdxl

* speed up dreambooth tests

* speed up test_dreambooth_lora_checkpointing_checkpoints_total_limit_removes_multiple_checkpoints

* speed up test_custom_diffusion_checkpointing_checkpoints_total_limit_removes_multiple_checkpoints

* speed up test_text_to_image_lora_sdxl_text_encoder_checkpointing_checkpoints_total_limit

* speed up # checkpoint-2 should have been deleted

* speed up examples/text_to_image/test_text_to_image.py::TextToImage::test_text_to_image_checkpointing_checkpoints_total_limit

* additional speed ups

* style
2023-12-25 19:50:48 +05:30
Sayak Paul
89459a5d56 fix: lora peft dummy components (#6308)
* fix: lora peft dummy components

* fix: dummy components
2023-12-25 11:26:45 +05:30
Sayak Paul
008d9818a2 fix: t2i apdater paper link (#6314) 2023-12-25 10:45:14 +05:30
mwkldeveloper
2d43094ffc fix RuntimeError: Input type (float) and bias type (c10::Half) should be the same in train_text_to_image_lora.py (#6259)
* fix RuntimeError: Input type (float) and bias type (c10::Half) should be the same

* format source code

* format code

* remove the autocast blocks within the pipeline

* add autocast blocks to pipeline caller in train_text_to_image_lora.py
2023-12-24 14:34:35 +05:30
Celestial Phineas
7c05b975b7 Fix typos in the ValueError for a nested image list as StableDiffusionControlNetPipeline input. (#6286)
Fixed typos in the `ValueError` for a nested image list as input.
2023-12-24 14:32:24 +05:30
Dhruv Nair
fe574c8b29 LoRA Unfusion test fix (#6291)
update

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2023-12-24 14:31:48 +05:30
Sayak Paul
90b9479903 [LoRA PEFT] fix LoRA loading so that correct alphas are parsed (#6225)
* initialize alpha too.

* add: test

* remove config parsing

* store rank

* debug

* remove faulty test
2023-12-24 09:59:41 +05:30
apolinário
df76a39e1b Fix Prodigy optimizer in SDXL Dreambooth script (#6290)
* Fix ProdigyOPT in SDXL Dreambooth script

* style

* style
2023-12-22 06:42:04 -06:00
Bingxin Ke
3369bc810a [Community Pipeline] Add Marigold Monocular Depth Estimation (#6249)
* [Community Pipeline] Add Marigold Monocular Depth Estimation

- add single-file pipeline
- update README

* fix format - add one blank line

* format script with ruff

* use direct image link in example code

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2023-12-22 15:41:46 +05:30
Pedro Cuenca
7fe47596af Allow diffusers to load with Flax, w/o PyTorch (#6272) 2023-12-22 09:37:30 +01:00
Dhruv Nair
59d1caa238 Remove peft tests from old lora backend tests (#6273)
update
2023-12-22 13:35:52 +05:30
Dhruv Nair
c022e52923 Remove ONNX inpaint legacy (#6269)
update

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2023-12-22 13:35:21 +05:30
Will Berman
4039815276 open muse (#5437)
amused

rename

Update docs/source/en/api/pipelines/amused.md

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

AdaLayerNormContinuous default values

custom micro conditioning

micro conditioning docs

put lookup from codebook in constructor

fix conversion script

remove manual fused flash attn kernel

add training script

temp remove training script

add dummy gradient checkpointing func

clarify temperatures is an instance variable by setting it

remove additional SkipFF block args

hardcode norm args

rename tests folder

fix paths and samples

fix tests

add training script

training readme

lora saving and loading

non-lora saving/loading

some readme fixes

guards

Update docs/source/en/api/pipelines/amused.md

Co-authored-by: Suraj Patil <surajp815@gmail.com>

Update examples/amused/README.md

Co-authored-by: Suraj Patil <surajp815@gmail.com>

Update examples/amused/train_amused.py

Co-authored-by: Suraj Patil <surajp815@gmail.com>

vae upcasting

add fp16 integration tests

use tuple for micro cond

copyrights

remove casts

delegate to torch.nn.LayerNorm

move temperature to pipeline call

upsampling/downsampling changes
2023-12-21 11:40:55 -08:00
Sayak Paul
5b186b7128 [Refactor] move ldm3d out of stable_diffusion. (#6263)
ldm3d.
2023-12-21 18:59:55 +05:30
Sayak Paul
ab0459f2b7 [Deprecated pipelines] remove pix2pix zero from init (#6268)
remove pix2pix zero from init
2023-12-21 18:17:28 +05:30
Sayak Paul
9c7cc36011 [Refactor] move panorama out of stable_diffusion (#6262)
* move panorama out.

* fix: diffedit

* fix: import.

* fix: impirt
2023-12-21 18:17:05 +05:30
Sayak Paul
325f6c53ed [Refactor] move attend and excite out of stable_diffusion. (#6261)
* move attend and excite out.

* fix: import

* fix diffedit
2023-12-21 16:49:32 +05:30
Benjamin Bossan
43979c2890 TST Fix LoRA test that fails with PEFT >= 0.7.0 (#6216)
See #6185 for context.

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2023-12-21 11:50:05 +01:00
Sayak Paul
9ea6ac1b07 [Refactor] move sag out of stable_diffusion (#6264)
move sag out of .
2023-12-21 16:09:49 +05:30
Sayak Paul
2c34c7d6dd [Refactor] move gligen out of stable diffusion. (#6265)
* move gligen out of stable diffusion.

* fix: import

* fix import module
2023-12-21 15:26:52 +05:30
Sayak Paul
bffadde126 [Refactor] move k diffusion out of stable_diffusion (#6267)
move k diffusion out of stable_diffusion
2023-12-21 15:24:24 +05:30
YShow
35a969d297 [Training] remove depcreated method from lora scripts again (#6266)
* remove depcreated method from lora scripts

* check code quality
2023-12-21 14:17:52 +05:30
sayakpaul
c5ff469d0e Revert "move attend and excite out of stable_diffusion"
This reverts commit bcecfbc873.
2023-12-21 12:35:58 +05:30
sayakpaul
bcecfbc873 move attend and excite out of stable_diffusion 2023-12-21 12:35:09 +05:30
Sayak Paul
6269045c5b [Refactor] move diffedit out of stable_diffusion (#6260)
* move diffedit out of stable_diffuson

* fix: import

* style

* fix: import
2023-12-21 12:26:36 +05:30
lvzi
6ca9c4af05 fix: unscale fp16 gradient problem & potential error (#6086) (#6231)
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2023-12-21 09:09:26 +05:30
dependabot[bot]
0532cece97 Bump transformers from 4.34.0 to 4.36.0 in /examples/research_projects/realfill (#6255)
Bump transformers in /examples/research_projects/realfill

Bumps [transformers](https://github.com/huggingface/transformers) from 4.34.0 to 4.36.0.
- [Release notes](https://github.com/huggingface/transformers/releases)
- [Commits](https://github.com/huggingface/transformers/compare/v4.34.0...v4.36.0)

---
updated-dependencies:
- dependency-name: transformers
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-12-21 09:03:17 +05:30
Sayak Paul
22b45304bf [Refactor upsamplers and downsamplers] separate out upsamplers and downsamplers. (#6128)
* separate out upsamplers and downsamplers.

* import all the necessary blocks in resnet for backward comp.

* move upsample2d and downsample2d to utils.

* move downsample_2d to downsamplers.py

* apply feedback

* fix import

* samplers -> sampling
2023-12-20 21:01:33 +05:30
Beinsezii
457abdf2cf EulerAncestral add rescale_betas_zero_snr (#6187)
* EulerAncestral add `rescale_betas_zero_snr`

Uses same infinite sigma fix from EulerDiscrete. Interestingly the
ancestral version had the opposite problem: too much contrast instead of
too little.

* UT for EulerAncestral `rescale_betas_zero_snr`

* EulerAncestral upcast samples during step()

It helps this scheduler too, particularly when the model is using bf16.

While the noise dtype is still the model's it's automatically upcasted
for the add so all it affects is determinism.

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2023-12-20 13:09:25 +05:30
hako-mikan
ff43dba7ea [Fix] Fix Regional Prompting Pipeline (#6188)
* Update regional_prompting_stable_diffusion.py

* reformat

* reformat

* reformat

* reformat

* reformat

* reformat

* reformat

* regormat

* reformat

* reformat

* reformat

* reformat

* Update regional_prompting_stable_diffusion.py

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2023-12-20 10:37:19 +05:30
Steven Liu
5433962992 [docs] Batched seeds (#6237)
batched seed
2023-12-19 16:50:18 -08:00
raven
df476d9f63 [Docs] Fix a code example in the ControlNet Inpainting documentation (#6236)
fix document on masked image in inpainting controlnet
2023-12-19 13:14:37 -08:00
YiYi Xu
3e71a20650 [refactor embeddings]pixart-alpha (#6212)
pixart-alpha

Co-authored-by: yiyixuxu <yixu310@gmail,com>
2023-12-19 07:07:24 -10:00
Sayak Paul
bf40d7d82a add peft dependency to fast push tests (#6229)
* add peft dependency

* add peft dependency at the correct place.
2023-12-19 13:26:25 +05:30
Dhruv Nair
32ff4773d4 ControlNetXS fixes. (#6228)
update
2023-12-19 11:58:34 +05:30
Sayak Paul
288ceebea5 [T2I LoRA training] fix: unscale fp16 gradient problem (#6119)
* fix: unscale fp16 gradient problem

* fix for dreambooth lora sdxl

* make the type-casting conditional.

* Apply suggestions from code review

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

---------

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2023-12-19 09:54:17 +05:30
Sayak Paul
9221da4063 fix: init for vae during pixart tests (#6215)
* fix: init for vae during pixart tests

* print the values

* add flatten

* correct assertion value for test_inference

* correct assertion values for test_inference_non_square_images

* run styling

* debug test_inference_with_multiple_images_per_prompt

* fix assertion values for test_inference_with_multiple_images_per_prompt
2023-12-18 18:16:57 -10:00
YiYi Xu
57fde871e1 offload the optional module image_encoder (#6151)
* offload image_encoder

* add test

---------

Co-authored-by: yiyixuxu <yixu310@gmail,com>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2023-12-18 15:10:01 -10:00
Fabio Rigano
68e962395c Add converter method for ip adapters (#6150)
* Add converter method for ip adapters

* Move converter method

* Update to image proj converter

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2023-12-18 13:46:43 -10:00
Dhruv Nair
781775ea56 Slow Test for Pipelines minor fixes (#6221)
update
2023-12-19 00:45:51 +05:30
Patrick von Platen
fa3c86beaf [SVD] Fix guidance scale (#6002)
* [SVD] Fix guidance scale

* make style
2023-12-18 19:33:24 +01:00
Haofan Wang
7d0a47f387 Update train_text_to_image_lora.py (#6144)
* Update train_text_to_image_lora.py

* Fix typo?

---------

Co-authored-by: M. Tolga Cangöz <46008593+standardAI@users.noreply.github.com>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2023-12-18 19:33:05 +01:00
Aryan V S
67b3d3267e Support img2img and inpaint in lpw-xl (#6114)
* add img2img and inpaint support to lpw-xl

* update community README

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2023-12-18 19:19:11 +01:00
TilmannR
4e77056885 Update README.md (#6191)
Typo: The script for LoRA training is `train_text_to_image_lora_prior.py` not `train_text_to_image_prior_lora.py`.

Alternatively you could rename the file and keep the README.md unchanged.
2023-12-18 19:08:29 +01:00
Dhruv Nair
a0c54828a1 Deprecate Pipelines (#6169)
* deprecate pipe

* make style

* update

* add deprecation message

* format

* remove tests for deprecated pipelines

* remove deprecation message

* make style

* fix copies

* clean up

* clean

* clean

* clean

* clean up

* clean up

* clean up toctree

* clean up

---------

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2023-12-18 23:08:29 +05:30
Patrick von Platen
8d891e6e1b [Torch Compile] Fix torch compile for svd vae (#6217) 2023-12-18 18:21:17 +01:00
Patrick von Platen
cce1fe2d41 [Text-to-Video] Clean up pipeline (#6213)
* make style

* make style

* make style

* make style
2023-12-18 18:21:09 +01:00
Abin Thomas
d816bcb5e8 Fix t2i. blog url (#6205) 2023-12-18 09:12:28 -08:00
d8ahazard
6976cab7ca Fix possible re-conversion issues after extracting from safetensors (#6097)
* Fix possible re-conversion issues after extracting from diffusers

Properly rename specific vae keys.

* Whoops
2023-12-18 11:51:20 +01:00
Dhruv Nair
fcbed3fa79 Fix SDXL Inpainting from single file with Refiner Model (#6147)
* update

* update

* update
2023-12-18 11:45:37 +01:00
Sayak Paul
b98b314b7a [Training] remove depcreated method from lora scripts. (#6207)
remove depcreated method from lora scripts.
2023-12-18 15:52:43 +05:30
Omar Sanseviero
74558ff65b Nit fix to training params (#6200) 2023-12-18 11:06:16 +01:00
Yudong Jin
49644babd3 Fix the test script in examples/text_to_image/README.md (#6209)
* Update examples/text_to_image/README.md

* Update examples/text_to_image/README.md

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2023-12-18 15:36:00 +05:30
Sayak Paul
56b3b21693 [Refactor autoencoders] feat: introduce autoencoders module (#6129)
* feat: introduce autoencoders module

* more changes for styling and copy fixing

* path changes in the docs.

* fix: import structure in init.

* fix controlnetxs import
2023-12-18 12:42:15 +05:30
Sayak Paul
9cef07da5a [Benchmarks] fix: lcm benchmarking reporting (#6198)
* fix: lcm benchmarking reporting

* fix generate_csv_dict call.
2023-12-17 15:32:11 +05:30
Sayak Paul
2d94c7838e [Core] feat: enable fused attention projections for other SD and SDXL pipelines (#6179)
* feat: enable fused attention projections for other SD and SDXL pipelines

* add: test for SD fused projections.
2023-12-16 08:45:54 +05:30
Sayak Paul
a81334e3f0 [LoRA] add an error message when dealing with _best_guess_weight_name ofline (#6184)
* add an error message when dealing with _best_guess_weight_name ofline

* simplify condition
2023-12-16 08:36:08 +05:30
Dhruv Nair
d704a730cd Compile test fix (#6104)
* update

* update
2023-12-15 18:34:46 +05:30
dg845
49db233b35 Clean Up Comments in LCM(-LoRA) Distillation Scripts. (#6145)
* Clean up comments in LCM(-LoRA) distillation scripts.

* Calculate predicted source noise noise_pred correctly for all prediction_types.

* make style

* apply suggestions from review

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2023-12-15 18:18:16 +05:30
Dhruv Nair
93ea26f272 Add PEFT to training deps (#6148)
add peft to training deps

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2023-12-15 08:39:59 +05:30
Dhruv Nair
f5dfe2a8b0 LoRA test fixes (#6163)
* update

* update

* update

* update

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2023-12-15 08:39:41 +05:30
Patrick von Platen
4836cfad98 [Sigmas] Keep sigmas on CPU (#6173)
* correct

* Apply suggestions from code review

* make style
2023-12-15 07:43:18 +05:30
Kuba
1ccbfbb663 [docs] Add missing \ in lora.md (#6174) 2023-12-14 16:55:43 -08:00
Linoy Tsaban
29dfe22a8e [advanced dreambooth lora sdxl training script] load pipeline for inference only if validation prompt is used (#6171)
* load pipeline for inference only if validation prompt is used

* move things outside

* load pipeline for inference only if validation prompt is used

* fix readme when validation prompt is used

---------

Co-authored-by: linoytsaban <linoy@huggingface.co>
Co-authored-by: apolinário <joaopaulo.passos@gmail.com>
2023-12-14 11:45:33 -06:00
Aryan V S
56806cdbfd Add missing subclass docs, Fix broken example in SD_safe (#6116)
* fix broken example in pipeline_stable_diffusion_safe

* fix typo in pipeline_stable_diffusion_pix2pix_zero

* add missing docs

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2023-12-14 09:20:30 -08:00
Steven Liu
8ccc76ab37 [docs] IP-Adapter API doc (#6140)
add ip-adapter

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2023-12-14 09:19:37 -08:00
Monohydroxides
c46711e895 [Community] Add SDE Drag pipeline (#6105)
* Add community pipeline: sde_drag.py

* Update README.md

* Update README.md

Update example code and visual example

* Update sde_drag.py

Update code example.
2023-12-14 20:47:20 +05:30
Sayak Paul
1d686bac81 [feat: Benchmarking Workflow] add stuff for a benchmarking workflow (#5839)
* add poc for benchmarking workflow.

* import

* fix argument

* fix: argument

* fix: path

* fix

* fix

* path

* output csv files.

* workflow cleanup

* append token

* add utility to push to hf dataset

* fix: kw arg

* better reporting

* fix: headers

* better formatting of the numbers.

* better type annotation

* fix: formatting

* moentarily disable check

* push results.

* remove disable check

* introduce base classes.

* img2img class

* add inpainting pipeline

* intoduce base benchmark class.

* add img2img and inpainting

* feat: utility to compare changes

* fix

* fix import

* add args

* basepath

* better exception handling

* better path handling

* fix

* fix

* remove

* ifx

* fix

* add: support for controlnet.

* image_url -> url

* move images to huggingface hub

* correct urls.

* root_ckpt

* flush before benchmarking

* don't install accelerate from source

* add runner

* simplify Diffusers Benchmarking step

* change runner

* fix: subprocess call.

* filter percentage values

* fix controlnet benchmark

* add t2i adapters.

* fix filter columns

* fix t2i adapter benchmark

* fix init.

* fix

* remove safetensors flag

* fix args print

* fix

* feat: run_command

* add adapter resolution mapping

* benchmark t2i adapter fix.

* fix adapter input

* fix

* convert to L.

* add flush() add appropriate places

* better filtering

* okay

* get env for torch

* convert to float

* fix

* filter out nans.

* better coment

* sdxl

* sdxl for other benchmarks.

* fix: condition

* fix: condition for inpainting

* fix: mapping for resolution

* fix

* include kandinsky and wuerstchen

* fix: Wuerstchen

* Empty-Commit

* [Community] AnimateDiff + Controlnet Pipeline (#5928)

* begin work on animatediff + controlnet pipeline

* complete todos, uncomment multicontrolnet, input checks

Co-Authored-By: EdoardoBotta <botta.edoardo@gmail.com>

* update

Co-Authored-By: EdoardoBotta <botta.edoardo@gmail.com>

* add example

* update community README

* Update examples/community/README.md

---------

Co-authored-by: EdoardoBotta <botta.edoardo@gmail.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* EulerDiscreteScheduler add `rescale_betas_zero_snr` (#6024)

* EulerDiscreteScheduler add `rescale_betas_zero_snr`

* Revert "[Community] AnimateDiff + Controlnet Pipeline (#5928)"

This reverts commit 821726d7c0.

* Revert "EulerDiscreteScheduler add `rescale_betas_zero_snr` (#6024)"

This reverts commit 3dc2362b5a.

* add SDXL turbo

* add lcm lora to the mix as well.

* fix

* increase steps to 2 when running turbo i2i

* debug

* debug

* debug

* fix for good

* fix and isolate better

* fuse lora so that torch compile works with peft

* fix: LCMLoRA

* better identification for LCM

* change to cron job

---------

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>
Co-authored-by: Aryan V S <contact.aryanvs@gmail.com>
Co-authored-by: EdoardoBotta <botta.edoardo@gmail.com>
Co-authored-by: Beinsezii <39478211+Beinsezii@users.noreply.github.com>
2023-12-12 11:03:34 +05:30
M. Tolga Cangöz
0a401b95b7 [Docs] Fix typos (#6122)
Fix typos and trim trailing whitespaces
2023-12-11 10:55:28 -08:00
Edward Li
664e931bcb Correct type annotation for VaeImageProcessor.numpy_to_pil (#6111)
From `(np.ndarray) -> PIL.Image.Image` to `(np.ndarray) -> List[PIL.Image.Image]`.
2023-12-11 15:22:04 +05:30
Aryan V S
88bdd97ccd IP adapter support for most pipelines (#5900)
* support ip-adapter in src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_upscale.py

* support ip-adapter in src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_attend_and_excite.py

* support ip-adapter in src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_instruct_pix2pix.py

* update tests

* support ip-adapter in src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_panorama.py

* support ip-adapter in src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_sag.py

* support ip-adapter in src/diffusers/pipelines/stable_diffusion_safe/pipeline_stable_diffusion_safe.py

* support ip-adapter in src/diffusers/pipelines/latent_consistency_models/pipeline_latent_consistency_text2img.py

* support ip-adapter in src/diffusers/pipelines/latent_consistency_models/pipeline_latent_consistency_img2img.py

* support ip-adapter in src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_ldm3d.py

* revert changes to sd_attend_and_excite and sd_upscale

* make style

* fix broken tests

* update ip-adapter implementation to latest

* apply suggestions from review

---------

Co-authored-by: YiYi Xu <yixu310@gmail.com>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2023-12-10 21:19:14 +05:30
Charchit Sharma
08b453e382 IP-Adapter for StableDiffusionControlNetImg2ImgPipeline (#5901)
* adapter for StableDiffusionControlNetImg2ImgPipeline

* fix-copies

* fix-copies

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2023-12-09 11:02:55 +05:30
apolinário
2a111bc9fe [Advanced Training Script] Fix pipe example (#6106) 2023-12-08 15:56:35 +01:00
apolinário
16e6997f0d [Advanced Diffusion Script] Add Widget default text (#6100)
add widget
2023-12-08 12:45:27 +01:00
YiYi Xu
3b9b98656e Fix a bug in add_noise function (#6085)
* fix

* copies

---------

Co-authored-by: yiyixuxu <yixu310@gmail,com>
2023-12-07 11:35:28 -10:00
Fabio Rigano
b65928b556 Add support for IPAdapterFull (#5911)
* Add support for IPAdapterFull


Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

---------

Co-authored-by: YiYi Xu <yixu310@gmail.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2023-12-07 06:40:39 -10:00
Beinsezii
6bf1ca2c79 EulerDiscreteScheduler add rescale_betas_zero_snr (#6024)
* EulerDiscreteScheduler add `rescale_betas_zero_snr`
2023-12-06 21:51:04 -10:00
Aryan V S
978dec9014 [Community] AnimateDiff + Controlnet Pipeline (#5928)
* begin work on animatediff + controlnet pipeline

* complete todos, uncomment multicontrolnet, input checks

Co-Authored-By: EdoardoBotta <botta.edoardo@gmail.com>

* update

Co-Authored-By: EdoardoBotta <botta.edoardo@gmail.com>

* add example

* update community README

* Update examples/community/README.md

---------

Co-authored-by: EdoardoBotta <botta.edoardo@gmail.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2023-12-06 21:01:41 -10:00
Dhruv Nair
79a7ab92d1 Fix clearing backend cache from device agnostic testing (#6075)
update
2023-12-07 11:18:31 +05:30
Younes Belkada
c2717317f0 [PEFT] Adapt example scripts to use PEFT (#5388)
* adapt example scripts to use PEFT

* Update examples/text_to_image/train_text_to_image_lora.py

* fix

* add for SDXL

* oops

* make sure to install peft

* fix

* fix

* fix dreambooth and lora

* more fixes

* add peft to requirements.txt

* fix

* final fix

* add peft version in requirements

* remove comment

* change variable names

* add few lines in readme

* add to reqs

* style

* fix issues

* fix lora dreambooth xl tests

* init_lora_weights to gaussian and add out proj where missing

* ammend requirements.

* ammend requirements.txt

* add correct peft versions

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2023-12-07 09:39:29 +05:30
Ian
bf7f9b49a2 Fix typing inconsistency in Euler discrete scheduler (#6052) 2023-12-06 23:45:16 +01:00
UmerHA
e192ae08d3 Add ControlNet-XS support (#5827)
* Check in 23-10-05

* check-in 23-10-06

* check-in 23-10-07 2pm

* check-in 23-10-08

* check-in 231009T1200

* check-in 230109

* checkin 231010

* init + forward run

* checkin

* checkin

* ControlNetXSModel is now saveable+loadable

* Forward works

* checkin

* Pipeline works with `no_control=True`

* checkin

* debug: save intermediate outputs of resnet

* checkin

* Understood time error + fixed connection error

* checkin

* checkin 231106T1600

* turned off detailled debug prints

* time debug logs

* small fix

* Separated control_scale for connections/time

* simplified debug logging

* Full denoising works with control scale = 0

* aligned logs

* Added control_attention_head_dim param

* Passing n_heads instead of dim_head into ctrl unet

* Fixed ctrl midblock bug

* Cleanup

* Fixed time dtype bug

* checkin

* 1. from_unet, 2. base passed, 3. all unet params

* checkin

* Finished docstrings

* cleanup

* make style

* checkin

* more tests pass

* Fixed tests

* removed debug logs

* make style + quality

* make fix-copies

* fixed documentation

* added cnxs to doc toc

* added control start/end param

* Update controlnetxs_sdxl.md

* tried to fix copies..

* Fixed norm_num_groups in from_unet

* added sdxl-depth test

* created SD2.1 controlnet-xs pipeline

* re-added debug logs

* Adjusting group norm ; readded logs

* Added debug log statements

* removed debug logs ; started tests for sd2.1

* updated sd21 tests

* fixed tests

* fixed tests

* slightly increased error tolerance for 1 test

* make style & quality

* Added docs for CNXS-SD

* make fix-copies

* Fixed sd compile test ; fixed gradient ckpointing

* vae downs = cnxs conditioning downs; removed guess

* make style & quality

* Fixed tests

* fixed test

* Incorporated review feedback

* simplified control model surgery

* fixed tests & make style / quality

* Updated docs; deleted pip & cursor files

* Rolled back minimal change to resnet

* Update resnet.py

* Update resnet.py

* Update src/diffusers/models/controlnetxs.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Update src/diffusers/models/controlnetxs.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Incorporated review feedback

* Update docs/source/en/api/pipelines/controlnetxs_sdxl.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/api/pipelines/controlnetxs.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/api/pipelines/controlnetxs.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/api/pipelines/controlnetxs.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update src/diffusers/models/controlnetxs.py

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update src/diffusers/models/controlnetxs.py

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update src/diffusers/pipelines/controlnet_xs/pipeline_controlnet_xs.py

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/api/pipelines/controlnetxs.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update src/diffusers/pipelines/controlnet_xs/pipeline_controlnet_xs_sd_xl.py

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Incorporated doc feedback

---------

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>
2023-12-06 23:33:47 +01:00
Steven Liu
87a09d66f3 [docs] SDXL Turbo (#6065)
api docs
2023-12-06 14:33:14 -08:00
Lucain
75ada25048 Harmonize HF environment variables + deprecate use_auth_token (#6066)
* Harmonize HF environment variables + deprecate use_auth_token

* fix import

* fix
2023-12-06 22:22:31 +01:00
Patrick von Platen
2243a59483 [Euler Discrete] Fix sigma (#6078)
* [Euler Discrete] Fix sigma

* make style
2023-12-06 19:59:38 +01:00
apolinário
466d32c442 [Advanced Diffusion Training] Cache latents to avoid VAE passes for every training step (#6076)
* add cache latents

* style
2023-12-06 14:46:53 +01:00
Dhruv Nair
20ba1fdbbd Disable Tests Fetcher (#6060)
update
2023-12-06 18:10:11 +05:30
Pedro Cuenca
ab6672fecd Use CC12M for LCM WDS training example (#5908)
* Fix SD scripts - there are only 2 items per batch

* Adjustments to make the SDXL scripts work with other datasets

* Use public webdataset dataset for examples

* make style

* Minor tweaks to the readmes.

* Stress that the database is illustrative.
2023-12-06 10:35:36 +01:00
Dhruv Nair
f90a5139a2 fix 2023-12-06 06:03:58 +00:00
Sayak Paul
a2bc2e14b9 [feat] allow SDXL pipeline to run with fused QKV projections (#6030)
* debug

* from step

* print

* turn sigma a list

* make str

* init_noise_sigma

* comment

* remove prints

* feat: introduce fused projections

* change to a better name

* no grad

* device.

* device

* dtype

* okay

* print

* more print

* fix: unbind -> split

* fix: qkv >-> k

* enable disable

* apply attention processor within the method

* attn processors

* _enable_fused_qkv_projections

* remove print

* add fused projection to vae

* add todos.

* add: documentation and cleanups.

* add: test for qkv projection fusion.

* relax assertions.

* relax further

* fix: docs

* fix-copies

* correct error message.

* Empty-Commit

* better conditioning on disable_fused_qkv_projections

* check

* check processor

* bfloat16 computation.

* check latent dtype

* style

* remove copy temporarily

* cast latent to bfloat16

* fix: vae -> self.vae

* remove print.

* add _change_to_group_norm_32

* comment out stuff that didn't work

* Apply suggestions from code review

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* reflect patrick's suggestions.

* fix imports

* fix: disable call.

* fix more

* fix device and dtype

* fix conditions.

* fix more

* Apply suggestions from code review

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

---------

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2023-12-06 07:33:26 +05:30
Arsalan
f427345ab1 Device agnostic testing (#5612)
* utils and test modifications to enable device agnostic testing

* device for manual seed in unet1d

* fix generator condition in vae test

* consistency changes to testing

* make style

* add device agnostic testing changes to source and one model test

* make dtype check fns private, log cuda fp16 case

* remove dtype checks from import utils, move to testing_utils

* adding tests for most model classes and one pipeline

* fix vae import
2023-12-05 19:04:13 +05:30
apolinário
6e221334cd [advanced_dreambooth_lora_sdxl_tranining_script] save embeddings locally fix (#6058)
* Update train_dreambooth_lora_sdxl_advanced.py

* remove global function args from dreamboothdataset class

* style

* style

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2023-12-05 13:52:34 +01:00
Patrick von Platen
53bc30dd45 [From single file] remove depr warning (#6043) 2023-12-05 18:12:25 +05:30
Radamés Ajna
eacf5e34eb Fix demofusion (#6049)
* Update pipeline_demofusion_sdxl.py

* Update README.md
2023-12-05 18:10:46 +05:30
Dhruv Nair
4c05f7856a Ldm unet convert fix (#6038)
* fix

* fix ldm conversion

* fix linting
2023-12-05 18:01:02 +05:30
Dhruv Nair
bbd3572044 Pin Ruff Version (#6059)
pinn ruff
2023-12-05 17:51:37 +05:30
Dhruv Nair
f948778322 Move kandinsky convert script (#6047)
move kandinsky convert script
2023-12-05 15:12:37 +05:30
Steven Liu
4684ea2fe8 [docs] #Copied from mechanism (#6007)
* copied from section

* feedback
2023-12-04 10:12:52 -08:00
Steven Liu
b64f835ea7 [docs] Add Kandinsky 3 (#5988)
* add

* fix api docs

* edits
2023-12-04 10:11:15 -08:00
Linoy Tsaban
880c0fdd36 [advanced dreambooth lora training script][bug_fix] change token_abstraction type to str (#6040)
* improve help tags

* style fix

* changes token_abstraction type to string.
support multiple concepts for pivotal using a comma separated string.

* style fixup

* changed logger to warning (not yet available)

* moved the token_abstraction parsing to be in the same block as where we create the mapping of identifier to token

---------

Co-authored-by: Linoy <linoy@huggingface.co>
2023-12-04 18:38:44 +01:00
RuoyiDu
c36f1c3160 [Community Pipeline] DemoFusion: Democratising High-Resolution Image Generation With No $$$ (#6022)
* Add files via upload

* Update README.md

* Update pipeline_demofusion_sdxl.py

* Update pipeline_demofusion_sdxl.py

* Update examples/community/README.md

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2023-12-04 19:44:57 +05:30
takuoko
0a08d41961 [Feature] Support IP-Adapter Plus (#5915)
* Support IP-Adapter Plus

* fix format

* restore before black format

* restore before black format

* generic

* Refactor PerceiverAttention

* format

* fix test and refactor PerceiverAttention

* generic encode_image

* keep attention implementation

* merge tests

* encode_image backward compatible

* code quality

* fix controlnet inpaint pipeline

* refactor FFN

* refactor FFN

---------

Co-authored-by: YiYi Xu <yixu310@gmail.com>
2023-12-04 12:43:34 +01:00
Levi McCallum
e185084a5d Add variant argument to dreambooth lora sdxl advanced (#6021)
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2023-12-04 12:04:15 +01:00
Dhruv Nair
b21729225a Update Tests Fetcher (#5950)
* update setup and deps table

* update

* update

* update

* up

* up

* update

* up

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* quality fix

* fix failure reporting

---------

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2023-12-04 12:59:41 +05:30
Parth38
8a812e4e14 Update value_guided_sampling.py (#6027)
* Update value_guided_sampling.py

Changed the scheduler step function as predict_epsilon parameter is not there in latest  DDPM Scheduler

* Update value_guided_sampling.md

Updated a link to a working notebook

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2023-12-04 10:36:25 +05:30
gujing
bf92e746c0 fix StableDiffusionTensorRT super args error (#6009) 2023-12-04 10:06:23 +05:30
Linoy Tsaban
b785a155d6 [advanced dreambooth lora sdxl training script] improve help tags (#6035)
* improve help tags

* style fix

---------

Co-authored-by: Linoy <linoy@huggingface.co>
2023-12-04 09:41:25 +05:30
Sayak Paul
d486f0e846 [LoRA serialization] fix: duplicate unet prefix problem. (#5991)
* fix: duplicate unet prefix problem.

* Update src/diffusers/loaders/lora.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

---------

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2023-12-02 21:35:16 +05:30
Sayak Paul
3351270627 [PixArt Tests] remove fast tests from slow suite (#5945)
remove fast tests from slow suite
2023-12-02 20:58:27 +05:30
Junsong Chen
4520e1221a adapt PixArtAlphaPipeline for pixart-lcm model (#5974)
* adapt PixArtAlphaPipeline for pixart-lcm model

* remove original_inference_steps from __call__

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2023-12-02 13:30:40 +05:30
Long(Tony) Lian
618260409f LLMGroundedDiffusionPipeline: inherit from DiffusionPipeline and fix peft (#6023)
* LLMGroundedDiffusionPipeline: inherit from DiffusionPipeline and fix peft

* Use main in the revision in the examples

* Add "Copied from" statements in comments

* Fix formatting with ruff
2023-12-01 09:58:25 -10:00
Patrick von Platen
dadd55fb36 Post Release: v0.24.0 (#5985)
* Post Release: v0.24.0

* post pone deprecation

* post pone deprecation

* Add model_index.json
2023-12-01 18:43:44 +01:00
YiYi Xu
1b6c7ea74e [schedulers] create self.sigmas during __init__ (#6006)
* fix dpm
* all scheulers
2023-12-01 07:15:37 -10:00
YiYi Xu
b41f809a4e [Kandinsky 3.0] Follow-up TODOs (#5944)
clean-up kendinsky 3.0
2023-12-01 07:14:22 -10:00
Patrick von Platen
0f55c17e17 fix style 2023-12-01 15:59:34 +00:00
Charchit Sharma
5058d27f12 added attention_head_dim, attention_type, resolution_idx (#6011) 2023-12-01 16:26:58 +01:00
M. Tolga Cangöz
748c1b3ec7 [Docs] Update a link (#6014)
* Update the location of Python's version

* Trim trailing whitespace
2023-12-01 16:26:25 +01:00
M. Tolga Cangöz
523507034f [logging] Fix assertion bug (#6012)
Fix assertion bug
2023-12-01 16:26:04 +01:00
hako-mikan
46c751e970 [Community Pipeline] Regional Prompting Pipeline (#6015)
* Update README.md

* Update README.md

* Add files via upload

* Update README.md

* Update examples/community/README.md

---------

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2023-12-01 16:22:59 +01:00
Patrick von Platen
bc1d28c888 [From Single File] Allow Text Encoder to be passed (#6020)
Allow text encoder to be passed
2023-12-01 16:19:04 +01:00
Sayak Paul
af378c1dd1 [Easy] minor edits to setup.py (#5996)
minor edits to setup
2023-12-01 20:38:46 +05:30
Steven Liu
6ba4c5395f [docs] Fix SVD video (#6004)
Update svd.md
2023-12-01 16:07:47 +01:00
Linoy Tsaban
c1e4529541 [advanced_dreambooth_lora_sdxl_tranining_script] readme fix (#6019)
readme
2023-12-01 15:14:57 +01:00
Linoy Tsaban
d29d97b616 [examples/advanced_diffusion_training] bug fixes and improvements for LoRA Dreambooth SDXL advanced training script (#5935)
* imports and readme bug fixes

* bug fix - ensures text_encoder params are dtype==float32 (when using pivotal tuning) even if the rest of the model is loaded in fp16

* added pivotal tuning to readme

* mapping token identifier to new inserted token in validation prompt (if used)

* correct default value of --train_text_encoder_frac

* change default value of  --adam_weight_decay_text_encoder

* validation prompt generations when using pivotal tuning bug fix

* style fix

* textual inversion embeddings name change

* style fix

* bug fix - stopping text encoder optimization halfway

* readme - will include token abstraction and new inserted tokens when using pivotal tuning
- added type to --num_new_tokens_per_abstraction

* style fix

---------

Co-authored-by: Linoy Tsaban <linoy@huggingface.co>
2023-12-01 14:18:43 +01:00
Jongho Choi
7d4a257c7f Remove a duplicated line? (#6010)
Update __init__.py
2023-12-01 15:49:36 +05:30
Kristian Mischke
141cd52d56 Fix LLMGroundedDiffusionPipeline super class arguments (#5993)
* make `requires_safety_checker` a kwarg instead of a positional argument as it's more future-proof

* apply `make style` formatting edits

* add image_encoder to arguments and pass to super constructor
2023-11-30 10:15:14 -10:00
Steven Liu
f72b28c75b [docs] Fix video link (#5986)
Update svd.md
2023-11-29 20:52:25 +01:00
Suraj Patil
ada8109d5b Fix SVD doc (#5983)
fix url
2023-11-29 19:55:05 +01:00
Patrick von Platen
b34acbdcbc [SDXL Turbo] Add some docs (#5982)
* add diffusers example

* add diffusers example

* Comment about making it faster

* Apply suggestions from code review

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

---------

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
2023-11-29 19:52:07 +01:00
Suraj Patil
63f767ef15 Add SVD (#5895)
* begin model

* finish blocks

* add_embedding

* addition_time_embed_dim

* use TimestepEmbedding

* fix temporal res block

* fix time_pos_embed

* fix add_embedding

* add conversion script

* fix model

* up

* add new resnet blocks

* make forward work

* return sample in original shape

* fix temb shape in TemporalResnetBlock

* add spatio temporal transformers

* add vae blocks

* fix blocks

* update

* update

* fix shapes in Alphablender and add time activation in res blcok

* use new blocks

* style

* fix temb shape

* fix SpatioTemporalResBlock

* reuse TemporalBasicTransformerBlock

* fix TemporalBasicTransformerBlock

* use TransformerSpatioTemporalModel

* fix TransformerSpatioTemporalModel

* fix time_context dim

* clean up

* make temb optional

* add blocks

* rename model

* update conversion script

* remove UNetMidBlockSpatioTemporal

* add in init

* remove unused arg

* remove unused arg

* remove more unsed args

* up

* up

* check for None

* update vae

* update up/mid blocks for decoder

* begin pipeline

* adapt scheduler

* add guidance scalings

* fix norm eps in temporal transformers

* add temporal autoencoder

* make pipeline run

* fix frame decodig

* decode in float32

* decode n frames at a time

* pass decoding_t to decode_latents

* fix decode_latents

* vae encode/decode in fp32

* fix dtype in TransformerSpatioTemporalModel

* type image_latents same as image_embeddings

* allow using differnt eps in temporal block for video decoder

* fix default values in vae

* pass num frames in decode

* switch spatial to temporal for mixing in VAE

* fix num frames during split decoding

* cast alpha to sample dtype

* fix attention in MidBlockTemporalDecoder

* fix typo

* fix guidance_scales dtype

* fix missing activation in TemporalDecoder

* skip_post_quant_conv

* add vae conversion

* style

* take guidance scale as input

* up

* allow passing PIL to export_video

* accept fps as arg

* add pipeline and vae in init

* remove hack

* use AutoencoderKLTemporalDecoder

* don't scale image latents

* add unet tests

* clean up unet

* clean TransformerSpatioTemporalModel

* add slow svd test

* clean up

* make temb optional in Decoder mid block

* fix norm eps in TransformerSpatioTemporalModel

* clean up temp decoder

* clean up

* clean up

* use c_noise values for timesteps

* use math for log

* update

* fix copies

* doc

* upcast vae

* update forward pass for gradient checkpointing

* make added_time_ids is tensor

* up

* fix upcasting

* remove post quant conv

* add _resize_with_antialiasing

* fix _compute_padding

* cleanup model

* more cleanup

* more cleanup

* more cleanup

* remove freeu

* remove attn slice

* small clean

* up

* up

* remove extra step kwargs

* remove eta

* remove dropout

* remove callback

* remove merge factor args

* clean

* clean up

* move to dedicated folder

* remove attention_head_dim

* docstr and small fix

* update unet doc strings

* rename decoding_t

* correct linting

* store c_skip and c_out

* cleanup

* clean TemporalResnetBlock

* more cleanup

* clean up vae

* clean up

* begin doc

* more cleanup

* up

* up

* doc

* Improve

* better naming

* better naming

* better naming

* better naming

* better naming

* better naming

* better naming

* better naming

* Apply suggestions from code review

* Default chunk size to None

* add example

* Better

* Apply suggestions from code review

* update doc

* Update src/diffusers/pipelines/stable_diffusion_video/pipeline_stable_diffusion_video.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* style

* Get torch compile working

* up

* rename

* fix doc

* add chunking

* torch compile

* torch compile

* add modelling outputs

* torch compile

* Improve chunking

* Apply suggestions from code review

* Update docs/source/en/using-diffusers/svd.md

* Close diff tag

* remove slicing

* resnet docstr

* add docstr in resnet

* rename

* Apply suggestions from code review

* update tests

* Fix output type latents

* fix more

* fix more

* Update docs/source/en/using-diffusers/svd.md

* fix more

* add pipeline tests

* remove unused arg

* clean  up

* make sure get_scaling receives tensors

* fix euler scheduler

* fix get_scalings

* simply euler for now

* remove old test file

* use randn_tensor to create noise

* fix device for rand tensor

* increase expected_max_difference

* fix test_inference_batch_single_identical

* actually fix test_inference_batch_single_identical

* disable test_save_load_float16

* skip test_float16_inference

* skip test_inference_batch_single_identical

* fix test_xformers_attention_forwardGenerator_pass

* Apply suggestions from code review

* update StableVideoDiffusionPipelineSlowTests

* update image

* add diffusers example

* fix more

---------

Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: apolinário <joaopaulo.passos@gmail.com>
2023-11-29 19:13:36 +01:00
PENGUINLIONG
d1b2a1a957 Fixed custom module importing on Windows (#5891)
* Fixed custom module importing on Windows

Windows use back slash and `os.path.join()` follows that convention.

* Apply suggestions from code review

Co-authored-by: Lucain <lucainp@gmail.com>

* Update pipeline_utils.py

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: Lucain <lucainp@gmail.com>
2023-11-29 16:33:04 +01:00
Kashif Rasul
01782c220e [Wuerstchen] Adapt lora training example scripts to use PEFT (#5959)
* Adapt lora example scripts to use PEFT

* add to_out.0
2023-11-29 16:18:20 +01:00
vahramtadevosyan
d63a498c3b [Pipeline] Add TextToVideoZeroSDXLPipeline (#4695)
* integrated sdxl for the text2video-zero pipeline

* make fix-copies

* fixed CI issues

* make fix-copies

* added docs and `copied from` statements

* added fast tests

* made a small change in docs

* quality+style check fix

* updated docs. added controlnet inference with sdxl

* added device compatibility for fast tests

* fixed docstrings

* changing vae upcasting

* remove torch.empty_cache to speed up inference

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* made fast tests to run on dummy models only, fixed copied from statements

* fixed testing utils imports

* Added bullet points for SDXL support

* fixed formatting & quality

* Update tests/pipelines/text_to_video/test_text_to_video_zero_sdxl.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Update tests/pipelines/text_to_video/test_text_to_video_zero_sdxl.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* fixed minor error for merging

* fixed updates of sdxl

* made fast tests inherit from `PipelineTesterMixin` and run in 3-4secs on CPU

* make style && make quality

* reimplemented fast tests w/o default attn processor

* make style & make quality

* make fix-copies

* make fix-copies

* fixed docs

* make style & make quality & make fix-copies

* bug fix in cross attention

* make style && make quality

* make fix-copies

* fix gpu issues

* make fix-copies

* updated pipeline signature

---------

Co-authored-by: Vahram <vahram.tadevosyan@lambda-loginnode02.cm.cluster>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>
2023-11-29 16:10:43 +01:00
Marko Kostiv
6a4aad43dc Controlnet ssd 1b support (#5779)
* Add SSD-1B support for controlnet model

* Add conditioning_channels into ControlNet init from unet

* Fix black formatting

* Isort fixes

* Adds SSD-1B controlnet pipeline test with UNetMidBlock2D as mid block

* Overrides failing ssd-1b tests

* Fixes tests after main branch update

* Fixes code quality checks

---------

Co-authored-by: Marko Kostiv <marko@linearity.io>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2023-11-29 16:10:01 +01:00
Steven Liu
ddd8bd53ed [docs] LCM training (#5796)
* first draft

* feedback
2023-11-29 16:08:05 +01:00
JuanCarlosPi
9f7b2cf2dc Support of ip-adapter to the StableDiffusionControlNetInpaintPipeline (#5887)
* Change pipeline_controlnet_inpaint.py to add ip-adapter support. Changes are similar to those in pipeline_controlnet

* Change tests for the StableDiffusionControlNetInpaintPipeline by adding image_encoder: None

* Update src/diffusers/pipelines/controlnet/pipeline_controlnet_inpaint.py

Co-authored-by: YiYi Xu <yixu310@gmail.com>

---------

Co-authored-by: YiYi Xu <yixu310@gmail.com>
2023-11-29 16:00:24 +01:00
Sayak Paul
895c4b704b [LoRA refactor] move several state dict conversion utils out of lora.py (#5955)
* move several state dict conversion utils out of lora.py

* check

* check

* check

* check

* check

* check

* check

* revert back

* check

* check

* again check

* maybe fix?

* Apply suggestions from code review

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

---------

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2023-11-29 20:24:21 +05:30
Linh Nguyen
636feba552 Rename output_dir argument (#5916)
Fix typo in output_dir argument: "text-inversion-model" → "dreambooth-model"
2023-11-29 15:47:16 +01:00
Andrés Romero
79dc7df03e [bug fix] Inpainting for MultiAdapter (#5922)
* bug in MultiAdapter for Inpainting

* adapter_input is a list for MultiAdapter

---------

Co-authored-by: andres <andres@hax.ai>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2023-11-29 15:46:26 +01:00
Charchit Sharma
6031ecbd23 added doc for Kandinsky3.0 (#5937)
* added en doc for Kandinsky3.0

* required changes

* Update docs/source/en/api/pipelines/kandinsky3.md

* Update docs/source/en/api/pipelines/kandinsky3.md

* Update docs/source/en/api/pipelines/kandinsky3.md

---------

Co-authored-by: YiYi Xu <yixu310@gmail.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2023-11-29 15:32:00 +01:00
Sayak Paul
fdd003d8e2 [Tests] Refactor test_examples.py for better readability (#5946)
* control and custom diffusion

* dreambooth

* instructpix2pix and dreambooth ckpting

* t2i adapters.

* text to image ft

* textual inversion

* unconditional

* workflows

* import fix

* fix import
2023-11-29 18:43:59 +05:30
Steven Liu
172acc98b9 [docs] Update pipeline list (#5952)
add to list
2023-11-29 14:08:39 +01:00
estelleafl
5ae3c3a56b [ldm3d] Ldm3d upscaler to community pipeline (#5870)
---------
Co-authored-by: Aflalo <estellea@isl-gpu27.rr.intel.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: YiYi Xu <yixu310@gmail.com>
2023-11-28 09:00:39 -10:00
Soumik Rakshit
21bc59ab24 fix: minor typo in docstring (#5961) 2023-11-28 18:18:34 +05:30
Steven Liu
50a749e909 [docs] Fix space (#5898)
* fix

* minor edits
2023-11-27 11:50:59 -08:00
YiYi Xu
d9075be494 [load_textual_inversion]: allow multiple tokens (#5837)
Co-authored-by: yiyixuxu <yixu310@gmail,com>
2023-11-27 06:52:36 -10:00
Patrick von Platen
b135b6e905 [From_pretrained] Fix warning (#5948) 2023-11-27 14:35:19 +01:00
T. Xu
14a0d21d2e [Community Pipeline] Diffusion Posterior Sampling for General Noisy Inverse Problems (#5939)
* [community pipeline] dps impl

* add type checking

* pass ruff check

* ruff formatter
2023-11-27 14:29:42 +01:00
Patrick von Platen
ebf581e85f [Tests] Make sure that we don't run tests multiple times (#5949)
* [Tests] Make sure that we don't run tests mulitple times

* [Tests] Make sure that we don't run tests mulitple times

* [Tests] Make sure that we don't run tests mulitple times
2023-11-27 14:18:56 +01:00
Patrick von Platen
e550163b9f [Vae] Make sure all vae's work with latent diffusion models (#5880)
* add comments to explain the code better

* add comments to explain the code better

* add comments to explain the code better

* add comments to explain the code better

* add comments to explain the code better

* fix more

* fix more

* fix more

* fix more

* fix more

* fix more
2023-11-27 14:17:47 +01:00
Viktor Grygorchuk
20f0cbc88f fix: error on device for lpw_stable_diffusion_xl pipeline if pipe.enable_sequential_cpu_offload() enabled (#5885)
fix: set device for pipe.enable_sequential_cpu_offload()
2023-11-27 13:47:47 +01:00
Chi
d72a24b790 Replace multiple variables with one variable. (#5715)
* I added a new doc string to the class. This is more flexible to understanding other developers what are doing and where it's using.

* Update src/diffusers/models/unet_2d_blocks.py

This changes suggest by maintener.

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

* Update src/diffusers/models/unet_2d_blocks.py

Add suggested text

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

* Update unet_2d_blocks.py

I changed the Parameter to Args text.

* Update unet_2d_blocks.py

proper indentation set in this file.

* Update unet_2d_blocks.py

a little bit of change in the act_fun argument line.

* I run the black command to reformat style in the code

* Update unet_2d_blocks.py

similar doc-string add to have in the original diffusion repository.

* I enhanced the code by replacing multiple redundant variables with a single variable, as they all served the same purpose. Additionally, I utilized the get_activation function for improved flexibility in choosing activation functions.

* Using as black package to reformated my file

* reverte some changes

* Remove conv_out_padding variables and using as conv_in_padding

* conv_out_padding create and add them into the code.

* run black command to solving styling problem

* add little bit space between comment and import statement

* I am utilizing the ruff library to address the style issues in my Makefile.

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: YiYi Xu <yixu310@gmail.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2023-11-27 13:34:52 +01:00
ginjia
d3cda804e7 add LoRA weights load and fuse support for IPEX pipeline (#5920)
add IPEX pipeline LoRA weights loading support
2023-11-27 13:32:43 +01:00
dg845
07eac4d65a Fix LCM Stable Diffusion distillation bug related to parsing unet_time_cond_proj_dim (#5893)
* Fix bug related to parsing unet_time_cond_proj_dim.

* Fix analogous bug in the SD-XL LCM distillation script.
2023-11-27 13:00:40 +01:00
Iván de Prado
c079cae3d4 Avoid computing min() that is expensive when do_normalize is False in the image processor (#5896)
Avoid computing min() that is expensive when do_normalize is False

Avoid extra computing when do_normalize is False
2023-11-27 12:46:26 +01:00
Wang, Yi
c7bfb8b22a set the model to train state before accelerator prepare (#5099)
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
2023-11-27 12:43:49 +01:00
dg845
67d070749a Add Custom Timesteps Support to LCMScheduler and Supported Pipelines (#5874)
* Add custom timesteps support to LCMScheduler.

* Add custom timesteps support to StableDiffusionPipeline.

* Add custom timesteps support to StableDiffusionXLPipeline.

* Add custom timesteps support to remaining Stable Diffusion pipelines which support LCMScheduler (img2img, inpaint).

* Add custom timesteps support to remaining Stable Diffusion XL pipelines which support LCMScheduler (img2img, inpaint).

* Add custom timesteps support to StableDiffusionControlNetPipeline.

* Add custom timesteps support to T21 Stable Diffusion (XL) Adapters.

* Clean up Stable Diffusion inpaint tests.

* Manually add support for custom timesteps to AltDiffusion pipelines since make fix-copies doesn't appear to work correctly (it deletes the whole pipeline).

* make style

* Refactor pipeline timestep handling into the retrieve_timesteps function.
2023-11-27 12:39:14 +01:00
Aryan V S
9c357bda3f Deprecate KarrasVeScheduler and ScoreSdeVpScheduler (#5269)
* deprecated: KarrasVeScheduler, ScoreSdeVpScheduler

* delete tests relevant to deprecated schedulers

* chore: run make style

* fix: import error caused due to incorrect _import_structure after deprecation

* fix: ScoreSdeVpScheduler was not importable from diffusers

* remove import added by assumption

* Update src/diffusers/schedulers/__init__.py as suggested by @patrickvonplaten

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* make it a part deprecated

* Apply suggestions from code review

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Fix

* fix

* fix doc

* fix doc....again.......

* remove karras_ve test folder

Co-Authored-By: YiYi Xu <yixu310@gmail.com>

---------

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: YiYi Xu <yixu310@gmail.com>
Co-authored-by: yiyixuxu <yixu310@gmail,com>
2023-11-27 12:33:02 +01:00
Sayak Paul
3f7c3511dc [Core] add support for gradient checkpointing in transformer_2d (#5943)
add support for gradient checkpointing in transformer_2d
2023-11-27 16:21:12 +05:30
Junsong Chen
7d6f30e89b [Fix: pixart-alpha] random 512px resolution bug (#5842)
* [Fix: pixart-alpha]
add ASPECT_RATIO_512_BIN in use_resolution_binning for random 512px image generation.

* add slow test file for 512px generation without resolution binning

* fix: slow tests for resolution binning.

---------

Co-authored-by: jschen <chenjunsong4@h-partners.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2023-11-27 13:35:35 +05:30
Patrick von Platen
6d2e19f746 [Examples] Allow downloading variant model files (#5531)
* add variant

* add variant

* Apply suggestions from code review

* reformat

* fix: textual_inversion.py

* fix: variant in model_info

---------

Co-authored-by: sayakpaul <spsayakpaul@gmail.com>
2023-11-27 10:43:20 +05:30
Patrick von Platen
2a7f43a73b correct num inference steps 2023-11-24 17:09:26 +00:00
Patrick von Platen
b978334d71 [@cene555][Kandinsky 3.0] Add Kandinsky 3.0 (#5913)
* finalize

* finalize

* finalize

* add slow test

* add slow test

* add slow test

* Fix more

* add slow test

* fix more

* fix more

* fix more

* fix more

* fix more

* fix more

* fix more

* fix more

* fix more

* Better

* Fix more

* Fix more

* add slow test

* Add auto pipelines

* add slow test

* Add all

* add slow test

* add slow test

* add slow test

* add slow test

* add slow test

* Apply suggestions from code review

* add slow test

* add slow test
2023-11-24 17:46:00 +01:00
Sayak Paul
e5f232f76b [Docs] add: 8bit inference with pixart alpha (#5814)
* add: 8bit inference with pixart alpha

* Apply suggestions from code review

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* add: note on 4bit.

* Apply suggestions from code review

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* address comment

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2023-11-24 20:36:33 +05:30
Linoy Tsaban
3003ff4947 [bug fix] fix small bug in readme template of sdxl lora training script (#5914)
readme improvement and metadata fix
2023-11-23 19:08:49 +01:00
Linoy Tsaban
5ffa603244 [bug fix] fix small bug in readme template of sdxl lora training script (#5906)
* readme bug fix

* style fix

---------

Co-authored-by: Linoy Tsaban <linoy@huggingface.co>
2023-11-23 12:11:50 +01:00
Linoy Tsaban
0eeee618cf Adds an advanced version of the SD-XL DreamBooth LoRA training script supporting pivotal tuning (#5883)
* sdxl dreambooth lora training script with pivotal tuning

* bug fix - args missing from parse_args

* code quality fixes

* comment unnecessary code from TokenEmbedding handler class

* fixup

---------

Co-authored-by: Linoy Tsaban <linoy@huggingface.co>
2023-11-22 16:27:56 +01:00
Andrés Romero
93f1a14cab ControlNet+Adapter pipeline, and ControlNet+Adapter+Inpaint pipeline (#5869)
* ControlNet+Adapter pipeline, and +Inpaint pipeline


---------

Co-authored-by: andres <andres@hax.ai>
2023-11-21 08:59:29 -10:00
Patrick von Platen
13d73d9303 [Lora] Seperate logic (#5809)
* [Lora] Seperate logic

* [Lora] Seperate logic

* [Lora] Seperate logic

* add comments to explain the code better

* add comments to explain the code better
2023-11-21 18:58:37 +01:00
YiYi Xu
ba352aea29 [feat] IP Adapters (author @okotaku ) (#5713)
* add ip-adapter


---------

Co-authored-by: okotaku <to78314910@gmail.com>
Co-authored-by: sayakpaul <spsayakpaul@gmail.com>
Co-authored-by: yiyixuxu <yixu310@gmail,com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2023-11-21 07:34:30 -10:00
Linoy Tsaban
6fac1369d0 Add features to the Dreambooth LoRA SDXL training script (#5508)
* Additions:
- support for different lr for text encoder
- support for Prodigy optimizer
- support for min snr gamma
- support for custom captions and dataset loading from the hub

* adjusted --caption_column behaviour (to -not- use the second column of the dataset by default if --caption_column is not provided)

* fixed --output_dir / --model_dir_name confusion

* added --repeats, --adam_weight_decay_text_encoder
+ some fixes

* Update examples/dreambooth/train_dreambooth_lora_sdxl.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Update examples/dreambooth/train_dreambooth_lora_sdxl.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Update examples/dreambooth/train_dreambooth_lora_sdxl.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* - import compute_snr from diffusers/training_utils.py
- cluster adamw together
- when using 'prodigy', if --train_text_encoder == True and --text_encoder_lr != --learning rate, changes the lr of the text encoders optimization params to be --learning_rate (otherwise errors)

* shape fixes when custom captions are used

* formatting and a little cleanup

* code styling

* --repeats default value fixed, changed to 1

* bug fix - removed redundant lines of embedding concatenation when using prior_preservation (that duplicated class_prompt embeddings)

* changed dataset loading logic according to the following usecases (to avoid unnecessary dependency on datasets)-
1. user provides --dataset_name
2. user provides local dir --instance_data_dir that contains a metadata .jsonl file
3. user provides local dir --instance_data_dir that contains only images
in cases [1,2] we import datasets and use load_dataset method, in case [3] we process the data same as in the original script setting

* styling fix

* arg name fix

* adjusted the --repeats logic

* -removed redundant arg and 'if' when loading local folder with prompts
-updated readme template
-some default val fixes
-custom caption tests

* image path fix for readme

* code style

* bug fix

* --caption_column arg

* readme fix

---------

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Linoy Tsaban <linoy@huggingface.co>
2023-11-21 17:38:43 +01:00
Steven Liu
1093f9d615 [docs] MusicLDM (#5854)
* fix

* feedback
2023-11-21 15:27:41 +01:00
Aryan V S
81780882b8 Addition of new callbacks to controlnets (#5812)
* add new callbacks to src/diffusers/pipelines/controlnet/pipeline_controlnet.py

* update callbacks

* fix repeated kwarg

* update

---------

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2023-11-21 15:22:20 +01:00
Dhruv Nair
ebc7bedeb7 Add tests fetcher (#5848)
* add tests fetcher to utils

* add test fetcher

* update

* update

* remove unused dependency version check script

* update

* fix mistake

* update

* update

* update

* update

* update

* update

* update

* remove concurrency params

* update

* update

* update

* update

* update

* update

* move test fetcher to dedicated workflow
2023-11-21 18:01:44 +05:30
co63oc
ee519cfef5 Update README.md (#5855) 2023-11-21 11:56:13 +01:00
Steven Liu
7457aa67cb [docs] Loader APIs (#5813)
* first draft

* remove old loader doc

* start adding lora code examples

* finish

* add link to loralinearlayer

* feedback

* fix
2023-11-20 10:53:13 -08:00
M. Tolga Cangöz
c72a173906 Revert "[Docs] Update and make improvements" (#5858)
* Revert "[`Docs`] Update and make improvements (#5819)"

This reverts commit c697f52476.

* Update README.md

* Update memory.md

* Update basic_training.md

* Update write_own_pipeline.md

* Update fp16.md

* Update basic_training.md

* Update write_own_pipeline.md

* Update write_own_pipeline.md
2023-11-20 10:22:21 -08:00
dg845
dc21498b43 Update LCMScheduler Inference Timesteps to be More Evenly Spaced (#5836)
* Change LCMScheduler.set_timesteps to pick more evenly spaced inference timesteps.

* Change inference_indices implementation to better match previous behavior.

* Add num_inference_steps=26 test case to test_inference_steps.

* run CI

---------

Co-authored-by: patil-suraj <surajp815@gmail.com>
2023-11-20 15:46:10 +01:00
Patrick von Platen
3303aec5f8 make style 2023-11-20 12:54:52 +01:00
ginjia
4abbbff618 fix an issue that ipex occupy too much memory, it will not impact per… (#5625)
* fix an issue that ipex occupy too much memory, it will not impact performance

* make style

---------

Co-authored-by: root <jun.chen@intel.com>
Co-authored-by: Meng Guoqing <guoqing.meng@intel.com>
2023-11-20 12:43:29 +01:00
Younes Belkada
fda297703f [test / peft] Fix silent behaviour on PR tests (#5852)
Update pr_test_peft_backend.yml
2023-11-20 12:42:58 +01:00
Roy Hvaara
2695ba8e9d [JAX] Replace uses of jax.devices("cpu") with jax.local_devices(backend="cpu") (#5864)
An upcoming change to JAX will include non-local (addressable) CPU devices in jax.devices() when JAX is used multicontroller-style, where there are multiple Python processes.

This change preserves the current behavior by replacing uses of jax.devices("cpu"), which previously only returned local devices, with jax.local_devices("cpu"), which will return local devices both now and in the future.

This change is always safe (i.e., it should always preserve the previous behavior), but it may sometimes be unnecessary if code is never used in a multicontroller setting.

Co-authored-by: Peter Hawkins <phawkins@google.com>
2023-11-20 12:32:50 +01:00
Aryan V S
3ab921166d [Community] [WIP] LCM Interpolation Pipeline (#5767)
* wip: add interpolate pipeline for lcm

* update documentation

* update documentation
2023-11-20 12:18:17 +01:00
Kashif Rasul
6b04d61cf6 [Styling] stylify using ruff (#5841)
* ruff format

* not need to use doc-builder's black styling as the doc is styled in ruff

* make fix-copies

* comment

* use run_ruff
2023-11-20 11:48:34 +01:00
Sayak Paul
9c7f7fc475 [ControlNet] fix import in single file loading (#5834)
fix import

Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>
2023-11-20 15:50:36 +05:30
Aryan V S
4adad57e57 UnboundLocalError in SDXLInpaint.prepare_latents() (#5648)
* fix: UnboundLocalError with image_latents

* chore: run make style, quality, fix-copies

* revert changes from make fix-copies

* revert changes from make fix-copies

---------

Co-authored-by: YiYi Xu <yixu310@gmail.com>
2023-11-19 22:39:51 -10:00
Younes Belkada
4e54dfe985 [Tests/LoRA/PEFT] Test also on PEFT / transformers / accelerate latest (#5820)
* add also peft latest on peft CI

* up

* up

* up

* Update .github/workflows/pr_test_peft_backend.yml

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2023-11-17 19:17:31 +01:00
Sourab Mangrulkar
6f1435332b Speed up the peft lora unload (#5741)
* Update peft_utils.py

* fix bug

* make the util backwards compatible.

Co-Authored-By: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

* fix import issue

* refactor the backward compatibilty condition

* rename the conditional variable

* address comments

Co-Authored-By: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>

* address comment

---------

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>
2023-11-17 19:15:44 +01:00
Patrick von Platen
c6f90daea6 [PEFT] Unpin peft (#5850) 2023-11-17 19:15:02 +01:00
Will Berman
2a84e8bb5a fix memory consistency decoder test (#5828)
Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>
2023-11-17 09:31:01 -08:00
Lucain
c896b841e4 Set usedforsecurity=False in hashlib methods (FIPS compliance) (#5790)
* Set usedforsecurity=False in hashlib methods (FIPS compliance)

* update version dependency

* bump hfh version

* bump hfh version
2023-11-17 14:56:58 +01:00
Sayak Paul
69412d0a15 [Docs] add: japanese sdxl as a reference (#5844)
add: japanese sdxl as a reference
2023-11-17 18:14:02 +05:30
Patrick von Platen
913986afa5 Improve setup.py and add dependency check (#5826)
* put peft in requirements

* correct peft

* correct installs

* make style

* make style
2023-11-17 12:05:26 +01:00
Steven Liu
ff573ae245 [docs] Fix title (#5831)
fix section title
2023-11-17 08:53:11 +05:30
M. Tolga Cangöz
c697f52476 [Docs] Update and make improvements (#5819)
Update and make improvements
2023-11-16 13:47:25 -08:00
Suraj Patil
a042909c83 LCM-LoRA docs (#5782)
* begin doc

* fix examples

* add in toctree

* fix toctree

* improve copy

* improve introductions

* add lcm doc

* fix filename

* Apply suggestions from code review

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* address Sayak's comments

* remove controlnet aux

* open in colab

* move to Specific pipeline examples

* update controlent and adapter examples

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2023-11-16 16:19:27 +01:00
Suraj Patil
64cbd8e27a Support LCM in ControlNet and Adapter pipelines. (#5822)
* support lcm

* fix tests

* fix tests
2023-11-16 14:59:50 +01:00
Aryan V S
038b42db94 Improve docs and type hints (#5759)
* improvement: docs and type hints

* improvement: docs and type hints

minor refactor

* improvement: docs and type hints

* update with suggestions from review

Co-Authored-By: Dhruv Nair <dhruv.nair@gmail.com>

---------

Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>
2023-11-16 14:00:32 +05:30
M. Tolga Cangöz
ecbe27a07f [Docs] Fix typos and update files at API's Pipelines page 2 (#5748)
* Fix typos, update, add Copyright info, and trim trailing whitespace

* Update docs/source/en/api/pipelines/text_to_video_zero.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* 1 second is not a long video, but 6 seconds is

* Update text_to_video_zero.md

* Update text_to_video_zero.md

* Update text_to_video_zero.md

* Update wuerstchen.md

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2023-11-15 10:54:55 -08:00
M. Tolga Cangöz
3ad4207d1f [Docs] Fix typos, update, and add visualizations at Using Diffusers' Pipelines for Inference Page (#5649)
* Fix typos, update, add visualizations

* Update sdxl.md

* Update controlnet.md

* Update docs/source/en/using-diffusers/shap-e.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/using-diffusers/shap-e.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update diffedit.md

* Update kandinsky.md

* Update sdxl.md

* Update controlnet.md

* Update docs/source/en/using-diffusers/controlnet.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/using-diffusers/controlnet.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update controlnet.md

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2023-11-15 10:30:01 -08:00
MilkClouds
3517fb9430 fix: enabled num_images_per_prompt>1 for lpw_stable_diffusion_xl (community pipeline) (#5807)
* fix: enabled num_images_per_prompt>1 for lpw_stable_diffusion_xl

* style: fixed isort
2023-11-15 07:40:29 -10:00
Dhruv Nair
cdadb023a2 Make Video Tests faster (#5787)
* update test

* update
2023-11-15 10:56:01 +05:30
M. Tolga Cangöz
51fd3dd206 [Docs] Remove .to('cuda') before .enable_model_cpu_offload() (#5795)
Remove .to('cuda') before cpu_offload, trim trailing whitespaces
2023-11-14 17:20:54 -08:00
Pedro Gabriel Gengo Lourenço
98457580c0 Fixed documentation of consistency decoder (#5768)
* Fixed doc for consistency decoder

* Style fix
2023-11-14 15:27:26 -08:00
M. Tolga Cangöz
3b37488fa3 Fix typos, improve, update at main-page files and .github files (#5588)
* Update keywords; remove version cuz it changes constantly?

* Update if necessary

* Fix typos and links

* version_range_max from PyTorch's setup.py; fix typos; 1 checklist is enough?

* Fix a typo

* Fix typos

* There is already a Blank issue link at the bottom of the page; direct to Diffusers' forum

* Add 🌐 Translating a New Language? page

* Update .github/ISSUE_TEMPLATE/translate.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update PHILOSOPHY.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update PHILOSOPHY.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update PHILOSOPHY.md

* Update PHILOSOPHY.md

* Update CONTRIBUTING.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update CONTRIBUTING.md

* Update CONTRIBUTING.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update PHILOSOPHY.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update PHILOSOPHY.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update README.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Add X account link

* Update setup.py

* Update CITATION.cff

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2023-11-14 11:46:52 -08:00
Kadir Nar
c7260ce253 🔧 Fix import codes in diffusers library (#5792)
* 🔧 Fix import codes in diffusers library

* Refactor imports in community examples
2023-11-14 11:37:59 -08:00
M. Tolga Cangöz
8092017d3f [Docs] Fix typos and update files at API's Pipelines page 1 (#5744)
* Fix typos, update, add Copyright info, and trim trailing whitespace

* Update alt_diffusion.md

* Remove nonoperational demo

* Update docs/source/en/api/pipelines/consistency_models.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/api/pipelines/latent_consistency_models.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2023-11-14 10:36:20 -08:00
Steven Liu
bae14c8bcb [docs] Update training docs (#5512)
* first draft

* try hfoption syntax

* fix hfoption id

* add text2image

* fix tag

* feedback

* feedbacks

* add textual inversion

* DreamBooth

* lora

* controlnet

* instructpix2pix

* custom diffusion

* t2i

* separate training methods and models

* sdxl

* kandinsky

* wuerstchen

* light edits
2023-11-14 10:29:56 -08:00
Sayak Paul
ded93f798c [Refactor] refactor loaders.py to make it cleaner and leaner. (#5771)
* refactor loaders.py to make it cleaner and leaner.

* refactor loaders init

* inits.

* textual inversion to the init.

* inits.

* remove certain modules from the main init.

* AttnProcsLayers

* fix imports

* avoid circular import.

* fix circular import pt 2.

* address PR comments

* imports

* fix: imports.

* remove from main init for avoiding circular deps.

* remove spurious deps.

* fix-copies.

* fix imports.

* more debug

* more debug

* Apply suggestions from code review

* Apply suggestions from code review

---------

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2023-11-14 12:54:28 +01:00
Sayak Paul
a5720e9e31 [PixArt-Alpha] Fix PixArt-Alpha pipeline when number of images to generate is more than 1 (#5752)
* does this fix things?

* attention mask use

* attention mask order

* better masking.

* add: tesrt

* remove mask_featur

* test

* debug

* fix: tests

* deprecate mask_feature

* add deprecation test

* add slow test

* add print statements to retrieve the assertion values.

* fix for the 1024 fast tes

* fix tesy

* fix the remaining

* Apply suggestions from code review

* more debug

---------

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2023-11-14 12:45:35 +01:00
Richard Löwenström
16d500455b Use greater than or equal to in version comparisons for peft (#5785) 2023-11-14 12:17:51 +01:00
Yusuke Suzuki
210a07b13c fix exception around NSFW filter on flax stable diffusion (#5675) 2023-11-14 12:16:52 +01:00
Patrick von Platen
0a0ebc7cb4 [LCM] Better error message (#5788) 2023-11-14 12:08:15 +01:00
Patrick von Platen
81df9c85de Unwrap models everywhere (#5789)
more debug
2023-11-14 12:08:03 +01:00
takuoko
bfe94a3993 [Enhacne] Support maybe_raise_or_warn for peft (#5653)
* Support maybe_raise_or_warn for peft

* fix by comment

* unwrap function
2023-11-14 11:40:35 +01:00
Lukas Kuhn
c9c5436c94 download_from_original_stable_diffusion_ckpt initializes correct default pipeline for SDXL (#5784)
* feat: sdxl will be automatically detected as pipeline_class

* fix: formatting

* fix: formatting with black

* fix: import pipeline wrongly sorted
2023-11-14 11:35:26 +01:00
Sourab Mangrulkar
9c8eca702c add lora delete feature (#5738)
* add lora delete feature

* added tests and changed condition

* deal with corner cases

* more corner cases

* rename to `delete_adapter_layers` for consistency

---------

Co-authored-by: younesbelkada <younesbelkada@gmail.com>
2023-11-14 10:51:13 +01:00
co63oc
069123f66e Update checkpoint_merger.py (#5780) 2023-11-14 10:46:11 +01:00
Sayak Paul
ed759f0aee [PixArt-Alpha] Introduce resolution binning (#5739)
* feat: add resolution binning

Co-authored-by: lawrence-cj <jschen@mail.dlut.edu.cn>

* rename

* debug

* add :test

* remove unused variable

* set resolution_binning to False.

---------

Co-authored-by: lawrence-cj <jschen@mail.dlut.edu.cn>
2023-11-14 08:34:59 +05:30
Long(Tony) Lian
5b231aa38b Fix the pipeline name in the examples for LMD+ pipeline. Add a colab link to pipeline README. (#5775)
* Fix the pipeline name in the examples for LMD+ pipeline

* Add LMD+ colab link

* Apply code formatting

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2023-11-14 07:43:37 +05:30
M. Tolga Cangöz
a359ff7644 [Docs] Fix typos and update files at API's Main Classes, Models, and Schedulers pages (#5720)
* Fix typos, update, add Copyright info, and trim trailing whitespaces

* Update docs/source/en/api/loaders.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/api/models/autoencoder_tiny.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/api/models/autoencoder_tiny.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2023-11-13 14:32:59 -08:00
Steven Liu
4b45a1e147 [docs] Use other checkpoints with inpaint (#5590)
* tip about inpaint checkpoints

* expand section

* feedback
2023-11-13 12:39:30 -08:00
Steven Liu
f782ca112a [docs] Callbacks (#5735)
* updates

* feedback
2023-11-13 12:11:07 -08:00
Steven Liu
80e78d2cac [docs] Custom community components (#5732)
* fixes

* feedback
2023-11-13 11:01:52 -08:00
JacobYuan7
4d3b4e00ed Update the reference for text_to_video.md (#5706)
* Update the reference for text_to_video.md

The original reference (VideoFusion) might be misleading. VideoFusion is not open-sourced. I am the co-first author of ModelScopeT2V. I change the referred paper to the right one.

* Update docs/source/en/api/pipelines/text_to_video.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

---------

Co-authored-by: YiYi Xu <yixu310@gmail.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2023-11-13 19:06:18 +01:00
Thuan H. Nguyen
8fcd52febb Correct code for distributed training of RealFill (#5740)
Correct code for distributed training
2023-11-13 19:01:15 +01:00
Nicolas Hug
0488810f61 Fix realfill example compatibility with latest torchvision version (#5736) 2023-11-13 18:55:17 +01:00
Patrick von Platen
ef7787ea59 make style 2023-11-13 18:54:53 +01:00
Jianqi Pan
1ce4b5f3e3 fix: fix forward function signature of controlnet reference_only pipeline example (#5717)
fix: ignore other args

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2023-11-13 18:54:21 +01:00
Kashif Rasul
c9f847a70f [Wuerstchen] fix for when USE_PEFT_BACKEND is True (#5704)
* fix for when USE_PEFT_BACKEND is True

* Update modeling_wuerstchen_prior.py

* revert change

* add lora tests
2023-11-13 18:37:55 +01:00
Younes Belkada
8789d0b6c7 fix styling issues on main (#5754)
fix styling issues
2023-11-13 20:05:15 +05:30
Long(Tony) Lian
b1fbef544c Add LLM-grounded Diffusion (LMD+) pipeline (#5634)
* Add LLM-grounded Diffusion (LMD+) pipeline

* Update the formatting

* Applied formatting
2023-11-11 12:51:05 +05:30
YiYi Xu
a3476d5bd6 Controlnet Img2img: pass height and width to image_processor.preprocess (#5665)
pass height and width to image_processor.preprocess

Co-authored-by: yiyixuxu <yixu310@gmail,com>
2023-11-10 06:53:20 -10:00
Sayak Paul
1477865e48 post release v0.23.0 (#5730)
* post release

* fix: variant test

* up

* fix: test
2023-11-10 16:35:44 +05:30
aihao
1f87f83e68 add load_datasete data_dir parameter (#5747) 2023-11-09 23:03:14 -10:00
Sayak Paul
77ba494b29 [ConsistencyDecoder] fix: doc type (#5745)
fix: doc type
2023-11-10 14:05:22 +05:30
707 changed files with 78815 additions and 17378 deletions

View File

@@ -1,5 +1,5 @@
name: "\U0001F41B Bug Report"
description: Report a bug on diffusers
description: Report a bug on Diffusers
labels: [ "bug" ]
body:
- type: markdown
@@ -10,7 +10,7 @@ body:
Thus, issues are of the same importance as pull requests when contributing to this library ❤️.
In order to make your issue as **useful for the community as possible**, let's try to stick to some simple guidelines:
- 1. Please try to be as precise and concise as possible.
*Give your issue a fitting title. Assume that someone which very limited knowledge of diffusers can understand your issue. Add links to the source code, documentation other issues, pull requests etc...*
*Give your issue a fitting title. Assume that someone which very limited knowledge of Diffusers can understand your issue. Add links to the source code, documentation other issues, pull requests etc...*
- 2. If your issue is about something not working, **always** provide a reproducible code snippet. The reader should be able to reproduce your issue by **only copy-pasting your code snippet into a Python shell**.
*The community cannot solve your issue if it cannot reproduce it. If your bug is related to training, add your training script and make everything needed to train public. Otherwise, just add a simple Python code snippet.*
- 3. Add the **minimum** amount of code / context that is needed to understand, reproduce your issue.
@@ -19,7 +19,7 @@ body:
- type: markdown
attributes:
value: |
For more in-detail information on how to write good issues you can have a look [here](https://huggingface.co/course/chapter8/5?fw=pt)
For more in-detail information on how to write good issues you can have a look [here](https://huggingface.co/course/chapter8/5?fw=pt).
- type: textarea
id: bug-description
attributes:
@@ -47,7 +47,7 @@ body:
attributes:
label: System Info
description: Please share your system info with us. You can run the command `diffusers-cli env` and copy-paste its output below.
placeholder: diffusers version, platform, python version, ...
placeholder: Diffusers version, platform, Python version, ...
validations:
required: true
- type: textarea
@@ -55,7 +55,7 @@ body:
attributes:
label: Who can help?
description: |
Your issue will be replied to more quickly if you can figure out the right person to tag with @
Your issue will be replied to more quickly if you can figure out the right person to tag with @.
If you know how to use git blame, that is the easiest way, otherwise, here is a rough guide of **who to tag**.
All issues are read by one of the core maintainers, so if you don't know who to tag, just leave this blank and
@@ -66,7 +66,7 @@ body:
Questions on DiffusionPipeline (Saving, Loading, From pretrained, ...):
Questions on pipelines:
- Stable Diffusion @yiyixuxu @DN6 @patrickvonplaten @sayakpaul @patrickvonplaten
- Stable Diffusion @yiyixuxu @DN6 @sayakpaul @patrickvonplaten
- Stable Diffusion XL @yiyixuxu @sayakpaul @DN6 @patrickvonplaten
- Kandinsky @yiyixuxu @patrickvonplaten
- ControlNet @sayakpaul @yiyixuxu @DN6 @patrickvonplaten

View File

@@ -1,7 +1,4 @@
contact_links:
- name: Blank issue
url: https://github.com/huggingface/diffusers/issues/new
about: Other
- name: Forum
url: https://discuss.huggingface.co/
about: General usage questions and community discussions
- name: Questions / Discussions
url: https://github.com/huggingface/diffusers/discussions
about: General usage questions and community discussions

View File

@@ -1,5 +1,5 @@
---
name: "\U0001F680 Feature request"
name: "\U0001F680 Feature Request"
about: Suggest an idea for this project
title: ''
labels: ''
@@ -8,13 +8,13 @@ assignees: ''
---
**Is your feature request related to a problem? Please describe.**
A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]
A clear and concise description of what the problem is. Ex. I'm always frustrated when [...].
**Describe the solution you'd like**
**Describe the solution you'd like.**
A clear and concise description of what you want to happen.
**Describe alternatives you've considered**
**Describe alternatives you've considered.**
A clear and concise description of any alternative solutions or features you've considered.
**Additional context**
**Additional context.**
Add any other context or screenshots about the feature request here.

View File

@@ -1,5 +1,5 @@
name: "\U0001F31F New model/pipeline/scheduler addition"
description: Submit a proposal/request to implement a new diffusion model / pipeline / scheduler
name: "\U0001F31F New Model/Pipeline/Scheduler Addition"
description: Submit a proposal/request to implement a new diffusion model/pipeline/scheduler
labels: [ "New model/pipeline/scheduler" ]
body:
@@ -19,7 +19,7 @@ body:
description: |
Please note that if the model implementation isn't available or if the weights aren't open-source, we are less likely to implement it in `diffusers`.
options:
- label: "The model implementation is available"
- label: "The model implementation is available."
- label: "The model weights are available (Only relevant if addition is not a scheduler)."
- type: textarea

29
.github/ISSUE_TEMPLATE/translate.md vendored Normal file
View File

@@ -0,0 +1,29 @@
---
name: 🌐 Translating a New Language?
about: Start a new translation effort in your language
title: '[<languageCode>] Translating docs to <languageName>'
labels: WIP
assignees: ''
---
<!--
Note: Please search to see if an issue already exists for the language you are trying to translate.
-->
Hi!
Let's bring the documentation to all the <languageName>-speaking community 🌐.
Who would want to translate? Please follow the 🤗 [TRANSLATING guide](https://github.com/huggingface/diffusers/blob/main/docs/TRANSLATING.md). Here is a list of the files ready for translation. Let us know in this issue if you'd like to translate any, and we'll add your name to the list.
Some notes:
* Please translate using an informal tone (imagine you are talking with a friend about Diffusers 🤗).
* Please translate in a gender-neutral way.
* Add your translations to the folder called `<languageCode>` inside the [source folder](https://github.com/huggingface/diffusers/tree/main/docs/source).
* Register your translation in `<languageCode>/_toctree.yml`; please follow the order of the [English version](https://github.com/huggingface/diffusers/blob/main/docs/source/en/_toctree.yml).
* Once you're finished, open a pull request and tag this issue by including #issue-number in the description, where issue-number is the number of this issue. Please ping @stevhliu for review.
* 🙋 If you'd like others to help you with the translation, you can also post in the 🤗 [forums](https://discuss.huggingface.co/c/discussion-related-to-httpsgithubcomhuggingfacediffusers/63).
Thank you so much for your help! 🤗

View File

@@ -19,10 +19,10 @@ Fixes # (issue)
- [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
- [ ] Did you read the [contributor guideline](https://github.com/huggingface/diffusers/blob/main/CONTRIBUTING.md)?
- [ ] Did you read our [philosophy doc](https://github.com/huggingface/diffusers/blob/main/PHILOSOPHY.md) (important for complex PRs)?
- [ ] Was this discussed/approved via a Github issue or the [forum](https://discuss.huggingface.co/)? Please add a link to it if that's the case.
- [ ] Was this discussed/approved via a GitHub issue or the [forum](https://discuss.huggingface.co/c/discussion-related-to-httpsgithubcomhuggingfacediffusers/63)? Please add a link to it if that's the case.
- [ ] Did you make sure to update the documentation with your changes? Here are the
[documentation guidelines](https://github.com/huggingface/diffusers/tree/main/docs), and
[here are tips on formatting docstrings](https://github.com/huggingface/transformers/tree/main/docs#writing-source-documentation).
[here are tips on formatting docstrings](https://github.com/huggingface/diffusers/tree/main/docs#writing-source-documentation).
- [ ] Did you write any new necessary tests?
@@ -31,14 +31,14 @@ Fixes # (issue)
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.
<!-- Your PR will be replied to more quickly if you can figure out the right person to tag with @
<!-- Your PR will be replied to more quickly if you can figure out the right person to tag with @.
If you know how to use git blame, that is the easiest way, otherwise, here is a rough guide of **who to tag**.
Please tag fewer than 3 people.
Core library:
- Schedulers: @williamberman and @patrickvonplaten
- Schedulers: @yiyixuxu and @patrickvonplaten
- Pipelines: @patrickvonplaten and @sayakpaul
- Training examples: @sayakpaul and @patrickvonplaten
- Docs: @stevhliu and @yiyixuxu

View File

@@ -4,7 +4,7 @@ description: Sets up miniconda in your ${RUNNER_TEMP} environment and gives you
inputs:
python-version:
description: If set to any value, dont use sudo to clean the workspace
description: If set to any value, don't use sudo to clean the workspace
required: false
type: string
default: "3.9"

52
.github/workflows/benchmark.yml vendored Normal file
View File

@@ -0,0 +1,52 @@
name: Benchmarking tests
on:
schedule:
- cron: "30 1 1,15 * *" # every 2 weeks on the 1st and the 15th of every month at 1:30 AM
env:
DIFFUSERS_IS_CI: yes
HF_HOME: /mnt/cache
OMP_NUM_THREADS: 8
MKL_NUM_THREADS: 8
jobs:
torch_pipelines_cuda_benchmark_tests:
name: Torch Core Pipelines CUDA Benchmarking Tests
strategy:
fail-fast: false
max-parallel: 1
runs-on: [single-gpu, nvidia-gpu, a10, ci]
container:
image: diffusers/diffusers-pytorch-cuda
options: --shm-size "16gb" --ipc host -v /mnt/hf_cache:/mnt/cache/ --gpus 0
steps:
- name: Checkout diffusers
uses: actions/checkout@v3
with:
fetch-depth: 2
- name: NVIDIA-SMI
run: |
nvidia-smi
- name: Install dependencies
run: |
apt-get update && apt-get install libsndfile1-dev libgl1 -y
python -m pip install -e .[quality,test]
python -m pip install pandas
- name: Environment
run: |
python utils/print_env.py
- name: Diffusers Benchmarking
env:
HUGGING_FACE_HUB_TOKEN: ${{ secrets.DIFFUSERS_BOT_TOKEN }}
BASE_PATH: benchmark_outputs
run: |
export TOTAL_GPU_MEMORY=$(python -c "import torch; print(torch.cuda.get_device_properties(0).total_memory / (1024**3))")
cd benchmarks && mkdir ${BASE_PATH} && python run_all.py && python push_results.py
- name: Test suite reports artifacts
if: ${{ always() }}
uses: actions/upload-artifact@v2
with:
name: benchmark_test_reports
path: benchmarks/benchmark_outputs

View File

@@ -1,14 +0,0 @@
name: Delete doc comment
on:
workflow_run:
workflows: ["Delete doc comment trigger"]
types:
- completed
jobs:
delete:
uses: huggingface/doc-builder/.github/workflows/delete_doc_comment.yml@main
secrets:
comment_bot_token: ${{ secrets.COMMENT_BOT_TOKEN }}

View File

@@ -1,12 +0,0 @@
name: Delete doc comment trigger
on:
pull_request:
types: [ closed ]
jobs:
delete:
uses: huggingface/doc-builder/.github/workflows/delete_doc_comment_trigger.yml@main
with:
pr_number: ${{ github.event.number }}

View File

@@ -27,9 +27,8 @@ jobs:
pip install .[quality]
- name: Check quality
run: |
black --check examples tests src utils scripts
ruff examples tests src utils scripts
doc-builder style src/diffusers docs/source --max_len 119 --check_only --path_to_docs docs/source
ruff check examples tests src utils scripts
ruff format examples tests src utils scripts --check
check_repository_consistency:
runs-on: ubuntu-latest

170
.github/workflows/pr_test_fetcher.yml vendored Normal file
View File

@@ -0,0 +1,170 @@
name: Fast tests for PRs - Test Fetcher
on: workflow_dispatch
env:
DIFFUSERS_IS_CI: yes
OMP_NUM_THREADS: 4
MKL_NUM_THREADS: 4
PYTEST_TIMEOUT: 60
concurrency:
group: ${{ github.workflow }}-${{ github.head_ref || github.run_id }}
cancel-in-progress: true
jobs:
setup_pr_tests:
name: Setup PR Tests
runs-on: docker-cpu
container:
image: diffusers/diffusers-pytorch-cpu
options: --shm-size "16gb" --ipc host -v /mnt/hf_cache:/mnt/cache/
defaults:
run:
shell: bash
outputs:
matrix: ${{ steps.set_matrix.outputs.matrix }}
test_map: ${{ steps.set_matrix.outputs.test_map }}
steps:
- name: Checkout diffusers
uses: actions/checkout@v3
with:
fetch-depth: 0
- name: Install dependencies
run: |
apt-get update && apt-get install libsndfile1-dev libgl1 -y
python -m pip install -e .[quality,test]
- name: Environment
run: |
python utils/print_env.py
echo $(git --version)
- name: Fetch Tests
run: |
python utils/tests_fetcher.py | tee test_preparation.txt
- name: Report fetched tests
uses: actions/upload-artifact@v3
with:
name: test_fetched
path: test_preparation.txt
- id: set_matrix
name: Create Test Matrix
# The `keys` is used as GitHub actions matrix for jobs, i.e. `models`, `pipelines`, etc.
# The `test_map` is used to get the actual identified test files under each key.
# If no test to run (so no `test_map.json` file), create a dummy map (empty matrix will fail)
run: |
if [ -f test_map.json ]; then
keys=$(python3 -c 'import json; fp = open("test_map.json"); test_map = json.load(fp); fp.close(); d = list(test_map.keys()); print(json.dumps(d))')
test_map=$(python3 -c 'import json; fp = open("test_map.json"); test_map = json.load(fp); fp.close(); print(json.dumps(test_map))')
else
keys=$(python3 -c 'keys = ["dummy"]; print(keys)')
test_map=$(python3 -c 'test_map = {"dummy": []}; print(test_map)')
fi
echo $keys
echo $test_map
echo "matrix=$keys" >> $GITHUB_OUTPUT
echo "test_map=$test_map" >> $GITHUB_OUTPUT
run_pr_tests:
name: Run PR Tests
needs: setup_pr_tests
if: contains(fromJson(needs.setup_pr_tests.outputs.matrix), 'dummy') != true
strategy:
fail-fast: false
max-parallel: 2
matrix:
modules: ${{ fromJson(needs.setup_pr_tests.outputs.matrix) }}
runs-on: docker-cpu
container:
image: diffusers/diffusers-pytorch-cpu
options: --shm-size "16gb" --ipc host -v /mnt/hf_cache:/mnt/cache/
defaults:
run:
shell: bash
steps:
- name: Checkout diffusers
uses: actions/checkout@v3
with:
fetch-depth: 2
- name: Install dependencies
run: |
apt-get update && apt-get install libsndfile1-dev libgl1 -y
python -m pip install -e .[quality,test]
python -m pip install accelerate
- name: Environment
run: |
python utils/print_env.py
- name: Run all selected tests on CPU
run: |
python -m pytest -n 2 --dist=loadfile -v --make-reports=${{ matrix.modules }}_tests_cpu ${{ fromJson(needs.setup_pr_tests.outputs.test_map)[matrix.modules] }}
- name: Failure short reports
if: ${{ failure() }}
continue-on-error: true
run: |
cat reports/${{ matrix.modules }}_tests_cpu_stats.txt
cat reports/${{ matrix.modules }}_tests_cpu_failures_short.txt
- name: Test suite reports artifacts
if: ${{ always() }}
uses: actions/upload-artifact@v3
with:
name: ${{ matrix.modules }}_test_reports
path: reports
run_staging_tests:
strategy:
fail-fast: false
matrix:
config:
- name: Hub tests for models, schedulers, and pipelines
framework: hub_tests_pytorch
runner: docker-cpu
image: diffusers/diffusers-pytorch-cpu
report: torch_hub
name: ${{ matrix.config.name }}
runs-on: ${{ matrix.config.runner }}
container:
image: ${{ matrix.config.image }}
options: --shm-size "16gb" --ipc host -v /mnt/hf_cache:/mnt/cache/
defaults:
run:
shell: bash
steps:
- name: Checkout diffusers
uses: actions/checkout@v3
with:
fetch-depth: 2
- name: Install dependencies
run: |
apt-get update && apt-get install libsndfile1-dev libgl1 -y
python -m pip install -e .[quality,test]
- name: Environment
run: |
python utils/print_env.py
- name: Run Hub tests for models, schedulers, and pipelines on a staging env
if: ${{ matrix.config.framework == 'hub_tests_pytorch' }}
run: |
HUGGINGFACE_CO_STAGING=true python -m pytest \
-m "is_staging_test" \
--make-reports=tests_${{ matrix.config.report }} \
tests
- name: Failure short reports
if: ${{ failure() }}
run: cat reports/tests_${{ matrix.config.report }}_failures_short.txt
- name: Test suite reports artifacts
if: ${{ always() }}
uses: actions/upload-artifact@v2
with:
name: pr_${{ matrix.config.report }}_test_reports
path: reports

View File

@@ -20,20 +20,15 @@ jobs:
strategy:
fail-fast: false
matrix:
config:
- name: LoRA
framework: lora
runner: docker-cpu
image: diffusers/diffusers-pytorch-cpu
report: torch_cpu_lora
lib-versions: ["main", "latest"]
name: ${{ matrix.config.name }}
name: LoRA - ${{ matrix.lib-versions }}
runs-on: ${{ matrix.config.runner }}
runs-on: docker-cpu
container:
image: ${{ matrix.config.image }}
image: diffusers/diffusers-pytorch-cpu
options: --shm-size "16gb" --ipc host -v /mnt/hf_cache:/mnt/cache/
defaults:
@@ -50,18 +45,21 @@ jobs:
run: |
apt-get update && apt-get install libsndfile1-dev libgl1 -y
python -m pip install -e .[quality,test]
python -m pip install git+https://github.com/huggingface/accelerate.git
python -m pip install -U git+https://github.com/huggingface/transformers.git
python -m pip install -U git+https://github.com/huggingface/peft.git
if [ "${{ matrix.lib-versions }}" == "main" ]; then
python -m pip install -U git+https://github.com/huggingface/peft.git
python -m pip install -U git+https://github.com/huggingface/transformers.git
python -m pip install -U git+https://github.com/huggingface/accelerate.git
else
python -m pip install -U peft transformers accelerate
fi
- name: Environment
run: |
python utils/print_env.py
- name: Run fast PyTorch LoRA CPU tests with PEFT backend
if: ${{ matrix.config.framework == 'lora' }}
run: |
python -m pytest -n 2 --max-worker-restart=0 --dist=loadfile \
python -m pytest -n 1 --max-worker-restart=0 --dist=loadfile \
-s -v \
--make-reports=tests_${{ matrix.config.report }} \
tests/lora/test_lora_layers_peft.py
tests/lora/test_lora_layers_peft.py

View File

@@ -113,9 +113,10 @@ jobs:
- name: Run example PyTorch CPU tests
if: ${{ matrix.config.framework == 'pytorch_examples' }}
run: |
python -m pip install peft
python -m pytest -n 2 --max-worker-restart=0 --dist=loadfile \
--make-reports=tests_${{ matrix.config.report }} \
examples/test_examples.py
examples
- name: Failure short reports
if: ${{ failure() }}

View File

@@ -189,7 +189,7 @@ jobs:
CUBLAS_WORKSPACE_CONFIG: :16:8
run: |
python -m pytest -n 1 --max-worker-restart=0 --dist=loadfile \
-s -v -k "not Flax and not Onnx" \
-s -v -k "not Flax and not Onnx and not PEFTLoRALoading" \
--make-reports=tests_peft_cuda \
tests/lora/

View File

@@ -5,6 +5,10 @@ on:
branches:
- main
concurrency:
group: ${{ github.workflow }}-${{ github.head_ref || github.run_id }}
cancel-in-progress: true
env:
DIFFUSERS_IS_CI: yes
HF_HOME: /mnt/cache
@@ -94,9 +98,10 @@ jobs:
- name: Run example PyTorch CPU tests
if: ${{ matrix.config.framework == 'pytorch_examples' }}
run: |
python -m pip install peft
python -m pytest -n 2 --max-worker-restart=0 --dist=loadfile \
--make-reports=tests_${{ matrix.config.report }} \
examples/test_examples.py
examples
- name: Failure short reports
if: ${{ failure() }}

View File

@@ -13,6 +13,10 @@ env:
PYTEST_TIMEOUT: 600
RUN_SLOW: no
concurrency:
group: ${{ github.workflow }}-${{ github.head_ref || github.run_id }}
cancel-in-progress: true
jobs:
run_fast_tests_apple_m1:
name: Fast PyTorch MPS tests on MacOS

10
.gitignore vendored
View File

@@ -1,4 +1,4 @@
# Initially taken from Github's Python gitignore file
# Initially taken from GitHub's Python gitignore file
# Byte-compiled / optimized / DLL files
__pycache__/
@@ -34,7 +34,7 @@ wheels/
MANIFEST
# PyInstaller
# Usually these files are written by a python script from a template
# Usually these files are written by a Python script from a template
# before PyInstaller builds the exe, so as to inject date/other infos into it.
*.manifest
*.spec
@@ -153,7 +153,7 @@ debug.env
# vim
.*.swp
#ctags
# ctags
tags
# pre-commit
@@ -164,6 +164,7 @@ tags
# DS_Store (MacOS)
.DS_Store
# RL pipelines may produce mp4 outputs
*.mp4
@@ -173,4 +174,5 @@ tags
# ruff
.ruff_cache
wandb
# wandb
wandb

View File

@@ -31,10 +31,12 @@ keywords:
- deep-learning
- pytorch
- image-generation
- hacktoberfest
- diffusion
- text2image
- image2image
- score-based-generative-modeling
- stable-diffusion
- stable-diffusion-diffusers
license: Apache-2.0
version: 0.12.1

View File

@@ -7,7 +7,7 @@ We as members, contributors, and leaders pledge to make participation in our
community a harassment-free experience for everyone, regardless of age, body
size, visible or invisible disability, ethnicity, sex characteristics, gender
identity and expression, level of experience, education, socio-economic status,
nationality, personal appearance, race, religion, or sexual identity
nationality, personal appearance, race, caste, color, religion, or sexual identity
and orientation.
We pledge to act and interact in ways that contribute to an open, welcoming,
@@ -24,7 +24,7 @@ community include:
* Accepting responsibility and apologizing to those affected by our mistakes,
and learning from the experience
* Focusing on what is best not just for us as individuals, but for the
overall diffusers community
overall Diffusers community
Examples of unacceptable behavior include:
@@ -117,8 +117,8 @@ the community.
## Attribution
This Code of Conduct is adapted from the [Contributor Covenant][homepage],
version 2.0, available at
https://www.contributor-covenant.org/version/2/0/code_of_conduct.html.
version 2.1, available at
https://www.contributor-covenant.org/version/2/1/code_of_conduct.html.
Community Impact Guidelines were inspired by [Mozilla's code of conduct
enforcement ladder](https://github.com/mozilla/diversity).

View File

@@ -14,7 +14,7 @@ specific language governing permissions and limitations under the License.
We ❤️ contributions from the open-source community! Everyone is welcome, and all types of participation not just code are valued and appreciated. Answering questions, helping others, reaching out, and improving the documentation are all immensely valuable to the community, so don't be afraid and get involved if you're up for it!
Everyone is encouraged to start by saying 👋 in our public Discord channel. We discuss the latest trends in diffusion models, ask questions, show off personal projects, help each other with contributions, or just hang out ☕. <a href="https://Discord.gg/G7tWnz98XR"><img alt="Join us on Discord" src="https://img.shields.io/Discord/823813159592001537?color=5865F2&logo=Discord&logoColor=white"></a>
Everyone is encouraged to start by saying 👋 in our public Discord channel. We discuss the latest trends in diffusion models, ask questions, show off personal projects, help each other with contributions, or just hang out ☕. <a href="https://discord.gg/G7tWnz98XR"><img alt="Join us on Discord" src="https://img.shields.io/discord/823813159592001537?color=5865F2&logo=Discord&logoColor=white"></a>
Whichever way you choose to contribute, we strive to be part of an open, welcoming, and kind community. Please, read our [code of conduct](https://github.com/huggingface/diffusers/blob/main/CODE_OF_CONDUCT.md) and be mindful to respect it during your interactions. We also recommend you become familiar with the [ethical guidelines](https://huggingface.co/docs/diffusers/conceptual/ethical_guidelines) that guide our project and ask you to adhere to the same principles of transparency and responsibility.
@@ -28,11 +28,11 @@ the core library.
In the following, we give an overview of different ways to contribute, ranked by difficulty in ascending order. All of them are valuable to the community.
* 1. Asking and answering questions on [the Diffusers discussion forum](https://discuss.huggingface.co/c/discussion-related-to-httpsgithubcomhuggingfacediffusers) or on [Discord](https://discord.gg/G7tWnz98XR).
* 2. Opening new issues on [the GitHub Issues tab](https://github.com/huggingface/diffusers/issues/new/choose)
* 3. Answering issues on [the GitHub Issues tab](https://github.com/huggingface/diffusers/issues)
* 2. Opening new issues on [the GitHub Issues tab](https://github.com/huggingface/diffusers/issues/new/choose).
* 3. Answering issues on [the GitHub Issues tab](https://github.com/huggingface/diffusers/issues).
* 4. Fix a simple issue, marked by the "Good first issue" label, see [here](https://github.com/huggingface/diffusers/issues?q=is%3Aopen+is%3Aissue+label%3A%22good+first+issue%22).
* 5. Contribute to the [documentation](https://github.com/huggingface/diffusers/tree/main/docs/source).
* 6. Contribute a [Community Pipeline](https://github.com/huggingface/diffusers/issues?q=is%3Aopen+is%3Aissue+label%3Acommunity-examples)
* 6. Contribute a [Community Pipeline](https://github.com/huggingface/diffusers/issues?q=is%3Aopen+is%3Aissue+label%3Acommunity-examples).
* 7. Contribute to the [examples](https://github.com/huggingface/diffusers/tree/main/examples).
* 8. Fix a more difficult issue, marked by the "Good second issue" label, see [here](https://github.com/huggingface/diffusers/issues?q=is%3Aopen+is%3Aissue+label%3A%22Good+second+issue%22).
* 9. Add a new pipeline, model, or scheduler, see ["New Pipeline/Model"](https://github.com/huggingface/diffusers/issues?q=is%3Aopen+is%3Aissue+label%3A%22New+pipeline%2Fmodel%22) and ["New scheduler"](https://github.com/huggingface/diffusers/issues?q=is%3Aopen+is%3Aissue+label%3A%22New+scheduler%22) issues. For this contribution, please have a look at [Design Philosophy](https://github.com/huggingface/diffusers/blob/main/PHILOSOPHY.md).
@@ -40,7 +40,7 @@ In the following, we give an overview of different ways to contribute, ranked by
As said before, **all contributions are valuable to the community**.
In the following, we will explain each contribution a bit more in detail.
For all contributions 4.-9. you will need to open a PR. It is explained in detail how to do so in [Opening a pull request](#how-to-open-a-pr)
For all contributions 4-9, you will need to open a PR. It is explained in detail how to do so in [Opening a pull request](#how-to-open-a-pr).
### 1. Asking and answering questions on the Diffusers discussion forum or on the Diffusers Discord
@@ -91,12 +91,12 @@ open a new issue nevertheless and link to the related issue.
New issues usually include the following.
#### 2.1. Reproducible, minimal bug reports.
#### 2.1. Reproducible, minimal bug reports
A bug report should always have a reproducible code snippet and be as minimal and concise as possible.
This means in more detail:
- Narrow the bug down as much as you can, **do not just dump your whole code file**
- Format your code
- Narrow the bug down as much as you can, **do not just dump your whole code file**.
- Format your code.
- Do not include any external libraries except for Diffusers depending on them.
- **Always** provide all necessary information about your environment; for this, you can run: `diffusers-cli env` in your shell and copy-paste the displayed information to the issue.
- Explain the issue. If the reader doesn't know what the issue is and why it is an issue, she cannot solve it.
@@ -105,9 +105,9 @@ This means in more detail:
For more information, please have a look through the [How to write a good issue](#how-to-write-a-good-issue) section.
You can open a bug report [here](https://github.com/huggingface/diffusers/issues/new/choose).
You can open a bug report [here](https://github.com/huggingface/diffusers/issues/new?assignees=&labels=bug&projects=&template=bug-report.yml).
#### 2.2. Feature requests.
#### 2.2. Feature requests
A world-class feature request addresses the following points:
@@ -125,21 +125,21 @@ Awesome! Tell us what problem it solved for you.
You can open a feature request [here](https://github.com/huggingface/diffusers/issues/new?assignees=&labels=&template=feature_request.md&title=).
#### 2.3 Feedback.
#### 2.3 Feedback
Feedback about the library design and why it is good or not good helps the core maintainers immensely to build a user-friendly library. To understand the philosophy behind the current design philosophy, please have a look [here](https://huggingface.co/docs/diffusers/conceptual/philosophy). If you feel like a certain design choice does not fit with the current design philosophy, please explain why and how it should be changed. If a certain design choice follows the design philosophy too much, hence restricting use cases, explain why and how it should be changed.
If a certain design choice is very useful for you, please also leave a note as this is great feedback for future design decisions.
You can open an issue about feedback [here](https://github.com/huggingface/diffusers/issues/new?assignees=&labels=&template=feedback.md&title=).
#### 2.4 Technical questions.
#### 2.4 Technical questions
Technical questions are mainly about why certain code of the library was written in a certain way, or what a certain part of the code does. Please make sure to link to the code in question and please provide detail on
why this part of the code is difficult to understand.
You can open an issue about a technical question [here](https://github.com/huggingface/diffusers/issues/new?assignees=&labels=bug&template=bug-report.yml).
#### 2.5 Proposal to add a new model, scheduler, or pipeline.
#### 2.5 Proposal to add a new model, scheduler, or pipeline
If the diffusion model community released a new model, pipeline, or scheduler that you would like to see in the Diffusers library, please provide the following information:
@@ -156,14 +156,14 @@ You can open a request for a model/pipeline/scheduler [here](https://github.com/
Answering issues on GitHub might require some technical knowledge of Diffusers, but we encourage everybody to give it a try even if you are not 100% certain that your answer is correct.
Some tips to give a high-quality answer to an issue:
- Be as concise and minimal as possible
- Be as concise and minimal as possible.
- Stay on topic. An answer to the issue should concern the issue and only the issue.
- Provide links to code, papers, or other sources that prove or encourage your point.
- Answer in code. If a simple code snippet is the answer to the issue or shows how the issue can be solved, please provide a fully reproducible code snippet.
Also, many issues tend to be simply off-topic, duplicates of other issues, or irrelevant. It is of great
help to the maintainers if you can answer such issues, encouraging the author of the issue to be
more precise, provide the link to a duplicated issue or redirect them to [the forum](https://discuss.huggingface.co/c/discussion-related-to-httpsgithubcomhuggingfacediffusers/63) or [Discord](https://discord.gg/G7tWnz98XR)
more precise, provide the link to a duplicated issue or redirect them to [the forum](https://discuss.huggingface.co/c/discussion-related-to-httpsgithubcomhuggingfacediffusers/63) or [Discord](https://discord.gg/G7tWnz98XR).
If you have verified that the issued bug report is correct and requires a correction in the source code,
please have a look at the next sections.
@@ -202,7 +202,7 @@ Please have a look at [this page](https://github.com/huggingface/diffusers/tree/
### 6. Contribute a community pipeline
[Pipelines](https://huggingface.co/docs/diffusers/api/pipelines/overview) are usually the first point of contact between the Diffusers library and the user.
Pipelines are examples of how to use Diffusers [models](https://huggingface.co/docs/diffusers/api/models) and [schedulers](https://huggingface.co/docs/diffusers/api/schedulers/overview).
Pipelines are examples of how to use Diffusers [models](https://huggingface.co/docs/diffusers/api/models/overview) and [schedulers](https://huggingface.co/docs/diffusers/api/schedulers/overview).
We support two types of pipelines:
- Official Pipelines
@@ -242,27 +242,27 @@ We support two types of training examples:
Research training examples are located in [examples/research_projects](https://github.com/huggingface/diffusers/tree/main/examples/research_projects) whereas official training examples include all folders under [examples](https://github.com/huggingface/diffusers/tree/main/examples) except the `research_projects` and `community` folders.
The official training examples are maintained by the Diffusers' core maintainers whereas the research training examples are maintained by the community.
This is because of the same reasons put forward in [6. Contribute a community pipeline](#contribute-a-community-pipeline) for official pipelines vs. community pipelines: It is not feasible for the core maintainers to maintain all possible training methods for diffusion models.
This is because of the same reasons put forward in [6. Contribute a community pipeline](#6-contribute-a-community-pipeline) for official pipelines vs. community pipelines: It is not feasible for the core maintainers to maintain all possible training methods for diffusion models.
If the Diffusers core maintainers and the community consider a certain training paradigm to be too experimental or not popular enough, the corresponding training code should be put in the `research_projects` folder and maintained by the author.
Both official training and research examples consist of a directory that contains one or more training scripts, a requirements.txt file, and a README.md file. In order for the user to make use of the
training examples, it is required to clone the repository:
```
```bash
git clone https://github.com/huggingface/diffusers
```
as well as to install all additional dependencies required for training:
```
```bash
pip install -r /examples/<your-example-folder>/requirements.txt
```
Therefore when adding an example, the `requirements.txt` file shall define all pip dependencies required for your training example so that once all those are installed, the user can run the example's training script. See, for example, the [DreamBooth `requirements.txt` file](https://github.com/huggingface/diffusers/blob/main/examples/dreambooth/requirements.txt).
Training examples of the Diffusers library should adhere to the following philosophy:
- All the code necessary to run the examples should be found in a single Python file
- One should be able to run the example from the command line with `python <your-example>.py --args`
- All the code necessary to run the examples should be found in a single Python file.
- One should be able to run the example from the command line with `python <your-example>.py --args`.
- Examples should be kept simple and serve as **an example** on how to use Diffusers for training. The purpose of example scripts is **not** to create state-of-the-art diffusion models, but rather to reproduce known training schemes without adding too much custom logic. As a byproduct of this point, our examples also strive to serve as good educational materials.
To contribute an example, it is highly recommended to look at already existing examples such as [dreambooth](https://github.com/huggingface/diffusers/blob/main/examples/dreambooth/train_dreambooth.py) to get an idea of how they should look like.
@@ -281,7 +281,7 @@ If you are contributing to the official training examples, please also make sure
usually more complicated to solve than [Good first issues](https://github.com/huggingface/diffusers/issues?q=is%3Aopen+is%3Aissue+label%3A%22good+first+issue%22).
The issue description usually gives less guidance on how to fix the issue and requires
a decent understanding of the library by the interested contributor.
If you are interested in tackling a second good issue, feel free to open a PR to fix it and link the PR to the issue. If you see that a PR has already been opened for this issue but did not get merged, have a look to understand why it wasn't merged and try to open an improved PR.
If you are interested in tackling a good second issue, feel free to open a PR to fix it and link the PR to the issue. If you see that a PR has already been opened for this issue but did not get merged, have a look to understand why it wasn't merged and try to open an improved PR.
Good second issues are usually more difficult to get merged compared to good first issues, so don't hesitate to ask for help from the core maintainers. If your PR is almost finished the core maintainers can also jump into your PR and commit to it in order to get it merged.
### 9. Adding pipelines, models, schedulers
@@ -337,8 +337,8 @@ to be merged;
9. Add high-coverage tests. No quality testing = no merge.
- If you are adding new `@slow` tests, make sure they pass using
`RUN_SLOW=1 python -m pytest tests/test_my_new_model.py`.
CircleCI does not run the slow tests, but GitHub actions does every night!
10. All public methods must have informative docstrings that work nicely with markdown. See `[pipeline_latent_diffusion.py](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/latent_diffusion/pipeline_latent_diffusion.py)` for an example.
CircleCI does not run the slow tests, but GitHub Actions does every night!
10. All public methods must have informative docstrings that work nicely with markdown. See [`pipeline_latent_diffusion.py`](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/latent_diffusion/pipeline_latent_diffusion.py) for an example.
11. Due to the rapidly growing repository, it is important to make sure that no files that would significantly weigh down the repository are added. This includes images, videos, and other non-text files. We prefer to leverage a hf.co hosted `dataset` like
[`hf-internal-testing`](https://huggingface.co/hf-internal-testing) or [huggingface/documentation-images](https://huggingface.co/datasets/huggingface/documentation-images) to place these files.
If an external contribution, feel free to add the images to your PR and ask a Hugging Face member to migrate your images
@@ -355,7 +355,7 @@ You will need basic `git` proficiency to be able to contribute to
manual. Type `git --help` in a shell and enjoy. If you prefer books, [Pro
Git](https://git-scm.com/book/en/v2) is a very good reference.
Follow these steps to start contributing ([supported Python versions](https://github.com/huggingface/diffusers/blob/main/setup.py#L244)):
Follow these steps to start contributing ([supported Python versions](https://github.com/huggingface/diffusers/blob/main/setup.py#L265)):
1. Fork the [repository](https://github.com/huggingface/diffusers) by
clicking on the 'Fork' button on the repository's page. This creates a copy of the code
@@ -364,7 +364,7 @@ under your GitHub user account.
2. Clone your fork to your local disk, and add the base repository as a remote:
```bash
$ git clone git@github.com:<your Github handle>/diffusers.git
$ git clone git@github.com:<your GitHub handle>/diffusers.git
$ cd diffusers
$ git remote add upstream https://github.com/huggingface/diffusers.git
```
@@ -402,7 +402,7 @@ with this command:
$ pip install -e ".[test]"
```
You can run the full test suite with the following command, but it takes
You can also run the full test suite with the following command, but it takes
a beefy machine to produce a result in a decent amount of time now that
Diffusers has grown a lot. Here is the command for it:
@@ -410,7 +410,7 @@ Diffusers has grown a lot. Here is the command for it:
$ make test
```
🧨 Diffusers relies on `black` and `isort` to format its source code
🧨 Diffusers relies on `ruff` and `isort` to format its source code
consistently. After you make changes, apply automatic style corrections and code verifications
that can't be automated in one go with:
@@ -430,7 +430,7 @@ make a commit with `git commit` to record your changes locally:
```bash
$ git add modified_file.py
$ git commit
$ git commit -m "A descriptive message about your changes."
```
It is a good idea to sync your copy of the code with the original
@@ -493,7 +493,7 @@ To avoid pinging the upstream repository which adds reference notes to each upst
when syncing the main branch of a forked repository, please, follow these steps:
1. When possible, avoid syncing with the upstream using a branch and PR on the forked repository. Instead, merge directly into the forked main.
2. If a PR is absolutely necessary, use the following steps after checking out your branch:
```
```bash
$ git checkout -b your-branch-for-syncing
$ git pull --squash --no-commit upstream main
$ git commit -m '<your message without GitHub references>'
@@ -502,4 +502,4 @@ $ git push --set-upstream origin your-branch-for-syncing
### Style guide
For documentation strings, 🧨 Diffusers follows the [google style](https://google.github.io/styleguide/pyguide.html).
For documentation strings, 🧨 Diffusers follows the [Google style](https://google.github.io/styleguide/pyguide.html).

View File

@@ -3,14 +3,14 @@
# make sure to test the local checkout in scripts and not the pre-installed one (don't use quotes!)
export PYTHONPATH = src
check_dirs := examples scripts src tests utils
check_dirs := examples scripts src tests utils benchmarks
modified_only_fixup:
$(eval modified_py_files := $(shell python utils/get_modified_files.py $(check_dirs)))
@if test -n "$(modified_py_files)"; then \
echo "Checking/fixing $(modified_py_files)"; \
black $(modified_py_files); \
ruff $(modified_py_files); \
ruff check $(modified_py_files) --fix; \
ruff format $(modified_py_files);\
else \
echo "No library .py files were modified"; \
fi
@@ -40,23 +40,21 @@ repo-consistency:
# this target runs checks on all files
quality:
black --check $(check_dirs)
ruff $(check_dirs)
doc-builder style src/diffusers docs/source --max_len 119 --check_only --path_to_docs docs/source
ruff check $(check_dirs) setup.py
ruff format --check $(check_dirs) setup.py
python utils/check_doc_toc.py
# Format source code automatically and check is there are any problems left that need manual fixing
extra_style_checks:
python utils/custom_init_isort.py
doc-builder style src/diffusers docs/source --max_len 119 --path_to_docs docs/source
python utils/check_doc_toc.py --fix_and_overwrite
# this target runs checks on all files and potentially modifies some of them
style:
black $(check_dirs)
ruff $(check_dirs) --fix
ruff check $(check_dirs) setup.py --fix
ruff format $(check_dirs) setup.py
${MAKE} autogenerate_code
${MAKE} extra_style_checks

View File

@@ -22,7 +22,7 @@ In a nutshell, Diffusers is built to be a natural extension of PyTorch. Therefor
## Usability over Performance
- While Diffusers has many built-in performance-enhancing features (see [Memory and Speed](https://huggingface.co/docs/diffusers/optimization/fp16)), models are always loaded with the highest precision and lowest optimization. Therefore, by default diffusion pipelines are always instantiated on CPU with float32 precision if not otherwise defined by the user. This ensures usability across different platforms and accelerators and means that no complex installations are required to run the library.
- Diffusers aim at being a **light-weight** package and therefore has very few required dependencies, but many soft dependencies that can improve performance (such as `accelerate`, `safetensors`, `onnx`, etc...). We strive to keep the library as lightweight as possible so that it can be added without much concern as a dependency on other packages.
- Diffusers aims to be a **light-weight** package and therefore has very few required dependencies, but many soft dependencies that can improve performance (such as `accelerate`, `safetensors`, `onnx`, etc...). We strive to keep the library as lightweight as possible so that it can be added without much concern as a dependency on other packages.
- Diffusers prefers simple, self-explainable code over condensed, magic code. This means that short-hand code syntaxes such as lambda functions, and advanced PyTorch operators are often not desired.
## Simple over easy
@@ -31,13 +31,13 @@ As PyTorch states, **explicit is better than implicit** and **simple is better t
- We follow PyTorch's API with methods like [`DiffusionPipeline.to`](https://huggingface.co/docs/diffusers/main/en/api/diffusion_pipeline#diffusers.DiffusionPipeline.to) to let the user handle device management.
- Raising concise error messages is preferred to silently correct erroneous input. Diffusers aims at teaching the user, rather than making the library as easy to use as possible.
- Complex model vs. scheduler logic is exposed instead of magically handled inside. Schedulers/Samplers are separated from diffusion models with minimal dependencies on each other. This forces the user to write the unrolled denoising loop. However, the separation allows for easier debugging and gives the user more control over adapting the denoising process or switching out diffusion models or schedulers.
- Separately trained components of the diffusion pipeline, *e.g.* the text encoder, the unet, and the variational autoencoder, each have their own model class. This forces the user to handle the interaction between the different model components, and the serialization format separates the model components into different files. However, this allows for easier debugging and customization. Dreambooth or textual inversion training
is very simple thanks to diffusers' ability to separate single components of the diffusion pipeline.
- Separately trained components of the diffusion pipeline, *e.g.* the text encoder, the UNet, and the variational autoencoder, each has their own model class. This forces the user to handle the interaction between the different model components, and the serialization format separates the model components into different files. However, this allows for easier debugging and customization. DreamBooth or Textual Inversion training
is very simple thanks to Diffusers' ability to separate single components of the diffusion pipeline.
## Tweakable, contributor-friendly over abstraction
For large parts of the library, Diffusers adopts an important design principle of the [Transformers library](https://github.com/huggingface/transformers), which is to prefer copy-pasted code over hasty abstractions. This design principle is very opinionated and stands in stark contrast to popular design principles such as [Don't repeat yourself (DRY)](https://en.wikipedia.org/wiki/Don%27t_repeat_yourself).
In short, just like Transformers does for modeling files, diffusers prefers to keep an extremely low level of abstraction and very self-contained code for pipelines and schedulers.
In short, just like Transformers does for modeling files, Diffusers prefers to keep an extremely low level of abstraction and very self-contained code for pipelines and schedulers.
Functions, long code blocks, and even classes can be copied across multiple files which at first can look like a bad, sloppy design choice that makes the library unmaintainable.
**However**, this design has proven to be extremely successful for Transformers and makes a lot of sense for community-driven, open-source machine learning libraries because:
- Machine Learning is an extremely fast-moving field in which paradigms, model architectures, and algorithms are changing rapidly, which therefore makes it very difficult to define long-lasting code abstractions.
@@ -47,24 +47,24 @@ Functions, long code blocks, and even classes can be copied across multiple file
At Hugging Face, we call this design the **single-file policy** which means that almost all of the code of a certain class should be written in a single, self-contained file. To read more about the philosophy, you can have a look
at [this blog post](https://huggingface.co/blog/transformers-design-philosophy).
In diffusers, we follow this philosophy for both pipelines and schedulers, but only partly for diffusion models. The reason we don't follow this design fully for diffusion models is because almost all diffusion pipelines, such
as [DDPM](https://huggingface.co/docs/diffusers/v0.12.0/en/api/pipelines/ddpm), [Stable Diffusion](https://huggingface.co/docs/diffusers/v0.12.0/en/api/pipelines/stable_diffusion/overview#stable-diffusion-pipelines), [UnCLIP (Dalle-2)](https://huggingface.co/docs/diffusers/v0.12.0/en/api/pipelines/unclip#overview) and [Imagen](https://imagen.research.google/) all rely on the same diffusion model, the [UNet](https://huggingface.co/docs/diffusers/api/models#diffusers.UNet2DConditionModel).
In Diffusers, we follow this philosophy for both pipelines and schedulers, but only partly for diffusion models. The reason we don't follow this design fully for diffusion models is because almost all diffusion pipelines, such
as [DDPM](https://huggingface.co/docs/diffusers/api/pipelines/ddpm), [Stable Diffusion](https://huggingface.co/docs/diffusers/api/pipelines/stable_diffusion/overview#stable-diffusion-pipelines), [unCLIP (DALL·E 2)](https://huggingface.co/docs/diffusers/api/pipelines/unclip) and [Imagen](https://imagen.research.google/) all rely on the same diffusion model, the [UNet](https://huggingface.co/docs/diffusers/api/models/unet2d-cond).
Great, now you should have generally understood why 🧨 Diffusers is designed the way it is 🤗.
We try to apply these design principles consistently across the library. Nevertheless, there are some minor exceptions to the philosophy or some unlucky design choices. If you have feedback regarding the design, we would ❤️ to hear it [directly on GitHub](https://github.com/huggingface/diffusers/issues/new?assignees=&labels=&template=feedback.md&title=).
## Design Philosophy in Details
Now, let's look a bit into the nitty-gritty details of the design philosophy. Diffusers essentially consist of three major classes, [pipelines](https://github.com/huggingface/diffusers/tree/main/src/diffusers/pipelines), [models](https://github.com/huggingface/diffusers/tree/main/src/diffusers/models), and [schedulers](https://github.com/huggingface/diffusers/tree/main/src/diffusers/schedulers).
Let's walk through more in-detail design decisions for each class.
Now, let's look a bit into the nitty-gritty details of the design philosophy. Diffusers essentially consists of three major classes: [pipelines](https://github.com/huggingface/diffusers/tree/main/src/diffusers/pipelines), [models](https://github.com/huggingface/diffusers/tree/main/src/diffusers/models), and [schedulers](https://github.com/huggingface/diffusers/tree/main/src/diffusers/schedulers).
Let's walk through more detailed design decisions for each class.
### Pipelines
Pipelines are designed to be easy to use (therefore do not follow [*Simple over easy*](#simple-over-easy) 100%)), are not feature complete, and should loosely be seen as examples of how to use [models](#models) and [schedulers](#schedulers) for inference.
Pipelines are designed to be easy to use (therefore do not follow [*Simple over easy*](#simple-over-easy) 100%), are not feature complete, and should loosely be seen as examples of how to use [models](#models) and [schedulers](#schedulers) for inference.
The following design principles are followed:
- Pipelines follow the single-file policy. All pipelines can be found in individual directories under src/diffusers/pipelines. One pipeline folder corresponds to one diffusion paper/project/release. Multiple pipeline files can be gathered in one pipeline folder, as its done for [`src/diffusers/pipelines/stable-diffusion`](https://github.com/huggingface/diffusers/tree/main/src/diffusers/pipelines/stable_diffusion). If pipelines share similar functionality, one can make use of the [#Copied from mechanism](https://github.com/huggingface/diffusers/blob/125d783076e5bd9785beb05367a2d2566843a271/src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_img2img.py#L251).
- Pipelines all inherit from [`DiffusionPipeline`]
- Pipelines all inherit from [`DiffusionPipeline`].
- Every pipeline consists of different model and scheduler components, that are documented in the [`model_index.json` file](https://huggingface.co/runwayml/stable-diffusion-v1-5/blob/main/model_index.json), are accessible under the same name as attributes of the pipeline and can be shared between pipelines with [`DiffusionPipeline.components`](https://huggingface.co/docs/diffusers/main/en/api/diffusion_pipeline#diffusers.DiffusionPipeline.components) function.
- Every pipeline should be loadable via the [`DiffusionPipeline.from_pretrained`](https://huggingface.co/docs/diffusers/main/en/api/diffusion_pipeline#diffusers.DiffusionPipeline.from_pretrained) function.
- Pipelines should be used **only** for inference.
@@ -83,14 +83,14 @@ The following design principles are followed:
- Models correspond to **a type of model architecture**. *E.g.* the [`UNet2DConditionModel`] class is used for all UNet variations that expect 2D image inputs and are conditioned on some context.
- All models can be found in [`src/diffusers/models`](https://github.com/huggingface/diffusers/tree/main/src/diffusers/models) and every model architecture shall be defined in its file, e.g. [`unet_2d_condition.py`](https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/unet_2d_condition.py), [`transformer_2d.py`](https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/transformer_2d.py), etc...
- Models **do not** follow the single-file policy and should make use of smaller model building blocks, such as [`attention.py`](https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/attention.py), [`resnet.py`](https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/resnet.py), [`embeddings.py`](https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/embeddings.py), etc... **Note**: This is in stark contrast to Transformers' modeling files and shows that models do not really follow the single-file policy.
- Models intend to expose complexity, just like PyTorch's module does, and give clear error messages.
- Models intend to expose complexity, just like PyTorch's `Module` class, and give clear error messages.
- Models all inherit from `ModelMixin` and `ConfigMixin`.
- Models can be optimized for performance when it doesnt demand major code changes, keeps backward compatibility, and gives significant memory or compute gain.
- Models can be optimized for performance when it doesnt demand major code changes, keep backward compatibility, and give significant memory or compute gain.
- Models should by default have the highest precision and lowest performance setting.
- To integrate new model checkpoints whose general architecture can be classified as an architecture that already exists in Diffusers, the existing model architecture shall be adapted to make it work with the new checkpoint. One should only create a new file if the model architecture is fundamentally different.
- Models should be designed to be easily extendable to future changes. This can be achieved by limiting public function arguments, configuration arguments, and "foreseeing" future changes, *e.g.* it is usually better to add `string` "...type" arguments that can easily be extended to new future types instead of boolean `is_..._type` arguments. Only the minimum amount of changes shall be made to existing architectures to make a new model checkpoint work.
- The model design is a difficult trade-off between keeping code readable and concise and supporting many model checkpoints. For most parts of the modeling code, classes shall be adapted for new model checkpoints, while there are some exceptions where it is preferred to add new classes to make sure the code is kept concise and
readable longterm, such as [UNet blocks](https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/unet_2d_blocks.py) and [Attention processors](https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/attention_processor.py).
readable long-term, such as [UNet blocks](https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/unet_2d_blocks.py) and [Attention processors](https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/attention_processor.py).
### Schedulers
@@ -99,10 +99,10 @@ Schedulers are responsible to guide the denoising process for inference as well
The following design principles are followed:
- All schedulers are found in [`src/diffusers/schedulers`](https://github.com/huggingface/diffusers/tree/main/src/diffusers/schedulers).
- Schedulers are **not** allowed to import from large utils files and shall be kept very self-contained.
- One scheduler python file corresponds to one scheduler algorithm (as might be defined in a paper).
- One scheduler Python file corresponds to one scheduler algorithm (as might be defined in a paper).
- If schedulers share similar functionalities, we can make use of the `#Copied from` mechanism.
- Schedulers all inherit from `SchedulerMixin` and `ConfigMixin`.
- Schedulers can be easily swapped out with the [`ConfigMixin.from_config`](https://huggingface.co/docs/diffusers/main/en/api/configuration#diffusers.ConfigMixin.from_config) method as explained in detail [here](./using-diffusers/schedulers.md).
- Schedulers can be easily swapped out with the [`ConfigMixin.from_config`](https://huggingface.co/docs/diffusers/main/en/api/configuration#diffusers.ConfigMixin.from_config) method as explained in detail [here](./docs/source/en/using-diffusers/schedulers.md).
- Every scheduler has to have a `set_num_inference_steps`, and a `step` function. `set_num_inference_steps(...)` has to be called before every denoising process, *i.e.* before `step(...)` is called.
- Every scheduler exposes the timesteps to be "looped over" via a `timesteps` attribute, which is an array of timesteps the model will be called upon.
- The `step(...)` function takes a predicted model output and the "current" sample (x_t) and returns the "previous", slightly more denoised sample (x_t-1).

View File

@@ -1,3 +1,19 @@
<!---
Copyright 2022 - The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
<p align="center">
<br>
<img src="https://raw.githubusercontent.com/huggingface/diffusers/main/docs/source/en/imgs/diffusers_library.jpg" width="400"/>
@@ -14,7 +30,10 @@
<img alt="GitHub release" src="https://static.pepy.tech/badge/diffusers/month">
</a>
<a href="CODE_OF_CONDUCT.md">
<img alt="Contributor Covenant" src="https://img.shields.io/badge/Contributor%20Covenant-2.0-4baaaa.svg">
<img alt="Contributor Covenant" src="https://img.shields.io/badge/Contributor%20Covenant-2.1-4baaaa.svg">
</a>
<a href="https://twitter.com/diffuserslib">
<img alt="X account" src="https://img.shields.io/twitter/url/https/twitter.com/diffuserslib.svg?style=social&label=Follow%20%40diffuserslib">
</a>
</p>
@@ -24,11 +43,11 @@
- State-of-the-art [diffusion pipelines](https://huggingface.co/docs/diffusers/api/pipelines/overview) that can be run in inference with just a few lines of code.
- Interchangeable noise [schedulers](https://huggingface.co/docs/diffusers/api/schedulers/overview) for different diffusion speeds and output quality.
- Pretrained [models](https://huggingface.co/docs/diffusers/api/models) that can be used as building blocks, and combined with schedulers, for creating your own end-to-end diffusion systems.
- Pretrained [models](https://huggingface.co/docs/diffusers/api/models/overview) that can be used as building blocks, and combined with schedulers, for creating your own end-to-end diffusion systems.
## Installation
We recommend installing 🤗 Diffusers in a virtual environment from PyPi or Conda. For more details about installing [PyTorch](https://pytorch.org/get-started/locally/) and [Flax](https://flax.readthedocs.io/en/latest/#installation), please refer to their official documentation.
We recommend installing 🤗 Diffusers in a virtual environment from PyPI or Conda. For more details about installing [PyTorch](https://pytorch.org/get-started/locally/) and [Flax](https://flax.readthedocs.io/en/latest/#installation), please refer to their official documentation.
### PyTorch
@@ -58,7 +77,7 @@ Please refer to the [How to use Stable Diffusion in Apple Silicon](https://huggi
## Quickstart
Generating outputs is super easy with 🤗 Diffusers. To generate an image from text, use the `from_pretrained` method to load any pretrained diffusion model (browse the [Hub](https://huggingface.co/models?library=diffusers&sort=downloads) for 4000+ checkpoints):
Generating outputs is super easy with 🤗 Diffusers. To generate an image from text, use the `from_pretrained` method to load any pretrained diffusion model (browse the [Hub](https://huggingface.co/models?library=diffusers&sort=downloads) for 16000+ checkpoints):
```python
from diffusers import DiffusionPipeline
@@ -75,14 +94,13 @@ You can also dig into the models and schedulers toolbox to build your own diffus
from diffusers import DDPMScheduler, UNet2DModel
from PIL import Image
import torch
import numpy as np
scheduler = DDPMScheduler.from_pretrained("google/ddpm-cat-256")
model = UNet2DModel.from_pretrained("google/ddpm-cat-256").to("cuda")
scheduler.set_timesteps(50)
sample_size = model.config.sample_size
noise = torch.randn((1, 3, sample_size, sample_size)).to("cuda")
noise = torch.randn((1, 3, sample_size, sample_size), device="cuda")
input = noise
for t in scheduler.timesteps:
@@ -117,8 +135,7 @@ You can look out for [issues](https://github.com/huggingface/diffusers/issues) y
- See [New model/pipeline](https://github.com/huggingface/diffusers/issues?q=is%3Aopen+is%3Aissue+label%3A%22New+pipeline%2Fmodel%22) to contribute exciting new diffusion models / diffusion pipelines
- See [New scheduler](https://github.com/huggingface/diffusers/issues?q=is%3Aopen+is%3Aissue+label%3A%22New+scheduler%22)
Also, say 👋 in our public Discord channel <a href="https://discord.gg/G7tWnz98XR"><img alt="Join us on Discord" src="https://img.shields.io/discord/823813159592001537?color=5865F2&logo=discord&logoColor=white"></a>. We discuss the hottest trends about diffusion models, help each other with contributions, personal projects or
just hang out ☕.
Also, say 👋 in our public Discord channel <a href="https://discord.gg/G7tWnz98XR"><img alt="Join us on Discord" src="https://img.shields.io/discord/823813159592001537?color=5865F2&logo=discord&logoColor=white"></a>. We discuss the hottest trends about diffusion models, help each other with contributions, personal projects or just hang out ☕.
## Popular Tasks & Pipelines
@@ -141,12 +158,12 @@ just hang out ☕.
</tr>
<tr>
<td>Text-to-Image</td>
<td><a href="https://huggingface.co/docs/diffusers/api/pipelines/unclip">unclip</a></td>
<td><a href="https://huggingface.co/docs/diffusers/api/pipelines/unclip">unCLIP</a></td>
<td><a href="https://huggingface.co/kakaobrain/karlo-v1-alpha"> kakaobrain/karlo-v1-alpha </a></td>
</tr>
<tr>
<td>Text-to-Image</td>
<td><a href="https://huggingface.co/docs/diffusers/api/pipelines/if">DeepFloyd IF</a></td>
<td><a href="https://huggingface.co/docs/diffusers/api/pipelines/deepfloyd_if">DeepFloyd IF</a></td>
<td><a href="https://huggingface.co/DeepFloyd/IF-I-XL-v1.0"> DeepFloyd/IF-I-XL-v1.0 </a></td>
</tr>
<tr>
@@ -156,12 +173,12 @@ just hang out ☕.
</tr>
<tr style="border-top: 2px solid black">
<td>Text-guided Image-to-Image</td>
<td><a href="https://huggingface.co/docs/diffusers/api/pipelines/stable_diffusion/controlnet">Controlnet</a></td>
<td><a href="https://huggingface.co/docs/diffusers/api/pipelines/controlnet">ControlNet</a></td>
<td><a href="https://huggingface.co/lllyasviel/sd-controlnet-canny"> lllyasviel/sd-controlnet-canny </a></td>
</tr>
<tr>
<td>Text-guided Image-to-Image</td>
<td><a href="https://huggingface.co/docs/diffusers/api/pipelines/pix2pix">Instruct Pix2Pix</a></td>
<td><a href="https://huggingface.co/docs/diffusers/api/pipelines/pix2pix">InstructPix2Pix</a></td>
<td><a href="https://huggingface.co/timbrooks/instruct-pix2pix"> timbrooks/instruct-pix2pix </a></td>
</tr>
<tr>
@@ -171,7 +188,7 @@ just hang out ☕.
</tr>
<tr style="border-top: 2px solid black">
<td>Text-guided Image Inpainting</td>
<td><a href="https://huggingface.co/docs/diffusers/api/pipelines/stable_diffusion/inpaint">Stable Diffusion Inpaint</a></td>
<td><a href="https://huggingface.co/docs/diffusers/api/pipelines/stable_diffusion/inpaint">Stable Diffusion Inpainting</a></td>
<td><a href="https://huggingface.co/runwayml/stable-diffusion-inpainting"> runwayml/stable-diffusion-inpainting </a></td>
</tr>
<tr style="border-top: 2px solid black">
@@ -202,9 +219,9 @@ just hang out ☕.
- https://github.com/deep-floyd/IF
- https://github.com/bentoml/BentoML
- https://github.com/bmaltais/kohya_ss
- +3000 other amazing GitHub repositories 💪
- +7000 other amazing GitHub repositories 💪
Thank you for using us ❤️
Thank you for using us ❤️.
## Credits

316
benchmarks/base_classes.py Normal file
View File

@@ -0,0 +1,316 @@
import os
import sys
import torch
from diffusers import (
AutoPipelineForImage2Image,
AutoPipelineForInpainting,
AutoPipelineForText2Image,
ControlNetModel,
LCMScheduler,
StableDiffusionAdapterPipeline,
StableDiffusionControlNetPipeline,
StableDiffusionXLAdapterPipeline,
StableDiffusionXLControlNetPipeline,
T2IAdapter,
WuerstchenCombinedPipeline,
)
from diffusers.utils import load_image
sys.path.append(".")
from utils import ( # noqa: E402
BASE_PATH,
PROMPT,
BenchmarkInfo,
benchmark_fn,
bytes_to_giga_bytes,
flush,
generate_csv_dict,
write_to_csv,
)
RESOLUTION_MAPPING = {
"runwayml/stable-diffusion-v1-5": (512, 512),
"lllyasviel/sd-controlnet-canny": (512, 512),
"diffusers/controlnet-canny-sdxl-1.0": (1024, 1024),
"TencentARC/t2iadapter_canny_sd14v1": (512, 512),
"TencentARC/t2i-adapter-canny-sdxl-1.0": (1024, 1024),
"stabilityai/stable-diffusion-2-1": (768, 768),
"stabilityai/stable-diffusion-xl-base-1.0": (1024, 1024),
"stabilityai/stable-diffusion-xl-refiner-1.0": (1024, 1024),
"stabilityai/sdxl-turbo": (512, 512),
}
class BaseBenchmak:
pipeline_class = None
def __init__(self, args):
super().__init__()
def run_inference(self, args):
raise NotImplementedError
def benchmark(self, args):
raise NotImplementedError
def get_result_filepath(self, args):
pipeline_class_name = str(self.pipe.__class__.__name__)
name = (
args.ckpt.replace("/", "_")
+ "_"
+ pipeline_class_name
+ f"-bs@{args.batch_size}-steps@{args.num_inference_steps}-mco@{args.model_cpu_offload}-compile@{args.run_compile}.csv"
)
filepath = os.path.join(BASE_PATH, name)
return filepath
class TextToImageBenchmark(BaseBenchmak):
pipeline_class = AutoPipelineForText2Image
def __init__(self, args):
pipe = self.pipeline_class.from_pretrained(args.ckpt, torch_dtype=torch.float16)
pipe = pipe.to("cuda")
if args.run_compile:
if not isinstance(pipe, WuerstchenCombinedPipeline):
pipe.unet.to(memory_format=torch.channels_last)
print("Run torch compile")
pipe.unet = torch.compile(pipe.unet, mode="reduce-overhead", fullgraph=True)
if hasattr(pipe, "movq") and getattr(pipe, "movq", None) is not None:
pipe.movq.to(memory_format=torch.channels_last)
pipe.movq = torch.compile(pipe.movq, mode="reduce-overhead", fullgraph=True)
else:
print("Run torch compile")
pipe.decoder = torch.compile(pipe.decoder, mode="reduce-overhead", fullgraph=True)
pipe.vqgan = torch.compile(pipe.vqgan, mode="reduce-overhead", fullgraph=True)
pipe.set_progress_bar_config(disable=True)
self.pipe = pipe
def run_inference(self, pipe, args):
_ = pipe(
prompt=PROMPT,
num_inference_steps=args.num_inference_steps,
num_images_per_prompt=args.batch_size,
)
def benchmark(self, args):
flush()
print(f"[INFO] {self.pipe.__class__.__name__}: Running benchmark with: {vars(args)}\n")
time = benchmark_fn(self.run_inference, self.pipe, args) # in seconds.
memory = bytes_to_giga_bytes(torch.cuda.max_memory_allocated()) # in GBs.
benchmark_info = BenchmarkInfo(time=time, memory=memory)
pipeline_class_name = str(self.pipe.__class__.__name__)
flush()
csv_dict = generate_csv_dict(
pipeline_cls=pipeline_class_name, ckpt=args.ckpt, args=args, benchmark_info=benchmark_info
)
filepath = self.get_result_filepath(args)
write_to_csv(filepath, csv_dict)
print(f"Logs written to: {filepath}")
flush()
class TurboTextToImageBenchmark(TextToImageBenchmark):
def __init__(self, args):
super().__init__(args)
def run_inference(self, pipe, args):
_ = pipe(
prompt=PROMPT,
num_inference_steps=args.num_inference_steps,
num_images_per_prompt=args.batch_size,
guidance_scale=0.0,
)
class LCMLoRATextToImageBenchmark(TextToImageBenchmark):
lora_id = "latent-consistency/lcm-lora-sdxl"
def __init__(self, args):
super().__init__(args)
self.pipe.load_lora_weights(self.lora_id)
self.pipe.fuse_lora()
self.pipe.scheduler = LCMScheduler.from_config(self.pipe.scheduler.config)
def get_result_filepath(self, args):
pipeline_class_name = str(self.pipe.__class__.__name__)
name = (
self.lora_id.replace("/", "_")
+ "_"
+ pipeline_class_name
+ f"-bs@{args.batch_size}-steps@{args.num_inference_steps}-mco@{args.model_cpu_offload}-compile@{args.run_compile}.csv"
)
filepath = os.path.join(BASE_PATH, name)
return filepath
def run_inference(self, pipe, args):
_ = pipe(
prompt=PROMPT,
num_inference_steps=args.num_inference_steps,
num_images_per_prompt=args.batch_size,
guidance_scale=1.0,
)
def benchmark(self, args):
flush()
print(f"[INFO] {self.pipe.__class__.__name__}: Running benchmark with: {vars(args)}\n")
time = benchmark_fn(self.run_inference, self.pipe, args) # in seconds.
memory = bytes_to_giga_bytes(torch.cuda.max_memory_allocated()) # in GBs.
benchmark_info = BenchmarkInfo(time=time, memory=memory)
pipeline_class_name = str(self.pipe.__class__.__name__)
flush()
csv_dict = generate_csv_dict(
pipeline_cls=pipeline_class_name, ckpt=self.lora_id, args=args, benchmark_info=benchmark_info
)
filepath = self.get_result_filepath(args)
write_to_csv(filepath, csv_dict)
print(f"Logs written to: {filepath}")
flush()
class ImageToImageBenchmark(TextToImageBenchmark):
pipeline_class = AutoPipelineForImage2Image
url = "https://huggingface.co/datasets/diffusers/docs-images/resolve/main/benchmarking/1665_Girl_with_a_Pearl_Earring.jpg"
image = load_image(url).convert("RGB")
def __init__(self, args):
super().__init__(args)
self.image = self.image.resize(RESOLUTION_MAPPING[args.ckpt])
def run_inference(self, pipe, args):
_ = pipe(
prompt=PROMPT,
image=self.image,
num_inference_steps=args.num_inference_steps,
num_images_per_prompt=args.batch_size,
)
class TurboImageToImageBenchmark(ImageToImageBenchmark):
def __init__(self, args):
super().__init__(args)
def run_inference(self, pipe, args):
_ = pipe(
prompt=PROMPT,
image=self.image,
num_inference_steps=args.num_inference_steps,
num_images_per_prompt=args.batch_size,
guidance_scale=0.0,
strength=0.5,
)
class InpaintingBenchmark(ImageToImageBenchmark):
pipeline_class = AutoPipelineForInpainting
mask_url = "https://huggingface.co/datasets/diffusers/docs-images/resolve/main/benchmarking/overture-creations-5sI6fQgYIuo_mask.png"
mask = load_image(mask_url).convert("RGB")
def __init__(self, args):
super().__init__(args)
self.image = self.image.resize(RESOLUTION_MAPPING[args.ckpt])
self.mask = self.mask.resize(RESOLUTION_MAPPING[args.ckpt])
def run_inference(self, pipe, args):
_ = pipe(
prompt=PROMPT,
image=self.image,
mask_image=self.mask,
num_inference_steps=args.num_inference_steps,
num_images_per_prompt=args.batch_size,
)
class ControlNetBenchmark(TextToImageBenchmark):
pipeline_class = StableDiffusionControlNetPipeline
aux_network_class = ControlNetModel
root_ckpt = "runwayml/stable-diffusion-v1-5"
url = "https://huggingface.co/datasets/diffusers/docs-images/resolve/main/benchmarking/canny_image_condition.png"
image = load_image(url).convert("RGB")
def __init__(self, args):
aux_network = self.aux_network_class.from_pretrained(args.ckpt, torch_dtype=torch.float16)
pipe = self.pipeline_class.from_pretrained(self.root_ckpt, controlnet=aux_network, torch_dtype=torch.float16)
pipe = pipe.to("cuda")
pipe.set_progress_bar_config(disable=True)
self.pipe = pipe
if args.run_compile:
pipe.unet.to(memory_format=torch.channels_last)
pipe.controlnet.to(memory_format=torch.channels_last)
print("Run torch compile")
pipe.unet = torch.compile(pipe.unet, mode="reduce-overhead", fullgraph=True)
pipe.controlnet = torch.compile(pipe.controlnet, mode="reduce-overhead", fullgraph=True)
self.image = self.image.resize(RESOLUTION_MAPPING[args.ckpt])
def run_inference(self, pipe, args):
_ = pipe(
prompt=PROMPT,
image=self.image,
num_inference_steps=args.num_inference_steps,
num_images_per_prompt=args.batch_size,
)
class ControlNetSDXLBenchmark(ControlNetBenchmark):
pipeline_class = StableDiffusionXLControlNetPipeline
root_ckpt = "stabilityai/stable-diffusion-xl-base-1.0"
def __init__(self, args):
super().__init__(args)
class T2IAdapterBenchmark(ControlNetBenchmark):
pipeline_class = StableDiffusionAdapterPipeline
aux_network_class = T2IAdapter
root_ckpt = "CompVis/stable-diffusion-v1-4"
url = "https://huggingface.co/datasets/diffusers/docs-images/resolve/main/benchmarking/canny_for_adapter.png"
image = load_image(url).convert("L")
def __init__(self, args):
aux_network = self.aux_network_class.from_pretrained(args.ckpt, torch_dtype=torch.float16)
pipe = self.pipeline_class.from_pretrained(self.root_ckpt, adapter=aux_network, torch_dtype=torch.float16)
pipe = pipe.to("cuda")
pipe.set_progress_bar_config(disable=True)
self.pipe = pipe
if args.run_compile:
pipe.unet.to(memory_format=torch.channels_last)
pipe.adapter.to(memory_format=torch.channels_last)
print("Run torch compile")
pipe.unet = torch.compile(pipe.unet, mode="reduce-overhead", fullgraph=True)
pipe.adapter = torch.compile(pipe.adapter, mode="reduce-overhead", fullgraph=True)
self.image = self.image.resize(RESOLUTION_MAPPING[args.ckpt])
class T2IAdapterSDXLBenchmark(T2IAdapterBenchmark):
pipeline_class = StableDiffusionXLAdapterPipeline
root_ckpt = "stabilityai/stable-diffusion-xl-base-1.0"
url = "https://huggingface.co/datasets/diffusers/docs-images/resolve/main/benchmarking/canny_for_adapter_sdxl.png"
image = load_image(url)
def __init__(self, args):
super().__init__(args)

View File

@@ -0,0 +1,26 @@
import argparse
import sys
sys.path.append(".")
from base_classes import ControlNetBenchmark, ControlNetSDXLBenchmark # noqa: E402
if __name__ == "__main__":
parser = argparse.ArgumentParser()
parser.add_argument(
"--ckpt",
type=str,
default="lllyasviel/sd-controlnet-canny",
choices=["lllyasviel/sd-controlnet-canny", "diffusers/controlnet-canny-sdxl-1.0"],
)
parser.add_argument("--batch_size", type=int, default=1)
parser.add_argument("--num_inference_steps", type=int, default=50)
parser.add_argument("--model_cpu_offload", action="store_true")
parser.add_argument("--run_compile", action="store_true")
args = parser.parse_args()
benchmark_pipe = (
ControlNetBenchmark(args) if args.ckpt == "lllyasviel/sd-controlnet-canny" else ControlNetSDXLBenchmark(args)
)
benchmark_pipe.benchmark(args)

View File

@@ -0,0 +1,29 @@
import argparse
import sys
sys.path.append(".")
from base_classes import ImageToImageBenchmark, TurboImageToImageBenchmark # noqa: E402
if __name__ == "__main__":
parser = argparse.ArgumentParser()
parser.add_argument(
"--ckpt",
type=str,
default="runwayml/stable-diffusion-v1-5",
choices=[
"runwayml/stable-diffusion-v1-5",
"stabilityai/stable-diffusion-2-1",
"stabilityai/stable-diffusion-xl-refiner-1.0",
"stabilityai/sdxl-turbo",
],
)
parser.add_argument("--batch_size", type=int, default=1)
parser.add_argument("--num_inference_steps", type=int, default=50)
parser.add_argument("--model_cpu_offload", action="store_true")
parser.add_argument("--run_compile", action="store_true")
args = parser.parse_args()
benchmark_pipe = ImageToImageBenchmark(args) if "turbo" not in args.ckpt else TurboImageToImageBenchmark(args)
benchmark_pipe.benchmark(args)

View File

@@ -0,0 +1,28 @@
import argparse
import sys
sys.path.append(".")
from base_classes import InpaintingBenchmark # noqa: E402
if __name__ == "__main__":
parser = argparse.ArgumentParser()
parser.add_argument(
"--ckpt",
type=str,
default="runwayml/stable-diffusion-v1-5",
choices=[
"runwayml/stable-diffusion-v1-5",
"stabilityai/stable-diffusion-2-1",
"stabilityai/stable-diffusion-xl-base-1.0",
],
)
parser.add_argument("--batch_size", type=int, default=1)
parser.add_argument("--num_inference_steps", type=int, default=50)
parser.add_argument("--model_cpu_offload", action="store_true")
parser.add_argument("--run_compile", action="store_true")
args = parser.parse_args()
benchmark_pipe = InpaintingBenchmark(args)
benchmark_pipe.benchmark(args)

View File

@@ -0,0 +1,28 @@
import argparse
import sys
sys.path.append(".")
from base_classes import T2IAdapterBenchmark, T2IAdapterSDXLBenchmark # noqa: E402
if __name__ == "__main__":
parser = argparse.ArgumentParser()
parser.add_argument(
"--ckpt",
type=str,
default="TencentARC/t2iadapter_canny_sd14v1",
choices=["TencentARC/t2iadapter_canny_sd14v1", "TencentARC/t2i-adapter-canny-sdxl-1.0"],
)
parser.add_argument("--batch_size", type=int, default=1)
parser.add_argument("--num_inference_steps", type=int, default=50)
parser.add_argument("--model_cpu_offload", action="store_true")
parser.add_argument("--run_compile", action="store_true")
args = parser.parse_args()
benchmark_pipe = (
T2IAdapterBenchmark(args)
if args.ckpt == "TencentARC/t2iadapter_canny_sd14v1"
else T2IAdapterSDXLBenchmark(args)
)
benchmark_pipe.benchmark(args)

View File

@@ -0,0 +1,23 @@
import argparse
import sys
sys.path.append(".")
from base_classes import LCMLoRATextToImageBenchmark # noqa: E402
if __name__ == "__main__":
parser = argparse.ArgumentParser()
parser.add_argument(
"--ckpt",
type=str,
default="stabilityai/stable-diffusion-xl-base-1.0",
)
parser.add_argument("--batch_size", type=int, default=1)
parser.add_argument("--num_inference_steps", type=int, default=4)
parser.add_argument("--model_cpu_offload", action="store_true")
parser.add_argument("--run_compile", action="store_true")
args = parser.parse_args()
benchmark_pipe = LCMLoRATextToImageBenchmark(args)
benchmark_pipe.benchmark(args)

View File

@@ -0,0 +1,40 @@
import argparse
import sys
sys.path.append(".")
from base_classes import TextToImageBenchmark, TurboTextToImageBenchmark # noqa: E402
ALL_T2I_CKPTS = [
"runwayml/stable-diffusion-v1-5",
"segmind/SSD-1B",
"stabilityai/stable-diffusion-xl-base-1.0",
"kandinsky-community/kandinsky-2-2-decoder",
"warp-ai/wuerstchen",
"stabilityai/sdxl-turbo",
]
if __name__ == "__main__":
parser = argparse.ArgumentParser()
parser.add_argument(
"--ckpt",
type=str,
default="runwayml/stable-diffusion-v1-5",
choices=ALL_T2I_CKPTS,
)
parser.add_argument("--batch_size", type=int, default=1)
parser.add_argument("--num_inference_steps", type=int, default=50)
parser.add_argument("--model_cpu_offload", action="store_true")
parser.add_argument("--run_compile", action="store_true")
args = parser.parse_args()
benchmark_cls = None
if "turbo" in args.ckpt:
benchmark_cls = TurboTextToImageBenchmark
else:
benchmark_cls = TextToImageBenchmark
benchmark_pipe = benchmark_cls(args)
benchmark_pipe.benchmark(args)

View File

@@ -0,0 +1,72 @@
import glob
import sys
import pandas as pd
from huggingface_hub import hf_hub_download, upload_file
from huggingface_hub.utils._errors import EntryNotFoundError
sys.path.append(".")
from utils import BASE_PATH, FINAL_CSV_FILE, GITHUB_SHA, REPO_ID, collate_csv # noqa: E402
def has_previous_benchmark() -> str:
csv_path = None
try:
csv_path = hf_hub_download(repo_id=REPO_ID, repo_type="dataset", filename=FINAL_CSV_FILE)
except EntryNotFoundError:
csv_path = None
return csv_path
def filter_float(value):
if isinstance(value, str):
return float(value.split()[0])
return value
def push_to_hf_dataset():
all_csvs = sorted(glob.glob(f"{BASE_PATH}/*.csv"))
collate_csv(all_csvs, FINAL_CSV_FILE)
# If there's an existing benchmark file, we should report the changes.
csv_path = has_previous_benchmark()
if csv_path is not None:
current_results = pd.read_csv(FINAL_CSV_FILE)
previous_results = pd.read_csv(csv_path)
numeric_columns = current_results.select_dtypes(include=["float64", "int64"]).columns
numeric_columns = [
c for c in numeric_columns if c not in ["batch_size", "num_inference_steps", "actual_gpu_memory (gbs)"]
]
for column in numeric_columns:
previous_results[column] = previous_results[column].map(lambda x: filter_float(x))
# Calculate the percentage change
current_results[column] = current_results[column].astype(float)
previous_results[column] = previous_results[column].astype(float)
percent_change = ((current_results[column] - previous_results[column]) / previous_results[column]) * 100
# Format the values with '+' or '-' sign and append to original values
current_results[column] = current_results[column].map(str) + percent_change.map(
lambda x: f" ({'+' if x > 0 else ''}{x:.2f}%)"
)
# There might be newly added rows. So, filter out the NaNs.
current_results[column] = current_results[column].map(lambda x: x.replace(" (nan%)", ""))
# Overwrite the current result file.
current_results.to_csv(FINAL_CSV_FILE, index=False)
commit_message = f"upload from sha: {GITHUB_SHA}" if GITHUB_SHA is not None else "upload benchmark results"
upload_file(
repo_id=REPO_ID,
path_in_repo=FINAL_CSV_FILE,
path_or_fileobj=FINAL_CSV_FILE,
repo_type="dataset",
commit_message=commit_message,
)
if __name__ == "__main__":
push_to_hf_dataset()

97
benchmarks/run_all.py Normal file
View File

@@ -0,0 +1,97 @@
import glob
import subprocess
import sys
from typing import List
sys.path.append(".")
from benchmark_text_to_image import ALL_T2I_CKPTS # noqa: E402
PATTERN = "benchmark_*.py"
class SubprocessCallException(Exception):
pass
# Taken from `test_examples_utils.py`
def run_command(command: List[str], return_stdout=False):
"""
Runs `command` with `subprocess.check_output` and will potentially return the `stdout`. Will also properly capture
if an error occurred while running `command`
"""
try:
output = subprocess.check_output(command, stderr=subprocess.STDOUT)
if return_stdout:
if hasattr(output, "decode"):
output = output.decode("utf-8")
return output
except subprocess.CalledProcessError as e:
raise SubprocessCallException(
f"Command `{' '.join(command)}` failed with the following error:\n\n{e.output.decode()}"
) from e
def main():
python_files = glob.glob(PATTERN)
for file in python_files:
print(f"****** Running file: {file} ******")
# Run with canonical settings.
if file != "benchmark_text_to_image.py":
command = f"python {file}"
run_command(command.split())
command += " --run_compile"
run_command(command.split())
# Run variants.
for file in python_files:
if file == "benchmark_text_to_image.py":
for ckpt in ALL_T2I_CKPTS:
command = f"python {file} --ckpt {ckpt}"
if "turbo" in ckpt:
command += " --num_inference_steps 1"
run_command(command.split())
command += " --run_compile"
run_command(command.split())
elif file == "benchmark_sd_img.py":
for ckpt in ["stabilityai/stable-diffusion-xl-refiner-1.0", "stabilityai/sdxl-turbo"]:
command = f"python {file} --ckpt {ckpt}"
if ckpt == "stabilityai/sdxl-turbo":
command += " --num_inference_steps 2"
run_command(command.split())
command += " --run_compile"
run_command(command.split())
elif file == "benchmark_sd_inpainting.py":
sdxl_ckpt = "stabilityai/stable-diffusion-xl-base-1.0"
command = f"python {file} --ckpt {sdxl_ckpt}"
run_command(command.split())
command += " --run_compile"
run_command(command.split())
elif file in ["benchmark_controlnet.py", "benchmark_t2i_adapter.py"]:
sdxl_ckpt = (
"diffusers/controlnet-canny-sdxl-1.0"
if "controlnet" in file
else "TencentARC/t2i-adapter-canny-sdxl-1.0"
)
command = f"python {file} --ckpt {sdxl_ckpt}"
run_command(command.split())
command += " --run_compile"
run_command(command.split())
if __name__ == "__main__":
main()

98
benchmarks/utils.py Normal file
View File

@@ -0,0 +1,98 @@
import argparse
import csv
import gc
import os
from dataclasses import dataclass
from typing import Dict, List, Union
import torch
import torch.utils.benchmark as benchmark
GITHUB_SHA = os.getenv("GITHUB_SHA", None)
BENCHMARK_FIELDS = [
"pipeline_cls",
"ckpt_id",
"batch_size",
"num_inference_steps",
"model_cpu_offload",
"run_compile",
"time (secs)",
"memory (gbs)",
"actual_gpu_memory (gbs)",
"github_sha",
]
PROMPT = "ghibli style, a fantasy landscape with castles"
BASE_PATH = os.getenv("BASE_PATH", ".")
TOTAL_GPU_MEMORY = float(os.getenv("TOTAL_GPU_MEMORY", torch.cuda.get_device_properties(0).total_memory / (1024**3)))
REPO_ID = "diffusers/benchmarks"
FINAL_CSV_FILE = "collated_results.csv"
@dataclass
class BenchmarkInfo:
time: float
memory: float
def flush():
"""Wipes off memory."""
gc.collect()
torch.cuda.empty_cache()
torch.cuda.reset_max_memory_allocated()
torch.cuda.reset_peak_memory_stats()
def bytes_to_giga_bytes(bytes):
return f"{(bytes / 1024 / 1024 / 1024):.3f}"
def benchmark_fn(f, *args, **kwargs):
t0 = benchmark.Timer(
stmt="f(*args, **kwargs)",
globals={"args": args, "kwargs": kwargs, "f": f},
num_threads=torch.get_num_threads(),
)
return f"{(t0.blocked_autorange().mean):.3f}"
def generate_csv_dict(
pipeline_cls: str, ckpt: str, args: argparse.Namespace, benchmark_info: BenchmarkInfo
) -> Dict[str, Union[str, bool, float]]:
"""Packs benchmarking data into a dictionary for latter serialization."""
data_dict = {
"pipeline_cls": pipeline_cls,
"ckpt_id": ckpt,
"batch_size": args.batch_size,
"num_inference_steps": args.num_inference_steps,
"model_cpu_offload": args.model_cpu_offload,
"run_compile": args.run_compile,
"time (secs)": benchmark_info.time,
"memory (gbs)": benchmark_info.memory,
"actual_gpu_memory (gbs)": f"{(TOTAL_GPU_MEMORY):.3f}",
"github_sha": GITHUB_SHA,
}
return data_dict
def write_to_csv(file_name: str, data_dict: Dict[str, Union[str, bool, float]]):
"""Serializes a dictionary into a CSV file."""
with open(file_name, mode="w", newline="") as csvfile:
writer = csv.DictWriter(csvfile, fieldnames=BENCHMARK_FIELDS)
writer.writeheader()
writer.writerow(data_dict)
def collate_csv(input_files: List[str], output_file: str):
"""Collates multiple identically structured CSVs into a single CSV file."""
with open(output_file, mode="w", newline="") as outfile:
writer = csv.DictWriter(outfile, fieldnames=BENCHMARK_FIELDS)
writer.writeheader()
for file in input_files:
with open(file, mode="r") as infile:
reader = csv.DictReader(infile)
for row in reader:
writer.writerow(row)

View File

@@ -40,7 +40,6 @@ RUN python3.9 -m pip install --no-cache-dir --upgrade pip && \
numpy \
scipy \
tensorboard \
transformers \
omegaconf
transformers
CMD ["/bin/bash"]

View File

@@ -40,7 +40,6 @@ RUN python3 -m pip install --no-cache-dir --upgrade pip && \
scipy \
tensorboard \
transformers \
omegaconf \
pytorch-lightning
CMD ["/bin/bash"]

View File

@@ -40,7 +40,6 @@ RUN python3 -m pip install --no-cache-dir --upgrade pip && \
scipy \
tensorboard \
transformers \
omegaconf \
xformers
CMD ["/bin/bash"]

View File

@@ -1,10 +1,22 @@
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->
### Translating the Diffusers documentation into your language
As part of our mission to democratize machine learning, we'd love to make the Diffusers library available in many more languages! Follow the steps below if you want to help translate the documentation into your language 🙏.
**🗞️ Open an issue**
To get started, navigate to the [Issues](https://github.com/huggingface/diffusers/issues) page of this repo and check if anyone else has opened an issue for your language. If not, open a new issue by selecting the "Translation template" from the "New issue" button.
To get started, navigate to the [Issues](https://github.com/huggingface/diffusers/issues) page of this repo and check if anyone else has opened an issue for your language. If not, open a new issue by selecting the "🌐 Translating a New Language?" from the "New issue" button.
Once an issue exists, post a comment to indicate which chapters you'd like to work on, and we'll add your name to the list.
@@ -16,7 +28,7 @@ First, you'll need to [fork the Diffusers repo](https://docs.github.com/en/get-s
Once you've forked the repo, you'll want to get the files on your local machine for editing. You can do that by cloning the fork with Git as follows:
```bash
git clone https://github.com/YOUR-USERNAME/diffusers.git
git clone https://github.com/<YOUR-USERNAME>/diffusers.git
```
**📋 Copy-paste the English version with a new language code**
@@ -29,10 +41,10 @@ You'll only need to copy the files in the [`docs/source/en`](https://github.com/
```bash
cd ~/path/to/diffusers/docs
cp -r source/en source/LANG-ID
cp -r source/en source/<LANG-ID>
```
Here, `LANG-ID` should be one of the ISO 639-1 or ISO 639-2 language codes -- see [here](https://www.loc.gov/standards/iso639-2/php/code_list.php) for a handy table.
Here, `<LANG-ID>` should be one of the ISO 639-1 or ISO 639-2 language codes -- see [here](https://www.loc.gov/standards/iso639-2/php/code_list.php) for a handy table.
**✍️ Start translating**
@@ -40,7 +52,7 @@ The fun part comes - translating the text!
The first thing we recommend is translating the part of the `_toctree.yml` file that corresponds to your doc chapter. This file is used to render the table of contents on the website.
> 🙋 If the `_toctree.yml` file doesn't yet exist for your language, you can create one by copy-pasting from the English version and deleting the sections unrelated to your chapter. Just make sure it exists in the `docs/source/LANG-ID/` directory!
> 🙋 If the `_toctree.yml` file doesn't yet exist for your language, you can create one by copy-pasting from the English version and deleting the sections unrelated to your chapter. Just make sure it exists in the `docs/source/<LANG-ID>/` directory!
The fields you should add are `local` (with the name of the file containing the translation; e.g. `autoclass_tutorial`), and `title` (with the title of the doc in your language; e.g. `Load pretrained instances with an AutoClass`) -- as a reference, here is the `_toctree.yml` for [English](https://github.com/huggingface/diffusers/blob/main/docs/source/en/_toctree.yml):

View File

@@ -19,6 +19,8 @@
title: Train a diffusion model
- local: tutorials/using_peft_for_inference
title: Inference with PEFT
- local: tutorials/fast_diffusion
title: Accelerate inference of text-to-image diffusion models
title: Tutorials
- sections:
- sections:
@@ -72,26 +74,32 @@
title: Overview
- local: using-diffusers/sdxl
title: Stable Diffusion XL
- local: using-diffusers/lcm
title: Latent Consistency Models
- local: using-diffusers/sdxl_turbo
title: SDXL Turbo
- local: using-diffusers/kandinsky
title: Kandinsky
- local: using-diffusers/controlnet
title: ControlNet
- local: using-diffusers/callback
title: Callback
- local: using-diffusers/shap-e
title: Shap-E
- local: using-diffusers/diffedit
title: DiffEdit
- local: using-diffusers/distilled_sd
title: Distilled Stable Diffusion inference
- local: using-diffusers/callback
title: Pipeline callbacks
- local: using-diffusers/reproducibility
title: Create reproducible pipelines
- local: using-diffusers/custom_pipeline_examples
title: Community pipelines
- local: using-diffusers/contribute_pipeline
title: Contribute a community pipeline
- local: using-diffusers/inference_with_lcm_lora
title: Latent Consistency Model-LoRA
- local: using-diffusers/inference_with_lcm
title: Latent Consistency Model
- local: using-diffusers/svd
title: Stable Video Diffusion
title: Specific pipeline examples
- sections:
- local: training/overview
@@ -100,26 +108,38 @@
title: Create a dataset for training
- local: training/adapt_a_model
title: Adapt a model to a new task
- local: training/unconditional_training
title: Unconditional image generation
- local: training/text_inversion
title: Textual Inversion
- local: training/dreambooth
title: DreamBooth
- local: training/text2image
title: Text-to-image
- local: training/lora
title: Low-Rank Adaptation of Large Language Models (LoRA)
- local: training/controlnet
title: ControlNet
- local: training/instructpix2pix
title: InstructPix2Pix Training
- local: training/custom_diffusion
title: Custom Diffusion
- local: training/t2i_adapters
title: T2I-Adapters
- local: training/ddpo
title: Reinforcement learning training with DDPO
- sections:
- local: training/unconditional_training
title: Unconditional image generation
- local: training/text2image
title: Text-to-image
- local: training/sdxl
title: Stable Diffusion XL
- local: training/kandinsky
title: Kandinsky 2.2
- local: training/wuerstchen
title: Wuerstchen
- local: training/controlnet
title: ControlNet
- local: training/t2i_adapters
title: T2I-Adapters
- local: training/instructpix2pix
title: InstructPix2Pix
title: Models
- sections:
- local: training/text_inversion
title: Textual Inversion
- local: training/dreambooth
title: DreamBooth
- local: training/lora
title: LoRA
- local: training/custom_diffusion
title: Custom Diffusion
- local: training/lcm_distill
title: Latent Consistency Distillation
- local: training/ddpo
title: Reinforcement learning training with DDPO
title: Methods
title: Training
- sections:
- local: using-diffusers/other-modalities
@@ -140,6 +160,8 @@
title: xFormers
- local: optimization/tome
title: Token merging
- local: optimization/deepcache
title: DeepCache
title: General optimizations
- sections:
- local: using-diffusers/stable_diffusion_jax_how_to
@@ -174,13 +196,25 @@
- sections:
- local: api/configuration
title: Configuration
- local: api/loaders
title: Loaders
- local: api/logging
title: Logging
- local: api/outputs
title: Outputs
title: Main Classes
- sections:
- local: api/loaders/ip_adapter
title: IP-Adapter
- local: api/loaders/lora
title: LoRA
- local: api/loaders/single_file
title: Single files
- local: api/loaders/textual_inversion
title: Textual Inversion
- local: api/loaders/unet
title: UNet
- local: api/loaders/peft
title: PEFT
title: Loaders
- sections:
- local: api/models/overview
title: Overview
@@ -216,14 +250,12 @@
- sections:
- local: api/pipelines/overview
title: Overview
- local: api/pipelines/alt_diffusion
title: AltDiffusion
- local: api/pipelines/amused
title: aMUSEd
- local: api/pipelines/animatediff
title: AnimateDiff
- local: api/pipelines/attend_and_excite
title: Attend-and-Excite
- local: api/pipelines/audio_diffusion
title: Audio Diffusion
- local: api/pipelines/audioldm
title: AudioLDM
- local: api/pipelines/audioldm2
@@ -231,15 +263,13 @@
- local: api/pipelines/auto_pipeline
title: AutoPipeline
- local: api/pipelines/blip_diffusion
title: BLIP Diffusion
title: BLIP-Diffusion
- local: api/pipelines/consistency_models
title: Consistency Models
- local: api/pipelines/controlnet
title: ControlNet
- local: api/pipelines/controlnet_sdxl
title: ControlNet with Stable Diffusion XL
- local: api/pipelines/cycle_diffusion
title: Cycle Diffusion
- local: api/pipelines/dance_diffusion
title: Dance Diffusion
- local: api/pipelines/ddim
@@ -258,6 +288,8 @@
title: Kandinsky 2.1
- local: api/pipelines/kandinsky_v22
title: Kandinsky 2.2
- local: api/pipelines/kandinsky3
title: Kandinsky 3
- local: api/pipelines/latent_consistency_models
title: Latent Consistency Models
- local: api/pipelines/latent_diffusion
@@ -267,27 +299,15 @@
- local: api/pipelines/musicldm
title: MusicLDM
- local: api/pipelines/paint_by_example
title: Paint By Example
- local: api/pipelines/paradigms
title: Parallel Sampling of Diffusion Models
- local: api/pipelines/pix2pix_zero
title: Pix2Pix Zero
title: Paint by Example
- local: api/pipelines/pixart
title: PixArt
- local: api/pipelines/pndm
title: PNDM
- local: api/pipelines/repaint
title: RePaint
- local: api/pipelines/score_sde_ve
title: Score SDE VE
title: PixArt-α
- local: api/pipelines/self_attention_guidance
title: Self-Attention Guidance
- local: api/pipelines/semantic_stable_diffusion
title: Semantic Guidance
- local: api/pipelines/shap_e
title: Shap-E
- local: api/pipelines/spectrogram_diffusion
title: Spectrogram Diffusion
- sections:
- local: api/pipelines/stable_diffusion/overview
title: Overview
@@ -307,12 +327,16 @@
title: Stable Diffusion 2
- local: api/pipelines/stable_diffusion/stable_diffusion_xl
title: Stable Diffusion XL
- local: api/pipelines/stable_diffusion/sdxl_turbo
title: SDXL Turbo
- local: api/pipelines/stable_diffusion/latent_upscale
title: Latent upscaler
- local: api/pipelines/stable_diffusion/upscale
title: Super-resolution
- local: api/pipelines/stable_diffusion/k_diffusion
title: K-Diffusion
- local: api/pipelines/stable_diffusion/ldm3d_diffusion
title: LDM3D Text-to-(RGB, Depth)
title: LDM3D Text-to-(RGB, Depth), Text-to-(RGB-pano, Depth-pano), LDM3D Upscaler
- local: api/pipelines/stable_diffusion/adapter
title: Stable Diffusion T2I-Adapter
- local: api/pipelines/stable_diffusion/gligen
@@ -320,26 +344,16 @@
title: Stable Diffusion
- local: api/pipelines/stable_unclip
title: Stable unCLIP
- local: api/pipelines/stochastic_karras_ve
title: Stochastic Karras VE
- local: api/pipelines/model_editing
title: Text-to-image model editing
- local: api/pipelines/text_to_video
title: Text-to-video
- local: api/pipelines/text_to_video_zero
title: Text2Video-Zero
- local: api/pipelines/unclip
title: unCLIP
- local: api/pipelines/latent_diffusion_uncond
title: Unconditional Latent Diffusion
- local: api/pipelines/unidiffuser
title: UniDiffuser
- local: api/pipelines/value_guided_sampling
title: Value-guided sampling
- local: api/pipelines/versatile_diffusion
title: Versatile Diffusion
- local: api/pipelines/vq_diffusion
title: VQ Diffusion
- local: api/pipelines/wuerstchen
title: Wuerstchen
title: Pipelines

View File

@@ -1,3 +1,15 @@
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->
# Activation functions
Customized activation functions for supporting various models in 🤗 Diffusers.
@@ -12,4 +24,4 @@ Customized activation functions for supporting various models in 🤗 Diffusers.
## ApproximateGELU
[[autodoc]] models.activations.ApproximateGELU
[[autodoc]] models.activations.ApproximateGELU

View File

@@ -1,3 +1,15 @@
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->
# Attention Processor
An attention processor is a class for applying different types of attention mechanisms.
@@ -8,6 +20,9 @@ An attention processor is a class for applying different types of attention mech
## AttnProcessor2_0
[[autodoc]] models.attention_processor.AttnProcessor2_0
## FusedAttnProcessor2_0
[[autodoc]] models.attention_processor.FusedAttnProcessor2_0
## LoRAAttnProcessor
[[autodoc]] models.attention_processor.LoRAAttnProcessor

View File

@@ -12,9 +12,9 @@ specific language governing permissions and limitations under the License.
# VAE Image Processor
The [`VaeImageProcessor`] provides a unified API for [`StableDiffusionPipeline`]'s to prepare image inputs for VAE encoding and post-processing outputs once they're decoded. This includes transformations such as resizing, normalization, and conversion between PIL Image, PyTorch, and NumPy arrays.
The [`VaeImageProcessor`] provides a unified API for [`StableDiffusionPipeline`]s to prepare image inputs for VAE encoding and post-processing outputs once they're decoded. This includes transformations such as resizing, normalization, and conversion between PIL Image, PyTorch, and NumPy arrays.
All pipelines with [`VaeImageProcessor`] accepts PIL Image, PyTorch tensor, or NumPy arrays as image inputs and returns outputs based on the `output_type` argument by the user. You can pass encoded image latents directly to the pipeline and return latents from the pipeline as a specific output with the `output_type` argument (for example `output_type="pt"`). This allows you to take the generated latents from one pipeline and pass it to another pipeline as input without leaving the latent space. It also makes it much easier to use multiple pipelines together by passing PyTorch tensors directly between different pipelines.
All pipelines with [`VaeImageProcessor`] accept PIL Image, PyTorch tensor, or NumPy arrays as image inputs and return outputs based on the `output_type` argument by the user. You can pass encoded image latents directly to the pipeline and return latents from the pipeline as a specific output with the `output_type` argument (for example `output_type="latent"`). This allows you to take the generated latents from one pipeline and pass it to another pipeline as input without leaving the latent space. It also makes it much easier to use multiple pipelines together by passing PyTorch tensors directly between different pipelines.
## VaeImageProcessor
@@ -24,4 +24,4 @@ All pipelines with [`VaeImageProcessor`] accepts PIL Image, PyTorch tensor, or N
The [`VaeImageProcessorLDM3D`] accepts RGB and depth inputs and returns RGB and depth outputs.
[[autodoc]] image_processor.VaeImageProcessorLDM3D
[[autodoc]] image_processor.VaeImageProcessorLDM3D

View File

@@ -1,3 +1,15 @@
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->
# Overview
The APIs in this section are more experimental and prone to breaking changes. Most of them are used internally for development, but they may also be useful to you if you're interested in building a diffusion model with some custom parts or if you're interested in some of our helper utilities for working with 🤗 Diffusers.

View File

@@ -1,49 +0,0 @@
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->
# Loaders
Adapters (textual inversion, LoRA, hypernetworks) allow you to modify a diffusion model to generate images in a specific style without training or finetuning the entire model. The adapter weights are typically only a tiny fraction of the pretrained model's which making them very portable. 🤗 Diffusers provides an easy-to-use `LoaderMixin` API to load adapter weights.
<Tip warning={true}>
🧪 The `LoaderMixins` are highly experimental and prone to future changes. To use private or [gated](https://huggingface.co/docs/hub/models-gated#gated-models) models, log-in with `huggingface-cli login`.
</Tip>
## UNet2DConditionLoadersMixin
[[autodoc]] loaders.UNet2DConditionLoadersMixin
## TextualInversionLoaderMixin
[[autodoc]] loaders.TextualInversionLoaderMixin
## StableDiffusionXLLoraLoaderMixin
[[autodoc]] loaders.StableDiffusionXLLoraLoaderMixin
## LoraLoaderMixin
[[autodoc]] loaders.LoraLoaderMixin
## FromSingleFileMixin
[[autodoc]] loaders.FromSingleFileMixin
## FromOriginalControlnetMixin
[[autodoc]] loaders.FromOriginalControlnetMixin
## FromOriginalVAEMixin
[[autodoc]] loaders.FromOriginalVAEMixin

View File

@@ -0,0 +1,25 @@
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->
# IP-Adapter
[IP-Adapter](https://hf.co/papers/2308.06721) is a lightweight adapter that enables prompting a diffusion model with an image. This method decouples the cross-attention layers of the image and text features. The image features are generated from an image encoder. Files generated from IP-Adapter are only ~100MBs.
<Tip>
Learn how to load an IP-Adapter checkpoint and image in the [IP-Adapter](../../using-diffusers/loading_adapters#ip-adapter) loading guide.
</Tip>
## IPAdapterMixin
[[autodoc]] loaders.ip_adapter.IPAdapterMixin

View File

@@ -0,0 +1,32 @@
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->
# LoRA
LoRA is a fast and lightweight training method that inserts and trains a significantly smaller number of parameters instead of all the model parameters. This produces a smaller file (~100 MBs) and makes it easier to quickly train a model to learn a new concept. LoRA weights are typically loaded into the UNet, text encoder or both. There are two classes for loading LoRA weights:
- [`LoraLoaderMixin`] provides functions for loading and unloading, fusing and unfusing, enabling and disabling, and more functions for managing LoRA weights. This class can be used with any model.
- [`StableDiffusionXLLoraLoaderMixin`] is a [Stable Diffusion (SDXL)](../../api/pipelines/stable_diffusion/stable_diffusion_xl) version of the [`LoraLoaderMixin`] class for loading and saving LoRA weights. It can only be used with the SDXL model.
<Tip>
To learn more about how to load LoRA weights, see the [LoRA](../../using-diffusers/loading_adapters#lora) loading guide.
</Tip>
## LoraLoaderMixin
[[autodoc]] loaders.lora.LoraLoaderMixin
## StableDiffusionXLLoraLoaderMixin
[[autodoc]] loaders.lora.StableDiffusionXLLoraLoaderMixin

View File

@@ -0,0 +1,25 @@
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->
# PEFT
Diffusers supports loading adapters such as [LoRA](../../using-diffusers/loading_adapters) with the [PEFT](https://huggingface.co/docs/peft/index) library with the [`~loaders.peft.PeftAdapterMixin`] class. This allows modeling classes in Diffusers like [`UNet2DConditionModel`] to load an adapter.
<Tip>
Refer to the [Inference with PEFT](../../tutorials/using_peft_for_inference.md) tutorial for an overview of how to use PEFT in Diffusers for inference.
</Tip>
## PeftAdapterMixin
[[autodoc]] loaders.peft.PeftAdapterMixin

View File

@@ -0,0 +1,37 @@
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->
# Single files
Diffusers supports loading pretrained pipeline (or model) weights stored in a single file, such as a `ckpt` or `safetensors` file. These single file types are typically produced from community trained models. There are three classes for loading single file weights:
- [`FromSingleFileMixin`] supports loading pretrained pipeline weights stored in a single file, which can either be a `ckpt` or `safetensors` file.
- [`FromOriginalVAEMixin`] supports loading a pretrained [`AutoencoderKL`] from pretrained ControlNet weights stored in a single file, which can either be a `ckpt` or `safetensors` file.
- [`FromOriginalControlnetMixin`] supports loading pretrained ControlNet weights stored in a single file, which can either be a `ckpt` or `safetensors` file.
<Tip>
To learn more about how to load single file weights, see the [Load different Stable Diffusion formats](../../using-diffusers/other-formats) loading guide.
</Tip>
## FromSingleFileMixin
[[autodoc]] loaders.single_file.FromSingleFileMixin
## FromOriginalVAEMixin
[[autodoc]] loaders.single_file.FromOriginalVAEMixin
## FromOriginalControlnetMixin
[[autodoc]] loaders.single_file.FromOriginalControlnetMixin

View File

@@ -0,0 +1,27 @@
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->
# Textual Inversion
Textual Inversion is a training method for personalizing models by learning new text embeddings from a few example images. The file produced from training is extremely small (a few KBs) and the new embeddings can be loaded into the text encoder.
[`TextualInversionLoaderMixin`] provides a function for loading Textual Inversion embeddings from Diffusers and Automatic1111 into the text encoder and loading a special token to activate the embeddings.
<Tip>
To learn more about how to load Textual Inversion embeddings, see the [Textual Inversion](../../using-diffusers/loading_adapters#textual-inversion) loading guide.
</Tip>
## TextualInversionLoaderMixin
[[autodoc]] loaders.textual_inversion.TextualInversionLoaderMixin

View File

@@ -0,0 +1,27 @@
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->
# UNet
Some training methods - like LoRA and Custom Diffusion - typically target the UNet's attention layers, but these training methods can also target other non-attention layers. Instead of training all of a model's parameters, only a subset of the parameters are trained, which is faster and more efficient. This class is useful if you're *only* loading weights into a UNet. If you need to load weights into the text encoder or a text encoder and UNet, try using the [`~loaders.LoraLoaderMixin.load_lora_weights`] function instead.
The [`UNet2DConditionLoadersMixin`] class provides functions for loading and saving weights, fusing and unfusing LoRAs, disabling and enabling LoRAs, and setting and deleting adapters.
<Tip>
To learn more about how to load LoRA weights, see the [LoRA](../../using-diffusers/loading_adapters#lora) loading guide.
</Tip>
## UNet2DConditionLoadersMixin
[[autodoc]] loaders.unet.UNet2DConditionLoadersMixin

View File

@@ -51,7 +51,7 @@ logger.warning("WARN")
All methods of the logging module are documented below. The main methods are
[`logging.get_verbosity`] to get the current level of verbosity in the logger and
[`logging.set_verbosity`] to set the verbosity to the level of your choice.
[`logging.set_verbosity`] to set the verbosity to the level of your choice.
In order from the least verbose to the most verbose:

View File

@@ -1,3 +1,15 @@
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->
# AsymmetricAutoencoderKL
Improved larger variational autoencoder (VAE) model with KL loss for inpainting task: [Designing a Better Asymmetric VQGAN for StableDiffusion](https://arxiv.org/abs/2306.04632) by Zixin Zhu, Xuelu Feng, Dongdong Chen, Jianmin Bao, Le Wang, Yinpeng Chen, Lu Yuan, Gang Hua.
@@ -6,7 +18,7 @@ The abstract from the paper is:
*StableDiffusion is a revolutionary text-to-image generator that is causing a stir in the world of image generation and editing. Unlike traditional methods that learn a diffusion model in pixel space, StableDiffusion learns a diffusion model in the latent space via a VQGAN, ensuring both efficiency and quality. It not only supports image generation tasks, but also enables image editing for real images, such as image inpainting and local editing. However, we have observed that the vanilla VQGAN used in StableDiffusion leads to significant information loss, causing distortion artifacts even in non-edited image regions. To this end, we propose a new asymmetric VQGAN with two simple designs. Firstly, in addition to the input from the encoder, the decoder contains a conditional branch that incorporates information from task-specific priors, such as the unmasked image region in inpainting. Secondly, the decoder is much heavier than the encoder, allowing for more detailed recovery while only slightly increasing the total inference cost. The training cost of our asymmetric VQGAN is cheap, and we only need to retrain a new asymmetric decoder while keeping the vanilla VQGAN encoder and StableDiffusion unchanged. Our asymmetric VQGAN can be widely used in StableDiffusion-based inpainting and local editing methods. Extensive experiments demonstrate that it can significantly improve the inpainting and editing performance, while maintaining the original text-to-image capability. The code is available at https://github.com/buxiangzhiren/Asymmetric_VQGAN*
Evaluation results can be found in section 4.1 of the original paper.
Evaluation results can be found in section 4.1 of the original paper.
## Available checkpoints
@@ -16,40 +28,33 @@ Evaluation results can be found in section 4.1 of the original paper.
## Example Usage
```python
from io import BytesIO
from PIL import Image
import requests
from diffusers import AsymmetricAutoencoderKL, StableDiffusionInpaintPipeline
from diffusers.utils import load_image, make_image_grid
def download_image(url: str) -> Image.Image:
response = requests.get(url)
return Image.open(BytesIO(response.content)).convert("RGB")
prompt = "a photo of a person"
prompt = "a photo of a person with beard"
img_url = "https://huggingface.co/datasets/hf-internal-testing/diffusers-images/resolve/main/repaint/celeba_hq_256.png"
mask_url = "https://huggingface.co/datasets/hf-internal-testing/diffusers-images/resolve/main/repaint/mask_256.png"
image = download_image(img_url).resize((256, 256))
mask_image = download_image(mask_url).resize((256, 256))
original_image = load_image(img_url).resize((512, 512))
mask_image = load_image(mask_url).resize((512, 512))
pipe = StableDiffusionInpaintPipeline.from_pretrained("runwayml/stable-diffusion-inpainting")
pipe.vae = AsymmetricAutoencoderKL.from_pretrained("cross-attention/asymmetric-autoencoder-kl-x-1-5")
pipe.to("cuda")
image = pipe(prompt=prompt, image=image, mask_image=mask_image).images[0]
image.save("image.jpeg")
image = pipe(prompt=prompt, image=original_image, mask_image=mask_image).images[0]
make_image_grid([original_image, mask_image, image], rows=1, cols=3)
```
## AsymmetricAutoencoderKL
[[autodoc]] models.autoencoder_asym_kl.AsymmetricAutoencoderKL
[[autodoc]] models.autoencoders.autoencoder_asym_kl.AsymmetricAutoencoderKL
## AutoencoderKLOutput
[[autodoc]] models.autoencoder_kl.AutoencoderKLOutput
[[autodoc]] models.autoencoders.autoencoder_kl.AutoencoderKLOutput
## DecoderOutput
[[autodoc]] models.vae.DecoderOutput
[[autodoc]] models.autoencoders.vae.DecoderOutput

View File

@@ -1,6 +1,18 @@
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->
# Tiny AutoEncoder
Tiny AutoEncoder for Stable Diffusion (TAESD) was introduced in [madebyollin/taesd](https://github.com/madebyollin/taesd) by Ollin Boer Bohan. It is a tiny distilled version of Stable Diffusion's VAE that can quickly decode the latents in a [`StableDiffusionPipeline`] or [`StableDiffusionXLPipeline`] almost instantly.
Tiny AutoEncoder for Stable Diffusion (TAESD) was introduced in [madebyollin/taesd](https://github.com/madebyollin/taesd) by Ollin Boer Bohan. It is a tiny distilled version of Stable Diffusion's VAE that can quickly decode the latents in a [`StableDiffusionPipeline`] or [`StableDiffusionXLPipeline`] almost instantly.
To use with Stable Diffusion v-2.1:
@@ -16,7 +28,7 @@ pipe = pipe.to("cuda")
prompt = "slice of delicious New York-style berry cheesecake"
image = pipe(prompt, num_inference_steps=25).images[0]
image.save("cheesecake.png")
image
```
To use with Stable Diffusion XL 1.0
@@ -33,7 +45,7 @@ pipe = pipe.to("cuda")
prompt = "slice of delicious New York-style berry cheesecake"
image = pipe(prompt, num_inference_steps=25).images[0]
image.save("cheesecake_sdxl.png")
image
```
## AutoencoderTiny
@@ -42,4 +54,4 @@ image.save("cheesecake_sdxl.png")
## AutoencoderTinyOutput
[[autodoc]] models.autoencoder_tiny.AutoencoderTinyOutput
[[autodoc]] models.autoencoders.autoencoder_tiny.AutoencoderTinyOutput

View File

@@ -1,3 +1,15 @@
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->
# AutoencoderKL
The variational autoencoder (VAE) model with KL loss was introduced in [Auto-Encoding Variational Bayes](https://arxiv.org/abs/1312.6114v11) by Diederik P. Kingma and Max Welling. The model is used in 🤗 Diffusers to encode images into latents and to decode latent representations into images.
@@ -14,21 +26,24 @@ from the original format using [`FromOriginalVAEMixin.from_single_file`] as foll
```py
from diffusers import AutoencoderKL
url = "https://huggingface.co/stabilityai/sd-vae-ft-mse-original/blob/main/vae-ft-mse-840000-ema-pruned.safetensors" # can also be local file
url = "https://huggingface.co/stabilityai/sd-vae-ft-mse-original/blob/main/vae-ft-mse-840000-ema-pruned.safetensors" # can also be a local file
model = AutoencoderKL.from_single_file(url)
```
## AutoencoderKL
[[autodoc]] AutoencoderKL
- decode
- encode
- all
## AutoencoderKLOutput
[[autodoc]] models.autoencoder_kl.AutoencoderKLOutput
[[autodoc]] models.autoencoders.autoencoder_kl.AutoencoderKLOutput
## DecoderOutput
[[autodoc]] models.vae.DecoderOutput
[[autodoc]] models.autoencoders.vae.DecoderOutput
## FlaxAutoencoderKL

View File

@@ -1,10 +1,22 @@
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->
# ControlNet
The ControlNet model was introduced in [Adding Conditional Control to Text-to-Image Diffusion Models](https://huggingface.co/papers/2302.05543) by Lvmin Zhang and Maneesh Agrawala. It provides a greater degree of control over text-to-image generation by conditioning the model on additional inputs such as edge maps, depth maps, segmentation maps, and keypoints for pose detection.
The ControlNet model was introduced in [Adding Conditional Control to Text-to-Image Diffusion Models](https://huggingface.co/papers/2302.05543) by Lvmin Zhang, Anyi Rao, Maneesh Agrawala. It provides a greater degree of control over text-to-image generation by conditioning the model on additional inputs such as edge maps, depth maps, segmentation maps, and keypoints for pose detection.
The abstract from the paper is:
*We present a neural network structure, ControlNet, to control pretrained large diffusion models to support additional input conditions. The ControlNet learns task-specific conditions in an end-to-end way, and the learning is robust even when the training dataset is small (< 50k). Moreover, training a ControlNet is as fast as fine-tuning a diffusion model, and the model can be trained on a personal devices. Alternatively, if powerful computation clusters are available, the model can scale to large amounts (millions to billions) of data. We report that large diffusion models like Stable Diffusion can be augmented with ControlNets to enable conditional inputs like edge maps, segmentation maps, keypoints, etc. This may enrich the methods to control large diffusion models and further facilitate related applications.*
*We present ControlNet, a neural network architecture to add spatial conditioning controls to large, pretrained text-to-image diffusion models. ControlNet locks the production-ready large diffusion models, and reuses their deep and robust encoding layers pretrained with billions of images as a strong backbone to learn a diverse set of conditional controls. The neural architecture is connected with "zero convolutions" (zero-initialized convolution layers) that progressively grow the parameters from zero and ensure that no harmful noise could affect the finetuning. We test various conditioning controls, eg, edges, depth, segmentation, human pose, etc, with Stable Diffusion, using single or multiple conditions, with or without prompts. We show that the training of ControlNets is robust with small (<50k) and large (>1m) datasets. Extensive results show that ControlNet may facilitate wider applications to control image diffusion models.*
## Loading from the original format

View File

@@ -1,8 +1,20 @@
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->
# Models
🤗 Diffusers provides pretrained models for popular algorithms and modules to create custom diffusion systems. The primary function of models is to denoise an input sample as modeled by the distribution \\(p_{\theta}(x_{t-1}|x_{t})\\).
🤗 Diffusers provides pretrained models for popular algorithms and modules to create custom diffusion systems. The primary function of models is to denoise an input sample as modeled by the distribution \\(p_{\theta}(x_{t-1}|x_{t})\\).
All models are built from the base [`ModelMixin`] class which is a [`torch.nn.module`](https://pytorch.org/docs/stable/generated/torch.nn.Module.html) providing basic functionality for saving and loading models, locally and from the Hugging Face Hub.
All models are built from the base [`ModelMixin`] class which is a [`torch.nn.Module`](https://pytorch.org/docs/stable/generated/torch.nn.Module.html) providing basic functionality for saving and loading models, locally and from the Hugging Face Hub.
## ModelMixin
[[autodoc]] ModelMixin
@@ -13,4 +25,4 @@ All models are built from the base [`ModelMixin`] class which is a [`torch.nn.mo
## PushToHubMixin
[[autodoc]] utils.PushToHubMixin
[[autodoc]] utils.PushToHubMixin

View File

@@ -1,7 +1,18 @@
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->
# Prior Transformer
The Prior Transformer was originally introduced in [Hierarchical Text-Conditional Image Generation with CLIP Latents
](https://huggingface.co/papers/2204.06125) by Ramesh et al. It is used to predict CLIP image embeddings from CLIP text embeddings; image embeddings are predicted through a denoising diffusion process.
The Prior Transformer was originally introduced in [Hierarchical Text-Conditional Image Generation with CLIP Latents](https://huggingface.co/papers/2204.06125) by Ramesh et al. It is used to predict CLIP image embeddings from CLIP text embeddings; image embeddings are predicted through a denoising diffusion process.
The abstract from the paper is:
@@ -13,4 +24,4 @@ The abstract from the paper is:
## PriorTransformerOutput
[[autodoc]] models.prior_transformer.PriorTransformerOutput
[[autodoc]] models.prior_transformer.PriorTransformerOutput

View File

@@ -1,3 +1,15 @@
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->
# Transformer2D
A Transformer model for image-like data from [CompVis](https://huggingface.co/CompVis) that is based on the [Vision Transformer](https://huggingface.co/papers/2010.11929) introduced by Dosovitskiy et al. The [`Transformer2DModel`] accepts discrete (classes of vector embeddings) or continuous (actual embeddings) inputs.

View File

@@ -1,3 +1,15 @@
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->
# Transformer Temporal
A Transformer model for video-like data.
@@ -8,4 +20,4 @@ A Transformer model for video-like data.
## TransformerTemporalModelOutput
[[autodoc]] models.transformer_temporal.TransformerTemporalModelOutput
[[autodoc]] models.transformer_temporal.TransformerTemporalModelOutput

View File

@@ -1,3 +1,15 @@
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->
# UNetMotionModel
The [UNet](https://huggingface.co/papers/1505.04597) model was originally introduced by Ronneberger et al for biomedical image segmentation, but it is also commonly used in 🤗 Diffusers because it outputs images that are the same size as the input. It is one of the most important components of a diffusion system because it facilitates the actual diffusion process. There are several variants of the UNet model in 🤗 Diffusers, depending on it's number of dimensions and whether it is a conditional model or not. This is a 2D UNet model.

View File

@@ -1,6 +1,18 @@
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->
# UNet1DModel
The [UNet](https://huggingface.co/papers/1505.04597) model was originally introduced by Ronneberger et al for biomedical image segmentation, but it is also commonly used in 🤗 Diffusers because it outputs images that are the same size as the input. It is one of the most important components of a diffusion system because it facilitates the actual diffusion process. There are several variants of the UNet model in 🤗 Diffusers, depending on it's number of dimensions and whether it is a conditional model or not. This is a 1D UNet model.
The [UNet](https://huggingface.co/papers/1505.04597) model was originally introduced by Ronneberger et al. for biomedical image segmentation, but it is also commonly used in 🤗 Diffusers because it outputs images that are the same size as the input. It is one of the most important components of a diffusion system because it facilitates the actual diffusion process. There are several variants of the UNet model in 🤗 Diffusers, depending on it's number of dimensions and whether it is a conditional model or not. This is a 1D UNet model.
The abstract from the paper is:
@@ -10,4 +22,4 @@ The abstract from the paper is:
[[autodoc]] UNet1DModel
## UNet1DOutput
[[autodoc]] models.unet_1d.UNet1DOutput
[[autodoc]] models.unet_1d.UNet1DOutput

View File

@@ -1,6 +1,18 @@
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->
# UNet2DConditionModel
The [UNet](https://huggingface.co/papers/1505.04597) model was originally introduced by Ronneberger et al for biomedical image segmentation, but it is also commonly used in 🤗 Diffusers because it outputs images that are the same size as the input. It is one of the most important components of a diffusion system because it facilitates the actual diffusion process. There are several variants of the UNet model in 🤗 Diffusers, depending on it's number of dimensions and whether it is a conditional model or not. This is a 2D UNet conditional model.
The [UNet](https://huggingface.co/papers/1505.04597) model was originally introduced by Ronneberger et al. for biomedical image segmentation, but it is also commonly used in 🤗 Diffusers because it outputs images that are the same size as the input. It is one of the most important components of a diffusion system because it facilitates the actual diffusion process. There are several variants of the UNet model in 🤗 Diffusers, depending on it's number of dimensions and whether it is a conditional model or not. This is a 2D UNet conditional model.
The abstract from the paper is:
@@ -16,4 +28,4 @@ The abstract from the paper is:
[[autodoc]] models.unet_2d_condition_flax.FlaxUNet2DConditionModel
## FlaxUNet2DConditionOutput
[[autodoc]] models.unet_2d_condition_flax.FlaxUNet2DConditionOutput
[[autodoc]] models.unet_2d_condition_flax.FlaxUNet2DConditionOutput

View File

@@ -1,6 +1,18 @@
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->
# UNet2DModel
The [UNet](https://huggingface.co/papers/1505.04597) model was originally introduced by Ronneberger et al for biomedical image segmentation, but it is also commonly used in 🤗 Diffusers because it outputs images that are the same size as the input. It is one of the most important components of a diffusion system because it facilitates the actual diffusion process. There are several variants of the UNet model in 🤗 Diffusers, depending on it's number of dimensions and whether it is a conditional model or not. This is a 2D UNet model.
The [UNet](https://huggingface.co/papers/1505.04597) model was originally introduced by Ronneberger et al. for biomedical image segmentation, but it is also commonly used in 🤗 Diffusers because it outputs images that are the same size as the input. It is one of the most important components of a diffusion system because it facilitates the actual diffusion process. There are several variants of the UNet model in 🤗 Diffusers, depending on it's number of dimensions and whether it is a conditional model or not. This is a 2D UNet model.
The abstract from the paper is:
@@ -10,4 +22,4 @@ The abstract from the paper is:
[[autodoc]] UNet2DModel
## UNet2DOutput
[[autodoc]] models.unet_2d.UNet2DOutput
[[autodoc]] models.unet_2d.UNet2DOutput

View File

@@ -1,6 +1,18 @@
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->
# UNet3DConditionModel
The [UNet](https://huggingface.co/papers/1505.04597) model was originally introduced by Ronneberger et al for biomedical image segmentation, but it is also commonly used in 🤗 Diffusers because it outputs images that are the same size as the input. It is one of the most important components of a diffusion system because it facilitates the actual diffusion process. There are several variants of the UNet model in 🤗 Diffusers, depending on it's number of dimensions and whether it is a conditional model or not. This is a 3D UNet conditional model.
The [UNet](https://huggingface.co/papers/1505.04597) model was originally introduced by Ronneberger et al. for biomedical image segmentation, but it is also commonly used in 🤗 Diffusers because it outputs images that are the same size as the input. It is one of the most important components of a diffusion system because it facilitates the actual diffusion process. There are several variants of the UNet model in 🤗 Diffusers, depending on it's number of dimensions and whether it is a conditional model or not. This is a 3D UNet conditional model.
The abstract from the paper is:
@@ -10,4 +22,4 @@ The abstract from the paper is:
[[autodoc]] UNet3DConditionModel
## UNet3DConditionOutput
[[autodoc]] models.unet_3d_condition.UNet3DConditionOutput
[[autodoc]] models.unet_3d_condition.UNet3DConditionOutput

View File

@@ -1,3 +1,15 @@
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->
# VQModel
The VQ-VAE model was introduced in [Neural Discrete Representation Learning](https://huggingface.co/papers/1711.00937) by Aaron van den Oord, Oriol Vinyals and Koray Kavukcuoglu. The model is used in 🤗 Diffusers to decode latent representations into images. Unlike [`AutoencoderKL`], the [`VQModel`] works in a quantized latent space.
@@ -12,4 +24,4 @@ The abstract from the paper is:
## VQEncoderOutput
[[autodoc]] models.vq_model.VQEncoderOutput
[[autodoc]] models.vq_model.VQEncoderOutput

View File

@@ -1,3 +1,15 @@
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->
# Normalization layers
Customized normalization layers for supporting various models in 🤗 Diffusers.
@@ -10,6 +22,10 @@ Customized normalization layers for supporting various models in 🤗 Diffusers.
[[autodoc]] models.normalization.AdaLayerNormZero
## AdaLayerNormSingle
[[autodoc]] models.normalization.AdaLayerNormSingle
## AdaGroupNorm
[[autodoc]] models.normalization.AdaGroupNorm
[[autodoc]] models.normalization.AdaGroupNorm

View File

@@ -12,7 +12,7 @@ specific language governing permissions and limitations under the License.
# Outputs
All models outputs are subclasses of [`~utils.BaseOutput`], data structures containing all the information returned by the model. The outputs can also be used as tuples or dictionaries.
All model outputs are subclasses of [`~utils.BaseOutput`], data structures containing all the information returned by the model. The outputs can also be used as tuples or dictionaries.
For example:
@@ -64,4 +64,4 @@ To check a specific pipeline or model output, refer to its corresponding API doc
## ImageTextPipelineOutput
[[autodoc]] ImageTextPipelineOutput
[[autodoc]] ImageTextPipelineOutput

View File

@@ -1,47 +0,0 @@
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->
# AltDiffusion
AltDiffusion was proposed in [AltCLIP: Altering the Language Encoder in CLIP for Extended Language Capabilities](https://huggingface.co/papers/2211.06679) by Zhongzhi Chen, Guang Liu, Bo-Wen Zhang, Fulong Ye, Qinghong Yang, Ledell Wu.
The abstract from the paper is:
*In this work, we present a conceptually simple and effective method to train a strong bilingual multimodal representation model. Starting from the pretrained multimodal representation model CLIP released by OpenAI, we switched its text encoder with a pretrained multilingual text encoder XLM-R, and aligned both languages and image representations by a two-stage training schema consisting of teacher learning and contrastive learning. We validate our method through evaluations of a wide range of tasks. We set new state-of-the-art performances on a bunch of tasks including ImageNet-CN, Flicker30k- CN, and COCO-CN. Further, we obtain very close performances with CLIP on almost all tasks, suggesting that one can simply alter the text encoder in CLIP for extended capabilities such as multilingual understanding.*
## Tips
`AltDiffusion` is conceptually the same as [Stable Diffusion](./stable_diffusion/overview).
<Tip>
Make sure to check out the Schedulers [guide](../../using-diffusers/schedulers) to learn how to explore the tradeoff between scheduler speed and quality, and see the [reuse components across pipelines](../../using-diffusers/loading#reuse-components-across-pipelines) section to learn how to efficiently load the same components into multiple pipelines.
</Tip>
## AltDiffusionPipeline
[[autodoc]] AltDiffusionPipeline
- all
- __call__
## AltDiffusionImg2ImgPipeline
[[autodoc]] AltDiffusionImg2ImgPipeline
- all
- __call__
## AltDiffusionPipelineOutput
[[autodoc]] pipelines.alt_diffusion.AltDiffusionPipelineOutput
- all
- __call__

View File

@@ -0,0 +1,48 @@
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->
# aMUSEd
aMUSEd was introduced in [aMUSEd: An Open MUSE Reproduction](https://huggingface.co/papers/2401.01808) by Suraj Patil, William Berman, Robin Rombach, and Patrick von Platen.
Amused is a lightweight text to image model based off of the [MUSE](https://arxiv.org/abs/2301.00704) architecture. Amused is particularly useful in applications that require a lightweight and fast model such as generating many images quickly at once.
Amused is a vqvae token based transformer that can generate an image in fewer forward passes than many diffusion models. In contrast with muse, it uses the smaller text encoder CLIP-L/14 instead of t5-xxl. Due to its small parameter count and few forward pass generation process, amused can generate many images quickly. This benefit is seen particularly at larger batch sizes.
The abstract from the paper is:
*We present aMUSEd, an open-source, lightweight masked image model (MIM) for text-to-image generation based on MUSE. With 10 percent of MUSE's parameters, aMUSEd is focused on fast image generation. We believe MIM is under-explored compared to latent diffusion, the prevailing approach for text-to-image generation. Compared to latent diffusion, MIM requires fewer inference steps and is more interpretable. Additionally, MIM can be fine-tuned to learn additional styles with only a single image. We hope to encourage further exploration of MIM by demonstrating its effectiveness on large-scale text-to-image generation and releasing reproducible training code. We also release checkpoints for two models which directly produce images at 256x256 and 512x512 resolutions.*
| Model | Params |
|-------|--------|
| [amused-256](https://huggingface.co/amused/amused-256) | 603M |
| [amused-512](https://huggingface.co/amused/amused-512) | 608M |
## AmusedPipeline
[[autodoc]] AmusedPipeline
- __call__
- all
- enable_xformers_memory_efficient_attention
- disable_xformers_memory_efficient_attention
[[autodoc]] AmusedImg2ImgPipeline
- __call__
- all
- enable_xformers_memory_efficient_attention
- disable_xformers_memory_efficient_attention
[[autodoc]] AmusedInpaintPipeline
- __call__
- all
- enable_xformers_memory_efficient_attention
- disable_xformers_memory_efficient_attention

View File

@@ -14,11 +14,11 @@ specific language governing permissions and limitations under the License.
## Overview
[AnimateDiff: Animate Your Personalized Text-to-Image Diffusion Models without Specific Tuning](https://arxiv.org/abs/2307.04725) by Yuwei Guo, Ceyuan Yang*, Anyi Rao, Yaohui Wang, Yu Qiao, Dahua Lin, Bo Dai
[AnimateDiff: Animate Your Personalized Text-to-Image Diffusion Models without Specific Tuning](https://arxiv.org/abs/2307.04725) by Yuwei Guo, Ceyuan Yang, Anyi Rao, Yaohui Wang, Yu Qiao, Dahua Lin, Bo Dai.
The abstract of the paper is the following:
With the advance of text-to-image models (e.g., Stable Diffusion) and corresponding personalization techniques such as DreamBooth and LoRA, everyone can manifest their imagination into high-quality images at an affordable cost. Subsequently, there is a great demand for image animation techniques to further combine generated static images with motion dynamics. In this report, we propose a practical framework to animate most of the existing personalized text-to-image models once and for all, saving efforts in model-specific tuning. At the core of the proposed framework is to insert a newly initialized motion modeling module into the frozen text-to-image model and train it on video clips to distill reasonable motion priors. Once trained, by simply injecting this motion modeling module, all personalized versions derived from the same base T2I readily become text-driven models that produce diverse and personalized animated images. We conduct our evaluation on several public representative personalized text-to-image models across anime pictures and realistic photographs, and demonstrate that our proposed framework helps these models generate temporally smooth animation clips while preserving the domain and diversity of their outputs. Code and pre-trained weights will be publicly available at this https URL .
*With the advance of text-to-image models (e.g., Stable Diffusion) and corresponding personalization techniques such as DreamBooth and LoRA, everyone can manifest their imagination into high-quality images at an affordable cost. Subsequently, there is a great demand for image animation techniques to further combine generated static images with motion dynamics. In this report, we propose a practical framework to animate most of the existing personalized text-to-image models once and for all, saving efforts in model-specific tuning. At the core of the proposed framework is to insert a newly initialized motion modeling module into the frozen text-to-image model and train it on video clips to distill reasonable motion priors. Once trained, by simply injecting this motion modeling module, all personalized versions derived from the same base T2I readily become text-driven models that produce diverse and personalized animated images. We conduct our evaluation on several public representative personalized text-to-image models across anime pictures and realistic photographs, and demonstrate that our proposed framework helps these models generate temporally smooth animation clips while preserving the domain and diversity of their outputs. Code and pre-trained weights will be publicly available at [this https URL](https://animatediff.github.io/).*
## Available Pipelines
@@ -28,7 +28,7 @@ With the advance of text-to-image models (e.g., Stable Diffusion) and correspond
## Available checkpoints
Motion Adapter checkpoints can be found under [guoyww](https://huggingface.co/guoyww/). These checkpoints are meant to work with any model based on Stable Diffusion 1.4/1.5
Motion Adapter checkpoints can be found under [guoyww](https://huggingface.co/guoyww/). These checkpoints are meant to work with any model based on Stable Diffusion 1.4/1.5.
## Usage example
@@ -38,16 +38,21 @@ The following example demonstrates how to use a *MotionAdapter* checkpoint with
```python
import torch
from diffusers import MotionAdapter, AnimateDiffPipeline, DDIMScheduler
from diffusers import AnimateDiffPipeline, DDIMScheduler, MotionAdapter
from diffusers.utils import export_to_gif
# Load the motion adapter
adapter = MotionAdapter.from_pretrained("guoyww/animatediff-motion-adapter-v1-5-2")
adapter = MotionAdapter.from_pretrained("guoyww/animatediff-motion-adapter-v1-5-2", torch_dtype=torch.float16)
# load SD 1.5 based finetuned model
model_id = "SG161222/Realistic_Vision_V5.1_noVAE"
pipe = AnimateDiffPipeline.from_pretrained(model_id, motion_adapter=adapter)
pipe = AnimateDiffPipeline.from_pretrained(model_id, motion_adapter=adapter, torch_dtype=torch.float16)
scheduler = DDIMScheduler.from_pretrained(
model_id, subfolder="scheduler", clip_sample=False, timestep_spacing="linspace", steps_offset=1
model_id,
subfolder="scheduler",
clip_sample=False,
timestep_spacing="linspace",
beta_schedule="linear",
steps_offset=1,
)
pipe.scheduler = scheduler
@@ -70,6 +75,7 @@ output = pipe(
)
frames = output.frames[0]
export_to_gif(frames, "animation.gif")
```
Here are some sample outputs:
@@ -88,7 +94,7 @@ Here are some sample outputs:
<Tip>
AnimateDiff tends to work better with finetuned Stable Diffusion models. If you plan on using a scheduler that can clip samples, make sure to disable it by setting `clip_sample=False` in the scheduler as this can also have an adverse effect on generated samples.
AnimateDiff tends to work better with finetuned Stable Diffusion models. If you plan on using a scheduler that can clip samples, make sure to disable it by setting `clip_sample=False` in the scheduler as this can also have an adverse effect on generated samples. Additionally, the AnimateDiff checkpoints can be sensitive to the beta schedule of the scheduler. We recommend setting this to `linear`.
</Tip>
@@ -98,18 +104,25 @@ Motion LoRAs are a collection of LoRAs that work with the `guoyww/animatediff-mo
```python
import torch
from diffusers import MotionAdapter, AnimateDiffPipeline, DDIMScheduler
from diffusers import AnimateDiffPipeline, DDIMScheduler, MotionAdapter
from diffusers.utils import export_to_gif
# Load the motion adapter
adapter = MotionAdapter.from_pretrained("guoyww/animatediff-motion-adapter-v1-5-2")
adapter = MotionAdapter.from_pretrained("guoyww/animatediff-motion-adapter-v1-5-2", torch_dtype=torch.float16)
# load SD 1.5 based finetuned model
model_id = "SG161222/Realistic_Vision_V5.1_noVAE"
pipe = AnimateDiffPipeline.from_pretrained(model_id, motion_adapter=adapter)
pipe.load_lora_weights("guoyww/animatediff-motion-lora-zoom-out", adapter_name="zoom-out")
pipe = AnimateDiffPipeline.from_pretrained(model_id, motion_adapter=adapter, torch_dtype=torch.float16)
pipe.load_lora_weights(
"guoyww/animatediff-motion-lora-zoom-out", adapter_name="zoom-out"
)
scheduler = DDIMScheduler.from_pretrained(
model_id, subfolder="scheduler", clip_sample=False, timestep_spacing="linspace", steps_offset=1
model_id,
subfolder="scheduler",
clip_sample=False,
beta_schedule="linear",
timestep_spacing="linspace",
steps_offset=1,
)
pipe.scheduler = scheduler
@@ -132,6 +145,7 @@ output = pipe(
)
frames = output.frames[0]
export_to_gif(frames, "animation.gif")
```
<table>
@@ -160,21 +174,30 @@ Then you can use the following code to combine Motion LoRAs.
```python
import torch
from diffusers import MotionAdapter, AnimateDiffPipeline, DDIMScheduler
from diffusers import AnimateDiffPipeline, DDIMScheduler, MotionAdapter
from diffusers.utils import export_to_gif
# Load the motion adapter
adapter = MotionAdapter.from_pretrained("guoyww/animatediff-motion-adapter-v1-5-2")
adapter = MotionAdapter.from_pretrained("guoyww/animatediff-motion-adapter-v1-5-2", torch_dtype=torch.float16)
# load SD 1.5 based finetuned model
model_id = "SG161222/Realistic_Vision_V5.1_noVAE"
pipe = AnimateDiffPipeline.from_pretrained(model_id, motion_adapter=adapter)
pipe = AnimateDiffPipeline.from_pretrained(model_id, motion_adapter=adapter, torch_dtype=torch.float16)
pipe.load_lora_weights("diffusers/animatediff-motion-lora-zoom-out", adapter_name="zoom-out")
pipe.load_lora_weights("diffusers/animatediff-motion-lora-pan-left", adapter_name="pan-left")
pipe.load_lora_weights(
"diffusers/animatediff-motion-lora-zoom-out", adapter_name="zoom-out",
)
pipe.load_lora_weights(
"diffusers/animatediff-motion-lora-pan-left", adapter_name="pan-left",
)
pipe.set_adapters(["zoom-out", "pan-left"], adapter_weights=[1.0, 1.0])
scheduler = DDIMScheduler.from_pretrained(
model_id, subfolder="scheduler", clip_sample=False, timestep_spacing="linspace", steps_offset=1
model_id,
subfolder="scheduler",
clip_sample=False,
timestep_spacing="linspace",
beta_schedule="linear",
steps_offset=1,
)
pipe.scheduler = scheduler
@@ -197,6 +220,7 @@ output = pipe(
)
frames = output.frames[0]
export_to_gif(frames, "animation.gif")
```
<table>
@@ -211,6 +235,67 @@ export_to_gif(frames, "animation.gif")
</tr>
</table>
## Using FreeInit
[FreeInit: Bridging Initialization Gap in Video Diffusion Models](https://arxiv.org/abs/2312.07537) by Tianxing Wu, Chenyang Si, Yuming Jiang, Ziqi Huang, Ziwei Liu.
FreeInit is an effective method that improves temporal consistency and overall quality of videos generated using video-diffusion-models without any addition training. It can be applied to AnimateDiff, ModelScope, VideoCrafter and various other video generation models seamlessly at inference time, and works by iteratively refining the latent-initialization noise. More details can be found it the paper.
The following example demonstrates the usage of FreeInit.
```python
import torch
from diffusers import MotionAdapter, AnimateDiffPipeline, DDIMScheduler
from diffusers.utils import export_to_gif
adapter = MotionAdapter.from_pretrained("guoyww/animatediff-motion-adapter-v1-5-2")
model_id = "SG161222/Realistic_Vision_V5.1_noVAE"
pipe = AnimateDiffPipeline.from_pretrained(model_id, motion_adapter=adapter, torch_dtype=torch.float16).to("cuda")
pipe.scheduler = DDIMScheduler.from_pretrained(
model_id,
subfolder="scheduler",
beta_schedule="linear",
clip_sample=False,
timestep_spacing="linspace",
steps_offset=1
)
# enable memory savings
pipe.enable_vae_slicing()
pipe.enable_vae_tiling()
# enable FreeInit
# Refer to the enable_free_init documentation for a full list of configurable parameters
pipe.enable_free_init(method="butterworth", use_fast_sampling=True)
# run inference
output = pipe(
prompt="a panda playing a guitar, on a boat, in the ocean, high quality",
negative_prompt="bad quality, worse quality",
num_frames=16,
guidance_scale=7.5,
num_inference_steps=20,
generator=torch.Generator("cpu").manual_seed(666),
)
# disable FreeInit
pipe.disable_free_init()
frames = output.frames[0]
export_to_gif(frames, "animation.gif")
```
<Tip warning={true}>
FreeInit is not really free - the improved quality comes at the cost of extra computation. It requires sampling a few extra times depending on the `num_iters` parameter that is set when enabling it. Setting the `use_fast_sampling` parameter to `True` can improve the overall performance (at the cost of lower quality compared to when `use_fast_sampling=False` but still better results than vanilla video generation models).
</Tip>
<Tip>
Make sure to check out the Schedulers [guide](../../using-diffusers/schedulers) to learn how to explore the tradeoff between scheduler speed and quality, and see the [reuse components across pipelines](../../using-diffusers/loading#reuse-components-across-pipelines) section to learn how to efficiently load the same components into multiple pipelines.
</Tip>
## AnimateDiffPipeline
@@ -219,6 +304,8 @@ export_to_gif(frames, "animation.gif")
- __call__
- enable_freeu
- disable_freeu
- enable_free_init
- disable_free_init
- enable_vae_slicing
- disable_vae_slicing
- enable_vae_tiling
@@ -227,4 +314,3 @@ export_to_gif(frames, "animation.gif")
## AnimateDiffPipelineOutput
[[autodoc]] pipelines.animatediff.AnimateDiffPipelineOutput

View File

@@ -16,7 +16,7 @@ Attend-and-Excite for Stable Diffusion was proposed in [Attend-and-Excite: Atten
The abstract from the paper is:
*Text-to-image diffusion models have recently received a lot of interest for their astonishing ability to produce high-fidelity images from text only. However, achieving one-shot generation that aligns with the user's intent is nearly impossible, yet small changes to the input prompt often result in very different images. This leaves the user with little semantic control. To put the user in control, we show how to interact with the diffusion process to flexibly steer it along semantic directions. This semantic guidance (SEGA) allows for subtle and extensive edits, changes in composition and style, as well as optimizing the overall artistic conception. We demonstrate SEGA's effectiveness on a variety of tasks and provide evidence for its versatility and flexibility.*
*Recent text-to-image generative models have demonstrated an unparalleled ability to generate diverse and creative imagery guided by a target text prompt. While revolutionary, current state-of-the-art diffusion models may still fail in generating images that fully convey the semantics in the given text prompt. We analyze the publicly available Stable Diffusion model and assess the existence of catastrophic neglect, where the model fails to generate one or more of the subjects from the input prompt. Moreover, we find that in some cases the model also fails to correctly bind attributes (e.g., colors) to their corresponding subjects. To help mitigate these failure cases, we introduce the concept of Generative Semantic Nursing (GSN), where we seek to intervene in the generative process on the fly during inference time to improve the faithfulness of the generated images. Using an attention-based formulation of GSN, dubbed Attend-and-Excite, we guide the model to refine the cross-attention units to attend to all subject tokens in the text prompt and strengthen - or excite - their activations, encouraging the model to generate all subjects described in the text prompt. We compare our approach to alternative approaches and demonstrate that it conveys the desired concepts more faithfully across a range of text prompts.*
You can find additional information about Attend-and-Excite on the [project page](https://attendandexcite.github.io/Attend-and-Excite/), the [original codebase](https://github.com/AttendAndExcite/Attend-and-Excite), or try it out in a [demo](https://huggingface.co/spaces/AttendAndExcite/Attend-and-Excite).
@@ -34,4 +34,4 @@ Make sure to check out the Schedulers [guide](../../using-diffusers/schedulers)
## StableDiffusionPipelineOutput
[[autodoc]] pipelines.stable_diffusion.StableDiffusionPipelineOutput
[[autodoc]] pipelines.stable_diffusion.StableDiffusionPipelineOutput

View File

@@ -1,37 +0,0 @@
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->
# Audio Diffusion
[Audio Diffusion](https://github.com/teticio/audio-diffusion) is by Robert Dargavel Smith, and it leverages the recent advances in image generation from diffusion models by converting audio samples to and from Mel spectrogram images.
The original codebase, training scripts and example notebooks can be found at [teticio/audio-diffusion](https://github.com/teticio/audio-diffusion).
<Tip>
Make sure to check out the Schedulers [guide](../../using-diffusers/schedulers) to learn how to explore the tradeoff between scheduler speed and quality, and see the [reuse components across pipelines](../../using-diffusers/loading#reuse-components-across-pipelines) section to learn how to efficiently load the same components into multiple pipelines.
</Tip>
## AudioDiffusionPipeline
[[autodoc]] AudioDiffusionPipeline
- all
- __call__
## AudioPipelineOutput
[[autodoc]] pipelines.AudioPipelineOutput
## ImagePipelineOutput
[[autodoc]] pipelines.ImagePipelineOutput
## Mel
[[autodoc]] Mel

View File

@@ -19,9 +19,9 @@ sound effects, human speech and music.
The abstract from the paper is:
*Text-to-audio (TTA) system has recently gained attention for its ability to synthesize general audio based on text descriptions. However, previous studies in TTA have limited generation quality with high computational costs. In this study, we propose AudioLDM, a TTA system that is built on a latent space to learn the continuous audio representations from contrastive language-audio pretraining (CLAP) latents. The pretrained CLAP models enable us to train LDMs with audio embedding while providing text embedding as a condition during sampling. By learning the latent representations of audio signals and their compositions without modeling the cross-modal relationship, AudioLDM is advantageous in both generation quality and computational efficiency. Trained on AudioCaps with a single GPU, AudioLDM achieves state-of-the-art TTA performance measured by both objective and subjective metrics (e.g., frechet distance). Moreover, AudioLDM is the first TTA system that enables various text-guided audio manipulations (e.g., style transfer) in a zero-shot fashion. Our implementation and demos are available at https://audioldm.github.io.*
*Text-to-audio (TTA) system has recently gained attention for its ability to synthesize general audio based on text descriptions. However, previous studies in TTA have limited generation quality with high computational costs. In this study, we propose AudioLDM, a TTA system that is built on a latent space to learn the continuous audio representations from contrastive language-audio pretraining (CLAP) latents. The pretrained CLAP models enable us to train LDMs with audio embedding while providing text embedding as a condition during sampling. By learning the latent representations of audio signals and their compositions without modeling the cross-modal relationship, AudioLDM is advantageous in both generation quality and computational efficiency. Trained on AudioCaps with a single GPU, AudioLDM achieves state-of-the-art TTA performance measured by both objective and subjective metrics (e.g., frechet distance). Moreover, AudioLDM is the first TTA system that enables various text-guided audio manipulations (e.g., style transfer) in a zero-shot fashion. Our implementation and demos are available at [this https URL](https://audioldm.github.io/).*
The original codebase can be found at [haoheliu/AudioLDM](https://github.com/haoheliu/AudioLDM).
The original codebase can be found at [haoheliu/AudioLDM](https://github.com/haoheliu/AudioLDM).
## Tips
@@ -47,4 +47,4 @@ Make sure to check out the Schedulers [guide](../../using-diffusers/schedulers)
- __call__
## AudioPipelineOutput
[[autodoc]] pipelines.AudioPipelineOutput
[[autodoc]] pipelines.AudioPipelineOutput

View File

@@ -12,36 +12,23 @@ specific language governing permissions and limitations under the License.
# AudioLDM 2
AudioLDM 2 was proposed in [AudioLDM 2: Learning Holistic Audio Generation with Self-supervised Pretraining](https://arxiv.org/abs/2308.05734)
by Haohe Liu et al. AudioLDM 2 takes a text prompt as input and predicts the corresponding audio. It can generate
text-conditional sound effects, human speech and music.
AudioLDM 2 was proposed in [AudioLDM 2: Learning Holistic Audio Generation with Self-supervised Pretraining](https://arxiv.org/abs/2308.05734) by Haohe Liu et al. AudioLDM 2 takes a text prompt as input and predicts the corresponding audio. It can generate text-conditional sound effects, human speech and music.
Inspired by [Stable Diffusion](https://huggingface.co/docs/diffusers/api/pipelines/stable_diffusion/overview), AudioLDM 2
is a text-to-audio _latent diffusion model (LDM)_ that learns continuous audio representations from text embeddings. Two
text encoder models are used to compute the text embeddings from a prompt input: the text-branch of [CLAP](https://huggingface.co/docs/transformers/main/en/model_doc/clap)
and the encoder of [Flan-T5](https://huggingface.co/docs/transformers/main/en/model_doc/flan-t5). These text embeddings
are then projected to a shared embedding space by an [AudioLDM2ProjectionModel](https://huggingface.co/docs/diffusers/main/api/pipelines/audioldm2#diffusers.AudioLDM2ProjectionModel).
A [GPT2](https://huggingface.co/docs/transformers/main/en/model_doc/gpt2) _language model (LM)_ is used to auto-regressively
predict eight new embedding vectors, conditional on the projected CLAP and Flan-T5 embeddings. The generated embedding
vectors and Flan-T5 text embeddings are used as cross-attention conditioning in the LDM. The [UNet](https://huggingface.co/docs/diffusers/main/en/api/pipelines/audioldm2#diffusers.AudioLDM2UNet2DConditionModel)
of AudioLDM 2 is unique in the sense that it takes **two** cross-attention embeddings, as opposed to one cross-attention
conditioning, as in most other LDMs.
Inspired by [Stable Diffusion](https://huggingface.co/docs/diffusers/api/pipelines/stable_diffusion/overview), AudioLDM 2 is a text-to-audio _latent diffusion model (LDM)_ that learns continuous audio representations from text embeddings. Two text encoder models are used to compute the text embeddings from a prompt input: the text-branch of [CLAP](https://huggingface.co/docs/transformers/main/en/model_doc/clap) and the encoder of [Flan-T5](https://huggingface.co/docs/transformers/main/en/model_doc/flan-t5). These text embeddings are then projected to a shared embedding space by an [AudioLDM2ProjectionModel](https://huggingface.co/docs/diffusers/main/api/pipelines/audioldm2#diffusers.AudioLDM2ProjectionModel). A [GPT2](https://huggingface.co/docs/transformers/main/en/model_doc/gpt2) _language model (LM)_ is used to auto-regressively predict eight new embedding vectors, conditional on the projected CLAP and Flan-T5 embeddings. The generated embedding vectors and Flan-T5 text embeddings are used as cross-attention conditioning in the LDM. The [UNet](https://huggingface.co/docs/diffusers/main/en/api/pipelines/audioldm2#diffusers.AudioLDM2UNet2DConditionModel) of AudioLDM 2 is unique in the sense that it takes **two** cross-attention embeddings, as opposed to one cross-attention conditioning, as in most other LDMs.
The abstract of the paper is the following:
*Although audio generation shares commonalities across different types of audio, such as speech, music, and sound effects, designing models for each type requires careful consideration of specific objectives and biases that can significantly differ from those of other types. To bring us closer to a unified perspective of audio generation, this paper proposes a framework that utilizes the same learning method for speech, music, and sound effect generation. Our framework introduces a general representation of audio, called language of audio (LOA). Any audio can be translated into LOA based on AudioMAE, a self-supervised pre-trained representation learning model. In the generation process, we translate any modalities into LOA by using a GPT-2 model, and we perform self-supervised audio generation learning with a latent diffusion model conditioned on LOA. The proposed framework naturally brings advantages such as in-context learning abilities and reusable self-supervised pretrained AudioMAE and latent diffusion models. Experiments on the major benchmarks of text-to-audio, text-to-music, and text-to-speech demonstrate new state-of-the-art or competitive performance to previous approaches.*
*Although audio generation shares commonalities across different types of audio, such as speech, music, and sound effects, designing models for each type requires careful consideration of specific objectives and biases that can significantly differ from those of other types. To bring us closer to a unified perspective of audio generation, this paper proposes a framework that utilizes the same learning method for speech, music, and sound effect generation. Our framework introduces a general representation of audio, called "language of audio" (LOA). Any audio can be translated into LOA based on AudioMAE, a self-supervised pre-trained representation learning model. In the generation process, we translate any modalities into LOA by using a GPT-2 model, and we perform self-supervised audio generation learning with a latent diffusion model conditioned on LOA. The proposed framework naturally brings advantages such as in-context learning abilities and reusable self-supervised pretrained AudioMAE and latent diffusion models. Experiments on the major benchmarks of text-to-audio, text-to-music, and text-to-speech demonstrate state-of-the-art or competitive performance against previous approaches. Our code, pretrained model, and demo are available at [this https URL](https://audioldm.github.io/audioldm2).*
This pipeline was contributed by [sanchit-gandhi](https://huggingface.co/sanchit-gandhi). The original codebase can be
found at [haoheliu/audioldm2](https://github.com/haoheliu/audioldm2).
This pipeline was contributed by [sanchit-gandhi](https://huggingface.co/sanchit-gandhi). The original codebase can be found at [haoheliu/audioldm2](https://github.com/haoheliu/audioldm2).
## Tips
### Choosing a checkpoint
AudioLDM2 comes in three variants. Two of these checkpoints are applicable to the general task of text-to-audio
generation. The third checkpoint is trained exclusively on text-to-music generation.
AudioLDM2 comes in three variants. Two of these checkpoints are applicable to the general task of text-to-audio generation. The third checkpoint is trained exclusively on text-to-music generation.
All checkpoints share the same model size for the text encoders and VAE. They differ in the size and depth of the UNet.
All checkpoints share the same model size for the text encoders and VAE. They differ in the size and depth of the UNet.
See table below for details on the three checkpoints:
| Checkpoint | Task | UNet Model Size | Total Model Size | Training Data / h |
@@ -54,7 +41,7 @@ See table below for details on the three checkpoints:
* Descriptive prompt inputs work best: use adjectives to describe the sound (e.g. "high quality" or "clear") and make the prompt context specific (e.g. "water stream in a forest" instead of "stream").
* It's best to use general terms like "cat" or "dog" instead of specific names or abstract objects the model may not be familiar with.
* Using a **negative prompt** can significantly improve the quality of the generated waveform, by guiding the generation away from terms that correspond to poor quality audio. Try using a negative prompt of "Low quality."
* Using a **negative prompt** can significantly improve the quality of the generated waveform, by guiding the generation away from terms that correspond to poor quality audio. Try using a negative prompt of "Low quality."
### Controlling inference
@@ -63,7 +50,7 @@ See table below for details on the three checkpoints:
### Evaluating generated waveforms:
* The quality of the generated waveforms can vary significantly based on the seed. Try generating with different seeds until you find a satisfactory generation
* The quality of the generated waveforms can vary significantly based on the seed. Try generating with different seeds until you find a satisfactory generation.
* Multiple waveforms can be generated in one go: set `num_waveforms_per_prompt` to a value greater than 1. Automatic scoring will be performed between the generated waveforms and prompt text, and the audios ranked from best to worst accordingly.
The following example demonstrates how to construct good music generation using the aforementioned tips: [example](https://huggingface.co/docs/diffusers/main/en/api/pipelines/audioldm2#diffusers.AudioLDM2Pipeline.__call__.example).
@@ -88,4 +75,4 @@ Make sure to check out the Schedulers [guide](../../using-diffusers/schedulers)
- forward
## AudioPipelineOutput
[[autodoc]] pipelines.AudioPipelineOutput
[[autodoc]] pipelines.AudioPipelineOutput

View File

@@ -35,18 +35,18 @@ image = pipeline(prompt, num_inference_steps=25).images[0]
<Tip>
Check out the [AutoPipeline](/tutorials/autopipeline) tutorial to learn how to use this API!
Check out the [AutoPipeline](../../tutorials/autopipeline) tutorial to learn how to use this API!
</Tip>
`AutoPipeline` supports text-to-image, image-to-image, and inpainting for the following diffusion models:
- [Stable Diffusion](./stable_diffusion)
- [Stable Diffusion](./stable_diffusion/overview)
- [ControlNet](./controlnet)
- [Stable Diffusion XL (SDXL)](./stable_diffusion/stable_diffusion_xl)
- [DeepFloyd IF](./if)
- [Kandinsky](./kandinsky)
- [Kandinsky 2.2](./kandinsky#kandinsky-22)
- [DeepFloyd IF](./deepfloyd_if)
- [Kandinsky 2.1](./kandinsky)
- [Kandinsky 2.2](./kandinsky_v22)
## AutoPipelineForText2Image
@@ -56,7 +56,6 @@ Check out the [AutoPipeline](/tutorials/autopipeline) tutorial to learn how to u
- from_pretrained
- from_pipe
## AutoPipelineForImage2Image
[[autodoc]] AutoPipelineForImage2Image
@@ -70,5 +69,3 @@ Check out the [AutoPipeline](/tutorials/autopipeline) tutorial to learn how to u
- all
- from_pretrained
- from_pipe

View File

@@ -1,13 +1,25 @@
# Blip Diffusion
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
Blip Diffusion was proposed in [BLIP-Diffusion: Pre-trained Subject Representation for Controllable Text-to-Image Generation and Editing](https://arxiv.org/abs/2305.14720). It enables zero-shot subject-driven generation and control-guided zero-shot generation.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->
# BLIP-Diffusion
BLIP-Diffusion was proposed in [BLIP-Diffusion: Pre-trained Subject Representation for Controllable Text-to-Image Generation and Editing](https://arxiv.org/abs/2305.14720). It enables zero-shot subject-driven generation and control-guided zero-shot generation.
The abstract from the paper is:
*Subject-driven text-to-image generation models create novel renditions of an input subject based on text prompts. Existing models suffer from lengthy fine-tuning and difficulties preserving the subject fidelity. To overcome these limitations, we introduce BLIP-Diffusion, a new subject-driven image generation model that supports multimodal control which consumes inputs of subject images and text prompts. Unlike other subject-driven generation models, BLIP-Diffusion introduces a new multimodal encoder which is pre-trained to provide subject representation. We first pre-train the multimodal encoder following BLIP-2 to produce visual representation aligned with the text. Then we design a subject representation learning task which enables a diffusion model to leverage such visual representation and generates new subject renditions. Compared with previous methods such as DreamBooth, our model enables zero-shot subject-driven generation, and efficient fine-tuning for customized subject with up to 20x speedup. We also demonstrate that BLIP-Diffusion can be flexibly combined with existing techniques such as ControlNet and prompt-to-prompt to enable novel subject-driven generation and editing applications.*
*Subject-driven text-to-image generation models create novel renditions of an input subject based on text prompts. Existing models suffer from lengthy fine-tuning and difficulties preserving the subject fidelity. To overcome these limitations, we introduce BLIP-Diffusion, a new subject-driven image generation model that supports multimodal control which consumes inputs of subject images and text prompts. Unlike other subject-driven generation models, BLIP-Diffusion introduces a new multimodal encoder which is pre-trained to provide subject representation. We first pre-train the multimodal encoder following BLIP-2 to produce visual representation aligned with the text. Then we design a subject representation learning task which enables a diffusion model to leverage such visual representation and generates new subject renditions. Compared with previous methods such as DreamBooth, our model enables zero-shot subject-driven generation, and efficient fine-tuning for customized subject with up to 20x speedup. We also demonstrate that BLIP-Diffusion can be flexibly combined with existing techniques such as ControlNet and prompt-to-prompt to enable novel subject-driven generation and editing applications. Project page at [this https URL](https://dxli94.github.io/BLIP-Diffusion-website/).*
The original codebase can be found at [salesforce/LAVIS](https://github.com/salesforce/LAVIS/tree/main/projects/blip-diffusion). You can find the official BLIP Diffusion checkpoints under the [hf.co/SalesForce](https://hf.co/SalesForce) organization.
The original codebase can be found at [salesforce/LAVIS](https://github.com/salesforce/LAVIS/tree/main/projects/blip-diffusion). You can find the official BLIP-Diffusion checkpoints under the [hf.co/SalesForce](https://hf.co/SalesForce) organization.
`BlipDiffusionPipeline` and `BlipDiffusionControlNetPipeline` were contributed by [`ayushtues`](https://github.com/ayushtues/).

View File

@@ -1,10 +1,22 @@
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->
# Consistency Models
Consistency Models were proposed in [Consistency Models](https://huggingface.co/papers/2303.01469) by Yang Song, Prafulla Dhariwal, Mark Chen, and Ilya Sutskever.
The abstract from the paper is:
*Diffusion models have significantly advanced the fields of image, audio, and video generation, but they depend on an iterative sampling process that causes slow generation. To overcome this limitation, we propose consistency models, a new family of models that generate high quality samples by directly mapping noise to data. They support fast one-step generation by design, while still allowing multistep sampling to trade compute for sample quality. They also support zero-shot data editing, such as image inpainting, colorization, and super-resolution, without requiring explicit training on these tasks. Consistency models can be trained either by distilling pre-trained diffusion models, or as standalone generative models altogether. Through extensive experiments, we demonstrate that they outperform existing distillation techniques for diffusion models in one- and few-step sampling, achieving the new state-of-the-art FID of 3.55 on CIFAR-10 and 6.20 on ImageNet 64x64 for one-step generation. When trained in isolation, consistency models become a new family of generative models that can outperform existing one-step, non-adversarial generative models on standard benchmarks such as CIFAR-10, ImageNet 64x64 and LSUN 256x256. *
*Diffusion models have significantly advanced the fields of image, audio, and video generation, but they depend on an iterative sampling process that causes slow generation. To overcome this limitation, we propose consistency models, a new family of models that generate high quality samples by directly mapping noise to data. They support fast one-step generation by design, while still allowing multistep sampling to trade compute for sample quality. They also support zero-shot data editing, such as image inpainting, colorization, and super-resolution, without requiring explicit training on these tasks. Consistency models can be trained either by distilling pre-trained diffusion models, or as standalone generative models altogether. Through extensive experiments, we demonstrate that they outperform existing distillation techniques for diffusion models in one- and few-step sampling, achieving the new state-of-the-art FID of 3.55 on CIFAR-10 and 6.20 on ImageNet 64x64 for one-step generation. When trained in isolation, consistency models become a new family of generative models that can outperform existing one-step, non-adversarial generative models on standard benchmarks such as CIFAR-10, ImageNet 64x64 and LSUN 256x256.*
The original codebase can be found at [openai/consistency_models](https://github.com/openai/consistency_models), and additional checkpoints are available at [openai](https://huggingface.co/openai).
@@ -27,17 +39,18 @@ For an additional speed-up, use `torch.compile` to generate multiple images in <
+ pipe.unet = torch.compile(pipe.unet, mode="reduce-overhead", fullgraph=True)
# Multistep sampling
# Timesteps can be explicitly specified; the particular timesteps below are from the original Github repo:
# Timesteps can be explicitly specified; the particular timesteps below are from the original GitHub repo:
# https://github.com/openai/consistency_models/blob/main/scripts/launch.sh#L83
for _ in range(10):
image = pipe(timesteps=[17, 0]).images[0]
image.show()
```
## ConsistencyModelPipeline
[[autodoc]] ConsistencyModelPipeline
- all
- __call__
## ImagePipelineOutput
[[autodoc]] pipelines.ImagePipelineOutput
[[autodoc]] pipelines.ImagePipelineOutput

View File

@@ -12,13 +12,13 @@ specific language governing permissions and limitations under the License.
# ControlNet
ControlNet was introduced in [Adding Conditional Control to Text-to-Image Diffusion Models](https://huggingface.co/papers/2302.05543) by Lvmin Zhang and Maneesh Agrawala.
ControlNet was introduced in [Adding Conditional Control to Text-to-Image Diffusion Models](https://huggingface.co/papers/2302.05543) by Lvmin Zhang, Anyi Rao, and Maneesh Agrawala.
With a ControlNet model, you can provide an additional control image to condition and control Stable Diffusion generation. For example, if you provide a depth map, the ControlNet model generates an image that'll preserve the spatial information from the depth map. It is a more flexible and accurate way to control the image generation process.
The abstract from the paper is:
*We present a neural network structure, ControlNet, to control pretrained large diffusion models to support additional input conditions. The ControlNet learns task-specific conditions in an end-to-end way, and the learning is robust even when the training dataset is small (< 50k). Moreover, training a ControlNet is as fast as fine-tuning a diffusion model, and the model can be trained on a personal devices. Alternatively, if powerful computation clusters are available, the model can scale to large amounts (millions to billions) of data. We report that large diffusion models like Stable Diffusion can be augmented with ControlNets to enable conditional inputs like edge maps, segmentation maps, keypoints, etc. This may enrich the methods to control large diffusion models and further facilitate related applications.*
*We present ControlNet, a neural network architecture to add spatial conditioning controls to large, pretrained text-to-image diffusion models. ControlNet locks the production-ready large diffusion models, and reuses their deep and robust encoding layers pretrained with billions of images as a strong backbone to learn a diverse set of conditional controls. The neural architecture is connected with "zero convolutions" (zero-initialized convolution layers) that progressively grow the parameters from zero and ensure that no harmful noise could affect the finetuning. We test various conditioning controls, eg, edges, depth, segmentation, human pose, etc, with Stable Diffusion, using single or multiple conditions, with or without prompts. We show that the training of ControlNets is robust with small (<50k) and large (>1m) datasets. Extensive results show that ControlNet may facilitate wider applications to control image diffusion models.*
This model was contributed by [takuma104](https://huggingface.co/takuma104). ❤️
@@ -67,7 +67,6 @@ Make sure to check out the Schedulers [guide](../../using-diffusers/schedulers)
- load_textual_inversion
## StableDiffusionPipelineOutput
[[autodoc]] pipelines.stable_diffusion.StableDiffusionPipelineOutput
## FlaxStableDiffusionControlNetPipeline
@@ -76,5 +75,4 @@ Make sure to check out the Schedulers [guide](../../using-diffusers/schedulers)
- __call__
## FlaxStableDiffusionControlNetPipelineOutput
[[autodoc]] pipelines.stable_diffusion.FlaxStableDiffusionPipelineOutput
[[autodoc]] pipelines.stable_diffusion.FlaxStableDiffusionPipelineOutput

View File

@@ -12,13 +12,13 @@ specific language governing permissions and limitations under the License.
# ControlNet with Stable Diffusion XL
ControlNet was introduced in [Adding Conditional Control to Text-to-Image Diffusion Models](https://huggingface.co/papers/2302.05543) by Lvmin Zhang and Maneesh Agrawala.
ControlNet was introduced in [Adding Conditional Control to Text-to-Image Diffusion Models](https://huggingface.co/papers/2302.05543) by Lvmin Zhang, Anyi Rao, and Maneesh Agrawala.
With a ControlNet model, you can provide an additional control image to condition and control Stable Diffusion generation. For example, if you provide a depth map, the ControlNet model generates an image that'll preserve the spatial information from the depth map. It is a more flexible and accurate way to control the image generation process.
The abstract from the paper is:
*We present a neural network structure, ControlNet, to control pretrained large diffusion models to support additional input conditions. The ControlNet learns task-specific conditions in an end-to-end way, and the learning is robust even when the training dataset is small (< 50k). Moreover, training a ControlNet is as fast as fine-tuning a diffusion model, and the model can be trained on a personal devices. Alternatively, if powerful computation clusters are available, the model can scale to large amounts (millions to billions) of data. We report that large diffusion models like Stable Diffusion can be augmented with ControlNets to enable conditional inputs like edge maps, segmentation maps, keypoints, etc. This may enrich the methods to control large diffusion models and further facilitate related applications.*
*We present ControlNet, a neural network architecture to add spatial conditioning controls to large, pretrained text-to-image diffusion models. ControlNet locks the production-ready large diffusion models, and reuses their deep and robust encoding layers pretrained with billions of images as a strong backbone to learn a diverse set of conditional controls. The neural architecture is connected with "zero convolutions" (zero-initialized convolution layers) that progressively grow the parameters from zero and ensure that no harmful noise could affect the finetuning. We test various conditioning controls, eg, edges, depth, segmentation, human pose, etc, with Stable Diffusion, using single or multiple conditions, with or without prompts. We show that the training of ControlNets is robust with small (<50k) and large (>1m) datasets. Extensive results show that ControlNet may facilitate wider applications to control image diffusion models.*
You can find additional smaller Stable Diffusion XL (SDXL) ControlNet checkpoints from the 🤗 [Diffusers](https://huggingface.co/diffusers) Hub organization, and browse [community-trained](https://huggingface.co/models?other=stable-diffusion-xl&other=controlnet) checkpoints on the Hub.
@@ -28,7 +28,7 @@ You can find additional smaller Stable Diffusion XL (SDXL) ControlNet checkpoint
</Tip>
If you don't see a checkpoint you're interested in, you can train your own SDXL ControlNet with our [training script](https://github.com/huggingface/diffusers/blob/main/examples/controlnet/README_sdxl.md).
If you don't see a checkpoint you're interested in, you can train your own SDXL ControlNet with our [training script](../../../../../examples/controlnet/README_sdxl).
<Tip>
@@ -50,6 +50,6 @@ Make sure to check out the Schedulers [guide](../../using-diffusers/schedulers)
[[autodoc]] StableDiffusionXLControlNetInpaintPipeline
- all
- __call__
## StableDiffusionPipelineOutput
[[autodoc]] pipelines.stable_diffusion.StableDiffusionPipelineOutput
## StableDiffusionPipelineOutput
[[autodoc]] pipelines.stable_diffusion.StableDiffusionPipelineOutput

View File

@@ -1,33 +0,0 @@
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->
# Cycle Diffusion
Cycle Diffusion is a text guided image-to-image generation model proposed in [Unifying Diffusion Models' Latent Space, with Applications to CycleDiffusion and Guidance](https://huggingface.co/papers/2210.05559) by Chen Henry Wu, Fernando De la Torre.
The abstract from the paper is:
*Diffusion models have achieved unprecedented performance in generative modeling. The commonly-adopted formulation of the latent code of diffusion models is a sequence of gradually denoised samples, as opposed to the simpler (e.g., Gaussian) latent space of GANs, VAEs, and normalizing flows. This paper provides an alternative, Gaussian formulation of the latent space of various diffusion models, as well as an invertible DPM-Encoder that maps images into the latent space. While our formulation is purely based on the definition of diffusion models, we demonstrate several intriguing consequences. (1) Empirically, we observe that a common latent space emerges from two diffusion models trained independently on related domains. In light of this finding, we propose CycleDiffusion, which uses DPM-Encoder for unpaired image-to-image translation. Furthermore, applying CycleDiffusion to text-to-image diffusion models, we show that large-scale text-to-image diffusion models can be used as zero-shot image-to-image editors. (2) One can guide pre-trained diffusion models and GANs by controlling the latent codes in a unified, plug-and-play formulation based on energy-based models. Using the CLIP model and a face recognition model as guidance, we demonstrate that diffusion models have better coverage of low-density sub-populations and individuals than GANs.*
<Tip>
Make sure to check out the Schedulers [guide](../../using-diffusers/schedulers) to learn how to explore the tradeoff between scheduler speed and quality, and see the [reuse components across pipelines](../../using-diffusers/loading#reuse-components-across-pipelines) section to learn how to efficiently load the same components into multiple pipelines.
</Tip>
## CycleDiffusionPipeline
[[autodoc]] CycleDiffusionPipeline
- all
- __call__
## StableDiffusionPiplineOutput
[[autodoc]] pipelines.stable_diffusion.StableDiffusionPipelineOutput

View File

@@ -16,7 +16,6 @@ specific language governing permissions and limitations under the License.
Dance Diffusion is the first in a suite of generative audio tools for producers and musicians released by [Harmonai](https://github.com/Harmonai-org).
The original codebase of this implementation can be found at [Harmonai-org](https://github.com/Harmonai-org/sample-generator).
<Tip>
@@ -30,4 +29,4 @@ Make sure to check out the Schedulers [guide](../../using-diffusers/schedulers)
- __call__
## AudioPipelineOutput
[[autodoc]] pipelines.AudioPipelineOutput
[[autodoc]] pipelines.AudioPipelineOutput

View File

@@ -26,4 +26,4 @@ The original codebase can be found at [ermongroup/ddim](https://github.com/ermon
- __call__
## ImagePipelineOutput
[[autodoc]] pipelines.ImagePipelineOutput
[[autodoc]] pipelines.ImagePipelineOutput

View File

@@ -10,32 +10,31 @@ an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express o
specific language governing permissions and limitations under the License.
-->
# DeepFloyd IF
# DeepFloyd IF
## Overview
DeepFloyd IF is a novel state-of-the-art open-source text-to-image model with a high degree of photorealism and language understanding.
The model is a modular composed of a frozen text encoder and three cascaded pixel diffusion modules:
DeepFloyd IF is a novel state-of-the-art open-source text-to-image model with a high degree of photorealism and language understanding.
The model is a modular composed of a frozen text encoder and three cascaded pixel diffusion modules:
- Stage 1: a base model that generates 64x64 px image based on text prompt,
- Stage 2: a 64x64 px => 256x256 px super-resolution model, and a
- Stage 2: a 64x64 px => 256x256 px super-resolution model, and
- Stage 3: a 256x256 px => 1024x1024 px super-resolution model
Stage 1 and Stage 2 utilize a frozen text encoder based on the T5 transformer to extract text embeddings,
which are then fed into a UNet architecture enhanced with cross-attention and attention pooling.
Stage 3 is [Stability's x4 Upscaling model](https://huggingface.co/stabilityai/stable-diffusion-x4-upscaler).
The result is a highly efficient model that outperforms current state-of-the-art models, achieving a zero-shot FID score of 6.66 on the COCO dataset.
Stage 1 and Stage 2 utilize a frozen text encoder based on the T5 transformer to extract text embeddings, which are then fed into a UNet architecture enhanced with cross-attention and attention pooling.
Stage 3 is [Stability AI's x4 Upscaling model](https://huggingface.co/stabilityai/stable-diffusion-x4-upscaler).
The result is a highly efficient model that outperforms current state-of-the-art models, achieving a zero-shot FID score of 6.66 on the COCO dataset.
Our work underscores the potential of larger UNet architectures in the first stage of cascaded diffusion models and depicts a promising future for text-to-image synthesis.
## Usage
Before you can use IF, you need to accept its usage conditions. To do so:
1. Make sure to have a [Hugging Face account](https://huggingface.co/join) and be logged in
1. Make sure to have a [Hugging Face account](https://huggingface.co/join) and be logged in.
2. Accept the license on the model card of [DeepFloyd/IF-I-XL-v1.0](https://huggingface.co/DeepFloyd/IF-I-XL-v1.0). Accepting the license on the stage I model card will auto accept for the other IF models.
3. Make sure to login locally. Install `huggingface_hub`
3. Make sure to login locally. Install `huggingface_hub`:
```sh
pip install huggingface_hub --upgrade
```
run the login function in a Python shell
run the login function in a Python shell:
```py
from huggingface_hub import login
@@ -48,7 +47,7 @@ and enter your [Hugging Face Hub access token](https://huggingface.co/docs/hub/s
Next we install `diffusers` and dependencies:
```sh
pip install diffusers accelerate transformers safetensors
pip install -q diffusers accelerate transformers
```
The following sections give more in-detail examples of how to use IF. Specifically:
@@ -73,20 +72,17 @@ The following sections give more in-detail examples of how to use IF. Specifical
- *Stage-3*
- [stabilityai/stable-diffusion-x4-upscaler](https://huggingface.co/stabilityai/stable-diffusion-x4-upscaler)
**Demo**
[![Hugging Face Spaces](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https://huggingface.co/spaces/DeepFloyd/IF)
**Google Colab**
[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/deepfloyd_if_free_tier_google_colab.ipynb)
### Text-to-Image Generation
By default diffusers makes use of [model cpu offloading](https://huggingface.co/docs/diffusers/optimization/fp16#model-offloading-for-fast-inference-and-memory-savings)
to run the whole IF pipeline with as little as 14 GB of VRAM.
By default diffusers makes use of [model cpu offloading](../../optimization/memory#model-offloading) to run the whole IF pipeline with as little as 14 GB of VRAM.
```python
from diffusers import DiffusionPipeline
from diffusers.utils import pt_to_pil
from diffusers.utils import pt_to_pil, make_image_grid
import torch
# stage 1
@@ -117,48 +113,43 @@ generator = torch.manual_seed(1)
prompt_embeds, negative_embeds = stage_1.encode_prompt(prompt)
# stage 1
image = stage_1(
stage_1_output = stage_1(
prompt_embeds=prompt_embeds, negative_prompt_embeds=negative_embeds, generator=generator, output_type="pt"
).images
pt_to_pil(image)[0].save("./if_stage_I.png")
#pt_to_pil(stage_1_output)[0].save("./if_stage_I.png")
# stage 2
image = stage_2(
image=image,
stage_2_output = stage_2(
image=stage_1_output,
prompt_embeds=prompt_embeds,
negative_prompt_embeds=negative_embeds,
generator=generator,
output_type="pt",
).images
pt_to_pil(image)[0].save("./if_stage_II.png")
#pt_to_pil(stage_2_output)[0].save("./if_stage_II.png")
# stage 3
image = stage_3(prompt=prompt, image=image, noise_level=100, generator=generator).images
image[0].save("./if_stage_III.png")
stage_3_output = stage_3(prompt=prompt, image=stage_2_output, noise_level=100, generator=generator).images
#stage_3_output[0].save("./if_stage_III.png")
make_image_grid([pt_to_pil(stage_1_output)[0], pt_to_pil(stage_2_output)[0], stage_3_output[0]], rows=1, rows=3)
```
### Text Guided Image-to-Image Generation
The same IF model weights can be used for text-guided image-to-image translation or image variation.
In this case just make sure to load the weights using the [`IFInpaintingPipeline`] and [`IFInpaintingSuperResolutionPipeline`] pipelines.
In this case just make sure to load the weights using the [`IFImg2ImgPipeline`] and [`IFImg2ImgSuperResolutionPipeline`] pipelines.
**Note**: You can also directly move the weights of the text-to-image pipelines to the image-to-image pipelines
without loading them twice by making use of the [`~DiffusionPipeline.components()`] function as explained [here](#converting-between-different-pipelines).
without loading them twice by making use of the [`~DiffusionPipeline.components`] argument as explained [here](#converting-between-different-pipelines).
```python
from diffusers import IFImg2ImgPipeline, IFImg2ImgSuperResolutionPipeline, DiffusionPipeline
from diffusers.utils import pt_to_pil
from diffusers.utils import pt_to_pil, load_image, make_image_grid
import torch
from PIL import Image
import requests
from io import BytesIO
# download image
url = "https://raw.githubusercontent.com/CompVis/stable-diffusion/main/assets/stable-samples/img2img/sketch-mountains-input.jpg"
response = requests.get(url)
original_image = Image.open(BytesIO(response.content)).convert("RGB")
original_image = load_image(url)
original_image = original_image.resize((768, 512))
# stage 1
@@ -189,29 +180,30 @@ generator = torch.manual_seed(1)
prompt_embeds, negative_embeds = stage_1.encode_prompt(prompt)
# stage 1
image = stage_1(
stage_1_output = stage_1(
image=original_image,
prompt_embeds=prompt_embeds,
negative_prompt_embeds=negative_embeds,
generator=generator,
output_type="pt",
).images
pt_to_pil(image)[0].save("./if_stage_I.png")
#pt_to_pil(stage_1_output)[0].save("./if_stage_I.png")
# stage 2
image = stage_2(
image=image,
stage_2_output = stage_2(
image=stage_1_output,
original_image=original_image,
prompt_embeds=prompt_embeds,
negative_prompt_embeds=negative_embeds,
generator=generator,
output_type="pt",
).images
pt_to_pil(image)[0].save("./if_stage_II.png")
#pt_to_pil(stage_2_output)[0].save("./if_stage_II.png")
# stage 3
image = stage_3(prompt=prompt, image=image, generator=generator, noise_level=100).images
image[0].save("./if_stage_III.png")
stage_3_output = stage_3(prompt=prompt, image=stage_2_output, generator=generator, noise_level=100).images
#stage_3_output[0].save("./if_stage_III.png")
make_image_grid([original_image, pt_to_pil(stage_1_output)[0], pt_to_pil(stage_2_output)[0], stage_3_output[0]], rows=1, rows=4)
```
### Text Guided Inpainting Generation
@@ -224,24 +216,16 @@ without loading them twice by making use of the [`~DiffusionPipeline.components(
```python
from diffusers import IFInpaintingPipeline, IFInpaintingSuperResolutionPipeline, DiffusionPipeline
from diffusers.utils import pt_to_pil
from diffusers.utils import pt_to_pil, load_image, make_image_grid
import torch
from PIL import Image
import requests
from io import BytesIO
# download image
url = "https://huggingface.co/datasets/diffusers/docs-images/resolve/main/if/person.png"
response = requests.get(url)
original_image = Image.open(BytesIO(response.content)).convert("RGB")
original_image = original_image
original_image = load_image(url)
# download mask
url = "https://huggingface.co/datasets/diffusers/docs-images/resolve/main/if/glasses_mask.png"
response = requests.get(url)
mask_image = Image.open(BytesIO(response.content))
mask_image = mask_image
mask_image = load_image(url)
# stage 1
stage_1 = IFInpaintingPipeline.from_pretrained("DeepFloyd/IF-I-XL-v1.0", variant="fp16", torch_dtype=torch.float16)
@@ -271,7 +255,7 @@ generator = torch.manual_seed(1)
prompt_embeds, negative_embeds = stage_1.encode_prompt(prompt)
# stage 1
image = stage_1(
stage_1_output = stage_1(
image=original_image,
mask_image=mask_image,
prompt_embeds=prompt_embeds,
@@ -279,11 +263,11 @@ image = stage_1(
generator=generator,
output_type="pt",
).images
pt_to_pil(image)[0].save("./if_stage_I.png")
#pt_to_pil(stage_1_output)[0].save("./if_stage_I.png")
# stage 2
image = stage_2(
image=image,
stage_2_output = stage_2(
image=stage_1_output,
original_image=original_image,
mask_image=mask_image,
prompt_embeds=prompt_embeds,
@@ -291,11 +275,12 @@ image = stage_2(
generator=generator,
output_type="pt",
).images
pt_to_pil(image)[0].save("./if_stage_II.png")
#pt_to_pil(stage_1_output)[0].save("./if_stage_II.png")
# stage 3
image = stage_3(prompt=prompt, image=image, generator=generator, noise_level=100).images
image[0].save("./if_stage_III.png")
stage_3_output = stage_3(prompt=prompt, image=stage_2_output, generator=generator, noise_level=100).images
#stage_3_output[0].save("./if_stage_III.png")
make_image_grid([original_image, mask_image, pt_to_pil(stage_1_output)[0], pt_to_pil(stage_2_output)[0], stage_3_output[0]], rows=1, rows=5)
```
### Converting between different pipelines
@@ -332,13 +317,13 @@ pipe.to("cuda")
You can also run the diffusion process for a shorter number of timesteps.
This can either be done with the `num_inference_steps` argument
This can either be done with the `num_inference_steps` argument:
```py
pipe("<prompt>", num_inference_steps=30)
```
Or with the `timesteps` argument
Or with the `timesteps` argument:
```py
from diffusers.pipelines.deepfloyd_if import fast27_timesteps
@@ -347,8 +332,7 @@ pipe("<prompt>", timesteps=fast27_timesteps)
```
When doing image variation or inpainting, you can also decrease the number of timesteps
with the strength argument. The strength argument is the amount of noise to add to
the input image which also determines how many steps to run in the denoising process.
with the strength argument. The strength argument is the amount of noise to add to the input image which also determines how many steps to run in the denoising process.
A smaller number will vary the image less but run faster.
```py
@@ -362,18 +346,19 @@ You can also use [`torch.compile`](../../optimization/torch2.0). Note that we ha
with IF and it might not give expected results.
```py
from diffusers import DiffusionPipeline
import torch
pipe = DiffusionPipeline.from_pretrained("DeepFloyd/IF-I-XL-v1.0", variant="fp16", torch_dtype=torch.float16)
pipe.to("cuda")
pipe.text_encoder = torch.compile(pipe.text_encoder)
pipe.unet = torch.compile(pipe.unet)
pipe.text_encoder = torch.compile(pipe.text_encoder, mode="reduce-overhead", fullgraph=True)
pipe.unet = torch.compile(pipe.unet, mode="reduce-overhead", fullgraph=True)
```
### Optimizing for memory
When optimizing for GPU memory, we can use the standard diffusers cpu offloading APIs.
When optimizing for GPU memory, we can use the standard diffusers CPU offloading APIs.
Either the model based CPU offloading,
@@ -410,23 +395,21 @@ pipe = DiffusionPipeline.from_pretrained(
prompt_embeds, negative_embeds = pipe.encode_prompt("<prompt>")
```
For CPU RAM constrained machines like google colab free tier where we can't load all
model components to the CPU at once, we can manually only load the pipeline with
the text encoder or unet when the respective model components are needed.
For CPU RAM constrained machines like Google Colab free tier where we can't load all model components to the CPU at once, we can manually only load the pipeline with
the text encoder or UNet when the respective model components are needed.
```py
from diffusers import IFPipeline, IFSuperResolutionPipeline
import torch
import gc
from transformers import T5EncoderModel
from diffusers.utils import pt_to_pil
from diffusers.utils import pt_to_pil, make_image_grid
text_encoder = T5EncoderModel.from_pretrained(
"DeepFloyd/IF-I-XL-v1.0", subfolder="text_encoder", device_map="auto", load_in_8bit=True, variant="8bit"
)
# text to image
pipe = DiffusionPipeline.from_pretrained(
"DeepFloyd/IF-I-XL-v1.0",
text_encoder=text_encoder, # pass the previously instantiated 8bit text encoder
@@ -448,14 +431,14 @@ pipe = IFPipeline.from_pretrained(
)
generator = torch.Generator().manual_seed(0)
image = pipe(
stage_1_output = pipe(
prompt_embeds=prompt_embeds,
negative_prompt_embeds=negative_embeds,
output_type="pt",
generator=generator,
).images
pt_to_pil(image)[0].save("./if_stage_I.png")
#pt_to_pil(stage_1_output)[0].save("./if_stage_I.png")
# Remove the pipeline so we can load the super-resolution pipeline
del pipe
@@ -469,24 +452,24 @@ pipe = IFSuperResolutionPipeline.from_pretrained(
)
generator = torch.Generator().manual_seed(0)
image = pipe(
image=image,
stage_2_output = pipe(
image=stage_1_output,
prompt_embeds=prompt_embeds,
negative_prompt_embeds=negative_embeds,
output_type="pt",
generator=generator,
).images
pt_to_pil(image)[0].save("./if_stage_II.png")
#pt_to_pil(stage_2_output)[0].save("./if_stage_II.png")
make_image_grid([pt_to_pil(stage_1_output)[0], pt_to_pil(stage_2_output)[0]], rows=1, rows=2)
```
## Available Pipelines:
| Pipeline | Tasks | Colab
|---|---|:---:|
| [pipeline_if.py](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/deepfloyd_if/pipeline_if.py) | *Text-to-Image Generation* | - |
| [pipeline_if_superresolution.py](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/deepfloyd_if/pipeline_if.py) | *Text-to-Image Generation* | - |
| [pipeline_if_superresolution.py](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/deepfloyd_if/pipeline_if_superresolution.py) | *Text-to-Image Generation* | - |
| [pipeline_if_img2img.py](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/deepfloyd_if/pipeline_if_img2img.py) | *Image-to-Image Generation* | - |
| [pipeline_if_img2img_superresolution.py](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/deepfloyd_if/pipeline_if_img2img_superresolution.py) | *Image-to-Image Generation* | - |
| [pipeline_if_inpainting.py](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/deepfloyd_if/pipeline_if_inpainting.py) | *Image-to-Image Generation* | - |

View File

@@ -22,7 +22,7 @@ The original codebase can be found at [Xiang-cd/DiffEdit-stable-diffusion](https
This pipeline was contributed by [clarencechen](https://github.com/clarencechen). ❤️
## Tips
## Tips
* The pipeline can generate masks that can be fed into other inpainting pipelines.
* In order to generate an image using this pipeline, both an image mask (source and target prompts can be manually specified or generated, and passed to [`~StableDiffusionDiffEditPipeline.generate_mask`])
@@ -42,7 +42,7 @@ the phrases including "cat" to `negative_prompt` and "dog" to `prompt`.
* Swap the `source_prompt` and `target_prompt` in the arguments to `generate_mask`.
* Change the input prompt in [`~StableDiffusionDiffEditPipeline.invert`] to include "dog".
* Swap the `prompt` and `negative_prompt` in the arguments to call the pipeline to generate the final edited image.
* The source and target prompts, or their corresponding embeddings, can also be automatically generated. Please refer to the [DiffEdit](/using-diffusers/diffedit) guide for more details.
* The source and target prompts, or their corresponding embeddings, can also be automatically generated. Please refer to the [DiffEdit](../../using-diffusers/diffedit) guide for more details.
## StableDiffusionDiffEditPipeline
[[autodoc]] StableDiffusionDiffEditPipeline
@@ -52,4 +52,4 @@ the phrases including "cat" to `negative_prompt` and "dog" to `prompt`.
- __call__
## StableDiffusionPipelineOutput
[[autodoc]] pipelines.stable_diffusion.StableDiffusionPipelineOutput
[[autodoc]] pipelines.stable_diffusion.StableDiffusionPipelineOutput

View File

@@ -32,4 +32,4 @@ Make sure to check out the Schedulers [guide](../../using-diffusers/schedulers)
- __call__
## ImagePipelineOutput
[[autodoc]] pipelines.ImagePipelineOutput
[[autodoc]] pipelines.ImagePipelineOutput

View File

@@ -9,7 +9,7 @@ specific language governing permissions and limitations under the License.
# Kandinsky 2.1
Kandinsky 2.1 is created by [Arseniy Shakhmatov](https://github.com/cene555), [Anton Razzhigaev](https://github.com/razzant), [Aleksandr Nikolich](https://github.com/AlexWortega), [Igor Pavlov](https://github.com/boomb0om), [Andrey Kuznetsov](https://github.com/kuznetsoffandrey) and [Denis Dimitrov](https://github.com/denndimitrov).
Kandinsky 2.1 is created by [Arseniy Shakhmatov](https://github.com/cene555), [Anton Razzhigaev](https://github.com/razzant), [Aleksandr Nikolich](https://github.com/AlexWortega), [Vladimir Arkhipkin](https://github.com/oriBetelgeuse), [Igor Pavlov](https://github.com/boomb0om), [Andrey Kuznetsov](https://github.com/kuznetsoffandrey), and [Denis Dimitrov](https://github.com/denndimitrov).
The description from it's GitHub page is:
@@ -23,13 +23,19 @@ Check out the [Kandinsky Community](https://huggingface.co/kandinsky-community)
</Tip>
<Tip>
Make sure to check out the Schedulers [guide](../../using-diffusers/schedulers) to learn how to explore the tradeoff between scheduler speed and quality, and see the [reuse components across pipelines](../../using-diffusers/loading#reuse-components-across-pipelines) section to learn how to efficiently load the same components into multiple pipelines.
</Tip>
## KandinskyPriorPipeline
[[autodoc]] KandinskyPriorPipeline
- all
- __call__
- interpolate
## KandinskyPipeline
[[autodoc]] KandinskyPipeline

View File

@@ -0,0 +1,49 @@
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->
# Kandinsky 3
Kandinsky 3 is created by [Vladimir Arkhipkin](https://github.com/oriBetelgeuse),[Anastasia Maltseva](https://github.com/NastyaMittseva),[Igor Pavlov](https://github.com/boomb0om),[Andrei Filatov](https://github.com/anvilarth),[Arseniy Shakhmatov](https://github.com/cene555),[Andrey Kuznetsov](https://github.com/kuznetsoffandrey),[Denis Dimitrov](https://github.com/denndimitrov), [Zein Shaheen](https://github.com/zeinsh)
The description from it's Github page:
*Kandinsky 3.0 is an open-source text-to-image diffusion model built upon the Kandinsky2-x model family. In comparison to its predecessors, enhancements have been made to the text understanding and visual quality of the model, achieved by increasing the size of the text encoder and Diffusion U-Net models, respectively.*
Its architecture includes 3 main components:
1. [FLAN-UL2](https://huggingface.co/google/flan-ul2), which is an encoder decoder model based on the T5 architecture.
2. New U-Net architecture featuring BigGAN-deep blocks doubles depth while maintaining the same number of parameters.
3. Sber-MoVQGAN is a decoder proven to have superior results in image restoration.
The original codebase can be found at [ai-forever/Kandinsky-3](https://github.com/ai-forever/Kandinsky-3).
<Tip>
Check out the [Kandinsky Community](https://huggingface.co/kandinsky-community) organization on the Hub for the official model checkpoints for tasks like text-to-image, image-to-image, and inpainting.
</Tip>
<Tip>
Make sure to check out the schedulers [guide](../../using-diffusers/schedulers) to learn how to explore the tradeoff between scheduler speed and quality, and see the [reuse components across pipelines](../../using-diffusers/loading#reuse-components-across-pipelines) section to learn how to efficiently load the same components into multiple pipelines.
</Tip>
## Kandinsky3Pipeline
[[autodoc]] Kandinsky3Pipeline
- all
- __call__
## Kandinsky3Img2ImgPipeline
[[autodoc]] Kandinsky3Img2ImgPipeline
- all
- __call__

View File

@@ -9,7 +9,7 @@ specific language governing permissions and limitations under the License.
# Kandinsky 2.2
Kandinsky 2.1 is created by [Arseniy Shakhmatov](https://github.com/cene555), [Anton Razzhigaev](https://github.com/razzant), [Aleksandr Nikolich](https://github.com/AlexWortega), [Igor Pavlov](https://github.com/boomb0om), [Andrey Kuznetsov](https://github.com/kuznetsoffandrey) and [Denis Dimitrov](https://github.com/denndimitrov).
Kandinsky 2.2 is created by [Arseniy Shakhmatov](https://github.com/cene555), [Anton Razzhigaev](https://github.com/razzant), [Aleksandr Nikolich](https://github.com/AlexWortega), [Vladimir Arkhipkin](https://github.com/oriBetelgeuse), [Igor Pavlov](https://github.com/boomb0om), [Andrey Kuznetsov](https://github.com/kuznetsoffandrey), and [Denis Dimitrov](https://github.com/denndimitrov).
The description from it's GitHub page is:
@@ -23,6 +23,12 @@ Check out the [Kandinsky Community](https://huggingface.co/kandinsky-community)
</Tip>
<Tip>
Make sure to check out the schedulers [guide](../../using-diffusers/schedulers) to learn how to explore the tradeoff between scheduler speed and quality, and see the [reuse components across pipelines](../../using-diffusers/loading#reuse-components-across-pipelines) section to learn how to efficiently load the same components into multiple pipelines.
</Tip>
## KandinskyV22PriorPipeline
[[autodoc]] KandinskyV22PriorPipeline

View File

@@ -1,10 +1,22 @@
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->
# Latent Consistency Models
Latent Consistency Models (LCMs) were proposed in [Latent Consistency Models: Synthesizing High-Resolution Images with Few-Step Inference](https://arxiv.org/abs/2310.04378) by Simian Luo, Yiqin Tan, Longbo Huang, Jian Li, and Hang Zhao.
Latent Consistency Models (LCMs) were proposed in [Latent Consistency Models: Synthesizing High-Resolution Images with Few-Step Inference](https://huggingface.co/papers/2310.04378) by Simian Luo, Yiqin Tan, Longbo Huang, Jian Li, and Hang Zhao.
The abstract of the [paper](https://arxiv.org/pdf/2310.04378.pdf) is as follows:
The abstract of the paper is as follows:
*Latent Diffusion models (LDMs) have achieved remarkable results in synthesizing high-resolution images. However, the iterative sampling process is computationally intensive and leads to slow generation. Inspired by Consistency Models (song et al.), we propose Latent Consistency Models (LCMs), enabling swift inference with minimal steps on any pre-trained LDMs, including Stable Diffusion (rombach et al). Viewing the guided reverse diffusion process as solving an augmented probability flow ODE (PF-ODE), LCMs are designed to directly predict the solution of such ODE in latent space, mitigating the need for numerous iterations and allowing rapid, high-fidelity sampling. Efficiently distilled from pre-trained classifier-free guided diffusion models, a high-quality 768 x 768 2~4-step LCM takes only 32 A100 GPU hours for training. Furthermore, we introduce Latent Consistency Fine-tuning (LCF), a novel method that is tailored for fine-tuning LCMs on customized image datasets. Evaluation on the LAION-5B-Aesthetics dataset demonstrates that LCMs achieve state-of-the-art text-to-image generation performance with few-step inference.*
*Latent Diffusion models (LDMs) have achieved remarkable results in synthesizing high-resolution images. However, the iterative sampling process is computationally intensive and leads to slow generation. Inspired by Consistency Models (song et al.), we propose Latent Consistency Models (LCMs), enabling swift inference with minimal steps on any pre-trained LDMs, including Stable Diffusion (rombach et al). Viewing the guided reverse diffusion process as solving an augmented probability flow ODE (PF-ODE), LCMs are designed to directly predict the solution of such ODE in latent space, mitigating the need for numerous iterations and allowing rapid, high-fidelity sampling. Efficiently distilled from pre-trained classifier-free guided diffusion models, a high-quality 768 x 768 2~4-step LCM takes only 32 A100 GPU hours for training. Furthermore, we introduce Latent Consistency Fine-tuning (LCF), a novel method that is tailored for fine-tuning LCMs on customized image datasets. Evaluation on the LAION-5B-Aesthetics dataset demonstrates that LCMs achieve state-of-the-art text-to-image generation performance with few-step inference. Project Page: [this https URL](https://latent-consistency-models.github.io/).*
A demo for the [SimianLuo/LCM_Dreamshaper_v7](https://huggingface.co/SimianLuo/LCM_Dreamshaper_v7) checkpoint can be found [here](https://huggingface.co/spaces/SimianLuo/Latent_Consistency_Model).

View File

@@ -18,7 +18,7 @@ The abstract from the paper is:
*By decomposing the image formation process into a sequential application of denoising autoencoders, diffusion models (DMs) achieve state-of-the-art synthesis results on image data and beyond. Additionally, their formulation allows for a guiding mechanism to control the image generation process without retraining. However, since these models typically operate directly in pixel space, optimization of powerful DMs often consumes hundreds of GPU days and inference is expensive due to sequential evaluations. To enable DM training on limited computational resources while retaining their quality and flexibility, we apply them in the latent space of powerful pretrained autoencoders. In contrast to previous work, training diffusion models on such a representation allows for the first time to reach a near-optimal point between complexity reduction and detail preservation, greatly boosting visual fidelity. By introducing cross-attention layers into the model architecture, we turn diffusion models into powerful and flexible generators for general conditioning inputs such as text or bounding boxes and high-resolution synthesis becomes possible in a convolutional manner. Our latent diffusion models (LDMs) achieve a new state of the art for image inpainting and highly competitive performance on various tasks, including unconditional image generation, semantic scene synthesis, and super-resolution, while significantly reducing computational requirements compared to pixel-based DMs.*
The original codebase can be found at [Compvis/latent-diffusion](https://github.com/CompVis/latent-diffusion).
The original codebase can be found at [CompVis/latent-diffusion](https://github.com/CompVis/latent-diffusion).
<Tip>
@@ -37,4 +37,4 @@ Make sure to check out the Schedulers [guide](../../using-diffusers/schedulers)
- __call__
## ImagePipelineOutput
[[autodoc]] pipelines.ImagePipelineOutput
[[autodoc]] pipelines.ImagePipelineOutput

View File

@@ -1,35 +0,0 @@
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->
# Unconditional Latent Diffusion
Unconditional Latent Diffusion was proposed in [High-Resolution Image Synthesis with Latent Diffusion Models](https://huggingface.co/papers/2112.10752) by Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, Björn Ommer.
The abstract from the paper is:
*By decomposing the image formation process into a sequential application of denoising autoencoders, diffusion models (DMs) achieve state-of-the-art synthesis results on image data and beyond. Additionally, their formulation allows for a guiding mechanism to control the image generation process without retraining. However, since these models typically operate directly in pixel space, optimization of powerful DMs often consumes hundreds of GPU days and inference is expensive due to sequential evaluations. To enable DM training on limited computational resources while retaining their quality and flexibility, we apply them in the latent space of powerful pretrained autoencoders. In contrast to previous work, training diffusion models on such a representation allows for the first time to reach a near-optimal point between complexity reduction and detail preservation, greatly boosting visual fidelity. By introducing cross-attention layers into the model architecture, we turn diffusion models into powerful and flexible generators for general conditioning inputs such as text or bounding boxes and high-resolution synthesis becomes possible in a convolutional manner. Our latent diffusion models (LDMs) achieve a new state of the art for image inpainting and highly competitive performance on various tasks, including unconditional image generation, semantic scene synthesis, and super-resolution, while significantly reducing computational requirements compared to pixel-based DMs.*
The original codebase can be found at [CompVis/latent-diffusion](https://github.com/CompVis/latent-diffusion).
<Tip>
Make sure to check out the Schedulers [guide](../../using-diffusers/schedulers) to learn how to explore the tradeoff between scheduler speed and quality, and see the [reuse components across pipelines](../../using-diffusers/loading#reuse-components-across-pipelines) section to learn how to efficiently load the same components into multiple pipelines.
</Tip>
## LDMPipeline
[[autodoc]] LDMPipeline
- all
- __call__
## ImagePipelineOutput
[[autodoc]] pipelines.ImagePipelineOutput

View File

@@ -1,35 +0,0 @@
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->
# Text-to-image model editing
[Editing Implicit Assumptions in Text-to-Image Diffusion Models](https://huggingface.co/papers/2303.08084) is by Hadas Orgad, Bahjat Kawar, and Yonatan Belinkov. This pipeline enables editing diffusion model weights, such that its assumptions of a given concept are changed. The resulting change is expected to take effect in all prompt generations related to the edited concept.
The abstract from the paper is:
*Text-to-image diffusion models often make implicit assumptions about the world when generating images. While some assumptions are useful (e.g., the sky is blue), they can also be outdated, incorrect, or reflective of social biases present in the training data. Thus, there is a need to control these assumptions without requiring explicit user input or costly re-training. In this work, we aim to edit a given implicit assumption in a pre-trained diffusion model. Our Text-to-Image Model Editing method, TIME for short, receives a pair of inputs: a "source" under-specified prompt for which the model makes an implicit assumption (e.g., "a pack of roses"), and a "destination" prompt that describes the same setting, but with a specified desired attribute (e.g., "a pack of blue roses"). TIME then updates the model's cross-attention layers, as these layers assign visual meaning to textual tokens. We edit the projection matrices in these layers such that the source prompt is projected close to the destination prompt. Our method is highly efficient, as it modifies a mere 2.2% of the model's parameters in under one second. To evaluate model editing approaches, we introduce TIMED (TIME Dataset), containing 147 source and destination prompt pairs from various domains. Our experiments (using Stable Diffusion) show that TIME is successful in model editing, generalizes well for related prompts unseen during editing, and imposes minimal effect on unrelated generations.*
You can find additional information about model editing on the [project page](https://time-diffusion.github.io/), [original codebase](https://github.com/bahjat-kawar/time-diffusion), and try it out in a [demo](https://huggingface.co/spaces/bahjat-kawar/time-diffusion).
<Tip>
Make sure to check out the Schedulers [guide](../../using-diffusers/schedulers) to learn how to explore the tradeoff between scheduler speed and quality, and see the [reuse components across pipelines](../../using-diffusers/loading#reuse-components-across-pipelines) section to learn how to efficiently load the same components into multiple pipelines.
</Tip>
## StableDiffusionModelEditingPipeline
[[autodoc]] StableDiffusionModelEditingPipeline
- __call__
- all
## StableDiffusionPipelineOutput
[[autodoc]] pipelines.stable_diffusion.StableDiffusionPipelineOutput

View File

@@ -13,20 +13,17 @@ specific language governing permissions and limitations under the License.
# MusicLDM
MusicLDM was proposed in [MusicLDM: Enhancing Novelty in Text-to-Music Generation Using Beat-Synchronous Mixup Strategies](https://huggingface.co/papers/2308.01546) by Ke Chen, Yusong Wu, Haohe Liu, Marianna Nezhurina, Taylor Berg-Kirkpatrick, Shlomo Dubnov.
MusicLDM takes a text prompt as input and predicts the corresponding music sample.
MusicLDM takes a text prompt as input and predicts the corresponding music sample.
Inspired by [Stable Diffusion](https://huggingface.co/docs/diffusers/api/pipelines/stable_diffusion/overview) and [AudioLDM](https://huggingface.co/docs/diffusers/api/pipelines/audioldm/overview),
Inspired by [Stable Diffusion](https://huggingface.co/docs/diffusers/api/pipelines/stable_diffusion/overview) and [AudioLDM](https://huggingface.co/docs/diffusers/api/pipelines/audioldm),
MusicLDM is a text-to-music _latent diffusion model (LDM)_ that learns continuous audio representations from [CLAP](https://huggingface.co/docs/transformers/main/model_doc/clap)
latents.
MusicLDM is trained on a corpus of 466 hours of music data. Beat-synchronous data augmentation strategies are applied to
the music samples, both in the time domain and in the latent space. Using beat-synchronous data augmentation strategies
encourages the model to interpolate between the training samples, but stay within the domain of the training data. The
result is generated music that is more diverse while staying faithful to the corresponding style.
MusicLDM is trained on a corpus of 466 hours of music data. Beat-synchronous data augmentation strategies are applied to the music samples, both in the time domain and in the latent space. Using beat-synchronous data augmentation strategies encourages the model to interpolate between the training samples, but stay within the domain of the training data. The result is generated music that is more diverse while staying faithful to the corresponding style.
The abstract of the paper is the following:
*In this paper, we present MusicLDM, a state-of-the-art text-to-music model that adapts Stable Diffusion and AudioLDM architectures to the music domain. We achieve this by retraining the contrastive language-audio pretraining model (CLAP) and the Hifi-GAN vocoder, as components of MusicLDM, on a collection of music data samples. Then, we leverage a beat tracking model and propose two different mixup strategies for data augmentation: beat-synchronous audio mixup and beat-synchronous latent mixup, to encourage the model to generate music more diverse while still staying faithful to the corresponding style.*
*Diffusion models have shown promising results in cross-modal generation tasks, including text-to-image and text-to-audio generation. However, generating music, as a special type of audio, presents unique challenges due to limited availability of music data and sensitive issues related to copyright and plagiarism. In this paper, to tackle these challenges, we first construct a state-of-the-art text-to-music model, MusicLDM, that adapts Stable Diffusion and AudioLDM architectures to the music domain. We achieve this by retraining the contrastive language-audio pretraining model (CLAP) and the Hifi-GAN vocoder, as components of MusicLDM, on a collection of music data samples. Then, to address the limitations of training data and to avoid plagiarism, we leverage a beat tracking model and propose two different mixup strategies for data augmentation: beat-synchronous audio mixup and beat-synchronous latent mixup, which recombine training audio directly or via a latent embeddings space, respectively. Such mixup strategies encourage the model to interpolate between musical training samples and generate new music within the convex hull of the training data, making the generated music more diverse while still staying faithful to the corresponding style. In addition to popular evaluation metrics, we design several new evaluation metrics based on CLAP score to demonstrate that our proposed MusicLDM and beat-synchronous mixup strategies improve both the quality and novelty of generated music, as well as the correspondence between input text and generated music.*
This pipeline was contributed by [sanchit-gandhi](https://huggingface.co/sanchit-gandhi).
@@ -52,4 +49,4 @@ Make sure to check out the Schedulers [guide](../../using-diffusers/schedulers)
## MusicLDMPipeline
[[autodoc]] MusicLDMPipeline
- all
- __call__
- __call__

View File

@@ -31,6 +31,7 @@ The table below lists all the pipelines currently available in 🤗 Diffusers an
| Pipeline | Tasks |
|---|---|
| [AltDiffusion](alt_diffusion) | image2image |
| [AnimateDiff](animatediff) | text2video |
| [Attend-and-Excite](attend_and_excite) | text2image |
| [Audio Diffusion](audio_diffusion) | image2audio |
| [AudioLDM](audioldm) | text2audio |
@@ -39,6 +40,8 @@ The table below lists all the pipelines currently available in 🤗 Diffusers an
| [Consistency Models](consistency_models) | unconditional image generation |
| [ControlNet](controlnet) | text2image, image2image, inpainting |
| [ControlNet with Stable Diffusion XL](controlnet_sdxl) | text2image |
| [ControlNet-XS](controlnetxs) | text2image |
| [ControlNet-XS with Stable Diffusion XL](controlnetxs_sdxl) | text2image |
| [Cycle Diffusion](cycle_diffusion) | image2image |
| [Dance Diffusion](dance_diffusion) | unconditional audio generation |
| [DDIM](ddim) | unconditional image generation |
@@ -46,33 +49,37 @@ The table below lists all the pipelines currently available in 🤗 Diffusers an
| [DeepFloyd IF](deepfloyd_if) | text2image, image2image, inpainting, super-resolution |
| [DiffEdit](diffedit) | inpainting |
| [DiT](dit) | text2image |
| [GLIGEN](gligen) | text2image |
| [GLIGEN](stable_diffusion/gligen) | text2image |
| [InstructPix2Pix](pix2pix) | image editing |
| [Kandinsky](kandinsky) | text2image, image2image, inpainting, interpolation |
| [Kandinsky 2.1](kandinsky) | text2image, image2image, inpainting, interpolation |
| [Kandinsky 2.2](kandinsky_v22) | text2image, image2image, inpainting |
| [Kandinsky 3](kandinsky3) | text2image, image2image |
| [Latent Consistency Models](latent_consistency_models) | text2image |
| [Latent Diffusion](latent_diffusion) | text2image, super-resolution |
| [LDM3D](ldm3d_diffusion) | text2image, text-to-3D |
| [LDM3D](stable_diffusion/ldm3d_diffusion) | text2image, text-to-3D, text-to-pano, upscaling |
| [MultiDiffusion](panorama) | text2image |
| [MusicLDM](musicldm) | text2audio |
| [PaintByExample](paint_by_example) | inpainting |
| [Paint by Example](paint_by_example) | inpainting |
| [ParaDiGMS](paradigms) | text2image |
| [Pix2Pix Zero](pix2pix_zero) | image editing |
| [PixArt-α](pixart) | text2image |
| [PNDM](pndm) | unconditional image generation |
| [RePaint](repaint) | inpainting |
| [ScoreSdeVe](score_sde_ve) | unconditional image generation |
| [Score SDE VE](score_sde_ve) | unconditional image generation |
| [Self-Attention Guidance](self_attention_guidance) | text2image |
| [Semantic Guidance](semantic_stable_diffusion) | text2image |
| [Shap-E](shap_e) | text-to-3D, image-to-3D |
| [Spectrogram Diffusion](spectrogram_diffusion) | |
| [Stable Diffusion](stable_diffusion/overview) | text2image, image2image, depth2image, inpainting, image variation, latent upscaler, super-resolution |
| [Stable Diffusion Model Editing](model_editing) | model editing |
| [Stable Diffusion XL](stable_diffusion_xl) | text2image, image2image, inpainting |
| [Stable Diffusion XL](stable_diffusion/stable_diffusion_xl) | text2image, image2image, inpainting |
| [Stable Diffusion XL Turbo](stable_diffusion/sdxl_turbo) | text2image, image2image, inpainting |
| [Stable unCLIP](stable_unclip) | text2image, image variation |
| [KarrasVe](karras_ve) | unconditional image generation |
| [T2I Adapter](adapter) | text2image |
| [Stochastic Karras VE](stochastic_karras_ve) | unconditional image generation |
| [T2I-Adapter](stable_diffusion/adapter) | text2image |
| [Text2Video](text_to_video) | text2video, video2video |
| [Text2Video Zero](text_to_video_zero) | text2video |
| [UnCLIP](unclip) | text2image, image variation |
| [Text2Video-Zero](text_to_video_zero) | text2video |
| [unCLIP](unclip) | text2image, image variation |
| [Unconditional Latent Diffusion](latent_diffusion_uncond) | unconditional image generation |
| [UniDiffuser](unidiffuser) | text2image, image2text, image variation, text variation, unconditional image generation, unconditional audio generation |
| [Value-guided planning](value_guided_sampling) | value guided sampling |

View File

@@ -10,7 +10,7 @@ an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express o
specific language governing permissions and limitations under the License.
-->
# Paint By Example
# Paint by Example
[Paint by Example: Exemplar-based Image Editing with Diffusion Models](https://huggingface.co/papers/2211.13227) is by Binxin Yang, Shuyang Gu, Bo Zhang, Ting Zhang, Xuejin Chen, Xiaoyan Sun, Dong Chen, Fang Wen.
@@ -22,7 +22,7 @@ The original codebase can be found at [Fantasy-Studio/Paint-by-Example](https://
## Tips
PaintByExample is supported by the official [Fantasy-Studio/Paint-by-Example](https://huggingface.co/Fantasy-Studio/Paint-by-Example) checkpoint. The checkpoint is warm-started from [CompVis/stable-diffusion-v1-4](https://huggingface.co/CompVis/stable-diffusion-v1-4) to inpaint partly masked images conditioned on example and reference images.
Paint by Example is supported by the official [Fantasy-Studio/Paint-by-Example](https://huggingface.co/Fantasy-Studio/Paint-by-Example) checkpoint. The checkpoint is warm-started from [CompVis/stable-diffusion-v1-4](https://huggingface.co/CompVis/stable-diffusion-v1-4) to inpaint partly masked images conditioned on example and reference images.
<Tip>
@@ -36,4 +36,4 @@ Make sure to check out the Schedulers [guide](../../using-diffusers/schedulers)
- __call__
## StableDiffusionPipelineOutput
[[autodoc]] pipelines.stable_diffusion.StableDiffusionPipelineOutput
[[autodoc]] pipelines.stable_diffusion.StableDiffusionPipelineOutput

View File

@@ -22,19 +22,12 @@ You can find additional information about MultiDiffusion on the [project page](h
## Tips
While calling [`StableDiffusionPanoramaPipeline`], it's possible to specify the `view_batch_size` parameter to be > 1.
While calling [`StableDiffusionPanoramaPipeline`], it's possible to specify the `view_batch_size` parameter to be > 1.
For some GPUs with high performance, this can speedup the generation process and increase VRAM usage.
To generate panorama-like images make sure you pass the width parameter accordingly. We recommend a width value of 2048 which is the default.
Circular padding is applied to ensure there are no stitching artifacts when working with
panoramas to ensure a seamless transition from the rightmost part to the leftmost part.
By enabling circular padding (set `circular_padding=True`), the operation applies additional
crops after the rightmost point of the image, allowing the model to "see” the transition
from the rightmost part to the leftmost part. This helps maintain visual consistency in
a 360-degree sense and creates a proper “panorama” that can be viewed using 360-degree
panorama viewers. When decoding latents in Stable Diffusion, circular padding is applied
to ensure that the decoded latents match in the RGB space.
Circular padding is applied to ensure there are no stitching artifacts when working with panoramas to ensure a seamless transition from the rightmost part to the leftmost part. By enabling circular padding (set `circular_padding=True`), the operation applies additional crops after the rightmost point of the image, allowing the model to "see” the transition from the rightmost part to the leftmost part. This helps maintain visual consistency in a 360-degree sense and creates a proper “panorama” that can be viewed using 360-degree panorama viewers. When decoding latents in Stable Diffusion, circular padding is applied to ensure that the decoded latents match in the RGB space.
For example, without circular padding, there is a stitching artifact (default):
![img](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/indoor_%20no_circular_padding.png)
@@ -54,4 +47,4 @@ Make sure to check out the Schedulers [guide](../../using-diffusers/schedulers)
- all
## StableDiffusionPipelineOutput
[[autodoc]] pipelines.stable_diffusion.StableDiffusionPipelineOutput
[[autodoc]] pipelines.stable_diffusion.StableDiffusionPipelineOutput

View File

@@ -1,54 +0,0 @@
<!--Copyright 2023 ParaDiGMS authors and The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->
# Parallel Sampling of Diffusion Models
[Parallel Sampling of Diffusion Models](https://huggingface.co/papers/2305.16317) is by Andy Shih, Suneel Belkhale, Stefano Ermon, Dorsa Sadigh, Nima Anari.
The abstract from the paper is:
*Diffusion models are powerful generative models but suffer from slow sampling, often taking 1000 sequential denoising steps for one sample. As a result, considerable efforts have been directed toward reducing the number of denoising steps, but these methods hurt sample quality. Instead of reducing the number of denoising steps (trading quality for speed), in this paper we explore an orthogonal approach: can we run the denoising steps in parallel (trading compute for speed)? In spite of the sequential nature of the denoising steps, we show that surprisingly it is possible to parallelize sampling via Picard iterations, by guessing the solution of future denoising steps and iteratively refining until convergence. With this insight, we present ParaDiGMS, a novel method to accelerate the sampling of pretrained diffusion models by denoising multiple steps in parallel. ParaDiGMS is the first diffusion sampling method that enables trading compute for speed and is even compatible with existing fast sampling techniques such as DDIM and DPMSolver. Using ParaDiGMS, we improve sampling speed by 2-4x across a range of robotics and image generation models, giving state-of-the-art sampling speeds of 0.2s on 100-step DiffusionPolicy and 16s on 1000-step StableDiffusion-v2 with no measurable degradation of task reward, FID score, or CLIP score.*
The original codebase can be found at [AndyShih12/paradigms](https://github.com/AndyShih12/paradigms), and the pipeline was contributed by [AndyShih12](https://github.com/AndyShih12). ❤️
## Tips
This pipeline improves sampling speed by running denoising steps in parallel, at the cost of increased total FLOPs.
Therefore, it is better to call this pipeline when running on multiple GPUs. Otherwise, without enough GPU bandwidth
sampling may be even slower than sequential sampling.
The two parameters to play with are `parallel` (batch size) and `tolerance`.
- If it fits in memory, for a 1000-step DDPM you can aim for a batch size of around 100
(for example, 8 GPUs and `batch_per_device=12` to get `parallel=96`). A higher batch size
may not fit in memory, and lower batch size gives less parallelism.
- For tolerance, using a higher tolerance may get better speedups but can risk sample quality degradation.
If there is quality degradation with the default tolerance, then use a lower tolerance like `0.001`.
For a 1000-step DDPM on 8 A100 GPUs, you can expect around a 3x speedup from [`StableDiffusionParadigmsPipeline`] compared to the [`StableDiffusionPipeline`]
by setting `parallel=80` and `tolerance=0.1`.
🤗 Diffusers offers [distributed inference support](../training/distributed_inference) for generating multiple prompts
in parallel on multiple GPUs. But [`StableDiffusionParadigmsPipeline`] is designed for speeding up sampling of a single prompt by using multiple GPUs.
<Tip>
Make sure to check out the Schedulers [guide](../../using-diffusers/schedulers) to learn how to explore the tradeoff between scheduler speed and quality, and see the [reuse components across pipelines](../../using-diffusers/loading#reuse-components-across-pipelines) section to learn how to efficiently load the same components into multiple pipelines.
</Tip>
## StableDiffusionParadigmsPipeline
[[autodoc]] StableDiffusionParadigmsPipeline
- __call__
- all
## StableDiffusionPipelineOutput
[[autodoc]] pipelines.stable_diffusion.StableDiffusionPipelineOutput

View File

@@ -1,284 +0,0 @@
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->
# Pix2Pix Zero
[Zero-shot Image-to-Image Translation](https://huggingface.co/papers/2302.03027) is by Gaurav Parmar, Krishna Kumar Singh, Richard Zhang, Yijun Li, Jingwan Lu, and Jun-Yan Zhu.
The abstract from the paper is:
*Large-scale text-to-image generative models have shown their remarkable ability to synthesize diverse and high-quality images. However, it is still challenging to directly apply these models for editing real images for two reasons. First, it is hard for users to come up with a perfect text prompt that accurately describes every visual detail in the input image. Second, while existing models can introduce desirable changes in certain regions, they often dramatically alter the input content and introduce unexpected changes in unwanted regions. In this work, we propose pix2pix-zero, an image-to-image translation method that can preserve the content of the original image without manual prompting. We first automatically discover editing directions that reflect desired edits in the text embedding space. To preserve the general content structure after editing, we further propose cross-attention guidance, which aims to retain the cross-attention maps of the input image throughout the diffusion process. In addition, our method does not need additional training for these edits and can directly use the existing pre-trained text-to-image diffusion model. We conduct extensive experiments and show that our method outperforms existing and concurrent works for both real and synthetic image editing.*
You can find additional information about Pix2Pix Zero on the [project page](https://pix2pixzero.github.io/), [original codebase](https://github.com/pix2pixzero/pix2pix-zero), and try it out in a [demo](https://huggingface.co/spaces/pix2pix-zero-library/pix2pix-zero-demo).
## Tips
* The pipeline can be conditioned on real input images. Check out the code examples below to know more.
* The pipeline exposes two arguments namely `source_embeds` and `target_embeds`
that let you control the direction of the semantic edits in the final image to be generated. Let's say,
you wanted to translate from "cat" to "dog". In this case, the edit direction will be "cat -> dog". To reflect
this in the pipeline, you simply have to set the embeddings related to the phrases including "cat" to
`source_embeds` and "dog" to `target_embeds`. Refer to the code example below for more details.
* When you're using this pipeline from a prompt, specify the _source_ concept in the prompt. Taking
the above example, a valid input prompt would be: "a high resolution painting of a **cat** in the style of van gough".
* If you wanted to reverse the direction in the example above, i.e., "dog -> cat", then it's recommended to:
* Swap the `source_embeds` and `target_embeds`.
* Change the input prompt to include "dog".
* To learn more about how the source and target embeddings are generated, refer to the [original
paper](https://arxiv.org/abs/2302.03027). Below, we also provide some directions on how to generate the embeddings.
* Note that the quality of the outputs generated with this pipeline is dependent on how good the `source_embeds` and `target_embeds` are. Please, refer to [this discussion](#generating-source-and-target-embeddings) for some suggestions on the topic.
## Available Pipelines:
| Pipeline | Tasks | Demo
|---|---|:---:|
| [StableDiffusionPix2PixZeroPipeline](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_pix2pix_zero.py) | *Text-Based Image Editing* | [🤗 Space](https://huggingface.co/spaces/pix2pix-zero-library/pix2pix-zero-demo) |
<!-- TODO: add Colab -->
## Usage example
### Based on an image generated with the input prompt
```python
import requests
import torch
from diffusers import DDIMScheduler, StableDiffusionPix2PixZeroPipeline
def download(embedding_url, local_filepath):
r = requests.get(embedding_url)
with open(local_filepath, "wb") as f:
f.write(r.content)
model_ckpt = "CompVis/stable-diffusion-v1-4"
pipeline = StableDiffusionPix2PixZeroPipeline.from_pretrained(
model_ckpt, conditions_input_image=False, torch_dtype=torch.float16
)
pipeline.scheduler = DDIMScheduler.from_config(pipeline.scheduler.config)
pipeline.to("cuda")
prompt = "a high resolution painting of a cat in the style of van gogh"
src_embs_url = "https://github.com/pix2pixzero/pix2pix-zero/raw/main/assets/embeddings_sd_1.4/cat.pt"
target_embs_url = "https://github.com/pix2pixzero/pix2pix-zero/raw/main/assets/embeddings_sd_1.4/dog.pt"
for url in [src_embs_url, target_embs_url]:
download(url, url.split("/")[-1])
src_embeds = torch.load(src_embs_url.split("/")[-1])
target_embeds = torch.load(target_embs_url.split("/")[-1])
images = pipeline(
prompt,
source_embeds=src_embeds,
target_embeds=target_embeds,
num_inference_steps=50,
cross_attention_guidance_amount=0.15,
).images
images[0].save("edited_image_dog.png")
```
### Based on an input image
When the pipeline is conditioned on an input image, we first obtain an inverted
noise from it using a `DDIMInverseScheduler` with the help of a generated caption. Then
the inverted noise is used to start the generation process.
First, let's load our pipeline:
```py
import torch
from transformers import BlipForConditionalGeneration, BlipProcessor
from diffusers import DDIMScheduler, DDIMInverseScheduler, StableDiffusionPix2PixZeroPipeline
captioner_id = "Salesforce/blip-image-captioning-base"
processor = BlipProcessor.from_pretrained(captioner_id)
model = BlipForConditionalGeneration.from_pretrained(captioner_id, torch_dtype=torch.float16, low_cpu_mem_usage=True)
sd_model_ckpt = "CompVis/stable-diffusion-v1-4"
pipeline = StableDiffusionPix2PixZeroPipeline.from_pretrained(
sd_model_ckpt,
caption_generator=model,
caption_processor=processor,
torch_dtype=torch.float16,
safety_checker=None,
)
pipeline.scheduler = DDIMScheduler.from_config(pipeline.scheduler.config)
pipeline.inverse_scheduler = DDIMInverseScheduler.from_config(pipeline.scheduler.config)
pipeline.enable_model_cpu_offload()
```
Then, we load an input image for conditioning and obtain a suitable caption for it:
```py
import requests
from PIL import Image
img_url = "https://github.com/pix2pixzero/pix2pix-zero/raw/main/assets/test_images/cats/cat_6.png"
raw_image = Image.open(requests.get(img_url, stream=True).raw).convert("RGB").resize((512, 512))
caption = pipeline.generate_caption(raw_image)
```
Then we employ the generated caption and the input image to get the inverted noise:
```py
generator = torch.manual_seed(0)
inv_latents = pipeline.invert(caption, image=raw_image, generator=generator).latents
```
Now, generate the image with edit directions:
```py
# See the "Generating source and target embeddings" section below to
# automate the generation of these captions with a pre-trained model like Flan-T5 as explained below.
source_prompts = ["a cat sitting on the street", "a cat playing in the field", "a face of a cat"]
target_prompts = ["a dog sitting on the street", "a dog playing in the field", "a face of a dog"]
source_embeds = pipeline.get_embeds(source_prompts, batch_size=2)
target_embeds = pipeline.get_embeds(target_prompts, batch_size=2)
image = pipeline(
caption,
source_embeds=source_embeds,
target_embeds=target_embeds,
num_inference_steps=50,
cross_attention_guidance_amount=0.15,
generator=generator,
latents=inv_latents,
negative_prompt=caption,
).images[0]
image.save("edited_image.png")
```
## Generating source and target embeddings
The authors originally used the [GPT-3 API](https://openai.com/api/) to generate the source and target captions for discovering
edit directions. However, we can also leverage open source and public models for the same purpose.
Below, we provide an end-to-end example with the [Flan-T5](https://huggingface.co/docs/transformers/model_doc/flan-t5) model
for generating captions and [CLIP](https://huggingface.co/docs/transformers/model_doc/clip) for
computing embeddings on the generated captions.
**1. Load the generation model**:
```py
import torch
from transformers import AutoTokenizer, T5ForConditionalGeneration
tokenizer = AutoTokenizer.from_pretrained("google/flan-t5-xl")
model = T5ForConditionalGeneration.from_pretrained("google/flan-t5-xl", device_map="auto", torch_dtype=torch.float16)
```
**2. Construct a starting prompt**:
```py
source_concept = "cat"
target_concept = "dog"
source_text = f"Provide a caption for images containing a {source_concept}. "
"The captions should be in English and should be no longer than 150 characters."
target_text = f"Provide a caption for images containing a {target_concept}. "
"The captions should be in English and should be no longer than 150 characters."
```
Here, we're interested in the "cat -> dog" direction.
**3. Generate captions**:
We can use a utility like so for this purpose.
```py
def generate_captions(input_prompt):
input_ids = tokenizer(input_prompt, return_tensors="pt").input_ids.to("cuda")
outputs = model.generate(
input_ids, temperature=0.8, num_return_sequences=16, do_sample=True, max_new_tokens=128, top_k=10
)
return tokenizer.batch_decode(outputs, skip_special_tokens=True)
```
And then we just call it to generate our captions:
```py
source_captions = generate_captions(source_text)
target_captions = generate_captions(target_concept)
```
We encourage you to play around with the different parameters supported by the
`generate()` method ([documentation](https://huggingface.co/docs/transformers/main/en/main_classes/text_generation#transformers.generation_tf_utils.TFGenerationMixin.generate)) for the generation quality you are looking for.
**4. Load the embedding model**:
Here, we need to use the same text encoder model used by the subsequent Stable Diffusion model.
```py
from diffusers import StableDiffusionPix2PixZeroPipeline
pipeline = StableDiffusionPix2PixZeroPipeline.from_pretrained(
"CompVis/stable-diffusion-v1-4", torch_dtype=torch.float16
)
pipeline = pipeline.to("cuda")
tokenizer = pipeline.tokenizer
text_encoder = pipeline.text_encoder
```
**5. Compute embeddings**:
```py
import torch
def embed_captions(sentences, tokenizer, text_encoder, device="cuda"):
with torch.no_grad():
embeddings = []
for sent in sentences:
text_inputs = tokenizer(
sent,
padding="max_length",
max_length=tokenizer.model_max_length,
truncation=True,
return_tensors="pt",
)
text_input_ids = text_inputs.input_ids
prompt_embeds = text_encoder(text_input_ids.to(device), attention_mask=None)[0]
embeddings.append(prompt_embeds)
return torch.concatenate(embeddings, dim=0).mean(dim=0).unsqueeze(0)
source_embeddings = embed_captions(source_captions, tokenizer, text_encoder)
target_embeddings = embed_captions(target_captions, tokenizer, text_encoder)
```
And you're done! [Here](https://colab.research.google.com/drive/1tz2C1EdfZYAPlzXXbTnf-5PRBiR8_R1F?usp=sharing) is a Colab Notebook that you can use to interact with the entire process.
Now, you can use these embeddings directly while calling the pipeline:
```py
from diffusers import DDIMScheduler
pipeline.scheduler = DDIMScheduler.from_config(pipeline.scheduler.config)
images = pipeline(
prompt,
source_embeds=source_embeddings,
target_embeds=target_embeddings,
num_inference_steps=50,
cross_attention_guidance_amount=0.15,
).images
images[0].save("edited_image_dog.png")
```
## StableDiffusionPix2PixZeroPipeline
[[autodoc]] StableDiffusionPix2PixZeroPipeline
- __call__
- all

View File

@@ -10,7 +10,7 @@ an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express o
specific language governing permissions and limitations under the License.
-->
# PixArt
# PixArt-α
![](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/pixart/header_collage.png)
@@ -24,13 +24,126 @@ You can find the original codebase at [PixArt-alpha/PixArt-alpha](https://github
Some notes about this pipeline:
* It uses a Transformer backbone (instead of a UNet) for denoising. As such it has a similar architecture as [DiT](./dit.md).
* It was trained using text conditions computed from T5. This aspect makes the pipeline better at following complex text prompts with intricate details.
* It uses a Transformer backbone (instead of a UNet) for denoising. As such it has a similar architecture as [DiT](./dit).
* It was trained using text conditions computed from T5. This aspect makes the pipeline better at following complex text prompts with intricate details.
* It is good at producing high-resolution images at different aspect ratios. To get the best results, the authors recommend some size brackets which can be found [here](https://github.com/PixArt-alpha/PixArt-alpha/blob/08fbbd281ec96866109bdd2cdb75f2f58fb17610/diffusion/data/datasets/utils.py).
* It rivals the quality of state-of-the-art text-to-image generation systems (as of this writing) such as Stable Diffusion XL, Imagen, and DALL-E 2, while being more efficient than them.
<Tip>
Make sure to check out the Schedulers [guide](../../using-diffusers/schedulers) to learn how to explore the tradeoff between scheduler speed and quality, and see the [reuse components across pipelines](../../using-diffusers/loading#reuse-components-across-pipelines) section to learn how to efficiently load the same components into multiple pipelines.
</Tip>
## Inference with under 8GB GPU VRAM
Run the [`PixArtAlphaPipeline`] with under 8GB GPU VRAM by loading the text encoder in 8-bit precision. Let's walk through a full-fledged example.
First, install the [bitsandbytes](https://github.com/TimDettmers/bitsandbytes) library:
```bash
pip install -U bitsandbytes
```
Then load the text encoder in 8-bit:
```python
from transformers import T5EncoderModel
from diffusers import PixArtAlphaPipeline
import torch
text_encoder = T5EncoderModel.from_pretrained(
"PixArt-alpha/PixArt-XL-2-1024-MS",
subfolder="text_encoder",
load_in_8bit=True,
device_map="auto",
)
pipe = PixArtAlphaPipeline.from_pretrained(
"PixArt-alpha/PixArt-XL-2-1024-MS",
text_encoder=text_encoder,
transformer=None,
device_map="auto"
)
```
Now, use the `pipe` to encode a prompt:
```python
with torch.no_grad():
prompt = "cute cat"
prompt_embeds, prompt_attention_mask, negative_embeds, negative_prompt_attention_mask = pipe.encode_prompt(prompt)
```
Since text embeddings have been computed, remove the `text_encoder` and `pipe` from the memory, and free up som GPU VRAM:
```python
import gc
def flush():
gc.collect()
torch.cuda.empty_cache()
del text_encoder
del pipe
flush()
```
Then compute the latents with the prompt embeddings as inputs:
```python
pipe = PixArtAlphaPipeline.from_pretrained(
"PixArt-alpha/PixArt-XL-2-1024-MS",
text_encoder=None,
torch_dtype=torch.float16,
).to("cuda")
latents = pipe(
negative_prompt=None,
prompt_embeds=prompt_embeds,
negative_prompt_embeds=negative_embeds,
prompt_attention_mask=prompt_attention_mask,
negative_prompt_attention_mask=negative_prompt_attention_mask,
num_images_per_prompt=1,
output_type="latent",
).images
del pipe.transformer
flush()
```
<Tip>
Notice that while initializing `pipe`, you're setting `text_encoder` to `None` so that it's not loaded.
</Tip>
Once the latents are computed, pass it off to the VAE to decode into a real image:
```python
with torch.no_grad():
image = pipe.vae.decode(latents / pipe.vae.config.scaling_factor, return_dict=False)[0]
image = pipe.image_processor.postprocess(image, output_type="pil")[0]
image.save("cat.png")
```
By deleting components you aren't using and flushing the GPU VRAM, you should be able to run [`PixArtAlphaPipeline`] with under 8GB GPU VRAM.
![](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/pixart/8bits_cat.png)
If you want a report of your memory-usage, run this [script](https://gist.github.com/sayakpaul/3ae0f847001d342af27018a96f467e4e).
<Tip warning={true}>
Text embeddings computed in 8-bit can impact the quality of the generated images because of the information loss in the representation space caused by the reduced precision. It's recommended to compare the outputs with and without 8-bit.
</Tip>
While loading the `text_encoder`, you set `load_in_8bit` to `True`. You could also specify `load_in_4bit` to bring your memory requirements down even further to under 7GB.
## PixArtAlphaPipeline
[[autodoc]] PixArtAlphaPipeline
- all
- __call__
- __call__

View File

@@ -1,35 +0,0 @@
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->
# PNDM
[Pseudo Numerical methods for Diffusion Models on manifolds](https://huggingface.co/papers/2202.09778) (PNDM) is by Luping Liu, Yi Ren, Zhijie Lin and Zhou Zhao.
The abstract from the paper is:
*Denoising Diffusion Probabilistic Models (DDPMs) can generate high-quality samples such as image and audio samples. However, DDPMs require hundreds to thousands of iterations to produce final samples. Several prior works have successfully accelerated DDPMs through adjusting the variance schedule (e.g., Improved Denoising Diffusion Probabilistic Models) or the denoising equation (e.g., Denoising Diffusion Implicit Models (DDIMs)). However, these acceleration methods cannot maintain the quality of samples and even introduce new noise at a high speedup rate, which limit their practicability. To accelerate the inference process while keeping the sample quality, we provide a fresh perspective that DDPMs should be treated as solving differential equations on manifolds. Under such a perspective, we propose pseudo numerical methods for diffusion models (PNDMs). Specifically, we figure out how to solve differential equations on manifolds and show that DDIMs are simple cases of pseudo numerical methods. We change several classical numerical methods to corresponding pseudo numerical methods and find that the pseudo linear multi-step method is the best in most situations. According to our experiments, by directly using pre-trained models on Cifar10, CelebA and LSUN, PNDMs can generate higher quality synthetic images with only 50 steps compared with 1000-step DDIMs (20x speedup), significantly outperform DDIMs with 250 steps (by around 0.4 in FID) and have good generalization on different variance schedules.*
The original codebase can be found at [luping-liu/PNDM](https://github.com/luping-liu/PNDM).
<Tip>
Make sure to check out the Schedulers [guide](../../using-diffusers/schedulers) to learn how to explore the tradeoff between scheduler speed and quality, and see the [reuse components across pipelines](../../using-diffusers/loading#reuse-components-across-pipelines) section to learn how to efficiently load the same components into multiple pipelines.
</Tip>
## PNDMPipeline
[[autodoc]] PNDMPipeline
- all
- __call__
## ImagePipelineOutput
[[autodoc]] pipelines.ImagePipelineOutput

Some files were not shown because too many files have changed in this diff Show More