Compare commits

...

341 Commits

Author SHA1 Message Date
yiyixuxu
88c704684c up 2024-02-15 04:32:54 +00:00
yiyixuxu
a45fed728c up 2024-02-15 04:25:41 +00:00
yiyixuxu
754b0532d7 fix 2024-02-15 04:15:01 +00:00
yiyixuxu
80251ed035 draft 2024-02-15 03:50:12 +00:00
yiyixuxu
ef8c0bf51d fix 2024-02-15 02:45:31 +00:00
yiyixuxu
04be74ed94 fix 2024-02-15 00:45:41 +00:00
yiyixuxu
e4bee5d8df fix a error 2024-02-15 00:42:19 +00:00
yiyixuxu
9b1ff58b40 first draft 2024-02-14 23:44:15 +00:00
Alex Umnov
e7696e20f9 Updated lora inference instructions (#6913)
* Updated lora inference instructions

* Update examples/dreambooth/README.md

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

* Update README.md

* Update README.md

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-02-13 09:35:20 +05:30
Piyush Thakur
4b89aeffe1 [Type annotations] fixed in save_model_card (#6948)
fixed type annotations

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-02-13 08:56:45 +05:30
Steven Liu
0a1daadef8 [docs] Community pipelines (#6929)
fix
2024-02-12 10:38:13 -08:00
Sayak Paul
371f765908 [Diffusers -> Original SD conversion] fix things (#6933)
* fix: bias loading bug

* fixes for SDXL

* apply changes to the conversion script to match single_file_utils.py

* do transpose to match the single file loading logic.
2024-02-12 17:30:22 +05:30
Piyush Thakur
75aee39eac [Model Card] standardize T2I Adapter Sdxl model card (#6947)
standardize model card template t21-adapter-sdxl
2024-02-12 16:43:20 +05:30
Dhruv Nair
215e6804d3 Unpin torch versions in CI (#6945)
* update

* update

* update

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-02-12 16:01:05 +05:30
Disty0
9254d1f39a Pass device to enable_model_cpu_offload in maybe_free_model_hooks (#6937) 2024-02-12 13:42:32 +05:30
Piyush Thakur
e1bdcc7af3 [Model Card] standardize T2I Sdxl Lora model card (#6944)
* standardize model card template t2i-lora-sdxl

* type annotations
2024-02-12 11:45:40 +05:30
Dhruv Nair
84905ca728 Update PixArt Alpha test module to match src module (#6943)
update
2024-02-12 11:01:33 +05:30
Piyush Thakur
6f336650c3 [Model Card] standardize T2I Sdxl model card (#6942)
standardize model card template t2i-sdxl
2024-02-12 10:01:20 +05:30
Piyush Thakur
06a042cd0e [Model Card] standardize T2I Lora model card (#6940)
standardize model card t2i-lora
2024-02-12 10:01:13 +05:30
Piyush Thakur
8772496586 [Model Card] standardize T2I model card (#6939)
* standardize model card

* fix base_model
2024-02-12 10:00:41 +05:30
dg845
35fd84be27 Replace hardcoded values in SchedulerCommonTest with properties (#5479)
---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: YiYi Xu <yixu310@gmail.com>
Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>
2024-02-10 21:34:20 -10:00
YiYi Xu
f2756253e6 Fix a bug in AutoPipeline.from_pipe when switching pipeline with optional components (#6820)
* fix

* add tests

---------

Co-authored-by: yiyixuxu <yixu310@gmail,com>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-02-09 21:56:52 -10:00
YiYi Xu
0071478d9e allow attention processors to have different signatures (#6915)
add

Co-authored-by: yiyixuxu <yixu310@gmail,com>
2024-02-09 21:56:19 -10:00
Sayak Paul
7c8cab313e post release 0.26.2 (#6885)
* post release

* style

* Empty-Commit

---------

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2024-02-09 07:36:38 -10:00
Sayak Paul
ca9ed5e8d1 [LoRA] deprecate certain lora methods from the old backend. (#6889)
* deprecate certain lora methods from the old backend.

* uncomment necessary things.

* safe remove old lora backend 👋
2024-02-09 17:14:32 +01:00
Bingxin Ke
98b6bee1a1 [Community Pipeline][Bug Fix] marigold_depth_estimation: input image value range (#6787)
[FIX] IMPORTANT: rgb normalization
2024-02-09 16:55:37 +01:00
Dhruv Nair
ab7113487c More IPAdapter test fixes (#6888)
update
2024-02-09 13:54:58 +05:30
camaro
59c307f1d5 Standardize model card for Controlnet (#6910)
* controlnet

* style

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-02-09 12:29:08 +05:30
Sayak Paul
159885adc6 correct hub_token exposition behaviour (thanks to @bghira). (#6918) 2024-02-08 18:38:27 -10:00
Aryan
7337eea59b [refactor] unnecessary lines in decode_latents in video pipelines (#6682)
* refactor decode latents in video pipelines

* make fix-copies
2024-02-08 17:10:52 -10:00
camaro
f07899a57c Standardize model card for Controlnet SDXL (#6908)
controlnet-sdxl
2024-02-09 07:53:39 +05:30
camaro
a83cc0c0bc Standardize model card for Controlnet flax (#6909)
* controlnet-flax

* style

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-02-09 07:52:56 +05:30
C Q
db5194a45d Fix Compatibility Issues in stable_diffusion_xl_reference.py (#6251)
* Fix Compatibility Issues in stable_diffusion_xl_reference.py

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: YiYi Xu <yixu310@gmail.com>
2024-02-08 10:52:59 -10:00
shaoxiaowang
e6c9c2513f fix examples/community/pipeline_stable_diffusion_xl_instantid.py (#6759)
Co-authored-by: wangshaoxiao <wangshaoxiao@xiaomi.com>
2024-02-08 10:09:40 -10:00
Kyunghwan Kim
d643b6691f Add vae tiling and slicing in img2img and inpaint (#6871)
* Add vae tiling in img2img and inpaint

* Add vae tiling not slicing
2024-02-08 09:50:00 -10:00
Michael
f5c9be3a0a Remove <cat-toy> validation prompt from examples/textual_inversion/textual_inversion_sdxl.py (#6877)
Remove <cat-toy> validation prompt from textual_inversion_sdxl.py

The `<cat-toy>` validation prompt is a default choice for the example task in the README. But no other part of `textual_inversion_sdxl.py` references the cat toy and `textual_inversion.py` has a default validation prompt of `None` as well.

So bring `textual_inversion_sdxl.py` in line with `textual_inversion.py` and change default validation prompt to `None`
2024-02-08 09:46:06 -10:00
Laisky.Cai
1824d0050e fix: load_image should support PIL Image (#6904)
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-02-08 08:25:37 -10:00
Sayak Paul
30e5e81d58 change to 2024 in the license (#6902)
change to 2024
2024-02-08 08:19:31 -10:00
Masamune Ishihara
8de78001df Add fps argument to export_to_gif function. (#6786) 2024-02-08 21:59:51 +05:30
Patryk Bartkowiak
3ac2357794 changed positional parameters to named parameters like in docs (#6905)
Co-authored-by: Patryk Bartkowiak <patryk.bartkowiak@tcl.com>
Co-authored-by: Linoy Tsaban <57615435+linoytsaban@users.noreply.github.com>
2024-02-08 21:39:03 +05:30
Ehsan Akhgari
17808a091e Fix bug when converting checkpoint to diffusers format (#6900)
This fixes #6899.
2024-02-08 18:52:11 +05:30
Sayak Paul
491a933a1b [I2VGenXL] attention_head_dim in the UNet (#6872)
* attention_head_dim

* debug

* print more info

* correct num_attention_heads behaviour

* down_block_num_attention_heads -> num_attention_heads.

* correct the image link in doc.

* add: deprecation for num_attention_head

* fix: test argument to use attention_head_dim

* more fixes.

* quality

* address comments.

* remove depcrecation.
2024-02-08 12:30:14 +05:30
Sayak Paul
aa82df52e7 [IP Adapters] introduce ip_adapter_image_embeds in the SD pipeline call (#6868)
* add: support for passing ip adapter image embeddings

* debugging

* make feature_extractor unloading conditioned on safety_checker

* better condition

* type annotation

* index to look into value slices

* more debugging

* debugging

* serialize embeddings dict

* better conditioning

* remove unnecessary prints.

* Update src/diffusers/loaders/ip_adapter.py

Co-authored-by: YiYi Xu <yixu310@gmail.com>

* make fix-copies and styling.

* styling and further copy fixing.

* fix: check_inputs call in controlnet sdxl img2img pipeline

---------

Co-authored-by: YiYi Xu <yixu310@gmail.com>
2024-02-08 11:10:10 +05:30
Srimanth Agastyaraju
a11b0f83b7 Fix: training resume from fp16 for SDXL Consistency Distillation (#6840)
* Fix: training resume from fp16 for lcm distill lora sdxl

* Fix coding quality - run linter

* Fix 1 - shift mixed precision cast before optimizer

* Fix 2 - State dict errors by removing load_lora_into_unet

* Update train_lcm_distill_lora_sdxl.py - Revert default cache dir to None

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-02-08 11:09:29 +05:30
Sayak Paul
1835510524 Remove torch_dtype in to() to end deprecation (#6886)
* remove torch_dtype from to()

* remove torch_dtype from usage scripts.

* remove old lora backend

* Revert "remove old lora backend"

This reverts commit adcddf6ba4.
2024-02-08 09:38:57 +05:30
camaro
4a3d52850b fix: keyword argument mismatch (#6895) 2024-02-08 09:37:56 +05:30
YiYi Xu
97d004b9b4 [ip-adapter] make sure length of scale is same as number of ip-adapters when using set_ip_adapter_scale (#6884)
add

Co-authored-by: yiyixuxu <yixu310@gmail,com>
2024-02-07 10:13:12 -10:00
Sayak Paul
76696dca55 [Model Card] standardize dreambooth model card (#6729)
* feat: standarize model card creation for dreambooth training.

* correct 'inference

* remove comments.

* take component out of kwargs

* style

* add: card template to have a leaner description.

* widget support.

* propagate changes to train_dreambooth_lora

* propagate changes to custom diffusion

* make widget properly type-annotated
2024-02-07 15:07:11 +05:30
Félix Sanz
17612de451 fix: typo in callback function name and property (#6834)
* fix: callback function name is incorrect

On this tutorial there is a function defined and then used inside `callback_on_step_end` argument, but the name was not correct (mismatch)

* fix: typo in num_timestep (correct is num_timesteps)

fixed property name
2024-02-06 12:05:40 -08:00
Dhruv Nair
994360f7a5 Fix last IP Adapter test (#6875)
update
2024-02-06 08:53:40 -10:00
Dhruv Nair
e6a48db633 Refactor Deepfloyd IF tests. (#6855)
* update

* update

* update
2024-02-06 16:43:17 +05:30
sayakpaul
4f1df69d1a Revert "add attention_head_dim"
This reverts commit 15f6b22466.
2024-02-06 14:48:49 +05:30
sayakpaul
15f6b22466 add attention_head_dim 2024-02-06 14:48:07 +05:30
Sayak Paul
e6fd9ada3a [I2vGenXL] clean up things (#6845)
* remove _to_tensor

* remove _to_tensor definition

* remove _collapse_frames_into_batch

* remove lora for not bloating the code.

* remove sample_size.

* simplify code a bit more

* ensure timesteps are always in tensor.
2024-02-06 09:22:07 +05:30
Edward Li
493228a708 Fix AutoencoderTiny with use_slicing (#6850)
* Fix `AutoencoderTiny` with `use_slicing`

When using slicing with AutoencoderTiny, the encoder mistakenly encodes the entire batch for every image in the batch.

* Fixed formatting issue
2024-02-05 09:18:22 -10:00
Dhruv Nair
8bf046b7fb Add single file and IP Adapter support to PIA Pipeline (#6851)
update
2024-02-05 16:23:18 +05:30
Dhruv Nair
bb99623d09 Update IP Adapter tests to use cosine similarity distance (#6806)
* update

* update
2024-02-05 16:22:59 +05:30
Dhruv Nair
fdf55b1f1c Fix posix path issue in testing utils (#6849)
update
2024-02-05 08:57:18 +05:30
小咩Goat
c6f8c310c3 Fix forward pass in UNetMotionModel when gradient checkpoint is enabled (#6744)
fix #6742

Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>
2024-02-05 08:04:01 +05:30
YiYi Xu
64909f17b7 update IP-adapter code in UNetMotionModel (#6828)
fix

Co-authored-by: yiyixuxu <yixu310@gmail,com>
2024-02-05 07:26:46 +05:30
Dhruv Nair
f09ca909c8 Multiple small fixes to Video Pipeline docs (#6805)
* update

* update

* update

* Update src/diffusers/pipelines/i2vgen_xl/pipeline_i2vgen_xl.py

Co-authored-by: YiYi Xu <yixu310@gmail.com>

* update

* update

---------

Co-authored-by: YiYi Xu <yixu310@gmail.com>
2024-02-05 07:24:38 +05:30
YiYi Xu
a5fc62f819 add self.use_ada_layer_norm_* params back to BasicTransformerBlock (#6841)
fix sd reference community ppeline

Co-authored-by: yiyixuxu <yixu310@gmail,com>
2024-02-04 11:16:44 -10:00
Linoy Tsaban
fbdf26bac5 [dreambooth lora sdxl] add sdxl micro conditioning (#6795)
* add micro conditioning

* remove redundant lines

* style

* fix missing 's'

* fix missing shape bug due to missing RGB if statement

* remove redundant if, change arg order

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-02-04 16:00:09 +02:00
Fabio Rigano
13001ee315 Bugfix in IPAdapterFaceID (#6835) 2024-02-03 08:56:55 -10:00
Linoy Tsaban
65329aed98 [advanced dreambooth lora sdxl script] new features + bug fixes (#6691)
* add noise_offset param

* micro conditioning - wip

* image processing adjusted and moved to support micro conditioning

* change time ids to be computed inside train loop

* change time ids to be computed inside train loop

* change time ids to be computed inside train loop

* time ids shape fix

* move token replacement of validation prompt to the same section of instance prompt and class prompt

* add offset noise to sd15 advanced script

* fix token loading during validation

* fix token loading during validation in sdxl script

* a little clean

* style

* a little clean

* style

* sdxl script - a little clean + minor path fix

sd 1.5 script - change default resolution value

* ad 1.5 script - minor path fix

* fix missing comma in code example in model card

* clean up commented lines

* style

* remove time ids computed outside training loop - no longer used now that we utilize micro-conditioning, as all time ids are now computed inside the training loop

* style

* [WIP] - added draft readme, building off of examples/dreambooth/README.md

* readme

* readme

* readme

* readme

* readme

* readme

* readme

* readme

* removed --crops_coords_top_left from CLI args

* style

* fix missing shape bug due to missing RGB if statement

* add blog mention at the start of the reamde as well

* Update examples/advanced_diffusion_training/README.md

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

* change note to render nicely as well

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-02-03 17:33:43 +02:00
Stephen
02338c9317 Change path to posix (testing_utils.py) (#6803)
change path to pathlib as_posix

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-02-03 12:44:13 +05:30
Younes Belkada
15ed53d272 Fixes LoRA SDXL training script with DDP + PEFT (#6816)
Update train_dreambooth_lora_sdxl.py
2024-02-03 09:46:32 +05:30
UmerHA
9cc59ba089 [Contributor Experience] Fix test collection on MPS (#6808)
* Update testing_utils.py

* Update testing_utils.py
2024-02-02 20:59:00 +05:30
YiYi Xu
adcbe674a4 [refactor]Scheduler.set_begin_index (#6728) 2024-02-01 09:51:02 -10:00
Sayak Paul
ec9840a5db [Refactor] harmonize the module structure for models in tests (#6738)
* harmonize the module structure for models in tests

* make the folders modules.

---------

Co-authored-by: YiYi Xu <yixu310@gmail.com>
2024-02-01 14:23:39 +05:30
YiYi Xu
093a03a1a1 add is_torchvision_available (#6800)
* add

* remove transformer

---------

Co-authored-by: yiyixuxu <yixu310@gmail,com>
2024-01-31 20:01:44 -10:00
Patrick von Platen
c3369f5673 fix torchvision import (#6796) 2024-01-31 12:13:10 -10:00
Sayak Paul
04cd6adf8c [Feat] add I2VGenXL for image-to-video generation (#6665)
---------

Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: YiYi Xu <yixu310@gmail.com>
2024-01-31 10:38:51 -10:00
YiYi Xu
66722dbea7 [sdxl k-diffusion pipeline]move sigma to device (#6757)
move sigma to device

Co-authored-by: yiyixuxu <yixu310@gmail,com>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-01-31 09:29:15 -10:00
YiYi Xu
2e8d18e699 [IP-Adapter] Support multiple IP-Adapters (#6573)
---------

Co-authored-by: yiyixuxu <yixu310@gmail,com>
Co-authored-by: Alvaro Somoza <somoza.alvaro@gmail.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2024-01-31 07:11:15 -10:00
Steven Liu
03373de0db [docs] Add missing parameter (#6775)
add missing param
2024-01-31 08:53:40 -08:00
Dhruv Nair
56bea6b4a1 Add PIA Model/Pipeline (#6698)
* update

* update

* updaet

* add tests and docs

* clean up

* add to toctree

* fix copies

* pr review feedback

* fix copies

* fix tests

* update docs

* update

* update

* update docs

* update

* update

* update

* update
2024-01-31 18:00:17 +02:00
Dhruv Nair
d7dc0ffd79 Fix setting scaling factor in VAE config (#6779)
fix
2024-01-31 19:47:22 +05:30
Kashif Rasul
97ee616971 add ipo, hinge and cpo loss to dpo trainer (#6788)
add ipo and hinge loss to dpo trainer
2024-01-31 16:41:31 +05:30
Sayak Paul
0fc62d1702 [Kandinsky tests] add is_flaky to test_model_cpu_offload_forward_pass (#6762)
* add is_flaky to test_model_cpu_offload_forward_pass

* style

* update

---------

Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>
2024-01-31 14:51:12 +05:30
Dhruv Nair
f4d3f913f4 Pin torch < 2.2.0 in test runners (#6780)
* update

* update

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-01-31 13:41:18 +05:30
Viet Nguyen
1cab64b3be Update train_diffusion_dpo.py (#6754)
* Update train_diffusion_dpo.py

Address #6702

* Update train_diffusion_dpo_sdxl.py

* Empty-Commit

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-01-31 12:46:23 +05:30
Sayak Paul
8d7dc85312 add note about serialization (#6764) 2024-01-31 12:45:40 +05:30
dg845
87a92f779c Fix bug in ResnetBlock2D.forward where LoRA Scale gets Overwritten (#6736)
Fix bug in ResnetBlock2D.forward when not USE_PEFT_BACKEND and using scale_shift for time emb where the lora scale  gets overwritten.

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-01-30 14:43:48 -10:00
Yunxuan Xiao
0db766ba77 [DDPMScheduler] Load alpha_cumprod to device to avoid redundant data movement. (#6704)
* load cumprod tensor to device

Signed-off-by: woshiyyya <xiaoyunxuan1998@gmail.com>

* fixing ci

Signed-off-by: woshiyyya <xiaoyunxuan1998@gmail.com>

* make fix-copies

Signed-off-by: woshiyyya <xiaoyunxuan1998@gmail.com>

---------

Signed-off-by: woshiyyya <xiaoyunxuan1998@gmail.com>
2024-01-30 13:19:37 -10:00
Dhruv Nair
8e94663503 Update export to video to support new tensor_to_vid function in video pipelines (#6715)
update
2024-01-30 19:43:33 +05:30
YiYi Xu
b09b90e24c udpate ip-adapter slow tests (#6760)
* udpate slices

* up

* hopefully last one

---------

Co-authored-by: yiyixuxu <yixu310@gmail,com>
2024-01-29 17:55:41 -10:00
Sajad Norouzi
058b47553e Fix mixed precision fine-tuning for text-to-image-lora-sdxl example. (#6751)
* Fix mixed precision fine-tuning for text-to-image-lora-sdxl example.

* fix text_encoder_two bug.

---------

Co-authored-by: Sajad Norouzi <sajadn@dev-dsk-sajadn-2a-87239470.us-west-2.amazon.com>
2024-01-30 06:55:02 +05:30
xhedit
7f58a76f48 Update lora.md with a more accurate description of rank (#6724)
* Update lora.md with a more accurate description of rank

* Update docs/source/en/training/lora.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2024-01-29 09:41:51 -08:00
Sayak Paul
09b7bfce91 [Core] move transformer scripts to transformers modules (#6747)
* move transformer scripts to transformers modules

* move transformer model test

* move prior transformer test to  directory

* fix doc path

* correct doc path

* add: __init__.py
2024-01-29 22:28:28 +05:30
Fabio Rigano
5d8b1987ec Add unload_textual_inversion method (#6656)
* Add unload_textual_inversion

* Fix dicts in tokenizer

* Fix quality

* Apply suggestions from code review

Co-authored-by: YiYi Xu <yixu310@gmail.com>

* Fix variable name after last update

---------

Co-authored-by: YiYi Xu <yixu310@gmail.com>
2024-01-29 06:26:22 -10:00
Sayak Paul
acd1962769 correct hflip arg (#6743) 2024-01-28 17:42:34 +05:30
Stephen
5b1b80a5b6 Change os.path to pathlib Path (#6737)
Change os.path to pathlib
2024-01-28 10:24:13 +05:30
gzguevara
8581d9bce4 changed to posix unet (#6719)
changed to posix

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-01-27 16:31:52 +05:30
Yingtian Liu
c101066227 Correct SNR weighted loss in v-prediction case by only adding 1 to SNR on the denominator (#6307)
* fix minsnr implementation for v-prediction case

* format code

* always compute snr when snr_gamma is specified

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-01-27 09:18:09 +05:30
Sayak Paul
d4c7ab7bf1 [Hub] feat: explicitly tag to diffusers when using push_to_hub (#6678)
* feat: explicitly tag to diffusers when using push_to_hub

* remove tags.

* reset repo.

* Apply suggestions from code review

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* fix: tests

* fix: push_to_hub behaviour for tagging from save_pretrained

* Apply suggestions from code review

Co-authored-by: Lucain <lucainp@gmail.com>

* Apply suggestions from code review

Co-authored-by: Lucain <lucainp@gmail.com>

* import fixes.

* add library name to existing model card.

* add: standalone test for generate_model_card

* fix tests for standalone method

* moved library_name to a better place.

* merge create_model_card and generate_model_card.

* fix test

* address lucain's comments

* fix return identation

* Apply suggestions from code review

Co-authored-by: Lucain <lucainp@gmail.com>

* address further comments.

* Update src/diffusers/pipelines/pipeline_utils.py

Co-authored-by: Lucain <lucainp@gmail.com>

---------

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Lucain <lucainp@gmail.com>
2024-01-26 23:01:48 +05:30
dg845
ea9dc3fa90 Add UFOGenScheduler to Community Examples (#6650)
* Add UFOGenScheduler with diffusers imports changed from relative to absolute.

* make style

* Add community README entry for UFOGenScheduler.
2024-01-26 15:11:14 +02:00
dg845
b4220e97b1 Add Community Example Consistency Training Script (#6717)
* initial commit for unconditional/class-conditional consistency training script

* make style

* Add entry for consistency training script in community README.

* Move consistency training script from community to research_projects/consistency_training

* Add requirements.txt and README to research_projects/consistency_training directory.

* Manually revert community README changes for consistency training.

* Fix path to script after moving script to research projects.

* Add option to load U-Net weights from pretrained model.

---------

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2024-01-26 15:10:57 +02:00
Dhruv Nair
dc85b578c2 Move tests for SD inference variant pipelines into their own modules (#6707)
* update

* update

* update
2024-01-26 14:09:41 +02:00
Dhruv Nair
0d927c7542 Add IP Adapters to slow tests (#6714)
update
2024-01-25 19:52:50 -10:00
Andrew Ishutin
5b93338235 fix custom diffusion training with concept list (#6710)
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-01-26 11:17:51 +08:00
Aryan V S
7c1c705f60 fix community README (#6645) 2024-01-25 17:52:52 -08:00
Aryan V S
9e72016468 [docs] AnimateDiff Video-to-Video (#6712)
* add animatediff vid2vid to docs

* Update docs/source/en/api/pipelines/animatediff.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* apply suggestions from review

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2024-01-25 17:51:43 -08:00
Patrick von Platen
3e9716f22b Correct sigmas cpu settings (#6708) 2024-01-26 09:32:24 +08:00
Steven Liu
87bfbc320d [docs] UViT2D (#6643)
* uvit2d

* fix

* fix?

* add correct paper

* fix paths

* update abstract
2024-01-25 09:37:28 -08:00
Aryan V S
a517f665a4 AnimateDiff Video to Video (#6328)
* begin animatediff img2video and video2video

* revert animatediff to original implementation

* add img2video as pipeline

* update

* add vid2vid pipeline

* update imports

* update

* remove copied from line for check_inputs

* update

* update examples

* add multi-batch support

* fix __init__.py files

* move img2vid to community

* update community readme and examples

* fix

* make fix-copies

* add vid2vid batch params

* apply suggestions from review

Co-Authored-By: Dhruv Nair <dhruv.nair@gmail.com>

* add test for animatediff vid2vid

* torch.stack -> torch.cat

Co-Authored-By: Dhruv Nair <dhruv.nair@gmail.com>

* make style

* docs for vid2vid

* update

* fix prepare_latents

* fix docs

* remove img2vid

* update README to :main

* remove slow test

* refactor pipeline output

* update docs

* update docs

* merge community readme from :main

* final fix i promise

* add support for url in animatediff example

* update example

* update callbacks to latest implementation

* Update src/diffusers/pipelines/animatediff/pipeline_animatediff_video2video.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Update src/diffusers/pipelines/animatediff/pipeline_animatediff_video2video.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* fix merge

* Apply suggestions from code review

* remove callback and callback_steps as suggested in review

* Update tests/pipelines/animatediff/test_animatediff_video2video.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* fix import error caused due to unet refactor in #6630

* fix numpy import error after tensor2vid refactor in #6626

* make fix-copies

* fix numpy error

* fix progress bar test

---------

Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2024-01-24 18:22:26 +05:30
Brandon Strong
16748d1eba SD 1.5 Support For Advanced Lora Training (train_dreambooth_lora_sdxl_advanced.py) (#6449)
* sd1.5 support in separate script

A quick adaptation to support people interested in using this method on 1.5 models.

* sd15 prompt text encoding and unet conversions

as per @linoytsaban 's recommendations. Testing would be appreciated,

* Readability and quality improvements

Removed some mentions of SDXL, and some arguments that don't apply to sd 1.5, and cleaned up some comments.

* make style/quality commands

* tracker rename and run-it doc

* Update examples/advanced_diffusion_training/train_dreambooth_lora_sd15_advanced.py

* Update examples/advanced_diffusion_training/train_dreambooth_lora_sd15_advanced.py

---------

Co-authored-by: Linoy Tsaban <57615435+linoytsaban@users.noreply.github.com>
2024-01-24 11:20:08 +02:00
Haofan Wang
c9081a8abd [Fix bugs] pipeline_controlnet_sd_xl.py (#6653)
* Update pipeline_controlnet_sd_xl.py

* Update pipeline_controlnet_xs_sd_xl.py
2024-01-24 09:18:12 +05:30
Yasuna
0eb68d9ddb [Docs] update: tutorials ja | AUTOPIPELINE.md (#6629)
* add en file

* translate 1-118 lines

* add text

* add toctree

* fix

* fix typo

* fix link
2024-01-23 09:19:55 -08:00
Haofan Wang
9941b3f124 Add InstantID Pipeline (#6673)
* add instantid pipeline

* format

* Update README.md

* Update README.md

* format

---------

Co-authored-by: ResearcherXman <xhs.research@gmail.com>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-01-23 17:25:23 +05:30
Ayush Mangal
16b9f98b48 [WIP][Community Pipeline] InstaFlow! One-Step Stable Diffusion with Rectified Flow (#6057)
* Add instaflow community pipeline

* Make styling fixes

* Add lora

* Fix formatting

* Add docs

* Update README.md

* Update README.md

* Remove do LORA

* Update readme

* Update README.md

* Update README.md

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2024-01-23 13:52:50 +02:00
Dhruv Nair
fee93c81eb [Refactor] Update from single file (#6428)
* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update'

* update

* update

* update

* update

* update

* update

* up

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* up

* update

* update

* update

* update

* update'

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* clean

* update

* update

* clean up

* clean up

* update

* clean

* clean

* update

* updaet

* clean up

* fix docs

* update

* update

* Revert "update"

This reverts commit dbfb8f1ea9.

* update

* update

* update

* update

* fix controlnet

* fix scheduler

* fix controlnet tests
2024-01-23 14:42:03 +05:30
Sayak Paul
5308cce994 [Tests] Test for passing local config file to from_single_file() (#6638)
make config file local too.
2024-01-23 14:21:23 +05:30
YiYi Xu
318556b20e fix dpm related slow test failure (#6680)
fix

Co-authored-by: yiyixuxu <yixu310@gmail,com>
2024-01-22 18:52:05 -10:00
Dhruv Nair
6620eda357 Standardise outputs for video pipelines (#6626)
* update

* update

* update

* update

* update

* update

* update

* clean up

* clean up
2024-01-23 10:07:07 +05:30
Sayak Paul
1f0705adcf [Big refactor] move unets to unets module 🦋 (#6630)
* move unets to  module 🦋

* parameterize unet-level import.

* fix flax unet2dcondition model import

* models __init__

* mildly depcrecating models.unet_2d_blocks in favor of models.unets.unet_2d_blocks.

* noqa

* correct depcrecation behaviour

* inherit from the actual classes.

* Empty-Commit

* backwards compatibility for unet_2d.py

* backward compatibility for unet_2d_condition

* bc for unet_1d

* bc for unet_1d_blocks
2024-01-23 08:57:58 +05:30
M. Tolga Cangöz
5e96333cb2 Update README (#6669)
Update number of checkpoints and repositories in README
2024-01-22 08:08:07 -08:00
Sayak Paul
da95a28ff6 [Diffusion DPO] apply fixes from #6547 (#6668)
apply fixes from #6547
2024-01-22 20:14:54 +05:30
Dhruv Nair
d66d554dc2 Add tearDown method to LoRA tests. (#6660)
* update

* update
2024-01-22 14:00:37 +05:30
Junsong Chen
c7df846dec add Sa-Solver (#5975)
* add Sa-Solver



---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: scxue <xueshuchen17@mails.ucas.edu.cn>
Co-authored-by: jschen <chenjunsong4@h-partners.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: yiyixuxu <yixu310@gmail,com>
2024-01-21 21:37:44 -10:00
Vinh H. Pham
8e7bbfbe5a add padding_mask_crop to all inpaint pipelines (#6360)
* add padding_mask_crop
---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: YiYi Xu <yixu310@gmail.com>
2024-01-21 19:42:22 -10:00
YiYi Xu
e2773c6255 fix SDXL-kdiffusion tests (#6647)
🤞🤞🤞

Co-authored-by: yiyixuxu <yixu310@gmail,com>
2024-01-19 17:37:29 -10:00
YiYi Xu
ac61eefc9f fix DPM Scheduler with use_karras_sigmas option (#6477)
* fix

---------

Co-authored-by: yiyixuxu <yixu310@gmail,com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2024-01-19 07:08:22 -10:00
HelloWorldBeginner
f95615b823 Fixed the bug related to saving DeepSpeed models. (#6628)
* Fixed the bug related to saving DeepSpeed models.

* Add information about training SD models using DeepSpeed to the README.

* Apply suggestions from code review

---------

Co-authored-by: mhh001 <mahonghao1@huawei.com>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-01-19 19:21:57 +05:30
SangKim
a9288b49c9 Modularize InstructPix2Pix SDXL inferencing during and after training in examples (#6569) 2024-01-19 15:47:34 +05:30
elucida
c54419658b refactor: extract init/forward function in UNet2DConditionModel (#6478)
* - extract function for stage in UNet2DConditionModel init & forward
- Add new function get_mid_block() to unet_2d_blocks.py

* add type hint to get_mid_block aligned with get_up_block and get_down_block; rename _set_xxx function

* add type hint and  use keyword arguments

* remove `copy from` in versatile diffusion
2024-01-19 12:13:34 +02:00
Aryan V S
6382663dc8 [Community] Experimental AnimateDiff Image to Video (open to improvements) (#6509)
* add animatediff img2vid

* fix

* Update examples/community/README.md

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* fix code snippet between ip adapter face id and animatediff img2vid

---------

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2024-01-19 12:05:41 +02:00
spezialspezial
58b8dce129 Update convert_from_ckpt.py / read checkpoint config yaml contents (#6633)
Update convert_from_ckpt.py / read config yaml contents

Added missing reading of config yaml file contents
2024-01-19 08:25:03 +05:30
Dhruv Nair
a65ca8a059 Fix failing tests due to Posix Path (#6627)
update
2024-01-18 19:19:57 +05:30
Steven Liu
5ca062e011 [docs] Fix missing API function (#6604)
fix?
2024-01-17 13:59:09 -08:00
Linoy Tsaban
619e3ab6f6 [bug fix] advanced dreambooth lora sdxl - fixes bugs described in #6486 (#6599)
* fixes bugs:
1. redundant retraction
2. param clone
3. stopping optimization of text encoder params

* param upscaling

* style
2024-01-17 20:11:45 +05:30
Patrick von Platen
9e2804f720 Update pr_test_peft_backend.yml to use 1 process for testing (#6613) 2024-01-17 19:25:30 +05:30
Aryan V S
9112028ed8 FreeInit (#6315)
* freeinit

* update freeinit implementation based on review

Co-Authored-By: Dhruv Nair <dhruv.nair@gmail.com>

* fix

* another fix

* refactor

* fix timesteps missing bug

* apply suggestions from review

Co-Authored-By: Dhruv Nair <dhruv.nair@gmail.com>

* add test for freeinit

* apply suggestions from review

Co-Authored-By: Dhruv Nair <dhruv.nair@gmail.com>

* refactor

* fix test

* fix tensor not on same device

* update

* remove return_intermediate_results

* fix broken freeinit test

* update animatediff docs

---------

Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>
2024-01-17 17:17:07 +05:30
Steve Rhoades
dce06680d2 Fixes torch.compile() compatible training (#6589)
resolve conflicts
2024-01-17 07:47:03 +05:30
Bhavay Malhotra
dd63168319 Update installation.md (#6438)
* Update installation.md

* Update installation.md

* Update installation.md
2024-01-16 13:41:38 -08:00
Celestial Phineas
1040dfd9cc [Fix] Multiple image conditionings in a single batch for StableDiffusionControlNetPipeline (#6334)
* [Fix] Multiple image conditionings in a single batch for `StableDiffusionControlNetPipeline`.

* Refactor `check_inputs` in `StableDiffusionControlNetPipeline` to avoid redundant codes.

* Make the behavior of MultiControlNetModel to be the same to the original ControlNetModel

* Keep the code change minimum for nested list support

* Add fast test `test_inference_nested_image_input`

* Remove redundant check for nested image condition in `check_inputs`

Remove `len(image) == len(prompt)` check out of `check_image()`

Co-authored-by: YiYi Xu <yixu310@gmail.com>

* Better `ValueError` message for incompatible nested image list size

Co-authored-by: YiYi Xu <yixu310@gmail.com>

* Fix syntax error in `check_inputs`

* Remove warning message for multi-ControlNets with multiple prompts

* Fix a typo in test_controlnet.py

Co-authored-by: YiYi Xu <yixu310@gmail.com>

* Add test case for multiple prompts, single image conditioning in `StableDiffusionMultiControlNetPipelineFastTests`

* Improved `ValueError` message for nested `controlnet_conditioning_scale`

* Documenting the behavior of image list as `StableDiffusionControlNetPipeline` input

---------

Co-authored-by: YiYi Xu <yixu310@gmail.com>
2024-01-16 10:40:55 -10:00
Sayak Paul
49a4b377c1 remove omegaconf from the residues 👋 (#6600)
remove omegaconf 👋
2024-01-16 21:55:41 +05:30
JuanCarlosPi
dff35a86e4 Change in ip-adapter docs. CLIPVisionModelWithProjection should be im… (#6597)
Change in ip-adapter docs. CLIPVisionModelWithProjection should be imported from transformers, not diffusers
2024-01-16 08:18:13 -08:00
Yondon Fu
8842bcadb9 [SVD] Return np.ndarray when output_type="np" (#6507)
[SVD] Fix output_type="np"
2024-01-16 14:51:36 +05:30
Steve Rhoades
181280baba Fixes training resuming: Advanced Dreambooth LoRa Training (#6566)
* Fixes #6418 Advanced Dreambooth LoRa Training

* change order of import to fix nit

* fix nit, use cast_training_params

* remove torch.compile fix, will move to a new PR

* remove unnecessary import
2024-01-16 14:30:49 +05:30
Charchit Sharma
53f498d2a4 Use of Posix to better support Windows compatibility in testing_utils (#6587)
* changes in utils

* removed loc
2024-01-16 10:27:29 +02:00
Charchit Sharma
990860911f change to posix for better Windows support for lora loaders (#6590)
* posix lora

* changes and style fix
2024-01-16 13:46:29 +05:30
Fabio Rigano
23eed39702 Fix path generation in IP Adapter (#6564)
* Fix path generation on Windows

* Update set_default_attn_processors

* Use pathlib

* Fix quality

* Fix copy

* Revert changes in set_default_attn_processors

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-01-16 11:25:22 +05:30
YiYi Xu
fefed44543 update slow test for SDXL k-diffusion pipeline (#6588)
update expected slice
2024-01-15 18:54:33 -10:00
Dong
814f56d2fe 🐛 fix ip-adapter controlnet img2img missing code (#6528)
* 🐛 fix ip-adapter controlnet img2img missing code

* 📝 edit test

* 📝 edit test

* 📝 run make style and quality

* 🎨 remove slow tests
2024-01-15 16:41:09 -10:00
SangKim
96d6e16550 Enable image resizing to adjust its height and width in StableDiffusionXLInstructPix2PixPipeline (#6581)
* Enable image resizing to adjust its height and width in StableDiffusionXLInstructPix2PixPipeline

* Ensure that validation is performed at every 'validation_step', not at every step
2024-01-16 07:50:34 +05:30
Aryan V S
c11de13588 [training] fix training resuming problem for fp16 (SD LoRA DreamBooth) (#6554)
* fix training resume

* update

* update
2024-01-16 07:27:06 +05:30
Patrick von Platen
357855f8fc [Docs] Fix controlnet indent (#6578) 2024-01-15 18:12:55 +02:00
Fabio Rigano
f825221b5d [Community Pipeline] IPAdapter FaceID (#6276)
* Add support for IPAdapter FaceID

* Add docs

* Move subfolder to kwargs

* Fix quality

* Fix image encoder loading

* Fix loading + add test

* Move to community folder

* Fix style

* Revert constant update

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-01-15 16:43:54 +02:00
Aryan V S
119d734f6e [AnimateDiff+Controlnet] Fix multicontrolnet support (#6551)
* fix multicontrolnet support

* update README with multicontrolnet example
2024-01-15 16:36:54 +02:00
Sayak Paul
cb4b3f0b78 [OmegaConf] replace it with yaml (#6488)
* remove omegaconf from convert_from_ckpt.

* remove from single_file.

* change to string based ubscription.

* style

* okay

* fix: vae_param

* no . indexing.

* style

* style

* turn getattrs into explicit if/else

* style

* propagate changes to ldm_uncond.

* propagate to gligen

* propagate to if.

* fix: quotes.

* propagate to audioldm.

* propagate to audioldm2

* propagate to musicldm.

* propagate to vq_diffusion

* propagate to zero123.

* remove omegaconf from diffusers codebase.
2024-01-15 20:02:10 +05:30
Haofan Wang
3d574b3bbe Fix a bug of flip in SDXL training script (#6547)
* Update train_text_to_image_sdxl.py

* Update train_text_to_image_lora_sdxl.py
2024-01-15 16:28:04 +02:00
Charchit Sharma
09903774d9 Make T2I Adapter SDXL Training Script torch.compile compatible (#6577)
update for t2i_adapter
2024-01-15 19:42:56 +05:30
dependabot[bot]
d6a70d8ba8 Bump jinja2 from 3.1.2 to 3.1.3 in /examples/research_projects/realfill (#6539)
Bumps [jinja2](https://github.com/pallets/jinja) from 3.1.2 to 3.1.3.
- [Release notes](https://github.com/pallets/jinja/releases)
- [Changelog](https://github.com/pallets/jinja/blob/main/CHANGES.rst)
- [Commits](https://github.com/pallets/jinja/compare/3.1.2...3.1.3)

---
updated-dependencies:
- dependency-name: jinja2
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-01-15 16:10:10 +02:00
Charchit Sharma
e3103e171f Make InstructPix2Pix SDXL Training Script torch.compile compatible (#6576)
* changes for pix2pix_sdxl

* style fix
2024-01-15 17:54:03 +05:30
Charchit Sharma
b053053ac9 Make InstructPix2Pix Training Script torch.compile compatible (#6558)
* added torch.compile for pix2pix

* required changes
2024-01-15 17:03:22 +05:30
Vinh H. Pham
08702fc1cb Make text-to-image SDXL LoRA Training Script torch.compile compatible (#6556)
make compile compatible
2024-01-15 16:58:16 +05:30
Vinh H. Pham
7ce89e979c Make text-to-image SD LoRA Training Script torch.compile compatible (#6555)
make compile compatible
2024-01-15 16:55:08 +05:30
gzguevara
05faf3263b SDXL text-to-image torch compatible (#6550)
* torch compatible

* code quality fix

* ruff style

* ruff format
2024-01-15 16:49:11 +05:30
Sayak Paul
a080f0d3a2 [Training Utils] create a utility for casting the lora params during training. (#6553)
create a utility for casting the lora params during training.
2024-01-15 13:51:13 +05:30
Sayak Paul
79df50388d [Training] fix training resuming problem when using FP16 (SDXL LoRA DreamBooth) (#6514)
* fix: training resume from fp16.

* add: comment

* remove residue from another branch.

* remove more residues.

* thanks to Younes; no hacks.

* style.

* clean things a bit and modularize _set_state_dict_into_text_encoder

* add comment about the fix detailed.
2024-01-12 17:11:06 +05:30
Vinh H. Pham
7d631825b0 Make Dreambooth SD Training Script torch.compile compatible (#6532)
* support compile

* make style

* move unwrap_model inside function

* change unwrap call

* run make style

* Update examples/dreambooth/train_dreambooth.py

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

* Revert "Update examples/dreambooth/train_dreambooth.py"

This reverts commit 70ab09732e.

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-01-12 12:50:15 +05:30
gzguevara
33d2b5b087 SD text-to-image torch compile compatible (#6519)
* added unwrapper

* fiz typo
2024-01-12 09:28:35 +05:30
Suvaditya Mukherjee
f486d34b04 Make ControlNet SD Training Script torch.compile compatible (#6525)
* update: make controlnet script torch compile compatible

Signed-off-by: Suvaditya Mukherjee <suvadityamuk@gmail.com>

* update: correct earlier mistakes for compilation

Signed-off-by: Suvaditya Mukherjee <suvadityamuk@gmail.com>

* update: fix code style issues

Signed-off-by: Suvaditya Mukherjee <suvadityamuk@gmail.com>

---------

Signed-off-by: Suvaditya Mukherjee <suvadityamuk@gmail.com>
2024-01-12 09:27:26 +05:30
Charchit Sharma
e44b205e0b Make ControlNet SDXL Training Script torch.compile compatible (#6526)
* make torch.compile compatible

* fix quality
2024-01-12 09:25:09 +05:30
Vinh H. Pham
60cb44323d Make Dreambooth SD LoRA Training Script torch.compile compatible (#6534)
support compile
2024-01-12 09:24:03 +05:30
Radamés Ajna
1dd0ac9401 [DPO Training] pass tracker name as argument (#6542)
pass tracker name as argumentw
2024-01-12 09:15:39 +05:30
Yassine El Boudouri
c6b04589b6 Remove conversion to RGB (#6479)
* Remove conversion to RGB

* Add a Conversion Function

* Add type hint for convert_method

* Update src/diffusers/utils/loading_utils.py

Update docstring

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Update docstring

* Optimize imports

* Optimize imports (2)

* Reformat code

---------

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-01-12 07:20:24 +05:30
Sayak Paul
5dc3471380 [SVD] support generators that are created on GPU (#6484)
* debug generator

* fix?

* fix?

* fix

* remove print.

* revert none check
2024-01-11 20:08:18 +05:30
Aryan V S
9df566e6da [Community] StyleAligned Pipeline (#6489)
* add stylealigned sdxl pipeline

* bugfix

* update docs

* remove einops dependency

* update README

* update example docstring
2024-01-11 14:35:55 +01:00
Sayak Paul
be0b425762 [Training] make checkpointing compatible when using torch.compile (part II) (#6511)
make checkpointing compatible when using torch.compile.
2024-01-11 18:37:30 +05:30
jquintanilla4
da843b3d53 .load_ip_adapter in StableDiffusionXLAdapterPipeline (#6246)
* Added testing notebook and .load_ip_adapter to XLAdapterPipeline

* Added annotations

* deleted testing notebook

* Update src/diffusers/pipelines/t2i_adapter/pipeline_stable_diffusion_xl_adapter.py

Co-authored-by: YiYi Xu <yixu310@gmail.com>

* code clean up

* Add feature_extractor and image_encoder to components

---------

Co-authored-by: YiYi Xu <yixu310@gmail.com>
2024-01-11 13:04:08 +05:30
dg845
17cece072a Fix bug in LCM Distillation Scripts when args.unet_time_cond_proj_dim is used (#6523)
* Fix bug where unet's time_cond_proj_dim is not set correctly if using args.unet_time_cond_proj_dim.

* make style
2024-01-11 08:21:07 +05:30
Steven Liu
a551ddf928 [docs] mask_blur and padding_mask_crop (#6498)
new inpaint features
2024-01-10 08:14:34 -08:00
Steven Liu
1d57892980 [docs] Callbacks (#6471)
edits
2024-01-10 08:14:07 -08:00
antoine-scenario
3e8b63216e Add IP-Adapter to StableDiffusionXLControlNetImg2ImgPipeline (#6293)
* add IP-Adapter to StableDiffusionXLControlNetImg2ImgPipeline

Update src/diffusers/pipelines/controlnet/pipeline_controlnet_sd_xl_img2img.py

Co-authored-by: YiYi Xu <yixu310@gmail.com>

fix tests

* fix failing test

---------

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-01-09 22:02:11 -10:00
YiYi Xu
dd4459ad79 [Refactor] splitingResnetBlock2D into multiple blocks (#6166)
---------

Co-authored-by: yiyixuxu <yixu310@gmail,com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-01-09 21:38:05 -10:00
YiYi Xu
6313645b6b add StableDiffusionXLKDiffusionPipeline (#6447)
---------

Co-authored-by: yiyixuxu <yixu310@gmail,com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2024-01-09 16:29:01 -10:00
Rahul Raman
2d1f2182cc example: Train Instruct pix2 pix with lora implementation (#6469)
* base template file - train_instruct_pix2pix.py

* additional import and parser argument requried for lora

* finetune only instructpix2pix model -- no need to include these layers

* inject lora layers

* freeze unet model -- only lora layers are trained

* training modifications to train only lora parameters

* store only lora parameters

* move train script to research project

* run quality and style code checks

* move train script to a new folder

* add README

* update README

* update references in README

---------

Co-authored-by: Rahul Raman <rahulraman@gmail.com>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-01-10 06:38:19 +05:30
Steven Liu
3be7c96e28 [docs] Stable video diffusion (#6472)
svd
2024-01-09 09:21:58 -08:00
Steven Liu
3c79dd9dbe [docs] PEFT adapter API (#6499)
follow up
2024-01-09 08:09:15 -08:00
Steven Liu
9d767916da [docs] Fast diffusion (#6470)
* edits

* fix

* feedback
2024-01-09 08:08:31 -08:00
Patrick von Platen
ae7cd5ad4c Link issue template to discussions 2024-01-09 17:00:15 +01:00
Sayak Paul
4497b3ec98 [Training] make DreamBooth SDXL LoRA training script compatible with torch.compile (#6483)
* make it torch.compile comaptible

* make the text encoder compatible too.

* style
2024-01-09 20:11:26 +05:30
Yifan Zhou
fc63ebdd3a [Community Pipeline] Rerender-A-Video: Zero-Shot Video-to-Video Translation (#6332)
* upload codes and doc

* lint

* lint

* lint

* update code

* remove blank lines

* Fix load url
2024-01-09 14:55:34 +01:00
Sayak Paul
8b6fae4ce5 [SVD] fix: vae type (#6475)
fix: vae type
2024-01-09 14:50:02 +01:00
jiqing-feng
aa1797e109 enable stable-xl textual inversion (#6421)
* enable stable-xl textual inversion

* check if optimizer_2 exists

* check text_encoder_2 before using

* add textual inversion for sdxl in a single file

* fix style

* fix example style

* reset for error changes

* add readme for sdxl

* fix style

* disable autocast as it will cause cast error when weight_dtype=bf16

* fix spelling error

* fix style and readme and 8bit optimizer

* add README_sdxl.md link

* add tracker key on log_validation

* run style

* rm the second center crop

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-01-09 15:12:33 +05:30
Patrick von Platen
5bacc2f5af [SAG] Support more schedulers, add better error message and make tests faster (#6465)
* finish

* finish

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-01-09 09:24:38 +05:30
Yasuna
6ae7e8112a [Docs] update: tutorials ja | INDEX.md, TUTORIAL_OVERVIEW.md, TOCTREE.yml (#6338)
* add tutorials to toctree.yml

* fix title

* fix words

* add overview ja

* fix diffusion to 拡散

* fix line 21

* add space

* delete supported pipline

* fix tutorial_overview.md

* fix space

* fix typo

* Delete docs/source/ja/tutorials/using_peft_for_inference.md

this file is not translated

* Delete docs/source/ja/tutorials/basic_training.md

this file is not translated

* Delete docs/source/ja/tutorials/autopipeline.md

this file is not translated

* fix toctree
2024-01-08 09:06:46 -08:00
Sayak Paul
e0f349c2b0 correct reviewers in PR template (#6485) 2024-01-08 19:12:04 +05:30
Sayak Paul
774f5c4581 minor changes to the SVD doc (#6466)
minor changes
2024-01-06 08:40:46 +05:30
Sayak Paul
a483a8eddf Rename REAMDE.md to README.md 2024-01-05 20:49:44 +05:30
Lucain
9a9daee724 Fix offline mode import (#6467) 2024-01-05 15:34:40 +01:00
Sayak Paul
585f941366 [Core] introduce PeftAdapterMixin module. (#6416)
* introduce integrations module.

* remove duplicate methods.

* better imports.

* move to loaders.py

* remove peftadaptermixin from modelmixin.

* add: peftadaptermixin selectively.

* add: entry to _toctree

* Empty-Commit
2024-01-05 18:18:28 +05:30
Dhruv Nair
86a26761ac Correctly handle creating model index json files when setting compiled modules in pipelines. (#6436)
update
2024-01-05 18:02:09 +05:30
Liang Hou
6ef2b8a92f Fix amused paper link (#6462) 2024-01-05 13:12:09 +01:00
Vinh H. Pham
3848606c7e [Community Pipeline] Add gluegen (#6433)
* init works

* add gluegen pipeline

* add gluegen code

* add another way to load language adapter

* make style

* Update README.md

* change doc
2024-01-05 13:05:40 +01:00
Sayak Paul
2a97067b84 [Experimental] Diffusion LoRA DPO training (#6422)
* add: experimental script for diffusion dpo training.

* random_crop cli.

* fix: caption tokenization.

* fix: pixel_values index.

* fix: grad?

* debug

* fix: reduction.

* fixes in the loss calculation.

* style

* fix: unwrap call.

* fix: validation inference.

* add: initial sdxl script

* debug

* make sure images in the tuple are of same res

* fix model_max_length

* report print

* boom

* fix: numerical issues.

* fix: resolution

* comment about resize.

* change the order of the training transformation.

* save call.

* debug

* remove print

* manually detaching necessary?

* use the same vae for validation.

* add: readme.
2024-01-05 16:40:06 +05:30
Sayak Paul
ae060fc4f1 [feat] introduce unload_lora(). (#6451)
* introduce unload_lora.

* fix-copies
2024-01-05 16:22:11 +05:30
Sayak Paul
9d945b2b90 0.25.0 post release (#6358)
* post release

* style

---------

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2024-01-05 16:13:27 +05:30
Junsheng121
d184291c7d null-text-inversion-pipeline-implementation (#6329)
* null-text-inversion-implementation

* edited

* edited

* edited

* edited

* edited

* edit

* makestyle

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-01-05 11:35:21 +01:00
Sayak Paul
0a0bb526aa [LoRA depcrecation] LoRA depcrecation trilogy (#6450)
* edebug

* debug

* more debug

* more more debug

* remove tests for LoRAAttnProcessors.

* rename
2024-01-05 15:48:20 +05:30
Linoy Tsaban
2fada8dc1b [bug fix] fixes #6444 - checkpointing save issue in advanced dreambooth lora sdxl script (#6464)
* unwrap text encoder when saving hook only for full text encoder tuning

* unwrap text encoder when saving hook only for full text encoder tuning

* save embeddings in each checkpoint as well

* save embeddings in each checkpoint as well

* save embeddings in each checkpoint as well

* Update examples/advanced_diffusion_training/train_dreambooth_lora_sdxl_advanced.py

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-01-05 15:35:24 +05:30
jiqing-feng
f2d51a28f7 Intel Gen 4 Xeon and later support bf16 (#6367)
* Intel Gen 4 Xeon and later support bf16

* fix bf16 notes
2024-01-05 11:47:28 +05:30
Horseee
811fd06292 [Doc] Add DeepCache in section optimization/General optimizations (#6390)
* add documentation for DeepCache

* fix typo

* add wandb url for DeepCache

* fix some typos

* add item in _toctree.yml

* update formats for arguments

* Update deepcache.md

* Update docs/source/en/optimization/deepcache.md

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

* add StableDiffusionXLPipeline in doc

* Separate SDPipeline and SDXLPipeline

* Add the paper link of ablation experiments for hyper-parameters

* Apply suggestions from code review

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2024-01-05 09:57:08 +05:30
dg845
f3d1333e02 Improve LCM(-LoRA) Distillation Scripts (#6420)
* Make WDS pipeline interpolation type configurable.

* Make the VAE encoding batch size configurable.

* Make lora_alpha and lora_dropout configurable for LCM LoRA scripts.

* Generalize scalings_for_boundary_conditions function and make the timestep scaling configurable.

* Make LoRA target modules configurable for LCM-LoRA scripts.

* Move resolve_interpolation_mode to src/diffusers/training_utils.py and make interpolation type configurable in non-WDS script.

* apply suggestions from review
2024-01-05 06:55:13 +05:30
Steven Liu
acd926f4f2 [docs] Fix local links (#6440)
fix local links
2024-01-04 09:59:11 -08:00
Lucain
691d8d3e15 Respect offline mode when loading pipeline (#6456)
* Respect offline mode when loading model

* default to local entry if connectionerror
2024-01-04 17:05:55 +01:00
Sayak Paul
e7c0af5e71 add: amused paper link. (#6453) 2024-01-04 13:44:54 +05:30
Sayak Paul
107e02160a [LoRA tests] fix stuff related to assertions arising from the recent changes. (#6448)
* debug

* debug test_with_different_scales_fusion_equivalence

* use the right method.

* place it right.

* let's see.

* let's see again

* alright then.

* add a comment.
2024-01-04 12:55:15 +05:30
sayakpaul
6dbef45e6e Revert "debug"
This reverts commit 7715e6c31c.
2024-01-04 10:39:38 +05:30
sayakpaul
7715e6c31c debug 2024-01-04 10:39:00 +05:30
sayakpaul
05b3d36a25 Revert "debug"
This reverts commit fb4aec0ce3.
2024-01-04 10:38:04 +05:30
sayakpaul
fb4aec0ce3 debug 2024-01-04 10:37:28 +05:30
Sayak Paul
63de23e3db disable running peft non-peft lora test in the peft env. (#6437)
* disable running peft non-peft lora test in the peft env.

* Empty-Commit
2024-01-04 10:18:46 +05:30
Chi
2993257f2a Batter way to write binarize() function. (#6394)
* I added a new doc string to the class. This is more flexible to understanding other developers what are doing and where it's using.

* Update src/diffusers/models/unet_2d_blocks.py

This changes suggest by maintener.

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

* Update src/diffusers/models/unet_2d_blocks.py

Add suggested text

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

* Update unet_2d_blocks.py

I changed the Parameter to Args text.

* Update unet_2d_blocks.py

proper indentation set in this file.

* Update unet_2d_blocks.py

a little bit of change in the act_fun argument line.

* I run the black command to reformat style in the code

* Update unet_2d_blocks.py

similar doc-string add to have in the original diffusion repository.

* Batter way to write binarize function

* Solve check_code_quality error

* My mistake to run pull request but not reformated file

* Update image_processor.py

* remove extra variable and space

* Update image_processor.py

* Run ruff libarary to reformat my file

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: YiYi Xu <yixu310@gmail.com>
2024-01-04 09:32:08 +05:30
Sayak Paul
aad18faa3e Update README_sdxl.md to update the LR (#6432)
Update README_sdxl.md
2024-01-03 20:55:51 +05:30
Sayak Paul
d700140076 [LoRA deprecation] handle rest of the stuff related to deprecated lora stuff. (#6426)
* handle rest of the stuff related to deprecated lora stuff.

* fix: copies

* don't modify the uNet in-place.

* fix: temporal autoencoder.

* manually remove lora layers.

* don't copy unet.

* alright

* remove lora attn processors from unet3d

* fix: unet3d.

* styl

* Empty-Commit
2024-01-03 20:54:09 +05:30
Sayak Paul
2e4dc3e25d [LoRA] add: test to check if peft loras are loadable in non-peft envs. (#6400)
* add: test to check if peft loras are loadable in non-peft envs.

* add torch_device approrpiately.

* fix: get_dummy_inputs().

* test logits.

* rename

* debug

* debug

* fix: generator

* new assertion values after fixing the seed.

* shape

* remove print statements and settle this.

* to update values.

* change values when lora config is initialized under a fixed seed.

* update colab link

* update notebook link

* sanity restored by getting the exact same values without peft.
2024-01-03 09:57:49 +05:30
YiYi Xu
3e2961f0b4 [doc] update inpaint doc to use apply_overlay (#6364)
add doc

Co-authored-by: yiyixuxu <yixu310@gmail,com>
2024-01-02 11:16:36 -10:00
Vinh H. Pham
79c380bc80 Correct how apply_overlay read crop_coords (#6417)
correct reading variables
2024-01-02 19:31:12 +01:00
Aryan V S
e30b661437 Update lpw_xl pipeline to latest diffusers (#6411)
* add clip_skip, freeu, qkv

* fix

* add ip-adapter support

* callback on step end

* update

* fix NoneType bug

* fix

* add guidance scale embedding

* add textual inversion
2024-01-02 16:28:45 +01:00
Linoy Tsaban
b4077af212 [bug fix] using snr gamma and prior preservation loss in the dreambooth lora sdxl training scripts (#6356)
* change timesteps used to calculate snr when --with_prior_preservation is enabled

* change timesteps used to calculate snr when --with_prior_preservation is enabled (canonical script)

* style

* revert canonical script to before snr gamma change

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-01-02 09:21:39 -06:00
Daniel Socek
9f2bff502e [svd] fix noise_aug_strength type in svd pipe (#6389) 2024-01-02 14:45:07 +01:00
CyrusVorwald
0cb92717f9 add StableDiffusionXLControlNetInpaintPipeline to auto pipeline (#6302)
* add StableDiffusionXLControlNetInpaintPipeline to auto pipeline

* fixed style
2024-01-02 14:44:48 +01:00
Fabio Rigano
86714b72d0 Add unload_ip_adapter method (#6192)
* Add unload_ip_adapter method

* Update attn_processors with original layers

* Add test

* Use set_default_attn_processor

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-01-02 14:40:46 +01:00
Sayak Paul
61f6c5472a [LoRA] Remove the use of depcrecated loRA functionalities such as LoRAAttnProcessor (#6369)
* start deprecating loraattn.

* fix

* wrap into unet_lora_state_dict

* utilize text_encoder_lora_params

* utilize text_encoder_attn_modules

* debug

* debug

* remove print

* don't use text encoder for test_stable_diffusion_lora

* load the procs.

* set_default_attn_processor

* fix: set_default_attn_processor call.

* fix: lora_components[unet_lora_params]

* checking for 3d.

* 3d.

* more fixes.

* debug

* debug

* debug

* debug

* more debug

* more debug

* more debug

* more debug

* more debug

* more debug

* hack.

* remove comments and prep for a PR.

* appropriate set_lora_weights()

* fix

* fix: test_unload_lora_sd

* fix: test_unload_lora_sd

* use dfault attebtion processors.

* debu

* debug nan

* debug nan

* debug nan

* use NaN instead of inf

* remove comments.

* fix: test_text_encoder_lora_state_dict_unchanged

* attention processor default

* default attention processors.

* default

* style
2024-01-02 18:14:04 +05:30
lookas
17546020fc Fix #6409 (#6410)
* Update value_guided_sampling.py

Fix #6409

* Comply code style

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-01-02 11:16:21 +05:30
2510
8a366b835c Fix gradient-checkpointing option is ignored in SDXL+LoRA training. (#6388) (#6402)
* Fix gradient-checkpointing option is ignored in SDXL+LoRA training. (#6388)

* Fix gradient-checkpointing option is ignored in SD+LoRA training.

* Fix gradient checkpoint is not applied to text encoders. (SDXL+LoRA)

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-01-01 08:51:04 +05:30
Sayak Paul
61d223c884 add: CUDA graph details. (#6408) 2023-12-31 13:43:26 +05:30
apolinário
bf725e044e Add new WebUI conversion state_dict_utils to __init__ utils (#6404)
* Add new state_dict_utils to __init__ utils

* style

---------

Co-authored-by: multimodalart <joaopaulo.passos+multimodal@gmail.com>
2023-12-30 09:31:39 -06:00
apolinário
1622265e13 Add WebUI format support to Advanced Training Script (#6403)
* Add WebUI format support to Advanced Training Script

* style

---------

Co-authored-by: multimodalart <joaopaulo.passos+multimodal@gmail.com>
2023-12-30 08:45:49 -06:00
apolinário
0b63ad5ad5 Create convert_diffusers_sdxl_lora_to_webui.py (#6395)
* Create convert_diffusers_sdxl_lora_to_webui.py

* Move some conversion logic to utils

* fix logging import

* Add usage example

---------

Co-authored-by: multimodalart <joaopaulo.passos+multimodal@gmail.com>
2023-12-30 08:15:11 -06:00
Sayak Paul
6a376ceea2 [LoRA] remove unnecessary components from lora peft test suite (#6401)
remove unnecessary components from lora peft suite/
2023-12-30 18:25:40 +05:30
gzguevara
9f283b01d2 changed w&b report link (#6387) 2023-12-29 19:49:11 +05:30
Sayak Paul
203724e9d9 [Docs] add note on fp16 in fast diffusion (#6380)
add note on fp16
2023-12-29 09:38:50 +05:30
gzguevara
e7044a4221 multi-subject-dreambooth-inpainting with 🤗 datasets (#6378)
* files added

* fixing code quality

* fixing code quality

* fixing code quality

* fixing code quality

* sorted import block

* seperated import wandb

* ruff on script

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2023-12-29 09:33:49 +05:30
Sayak Paul
034b39b8cb [docs] add details concerning diffusers-specific bits. (#6375)
add details concerning diffusers-specific bits.
2023-12-28 23:12:49 +05:30
Sayak Paul
2db73f4a50 remove delete documentation trigger workflows. (#6373) 2023-12-28 18:26:14 +05:30
Adrian Punga
84d7faebe4 Fix support for MPS in KDPM2AncestralDiscreteScheduler (#6365)
Fix support for MPS

MPS doesn't support float64
2023-12-28 10:22:02 +01:00
YiYi Xu
4c483deb90 [refactor embeddings] gligen + ip-adapter (#6244)
* refactor ip-adapter-imageproj, gligen

---------

Co-authored-by: yiyixuxu <yixu310@gmail,com>
2023-12-27 18:48:42 -10:00
Sayak Paul
1ac07d8a8d [Training examples] Follow up of #6306 (#6346)
* add to dreambooth lora.

* add: t2i lora.

* add: sdxl t2i lora.

* style

* lcm lora sdxl.

* unwrap

* fix: enable_adapters().
2023-12-28 07:37:50 +05:30
apolinário
1fff527702 Fix keys for lora format on advanced training scripts (#6361)
fix keys for lora format on advanced training scripts
2023-12-27 11:38:03 -06:00
apolinário
645a62bf3b Add PEFT to advanced training script (#6294)
* Fix ProdigyOPT in SDXL Dreambooth script

* style

* style

* Add PEFT to Advanced Training Script

* style

* style

*  style 

* change order for logic operation

* add lora alpha

* style

* Align PEFT to new format

* Update train_dreambooth_lora_sdxl_advanced.py

Apply #6355 fix

---------

Co-authored-by: multimodalart <joaopaulo.passos+multimodal@gmail.com>
2023-12-27 10:00:32 -03:00
Dhruv Nair
6414d4e4f9 Fix chunking in SVD (#6350)
fix
2023-12-27 13:07:41 +01:00
Andy W
43672b4a22 Fix "push_to_hub only create repo in consistency model lora SDXL training script" (#6102)
* fix

* style fix

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2023-12-27 15:25:19 +05:30
dg845
9df3d84382 Fix LCM distillation bug when creating the guidance scale embeddings using multiple GPUs. (#6279)
Fix bug when creating the guidance embeddings using multiple GPUs.

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2023-12-27 14:25:21 +05:30
Jianqi Pan
c751449011 fix: use retrieve_latents (#6337) 2023-12-27 10:44:26 +05:30
Dhruv Nair
c1e8bdf1d4 Move ControlNetXS into Community Folder (#6316)
* update

* update

* update

* update

* update

* make style

* remove docs

* update

* move to research folder.

* fix-copies

* remove _toctree entry.

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2023-12-27 08:15:23 +05:30
Sayak Paul
78b87dc25a [LoRA] make LoRAs trained with peft loadable when peft isn't installed (#6306)
* spit diffusers-native format from the get go.

* rejig the peft_to_diffusers mapping.
2023-12-27 08:01:10 +05:30
Will Berman
0af12f1f8a amused update links to new repo (#6344)
* amused update links to new repo

* lint
2023-12-26 22:46:28 +01:00
Justin Ruan
6e123688dc Remove unused parameters and fixed FutureWarning (#6317)
* Remove unused parameters and fixed `FutureWarning`

* Fixed wrong config instance

* update unittest for `DDIMInverseScheduler`
2023-12-26 22:09:10 +01:00
YiYi Xu
f0a588b8e2 adding auto1111 features to inpainting pipeline (#6072)
* add inpaint_full_res

* fix

* update

* move get_crop_region to image processor

* Update src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_inpaint.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* move apply_overlay to image processor

---------

Co-authored-by: yiyixuxu <yixu310@gmail,com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2023-12-26 10:20:29 -10:00
priprapre
fa31704420 [SDXL-IP2P] Update README_sdxl, Replace the link for wandb log with the correct run (#6270)
Replace the link for wandb log with the correct run
2023-12-26 21:13:11 +01:00
Sayak Paul
9d79991da0 [Docs] fix: video rendering on svd. (#6330)
fix: video rendering on svd.
2023-12-26 21:05:22 +01:00
Will Berman
7d865ac9c6 amused other pipelines docs (#6343)
other pipelines
2023-12-26 20:20:32 +01:00
Dhruv Nair
fb02316db8 Add AnimateDiff conversion scripts (#6340)
* add scripts

* update
2023-12-26 22:40:00 +05:30
Dhruv Nair
98a2b3d2d8 Update Animatediff docs (#6341)
* update

* update

* update
2023-12-26 22:39:46 +05:30
Dhruv Nair
2026ec0a02 Interruptable Pipelines (#5867)
* add interruptable pipelines

* add tests

* updatemsmq

* add interrupt property

* make fix copies

* Revert "make fix copies"

This reverts commit 914b35332b.

* add docs

* add tutorial

* Update docs/source/en/tutorials/interrupting_diffusion_process.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/tutorials/interrupting_diffusion_process.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* update

* fix quality issues

* fix

* update

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2023-12-26 22:39:26 +05:30
dg845
3706aa3305 Add rescale_betas_zero_snr Argument to DDPMScheduler (#6305)
* Add rescale_betas_zero_snr argument to DDPMScheduler.

* Propagate rescale_betas_zero_snr changes to DDPMParallelScheduler.

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2023-12-26 17:54:30 +01:00
Sayak Paul
d4f10ea362 [Diffusion fast] add doc for diffusion fast (#6311)
* add doc for diffusion fast

* add entry to _toctree

* Apply suggestions from code review

* fix titlew

* fix: title entry

* add note about fuse_qkv_projections
2023-12-26 22:19:55 +05:30
Younes Belkada
3aba99af8f [Peft / Lora] Add adapter_names in fuse_lora (#5823)
* add adapter_name in fuse

* add tesrt

* up

* fix CI

* adapt from suggestion

* Update src/diffusers/utils/testing_utils.py

Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>

* change to `require_peft_version_greater`

* change variable names in test

* Update src/diffusers/loaders/lora.py

Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>

* break into 2 lines

* final comments

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>
2023-12-26 16:54:47 +01:00
Sayak Paul
6683f97959 [Training] Add datasets version of LCM LoRA SDXL (#5778)
* add: script to train lcm lora for sdxl with 🤗 datasets

* suit up the args.

* remove comments.

* fix num_update_steps

* fix batch unmarshalling

* fix num_update_steps_per_epoch

* fix; dataloading.

* fix microconditions.

* unconditional predictions debug

* fix batch size.

* no need to use use_auth_token

* Apply suggestions from code review

Co-authored-by: Suraj Patil <surajp815@gmail.com>

* make vae encoding batch size an arg

* final serialization in kohya

* style

* state dict rejigging

* feat: no separate teacher unet.

* debug

* fix state dict serialization

* debug

* debug

* debug

* remove prints.

* remove kohya utility and make style

* fix serialization

* fix

* add test

* add peft dependency.

* add: peft

* remove peft

* autocast device determination from accelerator

* autocast

* reduce lora rank.

* remove unneeded space

* Apply suggestions from code review

Co-authored-by: Suraj Patil <surajp815@gmail.com>

* style

* remove prompt dropout.

* also save in native diffusers ckpt format.

* debug

* debug

* debug

* better formation of the null embeddings.

* remove space.

* autocast fixes.

* autocast fix.

* hacky

* remove lora_sayak

* Apply suggestions from code review

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

* style

* make log validation leaner.

* move back enabled in.

* fix: log_validation call.

* add: checkpointing tests

* taking my chances to see if disabling autocasting has any effect?

* start debugging

* name

* name

* name

* more debug

* more debug

* index

* remove index.

* print length

* print length

* print length

* move unet.train() after add_adapter()

* disable some prints.

* enable_adapters() manually.

* remove prints.

* some changes.

* fix params_to_optimize

* more fixes

* debug

* debug

* remove print

* disable grad for certain contexts.

* Add support for IPAdapterFull (#5911)

* Add support for IPAdapterFull


Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

---------

Co-authored-by: YiYi Xu <yixu310@gmail.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Fix a bug in `add_noise` function  (#6085)

* fix

* copies

---------

Co-authored-by: yiyixuxu <yixu310@gmail,com>

* [Advanced Diffusion Script] Add Widget default text (#6100)

add widget

* [Advanced Training Script] Fix pipe example (#6106)

* IP-Adapter for StableDiffusionControlNetImg2ImgPipeline (#5901)

* adapter for StableDiffusionControlNetImg2ImgPipeline

* fix-copies

* fix-copies

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

* IP adapter support for most pipelines (#5900)

* support ip-adapter in src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_upscale.py

* support ip-adapter in src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_attend_and_excite.py

* support ip-adapter in src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_instruct_pix2pix.py

* update tests

* support ip-adapter in src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_panorama.py

* support ip-adapter in src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_sag.py

* support ip-adapter in src/diffusers/pipelines/stable_diffusion_safe/pipeline_stable_diffusion_safe.py

* support ip-adapter in src/diffusers/pipelines/latent_consistency_models/pipeline_latent_consistency_text2img.py

* support ip-adapter in src/diffusers/pipelines/latent_consistency_models/pipeline_latent_consistency_img2img.py

* support ip-adapter in src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_ldm3d.py

* revert changes to sd_attend_and_excite and sd_upscale

* make style

* fix broken tests

* update ip-adapter implementation to latest

* apply suggestions from review

---------

Co-authored-by: YiYi Xu <yixu310@gmail.com>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

* fix: lora_alpha

* make vae casting conditional/

* param upcasting

* propagate comments from https://github.com/huggingface/diffusers/pull/6145

Co-authored-by: dg845 <dgu8957@gmail.com>

* [Peft] fix saving / loading when unet is not "unet" (#6046)

* [Peft] fix saving / loading when unet is not "unet"

* Update src/diffusers/loaders/lora.py

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

* undo stablediffusion-xl changes

* use unet_name to get unet for lora helpers

* use unet_name

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

* [Wuerstchen] fix fp16 training and correct lora args (#6245)

fix fp16 training

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

* [docs] fix: animatediff docs (#6339)

fix: animatediff docs

* add: note about the new script in readme_sdxl.

* Revert "[Peft] fix saving / loading when unet is not "unet" (#6046)"

This reverts commit 4c7e983bb5.

* Revert "[Wuerstchen] fix fp16 training and correct lora args (#6245)"

This reverts commit 0bb9cf0216.

* Revert "[docs] fix: animatediff docs (#6339)"

This reverts commit 11659a6f74.

* remove tokenize_prompt().

* assistive comments around enable_adapters() and diable_adapters().

---------

Co-authored-by: Suraj Patil <surajp815@gmail.com>
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
Co-authored-by: Fabio Rigano <57982783+fabiorigano@users.noreply.github.com>
Co-authored-by: YiYi Xu <yixu310@gmail.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: yiyixuxu <yixu310@gmail,com>
Co-authored-by: apolinário <joaopaulo.passos@gmail.com>
Co-authored-by: Charchit Sharma <charchitsharma11@gmail.com>
Co-authored-by: Aryan V S <contact.aryanvs@gmail.com>
Co-authored-by: dg845 <dgu8957@gmail.com>
Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com>
2023-12-26 21:22:05 +05:30
Sayak Paul
4e7b0cb396 [docs] fix: animatediff docs (#6339)
fix: animatediff docs
2023-12-26 19:13:49 +05:30
Kashif Rasul
35b81fffae [Wuerstchen] fix fp16 training and correct lora args (#6245)
fix fp16 training

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2023-12-26 11:40:04 +01:00
Kashif Rasul
e0d8c910e9 [Peft] fix saving / loading when unet is not "unet" (#6046)
* [Peft] fix saving / loading when unet is not "unet"

* Update src/diffusers/loaders/lora.py

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

* undo stablediffusion-xl changes

* use unet_name to get unet for lora helpers

* use unet_name

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2023-12-26 11:39:28 +01:00
dg845
a3d31e3a3e Change LCM-LoRA README Script Example Learning Rates to 1e-4 (#6304)
Change README LCM-LoRA example learning rates to 1e-4.
2023-12-25 21:29:20 +05:30
Jianqi Pan
84c403aedb fix: cannot set guidance_scale (#6326)
fix: set guidance_scale
2023-12-25 21:16:57 +05:30
Sayak Paul
f4b0b26f7e [Tests] Speed up example tests (#6319)
* remove validation args from textual onverson tests

* reduce number of train steps in textual inversion tests

* fix: directories.

* debig

* fix: directories.

* remove validation tests from textual onversion

* try reducing the time of test_text_to_image_checkpointing_use_ema

* fix: directories

* speed up test_text_to_image_checkpointing

* speed up test_text_to_image_checkpointing_checkpoints_total_limit_removes_multiple_checkpoints

* fix

* speed up test_instruct_pix2pix_checkpointing_checkpoints_total_limit_removes_multiple_checkpoints

* set checkpoints_total_limit to 2.

* test_text_to_image_lora_checkpointing_checkpoints_total_limit_removes_multiple_checkpoints speed up

* speed up test_unconditional_checkpointing_checkpoints_total_limit_removes_multiple_checkpoints

* debug

* fix: directories.

* speed up test_instruct_pix2pix_checkpointing_checkpoints_total_limit

* speed up: test_controlnet_checkpointing_checkpoints_total_limit_removes_multiple_checkpoints

* speed up test_controlnet_sdxl

* speed up dreambooth tests

* speed up test_dreambooth_lora_checkpointing_checkpoints_total_limit_removes_multiple_checkpoints

* speed up test_custom_diffusion_checkpointing_checkpoints_total_limit_removes_multiple_checkpoints

* speed up test_text_to_image_lora_sdxl_text_encoder_checkpointing_checkpoints_total_limit

* speed up # checkpoint-2 should have been deleted

* speed up examples/text_to_image/test_text_to_image.py::TextToImage::test_text_to_image_checkpointing_checkpoints_total_limit

* additional speed ups

* style
2023-12-25 19:50:48 +05:30
Sayak Paul
89459a5d56 fix: lora peft dummy components (#6308)
* fix: lora peft dummy components

* fix: dummy components
2023-12-25 11:26:45 +05:30
Sayak Paul
008d9818a2 fix: t2i apdater paper link (#6314) 2023-12-25 10:45:14 +05:30
mwkldeveloper
2d43094ffc fix RuntimeError: Input type (float) and bias type (c10::Half) should be the same in train_text_to_image_lora.py (#6259)
* fix RuntimeError: Input type (float) and bias type (c10::Half) should be the same

* format source code

* format code

* remove the autocast blocks within the pipeline

* add autocast blocks to pipeline caller in train_text_to_image_lora.py
2023-12-24 14:34:35 +05:30
Celestial Phineas
7c05b975b7 Fix typos in the ValueError for a nested image list as StableDiffusionControlNetPipeline input. (#6286)
Fixed typos in the `ValueError` for a nested image list as input.
2023-12-24 14:32:24 +05:30
Dhruv Nair
fe574c8b29 LoRA Unfusion test fix (#6291)
update

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2023-12-24 14:31:48 +05:30
Sayak Paul
90b9479903 [LoRA PEFT] fix LoRA loading so that correct alphas are parsed (#6225)
* initialize alpha too.

* add: test

* remove config parsing

* store rank

* debug

* remove faulty test
2023-12-24 09:59:41 +05:30
apolinário
df76a39e1b Fix Prodigy optimizer in SDXL Dreambooth script (#6290)
* Fix ProdigyOPT in SDXL Dreambooth script

* style

* style
2023-12-22 06:42:04 -06:00
Bingxin Ke
3369bc810a [Community Pipeline] Add Marigold Monocular Depth Estimation (#6249)
* [Community Pipeline] Add Marigold Monocular Depth Estimation

- add single-file pipeline
- update README

* fix format - add one blank line

* format script with ruff

* use direct image link in example code

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2023-12-22 15:41:46 +05:30
Pedro Cuenca
7fe47596af Allow diffusers to load with Flax, w/o PyTorch (#6272) 2023-12-22 09:37:30 +01:00
Dhruv Nair
59d1caa238 Remove peft tests from old lora backend tests (#6273)
update
2023-12-22 13:35:52 +05:30
Dhruv Nair
c022e52923 Remove ONNX inpaint legacy (#6269)
update

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2023-12-22 13:35:21 +05:30
Will Berman
4039815276 open muse (#5437)
amused

rename

Update docs/source/en/api/pipelines/amused.md

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

AdaLayerNormContinuous default values

custom micro conditioning

micro conditioning docs

put lookup from codebook in constructor

fix conversion script

remove manual fused flash attn kernel

add training script

temp remove training script

add dummy gradient checkpointing func

clarify temperatures is an instance variable by setting it

remove additional SkipFF block args

hardcode norm args

rename tests folder

fix paths and samples

fix tests

add training script

training readme

lora saving and loading

non-lora saving/loading

some readme fixes

guards

Update docs/source/en/api/pipelines/amused.md

Co-authored-by: Suraj Patil <surajp815@gmail.com>

Update examples/amused/README.md

Co-authored-by: Suraj Patil <surajp815@gmail.com>

Update examples/amused/train_amused.py

Co-authored-by: Suraj Patil <surajp815@gmail.com>

vae upcasting

add fp16 integration tests

use tuple for micro cond

copyrights

remove casts

delegate to torch.nn.LayerNorm

move temperature to pipeline call

upsampling/downsampling changes
2023-12-21 11:40:55 -08:00
Sayak Paul
5b186b7128 [Refactor] move ldm3d out of stable_diffusion. (#6263)
ldm3d.
2023-12-21 18:59:55 +05:30
Sayak Paul
ab0459f2b7 [Deprecated pipelines] remove pix2pix zero from init (#6268)
remove pix2pix zero from init
2023-12-21 18:17:28 +05:30
Sayak Paul
9c7cc36011 [Refactor] move panorama out of stable_diffusion (#6262)
* move panorama out.

* fix: diffedit

* fix: import.

* fix: impirt
2023-12-21 18:17:05 +05:30
Sayak Paul
325f6c53ed [Refactor] move attend and excite out of stable_diffusion. (#6261)
* move attend and excite out.

* fix: import

* fix diffedit
2023-12-21 16:49:32 +05:30
Benjamin Bossan
43979c2890 TST Fix LoRA test that fails with PEFT >= 0.7.0 (#6216)
See #6185 for context.

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2023-12-21 11:50:05 +01:00
Sayak Paul
9ea6ac1b07 [Refactor] move sag out of stable_diffusion (#6264)
move sag out of .
2023-12-21 16:09:49 +05:30
Sayak Paul
2c34c7d6dd [Refactor] move gligen out of stable diffusion. (#6265)
* move gligen out of stable diffusion.

* fix: import

* fix import module
2023-12-21 15:26:52 +05:30
Sayak Paul
bffadde126 [Refactor] move k diffusion out of stable_diffusion (#6267)
move k diffusion out of stable_diffusion
2023-12-21 15:24:24 +05:30
YShow
35a969d297 [Training] remove depcreated method from lora scripts again (#6266)
* remove depcreated method from lora scripts

* check code quality
2023-12-21 14:17:52 +05:30
sayakpaul
c5ff469d0e Revert "move attend and excite out of stable_diffusion"
This reverts commit bcecfbc873.
2023-12-21 12:35:58 +05:30
sayakpaul
bcecfbc873 move attend and excite out of stable_diffusion 2023-12-21 12:35:09 +05:30
Sayak Paul
6269045c5b [Refactor] move diffedit out of stable_diffusion (#6260)
* move diffedit out of stable_diffuson

* fix: import

* style

* fix: import
2023-12-21 12:26:36 +05:30
lvzi
6ca9c4af05 fix: unscale fp16 gradient problem & potential error (#6086) (#6231)
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2023-12-21 09:09:26 +05:30
dependabot[bot]
0532cece97 Bump transformers from 4.34.0 to 4.36.0 in /examples/research_projects/realfill (#6255)
Bump transformers in /examples/research_projects/realfill

Bumps [transformers](https://github.com/huggingface/transformers) from 4.34.0 to 4.36.0.
- [Release notes](https://github.com/huggingface/transformers/releases)
- [Commits](https://github.com/huggingface/transformers/compare/v4.34.0...v4.36.0)

---
updated-dependencies:
- dependency-name: transformers
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-12-21 09:03:17 +05:30
Sayak Paul
22b45304bf [Refactor upsamplers and downsamplers] separate out upsamplers and downsamplers. (#6128)
* separate out upsamplers and downsamplers.

* import all the necessary blocks in resnet for backward comp.

* move upsample2d and downsample2d to utils.

* move downsample_2d to downsamplers.py

* apply feedback

* fix import

* samplers -> sampling
2023-12-20 21:01:33 +05:30
Beinsezii
457abdf2cf EulerAncestral add rescale_betas_zero_snr (#6187)
* EulerAncestral add `rescale_betas_zero_snr`

Uses same infinite sigma fix from EulerDiscrete. Interestingly the
ancestral version had the opposite problem: too much contrast instead of
too little.

* UT for EulerAncestral `rescale_betas_zero_snr`

* EulerAncestral upcast samples during step()

It helps this scheduler too, particularly when the model is using bf16.

While the noise dtype is still the model's it's automatically upcasted
for the add so all it affects is determinism.

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2023-12-20 13:09:25 +05:30
hako-mikan
ff43dba7ea [Fix] Fix Regional Prompting Pipeline (#6188)
* Update regional_prompting_stable_diffusion.py

* reformat

* reformat

* reformat

* reformat

* reformat

* reformat

* reformat

* regormat

* reformat

* reformat

* reformat

* reformat

* Update regional_prompting_stable_diffusion.py

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2023-12-20 10:37:19 +05:30
Steven Liu
5433962992 [docs] Batched seeds (#6237)
batched seed
2023-12-19 16:50:18 -08:00
raven
df476d9f63 [Docs] Fix a code example in the ControlNet Inpainting documentation (#6236)
fix document on masked image in inpainting controlnet
2023-12-19 13:14:37 -08:00
YiYi Xu
3e71a20650 [refactor embeddings]pixart-alpha (#6212)
pixart-alpha

Co-authored-by: yiyixuxu <yixu310@gmail,com>
2023-12-19 07:07:24 -10:00
Sayak Paul
bf40d7d82a add peft dependency to fast push tests (#6229)
* add peft dependency

* add peft dependency at the correct place.
2023-12-19 13:26:25 +05:30
Dhruv Nair
32ff4773d4 ControlNetXS fixes. (#6228)
update
2023-12-19 11:58:34 +05:30
Sayak Paul
288ceebea5 [T2I LoRA training] fix: unscale fp16 gradient problem (#6119)
* fix: unscale fp16 gradient problem

* fix for dreambooth lora sdxl

* make the type-casting conditional.

* Apply suggestions from code review

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

---------

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2023-12-19 09:54:17 +05:30
Sayak Paul
9221da4063 fix: init for vae during pixart tests (#6215)
* fix: init for vae during pixart tests

* print the values

* add flatten

* correct assertion value for test_inference

* correct assertion values for test_inference_non_square_images

* run styling

* debug test_inference_with_multiple_images_per_prompt

* fix assertion values for test_inference_with_multiple_images_per_prompt
2023-12-18 18:16:57 -10:00
YiYi Xu
57fde871e1 offload the optional module image_encoder (#6151)
* offload image_encoder

* add test

---------

Co-authored-by: yiyixuxu <yixu310@gmail,com>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2023-12-18 15:10:01 -10:00
Fabio Rigano
68e962395c Add converter method for ip adapters (#6150)
* Add converter method for ip adapters

* Move converter method

* Update to image proj converter

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2023-12-18 13:46:43 -10:00
Dhruv Nair
781775ea56 Slow Test for Pipelines minor fixes (#6221)
update
2023-12-19 00:45:51 +05:30
Patrick von Platen
fa3c86beaf [SVD] Fix guidance scale (#6002)
* [SVD] Fix guidance scale

* make style
2023-12-18 19:33:24 +01:00
Haofan Wang
7d0a47f387 Update train_text_to_image_lora.py (#6144)
* Update train_text_to_image_lora.py

* Fix typo?

---------

Co-authored-by: M. Tolga Cangöz <46008593+standardAI@users.noreply.github.com>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2023-12-18 19:33:05 +01:00
Aryan V S
67b3d3267e Support img2img and inpaint in lpw-xl (#6114)
* add img2img and inpaint support to lpw-xl

* update community README

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2023-12-18 19:19:11 +01:00
TilmannR
4e77056885 Update README.md (#6191)
Typo: The script for LoRA training is `train_text_to_image_lora_prior.py` not `train_text_to_image_prior_lora.py`.

Alternatively you could rename the file and keep the README.md unchanged.
2023-12-18 19:08:29 +01:00
Dhruv Nair
a0c54828a1 Deprecate Pipelines (#6169)
* deprecate pipe

* make style

* update

* add deprecation message

* format

* remove tests for deprecated pipelines

* remove deprecation message

* make style

* fix copies

* clean up

* clean

* clean

* clean

* clean up

* clean up

* clean up toctree

* clean up

---------

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2023-12-18 23:08:29 +05:30
Patrick von Platen
8d891e6e1b [Torch Compile] Fix torch compile for svd vae (#6217) 2023-12-18 18:21:17 +01:00
Patrick von Platen
cce1fe2d41 [Text-to-Video] Clean up pipeline (#6213)
* make style

* make style

* make style

* make style
2023-12-18 18:21:09 +01:00
Abin Thomas
d816bcb5e8 Fix t2i. blog url (#6205) 2023-12-18 09:12:28 -08:00
d8ahazard
6976cab7ca Fix possible re-conversion issues after extracting from safetensors (#6097)
* Fix possible re-conversion issues after extracting from diffusers

Properly rename specific vae keys.

* Whoops
2023-12-18 11:51:20 +01:00
Dhruv Nair
fcbed3fa79 Fix SDXL Inpainting from single file with Refiner Model (#6147)
* update

* update

* update
2023-12-18 11:45:37 +01:00
Sayak Paul
b98b314b7a [Training] remove depcreated method from lora scripts. (#6207)
remove depcreated method from lora scripts.
2023-12-18 15:52:43 +05:30
Omar Sanseviero
74558ff65b Nit fix to training params (#6200) 2023-12-18 11:06:16 +01:00
Yudong Jin
49644babd3 Fix the test script in examples/text_to_image/README.md (#6209)
* Update examples/text_to_image/README.md

* Update examples/text_to_image/README.md

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2023-12-18 15:36:00 +05:30
Sayak Paul
56b3b21693 [Refactor autoencoders] feat: introduce autoencoders module (#6129)
* feat: introduce autoencoders module

* more changes for styling and copy fixing

* path changes in the docs.

* fix: import structure in init.

* fix controlnetxs import
2023-12-18 12:42:15 +05:30
Sayak Paul
9cef07da5a [Benchmarks] fix: lcm benchmarking reporting (#6198)
* fix: lcm benchmarking reporting

* fix generate_csv_dict call.
2023-12-17 15:32:11 +05:30
Sayak Paul
2d94c7838e [Core] feat: enable fused attention projections for other SD and SDXL pipelines (#6179)
* feat: enable fused attention projections for other SD and SDXL pipelines

* add: test for SD fused projections.
2023-12-16 08:45:54 +05:30
Sayak Paul
a81334e3f0 [LoRA] add an error message when dealing with _best_guess_weight_name ofline (#6184)
* add an error message when dealing with _best_guess_weight_name ofline

* simplify condition
2023-12-16 08:36:08 +05:30
Dhruv Nair
d704a730cd Compile test fix (#6104)
* update

* update
2023-12-15 18:34:46 +05:30
dg845
49db233b35 Clean Up Comments in LCM(-LoRA) Distillation Scripts. (#6145)
* Clean up comments in LCM(-LoRA) distillation scripts.

* Calculate predicted source noise noise_pred correctly for all prediction_types.

* make style

* apply suggestions from review

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2023-12-15 18:18:16 +05:30
Dhruv Nair
93ea26f272 Add PEFT to training deps (#6148)
add peft to training deps

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2023-12-15 08:39:59 +05:30
Dhruv Nair
f5dfe2a8b0 LoRA test fixes (#6163)
* update

* update

* update

* update

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2023-12-15 08:39:41 +05:30
Patrick von Platen
4836cfad98 [Sigmas] Keep sigmas on CPU (#6173)
* correct

* Apply suggestions from code review

* make style
2023-12-15 07:43:18 +05:30
Kuba
1ccbfbb663 [docs] Add missing \ in lora.md (#6174) 2023-12-14 16:55:43 -08:00
Linoy Tsaban
29dfe22a8e [advanced dreambooth lora sdxl training script] load pipeline for inference only if validation prompt is used (#6171)
* load pipeline for inference only if validation prompt is used

* move things outside

* load pipeline for inference only if validation prompt is used

* fix readme when validation prompt is used

---------

Co-authored-by: linoytsaban <linoy@huggingface.co>
Co-authored-by: apolinário <joaopaulo.passos@gmail.com>
2023-12-14 11:45:33 -06:00
Aryan V S
56806cdbfd Add missing subclass docs, Fix broken example in SD_safe (#6116)
* fix broken example in pipeline_stable_diffusion_safe

* fix typo in pipeline_stable_diffusion_pix2pix_zero

* add missing docs

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2023-12-14 09:20:30 -08:00
Steven Liu
8ccc76ab37 [docs] IP-Adapter API doc (#6140)
add ip-adapter

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2023-12-14 09:19:37 -08:00
Monohydroxides
c46711e895 [Community] Add SDE Drag pipeline (#6105)
* Add community pipeline: sde_drag.py

* Update README.md

* Update README.md

Update example code and visual example

* Update sde_drag.py

Update code example.
2023-12-14 20:47:20 +05:30
Sayak Paul
1d686bac81 [feat: Benchmarking Workflow] add stuff for a benchmarking workflow (#5839)
* add poc for benchmarking workflow.

* import

* fix argument

* fix: argument

* fix: path

* fix

* fix

* path

* output csv files.

* workflow cleanup

* append token

* add utility to push to hf dataset

* fix: kw arg

* better reporting

* fix: headers

* better formatting of the numbers.

* better type annotation

* fix: formatting

* moentarily disable check

* push results.

* remove disable check

* introduce base classes.

* img2img class

* add inpainting pipeline

* intoduce base benchmark class.

* add img2img and inpainting

* feat: utility to compare changes

* fix

* fix import

* add args

* basepath

* better exception handling

* better path handling

* fix

* fix

* remove

* ifx

* fix

* add: support for controlnet.

* image_url -> url

* move images to huggingface hub

* correct urls.

* root_ckpt

* flush before benchmarking

* don't install accelerate from source

* add runner

* simplify Diffusers Benchmarking step

* change runner

* fix: subprocess call.

* filter percentage values

* fix controlnet benchmark

* add t2i adapters.

* fix filter columns

* fix t2i adapter benchmark

* fix init.

* fix

* remove safetensors flag

* fix args print

* fix

* feat: run_command

* add adapter resolution mapping

* benchmark t2i adapter fix.

* fix adapter input

* fix

* convert to L.

* add flush() add appropriate places

* better filtering

* okay

* get env for torch

* convert to float

* fix

* filter out nans.

* better coment

* sdxl

* sdxl for other benchmarks.

* fix: condition

* fix: condition for inpainting

* fix: mapping for resolution

* fix

* include kandinsky and wuerstchen

* fix: Wuerstchen

* Empty-Commit

* [Community] AnimateDiff + Controlnet Pipeline (#5928)

* begin work on animatediff + controlnet pipeline

* complete todos, uncomment multicontrolnet, input checks

Co-Authored-By: EdoardoBotta <botta.edoardo@gmail.com>

* update

Co-Authored-By: EdoardoBotta <botta.edoardo@gmail.com>

* add example

* update community README

* Update examples/community/README.md

---------

Co-authored-by: EdoardoBotta <botta.edoardo@gmail.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* EulerDiscreteScheduler add `rescale_betas_zero_snr` (#6024)

* EulerDiscreteScheduler add `rescale_betas_zero_snr`

* Revert "[Community] AnimateDiff + Controlnet Pipeline (#5928)"

This reverts commit 821726d7c0.

* Revert "EulerDiscreteScheduler add `rescale_betas_zero_snr` (#6024)"

This reverts commit 3dc2362b5a.

* add SDXL turbo

* add lcm lora to the mix as well.

* fix

* increase steps to 2 when running turbo i2i

* debug

* debug

* debug

* fix for good

* fix and isolate better

* fuse lora so that torch compile works with peft

* fix: LCMLoRA

* better identification for LCM

* change to cron job

---------

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>
Co-authored-by: Aryan V S <contact.aryanvs@gmail.com>
Co-authored-by: EdoardoBotta <botta.edoardo@gmail.com>
Co-authored-by: Beinsezii <39478211+Beinsezii@users.noreply.github.com>
2023-12-12 11:03:34 +05:30
M. Tolga Cangöz
0a401b95b7 [Docs] Fix typos (#6122)
Fix typos and trim trailing whitespaces
2023-12-11 10:55:28 -08:00
Edward Li
664e931bcb Correct type annotation for VaeImageProcessor.numpy_to_pil (#6111)
From `(np.ndarray) -> PIL.Image.Image` to `(np.ndarray) -> List[PIL.Image.Image]`.
2023-12-11 15:22:04 +05:30
Aryan V S
88bdd97ccd IP adapter support for most pipelines (#5900)
* support ip-adapter in src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_upscale.py

* support ip-adapter in src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_attend_and_excite.py

* support ip-adapter in src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_instruct_pix2pix.py

* update tests

* support ip-adapter in src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_panorama.py

* support ip-adapter in src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_sag.py

* support ip-adapter in src/diffusers/pipelines/stable_diffusion_safe/pipeline_stable_diffusion_safe.py

* support ip-adapter in src/diffusers/pipelines/latent_consistency_models/pipeline_latent_consistency_text2img.py

* support ip-adapter in src/diffusers/pipelines/latent_consistency_models/pipeline_latent_consistency_img2img.py

* support ip-adapter in src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_ldm3d.py

* revert changes to sd_attend_and_excite and sd_upscale

* make style

* fix broken tests

* update ip-adapter implementation to latest

* apply suggestions from review

---------

Co-authored-by: YiYi Xu <yixu310@gmail.com>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2023-12-10 21:19:14 +05:30
Charchit Sharma
08b453e382 IP-Adapter for StableDiffusionControlNetImg2ImgPipeline (#5901)
* adapter for StableDiffusionControlNetImg2ImgPipeline

* fix-copies

* fix-copies

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2023-12-09 11:02:55 +05:30
apolinário
2a111bc9fe [Advanced Training Script] Fix pipe example (#6106) 2023-12-08 15:56:35 +01:00
966 changed files with 61947 additions and 23102 deletions

View File

@@ -1,4 +1,4 @@
contact_links:
- name: Forum
url: https://discuss.huggingface.co/c/discussion-related-to-httpsgithubcomhuggingfacediffusers/63
- name: Questions / Discussions
url: https://github.com/huggingface/diffusers/discussions
about: General usage questions and community discussions

View File

@@ -38,7 +38,7 @@ members/contributors who may be interested in your PR.
Core library:
- Schedulers: @williamberman and @patrickvonplaten
- Schedulers: @yiyixuxu and @patrickvonplaten
- Pipelines: @patrickvonplaten and @sayakpaul
- Training examples: @sayakpaul and @patrickvonplaten
- Docs: @stevhliu and @yiyixuxu

52
.github/workflows/benchmark.yml vendored Normal file
View File

@@ -0,0 +1,52 @@
name: Benchmarking tests
on:
schedule:
- cron: "30 1 1,15 * *" # every 2 weeks on the 1st and the 15th of every month at 1:30 AM
env:
DIFFUSERS_IS_CI: yes
HF_HOME: /mnt/cache
OMP_NUM_THREADS: 8
MKL_NUM_THREADS: 8
jobs:
torch_pipelines_cuda_benchmark_tests:
name: Torch Core Pipelines CUDA Benchmarking Tests
strategy:
fail-fast: false
max-parallel: 1
runs-on: [single-gpu, nvidia-gpu, a10, ci]
container:
image: diffusers/diffusers-pytorch-cuda
options: --shm-size "16gb" --ipc host -v /mnt/hf_cache:/mnt/cache/ --gpus 0
steps:
- name: Checkout diffusers
uses: actions/checkout@v3
with:
fetch-depth: 2
- name: NVIDIA-SMI
run: |
nvidia-smi
- name: Install dependencies
run: |
apt-get update && apt-get install libsndfile1-dev libgl1 -y
python -m pip install -e .[quality,test]
python -m pip install pandas
- name: Environment
run: |
python utils/print_env.py
- name: Diffusers Benchmarking
env:
HUGGING_FACE_HUB_TOKEN: ${{ secrets.DIFFUSERS_BOT_TOKEN }}
BASE_PATH: benchmark_outputs
run: |
export TOTAL_GPU_MEMORY=$(python -c "import torch; print(torch.cuda.get_device_properties(0).total_memory / (1024**3))")
cd benchmarks && mkdir ${BASE_PATH} && python run_all.py && python push_results.py
- name: Test suite reports artifacts
if: ${{ always() }}
uses: actions/upload-artifact@v2
with:
name: benchmark_test_reports
path: benchmarks/benchmark_outputs

View File

@@ -1,14 +0,0 @@
name: Delete doc comment
on:
workflow_run:
workflows: ["Delete doc comment trigger"]
types:
- completed
jobs:
delete:
uses: huggingface/doc-builder/.github/workflows/delete_doc_comment.yml@main
secrets:
comment_bot_token: ${{ secrets.COMMENT_BOT_TOKEN }}

View File

@@ -1,12 +0,0 @@
name: Delete doc comment trigger
on:
pull_request:
types: [ closed ]
jobs:
delete:
uses: huggingface/doc-builder/.github/workflows/delete_doc_comment_trigger.yml@main
with:
pr_number: ${{ github.event.number }}

View File

@@ -59,7 +59,7 @@ jobs:
- name: Run fast PyTorch LoRA CPU tests with PEFT backend
run: |
python -m pytest -n 2 --max-worker-restart=0 --dist=loadfile \
python -m pytest -n 1 --max-worker-restart=0 --dist=loadfile \
-s -v \
--make-reports=tests_${{ matrix.config.report }} \
tests/lora/test_lora_layers_peft.py

View File

@@ -34,11 +34,6 @@ jobs:
runner: docker-cpu
image: diffusers/diffusers-pytorch-cpu
report: torch_cpu_models_schedulers
- name: LoRA
framework: lora
runner: docker-cpu
image: diffusers/diffusers-pytorch-cpu
report: torch_cpu_lora
- name: Fast Flax CPU tests
framework: flax
runner: docker-cpu
@@ -94,14 +89,6 @@ jobs:
--make-reports=tests_${{ matrix.config.report }} \
tests/models tests/schedulers tests/others
- name: Run fast PyTorch LoRA CPU tests
if: ${{ matrix.config.framework == 'lora' }}
run: |
python -m pytest -n 2 --max-worker-restart=0 --dist=loadfile \
-s -v -k "not Flax and not Onnx and not Dependency" \
--make-reports=tests_${{ matrix.config.report }} \
tests/lora
- name: Run fast Flax TPU tests
if: ${{ matrix.config.framework == 'flax' }}
run: |

View File

@@ -189,7 +189,7 @@ jobs:
CUBLAS_WORKSPACE_CONFIG: :16:8
run: |
python -m pytest -n 1 --max-worker-restart=0 --dist=loadfile \
-s -v -k "not Flax and not Onnx" \
-s -v -k "not Flax and not Onnx and not PEFTLoRALoading" \
--make-reports=tests_peft_cuda \
tests/lora/

View File

@@ -98,6 +98,7 @@ jobs:
- name: Run example PyTorch CPU tests
if: ${{ matrix.config.framework == 'pytorch_examples' }}
run: |
python -m pip install peft
python -m pytest -n 2 --max-worker-restart=0 --dist=loadfile \
--make-reports=tests_${{ matrix.config.report }} \
examples

View File

@@ -1,4 +1,4 @@
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

View File

@@ -3,7 +3,7 @@
# make sure to test the local checkout in scripts and not the pre-installed one (don't use quotes!)
export PYTHONPATH = src
check_dirs := examples scripts src tests utils
check_dirs := examples scripts src tests utils benchmarks
modified_only_fixup:
$(eval modified_py_files := $(shell python utils/get_modified_files.py $(check_dirs)))

View File

@@ -1,4 +1,4 @@
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

View File

@@ -77,7 +77,7 @@ Please refer to the [How to use Stable Diffusion in Apple Silicon](https://huggi
## Quickstart
Generating outputs is super easy with 🤗 Diffusers. To generate an image from text, use the `from_pretrained` method to load any pretrained diffusion model (browse the [Hub](https://huggingface.co/models?library=diffusers&sort=downloads) for 15000+ checkpoints):
Generating outputs is super easy with 🤗 Diffusers. To generate an image from text, use the `from_pretrained` method to load any pretrained diffusion model (browse the [Hub](https://huggingface.co/models?library=diffusers&sort=downloads) for 19000+ checkpoints):
```python
from diffusers import DiffusionPipeline
@@ -219,7 +219,7 @@ Also, say 👋 in our public Discord channel <a href="https://discord.gg/G7tWnz9
- https://github.com/deep-floyd/IF
- https://github.com/bentoml/BentoML
- https://github.com/bmaltais/kohya_ss
- +6000 other amazing GitHub repositories 💪
- +8000 other amazing GitHub repositories 💪
Thank you for using us ❤️.

316
benchmarks/base_classes.py Normal file
View File

@@ -0,0 +1,316 @@
import os
import sys
import torch
from diffusers import (
AutoPipelineForImage2Image,
AutoPipelineForInpainting,
AutoPipelineForText2Image,
ControlNetModel,
LCMScheduler,
StableDiffusionAdapterPipeline,
StableDiffusionControlNetPipeline,
StableDiffusionXLAdapterPipeline,
StableDiffusionXLControlNetPipeline,
T2IAdapter,
WuerstchenCombinedPipeline,
)
from diffusers.utils import load_image
sys.path.append(".")
from utils import ( # noqa: E402
BASE_PATH,
PROMPT,
BenchmarkInfo,
benchmark_fn,
bytes_to_giga_bytes,
flush,
generate_csv_dict,
write_to_csv,
)
RESOLUTION_MAPPING = {
"runwayml/stable-diffusion-v1-5": (512, 512),
"lllyasviel/sd-controlnet-canny": (512, 512),
"diffusers/controlnet-canny-sdxl-1.0": (1024, 1024),
"TencentARC/t2iadapter_canny_sd14v1": (512, 512),
"TencentARC/t2i-adapter-canny-sdxl-1.0": (1024, 1024),
"stabilityai/stable-diffusion-2-1": (768, 768),
"stabilityai/stable-diffusion-xl-base-1.0": (1024, 1024),
"stabilityai/stable-diffusion-xl-refiner-1.0": (1024, 1024),
"stabilityai/sdxl-turbo": (512, 512),
}
class BaseBenchmak:
pipeline_class = None
def __init__(self, args):
super().__init__()
def run_inference(self, args):
raise NotImplementedError
def benchmark(self, args):
raise NotImplementedError
def get_result_filepath(self, args):
pipeline_class_name = str(self.pipe.__class__.__name__)
name = (
args.ckpt.replace("/", "_")
+ "_"
+ pipeline_class_name
+ f"-bs@{args.batch_size}-steps@{args.num_inference_steps}-mco@{args.model_cpu_offload}-compile@{args.run_compile}.csv"
)
filepath = os.path.join(BASE_PATH, name)
return filepath
class TextToImageBenchmark(BaseBenchmak):
pipeline_class = AutoPipelineForText2Image
def __init__(self, args):
pipe = self.pipeline_class.from_pretrained(args.ckpt, torch_dtype=torch.float16)
pipe = pipe.to("cuda")
if args.run_compile:
if not isinstance(pipe, WuerstchenCombinedPipeline):
pipe.unet.to(memory_format=torch.channels_last)
print("Run torch compile")
pipe.unet = torch.compile(pipe.unet, mode="reduce-overhead", fullgraph=True)
if hasattr(pipe, "movq") and getattr(pipe, "movq", None) is not None:
pipe.movq.to(memory_format=torch.channels_last)
pipe.movq = torch.compile(pipe.movq, mode="reduce-overhead", fullgraph=True)
else:
print("Run torch compile")
pipe.decoder = torch.compile(pipe.decoder, mode="reduce-overhead", fullgraph=True)
pipe.vqgan = torch.compile(pipe.vqgan, mode="reduce-overhead", fullgraph=True)
pipe.set_progress_bar_config(disable=True)
self.pipe = pipe
def run_inference(self, pipe, args):
_ = pipe(
prompt=PROMPT,
num_inference_steps=args.num_inference_steps,
num_images_per_prompt=args.batch_size,
)
def benchmark(self, args):
flush()
print(f"[INFO] {self.pipe.__class__.__name__}: Running benchmark with: {vars(args)}\n")
time = benchmark_fn(self.run_inference, self.pipe, args) # in seconds.
memory = bytes_to_giga_bytes(torch.cuda.max_memory_allocated()) # in GBs.
benchmark_info = BenchmarkInfo(time=time, memory=memory)
pipeline_class_name = str(self.pipe.__class__.__name__)
flush()
csv_dict = generate_csv_dict(
pipeline_cls=pipeline_class_name, ckpt=args.ckpt, args=args, benchmark_info=benchmark_info
)
filepath = self.get_result_filepath(args)
write_to_csv(filepath, csv_dict)
print(f"Logs written to: {filepath}")
flush()
class TurboTextToImageBenchmark(TextToImageBenchmark):
def __init__(self, args):
super().__init__(args)
def run_inference(self, pipe, args):
_ = pipe(
prompt=PROMPT,
num_inference_steps=args.num_inference_steps,
num_images_per_prompt=args.batch_size,
guidance_scale=0.0,
)
class LCMLoRATextToImageBenchmark(TextToImageBenchmark):
lora_id = "latent-consistency/lcm-lora-sdxl"
def __init__(self, args):
super().__init__(args)
self.pipe.load_lora_weights(self.lora_id)
self.pipe.fuse_lora()
self.pipe.scheduler = LCMScheduler.from_config(self.pipe.scheduler.config)
def get_result_filepath(self, args):
pipeline_class_name = str(self.pipe.__class__.__name__)
name = (
self.lora_id.replace("/", "_")
+ "_"
+ pipeline_class_name
+ f"-bs@{args.batch_size}-steps@{args.num_inference_steps}-mco@{args.model_cpu_offload}-compile@{args.run_compile}.csv"
)
filepath = os.path.join(BASE_PATH, name)
return filepath
def run_inference(self, pipe, args):
_ = pipe(
prompt=PROMPT,
num_inference_steps=args.num_inference_steps,
num_images_per_prompt=args.batch_size,
guidance_scale=1.0,
)
def benchmark(self, args):
flush()
print(f"[INFO] {self.pipe.__class__.__name__}: Running benchmark with: {vars(args)}\n")
time = benchmark_fn(self.run_inference, self.pipe, args) # in seconds.
memory = bytes_to_giga_bytes(torch.cuda.max_memory_allocated()) # in GBs.
benchmark_info = BenchmarkInfo(time=time, memory=memory)
pipeline_class_name = str(self.pipe.__class__.__name__)
flush()
csv_dict = generate_csv_dict(
pipeline_cls=pipeline_class_name, ckpt=self.lora_id, args=args, benchmark_info=benchmark_info
)
filepath = self.get_result_filepath(args)
write_to_csv(filepath, csv_dict)
print(f"Logs written to: {filepath}")
flush()
class ImageToImageBenchmark(TextToImageBenchmark):
pipeline_class = AutoPipelineForImage2Image
url = "https://huggingface.co/datasets/diffusers/docs-images/resolve/main/benchmarking/1665_Girl_with_a_Pearl_Earring.jpg"
image = load_image(url).convert("RGB")
def __init__(self, args):
super().__init__(args)
self.image = self.image.resize(RESOLUTION_MAPPING[args.ckpt])
def run_inference(self, pipe, args):
_ = pipe(
prompt=PROMPT,
image=self.image,
num_inference_steps=args.num_inference_steps,
num_images_per_prompt=args.batch_size,
)
class TurboImageToImageBenchmark(ImageToImageBenchmark):
def __init__(self, args):
super().__init__(args)
def run_inference(self, pipe, args):
_ = pipe(
prompt=PROMPT,
image=self.image,
num_inference_steps=args.num_inference_steps,
num_images_per_prompt=args.batch_size,
guidance_scale=0.0,
strength=0.5,
)
class InpaintingBenchmark(ImageToImageBenchmark):
pipeline_class = AutoPipelineForInpainting
mask_url = "https://huggingface.co/datasets/diffusers/docs-images/resolve/main/benchmarking/overture-creations-5sI6fQgYIuo_mask.png"
mask = load_image(mask_url).convert("RGB")
def __init__(self, args):
super().__init__(args)
self.image = self.image.resize(RESOLUTION_MAPPING[args.ckpt])
self.mask = self.mask.resize(RESOLUTION_MAPPING[args.ckpt])
def run_inference(self, pipe, args):
_ = pipe(
prompt=PROMPT,
image=self.image,
mask_image=self.mask,
num_inference_steps=args.num_inference_steps,
num_images_per_prompt=args.batch_size,
)
class ControlNetBenchmark(TextToImageBenchmark):
pipeline_class = StableDiffusionControlNetPipeline
aux_network_class = ControlNetModel
root_ckpt = "runwayml/stable-diffusion-v1-5"
url = "https://huggingface.co/datasets/diffusers/docs-images/resolve/main/benchmarking/canny_image_condition.png"
image = load_image(url).convert("RGB")
def __init__(self, args):
aux_network = self.aux_network_class.from_pretrained(args.ckpt, torch_dtype=torch.float16)
pipe = self.pipeline_class.from_pretrained(self.root_ckpt, controlnet=aux_network, torch_dtype=torch.float16)
pipe = pipe.to("cuda")
pipe.set_progress_bar_config(disable=True)
self.pipe = pipe
if args.run_compile:
pipe.unet.to(memory_format=torch.channels_last)
pipe.controlnet.to(memory_format=torch.channels_last)
print("Run torch compile")
pipe.unet = torch.compile(pipe.unet, mode="reduce-overhead", fullgraph=True)
pipe.controlnet = torch.compile(pipe.controlnet, mode="reduce-overhead", fullgraph=True)
self.image = self.image.resize(RESOLUTION_MAPPING[args.ckpt])
def run_inference(self, pipe, args):
_ = pipe(
prompt=PROMPT,
image=self.image,
num_inference_steps=args.num_inference_steps,
num_images_per_prompt=args.batch_size,
)
class ControlNetSDXLBenchmark(ControlNetBenchmark):
pipeline_class = StableDiffusionXLControlNetPipeline
root_ckpt = "stabilityai/stable-diffusion-xl-base-1.0"
def __init__(self, args):
super().__init__(args)
class T2IAdapterBenchmark(ControlNetBenchmark):
pipeline_class = StableDiffusionAdapterPipeline
aux_network_class = T2IAdapter
root_ckpt = "CompVis/stable-diffusion-v1-4"
url = "https://huggingface.co/datasets/diffusers/docs-images/resolve/main/benchmarking/canny_for_adapter.png"
image = load_image(url).convert("L")
def __init__(self, args):
aux_network = self.aux_network_class.from_pretrained(args.ckpt, torch_dtype=torch.float16)
pipe = self.pipeline_class.from_pretrained(self.root_ckpt, adapter=aux_network, torch_dtype=torch.float16)
pipe = pipe.to("cuda")
pipe.set_progress_bar_config(disable=True)
self.pipe = pipe
if args.run_compile:
pipe.unet.to(memory_format=torch.channels_last)
pipe.adapter.to(memory_format=torch.channels_last)
print("Run torch compile")
pipe.unet = torch.compile(pipe.unet, mode="reduce-overhead", fullgraph=True)
pipe.adapter = torch.compile(pipe.adapter, mode="reduce-overhead", fullgraph=True)
self.image = self.image.resize(RESOLUTION_MAPPING[args.ckpt])
class T2IAdapterSDXLBenchmark(T2IAdapterBenchmark):
pipeline_class = StableDiffusionXLAdapterPipeline
root_ckpt = "stabilityai/stable-diffusion-xl-base-1.0"
url = "https://huggingface.co/datasets/diffusers/docs-images/resolve/main/benchmarking/canny_for_adapter_sdxl.png"
image = load_image(url)
def __init__(self, args):
super().__init__(args)

View File

@@ -0,0 +1,26 @@
import argparse
import sys
sys.path.append(".")
from base_classes import ControlNetBenchmark, ControlNetSDXLBenchmark # noqa: E402
if __name__ == "__main__":
parser = argparse.ArgumentParser()
parser.add_argument(
"--ckpt",
type=str,
default="lllyasviel/sd-controlnet-canny",
choices=["lllyasviel/sd-controlnet-canny", "diffusers/controlnet-canny-sdxl-1.0"],
)
parser.add_argument("--batch_size", type=int, default=1)
parser.add_argument("--num_inference_steps", type=int, default=50)
parser.add_argument("--model_cpu_offload", action="store_true")
parser.add_argument("--run_compile", action="store_true")
args = parser.parse_args()
benchmark_pipe = (
ControlNetBenchmark(args) if args.ckpt == "lllyasviel/sd-controlnet-canny" else ControlNetSDXLBenchmark(args)
)
benchmark_pipe.benchmark(args)

View File

@@ -0,0 +1,29 @@
import argparse
import sys
sys.path.append(".")
from base_classes import ImageToImageBenchmark, TurboImageToImageBenchmark # noqa: E402
if __name__ == "__main__":
parser = argparse.ArgumentParser()
parser.add_argument(
"--ckpt",
type=str,
default="runwayml/stable-diffusion-v1-5",
choices=[
"runwayml/stable-diffusion-v1-5",
"stabilityai/stable-diffusion-2-1",
"stabilityai/stable-diffusion-xl-refiner-1.0",
"stabilityai/sdxl-turbo",
],
)
parser.add_argument("--batch_size", type=int, default=1)
parser.add_argument("--num_inference_steps", type=int, default=50)
parser.add_argument("--model_cpu_offload", action="store_true")
parser.add_argument("--run_compile", action="store_true")
args = parser.parse_args()
benchmark_pipe = ImageToImageBenchmark(args) if "turbo" not in args.ckpt else TurboImageToImageBenchmark(args)
benchmark_pipe.benchmark(args)

View File

@@ -0,0 +1,28 @@
import argparse
import sys
sys.path.append(".")
from base_classes import InpaintingBenchmark # noqa: E402
if __name__ == "__main__":
parser = argparse.ArgumentParser()
parser.add_argument(
"--ckpt",
type=str,
default="runwayml/stable-diffusion-v1-5",
choices=[
"runwayml/stable-diffusion-v1-5",
"stabilityai/stable-diffusion-2-1",
"stabilityai/stable-diffusion-xl-base-1.0",
],
)
parser.add_argument("--batch_size", type=int, default=1)
parser.add_argument("--num_inference_steps", type=int, default=50)
parser.add_argument("--model_cpu_offload", action="store_true")
parser.add_argument("--run_compile", action="store_true")
args = parser.parse_args()
benchmark_pipe = InpaintingBenchmark(args)
benchmark_pipe.benchmark(args)

View File

@@ -0,0 +1,28 @@
import argparse
import sys
sys.path.append(".")
from base_classes import T2IAdapterBenchmark, T2IAdapterSDXLBenchmark # noqa: E402
if __name__ == "__main__":
parser = argparse.ArgumentParser()
parser.add_argument(
"--ckpt",
type=str,
default="TencentARC/t2iadapter_canny_sd14v1",
choices=["TencentARC/t2iadapter_canny_sd14v1", "TencentARC/t2i-adapter-canny-sdxl-1.0"],
)
parser.add_argument("--batch_size", type=int, default=1)
parser.add_argument("--num_inference_steps", type=int, default=50)
parser.add_argument("--model_cpu_offload", action="store_true")
parser.add_argument("--run_compile", action="store_true")
args = parser.parse_args()
benchmark_pipe = (
T2IAdapterBenchmark(args)
if args.ckpt == "TencentARC/t2iadapter_canny_sd14v1"
else T2IAdapterSDXLBenchmark(args)
)
benchmark_pipe.benchmark(args)

View File

@@ -0,0 +1,23 @@
import argparse
import sys
sys.path.append(".")
from base_classes import LCMLoRATextToImageBenchmark # noqa: E402
if __name__ == "__main__":
parser = argparse.ArgumentParser()
parser.add_argument(
"--ckpt",
type=str,
default="stabilityai/stable-diffusion-xl-base-1.0",
)
parser.add_argument("--batch_size", type=int, default=1)
parser.add_argument("--num_inference_steps", type=int, default=4)
parser.add_argument("--model_cpu_offload", action="store_true")
parser.add_argument("--run_compile", action="store_true")
args = parser.parse_args()
benchmark_pipe = LCMLoRATextToImageBenchmark(args)
benchmark_pipe.benchmark(args)

View File

@@ -0,0 +1,40 @@
import argparse
import sys
sys.path.append(".")
from base_classes import TextToImageBenchmark, TurboTextToImageBenchmark # noqa: E402
ALL_T2I_CKPTS = [
"runwayml/stable-diffusion-v1-5",
"segmind/SSD-1B",
"stabilityai/stable-diffusion-xl-base-1.0",
"kandinsky-community/kandinsky-2-2-decoder",
"warp-ai/wuerstchen",
"stabilityai/sdxl-turbo",
]
if __name__ == "__main__":
parser = argparse.ArgumentParser()
parser.add_argument(
"--ckpt",
type=str,
default="runwayml/stable-diffusion-v1-5",
choices=ALL_T2I_CKPTS,
)
parser.add_argument("--batch_size", type=int, default=1)
parser.add_argument("--num_inference_steps", type=int, default=50)
parser.add_argument("--model_cpu_offload", action="store_true")
parser.add_argument("--run_compile", action="store_true")
args = parser.parse_args()
benchmark_cls = None
if "turbo" in args.ckpt:
benchmark_cls = TurboTextToImageBenchmark
else:
benchmark_cls = TextToImageBenchmark
benchmark_pipe = benchmark_cls(args)
benchmark_pipe.benchmark(args)

View File

@@ -0,0 +1,72 @@
import glob
import sys
import pandas as pd
from huggingface_hub import hf_hub_download, upload_file
from huggingface_hub.utils._errors import EntryNotFoundError
sys.path.append(".")
from utils import BASE_PATH, FINAL_CSV_FILE, GITHUB_SHA, REPO_ID, collate_csv # noqa: E402
def has_previous_benchmark() -> str:
csv_path = None
try:
csv_path = hf_hub_download(repo_id=REPO_ID, repo_type="dataset", filename=FINAL_CSV_FILE)
except EntryNotFoundError:
csv_path = None
return csv_path
def filter_float(value):
if isinstance(value, str):
return float(value.split()[0])
return value
def push_to_hf_dataset():
all_csvs = sorted(glob.glob(f"{BASE_PATH}/*.csv"))
collate_csv(all_csvs, FINAL_CSV_FILE)
# If there's an existing benchmark file, we should report the changes.
csv_path = has_previous_benchmark()
if csv_path is not None:
current_results = pd.read_csv(FINAL_CSV_FILE)
previous_results = pd.read_csv(csv_path)
numeric_columns = current_results.select_dtypes(include=["float64", "int64"]).columns
numeric_columns = [
c for c in numeric_columns if c not in ["batch_size", "num_inference_steps", "actual_gpu_memory (gbs)"]
]
for column in numeric_columns:
previous_results[column] = previous_results[column].map(lambda x: filter_float(x))
# Calculate the percentage change
current_results[column] = current_results[column].astype(float)
previous_results[column] = previous_results[column].astype(float)
percent_change = ((current_results[column] - previous_results[column]) / previous_results[column]) * 100
# Format the values with '+' or '-' sign and append to original values
current_results[column] = current_results[column].map(str) + percent_change.map(
lambda x: f" ({'+' if x > 0 else ''}{x:.2f}%)"
)
# There might be newly added rows. So, filter out the NaNs.
current_results[column] = current_results[column].map(lambda x: x.replace(" (nan%)", ""))
# Overwrite the current result file.
current_results.to_csv(FINAL_CSV_FILE, index=False)
commit_message = f"upload from sha: {GITHUB_SHA}" if GITHUB_SHA is not None else "upload benchmark results"
upload_file(
repo_id=REPO_ID,
path_in_repo=FINAL_CSV_FILE,
path_or_fileobj=FINAL_CSV_FILE,
repo_type="dataset",
commit_message=commit_message,
)
if __name__ == "__main__":
push_to_hf_dataset()

97
benchmarks/run_all.py Normal file
View File

@@ -0,0 +1,97 @@
import glob
import subprocess
import sys
from typing import List
sys.path.append(".")
from benchmark_text_to_image import ALL_T2I_CKPTS # noqa: E402
PATTERN = "benchmark_*.py"
class SubprocessCallException(Exception):
pass
# Taken from `test_examples_utils.py`
def run_command(command: List[str], return_stdout=False):
"""
Runs `command` with `subprocess.check_output` and will potentially return the `stdout`. Will also properly capture
if an error occurred while running `command`
"""
try:
output = subprocess.check_output(command, stderr=subprocess.STDOUT)
if return_stdout:
if hasattr(output, "decode"):
output = output.decode("utf-8")
return output
except subprocess.CalledProcessError as e:
raise SubprocessCallException(
f"Command `{' '.join(command)}` failed with the following error:\n\n{e.output.decode()}"
) from e
def main():
python_files = glob.glob(PATTERN)
for file in python_files:
print(f"****** Running file: {file} ******")
# Run with canonical settings.
if file != "benchmark_text_to_image.py":
command = f"python {file}"
run_command(command.split())
command += " --run_compile"
run_command(command.split())
# Run variants.
for file in python_files:
if file == "benchmark_text_to_image.py":
for ckpt in ALL_T2I_CKPTS:
command = f"python {file} --ckpt {ckpt}"
if "turbo" in ckpt:
command += " --num_inference_steps 1"
run_command(command.split())
command += " --run_compile"
run_command(command.split())
elif file == "benchmark_sd_img.py":
for ckpt in ["stabilityai/stable-diffusion-xl-refiner-1.0", "stabilityai/sdxl-turbo"]:
command = f"python {file} --ckpt {ckpt}"
if ckpt == "stabilityai/sdxl-turbo":
command += " --num_inference_steps 2"
run_command(command.split())
command += " --run_compile"
run_command(command.split())
elif file == "benchmark_sd_inpainting.py":
sdxl_ckpt = "stabilityai/stable-diffusion-xl-base-1.0"
command = f"python {file} --ckpt {sdxl_ckpt}"
run_command(command.split())
command += " --run_compile"
run_command(command.split())
elif file in ["benchmark_controlnet.py", "benchmark_t2i_adapter.py"]:
sdxl_ckpt = (
"diffusers/controlnet-canny-sdxl-1.0"
if "controlnet" in file
else "TencentARC/t2i-adapter-canny-sdxl-1.0"
)
command = f"python {file} --ckpt {sdxl_ckpt}"
run_command(command.split())
command += " --run_compile"
run_command(command.split())
if __name__ == "__main__":
main()

98
benchmarks/utils.py Normal file
View File

@@ -0,0 +1,98 @@
import argparse
import csv
import gc
import os
from dataclasses import dataclass
from typing import Dict, List, Union
import torch
import torch.utils.benchmark as benchmark
GITHUB_SHA = os.getenv("GITHUB_SHA", None)
BENCHMARK_FIELDS = [
"pipeline_cls",
"ckpt_id",
"batch_size",
"num_inference_steps",
"model_cpu_offload",
"run_compile",
"time (secs)",
"memory (gbs)",
"actual_gpu_memory (gbs)",
"github_sha",
]
PROMPT = "ghibli style, a fantasy landscape with castles"
BASE_PATH = os.getenv("BASE_PATH", ".")
TOTAL_GPU_MEMORY = float(os.getenv("TOTAL_GPU_MEMORY", torch.cuda.get_device_properties(0).total_memory / (1024**3)))
REPO_ID = "diffusers/benchmarks"
FINAL_CSV_FILE = "collated_results.csv"
@dataclass
class BenchmarkInfo:
time: float
memory: float
def flush():
"""Wipes off memory."""
gc.collect()
torch.cuda.empty_cache()
torch.cuda.reset_max_memory_allocated()
torch.cuda.reset_peak_memory_stats()
def bytes_to_giga_bytes(bytes):
return f"{(bytes / 1024 / 1024 / 1024):.3f}"
def benchmark_fn(f, *args, **kwargs):
t0 = benchmark.Timer(
stmt="f(*args, **kwargs)",
globals={"args": args, "kwargs": kwargs, "f": f},
num_threads=torch.get_num_threads(),
)
return f"{(t0.blocked_autorange().mean):.3f}"
def generate_csv_dict(
pipeline_cls: str, ckpt: str, args: argparse.Namespace, benchmark_info: BenchmarkInfo
) -> Dict[str, Union[str, bool, float]]:
"""Packs benchmarking data into a dictionary for latter serialization."""
data_dict = {
"pipeline_cls": pipeline_cls,
"ckpt_id": ckpt,
"batch_size": args.batch_size,
"num_inference_steps": args.num_inference_steps,
"model_cpu_offload": args.model_cpu_offload,
"run_compile": args.run_compile,
"time (secs)": benchmark_info.time,
"memory (gbs)": benchmark_info.memory,
"actual_gpu_memory (gbs)": f"{(TOTAL_GPU_MEMORY):.3f}",
"github_sha": GITHUB_SHA,
}
return data_dict
def write_to_csv(file_name: str, data_dict: Dict[str, Union[str, bool, float]]):
"""Serializes a dictionary into a CSV file."""
with open(file_name, mode="w", newline="") as csvfile:
writer = csv.DictWriter(csvfile, fieldnames=BENCHMARK_FIELDS)
writer.writeheader()
writer.writerow(data_dict)
def collate_csv(input_files: List[str], output_file: str):
"""Collates multiple identically structured CSVs into a single CSV file."""
with open(output_file, mode="w", newline="") as outfile:
writer = csv.DictWriter(outfile, fieldnames=BENCHMARK_FIELDS)
writer.writeheader()
for file in input_files:
with open(file, mode="r") as infile:
reader = csv.DictReader(infile)
for row in reader:
writer.writerow(row)

View File

@@ -24,9 +24,9 @@ ENV PATH="/opt/venv/bin:$PATH"
# pre-install the heavy dependencies (these can later be overridden by the deps from setup.py)
RUN python3 -m pip install --no-cache-dir --upgrade pip && \
python3 -m pip install --no-cache-dir \
torch \
torchvision \
torchaudio \
torch==2.1.2 \
torchvision==0.16.2 \
torchaudio==2.1.2 \
onnxruntime \
--extra-index-url https://download.pytorch.org/whl/cpu && \
python3 -m pip install --no-cache-dir \

View File

@@ -24,9 +24,9 @@ ENV PATH="/opt/venv/bin:$PATH"
# pre-install the heavy dependencies (these can later be overridden by the deps from setup.py)
RUN python3 -m pip install --no-cache-dir --upgrade pip && \
python3 -m pip install --no-cache-dir \
torch \
torchvision \
torchaudio \
torch==2.1.2 \
torchvision==0.16.2 \
torchaudio==2.1.2 \
"onnxruntime-gpu>=1.13.1" \
--extra-index-url https://download.pytorch.org/whl/cu117 && \
python3 -m pip install --no-cache-dir \

View File

@@ -40,7 +40,6 @@ RUN python3.9 -m pip install --no-cache-dir --upgrade pip && \
numpy \
scipy \
tensorboard \
transformers \
omegaconf
transformers
CMD ["/bin/bash"]

View File

@@ -40,7 +40,6 @@ RUN python3 -m pip install --no-cache-dir --upgrade pip && \
scipy \
tensorboard \
transformers \
omegaconf \
pytorch-lightning
CMD ["/bin/bash"]

View File

@@ -40,7 +40,6 @@ RUN python3 -m pip install --no-cache-dir --upgrade pip && \
scipy \
tensorboard \
transformers \
omegaconf \
xformers
CMD ["/bin/bash"]

View File

@@ -1,5 +1,5 @@
<!---
Copyright 2023- The HuggingFace Team. All rights reserved.
Copyright 2024- The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.

View File

@@ -1,4 +1,4 @@
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

View File

@@ -19,6 +19,8 @@
title: Train a diffusion model
- local: tutorials/using_peft_for_inference
title: Inference with PEFT
- local: tutorials/fast_diffusion
title: Accelerate inference of text-to-image diffusion models
title: Tutorials
- sections:
- sections:
@@ -158,6 +160,8 @@
title: xFormers
- local: optimization/tome
title: Token merging
- local: optimization/deepcache
title: DeepCache
title: General optimizations
- sections:
- local: using-diffusers/stable_diffusion_jax_how_to
@@ -198,6 +202,8 @@
title: Outputs
title: Main Classes
- sections:
- local: api/loaders/ip_adapter
title: IP-Adapter
- local: api/loaders/lora
title: LoRA
- local: api/loaders/single_file
@@ -206,6 +212,8 @@
title: Textual Inversion
- local: api/loaders/unet
title: UNet
- local: api/loaders/peft
title: PEFT
title: Loaders
- sections:
- local: api/models/overview
@@ -220,6 +228,8 @@
title: UNet3DConditionModel
- local: api/models/unet-motion
title: UNetMotionModel
- local: api/models/uvit2d
title: UViT2DModel
- local: api/models/vq
title: VQModel
- local: api/models/autoencoderkl
@@ -242,14 +252,12 @@
- sections:
- local: api/pipelines/overview
title: Overview
- local: api/pipelines/alt_diffusion
title: AltDiffusion
- local: api/pipelines/amused
title: aMUSEd
- local: api/pipelines/animatediff
title: AnimateDiff
- local: api/pipelines/attend_and_excite
title: Attend-and-Excite
- local: api/pipelines/audio_diffusion
title: Audio Diffusion
- local: api/pipelines/audioldm
title: AudioLDM
- local: api/pipelines/audioldm2
@@ -264,12 +272,6 @@
title: ControlNet
- local: api/pipelines/controlnet_sdxl
title: ControlNet with Stable Diffusion XL
- local: api/pipelines/controlnetxs
title: ControlNet-XS
- local: api/pipelines/controlnetxs_sdxl
title: ControlNet-XS with Stable Diffusion XL
- local: api/pipelines/cycle_diffusion
title: Cycle Diffusion
- local: api/pipelines/dance_diffusion
title: Dance Diffusion
- local: api/pipelines/ddim
@@ -282,6 +284,8 @@
title: DiffEdit
- local: api/pipelines/dit
title: DiT
- local: api/pipelines/i2vgenxl
title: I2VGen-XL
- local: api/pipelines/pix2pix
title: InstructPix2Pix
- local: api/pipelines/kandinsky
@@ -300,26 +304,16 @@
title: MusicLDM
- local: api/pipelines/paint_by_example
title: Paint by Example
- local: api/pipelines/paradigms
title: Parallel Sampling of Diffusion Models
- local: api/pipelines/pix2pix_zero
title: Pix2Pix Zero
- local: api/pipelines/pia
title: Personalized Image Animator (PIA)
- local: api/pipelines/pixart
title: PixArt-α
- local: api/pipelines/pndm
title: PNDM
- local: api/pipelines/repaint
title: RePaint
- local: api/pipelines/score_sde_ve
title: Score SDE VE
- local: api/pipelines/self_attention_guidance
title: Self-Attention Guidance
- local: api/pipelines/semantic_stable_diffusion
title: Semantic Guidance
- local: api/pipelines/shap_e
title: Shap-E
- local: api/pipelines/spectrogram_diffusion
title: Spectrogram Diffusion
- sections:
- local: api/pipelines/stable_diffusion/overview
title: Overview
@@ -345,6 +339,8 @@
title: Latent upscaler
- local: api/pipelines/stable_diffusion/upscale
title: Super-resolution
- local: api/pipelines/stable_diffusion/k_diffusion
title: K-Diffusion
- local: api/pipelines/stable_diffusion/ldm3d_diffusion
title: LDM3D Text-to-(RGB, Depth), Text-to-(RGB-pano, Depth-pano), LDM3D Upscaler
- local: api/pipelines/stable_diffusion/adapter
@@ -354,26 +350,16 @@
title: Stable Diffusion
- local: api/pipelines/stable_unclip
title: Stable unCLIP
- local: api/pipelines/stochastic_karras_ve
title: Stochastic Karras VE
- local: api/pipelines/model_editing
title: Text-to-image model editing
- local: api/pipelines/text_to_video
title: Text-to-video
- local: api/pipelines/text_to_video_zero
title: Text2Video-Zero
- local: api/pipelines/unclip
title: unCLIP
- local: api/pipelines/latent_diffusion_uncond
title: Unconditional Latent Diffusion
- local: api/pipelines/unidiffuser
title: UniDiffuser
- local: api/pipelines/value_guided_sampling
title: Value-guided sampling
- local: api/pipelines/versatile_diffusion
title: Versatile Diffusion
- local: api/pipelines/vq_diffusion
title: VQ Diffusion
- local: api/pipelines/wuerstchen
title: Wuerstchen
title: Pipelines

View File

@@ -1,4 +1,4 @@
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

View File

@@ -1,4 +1,4 @@
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

View File

@@ -1,4 +1,4 @@
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

View File

@@ -1,4 +1,4 @@
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

View File

@@ -1,4 +1,4 @@
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

View File

@@ -0,0 +1,25 @@
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->
# IP-Adapter
[IP-Adapter](https://hf.co/papers/2308.06721) is a lightweight adapter that enables prompting a diffusion model with an image. This method decouples the cross-attention layers of the image and text features. The image features are generated from an image encoder. Files generated from IP-Adapter are only ~100MBs.
<Tip>
Learn how to load an IP-Adapter checkpoint and image in the [IP-Adapter](../../using-diffusers/loading_adapters#ip-adapter) loading guide.
</Tip>
## IPAdapterMixin
[[autodoc]] loaders.ip_adapter.IPAdapterMixin

View File

@@ -1,4 +1,4 @@
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

View File

@@ -0,0 +1,25 @@
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->
# PEFT
Diffusers supports loading adapters such as [LoRA](../../using-diffusers/loading_adapters) with the [PEFT](https://huggingface.co/docs/peft/index) library with the [`~loaders.peft.PeftAdapterMixin`] class. This allows modeling classes in Diffusers like [`UNet2DConditionModel`] to load an adapter.
<Tip>
Refer to the [Inference with PEFT](../../tutorials/using_peft_for_inference.md) tutorial for an overview of how to use PEFT in Diffusers for inference.
</Tip>
## PeftAdapterMixin
[[autodoc]] loaders.peft.PeftAdapterMixin

View File

@@ -1,4 +1,4 @@
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
@@ -30,8 +30,8 @@ To learn more about how to load single file weights, see the [Load different Sta
## FromOriginalVAEMixin
[[autodoc]] loaders.single_file.FromOriginalVAEMixin
[[autodoc]] loaders.autoencoder.FromOriginalVAEMixin
## FromOriginalControlnetMixin
[[autodoc]] loaders.single_file.FromOriginalControlnetMixin
[[autodoc]] loaders.controlnet.FromOriginalControlNetMixin

View File

@@ -1,4 +1,4 @@
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

View File

@@ -1,4 +1,4 @@
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

View File

@@ -1,4 +1,4 @@
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

View File

@@ -1,4 +1,4 @@
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
@@ -49,12 +49,12 @@ make_image_grid([original_image, mask_image, image], rows=1, cols=3)
## AsymmetricAutoencoderKL
[[autodoc]] models.autoencoder_asym_kl.AsymmetricAutoencoderKL
[[autodoc]] models.autoencoders.autoencoder_asym_kl.AsymmetricAutoencoderKL
## AutoencoderKLOutput
[[autodoc]] models.autoencoder_kl.AutoencoderKLOutput
[[autodoc]] models.autoencoders.autoencoder_kl.AutoencoderKLOutput
## DecoderOutput
[[autodoc]] models.vae.DecoderOutput
[[autodoc]] models.autoencoders.vae.DecoderOutput

View File

@@ -1,4 +1,4 @@
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
@@ -54,4 +54,4 @@ image
## AutoencoderTinyOutput
[[autodoc]] models.autoencoder_tiny.AutoencoderTinyOutput
[[autodoc]] models.autoencoders.autoencoder_tiny.AutoencoderTinyOutput

View File

@@ -1,4 +1,4 @@
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
@@ -33,14 +33,17 @@ model = AutoencoderKL.from_single_file(url)
## AutoencoderKL
[[autodoc]] AutoencoderKL
- decode
- encode
- all
## AutoencoderKLOutput
[[autodoc]] models.autoencoder_kl.AutoencoderKLOutput
[[autodoc]] models.autoencoders.autoencoder_kl.AutoencoderKLOutput
## DecoderOutput
[[autodoc]] models.vae.DecoderOutput
[[autodoc]] models.autoencoders.vae.DecoderOutput
## FlaxAutoencoderKL

View File

@@ -1,4 +1,4 @@
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

View File

@@ -1,4 +1,4 @@
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

View File

@@ -1,4 +1,4 @@
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
@@ -24,4 +24,4 @@ The abstract from the paper is:
## PriorTransformerOutput
[[autodoc]] models.prior_transformer.PriorTransformerOutput
[[autodoc]] models.transformers.prior_transformer.PriorTransformerOutput

View File

@@ -1,4 +1,4 @@
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
@@ -38,4 +38,4 @@ It is assumed one of the input classes is the masked latent pixel. The predicted
## Transformer2DModelOutput
[[autodoc]] models.transformer_2d.Transformer2DModelOutput
[[autodoc]] models.transformers.transformer_2d.Transformer2DModelOutput

View File

@@ -1,4 +1,4 @@
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
@@ -16,8 +16,8 @@ A Transformer model for video-like data.
## TransformerTemporalModel
[[autodoc]] models.transformer_temporal.TransformerTemporalModel
[[autodoc]] models.transformers.transformer_temporal.TransformerTemporalModel
## TransformerTemporalModelOutput
[[autodoc]] models.transformer_temporal.TransformerTemporalModelOutput
[[autodoc]] models.transformers.transformer_temporal.TransformerTemporalModelOutput

View File

@@ -1,4 +1,4 @@
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
@@ -22,4 +22,4 @@ The abstract from the paper is:
[[autodoc]] UNetMotionModel
## UNet3DConditionOutput
[[autodoc]] models.unet_3d_condition.UNet3DConditionOutput
[[autodoc]] models.unets.unet_3d_condition.UNet3DConditionOutput

View File

@@ -1,4 +1,4 @@
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
@@ -22,4 +22,4 @@ The abstract from the paper is:
[[autodoc]] UNet1DModel
## UNet1DOutput
[[autodoc]] models.unet_1d.UNet1DOutput
[[autodoc]] models.unets.unet_1d.UNet1DOutput

View File

@@ -1,4 +1,4 @@
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
@@ -22,10 +22,10 @@ The abstract from the paper is:
[[autodoc]] UNet2DConditionModel
## UNet2DConditionOutput
[[autodoc]] models.unet_2d_condition.UNet2DConditionOutput
[[autodoc]] models.unets.unet_2d_condition.UNet2DConditionOutput
## FlaxUNet2DConditionModel
[[autodoc]] models.unet_2d_condition_flax.FlaxUNet2DConditionModel
[[autodoc]] models.unets.unet_2d_condition_flax.FlaxUNet2DConditionModel
## FlaxUNet2DConditionOutput
[[autodoc]] models.unet_2d_condition_flax.FlaxUNet2DConditionOutput
[[autodoc]] models.unets.unet_2d_condition_flax.FlaxUNet2DConditionOutput

View File

@@ -1,4 +1,4 @@
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
@@ -22,4 +22,4 @@ The abstract from the paper is:
[[autodoc]] UNet2DModel
## UNet2DOutput
[[autodoc]] models.unet_2d.UNet2DOutput
[[autodoc]] models.unets.unet_2d.UNet2DOutput

View File

@@ -1,4 +1,4 @@
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
@@ -22,4 +22,4 @@ The abstract from the paper is:
[[autodoc]] UNet3DConditionModel
## UNet3DConditionOutput
[[autodoc]] models.unet_3d_condition.UNet3DConditionOutput
[[autodoc]] models.unets.unet_3d_condition.UNet3DConditionOutput

View File

@@ -0,0 +1,39 @@
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->
# UVit2DModel
The [U-ViT](https://hf.co/papers/2301.11093) model is a vision transformer (ViT) based UNet. This model incorporates elements from ViT (considers all inputs such as time, conditions and noisy image patches as tokens) and a UNet (long skip connections between the shallow and deep layers). The skip connection is important for predicting pixel-level features. An additional 3x3 convolutional block is applied prior to the final output to improve image quality.
The abstract from the paper is:
*Currently, applying diffusion models in pixel space of high resolution images is difficult. Instead, existing approaches focus on diffusion in lower dimensional spaces (latent diffusion), or have multiple super-resolution levels of generation referred to as cascades. The downside is that these approaches add additional complexity to the diffusion framework. This paper aims to improve denoising diffusion for high resolution images while keeping the model as simple as possible. The paper is centered around the research question: How can one train a standard denoising diffusion models on high resolution images, and still obtain performance comparable to these alternate approaches? The four main findings are: 1) the noise schedule should be adjusted for high resolution images, 2) It is sufficient to scale only a particular part of the architecture, 3) dropout should be added at specific locations in the architecture, and 4) downsampling is an effective strategy to avoid high resolution feature maps. Combining these simple yet effective techniques, we achieve state-of-the-art on image generation among diffusion models without sampling modifiers on ImageNet.*
## UVit2DModel
[[autodoc]] UVit2DModel
## UVit2DConvEmbed
[[autodoc]] models.unets.uvit_2d.UVit2DConvEmbed
## UVitBlock
[[autodoc]] models.unets.uvit_2d.UVitBlock
## ConvNextBlock
[[autodoc]] models.unets.uvit_2d.ConvNextBlock
## ConvMlmLayer
[[autodoc]] models.unets.uvit_2d.ConvMlmLayer

View File

@@ -1,4 +1,4 @@
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

View File

@@ -1,4 +1,4 @@
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

View File

@@ -1,4 +1,4 @@
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

View File

@@ -1,47 +0,0 @@
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->
# AltDiffusion
AltDiffusion was proposed in [AltCLIP: Altering the Language Encoder in CLIP for Extended Language Capabilities](https://huggingface.co/papers/2211.06679) by Zhongzhi Chen, Guang Liu, Bo-Wen Zhang, Fulong Ye, Qinghong Yang, Ledell Wu.
The abstract from the paper is:
*In this work, we present a conceptually simple and effective method to train a strong bilingual/multilingual multimodal representation model. Starting from the pre-trained multimodal representation model CLIP released by OpenAI, we altered its text encoder with a pre-trained multilingual text encoder XLM-R, and aligned both languages and image representations by a two-stage training schema consisting of teacher learning and contrastive learning. We validate our method through evaluations of a wide range of tasks. We set new state-of-the-art performances on a bunch of tasks including ImageNet-CN, Flicker30k-CN, COCO-CN and XTD. Further, we obtain very close performances with CLIP on almost all tasks, suggesting that one can simply alter the text encoder in CLIP for extended capabilities such as multilingual understanding. Our models and code are available at [this https URL](https://github.com/FlagAI-Open/FlagAI).*
## Tips
`AltDiffusion` is conceptually the same as [Stable Diffusion](./stable_diffusion/overview).
<Tip>
Make sure to check out the Schedulers [guide](../../using-diffusers/schedulers) to learn how to explore the tradeoff between scheduler speed and quality, and see the [reuse components across pipelines](../../using-diffusers/loading#reuse-components-across-pipelines) section to learn how to efficiently load the same components into multiple pipelines.
</Tip>
## AltDiffusionPipeline
[[autodoc]] AltDiffusionPipeline
- all
- __call__
## AltDiffusionImg2ImgPipeline
[[autodoc]] AltDiffusionImg2ImgPipeline
- all
- __call__
## AltDiffusionPipelineOutput
[[autodoc]] pipelines.alt_diffusion.AltDiffusionPipelineOutput
- all
- __call__

View File

@@ -0,0 +1,48 @@
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->
# aMUSEd
aMUSEd was introduced in [aMUSEd: An Open MUSE Reproduction](https://huggingface.co/papers/2401.01808) by Suraj Patil, William Berman, Robin Rombach, and Patrick von Platen.
Amused is a lightweight text to image model based off of the [MUSE](https://arxiv.org/abs/2301.00704) architecture. Amused is particularly useful in applications that require a lightweight and fast model such as generating many images quickly at once.
Amused is a vqvae token based transformer that can generate an image in fewer forward passes than many diffusion models. In contrast with muse, it uses the smaller text encoder CLIP-L/14 instead of t5-xxl. Due to its small parameter count and few forward pass generation process, amused can generate many images quickly. This benefit is seen particularly at larger batch sizes.
The abstract from the paper is:
*We present aMUSEd, an open-source, lightweight masked image model (MIM) for text-to-image generation based on MUSE. With 10 percent of MUSE's parameters, aMUSEd is focused on fast image generation. We believe MIM is under-explored compared to latent diffusion, the prevailing approach for text-to-image generation. Compared to latent diffusion, MIM requires fewer inference steps and is more interpretable. Additionally, MIM can be fine-tuned to learn additional styles with only a single image. We hope to encourage further exploration of MIM by demonstrating its effectiveness on large-scale text-to-image generation and releasing reproducible training code. We also release checkpoints for two models which directly produce images at 256x256 and 512x512 resolutions.*
| Model | Params |
|-------|--------|
| [amused-256](https://huggingface.co/amused/amused-256) | 603M |
| [amused-512](https://huggingface.co/amused/amused-512) | 608M |
## AmusedPipeline
[[autodoc]] AmusedPipeline
- __call__
- all
- enable_xformers_memory_efficient_attention
- disable_xformers_memory_efficient_attention
[[autodoc]] AmusedImg2ImgPipeline
- __call__
- all
- enable_xformers_memory_efficient_attention
- disable_xformers_memory_efficient_attention
[[autodoc]] AmusedInpaintPipeline
- __call__
- all
- enable_xformers_memory_efficient_attention
- disable_xformers_memory_efficient_attention

View File

@@ -1,4 +1,4 @@
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
@@ -25,6 +25,7 @@ The abstract of the paper is the following:
| Pipeline | Tasks | Demo
|---|---|:---:|
| [AnimateDiffPipeline](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/animatediff/pipeline_animatediff.py) | *Text-to-Video Generation with AnimateDiff* |
| [AnimateDiffVideoToVideoPipeline](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/animatediff/pipeline_animatediff_video2video.py) | *Video-to-Video Generation with AnimateDiff* |
## Available checkpoints
@@ -32,22 +33,29 @@ Motion Adapter checkpoints can be found under [guoyww](https://huggingface.co/gu
## Usage example
### AnimateDiffPipeline
AnimateDiff works with a MotionAdapter checkpoint and a Stable Diffusion model checkpoint. The MotionAdapter is a collection of Motion Modules that are responsible for adding coherent motion across image frames. These modules are applied after the Resnet and Attention blocks in Stable Diffusion UNet.
The following example demonstrates how to use a *MotionAdapter* checkpoint with Diffusers for inference based on StableDiffusion-1.4/1.5.
```python
import torch
from diffusers import MotionAdapter, AnimateDiffPipeline, DDIMScheduler
from diffusers import AnimateDiffPipeline, DDIMScheduler, MotionAdapter
from diffusers.utils import export_to_gif
# Load the motion adapter
adapter = MotionAdapter.from_pretrained("guoyww/animatediff-motion-adapter-v1-5-2")
adapter = MotionAdapter.from_pretrained("guoyww/animatediff-motion-adapter-v1-5-2", torch_dtype=torch.float16)
# load SD 1.5 based finetuned model
model_id = "SG161222/Realistic_Vision_V5.1_noVAE"
pipe = AnimateDiffPipeline.from_pretrained(model_id, motion_adapter=adapter)
pipe = AnimateDiffPipeline.from_pretrained(model_id, motion_adapter=adapter, torch_dtype=torch.float16)
scheduler = DDIMScheduler.from_pretrained(
model_id, subfolder="scheduler", clip_sample=False, timestep_spacing="linspace", steps_offset=1
model_id,
subfolder="scheduler",
clip_sample=False,
timestep_spacing="linspace",
beta_schedule="linear",
steps_offset=1,
)
pipe.scheduler = scheduler
@@ -70,6 +78,7 @@ output = pipe(
)
frames = output.frames[0]
export_to_gif(frames, "animation.gif")
```
Here are some sample outputs:
@@ -88,28 +97,143 @@ Here are some sample outputs:
<Tip>
AnimateDiff tends to work better with finetuned Stable Diffusion models. If you plan on using a scheduler that can clip samples, make sure to disable it by setting `clip_sample=False` in the scheduler as this can also have an adverse effect on generated samples.
AnimateDiff tends to work better with finetuned Stable Diffusion models. If you plan on using a scheduler that can clip samples, make sure to disable it by setting `clip_sample=False` in the scheduler as this can also have an adverse effect on generated samples. Additionally, the AnimateDiff checkpoints can be sensitive to the beta schedule of the scheduler. We recommend setting this to `linear`.
</Tip>
### AnimateDiffVideoToVideoPipeline
AnimateDiff can also be used to generate visually similar videos or enable style/character/background or other edits starting from an initial video, allowing you to seamlessly explore creative possibilities.
```python
import imageio
import requests
import torch
from diffusers import AnimateDiffVideoToVideoPipeline, DDIMScheduler, MotionAdapter
from diffusers.utils import export_to_gif
from io import BytesIO
from PIL import Image
# Load the motion adapter
adapter = MotionAdapter.from_pretrained("guoyww/animatediff-motion-adapter-v1-5-2", torch_dtype=torch.float16)
# load SD 1.5 based finetuned model
model_id = "SG161222/Realistic_Vision_V5.1_noVAE"
pipe = AnimateDiffVideoToVideoPipeline.from_pretrained(model_id, motion_adapter=adapter, torch_dtype=torch.float16).to("cuda")
scheduler = DDIMScheduler.from_pretrained(
model_id,
subfolder="scheduler",
clip_sample=False,
timestep_spacing="linspace",
beta_schedule="linear",
steps_offset=1,
)
pipe.scheduler = scheduler
# enable memory savings
pipe.enable_vae_slicing()
pipe.enable_model_cpu_offload()
# helper function to load videos
def load_video(file_path: str):
images = []
if file_path.startswith(('http://', 'https://')):
# If the file_path is a URL
response = requests.get(file_path)
response.raise_for_status()
content = BytesIO(response.content)
vid = imageio.get_reader(content)
else:
# Assuming it's a local file path
vid = imageio.get_reader(file_path)
for frame in vid:
pil_image = Image.fromarray(frame)
images.append(pil_image)
return images
video = load_video("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/animatediff-vid2vid-input-1.gif")
output = pipe(
video = video,
prompt="panda playing a guitar, on a boat, in the ocean, high quality",
negative_prompt="bad quality, worse quality",
guidance_scale=7.5,
num_inference_steps=25,
strength=0.5,
generator=torch.Generator("cpu").manual_seed(42),
)
frames = output.frames[0]
export_to_gif(frames, "animation.gif")
```
Here are some sample outputs:
<table>
<tr>
<th align=center>Source Video</th>
<th align=center>Output Video</th>
</tr>
<tr>
<td align=center>
raccoon playing a guitar
<br />
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/animatediff-vid2vid-input-1.gif"
alt="racoon playing a guitar"
style="width: 300px;" />
</td>
<td align=center>
panda playing a guitar
<br/>
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/animatediff-vid2vid-output-1.gif"
alt="panda playing a guitar"
style="width: 300px;" />
</td>
</tr>
<tr>
<td align=center>
closeup of margot robbie, fireworks in the background, high quality
<br />
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/animatediff-vid2vid-input-2.gif"
alt="closeup of margot robbie, fireworks in the background, high quality"
style="width: 300px;" />
</td>
<td align=center>
closeup of tony stark, robert downey jr, fireworks
<br/>
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/animatediff-vid2vid-output-2.gif"
alt="closeup of tony stark, robert downey jr, fireworks"
style="width: 300px;" />
</td>
</tr>
</table>
## Using Motion LoRAs
Motion LoRAs are a collection of LoRAs that work with the `guoyww/animatediff-motion-adapter-v1-5-2` checkpoint. These LoRAs are responsible for adding specific types of motion to the animations.
```python
import torch
from diffusers import MotionAdapter, AnimateDiffPipeline, DDIMScheduler
from diffusers import AnimateDiffPipeline, DDIMScheduler, MotionAdapter
from diffusers.utils import export_to_gif
# Load the motion adapter
adapter = MotionAdapter.from_pretrained("guoyww/animatediff-motion-adapter-v1-5-2")
adapter = MotionAdapter.from_pretrained("guoyww/animatediff-motion-adapter-v1-5-2", torch_dtype=torch.float16)
# load SD 1.5 based finetuned model
model_id = "SG161222/Realistic_Vision_V5.1_noVAE"
pipe = AnimateDiffPipeline.from_pretrained(model_id, motion_adapter=adapter)
pipe.load_lora_weights("guoyww/animatediff-motion-lora-zoom-out", adapter_name="zoom-out")
pipe = AnimateDiffPipeline.from_pretrained(model_id, motion_adapter=adapter, torch_dtype=torch.float16)
pipe.load_lora_weights(
"guoyww/animatediff-motion-lora-zoom-out", adapter_name="zoom-out"
)
scheduler = DDIMScheduler.from_pretrained(
model_id, subfolder="scheduler", clip_sample=False, timestep_spacing="linspace", steps_offset=1
model_id,
subfolder="scheduler",
clip_sample=False,
beta_schedule="linear",
timestep_spacing="linspace",
steps_offset=1,
)
pipe.scheduler = scheduler
@@ -132,6 +256,7 @@ output = pipe(
)
frames = output.frames[0]
export_to_gif(frames, "animation.gif")
```
<table>
@@ -160,21 +285,30 @@ Then you can use the following code to combine Motion LoRAs.
```python
import torch
from diffusers import MotionAdapter, AnimateDiffPipeline, DDIMScheduler
from diffusers import AnimateDiffPipeline, DDIMScheduler, MotionAdapter
from diffusers.utils import export_to_gif
# Load the motion adapter
adapter = MotionAdapter.from_pretrained("guoyww/animatediff-motion-adapter-v1-5-2")
adapter = MotionAdapter.from_pretrained("guoyww/animatediff-motion-adapter-v1-5-2", torch_dtype=torch.float16)
# load SD 1.5 based finetuned model
model_id = "SG161222/Realistic_Vision_V5.1_noVAE"
pipe = AnimateDiffPipeline.from_pretrained(model_id, motion_adapter=adapter)
pipe = AnimateDiffPipeline.from_pretrained(model_id, motion_adapter=adapter, torch_dtype=torch.float16)
pipe.load_lora_weights("diffusers/animatediff-motion-lora-zoom-out", adapter_name="zoom-out")
pipe.load_lora_weights("diffusers/animatediff-motion-lora-pan-left", adapter_name="pan-left")
pipe.load_lora_weights(
"diffusers/animatediff-motion-lora-zoom-out", adapter_name="zoom-out",
)
pipe.load_lora_weights(
"diffusers/animatediff-motion-lora-pan-left", adapter_name="pan-left",
)
pipe.set_adapters(["zoom-out", "pan-left"], adapter_weights=[1.0, 1.0])
scheduler = DDIMScheduler.from_pretrained(
model_id, subfolder="scheduler", clip_sample=False, timestep_spacing="linspace", steps_offset=1
model_id,
subfolder="scheduler",
clip_sample=False,
timestep_spacing="linspace",
beta_schedule="linear",
steps_offset=1,
)
pipe.scheduler = scheduler
@@ -197,6 +331,7 @@ output = pipe(
)
frames = output.frames[0]
export_to_gif(frames, "animation.gif")
```
<table>
@@ -211,6 +346,62 @@ export_to_gif(frames, "animation.gif")
</tr>
</table>
## Using FreeInit
[FreeInit: Bridging Initialization Gap in Video Diffusion Models](https://arxiv.org/abs/2312.07537) by Tianxing Wu, Chenyang Si, Yuming Jiang, Ziqi Huang, Ziwei Liu.
FreeInit is an effective method that improves temporal consistency and overall quality of videos generated using video-diffusion-models without any addition training. It can be applied to AnimateDiff, ModelScope, VideoCrafter and various other video generation models seamlessly at inference time, and works by iteratively refining the latent-initialization noise. More details can be found it the paper.
The following example demonstrates the usage of FreeInit.
```python
import torch
from diffusers import MotionAdapter, AnimateDiffPipeline, DDIMScheduler
from diffusers.utils import export_to_gif
adapter = MotionAdapter.from_pretrained("guoyww/animatediff-motion-adapter-v1-5-2")
model_id = "SG161222/Realistic_Vision_V5.1_noVAE"
pipe = AnimateDiffPipeline.from_pretrained(model_id, motion_adapter=adapter, torch_dtype=torch.float16).to("cuda")
pipe.scheduler = DDIMScheduler.from_pretrained(
model_id,
subfolder="scheduler",
beta_schedule="linear",
clip_sample=False,
timestep_spacing="linspace",
steps_offset=1
)
# enable memory savings
pipe.enable_vae_slicing()
pipe.enable_vae_tiling()
# enable FreeInit
# Refer to the enable_free_init documentation for a full list of configurable parameters
pipe.enable_free_init(method="butterworth", use_fast_sampling=True)
# run inference
output = pipe(
prompt="a panda playing a guitar, on a boat, in the ocean, high quality",
negative_prompt="bad quality, worse quality",
num_frames=16,
guidance_scale=7.5,
num_inference_steps=20,
generator=torch.Generator("cpu").manual_seed(666),
)
# disable FreeInit
pipe.disable_free_init()
frames = output.frames[0]
export_to_gif(frames, "animation.gif")
```
<Tip warning={true}>
FreeInit is not really free - the improved quality comes at the cost of extra computation. It requires sampling a few extra times depending on the `num_iters` parameter that is set when enabling it. Setting the `use_fast_sampling` parameter to `True` can improve the overall performance (at the cost of lower quality compared to when `use_fast_sampling=False` but still better results than vanilla video generation models).
</Tip>
<Tip>
Make sure to check out the Schedulers [guide](../../using-diffusers/schedulers) to learn how to explore the tradeoff between scheduler speed and quality, and see the [reuse components across pipelines](../../using-diffusers/loading#reuse-components-across-pipelines) section to learn how to efficiently load the same components into multiple pipelines.
@@ -220,14 +411,14 @@ Make sure to check out the Schedulers [guide](../../using-diffusers/schedulers)
## AnimateDiffPipeline
[[autodoc]] AnimateDiffPipeline
- all
- __call__
- enable_freeu
- disable_freeu
- enable_vae_slicing
- disable_vae_slicing
- enable_vae_tiling
- disable_vae_tiling
- all
- __call__
## AnimateDiffVideoToVideoPipeline
[[autodoc]] AnimateDiffVideoToVideoPipeline
- all
- __call__
## AnimateDiffPipelineOutput

View File

@@ -1,4 +1,4 @@
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

View File

@@ -1,35 +0,0 @@
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->
# Audio Diffusion
[Audio Diffusion](https://github.com/teticio/audio-diffusion) is by Robert Dargavel Smith, and it leverages the recent advances in image generation from diffusion models by converting audio samples to and from Mel spectrogram images.
<Tip>
Make sure to check out the Schedulers [guide](../../using-diffusers/schedulers) to learn how to explore the tradeoff between scheduler speed and quality, and see the [reuse components across pipelines](../../using-diffusers/loading#reuse-components-across-pipelines) section to learn how to efficiently load the same components into multiple pipelines.
</Tip>
## AudioDiffusionPipeline
[[autodoc]] AudioDiffusionPipeline
- all
- __call__
## AudioPipelineOutput
[[autodoc]] pipelines.AudioPipelineOutput
## ImagePipelineOutput
[[autodoc]] pipelines.ImagePipelineOutput
## Mel
[[autodoc]] Mel

View File

@@ -1,4 +1,4 @@
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

View File

@@ -1,4 +1,4 @@
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

View File

@@ -1,4 +1,4 @@
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

View File

@@ -1,4 +1,4 @@
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

View File

@@ -1,4 +1,4 @@
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

View File

@@ -1,4 +1,4 @@
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

View File

@@ -1,4 +1,4 @@
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

View File

@@ -1,33 +0,0 @@
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->
# Cycle Diffusion
Cycle Diffusion is a text guided image-to-image generation model proposed in [Unifying Diffusion Models' Latent Space, with Applications to CycleDiffusion and Guidance](https://huggingface.co/papers/2210.05559) by Chen Henry Wu, Fernando De la Torre.
The abstract from the paper is:
*Diffusion models have achieved unprecedented performance in generative modeling. The commonly-adopted formulation of the latent code of diffusion models is a sequence of gradually denoised samples, as opposed to the simpler (e.g., Gaussian) latent space of GANs, VAEs, and normalizing flows. This paper provides an alternative, Gaussian formulation of the latent space of various diffusion models, as well as an invertible DPM-Encoder that maps images into the latent space. While our formulation is purely based on the definition of diffusion models, we demonstrate several intriguing consequences. (1) Empirically, we observe that a common latent space emerges from two diffusion models trained independently on related domains. In light of this finding, we propose CycleDiffusion, which uses DPM-Encoder for unpaired image-to-image translation. Furthermore, applying CycleDiffusion to text-to-image diffusion models, we show that large-scale text-to-image diffusion models can be used as zero-shot image-to-image editors. (2) One can guide pre-trained diffusion models and GANs by controlling the latent codes in a unified, plug-and-play formulation based on energy-based models. Using the CLIP model and a face recognition model as guidance, we demonstrate that diffusion models have better coverage of low-density sub-populations and individuals than GANs. The code is publicly available at [this https URL](https://github.com/ChenWu98/cycle-diffusion).*
<Tip>
Make sure to check out the Schedulers [guide](../../using-diffusers/schedulers) to learn how to explore the tradeoff between scheduler speed and quality, and see the [reuse components across pipelines](../../using-diffusers/loading#reuse-components-across-pipelines) section to learn how to efficiently load the same components into multiple pipelines.
</Tip>
## CycleDiffusionPipeline
[[autodoc]] CycleDiffusionPipeline
- all
- __call__
## StableDiffusionPiplineOutput
[[autodoc]] pipelines.stable_diffusion.StableDiffusionPipelineOutput

View File

@@ -1,4 +1,4 @@
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

View File

@@ -1,4 +1,4 @@
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

View File

@@ -1,4 +1,4 @@
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

View File

@@ -1,4 +1,4 @@
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

View File

@@ -1,4 +1,4 @@
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

View File

@@ -1,4 +1,4 @@
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

View File

@@ -0,0 +1,57 @@
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->
# I2VGen-XL
[I2VGen-XL: High-Quality Image-to-Video Synthesis via Cascaded Diffusion Models](https://hf.co/papers/2311.04145.pdf) by Shiwei Zhang, Jiayu Wang, Yingya Zhang, Kang Zhao, Hangjie Yuan, Zhiwu Qin, Xiang Wang, Deli Zhao, and Jingren Zhou.
The abstract from the paper is:
*Video synthesis has recently made remarkable strides benefiting from the rapid development of diffusion models. However, it still encounters challenges in terms of semantic accuracy, clarity and spatio-temporal continuity. They primarily arise from the scarcity of well-aligned text-video data and the complex inherent structure of videos, making it difficult for the model to simultaneously ensure semantic and qualitative excellence. In this report, we propose a cascaded I2VGen-XL approach that enhances model performance by decoupling these two factors and ensures the alignment of the input data by utilizing static images as a form of crucial guidance. I2VGen-XL consists of two stages: i) the base stage guarantees coherent semantics and preserves content from input images by using two hierarchical encoders, and ii) the refinement stage enhances the video's details by incorporating an additional brief text and improves the resolution to 1280×720. To improve the diversity, we collect around 35 million single-shot text-video pairs and 6 billion text-image pairs to optimize the model. By this means, I2VGen-XL can simultaneously enhance the semantic accuracy, continuity of details and clarity of generated videos. Through extensive experiments, we have investigated the underlying principles of I2VGen-XL and compared it with current top methods, which can demonstrate its effectiveness on diverse data. The source code and models will be publicly available at [this https URL](https://i2vgen-xl.github.io/).*
The original codebase can be found [here](https://github.com/ali-vilab/i2vgen-xl/). The model checkpoints can be found [here](https://huggingface.co/ali-vilab/).
<Tip>
Make sure to check out the Schedulers [guide](../../using-diffusers/schedulers) to learn how to explore the tradeoff between scheduler speed and quality, and see the [reuse components across pipelines](../../using-diffusers/loading#reuse-components-across-pipelines) section to learn how to efficiently load the same components into multiple pipelines. Also, to know more about reducing the memory usage of this pipeline, refer to the ["Reduce memory usage"] section [here](../../using-diffusers/svd#reduce-memory-usage).
</Tip>
Sample output with I2VGenXL:
<table>
<tr>
<td><center>
library.
<br>
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/i2vgen-xl-example.gif"
alt="library"
style="width: 300px;" />
</center></td>
</tr>
</table>
## Notes
* I2VGenXL always uses a `clip_skip` value of 1. This means it leverages the penultimate layer representations from the text encoder of CLIP.
* It can generate videos of quality that is often on par with [Stable Video Diffusion](../../using-diffusers/svd) (SVD).
* Unlike SVD, it additionally accepts text prompts as inputs.
* It can generate higher resolution videos.
* When using the [`DDIMScheduler`] (which is default for this pipeline), less than 50 steps for inference leads to bad results.
## I2VGenXLPipeline
[[autodoc]] I2VGenXLPipeline
- all
- __call__
## I2VGenXLPipelineOutput
[[autodoc]] pipelines.i2vgen_xl.pipeline_i2vgen_xl.I2VGenXLPipelineOutput

View File

@@ -1,4 +1,4 @@
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0

View File

@@ -1,4 +1,4 @@
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0

View File

@@ -1,4 +1,4 @@
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0

View File

@@ -1,4 +1,4 @@
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

View File

@@ -1,4 +1,4 @@
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

View File

@@ -1,35 +0,0 @@
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->
# Unconditional Latent Diffusion
Unconditional Latent Diffusion was proposed in [High-Resolution Image Synthesis with Latent Diffusion Models](https://huggingface.co/papers/2112.10752) by Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, Björn Ommer.
The abstract from the paper is:
*By decomposing the image formation process into a sequential application of denoising autoencoders, diffusion models (DMs) achieve state-of-the-art synthesis results on image data and beyond. Additionally, their formulation allows for a guiding mechanism to control the image generation process without retraining. However, since these models typically operate directly in pixel space, optimization of powerful DMs often consumes hundreds of GPU days and inference is expensive due to sequential evaluations. To enable DM training on limited computational resources while retaining their quality and flexibility, we apply them in the latent space of powerful pretrained autoencoders. In contrast to previous work, training diffusion models on such a representation allows for the first time to reach a near-optimal point between complexity reduction and detail preservation, greatly boosting visual fidelity. By introducing cross-attention layers into the model architecture, we turn diffusion models into powerful and flexible generators for general conditioning inputs such as text or bounding boxes and high-resolution synthesis becomes possible in a convolutional manner. Our latent diffusion models (LDMs) achieve a new state of the art for image inpainting and highly competitive performance on various tasks, including unconditional image generation, semantic scene synthesis, and super-resolution, while significantly reducing computational requirements compared to pixel-based DMs.*
The original codebase can be found at [CompVis/latent-diffusion](https://github.com/CompVis/latent-diffusion).
<Tip>
Make sure to check out the Schedulers [guide](../../using-diffusers/schedulers) to learn how to explore the tradeoff between scheduler speed and quality, and see the [reuse components across pipelines](../../using-diffusers/loading#reuse-components-across-pipelines) section to learn how to efficiently load the same components into multiple pipelines.
</Tip>
## LDMPipeline
[[autodoc]] LDMPipeline
- all
- __call__
## ImagePipelineOutput
[[autodoc]] pipelines.ImagePipelineOutput

View File

@@ -1,35 +0,0 @@
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->
# Text-to-image model editing
[Editing Implicit Assumptions in Text-to-Image Diffusion Models](https://huggingface.co/papers/2303.08084) is by Hadas Orgad, Bahjat Kawar, and Yonatan Belinkov. This pipeline enables editing diffusion model weights, such that its assumptions of a given concept are changed. The resulting change is expected to take effect in all prompt generations related to the edited concept.
The abstract from the paper is:
*Text-to-image diffusion models often make implicit assumptions about the world when generating images. While some assumptions are useful (e.g., the sky is blue), they can also be outdated, incorrect, or reflective of social biases present in the training data. Thus, there is a need to control these assumptions without requiring explicit user input or costly re-training. In this work, we aim to edit a given implicit assumption in a pre-trained diffusion model. Our Text-to-Image Model Editing method, TIME for short, receives a pair of inputs: a "source" under-specified prompt for which the model makes an implicit assumption (e.g., "a pack of roses"), and a "destination" prompt that describes the same setting, but with a specified desired attribute (e.g., "a pack of blue roses"). TIME then updates the model's cross-attention layers, as these layers assign visual meaning to textual tokens. We edit the projection matrices in these layers such that the source prompt is projected close to the destination prompt. Our method is highly efficient, as it modifies a mere 2.2% of the model's parameters in under one second. To evaluate model editing approaches, we introduce TIMED (TIME Dataset), containing 147 source and destination prompt pairs from various domains. Our experiments (using Stable Diffusion) show that TIME is successful in model editing, generalizes well for related prompts unseen during editing, and imposes minimal effect on unrelated generations.*
You can find additional information about model editing on the [project page](https://time-diffusion.github.io/), [original codebase](https://github.com/bahjat-kawar/time-diffusion), and try it out in a [demo](https://huggingface.co/spaces/bahjat-kawar/time-diffusion).
<Tip>
Make sure to check out the Schedulers [guide](../../using-diffusers/schedulers) to learn how to explore the tradeoff between scheduler speed and quality, and see the [reuse components across pipelines](../../using-diffusers/loading#reuse-components-across-pipelines) section to learn how to efficiently load the same components into multiple pipelines.
</Tip>
## StableDiffusionModelEditingPipeline
[[autodoc]] StableDiffusionModelEditingPipeline
- __call__
- all
## StableDiffusionPipelineOutput
[[autodoc]] pipelines.stable_diffusion.StableDiffusionPipelineOutput

View File

@@ -1,4 +1,4 @@
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

View File

@@ -1,4 +1,4 @@
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

View File

@@ -1,4 +1,4 @@
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

View File

@@ -1,4 +1,4 @@
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

View File

@@ -1,51 +0,0 @@
<!--Copyright 2023 ParaDiGMS authors and The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->
# Parallel Sampling of Diffusion Models
[Parallel Sampling of Diffusion Models](https://huggingface.co/papers/2305.16317) is by Andy Shih, Suneel Belkhale, Stefano Ermon, Dorsa Sadigh, Nima Anari.
The abstract from the paper is:
*Diffusion models are powerful generative models but suffer from slow sampling, often taking 1000 sequential denoising steps for one sample. As a result, considerable efforts have been directed toward reducing the number of denoising steps, but these methods hurt sample quality. Instead of reducing the number of denoising steps (trading quality for speed), in this paper we explore an orthogonal approach: can we run the denoising steps in parallel (trading compute for speed)? In spite of the sequential nature of the denoising steps, we show that surprisingly it is possible to parallelize sampling via Picard iterations, by guessing the solution of future denoising steps and iteratively refining until convergence. With this insight, we present ParaDiGMS, a novel method to accelerate the sampling of pretrained diffusion models by denoising multiple steps in parallel. ParaDiGMS is the first diffusion sampling method that enables trading compute for speed and is even compatible with existing fast sampling techniques such as DDIM and DPMSolver. Using ParaDiGMS, we improve sampling speed by 2-4x across a range of robotics and image generation models, giving state-of-the-art sampling speeds of 0.2s on 100-step DiffusionPolicy and 14.6s on 1000-step StableDiffusion-v2 with no measurable degradation of task reward, FID score, or CLIP score.*
The original codebase can be found at [AndyShih12/paradigms](https://github.com/AndyShih12/paradigms), and the pipeline was contributed by [AndyShih12](https://github.com/AndyShih12). ❤️
## Tips
This pipeline improves sampling speed by running denoising steps in parallel, at the cost of increased total FLOPs.
Therefore, it is better to call this pipeline when running on multiple GPUs. Otherwise, without enough GPU bandwidth
sampling may be even slower than sequential sampling.
The two parameters to play with are `parallel` (batch size) and `tolerance`.
- If it fits in memory, for a 1000-step DDPM you can aim for a batch size of around 100 (for example, 8 GPUs and `batch_per_device=12` to get `parallel=96`). A higher batch size may not fit in memory, and lower batch size gives less parallelism.
- For tolerance, using a higher tolerance may get better speedups but can risk sample quality degradation. If there is quality degradation with the default tolerance, then use a lower tolerance like `0.001`.
For a 1000-step DDPM on 8 A100 GPUs, you can expect around a 3x speedup from [`StableDiffusionParadigmsPipeline`] compared to the [`StableDiffusionPipeline`]
by setting `parallel=80` and `tolerance=0.1`.
🤗 Diffusers offers [distributed inference support](../../training/distributed_inference) for generating multiple prompts
in parallel on multiple GPUs. But [`StableDiffusionParadigmsPipeline`] is designed for speeding up sampling of a single prompt by using multiple GPUs.
<Tip>
Make sure to check out the Schedulers [guide](../../using-diffusers/schedulers) to learn how to explore the tradeoff between scheduler speed and quality, and see the [reuse components across pipelines](../../using-diffusers/loading#reuse-components-across-pipelines) section to learn how to efficiently load the same components into multiple pipelines.
</Tip>
## StableDiffusionParadigmsPipeline
[[autodoc]] StableDiffusionParadigmsPipeline
- __call__
- all
## StableDiffusionPipelineOutput
[[autodoc]] pipelines.stable_diffusion.StableDiffusionPipelineOutput

View File

@@ -0,0 +1,167 @@
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->
# Image-to-Video Generation with PIA (Personalized Image Animator)
## Overview
[PIA: Your Personalized Image Animator via Plug-and-Play Modules in Text-to-Image Models](https://arxiv.org/abs/2312.13964) by Yiming Zhang, Zhening Xing, Yanhong Zeng, Youqing Fang, Kai Chen
Recent advancements in personalized text-to-image (T2I) models have revolutionized content creation, empowering non-experts to generate stunning images with unique styles. While promising, adding realistic motions into these personalized images by text poses significant challenges in preserving distinct styles, high-fidelity details, and achieving motion controllability by text. In this paper, we present PIA, a Personalized Image Animator that excels in aligning with condition images, achieving motion controllability by text, and the compatibility with various personalized T2I models without specific tuning. To achieve these goals, PIA builds upon a base T2I model with well-trained temporal alignment layers, allowing for the seamless transformation of any personalized T2I model into an image animation model. A key component of PIA is the introduction of the condition module, which utilizes the condition frame and inter-frame affinity as input to transfer appearance information guided by the affinity hint for individual frame synthesis in the latent space. This design mitigates the challenges of appearance-related image alignment within and allows for a stronger focus on aligning with motion-related guidance.
[Project page](https://pi-animator.github.io/)
## Available Pipelines
| Pipeline | Tasks | Demo
|---|---|:---:|
| [PIAPipeline](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/pia/pipeline_pia.py) | *Image-to-Video Generation with PIA* |
## Available checkpoints
Motion Adapter checkpoints for PIA can be found under the [OpenMMLab org](https://huggingface.co/openmmlab/PIA-condition-adapter). These checkpoints are meant to work with any model based on Stable Diffusion 1.5
## Usage example
PIA works with a MotionAdapter checkpoint and a Stable Diffusion 1.5 model checkpoint. The MotionAdapter is a collection of Motion Modules that are responsible for adding coherent motion across image frames. These modules are applied after the Resnet and Attention blocks in the Stable Diffusion UNet. In addition to the motion modules, PIA also replaces the input convolution layer of the SD 1.5 UNet model with a 9 channel input convolution layer.
The following example demonstrates how to use PIA to generate a video from a single image.
```python
import torch
from diffusers import (
EulerDiscreteScheduler,
MotionAdapter,
PIAPipeline,
)
from diffusers.utils import export_to_gif, load_image
adapter = MotionAdapter.from_pretrained("openmmlab/PIA-condition-adapter")
pipe = PIAPipeline.from_pretrained("SG161222/Realistic_Vision_V6.0_B1_noVAE", motion_adapter=adapter, torch_dtype=torch.float16)
pipe.scheduler = EulerDiscreteScheduler.from_config(pipe.scheduler.config)
pipe.enable_model_cpu_offload()
pipe.enable_vae_slicing()
image = load_image(
"https://huggingface.co/datasets/hf-internal-testing/diffusers-images/resolve/main/pix2pix/cat_6.png?download=true"
)
image = image.resize((512, 512))
prompt = "cat in a field"
negative_prompt = "wrong white balance, dark, sketches,worst quality,low quality"
generator = torch.Generator("cpu").manual_seed(0)
output = pipe(image=image, prompt=prompt, generator=generator)
frames = output.frames[0]
export_to_gif(frames, "pia-animation.gif")
```
Here are some sample outputs:
<table>
<tr>
<td><center>
cat in a field.
<br>
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/pia-default-output.gif"
alt="cat in a field"
style="width: 300px;" />
</center></td>
</tr>
</table>
<Tip>
If you plan on using a scheduler that can clip samples, make sure to disable it by setting `clip_sample=False` in the scheduler as this can also have an adverse effect on generated samples. Additionally, the PIA checkpoints can be sensitive to the beta schedule of the scheduler. We recommend setting this to `linear`.
</Tip>
## Using FreeInit
[FreeInit: Bridging Initialization Gap in Video Diffusion Models](https://arxiv.org/abs/2312.07537) by Tianxing Wu, Chenyang Si, Yuming Jiang, Ziqi Huang, Ziwei Liu.
FreeInit is an effective method that improves temporal consistency and overall quality of videos generated using video-diffusion-models without any addition training. It can be applied to PIA, AnimateDiff, ModelScope, VideoCrafter and various other video generation models seamlessly at inference time, and works by iteratively refining the latent-initialization noise. More details can be found it the paper.
The following example demonstrates the usage of FreeInit.
```python
import torch
from diffusers import (
DDIMScheduler,
MotionAdapter,
PIAPipeline,
)
from diffusers.utils import export_to_gif, load_image
adapter = MotionAdapter.from_pretrained("openmmlab/PIA-condition-adapter")
pipe = PIAPipeline.from_pretrained("SG161222/Realistic_Vision_V6.0_B1_noVAE", motion_adapter=adapter)
# enable FreeInit
# Refer to the enable_free_init documentation for a full list of configurable parameters
pipe.enable_free_init(method="butterworth", use_fast_sampling=True)
# Memory saving options
pipe.enable_model_cpu_offload()
pipe.enable_vae_slicing()
pipe.scheduler = DDIMScheduler.from_config(pipe.scheduler.config)
image = load_image(
"https://huggingface.co/datasets/hf-internal-testing/diffusers-images/resolve/main/pix2pix/cat_6.png?download=true"
)
image = image.resize((512, 512))
prompt = "cat in a field"
negative_prompt = "wrong white balance, dark, sketches,worst quality,low quality"
generator = torch.Generator("cpu").manual_seed(0)
output = pipe(image=image, prompt=prompt, generator=generator)
frames = output.frames[0]
export_to_gif(frames, "pia-freeinit-animation.gif")
```
<table>
<tr>
<td><center>
cat in a field.
<br>
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/pia-freeinit-output-cat.gif"
alt="cat in a field"
style="width: 300px;" />
</center></td>
</tr>
</table>
<Tip warning={true}>
FreeInit is not really free - the improved quality comes at the cost of extra computation. It requires sampling a few extra times depending on the `num_iters` parameter that is set when enabling it. Setting the `use_fast_sampling` parameter to `True` can improve the overall performance (at the cost of lower quality compared to when `use_fast_sampling=False` but still better results than vanilla video generation models).
</Tip>
## PIAPipeline
[[autodoc]] PIAPipeline
- all
- __call__
- enable_freeu
- disable_freeu
- enable_free_init
- disable_free_init
- enable_vae_slicing
- disable_vae_slicing
- enable_vae_tiling
- disable_vae_tiling
## PIAPipelineOutput
[[autodoc]] pipelines.pia.PIAPipelineOutput

View File

@@ -1,4 +1,4 @@
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

View File

@@ -1,289 +0,0 @@
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->
# Pix2Pix Zero
[Zero-shot Image-to-Image Translation](https://huggingface.co/papers/2302.03027) is by Gaurav Parmar, Krishna Kumar Singh, Richard Zhang, Yijun Li, Jingwan Lu, and Jun-Yan Zhu.
The abstract from the paper is:
*Large-scale text-to-image generative models have shown their remarkable ability to synthesize diverse and high-quality images. However, it is still challenging to directly apply these models for editing real images for two reasons. First, it is hard for users to come up with a perfect text prompt that accurately describes every visual detail in the input image. Second, while existing models can introduce desirable changes in certain regions, they often dramatically alter the input content and introduce unexpected changes in unwanted regions. In this work, we propose pix2pix-zero, an image-to-image translation method that can preserve the content of the original image without manual prompting. We first automatically discover editing directions that reflect desired edits in the text embedding space. To preserve the general content structure after editing, we further propose cross-attention guidance, which aims to retain the cross-attention maps of the input image throughout the diffusion process. In addition, our method does not need additional training for these edits and can directly use the existing pre-trained text-to-image diffusion model. We conduct extensive experiments and show that our method outperforms existing and concurrent works for both real and synthetic image editing.*
You can find additional information about Pix2Pix Zero on the [project page](https://pix2pixzero.github.io/), [original codebase](https://github.com/pix2pixzero/pix2pix-zero), and try it out in a [demo](https://huggingface.co/spaces/pix2pix-zero-library/pix2pix-zero-demo).
## Tips
* The pipeline can be conditioned on real input images. Check out the code examples below to know more.
* The pipeline exposes two arguments namely `source_embeds` and `target_embeds`
that let you control the direction of the semantic edits in the final image to be generated. Let's say,
you wanted to translate from "cat" to "dog". In this case, the edit direction will be "cat -> dog". To reflect
this in the pipeline, you simply have to set the embeddings related to the phrases including "cat" to
`source_embeds` and "dog" to `target_embeds`. Refer to the code example below for more details.
* When you're using this pipeline from a prompt, specify the _source_ concept in the prompt. Taking
the above example, a valid input prompt would be: "a high resolution painting of a **cat** in the style of van gogh".
* If you wanted to reverse the direction in the example above, i.e., "dog -> cat", then it's recommended to:
* Swap the `source_embeds` and `target_embeds`.
* Change the input prompt to include "dog".
* To learn more about how the source and target embeddings are generated, refer to the [original paper](https://arxiv.org/abs/2302.03027). Below, we also provide some directions on how to generate the embeddings.
* Note that the quality of the outputs generated with this pipeline is dependent on how good the `source_embeds` and `target_embeds` are. Please, refer to [this discussion](#generating-source-and-target-embeddings) for some suggestions on the topic.
## Available Pipelines:
| Pipeline | Tasks | Demo
|---|---|:---:|
| [StableDiffusionPix2PixZeroPipeline](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_pix2pix_zero.py) | *Text-Based Image Editing* | [🤗 Space](https://huggingface.co/spaces/pix2pix-zero-library/pix2pix-zero-demo) |
<!-- TODO: add Colab -->
## Usage example
### Based on an image generated with the input prompt
```python
import requests
import torch
from diffusers import DDIMScheduler, StableDiffusionPix2PixZeroPipeline
def download(embedding_url, local_filepath):
r = requests.get(embedding_url)
with open(local_filepath, "wb") as f:
f.write(r.content)
model_ckpt = "CompVis/stable-diffusion-v1-4"
pipeline = StableDiffusionPix2PixZeroPipeline.from_pretrained(
model_ckpt, conditions_input_image=False, torch_dtype=torch.float16
)
pipeline.scheduler = DDIMScheduler.from_config(pipeline.scheduler.config)
pipeline.to("cuda")
prompt = "a high resolution painting of a cat in the style of van gogh"
src_embs_url = "https://github.com/pix2pixzero/pix2pix-zero/raw/main/assets/embeddings_sd_1.4/cat.pt"
target_embs_url = "https://github.com/pix2pixzero/pix2pix-zero/raw/main/assets/embeddings_sd_1.4/dog.pt"
for url in [src_embs_url, target_embs_url]:
download(url, url.split("/")[-1])
src_embeds = torch.load(src_embs_url.split("/")[-1])
target_embeds = torch.load(target_embs_url.split("/")[-1])
image = pipeline(
prompt,
source_embeds=src_embeds,
target_embeds=target_embeds,
num_inference_steps=50,
cross_attention_guidance_amount=0.15,
).images[0]
image
```
### Based on an input image
When the pipeline is conditioned on an input image, we first obtain an inverted
noise from it using a `DDIMInverseScheduler` with the help of a generated caption. Then the inverted noise is used to start the generation process.
First, let's load our pipeline:
```py
import torch
from transformers import BlipForConditionalGeneration, BlipProcessor
from diffusers import DDIMScheduler, DDIMInverseScheduler, StableDiffusionPix2PixZeroPipeline
captioner_id = "Salesforce/blip-image-captioning-base"
processor = BlipProcessor.from_pretrained(captioner_id)
model = BlipForConditionalGeneration.from_pretrained(captioner_id, torch_dtype=torch.float16, low_cpu_mem_usage=True)
sd_model_ckpt = "CompVis/stable-diffusion-v1-4"
pipeline = StableDiffusionPix2PixZeroPipeline.from_pretrained(
sd_model_ckpt,
caption_generator=model,
caption_processor=processor,
torch_dtype=torch.float16,
safety_checker=None,
)
pipeline.scheduler = DDIMScheduler.from_config(pipeline.scheduler.config)
pipeline.inverse_scheduler = DDIMInverseScheduler.from_config(pipeline.scheduler.config)
pipeline.enable_model_cpu_offload()
```
Then, we load an input image for conditioning and obtain a suitable caption for it:
```py
from diffusers.utils import load_image
img_url = "https://github.com/pix2pixzero/pix2pix-zero/raw/main/assets/test_images/cats/cat_6.png"
raw_image = load_image(url).resize((512, 512))
caption = pipeline.generate_caption(raw_image)
caption
```
Then we employ the generated caption and the input image to get the inverted noise:
```py
generator = torch.manual_seed(0)
inv_latents = pipeline.invert(caption, image=raw_image, generator=generator).latents
```
Now, generate the image with edit directions:
```py
# See the "Generating source and target embeddings" section below to
# automate the generation of these captions with a pre-trained model like Flan-T5 as explained below.
source_prompts = ["a cat sitting on the street", "a cat playing in the field", "a face of a cat"]
target_prompts = ["a dog sitting on the street", "a dog playing in the field", "a face of a dog"]
source_embeds = pipeline.get_embeds(source_prompts, batch_size=2)
target_embeds = pipeline.get_embeds(target_prompts, batch_size=2)
image = pipeline(
caption,
source_embeds=source_embeds,
target_embeds=target_embeds,
num_inference_steps=50,
cross_attention_guidance_amount=0.15,
generator=generator,
latents=inv_latents,
negative_prompt=caption,
).images[0]
image
```
## Generating source and target embeddings
The authors originally used the [GPT-3 API](https://openai.com/api/) to generate the source and target captions for discovering
edit directions. However, we can also leverage open source and public models for the same purpose.
Below, we provide an end-to-end example with the [Flan-T5](https://huggingface.co/docs/transformers/model_doc/flan-t5) model
for generating captions and [CLIP](https://huggingface.co/docs/transformers/model_doc/clip) for
computing embeddings on the generated captions.
**1. Load the generation model**:
```py
import torch
from transformers import AutoTokenizer, T5ForConditionalGeneration
tokenizer = AutoTokenizer.from_pretrained("google/flan-t5-xl")
model = T5ForConditionalGeneration.from_pretrained("google/flan-t5-xl", device_map="auto", torch_dtype=torch.float16)
```
**2. Construct a starting prompt**:
```py
source_concept = "cat"
target_concept = "dog"
source_text = f"Provide a caption for images containing a {source_concept}. "
"The captions should be in English and should be no longer than 150 characters."
target_text = f"Provide a caption for images containing a {target_concept}. "
"The captions should be in English and should be no longer than 150 characters."
```
Here, we're interested in the "cat -> dog" direction.
**3. Generate captions**:
We can use a utility like so for this purpose.
```py
def generate_captions(input_prompt):
input_ids = tokenizer(input_prompt, return_tensors="pt").input_ids.to("cuda")
outputs = model.generate(
input_ids, temperature=0.8, num_return_sequences=16, do_sample=True, max_new_tokens=128, top_k=10
)
return tokenizer.batch_decode(outputs, skip_special_tokens=True)
```
And then we just call it to generate our captions:
```py
source_captions = generate_captions(source_text)
target_captions = generate_captions(target_concept)
print(source_captions, target_captions, sep='\n')
```
We encourage you to play around with the different parameters supported by the
`generate()` method ([documentation](https://huggingface.co/docs/transformers/main/en/main_classes/text_generation#transformers.generation_tf_utils.TFGenerationMixin.generate)) for the generation quality you are looking for.
**4. Load the embedding model**:
Here, we need to use the same text encoder model used by the subsequent Stable Diffusion model.
```py
from diffusers import StableDiffusionPix2PixZeroPipeline
pipeline = StableDiffusionPix2PixZeroPipeline.from_pretrained(
"CompVis/stable-diffusion-v1-4", torch_dtype=torch.float16
)
pipeline = pipeline.to("cuda")
tokenizer = pipeline.tokenizer
text_encoder = pipeline.text_encoder
```
**5. Compute embeddings**:
```py
import torch
def embed_captions(sentences, tokenizer, text_encoder, device="cuda"):
with torch.no_grad():
embeddings = []
for sent in sentences:
text_inputs = tokenizer(
sent,
padding="max_length",
max_length=tokenizer.model_max_length,
truncation=True,
return_tensors="pt",
)
text_input_ids = text_inputs.input_ids
prompt_embeds = text_encoder(text_input_ids.to(device), attention_mask=None)[0]
embeddings.append(prompt_embeds)
return torch.concatenate(embeddings, dim=0).mean(dim=0).unsqueeze(0)
source_embeddings = embed_captions(source_captions, tokenizer, text_encoder)
target_embeddings = embed_captions(target_captions, tokenizer, text_encoder)
```
And you're done! [Here](https://colab.research.google.com/drive/1tz2C1EdfZYAPlzXXbTnf-5PRBiR8_R1F?usp=sharing) is a Colab Notebook that you can use to interact with the entire process.
Now, you can use these embeddings directly while calling the pipeline:
```py
from diffusers import DDIMScheduler
pipeline.scheduler = DDIMScheduler.from_config(pipeline.scheduler.config)
image = pipeline(
prompt,
source_embeds=source_embeddings,
target_embeds=target_embeddings,
num_inference_steps=50,
cross_attention_guidance_amount=0.15,
).images[0]
image
```
<Tip>
Make sure to check out the Schedulers [guide](../../using-diffusers/schedulers) to learn how to explore the tradeoff between scheduler speed and quality, and see the [reuse components across pipelines](../../using-diffusers/loading#reuse-components-across-pipelines) section to learn how to efficiently load the same components into multiple pipelines.
</Tip>
## StableDiffusionPix2PixZeroPipeline
[[autodoc]] StableDiffusionPix2PixZeroPipeline
- __call__
- all

View File

@@ -1,4 +1,4 @@
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

View File

@@ -1,35 +0,0 @@
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->
# PNDM
[Pseudo Numerical Methods for Diffusion Models on Manifolds](https://huggingface.co/papers/2202.09778) (PNDM) is by Luping Liu, Yi Ren, Zhijie Lin and Zhou Zhao.
The abstract from the paper is:
*Denoising Diffusion Probabilistic Models (DDPMs) can generate high-quality samples such as image and audio samples. However, DDPMs require hundreds to thousands of iterations to produce final samples. Several prior works have successfully accelerated DDPMs through adjusting the variance schedule (e.g., Improved Denoising Diffusion Probabilistic Models) or the denoising equation (e.g., Denoising Diffusion Implicit Models (DDIMs)). However, these acceleration methods cannot maintain the quality of samples and even introduce new noise at a high speedup rate, which limit their practicability. To accelerate the inference process while keeping the sample quality, we provide a fresh perspective that DDPMs should be treated as solving differential equations on manifolds. Under such a perspective, we propose pseudo numerical methods for diffusion models (PNDMs). Specifically, we figure out how to solve differential equations on manifolds and show that DDIMs are simple cases of pseudo numerical methods. We change several classical numerical methods to corresponding pseudo numerical methods and find that the pseudo linear multi-step method is the best in most situations. According to our experiments, by directly using pre-trained models on Cifar10, CelebA and LSUN, PNDMs can generate higher quality synthetic images with only 50 steps compared with 1000-step DDIMs (20x speedup), significantly outperform DDIMs with 250 steps (by around 0.4 in FID) and have good generalization on different variance schedules.*
The original codebase can be found at [luping-liu/PNDM](https://github.com/luping-liu/PNDM).
<Tip>
Make sure to check out the Schedulers [guide](../../using-diffusers/schedulers) to learn how to explore the tradeoff between scheduler speed and quality, and see the [reuse components across pipelines](../../using-diffusers/loading#reuse-components-across-pipelines) section to learn how to efficiently load the same components into multiple pipelines.
</Tip>
## PNDMPipeline
[[autodoc]] PNDMPipeline
- all
- __call__
## ImagePipelineOutput
[[autodoc]] pipelines.ImagePipelineOutput

View File

@@ -1,37 +0,0 @@
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->
# RePaint
[RePaint: Inpainting using Denoising Diffusion Probabilistic Models](https://huggingface.co/papers/2201.09865) is by Andreas Lugmayr, Martin Danelljan, Andres Romero, Fisher Yu, Radu Timofte, Luc Van Gool.
The abstract from the paper is:
*Free-form inpainting is the task of adding new content to an image in the regions specified by an arbitrary binary mask. Most existing approaches train for a certain distribution of masks, which limits their generalization capabilities to unseen mask types. Furthermore, training with pixel-wise and perceptual losses often leads to simple textural extensions towards the missing areas instead of semantically meaningful generation. In this work, we propose RePaint: A Denoising Diffusion Probabilistic Model (DDPM) based inpainting approach that is applicable to even extreme masks. We employ a pretrained unconditional DDPM as the generative prior. To condition the generation process, we only alter the reverse diffusion iterations by sampling the unmasked regions using the given image information. Since this technique does not modify or condition the original DDPM network itself, the model produces high-quality and diverse output images for any inpainting form. We validate our method for both faces and general-purpose image inpainting using standard and extreme masks.
RePaint outperforms state-of-the-art Autoregressive, and GAN approaches for at least five out of six mask distributions.*
The original codebase can be found at [andreas128/RePaint](https://github.com/andreas128/RePaint).
<Tip>
Make sure to check out the Schedulers [guide](../../using-diffusers/schedulers) to learn how to explore the tradeoff between scheduler speed and quality, and see the [reuse components across pipelines](../../using-diffusers/loading#reuse-components-across-pipelines) section to learn how to efficiently load the same components into multiple pipelines.
</Tip>
## RePaintPipeline
[[autodoc]] RePaintPipeline
- all
- __call__
## ImagePipelineOutput
[[autodoc]] pipelines.ImagePipelineOutput

View File

@@ -1,35 +0,0 @@
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->
# Score SDE VE
[Score-Based Generative Modeling through Stochastic Differential Equations](https://huggingface.co/papers/2011.13456) (Score SDE) is by Yang Song, Jascha Sohl-Dickstein, Diederik P. Kingma, Abhishek Kumar, Stefano Ermon and Ben Poole. This pipeline implements the variance expanding (VE) variant of the stochastic differential equation method.
The abstract from the paper is:
*Creating noise from data is easy; creating data from noise is generative modeling. We present a stochastic differential equation (SDE) that smoothly transforms a complex data distribution to a known prior distribution by slowly injecting noise, and a corresponding reverse-time SDE that transforms the prior distribution back into the data distribution by slowly removing the noise. Crucially, the reverse-time SDE depends only on the time-dependent gradient field (\aka, score) of the perturbed data distribution. By leveraging advances in score-based generative modeling, we can accurately estimate these scores with neural networks, and use numerical SDE solvers to generate samples. We show that this framework encapsulates previous approaches in score-based generative modeling and diffusion probabilistic modeling, allowing for new sampling procedures and new modeling capabilities. In particular, we introduce a predictor-corrector framework to correct errors in the evolution of the discretized reverse-time SDE. We also derive an equivalent neural ODE that samples from the same distribution as the SDE, but additionally enables exact likelihood computation, and improved sampling efficiency. In addition, we provide a new way to solve inverse problems with score-based models, as demonstrated with experiments on class-conditional generation, image inpainting, and colorization. Combined with multiple architectural improvements, we achieve record-breaking performance for unconditional image generation on CIFAR-10 with an Inception score of 9.89 and FID of 2.20, a competitive likelihood of 2.99 bits/dim, and demonstrate high fidelity generation of 1024 x 1024 images for the first time from a score-based generative model.*
The original codebase can be found at [yang-song/score_sde_pytorch](https://github.com/yang-song/score_sde_pytorch).
<Tip>
Make sure to check out the Schedulers [guide](../../using-diffusers/schedulers) to learn how to explore the tradeoff between scheduler speed and quality, and see the [reuse components across pipelines](../../using-diffusers/loading#reuse-components-across-pipelines) section to learn how to efficiently load the same components into multiple pipelines.
</Tip>
## ScoreSdeVePipeline
[[autodoc]] ScoreSdeVePipeline
- all
- __call__
## ImagePipelineOutput
[[autodoc]] pipelines.ImagePipelineOutput

View File

@@ -1,4 +1,4 @@
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

Some files were not shown because too many files have changed in this diff Show More