Compare commits

...

511 Commits

Author SHA1 Message Date
Sayak Paul
dbd413dfd9 Merge branch 'main' into call-pipeline-not-model 2024-07-26 15:53:25 +05:30
Dhruv Nair
1168eaaadd [CI] Nightly Test Runner explicitly set runner for Setup Pipeline Matrix (#8986)
* update

* update

* update
2024-07-26 13:20:35 +05:30
Dhruv Nair
bce9105ac7 [CI] Fix parallelism in nightly tests (#8983)
update
2024-07-26 10:04:01 +05:30
RandomGamingDev
2afb2e0aac Added accelerator based gradient accumulation for basic_example (#8966)
added accelerator based gradient accumulation for basic_example

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-07-26 09:35:52 +05:30
Sayak Paul
d87fe95f90 [Chore] add LoraLoaderMixin to the inits (#8981)
* introduce  to promote reusability.

* up

* add more tests

* up

* remove comments.

* fix fuse_nan test

* clarify the scope of fuse_lora and unfuse_lora

* remove space

* rewrite fuse_lora a bit.

* feedback

* copy over load_lora_into_text_encoder.

* address dhruv's feedback.

* fix-copies

* fix issubclass.

* num_fused_loras

* fix

* fix

* remove mapping

* up

* fix

* style

* fix-copies

* change to SD3TransformerLoRALoadersMixin

* Apply suggestions from code review

Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>

* up

* handle wuerstchen

* up

* move lora to lora_pipeline.py

* up

* fix-copies

* fix documentation.

* comment set_adapters().

* fix-copies

* fix set_adapters() at the model level.

* fix?

* fix

* loraloadermixin.

---------

Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>
2024-07-26 08:59:33 +05:30
Sayak Paul
50e66f2f95 [Chore] remove all is from auraflow. (#8980)
remove all is from auraflow.
2024-07-26 07:31:06 +05:30
efwfe
9b8c8605d1 fix guidance_scale value not equal to the value in comments (#8941)
fix guidance_scale value not equal with the value in comments
2024-07-25 12:31:37 -10:00
YiYi Xu
62863bb1ea Revert "[LoRA] introduce LoraBaseMixin to promote reusability." (#8976)
Revert "[LoRA] introduce LoraBaseMixin to promote reusability. (#8774)"

This reverts commit 527430d0a4.
2024-07-25 09:10:35 -10:00
mazharosama
1fd647f2a0 Enable CivitAI SDXL Inpainting Models Conversion (#8795)
modify in_channels in network_config params

Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>
2024-07-25 07:44:57 -10:00
asfiyab-nvidia
0bda1d7b89 Update TensorRT img2img community pipeline (#8899)
* Update TensorRT img2img pipeline

Signed-off-by: Asfiya Baig <asfiyab@nvidia.com>

* Update TensorRT version installed

Signed-off-by: Asfiya Baig <asfiyab@nvidia.com>

* make style and quality

Signed-off-by: Asfiya Baig <asfiyab@nvidia.com>

* Update examples/community/stable_diffusion_tensorrt_img2img.py

Co-authored-by: Tolga Cangöz <46008593+tolgacangoz@users.noreply.github.com>

* Update examples/community/README.md

Co-authored-by: Tolga Cangöz <46008593+tolgacangoz@users.noreply.github.com>

* Apply style and quality using ruff 0.1.5

Signed-off-by: Asfiya Baig <asfiyab@nvidia.com>

---------

Signed-off-by: Asfiya Baig <asfiyab@nvidia.com>
Co-authored-by: Tolga Cangöz <46008593+tolgacangoz@users.noreply.github.com>
Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>
2024-07-25 21:58:21 +05:30
Sayak Paul
527430d0a4 [LoRA] introduce LoraBaseMixin to promote reusability. (#8774)
* introduce  to promote reusability.

* up

* add more tests

* up

* remove comments.

* fix fuse_nan test

* clarify the scope of fuse_lora and unfuse_lora

* remove space

* rewrite fuse_lora a bit.

* feedback

* copy over load_lora_into_text_encoder.

* address dhruv's feedback.

* fix-copies

* fix issubclass.

* num_fused_loras

* fix

* fix

* remove mapping

* up

* fix

* style

* fix-copies

* change to SD3TransformerLoRALoadersMixin

* Apply suggestions from code review

Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>

* up

* handle wuerstchen

* up

* move lora to lora_pipeline.py

* up

* fix-copies

* fix documentation.

* comment set_adapters().

* fix-copies

* fix set_adapters() at the model level.

* fix?

* fix

---------

Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>
2024-07-25 21:40:58 +05:30
Aryan
3ae0ee88d3 [tests] speed up animatediff tests (#8846)
* speed up animatediff tests

* fix pia test_ip_adapter_single

* fix tests/pipelines/pia/test_pia.py::PIAPipelineFastTests::test_dict_tuple_outputs_equivalent

* update

* fix ip adapter tests

* skip test_from_pipe_consistent_config tests

* fix prompt_embeds test

* update test_from_pipe_consistent_config tests

* fix expected_slice values

* remove temporal_norm_num_groups from UpBlockMotion

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>
2024-07-25 17:35:43 +05:30
Dhruv Nair
5fbb4d32d5 [CI] Slow Test Updates (#8870)
* update

* update

* update
2024-07-25 16:00:43 +05:30
Sayak Paul
d8bcb33f4b [Tests] fix slices of 26 tests (first half) (#8959)
* check for assertions.

* update with correct slices.

* okay

* style

* get it ready

* update

* update

* update

---------

Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>
2024-07-25 14:56:49 +05:30
sayakpaul
744533ad59 model -> pipeline 2024-07-25 10:00:24 +05:30
Sanchit Gandhi
4a782f462a [AudioLDM2] Fix cache pos for GPT-2 generation (#8964)
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-07-25 09:21:49 +05:30
RandomGamingDev
cdd12bde17 Added Code for Gradient Accumulation to work for basic_training (#8961)
added line allowing gradient accumulation to work for basic_training example
2024-07-25 08:40:53 +05:30
Sayak Paul
2c25b98c8e [AuraFlow] fix long prompt handling (#8937)
fix
2024-07-24 11:19:30 +05:30
Dhruv Nair
93983b6780 [CI] Skip flaky download tests in PR CI (#8945)
update
2024-07-24 09:25:06 +05:30
Sayak Paul
41b705f42d remove residual i from auraflow. (#8949)
* remove residual i.

* rename to aura_flow in pipeline test
2024-07-24 07:31:54 +05:30
Sayak Paul
50d21f7c6a [Core] fix QKV fusion for attention (#8829)
* start debugging the problem,

* start

* fix

* fix

* fix imports.

* handle hunyuan

* remove residuals.

* add a check for making sure there's appropriate procs.

* add more rigor to the tests.

* fix test

* remove redundant check

* fix-copies

* move check_qkv_fusion_matches_attn_procs_length and check_qkv_fusion_processors_exist.
2024-07-24 06:52:19 +05:30
Dhruv Nair
3bb1fd6fc0 Fix name when saving text inversion embeddings in dreambooth advanced scripts (#8927)
update
2024-07-23 19:51:20 +05:30
Tolga Cangöz
cf55dcf0ff Fix Colab and Notebook checks for diffusers-cli env (#8408)
* chore: Update is_google_colab check to use environment variable

* Check Colab with all possible COLAB_* env variables

* Remove unnecessary word

* Make `_is_google_colab` more inclusive

* Revert "Make `_is_google_colab` more inclusive"

This reverts commit 6406db21ac.

* Make `_is_google_colab` more inclusive.

* chore: Update import_utils.py with notebook check improvement

* Refactor import_utils.py to improve notebook detection for VS Code's notebook

* chore: Remove `is_notebook()` function and related code

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-07-23 18:04:20 +05:30
Vinh H. Pham
7a95f8d9d8 [Tests] Improve transformers model test suite coverage - Temporal Transformer (#8932)
* add test for temporal transformer

* remove unused variable

* fix code quality

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-07-23 15:36:30 +05:30
akbaig
7710415baf fix: checkpoint save issue in advanced dreambooth lora sdxl script (#8926)
Co-authored-by: Linoy Tsaban <57615435+linoytsaban@users.noreply.github.com>
2024-07-23 14:44:56 +05:30
Aritra Roy Gosthipaty
8b21feed42 [Tests] reduce the model size in the audioldm2 fast test (#7846)
* chore: initial model size reduction

* chore: fixing expected values for failing tests

* requested edits

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-07-23 14:34:07 +05:30
Dhruv Nair
f57b27d2ad Update pipeline test fetcher (#8931)
update
2024-07-23 10:02:22 +05:30
Sayak Paul
c5fdf33a10 [Benchmarking] check if runner helps to restore benchmarking (#8929)
* check if runner helps.

* remove caching

* gpus

* update runner group
2024-07-23 06:38:13 +05:30
Vishnu V Jaddipal
77c5de2e05 Add attentionless VAE support (#8769)
* Add attentionless VAE support

* make style and quality, fix-copies

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-07-22 14:16:58 -10:00
Sayak Paul
af400040f5 [Tests] proper skipping of request caching test (#8908)
proper skipping of request caching test
2024-07-22 12:52:57 -10:00
Jiwook Han
5802c2e3f2 Reflect few contributions on ethical_guidelines.md that were not reflected on #8294 (#8914)
fix_ethical_guidelines.md
2024-07-22 08:48:23 -07:00
Sayak Paul
f4af03b350 [Docs] small fixes to pag guide. (#8920)
small fixes to pag guide.
2024-07-22 08:35:01 -07:00
Seongsu Park
267bf65707 🌐 [i18n-KO] Translated docs to Korean (added 7 docs and etc) (#8804)
* remove unused docs

* add ko-18n docs

* docs typo, edit etc

* reorder list, add `in translation` in toctree

* fix minor translation

* fix docs minor tone, etc
2024-07-22 08:08:44 -07:00
Sayak Paul
1a8b3c2ee8 [Training] SD3 training fixes (#8917)
* SD3 training fixes

Co-authored-by: bghira <59658056+bghira@users.noreply.github.com>

* rewrite noise addition part to respect the eqn.

* styler

* Update examples/dreambooth/README_sd3.md

Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com>

---------

Co-authored-by: bghira <59658056+bghira@users.noreply.github.com>
Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com>
2024-07-21 16:24:04 +05:30
Lucain
56e772ab7e Use model_info.id instead of model_info.modelId (#8912)
Mention model_info.id instead of model_info.modelId
2024-07-20 20:01:21 +05:30
Pierre Chapuis
fe7948941d allow tensors in several schedulers step() call (#8905) 2024-07-19 18:58:06 -10:00
王奇勋
461efc57c5 [fix code annotation] Adjust the dimensions of the rotary positional embedding. (#8890)
* 2d rotary pos emb dim

* make style

---------

Co-authored-by: haofanwang <haofanwang.ai@gmail.com>
2024-07-19 18:57:36 -10:00
shinetzh
3b04cdc816 fix loop bug in SlicedAttnProcessor (#8836)
* fix loop bug in SlicedAttnProcessor


---------

Co-authored-by: neoshang <neoshang@tencent.com>
2024-07-19 18:14:29 -10:00
Álvaro Somoza
c009c203be [SDXL] Fix uncaught error with image to image (#8856)
* initial commit

* apply suggestion to sdxl pipelines

* apply fix to sd pipelines
2024-07-19 12:06:36 -10:00
Dhruv Nair
3f1411767b SSH into cpu runner additional fix (#8893)
* update

* update

* update
2024-07-18 16:18:45 +05:30
Dhruv Nair
588fb5c105 SSH into cpu runner fix (#8888)
* update

* update
2024-07-18 11:00:05 +05:30
Dhruv Nair
eb24e4bdb2 Add option to SSH into CPU runner. (#8884)
update
2024-07-18 10:20:24 +05:30
Sayak Paul
e02ec27e51 [Core] remove resume_download from Hub related stuff (#8648)
* remove resume_download

* fix: _fetch_index_file call.

* remove resume_download from docs.
2024-07-18 09:48:42 +05:30
Sayak Paul
a41e4c506b [Chore] add disable forward chunking to SD3 transformer. (#8838)
add disable forward chunking to SD3 transformer.
2024-07-18 09:30:18 +05:30
Aryan
12625c1c9c [docs] pipeline docs for latte (#8844)
* add pipeline docs for latte

* add inference time to latte docs

* apply review suggestions
2024-07-18 09:27:48 +05:30
Tolga Cangöz
c1dc2ae619 Fix multi-gpu case for train_cm_ct_unconditional.py (#8653)
* Fix multi-gpu case

* Prefer previously created `unwrap_model()` function

For `torch.compile()` generalizability

* `chore: update unwrap_model() function to use accelerator.unwrap_model()`
2024-07-17 19:03:12 +05:30
Beinsezii
e15a8e7f17 Add AuraFlowPipeline and KolorsPipeline to auto map (#8849)
* Add AuraFlowPipeline and KolorsPipeline to auto map

Just T2I. Validated using `quickdif`

* Add Kolors I2I and SD3 Inpaint auto maps

* style

---------

Co-authored-by: yiyixuxu <yixu310@gmail.com>
2024-07-16 17:13:28 -10:00
Sayak Paul
c2fbf8da02 [Chore] allow auraflow latest to be torch compile compatible. (#8859)
* allow auraflow latest to be torch compile compatible.

* default to 1024 1024.
2024-07-17 08:26:36 +05:30
Sayak Paul
0f09b01ab3 [Core] fix: shard loading and saving when variant is provided. (#8869)
fix: shard loading and saving when variant is provided.
2024-07-17 08:26:28 +05:30
Sayak Paul
f6cfe0a1e5 modify pocs. (#8867) 2024-07-17 08:26:13 +05:30
Tolga Cangöz
e87bf62940 [Cont'd] Add the SDE variant of ~~DPM-Solver~~ and DPM-Solver++ to DPM Single Step (#8269)
* Add the SDE variant of DPM-Solver and DPM-Solver++ to DPM Single Step


---------

Co-authored-by: cmdr2 <secondary.cmdr2@gmail.com>
2024-07-16 15:40:02 -10:00
Sayak Paul
3b37fefee9 [Docker] include python3.10 dev and solve header missing problem (#8865)
include python3.10 dev and solve header missing problem
2024-07-16 16:02:39 +05:30
Aryan
bbd2f9d4e9 [tests] fix typo in pag tests (#8845)
* fix typo in pag tests

* fix typo
2024-07-12 17:41:34 +05:30
Nguyễn Công Tú Anh
d704b3bf8c add PAG support sd15 controlnet (#8820)
* add pag support sd15 controlnet

* fix quality import

* remove unecessary import

* remove if state

* fix tests

* remove useless function

* add sd1.5 controlnet pag docs

---------

Co-authored-by: anhnct8 <anhnct8@fpt.com>
2024-07-12 15:42:56 +05:30
ustcuna
9f963e7349 [Community Pipelines] Accelerate inference of AnimateDiff by IPEX on CPU (#8643)
* add animatediff_ipex community pipeline

* address the 1st round review comments
2024-07-12 14:31:15 +05:30
Sayak Paul
973a62d408 [Docs] add AuraFlow docs (#8851)
* add pipeline documentation.

* add api spec for pipeline

* model documentation

* model spec
2024-07-12 09:52:18 +02:00
Dhruv Nair
11d18f3217 Add single file loading support for AnimateDiff (#8819)
* update

* update

* update

* update
2024-07-12 09:51:57 +05:30
Dhruv Nair
d2df40c6f3 Add VAE tiling option for SD3 (#8791)
update
2024-07-11 09:49:39 -10:00
Sayak Paul
2261510bbc [Core] Add AuraFlow (#8796)
* add lavender flow transformer

---------

Co-authored-by: YiYi Xu <yixu310@gmail.com>
2024-07-11 08:50:19 -10:00
Álvaro Somoza
87b9db644b [Core] Add Kolors (#8812)
* initial draft
2024-07-11 06:09:17 -10:00
Xin Ma
b8cf84a3f9 Latte: Latent Diffusion Transformer for Video Generation (#8404)
* add Latte to diffusers

* remove print

* remove print

* remove print

* remove unuse codes

* remove layer_norm_latte and add a flag

* remove layer_norm_latte and add a flag

* update latte_pipeline

* update latte_pipeline

* remove unuse squeeze

* add norm_hidden_states.ndim == 2: # for Latte

* fixed test latte pipeline bugs

* fixed test latte pipeline bugs

* delete sh

* add doc for latte

* add licensing

* Move Transformer3DModelOutput to modeling_outputs

* give a default value to sample_size

* remove the einops dependency

* change norm2 for latte

* modify pipeline of latte

* update test for Latte

* modify some codes for latte

* modify for Latte pipeline

* modify for Latte pipeline

* modify for Latte pipeline

* modify for Latte pipeline

* modify for Latte pipeline

* modify for Latte pipeline

* modify for Latte pipeline

* modify for Latte pipeline

* modify for Latte pipeline

* modify for Latte pipeline

* modify for Latte pipeline

* modify for Latte pipeline

* modify for Latte pipeline

* modify for Latte pipeline

* modify for Latte pipeline

* modify for Latte pipeline

* modify for Latte pipeline

* modify for Latte pipeline

* modify for Latte pipeline

* modify for Latte pipeline

* modify for Latte pipeline

* modify for Latte pipeline

* modify for Latte pipeline

* modify for Latte pipeline

* modify for Latte pipeline

* modify for Latte pipeline

* modify for Latte pipeline

* video_length -> num_frames; update prepare_latents copied from

* make fix-copies

* make style

* typo: videe -> video

* update

* modify for Latte pipeline

* modify latte pipeline

* modify latte pipeline

* modify latte pipeline

* modify latte pipeline

* modify for Latte pipeline

* Delete .vscode directory

* make style

* make fix-copies

* add latte transformer 3d to docs _toctree.yml

* update example

* reduce frames for test

* fixed bug of _text_preprocessing

* set num frame to 1 for testing

* remove unuse print

* add text = self._clean_caption(text) again

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: YiYi Xu <yixu310@gmail.com>
Co-authored-by: Aryan <contact.aryanvs@gmail.com>
Co-authored-by: Aryan <aryan@huggingface.co>
2024-07-11 15:06:22 +05:30
Alan Du
673eb60f1c Reformat docstring for get_timestep_embedding (#8811)
* Reformat docstring for `get_timestep_embedding`


---------

Co-authored-by: YiYi Xu <yixu310@gmail.com>
2024-07-10 15:54:44 -10:00
Sayak Paul
a785992c1d [Tests] fix more sharding tests (#8797)
* fix

* fix

* ugly

* okay

* fix more

* fix oops
2024-07-09 13:09:36 +05:30
Xu Cao
35cc66dc4c Add pipeline_stable_diffusion_3_inpaint.py for SD3 Inference (#8709)
* Add pipeline_stable_diffusion_3_inpaint


---------

Co-authored-by: Xu Cao <xucao2@jrehg-work-01.cs.illinois.edu>
Co-authored-by: IrohXu <irohcao@gmail.com>
Co-authored-by: YiYi Xu <yixu310@gmail.com>
2024-07-08 15:53:02 -10:00
Tolga Cangöz
57084dacc5 Remove unnecessary lines (#8569)
* Remove unused line


---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-07-08 10:42:02 -10:00
Zhuoqun(Jack) Chen
70611a1068 Fix static typing and doc typos (#8807)
* Fix static typing and doc typos

* Fix more same type hint typos with make fix-copies
2024-07-08 09:09:33 -10:00
PommesPeter
98388670d2 [Alpha-VLLM Team] Add Lumina-T2X to diffusers (#8652)
---------

Co-authored-by: zhuole1025 <zhuole1025@gmail.com>
Co-authored-by: YiYi Xu <yixu310@gmail.com>
2024-07-07 17:12:09 -10:00
YiYi Xu
9e9ed353a2 fix loading sharded checkpoints from subfolder (#8798)
* fix load sharded checkpoints from subfolder{

* style

* os.path.join

* add a small test

---------

Co-authored-by: sayakpaul <spsayakpaul@gmail.com>
2024-07-06 11:32:04 -10:00
apolinário
7833ed957b Improve model card for push_to_hub trainers (#8697)
* Improve trainer model cards

* Update train_dreambooth_sd3.py

* Update train_dreambooth_lora_sd3.py

* add link to adapters loading doc

* Update train_dreambooth_lora_sd3.py

---------

Co-authored-by: Linoy Tsaban <57615435+linoytsaban@users.noreply.github.com>
2024-07-05 12:18:41 +05:30
Dhruv Nair
85c4a326e0 Fix saving text encoder weights and kohya weights in advanced dreambooth lora script (#8766)
* update

* update

* update
2024-07-05 11:28:50 +05:30
Dhruv Nair
0bab9d6be7 [Single File] Allow loading T5 encoder in mixed precision (#8778)
* update

* update

* update

* update
2024-07-05 10:29:38 +05:30
Thomas Eding
2e2684f014 Add vae_roundtrip.py example (#7104)
* Add vae_roundtrip.py example

* Add cuda support to vae_roundtrip

* Move vae_roundtrip.py into research_projects/vae

* Fix channel scaling in vae roundrip and also support taesd.

* Apply ruff --fix for CI gatekeep check

---------

Co-authored-by: Álvaro Somoza <asomoza@users.noreply.github.com>
2024-07-04 01:53:09 -04:00
Sayak Paul
31adeb41cd [Tests] fix sharding tests (#8764)
fix sharding tests
2024-07-04 08:50:59 +05:30
Aryan
a7b9634e95 Fix minor bug in SD3 img2img test (#8779)
fix minor bug in sd3 img2img
2024-07-03 07:45:37 -10:00
XCL
6b6b4bcffe [Tencent Hunyuan Team] Add checkpoint conversion scripts and changed controlnet (#8783)
* add conversion files; changed controlnet for hunyuandit

* style

---------

Co-authored-by: xingchaoliu <xingchaoliu@tencent.com>
Co-authored-by: yiyixuxu <yixu310@gmail.com>
2024-07-03 07:45:18 -10:00
Linoy Tsaban
beb1c017ad [advanced dreambooth lora] add clip_skip arg (#8715)
* add clip_skip

* style

* smol fix

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-07-03 12:15:16 -05:00
Sayak Paul
06ee4db3e7 [Chore] add dummy lora attention processors to prevent failures in other libs (#8777)
add dummy lora attention processors to prevent failures in other libs
2024-07-03 13:11:00 +05:30
Sayak Paul
84bbd2f4ce Update README.md to include Colab link (#8775) 2024-07-03 07:46:38 +05:30
Sayak Paul
600ef8a4dc Allow SD3 DreamBooth LoRA fine-tuning on a free-tier Colab (#8762)
* add experimental scripts to train SD3 transformer lora on colab

* add readme

* add colab

* Apply suggestions from code review

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* fix link in the notebook.

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2024-07-03 07:07:47 +05:30
Sayak Paul
984d340534 Revert "[LoRA] introduce LoraBaseMixin to promote reusability." (#8773)
Revert "[LoRA] introduce `LoraBaseMixin` to promote reusability. (#8670)"

This reverts commit a2071a1837.
2024-07-03 07:05:01 +05:30
Sayak Paul
a2071a1837 [LoRA] introduce LoraBaseMixin to promote reusability. (#8670)
* introduce  to promote reusability.

* up

* add more tests

* up

* remove comments.

* fix fuse_nan test

* clarify the scope of fuse_lora and unfuse_lora

* remove space
2024-07-03 07:04:37 +05:30
YiYi Xu
d9f71ab3c3 correct attention_head_dim for JointTransformerBlock (#8608)
* add

* update sd3 controlnet

* Update src/diffusers/models/controlnet_sd3.py

---------

Co-authored-by: yiyixuxu <yixu310@gmail,com>
Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>
2024-07-02 07:42:25 -10:00
Jiwook Han
dd4b731e68 Reflect few contributions on philosophy.md that were not reflected on #8294 (#8690)
* Update philosophy.md 

Some contributions were not reflected previously, so I am resubmitting them.

* Update docs/source/ko/conceptual/philosophy.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/ko/conceptual/philosophy.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2024-07-02 08:43:56 -07:00
Dhruv Nair
31b211bfe3 Fix mistake in Single File Docs page (#8765)
update
2024-07-02 12:45:49 +05:30
Dhruv Nair
610a71d7d4 Fix indent in dreambooth lora advanced SD 15 script (#8753)
update
2024-07-02 11:07:34 +05:30
Dhruv Nair
c104482b9c Fix warning in UNetMotionModel (#8756)
* update

* Update src/diffusers/models/unets/unet_motion_model.py

Co-authored-by: YiYi Xu <yixu310@gmail.com>

---------

Co-authored-by: YiYi Xu <yixu310@gmail.com>
2024-07-02 11:07:13 +05:30
Dhruv Nair
c7a84ba2f4 Enforce ordering when running Pipeline slow tests (#8763)
update
2024-07-02 10:55:50 +05:30
YiYi Xu
8b1e3ec93e [hunyuan-dit] refactor HunyuanCombinedTimestepTextSizeStyleEmbedding (#8761)
up

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-07-02 10:11:04 +05:30
Sayak Paul
4e57aeff1f [Tests] add test suite for SD3 DreamBooth (#8650)
* add a test suite for SD3 DreamBooth

* lora suite

* style

* add checkpointing tests for LoRA

* add test to cover train_text_encoder.
2024-07-02 07:00:22 +05:30
Álvaro Somoza
af92869d9b [SD3 LoRA Training] Fix errors when not training text encoders (#8743)
* fix

* fix things.

Co-authored-by: Linoy Tsaban <linoy.tsaban@gmail.com>

* remove patch

* apply suggestions

---------

Co-authored-by: Linoy Tsaban <57615435+linoytsaban@users.noreply.github.com>
Co-authored-by: sayakpaul <spsayakpaul@gmail.com>
Co-authored-by: Linoy Tsaban <linoy.tsaban@gmail.com>
2024-07-02 06:21:16 +05:30
Haofan Wang
0bae6e447c Allow from_transformer in SD3ControlNetModel (#8749)
* Update controlnet_sd3.py

---------

Co-authored-by: YiYi Xu <yixu310@gmail.com>
2024-07-01 07:38:38 -10:00
Dhruv Nair
0368483b61 Remove legacy single file model loading mixins (#8754)
update
2024-07-01 07:20:19 -10:00
YiYi Xu
ddb9d8548c [doc] add a tip about using SDXL refiner with hunyuan-dit and pixart (#8735)
* up

* Apply suggestions from code review

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2024-07-01 06:30:09 -10:00
Lucain
49979753e1 Always raise from previous error (#8751) 2024-07-01 14:22:30 +05:30
XCL
a3904d7e34 [Tencent Hunyuan Team] Add HunyuanDiT-v1.2 Support (#8747)
* add v1.2 support

---------

Co-authored-by: xingchaoliu <xingchaoliu@tencent.com>
Co-authored-by: yiyixuxu <yixu310@gmail.com>
2024-06-30 21:33:38 -10:00
WenheLI
7bfc1ee1b2 fix the LR schedulers for dreambooth_lora (#8510)
* update training

* update

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: Linoy Tsaban <57615435+linoytsaban@users.noreply.github.com>
2024-07-01 08:14:57 +05:30
Bhavay Malhotra
71c046102b [train_controlnet_sdxl.py] Fix the LR schedulers when num_train_epochs is passed in a distributed training env (#8476)
* Create diffusers.yml

* num_train_epochs

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-07-01 07:21:40 +05:30
Sayak Paul
83b112a145 shift cache in benchmarking. (#8740)
* shift cache.

* comment
2024-07-01 07:14:05 +05:30
Shauray Singh
8690e8b9d6 add PAG support for SD architecture (#8725)
* add pag to sd pipelines
2024-06-29 09:26:11 -10:00
Sayak Paul
7db8c3ec40 Benchmarking workflow fix (#8389)
* fix

* fixes

* add back the deadsnakes

* better messaging

* disable IP adapter tests for the moment.

* style

* up

* empty
2024-06-29 09:06:32 +05:30
Álvaro Somoza
9b7acc7cf2 [Community pipeline] SD3 Differential Diffusion Img2Img Pipeline (#8679)
* new pipeline
2024-06-28 17:12:39 -10:00
Luo Chaofan
a216b0bb7f fix: ValueError when using FromOriginalModelMixin in subclasses #8440 (#8454)
* fix: ValueError when using FromOriginalModelMixin in subclasses #8440

(cherry picked from commit 9285997843)

* Update src/diffusers/loaders/single_file_model.py

Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>

* Update single_file_model.py

* Update single_file_model.py

---------

Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-06-28 17:15:46 +05:30
Dhruv Nair
150142c537 [Tests] Fix precision related issues in slow pipeline tests (#8720)
update
2024-06-28 08:13:46 +05:30
Linoy Tsaban
35f45ecd71 [Advanced dreambooth lora] adjustments to align with canonical script (#8406)
* minor changes

* minor changes

* minor changes

* minor changes

* minor changes

* minor changes

* minor changes

* fix

* fix

* aligning with blora script

* aligning with blora script

* aligning with blora script

* aligning with blora script

* aligning with blora script

* remove prints

* style

* default val

* license

* move save_model_card to outside push_to_hub

* Update train_dreambooth_lora_sdxl_advanced.py

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-06-27 13:27:37 +05:30
Sayak Paul
d5dd8df3b4 [Chore] perform better deprecation for vqmodeloutput (#8719)
perform better deprecation for vqmodeloutput
2024-06-27 12:16:37 +05:30
Mathis Koroglu
3e0d128da7 Motion Model / Adapter versatility (#8301)
* Motion Model / Adapter versatility

- allow to use a different number of layers per block
- allow to use a different number of transformer per layers per block
- allow a different number of motion attention head per block
- use dropout argument in get_down/up_block in 3d blocks

* Motion Model added arguments renamed & refactoring

* Add test for asymmetric UNetMotionModel
2024-06-27 11:11:29 +05:30
vincedovy
a536e775fb Fix json WindowsPath crash (#8662)
* Add check for WindowsPath in to_json_string

On Windows, os.path.join returns a WindowsPath. to_json_string does not convert this from a WindowsPath to a string. Added check for WindowsPath to to_json_saveable.

* Remove extraneous convert to string in test_check_path_types (tests/others/test_config.py)

* Fix style issues in tests/others/test_config.py

* Add unit test to test_config.py to verify that PosixPath and WindowsPath (depending on system) both work when converted to JSON

* Remove distinction between PosixPath and WindowsPath in ConfigMixIn.to_json_string(). Conditional now tests for Path, and uses Path.as_posix() to convert to string.

---------

Co-authored-by: Vincent Dovydaitis <vincedovy@gmail.com>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-06-27 10:30:55 +05:30
Álvaro Somoza
3b01d72a64 Modify FlowMatch Scale Noise (#8678)
* initial fix

* apply suggestion

* delete step_index line
2024-06-27 00:36:33 -04:00
Sayak Paul
e2a4a46e99 [Release notification] add some info when there is an error. (#8718)
add some info when there is an error.
2024-06-27 09:49:15 +05:30
Sayak Paul
eda560d34c modify PR and issue templates (#8687)
* modify PR and issue templates

* add single file poc.
2024-06-27 09:01:47 +05:30
Sayak Paul
adbb04864d [LoRA] fix conversion utility so that lora dora loads correctly (#8688)
fix conversion utility so that lora dora loads correctly
2024-06-27 08:58:32 +05:30
Dhruv Nair
effe4b9784 Update xformers SD3 test (#8712)
update
2024-06-26 10:24:27 -10:00
Sayak Paul
5b51ad0052 [LoRA] fix vanilla fine-tuned lora loading. (#8691)
fix vanilla fine-tuned lora loading.
2024-06-26 07:38:57 -10:00
Sayak Paul
10b4e354b6 [Chore] remove deprecation from transformer2d regarding the output class. (#8698)
* remove deprecation from transformer2d regarding the output class.

* up

* deprecate more
2024-06-26 07:35:36 -10:00
Donald.Lee
ea6938aea5 Fix: unet save_attn_procs at UNet2DconditionLoadersMixin (#8699)
* fix: unet save_attn_procs at custom diffusion

* style: recover unchanaged parts(max line length 119) / mod: add condition

* style: recover unchanaged parts(max line length 119)

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-06-26 22:30:49 +05:30
Sayak Paul
8ef0d9deff [Observability] add reporting mechanism when mirroring community pipelines. (#8676)
* add reporting mechanism when mirroring community pipelines.

* remove unneeded argument

* get the actual PATH_IN_REPO

* don't need tag
2024-06-26 22:11:33 +05:30
XCL
fa2abfdb03 [Tencent Hunyuan Team] Add Hunyuan-DiT ControlNet Inference (#8694)
* add controlnet support

---------

Co-authored-by: xingchaoliu <xingchaoliu@tencent.com>
Co-authored-by: yiyixuxu <yixu310@gmail,com>
2024-06-26 00:43:03 -10:00
YiYi Xu
1d3ef67b09 [doc] add more about from_pipe API for PAG doc (#8701)
* add more about from_pipe API

* Update docs/source/en/using-diffusers/pag.md

* Update docs/source/en/using-diffusers/pag.md

---------

Co-authored-by: yiyixuxu <yixu310@gmail,com>
2024-06-25 22:26:12 -10:00
Dhruv Nair
0f0b531827 Add decorator for compile tests (#8703)
* update

* update

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-06-26 11:26:47 +05:30
Sayak Paul
e8284281c1 add docs on model sharding (#8658)
* add docs on model sharding

* add entry to _toctree.

* Apply suggestions from code review

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* simplify wording

* add a note on transformer library handling

* move device placement section

* Update docs/source/en/training/distributed_inference.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2024-06-26 07:35:11 +05:30
YiYi Xu
715a7da1b2 add sd3 conversion script (#8702)
add conversion script
2024-06-25 14:24:58 -10:00
Álvaro Somoza
14d224d4e6 [Docs] SD3 T5 Token limit doc (#8654)
* doc for max_sequence_length

* better position and changed note to tip

* apply suggestions

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-06-25 14:41:27 -04:00
YiYi Xu
540399f540 add PAG support (#7944)
* first draft


---------

Co-authored-by: yiyixuxu <yixu310@gmail,com>
Co-authored-by: Junhwa Song <ethan9867@gmail.com>
Co-authored-by: Ahn Donghoon (안동훈 / suno) <suno.vivid@gmail.com>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2024-06-25 08:40:02 -10:00
Sayak Paul
f088027e93 [Marigold tests] add is_flaky decorator to some Marigold tests (#8696)
okay
2024-06-25 06:27:28 -10:00
Linoy Tsaban
c6e08ecd46 [Sd3 Dreambooth LoRA] Add text encoder training for the clip encoders (#8630)
* add clip text-encoder training

* no dora

* text encoder traing fixes

* text encoder traing fixes

* text encoder training fixes

* text encoder training fixes

* text encoder training fixes

* text encoder training fixes

* add text_encoder layers to save_lora

* style

* fix imports

* style

* fix text encoder

* review changes

* review changes

* review changes

* minor change

* add lora tag

* style

* add readme notes

* add tests for clip encoders

* style

* typo

* fixes

* style

* Update tests/lora/test_lora_layers_sd3.py

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

* Update examples/dreambooth/README_sd3.md

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

* minor readme change

---------

Co-authored-by: YiYi Xu <yixu310@gmail.com>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-06-25 18:00:19 +05:30
Sayak Paul
4ad7a1f5fd [Chore] create a utility for calculating the expected number of shards. (#8692)
create a utility for calculating the expected number of shards.
2024-06-25 17:05:39 +05:30
Hammond Liu
1f81fbe274 Fix redundant pipe init in sd3 lora (#8680)
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-06-25 07:31:20 +05:30
Tolga Cangöz
589931ca79 Errata - Update class method convention to use cls (#8574)
* Class methods are supposed to use `cls` conventionally

* `make style && make quality`

* An Empty commit

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-06-24 10:35:45 -07:00
Steven Liu
675be88f00 [docs] Add note for float8 (#8685)
add note
2024-06-24 10:13:34 -07:00
Steven Liu
df4ad6f4ac [docs] Fix Pillow import (#8684)
fix import error
2024-06-24 10:13:15 -07:00
Sayak Paul
bc90c28bc9 [Docs] add note on caching in fast diffusion (#8675)
* add note on caching in fast diffusion

* formatting

* Update docs/source/en/tutorials/fast_diffusion.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2024-06-24 10:10:45 -07:00
Tolga Cangöz
f040c27d4c Errata - Fix typos and improve style (#8571)
* Fix typos

* Fix typos & up style

* chore: Update numbers

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-06-24 10:07:22 -07:00
Tolga Cangöz
138fac703a Discourage using deprecated revision parameter (#8573)
* Discourage using `revision`

* `make style && make quality`

* Refactor code to use 'variant' instead of 'revision'

* `revision="bf16"` -> `variant="bf16"`

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-06-24 10:06:49 -07:00
Tolga Cangöz
468ae09ed8 Errata - Trim trailing white space in the whole repo (#8575)
* Trim all the trailing white space in the whole repo

* Remove unnecessary empty places

* make style && make quality

* Trim trailing white space

* trim

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-06-24 18:39:15 +05:30
Dong
3fca52022f 🎨 fix xl playground device (#8550)
* 🎨 fix xl playground device

* 🎨 run `make fix-copies`

* 🎨 run `make fix-copies`

* edit xl_controlnet_img2img file

* edit playground img2img test slow

* Update tests/pipelines/stable_diffusion_xl/test_stable_diffusion_xl_img2img.py

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-06-24 16:49:55 +05:30
Tolga Cangöz
c375903db5 Errata - Fix typos & improve contributing page (#8572)
* Fix typos & improve contributing page

* `make style && make quality`

* fix typos

* Fix typo

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-06-24 14:13:03 +05:30
Vinh H. Pham
b9d52fca1d [train_lcm_distill_lora_sdxl.py] Fix the LR schedulers when num_train_epochs is passed in a distributed training env (#8446)
fix num_train_epochs

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-06-24 14:09:28 +05:30
drhead
2ada094bff Add extra performance features for EMAModel, torch._foreach operations and better support for non-blocking CPU offloading (#7685)
* Add support for _foreach operations and non-blocking to EMAModel

* default foreach to false

* add non-blocking EMA offloading to SD1.5 T2I example script

* fix whitespace

* move foreach to cli argument

* linting

* Update README.md re: EMA weight training

* correct args.foreach_ema

* add tests for foreach ema

* code quality

* add foreach to from_pretrained

* default foreach false

* fix linting

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: drhead <a@a.a>
2024-06-24 14:03:47 +05:30
Haofan Wang
f1f542bdd4 Update pipeline_stable_diffusion_3_controlnet.py (#8660)
Co-authored-by: YiYi Xu <yixu310@gmail,com>
2024-06-23 15:27:59 +05:30
Sayak Paul
a9c403c001 [LoRA] refactor lora conversion utility. (#8295)
* refactor lora conversion utility.

* remove error raises.

* add onetrainer support too.
2024-06-22 08:29:12 +05:30
Álvaro Somoza
e7b9a0762b [SD3 LoRA] Fix list index out of range (#8584)
* fix

* add check

* key present is checked before

* test case draft

* aply suggestions

* changed testing repo, back to old class

* forgot docstring

---------

Co-authored-by: YiYi Xu <yixu310@gmail.com>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-06-21 21:17:34 +05:30
Sayak Paul
8eb17315c8 [LoRA] get rid of the legacy lora remnants and make our codebase lighter (#8623)
* get rid of the legacy lora remnants and make our codebase lighter

* fix depcrecated lora argument

* fix

* empty commit to trigger ci

* remove print

* empty
2024-06-21 16:36:05 +05:30
YiYi Xu
c71c19c5e6 a few fix for shard checkpoints (#8656)
fix

Co-authored-by: yiyixuxu <yixu310@gmail,com>
2024-06-21 12:50:58 +05:30
Steaunk
adc31940a9 Fix Typo in StableDiffusion3 (#8642)
* fix typo in __call__ of pipeline_stable_diffusion_3.py

* fix typo in __call__ of pipeline_stable_diffusion_3_img2img.py

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-06-21 08:45:48 +05:30
satani99
963ee05d16 Update train_dreambooth_lora_sd3.py (#8600)
* Update train_dreambooth_lora_sd3.py

* Update train_dreambooth_lora_sd3.py

* Update train_dreambooth_sd3.py

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-06-20 17:42:24 +05:30
Sayak Paul
668e34c6e0 [LoRA SD3] add support for lora fusion in sd3 (#8616)
* add support for lora fusion in sd3

* add test to ensure fused lora and effective lora produce same outpouts
2024-06-20 14:25:51 +05:30
Sayak Paul
25d7bb3ea6 [Flax tests] reduce tolerance for a flax test (#8640)
reduce tolerance for a flax test
2024-06-20 00:48:08 +04:00
YiYi Xu
394b8fb996 fix from_single_file for checkpoints with t5 (#8631)
fix single file
2024-06-19 08:23:35 -10:00
Sayak Paul
a1d55e14ba Change the default weighting_scheme in the SD3 scripts (#8639)
* change to logit_normal as the weighting scheme

* sensible default mote
2024-06-19 13:05:26 +01:00
王奇勋
e5564d45bf Support SD3 ControlNet and Multi-ControlNet. (#8566)
* sd3 controlnet



---------

Co-authored-by: haofanwang <haofanwang.ai@gmail.com>
2024-06-18 14:59:22 -10:00
Nan
2921a20194 [SD3] Fix mis-matched shape when num_images_per_prompt > 1 using without T5 (text_encoder_3=None) (#8558)
* fix shape mismatch when num_images_per_prompt > 1 and text_encoder_3=None

* style

* fix copies

---------

Co-authored-by: YiYi Xu <yixu310@gmail.com>
Co-authored-by: yiyixuxu <yixu310@gmail,com>
2024-06-18 12:41:18 -10:00
Carolinabanana
3376252d71 Fix gradient checkpointing issue for Stable Diffusion 3 (#8542)
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: YiYi Xu <yixu310@gmail.com>
2024-06-18 11:36:23 -10:00
Yongsen Mao
16170c69ae add sd1.5 compatibility to controlnet-xs and fix unused_parameters error during training (#8606)
* add sd1.5 compatibility to controlnet-xs

* set use_linear_projection by base_block

* refine code style
2024-06-18 11:35:34 -10:00
kkj15dk
4408047ac5 self.upsample = Upsample1D (#8580)
Making self.upsample actually be Upsample1D
2024-06-18 11:34:07 -10:00
Vasco Ramos
34fab8b511 [SD3 Docs] Corrected title about loading model with T5 "without" -> "with" (#8602)
[SD3 Docs] Corrected title about loading model with T5

Corrected the documentation title to "Loading the single file checkpoint with T5" Previously, it incorrectly stated "Loading the single file checkpoint without T5" which contradicted the code snippet showing how to load the SD3 checkpoint with the T5 model
2024-06-18 11:33:43 -10:00
Gæros
298ce67999 [LoRA] text encoder: read the ranks for all the attn modules (#8324)
* [LoRA] text encoder: read the ranks for all the attn modules

 * In addition to out_proj, read the ranks of adapters for q_proj, k_proj, and  v_proj

 * Allow missing adapters (UNet already supports this)

* ruff format loaders.lora

* [LoRA] add tests for partial text encoders LoRAs

* [LoRA] update test_simple_inference_with_partial_text_lora to be deterministic

* [LoRA] comment justifying test_simple_inference_with_partial_text_lora

* style

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-06-18 21:10:50 +01:00
Andrew Hong
d2e7a19fd5 Remove underlines between badges (#8484)
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-06-18 10:40:12 -07:00
Sayak Paul
cd3082008e [Core] Add shift_factor to SD3 tiny autoencoder (#8618)
* shift factor argument to tiny

* remove shift factor rejigging from the sd3 docs
2024-06-18 18:28:02 +01:00
Álvaro Somoza
f3209b5b55 [SD3 Inference] T5 Token limit (#8506)
* max_sequence_length for the T5

* updated img2img

* apply suggestions

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: YiYi Xu <yixu310@gmail.com>
2024-06-18 06:46:38 -10:00
Marc Sun
96399c3ec6 Fix sharding when no device_map is passed (#8531)
* Fix sharding when no device_map is passed

* style

* add tests

* align

* add docstring

* format

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-06-18 05:47:23 -10:00
MaoXianXin
10d3220abe A backslash is missing from the run command (#8471) 2024-06-18 16:44:34 +01:00
Dhruv Nair
f69511ecc6 [Single File Loading] Handle unexpected keys in CLIP models when accelerate isn't installed. (#8462)
* update

* update

* update

* update

* update

---------

Co-authored-by: YiYi Xu <yixu310@gmail.com>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-06-18 16:39:30 +01:00
Álvaro Somoza
d2b10b1f4f [SD3] TAESD3 docs (#8607)
* tased3 docs

* apply suggestion

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-06-18 15:56:38 +01:00
Sayak Paul
23a2cd3337 [LoRA] training fix the position of param casting when loading them (#8460)
fix the position of param casting when loading them
2024-06-18 14:57:34 +01:00
Sayak Paul
4edde134f6 [SD3 training] refactor the density and weighting utilities. (#8591)
refactor the density and weighting utilities.
2024-06-18 14:44:38 +01:00
Bagheera
074a7cc3c5 SD3: update default training timestep / loss weighting distribution to logit_normal (#8592)
Co-authored-by: bghira <bghira@users.github.com>
Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com>
2024-06-18 14:15:19 +01:00
Álvaro Somoza
6bfd13f07a [SD3 Training] T5 token limit (#8564)
* initial commit

* default back to 77

* better text

* text correction

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-06-17 16:32:56 -04:00
AmosDinh
eeb70033a6 Syntax error in readme example "pipe" -> "pipeline" (#8601)
Update controlnet.md

Syntax error pipe -> pipeline
2024-06-17 11:02:07 -07:00
Dhruv Nair
c4a4750cb3 Temporarily pin Numpy in the CI (#8603)
temp pin numpy
2024-06-17 19:32:38 +05:30
YiYi Xu
a6375d4101 Image processor latent (#8513)
* fix

* up

---------

Co-authored-by: yiyixuxu <yixu310@gmail,com>
2024-06-16 22:34:55 -10:00
spacepxl
8e1b7a084a Fix the deletion of SD3 text encoders for Dreambooth/LoRA training if the text encoders are not being trained (#8536)
* Update train_dreambooth_sd3.py to fix TE garbage collection

* Update train_dreambooth_lora_sd3.py to fix TE garbage collection

---------

Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-06-16 20:52:33 +01:00
Rafie Walker
6946facf69 Implement SD3 loss weighting (#8528)
* Add lognorm and cosmap weighting

* Implement mode sampling

* Update examples/dreambooth/train_dreambooth_lora_sd3.py

* Update examples/dreambooth/train_dreambooth_lora_sd3.py

* Update examples/dreambooth/train_dreambooth_sd3.py

* Update examples/dreambooth/train_dreambooth_sd3.py

* Update examples/dreambooth/train_dreambooth_sd3.py

* Update examples/dreambooth/train_dreambooth_lora_sd3.py

* Update examples/dreambooth/train_dreambooth_sd3.py

* Update examples/dreambooth/train_dreambooth_sd3.py

* Update examples/dreambooth/train_dreambooth_lora_sd3.py

* keep timestamp sampling fully on cpu

---------

Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-06-16 20:15:50 +01:00
Sayak Paul
130dd936bb pin accelerate to 0.31.0 (#8563)
* pin accelerate to 0.31.0

* update dep table

* empty
2024-06-16 08:37:00 -10:00
Jonathan Rahn
a899e42fc7 add sentencepiece to requirements.txt for SD3 dreambooth (#8538)
* add `sentencepiece` requirement for SD3

add `sentencepiece` requirement

* Empty-Commit

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-06-14 22:48:36 +01:00
Sayak Paul
f96e4a16ad pin transformers to the latest (#8522)
thanks!
2024-06-13 07:39:24 -10:00
Tolga Cangöz
9c6e9684a2 Refactor StableDiffusion3Img2ImgPipeline to remove redundant code (#8533) 2024-06-13 07:36:46 -10:00
Sayak Paul
2e4841ef1e post release 0.29.0 (#8492)
post release
2024-06-13 06:14:20 -10:00
Haofan Wang
8bea943714 Update requirements_sd3.txt (#8521) 2024-06-13 17:02:17 +01:00
YiYi Xu
614d0c64e9 remove the deprecated prepare_mask_and_masked_image function (#8512)
remove prepare mask fn

Co-authored-by: yiyixuxu <yixu310@gmail,com>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-06-13 14:59:21 +01:00
Dhruv Nair
b1a2c0d577 Expand Single File support in SD3 Pipeline (#8517)
* update

* update
2024-06-13 18:29:19 +05:30
Lucain
06ee907b73 Fix PATH_IN_REPO on new release in mirror_community_pipeline.yaml (#8519)
Fix PATH_IN_REPO in mirror workflow
2024-06-13 10:25:24 +02:00
ちくわぶ
896fb6d8d7 Fix duplicate variable assignments in SD3's JointAttnProcessor (#8516)
* Fix duplicate variable assignments.

* Fix duplicate variable assignments.
2024-06-12 21:52:35 -10:00
Beinsezii
7f51f286a5 Add Hunyuan AutoPipe mapping (#8505) 2024-06-12 16:11:55 -10:00
kkj15dk
829f6defa4 Fix spelling in scheduling_flow_match_euler_discrete.py (#8497)
Update scheduling_flow_match_euler_discrete.py

Spelling:
Foward -> Forward

Co-authored-by: YiYi Xu <yixu310@gmail.com>
2024-06-12 12:37:47 -10:00
Beinsezii
24bdf4b215 Add SD3 AutoPipeline mappings (#8489) 2024-06-12 12:31:36 -10:00
Radamés Ajna
95e0c3757d Fix small typo (#8498) 2024-06-12 15:30:58 -07:00
Sayak Paul
6cf0be5d3d fix warning log for Transformer SD3 (#8496)
fix warning log
2024-06-12 12:25:18 -10:00
Sayak Paul
ec068f9b5b fix dual transformer2d import (#8491)
fix
2024-06-12 21:10:27 +01:00
Ameer Azam
0240d4191a Update README_sd3.md (#8490)
becasue in  Readme  it was not correct
train_dreambooth_sd3.py to train_dreambooth_lora_sd3
2024-06-12 21:08:36 +01:00
Dhruv Nair
04717fd861 Add Stable Diffusion 3 (#8483)
* up

* add sd3

* update

* update

* add tests

* fix copies

* fix docs

* update

* add dreambooth lora

* add LoRA

* update

* update

* update

* update

* import fix

* update

* Update src/diffusers/pipelines/stable_diffusion_3/pipeline_stable_diffusion_3.py

Co-authored-by: YiYi Xu <yixu310@gmail.com>

* import fix 2

* update

* Update src/diffusers/models/autoencoders/autoencoder_kl.py

Co-authored-by: YiYi Xu <yixu310@gmail.com>

* Update src/diffusers/models/autoencoders/autoencoder_kl.py

Co-authored-by: YiYi Xu <yixu310@gmail.com>

* Update src/diffusers/models/autoencoders/autoencoder_kl.py

Co-authored-by: YiYi Xu <yixu310@gmail.com>

* Update src/diffusers/models/autoencoders/autoencoder_kl.py

Co-authored-by: YiYi Xu <yixu310@gmail.com>

* Update src/diffusers/models/autoencoders/autoencoder_kl.py

Co-authored-by: YiYi Xu <yixu310@gmail.com>

* Update src/diffusers/models/autoencoders/autoencoder_kl.py

Co-authored-by: YiYi Xu <yixu310@gmail.com>

* Update src/diffusers/models/autoencoders/autoencoder_kl.py

Co-authored-by: YiYi Xu <yixu310@gmail.com>

* Update src/diffusers/models/autoencoders/autoencoder_kl.py

Co-authored-by: YiYi Xu <yixu310@gmail.com>

* Update src/diffusers/models/autoencoders/autoencoder_kl.py

Co-authored-by: YiYi Xu <yixu310@gmail.com>

* Update src/diffusers/models/autoencoders/autoencoder_kl.py

Co-authored-by: YiYi Xu <yixu310@gmail.com>

* Update src/diffusers/models/autoencoders/autoencoder_kl.py

Co-authored-by: YiYi Xu <yixu310@gmail.com>

* update

* update

* update

* fix ckpt id

* fix more ids

* update

* missing doc

* Update src/diffusers/schedulers/scheduling_flow_match_euler_discrete.py

Co-authored-by: YiYi Xu <yixu310@gmail.com>

* Update src/diffusers/schedulers/scheduling_flow_match_euler_discrete.py

Co-authored-by: YiYi Xu <yixu310@gmail.com>

* Update docs/source/en/api/pipelines/stable_diffusion/stable_diffusion_3.md

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

* Update docs/source/en/api/pipelines/stable_diffusion/stable_diffusion_3.md

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

* update'

* fix

* update

* Update src/diffusers/models/autoencoders/autoencoder_kl.py

* Update src/diffusers/models/autoencoders/autoencoder_kl.py

* note on gated access.

* requirements

* licensing

---------

Co-authored-by: sayakpaul <spsayakpaul@gmail.com>
Co-authored-by: YiYi Xu <yixu310@gmail.com>
2024-06-12 20:44:00 +01:00
Jiwook Han
6fd458e99d 🌐 [i18n-KO] Translated conceptual/philosophy.md and 3 other documents to Korean (#8294)
* translation about 3 documents into Korean

* evaluation doc korean translation

* _toctree.yml modify

* doc title fix : philosopy->philosophy

* Update docs/source/ko/conceptual/ethical_guidelines.md

Co-authored-by: Jihun Lim <31366038+heuristicwave@users.noreply.github.com>

* Update docs/source/ko/conceptual/ethical_guidelines.md

Co-authored-by: Jihun Lim <31366038+heuristicwave@users.noreply.github.com>

* Update docs/source/ko/conceptual/ethical_guidelines.md

Co-authored-by: Jihun Lim <31366038+heuristicwave@users.noreply.github.com>

* Update docs/source/ko/conceptual/ethical_guidelines.md

Co-authored-by: Jihun Lim <31366038+heuristicwave@users.noreply.github.com>

* Update docs/source/ko/conceptual/ethical_guidelines.md

Co-authored-by: Jihun Lim <31366038+heuristicwave@users.noreply.github.com>

* Update docs/source/ko/conceptual/ethical_guidelines.md

Co-authored-by: Jihun Lim <31366038+heuristicwave@users.noreply.github.com>

* Update docs/source/ko/conceptual/ethical_guidelines.md

Co-authored-by: Jihun Lim <31366038+heuristicwave@users.noreply.github.com>

* Update docs/source/ko/conceptual/ethical_guidelines.md

Co-authored-by: Jihun Lim <31366038+heuristicwave@users.noreply.github.com>

* Update docs/source/ko/conceptual/ethical_guidelines.md

Co-authored-by: Jihun Lim <31366038+heuristicwave@users.noreply.github.com>

* Update docs/source/ko/conceptual/ethical_guidelines.md

Co-authored-by: Jihun Lim <31366038+heuristicwave@users.noreply.github.com>

* Update docs/source/ko/conceptual/evaluation.md

Co-authored-by: Chulhwa (Evan) Han <cjfghk5697@ajou.ac.kr>

* Update docs/source/ko/conceptual/evaluation.md

Co-authored-by: Chulhwa (Evan) Han <cjfghk5697@ajou.ac.kr>

* Update docs/source/ko/conceptual/evaluation.md

Co-authored-by: Chulhwa (Evan) Han <cjfghk5697@ajou.ac.kr>

* Update docs/source/ko/conceptual/evaluation.md

Co-authored-by: Chulhwa (Evan) Han <cjfghk5697@ajou.ac.kr>

* Update docs/source/ko/conceptual/evaluation.md

Co-authored-by: Chulhwa (Evan) Han <cjfghk5697@ajou.ac.kr>

* Update docs/source/ko/conceptual/evaluation.md

Co-authored-by: Chulhwa (Evan) Han <cjfghk5697@ajou.ac.kr>

* Update docs/source/ko/conceptual/evaluation.md

Co-authored-by: Chulhwa (Evan) Han <cjfghk5697@ajou.ac.kr>

* Update philosophy.md (from jungnerd)

---------

Co-authored-by: Jihun Lim <31366038+heuristicwave@users.noreply.github.com>
Co-authored-by: Chulhwa (Evan) Han <cjfghk5697@ajou.ac.kr>
2024-06-12 09:40:37 -07:00
Greg Hunkins
1066fe4cbc 🤫 Quiet IP Adapter Mask Warning (#8475)
* quiet attn parameters

* fix lint

* make style && make quality

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-06-12 16:50:13 +01:00
Sayak Paul
d38f69ea25 change max_shard_size to 10GB (#8445)
* change max_shard_size to 10GB

* add notes to the documentation

* Update src/diffusers/models/modeling_utils.py

Co-authored-by: Lucain <lucainp@gmail.com>

* change to abs limit

---------

Co-authored-by: Lucain <lucainp@gmail.com>
2024-06-12 13:49:13 +01:00
Patrick
0a1c13af79 image_processor.py: Fixed an error in ValueError's message (#8447)
* image_processor.py: Fixed an error in ValueError's message , as the string's join method tried to join types, instead of strings

Bug that occurred:

f"Input is in incorrect format. Currently, we only support {', '.join(supported_formats)}"
TypeError: sequence item 0: expected str instance, type found

* Fixed: C417 Unnecessary `map` usage (rewrite using a generator expression)

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-06-11 08:09:24 -10:00
YiYi Xu
0028c34432 fix SEGA pipeline (#8467)
* fix

* style

---------

Co-authored-by: yiyixuxu <yixu310@gmail,com>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-06-11 06:37:49 -10:00
Sayak Paul
d457beed92 Update README.md to update the MaPO project (#8470)
Update README.md
2024-06-11 10:10:45 +01:00
Jianqi Pan
1d9a6a81b9 🔧 chore: use modeling_outputs.Transformer2DModelOutput (#8436)
* 🔧 chore: use modeling_outputs.Transformer2DModelOutput

* 🔧 chore: isort

* 🔧 chore: isort

* style

---------

Co-authored-by: sayakpaul <spsayakpaul@gmail.com>
2024-06-10 12:11:41 +01:00
Luc Georges
4e0984db6c fix(ci): remove unnecessary permissions (#8457) 2024-06-10 10:49:29 +01:00
Luc Georges
83bc6c94ea feat(ci): add trufflehog secrets detection (#8430) 2024-06-08 07:56:47 +05:30
Lucain
0d68ddf327 Move away from cached_download (#8419)
* Move away from

* unused constant

* Add custom error
2024-06-07 15:43:00 +05:30
Sayak Paul
7d887118b9 [Core] support saving and loading of sharded checkpoints (#7830)
* feat: support saving a model in sharded checkpoints.

* feat: make loading of sharded checkpoints work.

* add tests

* cleanse the loading logic a bit more.

* more resilience while loading from the Hub.

* parallelize shard downloads by using snapshot_download()/

* default to a shard size.

* more fix

* Empty-Commit

* debug

* fix

* uality

* more debugging

* fix more

* initial comments from Benjamin

* move certain methods to loading_utils

* add test to check if the correct number of shards are present.

* add a test to check if loading of sharded checkpoints from the Hub is okay

* clarify the unit when passed as an int.

* use hf_hub for sharding.

* remove unnecessary code

* remove unnecessary function

* lucain's comments.

* fixes

* address high-level comments.

* fix test

* subfolder shenanigans./

* Update src/diffusers/utils/hub_utils.py

Co-authored-by: Lucain <lucainp@gmail.com>

* Apply suggestions from code review

Co-authored-by: Lucain <lucainp@gmail.com>

* remove _huggingface_hub_version as not needed.

* address more feedback.

* add a test for local_files_only=True/

* need hf hub to be at least 0.23.2

* style

* final comment.

* clean up subfolder.

* deal with suffixes in code.

* _add_variant default.

* use weights_name_pattern

* remove add_suffix_keyword

* clean up downloading of sharded ckpts.

* don't return something special when using index.json

* fix more

* don't use bare except

* remove comments and catch the errors better

* fix a couple of things when using is_file()

* empty

---------

Co-authored-by: Lucain <lucainp@gmail.com>
2024-06-07 14:49:10 +05:30
Lucain
b63c956860 Final fix for mirror community pipeline (#8427) 2024-06-07 11:08:33 +02:00
Lucain
716b2062bf Fix mirror community pipeline (#8426) 2024-06-07 11:03:48 +02:00
Lucain
5fd6825d25 Fix mirror_community_pipeline.yml name (#8425) 2024-06-07 11:00:05 +02:00
Lucain
e0fae6fd73 Mirror ./examples/community folder on HF (#8417)
* first draft

* secret

* tiktok

* capital matters

* dataset matter

* don't be a prick

* refact

* only on main or tag

* document with an example

* Update destination dataset

* link

* allow manual trigger

* better

* lin

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-06-07 10:56:05 +02:00
Tolga Cangöz
ec1aded12e Optimize test files by fixing CPU-offloading usage (#8409)
* Refactor code to remove unnecessary calls to `to(torch_device)`

* Refactor code to remove unnecessary calls to `to("cuda")`

* Update pipeline_stable_diffusion_diffedit.py
2024-06-06 09:51:26 -10:00
Steven Liu
151a56b80e [docs] Single file usage (#8412)
* single file usage

* edit
2024-06-06 12:40:34 -07:00
Sayak Paul
a3faf3f260 [Core] fix: legacy model mapping (#8416)
* fix: legacy model mapping

* remove print
2024-06-06 20:35:05 +05:30
Sayak Paul
867a2b0cf9 [Hunyuan] add optimization related sections to the hunyuan dit docs. (#8402)
* optimizations to the hunyuan dit docs.

* Apply suggestions from code review

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/api/pipelines/hunyuandit.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2024-06-06 05:41:38 +05:30
Tolga Cangöz
98730c5dd7 Errata (#8322)
* Fix typos

* Trim trailing whitespaces

* Remove a trailing whitespace

* chore: Update MarigoldDepthPipeline checkpoint to prs-eth/marigold-lcm-v1-0

* Revert "chore: Update MarigoldDepthPipeline checkpoint to prs-eth/marigold-lcm-v1-0"

This reverts commit fd742b30b4.

* pokemon -> naruto

* `DPMSolverMultistep` -> `DPMSolverMultistepScheduler`

* Improve Markdown stylization

* Improve style

* Improve style

* Refactor pipeline variable names for consistency

* up style
2024-06-05 13:59:09 -07:00
Guillaume LEGENDRE
7ebd359446 Update tailscale action to main (#8403) 2024-06-05 18:53:33 +05:30
Hzzone
d3881f35b7 Gligen training (#7906)
* add training code of gligen

* fix code quality tests.

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-06-05 16:26:42 +04:00
Sayak Paul
48207d6689 [Scheduler] fix: EDM schedulers when using the exp sigma schedule. (#8385)
* fix: euledm when using the exp sigma schedule.

* fix-copies

* remove print.

* reduce friction

* yiyi's suggestioms
2024-06-04 19:31:43 -10:00
Sayak Paul
2f6f426f66 [Hunyuan] allow Hunyuan DiT to run under 6GB for GPU VRAM (#8399)
* allow hunyuan dit to run under 6GB for GPU VRAM

* add section in the docs/
2024-06-05 08:24:19 +04:00
Sayak Paul
a0542c1917 [LoRA] Remove legacy LoRA code and related adjustments (#8316)
* remove legacy code from load_attn_procs.

* finish first draft

* fix more.

* fix more

* add test

* add serialization support.

* fix-copies

* require peft backend for lora tests

* style

* fix test

* fix loading.

* empty

* address benjamin's feedback.
2024-06-05 08:15:30 +04:00
Sayak Paul
a8ad6664c2 [Hunyuan] feat: support chunked ff. (#8397)
feat: support chunked ff.
2024-06-05 08:12:18 +04:00
Sayak Paul
14f7b545bd [Hunyuan DiT] feat: enable fusing qkv projections when doing attention (#8396)
* feat: introduce qkv fusion for Hunyuan

* fix copies
2024-06-05 07:58:03 +04:00
leaps
07cd20041c Update code example in pipeline_stable_unclip_img2img.py EXAMPLE_DOC_STRING (#8401)
Update code example in pipeline_stable_unclip_img2img.py

Previous code caused an error when run
2024-06-04 17:22:46 -10:00
Sayak Paul
6ddbf6222c [Transformer2DModel] Handle norm_type safely while remapping (#8370)
* handle norm_type of transformer2d_model safely.

* log an info when old model class is being returned.

* Apply suggestions from code review

Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>

* remove extra stuff

---------

Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>
2024-06-04 13:39:19 +04:00
Sayak Paul
3ff39e8e86 [HunyuanDiT] minor docs changes in hunyuandit (#8395)
minor docs changes in hunyuandit
2024-06-04 12:18:53 +04:00
townwish4git
6be43bd855 Fix AsymmetricAutoencoderKL forward (#8378) 2024-06-03 17:25:11 -10:00
Marçal Comajoan Cara
dc89434bdc Update transformer2d.md title (#8375)
* Update transformer2d.md title

For the other classes (e.g., UNet2DModel) the title of the documentation coincides with the name of the class, but that was not the case for Transformer2DModel.

* Update model docs titles for consistency with class names
2024-06-03 17:01:21 -07:00
Dhruv Nair
4d633bfe9a Update slow test actions (#8381)
* update

* update

* update

* update
2024-06-03 18:32:34 +05:30
XCL
174cf868ea Tencent Hunyuan Team - Updated Doc for HunyuanDiT (#8383)
* add hunyuandit doc

* update hunyuandit doc

* update hunyuandit 2d model

* update toctree.yml for hunyuandit
2024-06-03 14:02:46 +04:00
XCL
413604405f Tencent Hunyuan Team: add HunyuanDiT related updates (#8240)
* Hunyuan Team: add HunyuanDiT related updates


---------

Co-authored-by: XCLiu <liuxc1996@gmail.com>
Co-authored-by: yiyixuxu <yixu310@gmail.com>
2024-06-01 12:41:21 -10:00
39th president of the United States, probably
bc108e1533 Fix DREAM training (#8302)
Co-authored-by: Jimmy <39@🇺🇸.com>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: YiYi Xu <yixu310@gmail.com>
2024-06-01 11:27:57 +04:00
Anton Obukhov
86555c9f59 Fix marigold documentation (#8372)
* rename prs-eth/marigold-lcm-v1-0 into prs-eth/marigold-depth-lcm-v1-0

* update image paths in https://huggingface.co/datasets/huggingface/documentation-images to use main branch

* fix relative paths to other diffusers pages

* Update docs/source/en/using-diffusers/marigold_usage.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2024-05-31 12:10:05 -10:00
Sayak Paul
983dec3bf7 [Core] Introduce class variants for Transformer2DModel (#7647)
* init for patches

* finish patched model.

* continuous transformer

* vectorized transformer2d.

* style.

* inits.

* fix-copies.

* introduce DiTTransformer2DModel.

* fixes

* use REMAPPING as suggested by @DN6

* better logging.

* add pixart transformer model.

* inits.

* caption_channels.

* attention masking.

* fix use_additional_conditions.

* remove print.

* debug

* flatten

* fix: assertion for sigma

* handle remapping for modeling_utils

* add tests for dit transformer2d

* quality

* placeholder for pixart tests

* pixart tests

* add _no_split_modules

* add docs.

* check

* check

* check

* check

* fix tests

* fix tests

* move Transformer output to modeling_output

* move errors better and bring back use_additional_conditions attribute.

* add unnecessary things from DiT.

* clean up pixart

* fix remapping

* fix device_map things in pixart2d.

* replace Transformer2DModel with appropriate classes in dit, pixart tests

* empty

* legacy mixin classes./

* use a remapping dict for fetching class names.

* change to specifc model types in the pipeline implementations.

* move _fetch_remapped_cls_from_config to modeling_loading_utils.py

* fix dependency problems.

* add deprecation note.
2024-05-31 13:40:27 +05:30
Dhruv Nair
f9fa8a868c Change checkpoint key used to identify CLIP models in single file checkpoints (#8319)
update
2024-05-31 11:20:31 +05:30
Jonah
05be622b1c Fix depth pipeline "input/weight type should be the same" error at fp16 (#8321)
Fix "input/weight type should be the same"

Co-authored-by: YiYi Xu <yixu310@gmail.com>
2024-05-30 13:59:49 -10:00
satani99
352d96eb82 Modularize train_text_to_image_lora_sdxl inferencing during and after training in example (#8335)
* Modularized the train_lora_sdxl file

* Modularized the train_lora_sdxl file

* Modularized the train_lora_sdxl file

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-05-31 04:52:22 +05:30
Genius Patrick
3511a9623f fix(training): lr scheduler doesn't work properly in distributed scenarios (#8312) 2024-05-30 15:23:19 +05:30
Dhruv Nair
42cae93b94 Fix StableDiffusionPipeline when text_encoder=None (#8297)
* update

* update

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-05-29 09:00:51 -10:00
Tolga Cangöz
a2ecce26bc Fix Copying Mechanism typo/bug (#8232)
* Fix copying mechanism typos

* fix copying mecha

* Revert, since they are in TODO

* Fix copying mechanism
2024-05-29 09:37:18 -07:00
Steven Liu
9e00b727ad [docs] Files and formats (#7874)
* files and formats

* fix callout

* feedback

* code sample

* feedback
2024-05-29 09:31:32 -07:00
Steven Liu
f7a4626f4b [docs] DeepFloyd training (#8224)
deepfloyd training

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-05-29 09:27:37 -07:00
Tolga Cangöz
f4a44b7707 Simplify platform_info assignment in diffusers-cli env (#8298)
chore: Simplify `platform_info` assignment
2024-05-29 17:57:42 +05:30
satani99
3bc3b48c10 Modularize train_text_to_image_lora SD inferencing during and after training in example (#8283)
* Modularized the train_lora file

* Modularized the train_lora file

* Modularized the train_lora file

* Modularized the train_lora file

* Modularized the train_lora file

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-05-29 10:08:02 +05:30
Sayak Paul
581d8aacf7 post release v0.28.0 (#8286)
* post release v0.28.0

* style
2024-05-29 07:13:22 +05:30
Sayak Paul
ba1bfac20b [Core] Refactor IPAdapterPlusImageProjection a bit (#7994)
* use IPAdapterPlusImageProjectionBlock in IPAdapterPlusImageProjection

* reposition IPAdapterPlusImageProjection

* refactor complete?

* fix heads param retrieval.

* update test dict creation method.
2024-05-29 06:30:47 +05:30
Sayak Paul
5edd0b34fa move vqmodel to models.autoencoders. (#8292)
move vqmodel to models.autoencoders.
2024-05-29 06:30:35 +05:30
Sayak Paul
3a28e36aa1 [Post release 0.28.0] remove deprecated blocks. (#8291)
* remove deprecated blocks.

* update the location paths.
2024-05-29 06:29:43 +05:30
Vladimir Mandic
3393c01c9d fix pixart-sigma negative prompt handling (#8299)
* fix negative prompt

* fix

---------

Co-authored-by: yiyixuxu <yixu310@gmail,com>
Co-authored-by: YiYi Xu <yixu310@gmail.com>
2024-05-28 13:10:35 -10:00
Steven Liu
1fa8dbc63a [docs] Outpaint (#7964)
* first draft

* edits
2024-05-28 14:42:03 -07:00
Steven Liu
0ab6dc0f23 [docs] Scheduler features (#7990)
* noise schedule

* sigmas and zero snr

* feedback

* feedback
2024-05-28 14:41:22 -07:00
Álvaro Somoza
b2030a249c Fix object has no attribute 'flush' when using without a console (#8271)
fix
2024-05-28 11:19:01 -10:00
Sajad Norouzi
67bef2027c Add Kohya fix to SD pipeline for high resolution generation (#7633)
add kohya high resolution fix.
2024-05-28 10:00:04 -10:00
Sayak Paul
aa676c641f change to yiyi's address. (#7981)
* change to yiyi's address.

* update to diffusers@huggingface.co
2024-05-28 08:28:55 -10:00
Sayak Paul
e6df8edadc [LoRA] attempt at fixing onetrainer lora. (#8242)
* attempt at fixing onetrainer lora.

* fix
2024-05-28 08:25:54 -10:00
Jiwook Han
80cfaebaa1 Fix typo in philosophy.md (#8303)
fix typo in philosophy.md
2024-05-28 10:38:48 -07:00
Álvaro Somoza
ba82414106 [docs] Add controlnet example to marigold (#8289)
* initial doc

* fix wrong LCM sentence

* implement binary colormap without requiring matplotlib
update section about Marigold for ControlNet
update formatting of marigold_usage.md

* fix indentation

---------

Co-authored-by: anton <anton.obukhov@gmail.com>
2024-05-28 11:58:06 -04:00
Sayak Paul
fe5f035f79 install wget. (#8285) 2024-05-27 18:06:07 +05:30
Anton Obukhov
b3d10d6d65 [Pipeline] Marigold depth and normals estimation (#7847)
* implement marigold depth and normals pipelines in diffusers core

* remove bibtex

* remove deprecations

* remove save_memory argument

* remove validate_vae

* remove config output

* remove batch_size autodetection

* remove presets logic
move default denoising_steps and processing_resolution into the model config
make default ensemble_size 1

* remove no_grad

* add fp16 to the example usage

* implement is_matplotlib_available
use is_matplotlib_available, is_scipy_available for conditional imports in the marigold depth pipeline

* move colormap, visualize_depth, and visualize_normals into export_utils.py

* make the denoising loop more lucid
fix the outputs to always be 4d tensors or lists of pil images
support a 4d input_image case
attempt to support model_cpu_offload_seq
move check_inputs into a separate function
change default batch_size to 1, remove any logic to make it bigger implicitly

* style

* rename denoising_steps into num_inference_steps

* rename input_image into image

* rename input_latent into latents

* remove decode_image
change decode_prediction to use the AutoencoderKL.decode method

* move clean_latent outside of progress_bar

* refactor marigold-reusable image processing bits into MarigoldImageProcessor class

* clean up the usage example docstring

* make ensemble functions members of the pipelines

* add early checks in check_inputs
rename E into ensemble_size in depth ensembling

* fix vae_scale_factor computation

* better compatibility with torch.compile
better variable naming

* move export_depth_to_png to export_utils

* remove encode_prediction

* improve visualize_depth and visualize_normals to accept multi-dimensional data and lists
remove visualization functions from the pipelines
move exporting depth as 16-bit PNGs functionality from the depth pipeline
update example docstrings

* do not shortcut vae.config variables

* change all asserts to raise ValueError

* rename output_prediction_type to output_type

* better variable names
clean up variable deletion code

* better variable names

* pass desc and leave kwargs into the diffusers progress_bar
implement nested progress bar for images and steps loops

* implement scale_invariant and shift_invariant flags in the ensemble_depth function
add scale_invariant and shift_invariant flags readout from the model config
further refactor ensemble_depth
support ensembling without alignment
add ensemble_depth docstring

* fix generator device placement checks

* move encode_empty_text body into the pipeline call

* minor empty text encoding simplifications

* adjust pipelines' class docstrings to explain the added construction arguments

* improve the scipy failure condition
add comments
improve docstrings
change the default use_full_z_range to True

* make input image values range check configurable in the preprocessor
refactor load_image_canonical in preprocessor to reject unknown types and return the image in the expected 4D format of tensor and on right device
support a list of everything as inputs to the pipeline, change type to PipelineImageInput
implement a check that all input list elements have the same dimensions
improve docstrings of pipeline outputs
remove check_input pipeline argument

* remove forgotten print

* add prediction_type model config

* add uncertainty visualization into export utils
fix NaN values in normals uncertainties

* change default of output_uncertainty to False
better handle the case of an attempt to export or visualize none

* fix `output_uncertainty=False`

* remove kwargs
fix check_inputs according to the new inputs of the pipeline

* rename prepare_latent into prepare_latents as in other pipelines
annotate prepare_latents in normals pipeline with "Copied from"
annotate encode_image in normals pipeline with "Copied from"

* move nested-capable `progress_bar` method into the pipelines
revert the original `progress_bar` method in pipeline_utils

* minor message improvement

* fix cpu offloading

* move colormap, visualize_depth, export_depth_to_16bit_png, visualize_normals, visualize_uncertainty to marigold_image_processing.py
update example docstrings

* fix missing comma

* change torch.FloatTensor to torch.Tensor

* fix importing of MarigoldImageProcessor

* fix vae offloading
fix batched image encoding
remove separate encode_image function and use vae.encode instead

* implement marigold's intial tests
relax generator checks in line with other pipelines
implement return_dict __call__ argument in line with other pipelines

* fix num_images computation

* remove MarigoldImageProcessor and outputs from import structure
update tests

* update docstrings

* update init

* update

* style

* fix

* fix

* up

* up

* up

* add simple test

* up

* update expected np input/output to be channel last

* move expand_tensor_or_array into the MarigoldImageProcessor

* rewrite tests to follow conventions - hardcoded slices instead of image artifacts
write more smoke tests

* add basic docs.

* add anton's contribution statement

* remove todos.

* fix assertion values for marigold depth slow tests

* fix assertion values for depth normals.

* remove print

* support AutoencoderTiny in the pipelines

* update documentation page
add Available Pipelines section
add Available Checkpoints section
add warning about num_inference_steps

* fix missing import in docstring
fix wrong value in visualize_depth docstring

* [doc] add marigold to pipelines overview

* [doc] add section "usage examples"

* fix an issue with latents check in the pipelines

* add "Frame-by-frame Video Processing with Consistency" section

* grammarly

* replace tables with images with css-styled images (blindly)

* style

* print

* fix the assertions.

* take from the github runner.

* take the slices from action artifacts

* style.

* update with the slices from the runner.

* remove unnecessary code blocks.

* Revert "[doc] add marigold to pipelines overview"

This reverts commit a505165150afd8dab23c474d1a054ea505a56a5f.

* remove invitation for new modalities

* split out marigold usage examples

* doc cleanup

---------

Co-authored-by: yiyixuxu <yixu310@gmail.com>
Co-authored-by: yiyixuxu <yixu310@gmail,com>
Co-authored-by: sayakpaul <spsayakpaul@gmail.com>
2024-05-27 17:21:49 +05:30
Dhruv Nair
b82f9f5666 Add zip package to doc builder image (#8284)
update
2024-05-27 15:50:00 +05:30
Sayak Paul
6a5ba1b719 [Workflows] add a more secure way to run tests from a PR. (#7969)
* add a more secure way to run tests from a PR.

* make pytest more secure.

* address dhruv's comments.

* improve validation check.

* Update .github/workflows/run_tests_from_a_pr.yml

Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>

---------

Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>
2024-05-27 13:47:50 +05:30
Dhaivat Bhatt
4d40c9140c Add details about 1-stage implementation in I2VGen-XL docs (#8282)
* Add details about 1-stage implementation

* Add details about 1-stage implementation
2024-05-27 09:56:32 +05:30
Tolga Cangöz
0ab63ff647 Fix CPU Offloading Usage & Typos (#8230)
* Fix typos

* Fix `pipe.enable_model_cpu_offload()` usage

* Fix cpu offloading

* Update numbers
2024-05-24 11:25:29 -07:00
Tolga Cangöz
db33af065b Fix a grammatical error in the raise messages (#8272)
Fix grammatical error
2024-05-24 11:15:00 -07:00
Yue Wu
1096f88e2b sampling bug fix in diffusers tutorial "basic_training.md" (#8223)
sampling bug fix in basic_training.md

In the diffusers basic training tutorial, setting the manual seed argument (generator=torch.manual_seed(config.seed)) in the pipeline call inside evaluate() function rewinds the dataloader shuffling, leading to overfitting due to the model seeing same sequence of training examples after every evaluation call. Using generator=torch.Generator(device='cpu').manual_seed(config.seed) avoids this.
2024-05-24 11:14:32 -07:00
Dhruv Nair
cef4a51223 Clean up from_single_file docs (#8268)
* update

* update
2024-05-24 17:43:51 +05:30
Lucain
edf5ba6a17 Respect resume_download deprecation V2 (#8267)
* Fix resume_downoad FutureWarning

* only resume download
2024-05-24 12:11:03 +02:00
Sayak Paul
9941f1f61b [Chore] run the documentation workflow in a custom container. (#8266)
run the documentation workflow in a custom container.
2024-05-24 15:10:02 +05:30
Yifan Zhou
46a9db0336 [Community Pipeline] FRESCO: Spatial-Temporal Correspondence for Zero-Shot Video Translation (#8239)
* code and doc

* update paper link

* remove redundant codes

* add example video

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-05-24 14:44:20 +05:30
Dhruv Nair
370146e4e0 Use freedesktop_os_release() in diffusers cli for Python >=3.10 (#8235)
* update

* update
2024-05-24 13:30:40 +05:30
Dhruv Nair
5cd45c24bf Create custom container for doc builder (#8263)
* update

* update
2024-05-24 12:53:48 +05:30
Dhruv Nair
67b3fe0aae Fix resize issue in SVD pipeline with VideoProcessor (#8229)
update

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-05-23 11:57:34 +05:30
Dhruv Nair
baab065679 Remove unnecessary single file tests for SD Cascade UNet (#7996)
update
2024-05-22 12:29:59 +05:30
BootesVoid
509741aea7 fix: Attribute error in Logger object (logger.warning) (#8183) 2024-05-22 12:29:11 +05:30
Lucain
e1df77ee1e Use HF_TOKEN env var in CI (#7993) 2024-05-21 14:58:10 +05:30
Steven Liu
fdb1baa05c [docs] VideoProcessor (#7965)
* fix?

* fix?

* fix
2024-05-21 08:18:21 +05:30
Vinh H. Pham
6529ee67ec Make VAE compatible to torch.compile() (#7984)
make VAE compatible to torch.compile()

Co-authored-by: YiYi Xu <yixu310@gmail.com>
2024-05-20 13:43:59 -04:00
Sai-Suraj-27
df2bc5ef28 fix: Fixed few docstrings according to the Google Style Guide (#7717)
Fixed few docstrings according to the Google Style Guide.
2024-05-20 10:26:05 -07:00
Aleksei Zhuravlev
a7bf77fc28 Passing cross_attention_kwargs to StableDiffusionInstructPix2PixPipeline (#7961)
* Update pipeline_stable_diffusion_instruct_pix2pix.py

Add `cross_attention_kwargs` to `__call__` method of `StableDiffusionInstructPix2PixPipeline`, which are passed to UNet.

* Update documentation for pipeline_stable_diffusion_instruct_pix2pix.py

* Update docstring

* Update docstring

* Fix typing import
2024-05-20 13:14:34 -04:00
Junsong Chen
0f0defdb65 [docs] add doc for PixArtSigmaPipeline (#7857)
* 1. add doc for PixArtSigmaPipeline;

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
Co-authored-by: Guillaume LEGENDRE <glegendre01@gmail.com>
Co-authored-by: Álvaro Somoza <asomoza@users.noreply.github.com>
Co-authored-by: Bagheera <59658056+bghira@users.noreply.github.com>
Co-authored-by: bghira <bghira@users.github.com>
Co-authored-by: Hyoungwon Cho <jhw9811@korea.ac.kr>
Co-authored-by: yiyixuxu <yixu310@gmail.com>
Co-authored-by: Tolga Cangöz <46008593+standardAI@users.noreply.github.com>
Co-authored-by: Philip Pham <phillypham@google.com>
2024-05-20 12:40:57 -04:00
Nikita
19df9f3ec0 Update pipeline_controlnet_inpaint_sd_xl.py (#7983) 2024-05-20 12:24:49 -04:00
Jacob Marks
d6ca120987 Fix typo in "attention" (#7977) 2024-05-20 11:54:29 -04:00
Sayak Paul
fb7ae0184f [tests] fix Pixart Sigma tests (#7966)
* checking tests

* checking ii.

* remove prints.

* test_pixart_1024

* fix 1024.
2024-05-19 20:56:31 +05:30
Sayak Paul
70f8d4b488 remove unsafe workflow. (#7967) 2024-05-17 13:46:24 +05:30
Álvaro Somoza
6c60e430ee Consistent SDXL Controlnet callback tensor inputs (#7958)
* make _callback_tensor_inputs consistent between sdxl pipelines

* forgot this one

* fix failing test

* fix test_components_function

* fix controlnet inpaint tests
2024-05-16 07:15:10 -10:00
Alphin Jain
1221b28eac Fix AttributeError in train_lcm_distill_lora_sdxl_wds.py (#7923)
Fix conditional teacher model check in train_lcm_distill_lora_sdxl_wds.py

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-05-16 15:49:54 +05:30
Liang Hou
746f603b20 Fix the text tokenizer name in logger warning of PixArt pipelines (#7912)
Fix CLIP to T5 in logger warning
2024-05-15 18:49:29 -10:00
Sai-Suraj-27
2afea72d29 refactor: Refactored code by Merging isinstance calls (#7710)
* Merged isinstance calls to make the code simpler.

* Corrected formatting errors using ruff.

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: YiYi Xu <yixu310@gmail.com>
2024-05-15 18:33:19 -10:00
Sayak Paul
0f111ab794 [Workflows] add a workflow that can be manually triggered on a PR. (#7942)
* add a workflow that can be manually triggered on a PR.

* remove sudo

* add command

* small fixes.
2024-05-15 17:18:56 +05:30
Guillaume LEGENDRE
4dd7aaa06f move to GH hosted M1 runner (#7949) 2024-05-15 13:47:36 +05:30
Isamu Isozaki
d27e996ccd Adding VQGAN Training script (#5483)
* Init commit

* Removed einops

* Added default movq config for training

* Update explanation of prompts

* Fixed inheritance of discriminator and init_tracker

* Fixed incompatible api between muse and here

* Fixed output

* Setup init training

* Basic structure done

* Removed attention for quick tests

* Style fixes

* Fixed vae/vqgan styles

* Removed redefinition of wandb

* Fixed log_validation and tqdm

* Nothing commit

* Added commit loss to lookup_from_codebook

* Update src/diffusers/models/vq_model.py

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

* Adding perliminary README

* Fixed one typo

* Local changes

* Fixed main issues

* Merging

* Update src/diffusers/models/vq_model.py

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

* Testing+Fixed bugs in training script

* Some style fixes

* Added wandb to docs

* Fixed timm test

* get testing suite ready.

* remove return loss

* remove return_loss

* Remove diffs

* Remove diffs

* fix ruff format

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>
2024-05-15 08:47:12 +05:30
Sayak Paul
72780ff5b1 [tests] decorate StableDiffusion21PipelineSingleFileSlowTests with slow. (#7941)
decorate StableDiffusion21PipelineSingleFileSlowTests with slow.
2024-05-14 14:26:21 -10:00
Jingyang Zhang
69fdb8720f [Pipeline] Adding BoxDiff to community examples (#7947)
add boxdiff to community examples
2024-05-14 11:18:29 -10:00
Nikita
b2140a895b Fix added_cond_kwargs when using IP-Adapter in StableDiffusionXLControlNetInpaintPipeline (#7924)
Fix `added_cond_kwargs` when using IP-Adapter

Fix error when using IP-Adapter in pipeline and passing `ip_adapter_image_embeds` instead of `ip_adapter_image`

Co-authored-by: YiYi Xu <yixu310@gmail.com>
2024-05-14 10:32:08 -10:00
Sayak Paul
e0e8c58f64 [Core] separate the loading utilities in modeling similar to pipelines. (#7943)
separate the loading utilities in modeling similar to pipelines.
2024-05-14 22:33:43 +05:30
Sayak Paul
cbea5d1725 update to use hf-workflows for reporting the Docker build statuses (#7938)
update to use hf-workflows for reporting
2024-05-14 09:25:13 +05:30
Tolga Cangöz
a1245c2c61 Expansion proposal of diffusers-cli env (#7403)
* Expand `diffusers-cli env`

* SafeTensors -> Safetensors

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

* Move `safetensors_version = "not installed"` to `else`

* Update `safetensors_version` checking

* Add GPU detection for Linux, Mac OS, and Windows

* Add accelerator detection to environment command

* Add is_peft_version to import_utils

* Update env.py

* Add `huggingface_hub` reference

* Add `transformers` reference

* Add reference for `huggingface_hub`

* Fix print statement in env.py for unusual OS

* Up

* Fix platform information in env.py

* up

* Fix import order in env.py

* ruff

* make style

* Fix platform system check in env.py

* Fix run method return type in env.py

* 🤗

* No need f-string

* Remove location info

* Remove accelerate config

* Refactor env.py to remove accelerate config

* feat: Add support for `bitsandbytes` library in environment command

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>
2024-05-14 08:20:24 +05:30
bssrdf
cdda94f412 fix VAE loading issue in train_dreambooth (#7632)
* fixed vae loading issue #7619

* rerun make style && make quality

* bring back model_has_vae and add change \ to / in config_file_name on windows os to make match work

* add missing import platform

* bring back import model_info

* make config_file_name OS independent

* switch to using Path.as_posix() to resolve OS dependence

* improve style

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: bssrdf <bssrdf@gmail.com>
Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>
2024-05-14 08:19:53 +05:30
dependabot[bot]
5b830aa356 Bump transformers from 4.36.0 to 4.38.0 in /examples/research_projects/realfill (#7635)
Bump transformers in /examples/research_projects/realfill

Bumps [transformers](https://github.com/huggingface/transformers) from 4.36.0 to 4.38.0.
- [Release notes](https://github.com/huggingface/transformers/releases)
- [Commits](https://github.com/huggingface/transformers/compare/v4.36.0...v4.38.0)

---
updated-dependencies:
- dependency-name: transformers
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-05-14 08:17:06 +05:30
Kohei
9e7bae9881 Update requirements.txt for text_to_image (#7892)
Update requirements.txt

If the datasets library is old, it will not read the metadata.jsonl and the label will default to an integer of type int.

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-05-14 08:09:12 +05:30
rebel-kblee
b41ce1e090 fix multicontrolnet save_pretrained logic for compatibility (#7821)
fix multicontrolnet save_pretrained logic for compatibility

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-05-13 09:32:06 -10:00
Sayak Paul
95d3748453 [LoRA] Fix LoRA tests (side effects of RGB ordering) part ii (#7932)
* check

* check 2.

* update slices
2024-05-13 09:23:48 -10:00
Fabio Rigano
44aa9e566d fix AnimateDiff creation with a unet loaded with IP Adapter (#7791)
* Fix loading from_pipe

* Fix style

---------

Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>
2024-05-13 08:15:01 -10:00
Álvaro Somoza
fdb05f54ef Official callbacks (#7761) 2024-05-12 17:10:29 -10:00
HelloWorldBeginner
98ba18ba55 Add Ascend NPU support for SDXL. (#7916)
Co-authored-by: mhh001 <mahonghao1@huawei.com>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-05-12 13:34:23 +02:00
Sayak Paul
5bb38586a9 [Core] fix offload behaviour when device_map is enabled. (#7919)
fix offload behaviour when device_map is enabled.
2024-05-12 13:29:43 +02:00
Sai-Suraj-27
ec9e88139a fix: Fixed a wrong link to supported python versions in contributing.md file (#7638)
* Fixed a wrong link to python versions in contributing.md file.

* Updated the link to a permalink, so that it will permanently point to the specific line.
2024-05-12 13:21:18 +02:00
momo
e4f8dca9a0 add custom sigmas and timesteps for StableDiffusionXLControlNet pipeline (#7913)
add custom sigmas and timesteps
2024-05-11 23:33:19 -10:00
HelloWorldBeginner
0267c5233a fix bugs when using deepspeed in sdxl (#7917)
fix bugs when using deepspeed

Co-authored-by: mhh001 <mahonghao1@huawei.com>
2024-05-11 20:49:09 +02:00
Mark Van Aken
be4afa0bb4 #7535 Update FloatTensor type hints to Tensor (#7883)
* find & replace all FloatTensors to Tensor

* apply formatting

* Update torch.FloatTensor to torch.Tensor in the remaining files

* formatting

* Fix the rest of the places where FloatTensor is used as well as in documentation

* formatting

* Update new file from FloatTensor to Tensor
2024-05-10 09:53:31 -10:00
Sayak Paul
04f4bd54ea [Core] introduce videoprocessor. (#7776)
* introduce videoprocessor.

* fix quality

* address yiyi's feedback

* fix preprocess_video call.

* video_processor -> image_processor

* fix

* fix more.

* quality

* image_processor -> video_processor

* support List[List[PIL.Image.Image]]

* change to video_processor.

* documentation

* Apply suggestions from code review

* changes

* remove print.

* refactor video processor (part # 7776) (#7861)

* update

* update remove deprecate

* Update src/diffusers/video_processor.py

* update

* Apply suggestions from code review

* deprecate list of 5d for video and list of 4d for image + apply other feedbacks

* up

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

* add doc.

* tensor2vid -> postprocess_video.

* refactor preprocess with preprocess_video

* set default values.

* empty commit

* more refactoring of prepare_latents in animatediff vid2vid

* checking documentation

* remove documentation for now.

* fix animatediff sdxl

* fix test failure [part of video processor PR] (#7905)

up

* remove preceed_with_frames.

* doc

* fix

* fix

* remove video input as a single-frame video.

---------

Co-authored-by: YiYi Xu <yixu310@gmail.com>
2024-05-10 21:02:36 +02:00
Sayak Paul
82be58c512 add missing image processors to the docs (#7910)
add missing processors.
2024-05-10 14:53:57 +02:00
Sayak Paul
6695635696 upgrade to python 3.10 in the Dockerfiles (#7893)
* upgrade to python 3.10

* fix

* try https://askubuntu.com/questions/1459694/can-not-find-python3-10-after-apt-get-installation

* fix

* up

* yes

* okay

* up

* up

* up

* up

* up

* check

* okay

* up

* i[

* fix
2024-05-10 14:29:08 +02:00
YiYi Xu
b934215d4c [scheduler] support custom timesteps and sigmas (#7817)
* support custom sigmas and timesteps, dpm euler

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2024-05-09 11:07:43 -10:00
YiYi Xu
5ed3abd371 fix _optional_components in StableCascadeCombinedPipeline (#7894)
* fix

* up
2024-05-09 06:32:55 -10:00
Dhruv Nair
1087a510b5 Set max parallel jobs on slow test runners (#7878)
* set max parallel

* update

* update

* update
2024-05-09 19:42:18 +05:30
Sayak Paul
305f2b4498 [Tests] fix things after #7013 (#7899)
* debugging

* save the resulting image

* check if order reversing works.

* checking values.

* up

* okay

* checking

* fix

* remove print
2024-05-09 16:05:35 +02:00
Dhruv Nair
cb0f3b49cb [Refactor] Better align from_single_file logic with from_pretrained (#7496)
* refactor unet single file loading a bit.

* retrieve the unet from create_diffusers_unet_model_from_ldm

* update

* update

* updae

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* tests

* update

* update

* update

* Update docs/source/en/api/single_file.md

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

* Update docs/source/en/api/single_file.md

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* Update docs/source/en/api/loaders/single_file.md

Co-authored-by: YiYi Xu <yixu310@gmail.com>

* Update src/diffusers/loaders/single_file.py

Co-authored-by: YiYi Xu <yixu310@gmail.com>

* Update docs/source/en/api/loaders/single_file.md

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

* Update docs/source/en/api/loaders/single_file.md

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

* Update docs/source/en/api/loaders/single_file.md

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

* Update docs/source/en/api/loaders/single_file.md

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

---------

Co-authored-by: sayakpaul <spsayakpaul@gmail.com>
Co-authored-by: YiYi Xu <yixu310@gmail.com>
2024-05-09 19:00:19 +05:30
Tolga Cangöz
caf9e985df Fix several imports (#7712)
Fix imports
2024-05-09 07:34:44 +02:00
Tolga Cangöz
c1c42698c9 Remove dead code and fix f-string issue (#7720)
* Remove dead code

* PylancereportGeneralTypeIssues: Strings nested within an f-string cannot use the same quote character as the f-string prior to Python 3.12.

* Remove dead code
2024-05-08 13:15:28 -10:00
Pierre Dulac
75aab34675 Allow users to save SDXL LoRA weights for only one text encoder (#7607)
SDXL LoRA weights for text encoders should be decoupled on save

The method checks if at least one of unet, text_encoder and
text_encoder_2 lora weights are passed, which was not reflected in the
implentation.

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: YiYi Xu <yixu310@gmail.com>
2024-05-08 10:41:58 -10:00
YiYi Xu
35358a2dec fix offload test (#7868)
fix

Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>
2024-05-08 07:59:08 -10:00
Aryan
818f760732 [Pipeline] AnimateDiff SDXL (#6721)
* update conversion script to handle motion adapter sdxl checkpoint

* add animatediff xl

* handle addition_embed_type

* fix output

* update

* add imports

* make fix-copies

* add decode latents

* update docstrings

* add animatediff sdxl to docs

* remove unnecessary lines

* update example

* add test

* revert conv_in conv_out kernel param

* remove unused param addition_embed_type_num_heads

* latest IPAdapter impl

* make fix-copies

* fix return

* add IPAdapterTesterMixin to tests

* fix return

* revert based on suggestion

* add freeinit

* fix test_to_dtype test

* use StableDiffusionMixin instead of different helper methods

* fix progress bar iterations

* apply suggestions from review

* hardcode flip_sin_to_cos and freq_shift

* make fix-copies

* fix ip adapter implementation

* fix last failing test

* make style

* Update docs/source/en/api/pipelines/animatediff.md

Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>

* remove todo

* fix doc-builder errors

---------

Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>
2024-05-08 21:27:14 +05:30
Philip Pham
f29b93488d Check shape and remove deprecated APIs in scheduling_ddpm_flax.py (#7703)
`model_output.shape` may only have rank 1.

There are warnings related to use of random keys.

```
tests/schedulers/test_scheduler_flax.py: 13 warnings
  /Users/phillypham/diffusers/src/diffusers/schedulers/scheduling_ddpm_flax.py:268: FutureWarning: normal accepts a single key, but was given a key array of shape (1, 2) != (). Use jax.vmap for batching. In a future JAX version, this will be an error.
    noise = jax.random.normal(split_key, shape=model_output.shape, dtype=self.dtype)

tests/schedulers/test_scheduler_flax.py::FlaxDDPMSchedulerTest::test_betas
  /Users/phillypham/virtualenv/diffusers/lib/python3.9/site-packages/jax/_src/random.py:731: FutureWarning: uniform accepts a single key, but was given a key array of shape (1,) != (). Use jax.vmap for batching. In a future JAX version, this will be an error.
    u = uniform(key, shape, dtype, lo, hi)  # type: ignore[arg-type]
```
2024-05-08 13:57:19 +02:00
Tolga Cangöz
d50baf0c63 Fix image upcasting (#7858)
Fix image's upcasting before `vae.encode()` when using `fp16`

Co-authored-by: YiYi Xu <yixu310@gmail.com>
2024-05-07 16:45:02 -10:00
Hyoungwon Cho
c2217142bd Modification on the PAG community pipeline (re) (#7876)
* edited_pag_implementation

* update

---------

Co-authored-by: yiyixuxu <yixu310@gmail.com>
2024-05-07 16:35:15 -10:00
Bagheera
8edaf3b79c 7879 - adjust documentation to use naruto dataset, since pokemon is now gated (#7880)
* 7879 - adjust documentation to use naruto dataset, since pokemon is now gated

* replace references to pokemon in docs

* more references to pokemon replaced

* Japanese translation update

---------

Co-authored-by: bghira <bghira@users.github.com>
2024-05-07 09:36:39 -07:00
Álvaro Somoza
23e091564f Fix for "no lora weight found module" with some loras (#7875)
* return layer weight if not found

* better system and test

* key example and typo
2024-05-07 13:54:57 +02:00
Steven Liu
0d23645bd1 [docs] Distilled inference (#7834)
* combine

* edits
2024-05-06 15:07:25 -07:00
Guillaume LEGENDRE
7fa3e5b0f6 Ci - change cache folder (#7867) 2024-05-06 17:55:24 +05:30
Steven Liu
49b959b540 [docs] LCM (#7829)
* lcm

* lcm lora

* fix

* fix hfoption

* edits
2024-05-03 16:08:27 -07:00
HelloWorldBeginner
58237364b1 Add Ascend NPU support for SDXL fine-tuning and fix the model saving bug when using DeepSpeed. (#7816)
* Add Ascend NPU support for SDXL fine-tuning and fix the model saving bug when using DeepSpeed.

* fix check code quality

* Decouple the NPU flash attention and make it an independent module.

* add doc and unit tests for npu flash attention.

---------

Co-authored-by: mhh001 <mahonghao1@huawei.com>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-05-03 08:14:34 -10:00
Dhruv Nair
3e35628873 Remove installing python again in container (#7852)
update
2024-05-03 15:09:15 +05:30
Lucain
6a479588db Respect resume_download deprecation (#7843)
* Deprecate resume_download

* align docstring with transformers

* style

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-05-03 08:42:57 +02:00
Aritra Roy Gosthipaty
fa489eaed6 [Tests] reduce the model size in the blipdiffusion fast test (#7849)
reducing model size
2024-05-03 07:46:48 +05:30
Dhruv Nair
0d7c479023 Update deps for pipe test fetcher (#7838)
update

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-05-02 20:36:47 +05:30
Guillaume LEGENDRE
ce97d7e19b Change GPU Runners (#7840)
* Move to new GPU Runners for slow tests

* Move to new GPU Runners for nightly tests
2024-05-02 18:48:46 +05:30
Guillaume LEGENDRE
44ba90caff move to new runners (#7839) 2024-05-02 14:53:38 +02:00
Dhruv Nair
3c85a57297 Update CI cache (#7832)
update

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-05-02 14:03:35 +05:30
Dhruv Nair
03ca11318e Update download diff format tests (#7831)
update

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-05-02 13:15:38 +05:30
Dhruv Nair
3ffa7b46e5 Fix hanging pipeline fetching (#7837)
update
2024-05-02 13:08:57 +05:30
yunseong Cho
c1b2a89e34 Fix key error for dictionary with randomized order in convert_ldm_unet_checkpoint (#7680)
fix key error for different order

Co-authored-by: yunseong <yunseong.cho@superlabs.us>
Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>
2024-05-02 10:29:55 +05:30
Aritra Roy Gosthipaty
435d37ce5a [Tests] reduce the model size in the audioldm fast test (#7833)
chore: initial size reduction of models
2024-05-02 06:03:52 +05:30
YiYi Xu
5915c2985d [ip-adapter] fix ip-adapter for StableDiffusionInstructPix2PixPipeline (#7820)
update prepare_ip_adapter_ for pix2pix
2024-05-01 06:27:43 -10:00
YiYi Xu
21a7ff12a7 update the logic of is_sequential_cpu_offload (#7788)
* up

* add comment to the tests + fix dit

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-05-01 06:25:57 -10:00
Sayak Paul
8909ab4b19 [Tests] fix: device map tests for models (#7825)
* fix: device module tests

* remove patch file

* Empty-Commit
2024-05-01 18:45:47 +05:30
Dhruv Nair
c1edb03c37 Fix for pipeline slow test fetcher (#7824)
* update

* update
2024-05-01 17:36:54 +05:30
Steven Liu
0d08370263 [docs] Community pipelines (#7819)
* community pipelines

* feedback

* consolidate
2024-04-30 14:10:14 -07:00
Tolga Cangöz
b8ccb46259 Fix CPU offload in docstring (#7827)
Fix cpu offload
2024-04-30 10:53:27 -07:00
Dhruv Nair
725ead2f5e SSH Runner Workflow Update (#7822)
* add debug workflow

* update
2024-04-30 20:14:18 +05:30
Linoy Tsaban
26a7851e1e Add B-Lora training option to the advanced dreambooth lora script (#7741)
* add blora

* add blora

* add blora

* add blora

* little changes

* little changes

* remove redundancies

* fixes

* add B LoRA to readme

* style

* inference

* defaults + path to loras+ generation

* minor changes

* style

* minor changes

* minor changes

* blora arg

* added --lora_unet_blocks

* style

* Update examples/advanced_diffusion_training/README.md

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

* add commit hash to B-LoRA repo cloneing

* change inference, remove cloning

* change inference, remove cloning
add section about configureable unet blocks

* change inference, remove cloning
add section about configureable unet blocks

* Apply suggestions from code review

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-04-30 09:46:30 +05:30
Sayak Paul
3fd31eef51 [Core] introduce _no_split_modules to ModelMixin (#6396)
* introduce _no_split_modules.

* unnecessary spaces.

* remove unnecessary kwargs and style

* fix: accelerate imports.

* change to _determine_device_map

* add the blocks that have residual connections.

* add: CrossAttnUpBlock2D

* add: testin

* style

* line-spaces

* quality

* add disk offload test without safetensors.

* checking disk offloading percentages.

* change model split

* add: utility for checking multi-gpu requirement.

* model parallelism test

* splits.

* splits.

* splits

* splits.

* splits.

* splits.

* offload folder to test_disk_offload_with_safetensors

* add _no_split_modules

* fix-copies
2024-04-30 08:46:51 +05:30
Aritra Roy Gosthipaty
b02e2113ff [Tests] reduce the model size in the amused fast test (#7804)
* chore: reducing model sizes

* chore: shrinks further

* chore: shrinks further

* chore: shrinking model for img2img pipeline

* chore: reducing size of model for inpaint pipeline

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-04-30 08:11:26 +05:30
Aritra Roy Gosthipaty
21f023ec1a [Tests] reduce the model size in the ddpm fast test (#7797)
* chore: reducing unet size for faster tests

* review suggestions

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-04-30 08:11:03 +05:30
Aritra Roy Gosthipaty
31d9f9ea77 [Tests] reduce the model size in the ddim fast test (#7803)
chore: reducing model size for ddim fast pipeline

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-04-30 07:54:38 +05:30
Clint Adams
f53352f750 Set main_input_name in StableDiffusionSafetyChecker to "clip_input" (#7500)
FlaxStableDiffusionSafetyChecker sets main_input_name to "clip_input".
This makes StableDiffusionSafetyChecker consistent.

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: YiYi Xu <yixu310@gmail.com>
2024-04-29 11:39:59 -10:00
RuiningLi
83ae24ce2d Added get_velocity function to EulerDiscreteScheduler. (#7733)
* Added get_velocity function to EulerDiscreteScheduler.

* Fix white space on blank lines

* Added copied from statement

* back to the original.

---------

Co-authored-by: Ruining Li <ruining@robots.ox.ac.uk>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-04-29 10:32:13 -10:00
jschoormans
8af793b2d4 Adding TextualInversionLoaderMixin for the controlnet_inpaint_sd_xl pipeline (#7288)
* added TextualInversionMixIn to controlnet_inpaint_sd_xl pipeline


---------

Co-authored-by: YiYi Xu <yixu310@gmail.com>
2024-04-29 09:00:53 -10:00
Dhruv Nair
eb96ff0d59 Safetensor loading in AnimateDiff conversion scripts (#7764)
* update

* update
2024-04-29 17:36:50 +05:30
Yushu
a38dd79512 [Pipeline] Fix error of SVD pipeline when num_videos_per_prompt > 1 (#7786)
swap the order for do_classifier_free_guidance concat with repeat

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>
2024-04-29 16:24:16 +05:30
Dhruv Nair
b1c5817a89 Add debugging workflow (#7778)
add debug workflow

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-04-29 13:44:39 +05:30
Nilesh
235d34cf56 Check for latents, before calling prepare_latents - sdxlImg2Img (#7582)
* Check for latents, before calling prepare_latents - sdxlImg2Img

* Added latents check for all the img2img pipeline

* Fixed silly mistake while checking latents as None
2024-04-28 14:53:29 -10:00
Jenyuan-Huang
5029673987 Update InstantStyle usage in IP-Adapter documentation (#7806)
* enable control ip-adapter per-transformer block on-the-fly


---------

Co-authored-by: sayakpaul <spsayakpaul@gmail.com>
Co-authored-by: ResearcherXman <xhs.research@gmail.com>
Co-authored-by: YiYi Xu <yixu310@gmail.com>
2024-04-28 10:34:57 -10:00
Sayak Paul
56bd7e67c2 [Scheduler] introduce sigma schedule. (#7649)
* introduce sigma schedule.

Co-authored-by: Suraj Patil <surajp815@gmail.com>

* address yiyi

* update docstrings.

* implement the schedule for EDMDPMSolverMultistepScheduler

---------

Co-authored-by: Suraj Patil <surajp815@gmail.com>
2024-04-27 07:40:35 +05:30
39th president of the United States, probably
9d16daaf64 Add DREAM training (#6381)
A new function compute_dream_and_update_latents has been added to the
training utilities that allows you to do DREAM rectified training in line
with the paper https://arxiv.org/abs/2312.00210. The method can be used
with an extra argument in the train_text_to_image.py script.

Co-authored-by: Jimmy <39@🇺🇸.com>
2024-04-27 07:19:15 +05:30
Fabio Rigano
8e4ca1b6b2 [Docs] Update image masking and face id example (#7780)
* [Docs] Update image masking and face id example

* Update docs

* Fix docs
2024-04-26 12:51:11 -10:00
Beinsezii
0d2d424fbe Add PixArtSigmaPipeline to AutoPipeline mapping (#7783) 2024-04-26 09:10:20 -10:00
Steven Liu
e24e54fdfa [docs] Fix AutoPipeline docstring (#7779)
fix

Co-authored-by: YiYi Xu <yixu310@gmail.com>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-04-26 10:09:36 -07:00
btlorch
ebc99a77aa Convert RGB to BGR for the SDXL watermark encoder (#7013)
* Convert channel order to BGR for the watermark encoder. Convert the watermarked BGR images back to RGB. Fixes #6292

* Revert channel order before stacking images to overcome limitations that negative strides are currently not supported

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-04-25 14:44:53 -10:00
Steven Liu
fa750a15bd [docs] Refactor image quality docs (#7758)
* refactor

* code snippets

* fix path

* fix path in guide

* code outputs

* align toctree title

* title

* fix title
2024-04-25 16:55:35 -07:00
Steven Liu
181688012a [docs] Reproducible pipelines (#7769)
* reproducibility

* feedback

* feedback

* fix path

* github link
2024-04-25 16:15:12 -07:00
Sayak Paul
142f353e1c Fix lora device test (#7738)
* fix lora device test

* fix more.

* fix more/

* quality

* empty

---------

Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>
2024-04-25 18:05:27 +05:30
Sayak Paul
b833d0fc80 [Tests] mark UNetControlNetXSModelTests::test_forward_no_control to be flaky (#7771)
decorate UNetControlNetXSModelTests::test_forward_no_control with is_flaky
2024-04-25 07:29:04 +05:30
Sayak Paul
e963621649 [PixArt] fix small nits in pixart sigma (#7767)
fix small nits in pixart sigma
2024-04-25 06:37:35 +05:30
Junsong Chen
39215aa30e PixArt-Sigma Implementation (#7654)
* support PixArt-DMD

---------

Co-authored-by: jschen <chenjunsong4@h-partners.com>
Co-authored-by: badayvedat <badayvedat@gmail.com>
Co-authored-by: Vedat Baday <54285744+badayvedat@users.noreply.github.com>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: YiYi Xu <yixu310@gmail.com>
Co-authored-by: yiyixuxu <yixu310@gmail,com>
2024-04-23 22:33:08 -10:00
Dhruv Nair
9ef43f38d4 Fix test for consistency decoder. (#7746)
update
2024-04-24 12:28:11 +05:30
Dhruv Nair
88018fcf20 Fix failing VAE tiling test (#7747)
update
2024-04-24 12:27:45 +05:30
Steven Liu
7404f1e9dc [docs] Clean up toctree (#7715)
* toctree

* optim

* feedback

* improve overview
2024-04-23 09:30:33 -07:00
Sayak Paul
5a69227863 [Metadat utils] fix: json lines ordering. (#7744)
fix: json lines ordering.
2024-04-23 14:32:30 +05:30
Sai-Suraj-27
fc9fecc217 fix: Fixed a wrong decorator by modifying it to @classmethod (#7653)
* Fixed wrong decorator by modifying it to @classmethod.

* Updated the method and it's argument.

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-04-22 14:41:35 -10:00
Fabio Rigano
065f251766 Restore AttnProcessor2_0 in unload_ip_adapter (#7727)
* Restore AttnProcessor2_0 in unload_ip_adapter

* Fix style

* Update test

---------

Co-authored-by: YiYi Xu <yixu310@gmail.com>
2024-04-22 13:59:03 -10:00
Jenyuan-Huang
21c747fa0f Support InstantStyle (#7668)
* enable control ip-adapter per-transformer block on-the-fly

---------

Co-authored-by: sayakpaul <spsayakpaul@gmail.com>
Co-authored-by: ResearcherXman <xhs.research@gmail.com>
Co-authored-by: YiYi Xu <yixu310@gmail.com>
2024-04-22 13:20:19 -10:00
Phil Butler
09129842e7 Remove redundant lines (#7396)
Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>
Co-authored-by: YiYi Xu <yixu310@gmail.com>
2024-04-22 09:32:16 -10:00
Steven Liu
33b363edfa [docs] AutoPipeline (#7714)
* autopipeline

* edits

* feedback

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-04-22 10:15:07 -07:00
Dhruv Nair
a9dd86029e Fix Kandinksy V22 tests (#7699)
update
2024-04-22 15:41:59 +05:30
Dhruv Nair
9100652494 Update Wuerschten Test (#7700)
update
2024-04-22 15:41:39 +05:30
Abhinav Gopal
d1e3f489e9 Animatediff Controlnet Community Pipeline IP Adapter Fix (#7413)
* fixed encode_image function signature in controlnet animatediff

* copied encode_image from stable diffusion pipeline

---------

Co-authored-by: YiYi Xu <yixu310@gmail.com>
2024-04-19 15:35:07 -10:00
Guillaume LEGENDRE
ae05050db9 fix/add tailscale key in case of failure (#7719)
add tailscale key in case of failure
2024-04-19 10:56:40 +02:00
Sai-Suraj-27
db969cc16d fix: Fixed type annotations for compatability with python 3.8 (#7648)
* Fixed type annotations for compatability with python 3.8

* Add required imports.
2024-04-18 19:34:09 -10:00
Dhruv Nair
3cfe187dc7 Cleanup ControlnetXS (#7701)
* update

* update
2024-04-18 19:32:00 -10:00
Dhruv Nair
90250d9e48 Cast height, width to int inside prepare latents (#7691)
update
2024-04-18 19:30:39 -10:00
YiYi Xu
e5674015f3 adding back test_conversion_when_using_device_map (#7704)
* style


* Fix device map nits (#7705)


---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-04-18 19:21:32 -10:00
Fabio Rigano
b5c8b555d7 Move IP Adapter Face ID to core (#7186)
* Switch to peft and multi proj layers

* Move Face ID loading and inference to core

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-04-18 14:13:27 -10:00
Guillaume LEGENDRE
e23c27e905 Add tailscale action to push_test (#7709) 2024-04-18 18:48:39 +05:30
Steven Liu
7635d3d37f [docs] Pipeline loading (#7684)
* pipelines

* schedulers and models

* community pipelines

* feedback
2024-04-17 15:42:27 -07:00
Wentian
9132ce7c58 [Docs] Update TGATE in section optimization. (#7698)
Update tgate.md
2024-04-17 09:37:24 -07:00
Sayak Paul
30c977d1f5 [Workflows] remove installation of redundant modules from flax PR tests (#7662)
remove installation of redundant modules from flax PR tests
2024-04-17 15:16:04 +05:30
Dhruv Nair
f0fa17dd8e Don't install PEFT with UV in slow tests (#7697)
* update

* update
2024-04-17 15:10:38 +05:30
Sai-Suraj-27
c726d02beb fix: Updated ruff configuration to avoid deprecated configuration warning (#7637)
Updated ruff configuration to avoid depreceated config.

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-04-16 15:02:55 -10:00
Wentian
a68503f221 [Docs] Add TGATE in section optimization (#7639)
* Create tgate.md

* Update _toctree.yml

* Update docs/source/en/optimization/tgate.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/optimization/tgate.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/optimization/tgate.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/optimization/tgate.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/optimization/tgate.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/optimization/tgate.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/optimization/tgate.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/optimization/tgate.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/optimization/tgate.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/optimization/tgate.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update tgate.md

* Update tgate.md

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2024-04-16 17:58:27 -07:00
Sayak Paul
9d50f7eec1 [Core] is_cosxl_edit arg in SDXL ip2p. (#7650)
* is_cosxl_edit arg in SDXL ip2p.

* Empty-Commit

Co-authored-by: Yiyi Xu <yixu310@gmail.com>

* doc

* remove redundant logic.

* reflect drhuv's comments.

---------

Co-authored-by: Yiyi Xu <yixu310@gmail.com>
Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>
2024-04-16 22:15:55 +05:30
UmerHA
fda1531d8a Fixing implementation of ControlNet-XS (#6772)
* CheckIn - created DownSubBlocks

* Added extra channels, implemented subblock fwd

* Fixed connection sizes

* checkin

* Removed iter, next in forward

* Models for SD21 & SDXL run through

* Added back pipelines, cleared up connections

* Cleaned up connection creation

* added debug logs

* updated logs

* logs: added input loading

* Update umer_debug_logger.py

* log: Loading hint

* Update umer_debug_logger.py

* added logs

* Changed debug logging

* debug: added more logs

* Fixed num_norm_groups

* Debug: Logging all of SDXL input

* Update umer_debug_logger.py

* debug: updated logs

* checkim

* Readded tests

* Removed debug logs

* Fixed Slow Tests

* Added value ckecks | Updated model_cpu_offload_seq

* accelerate-offloading works ; fast tests work

* Made unet & addon explicit in controlnet

* Updated slow tests

* Added dtype/device to ControlNetXS

* Filled in test model paths

* Added image_encoder/feature_extractor to XL pipe

* Fixed fast tests

* Added comments and docstrings

* Fixed copies

* Added docs ; Updates slow tests

* Moved changes to UNetMidBlock2DCrossAttn

* tiny cleanups

* Removed stray prints

* Removed ip adapters + freeU

- Removed ip adapters + freeU as they don't make sense for ControlNet-XS
- Fixed imports of UNet components

* Fixed test_save_load_float16

* Make style, quality, fix-copies

* Changed loading/saving API for ControlNetXS

- Changed loading/saving API for ControlNetXS
- other small fixes

* Removed ControlNet-XS from research examples

* Make style, quality, fix-copies

* Small fixes

- deleted ControlNetXSModel.init_original
- added time_embedding_mix to StableDiffusionControlNetXSPipeline .from_pretrained / StableDiffusionXLControlNetXSPipeline.from_pretrained
- fixed copy hints

* checkin May 11 '23

* CheckIn Mar 12 '24

* Fixed tests for SD

* Added tests for UNetControlNetXSModel

* Fixed SDXL tests

* cleanup

* Delete Pipfile

* CheckIn Mar 20

Started replacing sub blocks  by `ControlNetXSCrossAttnDownBlock2D` and `ControlNetXSCrossAttnUplock2D`

* check-in Mar 23

* checkin 24 Mar

* Created init for UNetCnxs and CnxsAddon

* CheckIn

* Made from_modules, from_unet and no_control work

* make style,quality,fix-copies & small changes

* Fixed freezing

* Added gradient ckpt'ing; fixed tests

* Fix slow tests(+compile) ; clear naming confusion

* Don't create UNet in init ; removed class_emb

* Incorporated review feedback

- Deleted get_base_pipeline /  get_controlnet_addon for pipes
- Pipes inherit from StableDiffusionXLPipeline
- Made module dicts for cnxs-addon's down/mid/up classes
- Added support for qkv fusion and freeU

* Make style, quality, fix-copies

* Implemented review feedback

* Removed compatibility check for vae/ctrl embedding

* make style, quality, fix-copies

* Delete Pipfile

* Integrated review feedback

- Importing ControlNetConditioningEmbedding now
- get_down/mid/up_block_addon now outside class
- renamed `do_control` to `apply_control`

* Reduced size of test tensors

For this, added `norm_num_groups` as parameter everywhere

* Renamed cnxs-`Addon` to cnxs-`Adapter`

- `ControlNetXSAddon` -> `ControlNetXSAdapter`
- `ControlNetXSAddonDownBlockComponents` -> `DownBlockControlNetXSAdapter`, and similarly for mid/up
- `get_mid_block_addon` -> `get_mid_block_adapter`, and similarly for mid/up

* Fixed save_pretrained/from_pretrained bug

* Removed redundant code

---------

Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>
2024-04-16 21:56:20 +05:30
Sayak Paul
cf6e0407e0 don't install peft from the source with uv for now. (#7679) 2024-04-15 09:33:02 +05:30
Sayak Paul
1c000d46e1 fix: metadata token (#7631) 2024-04-15 08:32:27 +05:30
Sayak Paul
08bf754507 make docker-buildx mandatory. (#7652) 2024-04-13 07:26:34 +05:30
kabachuha
2f23437618 Add (Scheduled) Pseudo-Huber Loss training scripts to research projects (#7527)
* add scheduled pseudo-huber loss training scripts

See #7488

* add reduction modes to huber loss

* [DB Lora] *2 multiplier to huber loss cause of 1/2 a^2 conv.

pairing of c6495def1f

* [DB Lora] add option for smooth l1 (huber / delta)

Pairing of dd22958caa

* [DB Lora] unify huber scheduling

Pairing of 19a834c3ab

* [DB Lora] add snr huber scheduler

Pairing of 47fb1a6854

* fixup examples link

* use snr schedule by default in DB

* update all huber scripts with snr

* code quality

* huber: make style && make quality

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-04-13 07:26:08 +05:30
Benjamin Bossan
2523390c26 FIX Setting device for DoRA parameters (#7655)
Fix a bug that causes the the call to set_lora_device to ignore the DoRA
parameters.
2024-04-12 13:55:46 +02:00
Sai-Suraj-27
279de3c3ff fix: Replaced deprecated logger.warn with logger.warning (#7643)
Fixed deprecated logger.warn with logger.warning.
2024-04-11 09:43:01 -10:00
Yiqin Zhao
8e14535708 Fixed YAML loading. (#7579) 2024-04-11 09:08:42 -10:00
dg845
0bee4d336b LCM Distill Scripts Fix Bug when Initializing Target U-Net (#6848)
* Initialize target_unet from unet rather than teacher_unet so that we correctly add time_embedding.cond_proj if necessary.

* Use UNet2DConditionModel.from_config to initialize target_unet from unet's config.

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-04-11 07:52:12 -10:00
Steven Munn
42f25d601a Skip PEFT LoRA Scaling if the scale is 1.0 (#7576)
* Skip scaling if scale is identity

* move check for weight one to scale and unscale lora

* fix code style/quality

* Empty-Commit

---------

Co-authored-by: Steven Munn <stevenjmunn@gmail.com>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: Steven Munn <5297082+stevenjlm@users.noreply.github.com>
2024-04-11 11:02:31 +05:30
Sayak Paul
33c5d125cb [Core] fix img2img pipeline for Playground (#7627)
* playground vae encoding should use std and mean of the vae.

* style.

* fix-copies.
2024-04-11 09:07:38 +05:30
YiYi Xu
aa1f00fd01 Fix cpu offload related slow tests (#7618)
* fix

* up

---------

Co-authored-by: yiyixuxu <yixu310@gmail,com>
2024-04-10 14:53:45 -10:00
Steven Liu
d95b993427 [docs] T2I (#7623)
* refactor t2i

* add code snippets
2024-04-10 17:10:41 -07:00
Steven Liu
1d480298c1 [docs] Prompt enhancer (#7565)
* prompt enhance

* edits

* align titles

* feedback

* feedback

* feedback

* link to style
2024-04-10 16:09:06 -07:00
Sayak Paul
b2323aa2b7 [Tests] reduce the model sizes in the SD fast tests (#7580)
* give it a shot.

* print.

* correct assertion.

* gather results from the rest of the tests.

* change the assertion values where needed.

* remove print statements.
2024-04-10 11:36:28 -10:00
satani99
37e9d695af Modularize instruct_pix2pix SD inferencing during and after training in examples (#7603)
* Modularize instruct_pix2pix code

* quality check

* quality check

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-04-10 11:19:16 +05:30
Sayak Paul
a402431de0 [docs] remove duplicate tip block. (#7625)
remove duplicate tip block.
2024-04-10 10:31:11 +05:30
IDKiro
b99b1617cf add the option of upsample function for tiny vae (#7604)
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: YiYi Xu <yixu310@gmail.com>
2024-04-10 09:27:39 +05:30
Sayak Paul
3e4a6bd2d4 [Core] add "balanced" device_map support to pipelines (#6857)
* get device <-> component mapping when using multiple gpus.

* condition the device_map bits.

* relax condition

* device_map progress.

* device_map enhancement

* some cleaning up and debugging

* Apply suggestions from code review

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

* incorporate suggestions from PR.

* remove multi-gpu condition for now.

* guard check the component -> device mapping

* fix: device_memory variable

* dispatching transformers model to have force_hooks=True

* better guarding for transformers device_map

* introduce support balanced_low_memory and balanced_ultra_low_memory.

* remove device_map patch.

* fix: intermediate variable scoping.

* fix: condition in cpu offload.

* fix: flax class restrictions.

* remove modifications from cpu_offload and model_offload

* incorporate changes.

* add a simple forward pass test

* add: torch_device in get_inputs()

* add: tests

* remove print

* safe-guard to(), model offloading and cpu offloading when balanced is used as a device_map.

* style

* remove .

* safeguard device_map with more checks and remove invalid device_mapping strategues.

* make  a class attribute and adjust tests accordingly.

* fix device_map check

* fix test

* adjust comment

* fix: device_map attribute

* fix: dispatching.

* max_memory test for pipeline

* version guard the tests

* fix guard.

* address review feedback.

* reset_device_map method.

* add: test for reset_hf_device_map

* fix a couple things.

* add reset_device_map() in the error message.

* add tests for checking reset_device_map doesn't have unintended consequences.

* fix reset_device_map and offloading tests.

* create _get_final_device_map utility.

* hf_device_map -> _hf_device_map

* add documentation

* add notes suggested by Marc.

* styling.

* Apply suggestions from code review

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* move updates within gpu condition.

* other docs related things

* note on ignore a device not specified in .

* provide a suggestion if device mapping errors out.

* fix: typo.

* _hf_device_map -> hf_device_map

* Empty-Commit

* add: example hf_device_map.

---------

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
2024-04-10 08:59:05 +05:30
Sayak Paul
c827e94da0 [Workflows] remove installation of libsndfile1-dev and libgl1 from workflows (#7543)
* remove libsndfile1-dev and libgl1 from workflows and ensure that re present in the respective dockerfiles.

* change to self-hosted runner; let's see 🤞

* add libsndfile1-dev libgl1 for now

* use self-hosted runners for building and push too.
2024-04-10 08:34:56 +05:30
Sayak Paul
44f6b859bf [Core] refactor transformer_2d forward logic into meaningful conditions. (#7489)
* refactor transformer_2d forward logic into meaningful conditions.

* Empty-Commit

* fix: _operate_on_patched_inputs

* fix: _operate_on_patched_inputs

* check

* fix: patch output computation block.

* fix: _operate_on_patched_inputs.

* remove print.

* move operations to blocks.

* more readability neats.

* empty commit

* Apply suggestions from code review

Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>

* Revert "Apply suggestions from code review"

This reverts commit 12178b1aa0.

---------

Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>
2024-04-10 08:33:19 +05:30
Sayak Paul
ac7ff7d4a3 add utilities for updating diffusers pipeline metadata. (#7573)
* add utilities for updating diffusers pipeline metadata.

* style

* remove first empty line
2024-04-10 08:28:49 +05:30
Fabio Rigano
a0cf607667 Multi-image masking for single IP Adapter (#7499)
* Support multiimage masking

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: YiYi Xu <yixu310@gmail.com>
2024-04-09 09:20:57 -10:00
YiYi Xu
a341b536a8 disable test_conversion_when_using_device_map (#7620)
* disable test

* update

---------

Co-authored-by: yiyixuxu <yixu310@gmail,com>
2024-04-09 09:01:19 -10:00
Christopher Beckham
8e46d97cd8 Add missing restore() EMA call in train SDXL script (#7599)
* Restore unet params back to normal from EMA when validation call is finished

* empty commit

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-04-09 18:07:55 +05:30
Junjie
7e808e768a [Docs] fix bugs in callback docs (#7594) 2024-04-08 08:46:30 -10:00
w4ffl35
7e39516627 Allow more arguments to be passed to convert_from_ckpt (#7222)
Allow safety and feature extractor arguments to be passed to convert_from_ckpt

Allows management of safety checker and feature extractor
from outside of the convert ckpt class.

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-04-08 10:13:48 +05:30
Nguyễn Công Tú Anh
56a76082ed Add AudioLDM2 TTS (#5381)
* add audioldm2 tts

* change gpt2 max new tokens

* remove unnecessary pipeline and class

* add TTS to AudioLDM2Pipeline

* add TTS docs

* delete unnecessary file

* remove unnecessary import

* add audioldm2 slow testcase

* fix code quality

* remove AudioLDMLearnablePositionalEmbedding

* add variable check vits encoder

* add use_learned_position_embedding

---------

Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>
2024-04-08 10:11:24 +05:30
YiYi Xu
6133d98ff7 [IF| add set_begin_index for all IF pipelines (#7577)
add set_begin_index for all if pipelines
2024-04-05 06:54:07 -10:00
Sayak Paul
1c60e094de [Tests] reduce block sizes of UNet and VAE tests (#7560)
* reduce block sizes for unet1d.

* reduce blocks for unet_2d.

* reduce block size for unet_motion

* increase channels.

* correctly increase channels.

* reduce number of layers in unet2dconditionmodel tests.

* reduce block sizes for unet2dconditionmodel tests

* reduce block sizes for unet3dconditionmodel.

* fix: test_feed_forward_chunking

* fix: test_forward_with_norm_groups

* skip spatiotemporal tests on MPS.

* reduce block size in AutoencoderKL.

* reduce block sizes for vqmodel.

* further reduce block size.

* make style.

* Empty-Commit

* reduce sizes for ConsistencyDecoderVAETests

* further reduction.

* further block reductions in AutoencoderKL and AssymetricAutoencoderKL.

* massively reduce the block size in unet2dcontionmodel.

* reduce sizes for unet3d

* fix tests in unet3d.

* reduce blocks further in motion unet.

* fix: output shape

* add attention_head_dim to the test configuration.

* remove unexpected keyword arg

* up a bit.

* groups.

* up again

* fix
2024-04-05 10:08:32 +05:30
UmerHA
71f49a5d2a Skip test_freeu_enabled on MPS (#7570)
* Skip `test_freeu_enabled ` on MPS

* Small fixes

- import skip_mps correctly
- disable all instances of test_freeu_enabled

* Empty commit to trigger tests

* Empty commit to trigger CI
2024-04-04 12:16:04 +02:00
Abhinav Gopal
35db2fdea9 Update pipeline_animatediff_video2video.py (#7457)
* Update pipeline_animatediff_video2video.py

* commit with test for whether latent input can be passed into animatediffvid2vid
2024-04-03 19:34:28 +05:30
Sayak Paul
ad55ce6100 [Chore] increase number of workers for the tests. (#7558)
* increase number of workers for the tests.

* move to beefier runner.

* improve the fast push tests too.

* use a beefy machine for pytorch pipeline tests

* up the number of workers further.
2024-04-03 17:11:42 +05:30
Sayak Paul
a9a5b14f35 [Core] refactor transformers 2d into multiple init variants. (#7491)
* refactor transformers 2d into multiple legacy variants.

* fix: init.

* fix recursive init.

* add inits.

* make transformer block creation more modular.

* complete refactor.

* remove forward

* debug

* remove legacy blocks and refactor within the module itself.

* remove print

* guard caption projection

* remove fetcher.

* reduce the number of args.

* fix: norm_type

* group variables that are shared.

* remove _get_transformer_blocks

* harmonize the init function signatures.

* transformer_blocks to common

* repeat .
2024-04-03 12:56:17 +05:30
Beinsezii
aa19025989 UniPC Multistep add rescale_betas_zero_snr (#7531)
* UniPC Multistep add `rescale_betas_zero_snr`

Same patch as DPM and Euler with the patched final alpha cumprod

BF16 doesn't seem to break down, I think cause UniPC upcasts during some
phases already? We could still force an upcast since it only
loses ≈ 0.005 it/s for me but the difference in output is very small. A
better endeavor might upcasting in step() and removing all the other
upcasts elsewhere?

* UniPC ZSNR UT

* Re-add `rescale_betas_zsnr` doc oops
2024-04-02 17:23:55 -10:00
Beinsezii
19ab04ff56 UniPC Multistep fix tensor dtype/device on order=3 (#7532)
* UniPC UTs iterate solvers on FP16

It wasn't catching errs on order==3. Might be excessive?

* UniPC Multistep fix tensor dtype/device on order=3

* UniPC UTs Add v_pred to fp16 test iter

For completions sake. Probably overkill?
2024-04-02 15:41:29 -10:00
Sayak Paul
4a34307702 add: utility to format our docs too 📜 (#7314)
* add: utility to format our docs too 📜

* debugging saga

* fix: message

* checking

* should be fixed.

* revert pipeline_fixture

* remove empty line

* make style

* fix: setup.py

* style.
2024-04-02 20:49:43 +05:30
Bagheera
8e963d1c2a 7529 do not disable autocast for cuda devices (#7530)
* 7529 do not disable autocast for cuda devices

* Remove typecasting error check for non-mps platforms, as a correct autocast implementation makes it a non-issue

* add autocast fix to other training examples

* disable native_amp for dreambooth (sdxl)

* disable native_amp for pix2pix (sdxl)

* remove tests from remaining files

* disable native_amp on huggingface accelerator for every training example that uses it

* convert more usages of autocast to nullcontext, make style fixes

* make style fixes

* style.

* Empty-Commit

---------

Co-authored-by: bghira <bghira@users.github.com>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-04-02 20:15:06 +05:30
Sayak Paul
2b04ec2ff7 [Tests] Speed up fast pipelines part II (#7521)
* start printing the tensors.

* print full throttle

* set static slices for 7 tests.

* remove printing.

* flatten

* disable test for controlnet

* what happens when things are seeded properly?

* set the right value

* style./

* make pia test fail to check things

* print.

* fix pia.

* checking for animatediff.

* fix: animatediff.

* video synthesis

* final piece.

* style.

* print guess.

* fix: assertion for control guess.

---------

Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>
2024-04-02 13:24:56 +05:30
Sayak Paul
000fa82a1e [Chore] remove class assignments for linear and conv. (#7553)
* remove class assignments for linear and conv.

* fix: self.nn
2024-04-02 13:01:04 +05:30
Sayak Paul
5d83f50c23 [Release tests] make nightly workflow dispatchable. (#7541)
* make nightly workflow dispatchable.

* add a note about running the release tests to setup.py
2024-04-02 12:21:17 +05:30
Dhruv Nair
5d21d4a204 Fix FreeU tests (#7540)
update
2024-04-02 11:05:50 +05:30
Álvaro Somoza
73ba81090e [Community pipeline] SDXL Differential Diffusion Img2Img Pipeline (#7550)
* initial-commit pipeline created

* updated README.md
2024-04-01 18:15:30 -10:00
YiYi Xu
7956c36aaa add a from_pipe method to DiffusionPipeline (#7241)
* add from_pipe



---------

Co-authored-by: yiyixuxu <yixu310@gmail,com>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>
2024-04-01 13:02:00 -10:00
haikmanukyan
5266ab7935 add HD-Painter pipeline (#7520)
* add HD-Painter pipeline

* style fixing

* refactor, change doc, fix ruff

* fix docs

* used correct ruff version

---------

Co-authored-by: Hayk Manukyan <youremail@yourdomain.com>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-04-01 15:10:44 +05:30
YiYi Xu
7f724a930e fix the cpu offload tests (#7544)
fix
2024-04-01 14:27:14 +05:30
Jianbing Wu
9bef9f4be7 Fix SVD bug (shape of time_context) (#7268)
* Fix SVD bug (shape of `time_context`)

* Formatting code

* Formatting src/diffusers/models/transformers/transformer_temporal.py by `make style && make quality`

---------

Co-authored-by: kevinkhwu <kevinkhwu@tencent.com>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>
2024-04-01 14:05:52 +05:30
Dhruv Nair
7aa4514260 Fix typo in CPU offload test (#7542)
update
2024-03-31 22:07:17 -10:00
Bingxin Ke
c2e87869be [Community pipeline] Marigold depth estimation update -- align with marigold v0.1.5 (#7524)
* add resample option; check denoise_step; update ckpt path

* Add seeding in pipeline to increase reproducibility

* fix typo

* fix typo
2024-03-30 07:09:02 -10:00
Stephen
ca61287daa Fix IP Adapter Support for SAG Pipeline (#7260)
* fix ip adapter support

* Update sag pipelines tests, adjust sag pipeline to pass tests

---------

Co-authored-by: YiYi Xu <yixu310@gmail.com>
2024-03-30 06:15:29 -10:00
Beinsezii
f0c81562a4 Add final_sigma_zero to UniPCMultistep (#7517)
* Add `final_sigma_zero` to UniPCMultistep

Effectively the same trick as DDIM's `set_alpha_to_one` and
DPM's `final_sigma_type='zero'`.
Currently False by default but maybe this should be True?

* `final_sigma_zero: bool` -> `final_sigmas_type: str`

Should 1:1 match DPM Multistep now.

* Set `final_sigmas_type='sigma_min'` in UniPC UTs
2024-03-29 22:23:45 -10:00
Hyoungwon Cho
9d20ed37a2 Perturbed-Attention Guidance (#7512)
* pag_initial

* pag_docs

* edit_docs

* custom

* typo

* delete_docs

* whitespace

* make style

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-03-30 10:52:51 +05:30
Linoy Tsaban
bda1d4faf8 add Instant id sdxl image2image pipeline (#7507)
* initial commit - instantid img2img

* adapting to img2img

* change add_time_ids

* change add_time_ids

* WIP changes

* add strength to timesteps

* check insightface import

* style

* check insightface import changed to warning

* check insightface import changed to warning

* style

---------

Co-authored-by: apolinário <joaopaulo.passos@gmail.com>
2024-03-30 10:25:21 +05:30
UmerHA
77103d71ca Quick-Fix for #7352 block-lora (#7523)
Fixed important typo
2024-03-30 06:42:28 +05:30
UmerHA
0302446819 Implements Blockwise lora (#7352)
* Initial commit

* Implemented block lora

- implemented block lora
- updated docs
- added tests

* Finishing up

* Reverted unrelated changes made by make style

* Fixed typo

* Fixed bug + Made text_encoder_2 scalable

* Integrated some review feedback

* Incorporated review feedback

* Fix tests

* Made every module configurable

* Adapter to new lora test structure

* Final cleanup

* Some more final fixes

- Included examples in `using_peft_for_inference.md`
- Added hint that only attns are scaled
- Removed NoneTypes
- Added test to check mismatching lens of adapter names / weights raise error

* Update using_peft_for_inference.md

* Update using_peft_for_inference.md

* Make style, quality, fix-copies

* Updated tutorial;Warning if scale/adapter mismatch

* floats are forwarded as-is; changed tutorial scale

* make style, quality, fix-copies

* Fixed typo in tutorial

* Moved some warnings into `lora_loader_utils.py`

* Moved scale/lora mismatch warnings back

* Integrated final review suggestions

* Empty commit to trigger CI

* Reverted emoty commit to trigger CI

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-03-29 21:15:57 +05:30
Dhruv Nair
4d39b7483d Memory clean up on all Slow Tests (#7514)
* update

* update

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-03-29 14:23:28 +05:30
Sayak Paul
fac761694a [Tests] Speed up some fast pipeline tests (#7477)
* speed up test_vae_slicing in animatediff

* speed up test_karras_schedulers_shape for attend and excite.

* style.

* get the static slices out.

* specify torch print options.

* modify

* test run with controlnet

* specify kwarg

* fix: things

* not None

* flatten

* controlnet img2img

* complete controlet sd

* finish more

* finish more

* finish more

* finish more

* finish the final batch

* add cpu check for expected_pipe_slice.

* finish the rest

* remove print

* style

* fix ssd1b controlnet test

* checking ssd1b

* disable the test.

* make the test_ip_adapter_single controlnet test more robust

* fix: simple inpaint

* multi

* disable panorama

* enable again

* panorama is shaky so leave it for now

* remove print

* raise tolerance.
2024-03-29 14:11:38 +05:30
YiYi Xu
34c90dbb31 fix OOM for test_vae_tiling (#7510)
use float16 and add torch.no_grad()
2024-03-29 08:22:39 +05:30
Lvkesheng Shen
e49c04d5d6 Bug fix for controlnetpipeline check_image (#7103)
* Bug fix for controlnetpipeline check_image

Bug fix for controlnetpipeline check_image when using multicontrolnet and prompt list

* Update test_inference_multiple_prompt_input function

* Update test_controlnet.py

add test for multiple prompts and multiple image conditioning

* Update test_controlnet.py

Fix format error

---------

Co-authored-by: Lvkesheng Shen <45848260+Fantast416@users.noreply.github.com>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-03-28 08:25:18 -10:00
YiYi Xu
f238cb0736 cpu_offload: remove all hooks before offload (#7448)
* add remove_all_hooks

* a few more fix and tests

* up

* Update src/diffusers/pipelines/pipeline_utils.py

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* split tests

* add

---------

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
2024-03-28 08:23:02 -10:00
Bagheera
d78acdedc1 apple mps: training support for SDXL (ControlNet, LoRA, Dreambooth, T2I) (#7447)
* apple mps: training support for SDXL LoRA

* sdxl: support training lora, dreambooth, t2i, pix2pix, and controlnet on apple mps

---------

Co-authored-by: bghira <bghira@users.github.com>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-03-28 14:26:18 +05:30
Sayak Paul
6df103deba add: a helpful message when quality and repo consistency checks fail. (#7475) 2024-03-28 13:51:56 +05:30
Sayak Paul
73f28708be Improve nightly tests (#7385)
* flesh out the nightly tests

* address feedback.
2024-03-28 13:26:34 +05:30
Sayak Paul
0cbc78f04c [Modeling utils chore] import load_model_dict_into_meta only once (#7437)
import load_model_dict_into_meta only once
2024-03-28 13:01:53 +05:30
Thomas Liang
0cc5630945 [Chore] Fix Colab notebook links in README.md (#7495) 2024-03-27 12:36:36 -10:00
UmerHA
0b8e29289d Skip test_lora_fuse_nan on mps (#7481)
Skipping test_lora_fuse_nan on mps

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-03-27 14:35:59 +05:30
Sayak Paul
ab38ddf64f [chore] make the istructions on fetching all commits clearer. (#7474)
* make the istructions on fetching all commits clearer.

* Update setup.py

Co-authored-by: YiYi Xu <yixu310@gmail.com>

---------

Co-authored-by: YiYi Xu <yixu310@gmail.com>
2024-03-27 08:16:46 +05:30
YiYi Xu
ead82fedea fix torch.compile for multi-controlnet of sdxl inpaint (#7476)
fix

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-03-27 08:08:32 +05:30
Disty0
45b42d1203 Add device arg to offloading with combined pipelines (#7471)
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-03-26 13:45:16 -10:00
Long(Tony) Lian
5199ee4f7b Fix missing raise statements in check_inputs (#7473)
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-03-26 13:34:28 -10:00
Bagheera
544710ef0f diffusers#7426 fix stable diffusion xl inference on MPS when dtypes shift unexpectedly due to pytorch bugs (#7446)
* mps: fix XL pipeline inference at training time due to upstream pytorch bug

* Update src/diffusers/pipelines/stable_diffusion_xl/pipeline_stable_diffusion_xl.py

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

* apply the safe-guarding logic elsewhere.

---------

Co-authored-by: bghira <bghira@users.github.com>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-03-26 20:05:49 +05:30
M. Tolga Cangöz
443aa14e41 Fix Tiling in ConsistencyDecoderVAE (#7290)
* Fix typos

* Add docstring to `decode` method in `ConsistencyDecoderVAE`

* Fix tiling

* Enable tiled VAE decoding with customizable tile sample size and overlap factor

* Revert "Enable tiled VAE decoding with customizable tile sample size and overlap factor"

This reverts commit 181049675e.

* Add VAE tiling test for `ConsistencyDecoderVAE`

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-03-26 17:59:08 +05:30
Sayak Paul
288632adf6 [Training utils] add kohya conversion dict. (#7435)
* add kohya conversion dict.

* update readme

* typo

* add filename
2024-03-26 17:31:22 +05:30
Ernie Chu
5ce79cbded Update train_dreambooth_lora_sd15_advanced.py (#7433)
you cannot specify `type="bool"` and `action="store_true"` at the same time.
remove excessive and buggy `type=bool`.

Co-authored-by: Linoy Tsaban <57615435+linoytsaban@users.noreply.github.com>
2024-03-26 12:53:02 +02:00
Marçal Comajoan Cara
d52f3e30f8 Fix broken link (#7472)
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-03-26 10:29:08 +05:30
Sayak Paul
699dfb084c feat: support DoRA LoRA from community (#7371)
* feat: support dora loras from community

* safe-guard dora operations under peft version.

* pop use_dora when False

* make dora lora from kohya work.

* fix: kohya conversion utils.

* add a fast test for DoRA compatibility..

* add a nightly test.
2024-03-26 09:37:33 +05:30
Sayak Paul
484c8ef399 [tests] skip dynamo tests when python is 3.12. (#7458)
skip dynamo tests when python is 3.12.
2024-03-26 08:39:48 +05:30
estelleafl
0dd0528851 Small ldm3d fix (#7464)
* fixed typo

* updated doc to be consistent in naming

* make style/quality

* preprocessing for 4 channels and not 6

* make style

* test for 4c

* make style/quality

* fixed test on cpu

* fixed doc typo

* changed default ckpt to 4c

* Update pipeline_stable_diffusion_ldm3d.py

* fix bug

---------

Co-authored-by: Aflalo <estellea@isl-iam1.rr.intel.com>
Co-authored-by: Aflalo <estellea@isl-gpu33.rr.intel.com>
Co-authored-by: Aflalo <estellea@isl-gpu38.rr.intel.com>
2024-03-25 15:33:43 -10:00
UmerHA
1cd4732e7f Fixed minor error in test_lora_layers_peft.py (#7394)
* Update test_lora_layers_peft.py

* Update utils.py
2024-03-25 11:35:27 -10:00
M. Tolga Cangöz
a51b6cc86a [Docs] Fix typos (#7451)
* Fix typos

* Fix typos

* Fix typos

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-03-25 11:48:02 -07:00
Dhruv Nair
3bce0f3da1 Fix for str_to_bool definition in testing utils (#7461)
update
2024-03-25 13:33:09 +05:30
Dhruv Nair
9a34953823 Additional Memory clean up for slow tests (#7436)
* update

* update

* update
2024-03-25 12:19:21 +05:30
Sayak Paul
e29f16cfaa [Research Projects] ORPO diffusion for alignment (#7423)
* barebones orpo

* remove reference model.

* full implementation

* change default of beta_orpo

* add a training command.

* fix: dataloading issues.

* interpreting the formulation.

* revert styling

* add: wds full blown version

* fix: per_gpu_batch_siz

* start debuggin

* debugging

* remove print

* fix

* remove filter keys.

* turn on non-blocking calls.

* device_placement

* let's see.

* add bigger training run command

* reinitialize generator for fair repro

* add: detailed readme and requirements

---------

Co-authored-by: Sayak Paul <sayakpaul@Sayaks-MacBook-Pro-2.local>
2024-03-25 08:37:41 +05:30
M. Tolga Cangöz
f7dfcfd971 [IP-Adapter] Fix IP-Adapter Support and Refactor Callback for StableDiffusionPanoramaPipeline (#7262)
* Add properties and `IPAdapterTesterMixin` tests for `StableDiffusionPanoramaPipeline`

* Update torch manual seed to use `torch.Generator(device=device)`

* Refactor 📞🔙 to support `callback_on_step_end`

* make fix-copies
2024-03-24 16:07:02 -10:00
Sayak Paul
3c67864c5a Remove distutils (#7455)
* strtobool

* replace Command from setuptools.
2024-03-25 06:44:53 +05:30
Aryan
363699044e [refactor] Fix FreeInit behaviour (#7410)
* fix freeinit impl

* fix progress bar

* fix progress bar and remove old code

* fix num_inference_steps==1 case for freeinit by atleast running 1 step when fast sampling enabled
2024-03-22 19:20:00 +05:30
Sayak Paul
9613576191 add: space for calculating memory usagee. (#7414)
* add: space for calculating memory usahe.

* Update docs/source/en/using-diffusers/loading.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2024-03-22 08:43:21 +05:30
YiYi Xu
e4356d6488 add a "Community Scripts" section (#7358)
* add

* add tiling

* fix

* fix

* fix

* give community script its own readme

* Update examples/community/README_community_scripts.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update examples/community/README_community_scripts.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update examples/community/README_community_scripts.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update examples/community/README_community_scripts.md

---------

Co-authored-by: Alexis Rolland <alexis.rolland@ubisoft.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2024-03-21 10:05:07 -10:00
Sayak Paul
82441460ef [Docs] add missing output image (#7425)
add missing output image
2024-03-21 09:22:06 -07:00
sayakpaul
3e1097cb63 Revert "add: space within docs to calculate mememory usage."
This reverts commit 78990dd960.
2024-03-21 08:33:02 +05:30
sayakpaul
78990dd960 add: space within docs to calculate mememory usage. 2024-03-21 08:32:37 +05:30
Yuanhao Zhai
405a1facd2 fix: enable unet_3d_condition to support time_cond_proj_dim (#7364)
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-03-21 07:46:32 +05:30
M. Tolga Cangöz
3028089e5e Fix typos (#7411)
* Fix typos

* Fix typo in SVD.md
2024-03-20 18:46:47 -07:00
Sayak Paul
b536f39818 [Custom Pipelines with Custom Components] fix multiple things (#7304)
* checking to improve pipelines.

* more fixes.

* add: tip to encourage the usage of revision

* Apply suggestions from code review

* retrigger ci

---------

Co-authored-by: YiYi Xu <yixu310@gmail.com>
2024-03-20 18:49:00 +05:30
Sayak Paul
e25e525fde [LoRA test suite] refactor the test suite and cleanse it (#7316)
* cleanse and refactor lora testing suite.

* more cleanup.

* make check_if_lora_correctly_set a utility function

* fix: typo

* retrigger ci

* style
2024-03-20 17:13:52 +05:30
Sayak Paul
de9adb907c clean dep installation step in push_tests (#7382)
* clean dep installation step in push_tests

* fix: deps
2024-03-20 07:30:43 +05:30
Sayak Paul
bf861e65dc [Chore] add: fives names to citations. (#7395)
* add: four names to citations.

* add: steven
2024-03-20 06:37:57 +05:30
Dhruv Nair
4da810b943 Remove insecure torch.load calls (#7393)
update
2024-03-19 12:41:50 -10:00
Stephen
161c6e14b6 Change path to posix (modeling_utils.py) (#6781)
* Change path to posix

* running isort

* run style and quality checks

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-03-19 11:50:34 -10:00
laksjdjf
a6c9015c4e Fix ControlNetModel.from_unet do not load add_embedding (#7269)
* Fix ControlNetModel.from_unet do not load add_embedding

* delete white space in blank line

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-03-19 09:45:08 -10:00
PJC
e6a5f99e5c Update pipeline_controlnet_sd_xl_img2img.py (#7353)
* Update pipeline_controlnet_sd_xl_img2img.py

fix: safetensors load error

* fix for pass test

---------

Co-authored-by: YiYi Xu <yixu310@gmail.com>
2024-03-19 09:29:39 -10:00
Dhruv Nair
80ff4ba63e Fix issue with prompt embeds and latents in SD Cascade Decoder with multiple image embeddings for a single prompt. (#7381)
* fix

* update

* update

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-03-19 07:40:14 -10:00
Sayak Paul
b09a2aa308 [LoRA] fix cross_attention_kwargs problems and tighten tests (#7388)
* debugging

* let's see the numbers

* let's see the numbers

* let's see the numbers

* restrict tolerance.

* increase inference steps.

* shallow copy of cross_attentionkwargs

* remove print
2024-03-19 17:53:38 +05:30
YiYi Xu
63b6846849 [scheduler] fix a bug in add_noise (#7386)
* fix

* fix

* add a tests

* fix

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: yiyixuxu <yixu310@gmail,com>
2024-03-19 00:50:58 -10:00
lawfordp2017
139f707e6e Correction for non-integral image resolutions with quantizations other than float32 (#7356)
* Correction for non-integral image resolutions with quantizations other than float32.

* Support for training, and use of diffusers-style casting.
2024-03-19 16:17:44 +05:30
Aryan
e4546fd5bb [docs] Add missing copied from statements in TCD Scheduler (#7360)
* add missing copied from statements in tcd scheduler

* update docstring

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-03-19 00:45:36 -10:00
Dhruv Nair
d44e31aec2 Add FreeInit Outputs to Docs Page (#7384)
* update

* fix
2024-03-19 14:13:41 +05:30
Sayak Paul
ce9825b56b [LoRA] pop the LoRA scale so that it doesn't get propagated to the weeds (#7338)
* pop scale from the top-level unet instead of getting it.

* improve readability.

* Apply suggestions from code review

Co-authored-by: YiYi Xu <yixu310@gmail.com>

* fix a little bit.

---------

Co-authored-by: YiYi Xu <yixu310@gmail.com>
2024-03-19 09:12:05 +05:30
M. Tolga Cangöz
85f9d92883 Fix conditional statement in test_schedulers.py (#7323)
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-03-19 08:28:47 +05:30
M. Tolga Cangöz
916d9812a8 Update loading of config from a file in test_config.py (#7344)
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-03-18 11:47:36 -10:00
M. Tolga Cangöz
e6a8492242 Use PyTorch's conventional inplace functions (#7332)
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-03-18 09:12:15 -10:00
Beinsezii
ad0308b3f1 Add Cascade to Auto T2I + Decoder mappings (#7362)
* Add Cascade to Auto T2I + Decoder mappings

* ruff autofix

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-03-18 08:58:20 -10:00
M. Tolga Cangöz
e97a633b63 Update access of configuration attributes (#7343)
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-03-18 08:53:29 -10:00
Sayak Paul
01ac37b331 [LoRA] Clean Kohya conversion utils (#7374)
* clean up the kohya_conversion utility

* state dict assignment
2024-03-18 06:53:37 -10:00
M. Tolga Cangöz
6a05b274cc Fix Typos (#7325)
* Fix PyTorch's convention for inplace functions

* Fix import structure in __init__.py and update config loading logic in test_config.py

* Update configuration access

* Fix typos

* Trim trailing white spaces

* Fix typo in logger name

* Revert "Fix PyTorch's convention for inplace functions"

This reverts commit f65dc4afcb.

* Fix typo in step_index property description

* Revert "Update configuration access"

This reverts commit 8d44e870b8.

* Revert "Fix import structure in __init__.py and update config loading logic in test_config.py"

This reverts commit 2ad5e8bca2.

* Fix typos

* Fix typos

* Fix typos

* Fix a typo: tranform -> transform
2024-03-18 09:48:40 -07:00
Anatoly Belikov
98d46a3f08 delete vae and text encoders after use in SDXL training script (#6693)
delete vae and text encoders after use

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-03-18 20:03:53 +05:30
970 changed files with 122682 additions and 24148 deletions

View File

@@ -63,23 +63,27 @@ body:
Please tag a maximum of 2 people.
Questions on DiffusionPipeline (Saving, Loading, From pretrained, ...):
Questions on DiffusionPipeline (Saving, Loading, From pretrained, ...): @sayakpaul @DN6
Questions on pipelines:
- Stable Diffusion @yiyixuxu @DN6 @sayakpaul
- Stable Diffusion @yiyixuxu @asomoza
- Stable Diffusion XL @yiyixuxu @sayakpaul @DN6
- Stable Diffusion 3: @yiyixuxu @sayakpaul @DN6 @asomoza
- Kandinsky @yiyixuxu
- ControlNet @sayakpaul @yiyixuxu @DN6
- T2I Adapter @sayakpaul @yiyixuxu @DN6
- IF @DN6
- Text-to-Video / Video-to-Video @DN6 @sayakpaul
- Text-to-Video / Video-to-Video @DN6 @a-r-r-o-w
- Wuerstchen @DN6
- Other: @yiyixuxu @DN6
- Improving generation quality: @asomoza
Questions on models:
- UNet @DN6 @yiyixuxu @sayakpaul
- VAE @sayakpaul @DN6 @yiyixuxu
- Transformers/Attention @DN6 @yiyixuxu @sayakpaul @DN6
- Transformers/Attention @DN6 @yiyixuxu @sayakpaul
Questions on single file checkpoints: @DN6
Questions on Schedulers: @yiyixuxu
@@ -99,7 +103,7 @@ body:
Questions on JAX- and MPS-related things: @pcuenca
Questions on audio pipelines: @DN6
Questions on audio pipelines: @sanchit-gandhi

View File

@@ -39,7 +39,7 @@ members/contributors who may be interested in your PR.
Core library:
- Schedulers: @yiyixuxu
- Pipelines: @sayakpaul @yiyixuxu @DN6
- Pipelines and pipeline callbacks: @yiyixuxu and @asomoza
- Training examples: @sayakpaul
- Docs: @stevhliu and @sayakpaul
- JAX and MPS: @pcuenca
@@ -48,7 +48,8 @@ Core library:
Integrations:
- deepspeed: HF Trainer/Accelerate: @pacman100
- deepspeed: HF Trainer/Accelerate: @SunMarc
- PEFT: @sayakpaul @BenjaminBossan
HF projects:

View File

@@ -13,14 +13,17 @@ env:
jobs:
torch_pipelines_cuda_benchmark_tests:
env:
SLACK_WEBHOOK_URL: ${{ secrets.SLACK_WEBHOOK_URL_BENCHMARK }}
name: Torch Core Pipelines CUDA Benchmarking Tests
strategy:
fail-fast: false
max-parallel: 1
runs-on: [single-gpu, nvidia-gpu, a10, ci]
runs-on:
group: aws-g6-4xlarge-plus
container:
image: diffusers/diffusers-pytorch-cuda
options: --shm-size "16gb" --ipc host -v /mnt/hf_cache:/mnt/cache/ --gpus 0
image: diffusers/diffusers-pytorch-compile-cuda
options: --shm-size "16gb" --ipc host --gpus 0
steps:
- name: Checkout diffusers
uses: actions/checkout@v3
@@ -31,7 +34,6 @@ jobs:
nvidia-smi
- name: Install dependencies
run: |
apt-get update && apt-get install libsndfile1-dev libgl1 -y
python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
python -m uv pip install -e [quality,test]
python -m uv pip install pandas peft
@@ -40,7 +42,7 @@ jobs:
python utils/print_env.py
- name: Diffusers Benchmarking
env:
HUGGING_FACE_HUB_TOKEN: ${{ secrets.DIFFUSERS_BOT_TOKEN }}
HF_TOKEN: ${{ secrets.DIFFUSERS_BOT_TOKEN }}
BASE_PATH: benchmark_outputs
run: |
export TOTAL_GPU_MEMORY=$(python -c "import torch; print(torch.cuda.get_device_properties(0).total_memory / (1024**3))")
@@ -52,3 +54,13 @@ jobs:
with:
name: benchmark_test_reports
path: benchmarks/benchmark_outputs
- name: Report success status
if: ${{ success() }}
run: |
pip install requests && python utils/notify_benchmarking_status.py --status=success
- name: Report failure status
if: ${{ failure() }}
run: |
pip install requests && python utils/notify_benchmarking_status.py --status=failure

View File

@@ -20,7 +20,7 @@ env:
jobs:
test-build-docker-images:
runs-on: ubuntu-latest
runs-on: [ self-hosted, intel-cpu, 8-cpu, ci ]
if: github.event_name == 'pull_request'
steps:
- name: Set up Docker Buildx
@@ -50,7 +50,7 @@ jobs:
if: steps.file_changes.outputs.all != ''
build-and-push-docker-images:
runs-on: ubuntu-latest
runs-on: [ self-hosted, intel-cpu, 8-cpu, ci ]
if: github.event_name != 'pull_request'
permissions:
@@ -69,17 +69,18 @@ jobs:
- diffusers-flax-tpu
- diffusers-onnxruntime-cpu
- diffusers-onnxruntime-cuda
- diffusers-doc-builder
steps:
- name: Checkout repository
uses: actions/checkout@v3
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v1
- name: Login to Docker Hub
uses: docker/login-action@v2
with:
username: ${{ env.REGISTRY }}
password: ${{ secrets.DOCKERHUB_TOKEN }}
- name: Build and push
uses: docker/build-push-action@v3
with:
@@ -90,24 +91,11 @@ jobs:
- name: Post to a Slack channel
id: slack
uses: slackapi/slack-github-action@6c661ce58804a1a20f6dc5fbee7f0381b469e001
uses: huggingface/hf-workflows/.github/actions/post-slack@main
with:
# Slack channel id, channel name, or user id to post message.
# See also: https://api.slack.com/methods/chat.postMessage#channels
channel-id: ${{ env.CI_SLACK_CHANNEL }}
# For posting a rich message using Block Kit
payload: |
{
"text": "${{ matrix.image-name }} Docker Image build result: ${{ job.status }}\n${{ github.event.head_commit.url }}",
"blocks": [
{
"type": "section",
"text": {
"type": "mrkdwn",
"text": "${{ matrix.image-name }} Docker Image build result: ${{ job.status }}\n${{ github.event.head_commit.url }}"
}
}
]
}
env:
SLACK_BOT_TOKEN: ${{ secrets.SLACK_CIFEEDBACK_BOT_TOKEN }}
slack_channel: ${{ env.CI_SLACK_CHANNEL }}
title: "🤗 Results of the ${{ matrix.image-name }} Docker Image build"
status: ${{ job.status }}
slack_token: ${{ secrets.SLACK_CIFEEDBACK_BOT_TOKEN }}

View File

@@ -21,7 +21,7 @@ jobs:
package: diffusers
notebook_folder: diffusers_doc
languages: en ko zh ja pt
custom_container: diffusers/diffusers-doc-builder
secrets:
token: ${{ secrets.HUGGINGFACE_PUSH }}
hf_token: ${{ secrets.HF_DOC_BUILD_PUSH }}

View File

@@ -20,3 +20,4 @@ jobs:
install_libgl1: true
package: diffusers
languages: en ko zh ja pt
custom_container: diffusers/diffusers-doc-builder

View File

@@ -0,0 +1,102 @@
name: Mirror Community Pipeline
on:
# Push changes on the main branch
push:
branches:
- main
paths:
- 'examples/community/**.py'
# And on tag creation (e.g. `v0.28.1`)
tags:
- '*'
# Manual trigger with ref input
workflow_dispatch:
inputs:
ref:
description: "Either 'main' or a tag ref"
required: true
default: 'main'
jobs:
mirror_community_pipeline:
env:
SLACK_WEBHOOK_URL: ${{ secrets.SLACK_WEBHOOK_URL_COMMUNITY_MIRROR }}
runs-on: ubuntu-latest
steps:
# Checkout to correct ref
# If workflow dispatch
# If ref is 'main', set:
# CHECKOUT_REF=refs/heads/main
# PATH_IN_REPO=main
# Else it must be a tag. Set:
# CHECKOUT_REF=refs/tags/{tag}
# PATH_IN_REPO={tag}
# If not workflow dispatch
# If ref is 'refs/heads/main' => set 'main'
# Else it must be a tag => set {tag}
- name: Set checkout_ref and path_in_repo
run: |
if [ "${{ github.event_name }}" == "workflow_dispatch" ]; then
if [ -z "${{ github.event.inputs.ref }}" ]; then
echo "Error: Missing ref input"
exit 1
elif [ "${{ github.event.inputs.ref }}" == "main" ]; then
echo "CHECKOUT_REF=refs/heads/main" >> $GITHUB_ENV
echo "PATH_IN_REPO=main" >> $GITHUB_ENV
else
echo "CHECKOUT_REF=refs/tags/${{ github.event.inputs.ref }}" >> $GITHUB_ENV
echo "PATH_IN_REPO=${{ github.event.inputs.ref }}" >> $GITHUB_ENV
fi
elif [ "${{ github.ref }}" == "refs/heads/main" ]; then
echo "CHECKOUT_REF=${{ github.ref }}" >> $GITHUB_ENV
echo "PATH_IN_REPO=main" >> $GITHUB_ENV
else
# e.g. refs/tags/v0.28.1 -> v0.28.1
echo "CHECKOUT_REF=${{ github.ref }}" >> $GITHUB_ENV
echo "PATH_IN_REPO=$(echo ${{ github.ref }} | sed 's/^refs\/tags\///')" >> $GITHUB_ENV
fi
- name: Print env vars
run: |
echo "CHECKOUT_REF: ${{ env.CHECKOUT_REF }}"
echo "PATH_IN_REPO: ${{ env.PATH_IN_REPO }}"
- uses: actions/checkout@v3
with:
ref: ${{ env.CHECKOUT_REF }}
# Setup + install dependencies
- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: "3.10"
- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install --upgrade huggingface_hub
# Check secret is set
- name: whoami
run: huggingface-cli whoami
env:
HF_TOKEN: ${{ secrets.HF_TOKEN_MIRROR_COMMUNITY_PIPELINES }}
# Push to HF! (under subfolder based on checkout ref)
# https://huggingface.co/datasets/diffusers/community-pipelines-mirror
- name: Mirror community pipeline to HF
run: huggingface-cli upload diffusers/community-pipelines-mirror ./examples/community ${PATH_IN_REPO} --repo-type dataset
env:
PATH_IN_REPO: ${{ env.PATH_IN_REPO }}
HF_TOKEN: ${{ secrets.HF_TOKEN_MIRROR_COMMUNITY_PIPELINES }}
- name: Report success status
if: ${{ success() }}
run: |
pip install requests && python utils/notify_community_pipelines_mirror.py --status=success
- name: Report failure status
if: ${{ failure() }}
run: |
pip install requests && python utils/notify_community_pipelines_mirror.py --status=failure

View File

@@ -1,52 +1,243 @@
name: Nightly tests on main
name: Nightly and release tests on main/release branch
on:
workflow_dispatch:
schedule:
- cron: "0 0 * * *" # every day at midnight
env:
DIFFUSERS_IS_CI: yes
HF_HOME: /mnt/cache
HF_HUB_ENABLE_HF_TRANSFER: 1
OMP_NUM_THREADS: 8
MKL_NUM_THREADS: 8
PYTEST_TIMEOUT: 600
RUN_SLOW: yes
RUN_NIGHTLY: yes
PIPELINE_USAGE_CUTOFF: 5000
SLACK_API_TOKEN: ${{ secrets.SLACK_CIFEEDBACK_BOT_TOKEN }}
jobs:
run_nightly_tests:
setup_torch_cuda_pipeline_matrix:
name: Setup Torch Pipelines CUDA Slow Tests Matrix
runs-on: [ self-hosted, intel-cpu, 8-cpu, ci ]
container:
image: diffusers/diffusers-pytorch-cpu
outputs:
pipeline_test_matrix: ${{ steps.fetch_pipeline_matrix.outputs.pipeline_test_matrix }}
steps:
- name: Checkout diffusers
uses: actions/checkout@v3
with:
fetch-depth: 2
- name: Install dependencies
run: |
pip install -e .
pip install huggingface_hub
- name: Fetch Pipeline Matrix
id: fetch_pipeline_matrix
run: |
matrix=$(python utils/fetch_torch_cuda_pipeline_test_matrix.py)
echo $matrix
echo "pipeline_test_matrix=$matrix" >> $GITHUB_OUTPUT
- name: Pipeline Tests Artifacts
if: ${{ always() }}
uses: actions/upload-artifact@v2
with:
name: test-pipelines.json
path: reports
run_nightly_tests_for_torch_pipelines:
name: Nightly Torch Pipelines CUDA Tests
needs: setup_torch_cuda_pipeline_matrix
strategy:
fail-fast: false
max-parallel: 8
matrix:
config:
- name: Nightly PyTorch CUDA tests on Ubuntu
framework: pytorch
runner: docker-gpu
image: diffusers/diffusers-pytorch-cuda
report: torch_cuda
- name: Nightly Flax TPU tests on Ubuntu
framework: flax
runner: docker-tpu
image: diffusers/diffusers-flax-tpu
report: flax_tpu
- name: Nightly ONNXRuntime CUDA tests on Ubuntu
framework: onnxruntime
runner: docker-gpu
image: diffusers/diffusers-onnxruntime-cuda
report: onnx_cuda
name: ${{ matrix.config.name }}
runs-on: ${{ matrix.config.runner }}
module: ${{ fromJson(needs.setup_torch_cuda_pipeline_matrix.outputs.pipeline_test_matrix) }}
runs-on: [single-gpu, nvidia-gpu, t4, ci]
container:
image: ${{ matrix.config.image }}
options: --shm-size "16gb" --ipc host -v /mnt/hf_cache:/mnt/cache/ ${{ matrix.config.runner == 'docker-tpu' && '--privileged' || '--gpus 0'}}
image: diffusers/diffusers-pytorch-cuda
options: --shm-size "16gb" --ipc host --gpus 0
steps:
- name: Checkout diffusers
uses: actions/checkout@v3
with:
fetch-depth: 2
- name: NVIDIA-SMI
run: nvidia-smi
- name: Install dependencies
run: |
python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
python -m uv pip install -e [quality,test]
python -m uv pip install accelerate@git+https://github.com/huggingface/accelerate.git
python -m uv pip install pytest-reportlog
- name: Environment
run: |
python utils/print_env.py
- name: Pipeline CUDA Test
env:
HF_TOKEN: ${{ secrets.HF_TOKEN }}
# https://pytorch.org/docs/stable/notes/randomness.html#avoiding-nondeterministic-algorithms
CUBLAS_WORKSPACE_CONFIG: :16:8
run: |
python -m pytest -n 1 --max-worker-restart=0 --dist=loadfile \
-s -v -k "not Flax and not Onnx" \
--make-reports=tests_pipeline_${{ matrix.module }}_cuda \
--report-log=tests_pipeline_${{ matrix.module }}_cuda.log \
tests/pipelines/${{ matrix.module }}
- name: Failure short reports
if: ${{ failure() }}
run: |
cat reports/tests_pipeline_${{ matrix.module }}_cuda_stats.txt
cat reports/tests_pipeline_${{ matrix.module }}_cuda_failures_short.txt
- name: Test suite reports artifacts
if: ${{ always() }}
uses: actions/upload-artifact@v2
with:
name: pipeline_${{ matrix.module }}_test_reports
path: reports
- name: Generate Report and Notify Channel
if: always()
run: |
pip install slack_sdk tabulate
python utils/log_reports.py >> $GITHUB_STEP_SUMMARY
run_nightly_tests_for_other_torch_modules:
name: Nightly Torch CUDA Tests
runs-on: [single-gpu, nvidia-gpu, t4, ci]
container:
image: diffusers/diffusers-pytorch-cuda
options: --shm-size "16gb" --ipc host --gpus 0
defaults:
run:
shell: bash
strategy:
max-parallel: 2
matrix:
module: [models, schedulers, lora, others, single_file, examples]
steps:
- name: Checkout diffusers
uses: actions/checkout@v3
with:
fetch-depth: 2
- name: Install dependencies
run: |
python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
python -m uv pip install -e [quality,test]
python -m uv pip install accelerate@git+https://github.com/huggingface/accelerate.git
python -m uv pip install peft@git+https://github.com/huggingface/peft.git
python -m uv pip install pytest-reportlog
- name: Environment
run: python utils/print_env.py
- name: Run nightly PyTorch CUDA tests for non-pipeline modules
if: ${{ matrix.module != 'examples'}}
env:
HF_TOKEN: ${{ secrets.HF_TOKEN }}
# https://pytorch.org/docs/stable/notes/randomness.html#avoiding-nondeterministic-algorithms
CUBLAS_WORKSPACE_CONFIG: :16:8
run: |
python -m pytest -n 1 --max-worker-restart=0 --dist=loadfile \
-s -v -k "not Flax and not Onnx" \
--make-reports=tests_torch_${{ matrix.module }}_cuda \
--report-log=tests_torch_${{ matrix.module }}_cuda.log \
tests/${{ matrix.module }}
- name: Run nightly example tests with Torch
if: ${{ matrix.module == 'examples' }}
env:
HF_TOKEN: ${{ secrets.HF_TOKEN }}
# https://pytorch.org/docs/stable/notes/randomness.html#avoiding-nondeterministic-algorithms
CUBLAS_WORKSPACE_CONFIG: :16:8
run: |
python -m pytest -n 1 --max-worker-restart=0 --dist=loadfile \
-s -v --make-reports=examples_torch_cuda \
--report-log=examples_torch_cuda.log \
examples/
- name: Failure short reports
if: ${{ failure() }}
run: |
cat reports/tests_torch_${{ matrix.module }}_cuda_stats.txt
cat reports/tests_torch_${{ matrix.module }}_cuda_failures_short.txt
- name: Test suite reports artifacts
if: ${{ always() }}
uses: actions/upload-artifact@v2
with:
name: torch_${{ matrix.module }}_cuda_test_reports
path: reports
- name: Generate Report and Notify Channel
if: always()
run: |
pip install slack_sdk tabulate
python utils/log_reports.py >> $GITHUB_STEP_SUMMARY
run_flax_tpu_tests:
name: Nightly Flax TPU Tests
runs-on: docker-tpu
if: github.event_name == 'schedule'
container:
image: diffusers/diffusers-flax-tpu
options: --shm-size "16gb" --ipc host -v /mnt/hf_cache:/mnt/cache/ --privileged
defaults:
run:
shell: bash
steps:
- name: Checkout diffusers
uses: actions/checkout@v3
with:
fetch-depth: 2
- name: Install dependencies
run: |
python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
python -m uv pip install -e [quality,test]
python -m uv pip install accelerate@git+https://github.com/huggingface/accelerate.git
python -m uv pip install pytest-reportlog
- name: Environment
run: python utils/print_env.py
- name: Run nightly Flax TPU tests
env:
HF_TOKEN: ${{ secrets.HF_TOKEN }}
run: |
python -m pytest -n 0 \
-s -v -k "Flax" \
--make-reports=tests_flax_tpu \
--report-log=tests_flax_tpu.log \
tests/
- name: Failure short reports
if: ${{ failure() }}
run: |
cat reports/tests_flax_tpu_stats.txt
cat reports/tests_flax_tpu_failures_short.txt
- name: Test suite reports artifacts
if: ${{ always() }}
uses: actions/upload-artifact@v2
with:
name: flax_tpu_test_reports
path: reports
- name: Generate Report and Notify Channel
if: always()
run: |
pip install slack_sdk tabulate
python utils/log_reports.py >> $GITHUB_STEP_SUMMARY
run_nightly_onnx_tests:
name: Nightly ONNXRuntime CUDA tests on Ubuntu
runs-on: [single-gpu, nvidia-gpu, t4, ci]
container:
image: diffusers/diffusers-onnxruntime-cuda
options: --gpus 0 --shm-size "16gb" --ipc host
steps:
- name: Checkout diffusers
@@ -55,61 +246,32 @@ jobs:
fetch-depth: 2
- name: NVIDIA-SMI
if: ${{ matrix.config.runner == 'docker-gpu' }}
run: |
nvidia-smi
run: nvidia-smi
- name: Install dependencies
run: |
python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
python -m uv pip install -e [quality,test]
python -m uv pip install -U transformers@git+https://github.com/huggingface/transformers
python -m uv pip install accelerate@git+https://github.com/huggingface/accelerate
python -m uv pip install accelerate@git+https://github.com/huggingface/accelerate.git
python -m uv pip install pytest-reportlog
- name: Environment
run: |
python utils/print_env.py
run: python utils/print_env.py
- name: Run nightly PyTorch CUDA tests
if: ${{ matrix.config.framework == 'pytorch' }}
- name: Run Nightly ONNXRuntime CUDA tests
env:
HUGGING_FACE_HUB_TOKEN: ${{ secrets.HUGGING_FACE_HUB_TOKEN }}
HF_TOKEN: ${{ secrets.HF_TOKEN }}
run: |
python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
python -m pytest -n 1 --max-worker-restart=0 --dist=loadfile \
-s -v -k "not Flax and not Onnx" \
--make-reports=tests_${{ matrix.config.report }} \
--report-log=${{ matrix.config.report }}.log \
tests/
- name: Run nightly Flax TPU tests
if: ${{ matrix.config.framework == 'flax' }}
env:
HUGGING_FACE_HUB_TOKEN: ${{ secrets.HUGGING_FACE_HUB_TOKEN }}
run: |
python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
python -m pytest -n 0 \
-s -v -k "Flax" \
--make-reports=tests_${{ matrix.config.report }} \
--report-log=${{ matrix.config.report }}.log \
tests/
- name: Run nightly ONNXRuntime CUDA tests
if: ${{ matrix.config.framework == 'onnxruntime' }}
env:
HUGGING_FACE_HUB_TOKEN: ${{ secrets.HUGGING_FACE_HUB_TOKEN }}
run: |
python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
python -m pytest -n 1 --max-worker-restart=0 --dist=loadfile \
-s -v -k "Onnx" \
--make-reports=tests_${{ matrix.config.report }} \
--report-log=${{ matrix.config.report }}.log \
--make-reports=tests_onnx_cuda \
--report-log=tests_onnx_cuda.log \
tests/
- name: Failure short reports
if: ${{ failure() }}
run: cat reports/tests_${{ matrix.config.report }}_failures_short.txt
run: |
cat reports/tests_onnx_cuda_stats.txt
cat reports/tests_onnx_cuda_failures_short.txt
- name: Test suite reports artifacts
if: ${{ always() }}
@@ -122,11 +284,12 @@ jobs:
if: always()
run: |
pip install slack_sdk tabulate
python scripts/log_reports.py >> $GITHUB_STEP_SUMMARY
python utils/log_reports.py >> $GITHUB_STEP_SUMMARY
run_nightly_tests_apple_m1:
name: Nightly PyTorch MPS tests on MacOS
runs-on: [ self-hosted, apple-m1 ]
if: github.event_name == 'schedule'
steps:
- name: Checkout diffusers
@@ -162,7 +325,7 @@ jobs:
shell: arch -arch arm64 bash {0}
env:
HF_HOME: /System/Volumes/Data/mnt/cache
HUGGING_FACE_HUB_TOKEN: ${{ secrets.HUGGING_FACE_HUB_TOKEN }}
HF_TOKEN: ${{ secrets.HF_TOKEN }}
run: |
${CONDA_RUN} python -m pytest -n 1 -s -v --make-reports=tests_torch_mps \
--report-log=tests_torch_mps.log \
@@ -183,4 +346,4 @@ jobs:
if: always()
run: |
pip install slack_sdk tabulate
python scripts/log_reports.py >> $GITHUB_STEP_SUMMARY
python utils/log_reports.py >> $GITHUB_STEP_SUMMARY

View File

@@ -33,4 +33,3 @@ jobs:
run: |
python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
pytest tests/others/test_dependencies.py

View File

@@ -15,7 +15,7 @@ concurrency:
jobs:
setup_pr_tests:
name: Setup PR Tests
runs-on: docker-cpu
runs-on: [ self-hosted, intel-cpu, 8-cpu, ci ]
container:
image: diffusers/diffusers-pytorch-cpu
options: --shm-size "16gb" --ipc host -v /mnt/hf_cache:/mnt/cache/
@@ -32,7 +32,6 @@ jobs:
fetch-depth: 0
- name: Install dependencies
run: |
apt-get update && apt-get install libsndfile1-dev libgl1 -y
python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
python -m uv pip install -e [quality,test]
- name: Environment
@@ -74,7 +73,7 @@ jobs:
max-parallel: 2
matrix:
modules: ${{ fromJson(needs.setup_pr_tests.outputs.matrix) }}
runs-on: docker-cpu
runs-on: [ self-hosted, intel-cpu, 8-cpu, ci ]
container:
image: diffusers/diffusers-pytorch-cpu
options: --shm-size "16gb" --ipc host -v /mnt/hf_cache:/mnt/cache/
@@ -89,7 +88,6 @@ jobs:
- name: Install dependencies
run: |
apt-get update && apt-get install libsndfile1-dev libgl1 -y
python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
python -m pip install -e [quality,test]
python -m pip install accelerate
@@ -125,7 +123,7 @@ jobs:
config:
- name: Hub tests for models, schedulers, and pipelines
framework: hub_tests_pytorch
runner: docker-cpu
runner: [ self-hosted, intel-cpu, 8-cpu, ci ]
image: diffusers/diffusers-pytorch-cpu
report: torch_hub
@@ -147,7 +145,6 @@ jobs:
- name: Install dependencies
run: |
apt-get update && apt-get install libsndfile1-dev libgl1 -y
python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
python -m pip install -e [quality,test]

View File

@@ -32,9 +32,11 @@ jobs:
python -m pip install --upgrade pip
pip install .[quality]
- name: Check quality
run: make quality
- name: Check if failure
if: ${{ failure() }}
run: |
ruff check examples tests src utils scripts
ruff format examples tests src utils scripts --check
echo "Quality check failed. Please ensure the right dependency versions are installed with 'pip install -e .[quality]' and run 'make style && make quality'" >> $GITHUB_STEP_SUMMARY
check_repository_consistency:
needs: check_code_quality
@@ -49,11 +51,15 @@ jobs:
run: |
python -m pip install --upgrade pip
pip install .[quality]
- name: Check quality
- name: Check repo consistency
run: |
python utils/check_copies.py
python utils/check_dummies.py
make deps_table_check_updated
- name: Check if failure
if: ${{ failure() }}
run: |
echo "Repo consistency check failed. Please ensure the right dependency versions are installed with 'pip install -e .[quality]' and run 'make fix-copies'" >> $GITHUB_STEP_SUMMARY
run_fast_tests:
needs: [check_code_quality, check_repository_consistency]
@@ -65,7 +71,7 @@ jobs:
name: LoRA - ${{ matrix.lib-versions }}
runs-on: docker-cpu
runs-on: [ self-hosted, intel-cpu, 8-cpu, ci ]
container:
image: diffusers/diffusers-pytorch-cpu
@@ -83,11 +89,10 @@ jobs:
- name: Install dependencies
run: |
apt-get update && apt-get install libsndfile1-dev libgl1 -y
python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
python -m uv pip install -e [quality,test]
if [ "${{ matrix.lib-versions }}" == "main" ]; then
python -m uv pip install -U peft@git+https://github.com/huggingface/peft.git
python -m pip install -U peft@git+https://github.com/huggingface/peft.git
python -m uv pip install -U transformers@git+https://github.com/huggingface/transformers.git
python -m uv pip install -U accelerate@git+https://github.com/huggingface/accelerate.git
else
@@ -102,7 +107,25 @@ jobs:
- name: Run fast PyTorch LoRA CPU tests with PEFT backend
run: |
python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
python -m pytest -n 1 --max-worker-restart=0 --dist=loadfile \
python -m pytest -n 4 --max-worker-restart=0 --dist=loadfile \
-s -v \
--make-reports=tests_${{ matrix.config.report }} \
tests/lora/test_lora_layers_peft.py
tests/lora/
python -m pytest -n 4 --max-worker-restart=0 --dist=loadfile \
-s -v \
--make-reports=tests_models_lora_${{ matrix.config.report }} \
tests/models/ -k "lora"
- name: Failure short reports
if: ${{ failure() }}
run: |
cat reports/tests_${{ matrix.config.report }}_failures_short.txt
cat reports/tests_models_lora_${{ matrix.config.report }}_failures_short.txt
- name: Test suite reports artifacts
if: ${{ always() }}
uses: actions/upload-artifact@v2
with:
name: pr_${{ matrix.config.report }}_test_reports
path: reports

View File

@@ -40,9 +40,11 @@ jobs:
python -m pip install --upgrade pip
pip install .[quality]
- name: Check quality
run: make quality
- name: Check if failure
if: ${{ failure() }}
run: |
ruff check examples tests src utils scripts
ruff format examples tests src utils scripts --check
echo "Quality check failed. Please ensure the right dependency versions are installed with 'pip install -e .[quality]' and run 'make style && make quality'" >> $GITHUB_STEP_SUMMARY
check_repository_consistency:
needs: check_code_quality
@@ -57,11 +59,15 @@ jobs:
run: |
python -m pip install --upgrade pip
pip install .[quality]
- name: Check quality
- name: Check repo consistency
run: |
python utils/check_copies.py
python utils/check_dummies.py
make deps_table_check_updated
- name: Check if failure
if: ${{ failure() }}
run: |
echo "Repo consistency check failed. Please ensure the right dependency versions are installed with 'pip install -e .[quality]' and run 'make fix-copies'" >> $GITHUB_STEP_SUMMARY
run_fast_tests:
needs: [check_code_quality, check_repository_consistency]
@@ -71,22 +77,22 @@ jobs:
config:
- name: Fast PyTorch Pipeline CPU tests
framework: pytorch_pipelines
runner: docker-cpu
runner: [ self-hosted, intel-cpu, 32-cpu, 256-ram, ci ]
image: diffusers/diffusers-pytorch-cpu
report: torch_cpu_pipelines
- name: Fast PyTorch Models & Schedulers CPU tests
framework: pytorch_models
runner: docker-cpu
runner: [ self-hosted, intel-cpu, 8-cpu, ci ]
image: diffusers/diffusers-pytorch-cpu
report: torch_cpu_models_schedulers
- name: Fast Flax CPU tests
framework: flax
runner: docker-cpu
runner: [ self-hosted, intel-cpu, 8-cpu, ci ]
image: diffusers/diffusers-flax-cpu
report: flax_cpu
- name: PyTorch Example CPU tests
framework: pytorch_examples
runner: docker-cpu
runner: [ self-hosted, intel-cpu, 8-cpu, ci ]
image: diffusers/diffusers-pytorch-cpu
report: torch_example_cpu
@@ -110,7 +116,6 @@ jobs:
- name: Install dependencies
run: |
apt-get update && apt-get install libsndfile1-dev libgl1 -y
python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
python -m uv pip install -e [quality,test]
python -m uv pip install accelerate
@@ -124,7 +129,7 @@ jobs:
if: ${{ matrix.config.framework == 'pytorch_pipelines' }}
run: |
python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
python -m pytest -n 2 --max-worker-restart=0 --dist=loadfile \
python -m pytest -n 8 --max-worker-restart=0 --dist=loadfile \
-s -v -k "not Flax and not Onnx" \
--make-reports=tests_${{ matrix.config.report }} \
tests/pipelines
@@ -133,7 +138,7 @@ jobs:
if: ${{ matrix.config.framework == 'pytorch_models' }}
run: |
python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
python -m pytest -n 2 --max-worker-restart=0 --dist=loadfile \
python -m pytest -n 4 --max-worker-restart=0 --dist=loadfile \
-s -v -k "not Flax and not Onnx and not Dependency" \
--make-reports=tests_${{ matrix.config.report }} \
tests/models tests/schedulers tests/others
@@ -142,7 +147,7 @@ jobs:
if: ${{ matrix.config.framework == 'flax' }}
run: |
python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
python -m pytest -n 2 --max-worker-restart=0 --dist=loadfile \
python -m pytest -n 4 --max-worker-restart=0 --dist=loadfile \
-s -v -k "Flax" \
--make-reports=tests_${{ matrix.config.report }} \
tests
@@ -151,8 +156,8 @@ jobs:
if: ${{ matrix.config.framework == 'pytorch_examples' }}
run: |
python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
python -m uv pip install peft
python -m pytest -n 2 --max-worker-restart=0 --dist=loadfile \
python -m uv pip install peft timm
python -m pytest -n 4 --max-worker-restart=0 --dist=loadfile \
--make-reports=tests_${{ matrix.config.report }} \
examples
@@ -175,7 +180,7 @@ jobs:
config:
- name: Hub tests for models, schedulers, and pipelines
framework: hub_tests_pytorch
runner: docker-cpu
runner: [ self-hosted, intel-cpu, 8-cpu, ci ]
image: diffusers/diffusers-pytorch-cpu
report: torch_hub
@@ -199,7 +204,6 @@ jobs:
- name: Install dependencies
run: |
apt-get update && apt-get install libsndfile1-dev libgl1 -y
python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
python -m uv pip install -e [quality,test]

View File

@@ -11,20 +11,17 @@ on:
env:
DIFFUSERS_IS_CI: yes
HF_HOME: /mnt/cache
OMP_NUM_THREADS: 8
MKL_NUM_THREADS: 8
PYTEST_TIMEOUT: 600
RUN_SLOW: yes
PIPELINE_USAGE_CUTOFF: 50000
jobs:
setup_torch_cuda_pipeline_matrix:
name: Setup Torch Pipelines CUDA Slow Tests Matrix
runs-on: [single-gpu, nvidia-gpu, t4, ci]
runs-on: [ self-hosted, intel-cpu, 8-cpu, ci ]
container:
image: diffusers/diffusers-pytorch-cpu # this is a CPU image, but we need it to fetch the matrix
options: --shm-size "16gb" --ipc host
image: diffusers/diffusers-pytorch-cpu
outputs:
pipeline_test_matrix: ${{ steps.fetch_pipeline_matrix.outputs.pipeline_test_matrix }}
steps:
@@ -34,22 +31,17 @@ jobs:
fetch-depth: 2
- name: Install dependencies
run: |
apt-get update && apt-get install libsndfile1-dev libgl1 -y
python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
python -m uv pip install -e [quality,test]
python -m uv pip install accelerate@git+https://github.com/huggingface/accelerate.git
- name: Environment
run: |
python utils/print_env.py
- name: Fetch Pipeline Matrix
id: fetch_pipeline_matrix
run: |
matrix=$(python utils/fetch_torch_cuda_pipeline_test_matrix.py)
echo $matrix
echo "pipeline_test_matrix=$matrix" >> $GITHUB_OUTPUT
- name: Pipeline Tests Artifacts
if: ${{ always() }}
uses: actions/upload-artifact@v2
@@ -58,16 +50,17 @@ jobs:
path: reports
torch_pipelines_cuda_tests:
name: Torch Pipelines CUDA Slow Tests
name: Torch Pipelines CUDA Tests
needs: setup_torch_cuda_pipeline_matrix
strategy:
fail-fast: false
max-parallel: 8
matrix:
module: ${{ fromJson(needs.setup_torch_cuda_pipeline_matrix.outputs.pipeline_test_matrix) }}
runs-on: [single-gpu, nvidia-gpu, t4, ci]
container:
image: diffusers/diffusers-pytorch-cuda
options: --shm-size "16gb" --ipc host -v /mnt/hf_cache:/mnt/cache/ --gpus 0
options: --shm-size "16gb" --ipc host --gpus 0
steps:
- name: Checkout diffusers
uses: actions/checkout@v3
@@ -78,7 +71,6 @@ jobs:
nvidia-smi
- name: Install dependencies
run: |
apt-get update && apt-get install libsndfile1-dev libgl1 -y
python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
python -m uv pip install -e [quality,test]
python -m uv pip install accelerate@git+https://github.com/huggingface/accelerate.git
@@ -87,7 +79,7 @@ jobs:
python utils/print_env.py
- name: Slow PyTorch CUDA checkpoint tests on Ubuntu
env:
HUGGING_FACE_HUB_TOKEN: ${{ secrets.HUGGING_FACE_HUB_TOKEN }}
HF_TOKEN: ${{ secrets.HF_TOKEN }}
# https://pytorch.org/docs/stable/notes/randomness.html#avoiding-nondeterministic-algorithms
CUBLAS_WORKSPACE_CONFIG: :16:8
run: |
@@ -100,7 +92,6 @@ jobs:
run: |
cat reports/tests_pipeline_${{ matrix.module }}_cuda_stats.txt
cat reports/tests_pipeline_${{ matrix.module }}_cuda_failures_short.txt
- name: Test suite reports artifacts
if: ${{ always() }}
uses: actions/upload-artifact@v2
@@ -110,16 +101,16 @@ jobs:
torch_cuda_tests:
name: Torch CUDA Tests
runs-on: docker-gpu
runs-on: [single-gpu, nvidia-gpu, t4, ci]
container:
image: diffusers/diffusers-pytorch-cuda
options: --shm-size "16gb" --ipc host -v /mnt/hf_cache:/mnt/cache/ --gpus 0
options: --shm-size "16gb" --ipc host --gpus 0
defaults:
run:
shell: bash
strategy:
matrix:
module: [models, schedulers, lora, others]
module: [models, schedulers, lora, others, single_file]
steps:
- name: Checkout diffusers
uses: actions/checkout@v3
@@ -128,18 +119,18 @@ jobs:
- name: Install dependencies
run: |
apt-get update && apt-get install libsndfile1-dev libgl1 -y
python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
python -m uv pip install -e [quality,test]
python -m uv pip install accelerate@git+https://github.com/huggingface/accelerate.git
python -m uv pip install peft@git+https://github.com/huggingface/peft.git
- name: Environment
run: |
python utils/print_env.py
- name: Run slow PyTorch CUDA tests
- name: Run PyTorch CUDA tests
env:
HUGGING_FACE_HUB_TOKEN: ${{ secrets.HUGGING_FACE_HUB_TOKEN }}
HF_TOKEN: ${{ secrets.HF_TOKEN }}
# https://pytorch.org/docs/stable/notes/randomness.html#avoiding-nondeterministic-algorithms
CUBLAS_WORKSPACE_CONFIG: :16:8
run: |
@@ -161,63 +152,12 @@ jobs:
name: torch_cuda_test_reports
path: reports
peft_cuda_tests:
name: PEFT CUDA Tests
runs-on: docker-gpu
container:
image: diffusers/diffusers-pytorch-cuda
options: --shm-size "16gb" --ipc host -v /mnt/hf_cache:/mnt/cache/ --gpus 0
defaults:
run:
shell: bash
steps:
- name: Checkout diffusers
uses: actions/checkout@v3
with:
fetch-depth: 2
- name: Install dependencies
run: |
apt-get update && apt-get install libsndfile1-dev libgl1 -y
python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
python -m uv pip install -e [quality,test]
python -m uv pip install accelerate@git+https://github.com/huggingface/accelerate.git
python -m uv pip install peft@git+https://github.com/huggingface/peft.git
- name: Environment
run: |
python utils/print_env.py
- name: Run slow PEFT CUDA tests
env:
HUGGING_FACE_HUB_TOKEN: ${{ secrets.HUGGING_FACE_HUB_TOKEN }}
# https://pytorch.org/docs/stable/notes/randomness.html#avoiding-nondeterministic-algorithms
CUBLAS_WORKSPACE_CONFIG: :16:8
run: |
python -m pytest -n 1 --max-worker-restart=0 --dist=loadfile \
-s -v -k "not Flax and not Onnx and not PEFTLoRALoading" \
--make-reports=tests_peft_cuda \
tests/lora/
- name: Failure short reports
if: ${{ failure() }}
run: |
cat reports/tests_peft_cuda_stats.txt
cat reports/tests_peft_cuda_failures_short.txt
- name: Test suite reports artifacts
if: ${{ always() }}
uses: actions/upload-artifact@v2
with:
name: torch_peft_test_reports
path: reports
flax_tpu_tests:
name: Flax TPU Tests
runs-on: docker-tpu
container:
image: diffusers/diffusers-flax-tpu
options: --shm-size "16gb" --ipc host -v /mnt/hf_cache:/mnt/cache/ --privileged
options: --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/ --privileged
defaults:
run:
shell: bash
@@ -229,7 +169,6 @@ jobs:
- name: Install dependencies
run: |
apt-get update && apt-get install libsndfile1-dev libgl1 -y
python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
python -m uv pip install -e [quality,test]
python -m uv pip install accelerate@git+https://github.com/huggingface/accelerate.git
@@ -240,7 +179,7 @@ jobs:
- name: Run slow Flax TPU tests
env:
HUGGING_FACE_HUB_TOKEN: ${{ secrets.HUGGING_FACE_HUB_TOKEN }}
HF_TOKEN: ${{ secrets.HF_TOKEN }}
run: |
python -m pytest -n 0 \
-s -v -k "Flax" \
@@ -262,10 +201,10 @@ jobs:
onnx_cuda_tests:
name: ONNX CUDA Tests
runs-on: docker-gpu
runs-on: [single-gpu, nvidia-gpu, t4, ci]
container:
image: diffusers/diffusers-onnxruntime-cuda
options: --shm-size "16gb" --ipc host -v /mnt/hf_cache:/mnt/cache/ --gpus 0
options: --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/ --gpus 0
defaults:
run:
shell: bash
@@ -277,7 +216,6 @@ jobs:
- name: Install dependencies
run: |
apt-get update && apt-get install libsndfile1-dev libgl1 -y
python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
python -m uv pip install -e [quality,test]
python -m uv pip install accelerate@git+https://github.com/huggingface/accelerate.git
@@ -288,7 +226,7 @@ jobs:
- name: Run slow ONNXRuntime CUDA tests
env:
HUGGING_FACE_HUB_TOKEN: ${{ secrets.HUGGING_FACE_HUB_TOKEN }}
HF_TOKEN: ${{ secrets.HF_TOKEN }}
run: |
python -m pytest -n 1 --max-worker-restart=0 --dist=loadfile \
-s -v -k "Onnx" \
@@ -311,11 +249,11 @@ jobs:
run_torch_compile_tests:
name: PyTorch Compile CUDA tests
runs-on: docker-gpu
runs-on: [single-gpu, nvidia-gpu, t4, ci]
container:
image: diffusers/diffusers-pytorch-compile-cuda
options: --gpus 0 --shm-size "16gb" --ipc host -v /mnt/hf_cache:/mnt/cache/
options: --gpus 0 --shm-size "16gb" --ipc host
steps:
- name: Checkout diffusers
@@ -335,7 +273,8 @@ jobs:
python utils/print_env.py
- name: Run example tests on GPU
env:
HUGGING_FACE_HUB_TOKEN: ${{ secrets.HUGGING_FACE_HUB_TOKEN }}
HF_TOKEN: ${{ secrets.HF_TOKEN }}
RUN_COMPILE: yes
run: |
python -m pytest -n 1 --max-worker-restart=0 --dist=loadfile -s -v -k "compile" --make-reports=tests_torch_compile_cuda tests/
- name: Failure short reports
@@ -352,11 +291,11 @@ jobs:
run_xformers_tests:
name: PyTorch xformers CUDA tests
runs-on: docker-gpu
runs-on: [single-gpu, nvidia-gpu, t4, ci]
container:
image: diffusers/diffusers-pytorch-xformers-cuda
options: --gpus 0 --shm-size "16gb" --ipc host -v /mnt/hf_cache:/mnt/cache/
options: --gpus 0 --shm-size "16gb" --ipc host
steps:
- name: Checkout diffusers
@@ -376,7 +315,7 @@ jobs:
python utils/print_env.py
- name: Run example tests on GPU
env:
HUGGING_FACE_HUB_TOKEN: ${{ secrets.HUGGING_FACE_HUB_TOKEN }}
HF_TOKEN: ${{ secrets.HF_TOKEN }}
run: |
python -m pytest -n 1 --max-worker-restart=0 --dist=loadfile -s -v -k "xformers" --make-reports=tests_torch_xformers_cuda tests/
- name: Failure short reports
@@ -393,11 +332,11 @@ jobs:
run_examples_tests:
name: Examples PyTorch CUDA tests on Ubuntu
runs-on: docker-gpu
runs-on: [single-gpu, nvidia-gpu, t4, ci]
container:
image: diffusers/diffusers-pytorch-cuda
options: --gpus 0 --shm-size "16gb" --ipc host -v /mnt/hf_cache:/mnt/cache/
options: --gpus 0 --shm-size "16gb" --ipc host
steps:
- name: Checkout diffusers
@@ -421,9 +360,10 @@ jobs:
- name: Run example tests on GPU
env:
HUGGING_FACE_HUB_TOKEN: ${{ secrets.HUGGING_FACE_HUB_TOKEN }}
HF_TOKEN: ${{ secrets.HF_TOKEN }}
run: |
python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
python -m uv pip install timm
python -m pytest -n 1 --max-worker-restart=0 --dist=loadfile -s -v --make-reports=examples_torch_cuda examples/
- name: Failure short reports

View File

@@ -29,22 +29,22 @@ jobs:
config:
- name: Fast PyTorch CPU tests on Ubuntu
framework: pytorch
runner: docker-cpu
runner: [ self-hosted, intel-cpu, 8-cpu, ci ]
image: diffusers/diffusers-pytorch-cpu
report: torch_cpu
- name: Fast Flax CPU tests on Ubuntu
framework: flax
runner: docker-cpu
runner: [ self-hosted, intel-cpu, 8-cpu, ci ]
image: diffusers/diffusers-flax-cpu
report: flax_cpu
- name: Fast ONNXRuntime CPU tests on Ubuntu
framework: onnxruntime
runner: docker-cpu
runner: [ self-hosted, intel-cpu, 8-cpu, ci ]
image: diffusers/diffusers-onnxruntime-cpu
report: onnx_cpu
- name: PyTorch Example CPU tests on Ubuntu
framework: pytorch_examples
runner: docker-cpu
runner: [ self-hosted, intel-cpu, 8-cpu, ci ]
image: diffusers/diffusers-pytorch-cpu
report: torch_example_cpu
@@ -68,7 +68,6 @@ jobs:
- name: Install dependencies
run: |
apt-get update && apt-get install libsndfile1-dev libgl1 -y
python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
python -m uv pip install -e [quality,test]
@@ -81,7 +80,7 @@ jobs:
if: ${{ matrix.config.framework == 'pytorch' }}
run: |
python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
python -m pytest -n 2 --max-worker-restart=0 --dist=loadfile \
python -m pytest -n 4 --max-worker-restart=0 --dist=loadfile \
-s -v -k "not Flax and not Onnx" \
--make-reports=tests_${{ matrix.config.report }} \
tests/
@@ -90,7 +89,7 @@ jobs:
if: ${{ matrix.config.framework == 'flax' }}
run: |
python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
python -m pytest -n 2 --max-worker-restart=0 --dist=loadfile \
python -m pytest -n 4 --max-worker-restart=0 --dist=loadfile \
-s -v -k "Flax" \
--make-reports=tests_${{ matrix.config.report }} \
tests/
@@ -99,7 +98,7 @@ jobs:
if: ${{ matrix.config.framework == 'onnxruntime' }}
run: |
python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
python -m pytest -n 2 --max-worker-restart=0 --dist=loadfile \
python -m pytest -n 4 --max-worker-restart=0 --dist=loadfile \
-s -v -k "Onnx" \
--make-reports=tests_${{ matrix.config.report }} \
tests/
@@ -108,8 +107,8 @@ jobs:
if: ${{ matrix.config.framework == 'pytorch_examples' }}
run: |
python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
python -m uv pip install peft
python -m pytest -n 2 --max-worker-restart=0 --dist=loadfile \
python -m uv pip install peft timm
python -m pytest -n 4 --max-worker-restart=0 --dist=loadfile \
--make-reports=tests_${{ matrix.config.report }} \
examples

View File

@@ -23,7 +23,7 @@ concurrency:
jobs:
run_fast_tests_apple_m1:
name: Fast PyTorch MPS tests on MacOS
runs-on: [ self-hosted, apple-m1 ]
runs-on: macos-13-xlarge
steps:
- name: Checkout diffusers
@@ -59,7 +59,7 @@ jobs:
shell: arch -arch arm64 bash {0}
env:
HF_HOME: /System/Volumes/Data/mnt/cache
HUGGING_FACE_HUB_TOKEN: ${{ secrets.HUGGING_FACE_HUB_TOKEN }}
HF_TOKEN: ${{ secrets.HF_TOKEN }}
run: |
${CONDA_RUN} python -m pytest -n 0 -s -v --make-reports=tests_torch_mps tests/

View File

@@ -0,0 +1,73 @@
name: Check running SLOW tests from a PR (only GPU)
on:
workflow_dispatch:
inputs:
docker_image:
default: 'diffusers/diffusers-pytorch-cuda'
description: 'Name of the Docker image'
required: true
branch:
description: 'PR Branch to test on'
required: true
test:
description: 'Tests to run (e.g.: `tests/models`).'
required: true
env:
DIFFUSERS_IS_CI: yes
IS_GITHUB_CI: "1"
HF_HOME: /mnt/cache
OMP_NUM_THREADS: 8
MKL_NUM_THREADS: 8
PYTEST_TIMEOUT: 600
RUN_SLOW: yes
jobs:
run_tests:
name: "Run a test on our runner from a PR"
runs-on: [single-gpu, nvidia-gpu, t4, ci]
container:
image: ${{ github.event.inputs.docker_image }}
options: --gpus 0 --privileged --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/
steps:
- name: Validate test files input
id: validate_test_files
env:
PY_TEST: ${{ github.event.inputs.test }}
run: |
if [[ ! "$PY_TEST" =~ ^tests/ ]]; then
echo "Error: The input string must start with 'tests/'."
exit 1
fi
if [[ ! "$PY_TEST" =~ ^tests/(models|pipelines) ]]; then
echo "Error: The input string must contain either 'models' or 'pipelines' after 'tests/'."
exit 1
fi
if [[ "$PY_TEST" == *";"* ]]; then
echo "Error: The input string must not contain ';'."
exit 1
fi
echo "$PY_TEST"
- name: Checkout PR branch
uses: actions/checkout@v4
with:
ref: ${{ github.event.inputs.branch }}
repository: ${{ github.event.pull_request.head.repo.full_name }}
- name: Install pytest
run: |
python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
python -m uv pip install -e [quality,test]
python -m uv pip install peft
- name: Run tests
env:
PY_TEST: ${{ github.event.inputs.test }}
run: |
pytest "$PY_TEST"

39
.github/workflows/ssh-pr-runner.yml vendored Normal file
View File

@@ -0,0 +1,39 @@
name: SSH into PR runners
on:
workflow_dispatch:
inputs:
docker_image:
description: 'Name of the Docker image'
required: true
env:
IS_GITHUB_CI: "1"
HF_HUB_READ_TOKEN: ${{ secrets.HF_HUB_READ_TOKEN }}
HF_HOME: /mnt/cache
DIFFUSERS_IS_CI: yes
OMP_NUM_THREADS: 8
MKL_NUM_THREADS: 8
RUN_SLOW: yes
jobs:
ssh_runner:
name: "SSH"
runs-on: [self-hosted, intel-cpu, 32-cpu, 256-ram, ci]
container:
image: ${{ github.event.inputs.docker_image }}
options: --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface/diffusers:/mnt/cache/ --privileged
steps:
- name: Checkout diffusers
uses: actions/checkout@v3
with:
fetch-depth: 2
- name: Tailscale # In order to be able to SSH when a test fails
uses: huggingface/tailscale-action@main
with:
authkey: ${{ secrets.TAILSCALE_SSH_AUTHKEY }}
slackChannel: ${{ secrets.SLACK_CIFEEDBACK_CHANNEL }}
slackToken: ${{ secrets.SLACK_CIFEEDBACK_BOT_TOKEN }}
waitForSSH: true

46
.github/workflows/ssh-runner.yml vendored Normal file
View File

@@ -0,0 +1,46 @@
name: SSH into GPU runners
on:
workflow_dispatch:
inputs:
runner_type:
description: 'Type of runner to test (a10 or t4)'
required: true
docker_image:
description: 'Name of the Docker image'
required: true
env:
IS_GITHUB_CI: "1"
HF_HUB_READ_TOKEN: ${{ secrets.HF_HUB_READ_TOKEN }}
HF_HOME: /mnt/cache
DIFFUSERS_IS_CI: yes
OMP_NUM_THREADS: 8
MKL_NUM_THREADS: 8
RUN_SLOW: yes
jobs:
ssh_runner:
name: "SSH"
runs-on: [single-gpu, nvidia-gpu, "${{ github.event.inputs.runner_type }}", ci]
container:
image: ${{ github.event.inputs.docker_image }}
options: --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface/diffusers:/mnt/cache/ --gpus 0 --privileged
steps:
- name: Checkout diffusers
uses: actions/checkout@v3
with:
fetch-depth: 2
- name: NVIDIA-SMI
run: |
nvidia-smi
- name: Tailscale # In order to be able to SSH when a test fails
uses: huggingface/tailscale-action@main
with:
authkey: ${{ secrets.TAILSCALE_SSH_AUTHKEY }}
slackChannel: ${{ secrets.SLACK_CIFEEDBACK_CHANNEL }}
slackToken: ${{ secrets.SLACK_CIFEEDBACK_BOT_TOKEN }}
waitForSSH: true

15
.github/workflows/trufflehog.yml vendored Normal file
View File

@@ -0,0 +1,15 @@
on:
push:
name: Secret Leaks
jobs:
trufflehog:
runs-on: ubuntu-latest
steps:
- name: Checkout code
uses: actions/checkout@v4
with:
fetch-depth: 0
- name: Secret Scanning
uses: trufflesecurity/trufflehog@main

30
.github/workflows/update_metadata.yml vendored Normal file
View File

@@ -0,0 +1,30 @@
name: Update Diffusers metadata
on:
workflow_dispatch:
push:
branches:
- main
- update_diffusers_metadata*
jobs:
update_metadata:
runs-on: ubuntu-22.04
defaults:
run:
shell: bash -l {0}
steps:
- uses: actions/checkout@v3
- name: Setup environment
run: |
pip install --upgrade pip
pip install datasets pandas
pip install .[torch]
- name: Update metadata
env:
HF_TOKEN: ${{ secrets.SAYAK_HF_TOKEN }}
run: |
python utils/update_metadata.py --commit_sha ${{ github.sha }}

View File

@@ -19,6 +19,16 @@ authors:
family-names: Rasul
- given-names: Mishig
family-names: Davaadorj
- given-names: Dhruv
family-names: Nair
- given-names: Sayak
family-names: Paul
- given-names: Steven
family-names: Liu
- given-names: William
family-names: Berman
- given-names: Yiyi
family-names: Xu
- given-names: Thomas
family-names: Wolf
repository-code: 'https://github.com/huggingface/diffusers'

View File

@@ -245,7 +245,7 @@ The official training examples are maintained by the Diffusers' core maintainers
This is because of the same reasons put forward in [6. Contribute a community pipeline](#6-contribute-a-community-pipeline) for official pipelines vs. community pipelines: It is not feasible for the core maintainers to maintain all possible training methods for diffusion models.
If the Diffusers core maintainers and the community consider a certain training paradigm to be too experimental or not popular enough, the corresponding training code should be put in the `research_projects` folder and maintained by the author.
Both official training and research examples consist of a directory that contains one or more training scripts, a requirements.txt file, and a README.md file. In order for the user to make use of the
Both official training and research examples consist of a directory that contains one or more training scripts, a `requirements.txt` file, and a `README.md` file. In order for the user to make use of the
training examples, it is required to clone the repository:
```bash
@@ -255,7 +255,8 @@ git clone https://github.com/huggingface/diffusers
as well as to install all additional dependencies required for training:
```bash
pip install -r /examples/<your-example-folder>/requirements.txt
cd diffusers
pip install -r examples/<your-example-folder>/requirements.txt
```
Therefore when adding an example, the `requirements.txt` file shall define all pip dependencies required for your training example so that once all those are installed, the user can run the example's training script. See, for example, the [DreamBooth `requirements.txt` file](https://github.com/huggingface/diffusers/blob/main/examples/dreambooth/requirements.txt).
@@ -355,7 +356,7 @@ You will need basic `git` proficiency to be able to contribute to
manual. Type `git --help` in a shell and enjoy. If you prefer books, [Pro
Git](https://git-scm.com/book/en/v2) is a very good reference.
Follow these steps to start contributing ([supported Python versions](https://github.com/huggingface/diffusers/blob/main/setup.py#L265)):
Follow these steps to start contributing ([supported Python versions](https://github.com/huggingface/diffusers/blob/42f25d601a910dceadaee6c44345896b4cfa9928/setup.py#L270)):
1. Fork the [repository](https://github.com/huggingface/diffusers) by
clicking on the 'Fork' button on the repository's page. This creates a copy of the code

View File

@@ -42,6 +42,7 @@ repo-consistency:
quality:
ruff check $(check_dirs) setup.py
ruff format --check $(check_dirs) setup.py
doc-builder style src/diffusers docs/source --max_len 119 --check_only
python utils/check_doc_toc.py
# Format source code automatically and check is there are any problems left that need manual fixing
@@ -55,6 +56,7 @@ extra_style_checks:
style:
ruff check $(check_dirs) setup.py --fix
ruff format $(check_dirs) setup.py
doc-builder style src/diffusers docs/source --max_len 119
${MAKE} autogenerate_code
${MAKE} extra_style_checks

View File

@@ -70,7 +70,7 @@ The following design principles are followed:
- Pipelines should be used **only** for inference.
- Pipelines should be very readable, self-explanatory, and easy to tweak.
- Pipelines should be designed to build on top of each other and be easy to integrate into higher-level APIs.
- Pipelines are **not** intended to be feature-complete user interfaces. For future complete user interfaces one should rather have a look at [InvokeAI](https://github.com/invoke-ai/InvokeAI), [Diffuzers](https://github.com/abhishekkrthakur/diffuzers), and [lama-cleaner](https://github.com/Sanster/lama-cleaner).
- Pipelines are **not** intended to be feature-complete user interfaces. For feature-complete user interfaces one should rather have a look at [InvokeAI](https://github.com/invoke-ai/InvokeAI), [Diffuzers](https://github.com/abhishekkrthakur/diffuzers), and [lama-cleaner](https://github.com/Sanster/lama-cleaner).
- Every pipeline should have one and only one way to run it via a `__call__` method. The naming of the `__call__` arguments should be shared across all pipelines.
- Pipelines should be named after the task they are intended to solve.
- In almost all cases, novel diffusion pipelines shall be implemented in a new pipeline folder/file.
@@ -81,7 +81,7 @@ Models are designed as configurable toolboxes that are natural extensions of [Py
The following design principles are followed:
- Models correspond to **a type of model architecture**. *E.g.* the [`UNet2DConditionModel`] class is used for all UNet variations that expect 2D image inputs and are conditioned on some context.
- All models can be found in [`src/diffusers/models`](https://github.com/huggingface/diffusers/tree/main/src/diffusers/models) and every model architecture shall be defined in its file, e.g. [`unet_2d_condition.py`](https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/unet_2d_condition.py), [`transformer_2d.py`](https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/transformer_2d.py), etc...
- All models can be found in [`src/diffusers/models`](https://github.com/huggingface/diffusers/tree/main/src/diffusers/models) and every model architecture shall be defined in its file, e.g. [`unets/unet_2d_condition.py`](https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/unets/unet_2d_condition.py), [`transformers/transformer_2d.py`](https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/transformers/transformer_2d.py), etc...
- Models **do not** follow the single-file policy and should make use of smaller model building blocks, such as [`attention.py`](https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/attention.py), [`resnet.py`](https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/resnet.py), [`embeddings.py`](https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/embeddings.py), etc... **Note**: This is in stark contrast to Transformers' modeling files and shows that models do not really follow the single-file policy.
- Models intend to expose complexity, just like PyTorch's `Module` class, and give clear error messages.
- Models all inherit from `ModelMixin` and `ConfigMixin`.
@@ -90,7 +90,7 @@ The following design principles are followed:
- To integrate new model checkpoints whose general architecture can be classified as an architecture that already exists in Diffusers, the existing model architecture shall be adapted to make it work with the new checkpoint. One should only create a new file if the model architecture is fundamentally different.
- Models should be designed to be easily extendable to future changes. This can be achieved by limiting public function arguments, configuration arguments, and "foreseeing" future changes, *e.g.* it is usually better to add `string` "...type" arguments that can easily be extended to new future types instead of boolean `is_..._type` arguments. Only the minimum amount of changes shall be made to existing architectures to make a new model checkpoint work.
- The model design is a difficult trade-off between keeping code readable and concise and supporting many model checkpoints. For most parts of the modeling code, classes shall be adapted for new model checkpoints, while there are some exceptions where it is preferred to add new classes to make sure the code is kept concise and
readable long-term, such as [UNet blocks](https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/unet_2d_blocks.py) and [Attention processors](https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/attention_processor.py).
readable long-term, such as [UNet blocks](https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/unets/unet_2d_blocks.py) and [Attention processors](https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/attention_processor.py).
### Schedulers

View File

@@ -20,21 +20,11 @@ limitations under the License.
<br>
<p>
<p align="center">
<a href="https://github.com/huggingface/diffusers/blob/main/LICENSE">
<img alt="GitHub" src="https://img.shields.io/github/license/huggingface/datasets.svg?color=blue">
</a>
<a href="https://github.com/huggingface/diffusers/releases">
<img alt="GitHub release" src="https://img.shields.io/github/release/huggingface/diffusers.svg">
</a>
<a href="https://pepy.tech/project/diffusers">
<img alt="GitHub release" src="https://static.pepy.tech/badge/diffusers/month">
</a>
<a href="CODE_OF_CONDUCT.md">
<img alt="Contributor Covenant" src="https://img.shields.io/badge/Contributor%20Covenant-2.1-4baaaa.svg">
</a>
<a href="https://twitter.com/diffuserslib">
<img alt="X account" src="https://img.shields.io/twitter/url/https/twitter.com/diffuserslib.svg?style=social&label=Follow%20%40diffuserslib">
</a>
<a href="https://github.com/huggingface/diffusers/blob/main/LICENSE"><img alt="GitHub" src="https://img.shields.io/github/license/huggingface/datasets.svg?color=blue"></a>
<a href="https://github.com/huggingface/diffusers/releases"><img alt="GitHub release" src="https://img.shields.io/github/release/huggingface/diffusers.svg"></a>
<a href="https://pepy.tech/project/diffusers"><img alt="GitHub release" src="https://static.pepy.tech/badge/diffusers/month"></a>
<a href="CODE_OF_CONDUCT.md"><img alt="Contributor Covenant" src="https://img.shields.io/badge/Contributor%20Covenant-2.1-4baaaa.svg"></a>
<a href="https://twitter.com/diffuserslib"><img alt="X account" src="https://img.shields.io/twitter/url/https/twitter.com/diffuserslib.svg?style=social&label=Follow%20%40diffuserslib"></a>
</p>
🤗 Diffusers is the go-to library for state-of-the-art pretrained diffusion models for generating images, audio, and even 3D structures of molecules. Whether you're looking for a simple inference solution or training your own diffusion models, 🤗 Diffusers is a modular toolbox that supports both. Our library is designed with a focus on [usability over performance](https://huggingface.co/docs/diffusers/conceptual/philosophy#usability-over-performance), [simple over easy](https://huggingface.co/docs/diffusers/conceptual/philosophy#simple-over-easy), and [customizability over abstractions](https://huggingface.co/docs/diffusers/conceptual/philosophy#tweakable-contributorfriendly-over-abstraction).
@@ -77,7 +67,7 @@ Please refer to the [How to use Stable Diffusion in Apple Silicon](https://huggi
## Quickstart
Generating outputs is super easy with 🤗 Diffusers. To generate an image from text, use the `from_pretrained` method to load any pretrained diffusion model (browse the [Hub](https://huggingface.co/models?library=diffusers&sort=downloads) for 22000+ checkpoints):
Generating outputs is super easy with 🤗 Diffusers. To generate an image from text, use the `from_pretrained` method to load any pretrained diffusion model (browse the [Hub](https://huggingface.co/models?library=diffusers&sort=downloads) for 27.000+ checkpoints):
```python
from diffusers import DiffusionPipeline
@@ -219,7 +209,7 @@ Also, say 👋 in our public Discord channel <a href="https://discord.gg/G7tWnz9
- https://github.com/deep-floyd/IF
- https://github.com/bentoml/BentoML
- https://github.com/bmaltais/kohya_ss
- +9000 other amazing GitHub repositories 💪
- +12.000 other amazing GitHub repositories 💪
Thank you for using us ❤️.
@@ -238,7 +228,7 @@ We also want to thank @heejkoo for the very helpful overview of papers, code and
```bibtex
@misc{von-platen-etal-2022-diffusers,
author = {Patrick von Platen and Suraj Patil and Anton Lozhkov and Pedro Cuenca and Nathan Lambert and Kashif Rasul and Mishig Davaadorj and Thomas Wolf},
author = {Patrick von Platen and Suraj Patil and Anton Lozhkov and Pedro Cuenca and Nathan Lambert and Kashif Rasul and Mishig Davaadorj and Dhruv Nair and Sayak Paul and William Berman and Yiyi Xu and Steven Liu and Thomas Wolf},
title = {Diffusers: State-of-the-art diffusion models},
year = {2022},
publisher = {GitHub},

View File

@@ -40,7 +40,7 @@ def main():
print(f"****** Running file: {file} ******")
# Run with canonical settings.
if file != "benchmark_text_to_image.py":
if file != "benchmark_text_to_image.py" and file != "benchmark_ip_adapters.py":
command = f"python {file}"
run_command(command.split())
@@ -49,6 +49,10 @@ def main():
# Run variants.
for file in python_files:
# See: https://github.com/pytorch/pytorch/issues/129637
if file == "benchmark_ip_adapters.py":
continue
if file == "benchmark_text_to_image.py":
for ckpt in ALL_T2I_CKPTS:
command = f"python {file} --ckpt {ckpt}"

View File

@@ -0,0 +1,52 @@
FROM ubuntu:20.04
LABEL maintainer="Hugging Face"
LABEL repository="diffusers"
ENV DEBIAN_FRONTEND=noninteractive
RUN apt-get -y update \
&& apt-get install -y software-properties-common \
&& add-apt-repository ppa:deadsnakes/ppa
RUN apt install -y bash \
build-essential \
git \
git-lfs \
curl \
ca-certificates \
libsndfile1-dev \
python3.10 \
python3-pip \
libgl1 \
zip \
wget \
python3.10-venv && \
rm -rf /var/lib/apt/lists
# make sure to use venv
RUN python3.10 -m venv /opt/venv
ENV PATH="/opt/venv/bin:$PATH"
# pre-install the heavy dependencies (these can later be overridden by the deps from setup.py)
RUN python3.10 -m pip install --no-cache-dir --upgrade pip uv==0.1.11 && \
python3.10 -m uv pip install --no-cache-dir \
torch \
torchvision \
torchaudio \
invisible_watermark \
--extra-index-url https://download.pytorch.org/whl/cpu && \
python3.10 -m uv pip install --no-cache-dir \
accelerate \
datasets \
hf-doc-builder \
huggingface-hub \
Jinja2 \
librosa \
numpy==1.26.4 \
scipy \
tensorboard \
transformers \
matplotlib \
setuptools==69.5.1
CMD ["/bin/bash"]

View File

@@ -4,21 +4,25 @@ LABEL repository="diffusers"
ENV DEBIAN_FRONTEND=noninteractive
RUN apt update && \
apt install -y bash \
RUN apt-get -y update \
&& apt-get install -y software-properties-common \
&& add-apt-repository ppa:deadsnakes/ppa
RUN apt install -y bash \
build-essential \
git \
git-lfs \
curl \
ca-certificates \
libsndfile1-dev \
python3.8 \
libgl1 \
python3.10 \
python3-pip \
python3.8-venv && \
python3.10-venv && \
rm -rf /var/lib/apt/lists
# make sure to use venv
RUN python3 -m venv /opt/venv
RUN python3.10 -m venv /opt/venv
ENV PATH="/opt/venv/bin:$PATH"
# pre-install the heavy dependencies (these can later be overridden by the deps from setup.py)
@@ -36,7 +40,7 @@ RUN python3 -m pip install --no-cache-dir --upgrade pip uv==0.1.11 && \
huggingface-hub \
Jinja2 \
librosa \
numpy \
numpy==1.26.4 \
scipy \
tensorboard \
transformers

View File

@@ -4,21 +4,25 @@ LABEL repository="diffusers"
ENV DEBIAN_FRONTEND=noninteractive
RUN apt update && \
apt install -y bash \
RUN apt-get -y update \
&& apt-get install -y software-properties-common \
&& add-apt-repository ppa:deadsnakes/ppa
RUN apt install -y bash \
build-essential \
git \
git-lfs \
curl \
ca-certificates \
libsndfile1-dev \
python3.8 \
libgl1 \
python3.10 \
python3-pip \
python3.8-venv && \
python3.10-venv && \
rm -rf /var/lib/apt/lists
# make sure to use venv
RUN python3 -m venv /opt/venv
RUN python3.10 -m venv /opt/venv
ENV PATH="/opt/venv/bin:$PATH"
# pre-install the heavy dependencies (these can later be overridden by the deps from setup.py)
@@ -38,7 +42,7 @@ RUN python3 -m pip install --no-cache-dir --upgrade pip uv==0.1.11 && \
huggingface-hub \
Jinja2 \
librosa \
numpy \
numpy==1.26.4 \
scipy \
tensorboard \
transformers

View File

@@ -4,21 +4,25 @@ LABEL repository="diffusers"
ENV DEBIAN_FRONTEND=noninteractive
RUN apt update && \
apt install -y bash \
RUN apt-get -y update \
&& apt-get install -y software-properties-common \
&& add-apt-repository ppa:deadsnakes/ppa
RUN apt install -y bash \
build-essential \
git \
git-lfs \
curl \
ca-certificates \
libsndfile1-dev \
python3.8 \
libgl1 \
python3.10 \
python3-pip \
python3.8-venv && \
python3.10-venv && \
rm -rf /var/lib/apt/lists
# make sure to use venv
RUN python3 -m venv /opt/venv
RUN python3.10 -m venv /opt/venv
ENV PATH="/opt/venv/bin:$PATH"
# pre-install the heavy dependencies (these can later be overridden by the deps from setup.py)
@@ -36,7 +40,7 @@ RUN python3 -m pip install --no-cache-dir --upgrade pip uv==0.1.11 && \
huggingface-hub \
Jinja2 \
librosa \
numpy \
numpy==1.26.4 \
scipy \
tensorboard \
transformers

View File

@@ -4,39 +4,44 @@ LABEL repository="diffusers"
ENV DEBIAN_FRONTEND=noninteractive
RUN apt update && \
apt install -y bash \
RUN apt-get -y update \
&& apt-get install -y software-properties-common \
&& add-apt-repository ppa:deadsnakes/ppa
RUN apt install -y bash \
build-essential \
git \
git-lfs \
curl \
ca-certificates \
libsndfile1-dev \
python3.8 \
libgl1 \
python3.10 \
python3-pip \
python3.8-venv && \
python3.10-venv && \
rm -rf /var/lib/apt/lists
# make sure to use venv
RUN python3 -m venv /opt/venv
RUN python3.10 -m venv /opt/venv
ENV PATH="/opt/venv/bin:$PATH"
# pre-install the heavy dependencies (these can later be overridden by the deps from setup.py)
RUN python3 -m pip install --no-cache-dir --upgrade pip uv==0.1.11 && \
python3 -m uv pip install --no-cache-dir \
RUN python3.10 -m pip install --no-cache-dir --upgrade pip uv==0.1.11 && \
python3.10 -m uv pip install --no-cache-dir \
torch \
torchvision \
torchaudio \
"onnxruntime-gpu>=1.13.1" \
--extra-index-url https://download.pytorch.org/whl/cu117 && \
python3 -m uv pip install --no-cache-dir \
python3.10 -m uv pip install --no-cache-dir \
accelerate \
datasets \
hf-doc-builder \
huggingface-hub \
hf_transfer \
Jinja2 \
librosa \
numpy \
numpy==1.26.4 \
scipy \
tensorboard \
transformers

View File

@@ -4,8 +4,11 @@ LABEL repository="diffusers"
ENV DEBIAN_FRONTEND=noninteractive
RUN apt update && \
apt install -y bash \
RUN apt-get -y update \
&& apt-get install -y software-properties-common \
&& add-apt-repository ppa:deadsnakes/ppa
RUN apt install -y bash \
build-essential \
git \
git-lfs \
@@ -13,31 +16,32 @@ RUN apt update && \
ca-certificates \
libsndfile1-dev \
libgl1 \
python3.9 \
python3.9-dev \
python3.10 \
python3.10-dev \
python3-pip \
python3.9-venv && \
python3.10-venv && \
rm -rf /var/lib/apt/lists
# make sure to use venv
RUN python3.9 -m venv /opt/venv
RUN python3.10 -m venv /opt/venv
ENV PATH="/opt/venv/bin:$PATH"
# pre-install the heavy dependencies (these can later be overridden by the deps from setup.py)
RUN python3.9 -m pip install --no-cache-dir --upgrade pip uv==0.1.11 && \
python3.9 -m uv pip install --no-cache-dir \
RUN python3.10 -m pip install --no-cache-dir --upgrade pip uv==0.1.11 && \
python3.10 -m uv pip install --no-cache-dir \
torch \
torchvision \
torchaudio \
invisible_watermark && \
python3.9 -m pip install --no-cache-dir \
python3.10 -m pip install --no-cache-dir \
accelerate \
datasets \
hf-doc-builder \
huggingface-hub \
hf_transfer \
Jinja2 \
librosa \
numpy \
numpy==1.26.4 \
scipy \
tensorboard \
transformers

View File

@@ -4,40 +4,44 @@ LABEL repository="diffusers"
ENV DEBIAN_FRONTEND=noninteractive
RUN apt update && \
apt install -y bash \
RUN apt-get -y update \
&& apt-get install -y software-properties-common \
&& add-apt-repository ppa:deadsnakes/ppa
RUN apt install -y bash \
build-essential \
git \
git-lfs \
curl \
ca-certificates \
libsndfile1-dev \
python3.8 \
python3.10 \
python3.10-dev \
python3-pip \
libgl1 \
python3.8-venv && \
python3.10-venv && \
rm -rf /var/lib/apt/lists
# make sure to use venv
RUN python3 -m venv /opt/venv
RUN python3.10 -m venv /opt/venv
ENV PATH="/opt/venv/bin:$PATH"
# pre-install the heavy dependencies (these can later be overridden by the deps from setup.py)
RUN python3 -m pip install --no-cache-dir --upgrade pip uv==0.1.11 && \
python3 -m uv pip install --no-cache-dir \
RUN python3.10 -m pip install --no-cache-dir --upgrade pip uv==0.1.11 && \
python3.10 -m uv pip install --no-cache-dir \
torch \
torchvision \
torchaudio \
invisible_watermark \
--extra-index-url https://download.pytorch.org/whl/cpu && \
python3 -m uv pip install --no-cache-dir \
python3.10 -m uv pip install --no-cache-dir \
accelerate \
datasets \
hf-doc-builder \
huggingface-hub \
Jinja2 \
librosa \
numpy \
numpy==1.26.4 \
scipy \
tensorboard \
transformers matplotlib

View File

@@ -4,8 +4,11 @@ LABEL repository="diffusers"
ENV DEBIAN_FRONTEND=noninteractive
RUN apt update && \
apt install -y bash \
RUN apt-get -y update \
&& apt-get install -y software-properties-common \
&& add-apt-repository ppa:deadsnakes/ppa
RUN apt install -y bash \
build-essential \
git \
git-lfs \
@@ -13,30 +16,32 @@ RUN apt update && \
ca-certificates \
libsndfile1-dev \
libgl1 \
python3.8 \
python3.10 \
python3.10-dev \
python3-pip \
python3.8-venv && \
python3.10-venv && \
rm -rf /var/lib/apt/lists
# make sure to use venv
RUN python3 -m venv /opt/venv
RUN python3.10 -m venv /opt/venv
ENV PATH="/opt/venv/bin:$PATH"
# pre-install the heavy dependencies (these can later be overridden by the deps from setup.py)
RUN python3 -m pip install --no-cache-dir --upgrade pip uv==0.1.11 && \
python3 -m uv pip install --no-cache-dir \
RUN python3.10 -m pip install --no-cache-dir --upgrade pip uv==0.1.11 && \
python3.10 -m uv pip install --no-cache-dir \
torch \
torchvision \
torchaudio \
invisible_watermark && \
python3 -m pip install --no-cache-dir \
python3.10 -m pip install --no-cache-dir \
accelerate \
datasets \
hf-doc-builder \
huggingface-hub \
hf_transfer \
Jinja2 \
librosa \
numpy \
numpy==1.26.4 \
scipy \
tensorboard \
transformers \

View File

@@ -4,8 +4,11 @@ LABEL repository="diffusers"
ENV DEBIAN_FRONTEND=noninteractive
RUN apt update && \
apt install -y bash \
RUN apt-get -y update \
&& apt-get install -y software-properties-common \
&& add-apt-repository ppa:deadsnakes/ppa
RUN apt install -y bash \
build-essential \
git \
git-lfs \
@@ -13,30 +16,32 @@ RUN apt update && \
ca-certificates \
libsndfile1-dev \
libgl1 \
python3.8 \
python3.10 \
python3.10-dev \
python3-pip \
python3.8-venv && \
python3.10-venv && \
rm -rf /var/lib/apt/lists
# make sure to use venv
RUN python3 -m venv /opt/venv
RUN python3.10 -m venv /opt/venv
ENV PATH="/opt/venv/bin:$PATH"
# pre-install the heavy dependencies (these can later be overridden by the deps from setup.py)
RUN python3 -m pip install --no-cache-dir --upgrade pip uv==0.1.11 && \
python3 -m pip install --no-cache-dir \
RUN python3.10 -m pip install --no-cache-dir --upgrade pip uv==0.1.11 && \
python3.10 -m pip install --no-cache-dir \
torch \
torchvision \
torchaudio \
invisible_watermark && \
python3 -m uv pip install --no-cache-dir \
python3.10 -m uv pip install --no-cache-dir \
accelerate \
datasets \
hf-doc-builder \
huggingface-hub \
hf_transfer \
Jinja2 \
librosa \
numpy \
numpy==1.26.4 \
scipy \
tensorboard \
transformers \

View File

@@ -242,10 +242,10 @@ Here's an example of a tuple return, comprising several objects:
```
Returns:
`tuple(torch.FloatTensor)` comprising various elements depending on the configuration ([`BertConfig`]) and inputs:
- ** loss** (*optional*, returned when `masked_lm_labels` is provided) `torch.FloatTensor` of shape `(1,)` --
`tuple(torch.Tensor)` comprising various elements depending on the configuration ([`BertConfig`]) and inputs:
- ** loss** (*optional*, returned when `masked_lm_labels` is provided) `torch.Tensor` of shape `(1,)` --
Total loss is the sum of the masked language modeling loss and the next sequence prediction (classification) loss.
- **prediction_scores** (`torch.FloatTensor` of shape `(batch_size, sequence_length, config.vocab_size)`) --
- **prediction_scores** (`torch.Tensor` of shape `(batch_size, sequence_length, config.vocab_size)`) --
Prediction scores of the language modeling head (scores for each vocabulary token before SoftMax).
```

View File

@@ -21,29 +21,24 @@
title: Load LoRAs for inference
- local: tutorials/fast_diffusion
title: Accelerate inference of text-to-image diffusion models
- local: tutorials/inference_with_big_models
title: Working with big models
title: Tutorials
- sections:
- sections:
- local: using-diffusers/loading_overview
title: Overview
- local: using-diffusers/loading
title: Load pipelines, models, and schedulers
- local: using-diffusers/schedulers
title: Load and compare different schedulers
title: Load pipelines
- local: using-diffusers/custom_pipeline_overview
title: Load community pipelines and components
- local: using-diffusers/using_safetensors
title: Load safetensors
- local: using-diffusers/schedulers
title: Load schedulers and models
- local: using-diffusers/other-formats
title: Load different Stable Diffusion formats
title: Model files and layouts
- local: using-diffusers/loading_adapters
title: Load adapters
- local: using-diffusers/push_to_hub
title: Push files to the Hub
title: Loading & Hub
title: Load pipelines and adapters
- sections:
- local: using-diffusers/pipeline_overview
title: Overview
- local: using-diffusers/unconditional_image_generation
title: Unconditional image generation
- local: using-diffusers/conditional_image_generation
@@ -56,58 +51,58 @@
title: Text or image-to-video
- local: using-diffusers/depth2img
title: Depth-to-image
title: Tasks
title: Generative tasks
- sections:
- local: using-diffusers/textual_inversion_inference
title: Textual inversion
- local: using-diffusers/ip_adapter
title: IP-Adapter
- local: using-diffusers/merge_loras
title: Merge LoRAs
- local: using-diffusers/overview_techniques
title: Overview
- local: training/distributed_inference
title: Distributed inference with multiple GPUs
- local: using-diffusers/merge_loras
title: Merge LoRAs
- local: using-diffusers/scheduler_features
title: Scheduler features
- local: using-diffusers/callback
title: Pipeline callbacks
- local: using-diffusers/reusing_seeds
title: Improve image quality with deterministic generation
- local: using-diffusers/control_brightness
title: Control image brightness
title: Reproducible pipelines
- local: using-diffusers/image_quality
title: Controlling image quality
- local: using-diffusers/weighted_prompts
title: Prompt weighting
- local: using-diffusers/freeu
title: Improve generation quality with FreeU
title: Techniques
title: Prompt techniques
title: Inference techniques
- sections:
- local: advanced_inference/outpaint
title: Outpainting
title: Advanced inference
- sections:
- local: using-diffusers/pipeline_overview
title: Overview
- local: using-diffusers/sdxl
title: Stable Diffusion XL
- local: using-diffusers/sdxl_turbo
title: SDXL Turbo
- local: using-diffusers/kandinsky
title: Kandinsky
- local: using-diffusers/ip_adapter
title: IP-Adapter
- local: using-diffusers/pag
title: PAG
- local: using-diffusers/controlnet
title: ControlNet
- local: using-diffusers/t2i_adapter
title: T2I-Adapter
- local: using-diffusers/inference_with_lcm
title: Latent Consistency Model
- local: using-diffusers/textual_inversion_inference
title: Textual inversion
- local: using-diffusers/shap-e
title: Shap-E
- local: using-diffusers/diffedit
title: DiffEdit
- local: using-diffusers/distilled_sd
title: Distilled Stable Diffusion inference
- local: using-diffusers/callback
title: Pipeline callbacks
- local: using-diffusers/reproducibility
title: Create reproducible pipelines
- local: using-diffusers/custom_pipeline_examples
title: Community pipelines
- local: using-diffusers/contribute_pipeline
title: Contribute a community pipeline
- local: using-diffusers/inference_with_lcm_lora
title: Latent Consistency Model-LoRA
- local: using-diffusers/inference_with_lcm
title: Latent Consistency Model
- local: using-diffusers/inference_with_tcd_lora
title: Trajectory Consistency Distillation-LoRA
- local: using-diffusers/svd
title: Stable Video Diffusion
- local: using-diffusers/marigold_usage
title: Marigold Computer Vision
title: Specific pipeline examples
- sections:
- local: training/overview
@@ -116,7 +111,8 @@
title: Create a dataset for training
- local: training/adapt_a_model
title: Adapt a model to a new task
- sections:
- isExpanded: false
sections:
- local: training/unconditional_training
title: Unconditional image generation
- local: training/text2image
@@ -134,7 +130,8 @@
- local: training/instructpix2pix
title: InstructPix2Pix
title: Models
- sections:
- isExpanded: false
sections:
- local: training/text_inversion
title: Textual Inversion
- local: training/dreambooth
@@ -149,14 +146,6 @@
title: Reinforcement learning training with DDPO
title: Methods
title: Training
- sections:
- local: using-diffusers/other-modalities
title: Other Modalities
title: Taking Diffusers Beyond Images
title: Using Diffusers
- sections:
- local: optimization/opt_overview
title: Overview
- sections:
- local: optimization/fp16
title: Speed up inference
@@ -170,7 +159,8 @@
title: Token merging
- local: optimization/deepcache
title: DeepCache
title: General optimizations
- local: optimization/tgate
title: TGATE
- sections:
- local: using-diffusers/stable_diffusion_jax_how_to
title: JAX/Flax
@@ -180,14 +170,14 @@
title: OpenVINO
- local: optimization/coreml
title: Core ML
title: Optimized model types
title: Optimized model formats
- sections:
- local: optimization/mps
title: Metal Performance Shaders (MPS)
- local: optimization/habana
title: Habana Gaudi
title: Optimized hardware
title: Optimization
title: Accelerate inference and reduce memory
- sections:
- local: conceptual/philosophy
title: Philosophy
@@ -201,7 +191,8 @@
title: Evaluating Diffusion Models
title: Conceptual Guides
- sections:
- sections:
- isExpanded: false
sections:
- local: api/configuration
title: Configuration
- local: api/logging
@@ -209,7 +200,8 @@
- local: api/outputs
title: Outputs
title: Main Classes
- sections:
- isExpanded: false
sections:
- local: api/loaders/ip_adapter
title: IP-Adapter
- local: api/loaders/lora
@@ -223,7 +215,8 @@
- local: api/loaders/peft
title: PEFT
title: Loaders
- sections:
- isExpanded: false
sections:
- local: api/models/overview
title: Overview
- local: api/models/unet
@@ -249,15 +242,34 @@
- local: api/models/consistency_decoder_vae
title: ConsistencyDecoderVAE
- local: api/models/transformer2d
title: Transformer2D
title: Transformer2DModel
- local: api/models/pixart_transformer2d
title: PixArtTransformer2DModel
- local: api/models/dit_transformer2d
title: DiTTransformer2DModel
- local: api/models/hunyuan_transformer2d
title: HunyuanDiT2DModel
- local: api/models/aura_flow_transformer2d
title: AuraFlowTransformer2DModel
- local: api/models/latte_transformer3d
title: LatteTransformer3DModel
- local: api/models/lumina_nextdit2d
title: LuminaNextDiT2DModel
- local: api/models/transformer_temporal
title: Transformer Temporal
title: TransformerTemporalModel
- local: api/models/sd3_transformer2d
title: SD3Transformer2DModel
- local: api/models/prior_transformer
title: PriorTransformer
- local: api/models/controlnet
title: ControlNet
title: ControlNetModel
- local: api/models/controlnet_hunyuandit
title: HunyuanDiT2DControlNetModel
- local: api/models/controlnet_sd3
title: SD3ControlNetModel
title: Models
- sections:
- isExpanded: false
sections:
- local: api/pipelines/overview
title: Overview
- local: api/pipelines/amused
@@ -270,6 +282,8 @@
title: AudioLDM
- local: api/pipelines/audioldm2
title: AudioLDM 2
- local: api/pipelines/aura_flow
title: AuraFlow
- local: api/pipelines/auto_pipeline
title: AutoPipeline
- local: api/pipelines/blip_diffusion
@@ -278,8 +292,16 @@
title: Consistency Models
- local: api/pipelines/controlnet
title: ControlNet
- local: api/pipelines/controlnet_hunyuandit
title: ControlNet with Hunyuan-DiT
- local: api/pipelines/controlnet_sd3
title: ControlNet with Stable Diffusion 3
- local: api/pipelines/controlnet_sdxl
title: ControlNet with Stable Diffusion XL
- local: api/pipelines/controlnetxs
title: ControlNet-XS
- local: api/pipelines/controlnetxs_sdxl
title: ControlNet-XS with Stable Diffusion XL
- local: api/pipelines/dance_diffusion
title: Dance Diffusion
- local: api/pipelines/ddim
@@ -292,6 +314,8 @@
title: DiffEdit
- local: api/pipelines/dit
title: DiT
- local: api/pipelines/hunyuandit
title: Hunyuan-DiT
- local: api/pipelines/i2vgenxl
title: I2VGen-XL
- local: api/pipelines/pix2pix
@@ -302,22 +326,34 @@
title: Kandinsky 2.2
- local: api/pipelines/kandinsky3
title: Kandinsky 3
- local: api/pipelines/kolors
title: Kolors
- local: api/pipelines/latent_consistency_models
title: Latent Consistency Models
- local: api/pipelines/latent_diffusion
title: Latent Diffusion
- local: api/pipelines/latte
title: Latte
- local: api/pipelines/ledits_pp
title: LEDITS++
- local: api/pipelines/lumina
title: Lumina-T2X
- local: api/pipelines/marigold
title: Marigold
- local: api/pipelines/panorama
title: MultiDiffusion
- local: api/pipelines/musicldm
title: MusicLDM
- local: api/pipelines/pag
title: PAG
- local: api/pipelines/paint_by_example
title: Paint by Example
- local: api/pipelines/pia
title: Personalized Image Animator (PIA)
- local: api/pipelines/pixart
title: PixArt-α
- local: api/pipelines/pixart_sigma
title: PixArt-Σ
- local: api/pipelines/self_attention_guidance
title: Self-Attention Guidance
- local: api/pipelines/semantic_stable_diffusion
@@ -345,6 +381,8 @@
title: Safe Stable Diffusion
- local: api/pipelines/stable_diffusion/stable_diffusion_2
title: Stable Diffusion 2
- local: api/pipelines/stable_diffusion/stable_diffusion_3
title: Stable Diffusion 3
- local: api/pipelines/stable_diffusion/stable_diffusion_xl
title: Stable Diffusion XL
- local: api/pipelines/stable_diffusion/sdxl_turbo
@@ -358,7 +396,7 @@
- local: api/pipelines/stable_diffusion/ldm3d_diffusion
title: LDM3D Text-to-(RGB, Depth), Text-to-(RGB-pano, Depth-pano), LDM3D Upscaler
- local: api/pipelines/stable_diffusion/adapter
title: Stable Diffusion T2I-Adapter
title: T2I-Adapter
- local: api/pipelines/stable_diffusion/gligen
title: GLIGEN (Grounded Language-to-Image Generation)
title: Stable Diffusion
@@ -377,7 +415,8 @@
- local: api/pipelines/wuerstchen
title: Wuerstchen
title: Pipelines
- sections:
- isExpanded: false
sections:
- local: api/schedulers/overview
title: Overview
- local: api/schedulers/cm_stochastic_iterative
@@ -408,6 +447,10 @@
title: EulerAncestralDiscreteScheduler
- local: api/schedulers/euler
title: EulerDiscreteScheduler
- local: api/schedulers/flow_match_euler_discrete
title: FlowMatchEulerDiscreteScheduler
- local: api/schedulers/flow_match_heun_discrete
title: FlowMatchHeunDiscreteScheduler
- local: api/schedulers/heun
title: HeunDiscreteScheduler
- local: api/schedulers/ipndm
@@ -437,7 +480,8 @@
- local: api/schedulers/vq_diffusion
title: VQDiffusionScheduler
title: Schedulers
- sections:
- isExpanded: false
sections:
- local: api/internal_classes_overview
title: Overview
- local: api/attnprocessor
@@ -450,5 +494,7 @@
title: Utilities
- local: api/image_processor
title: VAE Image Processor
- local: api/video_processor
title: Video Processor
title: Internal classes
title: API

View File

@@ -0,0 +1,231 @@
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->
# Outpainting
Outpainting extends an image beyond its original boundaries, allowing you to add, replace, or modify visual elements in an image while preserving the original image. Like [inpainting](../using-diffusers/inpaint), you want to fill the white area (in this case, the area outside of the original image) with new visual elements while keeping the original image (represented by a mask of black pixels). There are a couple of ways to outpaint, such as with a [ControlNet](https://hf.co/blog/OzzyGT/outpainting-controlnet) or with [Differential Diffusion](https://hf.co/blog/OzzyGT/outpainting-differential-diffusion).
This guide will show you how to outpaint with an inpainting model, ControlNet, and a ZoeDepth estimator.
Before you begin, make sure you have the [controlnet_aux](https://github.com/huggingface/controlnet_aux) library installed so you can use the ZoeDepth estimator.
```py
!pip install -q controlnet_aux
```
## Image preparation
Start by picking an image to outpaint with and remove the background with a Space like [BRIA-RMBG-1.4](https://hf.co/spaces/briaai/BRIA-RMBG-1.4).
<iframe
src="https://briaai-bria-rmbg-1-4.hf.space"
frameborder="0"
width="850"
height="450"
></iframe>
For example, remove the background from this image of a pair of shoes.
<div class="flex flex-row gap-4">
<div class="flex-1">
<img class="rounded-xl" src="https://huggingface.co/datasets/stevhliu/testing-images/resolve/main/original-jordan.png"/>
<figcaption class="mt-2 text-center text-sm text-gray-500">original image</figcaption>
</div>
<div class="flex-1">
<img class="rounded-xl" src="https://huggingface.co/datasets/stevhliu/testing-images/resolve/main/no-background-jordan.png"/>
<figcaption class="mt-2 text-center text-sm text-gray-500">background removed</figcaption>
</div>
</div>
[Stable Diffusion XL (SDXL)](../using-diffusers/sdxl) models work best with 1024x1024 images, but you can resize the image to any size as long as your hardware has enough memory to support it. The transparent background in the image should also be replaced with a white background. Create a function (like the one below) that scales and pastes the image onto a white background.
```py
import random
import requests
import torch
from controlnet_aux import ZoeDetector
from PIL import Image, ImageOps
from diffusers import (
AutoencoderKL,
ControlNetModel,
StableDiffusionXLControlNetPipeline,
StableDiffusionXLInpaintPipeline,
)
def scale_and_paste(original_image):
aspect_ratio = original_image.width / original_image.height
if original_image.width > original_image.height:
new_width = 1024
new_height = round(new_width / aspect_ratio)
else:
new_height = 1024
new_width = round(new_height * aspect_ratio)
resized_original = original_image.resize((new_width, new_height), Image.LANCZOS)
white_background = Image.new("RGBA", (1024, 1024), "white")
x = (1024 - new_width) // 2
y = (1024 - new_height) // 2
white_background.paste(resized_original, (x, y), resized_original)
return resized_original, white_background
original_image = Image.open(
requests.get(
"https://huggingface.co/datasets/stevhliu/testing-images/resolve/main/no-background-jordan.png",
stream=True,
).raw
).convert("RGBA")
resized_img, white_bg_image = scale_and_paste(original_image)
```
To avoid adding unwanted extra details, use the ZoeDepth estimator to provide additional guidance during generation and to ensure the shoes remain consistent with the original image.
```py
zoe = ZoeDetector.from_pretrained("lllyasviel/Annotators")
image_zoe = zoe(white_bg_image, detect_resolution=512, image_resolution=1024)
image_zoe
```
<div class="flex justify-center">
<img src="https://huggingface.co/datasets/stevhliu/testing-images/resolve/main/zoedepth-jordan.png"/>
</div>
## Outpaint
Once your image is ready, you can generate content in the white area around the shoes with [controlnet-inpaint-dreamer-sdxl](https://hf.co/destitech/controlnet-inpaint-dreamer-sdxl), a SDXL ControlNet trained for inpainting.
Load the inpainting ControlNet, ZoeDepth model, VAE and pass them to the [`StableDiffusionXLControlNetPipeline`]. Then you can create an optional `generate_image` function (for convenience) to outpaint an initial image.
```py
controlnets = [
ControlNetModel.from_pretrained(
"destitech/controlnet-inpaint-dreamer-sdxl", torch_dtype=torch.float16, variant="fp16"
),
ControlNetModel.from_pretrained(
"diffusers/controlnet-zoe-depth-sdxl-1.0", torch_dtype=torch.float16
),
]
vae = AutoencoderKL.from_pretrained("madebyollin/sdxl-vae-fp16-fix", torch_dtype=torch.float16).to("cuda")
pipeline = StableDiffusionXLControlNetPipeline.from_pretrained(
"SG161222/RealVisXL_V4.0", torch_dtype=torch.float16, variant="fp16", controlnet=controlnets, vae=vae
).to("cuda")
def generate_image(prompt, negative_prompt, inpaint_image, zoe_image, seed: int = None):
if seed is None:
seed = random.randint(0, 2**32 - 1)
generator = torch.Generator(device="cpu").manual_seed(seed)
image = pipeline(
prompt,
negative_prompt=negative_prompt,
image=[inpaint_image, zoe_image],
guidance_scale=6.5,
num_inference_steps=25,
generator=generator,
controlnet_conditioning_scale=[0.5, 0.8],
control_guidance_end=[0.9, 0.6],
).images[0]
return image
prompt = "nike air jordans on a basketball court"
negative_prompt = ""
temp_image = generate_image(prompt, negative_prompt, white_bg_image, image_zoe, 908097)
```
Paste the original image over the initial outpainted image. You'll improve the outpainted background in a later step.
```py
x = (1024 - resized_img.width) // 2
y = (1024 - resized_img.height) // 2
temp_image.paste(resized_img, (x, y), resized_img)
temp_image
```
<div class="flex justify-center">
<img src="https://huggingface.co/datasets/stevhliu/testing-images/resolve/main/initial-outpaint.png"/>
</div>
> [!TIP]
> Now is a good time to free up some memory if you're running low!
>
> ```py
> pipeline=None
> torch.cuda.empty_cache()
> ```
Now that you have an initial outpainted image, load the [`StableDiffusionXLInpaintPipeline`] with the [RealVisXL](https://hf.co/SG161222/RealVisXL_V4.0) model to generate the final outpainted image with better quality.
```py
pipeline = StableDiffusionXLInpaintPipeline.from_pretrained(
"OzzyGT/RealVisXL_V4.0_inpainting",
torch_dtype=torch.float16,
variant="fp16",
vae=vae,
).to("cuda")
```
Prepare a mask for the final outpainted image. To create a more natural transition between the original image and the outpainted background, blur the mask to help it blend better.
```py
mask = Image.new("L", temp_image.size)
mask.paste(resized_img.split()[3], (x, y))
mask = ImageOps.invert(mask)
final_mask = mask.point(lambda p: p > 128 and 255)
mask_blurred = pipeline.mask_processor.blur(final_mask, blur_factor=20)
mask_blurred
```
<div class="flex justify-center">
<img src="https://huggingface.co/datasets/stevhliu/testing-images/resolve/main/blurred-mask.png"/>
</div>
Create a better prompt and pass it to the `generate_outpaint` function to generate the final outpainted image. Again, paste the original image over the final outpainted background.
```py
def generate_outpaint(prompt, negative_prompt, image, mask, seed: int = None):
if seed is None:
seed = random.randint(0, 2**32 - 1)
generator = torch.Generator(device="cpu").manual_seed(seed)
image = pipeline(
prompt,
negative_prompt=negative_prompt,
image=image,
mask_image=mask,
guidance_scale=10.0,
strength=0.8,
num_inference_steps=30,
generator=generator,
).images[0]
return image
prompt = "high quality photo of nike air jordans on a basketball court, highly detailed"
negative_prompt = ""
final_image = generate_outpaint(prompt, negative_prompt, temp_image, mask_blurred, 7688778)
x = (1024 - resized_img.width) // 2
y = (1024 - resized_img.height) // 2
final_image.paste(resized_img, (x, y), resized_img)
final_image
```
<div class="flex justify-center">
<img src="https://huggingface.co/datasets/stevhliu/testing-images/resolve/main/final-outpaint.png"/>
</div>

View File

@@ -41,12 +41,6 @@ An attention processor is a class for applying different types of attention mech
## FusedAttnProcessor2_0
[[autodoc]] models.attention_processor.FusedAttnProcessor2_0
## LoRAAttnAddedKVProcessor
[[autodoc]] models.attention_processor.LoRAAttnAddedKVProcessor
## LoRAXFormersAttnProcessor
[[autodoc]] models.attention_processor.LoRAXFormersAttnProcessor
## SlicedAttnProcessor
[[autodoc]] models.attention_processor.SlicedAttnProcessor
@@ -55,3 +49,6 @@ An attention processor is a class for applying different types of attention mech
## XFormersAttnProcessor
[[autodoc]] models.attention_processor.XFormersAttnProcessor
## AttnProcessorNPU
[[autodoc]] models.attention_processor.AttnProcessorNPU

View File

@@ -25,3 +25,11 @@ All pipelines with [`VaeImageProcessor`] accept PIL Image, PyTorch tensor, or Nu
The [`VaeImageProcessorLDM3D`] accepts RGB and depth inputs and returns RGB and depth outputs.
[[autodoc]] image_processor.VaeImageProcessorLDM3D
## PixArtImageProcessor
[[autodoc]] image_processor.PixArtImageProcessor
## IPAdapterMaskProcessor
[[autodoc]] image_processor.IPAdapterMaskProcessor

View File

@@ -12,10 +12,13 @@ specific language governing permissions and limitations under the License.
# LoRA
LoRA is a fast and lightweight training method that inserts and trains a significantly smaller number of parameters instead of all the model parameters. This produces a smaller file (~100 MBs) and makes it easier to quickly train a model to learn a new concept. LoRA weights are typically loaded into the UNet, text encoder or both. There are two classes for loading LoRA weights:
LoRA is a fast and lightweight training method that inserts and trains a significantly smaller number of parameters instead of all the model parameters. This produces a smaller file (~100 MBs) and makes it easier to quickly train a model to learn a new concept. LoRA weights are typically loaded into the denoiser, text encoder or both. The denoiser usually corresponds to a UNet ([`UNet2DConditionModel`], for example) or a Transformer ([`SD3Transformer2DModel`], for example). There are several classes for loading LoRA weights:
- [`LoraLoaderMixin`] provides functions for loading and unloading, fusing and unfusing, enabling and disabling, and more functions for managing LoRA weights. This class can be used with any model.
- [`StableDiffusionXLLoraLoaderMixin`] is a [Stable Diffusion (SDXL)](../../api/pipelines/stable_diffusion/stable_diffusion_xl) version of the [`LoraLoaderMixin`] class for loading and saving LoRA weights. It can only be used with the SDXL model.
- [`StableDiffusionLoraLoaderMixin`] provides functions for loading and unloading, fusing and unfusing, enabling and disabling, and more functions for managing LoRA weights. This class can be used with any model.
- [`StableDiffusionXLLoraLoaderMixin`] is a [Stable Diffusion (SDXL)](../../api/pipelines/stable_diffusion/stable_diffusion_xl) version of the [`StableDiffusionLoraLoaderMixin`] class for loading and saving LoRA weights. It can only be used with the SDXL model.
- [`SD3LoraLoaderMixin`] provides similar functions for [Stable Diffusion 3](https://huggingface.co/blog/sd3).
- [`AmusedLoraLoaderMixin`] is for the [`AmusedPipeline`].
- [`LoraBaseMixin`] provides a base class with several utility methods to fuse, unfuse, unload, LoRAs and more.
<Tip>
@@ -23,10 +26,22 @@ To learn more about how to load LoRA weights, see the [LoRA](../../using-diffuse
</Tip>
## LoraLoaderMixin
## StableDiffusionLoraLoaderMixin
[[autodoc]] loaders.lora.LoraLoaderMixin
[[autodoc]] loaders.lora_pipeline.StableDiffusionLoraLoaderMixin
## StableDiffusionXLLoraLoaderMixin
[[autodoc]] loaders.lora.StableDiffusionXLLoraLoaderMixin
[[autodoc]] loaders.lora_pipeline.StableDiffusionXLLoraLoaderMixin
## SD3LoraLoaderMixin
[[autodoc]] loaders.lora_pipeline.SD3LoraLoaderMixin
## AmusedLoraLoaderMixin
[[autodoc]] loaders.lora_pipeline.AmusedLoraLoaderMixin
## LoraBaseMixin
[[autodoc]] loaders.lora_base.LoraBaseMixin

View File

@@ -12,7 +12,7 @@ specific language governing permissions and limitations under the License.
# PEFT
Diffusers supports loading adapters such as [LoRA](../../using-diffusers/loading_adapters) with the [PEFT](https://huggingface.co/docs/peft/index) library with the [`~loaders.peft.PeftAdapterMixin`] class. This allows modeling classes in Diffusers like [`UNet2DConditionModel`] to load an adapter.
Diffusers supports loading adapters such as [LoRA](../../using-diffusers/loading_adapters) with the [PEFT](https://huggingface.co/docs/peft/index) library with the [`~loaders.peft.PeftAdapterMixin`] class. This allows modeling classes in Diffusers like [`UNet2DConditionModel`], [`SD3Transformer2DModel`] to operate with an adapter.
<Tip>

View File

@@ -12,26 +12,50 @@ specific language governing permissions and limitations under the License.
# Single files
Diffusers supports loading pretrained pipeline (or model) weights stored in a single file, such as a `ckpt` or `safetensors` file. These single file types are typically produced from community trained models. There are three classes for loading single file weights:
The [`~loaders.FromSingleFileMixin.from_single_file`] method allows you to load:
- [`FromSingleFileMixin`] supports loading pretrained pipeline weights stored in a single file, which can either be a `ckpt` or `safetensors` file.
- [`FromOriginalVAEMixin`] supports loading a pretrained [`AutoencoderKL`] from pretrained ControlNet weights stored in a single file, which can either be a `ckpt` or `safetensors` file.
- [`FromOriginalControlnetMixin`] supports loading pretrained ControlNet weights stored in a single file, which can either be a `ckpt` or `safetensors` file.
* a model stored in a single file, which is useful if you're working with models from the diffusion ecosystem, like Automatic1111, and commonly rely on a single-file layout to store and share models
* a model stored in their originally distributed layout, which is useful if you're working with models finetuned with other services, and want to load it directly into Diffusers model objects and pipelines
<Tip>
> [!TIP]
> Read the [Model files and layouts](../../using-diffusers/other-formats) guide to learn more about the Diffusers-multifolder layout versus the single-file layout, and how to load models stored in these different layouts.
To learn more about how to load single file weights, see the [Load different Stable Diffusion formats](../../using-diffusers/other-formats) loading guide.
## Supported pipelines
</Tip>
- [`StableDiffusionPipeline`]
- [`StableDiffusionImg2ImgPipeline`]
- [`StableDiffusionInpaintPipeline`]
- [`StableDiffusionControlNetPipeline`]
- [`StableDiffusionControlNetImg2ImgPipeline`]
- [`StableDiffusionControlNetInpaintPipeline`]
- [`StableDiffusionUpscalePipeline`]
- [`StableDiffusionXLPipeline`]
- [`StableDiffusionXLImg2ImgPipeline`]
- [`StableDiffusionXLInpaintPipeline`]
- [`StableDiffusionXLInstructPix2PixPipeline`]
- [`StableDiffusionXLControlNetPipeline`]
- [`StableDiffusionXLKDiffusionPipeline`]
- [`StableDiffusion3Pipeline`]
- [`LatentConsistencyModelPipeline`]
- [`LatentConsistencyModelImg2ImgPipeline`]
- [`StableDiffusionControlNetXSPipeline`]
- [`StableDiffusionXLControlNetXSPipeline`]
- [`LEditsPPPipelineStableDiffusion`]
- [`LEditsPPPipelineStableDiffusionXL`]
- [`PIAPipeline`]
## Supported models
- [`UNet2DConditionModel`]
- [`StableCascadeUNet`]
- [`AutoencoderKL`]
- [`ControlNetModel`]
- [`SD3Transformer2DModel`]
## FromSingleFileMixin
[[autodoc]] loaders.single_file.FromSingleFileMixin
## FromOriginalVAEMixin
## FromOriginalModelMixin
[[autodoc]] loaders.autoencoder.FromOriginalVAEMixin
## FromOriginalControlnetMixin
[[autodoc]] loaders.controlnet.FromOriginalControlNetMixin
[[autodoc]] loaders.single_file_model.FromOriginalModelMixin

View File

@@ -12,7 +12,7 @@ specific language governing permissions and limitations under the License.
# UNet
Some training methods - like LoRA and Custom Diffusion - typically target the UNet's attention layers, but these training methods can also target other non-attention layers. Instead of training all of a model's parameters, only a subset of the parameters are trained, which is faster and more efficient. This class is useful if you're *only* loading weights into a UNet. If you need to load weights into the text encoder or a text encoder and UNet, try using the [`~loaders.LoraLoaderMixin.load_lora_weights`] function instead.
Some training methods - like LoRA and Custom Diffusion - typically target the UNet's attention layers, but these training methods can also target other non-attention layers. Instead of training all of a model's parameters, only a subset of the parameters are trained, which is faster and more efficient. This class is useful if you're *only* loading weights into a UNet. If you need to load weights into the text encoder or a text encoder and UNet, try using the [`~loaders.StableDiffusionLoraLoaderMixin.load_lora_weights`] function instead.
The [`UNet2DConditionLoadersMixin`] class provides functions for loading and saving weights, fusing and unfusing LoRAs, disabling and enabling LoRAs, and setting and deleting adapters.

View File

@@ -0,0 +1,19 @@
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->
# AuraFlowTransformer2DModel
A Transformer model for image-like data from [AuraFlow](https://blog.fal.ai/auraflow/).
## AuraFlowTransformer2DModel
[[autodoc]] AuraFlowTransformer2DModel

View File

@@ -21,7 +21,7 @@ The abstract from the paper is:
## Loading from the original format
By default the [`AutoencoderKL`] should be loaded with [`~ModelMixin.from_pretrained`], but it can also be loaded
from the original format using [`FromOriginalVAEMixin.from_single_file`] as follows:
from the original format using [`FromOriginalModelMixin.from_single_file`] as follows:
```py
from diffusers import AutoencoderKL

View File

@@ -10,7 +10,7 @@ an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express o
specific language governing permissions and limitations under the License.
-->
# ControlNet
# ControlNetModel
The ControlNet model was introduced in [Adding Conditional Control to Text-to-Image Diffusion Models](https://huggingface.co/papers/2302.05543) by Lvmin Zhang, Anyi Rao, Maneesh Agrawala. It provides a greater degree of control over text-to-image generation by conditioning the model on additional inputs such as edge maps, depth maps, segmentation maps, and keypoints for pose detection.
@@ -21,7 +21,7 @@ The abstract from the paper is:
## Loading from the original format
By default the [`ControlNetModel`] should be loaded with [`~ModelMixin.from_pretrained`], but it can also be loaded
from the original format using [`FromOriginalControlnetMixin.from_single_file`] as follows:
from the original format using [`FromOriginalModelMixin.from_single_file`] as follows:
```py
from diffusers import StableDiffusionControlNetPipeline, ControlNetModel

View File

@@ -0,0 +1,37 @@
<!--Copyright 2024 The HuggingFace Team and Tencent Hunyuan Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->
# HunyuanDiT2DControlNetModel
HunyuanDiT2DControlNetModel is an implementation of ControlNet for [Hunyuan-DiT](https://arxiv.org/abs/2405.08748).
ControlNet was introduced in [Adding Conditional Control to Text-to-Image Diffusion Models](https://huggingface.co/papers/2302.05543) by Lvmin Zhang, Anyi Rao, and Maneesh Agrawala.
With a ControlNet model, you can provide an additional control image to condition and control Hunyuan-DiT generation. For example, if you provide a depth map, the ControlNet model generates an image that'll preserve the spatial information from the depth map. It is a more flexible and accurate way to control the image generation process.
The abstract from the paper is:
*We present ControlNet, a neural network architecture to add spatial conditioning controls to large, pretrained text-to-image diffusion models. ControlNet locks the production-ready large diffusion models, and reuses their deep and robust encoding layers pretrained with billions of images as a strong backbone to learn a diverse set of conditional controls. The neural architecture is connected with "zero convolutions" (zero-initialized convolution layers) that progressively grow the parameters from zero and ensure that no harmful noise could affect the finetuning. We test various conditioning controls, eg, edges, depth, segmentation, human pose, etc, with Stable Diffusion, using single or multiple conditions, with or without prompts. We show that the training of ControlNets is robust with small (<50k) and large (>1m) datasets. Extensive results show that ControlNet may facilitate wider applications to control image diffusion models.*
This code is implemented by Tencent Hunyuan Team. You can find pre-trained checkpoints for Hunyuan-DiT ControlNets on [Tencent Hunyuan](https://huggingface.co/Tencent-Hunyuan).
## Example For Loading HunyuanDiT2DControlNetModel
```py
from diffusers import HunyuanDiT2DControlNetModel
import torch
controlnet = HunyuanDiT2DControlNetModel.from_pretrained("Tencent-Hunyuan/HunyuanDiT-v1.1-ControlNet-Diffusers-Pose", torch_dtype=torch.float16)
```
## HunyuanDiT2DControlNetModel
[[autodoc]] HunyuanDiT2DControlNetModel

View File

@@ -0,0 +1,42 @@
<!--Copyright 2024 The HuggingFace Team and The InstantX Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->
# SD3ControlNetModel
SD3ControlNetModel is an implementation of ControlNet for Stable Diffusion 3.
The ControlNet model was introduced in [Adding Conditional Control to Text-to-Image Diffusion Models](https://huggingface.co/papers/2302.05543) by Lvmin Zhang, Anyi Rao, Maneesh Agrawala. It provides a greater degree of control over text-to-image generation by conditioning the model on additional inputs such as edge maps, depth maps, segmentation maps, and keypoints for pose detection.
The abstract from the paper is:
*We present ControlNet, a neural network architecture to add spatial conditioning controls to large, pretrained text-to-image diffusion models. ControlNet locks the production-ready large diffusion models, and reuses their deep and robust encoding layers pretrained with billions of images as a strong backbone to learn a diverse set of conditional controls. The neural architecture is connected with "zero convolutions" (zero-initialized convolution layers) that progressively grow the parameters from zero and ensure that no harmful noise could affect the finetuning. We test various conditioning controls, eg, edges, depth, segmentation, human pose, etc, with Stable Diffusion, using single or multiple conditions, with or without prompts. We show that the training of ControlNets is robust with small (<50k) and large (>1m) datasets. Extensive results show that ControlNet may facilitate wider applications to control image diffusion models.*
## Loading from the original format
By default the [`SD3ControlNetModel`] should be loaded with [`~ModelMixin.from_pretrained`].
```py
from diffusers import StableDiffusion3ControlNetPipeline
from diffusers.models import SD3ControlNetModel, SD3MultiControlNetModel
controlnet = SD3ControlNetModel.from_pretrained("InstantX/SD3-Controlnet-Canny")
pipe = StableDiffusion3ControlNetPipeline.from_pretrained("stabilityai/stable-diffusion-3-medium-diffusers", controlnet=controlnet)
```
## SD3ControlNetModel
[[autodoc]] SD3ControlNetModel
## SD3ControlNetOutput
[[autodoc]] models.controlnet_sd3.SD3ControlNetOutput

View File

@@ -0,0 +1,19 @@
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->
# DiTTransformer2DModel
A Transformer model for image-like data from [DiT](https://huggingface.co/papers/2212.09748).
## DiTTransformer2DModel
[[autodoc]] DiTTransformer2DModel

View File

@@ -0,0 +1,20 @@
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->
# HunyuanDiT2DModel
A Diffusion Transformer model for 2D data from [Hunyuan-DiT](https://github.com/Tencent/HunyuanDiT).
## HunyuanDiT2DModel
[[autodoc]] HunyuanDiT2DModel

View File

@@ -0,0 +1,19 @@
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->
## LatteTransformer3DModel
A Diffusion Transformer model for 3D data from [Latte](https://github.com/Vchitect/Latte).
## LatteTransformer3DModel
[[autodoc]] LatteTransformer3DModel

View File

@@ -0,0 +1,20 @@
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->
# LuminaNextDiT2DModel
A Next Version of Diffusion Transformer model for 2D data from [Lumina-T2X](https://github.com/Alpha-VLLM/Lumina-T2X).
## LuminaNextDiT2DModel
[[autodoc]] LuminaNextDiT2DModel

View File

@@ -0,0 +1,19 @@
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->
# PixArtTransformer2DModel
A Transformer model for image-like data from [PixArt-Alpha](https://huggingface.co/papers/2310.00426) and [PixArt-Sigma](https://huggingface.co/papers/2403.04692).
## PixArtTransformer2DModel
[[autodoc]] PixArtTransformer2DModel

View File

@@ -0,0 +1,19 @@
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->
# SD3 Transformer Model
The Transformer model introduced in [Stable Diffusion 3](https://hf.co/papers/2403.03206). Its novelty lies in the MMDiT transformer block.
## SD3Transformer2DModel
[[autodoc]] SD3Transformer2DModel

View File

@@ -10,7 +10,7 @@ an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express o
specific language governing permissions and limitations under the License.
-->
# Transformer2D
# Transformer2DModel
A Transformer model for image-like data from [CompVis](https://huggingface.co/CompVis) that is based on the [Vision Transformer](https://huggingface.co/papers/2010.11929) introduced by Dosovitskiy et al. The [`Transformer2DModel`] accepts discrete (classes of vector embeddings) or continuous (actual embeddings) inputs.
@@ -38,4 +38,4 @@ It is assumed one of the input classes is the masked latent pixel. The predicted
## Transformer2DModelOutput
[[autodoc]] models.transformers.transformer_2d.Transformer2DModelOutput
[[autodoc]] models.modeling_outputs.Transformer2DModelOutput

View File

@@ -10,7 +10,7 @@ an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express o
specific language governing permissions and limitations under the License.
-->
# Transformer Temporal
# TransformerTemporalModel
A Transformer model for video-like data.

View File

@@ -24,4 +24,4 @@ The abstract from the paper is:
## VQEncoderOutput
[[autodoc]] models.vq_model.VQEncoderOutput
[[autodoc]] models.autoencoders.vq_model.VQEncoderOutput

View File

@@ -78,7 +78,6 @@ output = pipe(
)
frames = output.frames[0]
export_to_gif(frames, "animation.gif")
```
Here are some sample outputs:
@@ -101,6 +100,53 @@ AnimateDiff tends to work better with finetuned Stable Diffusion models. If you
</Tip>
### AnimateDiffSDXLPipeline
AnimateDiff can also be used with SDXL models. This is currently an experimental feature as only a beta release of the motion adapter checkpoint is available.
```python
import torch
from diffusers.models import MotionAdapter
from diffusers import AnimateDiffSDXLPipeline, DDIMScheduler
from diffusers.utils import export_to_gif
adapter = MotionAdapter.from_pretrained("guoyww/animatediff-motion-adapter-sdxl-beta", torch_dtype=torch.float16)
model_id = "stabilityai/stable-diffusion-xl-base-1.0"
scheduler = DDIMScheduler.from_pretrained(
model_id,
subfolder="scheduler",
clip_sample=False,
timestep_spacing="linspace",
beta_schedule="linear",
steps_offset=1,
)
pipe = AnimateDiffSDXLPipeline.from_pretrained(
model_id,
motion_adapter=adapter,
scheduler=scheduler,
torch_dtype=torch.float16,
variant="fp16",
).to("cuda")
# enable memory savings
pipe.enable_vae_slicing()
pipe.enable_vae_tiling()
output = pipe(
prompt="a panda surfing in the ocean, realistic, high quality",
negative_prompt="low quality, worst quality",
num_inference_steps=20,
guidance_scale=8,
width=1024,
height=1024,
num_frames=16,
)
frames = output.frames[0]
export_to_gif(frames, "animation.gif")
```
### AnimateDiffVideoToVideoPipeline
AnimateDiff can also be used to generate visually similar videos or enable style/character/background or other edits starting from an initial video, allowing you to seamlessly explore creative possibilities.
@@ -118,7 +164,7 @@ from PIL import Image
adapter = MotionAdapter.from_pretrained("guoyww/animatediff-motion-adapter-v1-5-2", torch_dtype=torch.float16)
# load SD 1.5 based finetuned model
model_id = "SG161222/Realistic_Vision_V5.1_noVAE"
pipe = AnimateDiffVideoToVideoPipeline.from_pretrained(model_id, motion_adapter=adapter, torch_dtype=torch.float16).to("cuda")
pipe = AnimateDiffVideoToVideoPipeline.from_pretrained(model_id, motion_adapter=adapter, torch_dtype=torch.float16)
scheduler = DDIMScheduler.from_pretrained(
model_id,
subfolder="scheduler",
@@ -256,7 +302,6 @@ output = pipe(
)
frames = output.frames[0]
export_to_gif(frames, "animation.gif")
```
<table>
@@ -331,7 +376,6 @@ output = pipe(
)
frames = output.frames[0]
export_to_gif(frames, "animation.gif")
```
<table>
@@ -408,6 +452,29 @@ Make sure to check out the Schedulers [guide](../../using-diffusers/schedulers)
</Tip>
<table>
<tr>
<th align=center>Without FreeInit enabled</th>
<th align=center>With FreeInit enabled</th>
</tr>
<tr>
<td align=center>
panda playing a guitar
<br />
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/animatediff-no-freeinit.gif"
alt="panda playing a guitar"
style="width: 300px;" />
</td>
<td align=center>
panda playing a guitar
<br/>
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/animatediff-freeinit.gif"
alt="panda playing a guitar"
style="width: 300px;" />
</td>
</tr>
</table>
## Using AnimateLCM
[AnimateLCM](https://animatelcm.github.io/) is a motion module checkpoint and an [LCM LoRA](https://huggingface.co/docs/diffusers/using-diffusers/inference_with_lcm_lora) that have been created using a consistency learning strategy that decouples the distillation of the image generation priors and the motion generation priors.
@@ -493,12 +560,32 @@ export_to_gif(frames, "animatelcm-motion-lora.gif")
</table>
## Using `from_single_file` with the MotionAdapter
`diffusers>=0.30.0` supports loading the AnimateDiff checkpoints into the `MotionAdapter` in their original format via `from_single_file`
```python
from diffusers import MotionAdapter
ckpt_path = "https://huggingface.co/Lightricks/LongAnimateDiff/blob/main/lt_long_mm_32_frames.ckpt"
adapter = MotionAdapter.from_single_file(ckpt_path, torch_dtype=torch.float16)
pipe = AnimateDiffPipeline.from_pretrained("emilianJR/epiCRealism", motion_adapter=adapter)
```
## AnimateDiffPipeline
[[autodoc]] AnimateDiffPipeline
- all
- __call__
## AnimateDiffSDXLPipeline
[[autodoc]] AnimateDiffSDXLPipeline
- all
- __call__
## AnimateDiffVideoToVideoPipeline
[[autodoc]] AnimateDiffVideoToVideoPipeline

View File

@@ -20,7 +20,8 @@ The abstract of the paper is the following:
*Although audio generation shares commonalities across different types of audio, such as speech, music, and sound effects, designing models for each type requires careful consideration of specific objectives and biases that can significantly differ from those of other types. To bring us closer to a unified perspective of audio generation, this paper proposes a framework that utilizes the same learning method for speech, music, and sound effect generation. Our framework introduces a general representation of audio, called "language of audio" (LOA). Any audio can be translated into LOA based on AudioMAE, a self-supervised pre-trained representation learning model. In the generation process, we translate any modalities into LOA by using a GPT-2 model, and we perform self-supervised audio generation learning with a latent diffusion model conditioned on LOA. The proposed framework naturally brings advantages such as in-context learning abilities and reusable self-supervised pretrained AudioMAE and latent diffusion models. Experiments on the major benchmarks of text-to-audio, text-to-music, and text-to-speech demonstrate state-of-the-art or competitive performance against previous approaches. Our code, pretrained model, and demo are available at [this https URL](https://audioldm.github.io/audioldm2).*
This pipeline was contributed by [sanchit-gandhi](https://huggingface.co/sanchit-gandhi). The original codebase can be found at [haoheliu/audioldm2](https://github.com/haoheliu/audioldm2).
This pipeline was contributed by [sanchit-gandhi](https://huggingface.co/sanchit-gandhi) and [Nguyễn Công Tú Anh](https://github.com/tuanh123789). The original codebase can be
found at [haoheliu/audioldm2](https://github.com/haoheliu/audioldm2).
## Tips
@@ -36,6 +37,8 @@ See table below for details on the three checkpoints:
| [audioldm2](https://huggingface.co/cvssp/audioldm2) | Text-to-audio | 350M | 1.1B | 1150k |
| [audioldm2-large](https://huggingface.co/cvssp/audioldm2-large) | Text-to-audio | 750M | 1.5B | 1150k |
| [audioldm2-music](https://huggingface.co/cvssp/audioldm2-music) | Text-to-music | 350M | 1.1B | 665k |
| [audioldm2-gigaspeech](https://huggingface.co/anhnct/audioldm2_gigaspeech) | Text-to-speech | 350M | 1.1B |10k |
| [audioldm2-ljspeech](https://huggingface.co/anhnct/audioldm2_ljspeech) | Text-to-speech | 350M | 1.1B | |
### Constructing a prompt
@@ -53,7 +56,7 @@ See table below for details on the three checkpoints:
* The quality of the generated waveforms can vary significantly based on the seed. Try generating with different seeds until you find a satisfactory generation.
* Multiple waveforms can be generated in one go: set `num_waveforms_per_prompt` to a value greater than 1. Automatic scoring will be performed between the generated waveforms and prompt text, and the audios ranked from best to worst accordingly.
The following example demonstrates how to construct good music generation using the aforementioned tips: [example](https://huggingface.co/docs/diffusers/main/en/api/pipelines/audioldm2#diffusers.AudioLDM2Pipeline.__call__.example).
The following example demonstrates how to construct good music and speech generation using the aforementioned tips: [example](https://huggingface.co/docs/diffusers/main/en/api/pipelines/audioldm2#diffusers.AudioLDM2Pipeline.__call__.example).
<Tip>

View File

@@ -0,0 +1,29 @@
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->
# AuraFlow
AuraFlow is inspired by [Stable Diffusion 3](../pipelines/stable_diffusion/stable_diffusion_3.md) and is by far the largest text-to-image generation model that comes with an Apache 2.0 license. This model achieves state-of-the-art results on the [GenEval](https://github.com/djghosh13/geneval) benchmark.
It was developed by the Fal team and more details about it can be found in [this blog post](https://blog.fal.ai/auraflow/).
<Tip>
AuraFlow can be quite expensive to run on consumer hardware devices. However, you can perform a suite of optimizations to run it faster and in a more memory-friendly manner. Check out [this section](https://huggingface.co/blog/sd3#memory-optimizations-for-sd3) for more details.
</Tip>
## AuraFlowPipeline
[[autodoc]] AuraFlowPipeline
- all
- __call__

View File

@@ -12,42 +12,10 @@ specific language governing permissions and limitations under the License.
# AutoPipeline
`AutoPipeline` is designed to:
1. make it easy for you to load a checkpoint for a task without knowing the specific pipeline class to use
2. use multiple pipelines in your workflow
Based on the task, the `AutoPipeline` class automatically retrieves the relevant pipeline given the name or path to the pretrained weights with the `from_pretrained()` method.
To seamlessly switch between tasks with the same checkpoint without reallocating additional memory, use the `from_pipe()` method to transfer the components from the original pipeline to the new one.
```py
from diffusers import AutoPipelineForText2Image
import torch
pipeline = AutoPipelineForText2Image.from_pretrained(
"runwayml/stable-diffusion-v1-5", torch_dtype=torch.float16, use_safetensors=True
).to("cuda")
prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k"
image = pipeline(prompt, num_inference_steps=25).images[0]
```
<Tip>
Check out the [AutoPipeline](../../tutorials/autopipeline) tutorial to learn how to use this API!
</Tip>
`AutoPipeline` supports text-to-image, image-to-image, and inpainting for the following diffusion models:
- [Stable Diffusion](./stable_diffusion/overview)
- [ControlNet](./controlnet)
- [Stable Diffusion XL (SDXL)](./stable_diffusion/stable_diffusion_xl)
- [DeepFloyd IF](./deepfloyd_if)
- [Kandinsky 2.1](./kandinsky)
- [Kandinsky 2.2](./kandinsky_v22)
The `AutoPipeline` is designed to make it easy to load a checkpoint for a task without needing to know the specific pipeline class. Based on the task, the `AutoPipeline` automatically retrieves the correct pipeline class from the checkpoint `model_index.json` file.
> [!TIP]
> Check out the [AutoPipeline](../../tutorials/autopipeline) tutorial to learn how to use this API!
## AutoPipelineForText2Image

View File

@@ -0,0 +1,36 @@
<!--Copyright 2024 The HuggingFace Team and Tencent Hunyuan Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->
# ControlNet with Hunyuan-DiT
HunyuanDiTControlNetPipeline is an implementation of ControlNet for [Hunyuan-DiT](https://arxiv.org/abs/2405.08748).
ControlNet was introduced in [Adding Conditional Control to Text-to-Image Diffusion Models](https://huggingface.co/papers/2302.05543) by Lvmin Zhang, Anyi Rao, and Maneesh Agrawala.
With a ControlNet model, you can provide an additional control image to condition and control Hunyuan-DiT generation. For example, if you provide a depth map, the ControlNet model generates an image that'll preserve the spatial information from the depth map. It is a more flexible and accurate way to control the image generation process.
The abstract from the paper is:
*We present ControlNet, a neural network architecture to add spatial conditioning controls to large, pretrained text-to-image diffusion models. ControlNet locks the production-ready large diffusion models, and reuses their deep and robust encoding layers pretrained with billions of images as a strong backbone to learn a diverse set of conditional controls. The neural architecture is connected with "zero convolutions" (zero-initialized convolution layers) that progressively grow the parameters from zero and ensure that no harmful noise could affect the finetuning. We test various conditioning controls, eg, edges, depth, segmentation, human pose, etc, with Stable Diffusion, using single or multiple conditions, with or without prompts. We show that the training of ControlNets is robust with small (<50k) and large (>1m) datasets. Extensive results show that ControlNet may facilitate wider applications to control image diffusion models.*
This code is implemented by Tencent Hunyuan Team. You can find pre-trained checkpoints for Hunyuan-DiT ControlNets on [Tencent Hunyuan](https://huggingface.co/Tencent-Hunyuan).
<Tip>
Make sure to check out the Schedulers [guide](../../using-diffusers/schedulers) to learn how to explore the tradeoff between scheduler speed and quality, and see the [reuse components across pipelines](../../using-diffusers/loading#reuse-components-across-pipelines) section to learn how to efficiently load the same components into multiple pipelines.
</Tip>
## HunyuanDiTControlNetPipeline
[[autodoc]] HunyuanDiTControlNetPipeline
- all
- __call__

View File

@@ -0,0 +1,39 @@
<!--Copyright 2023 The HuggingFace Team and The InstantX Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->
# ControlNet with Stable Diffusion 3
StableDiffusion3ControlNetPipeline is an implementation of ControlNet for Stable Diffusion 3.
ControlNet was introduced in [Adding Conditional Control to Text-to-Image Diffusion Models](https://huggingface.co/papers/2302.05543) by Lvmin Zhang, Anyi Rao, and Maneesh Agrawala.
With a ControlNet model, you can provide an additional control image to condition and control Stable Diffusion generation. For example, if you provide a depth map, the ControlNet model generates an image that'll preserve the spatial information from the depth map. It is a more flexible and accurate way to control the image generation process.
The abstract from the paper is:
*We present ControlNet, a neural network architecture to add spatial conditioning controls to large, pretrained text-to-image diffusion models. ControlNet locks the production-ready large diffusion models, and reuses their deep and robust encoding layers pretrained with billions of images as a strong backbone to learn a diverse set of conditional controls. The neural architecture is connected with "zero convolutions" (zero-initialized convolution layers) that progressively grow the parameters from zero and ensure that no harmful noise could affect the finetuning. We test various conditioning controls, eg, edges, depth, segmentation, human pose, etc, with Stable Diffusion, using single or multiple conditions, with or without prompts. We show that the training of ControlNets is robust with small (<50k) and large (>1m) datasets. Extensive results show that ControlNet may facilitate wider applications to control image diffusion models.*
This code is implemented by [The InstantX Team](https://huggingface.co/InstantX). You can find pre-trained checkpoints for SD3-ControlNet on [The InstantX Team](https://huggingface.co/InstantX) Hub profile.
<Tip>
Make sure to check out the Schedulers [guide](../../using-diffusers/schedulers) to learn how to explore the tradeoff between scheduler speed and quality, and see the [reuse components across pipelines](../../using-diffusers/loading#reuse-components-across-pipelines) section to learn how to efficiently load the same components into multiple pipelines.
</Tip>
## StableDiffusion3ControlNetPipeline
[[autodoc]] StableDiffusion3ControlNetPipeline
- all
- __call__
## StableDiffusion3PipelineOutput
[[autodoc]] pipelines.stable_diffusion_3.pipeline_output.StableDiffusion3PipelineOutput

View File

@@ -1,3 +1,15 @@
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->
# ControlNet-XS
ControlNet-XS was introduced in [ControlNet-XS](https://vislearn.github.io/ControlNet-XS/) by Denis Zavadski and Carsten Rother. It is based on the observation that the control model in the [original ControlNet](https://huggingface.co/papers/2302.05543) can be made much smaller and still produce good results.
@@ -12,5 +24,16 @@ Here's the overview from the [project page](https://vislearn.github.io/ControlNe
This model was contributed by [UmerHA](https://twitter.com/UmerHAdil). ❤️
<Tip>
> 🧠 Make sure to check out the Schedulers [guide](https://huggingface.co/docs/diffusers/main/en/using-diffusers/schedulers) to learn how to explore the tradeoff between scheduler speed and quality, and see the [reuse components across pipelines](https://huggingface.co/docs/diffusers/main/en/using-diffusers/loading#reuse-components-across-pipelines) section to learn how to efficiently load the same components into multiple pipelines.
Make sure to check out the Schedulers [guide](../../using-diffusers/schedulers) to learn how to explore the tradeoff between scheduler speed and quality, and see the [reuse components across pipelines](../../using-diffusers/loading#reuse-components-across-pipelines) section to learn how to efficiently load the same components into multiple pipelines.
</Tip>
## StableDiffusionControlNetXSPipeline
[[autodoc]] StableDiffusionControlNetXSPipeline
- all
- __call__
## StableDiffusionPipelineOutput
[[autodoc]] pipelines.stable_diffusion.StableDiffusionPipelineOutput

View File

@@ -1,3 +1,15 @@
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->
# ControlNet-XS with Stable Diffusion XL
ControlNet-XS was introduced in [ControlNet-XS](https://vislearn.github.io/ControlNet-XS/) by Denis Zavadski and Carsten Rother. It is based on the observation that the control model in the [original ControlNet](https://huggingface.co/papers/2302.05543) can be made much smaller and still produce good results.
@@ -12,4 +24,22 @@ Here's the overview from the [project page](https://vislearn.github.io/ControlNe
This model was contributed by [UmerHA](https://twitter.com/UmerHAdil). ❤️
> 🧠 Make sure to check out the Schedulers [guide](https://huggingface.co/docs/diffusers/main/en/using-diffusers/schedulers) to learn how to explore the tradeoff between scheduler speed and quality, and see the [reuse components across pipelines](https://huggingface.co/docs/diffusers/main/en/using-diffusers/loading#reuse-components-across-pipelines) section to learn how to efficiently load the same components into multiple pipelines.
<Tip warning={true}>
🧪 Many of the SDXL ControlNet checkpoints are experimental, and there is a lot of room for improvement. Feel free to open an [Issue](https://github.com/huggingface/diffusers/issues/new/choose) and leave us feedback on how we can improve!
</Tip>
<Tip>
Make sure to check out the Schedulers [guide](../../using-diffusers/schedulers) to learn how to explore the tradeoff between scheduler speed and quality, and see the [reuse components across pipelines](../../using-diffusers/loading#reuse-components-across-pipelines) section to learn how to efficiently load the same components into multiple pipelines.
</Tip>
## StableDiffusionXLControlNetXSPipeline
[[autodoc]] StableDiffusionXLControlNetXSPipeline
- all
- __call__
## StableDiffusionPipelineOutput
[[autodoc]] pipelines.stable_diffusion.StableDiffusionPipelineOutput

View File

@@ -0,0 +1,101 @@
<!--Copyright 2024 The HuggingFace Team and Tencent Hunyuan Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->
# Hunyuan-DiT
![chinese elements understanding](https://github.com/gnobitab/diffusers-hunyuan/assets/1157982/39b99036-c3cb-4f16-bb1a-40ec25eda573)
[Hunyuan-DiT : A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding](https://arxiv.org/abs/2405.08748) from Tencent Hunyuan.
The abstract from the paper is:
*We present Hunyuan-DiT, a text-to-image diffusion transformer with fine-grained understanding of both English and Chinese. To construct Hunyuan-DiT, we carefully design the transformer structure, text encoder, and positional encoding. We also build from scratch a whole data pipeline to update and evaluate data for iterative model optimization. For fine-grained language understanding, we train a Multimodal Large Language Model to refine the captions of the images. Finally, Hunyuan-DiT can perform multi-turn multimodal dialogue with users, generating and refining images according to the context. Through our holistic human evaluation protocol with more than 50 professional human evaluators, Hunyuan-DiT sets a new state-of-the-art in Chinese-to-image generation compared with other open-source models.*
You can find the original codebase at [Tencent/HunyuanDiT](https://github.com/Tencent/HunyuanDiT) and all the available checkpoints at [Tencent-Hunyuan](https://huggingface.co/Tencent-Hunyuan/HunyuanDiT).
**Highlights**: HunyuanDiT supports Chinese/English-to-image, multi-resolution generation.
HunyuanDiT has the following components:
* It uses a diffusion transformer as the backbone
* It combines two text encoders, a bilingual CLIP and a multilingual T5 encoder
<Tip>
Make sure to check out the Schedulers [guide](../../using-diffusers/schedulers.md) to learn how to explore the tradeoff between scheduler speed and quality, and see the [reuse components across pipelines](../../using-diffusers/loading.md#reuse-a-pipeline) section to learn how to efficiently load the same components into multiple pipelines.
</Tip>
<Tip>
You can further improve generation quality by passing the generated image from [`HungyuanDiTPipeline`] to the [SDXL refiner](../../using-diffusers/sdxl#base-to-refiner-model) model.
</Tip>
## Optimization
You can optimize the pipeline's runtime and memory consumption with torch.compile and feed-forward chunking. To learn about other optimization methods, check out the [Speed up inference](../../optimization/fp16) and [Reduce memory usage](../../optimization/memory) guides.
### Inference
Use [`torch.compile`](https://huggingface.co/docs/diffusers/main/en/tutorials/fast_diffusion#torchcompile) to reduce the inference latency.
First, load the pipeline:
```python
from diffusers import HunyuanDiTPipeline
import torch
pipeline = HunyuanDiTPipeline.from_pretrained(
"Tencent-Hunyuan/HunyuanDiT-Diffusers", torch_dtype=torch.float16
).to("cuda")
```
Then change the memory layout of the pipelines `transformer` and `vae` components to `torch.channels-last`:
```python
pipeline.transformer.to(memory_format=torch.channels_last)
pipeline.vae.to(memory_format=torch.channels_last)
```
Finally, compile the components and run inference:
```python
pipeline.transformer = torch.compile(pipeline.transformer, mode="max-autotune", fullgraph=True)
pipeline.vae.decode = torch.compile(pipeline.vae.decode, mode="max-autotune", fullgraph=True)
image = pipeline(prompt="一个宇航员在骑马").images[0]
```
The [benchmark](https://gist.github.com/sayakpaul/29d3a14905cfcbf611fe71ebd22e9b23) results on a 80GB A100 machine are:
```bash
With torch.compile(): Average inference time: 12.470 seconds.
Without torch.compile(): Average inference time: 20.570 seconds.
```
### Memory optimization
By loading the T5 text encoder in 8 bits, you can run the pipeline in just under 6 GBs of GPU VRAM. Refer to [this script](https://gist.github.com/sayakpaul/3154605f6af05b98a41081aaba5ca43e) for details.
Furthermore, you can use the [`~HunyuanDiT2DModel.enable_forward_chunking`] method to reduce memory usage. Feed-forward chunking runs the feed-forward layers in a transformer block in a loop instead of all at once. This gives you a trade-off between memory consumption and inference runtime.
```diff
+ pipeline.transformer.enable_forward_chunking(chunk_size=1, dim=1)
```
## HunyuanDiTPipeline
[[autodoc]] HunyuanDiTPipeline
- all
- __call__

View File

@@ -47,6 +47,7 @@ Sample output with I2VGenXL:
* Unlike SVD, it additionally accepts text prompts as inputs.
* It can generate higher resolution videos.
* When using the [`DDIMScheduler`] (which is default for this pipeline), less than 50 steps for inference leads to bad results.
* This implementation is 1-stage variant of I2VGenXL. The main figure in the [I2VGen-XL](https://arxiv.org/abs/2311.04145) paper shows a 2-stage variant, however, 1-stage variant works well. See [this discussion](https://github.com/huggingface/diffusers/discussions/7952) for more details.
## I2VGenXLPipeline
[[autodoc]] I2VGenXLPipeline

View File

@@ -11,7 +11,7 @@ specific language governing permissions and limitations under the License.
Kandinsky 3 is created by [Vladimir Arkhipkin](https://github.com/oriBetelgeuse),[Anastasia Maltseva](https://github.com/NastyaMittseva),[Igor Pavlov](https://github.com/boomb0om),[Andrei Filatov](https://github.com/anvilarth),[Arseniy Shakhmatov](https://github.com/cene555),[Andrey Kuznetsov](https://github.com/kuznetsoffandrey),[Denis Dimitrov](https://github.com/denndimitrov), [Zein Shaheen](https://github.com/zeinsh)
The description from it's Github page:
The description from it's GitHub page:
*Kandinsky 3.0 is an open-source text-to-image diffusion model built upon the Kandinsky2-x model family. In comparison to its predecessors, enhancements have been made to the text understanding and visual quality of the model, achieved by increasing the size of the text encoder and Diffusion U-Net models, respectively.*

View File

@@ -0,0 +1,49 @@
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->
# Kolors: Effective Training of Diffusion Model for Photorealistic Text-to-Image Synthesis
![](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/kolors/kolors_header_collage.png)
Kolors is a large-scale text-to-image generation model based on latent diffusion, developed by [the Kuaishou Kolors team](kwai-kolors@kuaishou.com). Trained on billions of text-image pairs, Kolors exhibits significant advantages over both open-source and closed-source models in visual quality, complex semantic accuracy, and text rendering for both Chinese and English characters. Furthermore, Kolors supports both Chinese and English inputs, demonstrating strong performance in understanding and generating Chinese-specific content. For more details, please refer to this [technical report](https://github.com/Kwai-Kolors/Kolors/blob/master/imgs/Kolors_paper.pdf).
The abstract from the technical report is:
*We present Kolors, a latent diffusion model for text-to-image synthesis, characterized by its profound understanding of both English and Chinese, as well as an impressive degree of photorealism. There are three key insights contributing to the development of Kolors. Firstly, unlike large language model T5 used in Imagen and Stable Diffusion 3, Kolors is built upon the General Language Model (GLM), which enhances its comprehension capabilities in both English and Chinese. Moreover, we employ a multimodal large language model to recaption the extensive training dataset for fine-grained text understanding. These strategies significantly improve Kolors ability to comprehend intricate semantics, particularly those involving multiple entities, and enable its advanced text rendering capabilities. Secondly, we divide the training of Kolors into two phases: the concept learning phase with broad knowledge and the quality improvement phase with specifically curated high-aesthetic data. Furthermore, we investigate the critical role of the noise schedule and introduce a novel schedule to optimize high-resolution image generation. These strategies collectively enhance the visual appeal of the generated high-resolution images. Lastly, we propose a category-balanced benchmark KolorsPrompts, which serves as a guide for the training and evaluation of Kolors. Consequently, even when employing the commonly used U-Net backbone, Kolors has demonstrated remarkable performance in human evaluations, surpassing the existing open-source models and achieving Midjourney-v6 level performance, especially in terms of visual appeal. We will release the code and weights of Kolors at <https://github.com/Kwai-Kolors/Kolors>, and hope that it will benefit future research and applications in the visual generation community.*
## Usage Example
```python
import torch
from diffusers import DPMSolverMultistepScheduler, KolorsPipeline
pipe = KolorsPipeline.from_pretrained("Kwai-Kolors/Kolors-diffusers", torch_dtype=torch.float16, variant="fp16")
pipe.to("cuda")
pipe.scheduler = DPMSolverMultistepScheduler.from_config(pipe.scheduler.config, use_karras_sigmas=True)
image = pipe(
prompt='一张瓢虫的照片,微距,变焦,高质量,电影,拿着一个牌子,写着"可图"',
negative_prompt="",
guidance_scale=6.5,
num_inference_steps=25,
).images[0]
image.save("kolors_sample.png")
```
## KolorsPipeline
[[autodoc]] KolorsPipeline
- all
- __call__

View File

@@ -0,0 +1,75 @@
<!-- # Copyright 2024 The HuggingFace Team. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License. -->
# Latte
![latte text-to-video](https://github.com/Vchitect/Latte/blob/52bc0029899babbd6e9250384c83d8ed2670ff7a/visuals/latte.gif?raw=true)
[Latte: Latent Diffusion Transformer for Video Generation](https://arxiv.org/abs/2401.03048) from Monash University, Shanghai AI Lab, Nanjing University, and Nanyang Technological University.
The abstract from the paper is:
*We propose a novel Latent Diffusion Transformer, namely Latte, for video generation. Latte first extracts spatio-temporal tokens from input videos and then adopts a series of Transformer blocks to model video distribution in the latent space. In order to model a substantial number of tokens extracted from videos, four efficient variants are introduced from the perspective of decomposing the spatial and temporal dimensions of input videos. To improve the quality of generated videos, we determine the best practices of Latte through rigorous experimental analysis, including video clip patch embedding, model variants, timestep-class information injection, temporal positional embedding, and learning strategies. Our comprehensive evaluation demonstrates that Latte achieves state-of-the-art performance across four standard video generation datasets, i.e., FaceForensics, SkyTimelapse, UCF101, and Taichi-HD. In addition, we extend Latte to text-to-video generation (T2V) task, where Latte achieves comparable results compared to recent T2V models. We strongly believe that Latte provides valuable insights for future research on incorporating Transformers into diffusion models for video generation.*
**Highlights**: Latte is a latent diffusion transformer proposed as a backbone for modeling different modalities (trained for text-to-video generation here). It achieves state-of-the-art performance across four standard video benchmarks - [FaceForensics](https://arxiv.org/abs/1803.09179), [SkyTimelapse](https://arxiv.org/abs/1709.07592), [UCF101](https://arxiv.org/abs/1212.0402) and [Taichi-HD](https://arxiv.org/abs/2003.00196). To prepare and download the datasets for evaluation, please refer to [this https URL](https://github.com/Vchitect/Latte/blob/main/docs/datasets_evaluation.md).
<Tip>
Make sure to check out the Schedulers [guide](../../using-diffusers/schedulers.md) to learn how to explore the tradeoff between scheduler speed and quality, and see the [reuse components across pipelines](../../using-diffusers/loading.md#reuse-a-pipeline) section to learn how to efficiently load the same components into multiple pipelines.
</Tip>
### Inference
Use [`torch.compile`](https://huggingface.co/docs/diffusers/main/en/tutorials/fast_diffusion#torchcompile) to reduce the inference latency.
First, load the pipeline:
```python
import torch
from diffusers import LattePipeline
pipeline = LattePipeline.from_pretrained(
"maxin-cn/Latte-1", torch_dtype=torch.float16
).to("cuda")
```
Then change the memory layout of the pipelines `transformer` and `vae` components to `torch.channels-last`:
```python
pipeline.transformer.to(memory_format=torch.channels_last)
pipeline.vae.to(memory_format=torch.channels_last)
```
Finally, compile the components and run inference:
```python
pipeline.transformer = torch.compile(pipeline.transformer)
pipeline.vae.decode = torch.compile(pipeline.vae.decode)
video = pipeline(prompt="A dog wearing sunglasses floating in space, surreal, nebulae in background").frames[0]
```
The [benchmark](https://gist.github.com/a-r-r-o-w/4e1694ca46374793c0361d740a99ff19) results on an 80GB A100 machine are:
```
Without torch.compile(): Average inference time: 16.246 seconds.
With torch.compile(): Average inference time: 14.573 seconds.
```
## LattePipeline
[[autodoc]] LattePipeline
- all
- __call__

View File

@@ -0,0 +1,88 @@
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->
# Lumina-T2X
![concepts](https://github.com/Alpha-VLLM/Lumina-T2X/assets/54879512/9f52eabb-07dc-4881-8257-6d8a5f2a0a5a)
[Lumina-Next : Making Lumina-T2X Stronger and Faster with Next-DiT](https://github.com/Alpha-VLLM/Lumina-T2X/blob/main/assets/lumina-next.pdf) from Alpha-VLLM, OpenGVLab, Shanghai AI Laboratory.
The abstract from the paper is:
*Lumina-T2X is a nascent family of Flow-based Large Diffusion Transformers (Flag-DiT) that establishes a unified framework for transforming noise into various modalities, such as images and videos, conditioned on text instructions. Despite its promising capabilities, Lumina-T2X still encounters challenges including training instability, slow inference, and extrapolation artifacts. In this paper, we present Lumina-Next, an improved version of Lumina-T2X, showcasing stronger generation performance with increased training and inference efficiency. We begin with a comprehensive analysis of the Flag-DiT architecture and identify several suboptimal components, which we address by introducing the Next-DiT architecture with 3D RoPE and sandwich normalizations. To enable better resolution extrapolation, we thoroughly compare different context extrapolation methods applied to text-to-image generation with 3D RoPE, and propose Frequency- and Time-Aware Scaled RoPE tailored for diffusion transformers. Additionally, we introduce a sigmoid time discretization schedule to reduce sampling steps in solving the Flow ODE and the Context Drop method to merge redundant visual tokens for faster network evaluation, effectively boosting the overall sampling speed. Thanks to these improvements, Lumina-Next not only improves the quality and efficiency of basic text-to-image generation but also demonstrates superior resolution extrapolation capabilities and multilingual generation using decoder-based LLMs as the text encoder, all in a zero-shot manner. To further validate Lumina-Next as a versatile generative framework, we instantiate it on diverse tasks including visual recognition, multi-view, audio, music, and point cloud generation, showcasing strong performance across these domains. By releasing all codes and model weights at https://github.com/Alpha-VLLM/Lumina-T2X, we aim to advance the development of next-generation generative AI capable of universal modeling.*
**Highlights**: Lumina-Next is a next-generation Diffusion Transformer that significantly enhances text-to-image generation, multilingual generation, and multitask performance by introducing the Next-DiT architecture, 3D RoPE, and frequency- and time-aware RoPE, among other improvements.
Lumina-Next has the following components:
* It improves sampling efficiency with fewer and faster Steps.
* It uses a Next-DiT as a transformer backbone with Sandwichnorm 3D RoPE, and Grouped-Query Attention.
* It uses a Frequency- and Time-Aware Scaled RoPE.
---
[Lumina-T2X: Transforming Text into Any Modality, Resolution, and Duration via Flow-based Large Diffusion Transformers](https://arxiv.org/abs/2405.05945) from Alpha-VLLM, OpenGVLab, Shanghai AI Laboratory.
The abstract from the paper is:
*Sora unveils the potential of scaling Diffusion Transformer for generating photorealistic images and videos at arbitrary resolutions, aspect ratios, and durations, yet it still lacks sufficient implementation details. In this technical report, we introduce the Lumina-T2X family - a series of Flow-based Large Diffusion Transformers (Flag-DiT) equipped with zero-initialized attention, as a unified framework designed to transform noise into images, videos, multi-view 3D objects, and audio clips conditioned on text instructions. By tokenizing the latent spatial-temporal space and incorporating learnable placeholders such as [nextline] and [nextframe] tokens, Lumina-T2X seamlessly unifies the representations of different modalities across various spatial-temporal resolutions. This unified approach enables training within a single framework for different modalities and allows for flexible generation of multimodal data at any resolution, aspect ratio, and length during inference. Advanced techniques like RoPE, RMSNorm, and flow matching enhance the stability, flexibility, and scalability of Flag-DiT, enabling models of Lumina-T2X to scale up to 7 billion parameters and extend the context window to 128K tokens. This is particularly beneficial for creating ultra-high-definition images with our Lumina-T2I model and long 720p videos with our Lumina-T2V model. Remarkably, Lumina-T2I, powered by a 5-billion-parameter Flag-DiT, requires only 35% of the training computational costs of a 600-million-parameter naive DiT. Our further comprehensive analysis underscores Lumina-T2X's preliminary capability in resolution extrapolation, high-resolution editing, generating consistent 3D views, and synthesizing videos with seamless transitions. We expect that the open-sourcing of Lumina-T2X will further foster creativity, transparency, and diversity in the generative AI community.*
You can find the original codebase at [Alpha-VLLM](https://github.com/Alpha-VLLM/Lumina-T2X) and all the available checkpoints at [Alpha-VLLM Lumina Family](https://huggingface.co/collections/Alpha-VLLM/lumina-family-66423205bedb81171fd0644b).
**Highlights**: Lumina-T2X supports Any Modality, Resolution, and Duration.
Lumina-T2X has the following components:
* It uses a Flow-based Large Diffusion Transformer as the backbone
* It supports different any modalities with one backbone and corresponding encoder, decoder.
<Tip>
Make sure to check out the Schedulers [guide](../../using-diffusers/schedulers.md) to learn how to explore the tradeoff between scheduler speed and quality, and see the [reuse components across pipelines](../../using-diffusers/loading.md#reuse-a-pipeline) section to learn how to efficiently load the same components into multiple pipelines.
</Tip>
### Inference (Text-to-Image)
Use [`torch.compile`](https://huggingface.co/docs/diffusers/main/en/tutorials/fast_diffusion#torchcompile) to reduce the inference latency.
First, load the pipeline:
```python
from diffusers import LuminaText2ImgPipeline
import torch
pipeline = LuminaText2ImgPipeline.from_pretrained(
"Alpha-VLLM/Lumina-Next-SFT-diffusers", torch_dtype=torch.bfloat16
).to("cuda")
```
Then change the memory layout of the pipelines `transformer` and `vae` components to `torch.channels-last`:
```python
pipeline.transformer.to(memory_format=torch.channels_last)
pipeline.vae.to(memory_format=torch.channels_last)
```
Finally, compile the components and run inference:
```python
pipeline.transformer = torch.compile(pipeline.transformer, mode="max-autotune", fullgraph=True)
pipeline.vae.decode = torch.compile(pipeline.vae.decode, mode="max-autotune", fullgraph=True)
image = pipeline(prompt="Upper body of a young woman in a Victorian-era outfit with brass goggles and leather straps. Background shows an industrial revolution cityscape with smoky skies and tall, metal structures").images[0]
```
## LuminaText2ImgPipeline
[[autodoc]] LuminaText2ImgPipeline
- all
- __call__

View File

@@ -0,0 +1,76 @@
<!--Copyright 2024 Marigold authors and The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->
# Marigold Pipelines for Computer Vision Tasks
![marigold](https://marigoldmonodepth.github.io/images/teaser_collage_compressed.jpg)
Marigold was proposed in [Repurposing Diffusion-Based Image Generators for Monocular Depth Estimation](https://huggingface.co/papers/2312.02145), a CVPR 2024 Oral paper by [Bingxin Ke](http://www.kebingxin.com/), [Anton Obukhov](https://www.obukhov.ai/), [Shengyu Huang](https://shengyuh.github.io/), [Nando Metzger](https://nandometzger.github.io/), [Rodrigo Caye Daudt](https://rcdaudt.github.io/), and [Konrad Schindler](https://scholar.google.com/citations?user=FZuNgqIAAAAJ&hl=en).
The idea is to repurpose the rich generative prior of Text-to-Image Latent Diffusion Models (LDMs) for traditional computer vision tasks.
Initially, this idea was explored to fine-tune Stable Diffusion for Monocular Depth Estimation, as shown in the teaser above.
Later,
- [Tianfu Wang](https://tianfwang.github.io/) trained the first Latent Consistency Model (LCM) of Marigold, which unlocked fast single-step inference;
- [Kevin Qu](https://www.linkedin.com/in/kevin-qu-b3417621b/?locale=en_US) extended the approach to Surface Normals Estimation;
- [Anton Obukhov](https://www.obukhov.ai/) contributed the pipelines and documentation into diffusers (enabled and supported by [YiYi Xu](https://yiyixuxu.github.io/) and [Sayak Paul](https://sayak.dev/)).
The abstract from the paper is:
*Monocular depth estimation is a fundamental computer vision task. Recovering 3D depth from a single image is geometrically ill-posed and requires scene understanding, so it is not surprising that the rise of deep learning has led to a breakthrough. The impressive progress of monocular depth estimators has mirrored the growth in model capacity, from relatively modest CNNs to large Transformer architectures. Still, monocular depth estimators tend to struggle when presented with images with unfamiliar content and layout, since their knowledge of the visual world is restricted by the data seen during training, and challenged by zero-shot generalization to new domains. This motivates us to explore whether the extensive priors captured in recent generative diffusion models can enable better, more generalizable depth estimation. We introduce Marigold, a method for affine-invariant monocular depth estimation that is derived from Stable Diffusion and retains its rich prior knowledge. The estimator can be fine-tuned in a couple of days on a single GPU using only synthetic training data. It delivers state-of-the-art performance across a wide range of datasets, including over 20% performance gains in specific cases. Project page: https://marigoldmonodepth.github.io.*
## Available Pipelines
Each pipeline supports one Computer Vision task, which takes an input RGB image as input and produces a *prediction* of the modality of interest, such as a depth map of the input image.
Currently, the following tasks are implemented:
| Pipeline | Predicted Modalities | Demos |
|---------------------------------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------|:--------------------------------------------------------------------------------------------------------------------------------------------------:|
| [MarigoldDepthPipeline](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/marigold/pipeline_marigold_depth.py) | [Depth](https://en.wikipedia.org/wiki/Depth_map), [Disparity](https://en.wikipedia.org/wiki/Binocular_disparity) | [Fast Demo (LCM)](https://huggingface.co/spaces/prs-eth/marigold-lcm), [Slow Original Demo (DDIM)](https://huggingface.co/spaces/prs-eth/marigold) |
| [MarigoldNormalsPipeline](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/marigold/pipeline_marigold_normals.py) | [Surface normals](https://en.wikipedia.org/wiki/Normal_mapping) | [Fast Demo (LCM)](https://huggingface.co/spaces/prs-eth/marigold-normals-lcm) |
## Available Checkpoints
The original checkpoints can be found under the [PRS-ETH](https://huggingface.co/prs-eth/) Hugging Face organization.
<Tip>
Make sure to check out the Schedulers [guide](../../using-diffusers/schedulers) to learn how to explore the tradeoff between scheduler speed and quality, and see the [reuse components across pipelines](../../using-diffusers/loading#reuse-components-across-pipelines) section to learn how to efficiently load the same components into multiple pipelines. Also, to know more about reducing the memory usage of this pipeline, refer to the ["Reduce memory usage"] section [here](../../using-diffusers/svd#reduce-memory-usage).
</Tip>
<Tip warning={true}>
Marigold pipelines were designed and tested only with `DDIMScheduler` and `LCMScheduler`.
Depending on the scheduler, the number of inference steps required to get reliable predictions varies, and there is no universal value that works best across schedulers.
Because of that, the default value of `num_inference_steps` in the `__call__` method of the pipeline is set to `None` (see the API reference).
Unless set explicitly, its value will be taken from the checkpoint configuration `model_index.json`.
This is done to ensure high-quality predictions when calling the pipeline with just the `image` argument.
</Tip>
See also Marigold [usage examples](marigold_usage).
## MarigoldDepthPipeline
[[autodoc]] MarigoldDepthPipeline
- all
- __call__
## MarigoldNormalsPipeline
[[autodoc]] MarigoldNormalsPipeline
- all
- __call__
## MarigoldDepthOutput
[[autodoc]] pipelines.marigold.pipeline_marigold_depth.MarigoldDepthOutput
## MarigoldNormalsOutput
[[autodoc]] pipelines.marigold.pipeline_marigold_normals.MarigoldNormalsOutput

View File

@@ -97,6 +97,11 @@ The table below lists all the pipelines currently available in 🤗 Diffusers an
- to
- components
[[autodoc]] pipelines.StableDiffusionMixin.enable_freeu
[[autodoc]] pipelines.StableDiffusionMixin.disable_freeu
## FlaxDiffusionPipeline
[[autodoc]] pipelines.pipeline_flax_utils.FlaxDiffusionPipeline

View File

@@ -0,0 +1,51 @@
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->
# Perturbed-Attention Guidance
[Perturbed-Attention Guidance (PAG)](https://ku-cvlab.github.io/Perturbed-Attention-Guidance/) is a new diffusion sampling guidance that improves sample quality across both unconditional and conditional settings, achieving this without requiring further training or the integration of external modules.
PAG was introduced in [Self-Rectifying Diffusion Sampling with Perturbed-Attention Guidance](https://huggingface.co/papers/2403.17377) by Donghoon Ahn, Hyoungwon Cho, Jaewon Min, Wooseok Jang, Jungwoo Kim, SeonHwa Kim, Hyun Hee Park, Kyong Hwan Jin and Seungryong Kim.
The abstract from the paper is:
*Recent studies have demonstrated that diffusion models are capable of generating high-quality samples, but their quality heavily depends on sampling guidance techniques, such as classifier guidance (CG) and classifier-free guidance (CFG). These techniques are often not applicable in unconditional generation or in various downstream tasks such as image restoration. In this paper, we propose a novel sampling guidance, called Perturbed-Attention Guidance (PAG), which improves diffusion sample quality across both unconditional and conditional settings, achieving this without requiring additional training or the integration of external modules. PAG is designed to progressively enhance the structure of samples throughout the denoising process. It involves generating intermediate samples with degraded structure by substituting selected self-attention maps in diffusion U-Net with an identity matrix, by considering the self-attention mechanisms' ability to capture structural information, and guiding the denoising process away from these degraded samples. In both ADM and Stable Diffusion, PAG surprisingly improves sample quality in conditional and even unconditional scenarios. Moreover, PAG significantly improves the baseline performance in various downstream tasks where existing guidances such as CG or CFG cannot be fully utilized, including ControlNet with empty prompts and image restoration such as inpainting and deblurring.*
## StableDiffusionPAGPipeline
[[autodoc]] StableDiffusionPAGPipeline
- all
- __call__
## StableDiffusionControlNetPAGPipeline
[[autodoc]] StableDiffusionControlNetPAGPipeline
- all
- __call__
## StableDiffusionXLPAGPipeline
[[autodoc]] StableDiffusionXLPAGPipeline
- all
- __call__
## StableDiffusionXLPAGImg2ImgPipeline
[[autodoc]] StableDiffusionXLPAGImg2ImgPipeline
- all
- __call__
## StableDiffusionXLPAGInpaintPipeline
[[autodoc]] StableDiffusionXLPAGInpaintPipeline
- all
- __call__
## StableDiffusionXLControlNetPAGPipeline
[[autodoc]] StableDiffusionXLControlNetPAGPipeline
- all
- __call__

View File

@@ -31,7 +31,7 @@ Some notes about this pipeline:
<Tip>
Make sure to check out the Schedulers [guide](../../using-diffusers/schedulers) to learn how to explore the tradeoff between scheduler speed and quality, and see the [reuse components across pipelines](../../using-diffusers/loading#reuse-components-across-pipelines) section to learn how to efficiently load the same components into multiple pipelines.
Make sure to check out the Schedulers [guide](../../using-diffusers/schedulers.md) to learn how to explore the tradeoff between scheduler speed and quality, and see the [reuse components across pipelines](../../using-diffusers/loading.md#reuse-a-pipeline) section to learn how to efficiently load the same components into multiple pipelines.
</Tip>
@@ -75,7 +75,7 @@ with torch.no_grad():
prompt_embeds, prompt_attention_mask, negative_embeds, negative_prompt_attention_mask = pipe.encode_prompt(prompt)
```
Since text embeddings have been computed, remove the `text_encoder` and `pipe` from the memory, and free up som GPU VRAM:
Since text embeddings have been computed, remove the `text_encoder` and `pipe` from the memory, and free up some GPU VRAM:
```python
import gc
@@ -146,4 +146,3 @@ While loading the `text_encoder`, you set `load_in_8bit` to `True`. You could al
[[autodoc]] PixArtAlphaPipeline
- all
- __call__

View File

@@ -0,0 +1,155 @@
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->
# PixArt-Σ
![](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/pixart/header_collage_sigma.jpg)
[PixArt-Σ: Weak-to-Strong Training of Diffusion Transformer for 4K Text-to-Image Generation](https://huggingface.co/papers/2403.04692) is Junsong Chen, Jincheng Yu, Chongjian Ge, Lewei Yao, Enze Xie, Yue Wu, Zhongdao Wang, James Kwok, Ping Luo, Huchuan Lu, and Zhenguo Li.
The abstract from the paper is:
*In this paper, we introduce PixArt-Σ, a Diffusion Transformer model (DiT) capable of directly generating images at 4K resolution. PixArt-Σ represents a significant advancement over its predecessor, PixArt-α, offering images of markedly higher fidelity and improved alignment with text prompts. A key feature of PixArt-Σ is its training efficiency. Leveraging the foundational pre-training of PixArt-α, it evolves from the weaker baseline to a stronger model via incorporating higher quality data, a process we term “weak-to-strong training”. The advancements in PixArt-Σ are twofold: (1) High-Quality Training Data: PixArt-Σ incorporates superior-quality image data, paired with more precise and detailed image captions. (2) Efficient Token Compression: we propose a novel attention module within the DiT framework that compresses both keys and values, significantly improving efficiency and facilitating ultra-high-resolution image generation. Thanks to these improvements, PixArt-Σ achieves superior image quality and user prompt adherence capabilities with significantly smaller model size (0.6B parameters) than existing text-to-image diffusion models, such as SDXL (2.6B parameters) and SD Cascade (5.1B parameters). Moreover, PixArt-Σs capability to generate 4K images supports the creation of high-resolution posters and wallpapers, efficiently bolstering the production of highquality visual content in industries such as film and gaming.*
You can find the original codebase at [PixArt-alpha/PixArt-sigma](https://github.com/PixArt-alpha/PixArt-sigma) and all the available checkpoints at [PixArt-alpha](https://huggingface.co/PixArt-alpha).
Some notes about this pipeline:
* It uses a Transformer backbone (instead of a UNet) for denoising. As such it has a similar architecture as [DiT](https://hf.co/docs/transformers/model_doc/dit).
* It was trained using text conditions computed from T5. This aspect makes the pipeline better at following complex text prompts with intricate details.
* It is good at producing high-resolution images at different aspect ratios. To get the best results, the authors recommend some size brackets which can be found [here](https://github.com/PixArt-alpha/PixArt-sigma/blob/master/diffusion/data/datasets/utils.py).
* It rivals the quality of state-of-the-art text-to-image generation systems (as of this writing) such as PixArt-α, Stable Diffusion XL, Playground V2.0 and DALL-E 3, while being more efficient than them.
* It shows the ability of generating super high resolution images, such as 2048px or even 4K.
* It shows that text-to-image models can grow from a weak model to a stronger one through several improvements (VAEs, datasets, and so on.)
<Tip>
Make sure to check out the Schedulers [guide](../../using-diffusers/schedulers) to learn how to explore the tradeoff between scheduler speed and quality, and see the [reuse components across pipelines](../../using-diffusers/loading#reuse-a-pipeline) section to learn how to efficiently load the same components into multiple pipelines.
</Tip>
<Tip>
You can further improve generation quality by passing the generated image from [`PixArtSigmaPipeline`] to the [SDXL refiner](../../using-diffusers/sdxl#base-to-refiner-model) model.
</Tip>
## Inference with under 8GB GPU VRAM
Run the [`PixArtSigmaPipeline`] with under 8GB GPU VRAM by loading the text encoder in 8-bit precision. Let's walk through a full-fledged example.
First, install the [bitsandbytes](https://github.com/TimDettmers/bitsandbytes) library:
```bash
pip install -U bitsandbytes
```
Then load the text encoder in 8-bit:
```python
from transformers import T5EncoderModel
from diffusers import PixArtSigmaPipeline
import torch
text_encoder = T5EncoderModel.from_pretrained(
"PixArt-alpha/PixArt-Sigma-XL-2-1024-MS",
subfolder="text_encoder",
load_in_8bit=True,
device_map="auto",
)
pipe = PixArtSigmaPipeline.from_pretrained(
"PixArt-alpha/PixArt-Sigma-XL-2-1024-MS",
text_encoder=text_encoder,
transformer=None,
device_map="balanced"
)
```
Now, use the `pipe` to encode a prompt:
```python
with torch.no_grad():
prompt = "cute cat"
prompt_embeds, prompt_attention_mask, negative_embeds, negative_prompt_attention_mask = pipe.encode_prompt(prompt)
```
Since text embeddings have been computed, remove the `text_encoder` and `pipe` from the memory, and free up some GPU VRAM:
```python
import gc
def flush():
gc.collect()
torch.cuda.empty_cache()
del text_encoder
del pipe
flush()
```
Then compute the latents with the prompt embeddings as inputs:
```python
pipe = PixArtSigmaPipeline.from_pretrained(
"PixArt-alpha/PixArt-Sigma-XL-2-1024-MS",
text_encoder=None,
torch_dtype=torch.float16,
).to("cuda")
latents = pipe(
negative_prompt=None,
prompt_embeds=prompt_embeds,
negative_prompt_embeds=negative_embeds,
prompt_attention_mask=prompt_attention_mask,
negative_prompt_attention_mask=negative_prompt_attention_mask,
num_images_per_prompt=1,
output_type="latent",
).images
del pipe.transformer
flush()
```
<Tip>
Notice that while initializing `pipe`, you're setting `text_encoder` to `None` so that it's not loaded.
</Tip>
Once the latents are computed, pass it off to the VAE to decode into a real image:
```python
with torch.no_grad():
image = pipe.vae.decode(latents / pipe.vae.config.scaling_factor, return_dict=False)[0]
image = pipe.image_processor.postprocess(image, output_type="pil")[0]
image.save("cat.png")
```
By deleting components you aren't using and flushing the GPU VRAM, you should be able to run [`PixArtSigmaPipeline`] with under 8GB GPU VRAM.
![](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/pixart/8bits_cat.png)
If you want a report of your memory-usage, run this [script](https://gist.github.com/sayakpaul/3ae0f847001d342af27018a96f467e4e).
<Tip warning={true}>
Text embeddings computed in 8-bit can impact the quality of the generated images because of the information loss in the representation space caused by the reduced precision. It's recommended to compare the outputs with and without 8-bit.
</Tip>
While loading the `text_encoder`, you set `load_in_8bit` to `True`. You could also specify `load_in_4bit` to bring your memory requirements down even further to under 7GB.
## PixArtSigmaPipeline
[[autodoc]] PixArtSigmaPipeline
- all
- __call__

View File

@@ -10,9 +10,7 @@ an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express o
specific language governing permissions and limitations under the License.
-->
# Text-to-Image Generation with Adapter Conditioning
## Overview
# T2I-Adapter
[T2I-Adapter: Learning Adapters to Dig out More Controllable Ability for Text-to-Image Diffusion Models](https://arxiv.org/abs/2302.08453) by Chong Mou, Xintao Wang, Liangbin Xie, Jian Zhang, Zhongang Qi, Ying Shan, Xiaohu Qie.
@@ -24,219 +22,8 @@ The abstract of the paper is the following:
This model was contributed by the community contributor [HimariO](https://github.com/HimariO) ❤️ .
## Available Pipelines:
| Pipeline | Tasks | Demo
|---|---|:---:|
| [StableDiffusionAdapterPipeline](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/t2i_adapter/pipeline_stable_diffusion_adapter.py) | *Text-to-Image Generation with T2I-Adapter Conditioning* | -
| [StableDiffusionXLAdapterPipeline](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/t2i_adapter/pipeline_stable_diffusion_xl_adapter.py) | *Text-to-Image Generation with T2I-Adapter Conditioning on StableDiffusion-XL* | -
## Usage example with the base model of StableDiffusion-1.4/1.5
In the following we give a simple example of how to use a *T2I-Adapter* checkpoint with Diffusers for inference based on StableDiffusion-1.4/1.5.
All adapters use the same pipeline.
1. Images are first converted into the appropriate *control image* format.
2. The *control image* and *prompt* are passed to the [`StableDiffusionAdapterPipeline`].
Let's have a look at a simple example using the [Color Adapter](https://huggingface.co/TencentARC/t2iadapter_color_sd14v1).
```python
from diffusers.utils import load_image, make_image_grid
image = load_image("https://huggingface.co/datasets/diffusers/docs-images/resolve/main/t2i-adapter/color_ref.png")
```
![img](https://huggingface.co/datasets/diffusers/docs-images/resolve/main/t2i-adapter/color_ref.png)
Then we can create our color palette by simply resizing it to 8 by 8 pixels and then scaling it back to original size.
```python
from PIL import Image
color_palette = image.resize((8, 8))
color_palette = color_palette.resize((512, 512), resample=Image.Resampling.NEAREST)
```
Let's take a look at the processed image.
![img](https://huggingface.co/datasets/diffusers/docs-images/resolve/main/t2i-adapter/color_palette.png)
Next, create the adapter pipeline
```py
import torch
from diffusers import StableDiffusionAdapterPipeline, T2IAdapter
adapter = T2IAdapter.from_pretrained("TencentARC/t2iadapter_color_sd14v1", torch_dtype=torch.float16)
pipe = StableDiffusionAdapterPipeline.from_pretrained(
"CompVis/stable-diffusion-v1-4",
adapter=adapter,
torch_dtype=torch.float16,
)
pipe.to("cuda")
```
Finally, pass the prompt and control image to the pipeline
```py
# fix the random seed, so you will get the same result as the example
generator = torch.Generator("cuda").manual_seed(7)
out_image = pipe(
"At night, glowing cubes in front of the beach",
image=color_palette,
generator=generator,
).images[0]
make_image_grid([image, color_palette, out_image], rows=1, cols=3)
```
![img](https://huggingface.co/datasets/diffusers/docs-images/resolve/main/t2i-adapter/color_output.png)
## Usage example with the base model of StableDiffusion-XL
In the following we give a simple example of how to use a *T2I-Adapter* checkpoint with Diffusers for inference based on StableDiffusion-XL.
All adapters use the same pipeline.
1. Images are first downloaded into the appropriate *control image* format.
2. The *control image* and *prompt* are passed to the [`StableDiffusionXLAdapterPipeline`].
Let's have a look at a simple example using the [Sketch Adapter](https://huggingface.co/Adapter/t2iadapter/tree/main/sketch_sdxl_1.0).
```python
from diffusers.utils import load_image, make_image_grid
sketch_image = load_image("https://huggingface.co/Adapter/t2iadapter/resolve/main/sketch.png").convert("L")
```
![img](https://huggingface.co/Adapter/t2iadapter/resolve/main/sketch.png)
Then, create the adapter pipeline
```py
import torch
from diffusers import (
T2IAdapter,
StableDiffusionXLAdapterPipeline,
DDPMScheduler
)
model_id = "stabilityai/stable-diffusion-xl-base-1.0"
adapter = T2IAdapter.from_pretrained("Adapter/t2iadapter", subfolder="sketch_sdxl_1.0", torch_dtype=torch.float16, adapter_type="full_adapter_xl")
scheduler = DDPMScheduler.from_pretrained(model_id, subfolder="scheduler")
pipe = StableDiffusionXLAdapterPipeline.from_pretrained(
model_id, adapter=adapter, safety_checker=None, torch_dtype=torch.float16, variant="fp16", scheduler=scheduler
)
pipe.to("cuda")
```
Finally, pass the prompt and control image to the pipeline
```py
# fix the random seed, so you will get the same result as the example
generator = torch.Generator().manual_seed(42)
sketch_image_out = pipe(
prompt="a photo of a dog in real world, high quality",
negative_prompt="extra digit, fewer digits, cropped, worst quality, low quality",
image=sketch_image,
generator=generator,
guidance_scale=7.5
).images[0]
make_image_grid([sketch_image, sketch_image_out], rows=1, cols=2)
```
![img](https://huggingface.co/Adapter/t2iadapter/resolve/main/sketch_output.png)
## Available checkpoints
Non-diffusers checkpoints can be found under [TencentARC/T2I-Adapter](https://huggingface.co/TencentARC/T2I-Adapter/tree/main/models).
### T2I-Adapter with Stable Diffusion 1.4
| Model Name | Control Image Overview| Control Image Example | Generated Image Example |
|---|---|---|---|
|[TencentARC/t2iadapter_color_sd14v1](https://huggingface.co/TencentARC/t2iadapter_color_sd14v1)<br/> *Trained with spatial color palette* | An image with 8x8 color palette.|<a href="https://huggingface.co/datasets/diffusers/docs-images/resolve/main/t2i-adapter/color_sample_input.png"><img width="64" style="margin:0;padding:0;" src="https://huggingface.co/datasets/diffusers/docs-images/resolve/main/t2i-adapter/color_sample_input.png"/></a>|<a href="https://huggingface.co/datasets/diffusers/docs-images/resolve/main/t2i-adapter/color_sample_output.png"><img width="64" src="https://huggingface.co/datasets/diffusers/docs-images/resolve/main/t2i-adapter/color_sample_output.png"/></a>|
|[TencentARC/t2iadapter_canny_sd14v1](https://huggingface.co/TencentARC/t2iadapter_canny_sd14v1)<br/> *Trained with canny edge detection* | A monochrome image with white edges on a black background.|<a href="https://huggingface.co/datasets/diffusers/docs-images/resolve/main/t2i-adapter/canny_sample_input.png"><img width="64" style="margin:0;padding:0;" src="https://huggingface.co/datasets/diffusers/docs-images/resolve/main/t2i-adapter/canny_sample_input.png"/></a>|<a href="https://huggingface.co/datasets/diffusers/docs-images/resolve/main/t2i-adapter/canny_sample_output.png"><img width="64" src="https://huggingface.co/datasets/diffusers/docs-images/resolve/main/t2i-adapter/canny_sample_output.png"/></a>|
|[TencentARC/t2iadapter_sketch_sd14v1](https://huggingface.co/TencentARC/t2iadapter_sketch_sd14v1)<br/> *Trained with [PidiNet](https://github.com/zhuoinoulu/pidinet) edge detection* | A hand-drawn monochrome image with white outlines on a black background.|<a href="https://huggingface.co/datasets/diffusers/docs-images/resolve/main/t2i-adapter/sketch_sample_input.png"><img width="64" style="margin:0;padding:0;" src="https://huggingface.co/datasets/diffusers/docs-images/resolve/main/t2i-adapter/sketch_sample_input.png"/></a>|<a href="https://huggingface.co/datasets/diffusers/docs-images/resolve/main/t2i-adapter/sketch_sample_output.png"><img width="64" src="https://huggingface.co/datasets/diffusers/docs-images/resolve/main/t2i-adapter/sketch_sample_output.png"/></a>|
|[TencentARC/t2iadapter_depth_sd14v1](https://huggingface.co/TencentARC/t2iadapter_depth_sd14v1)<br/> *Trained with Midas depth estimation* | A grayscale image with black representing deep areas and white representing shallow areas.|<a href="https://huggingface.co/datasets/diffusers/docs-images/resolve/main/t2i-adapter/depth_sample_input.png"><img width="64" src="https://huggingface.co/datasets/diffusers/docs-images/resolve/main/t2i-adapter/depth_sample_input.png"/></a>|<a href="https://huggingface.co/datasets/diffusers/docs-images/resolve/main/t2i-adapter/depth_sample_output.png"><img width="64" src="https://huggingface.co/datasets/diffusers/docs-images/resolve/main/t2i-adapter/depth_sample_output.png"/></a>|
|[TencentARC/t2iadapter_openpose_sd14v1](https://huggingface.co/TencentARC/t2iadapter_openpose_sd14v1)<br/> *Trained with OpenPose bone image* | A [OpenPose bone](https://github.com/CMU-Perceptual-Computing-Lab/openpose) image.|<a href="https://huggingface.co/datasets/diffusers/docs-images/resolve/main/t2i-adapter/openpose_sample_input.png"><img width="64" src="https://huggingface.co/datasets/diffusers/docs-images/resolve/main/t2i-adapter/openpose_sample_input.png"/></a>|<a href="https://huggingface.co/datasets/diffusers/docs-images/resolve/main/t2i-adapter/openpose_sample_output.png"><img width="64" src="https://huggingface.co/datasets/diffusers/docs-images/resolve/main/t2i-adapter/openpose_sample_output.png"/></a>|
|[TencentARC/t2iadapter_keypose_sd14v1](https://huggingface.co/TencentARC/t2iadapter_keypose_sd14v1)<br/> *Trained with mmpose skeleton image* | A [mmpose skeleton](https://github.com/open-mmlab/mmpose) image.|<a href="https://huggingface.co/datasets/diffusers/docs-images/resolve/main/t2i-adapter/keypose_sample_input.png"><img width="64" src="https://huggingface.co/datasets/diffusers/docs-images/resolve/main/t2i-adapter/keypose_sample_input.png"/></a>|<a href="https://huggingface.co/datasets/diffusers/docs-images/resolve/main/t2i-adapter/keypose_sample_output.png"><img width="64" src="https://huggingface.co/datasets/diffusers/docs-images/resolve/main/t2i-adapter/keypose_sample_output.png"/></a>|
|[TencentARC/t2iadapter_seg_sd14v1](https://huggingface.co/TencentARC/t2iadapter_seg_sd14v1)<br/>*Trained with semantic segmentation* | An [custom](https://github.com/TencentARC/T2I-Adapter/discussions/25) segmentation protocol image.|<a href="https://huggingface.co/datasets/diffusers/docs-images/resolve/main/t2i-adapter/seg_sample_input.png"><img width="64" src="https://huggingface.co/datasets/diffusers/docs-images/resolve/main/t2i-adapter/seg_sample_input.png"/></a>|<a href="https://huggingface.co/datasets/diffusers/docs-images/resolve/main/t2i-adapter/seg_sample_output.png"><img width="64" src="https://huggingface.co/datasets/diffusers/docs-images/resolve/main/t2i-adapter/seg_sample_output.png"/></a> |
|[TencentARC/t2iadapter_canny_sd15v2](https://huggingface.co/TencentARC/t2iadapter_canny_sd15v2)||
|[TencentARC/t2iadapter_depth_sd15v2](https://huggingface.co/TencentARC/t2iadapter_depth_sd15v2)||
|[TencentARC/t2iadapter_sketch_sd15v2](https://huggingface.co/TencentARC/t2iadapter_sketch_sd15v2)||
|[TencentARC/t2iadapter_zoedepth_sd15v1](https://huggingface.co/TencentARC/t2iadapter_zoedepth_sd15v1)||
|[Adapter/t2iadapter, subfolder='sketch_sdxl_1.0'](https://huggingface.co/Adapter/t2iadapter/tree/main/sketch_sdxl_1.0)||
|[Adapter/t2iadapter, subfolder='canny_sdxl_1.0'](https://huggingface.co/Adapter/t2iadapter/tree/main/canny_sdxl_1.0)||
|[Adapter/t2iadapter, subfolder='openpose_sdxl_1.0'](https://huggingface.co/Adapter/t2iadapter/tree/main/openpose_sdxl_1.0)||
## Combining multiple adapters
[`MultiAdapter`] can be used for applying multiple conditionings at once.
Here we use the keypose adapter for the character posture and the depth adapter for creating the scene.
```py
from diffusers.utils import load_image, make_image_grid
cond_keypose = load_image(
"https://huggingface.co/datasets/diffusers/docs-images/resolve/main/t2i-adapter/keypose_sample_input.png"
)
cond_depth = load_image(
"https://huggingface.co/datasets/diffusers/docs-images/resolve/main/t2i-adapter/depth_sample_input.png"
)
cond = [cond_keypose, cond_depth]
prompt = ["A man walking in an office room with a nice view"]
```
The two control images look as such:
![img](https://huggingface.co/datasets/diffusers/docs-images/resolve/main/t2i-adapter/keypose_sample_input.png)
![img](https://huggingface.co/datasets/diffusers/docs-images/resolve/main/t2i-adapter/depth_sample_input.png)
`MultiAdapter` combines keypose and depth adapters.
`adapter_conditioning_scale` balances the relative influence of the different adapters.
```py
import torch
from diffusers import StableDiffusionAdapterPipeline, MultiAdapter, T2IAdapter
adapters = MultiAdapter(
[
T2IAdapter.from_pretrained("TencentARC/t2iadapter_keypose_sd14v1"),
T2IAdapter.from_pretrained("TencentARC/t2iadapter_depth_sd14v1"),
]
)
adapters = adapters.to(torch.float16)
pipe = StableDiffusionAdapterPipeline.from_pretrained(
"CompVis/stable-diffusion-v1-4",
torch_dtype=torch.float16,
adapter=adapters,
).to("cuda")
image = pipe(prompt, cond, adapter_conditioning_scale=[0.8, 0.8]).images[0]
make_image_grid([cond_keypose, cond_depth, image], rows=1, cols=3)
```
![img](https://huggingface.co/datasets/diffusers/docs-images/resolve/main/t2i-adapter/keypose_depth_sample_output.png)
## T2I-Adapter vs ControlNet
T2I-Adapter is similar to [ControlNet](https://huggingface.co/docs/diffusers/main/en/api/pipelines/controlnet).
T2I-Adapter uses a smaller auxiliary network which is only run once for the entire diffusion process.
However, T2I-Adapter performs slightly worse than ControlNet.
## StableDiffusionAdapterPipeline
[[autodoc]] StableDiffusionAdapterPipeline
- all
- __call__
@@ -248,6 +35,7 @@ However, T2I-Adapter performs slightly worse than ControlNet.
- disable_xformers_memory_efficient_attention
## StableDiffusionXLAdapterPipeline
[[autodoc]] StableDiffusionXLAdapterPipeline
- all
- __call__

View File

@@ -177,7 +177,7 @@ inpaint = StableDiffusionInpaintPipeline(**text2img.components)
The Stable Diffusion pipelines are automatically supported in [Gradio](https://github.com/gradio-app/gradio/), a library that makes creating beautiful and user-friendly machine learning apps on the web a breeze. First, make sure you have Gradio installed:
```
```sh
pip install -U gradio
```

View File

@@ -48,7 +48,7 @@ from diffusers import DiffusionPipeline, DPMSolverMultistepScheduler
import torch
repo_id = "stabilityai/stable-diffusion-2-base"
pipe = DiffusionPipeline.from_pretrained(repo_id, torch_dtype=torch.float16, revision="fp16")
pipe = DiffusionPipeline.from_pretrained(repo_id, torch_dtype=torch.float16, variant="fp16")
pipe.scheduler = DPMSolverMultistepScheduler.from_config(pipe.scheduler.config)
pipe = pipe.to("cuda")
@@ -72,7 +72,7 @@ init_image = load_image(img_url).resize((512, 512))
mask_image = load_image(mask_url).resize((512, 512))
repo_id = "stabilityai/stable-diffusion-2-inpainting"
pipe = DiffusionPipeline.from_pretrained(repo_id, torch_dtype=torch.float16, revision="fp16")
pipe = DiffusionPipeline.from_pretrained(repo_id, torch_dtype=torch.float16, variant="fp16")
pipe.scheduler = DPMSolverMultistepScheduler.from_config(pipe.scheduler.config)
pipe = pipe.to("cuda")

View File

@@ -0,0 +1,315 @@
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->
# Stable Diffusion 3
Stable Diffusion 3 (SD3) was proposed in [Scaling Rectified Flow Transformers for High-Resolution Image Synthesis](https://arxiv.org/pdf/2403.03206.pdf) by Patrick Esser, Sumith Kulal, Andreas Blattmann, Rahim Entezari, Jonas Muller, Harry Saini, Yam Levi, Dominik Lorenz, Axel Sauer, Frederic Boesel, Dustin Podell, Tim Dockhorn, Zion English, Kyle Lacey, Alex Goodwin, Yannik Marek, and Robin Rombach.
The abstract from the paper is:
*Diffusion models create data from noise by inverting the forward paths of data towards noise and have emerged as a powerful generative modeling technique for high-dimensional, perceptual data such as images and videos. Rectified flow is a recent generative model formulation that connects data and noise in a straight line. Despite its better theoretical properties and conceptual simplicity, it is not yet decisively established as standard practice. In this work, we improve existing noise sampling techniques for training rectified flow models by biasing them towards perceptually relevant scales. Through a large-scale study, we demonstrate the superior performance of this approach compared to established diffusion formulations for high-resolution text-to-image synthesis. Additionally, we present a novel transformer-based architecture for text-to-image generation that uses separate weights for the two modalities and enables a bidirectional flow of information between image and text tokens, improving text comprehension typography, and human preference ratings. We demonstrate that this architecture follows predictable scaling trends and correlates lower validation loss to improved text-to-image synthesis as measured by various metrics and human evaluations.*
## Usage Example
_As the model is gated, before using it with diffusers you first need to go to the [Stable Diffusion 3 Medium Hugging Face page](https://huggingface.co/stabilityai/stable-diffusion-3-medium-diffusers), fill in the form and accept the gate. Once you are in, you need to login so that your system knows youve accepted the gate._
Use the command below to log in:
```bash
huggingface-cli login
```
<Tip>
The SD3 pipeline uses three text encoders to generate an image. Model offloading is necessary in order for it to run on most commodity hardware. Please use the `torch.float16` data type for additional memory savings.
</Tip>
```python
import torch
from diffusers import StableDiffusion3Pipeline
pipe = StableDiffusion3Pipeline.from_pretrained("stabilityai/stable-diffusion-3-medium-diffusers", torch_dtype=torch.float16)
pipe.to("cuda")
image = pipe(
prompt="a photo of a cat holding a sign that says hello world",
negative_prompt="",
num_inference_steps=28,
height=1024,
width=1024,
guidance_scale=7.0,
).images[0]
image.save("sd3_hello_world.png")
```
## Memory Optimisations for SD3
SD3 uses three text encoders, one if which is the very large T5-XXL model. This makes it challenging to run the model on GPUs with less than 24GB of VRAM, even when using `fp16` precision. The following section outlines a few memory optimizations in Diffusers that make it easier to run SD3 on low resource hardware.
### Running Inference with Model Offloading
The most basic memory optimization available in Diffusers allows you to offload the components of the model to CPU during inference in order to save memory, while seeing a slight increase in inference latency. Model offloading will only move a model component onto the GPU when it needs to be executed, while keeping the remaining components on the CPU.
```python
import torch
from diffusers import StableDiffusion3Pipeline
pipe = StableDiffusion3Pipeline.from_pretrained("stabilityai/stable-diffusion-3-medium-diffusers", torch_dtype=torch.float16)
pipe.enable_model_cpu_offload()
image = pipe(
prompt="a photo of a cat holding a sign that says hello world",
negative_prompt="",
num_inference_steps=28,
height=1024,
width=1024,
guidance_scale=7.0,
).images[0]
image.save("sd3_hello_world.png")
```
### Dropping the T5 Text Encoder during Inference
Removing the memory-intensive 4.7B parameter T5-XXL text encoder during inference can significantly decrease the memory requirements for SD3 with only a slight loss in performance.
```python
import torch
from diffusers import StableDiffusion3Pipeline
pipe = StableDiffusion3Pipeline.from_pretrained(
"stabilityai/stable-diffusion-3-medium-diffusers",
text_encoder_3=None,
tokenizer_3=None,
torch_dtype=torch.float16
)
pipe.to("cuda")
image = pipe(
prompt="a photo of a cat holding a sign that says hello world",
negative_prompt="",
num_inference_steps=28,
height=1024,
width=1024,
guidance_scale=7.0,
).images[0]
image.save("sd3_hello_world-no-T5.png")
```
### Using a Quantized Version of the T5 Text Encoder
We can leverage the `bitsandbytes` library to load and quantize the T5-XXL text encoder to 8-bit precision. This allows you to keep using all three text encoders while only slightly impacting performance.
First install the `bitsandbytes` library.
```shell
pip install bitsandbytes
```
Then load the T5-XXL model using the `BitsAndBytesConfig`.
```python
import torch
from diffusers import StableDiffusion3Pipeline
from transformers import T5EncoderModel, BitsAndBytesConfig
quantization_config = BitsAndBytesConfig(load_in_8bit=True)
model_id = "stabilityai/stable-diffusion-3-medium-diffusers"
text_encoder = T5EncoderModel.from_pretrained(
model_id,
subfolder="text_encoder_3",
quantization_config=quantization_config,
)
pipe = StableDiffusion3Pipeline.from_pretrained(
model_id,
text_encoder_3=text_encoder,
device_map="balanced",
torch_dtype=torch.float16
)
image = pipe(
prompt="a photo of a cat holding a sign that says hello world",
negative_prompt="",
num_inference_steps=28,
height=1024,
width=1024,
guidance_scale=7.0,
).images[0]
image.save("sd3_hello_world-8bit-T5.png")
```
You can find the end-to-end script [here](https://gist.github.com/sayakpaul/82acb5976509851f2db1a83456e504f1).
## Performance Optimizations for SD3
### Using Torch Compile to Speed Up Inference
Using compiled components in the SD3 pipeline can speed up inference by as much as 4X. The following code snippet demonstrates how to compile the Transformer and VAE components of the SD3 pipeline.
```python
import torch
from diffusers import StableDiffusion3Pipeline
torch.set_float32_matmul_precision("high")
torch._inductor.config.conv_1x1_as_mm = True
torch._inductor.config.coordinate_descent_tuning = True
torch._inductor.config.epilogue_fusion = False
torch._inductor.config.coordinate_descent_check_all_directions = True
pipe = StableDiffusion3Pipeline.from_pretrained(
"stabilityai/stable-diffusion-3-medium-diffusers",
torch_dtype=torch.float16
).to("cuda")
pipe.set_progress_bar_config(disable=True)
pipe.transformer.to(memory_format=torch.channels_last)
pipe.vae.to(memory_format=torch.channels_last)
pipe.transformer = torch.compile(pipe.transformer, mode="max-autotune", fullgraph=True)
pipe.vae.decode = torch.compile(pipe.vae.decode, mode="max-autotune", fullgraph=True)
# Warm Up
prompt = "a photo of a cat holding a sign that says hello world"
for _ in range(3):
_ = pipe(prompt=prompt, generator=torch.manual_seed(1))
# Run Inference
image = pipe(prompt=prompt, generator=torch.manual_seed(1)).images[0]
image.save("sd3_hello_world.png")
```
Check out the full script [here](https://gist.github.com/sayakpaul/508d89d7aad4f454900813da5d42ca97).
## Using Long Prompts with the T5 Text Encoder
By default, the T5 Text Encoder prompt uses a maximum sequence length of `256`. This can be adjusted by setting the `max_sequence_length` to accept fewer or more tokens. Keep in mind that longer sequences require additional resources and result in longer generation times, such as during batch inference.
```python
prompt = "A whimsical and creative image depicting a hybrid creature that is a mix of a waffle and a hippopotamus, basking in a river of melted butter amidst a breakfast-themed landscape. It features the distinctive, bulky body shape of a hippo. However, instead of the usual grey skin, the creatures body resembles a golden-brown, crispy waffle fresh off the griddle. The skin is textured with the familiar grid pattern of a waffle, each square filled with a glistening sheen of syrup. The environment combines the natural habitat of a hippo with elements of a breakfast table setting, a river of warm, melted butter, with oversized utensils or plates peeking out from the lush, pancake-like foliage in the background, a towering pepper mill standing in for a tree. As the sun rises in this fantastical world, it casts a warm, buttery glow over the scene. The creature, content in its butter river, lets out a yawn. Nearby, a flock of birds take flight"
image = pipe(
prompt=prompt,
negative_prompt="",
num_inference_steps=28,
guidance_scale=4.5,
max_sequence_length=512,
).images[0]
```
### Sending a different prompt to the T5 Text Encoder
You can send a different prompt to the CLIP Text Encoders and the T5 Text Encoder to prevent the prompt from being truncated by the CLIP Text Encoders and to improve generation.
<Tip>
The prompt with the CLIP Text Encoders is still truncated to the 77 token limit.
</Tip>
```python
prompt = "A whimsical and creative image depicting a hybrid creature that is a mix of a waffle and a hippopotamus, basking in a river of melted butter amidst a breakfast-themed landscape. A river of warm, melted butter, pancake-like foliage in the background, a towering pepper mill standing in for a tree."
prompt_3 = "A whimsical and creative image depicting a hybrid creature that is a mix of a waffle and a hippopotamus, basking in a river of melted butter amidst a breakfast-themed landscape. It features the distinctive, bulky body shape of a hippo. However, instead of the usual grey skin, the creatures body resembles a golden-brown, crispy waffle fresh off the griddle. The skin is textured with the familiar grid pattern of a waffle, each square filled with a glistening sheen of syrup. The environment combines the natural habitat of a hippo with elements of a breakfast table setting, a river of warm, melted butter, with oversized utensils or plates peeking out from the lush, pancake-like foliage in the background, a towering pepper mill standing in for a tree. As the sun rises in this fantastical world, it casts a warm, buttery glow over the scene. The creature, content in its butter river, lets out a yawn. Nearby, a flock of birds take flight"
image = pipe(
prompt=prompt,
prompt_3=prompt_3,
negative_prompt="",
num_inference_steps=28,
guidance_scale=4.5,
max_sequence_length=512,
).images[0]
```
## Tiny AutoEncoder for Stable Diffusion 3
Tiny AutoEncoder for Stable Diffusion (TAESD3) is a tiny distilled version of Stable Diffusion 3's VAE by [Ollin Boer Bohan](https://github.com/madebyollin/taesd) that can decode [`StableDiffusion3Pipeline`] latents almost instantly.
To use with Stable Diffusion 3:
```python
import torch
from diffusers import StableDiffusion3Pipeline, AutoencoderTiny
pipe = StableDiffusion3Pipeline.from_pretrained(
"stabilityai/stable-diffusion-3-medium-diffusers", torch_dtype=torch.float16
)
pipe.vae = AutoencoderTiny.from_pretrained("madebyollin/taesd3", torch_dtype=torch.float16)
pipe = pipe.to("cuda")
prompt = "slice of delicious New York-style berry cheesecake"
image = pipe(prompt, num_inference_steps=25).images[0]
image.save("cheesecake.png")
```
## Loading the original checkpoints via `from_single_file`
The `SD3Transformer2DModel` and `StableDiffusion3Pipeline` classes support loading the original checkpoints via the `from_single_file` method. This method allows you to load the original checkpoint files that were used to train the models.
## Loading the original checkpoints for the `SD3Transformer2DModel`
```python
from diffusers import SD3Transformer2DModel
model = SD3Transformer2DModel.from_single_file("https://huggingface.co/stabilityai/stable-diffusion-3-medium/blob/main/sd3_medium.safetensors")
```
## Loading the single checkpoint for the `StableDiffusion3Pipeline`
### Loading the single file checkpoint without T5
```python
import torch
from diffusers import StableDiffusion3Pipeline
pipe = StableDiffusion3Pipeline.from_single_file(
"https://huggingface.co/stabilityai/stable-diffusion-3-medium/blob/main/sd3_medium_incl_clips.safetensors",
torch_dtype=torch.float16,
text_encoder_3=None
)
pipe.enable_model_cpu_offload()
image = pipe("a picture of a cat holding a sign that says hello world").images[0]
image.save('sd3-single-file.png')
```
### Loading the single file checkpoint with T5
> [!TIP]
> The following example loads a checkpoint stored in a 8-bit floating point format which requires PyTorch 2.3 or later.
```python
import torch
from diffusers import StableDiffusion3Pipeline
pipe = StableDiffusion3Pipeline.from_single_file(
"https://huggingface.co/stabilityai/stable-diffusion-3-medium/blob/main/sd3_medium_incl_clips_t5xxlfp8.safetensors",
torch_dtype=torch.float16,
)
pipe.enable_model_cpu_offload()
image = pipe("a picture of a cat holding a sign that says hello world").images[0]
image.save('sd3-single-file-t5-fp8.png')
```
## StableDiffusion3Pipeline
[[autodoc]] StableDiffusion3Pipeline
- all
- __call__

View File

@@ -12,7 +12,7 @@ specific language governing permissions and limitations under the License.
# EDMDPMSolverMultistepScheduler
`EDMDPMSolverMultistepScheduler` is a [Karras formulation](https://huggingface.co/papers/2206.00364) of `DPMSolverMultistep`, a multistep scheduler from [DPM-Solver: A Fast ODE Solver for Diffusion Probabilistic Model Sampling in Around 10 Steps](https://huggingface.co/papers/2206.00927) and [DPM-Solver++: Fast Solver for Guided Sampling of Diffusion Probabilistic Models](https://huggingface.co/papers/2211.01095) by Cheng Lu, Yuhao Zhou, Fan Bao, Jianfei Chen, Chongxuan Li, and Jun Zhu.
`EDMDPMSolverMultistepScheduler` is a [Karras formulation](https://huggingface.co/papers/2206.00364) of `DPMSolverMultistepScheduler`, a multistep scheduler from [DPM-Solver: A Fast ODE Solver for Diffusion Probabilistic Model Sampling in Around 10 Steps](https://huggingface.co/papers/2206.00927) and [DPM-Solver++: Fast Solver for Guided Sampling of Diffusion Probabilistic Models](https://huggingface.co/papers/2211.01095) by Cheng Lu, Yuhao Zhou, Fan Bao, Jianfei Chen, Chongxuan Li, and Jun Zhu.
DPMSolver (and the improved version DPMSolver++) is a fast dedicated high-order solver for diffusion ODEs with convergence order guarantee. Empirically, DPMSolver sampling with only 20 steps can generate high-quality
samples, and it can generate quite good samples even in 10 steps.

View File

@@ -0,0 +1,18 @@
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->
# FlowMatchEulerDiscreteScheduler
`FlowMatchEulerDiscreteScheduler` is based on the flow-matching sampling introduced in [Stable Diffusion 3](https://arxiv.org/abs/2403.03206).
## FlowMatchEulerDiscreteScheduler
[[autodoc]] FlowMatchEulerDiscreteScheduler

View File

@@ -0,0 +1,18 @@
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->
# FlowMatchHeunDiscreteScheduler
`FlowMatchHeunDiscreteScheduler` is based on the flow-matching sampling introduced in [EDM](https://arxiv.org/abs/2403.03206).
## FlowMatchHeunDiscreteScheduler
[[autodoc]] FlowMatchHeunDiscreteScheduler

View File

@@ -12,7 +12,7 @@ specific language governing permissions and limitations under the License.
# DPMSolverMultistepScheduler
`DPMSolverMultistep` is a multistep scheduler from [DPM-Solver: A Fast ODE Solver for Diffusion Probabilistic Model Sampling in Around 10 Steps](https://huggingface.co/papers/2206.00927) and [DPM-Solver++: Fast Solver for Guided Sampling of Diffusion Probabilistic Models](https://huggingface.co/papers/2211.01095) by Cheng Lu, Yuhao Zhou, Fan Bao, Jianfei Chen, Chongxuan Li, and Jun Zhu.
`DPMSolverMultistepScheduler` is a multistep scheduler from [DPM-Solver: A Fast ODE Solver for Diffusion Probabilistic Model Sampling in Around 10 Steps](https://huggingface.co/papers/2206.00927) and [DPM-Solver++: Fast Solver for Guided Sampling of Diffusion Probabilistic Models](https://huggingface.co/papers/2211.01095) by Cheng Lu, Yuhao Zhou, Fan Bao, Jianfei Chen, Chongxuan Li, and Jun Zhu.
DPMSolver (and the improved version DPMSolver++) is a fast dedicated high-order solver for diffusion ODEs with convergence order guarantee. Empirically, DPMSolver sampling with only 20 steps can generate high-quality
samples, and it can generate quite good samples even in 10 steps.

View File

@@ -37,3 +37,7 @@ Utility and helper functions for working with 🤗 Diffusers.
## make_image_grid
[[autodoc]] utils.make_image_grid
## randn_tensor
[[autodoc]] utils.torch_utils.randn_tensor

View File

@@ -0,0 +1,21 @@
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->
# Video Processor
The [`VideoProcessor`] provides a unified API for video pipelines to prepare inputs for VAE encoding and post-processing outputs once they're decoded. The class inherits [`VaeImageProcessor`] so it includes transformations such as resizing, normalization, and conversion between PIL Image, PyTorch, and NumPy arrays.
## VideoProcessor
[[autodoc]] video_processor.VideoProcessor.preprocess_video
[[autodoc]] video_processor.VideoProcessor.postprocess_video

View File

@@ -22,14 +22,13 @@ We enormously value feedback from the community, so please do not be afraid to s
## Overview
You can contribute in many ways ranging from answering questions on issues to adding new diffusion models to
the core library.
You can contribute in many ways ranging from answering questions on issues and discussions to adding new diffusion models to the core library.
In the following, we give an overview of different ways to contribute, ranked by difficulty in ascending order. All of them are valuable to the community.
* 1. Asking and answering questions on [the Diffusers discussion forum](https://discuss.huggingface.co/c/discussion-related-to-httpsgithubcomhuggingfacediffusers) or on [Discord](https://discord.gg/G7tWnz98XR).
* 2. Opening new issues on [the GitHub Issues tab](https://github.com/huggingface/diffusers/issues/new/choose).
* 3. Answering issues on [the GitHub Issues tab](https://github.com/huggingface/diffusers/issues).
* 2. Opening new issues on [the GitHub Issues tab](https://github.com/huggingface/diffusers/issues/new/choose) or new discussions on [the GitHub Discussions tab](https://github.com/huggingface/diffusers/discussions/new/choose).
* 3. Answering issues on [the GitHub Issues tab](https://github.com/huggingface/diffusers/issues) or discussions on [the GitHub Discussions tab](https://github.com/huggingface/diffusers/discussions).
* 4. Fix a simple issue, marked by the "Good first issue" label, see [here](https://github.com/huggingface/diffusers/issues?q=is%3Aopen+is%3Aissue+label%3A%22good+first+issue%22).
* 5. Contribute to the [documentation](https://github.com/huggingface/diffusers/tree/main/docs/source).
* 6. Contribute a [Community Pipeline](https://github.com/huggingface/diffusers/issues?q=is%3Aopen+is%3Aissue+label%3Acommunity-examples).
@@ -63,7 +62,7 @@ In the same spirit, you are of immense help to the community by answering such q
**Please** keep in mind that the more effort you put into asking or answering a question, the higher
the quality of the publicly documented knowledge. In the same way, well-posed and well-answered questions create a high-quality knowledge database accessible to everybody, while badly posed questions or answers reduce the overall quality of the public knowledge database.
In short, a high quality question or answer is *precise*, *concise*, *relevant*, *easy-to-understand*, *accessible*, and *well-formated/well-posed*. For more information, please have a look through the [How to write a good issue](#how-to-write-a-good-issue) section.
In short, a high quality question or answer is *precise*, *concise*, *relevant*, *easy-to-understand*, *accessible*, and *well-formatted/well-posed*. For more information, please have a look through the [How to write a good issue](#how-to-write-a-good-issue) section.
**NOTE about channels**:
[*The forum*](https://discuss.huggingface.co/c/discussion-related-to-httpsgithubcomhuggingfacediffusers/63) is much better indexed by search engines, such as Google. Posts are ranked by popularity rather than chronologically. Hence, it's easier to look up questions and answers that we posted some time ago.
@@ -99,7 +98,7 @@ This means in more detail:
- Format your code.
- Do not include any external libraries except for Diffusers depending on them.
- **Always** provide all necessary information about your environment; for this, you can run: `diffusers-cli env` in your shell and copy-paste the displayed information to the issue.
- Explain the issue. If the reader doesn't know what the issue is and why it is an issue, she cannot solve it.
- Explain the issue. If the reader doesn't know what the issue is and why it is an issue, (s)he cannot solve it.
- **Always** make sure the reader can reproduce your issue with as little effort as possible. If your code snippet cannot be run because of missing libraries or undefined variables, the reader cannot help you. Make sure your reproducible code snippet is as minimal as possible and can be copy-pasted into a simple Python shell.
- If in order to reproduce your issue a model and/or dataset is required, make sure the reader has access to that model or dataset. You can always upload your model or dataset to the [Hub](https://huggingface.co) to make it easily downloadable. Try to keep your model and dataset as small as possible, to make the reproduction of your issue as effortless as possible.
@@ -198,38 +197,81 @@ Anything displayed on [the official Diffusers doc page](https://huggingface.co/d
Please have a look at [this page](https://github.com/huggingface/diffusers/tree/main/docs) on how to verify changes made to the documentation locally.
### 6. Contribute a community pipeline
[Pipelines](https://huggingface.co/docs/diffusers/api/pipelines/overview) are usually the first point of contact between the Diffusers library and the user.
Pipelines are examples of how to use Diffusers [models](https://huggingface.co/docs/diffusers/api/models/overview) and [schedulers](https://huggingface.co/docs/diffusers/api/schedulers/overview).
We support two types of pipelines:
> [!TIP]
> Read the [Community pipelines](../using-diffusers/custom_pipeline_overview#community-pipelines) guide to learn more about the difference between a GitHub and Hugging Face Hub community pipeline. If you're interested in why we have community pipelines, take a look at GitHub Issue [#841](https://github.com/huggingface/diffusers/issues/841) (basically, we can't maintain all the possible ways diffusion models can be used for inference but we also don't want to prevent the community from building them).
- Official Pipelines
- Community Pipelines
Contributing a community pipeline is a great way to share your creativity and work with the community. It lets you build on top of the [`DiffusionPipeline`] so that anyone can load and use it by setting the `custom_pipeline` parameter. This section will walk you through how to create a simple pipeline where the UNet only does a single forward pass and calls the scheduler once (a "one-step" pipeline).
Both official and community pipelines follow the same design and consist of the same type of components.
1. Create a one_step_unet.py file for your community pipeline. This file can contain whatever package you want to use as long as it's installed by the user. Make sure you only have one pipeline class that inherits from [`DiffusionPipeline`] to load model weights and the scheduler configuration from the Hub. Add a UNet and scheduler to the `__init__` function.
Official pipelines are tested and maintained by the core maintainers of Diffusers. Their code
resides in [src/diffusers/pipelines](https://github.com/huggingface/diffusers/tree/main/src/diffusers/pipelines).
In contrast, community pipelines are contributed and maintained purely by the **community** and are **not** tested.
They reside in [examples/community](https://github.com/huggingface/diffusers/tree/main/examples/community) and while they can be accessed via the [PyPI diffusers package](https://pypi.org/project/diffusers/), their code is not part of the PyPI distribution.
You should also add the `register_modules` function to ensure your pipeline and its components can be saved with [`~DiffusionPipeline.save_pretrained`].
The reason for the distinction is that the core maintainers of the Diffusers library cannot maintain and test all
possible ways diffusion models can be used for inference, but some of them may be of interest to the community.
Officially released diffusion pipelines,
such as Stable Diffusion are added to the core src/diffusers/pipelines package which ensures
high quality of maintenance, no backward-breaking code changes, and testing.
More bleeding edge pipelines should be added as community pipelines. If usage for a community pipeline is high, the pipeline can be moved to the official pipelines upon request from the community. This is one of the ways we strive to be a community-driven library.
```py
from diffusers import DiffusionPipeline
import torch
To add a community pipeline, one should add a <name-of-the-community>.py file to [examples/community](https://github.com/huggingface/diffusers/tree/main/examples/community) and adapt the [examples/community/README.md](https://github.com/huggingface/diffusers/tree/main/examples/community/README.md) to include an example of the new pipeline.
class UnetSchedulerOneForwardPipeline(DiffusionPipeline):
def __init__(self, unet, scheduler):
super().__init__()
An example can be seen [here](https://github.com/huggingface/diffusers/pull/2400).
self.register_modules(unet=unet, scheduler=scheduler)
```
Community pipeline PRs are only checked at a superficial level and ideally they should be maintained by their original authors.
1. In the forward pass (which we recommend defining as `__call__`), you can add any feature you'd like. For the "one-step" pipeline, create a random image and call the UNet and scheduler once by setting `timestep=1`.
Contributing a community pipeline is a great way to understand how Diffusers models and schedulers work. Having contributed a community pipeline is usually the first stepping stone to contributing an official pipeline to the
core package.
```py
from diffusers import DiffusionPipeline
import torch
class UnetSchedulerOneForwardPipeline(DiffusionPipeline):
def __init__(self, unet, scheduler):
super().__init__()
self.register_modules(unet=unet, scheduler=scheduler)
def __call__(self):
image = torch.randn(
(1, self.unet.config.in_channels, self.unet.config.sample_size, self.unet.config.sample_size),
)
timestep = 1
model_output = self.unet(image, timestep).sample
scheduler_output = self.scheduler.step(model_output, timestep, image).prev_sample
return scheduler_output
```
Now you can run the pipeline by passing a UNet and scheduler to it or load pretrained weights if the pipeline structure is identical.
```py
from diffusers import DDPMScheduler, UNet2DModel
scheduler = DDPMScheduler()
unet = UNet2DModel()
pipeline = UnetSchedulerOneForwardPipeline(unet=unet, scheduler=scheduler)
output = pipeline()
# load pretrained weights
pipeline = UnetSchedulerOneForwardPipeline.from_pretrained("google/ddpm-cifar10-32", use_safetensors=True)
output = pipeline()
```
You can either share your pipeline as a GitHub community pipeline or Hub community pipeline.
<hfoptions id="pipeline type">
<hfoption id="GitHub pipeline">
Share your GitHub pipeline by opening a pull request on the Diffusers [repository](https://github.com/huggingface/diffusers) and add the one_step_unet.py file to the [examples/community](https://github.com/huggingface/diffusers/tree/main/examples/community) subfolder.
</hfoption>
<hfoption id="Hub pipeline">
Share your Hub pipeline by creating a model repository on the Hub and uploading the one_step_unet.py file to it.
</hfoption>
</hfoptions>
### 7. Contribute to training examples
@@ -245,7 +287,7 @@ The official training examples are maintained by the Diffusers' core maintainers
This is because of the same reasons put forward in [6. Contribute a community pipeline](#6-contribute-a-community-pipeline) for official pipelines vs. community pipelines: It is not feasible for the core maintainers to maintain all possible training methods for diffusion models.
If the Diffusers core maintainers and the community consider a certain training paradigm to be too experimental or not popular enough, the corresponding training code should be put in the `research_projects` folder and maintained by the author.
Both official training and research examples consist of a directory that contains one or more training scripts, a requirements.txt file, and a README.md file. In order for the user to make use of the
Both official training and research examples consist of a directory that contains one or more training scripts, a `requirements.txt` file, and a `README.md` file. In order for the user to make use of the
training examples, it is required to clone the repository:
```bash
@@ -255,7 +297,8 @@ git clone https://github.com/huggingface/diffusers
as well as to install all additional dependencies required for training:
```bash
pip install -r /examples/<your-example-folder>/requirements.txt
cd diffusers
pip install -r examples/<your-example-folder>/requirements.txt
```
Therefore when adding an example, the `requirements.txt` file shall define all pip dependencies required for your training example so that once all those are installed, the user can run the example's training script. See, for example, the [DreamBooth `requirements.txt` file](https://github.com/huggingface/diffusers/blob/main/examples/dreambooth/requirements.txt).
@@ -273,7 +316,7 @@ Once an example script works, please make sure to add a comprehensive `README.md
- A link to some training results (logs, models, etc.) that show what the user can expect as shown [here](https://api.wandb.ai/report/patrickvonplaten/xm6cd5q5).
- If you are adding a non-official/research training example, **please don't forget** to add a sentence that you are maintaining this training example which includes your git handle as shown [here](https://github.com/huggingface/diffusers/tree/main/examples/research_projects/intel_opts#diffusers-examples-with-intel-optimizations).
If you are contributing to the official training examples, please also make sure to add a test to [examples/test_examples.py](https://github.com/huggingface/diffusers/blob/main/examples/test_examples.py). This is not necessary for non-official training examples.
If you are contributing to the official training examples, please also make sure to add a test to its folder such as [examples/dreambooth/test_dreambooth.py](https://github.com/huggingface/diffusers/blob/main/examples/dreambooth/test_dreambooth.py). This is not necessary for non-official training examples.
### 8. Fixing a "Good second issue"
@@ -375,7 +418,7 @@ You will need basic `git` proficiency to be able to contribute to
manual. Type `git --help` in a shell and enjoy. If you prefer books, [Pro
Git](https://git-scm.com/book/en/v2) is a very good reference.
Follow these steps to start contributing ([supported Python versions](https://github.com/huggingface/diffusers/blob/main/setup.py#L244)):
Follow these steps to start contributing ([supported Python versions](https://github.com/huggingface/diffusers/blob/83bc6c94eaeb6f7704a2a428931cf2d9ad973ae9/setup.py#L270)):
1. Fork the [repository](https://github.com/huggingface/diffusers) by
clicking on the 'Fork' button on the repository's page. This creates a copy of the code

View File

@@ -70,7 +70,7 @@ The following design principles are followed:
- Pipelines should be used **only** for inference.
- Pipelines should be very readable, self-explanatory, and easy to tweak.
- Pipelines should be designed to build on top of each other and be easy to integrate into higher-level APIs.
- Pipelines are **not** intended to be feature-complete user interfaces. For future complete user interfaces one should rather have a look at [InvokeAI](https://github.com/invoke-ai/InvokeAI), [Diffuzers](https://github.com/abhishekkrthakur/diffuzers), and [lama-cleaner](https://github.com/Sanster/lama-cleaner).
- Pipelines are **not** intended to be feature-complete user interfaces. For feature-complete user interfaces one should rather have a look at [InvokeAI](https://github.com/invoke-ai/InvokeAI), [Diffuzers](https://github.com/abhishekkrthakur/diffuzers), and [lama-cleaner](https://github.com/Sanster/lama-cleaner).
- Every pipeline should have one and only one way to run it via a `__call__` method. The naming of the `__call__` arguments should be shared across all pipelines.
- Pipelines should be named after the task they are intended to solve.
- In almost all cases, novel diffusion pipelines shall be implemented in a new pipeline folder/file.
@@ -81,7 +81,7 @@ Models are designed as configurable toolboxes that are natural extensions of [Py
The following design principles are followed:
- Models correspond to **a type of model architecture**. *E.g.* the [`UNet2DConditionModel`] class is used for all UNet variations that expect 2D image inputs and are conditioned on some context.
- All models can be found in [`src/diffusers/models`](https://github.com/huggingface/diffusers/tree/main/src/diffusers/models) and every model architecture shall be defined in its file, e.g. [`unet_2d_condition.py`](https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/unet_2d_condition.py), [`transformer_2d.py`](https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/transformer_2d.py), etc...
- All models can be found in [`src/diffusers/models`](https://github.com/huggingface/diffusers/tree/main/src/diffusers/models) and every model architecture shall be defined in its file, e.g. [`unets/unet_2d_condition.py`](https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/unets/unet_2d_condition.py), [`transformers/transformer_2d.py`](https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/transformers/transformer_2d.py), etc...
- Models **do not** follow the single-file policy and should make use of smaller model building blocks, such as [`attention.py`](https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/attention.py), [`resnet.py`](https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/resnet.py), [`embeddings.py`](https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/embeddings.py), etc... **Note**: This is in stark contrast to Transformers' modeling files and shows that models do not really follow the single-file policy.
- Models intend to expose complexity, just like PyTorch's `Module` class, and give clear error messages.
- Models all inherit from `ModelMixin` and `ConfigMixin`.
@@ -90,7 +90,7 @@ The following design principles are followed:
- To integrate new model checkpoints whose general architecture can be classified as an architecture that already exists in Diffusers, the existing model architecture shall be adapted to make it work with the new checkpoint. One should only create a new file if the model architecture is fundamentally different.
- Models should be designed to be easily extendable to future changes. This can be achieved by limiting public function arguments, configuration arguments, and "foreseeing" future changes, *e.g.* it is usually better to add `string` "...type" arguments that can easily be extended to new future types instead of boolean `is_..._type` arguments. Only the minimum amount of changes shall be made to existing architectures to make a new model checkpoint work.
- The model design is a difficult trade-off between keeping code readable and concise and supporting many model checkpoints. For most parts of the modeling code, classes shall be adapted for new model checkpoints, while there are some exceptions where it is preferred to add new classes to make sure the code is kept concise and
readable long-term, such as [UNet blocks](https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/unet_2d_blocks.py) and [Attention processors](https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/attention_processor.py).
readable long-term, such as [UNet blocks](https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/unets/unet_2d_blocks.py) and [Attention processors](https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/attention_processor.py).
### Schedulers
@@ -102,7 +102,7 @@ The following design principles are followed:
- One scheduler Python file corresponds to one scheduler algorithm (as might be defined in a paper).
- If schedulers share similar functionalities, we can make use of the `# Copied from` mechanism.
- Schedulers all inherit from `SchedulerMixin` and `ConfigMixin`.
- Schedulers can be easily swapped out with the [`ConfigMixin.from_config`](https://huggingface.co/docs/diffusers/main/en/api/configuration#diffusers.ConfigMixin.from_config) method as explained in detail [here](../using-diffusers/schedulers.md).
- Schedulers can be easily swapped out with the [`ConfigMixin.from_config`](https://huggingface.co/docs/diffusers/main/en/api/configuration#diffusers.ConfigMixin.from_config) method as explained in detail [here](../using-diffusers/schedulers).
- Every scheduler has to have a `set_num_inference_steps`, and a `step` function. `set_num_inference_steps(...)` has to be called before every denoising process, *i.e.* before `step(...)` is called.
- Every scheduler exposes the timesteps to be "looped over" via a `timesteps` attribute, which is an array of timesteps the model will be called upon.
- The `step(...)` function takes a predicted model output and the "current" sample (x_t) and returns the "previous", slightly more denoised sample (x_t-1).

View File

@@ -112,7 +112,7 @@ pip install -e ".[flax]"
These commands will link the folder you cloned the repository to and your Python library paths.
Python will now look inside the folder you cloned to in addition to the normal library paths.
For example, if your Python packages are typically installed in `~/anaconda3/envs/main/lib/python3.8/site-packages/`, Python will also search the `~/diffusers/` folder you cloned to.
For example, if your Python packages are typically installed in `~/anaconda3/envs/main/lib/python3.10/site-packages/`, Python will also search the `~/diffusers/` folder you cloned to.
<Tip warning={true}>

View File

@@ -12,27 +12,23 @@ specific language governing permissions and limitations under the License.
# Speed up inference
There are several ways to optimize 🤗 Diffusers for inference speed. As a general rule of thumb, we recommend using either [xFormers](xformers) or `torch.nn.functional.scaled_dot_product_attention` in PyTorch 2.0 for their memory-efficient attention.
There are several ways to optimize Diffusers for inference speed, such as reducing the computational burden by lowering the data precision or using a lightweight distilled model. There are also memory-efficient attention implementations, [xFormers](xformers) and [scaled dot product attention](https://pytorch.org/docs/stable/generated/torch.nn.functional.scaled_dot_product_attention.html) in PyTorch 2.0, that reduce memory usage which also indirectly speeds up inference. Different speed optimizations can be stacked together to get the fastest inference times.
<Tip>
> [!TIP]
> Optimizing for inference speed or reduced memory usage can lead to improved performance in the other category, so you should try to optimize for both whenever you can. This guide focuses on inference speed, but you can learn more about lowering memory usage in the [Reduce memory usage](memory) guide.
In many cases, optimizing for speed or memory leads to improved performance in the other, so you should try to optimize for both whenever you can. This guide focuses on inference speed, but you can learn more about preserving memory in the [Reduce memory usage](memory) guide.
The inference times below are obtained from generating a single 512x512 image from the prompt "a photo of an astronaut riding a horse on mars" with 50 DDIM steps on a NVIDIA A100.
</Tip>
| setup | latency | speed-up |
|----------|---------|----------|
| baseline | 5.27s | x1 |
| tf32 | 4.14s | x1.27 |
| fp16 | 3.51s | x1.50 |
| combined | 3.41s | x1.54 |
The results below are obtained from generating a single 512x512 image from the prompt `a photo of an astronaut riding a horse on mars` with 50 DDIM steps on a Nvidia Titan RTX, demonstrating the speed-up you can expect.
## TensorFloat-32
| | latency | speed-up |
| ---------------- | ------- | ------- |
| original | 9.50s | x1 |
| fp16 | 3.61s | x2.63 |
| channels last | 3.30s | x2.88 |
| traced UNet | 3.21s | x2.96 |
| memory efficient attention | 2.63s | x3.61 |
## Use TensorFloat-32
On Ampere and later CUDA devices, matrix multiplications and convolutions can use the [TensorFloat-32 (TF32)](https://blogs.nvidia.com/blog/2020/05/14/tensorfloat-32-precision-format/) mode for faster, but slightly less accurate computations. By default, PyTorch enables TF32 mode for convolutions but not matrix multiplications. Unless your network requires full float32 precision, we recommend enabling TF32 for matrix multiplications. It can significantly speeds up computations with typically negligible loss in numerical accuracy.
On Ampere and later CUDA devices, matrix multiplications and convolutions can use the [TensorFloat-32 (tf32)](https://blogs.nvidia.com/blog/2020/05/14/tensorfloat-32-precision-format/) mode for faster, but slightly less accurate computations. By default, PyTorch enables tf32 mode for convolutions but not matrix multiplications. Unless your network requires full float32 precision, we recommend enabling tf32 for matrix multiplications. It can significantly speed up computations with typically negligible loss in numerical accuracy.
```python
import torch
@@ -40,11 +36,11 @@ import torch
torch.backends.cuda.matmul.allow_tf32 = True
```
You can learn more about TF32 in the [Mixed precision training](https://huggingface.co/docs/transformers/en/perf_train_gpu_one#tf32) guide.
Learn more about tf32 in the [Mixed precision training](https://huggingface.co/docs/transformers/en/perf_train_gpu_one#tf32) guide.
## Half-precision weights
To save GPU memory and get more speed, try loading and running the model weights directly in half-precision or float16:
To save GPU memory and get more speed, set `torch_dtype=torch.float16` to load and run the model weights directly with half-precision weights.
```Python
import torch
@@ -56,19 +52,76 @@ pipe = DiffusionPipeline.from_pretrained(
use_safetensors=True,
)
pipe = pipe.to("cuda")
prompt = "a photo of an astronaut riding a horse on mars"
image = pipe(prompt).images[0]
```
<Tip warning={true}>
Don't use [`torch.autocast`](https://pytorch.org/docs/stable/amp.html#torch.autocast) in any of the pipelines as it can lead to black images and is always slower than pure float16 precision.
</Tip>
> [!WARNING]
> Don't use [torch.autocast](https://pytorch.org/docs/stable/amp.html#torch.autocast) in any of the pipelines as it can lead to black images and is always slower than pure float16 precision.
## Distilled model
You could also use a distilled Stable Diffusion model and autoencoder to speed up inference. During distillation, many of the UNet's residual and attention blocks are shed to reduce the model size. The distilled model is faster and uses less memory while generating images of comparable quality to the full Stable Diffusion model.
You could also use a distilled Stable Diffusion model and autoencoder to speed up inference. During distillation, many of the UNet's residual and attention blocks are shed to reduce the model size by 51% and improve latency on CPU/GPU by 43%. The distilled model is faster and uses less memory while generating images of comparable quality to the full Stable Diffusion model.
Learn more about in the [Distilled Stable Diffusion inference](../using-diffusers/distilled_sd) guide!
> [!TIP]
> Read the [Open-sourcing Knowledge Distillation Code and Weights of SD-Small and SD-Tiny](https://huggingface.co/blog/sd_distillation) blog post to learn more about how knowledge distillation training works to produce a faster, smaller, and cheaper generative model.
The inference times below are obtained from generating 4 images from the prompt "a photo of an astronaut riding a horse on mars" with 25 PNDM steps on a NVIDIA A100. Each generation is repeated 3 times with the distilled Stable Diffusion v1.4 model by [Nota AI](https://hf.co/nota-ai).
| setup | latency | speed-up |
|------------------------------|---------|----------|
| baseline | 6.37s | x1 |
| distilled | 4.18s | x1.52 |
| distilled + tiny autoencoder | 3.83s | x1.66 |
Let's load the distilled Stable Diffusion model and compare it against the original Stable Diffusion model.
```py
from diffusers import StableDiffusionPipeline
import torch
distilled = StableDiffusionPipeline.from_pretrained(
"nota-ai/bk-sdm-small", torch_dtype=torch.float16, use_safetensors=True,
).to("cuda")
prompt = "a golden vase with different flowers"
generator = torch.manual_seed(2023)
image = distilled("a golden vase with different flowers", num_inference_steps=25, generator=generator).images[0]
image
```
<div class="flex gap-4">
<div>
<img class="rounded-xl" src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/original_sd.png"/>
<figcaption class="mt-2 text-center text-sm text-gray-500">original Stable Diffusion</figcaption>
</div>
<div>
<img class="rounded-xl" src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/distilled_sd.png"/>
<figcaption class="mt-2 text-center text-sm text-gray-500">distilled Stable Diffusion</figcaption>
</div>
</div>
### Tiny AutoEncoder
To speed inference up even more, replace the autoencoder with a [distilled version](https://huggingface.co/sayakpaul/taesdxl-diffusers) of it.
```py
import torch
from diffusers import AutoencoderTiny, StableDiffusionPipeline
distilled = StableDiffusionPipeline.from_pretrained(
"nota-ai/bk-sdm-small", torch_dtype=torch.float16, use_safetensors=True,
).to("cuda")
distilled.vae = AutoencoderTiny.from_pretrained(
"sayakpaul/taesd-diffusers", torch_dtype=torch.float16, use_safetensors=True,
).to("cuda")
prompt = "a golden vase with different flowers"
generator = torch.manual_seed(2023)
image = distilled("a golden vase with different flowers", num_inference_steps=25, generator=generator).images[0]
image
```
<div class="flex justify-center">
<div>
<img class="rounded-xl" src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/distilled_sd_vae.png" />
<figcaption class="mt-2 text-center text-sm text-gray-500">distilled Stable Diffusion + Tiny AutoEncoder</figcaption>
</div>
</div>

View File

@@ -261,7 +261,7 @@ from dataclasses import dataclass
@dataclass
class UNet2DConditionOutput:
sample: torch.FloatTensor
sample: torch.Tensor
pipe = StableDiffusionPipeline.from_pretrained(

View File

@@ -1,17 +0,0 @@
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->
# Overview
Generating high-quality outputs is computationally intensive, especially during each iterative step where you go from a noisy output to a less noisy output. One of 🤗 Diffuser's goals is to make this technology widely accessible to everyone, which includes enabling fast inference on consumer and specialized hardware.
This section will cover tips and tricks - like half-precision weights and sliced attention - for optimizing inference speed and reducing memory-consumption. You'll also learn how to speed up your PyTorch code with [`torch.compile`](https://pytorch.org/tutorials/intermediate/torch_compile_tutorial.html) or [ONNX Runtime](https://onnxruntime.ai/docs/), and enable memory-efficient attention with [xFormers](https://facebookresearch.github.io/xformers/). There are also guides for running inference on specific hardware like Apple Silicon, and Intel or Habana processors.

View File

@@ -0,0 +1,182 @@
# T-GATE
[T-GATE](https://github.com/HaozheLiu-ST/T-GATE/tree/main) accelerates inference for [Stable Diffusion](../api/pipelines/stable_diffusion/overview), [PixArt](../api/pipelines/pixart), and [Latency Consistency Model](../api/pipelines/latent_consistency_models.md) pipelines by skipping the cross-attention calculation once it converges. This method doesn't require any additional training and it can speed up inference from 10-50%. T-GATE is also compatible with other optimization methods like [DeepCache](./deepcache).
Before you begin, make sure you install T-GATE.
```bash
pip install tgate
pip install -U torch diffusers transformers accelerate DeepCache
```
To use T-GATE with a pipeline, you need to use its corresponding loader.
| Pipeline | T-GATE Loader |
|---|---|
| PixArt | TgatePixArtLoader |
| Stable Diffusion XL | TgateSDXLLoader |
| Stable Diffusion XL + DeepCache | TgateSDXLDeepCacheLoader |
| Stable Diffusion | TgateSDLoader |
| Stable Diffusion + DeepCache | TgateSDDeepCacheLoader |
Next, create a `TgateLoader` with a pipeline, the gate step (the time step to stop calculating the cross attention), and the number of inference steps. Then call the `tgate` method on the pipeline with a prompt, gate step, and the number of inference steps.
Let's see how to enable this for several different pipelines.
<hfoptions id="pipelines">
<hfoption id="PixArt">
Accelerate `PixArtAlphaPipeline` with T-GATE:
```py
import torch
from diffusers import PixArtAlphaPipeline
from tgate import TgatePixArtLoader
pipe = PixArtAlphaPipeline.from_pretrained("PixArt-alpha/PixArt-XL-2-1024-MS", torch_dtype=torch.float16)
gate_step = 8
inference_step = 25
pipe = TgatePixArtLoader(
pipe,
gate_step=gate_step,
num_inference_steps=inference_step,
).to("cuda")
image = pipe.tgate(
"An alpaca made of colorful building blocks, cyberpunk.",
gate_step=gate_step,
num_inference_steps=inference_step,
).images[0]
```
</hfoption>
<hfoption id="Stable Diffusion XL">
Accelerate `StableDiffusionXLPipeline` with T-GATE:
```py
import torch
from diffusers import StableDiffusionXLPipeline
from diffusers import DPMSolverMultistepScheduler
from tgate import TgateSDXLLoader
pipe = StableDiffusionXLPipeline.from_pretrained(
"stabilityai/stable-diffusion-xl-base-1.0",
torch_dtype=torch.float16,
variant="fp16",
use_safetensors=True,
)
pipe.scheduler = DPMSolverMultistepScheduler.from_config(pipe.scheduler.config)
gate_step = 10
inference_step = 25
pipe = TgateSDXLLoader(
pipe,
gate_step=gate_step,
num_inference_steps=inference_step,
).to("cuda")
image = pipe.tgate(
"Astronaut in a jungle, cold color palette, muted colors, detailed, 8k.",
gate_step=gate_step,
num_inference_steps=inference_step
).images[0]
```
</hfoption>
<hfoption id="StableDiffusionXL with DeepCache">
Accelerate `StableDiffusionXLPipeline` with [DeepCache](https://github.com/horseee/DeepCache) and T-GATE:
```py
import torch
from diffusers import StableDiffusionXLPipeline
from diffusers import DPMSolverMultistepScheduler
from tgate import TgateSDXLDeepCacheLoader
pipe = StableDiffusionXLPipeline.from_pretrained(
"stabilityai/stable-diffusion-xl-base-1.0",
torch_dtype=torch.float16,
variant="fp16",
use_safetensors=True,
)
pipe.scheduler = DPMSolverMultistepScheduler.from_config(pipe.scheduler.config)
gate_step = 10
inference_step = 25
pipe = TgateSDXLDeepCacheLoader(
pipe,
cache_interval=3,
cache_branch_id=0,
).to("cuda")
image = pipe.tgate(
"Astronaut in a jungle, cold color palette, muted colors, detailed, 8k.",
gate_step=gate_step,
num_inference_steps=inference_step
).images[0]
```
</hfoption>
<hfoption id="Latent Consistency Model">
Accelerate `latent-consistency/lcm-sdxl` with T-GATE:
```py
import torch
from diffusers import StableDiffusionXLPipeline
from diffusers import UNet2DConditionModel, LCMScheduler
from diffusers import DPMSolverMultistepScheduler
from tgate import TgateSDXLLoader
unet = UNet2DConditionModel.from_pretrained(
"latent-consistency/lcm-sdxl",
torch_dtype=torch.float16,
variant="fp16",
)
pipe = StableDiffusionXLPipeline.from_pretrained(
"stabilityai/stable-diffusion-xl-base-1.0",
unet=unet,
torch_dtype=torch.float16,
variant="fp16",
)
pipe.scheduler = LCMScheduler.from_config(pipe.scheduler.config)
gate_step = 1
inference_step = 4
pipe = TgateSDXLLoader(
pipe,
gate_step=gate_step,
num_inference_steps=inference_step,
lcm=True
).to("cuda")
image = pipe.tgate(
"Astronaut in a jungle, cold color palette, muted colors, detailed, 8k.",
gate_step=gate_step,
num_inference_steps=inference_step
).images[0]
```
</hfoption>
</hfoptions>
T-GATE also supports [`StableDiffusionPipeline`] and [PixArt-alpha/PixArt-LCM-XL-2-1024-MS](https://hf.co/PixArt-alpha/PixArt-LCM-XL-2-1024-MS).
## Benchmarks
| Model | MACs | Param | Latency | Zero-shot 10K-FID on MS-COCO |
|-----------------------|----------|-----------|---------|---------------------------|
| SD-1.5 | 16.938T | 859.520M | 7.032s | 23.927 |
| SD-1.5 w/ T-GATE | 9.875T | 815.557M | 4.313s | 20.789 |
| SD-2.1 | 38.041T | 865.785M | 16.121s | 22.609 |
| SD-2.1 w/ T-GATE | 22.208T | 815.433 M | 9.878s | 19.940 |
| SD-XL | 149.438T | 2.570B | 53.187s | 24.628 |
| SD-XL w/ T-GATE | 84.438T | 2.024B | 27.932s | 22.738 |
| Pixart-Alpha | 107.031T | 611.350M | 61.502s | 38.669 |
| Pixart-Alpha w/ T-GATE | 65.318T | 462.585M | 37.867s | 35.825 |
| DeepCache (SD-XL) | 57.888T | - | 19.931s | 23.755 |
| DeepCache w/ T-GATE | 43.868T | - | 14.666s | 23.999 |
| LCM (SD-XL) | 11.955T | 2.570B | 3.805s | 25.044 |
| LCM w/ T-GATE | 11.171T | 2.024B | 3.533s | 25.028 |
| LCM (Pixart-Alpha) | 8.563T | 611.350M | 4.733s | 36.086 |
| LCM w/ T-GATE | 7.623T | 462.585M | 4.543s | 37.048 |
The latency is tested on an NVIDIA 1080TI, MACs and Params are calculated with [calflops](https://github.com/MrYxJ/calculate-flops.pytorch), and the FID is calculated with [PytorchFID](https://github.com/mseitzer/pytorch-fid).

View File

@@ -49,7 +49,7 @@ One of the simplest ways to speed up inference is to place the pipeline on a GPU
pipeline = pipeline.to("cuda")
```
To make sure you can use the same image and improve on it, use a [`Generator`](https://pytorch.org/docs/stable/generated/torch.Generator.html) and set a seed for [reproducibility](./using-diffusers/reproducibility):
To make sure you can use the same image and improve on it, use a [`Generator`](https://pytorch.org/docs/stable/generated/torch.Generator.html) and set a seed for [reproducibility](./using-diffusers/reusing_seeds):
```python
import torch

View File

@@ -88,7 +88,7 @@ accelerate config default
Or if your environment doesn't support an interactive shell, like a notebook, you can use:
```bash
```py
from accelerate.utils import write_basic_config
write_basic_config()
@@ -349,7 +349,7 @@ control_image = load_image("./conditioning_image_1.png")
prompt = "pale golden rod circle with old lace background"
generator = torch.manual_seed(0)
image = pipe(prompt, num_inference_steps=20, generator=generator, image=control_image).images[0]
image = pipeline(prompt, num_inference_steps=20, generator=generator, image=control_image).images[0]
image.save("./output.png")
```

View File

@@ -54,7 +54,7 @@ accelerate config default
Or if your environment doesn't support an interactive shell, like a notebook, you can use:
```bash
```py
from accelerate.utils import write_basic_config
write_basic_config()
@@ -84,7 +84,7 @@ Many of the basic parameters are described in the [DreamBooth](dreambooth#script
- `--freeze_model`: freezes the key and value parameters in the cross-attention layer; the default is `crossattn_kv`, but you can set it to `crossattn` to train all the parameters in the cross-attention layer
- `--concepts_list`: to learn multiple concepts, provide a path to a JSON file containing the concepts
- `--modifier_token`: a special word used to represent the learned concept
- `--initializer_token`:
- `--initializer_token`: a special word used to initialize the embeddings of the `modifier_token`
### Prior preservation loss

View File

@@ -106,3 +106,6 @@ Once you've completed the inference script, use the `--nproc_per_node` argument
```bash
torchrun run_distributed.py --nproc_per_node=2
```
> [!TIP]
> You can use `device_map` within a [`DiffusionPipeline`] to distribute its model-level components on multiple devices. Refer to the [Device placement](../tutorials/inference_with_big_models#device-placement) guide to learn more.

View File

@@ -67,7 +67,7 @@ accelerate config default
Or if your environment doesn't support an interactive shell, like a notebook, you can use:
```bash
```py
from accelerate.utils import write_basic_config
write_basic_config()
@@ -440,6 +440,198 @@ Stable Diffusion XL (SDXL) is a powerful text-to-image model that generates high
The SDXL training script is discussed in more detail in the [SDXL training](sdxl) guide.
## DeepFloyd IF
DeepFloyd IF is a cascading pixel diffusion model with three stages. The first stage generates a base image and the second and third stages progressively upscales the base image into a high-resolution 1024x1024 image. Use the [train_dreambooth_lora.py](https://github.com/huggingface/diffusers/blob/main/examples/dreambooth/train_dreambooth_lora.py) or [train_dreambooth.py](https://github.com/huggingface/diffusers/blob/main/examples/dreambooth/train_dreambooth.py) scripts to train a DeepFloyd IF model with LoRA or the full model.
DeepFloyd IF uses predicted variance, but the Diffusers training scripts uses predicted error so the trained DeepFloyd IF models are switched to a fixed variance schedule. The training scripts will update the scheduler config of the fully trained model for you. However, when you load the saved LoRA weights you must also update the pipeline's scheduler config.
```py
from diffusers import DiffusionPipeline
pipe = DiffusionPipeline.from_pretrained("DeepFloyd/IF-I-XL-v1.0", use_safetensors=True)
pipe.load_lora_weights("<lora weights path>")
# Update scheduler config to fixed variance schedule
pipe.scheduler = pipe.scheduler.__class__.from_config(pipe.scheduler.config, variance_type="fixed_small")
```
The stage 2 model requires additional validation images to upscale. You can download and use a downsized version of the training images for this.
```py
from huggingface_hub import snapshot_download
local_dir = "./dog_downsized"
snapshot_download(
"diffusers/dog-example-downsized",
local_dir=local_dir,
repo_type="dataset",
ignore_patterns=".gitattributes",
)
```
The code samples below provide a brief overview of how to train a DeepFloyd IF model with a combination of DreamBooth and LoRA. Some important parameters to note are:
* `--resolution=64`, a much smaller resolution is required because DeepFloyd IF is a pixel diffusion model and to work on uncompressed pixels, the input images must be smaller
* `--pre_compute_text_embeddings`, compute the text embeddings ahead of time to save memory because the [`~transformers.T5Model`] can take up a lot of memory
* `--tokenizer_max_length=77`, you can use a longer default text length with T5 as the text encoder but the default model encoding procedure uses a shorter text length
* `--text_encoder_use_attention_mask`, to pass the attention mask to the text encoder
<hfoptions id="IF-DreamBooth">
<hfoption id="Stage 1 LoRA DreamBooth">
Training stage 1 of DeepFloyd IF with LoRA and DreamBooth requires ~28GB of memory.
```bash
export MODEL_NAME="DeepFloyd/IF-I-XL-v1.0"
export INSTANCE_DIR="dog"
export OUTPUT_DIR="dreambooth_dog_lora"
accelerate launch train_dreambooth_lora.py \
--report_to wandb \
--pretrained_model_name_or_path=$MODEL_NAME \
--instance_data_dir=$INSTANCE_DIR \
--output_dir=$OUTPUT_DIR \
--instance_prompt="a sks dog" \
--resolution=64 \
--train_batch_size=4 \
--gradient_accumulation_steps=1 \
--learning_rate=5e-6 \
--scale_lr \
--max_train_steps=1200 \
--validation_prompt="a sks dog" \
--validation_epochs=25 \
--checkpointing_steps=100 \
--pre_compute_text_embeddings \
--tokenizer_max_length=77 \
--text_encoder_use_attention_mask
```
</hfoption>
<hfoption id="Stage 2 LoRA DreamBooth">
For stage 2 of DeepFloyd IF with LoRA and DreamBooth, pay attention to these parameters:
* `--validation_images`, the images to upscale during validation
* `--class_labels_conditioning=timesteps`, to additionally conditional the UNet as needed in stage 2
* `--learning_rate=1e-6`, a lower learning rate is used compared to stage 1
* `--resolution=256`, the expected resolution for the upscaler
```bash
export MODEL_NAME="DeepFloyd/IF-II-L-v1.0"
export INSTANCE_DIR="dog"
export OUTPUT_DIR="dreambooth_dog_upscale"
export VALIDATION_IMAGES="dog_downsized/image_1.png dog_downsized/image_2.png dog_downsized/image_3.png dog_downsized/image_4.png"
python train_dreambooth_lora.py \
--report_to wandb \
--pretrained_model_name_or_path=$MODEL_NAME \
--instance_data_dir=$INSTANCE_DIR \
--output_dir=$OUTPUT_DIR \
--instance_prompt="a sks dog" \
--resolution=256 \
--train_batch_size=4 \
--gradient_accumulation_steps=1 \
--learning_rate=1e-6 \
--max_train_steps=2000 \
--validation_prompt="a sks dog" \
--validation_epochs=100 \
--checkpointing_steps=500 \
--pre_compute_text_embeddings \
--tokenizer_max_length=77 \
--text_encoder_use_attention_mask \
--validation_images $VALIDATION_IMAGES \
--class_labels_conditioning=timesteps
```
</hfoption>
<hfoption id="Stage 1 DreamBooth">
For stage 1 of DeepFloyd IF with DreamBooth, pay attention to these parameters:
* `--skip_save_text_encoder`, to skip saving the full T5 text encoder with the finetuned model
* `--use_8bit_adam`, to use 8-bit Adam optimizer to save memory due to the size of the optimizer state when training the full model
* `--learning_rate=1e-7`, a really low learning rate should be used for full model training otherwise the model quality is degraded (you can use a higher learning rate with a larger batch size)
Training with 8-bit Adam and a batch size of 4, the full model can be trained with ~48GB of memory.
```bash
export MODEL_NAME="DeepFloyd/IF-I-XL-v1.0"
export INSTANCE_DIR="dog"
export OUTPUT_DIR="dreambooth_if"
accelerate launch train_dreambooth.py \
--pretrained_model_name_or_path=$MODEL_NAME \
--instance_data_dir=$INSTANCE_DIR \
--output_dir=$OUTPUT_DIR \
--instance_prompt="a photo of sks dog" \
--resolution=64 \
--train_batch_size=4 \
--gradient_accumulation_steps=1 \
--learning_rate=1e-7 \
--max_train_steps=150 \
--validation_prompt "a photo of sks dog" \
--validation_steps 25 \
--text_encoder_use_attention_mask \
--tokenizer_max_length 77 \
--pre_compute_text_embeddings \
--use_8bit_adam \
--set_grads_to_none \
--skip_save_text_encoder \
--push_to_hub
```
</hfoption>
<hfoption id="Stage 2 DreamBooth">
For stage 2 of DeepFloyd IF with DreamBooth, pay attention to these parameters:
* `--learning_rate=5e-6`, use a lower learning rate with a smaller effective batch size
* `--resolution=256`, the expected resolution for the upscaler
* `--train_batch_size=2` and `--gradient_accumulation_steps=6`, to effectively train on images wiht faces requires larger batch sizes
```bash
export MODEL_NAME="DeepFloyd/IF-II-L-v1.0"
export INSTANCE_DIR="dog"
export OUTPUT_DIR="dreambooth_dog_upscale"
export VALIDATION_IMAGES="dog_downsized/image_1.png dog_downsized/image_2.png dog_downsized/image_3.png dog_downsized/image_4.png"
accelerate launch train_dreambooth.py \
--report_to wandb \
--pretrained_model_name_or_path=$MODEL_NAME \
--instance_data_dir=$INSTANCE_DIR \
--output_dir=$OUTPUT_DIR \
--instance_prompt="a sks dog" \
--resolution=256 \
--train_batch_size=2 \
--gradient_accumulation_steps=6 \
--learning_rate=5e-6 \
--max_train_steps=2000 \
--validation_prompt="a sks dog" \
--validation_steps=150 \
--checkpointing_steps=500 \
--pre_compute_text_embeddings \
--tokenizer_max_length=77 \
--text_encoder_use_attention_mask \
--validation_images $VALIDATION_IMAGES \
--class_labels_conditioning timesteps \
--push_to_hub
```
</hfoption>
</hfoptions>
### Training tips
Training the DeepFloyd IF model can be challenging, but here are some tips that we've found helpful:
- LoRA is sufficient for training the stage 1 model because the model's low resolution makes representing finer details difficult regardless.
- For common or simple objects, you don't necessarily need to finetune the upscaler. Make sure the prompt passed to the upscaler is adjusted to remove the new token from the instance prompt. For example, if your stage 1 prompt is "a sks dog" then your stage 2 prompt should be "a dog".
- For finer details like faces, fully training the stage 2 upscaler is better than training the stage 2 model with LoRA. It also helps to use lower learning rates with larger batch sizes.
- Lower learning rates should be used to train the stage 2 model.
- The [`DDPMScheduler`] works better than the DPMSolver used in the training scripts.
## Next steps
Congratulations on training your DreamBooth model! To learn more about how to use your new model, the following guide may be helpful:

View File

@@ -51,7 +51,7 @@ accelerate config default
Or if your environment doesn't support an interactive shell, like a notebook, you can use:
```bash
```py
from accelerate.utils import write_basic_config
write_basic_config()
@@ -89,7 +89,7 @@ The dataset preprocessing code and training loop are found in the [`main()`](htt
As with the script parameters, a walkthrough of the training script is provided in the [Text-to-image](text2image#training-script) training guide. Instead, this guide takes a look at the InstructPix2Pix relevant parts of the script.
The script begins by modifing the [number of input channels](https://github.com/huggingface/diffusers/blob/64603389da01082055a901f2883c4810d1144edb/examples/instruct_pix2pix/train_instruct_pix2pix.py#L445) in the first convolutional layer of the UNet to account for InstructPix2Pix's additional conditioning image:
The script begins by modifying the [number of input channels](https://github.com/huggingface/diffusers/blob/64603389da01082055a901f2883c4810d1144edb/examples/instruct_pix2pix/train_instruct_pix2pix.py#L445) in the first convolutional layer of the UNet to account for InstructPix2Pix's additional conditioning image:
```py
in_channels = 8

Some files were not shown because too many files have changed in this diff Show More