Compare commits

...

769 Commits

Author SHA1 Message Date
DN6
289c385d73 update 2025-06-19 21:57:21 +05:30
DN6
a2dd339a4d Merge branch 'main' into group-offload-refactor 2025-06-19 21:55:44 +05:30
DN6
ebde934317 update 2025-06-19 21:46:03 +05:30
DN6
4854309f51 update 2025-06-19 21:44:30 +05:30
DN6
a9e9ef5be2 update 2025-06-19 21:39:18 +05:30
DN6
b60bf2dacc update 2025-06-19 21:31:44 +05:30
DN6
22e6fc180c update 2025-06-19 21:26:48 +05:30
DN6
1d38a30001 update 2025-06-19 21:24:25 +05:30
DN6
135934893e update 2025-06-19 19:22:06 +05:30
DN6
ace698aa96 update 2025-06-19 18:26:10 +05:30
Sayak Paul
85a916bb8b make group offloading work with disk/nvme transfers (#11682)
* start implementing disk offloading in group.

* delete diff file.

* updates.patch

* offload_to_disk_path

* check if safetensors already exist.

* add test and clarify.

* updates

* update todos.

* update more docs.

* update docs
2025-06-19 18:09:30 +05:30
Dhruv Nair
3287ce2890 Fix HiDream pipeline test module (#11754)
update
2025-06-19 17:06:14 +05:30
Dhruv Nair
0c11c8c1ac [CI] Fix SANA tests (#11756)
update
2025-06-19 17:06:02 +05:30
Dhruv Nair
fc51583c8a [CI] Fix WAN VACE tests (#11757)
update
2025-06-19 17:03:12 +05:30
sayakpaul
90e546ada1 update more docs. 2025-06-19 16:16:16 +05:30
sayakpaul
7d2295567f update todos. 2025-06-19 16:10:39 +05:30
sayakpaul
da11656af4 updates 2025-06-19 15:59:48 +05:30
sayakpaul
68f7580656 Merge branch 'main' into group-offloading-with-disk 2025-06-19 15:54:54 +05:30
Sayak Paul
fb57c76aa1 [LoRA] refactor lora loading at the model-level (#11719)
* factor out stuff from load_lora_adapter().

* simplifying text encoder lora loading.

* fix peft.py

* fix logging locations.

* formatting

* fix

* update

* update

* update
2025-06-19 13:06:25 +05:30
dependabot[bot]
7251bb4fd0 Bump urllib3 from 2.2.3 to 2.5.0 in /examples/server (#11748)
Bumps [urllib3](https://github.com/urllib3/urllib3) from 2.2.3 to 2.5.0.
- [Release notes](https://github.com/urllib3/urllib3/releases)
- [Changelog](https://github.com/urllib3/urllib3/blob/main/CHANGES.rst)
- [Commits](https://github.com/urllib3/urllib3/compare/2.2.3...2.5.0)

---
updated-dependencies:
- dependency-name: urllib3
  dependency-version: 2.5.0
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-06-19 11:09:33 +05:30
Sayak Paul
357226327f Merge branch 'main' into group-offloading-with-disk 2025-06-19 09:29:44 +05:30
Aryan
3fba74e153 Add missing HiDream license (#11747)
update
2025-06-19 08:07:47 +05:30
Aryan
a4df8dbc40 Update more licenses to 2025 (#11746)
update
2025-06-19 07:46:01 +05:30
Sayak Paul
48eae6f420 [Quantizers] add is_compileable property to quantizers. (#11736)
add is_compileable property to quantizers.
2025-06-19 07:45:06 +05:30
Dhruv Nair
66394bf6c7 Chroma Follow Up (#11725)
* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* updte

* update

* update

* update
2025-06-18 22:24:41 +05:30
Sayak Paul
62cce3045d [chore] change to 2025 licensing for remaining (#11741)
change to 2025 licensing for remaining
2025-06-18 20:56:00 +05:30
Sayak Paul
05e867784d [tests] device_map tests for all models. (#11708)
* device_map tests for all models.

* updates

* Update tests/models/test_modeling_common.py

Co-authored-by: Aryan <aryan@huggingface.co>

* fix device_map in test

---------

Co-authored-by: Aryan <aryan@huggingface.co>
2025-06-18 10:52:06 +05:30
Sayak Paul
3c69c5ecc0 Merge branch 'main' into group-offloading-with-disk 2025-06-18 09:43:35 +05:30
Leo Jiang
d72184eba3 [training] add ds support to lora hidream (#11737)
* [training] add ds support to lora hidream

* Apply style fixes

---------

Co-authored-by: J石页 <jiangshuo9@h-partners.com>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2025-06-18 09:26:02 +05:30
Saurabh Misra
5ce4814af1 ️ Speed up method AutoencoderKLWan.clear_cache by 886% (#11665)
* ️ Speed up method `AutoencoderKLWan.clear_cache` by 886%

**Key optimizations:**
- Compute the number of `WanCausalConv3d` modules in each model (`encoder`/`decoder`) **only once during initialization**, store in `self._cached_conv_counts`. This removes unnecessary repeated tree traversals at every `clear_cache` call, which was the main bottleneck (from profiling).
- The internal helper `_count_conv3d_fast` is optimized via a generator expression with `sum` for efficiency.

All comments from the original code are preserved, except for updated or removed local docstrings/comments relevant to changed lines.  
**Function signatures and outputs remain unchanged.**

* Apply style fixes

* Apply suggestions from code review

Co-authored-by: Aryan <contact.aryanvs@gmail.com>

* Apply style fixes

---------

Co-authored-by: codeflash-ai[bot] <148906541+codeflash-ai[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Aryan <aryan@huggingface.co>
Co-authored-by: Aryan <contact.aryanvs@gmail.com>
Co-authored-by: Aseem Saxena <aseem.bits@gmail.com>
2025-06-18 08:46:03 +05:30
Linoy Tsaban
1bc6f3dc0f [LoRA training] update metadata use for lora alpha + README (#11723)
* lora alpha

* Apply style fixes

* Update examples/advanced_diffusion_training/README_flux.md

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

* fix readme format

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2025-06-17 12:19:27 +03:00
Aryan
79bd7ecc78 Support more Wan loras (VACE) (#11726)
update
2025-06-17 10:39:18 +05:30
David Berenstein
9b834f8710 Add Pruna optimization framework documentation (#11688)
* Add Pruna optimization framework documentation

- Introduced a new section for Pruna in the table of contents.
- Added comprehensive documentation for Pruna, detailing its optimization techniques, installation instructions, and examples for optimizing and evaluating models

* Enhance Pruna documentation with image alt text and code block formatting

- Added alt text to images for better accessibility and context.
- Changed code block syntax from diff to python for improved clarity.

* Add installation section to Pruna documentation

- Introduced a new installation section in the Pruna documentation to guide users on how to install the framework.
- Enhanced the overall clarity and usability of the documentation for new users.

* Update pruna.md

* Update pruna.md

* Update Pruna documentation for model optimization and evaluation

- Changed section titles for consistency and clarity, from "Optimizing models" to "Optimize models" and "Evaluating and benchmarking optimized models" to "Evaluate and benchmark models".
- Enhanced descriptions to clarify the use of `diffusers` models and the evaluation process.
- Added a new example for evaluating standalone `diffusers` models.
- Updated references and links for better navigation within the documentation.

* Refactor Pruna documentation for clarity and consistency

- Removed outdated references to FLUX-juiced and streamlined the explanation of benchmarking.
- Enhanced the description of evaluating standalone `diffusers` models.
- Cleaned up code examples by removing unnecessary imports and comments for better readability.

* Apply suggestions from code review

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Enhance Pruna documentation with new examples and clarifications

- Added an image to illustrate the optimization process.
- Updated the explanation for sharing and loading optimized models on the Hugging Face Hub.
- Clarified the evaluation process for optimized models using the EvaluationAgent.
- Improved descriptions for defining metrics and evaluating standalone diffusers models.

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-06-16 12:25:05 -07:00
Carl Thomé
81426b0f19 Fix misleading comment (#11722) 2025-06-16 08:47:00 -10:00
Sayak Paul
f0dba33d82 [training] show how metadata stuff should be incorporated in training scripts. (#11707)
* show how metadata stuff should be incorporated in training scripts.

* typing

* fix

---------

Co-authored-by: Linoy Tsaban <57615435+linoytsaban@users.noreply.github.com>
2025-06-16 16:42:34 +05:30
Sayak Paul
d1db4f853a [LoRA ]fix flux lora loader when return_metadata is true for non-diffusers (#11716)
* fix flux lora loader when return_metadata is true for non-diffusers

* remove annotation
2025-06-16 14:26:35 +05:30
Sayak Paul
2d3056180a Merge branch 'main' into group-offloading-with-disk 2025-06-14 07:31:38 +05:30
Edna
8adc6003ba Chroma Pipeline (#11698)
* working state from hameerabbasi and iddl

* working state form hameerabbasi and iddl (transformer)

* working state (normalization)

* working state (embeddings)

* add chroma loader

* add chroma to mappings

* add chroma to transformer init

* take out variant stuff

* get decently far in changing variant stuff

* add chroma init

* make chroma output class

* add chroma transformer to dummy tp

* add chroma to init

* add chroma to init

* fix single file

* update

* update

* add chroma to auto pipeline

* add chroma to pipeline init

* change to chroma transformer

* take out variant from blocks

* swap embedder location

* remove prompt_2

* work on swapping text encoders

* remove mask function

* dont modify mask (for now)

* wrap attn mask

* no attn mask (can't get it to work)

* remove pooled prompt embeds

* change to my own unpooled embeddeer

* fix load

* take pooled projections out of transformer

* ensure correct dtype for chroma embeddings

* update

* use dn6 attn mask + fix true_cfg_scale

* use chroma pipeline output

* use DN6 embeddings

* remove guidance

* remove guidance embed (pipeline)

* remove guidance from embeddings

* don't return length

* dont change dtype

* remove unused stuff, fix up docs

* add chroma autodoc

* add .md (oops)

* initial chroma docs

* undo don't change dtype

* undo arxiv change

unsure why that happened

* fix hf papers regression in more places

* Update docs/source/en/api/pipelines/chroma.md

Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>

* do_cfg -> self.do_classifier_free_guidance

* Update docs/source/en/api/models/chroma_transformer.md

Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>

* Update chroma.md

* Move chroma layers into transformer

* Remove pruned AdaLayerNorms

* Add chroma fast tests

* (untested) batch cond and uncond

* Add # Copied from for shift

* Update # Copied from statements

* update norm imports

* Revert cond + uncond batching

* Add transformer tests

* move chroma test (oops)

* chroma init

* fix chroma pipeline fast tests

* Update src/diffusers/models/transformers/transformer_chroma.py

Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>

* Move Approximator and Embeddings

* Fix auto pipeline + make style, quality

* make style

* Apply style fixes

* switch to new input ids

* fix # Copied from error

* remove # Copied from on protected members

* try to fix import

* fix import

* make fix-copes

* revert style fix

* update chroma transformer params

* update chroma transformer approximator init params

* update to pad tokens

* fix batch inference

* Make more pipeline tests work

* Make most transformer tests work

* fix docs

* make style, make quality

* skip batch tests

* fix test skipping

* fix test skipping again

* fix for tests

* Fix all pipeline test

* update

* push local changes, fix docs

* add encoder test, remove pooled dim

* default proj dim

* fix tests

* fix equal size list input

* update

* push local changes, fix docs

* add encoder test, remove pooled dim

* default proj dim

* fix tests

* fix equal size list input

* Revert "fix equal size list input"

This reverts commit 3fe4ad67d5.

* update

* update

* update

* update

* update

---------

Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2025-06-14 06:52:56 +05:30
Aryan
9f91305f85 Cosmos Predict2 (#11695)
* support text-to-image

* update example

* make fix-copies

* support use_flow_sigmas in EDM scheduler instead of maintain cosmos-specific scheduler

* support video-to-world

* update

* rename text2image pipeline

* make fix-copies

* add t2i test

* add test for v2w pipeline

* support edm dpmsolver multistep

* update

* update

* update

* update tests

* fix tests

* safety checker

* make conversion script work without guardrail
2025-06-14 01:51:29 +05:30
Sayak Paul
368958df6f [LoRA] parse metadata from LoRA and save metadata (#11324)
* feat: parse metadata from lora state dicts.

* tests

* fix tests

* key renaming

* fix

* smol update

* smol updates

* load metadata.

* automatically save metadata in save_lora_adapter.

* propagate changes.

* changes

* add test to models too.

* tigher tests.

* updates

* fixes

* rename tests.

* sorted.

* Update src/diffusers/loaders/lora_base.py

Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>

* review suggestions.

* removeprefix.

* propagate changes.

* fix-copies

* sd

* docs.

* fixes

* get review ready.

* one more test to catch error.

* change to a different approach.

* fix-copies.

* todo

* sd3

* update

* revert changes in get_peft_kwargs.

* update

* fixes

* fixes

* simplify _load_sft_state_dict_metadata

* update

* style fix

* uipdate

* update

* update

* empty commit

* _pack_dict_with_prefix

* update

* TODO 1.

* todo: 2.

* todo: 3.

* update

* update

* Apply suggestions from code review

Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>

* reraise.

* move argument.

---------

Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>
Co-authored-by: Linoy Tsaban <57615435+linoytsaban@users.noreply.github.com>
2025-06-13 14:37:49 +05:30
Aryan
e52ceae375 Support Wan AccVideo lora (#11704)
* update

* make style

* Update src/diffusers/loaders/lora_conversion_utils.py

* add note explaining threshold
2025-06-13 11:55:08 +05:30
Sayak Paul
62cbde8d41 [docs] mention fp8 benefits on supported hardware. (#11699)
* mention fp8 benefits on supported hardware.

* Update docs/source/en/quantization/torchao.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-06-13 07:17:03 +05:30
Sayak Paul
648e8955cf swap out token for style bot. (#11701) 2025-06-13 06:51:19 +05:30
Sayak Paul
a018ee1e70 Merge branch 'main' into group-offloading-with-disk 2025-06-12 11:08:56 +05:30
sayakpaul
8029cd7ef0 add test and clarify. 2025-06-12 11:08:19 +05:30
sayakpaul
4e4842fb0b check if safetensors already exist. 2025-06-12 10:48:57 +05:30
sayakpaul
d8179b10d3 offload_to_disk_path 2025-06-12 10:27:48 +05:30
Sayak Paul
00b179fb1a [docs] add compilation bits to the bitsandbytes docs. (#11693)
* add compilation bits to the bitsandbytes docs.

* Apply suggestions from code review

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* finish

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-06-12 08:49:24 +05:30
Tolga Cangöz
47ef79464f Apply Occam's Razor in position embedding calculation (#11562)
* fix: remove redundant indexing

* style
2025-06-11 13:47:37 -10:00
Joel Schlosser
b272807bc8 Avoid DtoH sync from access of nonzero() item in scheduler (#11696) 2025-06-11 12:03:40 -10:00
rasmi
447ccd0679 Set _torch_version to N/A if torch is disabled. (#11645) 2025-06-11 11:59:54 -10:00
Aryan
f3e09114f2 Improve Wan docstrings (#11689)
* improve docstrings for wan

* Apply suggestions from code review

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* make style

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-06-12 01:18:40 +05:30
Sayak Paul
91545666e0 [tests] model-level device_map clarifications (#11681)
* add clarity in documentation for device_map

* docs

* fix how compiler tester mixins are used.

* propagate

* more

* typo.

* fix tests

* fix order of decroators.

* clarify more.

* more test cases.

* fix doc

* fix device_map docstring in pipeline_utils.

* more examples

* more

* update

* remove code for stuff that is already supported.

* fix stuff.
2025-06-11 22:41:59 +05:30
Sayak Paul
b6f7933044 [tests] tests for compilation + quantization (bnb) (#11672)
* start adding compilation tests for quantization.

* fixes

* make common utility.

* modularize.

* add group offloading+compile

* xfail

* update

* Update tests/quantization/test_torch_compile_utils.py

Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>

* fixes

---------

Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>
2025-06-11 21:14:24 +05:30
Yao Matrix
33e636cea5 enable torchao test cases on XPU and switch to device agnostic APIs for test cases (#11654)
* enable torchao cases on XPU

Signed-off-by: Matrix YAO <matrix.yao@intel.com>

* device agnostic APIs

Signed-off-by: YAO Matrix <matrix.yao@intel.com>

* more

Signed-off-by: YAO Matrix <matrix.yao@intel.com>

* fix style

Signed-off-by: YAO Matrix <matrix.yao@intel.com>

* enable test_torch_compile_recompilation_and_graph_break on XPU

Signed-off-by: YAO Matrix <matrix.yao@intel.com>

* resolve comments

Signed-off-by: YAO Matrix <matrix.yao@intel.com>

---------

Signed-off-by: Matrix YAO <matrix.yao@intel.com>
Signed-off-by: YAO Matrix <matrix.yao@intel.com>
2025-06-11 15:17:06 +05:30
Tolga Cangöz
e27142ac64 [Wan] Fix VAE sampling mode in WanVideoToVideoPipeline (#11639)
* fix: vae sampling mode

* fix a typo
2025-06-11 14:19:23 +05:30
Sayak Paul
8e88495da2 [LoRA] support Flux Control LoRA with bnb 8bit. (#11655)
support Flux Control LoRA with bnb 8bit.
2025-06-11 08:32:47 +05:30
Akash Haridas
b79803fe08 Allow remote code repo names to contain "." (#11652)
* allow loading from repo with dot in name

* put new arg at the end to avoid breaking compatibility

* add test for loading repo with dot in name

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2025-06-10 13:38:54 -10:00
Meatfucker
b0f7036d9a Update pipeline_flux_inpaint.py to fix padding_mask_crop returning only the inpainted area (#11658)
* Update pipeline_flux_inpaint.py to fix padding_mask_crop returning only the inpainted area and not the entire image.

* Apply style fixes

* Update src/diffusers/pipelines/flux/pipeline_flux_inpaint.py
2025-06-10 13:07:22 -04:00
Sayak Paul
0bf55a99d3 Merge branch 'main' into group-offloading-with-disk 2025-06-10 10:29:54 +05:30
Philip Brown
6c7fad7ec8 Add community class StableDiffusionXL_T5Pipeline (#11626)
* Add community class StableDiffusionXL_T5Pipeline
Will be used with base model opendiffusionai/stablediffusionxl_t5

* Changed pooled_embeds to use projection instead of slice

* "make style" tweaks

* Added comments to top of code

* Apply style fixes
2025-06-09 15:57:51 -04:00
Sayak Paul
d32a2c6879 Merge branch 'main' into group-offloading-with-disk 2025-06-09 14:36:43 +05:30
Dhruv Nair
5b0dab1253 Introduce DeprecatedPipelineMixin to simplify pipeline deprecation process (#11596)
* update

* update

* update

* update

* update

* update

* update
2025-06-09 13:03:40 +05:30
sayakpaul
278cbc2e47 updates.patch 2025-06-09 12:03:01 +05:30
sayakpaul
49ac665460 delete diff file. 2025-06-09 10:26:00 +05:30
sayakpaul
e0d5079f9c start implementing disk offloading in group. 2025-06-09 10:25:45 +05:30
Sayak Paul
7c6e9ef425 [tests] Fix how compiler mixin classes are used (#11680)
* fix how compiler tester mixins are used.

* propagate

* more
2025-06-09 09:24:45 +05:30
Valeriy Sofin
f46abfe4ce fixed axes_dims_rope init (huggingface#11641) (#11678) 2025-06-09 01:16:30 +05:30
Aryan
73a9d5856f Wan VACE (#11582)
* initial support

* make fix-copies

* fix no split modules

* add conversion script

* refactor

* add pipeline test

* refactor

* fix bug with mask

* fix for reference images

* remove print

* update docs

* update slices

* update

* update

* update example
2025-06-06 17:53:10 +05:30
Sayak Paul
16c955c5fd [tests] add test for torch.compile + group offloading (#11670)
* add a test for group offloading + compilation.

* tests
2025-06-06 11:34:44 +05:30
jiqing-feng
0f91f2f6fc use deterministic to get stable result (#11663)
* use deterministic to get stable result

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* add deterministic for int8 test

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

---------

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
2025-06-06 09:14:00 +05:30
Markus Pobitzer
745199a869 [examples] flux-control: use num_training_steps_for_scheduler (#11662)
[examples] flux-control: use num_training_steps_for_scheduler in get_scheduler instead of args.max_train_steps * accelerator.num_processes

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2025-06-05 14:56:25 +05:30
Sayak Paul
0142f6f35a [chore] bring PipelineQuantizationConfig at the top of the import chain. (#11656)
bring PipelineQuantizationConfig at the top of the import chain.
2025-06-05 14:17:03 +05:30
Dhruv Nair
d04cd95012 [CI] Some improvements to Nightly reports summaries (#11166)
* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* updatee

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update
2025-06-05 13:55:01 +05:30
Steven Liu
c934720629 [docs] Model cards (#11112)
* initial

* update

* hunyuanvideo

* ltx

* fix

* wan

* gen guide

* feedback

* feedback

* pipeline-level quant config

* feedback

* ltx
2025-06-02 16:55:14 -07:00
Steven Liu
9f48394bf7 [docs] Caching methods (#11625)
* cache

* feedback
2025-06-02 10:58:47 -07:00
Sayak Paul
20273e5503 [tests] chore: rename lora model-level tests. (#11481)
chore: rename lora model-level tests.
2025-06-02 09:21:40 -07:00
Sayak Paul
d4dc4d7654 [chore] misc changes in the bnb tests for consistency. (#11355)
misc changes in the bnb tests for consistency.
2025-06-02 08:41:10 -07:00
Roy Hvaara
3a31b291f1 Use float32 RoPE freqs in Wan with MPS backends (#11643)
Use float32 for RoPE on MPS in Wan
2025-06-02 09:30:09 +05:30
Sayak Paul
b975bceff3 [docs] update torchao doc link (#11634)
update torchao doc link
2025-05-30 08:30:36 -07:00
co63oc
8183d0f16e Fix typos in strings and comments (#11476)
* Fix typos in strings and comments

Signed-off-by: co63oc <co63oc@users.noreply.github.com>

* Update src/diffusers/hooks/hooks.py

Co-authored-by: Aryan <contact.aryanvs@gmail.com>

* Update src/diffusers/hooks/hooks.py

Co-authored-by: Aryan <contact.aryanvs@gmail.com>

* Update layerwise_casting.py

* Apply style fixes

* update

---------

Signed-off-by: co63oc <co63oc@users.noreply.github.com>
Co-authored-by: Aryan <contact.aryanvs@gmail.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2025-05-30 18:49:00 +05:30
Yaniv Galron
6508da6f06 typo fix in pipeline_flux.py (#11623) 2025-05-30 11:32:13 +05:30
VLT Media
d0ec6601df Bug: Fixed Image 2 Image example (#11619)
Bug: Fixed Image 2 Image example where a PIL.Image was improperly being asked for an item via index.
2025-05-30 11:30:52 +05:30
Yao Matrix
a7aa8bf28a enable group_offloading and PipelineDeviceAndDtypeStabilityTests on XPU, all passed (#11620)
* enable group_offloading and PipelineDeviceAndDtypeStabilityTests on XPU,
all passed

Signed-off-by: Matrix YAO <matrix.yao@intel.com>

* fix style

Signed-off-by: Matrix YAO <matrix.yao@intel.com>

* fix

Signed-off-by: Matrix YAO <matrix.yao@intel.com>

---------

Signed-off-by: Matrix YAO <matrix.yao@intel.com>
Co-authored-by: Aryan <aryan@huggingface.co>
2025-05-30 11:30:37 +05:30
Yaniv Galron
3651bdb766 removing unnecessary else statement (#11624)
Co-authored-by: Aryan <aryan@huggingface.co>
2025-05-30 11:29:24 +05:30
Justin Ruan
df55f05358 Fix wrong indent for examples of controlnet script (#11632)
fix wrong indent for training controlnet
2025-05-29 15:26:39 -07:00
Yuanzhou Cai
89ddb6c0a4 [textual_inversion_sdxl.py] fix lr scheduler steps count (#11557)
fix lr scheduler steps count

Co-authored-by: Linoy Tsaban <57615435+linoytsaban@users.noreply.github.com>
2025-05-29 15:25:45 +03:00
Steven Liu
be2fb77dc1 [docs] PyTorch 2.0 (#11618)
* combine

* Update docs/source/en/optimization/fp16.md

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2025-05-28 09:42:41 -07:00
Sayak Paul
54cddc1e12 [CI] fix the filename for displaying failures in lora ci. (#11600)
fix the filename for displaying failures in lora ci.
2025-05-27 22:27:27 -07:00
Linoy Tsaban
28ef0165b9 [Sana Sprint] add image-to-image pipeline (#11602)
* sana sprint img2img

* fix import

* fix name

* fix image encoding

* fix image encoding

* fix image encoding

* fix image encoding

* fix image encoding

* fix image encoding

* try w/o strength

* try scaling differently

* try with strength

* revert unnecessary changes to scheduler

* revert unnecessary changes to scheduler

* Apply style fixes

* remove comment

* add copy statements

* add copy statements

* add to doc

* add to doc

* add to doc

* add to doc

* Apply style fixes

* empty commit

* fix copies

* fix copies

* fix copies

* fix copies

* fix copies

* docs

* make fix-copies.

* fix doc building error.

* initial commit - add img2img test

* initial commit - add img2img test

* fix import

* fix imports

* Apply style fixes

* empty commit

* remove

* empty commit

* test vocab size

* fix

* fix prompt missing from last commits

* small changes

* fix image processing when input is tensor

* fix order

* Apply style fixes

* empty commit

* fix shape

* remove comment

* image processing

* remove comment

* skip vae tiling test for now

* Apply style fixes

* empty commit

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: sayakpaul <spsayakpaul@gmail.com>
2025-05-27 22:09:51 +03:00
Sayak Paul
a4da216125 [LoRA] improve LoRA fusion tests (#11274)
* improve lora fusion tests

* more improvements.

* remove comment

* update

* relax tolerance.

* num_fused_loras as a property

Co-authored-by: BenjaminBossan <benjamin.bossan@gmail.com>

* updates

* update

* fix

* fix

Co-authored-by: BenjaminBossan <benjamin.bossan@gmail.com>

* Update src/diffusers/loaders/lora_base.py

Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>

---------

Co-authored-by: BenjaminBossan <benjamin.bossan@gmail.com>
Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>
2025-05-27 09:02:12 -07:00
Leo Jiang
5939ace91b Adding NPU for get device function (#11617)
* Adding device choice for npu

* Adding device choice for npu

---------

Co-authored-by: J石页 <jiangshuo9@h-partners.com>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2025-05-27 06:59:15 -07:00
Linoy Tsaban
cc59505e26 [training docs] smol update to README files (#11616)
add comment to install prodigy
2025-05-27 06:26:54 -07:00
Sayak Paul
5f5d02fbf1 Make group offloading compatible with torch.compile() (#11605)
wip: check if we can make go compile compat
2025-05-26 20:15:29 -07:00
Sayak Paul
53748217e6 fix security issue in build docker ci (#11614)
* fix security issue in build docker ci

* better

* update
2025-05-26 22:43:01 +05:30
Dhruv Nair
826f43505d Fix mixed variant downloading (#11611)
* update

* update
2025-05-26 21:43:48 +05:30
Sayak Paul
4af76d0d7d [tests] Changes to the torch.compile() CI and tests (#11508)
* remove compile cuda docker.

* replace compile cuda docker path.

* better manage compilation cache.

* propagate similar to the pipeline tests.

* remove unneeded compile test.

* small.

* don't check for deleted files.
2025-05-26 08:31:04 -07:00
kaixuanliu
b5c2050a16 Fix bug when variant and safetensor file does not match (#11587)
* Apply style fixes

* init test

Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com>

* adjust

Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com>

* add the variant check when there are no component folders

Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com>

* update related test cases

Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com>

* update related unit test cases

Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com>

* adjust

Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com>

* Apply style fixes

---------

Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2025-05-26 14:18:41 +05:30
Steven Liu
7ae546f8d1 [docs] Pipeline-level quantization (#11604)
refactor
2025-05-26 14:12:57 +05:30
Ishan Modi
f64fa9492d [Feature] AutoModel can load components using model_index.json (#11401)
* update

* update

* update

* update

* addressed PR comments

* update

* addressed PR comments

* added tests

* addressed PR comments

* updates

* update

* addressed PR comments

* update

* fix style

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>
2025-05-26 14:06:36 +05:30
Yao Matrix
049082e013 enable pipeline test cases on xpu (#11527)
* enable several pipeline integration tests on xpu

Signed-off-by: Yao Matrix <matrix.yao@intel.com>

* fix style

Signed-off-by: Yao Matrix <matrix.yao@intel.com>

* update per comments

Signed-off-by: Matrix Yao <matrix.yao@intel.com>

---------

Signed-off-by: Yao Matrix <matrix.yao@intel.com>
Signed-off-by: Matrix Yao <matrix.yao@intel.com>
2025-05-26 12:49:58 +05:30
regisss
f161e277d0 Update Intel Gaudi doc (#11479)
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-05-23 10:41:49 +02:00
Sayak Paul
a5f4cc7f84 [LoRA] minor fix for load_lora_weights() for Flux and a test (#11595)
* fix peft delete adapters for flux.

* add test

* empty commit
2025-05-22 15:44:45 +05:30
Dhruv Nair
c36f8487df Type annotation fix (#11597)
* update

* update
2025-05-21 11:50:33 -10:00
Sayak Paul
54af3ca7fd [chore] allow string device to be passed to randn_tensor. (#11559)
allow string device to be passed to randn_tensor.
2025-05-21 14:55:54 +05:30
Sai Shreyas Bhavanasi
ba8dc7dc49 RegionalPrompting: Inherit from Stable Diffusion (#11525)
* Refactoring Regional Prompting pipeline to use Diffusion Pipeline instead of Stable Diffusion Pipeline

* Apply style fixes
2025-05-20 15:03:16 -04:00
Steven Liu
23a4ff8488 [docs] Remove fast diffusion tutorial (#11583)
remove tutorial

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2025-05-20 08:56:12 -07:00
osrm
8705af0914 docs: fix invalid links (#11505)
* fix invalid link lora.md

* fix invalid link controlnet_sdxl.md

The Hugging Face models page now uses the tags parameter instead of the other parameter for tag-based filtering. Therefore, to simultaneously apply both the "Stable Diffusion XL" and "ControlNet" tags, the following URL should be used: https://huggingface.co/models?tags=stable-diffusion-xl,controlnet

* fix invalid link cosine_dpm.md

"https://github.com/Stability-AI/stable-audio-tool" -> "https://github.com/Stability-AI/stable-audio-tools"

* Update controlnet_sdxl.md

* Update cosine_dpm.md

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: Álvaro Somoza <asomoza@users.noreply.github.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-05-20 08:55:41 -07:00
Linoy Tsaban
5d4f723b57 [LoRA] kijai wan lora support for I2V (#11588)
* testing

* testing

* testing

* testing

* testing

* i2v

* i2v

* device fix

* testing

* fix

* fix

* fix

* fix

* fix

* Apply style fixes

* empty commit

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2025-05-20 17:11:55 +03:00
Aryan
05c8b42b75 LTX 0.9.7-distilled; documentation improvements (#11571)
* add guidance rescale

* update docs

* support adaptive instance norm filter

* fix custom timesteps support

* add custom timestep example to docs

* add a note about best generation settings being available only in the original repository

* use original org hub ids instead of personal

* make fix-copies

---------

Co-authored-by: Linoy Tsaban <57615435+linoytsaban@users.noreply.github.com>
2025-05-20 02:29:16 +05:30
Quentin Gallouédec
c8bb1ff53e Use HF Papers (#11567)
* Use HF Papers

* Apply style fixes

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2025-05-19 06:22:33 -10:00
Dhruv Nair
799adf4a10 [Single File] Fix loading for LTX 0.9.7 transformer (#11578)
update
2025-05-19 06:22:13 -10:00
Sayak Paul
00f9273da2 [WIP][LoRA] start supporting kijai wan lora. (#11579)
* start supporting kijai wan lora.

* diff_b keys.

* Apply suggestions from code review

Co-authored-by: Aryan <aryan@huggingface.co>

* merge ready

---------

Co-authored-by: Aryan <aryan@huggingface.co>
2025-05-19 20:47:44 +05:30
Linoy Tsaban
ceb7af277c [LoRA] support non-diffusers LTX-Video loras (#11572)
* support non diffusers loras for ltxv

* Update src/diffusers/loaders/lora_conversion_utils.py

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

* Update src/diffusers/loaders/lora_pipeline.py

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

* Apply style fixes

* empty commit

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2025-05-19 12:59:55 +03:00
Sayak Paul
6918f6d19a [docs] tip for group offloding + quantization (#11576)
* tip for group offloding + quantization

Co-authored-by: Aryan VS <contact.aryanvs@gmail.com>

* Apply suggestions from code review

Co-authored-by: Aryan <aryan@huggingface.co>

---------

Co-authored-by: Aryan VS <contact.aryanvs@gmail.com>
Co-authored-by: Aryan <aryan@huggingface.co>
2025-05-19 14:49:15 +05:30
apolinário
915c537891 Revert error to warning when loading LoRA from repo with multiple weights (#11568) 2025-05-19 13:33:43 +05:30
space_samurai
8270fa58e4 Doc update (#11531)
Update docs/source/en/using-diffusers/inpaint.md
2025-05-19 13:32:08 +05:30
Yao Matrix
1a10fa0c82 enhance value guard of _device_agnostic_dispatch (#11553)
enhance value guard

Signed-off-by: Matrix Yao <matrix.yao@intel.com>
2025-05-19 06:05:32 +05:30
Sayak Paul
9836f0e000 [docs] Regional compilation docs (#11556)
* add regional compilation docs.

* minor.

* reviwer feedback.

* Update docs/source/en/optimization/torch2.0.md

Co-authored-by: Ilyas Moutawwakil <57442720+IlyasMoutawwakil@users.noreply.github.com>

---------

Co-authored-by: Ilyas Moutawwakil <57442720+IlyasMoutawwakil@users.noreply.github.com>
2025-05-15 19:11:24 +05:30
Sayak Paul
20379d9d13 [tests] add tests for combining layerwise upcasting and groupoffloading. (#11558)
* add tests for combining layerwise upcasting and groupoffloading.

* feedback
2025-05-15 17:16:44 +05:30
Animesh Jain
3a6caba8e4 [gguf] Refactor __torch_function__ to avoid unnecessary computation (#11551)
* [gguf] Refactor __torch_function__ to avoid unnecessary computation

This helps with torch.compile compilation lantency. Avoiding unnecessary
computation should also lead to a slightly improved eager latency.

* Apply style fixes

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2025-05-15 14:38:18 +05:30
Dhruv Nair
4267d8f4eb [Single File] GGUF/Single File Support for HiDream (#11550)
* update

* update

* update

* update

* update

* update

* update
2025-05-15 12:25:18 +05:30
Seokhyeon Jeong
f4fa3beee7 [tests] Add torch.compile test for UNet2DConditionModel (#11537)
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2025-05-14 11:26:12 +05:30
Anwesha Chowdhury
7e3353196c Fix deprecation warnings in test_ltx_image2video.py (#11538)
Fixed 2 warnings that were raised during running LTXImageToVideoPipelineFastTests

Co-authored-by: achowdhury1211@gmail.com <anwesha@LAPTOP-E5QGFMOQ>
2025-05-13 08:05:52 -10:00
Meatfucker
8c249d1401 Update pipeline_flux_img2img.py to add missing vae_slicing and vae_tiling calls. (#11545)
* Update pipeline_flux_img2img.py

Adds missing vae_slicing and vae_tiling calls to FluxImage2ImagePipeline

* Update src/diffusers/pipelines/flux/pipeline_flux_img2img.py

Co-authored-by: Álvaro Somoza <asomoza@users.noreply.github.com>

* Update src/diffusers/pipelines/flux/pipeline_flux_img2img.py

Co-authored-by: Álvaro Somoza <asomoza@users.noreply.github.com>

* Update src/diffusers/pipelines/flux/pipeline_flux_img2img.py

Co-authored-by: Álvaro Somoza <asomoza@users.noreply.github.com>

* Update src/diffusers/pipelines/flux/pipeline_flux_img2img.py

Co-authored-by: Álvaro Somoza <asomoza@users.noreply.github.com>

---------

Co-authored-by: Álvaro Somoza <asomoza@users.noreply.github.com>
2025-05-13 12:31:45 -04:00
Sayak Paul
b555a03723 [tests] Enable testing for HiDream transformer (#11478)
* add tests for hidream transformer model.

* fix

* Update tests/models/transformers/test_models_transformer_hidream.py

Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>

---------

Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>
2025-05-13 21:01:20 +05:30
Aryan
06fee551e9 LTX Video 0.9.7 (#11516)
* add upsampling pipeline

* ltx upsample pipeline conversion; pipeline fixes

* make fix-copies

* remove print

* add vae convenience methods

* update

* add tests

* support denoising strength for upscaling & video-to-video

* update docs

* update doc checkpoints

* update docs

* fix

---------

Co-authored-by: Linoy Tsaban <57615435+linoytsaban@users.noreply.github.com>
2025-05-13 14:57:03 +05:30
Linoy Tsaban
8b99f7e157 [LoRA] small change to support Hunyuan LoRA Loading for FramePack (#11546)
init
2025-05-13 10:15:06 +03:00
Kenneth Gerald Hamilton
07dd6f8c0e [train_dreambooth.py] Fix the LR Schedulers when num_train_epochs is passed in a distributed training env (#11239)
Co-authored-by: Linoy Tsaban <57615435+linoytsaban@users.noreply.github.com>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2025-05-13 07:34:01 +05:30
johannaSommer
f8d4a1e283 fix: remove torch_dtype="auto" option from docstrings (#11513)
Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>
2025-05-12 15:42:35 -10:00
Abdellah Oumida
ddd0cfb497 Fix typo in train_diffusion_orpo_sdxl_lora_wds.py (#11541) 2025-05-12 15:28:29 -10:00
Zhong-Yu Li
4f438de35a Add VisualCloze (#11377)
* VisualCloze

* style quality

* add docs

* add docs

* typo

* Update docs/source/en/api/pipelines/visualcloze.md

* delete einops

* style quality

* Update src/diffusers/pipelines/visualcloze/pipeline_visualcloze.py

* reorg

* refine doc

* style quality

* typo

* typo

* Update src/diffusers/image_processor.py

* add comment

* test

* style

* Modified based on review

* style

* restore image_processor

* update example url

* style

* fix-copies

* VisualClozeGenerationPipeline

* combine

* tests docs

* remove VisualClozeUpsamplingPipeline

* style

* quality

* test examples

* quality style

* typo

* make fix-copies

* fix test_callback_cfg and test_save_load_dduf in VisualClozePipelineFastTests

* add EXAMPLE_DOC_STRING to VisualClozeGenerationPipeline

* delete maybe_free_model_hooks from pipeline_visualcloze_combined

* Apply suggestions from code review

* fix test_save_load_local test; add reason for skipping cfg test

* more save_load test fixes

* fix tests in generation pipeline tests
2025-05-13 02:46:51 +05:30
Evan Han
98cc6d05e4 [test_models_transformer_ltx.py] help us test torch.compile() for impactful models (#11512)
* Update test_models_transformer_ltx.py

* Update test_models_transformer_ltx.py

* Update test_models_transformer_ltx.py

* Update test_models_transformer_ltx.py

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2025-05-12 19:26:15 +05:30
Yao Matrix
c3726153fd enable several pipeline integration tests on XPU (#11526)
* enable kandinsky2_2 integration test cases on XPU

Signed-off-by: Yao Matrix <matrix.yao@intel.com>

* fix style

Signed-off-by: Yao Matrix <matrix.yao@intel.com>

* enable latent_diffusion, dance_diffusion, musicldm, shap_e integration
uts on xpu

Signed-off-by: Yao Matrix <matrix.yao@intel.com>

* fix style

Signed-off-by: Yao Matrix <matrix.yao@intel.com>

---------

Signed-off-by: Yao Matrix <matrix.yao@intel.com>
Co-authored-by: Aryan <aryan@huggingface.co>
2025-05-12 16:51:37 +05:30
Aryan
e48f6aeeb4 Hunyuan Video Framepack F1 (#11534)
* support framepack f1

* update docs

* update toctree

* remove typo
2025-05-12 16:11:10 +05:30
Sayak Paul
01abfc8736 [tests] add tests for framepack transformer model. (#11520)
* start.

* add tests for framepack transformer model.

* merge conflicts.

* make to square.

* fixes
2025-05-11 09:50:06 +05:30
Aryan
92fe689f06 Change Framepack transformer layer initialization order (#11535)
update
2025-05-09 23:09:50 +05:30
Yao Matrix
0ba1f76d4d enable print_env on xpu (#11507)
* detect xpu in print_env

Signed-off-by: YAO Matrix <matrix.yao@intel.com>

* enhance code, test passed on XPU

Signed-off-by: Yao Matrix <matrix.yao@intel.com>

---------

Signed-off-by: YAO Matrix <matrix.yao@intel.com>
Signed-off-by: Yao Matrix <matrix.yao@intel.com>
2025-05-09 16:30:51 +05:30
Yao Matrix
d6bf268a4a enable dit integration cases on xpu (#11523)
* enable dit integration test on XPU

Signed-off-by: Yao Matrix <matrix.yao@intel.com>

* fix style

Signed-off-by: Yao Matrix <matrix.yao@intel.com>

---------

Signed-off-by: Yao Matrix <matrix.yao@intel.com>
2025-05-09 16:06:50 +05:30
James Xu
3c0a0129fe [LTXPipeline] Update latents dtype to match VAE dtype (#11533)
fix: update latents dtype to match vae
2025-05-09 16:05:21 +05:30
Yao Matrix
2d380895e5 enable 7 cases on XPU (#11503)
* enable 7 cases on XPU

Signed-off-by: Yao Matrix <matrix.yao@intel.com>

* calibrate A100 expectations

Signed-off-by: YAO Matrix <matrix.yao@intel.com>

---------

Signed-off-by: Yao Matrix <matrix.yao@intel.com>
Signed-off-by: YAO Matrix <matrix.yao@intel.com>
2025-05-09 15:52:08 +05:30
Sayak Paul
0c47c954f3 [LoRA] support non-diffusers hidream loras (#11532)
* support non-diffusers hidream loras

* make fix-copies
2025-05-09 14:42:39 +05:30
Sayak Paul
7acf8345f6 [Tests] Enable more general testing for torch.compile() with LoRA hotswapping (#11322)
* refactor hotswap tester.

* fix seeds..

* add to nightly ci.

* move comment.

* move to nightly
2025-05-09 11:29:06 +05:30
Sayak Paul
599c887164 feat: pipeline-level quantization config (#11130)
* feat: pipeline-level quant config.

Co-authored-by: SunMarc <marc.sun@hotmail.fr>

condition better.

support mapping.

improvements.

[Quantization] Add Quanto backend (#10756)

* update

* updaet

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* Update docs/source/en/quantization/quanto.md

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* Update src/diffusers/quantizers/quanto/utils.py

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

* update

* update

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

[Single File] Add single file loading for SANA Transformer (#10947)

* added support for from_single_file

* added diffusers mapping script

* added testcase

* bug fix

* updated tests

* corrected code quality

* corrected code quality

---------

Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>

[LoRA] Improve warning messages when LoRA loading becomes a no-op (#10187)

* updates

* updates

* updates

* updates

* notebooks revert

* fix-copies.

* seeing

* fix

* revert

* fixes

* fixes

* fixes

* remove print

* fix

* conflicts ii.

* updates

* fixes

* better filtering of prefix.

---------

Co-authored-by: hlky <hlky@hlky.ac>

[LoRA] CogView4 (#10981)

* update

* make fix-copies

* update

[Tests] improve quantization tests by additionally measuring the inference memory savings (#11021)

* memory usage tests

* fixes

* gguf

[`Research Project`] Add AnyText: Multilingual Visual Text Generation And Editing (#8998)

* Add initial template

* Second template

* feat: Add TextEmbeddingModule to AnyTextPipeline

* feat: Add AuxiliaryLatentModule template to AnyTextPipeline

* Add bert tokenizer from the anytext repo for now

* feat: Update AnyTextPipeline's modify_prompt method

This commit adds improvements to the modify_prompt method in the AnyTextPipeline class. The method now handles special characters and replaces selected string prompts with a placeholder. Additionally, it includes a check for Chinese text and translation using the trans_pipe.

* Fill in the `forward` pass of `AuxiliaryLatentModule`

* `make style && make quality`

* `chore: Update bert_tokenizer.py with a TODO comment suggesting the use of the transformers library`

* Update error handling to raise and logging

* Add `create_glyph_lines` function into `TextEmbeddingModule`

* make style

* Up

* Up

* Up

* Up

* Remove several comments

* refactor: Remove ControlNetConditioningEmbedding and update code accordingly

* Up

* Up

* up

* refactor: Update AnyTextPipeline to include new optional parameters

* up

* feat: Add OCR model and its components

* chore: Update `TextEmbeddingModule` to include OCR model components and dependencies

* chore: Update `AuxiliaryLatentModule` to include VAE model and its dependencies for masked image in the editing task

* `make style`

* refactor: Update `AnyTextPipeline`'s docstring

* Update `AuxiliaryLatentModule` to include info dictionary so that text processing is done once

* simplify

* `make style`

* Converting `TextEmbeddingModule` to ordinary `encode_prompt()` function

* Simplify for now

* `make style`

* Up

* feat: Add scripts to convert AnyText controlnet to diffusers

* `make style`

* Fix: Move glyph rendering to `TextEmbeddingModule` from `AuxiliaryLatentModule`

* make style

* Up

* Simplify

* Up

* feat: Add safetensors module for loading model file

* Fix device issues

* Up

* Up

* refactor: Simplify

* refactor: Simplify code for loading models and handling data types

* `make style`

* refactor: Update to() method in FrozenCLIPEmbedderT3 and TextEmbeddingModule

* refactor: Update dtype in embedding_manager.py to match proj.weight

* Up

* Add attribution and adaptation information to pipeline_anytext.py

* Update usage example

* Will refactor `controlnet_cond_embedding` initialization

* Add `AnyTextControlNetConditioningEmbedding` template

* Refactor organization

* style

* style

* Move custom blocks from `AuxiliaryLatentModule` to `AnyTextControlNetConditioningEmbedding`

* Follow one-file policy

* style

* [Docs] Update README and pipeline_anytext.py to use AnyTextControlNetModel

* [Docs] Update import statement for AnyTextControlNetModel in pipeline_anytext.py

* [Fix] Update import path for ControlNetModel, ControlNetOutput in anytext_controlnet.py

* Refactor AnyTextControlNet to use configurable conditioning embedding channels

* Complete control net conditioning embedding in AnyTextControlNetModel

* up

* [FIX] Ensure embeddings use correct device in AnyTextControlNetModel

* up

* up

* style

* [UPDATE] Revise README and example code for AnyTextPipeline integration with DiffusionPipeline

* [UPDATE] Update example code in anytext.py to use correct font file and improve clarity

* down

* [UPDATE] Refactor BasicTokenizer usage to a new Checker class for text processing

* update pillow

* [UPDATE] Remove commented-out code and unnecessary docstring in anytext.py and anytext_controlnet.py for improved clarity

* [REMOVE] Delete frozen_clip_embedder_t3.py as it is in the anytext.py file

* [UPDATE] Replace edict with dict for configuration in anytext.py and RecModel.py for consistency

* 🆙

* style

* [UPDATE] Revise README.md for clarity, remove unused imports in anytext.py, and add author credits in anytext_controlnet.py

* style

* Update examples/research_projects/anytext/README.md

Co-authored-by: Aryan <contact.aryanvs@gmail.com>

* Remove commented-out image preparation code in AnyTextPipeline

* Remove unnecessary blank line in README.md

[Quantization] Allow loading TorchAO serialized Tensor objects with torch>=2.6  (#11018)

* update

* update

* update

* update

* update

* update

* update

* update

* update

fix: mixture tiling sdxl pipeline - adjust gerating time_ids & embeddings  (#11012)

small fix on generating time_ids & embeddings

[LoRA] support wan i2v loras from the world. (#11025)

* support wan i2v loras from the world.

* remove copied from.

* upates

* add lora.

Fix SD3 IPAdapter feature extractor (#11027)

chore: fix help messages in advanced diffusion examples (#10923)

Fix missing **kwargs in lora_pipeline.py (#11011)

* Update lora_pipeline.py

* Apply style fixes

* fix-copies

---------

Co-authored-by: hlky <hlky@hlky.ac>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

Fix for multi-GPU WAN inference (#10997)

Ensure that hidden_state and shift/scale are on the same device when running with multiple GPUs

Co-authored-by: Jimmy <39@🇺🇸.com>

[Refactor] Clean up import utils boilerplate (#11026)

* update

* update

* update

Use `output_size` in `repeat_interleave` (#11030)

[hybrid inference 🍯🐝] Add VAE encode (#11017)

* [hybrid inference 🍯🐝] Add VAE encode

* _toctree: add vae encode

* Add endpoints, tests

* vae_encode docs

* vae encode benchmarks

* api reference

* changelog

* Update docs/source/en/hybrid_inference/overview.md

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

* update

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

Wan Pipeline scaling fix, type hint warning, multi generator fix (#11007)

* Wan Pipeline scaling fix, type hint warning, multi generator fix

* Apply suggestions from code review

[LoRA] change to warning from info when notifying the users about a LoRA no-op (#11044)

* move to warning.

* test related changes.

Rename Lumina(2)Text2ImgPipeline -> Lumina(2)Pipeline (#10827)

* Rename Lumina(2)Text2ImgPipeline -> Lumina(2)Pipeline

---------

Co-authored-by: YiYi Xu <yixu310@gmail.com>

making ```formatted_images``` initialization compact (#10801)

compact writing

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: YiYi Xu <yixu310@gmail.com>

Fix aclnnRepeatInterleaveIntWithDim error on NPU for get_1d_rotary_pos_embed (#10820)

* get_1d_rotary_pos_embed support npu

* Update src/diffusers/models/embeddings.py

---------

Co-authored-by: Kai zheng <kaizheng@KaideMacBook-Pro.local>
Co-authored-by: hlky <hlky@hlky.ac>
Co-authored-by: YiYi Xu <yixu310@gmail.com>

[Tests] restrict memory tests for quanto for certain schemes. (#11052)

* restrict memory tests for quanto for certain schemes.

* Apply suggestions from code review

Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>

* fixes

* style

---------

Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>

[LoRA] feat: support non-diffusers wan t2v loras. (#11059)

feat: support non-diffusers wan t2v loras.

[examples/controlnet/train_controlnet_sd3.py] Fixes #11050 - Cast prompt_embeds and pooled_prompt_embeds to weight_dtype to prevent dtype mismatch (#11051)

Fix: dtype mismatch of prompt embeddings in sd3 controlnet training

Co-authored-by: Andreas Jörg <andreasjoerg@MacBook-Pro-von-Andreas-2.fritz.box>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

reverts accidental change that removes attn_mask in attn. Improves fl… (#11065)

reverts accidental change that removes attn_mask in attn. Improves flux ptxla by using flash block sizes. Moves encoding outside the for loop.

Co-authored-by: Juan Acevedo <jfacevedo@google.com>

Fix deterministic issue when getting pipeline dtype and device (#10696)

Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>

[Tests] add requires peft decorator. (#11037)

* add requires peft decorator.

* install peft conditionally.

* conditional deps.

Co-authored-by: DN6 <dhruv.nair@gmail.com>

---------

Co-authored-by: DN6 <dhruv.nair@gmail.com>

CogView4 Control Block (#10809)

* cogview4 control training

---------

Co-authored-by: OleehyO <leehy0357@gmail.com>
Co-authored-by: yiyixuxu <yixu310@gmail.com>

[CI] pin transformers version for benchmarking. (#11067)

pin transformers version for benchmarking.

updates

Fix Wan I2V Quality (#11087)

* fix_wan_i2v_quality

* Update src/diffusers/pipelines/wan/pipeline_wan_i2v.py

Co-authored-by: YiYi Xu <yixu310@gmail.com>

* Update src/diffusers/pipelines/wan/pipeline_wan_i2v.py

Co-authored-by: YiYi Xu <yixu310@gmail.com>

* Update src/diffusers/pipelines/wan/pipeline_wan_i2v.py

Co-authored-by: YiYi Xu <yixu310@gmail.com>

* Update pipeline_wan_i2v.py

---------

Co-authored-by: YiYi Xu <yixu310@gmail.com>
Co-authored-by: hlky <hlky@hlky.ac>

LTX 0.9.5 (#10968)

* update

---------

Co-authored-by: YiYi Xu <yixu310@gmail.com>
Co-authored-by: hlky <hlky@hlky.ac>

make PR GPU tests conditioned on styling. (#11099)

Group offloading improvements (#11094)

update

Fix pipeline_flux_controlnet.py (#11095)

* Fix pipeline_flux_controlnet.py

* Fix style

update readme instructions. (#11096)

Co-authored-by: Juan Acevedo <jfacevedo@google.com>

Resolve stride mismatch in UNet's ResNet to support Torch DDP (#11098)

Modify UNet's ResNet implementation to resolve stride mismatch in Torch's DDP

Fix Group offloading behaviour when using streams (#11097)

* update

* update

Quality options in `export_to_video` (#11090)

* Quality options in `export_to_video`

* make style

improve more.

add placeholders for docstrings.

formatting.

smol fix.

solidify validation and annotation

* Revert "feat: pipeline-level quant config."

This reverts commit 316ff46b76.

* feat: implement pipeline-level quantization config

Co-authored-by: SunMarc <marc@huggingface.co>

* update

* fixes

* fix validation.

* add tests and other improvements.

* add tests

* import quality

* remove prints.

* add docs.

* fixes to docs.

* doc fixes.

* doc fixes.

* add validation to the input quantization_config.

* clarify recommendations.

* docs

* add to ci.

* todo.

---------

Co-authored-by: SunMarc <marc@huggingface.co>
2025-05-09 10:04:44 +05:30
Sayak Paul
393aefcdc7 [tests] fix audioldm2 for transformers main. (#11522)
fix audioldm2 for transformers main.
2025-05-08 21:13:42 +05:30
Aryan
6674a5157f Conditionally import torchvision in Cosmos transformer (#11524)
fix
2025-05-08 19:37:47 +05:30
scxue
784db0eaab Add cross attention type for Sana-Sprint training in diffusers. (#11514)
* test permission

* Add cross attention type for Sana-Sprint.

* Add Sana-Sprint training script in diffusers.

* make style && make quality;

* modify the attention processor with `set_attn_processor` and change `SanaAttnProcessor3_0` to `SanaVanillaAttnProcessor`

* Add import for SanaVanillaAttnProcessor

* Add README file.

* Apply suggestions from code review

* style

* Update examples/research_projects/sana/README.md

---------

Co-authored-by: lawrence-cj <cjs1020440147@icloud.com>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2025-05-08 18:55:29 +05:30
Linoy Tsaban
66e50d4e24 [LoRA] make lora alpha and dropout configurable (#11467)
* add lora_alpha and lora_dropout

* Apply style fixes

* add lora_alpha and lora_dropout

* Apply style fixes

* revert lora_alpha until #11324 is merged

* Apply style fixes

* empty commit

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2025-05-08 11:54:50 +03:00
sayakpaul
c5c34a4591 Revert "fix audioldm"
This reverts commit 87e508f11f.
2025-05-08 11:30:29 +05:30
sayakpaul
87e508f11f fix audioldm 2025-05-08 11:30:11 +05:30
YiYi Xu
53bd367b03 clean up the __Init__ for stable_diffusion (#11500)
up
2025-05-07 07:01:17 -10:00
Aryan
7b904941bc Cosmos (#10660)
* begin transformer conversion

* refactor

* refactor

* refactor

* refactor

* refactor

* refactor

* update

* add conversion script

* add pipeline

* make fix-copies

* remove einops

* update docs

* gradient checkpointing

* add transformer test

* update

* debug

* remove prints

* match sigmas

* add vae pt. 1

* finish CV* vae

* update

* update

* update

* update

* update

* update

* make fix-copies

* update

* make fix-copies

* fix

* update

* update

* make fix-copies

* update

* update tests

* handle device and dtype for safety checker; required in latest diffusers

* remove enable_gqa and use repeat_interleave instead

* enforce safety checker; use dummy checker in fast tests

* add review suggestion for ONNX export

Co-Authored-By: Asfiya Baig <asfiyab@nvidia.com>

* fix safety_checker issues when not passed explicitly

We could either do what's done in this commit, or update the Cosmos examples to explicitly pass the safety checker

* use cosmos guardrail package

* auto format docs

* update conversion script to support 14B models

* update name CosmosPipeline -> CosmosTextToWorldPipeline

* update docs

* fix docs

* fix group offload test failing for vae

---------

Co-authored-by: Asfiya Baig <asfiyab@nvidia.com>
2025-05-07 20:59:09 +05:30
Sayak Paul
fb29132b98 [docs] minor updates to bitsandbytes docs. (#11509)
* minor updates to bitsandbytes docs.

* Apply suggestions from code review
2025-05-06 18:52:18 +05:30
Valeriy Selitskiy
79371661d1 [lora_conversion] Enhance key handling for OneTrainer components in LORA conversion utility (#11441) (#11487)
* [lora_conversion] Enhance key handling for OneTrainer components in LORA conversion utility (#11441)

* Update src/diffusers/loaders/lora_conversion_utils.py

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2025-05-06 18:44:58 +05:30
Yao Matrix
8c661ea586 enable lora cases on XPU (#11506)
* enable lora cases on XPU

Signed-off-by: Yao Matrix <matrix.yao@intel.com>

* remove hunyuanvideo xpu expectation

Signed-off-by: Yao Matrix <matrix.yao@intel.com>

---------

Signed-off-by: Yao Matrix <matrix.yao@intel.com>
2025-05-06 14:59:50 +05:30
Aryan
d7ffe60166 Hunyuan Video Framepack (#11428)
* add transformer

* add pipeline

* fixes

* make fix-copies

* update

* add flux mu shift

* update example snippet

* debug

* cleanup

* batch_size=1 optimization

* add pipeline test

* fix for model cpu offloading'

* add last_image support; credits: https://github.com/lllyasviel/FramePack/pull/167

* update example with flf2v

* update penguin url

* fix test

* address review comment: https://github.com/huggingface/diffusers/pull/11428#discussion_r2071032371

* address review comment: https://github.com/huggingface/diffusers/pull/11428#discussion_r2071087689

* Update src/diffusers/pipelines/hunyuan_video/pipeline_hunyuan_video_framepack.py

---------

Co-authored-by: Linoy Tsaban <57615435+linoytsaban@users.noreply.github.com>
2025-05-06 14:59:38 +05:30
Sayak Paul
10bee525e7 [LoRA] use removeprefix to preserve sanity. (#11493)
* use removeprefix to preserve sanity.

* f-string.
2025-05-06 12:17:57 +05:30
Sayak Paul
d88ae1f52a update dep table. (#11504)
* update dep table.

* fix
2025-05-06 11:14:07 +05:30
Sayak Paul
53f1043cbb Update setup.py to pin min version of peft (#11502) 2025-05-06 10:23:16 +05:30
Aryan
1fa5639438 Fix torchao docs typo for fp8 granular quantization (#11473)
update
2025-05-06 07:54:28 +05:30
RogerSinghChugh
ed4efbd63d Update training script for txt to img sdxl with lora supp with new interpolation. (#11496)
* Update training script for txt to img sdxl with lora supp with new interpolation.

* ran make style and make quality.
2025-05-05 12:33:28 -04:00
Yijun Lee
9c29e938d7 Set LANCZOS as the default interpolation method for image resizing. (#11492)
* Set LANCZOS as the default interpolation method for image resizing.

* style: run make style and quality checks
2025-05-05 12:18:40 -04:00
Sayak Paul
071807c853 [training] feat: enable quantization for hidream lora training. (#11494)
* feat: enable quantization for hidream lora training.

* better handle compute dtype.

* finalize.

* fix dtype.

---------

Co-authored-by: Linoy Tsaban <57615435+linoytsaban@users.noreply.github.com>
2025-05-05 20:44:35 +05:30
Evan Han
ee1516e5c7 [train_dreambooth_lora_lumina2] Add LANCZOS as the default interpolation mode for image resizing (#11491)
[ADD] interpolation
2025-05-05 10:41:33 -04:00
MinJu-Ha
ec9323996b [train_dreambooth_lora_sdxl] Add --image_interpolation_mode option for image resizing (default to lanczos) (#11490)
feat(train_dreambooth_lora_sdxl): support --image_interpolation_mode with default to lanczos
2025-05-05 10:19:30 -04:00
Parag Ekbote
fc5e906689 [train_text_to_image_sdxl]Add LANCZOS as default interpolation mode for image resizing (#11455)
* Add LANCZOS as default interplotation mode.

* update script

* Update as per code review.

* make style.
2025-05-05 09:52:19 -04:00
Connector Switch
8520d496f0 [Feature] Implement tiled VAE encoding/decoding for Wan model. (#11414)
* implement tiled encode/decode

* address review comments
2025-05-05 16:07:14 +05:30
Yao Matrix
a674914fd5 enable semantic diffusion and stable diffusion panorama cases on XPU (#11459)
Signed-off-by: Yao Matrix <matrix.yao@intel.com>
2025-05-05 15:28:07 +05:30
Yash
ec3d58286d [train_dreambooth_lora_flux_advanced] Add LANCZOS as the default interpolation mode for image resizing (#11472)
* [train_controlnet_sdxl] Add LANCZOS as the default interpolation mode for image resizing

* [train_dreambooth_lora_flux_advanced] Add LANCZOS as the default interpolation mode for image resizing
2025-05-02 18:14:41 -04:00
Yuanzhou
ed6cf52572 [train_dreambooth_lora_sdxl_advanced] Add LANCZOS as the default interpolation mode for image resizing (#11471) 2025-05-02 16:46:01 -04:00
Steven Liu
e23705e557 [docs] Adapters (#11331)
* refactor adapter docs

* ip-adapter

* ip adapter

* fix toctree

* fix toctree

* lora

* images

* controlnet

* feedback

* controlnet

* t2i

* fix typo

* feedback

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2025-05-02 08:08:33 +05:30
Steven Liu
b848d479b1 [docs] Memory optims (#11385)
* reformat

* initial

* fin

* review

* inference

* feedback

* feedback

* feedback
2025-05-01 11:22:00 -07:00
Vladimir Mandic
d0c02398b9 cache packages_distributions (#11453)
* cache packages_distributions

* remove unused exception reference

* make style

Signed-off-by: Vladimir Mandic <mandic00@live.com>

* change name to _package_map

---------

Signed-off-by: Vladimir Mandic <mandic00@live.com>
Co-authored-by: DN6 <dhruv.nair@gmail.com>
2025-05-01 21:47:52 +05:30
Sayak Paul
5dcdf4ac9a [tests] xfail recent pipeline tests for specific methods. (#11469)
xfail recent pipeline tests for specific methods.
2025-05-01 18:33:52 +05:30
co63oc
86294d3c7f Fix typos in docs and comments (#11416)
* Fix typos in docs and comments

* Apply style fixes

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2025-04-30 20:30:53 -10:00
Sayak Paul
d70f8ee18b [WAN] fix recompilation issues (#11475)
* [tests] Add torch.compile() test for WanTransformer3DModel

* fix wan recompilation issues.

* style

---------

Co-authored-by: tongyu0924 <winnie920924@gmail.com>
2025-04-30 20:29:08 -10:00
Yao Matrix
06beecafc5 make autoencoders. controlnet_flux and wan_transformer3d_single_file pass on xpu (#11461)
* make autoencoders. controlnet_flux and wan_transformer3d_single_file
pass on XPU

Signed-off-by: Yao Matrix <matrix.yao@intel.com>

* Apply style fixes

---------

Signed-off-by: Yao Matrix <matrix.yao@intel.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Aryan <aryan@huggingface.co>
2025-05-01 02:43:31 +05:30
Vaibhav Kumawat
daf0a23958 Add LANCZOS as default interplotation mode. (#11463)
* Add LANCZOS as default interplotation mode.

* LANCZOS as default interplotation

* LANCZOS as default interplotation mode

* Added LANCZOS as default interplotation mode
2025-04-30 14:22:38 -04:00
tongyu
38ced7ee59 [test_models_transformer_hunyuan_video] help us test torch.compile() for impactful models (#11431)
* Update test_models_transformer_hunyuan_video.py

* update

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2025-04-30 19:11:42 +08:00
Yao Matrix
23c98025b3 make safe diffusion test cases pass on XPU and A100 (#11458)
* make safe diffusion test cases pass on XPU and A100

Signed-off-by: Yao Matrix <matrix.yao@intel.com>

* calibrate A100 expected values

Signed-off-by: YAO Matrix <matrix.yao@intel.com>

---------

Signed-off-by: Yao Matrix <matrix.yao@intel.com>
Signed-off-by: YAO Matrix <matrix.yao@intel.com>
2025-04-30 16:05:28 +05:30
captainzz
8cd7426e56 Add StableDiffusion3InstructPix2PixPipeline (#11378)
* upload StableDiffusion3InstructPix2PixPipeline

* Move to community

* Add readme

* Fix images

* remove images

* Change image url

* fix

* Apply style fixes
2025-04-30 06:13:12 -04:00
Daniel Socek
fbce7aeb32 Add generic support for Intel Gaudi accelerator (hpu device) (#11328)
* Add generic support for Intel Gaudi accelerator (hpu device)

Signed-off-by: Daniel Socek <daniel.socek@intel.com>
Co-authored-by: Libin Tang <libin.tang@intel.com>

* Add loggers for generic HPU support

Signed-off-by: Daniel Socek <daniel.socek@intel.com>

* Refactor hpu support with is_hpu_available() logic

Signed-off-by: Daniel Socek <daniel.socek@intel.com>

* Fix style for hpu support update

Signed-off-by: Daniel Socek <daniel.socek@intel.com>

* Decouple soft HPU check from hard device validation to support HPU migration

Signed-off-by: Daniel Socek <daniel.socek@intel.com>

---------

Signed-off-by: Daniel Socek <daniel.socek@intel.com>
Co-authored-by: Libin Tang <libin.tang@intel.com>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2025-04-30 14:45:02 +05:30
Yao Matrix
35fada4169 enable unidiffuser test cases on xpu (#11444)
* enable unidiffuser cases on XPU

Signed-off-by: Yao Matrix <matrix.yao@intel.com>

* fix a typo

Signed-off-by: Yao Matrix <matrix.yao@intel.com>

* fix style

Signed-off-by: Yao Matrix <matrix.yao@intel.com>

---------

Signed-off-by: Yao Matrix <matrix.yao@intel.com>
2025-04-30 13:58:00 +05:30
Yao Matrix
fbe2fe5578 enable consistency test cases on XPU, all passed (#11446)
Signed-off-by: Yao Matrix <matrix.yao@intel.com>
2025-04-30 12:41:29 +05:30
Aryan
c86511586f torch.compile fullgraph compatibility for Hunyuan Video (#11457)
udpate
2025-04-30 11:21:17 +05:30
Yao Matrix
60892c55a4 enable marigold_intrinsics cases on XPU (#11445)
Signed-off-by: Yao Matrix <matrix.yao@intel.com>
2025-04-30 11:07:37 +05:30
Aryan
8fe5a14d9b Raise warning instead of error for block offloading with streams (#11425)
raise warning instead of error
2025-04-30 08:26:16 +05:30
Youlun Peng
58431f102c Set LANCZOS as the default interpolation for image resizing in ControlNet training (#11449)
Set LANCZOS as the default interpolation for image resizing
2025-04-29 08:47:02 -04:00
urpetkov-amd
4a9ab650aa Fixing missing provider options argument (#11397)
* Fixing missing provider options argument

* Adding if else for provider options

* Apply suggestions from code review

Co-authored-by: YiYi Xu <yixu310@gmail.com>

* Apply style fixes

* Update src/diffusers/pipelines/onnx_utils.py

Co-authored-by: YiYi Xu <yixu310@gmail.com>

* Update src/diffusers/pipelines/onnx_utils.py

Co-authored-by: YiYi Xu <yixu310@gmail.com>

---------

Co-authored-by: Uros Petkovic <urpektov@amd.com>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: YiYi Xu <yixu310@gmail.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2025-04-28 10:23:05 -10:00
Linoy Tsaban
0ac1d5b482 [Hi-Dream LoRA] fix bug in validation (#11439)
remove unnecessary pipeline moving to cpu in validation

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2025-04-28 06:22:32 -10:00
Yao Matrix
7567adfc45 enable 28 GGUF test cases on XPU (#11404)
* enable gguf test cases on XPU

Signed-off-by: YAO Matrix <matrix.yao@intel.com>

* make SD35LargeGGUFSingleFileTests::test_pipeline_inference pas

Signed-off-by: root <root@a4bf01945cfe.jf.intel.com>

* make FluxControlLoRAGGUFTests::test_lora_loading pass

Signed-off-by: Yao Matrix <matrix.yao@intel.com>

* polish code

Signed-off-by: Yao Matrix <matrix.yao@intel.com>

* Apply style fixes

---------

Signed-off-by: YAO Matrix <matrix.yao@intel.com>
Signed-off-by: root <root@a4bf01945cfe.jf.intel.com>
Signed-off-by: Yao Matrix <matrix.yao@intel.com>
Co-authored-by: root <root@a4bf01945cfe.jf.intel.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2025-04-28 21:32:04 +05:30
tongyu
3da98e7ee3 [train_text_to_image_lora] Better image interpolation in training scripts follow up (#11427)
* Update train_text_to_image_lora.py

* update_train_text_to_image_lora
2025-04-28 11:23:24 -04:00
tongyu
b3b04fefde [train_text_to_image] Better image interpolation in training scripts follow up (#11426)
* Update train_text_to_image.py

* update
2025-04-28 10:50:33 -04:00
Sayak Paul
0e3f2713c2 [tests] fix import. (#11434)
fix import.
2025-04-28 13:32:28 +08:00
Yao Matrix
a7e9f85e21 enable test_layerwise_casting_memory cases on XPU (#11406)
* enable test_layerwise_casting_memory cases on XPU

Signed-off-by: Yao Matrix <matrix.yao@intel.com>

* fix style

Signed-off-by: Yao Matrix <matrix.yao@intel.com>

---------

Signed-off-by: Yao Matrix <matrix.yao@intel.com>
2025-04-28 06:38:39 +05:30
Yao Matrix
9ce89e2efa enable group_offload cases and quanto cases on XPU (#11405)
* enable group_offload cases and quanto cases on XPU

Signed-off-by: YAO Matrix <matrix.yao@intel.com>

* use backend APIs

Signed-off-by: Yao Matrix <matrix.yao@intel.com>

* fix style

Signed-off-by: Yao Matrix <matrix.yao@intel.com>

---------

Signed-off-by: YAO Matrix <matrix.yao@intel.com>
Signed-off-by: Yao Matrix <matrix.yao@intel.com>
2025-04-28 06:37:16 +05:30
Sayak Paul
aa5f5d41d6 [tests] add tests to check for graph breaks, recompilation, cuda syncs in pipelines during torch.compile() (#11085)
* test for better torch.compile stuff.

* fixes

* recompilation and graph break.

* clear compilation cache.

* change to modeling level test.

* allow running compilation tests during nightlies.
2025-04-28 08:36:33 +08:00
Mert Erbak
bd96a084d3 [train_dreambooth_lora.py] Set LANCZOS as default interpolation mode for resizing (#11421)
* Set LANCZOS as default interpolation mode for resizing

* [train_dreambooth_lora.py] Set LANCZOS as default interpolation mode for resizing
2025-04-26 01:58:41 -04:00
co63oc
f00a995753 Fix typos in strings and comments (#11407) 2025-04-24 08:53:47 -10:00
Ishan Modi
e8312e7ca9 [BUG] fixed WAN docstring (#11226)
update
2025-04-24 08:49:37 -10:00
Emiliano
7986834572 Fix Flux IP adapter argument in the pipeline example (#11402)
Fix Flux IP adapter argument in the example

IP-Adapter example had a wrong argument. Fix `true_cfg` -> `true_cfg_scale`
2025-04-24 08:41:12 -10:00
Linoy Tsaban
edd7880418 [HiDream LoRA] optimizations + small updates (#11381)
* 1. add pre-computation of prompt embeddings when custom prompts are used as well
2. save model card even if model is not pushed to hub
3. remove scheduler initialization from code example - not necessary anymore (it's now if the base model's config)
4. add skip_final_inference - to allow to run with validation, but skip the final loading of the pipeline with the lora weights to reduce memory reqs

* pre encode validation prompt as well

* Update examples/dreambooth/train_dreambooth_lora_hidream.py

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

* Update examples/dreambooth/train_dreambooth_lora_hidream.py

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

* Update examples/dreambooth/train_dreambooth_lora_hidream.py

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

* pre encode validation prompt as well

* Apply style fixes

* empty commit

* change default trained modules

* empty commit

* address comments + change encoding of validation prompt (before it was only pre-encoded if custom prompts are provided, but should be pre-encoded either way)

* Apply style fixes

* empty commit

* fix validation_embeddings definition

* fix final inference condition

* fix pipeline deletion in last inference

* Apply style fixes

* empty commit

* layers

* remove readme remarks on only pre-computing when instance prompt is provided and change example to 3d icons

* smol fix

* empty commit

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2025-04-24 07:48:19 +03:00
Teriks
b4be42282d Kolors additional pipelines, community contrib (#11372)
* Kolors additional pipelines, community contrib

---------

Co-authored-by: Teriks <Teriks@users.noreply.github.com>
Co-authored-by: Linoy Tsaban <57615435+linoytsaban@users.noreply.github.com>
2025-04-23 11:07:27 -10:00
Ishan Modi
a4f9c3cbc3 [Feature] Added Xlab Controlnet support (#11249)
update
2025-04-23 10:43:50 -10:00
Ishan Dutta
4b60f4b602 [train_dreambooth_flux] Add LANCZOS as the default interpolation mode for image resizing (#11395) 2025-04-23 10:47:05 -04:00
Aryan
6cef71de3a Fix group offloading with block_level and use_stream=True (#11375)
* fix

* add tests

* add message check
2025-04-23 18:17:53 +05:30
Ameer Azam
026507c06c Update README_hidream.md (#11386)
Small change
requirements_sana.txt to 
requirements_hidream.txt
2025-04-22 20:08:26 -04:00
YiYi Xu
448c72a230 [HiDream] move deprecation to 0.35.0 (#11384)
up
2025-04-22 08:08:08 -10:00
Aryan
f108ad8888 Update modeling imports (#11129)
update
2025-04-22 06:59:25 -10:00
Linoy Tsaban
e30d3bf544 [LoRA] add LoRA support to HiDream and fine-tuning script (#11281)
* initial commit

* initial commit

* initial commit

* initial commit

* initial commit

* initial commit

* Update examples/dreambooth/train_dreambooth_lora_hidream.py

Co-authored-by: Bagheera <59658056+bghira@users.noreply.github.com>

* move prompt embeds, pooled embeds outside

* Update examples/dreambooth/train_dreambooth_lora_hidream.py

Co-authored-by: hlky <hlky@hlky.ac>

* Update examples/dreambooth/train_dreambooth_lora_hidream.py

Co-authored-by: hlky <hlky@hlky.ac>

* fix import

* fix import and tokenizer 4, text encoder 4 loading

* te

* prompt embeds

* fix naming

* shapes

* initial commit to add HiDreamImageLoraLoaderMixin

* fix init

* add tests

* loader

* fix model input

* add code example to readme

* fix default max length of text encoders

* prints

* nullify training cond in unpatchify for temp fix to incompatible shaping of transformer output during training

* smol fix

* unpatchify

* unpatchify

* fix validation

* flip pred and loss

* fix shift!!!

* revert unpatchify changes (for now)

* smol fix

* Apply style fixes

* workaround moe training

* workaround moe training

* remove prints

* to reduce some memory, keep vae in `weight_dtype` same as we have for flux (as it's the same vae)
bbd0c161b5/examples/dreambooth/train_dreambooth_lora_flux.py (L1207)

* refactor to align with HiDream refactor

* refactor to align with HiDream refactor

* refactor to align with HiDream refactor

* add support for cpu offloading of text encoders

* Apply style fixes

* adjust lr and rank for train example

* fix copies

* Apply style fixes

* update README

* update README

* update README

* fix license

* keep prompt2,3,4 as None in validation

* remove reverse ode comment

* Update examples/dreambooth/train_dreambooth_lora_hidream.py

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

* Update examples/dreambooth/train_dreambooth_lora_hidream.py

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

* vae offload change

* fix text encoder offloading

* Apply style fixes

* cleaner to_kwargs

* fix module name in copied from

* add requirements

* fix offloading

* fix offloading

* fix offloading

* update transformers version in reqs

* try AutoTokenizer

* try AutoTokenizer

* Apply style fixes

* empty commit

* Delete tests/lora/test_lora_layers_hidream.py

* change tokenizer_4 to load with AutoTokenizer as well

* make text_encoder_four and tokenizer_four configurable

* save model card

* save model card

* revert T5

* fix test

* remove non diffusers lumina2 conversion

---------

Co-authored-by: Bagheera <59658056+bghira@users.noreply.github.com>
Co-authored-by: hlky <hlky@hlky.ac>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2025-04-22 11:44:02 +03:00
apolinário
6ab62c7431 Add stochastic sampling to FlowMatchEulerDiscreteScheduler (#11369)
* Add stochastic sampling to FlowMatchEulerDiscreteScheduler

This PR adds stochastic sampling to FlowMatchEulerDiscreteScheduler based on b1aeddd7cc  ltx_video/schedulers/rf.py

* Apply style fixes

* Use config value directly

* Apply style fixes

* Swap order

* Update src/diffusers/schedulers/scheduling_flow_match_euler_discrete.py

Co-authored-by: YiYi Xu <yixu310@gmail.com>

* Update src/diffusers/schedulers/scheduling_flow_match_euler_discrete.py

Co-authored-by: YiYi Xu <yixu310@gmail.com>

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: YiYi Xu <yixu310@gmail.com>
2025-04-21 17:18:30 -10:00
Ishan Modi
f59df3bb8b [Refactor] Minor Improvement for import utils (#11161)
* update

* update

* addressed PR comments

* update

---------

Co-authored-by: YiYi Xu <yixu310@gmail.com>
2025-04-21 09:56:55 -10:00
josephrocca
a00c73a5e1 Support different-length pos/neg prompts for FLUX.1-schnell variants like Chroma (#11120)
Co-authored-by: YiYi Xu <yixu310@gmail.com>
2025-04-21 09:28:19 -10:00
OleehyO
0434db9a99 [cogview4][feat] Support attention mechanism with variable-length support and batch packing (#11349)
* [cogview4] Enhance attention mechanism with variable-length support and batch packing

---------

Co-authored-by: YiYi Xu <yixu310@gmail.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2025-04-21 09:27:55 -10:00
Aamir Nazir
aff574fb29 Add Serialized Type Name kwarg in Model Output (#10502)
* Update outputs.py
2025-04-21 08:45:28 -10:00
Ishan Modi
79ea8eb258 [BUG] fixes in kadinsky pipeline (#11080)
* bug fix kadinsky pipeline
2025-04-21 08:41:09 -10:00
Aryan
e7f3a73786 Fix Wan I2V prepare_latents dtype (#11371)
update
2025-04-21 08:18:50 -10:00
PromeAI
7a4a126db8 fix issue that training flux controlnet was unstable and validation r… (#11373)
* fix issue that training flux controlnet was unstable and validation results were unstable

* del unused code pieces, fix grammar

---------

Co-authored-by: Your Name <you@example.com>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2025-04-21 08:16:05 -10:00
Kenneth Gerald Hamilton
0dec414d5b [train_dreambooth_lora_sdxl.py] Fix the LR Schedulers when num_train_epochs is passed in a distributed training env (#11240)
Co-authored-by: Linoy Tsaban <57615435+linoytsaban@users.noreply.github.com>
2025-04-21 12:51:03 +05:30
Linoy Tsaban
44eeba07b2 [Flux LoRAs] fix lr scheduler bug in distributed scenarios (#11242)
* add fix

* add fix

* Apply style fixes

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2025-04-21 10:08:45 +03:00
YiYi Xu
5873377a66 [Wan2.1-FLF2V] update conversion script (#11365)
update scheuler config in conversion sript
2025-04-18 14:08:44 -10:00
YiYi Xu
5a2e0f715c update output for Hidream transformer (#11366)
up
2025-04-18 14:07:21 -10:00
Kazuki Yoda
ef47726e2d Fix: StableDiffusionXLControlNetAdapterInpaintPipeline incorrectly inherited StableDiffusionLoraLoaderMixin (#11357)
Fix: Inherit `StableDiffusionXLLoraLoaderMixin`

`StableDiffusionXLControlNetAdapterInpaintPipeline`
used to incorrectly inherit
`StableDiffusionLoraLoaderMixin`
instead of `StableDiffusionXLLoraLoaderMixin`
2025-04-18 12:46:06 -10:00
YiYi Xu
0021bfa1e1 support Wan-FLF2V (#11353)
* update transformer

---------

Co-authored-by: Aryan <aryan@huggingface.co>
2025-04-18 10:27:50 -10:00
Marc Sun
bbd0c161b5 [BNB] Fix test_moving_to_cpu_throws_warning (#11356)
fix

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2025-04-18 09:44:51 +05:30
Yao Matrix
eef3d65954 enable 2 test cases on XPU (#11332)
* enable 2 test cases on XPU

Signed-off-by: YAO Matrix <matrix.yao@intel.com>

* Apply style fixes

---------

Signed-off-by: YAO Matrix <matrix.yao@intel.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>
2025-04-17 13:27:41 -10:00
Frank (Haofan) Wang
ee6ad51d96 Update controlnet_flux.py (#11350) 2025-04-17 10:05:01 -10:00
Sayak Paul
4397f59a37 [bitsandbytes] improve dtype mismatch handling for bnb + lora. (#11270)
* improve dtype mismatch handling for bnb + lora.

* add a test

* fix and updates

* update
2025-04-17 19:51:49 +05:30
YiYi Xu
056793295c [Hi Dream] follow-up (#11296)
* add
2025-04-17 01:17:44 -10:00
Sayak Paul
29d2afbfe2 [LoRA] Propagate hotswap better (#11333)
* propagate hotswap to other load_lora_weights() methods.

* simplify documentations.

* updates

* propagate to load_lora_into_text_encoder.

* empty commit
2025-04-17 10:35:38 +05:30
Sayak Paul
b00a564dac [docs] add note about use_duck_shape in auraflow docs. (#11348)
add note about use_duck_shape in auraflow docs.
2025-04-17 10:25:39 +05:30
Sayak Paul
efc9d68b15 [chore] fix lora docs utils (#11338)
fix lora docs utils
2025-04-17 09:25:53 +05:30
nPeppon
3e59d531d1 Fix wrong dtype argument name as torch_dtype (#11346) 2025-04-16 16:00:25 -04:00
Ishan Modi
d63e6fccb1 [BUG] fixed _toctree.yml alphabetical ordering (#11277)
update
2025-04-16 09:04:22 -07:00
Dhruv Nair
59f1b7b1c8 Hunyuan I2V fast tests fix (#11341)
* update

* update
2025-04-16 18:40:33 +05:30
Sayak Paul
ce1063acfa [docs] add a snippet for compilation in the auraflow docs. (#11327)
* add a snippet for compilation in the auraflow docs.

* include speedups.
2025-04-16 11:12:09 +05:30
Sayak Paul
7212f35de2 [single file] enable telemetry for single file loading when using GGUF. (#11284)
* enable telemetry for single file loading when using GGUF.

* quality
2025-04-16 08:33:52 +05:30
Sayak Paul
3252d7ad11 unpin torch versions for onnx Dockerfile (#11290)
unpin torch versions for onnx
2025-04-16 08:16:38 +05:30
Dhruv Nair
b316104ddd Fix Hunyuan I2V for transformers>4.47.1 (#11293)
* update

* update
2025-04-16 07:53:32 +05:30
Álvaro Somoza
d3b2699a7f another fix for FlowMatchLCMScheduler forgotten import (#11330)
fix
2025-04-15 13:53:16 -10:00
Sayak Paul
4b868f14c1 post release 0.33.0 (#11255)
* post release

* update

* fix deprecations

* remaining

* update

---------

Co-authored-by: YiYi Xu <yixu310@gmail.com>
2025-04-15 06:50:08 -10:00
AstraliteHeart
b6156aafe9 Rewrite AuraFlowPatchEmbed.pe_selection_index_based_on_dim to be torch.compile compatible (#11297)
* Update pe_selection_index_based_on_dim

* Make pe_selection_index_based_on_dim work with torh.compile

* Fix AuraFlowTransformer2DModel's dpcstring default values

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2025-04-15 19:30:30 +05:30
Sayak Paul
7ecfe29160 [docs] fix hidream docstrings. (#11325)
* fix hidream docstrings.

* fix

* empty commit
2025-04-15 16:26:21 +05:30
Yao Matrix
7edace9a05 fix CPU offloading related fail cases on XPU (#11288)
* fix CPU offloading related fail cases on XPU

Signed-off-by: YAO Matrix <matrix.yao@intel.com>

* fix style

Signed-off-by: YAO Matrix <matrix.yao@intel.com>

* Apply style fixes

* trigger tests

* test_pipe_same_device_id_offload

---------

Signed-off-by: YAO Matrix <matrix.yao@intel.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: hlky <hlky@hlky.ac>
2025-04-15 09:06:56 +01:00
hlky
6e80d240d3 Fix vae.Decoder prev_output_channel (#11280) 2025-04-15 08:03:50 +01:00
Hameer Abbasi
9352a5ca56 [LoRA] Add LoRA support to AuraFlow (#10216)
* Add AuraFlowLoraLoaderMixin

* Add comments, remove qkv fusion

* Add Tests

* Add AuraFlowLoraLoaderMixin to documentation

* Add Suggested changes

* Change attention_kwargs->joint_attention_kwargs

* Rebasing derp.

* fix

* fix

* Quality fixes.

* make style

* `make fix-copies`

* `ruff check --fix`

* Attept 1 to fix tests.

* Attept 2 to fix tests.

* Attept 3 to fix tests.

* Address review comments.

* Rebasing derp.

* Get more tests passing by copying from Flux. Address review comments.

* `joint_attention_kwargs`->`attention_kwargs`

* Add `lora_scale` property for te LoRAs.

* Make test better.

* Remove useless property.

* Skip TE-only tests for AuraFlow.

* Support LoRA for non-CLIP TEs.

* Restore LoRA tests.

* Undo adding LoRA support for non-CLIP TEs.

* Undo support for TE in AuraFlow LoRA.

* `make fix-copies`

* Sync with upstream changes.

* Remove unneeded stuff.

* Mirror `Lumina2`.

* Skip for MPS.

* Address review comments.

* Remove duplicated code.

* Remove unnecessary code.

* Remove repeated docs.

* Propagate attention.

* Fix TE target modules.

* MPS fix for LoRA tests.

* Unrelated TE LoRA tests fix.

* Fix AuraFlow LoRA tests by applying to the right denoiser layers.

Co-authored-by: AstraliteHeart <81396681+AstraliteHeart@users.noreply.github.com>

* Apply style fixes

* empty commit

* Fix the repo consistency issues.

* Remove unrelated changes.

* Style.

* Fix `test_lora_fuse_nan`.

* fix quality issues.

* `pytest.xfail` -> `ValueError`.

* Add back `skip_mps`.

* Apply style fixes

* `make fix-copies`

---------

Co-authored-by: Warlord-K <warlordk28@gmail.com>
Co-authored-by: hlky <hlky@hlky.ac>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: AstraliteHeart <81396681+AstraliteHeart@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2025-04-15 10:41:28 +05:30
Sayak Paul
cefa28f449 [docs] Promote AutoModel usage (#11300)
* docs: promote the usage of automodel.

* bitsandbytes

* Apply suggestions from code review

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-04-15 09:25:40 +05:30
Beinsezii
8819cda6c0 Add skrample section to community_projects.md (#11319)
Update community_projects.md

https://github.com/huggingface/diffusers/discussions/11158#discussioncomment-12681691
2025-04-14 12:12:59 -10:00
hlky
dcf836cf47 Use float32 on mps or npu in transformer_hidream_image's rope (#11316) 2025-04-14 20:19:21 +01:00
Álvaro Somoza
1cb73cb19f import for FlowMatchLCMScheduler (#11318)
* add

* fix-copies
2025-04-14 06:28:57 -10:00
Linoy Tsaban
ba6008abfe [HiDream] code example (#11317) 2025-04-14 16:19:30 +01:00
Sayak Paul
a8f5134c11 [LoRA] support more SDXL loras. (#11292)
* support more SDXL loras.

* update

---------

Co-authored-by: hlky <hlky@hlky.ac>
2025-04-14 17:09:59 +05:30
Fanli Lin
c7f2d239fe make KolorsPipelineFastTests::test_inference_batch_single_identical pass on XPU (#11313)
adjust diff
2025-04-14 11:02:02 +01:00
Yao Matrix
fa1ac50a66 make test_stable_diffusion_karras_sigmas pass on XPU (#11310)
Signed-off-by: Matrix Yao <matrix.yao@intel.com>
2025-04-14 08:15:38 +01:00
Yao Matrix
aa541b9fab make KandinskyV22PipelineInpaintCombinedFastTests::test_float16_inference pass on XPU (#11308)
loose expected_max_diff from 5e-1 to 8e-1 to make
KandinskyV22PipelineInpaintCombinedFastTests::test_float16_inference
pass on XPU

Signed-off-by: Matrix Yao <matrix.yao@intel.com>
2025-04-14 07:49:20 +01:00
Ishan Modi
f1f38ffbee [ControlNet] Adds controlnet for SanaTransformer (#11040)
* added controlnet for sana transformer

* improve code quality

* addressed PR comments

* bug fixes

* added test cases

* update

* added dummy objects

* addressed PR comments

* update

* Forcing update

* add to docs

* code quality

* addressed PR comments

* addressed PR comments

* update

* addressed PR comments

* added proper styling

* update

* Revert "added proper styling"

This reverts commit 344ee8a701.

* manually ordered

* Apply suggestions from code review

---------

Co-authored-by: Aryan <contact.aryanvs@gmail.com>
2025-04-13 19:19:39 +05:30
Tuna Tuncer
36538e1135 Fix incorrect tile_latent_min_width calculations (#11305) 2025-04-13 18:21:50 +05:30
Aryan
97e0ef4db4 Hidream refactoring follow ups (#11299)
* HiDream Image

* update

* -einops

* py3.8

* fix -einops

* mixins, offload_seq, option_components

* docs

* Apply style fixes

* trigger tests

* Apply suggestions from code review

Co-authored-by: Aryan <contact.aryanvs@gmail.com>

* joint_attention_kwargs -> attention_kwargs, fixes

* fast tests

* -_init_weights

* style tests

* move reshape logic

* update slice 😴

* supports_dduf

* 🤷🏻‍♂️

* Update src/diffusers/models/transformers/transformer_hidream_image.py

Co-authored-by: Aryan <contact.aryanvs@gmail.com>

* address review comments

* update tests

* doc updates

* update

* Update src/diffusers/models/transformers/transformer_hidream_image.py

* Apply style fixes

---------

Co-authored-by: hlky <hlky@hlky.ac>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2025-04-13 09:45:19 +05:30
Adrien B
ed41db8525 Update autoencoderkl_allegro.md (#11303)
Correction typo
2025-04-13 09:41:30 +05:30
Nikita Starodubcev
ec0b2b3947 flow matching lcm scheduler (#11170)
* add flow matching lcm scheduler
* stochastic sampling
* upscaling for scale-wise generation

* Apply style fixes

* Apply suggestions from code review

Co-authored-by: hlky <hlky@hlky.ac>

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: YiYi Xu <yixu310@gmail.com>
Co-authored-by: hlky <hlky@hlky.ac>
2025-04-12 11:14:57 -10:00
hlky
0ef29355c9 HiDream Image (#11231)
* HiDream Image


---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Aryan <contact.aryanvs@gmail.com>
Co-authored-by: Aryan <aryan@huggingface.co>
2025-04-11 06:31:34 -10:00
Tuna Tuncer
bc261058ee Fix incorrect tile_latent_min_width calculation in AutoencoderKLMochi (#11294) 2025-04-11 19:11:08 +05:30
Sayak Paul
7054a34978 do not use DIFFUSERS_REQUEST_TIMEOUT for notification bot (#11273)
fix to a constant
2025-04-11 14:19:46 +05:30
Sayak Paul
511d738121 [CI] relax tolerance for unclip further (#11268)
relax tolerance for unclip further.
2025-04-11 14:06:52 +05:30
Sayak Paul
ea5a6a8b7c [Tests] Cleanup lora tests utils (#11276)
* start cleaning up lora test utils for reusability

* update

* updates

* updates
2025-04-10 15:50:34 +05:30
hlky
b8093e6665 Fix LTX 0.9.5 single file (#11271) 2025-04-10 07:06:13 +01:00
Yuqian Hong
e121d0ef67 [BUG] Fix convert_vae_pt_to_diffusers bug (#11078)
* fix attention

* Apply style fixes

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2025-04-10 06:59:45 +01:00
Yao Matrix
31c4f24fc1 make test_instant_style_multiple_masks pass on XPU (#11266)
Signed-off-by: Matrix Yao <matrix.yao@intel.com>
2025-04-10 06:23:00 +01:00
xieofxie
0efdf411fb add onnxruntime-qnn & onnxruntime-cann (#11269)
Co-authored-by: hualxie <hualxie@microsoft.com>
2025-04-10 06:00:23 +01:00
Yao Matrix
450dc48a2c make test_dict_tuple_outputs_equivalent pass on XPU (#11265)
Signed-off-by: Matrix Yao <matrix.yao@intel.com>
2025-04-10 05:56:28 +01:00
Yao Matrix
77b4f66b9e make test_stable_diffusion_inpaint_fp16 pass on XPU (#11264)
Signed-off-by: Matrix Yao <matrix.yao@intel.com>
2025-04-10 05:55:54 +01:00
Yao Matrix
68663f8a17 fix test_vanilla_funetuning failure on XPU and A100 (#11263)
* fix test_vanilla_funetuning failure on XPU and A100

Signed-off-by: Matrix Yao <matrix.yao@intel.com>

* change back to 5e-2

Signed-off-by: Matrix Yao <matrix.yao@intel.com>

---------

Signed-off-by: Matrix Yao <matrix.yao@intel.com>
2025-04-10 05:55:07 +01:00
Sayak Paul
ffda8735be [LoRA] support musubi wan loras. (#11243)
* support musubi wan loras.

* Update src/diffusers/loaders/lora_conversion_utils.py

Co-authored-by: hlky <hlky@hlky.ac>

* support i2v loras from musubi too.

---------

Co-authored-by: hlky <hlky@hlky.ac>
2025-04-10 09:50:22 +05:30
YiYi Xu
0706786e53 fix wan ftfy import (#11262) 2025-04-09 09:08:34 -10:00
Sayak Paul
5b27f8aba8 fix consisid imports (#11254)
* fix consisid imports

* fix opencv import

* fix
2025-04-09 18:49:32 +05:30
Sayak Paul
d1387ecee5 fix timeout constant (#11252)
* fix timeout constant

* style

* fix
2025-04-09 17:48:52 +05:30
Ilya Drobyshevskiy
6a7c2d0afa fix flux controlnet bug (#11152)
Before this if txt_ids was 3d tensor, line with txt_ids[:1] concat txt_ids by batch dim. Now we first check that txt_ids is 2d tensor (or take first batch element) and then concat by token dim
2025-04-09 13:01:07 +01:00
Dhruv Nair
edc154da09 Update Ruff to latest Version (#10919)
* update

* update

* update

* update
2025-04-09 16:51:34 +05:30
hlky
552cd32058 [docs] AutoModel (#11250)
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2025-04-09 16:42:23 +05:30
Yao Matrix
c36c745ceb fix FluxReduxSlowTests::test_flux_redux_inference case failure on XPU (#11245)
* loose test_float16_inference's tolerance from 5e-2 to 6e-2, so XPU can
pass UT

Signed-off-by: Matrix Yao <matrix.yao@intel.com>

* fix test_pipeline_flux_redux fail on XPU

Signed-off-by: Matrix Yao <matrix.yao@intel.com>

---------

Signed-off-by: Matrix Yao <matrix.yao@intel.com>
2025-04-09 11:41:15 +01:00
hlky
437cb36c65 AutoModel (#11115)
* AutoModel

* ...

* lol

* ...

* add test

* update

* make fix-copies

---------

Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>
2025-04-09 15:20:07 +05:30
hlky
9ee3dd3862 AudioLDM2 Fixes (#11244) 2025-04-09 14:12:00 +05:30
Sayak Paul
fd02aad402 fix: SD3 ControlNet validation so that it runs on a A100. (#11238)
* fix: SD3 ControlNet validation so that it runs on a A100.

* use backend-agnostic cache and pass devide.
2025-04-09 12:12:53 +05:30
Sayak Paul
6bfacf0418 [LoRA] support more comyui loras for Flux 🚨 (#10985)
* support more comyui loras.

* fix

* fixes

* revert changes in LoRA base.

* no position_embedding

* 🚨 introduce a breaking change to let peft handle module ambiguity

* styling

* remove position embeddings.

* improvements.

* style

* make info instead of NotImplementedError

* Update src/diffusers/loaders/peft.py

Co-authored-by: hlky <hlky@hlky.ac>

* add example.

* robust checks

* updates

---------

Co-authored-by: hlky <hlky@hlky.ac>
2025-04-09 09:17:05 +05:30
Sayak Paul
f685981ed0 [docs] minor updates to dtype map docs. (#11237)
minor updates to dtype map docs.
2025-04-09 08:38:17 +05:30
Sayak Paul
b924251dd8 minor update to sana sprint docs. (#11236) 2025-04-09 08:17:45 +05:30
Sayak Paul
1a04812439 [bistandbytes] improve replacement warnings for bnb (#11132)
* improve replacement warnings for bnb

* updates to docs.
2025-04-08 21:18:34 +05:30
Sayak Paul
4b27c4a494 [feat] implement record_stream when using CUDA streams during group offloading (#11081)
* implement record_stream for better performance.

* fix

* style.

* merge #11097

* Update src/diffusers/hooks/group_offloading.py

Co-authored-by: Aryan <aryan@huggingface.co>

* fixes

* docstring.

* remaining todos in low_cpu_mem_usage

* tests

* updates to docs.

---------

Co-authored-by: Aryan <aryan@huggingface.co>
2025-04-08 21:17:49 +05:30
hlky
5d49b3e83b Flux quantized with lora (#10990)
* Flux quantized with lora

* fix

* changes

* Apply suggestions from code review

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

* Apply style fixes

* enable model cpu offload()

* Update src/diffusers/loaders/lora_pipeline.py

Co-authored-by: hlky <hlky@hlky.ac>

* update

* Apply suggestions from code review

* update

* add peft as an additional dependency for gguf

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>
2025-04-08 21:17:03 +05:30
Linoy Tsaban
71f34fc5a4 [Flux LoRA] fix issues in flux lora scripts (#11111)
* remove custom scheduler

* update requirements.txt

* log_validation with mixed precision

* add intermediate embeddings saving when checkpointing is enabled

* remove comment

* fix validation

* add unwrap_model for accelerator, torch.no_grad context for validation, fix accelerator.accumulate call in advanced script

* revert unwrap_model change temp

* add .module to address distributed training bug + replace accelerator.unwrap_model with unwrap model

* changes to align advanced script with canonical script

* make changes for distributed training + unify unwrap_model calls in advanced script

* add module.dtype fix to dreambooth script

* unify unwrap_model calls in dreambooth script

* fix condition in validation run

* mixed precision

* Update examples/advanced_diffusion_training/train_dreambooth_lora_flux_advanced.py

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

* smol style change

* change autocast

* Apply style fixes

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2025-04-08 17:40:30 +03:00
Yao Matrix
c51b6bd837 introduce compute arch specific expectations and fix test_sd3_img2img_inference failure (#11227)
* add arch specfic expectations support, to support different arch's numerical characteristics

Signed-off-by: YAO Matrix <matrix.yao@intel.com>

* fix typo

Signed-off-by: YAO Matrix <matrix.yao@intel.com>

* Apply suggestions from code review

* Apply style fixes

* Update src/diffusers/utils/testing_utils.py

---------

Signed-off-by: YAO Matrix <matrix.yao@intel.com>
Co-authored-by: hlky <hlky@hlky.ac>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2025-04-08 14:57:49 +01:00
Benjamin Bossan
fb54499614 [LoRA] Implement hot-swapping of LoRA (#9453)
* [WIP][LoRA] Implement hot-swapping of LoRA

This PR adds the possibility to hot-swap LoRA adapters. It is WIP.

Description

As of now, users can already load multiple LoRA adapters. They can
offload existing adapters or they can unload them (i.e. delete them).
However, they cannot "hotswap" adapters yet, i.e. substitute the weights
from one LoRA adapter with the weights of another, without the need to
create a separate LoRA adapter.

Generally, hot-swapping may not appear not super useful but when the
model is compiled, it is necessary to prevent recompilation. See #9279
for more context.

Caveats

To hot-swap a LoRA adapter for another, these two adapters should target
exactly the same layers and the "hyper-parameters" of the two adapters
should be identical. For instance, the LoRA alpha has to be the same:
Given that we keep the alpha from the first adapter, the LoRA scaling
would be incorrect for the second adapter otherwise.

Theoretically, we could override the scaling dict with the alpha values
derived from the second adapter's config, but changing the dict will
trigger a guard for recompilation, defeating the main purpose of the
feature.

I also found that compilation flags can have an impact on whether this
works or not. E.g. when passing "reduce-overhead", there will be errors
of the type:

> input name: arg861_1. data pointer changed from 139647332027392 to
139647331054592

I don't know enough about compilation to determine whether this is
problematic or not.

Current state

This is obviously WIP right now to collect feedback and discuss which
direction to take this. If this PR turns out to be useful, the
hot-swapping functions will be added to PEFT itself and can be imported
here (or there is a separate copy in diffusers to avoid the need for a
min PEFT version to use this feature).

Moreover, more tests need to be added to better cover this feature,
although we don't necessarily need tests for the hot-swapping
functionality itself, since those tests will be added to PEFT.

Furthermore, as of now, this is only implemented for the unet. Other
pipeline components have yet to implement this feature.

Finally, it should be properly documented.

I would like to collect feedback on the current state of the PR before
putting more time into finalizing it.

* Reviewer feedback

* Reviewer feedback, adjust test

* Fix, doc

* Make fix

* Fix for possible g++ error

* Add test for recompilation w/o hotswapping

* Make hotswap work

Requires https://github.com/huggingface/peft/pull/2366

More changes to make hotswapping work. Together with the mentioned PEFT
PR, the tests pass for me locally.

List of changes:

- docstring for hotswap
- remove code copied from PEFT, import from PEFT now
- adjustments to PeftAdapterMixin.load_lora_adapter (unfortunately, some
  state dict renaming was necessary, LMK if there is a better solution)
- adjustments to UNet2DConditionLoadersMixin._process_lora: LMK if this
  is even necessary or not, I'm unsure what the overall relationship is
  between this and PeftAdapterMixin.load_lora_adapter
- also in UNet2DConditionLoadersMixin._process_lora, I saw that there is
  no LoRA unloading when loading the adapter fails, so I added it
  there (in line with what happens in PeftAdapterMixin.load_lora_adapter)
- rewritten tests to avoid shelling out, make the test more precise by
  making sure that the outputs align, parametrize it
- also checked the pipeline code mentioned in this comment:
  https://github.com/huggingface/diffusers/pull/9453#issuecomment-2418508871;
  when running this inside the with
  torch._dynamo.config.patch(error_on_recompile=True) context, there is
  no error, so I think hotswapping is now working with pipelines.

* Address reviewer feedback:

- Revert deprecated method
- Fix PEFT doc link to main
- Don't use private function
- Clarify magic numbers
- Add pipeline test

Moreover:
- Extend docstrings
- Extend existing test for outputs != 0
- Extend existing test for wrong adapter name

* Change order of test decorators

parameterized.expand seems to ignore skip decorators if added in last
place (i.e. innermost decorator).

* Split model and pipeline tests

Also increase test coverage by also targeting conv2d layers (support of
which was added recently on the PEFT PR).

* Reviewer feedback: Move decorator to test classes

... instead of having them on each test method.

* Apply suggestions from code review

Co-authored-by: hlky <hlky@hlky.ac>

* Reviewer feedback: version check, TODO comment

* Add enable_lora_hotswap method

* Reviewer feedback: check _lora_loadable_modules

* Revert changes in unet.py

* Add possibility to ignore enabled at wrong time

* Fix docstrings

* Log possible PEFT error, test

* Raise helpful error if hotswap not supported

I.e. for the text encoder

* Formatting

* More linter

* More ruff

* Doc-builder complaint

* Update docstring:

- mention no text encoder support yet
- make it clear that LoRA is meant
- mention that same adapter name should be passed

* Fix error in docstring

* Update more methods with hotswap argument

- SDXL
- SD3
- Flux

No changes were made to load_lora_into_transformer.

* Add hotswap argument to load_lora_into_transformer

For SD3 and Flux. Use shorter docstring for brevity.

* Extend docstrings

* Add version guards to tests

* Formatting

* Fix LoRA loading call to add prefix=None

See:
https://github.com/huggingface/diffusers/pull/10187#issuecomment-2717571064

* Run make fix-copies

* Add hot swap documentation to the docs

* Apply suggestions from code review

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: hlky <hlky@hlky.ac>
Co-authored-by: YiYi Xu <yixu310@gmail.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-04-08 17:05:31 +05:30
Álvaro Somoza
723dbdd363 [Training] Better image interpolation in training scripts (#11206)
* initial

* Update examples/dreambooth/train_dreambooth_lora_sdxl.py

Co-authored-by: hlky <hlky@hlky.ac>

* update

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: hlky <hlky@hlky.ac>
2025-04-08 12:26:07 +05:30
Bhavay Malhotra
fbf61f465b [train_controlnet.py] Fix the LR schedulers when num_train_epochs is passed in a distributed training env (#8461)
* Create diffusers.yml

* fix num_train_epochs

* Delete diffusers.yml

* Fixed Changes

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: YiYi Xu <yixu310@gmail.com>
2025-04-08 12:10:09 +05:30
Inigo Goiri
841504bb1a Add support to pass image embeddings to the WAN I2V pipeline. (#11175)
* Add support to pass image embeddings to the pipeline.



---------

Co-authored-by: hlky <hlky@hlky.ac>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: YiYi Xu <yixu310@gmail.com>
2025-04-07 15:47:06 -10:00
Steven Liu
fc7a867ae5 [docs] MPS update (#11212)
mps
2025-04-07 14:32:27 -10:00
alex choi
5ded26cdc7 ensure dtype match between diffused latents and vae weights (#8391) 2025-04-07 12:59:10 -10:00
Yao Matrix
506f39af3a enable 1 case on XPU (#11219)
enable case on XPU: 1. tests/quantization/bnb/test_mixed_int8.py::BnB8bitTrainingTests::test_training

Signed-off-by: YAO Matrix <matrix.yao@intel.com>
2025-04-07 08:24:21 +01:00
Mikko Tukiainen
8ad68c1393 Add missing MochiEncoder3D.gradient_checkpointing attribute (#11146)
* Add missing 'gradient_checkpointing = False' attr

* Add (limited) tests for Mochi autoencoder

* Apply style fixes

* pass 'conv_cache' as arg instead of kwarg

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2025-04-06 02:46:45 +05:30
Edna
41afb6690c Add Wan with STG as a community pipeline (#11184)
* Add stg wan to community pipelines

* remove debug prints

* remove unused comment

* Update doc

* Add credit + fix typo

* Apply style fixes

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2025-04-05 04:00:40 +02:00
Tolga Cangöz
13e48492f0 [LTX0.9.5] Refactor LTXConditionPipeline for text-only conditioning (#11174)
* Refactor `LTXConditionPipeline` to add text-only conditioning

* style

* up

* Refactor `LTXConditionPipeline` to streamline condition handling and improve clarity

* Improve condition checks

* Simplify latents handling based on conditioning type

* Refactor rope_interpolation_scale preparation for clarity and efficiency

* Update LTXConditionPipeline docstring to clarify supported input types

* Add LTX Video 0.9.5 model to documentation

* Clarify documentation to indicate support for text-only conditioning without passing `conditions`

* refactor: comment out unused parameters in LTXConditionPipeline

* fix: restore previously commented parameters in LTXConditionPipeline

* fix: remove unused parameters from LTXConditionPipeline

* refactor: remove unnecessary lines in LTXConditionPipeline
2025-04-04 16:43:15 +02:00
Suprhimp
94f2c48d58 [feat]Add strength in flux_fill pipeline (denoising strength for fluxfill) (#10603)
* [feat]add strength in flux_fill pipeline

* Update src/diffusers/pipelines/flux/pipeline_flux_fill.py

* Update src/diffusers/pipelines/flux/pipeline_flux_fill.py

* Update src/diffusers/pipelines/flux/pipeline_flux_fill.py

* [refactor] refactor after review

* [fix] change comment

* Apply style fixes

* empty

* fix

* update prepare_latents from flux.img2img pipeline

* style

* Update src/diffusers/pipelines/flux/pipeline_flux_fill.py

---------
2025-04-04 11:23:30 -03:00
Dhruv Nair
aabf8ce20b Fix Single File loading for LTX VAE (#11200)
update
2025-04-04 18:02:39 +05:30
Kenneth Gerald Hamilton
f10775b1b5 Fixed requests.get function call by adding timeout parameter. (#11156)
* Fixed requests.get function call by adding timeout parameter.

* declare DIFFUSERS_REQUEST_TIMEOUT in constants and import when needed

* remove unneeded os import

* Apply style fixes

---------

Co-authored-by: Sai-Suraj-27 <sai.suraj.27.729@gmail.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2025-04-04 07:23:14 +01:00
célina
6edb774b5e Update Style Bot workflow (#11202)
update style bot workflow
2025-04-03 19:31:49 +02:00
Basile Lewandowski
480510ada9 Change KolorsPipeline LoRA Loader to StableDiffusion (#11198)
Change LoRA Loader to StableDiffusion

Replace the SDXL LoRA Loader Mixin inheritance with the StableDiffusion one
2025-04-03 11:21:11 -03:00
Abhipsha Das
d9023a671a [Model Card] standardize advanced diffusion training sdxl lora (#7615)
* model card gen code

* push modelcard creation

* remove optional from params

* add import

* add use_dora check

* correct lora var use in tags

* make style && make quality

---------

Co-authored-by: Aryan <aryan@huggingface.co>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2025-04-03 07:43:01 +05:30
Eliseu Silva
c4646a3931 feat: [Community Pipeline] - FaithDiff Stable Diffusion XL Pipeline (#11188)
* feat: [Community Pipeline] - FaithDiff Stable Diffusion XL Pipeline for Image SR.

* added pipeline
2025-04-02 11:33:19 -10:00
Dhruv Nair
c97b709afa Add CacheMixin to Wan and LTX Transformers (#11187)
* update

* update

* update
2025-04-02 10:16:31 -10:00
lakshay sharma
b0ff822ed3 Update import_utils.py (#10329)
added onnxruntime-vitisai for custom build onnxruntime pkg
2025-04-02 20:47:10 +01:00
hlky
78c2fdc52e SchedulerMixin from_pretrained and ConfigMixin Self type annotation (#11192) 2025-04-02 08:24:02 -10:00
hlky
54dac3a87c Fix enable_sequential_cpu_offload in CogView4Pipeline (#11195)
* Fix enable_sequential_cpu_offload in CogView4Pipeline

* make fix-copies
2025-04-02 16:51:23 +01:00
hlky
e5c6027ef8 [docs] torch_dtype map (#11194) 2025-04-02 12:46:28 +01:00
hlky
da857bebb6 Revert save_model in ModelMixin save_pretrained and use safe_serialization=False in test (#11196) 2025-04-02 12:45:36 +01:00
Fanli Lin
52b460feb9 [tests] HunyuanDiTControlNetPipeline inference precision issue on XPU (#11197)
* add xpu part

* fix more cases

* remove some cases

* no canny

* format fix
2025-04-02 12:45:02 +01:00
hlky
d8c617ccb0 allow models to run with a user-provided dtype map instead of a single dtype (#10301)
* allow models to run with a user-provided dtype map instead of a single dtype

* make style

* Add warning, change `_` to `default`

* make style

* add test

* handle shared tensors

* remove warning

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2025-04-02 09:05:46 +01:00
Bruno Magalhaes
fe2b397426 remove unnecessary call to F.pad (#10620)
* rewrite memory count without implicitly using dimensions by @ic-synth

* replace F.pad by built-in padding in Conv3D

* in-place sums to reduce memory allocations

* fixed trailing whitespace

* file reformatted

* in-place sums

* simpler in-place expressions

* removed in-place sum, may affect backward propagation logic

* removed in-place sum, may affect backward propagation logic

* removed in-place sum, may affect backward propagation logic

* reverted change
2025-04-02 08:19:51 +01:00
Eliseu Silva
be0b7f55cc fix: for checking mandatory and optional pipeline components (#11189)
fix: optional componentes verification on load
2025-04-02 08:07:24 +01:00
jiqing-feng
4d5a96e40a fix autocast (#11190)
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
2025-04-02 07:26:27 +01:00
Yao Matrix
a7f07c1ef5 map BACKEND_RESET_MAX_MEMORY_ALLOCATED to reset_peak_memory_stats on XPU (#11191)
Signed-off-by: YAO Matrix <matrix.yao@intel.com>
2025-04-02 07:25:48 +01:00
Dhruv Nair
df1d7b01f1 [WIP] Add Wan Video2Video (#11053)
* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update
2025-04-01 17:22:11 +05:30
Fanli Lin
5a6edac087 [tests] no hard-coded cuda (#11186)
no cuda only
2025-04-01 12:14:31 +01:00
kakukakujirori
e8fc8b1f81 Bug fix in LTXImageToVideoPipeline.prepare_latents() when latents is already set (#10918)
* Bug fix in ltx

* Assume packed latents.

---------

Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>
Co-authored-by: YiYi Xu <yixu310@gmail.com>
2025-03-31 12:15:43 -10:00
hlky
d6f4774c1c Add latents_mean and latents_std to SDXLLongPromptWeightingPipeline (#11034) 2025-03-31 11:32:29 -10:00
Mark
eb50defff2 [Docs] Fix environment variables in installation.md (#11179) 2025-03-31 09:15:25 -07:00
Aryan
2c59af7222 Raise warning and round down if Wan num_frames is not 4k + 1 (#11167)
* update

* raise warning and round to nearest multiple of scale factor
2025-03-31 13:33:28 +05:30
hlky
75d7e5cc45 Fix LatteTransformer3DModel dtype mismatch with enable_temporal_attentions (#11139) 2025-03-29 15:52:56 +01:00
Dhruv Nair
617c208bb4 [Docs] Update Wan Docs with memory optimizations (#11089)
* update

* update
2025-03-28 19:05:56 +05:30
hlky
5d970a4aa9 WanI2V encode_image (#11164)
* WanI2V encode_image
2025-03-28 18:05:34 +05:30
kentdan3msu
de6a88c2d7 Set self._hf_peft_config_loaded to True when LoRA is loaded using load_lora_adapter in PeftAdapterMixin class (#11155)
set self._hf_peft_config_loaded to True on successful lora load

Sets the `_hf_peft_config_loaded` flag if a LoRA is successfully loaded in `load_lora_adapter`. Fixes bug huggingface/diffusers/issues/11148

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2025-03-26 18:31:18 +01:00
Dhruv Nair
7dc52ea769 [Quantization] dtype fix for GGUF + fix BnB tests (#11159)
* update

* update

* update

* update
2025-03-26 22:22:16 +05:30
Junsong Chen
739d6ec731 add a timestep scale for sana-sprint teacher model (#11150) 2025-03-25 08:47:39 -10:00
Aryan
1ddf3f3a19 Improve information about group offloading and layerwise casting (#11101)
* update

* Update docs/source/en/optimization/memory.md

* Apply suggestions from code review

Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>

* apply review suggestions

* update

---------

Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>
2025-03-24 23:25:59 +05:30
Jun Yeop Na
7aac77affa [doc] Fix Korean Controlnet Train doc (#11141)
* remove typo from korean controlnet train doc

* removed more paragraphs to remain in sync with the english document
2025-03-24 09:38:21 -07:00
Aryan
8907a70a36 New HunyuanVideo-I2V (#11066)
* update

* update

* update

* add tests

* update docs

* raise value error

* warning for true cfg and guidance scale

* fix test
2025-03-24 21:18:40 +05:30
Junsong Chen
5dbe4f5de6 [fix SANA-Sprint] (#11142)
* fix bug in sana conversion script;

* add more model paths;

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2025-03-23 23:38:14 -10:00
Yuxuan Zhang
1d37f42055 Modify the implementation of retrieve_timesteps in CogView4-Control. (#11125)
* 1

* change to channel 1

* cogview4 control training

* add CacheMixin

* 1

* remove initial_input_channels change for val

* 1

* update

* use 3.5

* new loss

* 1

* use imagetoken

* for megatron convert

* 1

* train con and uc

* 2

* remove guidance_scale

* Update pipeline_cogview4_control.py

* fix

* use cogview4 pipeline with timestep

* update shift_factor

* remove the uncond

* add max length

* change convert and use GLMModel instead of GLMForCasualLM

* fix

* [cogview4] Add attention mask support to transformer model

* [fix] Add attention mask for padded token

* update

* remove padding type

* Update train_control_cogview4.py

* resolve conflicts with #10981

* add control convert

* use control format

* fix

* add missing import

* update with cogview4 formate

* make style

* Update pipeline_cogview4_control.py

* Update pipeline_cogview4_control.py

* remove

* Update pipeline_cogview4_control.py

* put back

* Apply style fixes

---------

Co-authored-by: OleehyO <leehy0357@gmail.com>
Co-authored-by: yiyixuxu <yixu310@gmail.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2025-03-23 21:17:14 +05:30
Tolga Cangöz
0213179ba8 Update README and example code for AnyText usage (#11028)
* [Documentation] Update README and example code with additional usage instructions for AnyText

* [Documentation] Update README for AnyTextPipeline and improve logging in code

* Remove wget command for font file from example docstring in anytext.py
2025-03-23 21:15:57 +05:30
hlky
a7d53a5939 Don't override torch_dtype and don't use when quantization_config is set (#11039)
* Don't use `torch_dtype` when `quantization_config` is set

* up

* djkajka

* Apply suggestions from code review

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2025-03-21 21:58:38 +05:30
YiYi Xu
8a63aa5e4f add sana-sprint (#11074)
* add sana-sprint




---------

Co-authored-by: Junsong Chen <cjs1020440147@icloud.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: Aryan <aryan@huggingface.co>
2025-03-21 06:21:18 -10:00
Aryan
844221ae4e [core] FasterCache (#10163)
* init

* update

* update

* update

* make style

* update

* fix

* make it work with guidance distilled models

* update

* make fix-copies

* add tests

* update

* apply_faster_cache -> apply_fastercache

* fix

* reorder

* update

* refactor

* update docs

* add fastercache to CacheMixin

* update tests

* Apply suggestions from code review

* make style

* try to fix partial import error

* Apply style fixes

* raise warning

* update

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2025-03-21 09:35:04 +05:30
CyberVy
9b2c0a7dbe fix _callback_tensor_inputs of sd controlnet inpaint pipeline missing some elements (#11073)
* Update pipeline_controlnet_inpaint.py

* Apply style fixes
2025-03-20 23:56:12 -03:00
Parag Ekbote
f424b1b062 Notebooks for Community Scripts-8 (#11128)
Add 4 Notebooks and update the missing links for the
example README.
2025-03-20 12:24:46 -07:00
YiYi Xu
e9fda3924f remove F.rms_norm for now (#11126)
up
2025-03-20 07:55:01 -10:00
Dhruv Nair
2c1ed50fc5 Provide option to reduce CPU RAM usage in Group Offload (#11106)
* update

* update

* clean up
2025-03-20 17:01:09 +05:30
Fanli Lin
15ad97f782 [tests] make cuda only tests device-agnostic (#11058)
* enable bnb on xpu

* add 2 more cases

* add missing change

* add missing change

* add one more

* enable cuda only tests on xpu

* enable big gpu cases
2025-03-20 10:12:35 +00:00
hlky
9f2d5c9ee9 Flux with Remote Encode (#11091)
* Flux img2img remote encode

* Flux inpaint

* -copied from
2025-03-20 09:44:08 +00:00
Junsong Chen
dc62e6931e [fix bug] PixArt inference_steps=1 (#11079)
* fix bug when pixart-dmd inference with `num_inference_steps=1`

* use return_dict=False and return [1] element for 1-step pixart model, which works for both lcm and dmd
2025-03-20 07:44:30 +00:00
Fanli Lin
56f740051d [tests] enable bnb tests on xpu (#11001)
* enable bnb on xpu

* add 2 more cases

* add missing change

* add missing change

* add one more
2025-03-19 16:33:11 +00:00
Linoy Tsaban
a34d97cef0 [Wan LoRAs] make T2V LoRAs compatible with Wan I2V (#11107)
* @hlky t2v->i2v

* Apply style fixes

* try with ones to not nullify layers

* fix method name

* revert to zeros

* add check to state_dict keys

* add comment

* copies fix

* Revert "copies fix"

This reverts commit 051f534d18.

* remove copied from

* Update src/diffusers/loaders/lora_pipeline.py

Co-authored-by: hlky <hlky@hlky.ac>

* Update src/diffusers/loaders/lora_pipeline.py

Co-authored-by: hlky <hlky@hlky.ac>

* update

* update

* Update src/diffusers/loaders/lora_pipeline.py

Co-authored-by: hlky <hlky@hlky.ac>

* Apply style fixes

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Linoy <linoy@hf.co>
Co-authored-by: hlky <hlky@hlky.ac>
2025-03-19 21:44:19 +05:30
Yuqian Hong
fc28791fc8 [BUG] Fix Autoencoderkl train script (#11113)
* add disc_optimizer step (not fix)

* support syncbatchnorm in discriminator
2025-03-19 16:49:02 +05:30
Sayak Paul
ae14612673 [CI] uninstall deps properly from pr gpu tests. (#11102)
uninstall deps properly from pr gpu tests.
2025-03-19 08:58:36 +05:30
hlky
0ab8fe49bf Quality options in export_to_video (#11090)
* Quality options in `export_to_video`

* make style
2025-03-18 10:32:33 -10:00
Aryan
3be6706018 Fix Group offloading behaviour when using streams (#11097)
* update

* update
2025-03-18 14:44:10 +05:30
Cheng Jin
cb1b8b21b8 Resolve stride mismatch in UNet's ResNet to support Torch DDP (#11098)
Modify UNet's ResNet implementation to resolve stride mismatch in Torch's DDP
2025-03-18 07:38:13 +00:00
Juan Acevedo
27916822b2 update readme instructions. (#11096)
Co-authored-by: Juan Acevedo <jfacevedo@google.com>
2025-03-17 20:07:48 -10:00
co63oc
3fe3bc0642 Fix pipeline_flux_controlnet.py (#11095)
* Fix pipeline_flux_controlnet.py

* Fix style
2025-03-17 19:52:15 -10:00
Aryan
813d42cc96 Group offloading improvements (#11094)
update
2025-03-18 11:18:00 +05:30
Sayak Paul
b4d7e9c632 make PR GPU tests conditioned on styling. (#11099) 2025-03-18 11:15:35 +05:30
Aryan
2e83cbbb6d LTX 0.9.5 (#10968)
* update


---------

Co-authored-by: YiYi Xu <yixu310@gmail.com>
Co-authored-by: hlky <hlky@hlky.ac>
2025-03-17 16:43:36 -10:00
C
33d10af28f Fix Wan I2V Quality (#11087)
* fix_wan_i2v_quality

* Update src/diffusers/pipelines/wan/pipeline_wan_i2v.py

Co-authored-by: YiYi Xu <yixu310@gmail.com>

* Update src/diffusers/pipelines/wan/pipeline_wan_i2v.py

Co-authored-by: YiYi Xu <yixu310@gmail.com>

* Update src/diffusers/pipelines/wan/pipeline_wan_i2v.py

Co-authored-by: YiYi Xu <yixu310@gmail.com>

* Update pipeline_wan_i2v.py

---------

Co-authored-by: YiYi Xu <yixu310@gmail.com>
Co-authored-by: hlky <hlky@hlky.ac>
2025-03-17 06:24:57 -10:00
Sayak Paul
100142586f [CI] pin transformers version for benchmarking. (#11067)
pin transformers version for benchmarking.
2025-03-16 10:27:35 +05:30
Yuxuan Zhang
82188cef04 CogView4 Control Block (#10809)
* cogview4 control training


---------

Co-authored-by: OleehyO <leehy0357@gmail.com>
Co-authored-by: yiyixuxu <yixu310@gmail.com>
2025-03-15 07:15:56 -10:00
Sayak Paul
cc19726f3d [Tests] add requires peft decorator. (#11037)
* add requires peft decorator.

* install peft conditionally.

* conditional deps.

Co-authored-by: DN6 <dhruv.nair@gmail.com>

---------

Co-authored-by: DN6 <dhruv.nair@gmail.com>
2025-03-15 12:56:41 +05:30
Dimitri Barbot
be54a95b93 Fix deterministic issue when getting pipeline dtype and device (#10696)
Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>
2025-03-15 07:50:58 +05:30
Juan Acevedo
6b9a3334db reverts accidental change that removes attn_mask in attn. Improves fl… (#11065)
reverts accidental change that removes attn_mask in attn. Improves flux ptxla by using flash block sizes. Moves encoding outside the for loop.

Co-authored-by: Juan Acevedo <jfacevedo@google.com>
2025-03-14 12:47:01 -10:00
Andreas Jörg
8ead643bb7 [examples/controlnet/train_controlnet_sd3.py] Fixes #11050 - Cast prompt_embeds and pooled_prompt_embeds to weight_dtype to prevent dtype mismatch (#11051)
Fix: dtype mismatch of prompt embeddings in sd3 controlnet training

Co-authored-by: Andreas Jörg <andreasjoerg@MacBook-Pro-von-Andreas-2.fritz.box>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2025-03-14 17:33:15 +05:30
Sayak Paul
124ac3e81f [LoRA] feat: support non-diffusers wan t2v loras. (#11059)
feat: support non-diffusers wan t2v loras.
2025-03-14 16:01:25 +05:30
Sayak Paul
2f0f281b0d [Tests] restrict memory tests for quanto for certain schemes. (#11052)
* restrict memory tests for quanto for certain schemes.

* Apply suggestions from code review

Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>

* fixes

* style

---------

Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>
2025-03-14 10:35:19 +05:30
ZhengKai91
ccc8321651 Fix aclnnRepeatInterleaveIntWithDim error on NPU for get_1d_rotary_pos_embed (#10820)
* get_1d_rotary_pos_embed support npu

* Update src/diffusers/models/embeddings.py

---------

Co-authored-by: Kai zheng <kaizheng@KaideMacBook-Pro.local>
Co-authored-by: hlky <hlky@hlky.ac>
Co-authored-by: YiYi Xu <yixu310@gmail.com>
2025-03-13 09:58:03 -10:00
Yaniv Galron
5e48cd27d4 making ``formatted_images`` initialization compact (#10801)
compact writing

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: YiYi Xu <yixu310@gmail.com>
2025-03-13 09:27:14 -10:00
hlky
5551506b29 Rename Lumina(2)Text2ImgPipeline -> Lumina(2)Pipeline (#10827)
* Rename Lumina(2)Text2ImgPipeline -> Lumina(2)Pipeline


---------

Co-authored-by: YiYi Xu <yixu310@gmail.com>
2025-03-13 09:24:21 -10:00
Sayak Paul
20e4b6a628 [LoRA] change to warning from info when notifying the users about a LoRA no-op (#11044)
* move to warning.

* test related changes.
2025-03-12 21:20:48 +05:30
hlky
4ea9f89b8e Wan Pipeline scaling fix, type hint warning, multi generator fix (#11007)
* Wan Pipeline scaling fix, type hint warning, multi generator fix

* Apply suggestions from code review
2025-03-12 12:05:52 +00:00
hlky
733b44ac82 [hybrid inference 🍯🐝] Add VAE encode (#11017)
* [hybrid inference 🍯🐝] Add VAE encode

* _toctree: add vae encode

* Add endpoints, tests

* vae_encode docs

* vae encode benchmarks

* api reference

* changelog

* Update docs/source/en/hybrid_inference/overview.md

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

* update

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2025-03-12 11:23:41 +00:00
hlky
8b4f8ba764 Use output_size in repeat_interleave (#11030) 2025-03-12 07:30:21 +00:00
Dhruv Nair
5428046437 [Refactor] Clean up import utils boilerplate (#11026)
* update

* update

* update
2025-03-12 07:48:34 +05:30
39th president of the United States, probably
e7ffeae0a1 Fix for multi-GPU WAN inference (#10997)
Ensure that hidden_state and shift/scale are on the same device when running with multiple GPUs

Co-authored-by: Jimmy <39@🇺🇸.com>
2025-03-11 07:42:12 -10:00
CyberVy
d87ce2cefc Fix missing **kwargs in lora_pipeline.py (#11011)
* Update lora_pipeline.py

* Apply style fixes

* fix-copies

---------

Co-authored-by: hlky <hlky@hlky.ac>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2025-03-11 07:34:27 -10:00
wonderfan
36d0553af2 chore: fix help messages in advanced diffusion examples (#10923) 2025-03-11 07:33:55 -10:00
hlky
7e0db46f73 Fix SD3 IPAdapter feature extractor (#11027) 2025-03-11 16:29:27 +00:00
Sayak Paul
e4b056fe65 [LoRA] support wan i2v loras from the world. (#11025)
* support wan i2v loras from the world.

* remove copied from.

* upates

* add lora.
2025-03-11 20:43:29 +05:30
Eliseu Silva
4e3ddd5afa fix: mixture tiling sdxl pipeline - adjust gerating time_ids & embeddings (#11012)
small fix on generating time_ids & embeddings
2025-03-11 04:20:18 -03:00
Dhruv Nair
9add071592 [Quantization] Allow loading TorchAO serialized Tensor objects with torch>=2.6 (#11018)
* update

* update

* update

* update

* update

* update

* update

* update

* update
2025-03-11 10:52:01 +05:30
Tolga Cangöz
b88fef4785 [Research Project] Add AnyText: Multilingual Visual Text Generation And Editing (#8998)
* Add initial template

* Second template

* feat: Add TextEmbeddingModule to AnyTextPipeline

* feat: Add AuxiliaryLatentModule template to AnyTextPipeline

* Add bert tokenizer from the anytext repo for now

* feat: Update AnyTextPipeline's modify_prompt method

This commit adds improvements to the modify_prompt method in the AnyTextPipeline class. The method now handles special characters and replaces selected string prompts with a placeholder. Additionally, it includes a check for Chinese text and translation using the trans_pipe.

* Fill in the `forward` pass of `AuxiliaryLatentModule`

* `make style && make quality`

* `chore: Update bert_tokenizer.py with a TODO comment suggesting the use of the transformers library`

* Update error handling to raise and logging

* Add `create_glyph_lines` function into `TextEmbeddingModule`

* make style

* Up

* Up

* Up

* Up

* Remove several comments

* refactor: Remove ControlNetConditioningEmbedding and update code accordingly

* Up

* Up

* up

* refactor: Update AnyTextPipeline to include new optional parameters

* up

* feat: Add OCR model and its components

* chore: Update `TextEmbeddingModule` to include OCR model components and dependencies

* chore: Update `AuxiliaryLatentModule` to include VAE model and its dependencies for masked image in the editing task

* `make style`

* refactor: Update `AnyTextPipeline`'s docstring

* Update `AuxiliaryLatentModule` to include info dictionary so that text processing is done once

* simplify

* `make style`

* Converting `TextEmbeddingModule` to ordinary `encode_prompt()` function

* Simplify for now

* `make style`

* Up

* feat: Add scripts to convert AnyText controlnet to diffusers

* `make style`

* Fix: Move glyph rendering to `TextEmbeddingModule` from `AuxiliaryLatentModule`

* make style

* Up

* Simplify

* Up

* feat: Add safetensors module for loading model file

* Fix device issues

* Up

* Up

* refactor: Simplify

* refactor: Simplify code for loading models and handling data types

* `make style`

* refactor: Update to() method in FrozenCLIPEmbedderT3 and TextEmbeddingModule

* refactor: Update dtype in embedding_manager.py to match proj.weight

* Up

* Add attribution and adaptation information to pipeline_anytext.py

* Update usage example

* Will refactor `controlnet_cond_embedding` initialization

* Add `AnyTextControlNetConditioningEmbedding` template

* Refactor organization

* style

* style

* Move custom blocks from `AuxiliaryLatentModule` to `AnyTextControlNetConditioningEmbedding`

* Follow one-file policy

* style

* [Docs] Update README and pipeline_anytext.py to use AnyTextControlNetModel

* [Docs] Update import statement for AnyTextControlNetModel in pipeline_anytext.py

* [Fix] Update import path for ControlNetModel, ControlNetOutput in anytext_controlnet.py

* Refactor AnyTextControlNet to use configurable conditioning embedding channels

* Complete control net conditioning embedding in AnyTextControlNetModel

* up

* [FIX] Ensure embeddings use correct device in AnyTextControlNetModel

* up

* up

* style

* [UPDATE] Revise README and example code for AnyTextPipeline integration with DiffusionPipeline

* [UPDATE] Update example code in anytext.py to use correct font file and improve clarity

* down

* [UPDATE] Refactor BasicTokenizer usage to a new Checker class for text processing

* update pillow

* [UPDATE] Remove commented-out code and unnecessary docstring in anytext.py and anytext_controlnet.py for improved clarity

* [REMOVE] Delete frozen_clip_embedder_t3.py as it is in the anytext.py file

* [UPDATE] Replace edict with dict for configuration in anytext.py and RecModel.py for consistency

* 🆙

* style

* [UPDATE] Revise README.md for clarity, remove unused imports in anytext.py, and add author credits in anytext_controlnet.py

* style

* Update examples/research_projects/anytext/README.md

Co-authored-by: Aryan <contact.aryanvs@gmail.com>

* Remove commented-out image preparation code in AnyTextPipeline

* Remove unnecessary blank line in README.md
2025-03-11 01:49:37 +05:30
Sayak Paul
e7e6d85282 [Tests] improve quantization tests by additionally measuring the inference memory savings (#11021)
* memory usage tests

* fixes

* gguf
2025-03-10 21:42:24 +05:30
Aryan
8eefed65bd [LoRA] CogView4 (#10981)
* update

* make fix-copies

* update
2025-03-10 20:24:05 +05:30
Sayak Paul
26149c0ecd [LoRA] Improve warning messages when LoRA loading becomes a no-op (#10187)
* updates

* updates

* updates

* updates

* notebooks revert

* fix-copies.

* seeing

* fix

* revert

* fixes

* fixes

* fixes

* remove print

* fix

* conflicts ii.

* updates

* fixes

* better filtering of prefix.

---------

Co-authored-by: hlky <hlky@hlky.ac>
2025-03-10 09:28:32 +05:30
Ishan Modi
0703ce8800 [Single File] Add single file loading for SANA Transformer (#10947)
* added support for from_single_file

* added diffusers mapping script

* added testcase

* bug fix

* updated tests

* corrected code quality

* corrected code quality

---------

Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>
2025-03-10 08:38:30 +05:30
Dhruv Nair
f5edaa7894 [Quantization] Add Quanto backend (#10756)
* update

* updaet

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* Update docs/source/en/quantization/quanto.md

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* Update src/diffusers/quantizers/quanto/utils.py

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

* update

* update

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2025-03-10 08:33:05 +05:30
Dhruv Nair
9a1810f0de Fix for fetching variants only (#10646)
* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update
2025-03-10 07:45:44 +05:30
Sayak Paul
1fddee211e [LoRA] Improve copied from comments in the LoRA loader classes (#10995)
* more sanity of mind with copied from ...

* better

* better
2025-03-08 19:59:21 +05:30
Kinam Kim
b38450d5d2 Add STG to community pipelines (#10960)
* Support STG for video pipelines

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update pipeline_stg_cogvideox.py

* Update pipeline_stg_hunyuan_video.py

* Update pipeline_stg_ltx.py

* Update pipeline_stg_ltx_image2video.py

* Update pipeline_stg_mochi.py

* Update pipeline_stg_hunyuan_video.py

* Update pipeline_stg_ltx.py

* Update pipeline_stg_ltx_image2video.py

* Update pipeline_stg_mochi.py

* update

* remove rescaling

* Apply style fixes

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2025-03-08 00:28:24 +05:30
Dhruv Nair
1357931d74 [Single File] Add single file support for Wan T2V/I2V (#10991)
* update

* update

* update

* update

* update

* update

* update
2025-03-07 22:13:25 +05:30
Sayak Paul
a2d3d6af44 [LoRA] remove full key prefix from peft. (#11004)
remove full key prefix from peft.
2025-03-07 21:51:59 +05:30
hlky
363d1ab7e2 Wan VAE move scaling to pipeline (#10998) 2025-03-07 10:42:17 +00:00
C
6a0137eb3b Fix Graph Breaks When Compiling CogView4 (#10959)
* Fix Graph Breaks When Compiling CogView4

Eliminate this:

```
t]V0304 10:24:23.421000 3131076 torch/_dynamo/guards.py:2813] [0/4] [__recompiles] Recompiling function forward in /home/zeyi/repos/diffusers/src/diffusers/models/transformers/transformer_cogview4.py:374
V0304 10:24:23.421000 3131076 torch/_dynamo/guards.py:2813] [0/4] [__recompiles]     triggered by the following guard failure(s):
V0304 10:24:23.421000 3131076 torch/_dynamo/guards.py:2813] [0/4] [__recompiles]     - 0/3: ___check_obj_id(L['self'].rope.freqs_h, 139976127328032)    
V0304 10:24:23.421000 3131076 torch/_dynamo/guards.py:2813] [0/4] [__recompiles]     - 0/2: ___check_obj_id(L['self'].rope.freqs_h, 139976107780960)    
V0304 10:24:23.421000 3131076 torch/_dynamo/guards.py:2813] [0/4] [__recompiles]     - 0/1: ___check_obj_id(L['self'].rope.freqs_h, 140022511848960)    
V0304 10:24:23.421000 3131076 torch/_dynamo/guards.py:2813] [0/4] [__recompiles]     - 0/0: ___check_obj_id(L['self'].rope.freqs_h, 140024081342416)   
```

* Update transformer_cogview4.py

* fix cogview4 rotary pos embed

* Apply style fixes

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: YiYi Xu <yixu310@gmail.com>
2025-03-06 22:57:17 -10:00
Aryan
2e5203be04 Hunyuan I2V (#10983)
* update

* update

* update

* add tests

* update

* add model tests

* update docs

* update

* update example

* fix defaults

* update
2025-03-07 12:52:48 +05:30
yupeng1111
d55f41102a fix wan i2v pipeline bugs (#10975)
* fix wan i2v pipeline bugs

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: YiYi Xu <yixu310@gmail.com>
2025-03-06 18:57:41 -10:00
LittleNyima
748cb0fab6 Add CogVideoX DDIM Inversion to Community Pipelines (#10956)
* add cogvideox ddim inversion script

* implement as a pipeline, and add documentation

---------

Co-authored-by: Linoy Tsaban <57615435+linoytsaban@users.noreply.github.com>
2025-03-06 10:46:38 -10:00
Dhruv Nair
790a909b54 [Single File] Add user agent to SF download requests. (#10979)
update
2025-03-06 10:45:20 -10:00
CyberVy
54ab475391 Fix Flux Controlnet Pipeline _callback_tensor_inputs Missing Some Elements (#10974)
* Update pipeline_flux_controlnet.py

* Update pipeline_flux_controlnet_image_to_image.py

* Update pipeline_flux_controlnet_inpainting.py

* Update pipeline_flux_controlnet_inpainting.py

* Update pipeline_flux_controlnet_inpainting.py
2025-03-06 14:26:20 -03:00
dependabot[bot]
f103993094 Bump jinja2 from 3.1.5 to 3.1.6 in /examples/research_projects/realfill (#10984)
Bumps [jinja2](https://github.com/pallets/jinja) from 3.1.5 to 3.1.6.
- [Release notes](https://github.com/pallets/jinja/releases)
- [Changelog](https://github.com/pallets/jinja/blob/main/CHANGES.rst)
- [Commits](https://github.com/pallets/jinja/compare/3.1.5...3.1.6)

---
updated-dependencies:
- dependency-name: jinja2
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-03-06 11:59:51 +00:00
Sayak Paul
1be0202502 [CI] remove synchornized. (#10980)
removed synchornized.
2025-03-06 17:03:19 +05:30
Pierre Chapuis
ea81a4228d fix default values of Flux guidance_scale in docstrings (#10982) 2025-03-06 16:37:45 +05:30
hlky
b15027636a Fix loading OneTrainer Flux LoRA (#10978)
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2025-03-06 13:53:36 +05:30
Sayak Paul
6e2a93de70 [tests] fix tests for save load components (#10977)
fix tests
2025-03-06 12:30:37 +05:30
Jun Yeop Na
37b8edfb86 [train_dreambooth_lora.py] Fix the LR Schedulers when num_train_epochs is passed in a distributed training env (#10973)
* updated train_dreambooth_lora to fix the LR schedulers for `num_train_epochs` in distributed training env

* fixed formatting

* remove trailing newlines

* fixed style error
2025-03-06 10:06:24 +05:30
Célina
fbf6b856cc use style bot GH Action from huggingface_hub (#10970)
use style bot GH action from hfh

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2025-03-05 23:39:50 +05:30
Linoy Tsaban
e031caf4ea [flux lora training] fix t5 training bug (#10845)
* fix t5 training bug

* Apply style fixes

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2025-03-05 13:47:01 +02:00
hlky
08f74a8b92 Add VAE Decode endpoint slow test (#10946) 2025-03-05 11:28:06 +00:00
YiYi Xu
24c062aaa1 update check_input for cogview4 (#10966)
fix
2025-03-04 12:12:54 -10:00
Yuxuan Zhang
a74f02fb40 [Docs] CogView4 comment fix (#10957)
* Update pipeline_cogview4.py

* Use GLM instead of T5 in doc
2025-03-04 11:25:43 -10:00
Eliseu Silva
66bf7ea5be feat: add Mixture-of-Diffusers ControlNet Tile upscaler Pipeline for SDXL (#10951)
* feat: add Mixture-of-Diffusers ControlNet Tile upscaler Pipeline for SDXL

* make style make quality
2025-03-04 17:17:36 -03:00
Alexey Zolotenkov
b8215b1c06 Fix incorrect seed initialization when args.seed is 0 (#10964)
* Fix seed initialization to handle args.seed = 0 correctly

* Apply style fixes

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2025-03-04 10:09:52 -10:00
Aryan
3ee899fa0c [LoRA] Support Wan (#10943)
* update

* refactor image-to-video pipeline

* update

* fix copied from

* use FP32LayerNorm
2025-03-05 01:27:34 +05:30
CyberVy
dcd77ce222 Fix the missing parentheses when calling is_torchao_available in quantization_config.py. (#10961)
Update quantization_config.py
2025-03-04 09:52:41 -03:00
a120092009
11d8e3ce2c [Quantization] support pass MappingType for TorchAoConfig (#10927)
* [Quantization] support pass MappingType for TorchAoConfig

* Apply style fixes

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2025-03-04 16:40:50 +05:30
Sayak Paul
97fda1b75c [LoRA] feat: support non-diffusers lumina2 LoRAs. (#10909)
* feat: support non-diffusers lumina2 LoRAs.

* revert ipynb changes (but I don't know why this is required ☹️)

* empty

---------

Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>
Co-authored-by: YiYi Xu <yixu310@gmail.com>
2025-03-04 14:40:55 +05:30
Sayak Paul
cc22058324 Update evaluation.md (#10938)
* Update evaluation.md

* Update docs/source/en/conceptual/evaluation.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-03-04 13:58:16 +05:30
Fanli Lin
7855ac597e [tests] make tests device-agnostic (part 4) (#10508)
* initial comit

* fix empty cache

* fix one more

* fix style

* update device functions

* update

* update

* Update src/diffusers/utils/testing_utils.py

Co-authored-by: hlky <hlky@hlky.ac>

* Update src/diffusers/utils/testing_utils.py

Co-authored-by: hlky <hlky@hlky.ac>

* Update src/diffusers/utils/testing_utils.py

Co-authored-by: hlky <hlky@hlky.ac>

* Update tests/pipelines/controlnet/test_controlnet.py

Co-authored-by: hlky <hlky@hlky.ac>

* Update src/diffusers/utils/testing_utils.py

Co-authored-by: hlky <hlky@hlky.ac>

* Update src/diffusers/utils/testing_utils.py

Co-authored-by: hlky <hlky@hlky.ac>

* Update tests/pipelines/controlnet/test_controlnet.py

Co-authored-by: hlky <hlky@hlky.ac>

* with gc.collect

* update

* make style

* check_torch_dependencies

* add mps empty cache

* add changes

* bug fix

* enable on xpu

* update more cases

* revert

* revert back

* Update test_stable_diffusion_xl.py

* Update tests/pipelines/stable_diffusion/test_stable_diffusion.py

Co-authored-by: hlky <hlky@hlky.ac>

* Update tests/pipelines/stable_diffusion/test_stable_diffusion.py

Co-authored-by: hlky <hlky@hlky.ac>

* Update tests/pipelines/stable_diffusion/test_stable_diffusion_img2img.py

Co-authored-by: hlky <hlky@hlky.ac>

* Update tests/pipelines/stable_diffusion/test_stable_diffusion_img2img.py

Co-authored-by: hlky <hlky@hlky.ac>

* Update tests/pipelines/stable_diffusion/test_stable_diffusion_img2img.py

Co-authored-by: hlky <hlky@hlky.ac>

* Apply suggestions from code review

Co-authored-by: hlky <hlky@hlky.ac>

* add test marker

---------

Co-authored-by: hlky <hlky@hlky.ac>
2025-03-04 08:26:06 +00:00
CyberVy
30cef6bff3 Improve load_ip_adapter RAM Usage (#10948)
* Update ip_adapter.py

* Update ip_adapter.py

* Update ip_adapter.py

* Update ip_adapter.py

* Update ip_adapter.py

* Apply style fixes

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: hlky <hlky@hlky.ac>
2025-03-04 07:21:23 +00:00
Ahmed Belgacem
8f15be169f Fix redundant prev_output_channel assignment in UNet2DModel (#10945) 2025-03-03 11:43:15 -10:00
Yuxuan Zhang
f92e599c70 Update pipeline_cogview4.py (#10944) 2025-03-03 09:42:01 -10:00
Parag Ekbote
982f9b38d6 Add Example of IPAdapterScaleCutoffCallback to Docs (#10934)
* Add example of Ip-Adapter-Callback.

* Add image links from HF Hub.
2025-03-03 08:32:45 -08:00
fancydaddy
c9a219b323 add from_single_file to animatediff (#10924)
* Update pipeline_animatediff.py

* Update pipeline_animatediff_controlnet.py

* Update pipeline_animatediff_sparsectrl.py

* Update pipeline_animatediff_video2video.py

* Update pipeline_animatediff_video2video_controlnet.py

---------

Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>
2025-03-03 19:11:54 +05:30
Teriks
9e910c4633 Fix SD2.X clip single file load projection_dim (#10770)
* Fix SD2.X clip single file load projection_dim

Infer projection_dim from the checkpoint before loading
from pretrained, override any incorrect hub config.

Hub configuration for SD2.X specifies projection_dim=512
which is incorrect for SD2.X checkpoints loaded from civitai
and similar.

Exception was previously thrown upon attempting to
load_model_dict_into_meta for SD2.X single file checkpoints.

Such LDM models usually require projection_dim=1024

* convert_open_clip_checkpoint use hidden_size for text_proj_dim

* convert_open_clip_checkpoint, revert checkpoint[text_proj_key].shape[1] -> [0]

values are identical

---------

Co-authored-by: Teriks <Teriks@users.noreply.github.com>
Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>
2025-03-03 19:00:39 +05:30
Bubbliiiing
5e3b7d2d8a Add EasyAnimateV5.1 text-to-video, image-to-video, control-to-video generation model (#10626)
* Update EasyAnimate V5.1

* Add docs && add tests && Fix comments problems in transformer3d and vae

* delete comments and remove useless import

* delete process

* Update EXAMPLE_DOC_STRING

* rename transformer file

* make fix-copies

* make style

* refactor pt. 1

* update toctree.yml

* add model tests

* Update layer_norm for norm_added_q and norm_added_k in Attention

* Fix processor problem

* refactor vae

* Fix problem in comments

* refactor tiling; remove einops dependency

* fix docs path

* make fix-copies

* Update src/diffusers/pipelines/easyanimate/pipeline_easyanimate_control.py

* update _toctree.yml

* fix test

* update

* update

* update

* make fix-copies

* fix tests

---------

Co-authored-by: Aryan <aryan@huggingface.co>
Co-authored-by: Aryan <contact.aryanvs@gmail.com>
Co-authored-by: YiYi Xu <yixu310@gmail.com>
Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>
2025-03-03 18:37:19 +05:30
Sayak Paul
7513162b8b [Tests] Remove more encode prompts tests (#10942)
* fix-copies went uncaught it seems.

* remove more unneeded encode_prompt() tests

* Revert "fix-copies went uncaught it seems."

This reverts commit eefb302791.

* empty
2025-03-03 16:55:01 +05:30
Sayak Paul
4aaa0d21ba [chore] fix-copies to flux pipelines (#10941)
fix-copies went uncaught it seems.
2025-03-03 11:21:57 +05:30
hlky
54043c3e2e Update VAE Decode endpoints (#10939) 2025-03-02 18:29:53 +00:00
hlky
fc4229a0c3 Add remote_decode to remote_utils (#10898)
* Add `remote_decode` to `remote_utils`

* test dependency

* test dependency

* dependency

* dependency

* dependency

* docstrings

* changes

* make style

* apply

* revert, add new options

* Apply style fixes

* deprecate base64, headers not needed

* address comments

* add license header

* init test_remote_decode

* more

* more test

* more test

* skeleton for xl, flux

* more test

* flux test

* flux packed

* no scaling

* -save

* hunyuanvideo test

* Apply style fixes

* init docs

* Update src/diffusers/utils/remote_utils.py

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

* comments

* Apply style fixes

* comments

* hybrid_inference/vae_decode

* fix

* tip?

* tip

* api reference autodoc

* install tip

---------

Co-authored-by: sayakpaul <spsayakpaul@gmail.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2025-03-02 17:10:01 +00:00
hlky
694f9658c1 Support IPAdapter for more Flux pipelines (#10708)
* Support IPAdapter for more Flux pipelines

* -copied from

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2025-03-02 15:04:12 +00:00
YiYi Xu
2d8a41cae8 [Alibaba Wan Team] continue on #10921 Wan2.1 (#10922)
* Add wanx pipeline, model and example

* wanx_merged_v1

* change WanX into Wan

* fix i2v fp32 oom error

Link: https://code.alibaba-inc.com/open_wanx2/diffusers/codereview/20607813

* support t2v load fp32 ckpt

* add example

* final merge v1

* Update autoencoder_kl_wan.py

* up

* update middle, test up_block

* up up

* one less nn.sequential

* up more

* up

* more

* [refactor] [wip] Wan transformer/pipeline (#10926)

* update

* update

* refactor rope

* refactor pipeline

* make fix-copies

* add transformer test

* update

* update

* make style

* update tests

* tests

* conversion script

* conversion script

* update

* docs

* remove unused code

* fix _toctree.yml

* update dtype

* fix test

* fix tests: scale

* up

* more

* Apply suggestions from code review

* Apply suggestions from code review

* style

* Update scripts/convert_wan_to_diffusers.py

* update docs

* fix

---------

Co-authored-by: Yitong Huang <huangyitong.hyt@alibaba-inc.com>
Co-authored-by: 亚森 <wangjiayu.wjy@alibaba-inc.com>
Co-authored-by: Aryan <aryan@huggingface.co>
2025-03-02 17:24:26 +05:30
Dhruv Nair
7007febae5 [CI] Update Stylebot Permissions (#10931)
update
2025-03-01 09:43:05 +05:30
Sayak Paul
d230ecc570 [style bot] improve security for the stylebot. (#10908)
* improve security for the stylebot.

* 
2025-02-28 22:01:31 +05:30
hlky
37a5f1b3b6 Experimental per control type scale for ControlNet Union (#10723)
* ControlNet Union scale

* fix

* universal interface

* from_multi

* from_multi
2025-02-27 10:23:38 +00:00
Dhruv Nair
501d9de701 [CI] Fix for failing IP Adapter test in Fast GPU PR tests (#10915)
* update

* update

* update

* update
2025-02-27 14:22:28 +05:30
Dhruv Nair
e5c43b8af7 [CI] Fix Fast GPU tests on PR (#10912)
* update

* update

* update

* update

* update

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2025-02-27 14:21:50 +05:30
CyberVy
9a8e8db79f Fix Callback Tensor Inputs of the SD Controlnet Pipelines are missing some elements. (#10907)
* Update pipeline_controlnet_img2img.py

* Update pipeline_controlnet_inpaint.py

* Update pipeline_controlnet.py

---------
2025-02-26 15:36:47 -03:00
Sayak Paul
764d7ed49a [Tests] fix: lumina2 lora fuse_nan test (#10911)
fix: lumina2 lora fuse_nan test
2025-02-26 22:44:49 +05:30
Anton Obukhov
3fab6624fd Marigold Update: v1-1 models, Intrinsic Image Decomposition pipeline, documentation (#10884)
* minor documentation fixes of the depth and normals pipelines

* update license headers

* update model checkpoints in examples
fix missing prediction_type in register_to_config in the normals pipeline

* add initial marigold intrinsics pipeline
update comments about num_inference_steps and ensemble_size
minor fixes in comments of marigold normals and depth pipelines

* update uncertainty visualization to work with intrinsics

* integrate iid


---------

Co-authored-by: YiYi Xu <yixu310@gmail.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-02-25 14:13:02 -10:00
Yih-Dar
f0ac7aaafc Security fix (#10905)
fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-02-25 23:25:37 +05:30
CyberVy
613e77f8be Fix Callback Tensor Inputs of the SDXL Controlnet Inpaint and Img2img Pipelines are missing "controlnet_image". (#10880)
* Update pipeline_controlnet_inpaint_sd_xl.py

* Update pipeline_controlnet_sd_xl_img2img.py

* Update pipeline_controlnet_union_inpaint_sd_xl.py

* Update pipeline_controlnet_union_sd_xl_img2img.py

* Update pipeline_controlnet_inpaint_sd_xl.py

* Update pipeline_controlnet_sd_xl_img2img.py

* Update pipeline_controlnet_union_inpaint_sd_xl.py

* Update pipeline_controlnet_union_sd_xl_img2img.py

* Apply make style and make fix-copies fixes

* Update geodiff_molecule_conformation.ipynb

* Delete examples/research_projects/geodiff/geodiff_molecule_conformation.ipynb

* Delete examples/research_projects/gligen/demo.ipynb

* Create geodiff_molecule_conformation.ipynb

* Create demo.ipynb

* Update geodiff_molecule_conformation.ipynb

* Update geodiff_molecule_conformation.ipynb

* Delete examples/research_projects/geodiff/geodiff_molecule_conformation.ipynb

* Add files via upload

* Delete src/diffusers/pipelines/controlnet/pipeline_controlnet_inpaint.py

* Add files via upload
2025-02-25 12:53:03 -03:00
Daniel Regado
1450c2ac4f Multi IP-Adapter for Flux pipelines (#10867)
* Initial implementation of Flux multi IP-Adapter

* Update src/diffusers/pipelines/flux/pipeline_flux.py

Co-authored-by: hlky <hlky@hlky.ac>

* Update src/diffusers/pipelines/flux/pipeline_flux.py

Co-authored-by: hlky <hlky@hlky.ac>

* Changes for ipa image embeds

* Update src/diffusers/pipelines/flux/pipeline_flux.py

Co-authored-by: hlky <hlky@hlky.ac>

* Update src/diffusers/pipelines/flux/pipeline_flux.py

Co-authored-by: hlky <hlky@hlky.ac>

* make style && make quality

* Updated ip_adapter test

* Created typing_utils.py

---------

Co-authored-by: hlky <hlky@hlky.ac>
2025-02-25 09:51:15 +00:00
Dhruv Nair
cc7b5b873a [CI] Improvements to conditional GPU PR tests (#10859)
* update

* update

* update

* update

* update

* update

* test

* test

* test

* test

* test

* test

* test

* test

* test

* test

* test

* test

* update
2025-02-25 09:49:29 +05:30
Aryan
0404703237 [refactor] Remove additional Flux code (#10881)
* update

* apply review suggestions

---------

Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>
2025-02-24 14:56:30 -10:00
Aryan
13f20c7fe8 [refactor] SD3 docs & remove additional code (#10882)
* update

* update

* update
2025-02-25 03:08:47 +05:30
Dhruv Nair
87599691b9 [Docs] Fix toctree sorting (#10894)
update
2025-02-24 10:05:32 -10:00
Sayak Paul
36517f6124 [chore] correct qk norm list. (#10876)
correct qk norm list.
2025-02-24 07:49:14 -10:00
Aryan
64af74fc58 [docs] Add CogVideoX Schedulers (#10885)
update
2025-02-24 07:02:59 -10:00
SahilCarterr
170833c22a [Fix] fp16 unscaling in train_dreambooth_lora_sdxl (#10889)
Fix fp16 bug

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2025-02-24 06:49:23 -10:00
Steven Liu
db21c97043 [docs] Flux group offload (#10847)
* flux group-offload

* feedback
2025-02-24 08:47:08 -08:00
Steven Liu
3fdf173084 [docs] Update prompt weighting docs (#10843)
* sd_embed

* feedback
2025-02-24 08:46:26 -08:00
hlky
aba4a5799a Add SD3 ControlNet to AutoPipeline (#10888)
Co-authored-by: puhuk <wetr235@gmail.com>
2025-02-24 06:21:02 -10:00
Sayak Paul
b0550a66cc [LoRA] restrict certain keys to be checked for peft config update. (#10808)
* restruct certain keys to be checked for peft config update.

* updates

* finish./

* finish 2.

* updates
2025-02-24 16:54:38 +05:30
hlky
6f74ef550d Fix torch_dtype in Kolors text encoder with transformers v4.49 (#10816)
* Fix `torch_dtype` in Kolors text encoder with `transformers` v4.49

* Default torch_dtype and warning
2025-02-24 13:37:54 +05:30
Daniel Regado
9c7e205176 Comprehensive type checking for from_pretrained kwargs (#10758)
* More robust from_pretrained init_kwargs type checking

* Corrected for Python 3.10

* Type checks subclasses and fixed type warnings

* More type corrections and skip tokenizer type checking

* make style && make quality

* Updated docs and types for Lumina pipelines

* Fixed check for empty signature

* changed location of helper functions

* make style

---------

Co-authored-by: hlky <hlky@hlky.ac>
2025-02-22 13:15:19 +00:00
Steven Liu
64dec70e56 [docs] LoRA support (#10844)
* lora

* update

* update

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2025-02-22 08:53:02 +05:30
Marc Sun
ffb6777ace remove format check for safetensors file (#10864)
remove check
2025-02-21 19:56:16 +01:00
SahilCarterr
85fcbaf314 [Fix] Docs overview.md (#10858)
Fix docs
2025-02-21 08:03:22 -08:00
hlky
d75ea3c772 device_map in load_model_dict_into_meta (#10851)
* `device_map` in `load_model_dict_into_meta`

* _LOW_CPU_MEM_USAGE_DEFAULT

* fix is_peft_version is_bitsandbytes_version
2025-02-21 12:16:30 +00:00
Dhruv Nair
b27d4edbe1 [CI] Update always test Pipelines list in Pipeline fetcher (#10856)
* update

* update

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2025-02-21 16:24:20 +05:30
Dhruv Nair
2b2d04299c [CI] Fix incorrectly named test module for Hunyuan DiT (#10854)
update
2025-02-21 13:35:40 +05:30
Sayak Paul
6cef7d2366 fix remote vae template (#10852)
fix
2025-02-21 12:00:02 +05:30
Sayak Paul
9055ccb382 [chore] template for remote vae. (#10849)
template for remote vae.
2025-02-21 11:43:36 +05:30
Sayak Paul
1871a69ecb fix: run tests from a pr workflow. (#9696)
* fix: run tests from a pr workflow.

* correct

* update

* checking.
2025-02-21 08:50:37 +05:30
Aryan
e3bc4aab2e SkyReels Hunyuan T2V & I2V (#10837)
* update

* make fix-copies

* update

* tests

* update

* update

* add co-author

Co-Authored-By: Langdx <82783347+Langdx@users.noreply.github.com>

* add co-author

Co-Authored-By: howe <howezhang2018@gmail.com>

* update

---------

Co-authored-by: Langdx <82783347+Langdx@users.noreply.github.com>
Co-authored-by: howe <howezhang2018@gmail.com>
2025-02-21 06:48:15 +05:30
Aryan
f0707751ef Some consistency-related fixes for HunyuanVideo (#10835)
* update

* update
2025-02-21 03:37:07 +05:30
Daniel Regado
d9ee3879b0 SD3 IP-Adapter runtime checkpoint conversion (#10718)
* Added runtime checkpoint conversion

* Updated docs

* Fix for quantized model
2025-02-20 10:35:57 -10:00
Sayak Paul
454f82e6fc [CI] run fast gpu tests conditionally on pull requests. (#10310)
* run fast gpu tests conditionally on pull requests.

* revert unneeded changes.

* simplify PR.
2025-02-20 23:06:59 +05:30
Sayak Paul
1f853504da [CI] install accelerate transformers from main (#10289)
install accelerate transformers from .
2025-02-20 23:06:40 +05:30
Parag Ekbote
51941387dc Notebooks for Community Scripts-7 (#10846)
Add 5 Notebooks, improve their example
scripts and update the missing links for the
example README.
2025-02-20 09:02:09 -08:00
Haoyun Qin
c7a8c4395a fix: support transformer models' generation_config in pipeline (#10779) 2025-02-20 21:49:33 +05:30
Marc Sun
a4c1aac3ae store activation cls instead of function (#10832)
* store cls instead of an obj

* style
2025-02-20 10:38:15 +01:00
Sayak Paul
b2ca39c8ac [tests] test encode_prompt() in isolation (#10438)
* poc encode_prompt() tests

* fix

* updates.

* fixes

* fixes

* updates

* updates

* updates

* revert

* updates

* updates

* updates

* updates

* remove SDXLOptionalComponentsTesterMixin.

* remove tests that directly leveraged encode_prompt() in some way or the other.

* fix imports.

* remove _save_load

* fixes

* fixes

* fixes

* fixes
2025-02-20 13:21:43 +05:30
AstraliteHeart
532171266b Add missing isinstance for arg checks in GGUFParameter (#10834) 2025-02-20 12:49:51 +05:30
Sayak Paul
f550745a2b [Utils] add utilities for checking if certain utilities are properly documented (#7763)
* add; utility to check if attn_procs,norms,acts are properly documented.

* add support listing to the workflows.

* change to 2024.

* small fixes.

* does adding detailed docstrings help?

* uncomment image processor check

* quality

* fix, thanks to @mishig.

* Apply suggestions from code review

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* style

* JointAttnProcessor2_0

* fixes

* fixes

* fixes

* fixes

* fixes

* fixes

* Update docs/source/en/api/normalization.md

Co-authored-by: hlky <hlky@hlky.ac>

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
Co-authored-by: hlky <hlky@hlky.ac>
2025-02-20 12:37:00 +05:30
Sayak Paul
f10d3c6d04 [LoRA] add LoRA support to Lumina2 and fine-tuning script (#10818)
* feat: lora support for Lumina2.

* fix-copies.

* updates

* updates

* docs.

* fix

* add: training script.

* tests

* updates

* updates

* major updates.

* updates

* fixes

* docs.

* updates

* updates
2025-02-20 09:41:51 +05:30
Sayak Paul
0fb7068364 [tests] use proper gemma class and config in lumina2 tests. (#10828)
use proper gemma class and config in lumina2 tests.
2025-02-20 09:27:07 +05:30
Aryan
f8b54cf037 Remove print statements (#10836)
remove prints
2025-02-19 17:21:07 -10:00
Sayak Paul
680a8ed855 [misc] feat: introduce a style bot. (#10274)
* feat: introduce a style bot.

* updates

* Apply suggestions from code review

Co-authored-by: Guillaume LEGENDRE <glegendre01@gmail.com>

* apply suggestion

* fixes

* updates

---------

Co-authored-by: Guillaume LEGENDRE <glegendre01@gmail.com>
2025-02-19 20:49:10 +05:30
Marc Sun
f5929e0306 [FEAT] Model loading refactor (#10604)
* first draft model loading refactor

* revert name change

* fix bnb

* revert name

* fix dduf

* fix huanyan

* style

* Update src/diffusers/models/model_loading_utils.py

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

* suggestions from reviews

* Update src/diffusers/models/modeling_utils.py

Co-authored-by: YiYi Xu <yixu310@gmail.com>

* remove safetensors check

* fix default value

* more fix from suggestions

* revert logic for single file

* style

* typing + fix couple of issues

* improve speed

* Update src/diffusers/models/modeling_utils.py

Co-authored-by: Aryan <aryan@huggingface.co>

* fp8 dtype

* add tests

* rename resolved_archive_file to resolved_model_file

* format

* map_location default cpu

* add utility function

* switch to smaller model + test inference

* Apply suggestions from code review

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

* rm comment

* add log

* Apply suggestions from code review

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

* add decorator

* cosine sim instead

* fix use_keep_in_fp32_modules

* comm

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: YiYi Xu <yixu310@gmail.com>
Co-authored-by: Aryan <aryan@huggingface.co>
2025-02-19 17:34:53 +05:30
Sayak Paul
6fe05b9b93 [LoRA] make set_adapters() robust on silent failures. (#9618)
* make set_adapters() robust on silent failures.

* fixes to tests

* flaky decorator.

* fix

* flaky to sd3.

* remove warning.

* sort

* quality

* skip test_simple_inference_with_text_denoiser_multi_adapter_block_lora

* skip testing unsupported features.

* raise warning instead of error.
2025-02-19 14:33:57 +05:30
hlky
2bc82d6381 DiffusionPipeline mixin to+FromOriginalModelMixin/FromSingleFileMixin from_single_file type hint (#10811)
* DiffusionPipeline mixin `to` type hint

* FromOriginalModelMixin from_single_file

* FromSingleFileMixin from_single_file
2025-02-19 07:23:40 +00:00
Sayak Paul
924f880d4d [docs] add missing entries to the lora docs. (#10819)
add missing entries to the lora docs.
2025-02-18 09:10:18 -08:00
puhuk
b75b204a58 Fix max_shift value in flux and related functions to 1.15 (issue #10675) (#10807)
This PR updates the max_shift value in flux to 1.15 for consistency across the codebase. In addition to modifying max_shift in flux, all related functions that copy and use this logic, such as calculate_shift in `src/diffusers/pipelines/stable_diffusion_3/pipeline_stable_diffusion_3_img2img.py`, have also been updated to ensure uniform behavior.
2025-02-18 06:54:56 +00:00
Sayak Paul
c14057c8db [LoRA] improve lora support for flux. (#10810)
update lora support for flux.
2025-02-17 19:04:48 +05:30
Sayak Paul
3579cd2bb7 [chore] update notes generation spaces (#10592)
fix
2025-02-17 09:26:15 +05:30
Parag Ekbote
3e99b5677e Extend Support for callback_on_step_end for AuraFlow and LuminaText2Img Pipelines (#10746)
* Add support for callback_on_step_end for
AuraFlowPipeline and LuminaText2ImgPipeline.

* Apply the suggestions from code review for lumina and auraflow

Co-authored-by: hlky <hlky@hlky.ac>

* Update missing inputs and imports.

* Add input field.

* Apply suggestions from code review-2

Co-authored-by: hlky <hlky@hlky.ac>

* Apply the suggestions from review for unused imports.

Co-authored-by: hlky <hlky@hlky.ac>

* make style.

* Update pipeline_aura_flow.py

* Update pipeline_lumina.py

* Update pipeline_lumina.py

* Update pipeline_aura_flow.py

* Update pipeline_lumina.py

---------

Co-authored-by: hlky <hlky@hlky.ac>
2025-02-16 17:28:57 +00:00
Yaniv Galron
952b9131a2 typo fix (#10802) 2025-02-16 20:56:54 +05:30
Yuxuan Zhang
d90cd3621d CogView4 (supports different length c and uc) (#10649)
* init

* encode with glm

* draft schedule

* feat(scheduler): Add CogView scheduler implementation

* feat(embeddings): add CogView 2D rotary positional embedding

* 1

* Update pipeline_cogview4.py

* fix the timestep init and sigma

* update latent

* draft patch(not work)

* fix

* [WIP][cogview4]: implement initial CogView4 pipeline

Implement the basic CogView4 pipeline structure with the following changes:
- Add CogView4 pipeline implementation
- Implement DDIM scheduler for CogView4
- Add CogView3Plus transformer architecture
- Update embedding models

Current limitations:
- CFG implementation uses padding for sequence length alignment
- Need to verify transformer inference alignment with Megatron

TODO:
- Consider separate forward passes for condition/uncondition
  instead of padding approach

* [WIP][cogview4][refactor]: Split condition/uncondition forward pass in CogView4 pipeline

Split the forward pass for conditional and unconditional predictions in the CogView4 pipeline to match the original implementation. The noise prediction is now done separately for each case before combining them for guidance. However, the results still need improvement.

This is a work in progress as the generated images are not yet matching expected quality.

* use with -2 hidden state

* remove text_projector

* 1

* [WIP] Add tensor-reload to align input from transformer block

* [WIP] for older glm

* use with cogview4 transformers forward twice of u and uc

* Update convert_cogview4_to_diffusers.py

* remove this

* use main example

* change back

* reset

* setback

* back

* back 4

* Fix qkv conversion logic for CogView4 to Diffusers format

* back5

* revert to sat to cogview4 version

* update a new convert from megatron

* [WIP][cogview4]: implement CogView4 attention processor

Add CogView4AttnProcessor class for implementing scaled dot-product attention
with rotary embeddings for the CogVideoX model. This processor concatenates
encoder and hidden states, applies QKV projections and RoPE, but does not
include spatial normalization.

TODO:
- Fix incorrect QKV projection weights
- Resolve ~25% error in RoPE implementation compared to Megatron

* [cogview4] implement CogView4 transformer block

Implement CogView4 transformer block following the Megatron architecture:
- Add multi-modulate and multi-gate mechanisms for adaptive layer normalization
- Implement dual-stream attention with encoder-decoder structure
- Add feed-forward network with GELU activation
- Support rotary position embeddings for image tokens

The implementation follows the original CogView4 architecture while adapting
it to work within the diffusers framework.

* with new attn

* [bugfix] fix dimension mismatch in CogView4 attention

* [cogview4][WIP]: update final normalization in CogView4 transformer

Refactored the final normalization layer in CogView4 transformer to use separate layernorm and AdaLN operations instead of combined AdaLayerNormContinuous. This matches the original implementation but needs validation.

Needs verification against reference implementation.

* 1

* put back

* Update transformer_cogview4.py

* change time_shift

* Update pipeline_cogview4.py

* change timesteps

* fix

* change text_encoder_id

* [cogview4][rope] align RoPE implementation with Megatron

- Implement apply_rope method in attention processor to match Megatron's implementation
- Update position embeddings to ensure compatibility with Megatron-style rotary embeddings
- Ensure consistent rotary position encoding across attention layers

This change improves compatibility with Megatron-based models and provides
better alignment with the original implementation's positional encoding approach.

* [cogview4][bugfix] apply silu activation to time embeddings in CogView4

Applied silu activation to time embeddings before splitting into conditional
and unconditional parts in CogView4Transformer2DModel. This matches the
original implementation and helps ensure correct time conditioning behavior.

* [cogview4][chore] clean up pipeline code

- Remove commented out code and debug statements
- Remove unused retrieve_timesteps function
- Clean up code formatting and documentation

This commit focuses on code cleanup in the CogView4 pipeline implementation, removing unnecessary commented code and improving readability without changing functionality.

* [cogview4][scheduler] Implement CogView4 scheduler and pipeline

* now It work

* add timestep

* batch

* change convert scipt

* refactor pt. 1; make style

* refactor pt. 2

* refactor pt. 3

* add tests

* make fix-copies

* update toctree.yml

* use flow match scheduler instead of custom

* remove scheduling_cogview.py

* add tiktoken to test dependencies

* Update src/diffusers/models/embeddings.py

Co-authored-by: YiYi Xu <yixu310@gmail.com>

* apply suggestions from review

* use diffusers apply_rotary_emb

* update flow match scheduler to accept timesteps

* fix comment

* apply review sugestions

* Update src/diffusers/schedulers/scheduling_flow_match_euler_discrete.py

Co-authored-by: YiYi Xu <yixu310@gmail.com>

---------

Co-authored-by: 三洋三洋 <1258009915@qq.com>
Co-authored-by: OleehyO <leehy0357@gmail.com>
Co-authored-by: Aryan <aryan@huggingface.co>
Co-authored-by: YiYi Xu <yixu310@gmail.com>
2025-02-15 21:46:48 +05:30
YiYi Xu
69f919d8b5 follow-up refactor on lumina2 (#10776)
* up
2025-02-14 14:57:27 -10:00
SahilCarterr
a6b843a797 [FIX] check_inputs function in lumina2 (#10784) 2025-02-14 10:55:11 -10:00
puhuk
27b90235e4 Update Custom Diffusion Documentation for Multiple Concept Inference to resolve issue #10791 (#10792)
Update Custom Diffusion Documentation for Multiple Concept Inference

This PR updates the Custom Diffusion documentation to correctly demonstrate multiple concept inference by:

- Initializing the pipeline from a proper foundation model (e.g., "CompVis/stable-diffusion-v1-4") instead of a fine-tuned model.
- Defining model_id explicitly to avoid NameError.
- Correcting method calls for loading attention processors and textual inversion embeddings.
2025-02-14 08:19:11 -08:00
Aryan
9a147b82f7 Module Group Offloading (#10503)
* update

* fix

* non_blocking; handle parameters and buffers

* update

* Group offloading with cuda stream prefetching (#10516)

* cuda stream prefetch

* remove breakpoints

* update

* copy model hook implementation from pab

* update; ~very workaround based implementation but it seems to work as expected; needs cleanup and rewrite

* more workarounds to make it actually work

* cleanup

* rewrite

* update

* make sure to sync current stream before overwriting with pinned params

not doing so will lead to erroneous computations on the GPU and cause bad results

* better check

* update

* remove hook implementation to not deal with merge conflict

* re-add hook changes

* why use more memory when less memory do trick

* why still use slightly more memory when less memory do trick

* optimise

* add model tests

* add pipeline tests

* update docs

* add layernorm and groupnorm

* address review comments

* improve tests; add docs

* improve docs

* Apply suggestions from code review

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* apply suggestions from code review

* update tests

* apply suggestions from review

* enable_group_offloading -> enable_group_offload for naming consistency

* raise errors if multiple offloading strategies used; add relevant tests

* handle .to() when group offload applied

* refactor some repeated code

* remove unintentional change from merge conflict

* handle .cuda()

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-02-14 12:59:45 +05:30
Aryan
ab428207a7 Refactor CogVideoX transformer forward (#10789)
update
2025-02-13 12:11:25 -10:00
Aryan
8d081de844 Update FlowMatch docstrings to mention correct output classes (#10788)
update
2025-02-14 02:29:16 +05:30
Aryan
a0c22997fd Disable PEFT input autocast when using fp8 layerwise casting (#10685)
* disable peft input autocast

* use new peft method name; only disable peft input autocast if submodule layerwise casting active

* add test; reference PeftInputAutocastDisableHook in peft docs

* add load_lora_weights test

* casted -> cast

* Update tests/lora/utils.py
2025-02-13 23:12:54 +05:30
Fanli Lin
97abdd2210 make tensors contiguous before passing to safetensors (#10761)
fix contiguous bug
2025-02-13 06:27:53 +00:00
Eliseu Silva
051ebc3c8d fix: [Community pipeline] Fix flattened elements on image (#10774)
* feat: new community mixture_tiling_sdxl pipeline for SDXL mixture-of-diffusers support

* fix use of variable latents to tile_latents

* removed references to modules that are not being used in this pipeline

* make style, make quality

* fixfeat: added _get_crops_coords_list function to pipeline to automatically define ctop,cleft coord to focus on image generation, helps to better harmonize the image and corrects the problem of flattened elements.
2025-02-12 19:50:41 -03:00
Daniel Regado
5105b5a83d MultiControlNetUnionModel on SDXL (#10747)
* SDXL with MultiControlNetUnionModel



---------

Co-authored-by: hlky <hlky@hlky.ac>
2025-02-12 10:48:09 -10:00
hlky
ca6330dc53 Fix use_lu_lambdas and use_karras_sigmas with beta_schedule=squaredcos_cap_v2 in DPMSolverMultistepScheduler (#10740) 2025-02-12 20:33:56 +00:00
Dhruv Nair
28f48f4051 [Single File] Add Single File support for Lumina Image 2.0 Transformer (#10781)
* update

* update
2025-02-12 18:53:49 +05:30
Thanh Le
067eab1b3a Faster set_adapters (#10777)
* Update peft_utils.py

* Update peft_utils.py

* Update peft_utils.py

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2025-02-12 16:30:09 +05:30
Aryan
57ac673802 Refactor OmniGen (#10771)
* OmniGen model.py

* update OmniGenTransformerModel

* omnigen pipeline

* omnigen pipeline

* update omnigen_pipeline

* test case for omnigen

* update omnigenpipeline

* update docs

* update docs

* offload_transformer

* enable_transformer_block_cpu_offload

* update docs

* reformat

* reformat

* reformat

* update docs

* update docs

* make style

* make style

* Update docs/source/en/api/models/omnigen_transformer.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/using-diffusers/omnigen.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/using-diffusers/omnigen.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* update docs

* revert changes to examples/

* update OmniGen2DModel

* make style

* update test cases

* Update docs/source/en/api/pipelines/omnigen.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/using-diffusers/omnigen.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/using-diffusers/omnigen.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/using-diffusers/omnigen.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/using-diffusers/omnigen.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/using-diffusers/omnigen.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* update docs

* typo

* Update src/diffusers/models/embeddings.py

Co-authored-by: hlky <hlky@hlky.ac>

* Update src/diffusers/models/attention.py

Co-authored-by: hlky <hlky@hlky.ac>

* Update src/diffusers/models/transformers/transformer_omnigen.py

Co-authored-by: hlky <hlky@hlky.ac>

* Update src/diffusers/models/transformers/transformer_omnigen.py

Co-authored-by: hlky <hlky@hlky.ac>

* Update src/diffusers/models/transformers/transformer_omnigen.py

Co-authored-by: hlky <hlky@hlky.ac>

* Update src/diffusers/pipelines/omnigen/pipeline_omnigen.py

Co-authored-by: hlky <hlky@hlky.ac>

* Update src/diffusers/pipelines/omnigen/pipeline_omnigen.py

Co-authored-by: hlky <hlky@hlky.ac>

* Update src/diffusers/pipelines/omnigen/pipeline_omnigen.py

Co-authored-by: hlky <hlky@hlky.ac>

* Update tests/pipelines/omnigen/test_pipeline_omnigen.py

Co-authored-by: hlky <hlky@hlky.ac>

* Update tests/pipelines/omnigen/test_pipeline_omnigen.py

Co-authored-by: hlky <hlky@hlky.ac>

* Update src/diffusers/pipelines/omnigen/pipeline_omnigen.py

Co-authored-by: hlky <hlky@hlky.ac>

* Update src/diffusers/pipelines/omnigen/pipeline_omnigen.py

Co-authored-by: hlky <hlky@hlky.ac>

* Update src/diffusers/pipelines/omnigen/pipeline_omnigen.py

Co-authored-by: hlky <hlky@hlky.ac>

* consistent attention processor

* updata

* update

* check_inputs

* make style

* update testpipeline

* update testpipeline

* refactor omnigen

* more updates

* apply review suggestion

---------

Co-authored-by: shitao <2906698981@qq.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
Co-authored-by: hlky <hlky@hlky.ac>
2025-02-12 14:06:14 +05:30
Le Zhuo
81440fd474 Add support for lumina2 (#10642)
* Add support for lumina2


---------

Co-authored-by: csuhan <hanjiaming@whu.edu.cn>
Co-authored-by: YiYi Xu <yixu310@gmail.com>
Co-authored-by: Aryan <aryan@huggingface.co>
Co-authored-by: hlky <hlky@hlky.ac>
2025-02-11 11:38:33 -10:00
Eliseu Silva
c470274865 feat: new community mixture_tiling_sdxl pipeline for SDXL (#10759)
* feat: new community mixture_tiling_sdxl pipeline for SDXL mixture-of-diffusers support

* fix use of variable latents to tile_latents

* removed references to modules that are not being used in this pipeline

* make style, make quality
2025-02-11 18:01:42 -03:00
Shitao Xiao
798e17187d Add OmniGen (#10148)
* OmniGen model.py

* update OmniGenTransformerModel

* omnigen pipeline

* omnigen pipeline

* update omnigen_pipeline

* test case for omnigen

* update omnigenpipeline

* update docs

* update docs

* offload_transformer

* enable_transformer_block_cpu_offload

* update docs

* reformat

* reformat

* reformat

* update docs

* update docs

* make style

* make style

* Update docs/source/en/api/models/omnigen_transformer.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/using-diffusers/omnigen.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/using-diffusers/omnigen.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* update docs

* revert changes to examples/

* update OmniGen2DModel

* make style

* update test cases

* Update docs/source/en/api/pipelines/omnigen.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/using-diffusers/omnigen.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/using-diffusers/omnigen.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/using-diffusers/omnigen.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/using-diffusers/omnigen.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/using-diffusers/omnigen.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* update docs

* typo

* Update src/diffusers/models/embeddings.py

Co-authored-by: hlky <hlky@hlky.ac>

* Update src/diffusers/models/attention.py

Co-authored-by: hlky <hlky@hlky.ac>

* Update src/diffusers/models/transformers/transformer_omnigen.py

Co-authored-by: hlky <hlky@hlky.ac>

* Update src/diffusers/models/transformers/transformer_omnigen.py

Co-authored-by: hlky <hlky@hlky.ac>

* Update src/diffusers/models/transformers/transformer_omnigen.py

Co-authored-by: hlky <hlky@hlky.ac>

* Update src/diffusers/pipelines/omnigen/pipeline_omnigen.py

Co-authored-by: hlky <hlky@hlky.ac>

* Update src/diffusers/pipelines/omnigen/pipeline_omnigen.py

Co-authored-by: hlky <hlky@hlky.ac>

* Update src/diffusers/pipelines/omnigen/pipeline_omnigen.py

Co-authored-by: hlky <hlky@hlky.ac>

* Update tests/pipelines/omnigen/test_pipeline_omnigen.py

Co-authored-by: hlky <hlky@hlky.ac>

* Update tests/pipelines/omnigen/test_pipeline_omnigen.py

Co-authored-by: hlky <hlky@hlky.ac>

* Update src/diffusers/pipelines/omnigen/pipeline_omnigen.py

Co-authored-by: hlky <hlky@hlky.ac>

* Update src/diffusers/pipelines/omnigen/pipeline_omnigen.py

Co-authored-by: hlky <hlky@hlky.ac>

* Update src/diffusers/pipelines/omnigen/pipeline_omnigen.py

Co-authored-by: hlky <hlky@hlky.ac>

* consistent attention processor

* updata

* update

* check_inputs

* make style

* update testpipeline

* update testpipeline

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
Co-authored-by: hlky <hlky@hlky.ac>
Co-authored-by: Aryan <aryan@huggingface.co>
2025-02-12 02:16:38 +05:30
Dhruv Nair
ed4b75229f [CI] Fix Truffle Hog failure (#10769)
* update

* update
2025-02-11 22:41:03 +05:30
Mathias Parger
8ae8008b0d speedup hunyuan encoder causal mask generation (#10764)
* speedup causal mask generation

* fixing hunyuan attn mask test case
2025-02-11 16:03:15 +05:30
Sayak Paul
c80eda9d3e [Tests] Test layerwise casting with training (#10765)
* add a test to check if we can train with layerwise casting.

* updates

* updates

* style
2025-02-11 16:02:28 +05:30
hlky
7fb481f840 Add Self type hint to ModelMixin's from_pretrained (#10742) 2025-02-10 09:17:57 -10:00
Sayak Paul
9f5ad1db41 [LoRA] fix peft state dict parsing (#10532)
* fix peft state dict parsing

* updates
2025-02-10 18:47:20 +05:30
hlky
464374fb87 EDMEulerScheduler accept sigmas, add final_sigmas_type (#10734) 2025-02-07 06:53:52 +00:00
hlky
d43ce14e2d Quantized Flux with IP-Adapter (#10728) 2025-02-06 07:02:36 -10:00
Leo Jiang
cd0a4a82cf [bugfix] NPU Adaption for Sana (#10724)
* NPU Adaption for Sanna

* NPU Adaption for Sanna

* NPU Adaption for Sanna

* NPU Adaption for Sanna

* NPU Adaption for Sanna

* NPU Adaption for Sanna

* NPU Adaption for Sanna

* NPU Adaption for Sanna

* NPU Adaption for Sanna

* NPU Adaption for Sanna

* NPU Adaption for Sanna

* NPU Adaption for Sanna

* NPU Adaption for Sanna

* NPU Adaption for Sanna

* NPU Adaption for Sanna

* NPU Adaption for Sanna

* NPU Adaption for Sanna

* NPU Adaption for Sanna

* [bugfix]NPU Adaption for Sanna

---------

Co-authored-by: J石页 <jiangshuo9@h-partners.com>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2025-02-06 19:29:58 +05:30
suzukimain
145522cbb7 [Community] Enhanced Model Search (#10417)
* Added `auto_load_textual_inversion` and `auto_load_lora_weights`

* update README.md

* fix

* make quality

* Fix and `make style`
2025-02-05 14:43:53 -10:00
xieofxie
23bc56a02d add provider_options in from_pretrained (#10719)
Co-authored-by: hualxie <hualxie@microsoft.com>
2025-02-05 09:41:41 -10:00
SahilCarterr
5b1dcd1584 [Fix] Type Hint in from_pretrained() to Ensure Correct Type Inference (#10714)
* Update pipeline_utils.py

Added Self in from_pretrained method so  inference will correctly recognize pipeline

* Use typing_extensions

---------

Co-authored-by: hlky <hlky@hlky.ac>
2025-02-04 08:59:31 -10:00
Parag Ekbote
dbe0094e86 Notebooks for Community Scripts-6 (#10713)
* Fix Doc Tutorial.

* Add 4 Notebooks and improve their example
scripts.
2025-02-04 10:12:17 -08:00
Nicolas
f63d32233f Fix train_text_to_image.py --help (#10711) 2025-02-04 11:26:23 +05:30
Sayak Paul
5e8e6cb44f [bitsandbytes] Simplify bnb int8 dequant (#10401)
* fix dequantization for latest bnb.

* smol fixes.

* fix type annotation

* update peft link

* updates
2025-02-04 11:17:14 +05:30
Parag Ekbote
3e35f56b00 Fix Documentation about Image-to-Image Pipeline (#10704)
Fix Doc Tutorial.
2025-02-03 09:54:00 -08:00
Ikpreet S Babra
537891e693 Fixed grammar in "write_own_pipeline" readme (#10706) 2025-02-03 09:53:30 -08:00
Vedat Baday
9f28f1abba feat(training-utils): support device and dtype params in compute_density_for_timestep_sampling (#10699)
* feat(training-utils): support device and dtype params in compute_density_for_timestep_sampling

* chore: update type hint

* refactor: use union for type hint

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2025-02-01 23:04:05 +05:30
Thanh Le
5d2d23986e Fix inconsistent random transform in instruct pix2pix (#10698)
* Update train_instruct_pix2pix.py

Fix inconsistent random transform in instruct_pix2pix

* Update train_instruct_pix2pix_sdxl.py
2025-01-31 08:29:29 -10:00
Max Podkorytov
1ae9b0595f Fix enable memory efficient attention on ROCm (#10564)
* fix enable memory efficient attention on ROCm

while calling CK implementation

* Update attention_processor.py

refactor of picking a set element
2025-01-31 17:15:49 +05:30
SahilCarterr
aad69ac2f3 [FIX] check_inputs function in Auraflow Pipeline (#10678)
fix_shape_error
2025-01-29 13:11:54 -10:00
Vedat Baday
ea76880bd7 fix(hunyuan-video): typo in height and width input check (#10684) 2025-01-30 04:16:05 +05:30
Teriks
33f936154d support StableDiffusionAdapterPipeline.from_single_file (#10552)
* support StableDiffusionAdapterPipeline.from_single_file

* make style

---------

Co-authored-by: Teriks <Teriks@users.noreply.github.com>
Co-authored-by: hlky <hlky@hlky.ac>
2025-01-29 07:18:47 -10:00
Sayak Paul
e6037e8275 [tests] update llamatokenizer in hunyuanvideo tests (#10681)
update llamatokenizer in hunyuanvideo tests
2025-01-29 21:12:57 +05:30
Dimitri Barbot
196aef5a6f Fix pipeline dtype unexpected change when using SDXL reference community pipelines in float16 mode (#10670)
Fix pipeline dtype unexpected change when using SDXL reference community pipelines
2025-01-28 10:46:41 -03:00
Sayak Paul
7b100ce589 [Tests] conditionally check fp8_e4m3_bf16_max_memory < fp8_e4m3_fp32_max_memory (#10669)
* conditionally check if compute capability is met.

* log info.

* fix condition.

* updates

* updates

* updates

* updates
2025-01-28 12:00:14 +05:30
Aryan
c4d4ac21e7 Refactor gradient checkpointing (#10611)
* update

* remove unused fn

* apply suggestions based on review

* update + cleanup 🧹

* more cleanup 🧹

* make fix-copies

* update test
2025-01-28 06:51:46 +05:30
Hanch Han
f295e2eefc [fix] refer use_framewise_encoding on AutoencoderKLHunyuanVideo._encode (#10600)
* fix: refer to use_framewise_encoding on AutoencoderKLHunyuanVideo._encode

* fix: comment about tile_sample_min_num_frames

---------

Co-authored-by: Aryan <aryan@huggingface.co>
2025-01-28 06:51:27 +05:30
Aryan
658e24e86c [core] Pyramid Attention Broadcast (#9562)
* start pyramid attention broadcast

* add coauthor

Co-Authored-By: Xuanlei Zhao <43881818+oahzxl@users.noreply.github.com>

* update

* make style

* update

* make style

* add docs

* add tests

* update

* Update docs/source/en/api/pipelines/cogvideox.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/api/pipelines/cogvideox.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Pyramid Attention Broadcast rewrite + introduce hooks (#9826)

* rewrite implementation with hooks

* make style

* update

* merge pyramid-attention-rewrite-2

* make style

* remove changes from latte transformer

* revert docs changes

* better debug message

* add todos for future

* update tests

* make style

* cleanup

* fix

* improve log message; fix latte test

* refactor

* update

* update

* update

* revert changes to tests

* update docs

* update tests

* Apply suggestions from code review

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* update

* fix flux test

* reorder

* refactor

* make fix-copies

* update docs

* fixes

* more fixes

* make style

* update tests

* update code example

* make fix-copies

* refactor based on reviews

* use maybe_free_model_hooks

* CacheMixin

* make style

* update

* add current_timestep property; update docs

* make fix-copies

* update

* improve tests

* try circular import fix

* apply suggestions from review

* address review comments

* Apply suggestions from code review

* refactor hook implementation

* add test suite for hooks

* PAB Refactor (#10667)

* update

* update

* update

---------

Co-authored-by: DN6 <dhruv.nair@gmail.com>

* update

* fix remove hook behaviour

---------

Co-authored-by: Xuanlei Zhao <43881818+oahzxl@users.noreply.github.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
Co-authored-by: DN6 <dhruv.nair@gmail.com>
2025-01-28 05:09:04 +05:30
Giuseppe Catalano
fb42066489 Revert RePaint scheduler 'fix' (#10644)
Co-authored-by: Giuseppe Catalano <giuseppelorenzo.catalano@unito.it>
2025-01-27 11:16:45 -10:00
Teriks
e89ab5bc26 SDXL ControlNet Union pipelines, make control_image argument immutible (#10663)
controlnet union XL, make control_image immutible

when this argument is passed a list, __call__
modifies its content, since it is pass by reference
the list passed by the caller gets its content
modified unexpectedly

make a copy at method intro so this does not happen

Co-authored-by: Teriks <Teriks@users.noreply.github.com>
2025-01-27 10:53:30 -10:00
victolee0
8ceec90d76 fix check_inputs func in LuminaText2ImgPipeline (#10651) 2025-01-27 09:47:01 -10:00
hlky
158c5c4d08 Add provider_options to OnnxRuntimeModel (#10661) 2025-01-27 09:46:17 -10:00
hlky
41571773d9 [training] Convert to ImageFolder script (#10664)
* [training] Convert to ImageFolder script

* make
2025-01-27 09:43:51 -10:00
hlky
18f7d1d937 ControlNet Union controlnet_conditioning_scale for multiple control inputs (#10666) 2025-01-27 08:15:25 -10:00
Marlon May
f7f36c7d3d Add community pipeline for semantic guidance for FLUX (#10610)
* add community pipeline for semantic guidance for flux

* fix imports in community pipeline for semantic guidance for flux

* Update examples/community/pipeline_flux_semantic_guidance.py

Co-authored-by: hlky <hlky@hlky.ac>

* fix community pipeline for semantic guidance for flux

---------

Co-authored-by: Linoy Tsaban <57615435+linoytsaban@users.noreply.github.com>
Co-authored-by: hlky <hlky@hlky.ac>
2025-01-27 16:19:46 +02:00
Yuqian Hong
4fa24591a3 create a script to train autoencoderkl (#10605)
* create a script to train vae

* update main.py

* update train_autoencoderkl.py

* update train_autoencoderkl.py

* add a check of --pretrained_model_name_or_path and --model_config_name_or_path

* remove the comment, remove diffusers in requiremnets.txt, add validation_image ote

* update autoencoderkl.py

* quality

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2025-01-27 16:41:34 +05:30
Jacob Helwig
4f3ec5364e Add sigmoid scheduler in scheduling_ddpm.py docs (#10648)
Sigmoid scheduler in scheduling_ddpm.py docs
2025-01-26 15:37:20 -08:00
Leo Jiang
07860f9916 NPU Adaption for Sanna (#10409)
* NPU Adaption for Sanna


---------

Co-authored-by: J石页 <jiangshuo9@h-partners.com>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2025-01-24 09:08:52 -10:00
Wenhao Sun
87252d80c3 Add pipeline_stable_diffusion_xl_attentive_eraser (#10579)
* add pipeline_stable_diffusion_xl_attentive_eraser

* add pipeline_stable_diffusion_xl_attentive_eraser_make_style

* make style and add example output

* update Docs

Co-authored-by: Other Contributor <a457435687@126.com>

* add Oral

Co-authored-by: Other Contributor <a457435687@126.com>

* update_review

Co-authored-by: Other Contributor <a457435687@126.com>

* update_review_ms

Co-authored-by: Other Contributor <a457435687@126.com>

---------

Co-authored-by: Other Contributor <a457435687@126.com>
2025-01-24 13:52:45 +00:00
Sayak Paul
5897137397 [chore] add a script to extract loras from full fine-tuned models (#10631)
* feat: add a lora extraction script.

* updates
2025-01-24 11:50:36 +05:30
Yaniv Galron
a451c0ed14 removing redundant requires_grad = False (#10628)
We already set the unet to requires grad false at line 506

Co-authored-by: Aryan <aryan@huggingface.co>
2025-01-24 03:25:33 +05:30
hlky
37c9697f5b Add IP-Adapter example to Flux docs (#10633)
* Add IP-Adapter example to Flux docs

* Apply suggestions from code review

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2025-01-23 22:15:33 +05:30
Raul Ciotescu
9684c52adf width and height are mixed-up (#10629)
vars mixed-up
2025-01-23 06:40:22 -10:00
Steven Liu
5483162d12 [docs] uv installation (#10622)
* uv

* feedback
2025-01-23 08:34:51 -08:00
Sayak Paul
d77c53b6d2 [docs] fix image path in para attention docs (#10632)
fix image path in para attention docs
2025-01-23 08:22:42 -08:00
Sayak Paul
78bc824729 [Tests] modify the test slices for the failing flax test (#10630)
* fixes

* fixes

* fixes

* updates
2025-01-23 12:10:24 +05:30
kahmed10
04d40920a7 add onnxruntime-migraphx as part of check for onnxruntime in import_utils.py (#10624)
add onnxruntime-migraphx to import_utils.py

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2025-01-23 07:49:51 +05:30
Dhruv Nair
8d6f6d6b66 [CI] Update HF_TOKEN in all workflows (#10613)
update
2025-01-22 20:03:41 +05:30
Aryan
ca60ad8e55 Improve TorchAO error message (#10627)
improve error message
2025-01-22 19:50:02 +05:30
Aryan
beacaa5528 [core] Layerwise Upcasting (#10347)
* update

* update

* make style

* remove dynamo disable

* add coauthor

Co-Authored-By: Dhruv Nair <dhruv.nair@gmail.com>

* update

* update

* update

* update mixin

* add some basic tests

* update

* update

* non_blocking

* improvements

* update

* norm.* -> norm

* apply suggestions from review

* add example

* update hook implementation to the latest changes from pyramid attention broadcast

* deinitialize should raise an error

* update doc page

* Apply suggestions from code review

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* update docs

* update

* refactor

* fix _always_upcast_modules for asym ae and vq_model

* fix lumina embedding forward to not depend on weight dtype

* refactor tests

* add simple lora inference tests

* _always_upcast_modules -> _precision_sensitive_module_patterns

* remove todo comments about review; revert changes to self.dtype in unets because .dtype on ModelMixin should be able to handle fp8 weight case

* check layer dtypes in lora test

* fix UNet1DModelTests::test_layerwise_upcasting_inference

* _precision_sensitive_module_patterns -> _skip_layerwise_casting_patterns based on feedback

* skip test in NCSNppModelTests

* skip tests for AutoencoderTinyTests

* skip tests for AutoencoderOobleckTests

* skip tests for UNet1DModelTests - unsupported pytorch operations

* layerwise_upcasting -> layerwise_casting

* skip tests for UNetRLModelTests; needs next pytorch release for currently unimplemented operation support

* add layerwise fp8 pipeline test

* use xfail

* Apply suggestions from code review

Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>

* add assertion with fp32 comparison; add tolerance to fp8-fp32 vs fp32-fp32 comparison (required for a few models' test to pass)

* add note about memory consumption on tesla CI runner for failing test

---------

Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-01-22 19:49:37 +05:30
Lucain
a647682224 Remove cache migration script (#10619) 2025-01-21 07:22:59 -10:00
YiYi Xu
a1f9a71238 fix offload gpu tests etc (#10366)
* add

* style
2025-01-21 18:52:36 +05:30
Fanli Lin
ec37e20972 [tests] make tests device-agnostic (part 3) (#10437)
* initial comit

* fix empty cache

* fix one more

* fix style

* update device functions

* update

* update

* Update src/diffusers/utils/testing_utils.py

Co-authored-by: hlky <hlky@hlky.ac>

* Update src/diffusers/utils/testing_utils.py

Co-authored-by: hlky <hlky@hlky.ac>

* Update src/diffusers/utils/testing_utils.py

Co-authored-by: hlky <hlky@hlky.ac>

* Update tests/pipelines/controlnet/test_controlnet.py

Co-authored-by: hlky <hlky@hlky.ac>

* Update src/diffusers/utils/testing_utils.py

Co-authored-by: hlky <hlky@hlky.ac>

* Update src/diffusers/utils/testing_utils.py

Co-authored-by: hlky <hlky@hlky.ac>

* Update tests/pipelines/controlnet/test_controlnet.py

Co-authored-by: hlky <hlky@hlky.ac>

* with gc.collect

* update

* make style

* check_torch_dependencies

* add mps empty cache

* bug fix

* Apply suggestions from code review

---------

Co-authored-by: hlky <hlky@hlky.ac>
2025-01-21 12:15:45 +00:00
Muyang Li
158a5a87fb Remove the FP32 Wrapper when evaluating (#10617)
Remove the FP32 Wrapper

Co-authored-by: Linoy Tsaban <57615435+linoytsaban@users.noreply.github.com>
2025-01-21 16:16:54 +05:30
jiqing-feng
012d08b1bc Enable dreambooth lora finetune example on other devices (#10602)
* enable dreambooth_lora on other devices

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* enable xpu

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* check cuda device before empty cache

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix comment

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* import free_memory

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

---------

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
2025-01-21 14:09:45 +05:30
Sayak Paul
4ace7d0483 [chore] change licensing to 2025 from 2024. (#10615)
change licensing to 2025 from 2024.
2025-01-20 16:57:27 -10:00
baymax591
75a636da48 bugfix for npu not support float64 (#10123)
* bugfix for npu not support float64

* is_mps is_npu

---------

Co-authored-by: 白超 <baichao19@huawei.com>
Co-authored-by: hlky <hlky@hlky.ac>
2025-01-20 09:35:24 -10:00
sunxunle
4842f5d8de chore: remove redundant words (#10609)
Signed-off-by: sunxunle <sunxunle@ampere.tech>
2025-01-20 08:15:26 -10:00
Sayak Paul
328e0d20a7 [training] set rest of the blocks with requires_grad False. (#10607)
set rest of the blocks with requires_grad False.
2025-01-19 19:34:53 +05:30
Shenghai Yuan
23b467c79c [core] ConsisID (#10140)
* Update __init__.py

* add consisid

* update consisid

* update consisid

* make style

* make_style

* Update src/diffusers/pipelines/consisid/pipeline_consisid.py

Co-authored-by: hlky <hlky@hlky.ac>

* Update src/diffusers/pipelines/consisid/pipeline_consisid.py

Co-authored-by: hlky <hlky@hlky.ac>

* Update src/diffusers/pipelines/consisid/pipeline_consisid.py

Co-authored-by: hlky <hlky@hlky.ac>

* Update src/diffusers/pipelines/consisid/pipeline_consisid.py

Co-authored-by: hlky <hlky@hlky.ac>

* Update src/diffusers/pipelines/consisid/pipeline_consisid.py

Co-authored-by: hlky <hlky@hlky.ac>

* Update src/diffusers/pipelines/consisid/pipeline_consisid.py

Co-authored-by: hlky <hlky@hlky.ac>

* add doc

* make style

* Rename consisid .md to consisid.md

* Update geodiff_molecule_conformation.ipynb

* Update geodiff_molecule_conformation.ipynb

* Update geodiff_molecule_conformation.ipynb

* Update demo.ipynb

* Update pipeline_consisid.py

* make fix-copies

* Update docs/source/en/using-diffusers/consisid.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update src/diffusers/pipelines/consisid/pipeline_consisid.py

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update src/diffusers/pipelines/consisid/pipeline_consisid.py

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/using-diffusers/consisid.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/using-diffusers/consisid.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* update doc & pipeline code

* fix typo

* make style

* update example

* Update docs/source/en/using-diffusers/consisid.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* update example

* update example

* Update src/diffusers/pipelines/consisid/pipeline_consisid.py

Co-authored-by: hlky <hlky@hlky.ac>

* Update src/diffusers/pipelines/consisid/pipeline_consisid.py

Co-authored-by: hlky <hlky@hlky.ac>

* update

* add test and update

* remove some changes from docs

* refactor

* fix

* undo changes to examples

* remove save/load and fuse methods

* update

* link hf-doc-img & make test extremely small

* update

* add lora

* fix test

* update

* update

* change expected_diff_max to 0.4

* fix typo

* fix link

* fix typo

* update docs

* update

* remove consisid lora tests

---------

Co-authored-by: hlky <hlky@hlky.ac>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
Co-authored-by: Aryan <aryan@huggingface.co>
2025-01-19 13:10:08 +05:30
Juan Acevedo
aeac0a00f8 implementing flux on TPUs with ptxla (#10515)
* implementing flux on TPUs with ptxla

* add xla flux attention class

* run make style/quality

* Update src/diffusers/models/attention_processor.py

Co-authored-by: YiYi Xu <yixu310@gmail.com>

* Update src/diffusers/models/attention_processor.py

Co-authored-by: YiYi Xu <yixu310@gmail.com>

* run style and quality

---------

Co-authored-by: Juan Acevedo <jfacevedo@google.com>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: YiYi Xu <yixu310@gmail.com>
2025-01-16 08:46:02 -10:00
Leo Jiang
cecada5280 NPU adaption for RMSNorm (#10534)
* NPU adaption for RMSNorm

* NPU adaption for RMSNorm

---------

Co-authored-by: J石页 <jiangshuo9@h-partners.com>
2025-01-16 08:45:29 -10:00
C
17d99c4d22 [Docs] Add documentation about using ParaAttention to optimize FLUX and HunyuanVideo (#10544)
* add para_attn_flux.md and para_attn_hunyuan_video.md

* add enable_sequential_cpu_offload in para_attn_hunyuan_video.md

* add comment

* refactor

* fix

* fix

* Update docs/source/en/optimization/para_attn.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/optimization/para_attn.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/optimization/para_attn.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/optimization/para_attn.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/optimization/para_attn.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/optimization/para_attn.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/optimization/para_attn.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/optimization/para_attn.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/optimization/para_attn.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/optimization/para_attn.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* fix

* update links

* Update docs/source/en/optimization/para_attn.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/optimization/para_attn.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/optimization/para_attn.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* fix

* Update docs/source/en/optimization/para_attn.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/optimization/para_attn.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/optimization/para_attn.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/optimization/para_attn.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/optimization/para_attn.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/optimization/para_attn.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/optimization/para_attn.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/optimization/para_attn.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-01-16 10:05:13 -08:00
hlky
08e62fe0c2 Scheduling fixes on MPS (#10549)
* use np.int32 in scheduling

* test_add_noise_device

* -np.int32, fixes
2025-01-16 07:45:03 -10:00
Daniel Regado
9e1b8a0017 [Docs] Update SD3 ip_adapter model_id to diffusers checkpoint (#10597)
Update to diffusers ip_adapter ckpt
2025-01-16 07:43:29 -10:00
hlky
0b065c099a Move buffers to device (#10523)
* Move buffers to device

* add test

* named_buffers
2025-01-16 07:42:56 -10:00
Junyu Chen
b785ddb654 [DC-AE, SANA] fix SanaMultiscaleLinearAttention apply_quadratic_attention bf16 (#10595)
* autoencoder_dc tiling

* add tiling and slicing support in SANA pipelines

* create variables for padding length because the line becomes too long

* add tiling and slicing support in pag SANA pipelines

* revert changes to tile size

* make style

* add vae tiling test

* fix SanaMultiscaleLinearAttention apply_quadratic_attention bf16

---------

Co-authored-by: Aryan <aryan@huggingface.co>
2025-01-16 16:49:02 +05:30
Daniel Regado
e8114bd068 IP-Adapter for StableDiffusion3Img2ImgPipeline (#10589)
Added support for IP-Adapter
2025-01-16 09:46:22 +00:00
Leo Jiang
b0c8973834 [Sana 4K] Add vae tiling option to avoid OOM (#10583)
Co-authored-by: J石页 <jiangshuo9@h-partners.com>
2025-01-16 02:06:07 +05:30
Sayak Paul
c944f0651f [Chore] fix vae annotation in mochi pipeline (#10585)
fix vae annotation in mochi pipeline
2025-01-15 15:19:51 +05:30
Sayak Paul
bba59fb88b [Tests] add: test to check 8bit bnb quantized models work with lora loading. (#10576)
* add: test to check 8bit bnb quantized models work with lora loading.

* Update tests/quantization/bnb/test_mixed_int8.py

Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>

---------

Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>
2025-01-15 13:05:26 +05:30
Sayak Paul
2432f80ca3 [LoRA] feat: support loading loras into 4bit quantized Flux models. (#10578)
* feat: support loading loras into 4bit quantized models.

* updates

* update

* remove weight check.
2025-01-15 12:40:40 +05:30
Aryan
f9e957f011 Fix offload tests for CogVideoX and CogView3 (#10547)
* update

* update
2025-01-15 12:24:46 +05:30
Daniel Regado
4dec63c18e IP-Adapter for StableDiffusion3InpaintPipeline (#10581)
* Added support for IP-Adapter

* Added joint_attention_kwargs property
2025-01-15 06:52:23 +00:00
Junsong Chen
3d70777379 [Sana-4K] (#10537)
* [Sana 4K]
add 4K support for Sana

* [Sana-4K] fix SanaPAGPipeline

* add VAE automatically tiling function;

* set clean_caption to False;

* add warnings for VAE OOM.

* style

---------

Co-authored-by: yiyixuxu <yixu310@gmail.com>
2025-01-14 11:48:56 -10:00
Teriks
6b727842d7 allow passing hf_token to load_textual_inversion (#10546)
Co-authored-by: Teriks <Teriks@users.noreply.github.com>
2025-01-14 11:48:34 -10:00
Dhruv Nair
be62c85cd9 [CI] Update HF Token on Fast GPU Model Tests (#10570)
update
2025-01-14 17:00:32 +05:30
Marc Sun
fbff43acc9 [FEAT] DDUF format (#10037)
* load and save dduf archive

* style

* switch to zip uncompressed

* updates

* Update src/diffusers/pipelines/pipeline_utils.py

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

* Update src/diffusers/pipelines/pipeline_utils.py

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

* first draft

* remove print

* switch to dduf_file for consistency

* switch to huggingface hub api

* fix log

* add a basic test

* Update src/diffusers/configuration_utils.py

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

* Update src/diffusers/pipelines/pipeline_utils.py

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

* Update src/diffusers/pipelines/pipeline_utils.py

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

* fix

* fix variant

* change saving logic

* DDUF - Load transformers components manually (#10171)

* update hfh version

* Load transformers components manually

* load encoder from_pretrained with state_dict

* working version with transformers and tokenizer !

* add generation_config case

* fix tests

* remove saving for now

* typing

* need next version from transformers

* Update src/diffusers/configuration_utils.py

Co-authored-by: Lucain <lucain@huggingface.co>

* check path corectly

* Apply suggestions from code review

Co-authored-by: Lucain <lucain@huggingface.co>

* udapte

* typing

* remove check for subfolder

* quality

* revert setup changes

* oups

* more readable condition

* add loading from the hub test

* add basic docs.

* Apply suggestions from code review

Co-authored-by: Lucain <lucain@huggingface.co>

* add example

* add

* make functions private

* Apply suggestions from code review

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* minor.

* fixes

* fix

* change the precdence of parameterized.

* error out when custom pipeline is passed with dduf_file.

* updates

* fix

* updates

* fixes

* updates

* fix xfail condition.

* fix xfail

* fixes

* sharded checkpoint compat

* add test for sharded checkpoint

* add suggestions

* Update src/diffusers/models/model_loading_utils.py

Co-authored-by: YiYi Xu <yixu310@gmail.com>

* from suggestions

* add class attributes to flag dduf tests

* last one

* fix logic

* remove comment

* revert changes

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: Lucain <lucain@huggingface.co>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
Co-authored-by: YiYi Xu <yixu310@gmail.com>
2025-01-14 13:21:42 +05:30
Dhruv Nair
3279751bf9 [CI] Update HF Token in Fast GPU Tests (#10568)
update
2025-01-14 13:04:26 +05:30
hlky
4a4afd5ece Fix batch > 1 in HunyuanVideo (#10548) 2025-01-14 10:25:06 +05:30
Aryan
aa79d7da46 Test sequential cpu offload for torchao quantization (#10506)
test sequential cpu offload
2025-01-14 09:54:06 +05:30
Sayak Paul
74b67524b5 [Docs] Update hunyuan_video.md to rectify the checkpoint id (#10524)
* Update hunyuan_video.md to rectify the checkpoint id

* bfloat16

* more fixes

* don't update the checkpoint ids.

* update

* t -> T

* Apply suggestions from code review

* fix

---------

Co-authored-by: YiYi Xu <yixu310@gmail.com>
2025-01-13 10:59:13 -10:00
Vinh H. Pham
794f7e49a9 Implement framewise encoding/decoding in LTX Video VAE (#10488)
* add framewise decode

* add framewise encode, refactor tiled encode/decode

* add sanity test tiling for ltx

* run make style

* Update src/diffusers/models/autoencoders/autoencoder_kl_ltx.py

Co-authored-by: Aryan <contact.aryanvs@gmail.com>

---------

Co-authored-by: Pham Hong Vinh <vinhph3@vng.com.vn>
Co-authored-by: Aryan <contact.aryanvs@gmail.com>
2025-01-13 10:58:32 -10:00
Daniel Regado
9fc9c6dd71 Added IP-Adapter for StableDiffusion3ControlNetInpaintingPipeline (#10561)
* Added support for IP-Adapter

* Fixed Copied inconsistency
2025-01-13 10:15:36 -10:00
Omar Awile
df355ea2c6 Fix documentation for FluxPipeline (#10563)
Fix argument name in 8bit quantized example

Found a tiny mistake in the documentation where the text encoder model was passed to the wrong argument in the FluxPipeline.from_pretrained function.
2025-01-13 11:56:32 -08:00
Junsong Chen
ae019da9e3 [Sana] add Sana to auto-text2image-pipeline; (#10538)
add Sana to auto-text2image-pipeline;
2025-01-13 09:54:37 -10:00
Sayak Paul
329771e542 [LoRA] improve failure handling for peft. (#10551)
* improve failure handling for peft.

* emppty

* Update src/diffusers/loaders/peft.py

Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>

---------

Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>
2025-01-13 09:20:49 -10:00
Dhruv Nair
f7cb595428 [Single File] Fix loading Flux Dev finetunes with Comfy Prefix (#10545)
* update

* update

* update

* update

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2025-01-13 21:25:07 +05:30
hlky
c3478a42b9 Fix Nightly AudioLDM2PipelineFastTests (#10556)
* Fix Nightly AudioLDM2PipelineFastTests

* add phonemizer to setup extras test

* fix

* make style
2025-01-13 13:54:06 +00:00
hlky
980736b792 Fix train_dreambooth_lora_sd3_miniature (#10554) 2025-01-13 13:47:27 +00:00
hlky
50c81df4e7 Fix StableDiffusionInstructPix2PixPipelineSingleFileSlowTests (#10557) 2025-01-13 13:47:10 +00:00
Aryan
e1c7269720 Fix Latte output_type (#10558)
update
2025-01-13 19:15:59 +05:30
Sayak Paul
edb8c1bce6 [Flux] Improve true cfg condition (#10539)
* improve flux true cfg condition

* add test
2025-01-12 18:33:34 +05:30
Sayak Paul
0785dba4df [Docs] Add negative prompt docs to FluxPipeline (#10531)
* add negative_prompt documentation.

* add proper docs for negative prompts

* fix-copies

* remove comment.

* Apply suggestions from code review

Co-authored-by: hlky <hlky@hlky.ac>

* fix-copies

---------

Co-authored-by: hlky <hlky@hlky.ac>
2025-01-12 18:02:46 +05:30
Muyang Li
5cda8ea521 Use randn_tensor to replace torch.randn (#10535)
`torch.randn` requires `generator` and `latents` on the same device, while the wrapped function `randn_tensor` does not have this issue.
2025-01-12 11:41:41 +05:30
Sayak Paul
36acdd7517 [Tests] skip tests properly with unittest.skip() (#10527)
* skip tests properly.

* more

* more
2025-01-11 08:46:22 +05:30
Junyu Chen
e7db062e10 [DC-AE] support tiling for DC-AE (#10510)
* autoencoder_dc tiling

* add tiling and slicing support in SANA pipelines

* create variables for padding length because the line becomes too long

* add tiling and slicing support in pag SANA pipelines

* revert changes to tile size

* make style

* add vae tiling test

---------

Co-authored-by: Aryan <aryan@huggingface.co>
2025-01-11 07:15:26 +05:30
andreabosisio
1b0fe63656 Typo fix in the table number of a referenced paper (#10528)
Correcting a typo in the table number of a referenced paper (in scheduling_ddim_inverse.py)

Changed the number of the referenced table from 1 to 2 in a comment of the set_timesteps() method of the DDIMInverseScheduler class (also according to the description of the 'timestep_spacing' attribute of its __init__ method).
2025-01-10 17:15:25 -08:00
chaowenguo
d6c030fd37 add the xm.mark_step for the first denosing loop (#10530)
* Update rerender_a_video.py

* Update rerender_a_video.py

* Update examples/community/rerender_a_video.py

Co-authored-by: hlky <hlky@hlky.ac>

* Update rerender_a_video.py

* make style

---------

Co-authored-by: hlky <hlky@hlky.ac>
Co-authored-by: YiYi Xu <yixu310@gmail.com>
2025-01-10 21:03:41 +00:00
Sayak Paul
9f06a0d1a4 [CI] Match remaining assertions from big runner (#10521)
* print

* remove print.

* print

* update slice.

* empty
2025-01-10 16:37:36 +05:30
Daniel Hipke
52c05bd4cd Add a disable_mmap option to the from_single_file loader to improve load performance on network mounts (#10305)
* Add no_mmap arg.

* Fix arg parsing.

* Update another method to force no mmap.

* logging

* logging2

* propagate no_mmap

* logging3

* propagate no_mmap

* logging4

* fix open call

* clean up logging

* cleanup

* fix missing arg

* update logging and comments

* Rename to disable_mmap and update other references.

* [Docs] Update ltx_video.md to remove generator from `from_pretrained()` (#10316)

Update ltx_video.md to remove generator from `from_pretrained()`

* docs: fix a mistake in docstring (#10319)

Update pipeline_hunyuan_video.py

docs: fix a mistake

* [BUG FIX] [Stable Audio Pipeline] Resolve torch.Tensor.new_zeros() TypeError in function prepare_latents caused by audio_vae_length (#10306)

[BUG FIX] [Stable Audio Pipeline] TypeError: new_zeros(): argument 'size' failed to unpack the object at pos 3 with error "type must be tuple of ints,but got float"

torch.Tensor.new_zeros() takes a single argument size (int...) – a list, tuple, or torch.Size of integers defining the shape of the output tensor.

in function prepare_latents:
audio_vae_length = self.transformer.config.sample_size * self.vae.hop_length
audio_shape = (batch_size // num_waveforms_per_prompt, audio_channels, audio_vae_length)
...
audio = initial_audio_waveforms.new_zeros(audio_shape)

audio_vae_length evaluates to float because self.transformer.config.sample_size returns a float

Co-authored-by: hlky <hlky@hlky.ac>

* [docs] Fix quantization links (#10323)

Update overview.md

* [Sana]add 2K related model for Sana (#10322)

add 2K related model for Sana

* Update src/diffusers/loaders/single_file_model.py

Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>

* Update src/diffusers/loaders/single_file.py

Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>

* make style

---------

Co-authored-by: hlky <hlky@hlky.ac>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: Leojc <liao_junchao@outlook.com>
Co-authored-by: Aditya Raj <syntaxticsugr@gmail.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
Co-authored-by: Junsong Chen <cjs1020440147@icloud.com>
Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>
2025-01-10 15:41:04 +05:30
Sayak Paul
a6f043a80f [LoRA] allow big CUDA tests to run properly for LoRA (and others) (#9845)
* allow big lora tests to run on the CI.

* print

* print.

* print

* print

* print

* print

* more

* print

* remove print.

* remove print

* directly place on cuda.

* remove pipeline.

* remove

* fix

* fix

* spaces

* quality

* updates

* directly place flux controlnet pipeline on cuda.

* torch_device instead of cuda.

* style

* device placement.

* fixes

* add big gpu marker for mochi; rename test correctly

* address feedback

* fix

---------

Co-authored-by: Aryan <aryan@huggingface.co>
2025-01-10 12:50:24 +05:30
hlky
12fbe3f7dc Use Pipelines without unet (#10440)
* Use Pipelines without unet

* unet.config.in_channels

* default_sample_size

* is_unet_version_less_0_9_0

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2025-01-10 04:45:42 +00:00
Linoy Tsaban
83ba01a38d small readme changes for advanced training examples (#10473)
add to readme about hf login and wandb installation to address https://github.com/huggingface/diffusers/issues/10142#issuecomment-2571655570

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2025-01-10 07:35:19 +05:30
Zehuan Huang
7116fd24e5 Support pass kwargs to cogvideox custom attention processor (#10456)
* Support pass kwargs to cogvideox custom attention processor

* remove args in cogvideox attn processor

* remove unused kwargs
2025-01-09 11:57:03 -10:00
Sayak Paul
553b13845f [LoRA] clean up load_lora_into_text_encoder() and fuse_lora() copied from (#10495)
* factor out text encoder loading.

* make fix-copies

* remove copied from fuse_lora and unfuse_lora as needed.

* remove unused imports
2025-01-09 11:29:16 -10:00
chaowenguo
7bc8b92384 add callable object to convert frame into control_frame to reduce cpu memory usage. (#10501)
* Update rerender_a_video.py

* Update rerender_a_video.py

* Update examples/community/rerender_a_video.py

Co-authored-by: hlky <hlky@hlky.ac>

---------

Co-authored-by: hlky <hlky@hlky.ac>
Co-authored-by: YiYi Xu <yixu310@gmail.com>
2025-01-09 11:25:53 -10:00
Vladimir Mandic
f0c6d9784b flux: make scheduler config params optional (#10384)
* dont assume scheduler has optional config params

* make style, make fix-copies

* calculate_shift

* fix-copies, usage in pipelines

---------

Co-authored-by: hlky <hlky@hlky.ac>
2025-01-09 10:44:26 -10:00
Steven Liu
d006f0769b [docs] Fix missing parameters in docstrings (#10419)
* fix docstrings

* add
2025-01-09 10:54:39 -08:00
geronimi73
a26d57097a AutoModel instead of AutoModelForCausalLM (#10507) 2025-01-09 16:28:04 +05:30
Sayak Paul
daf9d0f119 [chore] remove prints from tests. (#10505)
remove prints from tests.
2025-01-09 14:19:43 +05:30
hlky
95c5ce4e6f PyTorch/XLA support (#10498)
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2025-01-08 12:31:27 -10:00
Junsong Chen
c0964571fc [Sana 4K] (#10493)
add 4K support for Sana
2025-01-08 11:58:11 -10:00
hlky
b13cdbb294 UNet2DModel mid_block_type (#10469) 2025-01-08 10:50:29 -10:00
Bagheera
a0acbdc989 fix for #7365, prevent pipelines from overriding provided prompt embeds (#7926)
* fix for #7365, prevent pipelines from overriding provided prompt embeds

* fix-copies

* fix implementation

* update

---------

Co-authored-by: bghira <bghira@users.github.com>
Co-authored-by: Aryan <aryan@huggingface.co>
Co-authored-by: sayakpaul <spsayakpaul@gmail.com>
2025-01-08 10:12:12 -10:00
Parag Ekbote
5655b22ead Notebooks for Community Scripts-5 (#10499)
Add 5 Notebooks for Diffusers Community
Pipelines.
2025-01-08 08:56:17 -08:00
hlky
4df9d49218 Fix tokenizers install from main in LoRA tests (#10494)
* Fix tokenizers install from main in LoRA tests

* @

* rust

* -e

* uv

* just update tokenizers
2025-01-08 16:14:25 +00:00
Dhruv Nair
9731773d39 [CI] Torch Min Version Test Fix (#10491)
update
2025-01-08 19:43:38 +05:30
Marc Sun
e2deb82e69 Fix compatibility with pipeline when loading model with device_map on single gpu (#10390)
* fix device issue in single gpu case

* Update src/diffusers/pipelines/pipeline_utils.py

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2025-01-08 11:35:00 +01:00
hlky
1288c8560a Update tokenizers in pr_test_peft_backend (#10132)
Update tokenizers
2025-01-08 10:09:32 +00:00
AstraliteHeart
cb342b745a Add AuraFlow GGUF support (#10463)
* Add support for loading AuraFlow models from GGUF

https://huggingface.co/city96/AuraFlow-v0.3-gguf

* Update AuraFlow documentation for GGUF, add GGUF tests and model detection.

* Address code review comments.

* Remove unused config.

---------

Co-authored-by: hlky <hlky@hlky.ac>
2025-01-08 13:23:12 +05:30
Junsong Chen
80fd9260bb [Sana][bug fix]change clean_caption from True to False. (#10481)
change clean_caption from True to False.
2025-01-07 15:31:23 -10:00
Aryan
71ad16b463 Add _no_split_modules to some models (#10308)
* set supports gradient checkpointing to true where necessary; add missing no split modules

* fix cogvideox tests

* update

---------

Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>
2025-01-08 06:34:19 +05:30
hlky
ee7e141d80 Use pipelines without vae (#10441)
* Use pipelines without vae

* getattr

* vqvae

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2025-01-07 13:26:51 -10:00
hlky
01bd79649e Fix HunyuanVideo produces NaN on PyTorch<2.5 (#10482)
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2025-01-07 13:13:55 -10:00
Teriks
03bcf5aefe RFInversionFluxPipeline, small fix for enable_model_cpu_offload & enable_sequential_cpu_offload compatibility (#10480)
RFInversionFluxPipeline.encode_image, device fix

Use self._execution_device instead of self.device when selecting
a device for the input image tensor.

This allows for compatibility with enable_model_cpu_offload &
enable_sequential_cpu_offload

Co-authored-by: Teriks <Teriks@users.noreply.github.com>
Co-authored-by: Linoy Tsaban <57615435+linoytsaban@users.noreply.github.com>
2025-01-07 15:47:28 +01:00
dependabot[bot]
e0b96ba7b0 Bump jinja2 from 3.1.4 to 3.1.5 in /examples/research_projects/realfill (#10377)
Bumps [jinja2](https://github.com/pallets/jinja) from 3.1.4 to 3.1.5.
- [Release notes](https://github.com/pallets/jinja/releases)
- [Changelog](https://github.com/pallets/jinja/blob/main/CHANGES.rst)
- [Commits](https://github.com/pallets/jinja/compare/3.1.4...3.1.5)

---
updated-dependencies:
- dependency-name: jinja2
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-01-07 19:59:41 +05:30
Dhruv Nair
854a04659c [CI] Add minimal testing for legacy Torch versions (#10479)
* update

* update
2025-01-07 18:51:41 +05:30
hlky
628f2c544a Use Pipelines without scheduler (#10439)
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2025-01-07 12:07:08 +00:00
Aryan
811560b1d7 [LoRA] Support original format loras for HunyuanVideo (#10376)
* update

* fix make copies

* update

* add relevant markers to the integration test suite.

* add copied.

* fox-copies

* temporarily add print.

* directly place on CUDA as CPU isn't that big on the CIO.

* fixes to fuse_lora, aryan was right.

* fixes

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2025-01-07 13:18:57 +05:30
Rahul Raman
f1e0c7ce4a Refactor instructpix2pix lora to support peft (#10205)
* make base code changes referred from train_instructpix2pix script in examples

* change code to use PEFT as discussed in issue 10062

* update README training command

* update README training command

* refactor variable name and freezing unet

* Update examples/research_projects/instructpix2pix_lora/train_instruct_pix2pix_lora.py

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

* update README installation instructions.

* cleanup code using make style and quality

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2025-01-07 12:00:45 +05:30
Sayak Paul
b94cfd7937 [Training] QoL improvements in the Flux Control training scripts (#10461)
* qol improvements to the Flux script.

* propagate the dataloader changes.
2025-01-07 11:56:17 +05:30
Aryan
661bde0ff2 Fix style (#10478)
fix
2025-01-07 11:06:36 +05:30
Ameer Azam
4f5e3e35d2 Regarding the RunwayML path for V1.5 did change to stable-diffusion-v1-5/[stable-diffusion-v1-5/ stable-diffusion-inpainting] (#10476)
* Update pipeline_controlnet.py

* Update pipeline_controlnet_img2img.py

runwayml Take-down so change all from to this
stable-diffusion-v1-5/stable-diffusion-v1-5

* Update pipeline_controlnet_inpaint.py

* runwayml take-down make change to sd-legacy

* runwayml take-down make change to sd-legacy

* runwayml take-down make change to sd-legacy

* runwayml take-down make change to sd-legacy

* Update convert_blipdiffusion_to_diffusers.py

style change
2025-01-06 15:01:52 -08:00
hlky
8f2253c58c Add torch_xla and from_single_file to instruct-pix2pix (#10444)
* Add torch_xla and from_single_file to instruct-pix2pix

* StableDiffusionInstructPix2PixPipelineSingleFileSlowTests

* StableDiffusionInstructPix2PixPipelineSingleFileSlowTests

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: YiYi Xu <yixu310@gmail.com>
2025-01-06 10:11:16 -10:00
Aryan
7747b588e2 Fix hunyuan video attention mask dim (#10454)
* fix

* add coauthor

Co-Authored-By: Nerogar <nerogar@arcor.de>

---------

Co-authored-by: Nerogar <nerogar@arcor.de>
2025-01-06 10:07:54 -10:00
Sayak Paul
d9d94e12f3 [LoRA] fix: lora unloading when using expanded Flux LoRAs. (#10397)
* fix: lora unloading when using expanded Flux LoRAs.

* fix argument name.

Co-authored-by: a-r-r-o-w <contact.aryanvs@gmail.com>

* docs.

---------

Co-authored-by: a-r-r-o-w <contact.aryanvs@gmail.com>
2025-01-06 08:35:05 -10:00
hlky
2f25156c14 LEditsPP - examples, check height/width, add tiling/slicing (#10471)
* LEditsPP - examples, check height/width, add tiling/slicing

* make style
2025-01-06 08:19:53 -10:00
SahilCarterr
6da6406529 [Fix] broken links in docs (#10434)
* Fix broken links in docs

* fix parenthesis
2025-01-06 10:07:38 -08:00
Aryan
04e783cd9e Update variable names correctly in docs (#10435)
fix
2025-01-06 08:56:43 -08:00
hlky
1896b1f7c1 lora_bias PEFT version check in unet.load_attn_procs (#10474)
`lora_bias` PEFT version check in `unet.load_attn_procs` path
2025-01-06 21:27:56 +05:30
Sayak Paul
b5726358cf [Tests] add slow and nightly markers to sd3 lora integation. (#10458)
add slow and nightly markers to sd3 lora integation.
2025-01-06 07:29:04 +05:30
hlky
fdcbbdf0bb Add torch_xla and from_single_file support to TextToVideoZeroPipeline (#10445)
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2025-01-05 05:24:28 +00:00
chaowenguo
4e44534845 Update rerender_a_video.py fix dtype error (#10451)
Update rerender_a_video.py
2025-01-04 14:52:50 +00:00
chaowenguo
a17832b2d9 add pythor_xla support for render a video (#10443)
* Update rerender_a_video.py

* Update rerender_a_video.py

* make style

---------

Co-authored-by: hlky <hlky@hlky.ac>
2025-01-03 16:00:02 +00:00
hlky
c28db0aa5b Fix AutoPipeline from_pipe where source pipeline is missing target pipeline's optional components (#10400)
* Optional components in AutoPipeline

* missing_modules

---------

Co-authored-by: YiYi Xu <yixu310@gmail.com>
2025-01-02 11:06:51 -10:00
Doug J
f7822ae4bf Update train_text_to_image_sdxl.py (#8830)
Enable VAE hash to be able to change with args change. If not, train_dataset_with_embeddiings may have row number inconsistency with train_dataset_with_vae.

Co-authored-by: Linoy Tsaban <57615435+linoytsaban@users.noreply.github.com>
2025-01-02 10:41:18 -10:00
Steven Liu
d81cc6f1da [docs] Fix internal links (#10418)
fix links

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2025-01-02 10:11:16 -10:00
Aryan
476795c5c3 Update Flux docstrings (#10423)
update
2025-01-02 10:06:18 -10:00
Sayak Paul
3cb66865f7 [LTX-Video] fix attribute adjustment for ltx. (#10426)
fix attribute adjustment for ltx.
2025-01-02 10:05:41 -10:00
Daniel Regado
68bd6934b1 IP-Adapter support for StableDiffusion3ControlNetPipeline (#10363)
* IP-Adapter support for `StableDiffusion3ControlNetPipeline`

* Update src/diffusers/pipelines/controlnet_sd3/pipeline_stable_diffusion_3_controlnet.py

Co-authored-by: hlky <hlky@hlky.ac>

---------

Co-authored-by: hlky <hlky@hlky.ac>
2025-01-02 10:02:32 -10:00
G.O.D
f4fdb3a0ab fix bug for ascend npu (#10429) 2025-01-02 09:52:53 -10:00
Junsong Chen
7ab7c12173 [Sana] 1k PE bug fixed (#10431)
fix pe bug for Sana

Co-authored-by: YiYi Xu <yixu310@gmail.com>
2025-01-02 09:50:51 -10:00
maxs-kan
44640c8358 Fix Flux multiple Lora loading bug (#10388)
* check for base_layer key in transformer state dict

* test_lora_expansion_works_for_absent_keys

* check

* Update tests/lora/test_lora_layers_flux.py

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

* check

* test_lora_expansion_works_for_absent_keys/test_lora_expansion_works_for_extra_keys

* absent->extra

---------

Co-authored-by: hlky <hlky@hlky.ac>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2025-01-02 08:34:48 -10:00
Dev Rajput
4b9f1c7d8c Add correct number of channels when resuming from checkpoint for Flux Control LoRa training (#10422)
* Add correct number of channels when resuming from checkpoint

* Fix Formatting
2025-01-02 15:51:44 +05:30
Steven Liu
91008aabc4 [docs] Video generation update (#10272)
* update

* update

* feedback

* fix videos

* use previous checkpoint

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-12-31 12:44:57 -08:00
Steven Liu
0744378dc0 [docs] Quantization tip (#10249)
* quantization

* add other vid models

* typo

* more pipelines

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-12-31 08:52:11 -08:00
Luchao Qi
3f591ef975 [Typo] Update md files (#10404)
* Update pix2pix.md

fix hyperlink error

* fix md link typos

* fix md typo - remove ".md" at the end of links

* [Fix] Broken links in hunyuan docs (#10402)

* fix-hunyuan-broken-links

* [Fix] docs broken links hunyuan

* [training] add ds support to lora sd3. (#10378)

* add ds support to lora sd3.

Co-authored-by: leisuzz <jiangshuonb@gmail.com>

* style.

---------

Co-authored-by: leisuzz <jiangshuonb@gmail.com>
Co-authored-by: Linoy Tsaban <57615435+linoytsaban@users.noreply.github.com>

* fix md typo - remove ".md" at the end of links

* fix md link typos

* fix md typo - remove ".md" at the end of links

---------

Co-authored-by: SahilCarterr <110806554+SahilCarterr@users.noreply.github.com>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: leisuzz <jiangshuonb@gmail.com>
Co-authored-by: Linoy Tsaban <57615435+linoytsaban@users.noreply.github.com>
2024-12-31 08:37:00 -08:00
Sayak Paul
5f72473543 [training] add ds support to lora sd3. (#10378)
* add ds support to lora sd3.

Co-authored-by: leisuzz <jiangshuonb@gmail.com>

* style.

---------

Co-authored-by: leisuzz <jiangshuonb@gmail.com>
Co-authored-by: Linoy Tsaban <57615435+linoytsaban@users.noreply.github.com>
2024-12-30 19:31:05 +05:30
SahilCarterr
01780c3c9c [Fix] Broken links in hunyuan docs (#10402)
* fix-hunyuan-broken-links

* [Fix] docs broken links hunyuan
2024-12-28 10:01:26 -10:00
hlky
55ac1dbdf2 Default values in SD3 pipelines when submodules are not loaded (#10393)
SD3 pipelines hasattr
2024-12-27 07:58:49 -10:00
SahilCarterr
83da817f73 [Add] torch_xla support to pipeline_sana.py (#10364)
[Add] torch_xla support in pipeline_sana.py
2024-12-27 08:33:11 +00:00
Alan Ponnachan
f430a0cf32 Add torch_xla support to pipeline_aura_flow.py (#10365)
* Add torch_xla support to pipeline_aura_flow.py

* make style

---------

Co-authored-by: hlky <hlky@hlky.ac>
2024-12-27 07:53:04 +00:00
Sayak Paul
1b202c5730 [LoRA] feat: support unload_lora_weights() for Flux Control. (#10206)
* feat: support unload_lora_weights() for Flux Control.

* tighten test

* minor

* updates

* meta device fixes.
2024-12-25 17:27:16 +05:30
Aryan
cd991d1e1a Fix TorchAO related bugs; revert device_map changes (#10371)
* Revert "Add support for sharded models when TorchAO quantization is enabled (#10256)"

This reverts commit 41ba8c0bf6.

* update tests

* udpate

* update

* update

* update device map tests

* apply review suggestions

* update

* make style

* fix

* update docs

* update tests

* update workflow

* update

* improve tests

* allclose tolerance

* Update src/diffusers/models/modeling_utils.py

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

* Update tests/quantization/torchao/test_torchao.py

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

* improve tests

* fix

* update correct slices

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-12-25 15:37:49 +05:30
Sayak Paul
825979ddc3 [training] fix: registration of out_channels in the control flux scripts. (#10367)
* fix: registration of out_channels in the control flux scripts.

* free memory.
2024-12-24 21:44:44 +05:30
Fanli Lin
023b0e0d55 [tests] fix AssertionError: Torch not compiled with CUDA enabled (#10356)
fix bug on xpu
2024-12-24 15:28:50 +00:00
Eliseu Silva
c0c11683f3 Make passing the IP Adapter mask to the attention mechanism optional (#10346)
Make passing the IP Adapter mask to the attention mechanism optional if there is no need to apply it to a given IP Adapter.
2024-12-24 15:28:42 +00:00
YiYi Xu
6dfaec3487 make style for https://github.com/huggingface/diffusers/pull/10368 (#10370)
* fix bug for torch.uint1-7 not support in torch<2.6

* up

---------

Co-authored-by: baymax591 <cbai@mail.nwpu.edu.cn>
2024-12-23 19:52:21 -10:00
suzukimain
c1e7fd5b34 [Docs] Added model search to community_projects.md (#10358)
Update community_projects.md
2024-12-23 17:14:26 -10:00
Sayak Paul
9d2c8d8859 fix test pypi installation in the release workflow (#10360)
fix
2024-12-24 07:48:18 +05:30
Sayak Paul
92933ec36a [chore] post release 0.32.0 (#10361)
* post release 0.32.0

* stylew
2024-12-23 10:03:34 -10:00
Aryan
4b557132ce [core] LTX Video 0.9.1 (#10330)
* update

* make style

* update

* update

* update

* make style

* single file related changes

* update

* fix

* update single file urls and docs

* update

* fix
2024-12-23 19:51:33 +05:30
Sayak Paul
851dfa30ae [Tests] Fix more tests sayak (#10359)
* fixes to tests

* fixture

* fixes
2024-12-23 19:11:21 +05:30
Sayak Paul
ea1ba0ba53 [LoRA] test fix (#10351)
updates
2024-12-23 15:45:45 +05:30
Aryan
9d27df8071 Rename LTX blocks and docs title (#10213)
* rename blocks and docs

* fix docs

---------

Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>
2024-12-23 15:29:10 +05:30
Aryan
055d95543a Fix failing CogVideoX LoRA fuse test (#10352)
fix
2024-12-23 14:22:09 +05:30
hlky
71cc2013fe Fix FluxIPAdapterTesterMixin (#10354) 2024-12-23 14:20:06 +05:30
Sayak Paul
c34fc34563 [Tests] QoL improvements to the LoRA test suite (#10304)
* misc lora test improvements.

* updates

* fixes to tests
2024-12-23 13:59:55 +05:30
Dhruv Nair
5fcee4a447 [Single File] Fix loading (#10349)
update
2024-12-23 13:12:23 +05:30
Sayak Paul
76e2727b5c [SANA LoRA] sana lora training tests and misc. (#10296)
* sana lora training tests and misc.

* remove push to hub

* Update examples/dreambooth/train_dreambooth_lora_sana.py

Co-authored-by: Aryan <aryan@huggingface.co>

---------

Co-authored-by: Aryan <aryan@huggingface.co>
2024-12-23 12:35:13 +05:30
Aryan
02c777c065 [tests] Refactor TorchAO serialization fast tests (#10271)
refactor
2024-12-23 11:04:57 +05:30
Sayak Paul
6a970a45c5 [docs] fix: torchao example. (#10278)
fix: torchao example.
2024-12-23 11:03:50 +05:30
Aryan
ffc0eaab6d Bump minimum TorchAO version to 0.7.0 (#10293)
* bump min torchao version to 0.7.0

* update
2024-12-23 11:03:04 +05:30
Thien Tran
3c2e2aa8a9 .from_single_file() - Add missing .shape (#10332)
Add missing `.shape`
2024-12-23 08:57:25 +05:30
Junsong Chen
b58868e6f4 [Sana bug] bug fix for 2K model config (#10340)
* fix the Positinoal Embedding bug in 2K model;

* Change the default model to the BF16 one for more stable training and output

* make style

* substract buffer size

* add compute_module_persistent_sizes

---------

Co-authored-by: yiyixuxu <yixu310@gmail.com>
2024-12-23 08:56:25 +05:30
Dhruv Nair
da21d590b5 [Single File] Add Single File support for HunYuan video (#10320)
* update

* Update src/diffusers/loaders/single_file_utils.py

Co-authored-by: Aryan <aryan@huggingface.co>

---------

Co-authored-by: Aryan <aryan@huggingface.co>
2024-12-23 08:44:58 +05:30
YiYi Xu
7c2f0afb1c update get_parameter_dtype (#10342)
add:
q
2024-12-23 08:14:13 +05:30
hlky
f615f00f58 Fix enable_sequential_cpu_offload in test_kandinsky_combined (#10324)
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-12-22 15:28:28 -10:00
Aryan
6aaa0518e3 Community hosted weights for diffusers format HunyuanVideo weights (#10344)
update docs and example to use community weights
2024-12-22 15:26:28 -10:00
Mehmet Yiğit Özgenç
233dffdc3f flux controlnet inpaint config bug (#10291)
* flux controlnet inpaint config bug

* Update src/diffusers/pipelines/flux/pipeline_flux_controlnet_inpainting.py

---------

Co-authored-by: yigitozgenc <yigit@quantuslabs.ai>
Co-authored-by: hlky <hlky@hlky.ac>
2024-12-21 18:44:43 +00:00
hlky
be2070991f Support Flux IP Adapter (#10261)
* Flux IP-Adapter

* test cfg

* make style

* temp remove copied from

* fix test

* fix test

* v2

* fix

* make style

* temp remove copied from

* Apply suggestions from code review

Co-authored-by: YiYi Xu <yixu310@gmail.com>

* Move encoder_hid_proj to inside FluxTransformer2DModel

* merge

* separate encode_prompt, add copied from, image_encoder offload

* make

* fix test

* fix

* Update src/diffusers/pipelines/flux/pipeline_flux.py

* test_flux_prompt_embeds change not needed

* true_cfg -> true_cfg_scale

* fix merge conflict

* test_flux_ip_adapter_inference

* add fast test

* FluxIPAdapterMixin not test mixin

* Update pipeline_flux.py

Co-authored-by: YiYi Xu <yixu310@gmail.com>

---------

Co-authored-by: YiYi Xu <yixu310@gmail.com>
2024-12-21 17:49:58 +00:00
hlky
bf9a641f1a Fix EMAModel test_from_pretrained (#10325) 2024-12-21 14:10:44 +00:00
hlky
a756694bf0 Fix push_tests_mps.yml (#10326) 2024-12-21 14:10:32 +00:00
Sayak Paul
d41388145e [Docs] Update gguf.md to remove generator from the pipeline from_pretrained (#10299)
Update gguf.md to remove generator from the pipeline from_pretrained
2024-12-21 07:15:03 +05:30
Junsong Chen
a6288a5571 [Sana]add 2K related model for Sana (#10322)
add 2K related model for Sana
2024-12-20 07:21:34 -10:00
Steven Liu
7d4db57037 [docs] Fix quantization links (#10323)
Update overview.md
2024-12-20 08:30:21 -08:00
Aditya Raj
902008608a [BUG FIX] [Stable Audio Pipeline] Resolve torch.Tensor.new_zeros() TypeError in function prepare_latents caused by audio_vae_length (#10306)
[BUG FIX] [Stable Audio Pipeline] TypeError: new_zeros(): argument 'size' failed to unpack the object at pos 3 with error "type must be tuple of ints,but got float"

torch.Tensor.new_zeros() takes a single argument size (int...) – a list, tuple, or torch.Size of integers defining the shape of the output tensor.

in function prepare_latents:
audio_vae_length = self.transformer.config.sample_size * self.vae.hop_length
audio_shape = (batch_size // num_waveforms_per_prompt, audio_channels, audio_vae_length)
...
audio = initial_audio_waveforms.new_zeros(audio_shape)

audio_vae_length evaluates to float because self.transformer.config.sample_size returns a float

Co-authored-by: hlky <hlky@hlky.ac>
2024-12-20 15:29:58 +00:00
Leojc
c8ee4af228 docs: fix a mistake in docstring (#10319)
Update pipeline_hunyuan_video.py

docs: fix a mistake
2024-12-20 15:22:32 +00:00
Sayak Paul
b64ca6c11c [Docs] Update ltx_video.md to remove generator from from_pretrained() (#10316)
Update ltx_video.md to remove generator from `from_pretrained()`
2024-12-20 18:32:22 +05:30
Dhruv Nair
e12d610faa Mochi docs (#9934)
* update

* update

* update

* update

* update

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-12-20 16:27:38 +05:30
Sayak Paul
bf6eaa8aec [Tests] add integration tests for lora expansion stuff in Flux. (#10318)
add integration tests for lora expansion stuff in Flux.
2024-12-20 16:14:58 +05:30
Sayak Paul
17128c42a4 [LoRA] feat: support loading regular Flux LoRAs into Flux Control, and Fill (#10259)
* lora expansion with dummy zeros.

* updates

* fix working 🥳

* working.

* use torch.device meta for state dict expansion.

* tests

Co-authored-by: a-r-r-o-w <contact.aryanvs@gmail.com>

* fixes

* fixes

* switch to debug

* fix

* Apply suggestions from code review

Co-authored-by: Aryan <aryan@huggingface.co>

* fix stuff

* docs

---------

Co-authored-by: a-r-r-o-w <contact.aryanvs@gmail.com>
Co-authored-by: Aryan <aryan@huggingface.co>
2024-12-20 14:30:32 +05:30
Dhruv Nair
dbc1d505f0 [Single File] Add GGUF support for LTX (#10298)
* update

* add docs.

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-12-20 11:52:29 +05:30
Aryan
151b74cd77 Make tensors in ResNet contiguous for Hunyuan VAE (#10309)
contiguous tensors in resnet

Co-authored-by: YiYi Xu <yixu310@gmail.com>
2024-12-20 11:45:37 +05:30
Aryan
41ba8c0bf6 Add support for sharded models when TorchAO quantization is enabled (#10256)
* add sharded + device_map check
2024-12-19 15:42:20 -10:00
Daniel Regado
3191248472 [WIP] SD3.5 IP-Adapter Pipeline Integration (#9987)
* Added support for single IPAdapter on SD3.5 pipeline



---------

Co-authored-by: hlky <hlky@hlky.ac>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
Co-authored-by: YiYi Xu <yixu310@gmail.com>
2024-12-19 14:48:18 -10:00
dg845
648d968cfc Enable Gradient Checkpointing for UNet2DModel (New) (#7201)
* Port UNet2DModel gradient checkpointing code from #6718.


---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: Vincent Neemie <92559302+VincentNeemie@users.noreply.github.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>
Co-authored-by: hlky <hlky@hlky.ac>
2024-12-19 14:45:45 -10:00
djm
b756ec6e80 unet's sample_size attribute is to accept tuple(h, w) in StableDiffusionPipeline (#10181) 2024-12-19 22:24:18 +00:00
Aryan
d8825e7697 Fix failing lora tests after HunyuanVideo lora (#10307)
fix
2024-12-20 02:35:41 +05:30
hlky
074798b299 Fix local_files_only for checkpoints with shards (#10294) 2024-12-19 07:04:57 -10:00
Dhruv Nair
3ee966950b Allow Mochi Transformer to be split across multiple GPUs (#10300)
update
2024-12-19 22:34:44 +05:30
Dhruv Nair
9764f229d4 [Single File] Add single file support for Mochi Transformer (#10268)
update
2024-12-19 22:20:40 +05:30
Shenghai Yuan
1826a1e7d3 [LoRA] Support HunyuanVideo (#10254)
* 1217

* 1217

* 1217

* update

* reverse

* add test

* update test

* make style

* update

* make style

---------

Co-authored-by: Aryan <aryan@huggingface.co>
2024-12-19 16:22:20 +05:30
hlky
0ed09a17bb Check correct model type is passed to from_pretrained (#10189)
* Check correct model type is passed to `from_pretrained`

* Flax, skip scheduler

* test_wrong_model

* Fix for scheduler

* Update tests/pipelines/test_pipelines.py

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

* EnumMeta

* Flax

* scheduler in expected types

* make

* type object 'CLIPTokenizer' has no attribute '_PipelineFastTests__name'

* support union

* fix typing in kandinsky

* make

* add LCMScheduler

* 'LCMScheduler' object has no attribute 'sigmas'

* tests for wrong scheduler

* make

* update

* warning

* tests

* Update src/diffusers/pipelines/pipeline_utils.py

Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>

* import FlaxSchedulerMixin

* skip scheduler

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>
2024-12-19 09:24:52 +00:00
赵三石
2f7a417d1f Update lora_conversion_utils.py (#9980)
x-flux single-blocks lora load

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: YiYi Xu <yixu310@gmail.com>
2024-12-18 23:07:50 -10:00
hlky
4450d26b63 Add Flux Control to AutoPipeline (#10292) 2024-12-18 22:28:56 -10:00
Aryan
f781b8c30c Hunyuan VAE tiling fixes and transformer docs (#10295)
* update

* udpate

* fix test
2024-12-19 10:28:10 +05:30
Sayak Paul
9c0e20de61 [chore] Update README_sana.md to update the default model (#10285)
Update README_sana.md to update the default model
2024-12-19 10:24:57 +05:30
Aryan
f35a38725b [tests] remove nullop import checks from lora tests (#10273)
remove nullop imports
2024-12-19 01:19:08 +05:30
Aryan
f66bd3261c Rename Mochi integration test correctly (#10220)
rename integration test
2024-12-18 22:41:23 +05:30
Aryan
c4c99c3907 [tests] Fix broken cuda, nightly and lora tests on main for CogVideoX (#10270)
fix joint pos embedding device
2024-12-18 22:36:08 +05:30
Dhruv Nair
862a7d5038 [Single File] Add single file support for Flux Canny, Depth and Fill (#10288)
update
2024-12-18 19:19:47 +05:30
Dhruv Nair
8304adce2a Make zeroing prompt embeds for Mochi Pipeline configurable (#10284)
update
2024-12-18 18:32:53 +05:30
Dhruv Nair
b389f339ec Fix Doc links in GGUF and Quantization overview docs (#10279)
* update

* Update docs/source/en/quantization/gguf.md

Co-authored-by: Aryan <aryan@huggingface.co>

---------

Co-authored-by: Aryan <aryan@huggingface.co>
2024-12-18 18:32:36 +05:30
hlky
e222246b4e Fix sigma_last with use_flow_sigmas (#10267) 2024-12-18 12:22:10 +00:00
Andrés Romero
83709d5a06 Flux Control(Depth/Canny) + Inpaint (#10192)
* flux_control_inpaint - failing test_flux_different_prompts

* removing test_flux_different_prompts?

* fix style

* fix from PR comments

* fix style

* reducing guidance_scale in demo

* Update src/diffusers/pipelines/flux/pipeline_flux_control_inpaint.py

Co-authored-by: hlky <hlky@hlky.ac>

* make

* prepare_latents is not copied from

* update docs

* typos

---------

Co-authored-by: affromero <ubuntu@ip-172-31-17-146.ec2.internal>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: hlky <hlky@hlky.ac>
2024-12-18 09:14:16 +00:00
Qin Zhou
8eb73c872a Support pass kwargs to sd3 custom attention processor (#9818)
* Support pass kwargs to sd3 custom attention processor


---------

Co-authored-by: hlky <hlky@hlky.ac>
Co-authored-by: YiYi Xu <yixu310@gmail.com>
2024-12-17 21:58:33 -10:00
Xinyuan Zhao
88b015dc9f Make time_embed_dim of UNet2DModel changeable (#10262) 2024-12-17 21:55:18 -10:00
Sayak Paul
63cdf9c0ba [chore] fix: reamde -> readme (#10276)
fix: reamde -> readme
2024-12-18 10:56:08 +05:30
hlky
0ac52d6f09 Use torch in get_2d_rotary_pos_embed (#10155)
* Use `torch` in `get_2d_rotary_pos_embed`

* Add deprecation
2024-12-17 18:26:52 -10:00
Sayak Paul
ba6fd6eb30 [chore] fix: licensing headers in mochi and ltx (#10275)
fix: licensing header.
2024-12-18 08:43:57 +05:30
Sayak Paul
9408aa2dfc [LoRA] feat: lora support for SANA. (#10234)
* feat: lora support for SANA.

* make fix-copies

* rename test class.

* attention_kwargs -> cross_attention_kwargs.

* Revert "attention_kwargs -> cross_attention_kwargs."

This reverts commit 23433bf9bc.

* exhaust 119 max line limit

* sana lora fine-tuning script.

* readme

* add a note about the supported models.

* Apply suggestions from code review

Co-authored-by: Aryan <aryan@huggingface.co>

* style

* docs for attention_kwargs.

* remove lora_scale from pag pipeline.

* copy fix

---------

Co-authored-by: Aryan <aryan@huggingface.co>
2024-12-18 08:22:31 +05:30
hlky
ec1c7a793f Add set_shift to FlowMatchEulerDiscreteScheduler (#10269) 2024-12-17 21:40:09 +00:00
cjkangme
9c68c945e9 [Community Pipeline] Fix typo that cause error on regional prompting pipeline (#10251)
fix: fix typo that cause error
2024-12-17 21:09:50 +00:00
Steven Liu
2739241ad1 [docs] delete_adapters() (#10245)
delete_adapters

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-12-17 09:26:45 -08:00
Aryan
1524781b88 [tests] Remove/rename unsupported quantization torchao type (#10263)
update
2024-12-17 21:43:15 +05:30
Dhruv Nair
128b96f369 Fix Mochi Quality Issues (#10033)
* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* Update src/diffusers/models/transformers/transformer_mochi.py

Co-authored-by: Aryan <aryan@huggingface.co>

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: Aryan <aryan@huggingface.co>
2024-12-17 19:40:00 +05:30
Dhruv Nair
e24941b2a7 [Single File] Add GGUF support (#9964)
* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* Update src/diffusers/quantizers/gguf/utils.py

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* Update docs/source/en/quantization/gguf.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* update

* update

* update

* update

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2024-12-17 16:09:37 +05:30
Aryan
f9d5a9324d [docs] Clarify dtypes for Sana (#10248)
update

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-12-17 13:43:24 +05:30
Aryan
ac86393487 [LoRA] Support LTX Video (#10228)
* add lora support for ltx

* add tests

* fix copied from comments

* update

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-12-17 12:05:05 +05:30
Aryan
0d96a894a7 Fix copied from comment in Mochi lora loader (#10255)
update
2024-12-17 11:09:57 +05:30
Sayak Paul
6fb94d51cb [chore] add contribution note for lawrence. (#10253)
add contribution note for lawrence.
2024-12-17 09:17:40 +05:30
Steven Liu
7667cfcb41 [docs] Add missing AttnProcessors (#10246)
* attnprocessors

* lora

* make style

* fix

* fix

* sana

* typo
2024-12-16 15:36:26 -08:00
Aryan
9f00c617a0 [core] TorchAO Quantizer (#10009)
* torchao quantizer


---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2024-12-16 13:35:40 -10:00
Kaiwen Sheng
aafed3f8dd fix downsample bug in MidResTemporalBlock1D (#10250) 2024-12-17 04:55:16 +05:30
hlky
5ed761a6f2 Add ControlNetUnion to AutoPipeline from_pretrained (#10219) 2024-12-16 10:25:08 -10:00
hlky
2f023d7b84 Fix RePaint Scheduler (#10185)
Fix repaint scheduler
2024-12-16 09:38:13 -10:00
hlky
e9a3911b67 Fix checkpoint in CogView3PlusPipeline example (#10211) 2024-12-16 09:31:22 -10:00
hlky
7186bb45f0 Add enable_vae_tiling to AllegroPipeline, fix example (#10212) 2024-12-16 09:31:02 -10:00
hlky
438bd60549 Use non-human subject in StableDiffusion3ControlNetPipeline example (#10214)
* Use non-human subject in StableDiffusion3ControlNetPipeline example

* make style
2024-12-16 09:30:26 -10:00
hlky
87e8157437 Fix ControlNetUnion _callback_tensor_inputs (#10218) 2024-12-16 09:29:12 -10:00
hlky
3f421fe09f Fix use_flow_sigmas (#10242)
use_flow_sigmas copy
2024-12-16 09:27:22 -10:00
hlky
a7d50524dd Add dynamic_shifting to SD3 (#10236)
* Add `dynamic_shifting` to SD3

* calculate_shift

* FlowMatchHeunDiscreteScheduler doesn't support mu

* Inpaint/img2img
2024-12-16 09:25:21 -10:00
hlky
672bd49573 Use t instead of timestep in _apply_perturbed_attention_guidance (#10243) 2024-12-16 09:24:16 -10:00
Sayak Paul
ea893a9ae7 [Docs] add rest of the lora loader mixins to the docs. (#10230)
add rest of the lora loader mixins to the docs.
2024-12-16 08:50:27 -08:00
fancy45daddy
5fb3a98517 Update pipeline_controlnet.py add support for pytorch_xla (#10222)
* Update pipeline_controlnet.py

* make style

---------

Co-authored-by: hlky <hlky@hlky.ac>
2024-12-16 09:05:50 +00:00
Aryan
aace1f412b [core] Hunyuan Video (#10136)
* copy transformer

* copy vae

* copy pipeline

* make fix-copies

* refactor; make original code work with diffusers; test latents for comparison generated with this commit

* move rope into pipeline; remove flash attention; refactor

* begin conversion script

* make style

* refactor attention

* refactor

* refactor final layer

* their mlp -> our feedforward

* make style

* add docs

* refactor layer names

* refactor modulation

* cleanup

* refactor norms

* refactor activations

* refactor single blocks attention

* refactor attention processor

* make style

* cleanup a bit

* refactor double transformer block attention

* update mochi attn proc

* use diffusers attention implementation in all modules; checkpoint for all values matching original

* remove helper functions in vae

* refactor upsample

* refactor causal conv

* refactor resnet

* refactor

* refactor

* refactor

* grad checkpointing

* autoencoder test

* fix scaling factor

* refactor clip

* refactor llama text encoding

* add coauthor

Co-Authored-By: "Gregory D. Hunkins" <greg@ollano.com>

* refactor rope; diff: 0.14990234375; reason and fix: create rope grid on cpu and move to device

Note: The following line diverges from original behaviour. We create the grid on the device, whereas
original implementation creates it on CPU and then moves it to device. This results in numerical
differences in layerwise debugging outputs, but visually it is the same.

* use diffusers timesteps embedding; diff: 0.10205078125

* rename

* convert

* update

* add tests for transformer

* add pipeline tests; text encoder 2 is not optional

* fix attention implementation for torch

* add example

* update docs

* update docs

* apply suggestions from review

* refactor vae

* update

* Apply suggestions from code review

Co-authored-by: hlky <hlky@hlky.ac>

* Update src/diffusers/pipelines/hunyuan_video/pipeline_hunyuan_video.py

Co-authored-by: hlky <hlky@hlky.ac>

* Update src/diffusers/pipelines/hunyuan_video/pipeline_hunyuan_video.py

Co-authored-by: hlky <hlky@hlky.ac>

* make fix-copies

* update

---------

Co-authored-by: "Gregory D. Hunkins" <greg@ollano.com>
Co-authored-by: hlky <hlky@hlky.ac>
2024-12-16 13:56:18 +05:30
Dhruv Nair
8957324363 Fix format issue in push_test yml (#10235)
update
2024-12-16 12:28:36 +05:30
Sayak Paul
e68092a471 [docs] minor stuff to ltx video docs. (#10229)
minor stuff to ltx video docs.
2024-12-16 12:24:14 +05:30
1605 changed files with 155635 additions and 22006 deletions

View File

@@ -0,0 +1,38 @@
name: "\U0001F31F Remote VAE"
description: Feedback for remote VAE pilot
labels: [ "Remote VAE" ]
body:
- type: textarea
id: positive
validations:
required: true
attributes:
label: Did you like the remote VAE solution?
description: |
If you liked it, we would appreciate it if you could elaborate what you liked.
- type: textarea
id: feedback
validations:
required: true
attributes:
label: What can be improved about the current solution?
description: |
Let us know the things you would like to see improved. Note that we will work optimizing the solution once the pilot is over and we have usage.
- type: textarea
id: others
validations:
required: true
attributes:
label: What other VAEs you would like to see if the pilot goes well?
description: |
Provide a list of the VAEs you would like to see in the future if the pilot goes well.
- type: textarea
id: additional-info
attributes:
label: Notify the members of the team
description: |
Tag the following folks when submitting this feedback: @hlky @sayakpaul

View File

@@ -23,7 +23,7 @@ jobs:
runs-on:
group: aws-g6-4xlarge-plus
container:
image: diffusers/diffusers-pytorch-compile-cuda
image: diffusers/diffusers-pytorch-cuda
options: --shm-size "16gb" --ipc host --gpus 0
steps:
- name: Checkout diffusers
@@ -38,6 +38,7 @@ jobs:
python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
python -m uv pip install -e [quality,test]
python -m uv pip install pandas peft
python -m uv pip uninstall transformers && python -m uv pip install transformers==4.48.0
- name: Environment
run: |
python utils/print_env.py

View File

@@ -34,13 +34,20 @@ jobs:
id: file_changes
uses: jitterbit/get-changed-files@v1
with:
format: 'space-delimited'
format: "space-delimited"
token: ${{ secrets.GITHUB_TOKEN }}
- name: Build Changed Docker Images
env:
CHANGED_FILES: ${{ steps.file_changes.outputs.all }}
run: |
CHANGED_FILES="${{ steps.file_changes.outputs.all }}"
for FILE in $CHANGED_FILES; do
echo "$CHANGED_FILES"
for FILE in $CHANGED_FILES; do
# skip anything that isn't still on disk
if [[ ! -f "$FILE" ]]; then
echo "Skipping removed file $FILE"
continue
fi
if [[ "$FILE" == docker/*Dockerfile ]]; then
DOCKER_PATH="${FILE%/Dockerfile}"
DOCKER_TAG=$(basename "$DOCKER_PATH")
@@ -65,8 +72,9 @@ jobs:
image-name:
- diffusers-pytorch-cpu
- diffusers-pytorch-cuda
- diffusers-pytorch-compile-cuda
- diffusers-pytorch-cuda
- diffusers-pytorch-xformers-cuda
- diffusers-pytorch-minimum-cuda
- diffusers-flax-cpu
- diffusers-flax-tpu
- diffusers-onnxruntime-cpu

View File

@@ -13,8 +13,9 @@ env:
PYTEST_TIMEOUT: 600
RUN_SLOW: yes
RUN_NIGHTLY: yes
PIPELINE_USAGE_CUTOFF: 5000
PIPELINE_USAGE_CUTOFF: 0
SLACK_API_TOKEN: ${{ secrets.SLACK_CIFEEDBACK_BOT_TOKEN }}
CONSOLIDATED_REPORT_PATH: consolidated_test_report.md
jobs:
setup_torch_cuda_pipeline_matrix:
@@ -99,11 +100,6 @@ jobs:
with:
name: pipeline_${{ matrix.module }}_test_reports
path: reports
- name: Generate Report and Notify Channel
if: always()
run: |
pip install slack_sdk tabulate
python utils/log_reports.py >> $GITHUB_STEP_SUMMARY
run_nightly_tests_for_other_torch_modules:
name: Nightly Torch CUDA Tests
@@ -174,11 +170,48 @@ jobs:
name: torch_${{ matrix.module }}_cuda_test_reports
path: reports
- name: Generate Report and Notify Channel
if: always()
run_torch_compile_tests:
name: PyTorch Compile CUDA tests
runs-on:
group: aws-g4dn-2xlarge
container:
image: diffusers/diffusers-pytorch-cuda
options: --gpus 0 --shm-size "16gb" --ipc host
steps:
- name: Checkout diffusers
uses: actions/checkout@v3
with:
fetch-depth: 2
- name: NVIDIA-SMI
run: |
pip install slack_sdk tabulate
python utils/log_reports.py >> $GITHUB_STEP_SUMMARY
nvidia-smi
- name: Install dependencies
run: |
python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
python -m uv pip install -e [quality,test,training]
- name: Environment
run: |
python utils/print_env.py
- name: Run torch compile tests on GPU
env:
HF_TOKEN: ${{ secrets.DIFFUSERS_HF_HUB_READ_TOKEN }}
RUN_COMPILE: yes
run: |
python -m pytest -n 1 --max-worker-restart=0 --dist=loadfile -s -v -k "compile" --make-reports=tests_torch_compile_cuda tests/
- name: Failure short reports
if: ${{ failure() }}
run: cat reports/tests_torch_compile_cuda_failures_short.txt
- name: Test suite reports artifacts
if: ${{ always() }}
uses: actions/upload-artifact@v4
with:
name: torch_compile_test_reports
path: reports
run_big_gpu_torch_tests:
name: Torch tests on big GPU
@@ -230,11 +263,227 @@ jobs:
with:
name: torch_cuda_big_gpu_test_reports
path: reports
- name: Generate Report and Notify Channel
if: always()
torch_minimum_version_cuda_tests:
name: Torch Minimum Version CUDA Tests
runs-on:
group: aws-g4dn-2xlarge
container:
image: diffusers/diffusers-pytorch-minimum-cuda
options: --shm-size "16gb" --ipc host --gpus 0
defaults:
run:
shell: bash
steps:
- name: Checkout diffusers
uses: actions/checkout@v3
with:
fetch-depth: 2
- name: Install dependencies
run: |
pip install slack_sdk tabulate
python utils/log_reports.py >> $GITHUB_STEP_SUMMARY
python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
python -m uv pip install -e [quality,test]
python -m uv pip install peft@git+https://github.com/huggingface/peft.git
pip uninstall accelerate -y && python -m uv pip install -U accelerate@git+https://github.com/huggingface/accelerate.git
- name: Environment
run: |
python utils/print_env.py
- name: Run PyTorch CUDA tests
env:
HF_TOKEN: ${{ secrets.DIFFUSERS_HF_HUB_READ_TOKEN }}
# https://pytorch.org/docs/stable/notes/randomness.html#avoiding-nondeterministic-algorithms
CUBLAS_WORKSPACE_CONFIG: :16:8
run: |
python -m pytest -n 1 --max-worker-restart=0 --dist=loadfile \
-s -v -k "not Flax and not Onnx" \
--make-reports=tests_torch_minimum_version_cuda \
tests/models/test_modeling_common.py \
tests/pipelines/test_pipelines_common.py \
tests/pipelines/test_pipeline_utils.py \
tests/pipelines/test_pipelines.py \
tests/pipelines/test_pipelines_auto.py \
tests/schedulers/test_schedulers.py \
tests/others
- name: Failure short reports
if: ${{ failure() }}
run: |
cat reports/tests_torch_minimum_version_cuda_stats.txt
cat reports/tests_torch_minimum_version_cuda_failures_short.txt
- name: Test suite reports artifacts
if: ${{ always() }}
uses: actions/upload-artifact@v4
with:
name: torch_minimum_version_cuda_test_reports
path: reports
run_nightly_onnx_tests:
name: Nightly ONNXRuntime CUDA tests on Ubuntu
runs-on:
group: aws-g4dn-2xlarge
container:
image: diffusers/diffusers-onnxruntime-cuda
options: --gpus 0 --shm-size "16gb" --ipc host
steps:
- name: Checkout diffusers
uses: actions/checkout@v3
with:
fetch-depth: 2
- name: NVIDIA-SMI
run: nvidia-smi
- name: Install dependencies
run: |
python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
python -m uv pip install -e [quality,test]
pip uninstall accelerate -y && python -m uv pip install -U accelerate@git+https://github.com/huggingface/accelerate.git
python -m uv pip install pytest-reportlog
- name: Environment
run: python utils/print_env.py
- name: Run Nightly ONNXRuntime CUDA tests
env:
HF_TOKEN: ${{ secrets.DIFFUSERS_HF_HUB_READ_TOKEN }}
run: |
python -m pytest -n 1 --max-worker-restart=0 --dist=loadfile \
-s -v -k "Onnx" \
--make-reports=tests_onnx_cuda \
--report-log=tests_onnx_cuda.log \
tests/
- name: Failure short reports
if: ${{ failure() }}
run: |
cat reports/tests_onnx_cuda_stats.txt
cat reports/tests_onnx_cuda_failures_short.txt
- name: Test suite reports artifacts
if: ${{ always() }}
uses: actions/upload-artifact@v4
with:
name: tests_onnx_cuda_reports
path: reports
run_nightly_quantization_tests:
name: Torch quantization nightly tests
strategy:
fail-fast: false
max-parallel: 2
matrix:
config:
- backend: "bitsandbytes"
test_location: "bnb"
additional_deps: ["peft"]
- backend: "gguf"
test_location: "gguf"
additional_deps: ["peft"]
- backend: "torchao"
test_location: "torchao"
additional_deps: []
- backend: "optimum_quanto"
test_location: "quanto"
additional_deps: []
runs-on:
group: aws-g6e-xlarge-plus
container:
image: diffusers/diffusers-pytorch-cuda
options: --shm-size "20gb" --ipc host --gpus 0
steps:
- name: Checkout diffusers
uses: actions/checkout@v3
with:
fetch-depth: 2
- name: NVIDIA-SMI
run: nvidia-smi
- name: Install dependencies
run: |
python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
python -m uv pip install -e [quality,test]
python -m uv pip install -U ${{ matrix.config.backend }}
if [ "${{ join(matrix.config.additional_deps, ' ') }}" != "" ]; then
python -m uv pip install ${{ join(matrix.config.additional_deps, ' ') }}
fi
python -m uv pip install pytest-reportlog
- name: Environment
run: |
python utils/print_env.py
- name: ${{ matrix.config.backend }} quantization tests on GPU
env:
HF_TOKEN: ${{ secrets.DIFFUSERS_HF_HUB_READ_TOKEN }}
# https://pytorch.org/docs/stable/notes/randomness.html#avoiding-nondeterministic-algorithms
CUBLAS_WORKSPACE_CONFIG: :16:8
BIG_GPU_MEMORY: 40
run: |
python -m pytest -n 1 --max-worker-restart=0 --dist=loadfile \
--make-reports=tests_${{ matrix.config.backend }}_torch_cuda \
--report-log=tests_${{ matrix.config.backend }}_torch_cuda.log \
tests/quantization/${{ matrix.config.test_location }}
- name: Failure short reports
if: ${{ failure() }}
run: |
cat reports/tests_${{ matrix.config.backend }}_torch_cuda_stats.txt
cat reports/tests_${{ matrix.config.backend }}_torch_cuda_failures_short.txt
- name: Test suite reports artifacts
if: ${{ always() }}
uses: actions/upload-artifact@v4
with:
name: torch_cuda_${{ matrix.config.backend }}_reports
path: reports
run_nightly_pipeline_level_quantization_tests:
name: Torch quantization nightly tests
strategy:
fail-fast: false
max-parallel: 2
runs-on:
group: aws-g6e-xlarge-plus
container:
image: diffusers/diffusers-pytorch-cuda
options: --shm-size "20gb" --ipc host --gpus 0
steps:
- name: Checkout diffusers
uses: actions/checkout@v3
with:
fetch-depth: 2
- name: NVIDIA-SMI
run: nvidia-smi
- name: Install dependencies
run: |
python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
python -m uv pip install -e [quality,test]
python -m uv pip install -U bitsandbytes optimum_quanto
python -m uv pip install pytest-reportlog
- name: Environment
run: |
python utils/print_env.py
- name: Pipeline-level quantization tests on GPU
env:
HF_TOKEN: ${{ secrets.DIFFUSERS_HF_HUB_READ_TOKEN }}
# https://pytorch.org/docs/stable/notes/randomness.html#avoiding-nondeterministic-algorithms
CUBLAS_WORKSPACE_CONFIG: :16:8
BIG_GPU_MEMORY: 40
run: |
python -m pytest -n 1 --max-worker-restart=0 --dist=loadfile \
--make-reports=tests_pipeline_level_quant_torch_cuda \
--report-log=tests_pipeline_level_quant_torch_cuda.log \
tests/quantization/test_pipeline_level_quantization.py
- name: Failure short reports
if: ${{ failure() }}
run: |
cat reports/tests_pipeline_level_quant_torch_cuda_stats.txt
cat reports/tests_pipeline_level_quant_torch_cuda_failures_short.txt
- name: Test suite reports artifacts
if: ${{ always() }}
uses: actions/upload-artifact@v4
with:
name: torch_cuda_pipeline_level_quant_reports
path: reports
run_flax_tpu_tests:
name: Nightly Flax TPU Tests
@@ -287,124 +536,64 @@ jobs:
name: flax_tpu_test_reports
path: reports
- name: Generate Report and Notify Channel
if: always()
run: |
pip install slack_sdk tabulate
python utils/log_reports.py >> $GITHUB_STEP_SUMMARY
run_nightly_onnx_tests:
name: Nightly ONNXRuntime CUDA tests on Ubuntu
generate_consolidated_report:
name: Generate Consolidated Test Report
needs: [
run_nightly_tests_for_torch_pipelines,
run_nightly_tests_for_other_torch_modules,
run_torch_compile_tests,
run_big_gpu_torch_tests,
run_nightly_quantization_tests,
run_nightly_pipeline_level_quantization_tests,
run_nightly_onnx_tests,
torch_minimum_version_cuda_tests,
run_flax_tpu_tests
]
if: always()
runs-on:
group: aws-g4dn-2xlarge
group: aws-general-8-plus
container:
image: diffusers/diffusers-onnxruntime-cuda
options: --gpus 0 --shm-size "16gb" --ipc host
steps:
- name: Checkout diffusers
uses: actions/checkout@v3
with:
fetch-depth: 2
- name: NVIDIA-SMI
run: nvidia-smi
- name: Install dependencies
run: |
python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
python -m uv pip install -e [quality,test]
pip uninstall accelerate -y && python -m uv pip install -U accelerate@git+https://github.com/huggingface/accelerate.git
python -m uv pip install pytest-reportlog
- name: Environment
run: python utils/print_env.py
- name: Run Nightly ONNXRuntime CUDA tests
env:
HF_TOKEN: ${{ secrets.DIFFUSERS_HF_HUB_READ_TOKEN }}
run: |
python -m pytest -n 1 --max-worker-restart=0 --dist=loadfile \
-s -v -k "Onnx" \
--make-reports=tests_onnx_cuda \
--report-log=tests_onnx_cuda.log \
tests/
- name: Failure short reports
if: ${{ failure() }}
run: |
cat reports/tests_onnx_cuda_stats.txt
cat reports/tests_onnx_cuda_failures_short.txt
- name: Test suite reports artifacts
if: ${{ always() }}
uses: actions/upload-artifact@v4
with:
name: tests_onnx_cuda_reports
path: reports
- name: Generate Report and Notify Channel
if: always()
run: |
pip install slack_sdk tabulate
python utils/log_reports.py >> $GITHUB_STEP_SUMMARY
run_nightly_quantization_tests:
name: Torch quantization nightly tests
strategy:
fail-fast: false
max-parallel: 2
matrix:
config:
- backend: "bitsandbytes"
test_location: "bnb"
runs-on:
group: aws-g6e-xlarge-plus
container:
image: diffusers/diffusers-pytorch-cuda
options: --shm-size "20gb" --ipc host --gpus 0
image: diffusers/diffusers-pytorch-cpu
steps:
- name: Checkout diffusers
uses: actions/checkout@v3
with:
fetch-depth: 2
- name: NVIDIA-SMI
run: nvidia-smi
- name: Create reports directory
run: mkdir -p combined_reports
- name: Download all test reports
uses: actions/download-artifact@v4
with:
path: artifacts
- name: Prepare reports
run: |
# Move all report files to a single directory for processing
find artifacts -name "*.txt" -exec cp {} combined_reports/ \;
- name: Install dependencies
run: |
python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
python -m uv pip install -e [quality,test]
python -m uv pip install -U ${{ matrix.config.backend }}
python -m uv pip install pytest-reportlog
- name: Environment
pip install -e .[test]
pip install slack_sdk tabulate
- name: Generate consolidated report
run: |
python utils/print_env.py
- name: ${{ matrix.config.backend }} quantization tests on GPU
env:
HF_TOKEN: ${{ secrets.DIFFUSERS_HF_HUB_READ_TOKEN }}
# https://pytorch.org/docs/stable/notes/randomness.html#avoiding-nondeterministic-algorithms
CUBLAS_WORKSPACE_CONFIG: :16:8
BIG_GPU_MEMORY: 40
python utils/consolidated_test_report.py \
--reports_dir combined_reports \
--output_file $CONSOLIDATED_REPORT_PATH \
--slack_channel_name diffusers-ci-nightly
- name: Show consolidated report
run: |
python -m pytest -n 1 --max-worker-restart=0 --dist=loadfile \
--make-reports=tests_${{ matrix.config.backend }}_torch_cuda \
--report-log=tests_${{ matrix.config.backend }}_torch_cuda.log \
tests/quantization/${{ matrix.config.test_location }}
- name: Failure short reports
if: ${{ failure() }}
run: |
cat reports/tests_${{ matrix.config.backend }}_torch_cuda_stats.txt
cat reports/tests_${{ matrix.config.backend }}_torch_cuda_failures_short.txt
- name: Test suite reports artifacts
if: ${{ always() }}
cat $CONSOLIDATED_REPORT_PATH >> $GITHUB_STEP_SUMMARY
- name: Upload consolidated report
uses: actions/upload-artifact@v4
with:
name: torch_cuda_${{ matrix.config.backend }}_reports
path: reports
- name: Generate Report and Notify Channel
if: always()
run: |
pip install slack_sdk tabulate
python utils/log_reports.py >> $GITHUB_STEP_SUMMARY
name: consolidated_test_report
path: ${{ env.CONSOLIDATED_REPORT_PATH }}
# M1 runner currently not well supported
# TODO: (Dhruv) add these back when we setup better testing for Apple Silicon
@@ -444,7 +633,7 @@ jobs:
# shell: arch -arch arm64 bash {0}
# env:
# HF_HOME: /System/Volumes/Data/mnt/cache
# HF_TOKEN: ${{ secrets.HF_TOKEN }}
# HF_TOKEN: ${{ secrets.DIFFUSERS_HF_HUB_READ_TOKEN }}
# run: |
# ${CONDA_RUN} python -m pytest -n 1 -s -v --make-reports=tests_torch_mps \
# --report-log=tests_torch_mps.log \
@@ -500,7 +689,7 @@ jobs:
# shell: arch -arch arm64 bash {0}
# env:
# HF_HOME: /System/Volumes/Data/mnt/cache
# HF_TOKEN: ${{ secrets.HF_TOKEN }}
# HF_TOKEN: ${{ secrets.DIFFUSERS_HF_HUB_READ_TOKEN }}
# run: |
# ${CONDA_RUN} python -m pytest -n 1 -s -v --make-reports=tests_torch_mps \
# --report-log=tests_torch_mps.log \

17
.github/workflows/pr_style_bot.yml vendored Normal file
View File

@@ -0,0 +1,17 @@
name: PR Style Bot
on:
issue_comment:
types: [created]
permissions:
contents: write
pull-requests: write
jobs:
style:
uses: huggingface/huggingface_hub/.github/workflows/style-bot-action.yml@main
with:
python_quality_dependencies: "[quality]"
secrets:
bot_token: ${{ secrets.HF_STYLE_BOT_ACTION }}

View File

@@ -2,8 +2,7 @@ name: Fast tests for PRs
on:
pull_request:
branches:
- main
branches: [main]
paths:
- "src/diffusers/**.py"
- "benchmarks/**.py"
@@ -12,6 +11,7 @@ on:
- "tests/**.py"
- ".github/**.yml"
- "utils/**.py"
- "setup.py"
push:
branches:
- ci-*
@@ -64,6 +64,7 @@ jobs:
run: |
python utils/check_copies.py
python utils/check_dummies.py
python utils/check_support_list.py
make deps_table_check_updated
- name: Check if failure
if: ${{ failure() }}
@@ -120,7 +121,8 @@ jobs:
run: |
python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
python -m uv pip install -e [quality,test]
python -m uv pip install accelerate
pip uninstall transformers -y && python -m uv pip install -U transformers@git+https://github.com/huggingface/transformers.git --no-deps
pip uninstall accelerate -y && python -m uv pip install -U accelerate@git+https://github.com/huggingface/accelerate.git --no-deps
- name: Environment
run: |
@@ -266,6 +268,7 @@ jobs:
# TODO (sayakpaul, DN6): revisit `--no-deps`
python -m pip install -U peft@git+https://github.com/huggingface/peft.git --no-deps
python -m uv pip install -U transformers@git+https://github.com/huggingface/transformers.git --no-deps
python -m uv pip install -U tokenizers
pip uninstall accelerate -y && python -m uv pip install -U accelerate@git+https://github.com/huggingface/accelerate.git --no-deps
- name: Environment
@@ -288,8 +291,8 @@ jobs:
- name: Failure short reports
if: ${{ failure() }}
run: |
cat reports/tests_lora_failures_short.txt
cat reports/tests_models_lora_failures_short.txt
cat reports/tests_peft_main_failures_short.txt
cat reports/tests_models_lora_peft_main_failures_short.txt
- name: Test suite reports artifacts
if: ${{ always() }}

296
.github/workflows/pr_tests_gpu.yml vendored Normal file
View File

@@ -0,0 +1,296 @@
name: Fast GPU Tests on PR
on:
pull_request:
branches: main
paths:
- "src/diffusers/models/modeling_utils.py"
- "src/diffusers/models/model_loading_utils.py"
- "src/diffusers/pipelines/pipeline_utils.py"
- "src/diffusers/pipeline_loading_utils.py"
- "src/diffusers/loaders/lora_base.py"
- "src/diffusers/loaders/lora_pipeline.py"
- "src/diffusers/loaders/peft.py"
- "tests/pipelines/test_pipelines_common.py"
- "tests/models/test_modeling_common.py"
workflow_dispatch:
concurrency:
group: ${{ github.workflow }}-${{ github.head_ref || github.run_id }}
cancel-in-progress: true
env:
DIFFUSERS_IS_CI: yes
OMP_NUM_THREADS: 8
MKL_NUM_THREADS: 8
HF_HUB_ENABLE_HF_TRANSFER: 1
PYTEST_TIMEOUT: 600
PIPELINE_USAGE_CUTOFF: 1000000000 # set high cutoff so that only always-test pipelines run
jobs:
check_code_quality:
runs-on: ubuntu-22.04
steps:
- uses: actions/checkout@v3
- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: "3.8"
- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install .[quality]
- name: Check quality
run: make quality
- name: Check if failure
if: ${{ failure() }}
run: |
echo "Quality check failed. Please ensure the right dependency versions are installed with 'pip install -e .[quality]' and run 'make style && make quality'" >> $GITHUB_STEP_SUMMARY
check_repository_consistency:
needs: check_code_quality
runs-on: ubuntu-22.04
steps:
- uses: actions/checkout@v3
- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: "3.8"
- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install .[quality]
- name: Check repo consistency
run: |
python utils/check_copies.py
python utils/check_dummies.py
python utils/check_support_list.py
make deps_table_check_updated
- name: Check if failure
if: ${{ failure() }}
run: |
echo "Repo consistency check failed. Please ensure the right dependency versions are installed with 'pip install -e .[quality]' and run 'make fix-copies'" >> $GITHUB_STEP_SUMMARY
setup_torch_cuda_pipeline_matrix:
needs: [check_code_quality, check_repository_consistency]
name: Setup Torch Pipelines CUDA Slow Tests Matrix
runs-on:
group: aws-general-8-plus
container:
image: diffusers/diffusers-pytorch-cpu
outputs:
pipeline_test_matrix: ${{ steps.fetch_pipeline_matrix.outputs.pipeline_test_matrix }}
steps:
- name: Checkout diffusers
uses: actions/checkout@v3
with:
fetch-depth: 2
- name: Install dependencies
run: |
python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
python -m uv pip install -e [quality,test]
- name: Environment
run: |
python utils/print_env.py
- name: Fetch Pipeline Matrix
id: fetch_pipeline_matrix
run: |
matrix=$(python utils/fetch_torch_cuda_pipeline_test_matrix.py)
echo $matrix
echo "pipeline_test_matrix=$matrix" >> $GITHUB_OUTPUT
- name: Pipeline Tests Artifacts
if: ${{ always() }}
uses: actions/upload-artifact@v4
with:
name: test-pipelines.json
path: reports
torch_pipelines_cuda_tests:
name: Torch Pipelines CUDA Tests
needs: setup_torch_cuda_pipeline_matrix
strategy:
fail-fast: false
max-parallel: 8
matrix:
module: ${{ fromJson(needs.setup_torch_cuda_pipeline_matrix.outputs.pipeline_test_matrix) }}
runs-on:
group: aws-g4dn-2xlarge
container:
image: diffusers/diffusers-pytorch-cuda
options: --shm-size "16gb" --ipc host --gpus 0
steps:
- name: Checkout diffusers
uses: actions/checkout@v3
with:
fetch-depth: 2
- name: NVIDIA-SMI
run: |
nvidia-smi
- name: Install dependencies
run: |
python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
python -m uv pip install -e [quality,test]
pip uninstall accelerate -y && python -m uv pip install -U accelerate@git+https://github.com/huggingface/accelerate.git
pip uninstall transformers -y && python -m uv pip install -U transformers@git+https://github.com/huggingface/transformers.git --no-deps
- name: Environment
run: |
python utils/print_env.py
- name: Extract tests
id: extract_tests
run: |
pattern=$(python utils/extract_tests_from_mixin.py --type pipeline)
echo "$pattern" > /tmp/test_pattern.txt
echo "pattern_file=/tmp/test_pattern.txt" >> $GITHUB_OUTPUT
- name: PyTorch CUDA checkpoint tests on Ubuntu
env:
HF_TOKEN: ${{ secrets.DIFFUSERS_HF_HUB_READ_TOKEN }}
# https://pytorch.org/docs/stable/notes/randomness.html#avoiding-nondeterministic-algorithms
CUBLAS_WORKSPACE_CONFIG: :16:8
run: |
if [ "${{ matrix.module }}" = "ip_adapters" ]; then
python -m pytest -n 1 --max-worker-restart=0 --dist=loadfile \
-s -v -k "not Flax and not Onnx" \
--make-reports=tests_pipeline_${{ matrix.module }}_cuda \
tests/pipelines/${{ matrix.module }}
else
pattern=$(cat ${{ steps.extract_tests.outputs.pattern_file }})
python -m pytest -n 1 --max-worker-restart=0 --dist=loadfile \
-s -v -k "not Flax and not Onnx and $pattern" \
--make-reports=tests_pipeline_${{ matrix.module }}_cuda \
tests/pipelines/${{ matrix.module }}
fi
- name: Failure short reports
if: ${{ failure() }}
run: |
cat reports/tests_pipeline_${{ matrix.module }}_cuda_stats.txt
cat reports/tests_pipeline_${{ matrix.module }}_cuda_failures_short.txt
- name: Test suite reports artifacts
if: ${{ always() }}
uses: actions/upload-artifact@v4
with:
name: pipeline_${{ matrix.module }}_test_reports
path: reports
torch_cuda_tests:
name: Torch CUDA Tests
needs: [check_code_quality, check_repository_consistency]
runs-on:
group: aws-g4dn-2xlarge
container:
image: diffusers/diffusers-pytorch-cuda
options: --shm-size "16gb" --ipc host --gpus 0
defaults:
run:
shell: bash
strategy:
fail-fast: false
max-parallel: 2
matrix:
module: [models, schedulers, lora, others]
steps:
- name: Checkout diffusers
uses: actions/checkout@v3
with:
fetch-depth: 2
- name: Install dependencies
run: |
python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
python -m uv pip install -e [quality,test]
python -m uv pip install peft@git+https://github.com/huggingface/peft.git
pip uninstall accelerate -y && python -m uv pip install -U accelerate@git+https://github.com/huggingface/accelerate.git
pip uninstall transformers -y && python -m uv pip install -U transformers@git+https://github.com/huggingface/transformers.git --no-deps
- name: Environment
run: |
python utils/print_env.py
- name: Extract tests
id: extract_tests
run: |
pattern=$(python utils/extract_tests_from_mixin.py --type ${{ matrix.module }})
echo "$pattern" > /tmp/test_pattern.txt
echo "pattern_file=/tmp/test_pattern.txt" >> $GITHUB_OUTPUT
- name: Run PyTorch CUDA tests
env:
HF_TOKEN: ${{ secrets.DIFFUSERS_HF_HUB_READ_TOKEN }}
# https://pytorch.org/docs/stable/notes/randomness.html#avoiding-nondeterministic-algorithms
CUBLAS_WORKSPACE_CONFIG: :16:8
run: |
pattern=$(cat ${{ steps.extract_tests.outputs.pattern_file }})
if [ -z "$pattern" ]; then
python -m pytest -n 1 -sv --max-worker-restart=0 --dist=loadfile -k "not Flax and not Onnx" tests/${{ matrix.module }} \
--make-reports=tests_torch_cuda_${{ matrix.module }}
else
python -m pytest -n 1 -sv --max-worker-restart=0 --dist=loadfile -k "not Flax and not Onnx and $pattern" tests/${{ matrix.module }} \
--make-reports=tests_torch_cuda_${{ matrix.module }}
fi
- name: Failure short reports
if: ${{ failure() }}
run: |
cat reports/tests_torch_cuda_${{ matrix.module }}_stats.txt
cat reports/tests_torch_cuda_${{ matrix.module }}_failures_short.txt
- name: Test suite reports artifacts
if: ${{ always() }}
uses: actions/upload-artifact@v4
with:
name: torch_cuda_test_reports_${{ matrix.module }}
path: reports
run_examples_tests:
name: Examples PyTorch CUDA tests on Ubuntu
needs: [check_code_quality, check_repository_consistency]
runs-on:
group: aws-g4dn-2xlarge
container:
image: diffusers/diffusers-pytorch-cuda
options: --gpus 0 --shm-size "16gb" --ipc host
steps:
- name: Checkout diffusers
uses: actions/checkout@v3
with:
fetch-depth: 2
- name: NVIDIA-SMI
run: |
nvidia-smi
- name: Install dependencies
run: |
python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
pip uninstall transformers -y && python -m uv pip install -U transformers@git+https://github.com/huggingface/transformers.git --no-deps
python -m uv pip install -e [quality,test,training]
- name: Environment
run: |
python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
python utils/print_env.py
- name: Run example tests on GPU
env:
HF_TOKEN: ${{ secrets.DIFFUSERS_HF_HUB_READ_TOKEN }}
run: |
python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
python -m uv pip install timm
python -m pytest -n 1 --max-worker-restart=0 --dist=loadfile -s -v --make-reports=examples_torch_cuda examples/
- name: Failure short reports
if: ${{ failure() }}
run: |
cat reports/examples_torch_cuda_stats.txt
cat reports/examples_torch_cuda_failures_short.txt
- name: Test suite reports artifacts
if: ${{ always() }}
uses: actions/upload-artifact@v4
with:
name: examples_test_reports
path: reports

View File

@@ -83,7 +83,7 @@ jobs:
python utils/print_env.py
- name: PyTorch CUDA checkpoint tests on Ubuntu
env:
HF_TOKEN: ${{ secrets.HF_TOKEN }}
HF_TOKEN: ${{ secrets.DIFFUSERS_HF_HUB_READ_TOKEN }}
# https://pytorch.org/docs/stable/notes/randomness.html#avoiding-nondeterministic-algorithms
CUBLAS_WORKSPACE_CONFIG: :16:8
run: |
@@ -137,7 +137,7 @@ jobs:
- name: Run PyTorch CUDA tests
env:
HF_TOKEN: ${{ secrets.HF_TOKEN }}
HF_TOKEN: ${{ secrets.DIFFUSERS_HF_HUB_READ_TOKEN }}
# https://pytorch.org/docs/stable/notes/randomness.html#avoiding-nondeterministic-algorithms
CUBLAS_WORKSPACE_CONFIG: :16:8
run: |
@@ -165,7 +165,8 @@ jobs:
group: gcp-ct5lp-hightpu-8t
container:
image: diffusers/diffusers-flax-tpu
options: --shm-size "16gb" --ipc host --privileged ${{ vars.V5_LITEPOD_8_ENV}} -v /mnt/hf_cache:/mnt/hf_cache defaults:
options: --shm-size "16gb" --ipc host --privileged ${{ vars.V5_LITEPOD_8_ENV}} -v /mnt/hf_cache:/mnt/hf_cache
defaults:
run:
shell: bash
steps:
@@ -186,7 +187,7 @@ jobs:
- name: Run Flax TPU tests
env:
HF_TOKEN: ${{ secrets.HF_TOKEN }}
HF_TOKEN: ${{ secrets.DIFFUSERS_HF_HUB_READ_TOKEN }}
run: |
python -m pytest -n 0 \
-s -v -k "Flax" \
@@ -234,7 +235,7 @@ jobs:
- name: Run ONNXRuntime CUDA tests
env:
HF_TOKEN: ${{ secrets.HF_TOKEN }}
HF_TOKEN: ${{ secrets.DIFFUSERS_HF_HUB_READ_TOKEN }}
run: |
python -m pytest -n 1 --max-worker-restart=0 --dist=loadfile \
-s -v -k "Onnx" \
@@ -261,7 +262,7 @@ jobs:
group: aws-g4dn-2xlarge
container:
image: diffusers/diffusers-pytorch-compile-cuda
image: diffusers/diffusers-pytorch-cuda
options: --gpus 0 --shm-size "16gb" --ipc host
steps:
@@ -282,7 +283,7 @@ jobs:
python utils/print_env.py
- name: Run example tests on GPU
env:
HF_TOKEN: ${{ secrets.HF_TOKEN }}
HF_TOKEN: ${{ secrets.DIFFUSERS_HF_HUB_READ_TOKEN }}
RUN_COMPILE: yes
run: |
python -m pytest -n 1 --max-worker-restart=0 --dist=loadfile -s -v -k "compile" --make-reports=tests_torch_compile_cuda tests/
@@ -325,7 +326,7 @@ jobs:
python utils/print_env.py
- name: Run example tests on GPU
env:
HF_TOKEN: ${{ secrets.HF_TOKEN }}
HF_TOKEN: ${{ secrets.DIFFUSERS_HF_HUB_READ_TOKEN }}
run: |
python -m pytest -n 1 --max-worker-restart=0 --dist=loadfile -s -v -k "xformers" --make-reports=tests_torch_xformers_cuda tests/
- name: Failure short reports
@@ -348,7 +349,6 @@ jobs:
container:
image: diffusers/diffusers-pytorch-cuda
options: --gpus 0 --shm-size "16gb" --ipc host
steps:
- name: Checkout diffusers
uses: actions/checkout@v3
@@ -358,7 +358,6 @@ jobs:
- name: NVIDIA-SMI
run: |
nvidia-smi
- name: Install dependencies
run: |
python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
@@ -371,7 +370,7 @@ jobs:
- name: Run example tests on GPU
env:
HF_TOKEN: ${{ secrets.HF_TOKEN }}
HF_TOKEN: ${{ secrets.DIFFUSERS_HF_HUB_READ_TOKEN }}
run: |
python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
python -m uv pip install timm

View File

@@ -46,7 +46,7 @@ jobs:
shell: arch -arch arm64 bash {0}
run: |
${CONDA_RUN} python -m pip install --upgrade pip uv
${CONDA_RUN} python -m uv pip install -e [quality,test]
${CONDA_RUN} python -m uv pip install -e ".[quality,test]"
${CONDA_RUN} python -m uv pip install torch torchvision torchaudio
${CONDA_RUN} python -m uv pip install accelerate@git+https://github.com/huggingface/accelerate.git
${CONDA_RUN} python -m uv pip install transformers --upgrade

View File

@@ -68,7 +68,7 @@ jobs:
- name: Test installing diffusers and importing
run: |
pip install diffusers && pip uninstall diffusers -y
pip install -i https://testpypi.python.org/pypi diffusers
pip install -i https://test.pypi.org/simple/ diffusers
python -c "from diffusers import __version__; print(__version__)"
python -c "from diffusers import DiffusionPipeline; pipe = DiffusionPipeline.from_pretrained('fusing/unet-ldm-dummy-update'); pipe()"
python -c "from diffusers import DiffusionPipeline; pipe = DiffusionPipeline.from_pretrained('hf-internal-testing/tiny-stable-diffusion-pipe', safety_checker=None); pipe('ah suh du')"

View File

@@ -81,7 +81,7 @@ jobs:
python utils/print_env.py
- name: Slow PyTorch CUDA checkpoint tests on Ubuntu
env:
HF_TOKEN: ${{ secrets.HF_TOKEN }}
HF_TOKEN: ${{ secrets.DIFFUSERS_HF_HUB_READ_TOKEN }}
# https://pytorch.org/docs/stable/notes/randomness.html#avoiding-nondeterministic-algorithms
CUBLAS_WORKSPACE_CONFIG: :16:8
run: |
@@ -135,7 +135,7 @@ jobs:
- name: Run PyTorch CUDA tests
env:
HF_TOKEN: ${{ secrets.HF_TOKEN }}
HF_TOKEN: ${{ secrets.DIFFUSERS_HF_HUB_READ_TOKEN }}
# https://pytorch.org/docs/stable/notes/randomness.html#avoiding-nondeterministic-algorithms
CUBLAS_WORKSPACE_CONFIG: :16:8
run: |
@@ -157,6 +157,63 @@ jobs:
name: torch_cuda_${{ matrix.module }}_test_reports
path: reports
torch_minimum_version_cuda_tests:
name: Torch Minimum Version CUDA Tests
runs-on:
group: aws-g4dn-2xlarge
container:
image: diffusers/diffusers-pytorch-minimum-cuda
options: --shm-size "16gb" --ipc host --gpus 0
defaults:
run:
shell: bash
steps:
- name: Checkout diffusers
uses: actions/checkout@v3
with:
fetch-depth: 2
- name: Install dependencies
run: |
python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
python -m uv pip install -e [quality,test]
python -m uv pip install peft@git+https://github.com/huggingface/peft.git
pip uninstall accelerate -y && python -m uv pip install -U accelerate@git+https://github.com/huggingface/accelerate.git
- name: Environment
run: |
python utils/print_env.py
- name: Run PyTorch CUDA tests
env:
HF_TOKEN: ${{ secrets.DIFFUSERS_HF_HUB_READ_TOKEN }}
# https://pytorch.org/docs/stable/notes/randomness.html#avoiding-nondeterministic-algorithms
CUBLAS_WORKSPACE_CONFIG: :16:8
run: |
python -m pytest -n 1 --max-worker-restart=0 --dist=loadfile \
-s -v -k "not Flax and not Onnx" \
--make-reports=tests_torch_minimum_cuda \
tests/models/test_modeling_common.py \
tests/pipelines/test_pipelines_common.py \
tests/pipelines/test_pipeline_utils.py \
tests/pipelines/test_pipelines.py \
tests/pipelines/test_pipelines_auto.py \
tests/schedulers/test_schedulers.py \
tests/others
- name: Failure short reports
if: ${{ failure() }}
run: |
cat reports/tests_torch_minimum_version_cuda_stats.txt
cat reports/tests_torch_minimum_version_cuda_failures_short.txt
- name: Test suite reports artifacts
if: ${{ always() }}
uses: actions/upload-artifact@v4
with:
name: torch_minimum_version_cuda_test_reports
path: reports
flax_tpu_tests:
name: Flax TPU Tests
runs-on: docker-tpu
@@ -184,7 +241,7 @@ jobs:
- name: Run slow Flax TPU tests
env:
HF_TOKEN: ${{ secrets.HF_TOKEN }}
HF_TOKEN: ${{ secrets.DIFFUSERS_HF_HUB_READ_TOKEN }}
run: |
python -m pytest -n 0 \
-s -v -k "Flax" \
@@ -232,7 +289,7 @@ jobs:
- name: Run slow ONNXRuntime CUDA tests
env:
HF_TOKEN: ${{ secrets.HF_TOKEN }}
HF_TOKEN: ${{ secrets.DIFFUSERS_HF_HUB_READ_TOKEN }}
run: |
python -m pytest -n 1 --max-worker-restart=0 --dist=loadfile \
-s -v -k "Onnx" \
@@ -259,7 +316,7 @@ jobs:
group: aws-g4dn-2xlarge
container:
image: diffusers/diffusers-pytorch-compile-cuda
image: diffusers/diffusers-pytorch-cuda
options: --gpus 0 --shm-size "16gb" --ipc host
steps:
@@ -278,9 +335,9 @@ jobs:
- name: Environment
run: |
python utils/print_env.py
- name: Run example tests on GPU
- name: Run torch compile tests on GPU
env:
HF_TOKEN: ${{ secrets.HF_TOKEN }}
HF_TOKEN: ${{ secrets.DIFFUSERS_HF_HUB_READ_TOKEN }}
RUN_COMPILE: yes
run: |
python -m pytest -n 1 --max-worker-restart=0 --dist=loadfile -s -v -k "compile" --make-reports=tests_torch_compile_cuda tests/
@@ -323,7 +380,7 @@ jobs:
python utils/print_env.py
- name: Run example tests on GPU
env:
HF_TOKEN: ${{ secrets.HF_TOKEN }}
HF_TOKEN: ${{ secrets.DIFFUSERS_HF_HUB_READ_TOKEN }}
run: |
python -m pytest -n 1 --max-worker-restart=0 --dist=loadfile -s -v -k "xformers" --make-reports=tests_torch_xformers_cuda tests/
- name: Failure short reports
@@ -369,7 +426,7 @@ jobs:
- name: Run example tests on GPU
env:
HF_TOKEN: ${{ secrets.HF_TOKEN }}
HF_TOKEN: ${{ secrets.DIFFUSERS_HF_HUB_READ_TOKEN }}
run: |
python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
python -m uv pip install timm

View File

@@ -7,8 +7,8 @@ on:
default: 'diffusers/diffusers-pytorch-cuda'
description: 'Name of the Docker image'
required: true
branch:
description: 'PR Branch to test on'
pr_number:
description: 'PR number to test on'
required: true
test:
description: 'Tests to run (e.g.: `tests/models`).'
@@ -43,8 +43,8 @@ jobs:
exit 1
fi
if [[ ! "$PY_TEST" =~ ^tests/(models|pipelines) ]]; then
echo "Error: The input string must contain either 'models' or 'pipelines' after 'tests/'."
if [[ ! "$PY_TEST" =~ ^tests/(models|pipelines|lora) ]]; then
echo "Error: The input string must contain either 'models', 'pipelines', or 'lora' after 'tests/'."
exit 1
fi
@@ -53,13 +53,13 @@ jobs:
exit 1
fi
echo "$PY_TEST"
shell: bash -e {0}
- name: Checkout PR branch
uses: actions/checkout@v4
with:
ref: ${{ github.event.inputs.branch }}
repository: ${{ github.event.pull_request.head.repo.full_name }}
ref: refs/pull/${{ inputs.pr_number }}/head
- name: Install pytest
run: |

View File

@@ -13,3 +13,6 @@ jobs:
fetch-depth: 0
- name: Secret Scanning
uses: trufflesecurity/trufflehog@main
with:
extra_args: --results=verified,unknown

View File

@@ -1,4 +1,4 @@
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
<!--Copyright 2025 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

View File

@@ -1,4 +1,4 @@
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
<!--Copyright 2025 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

View File

@@ -28,9 +28,9 @@ ENV PATH="/opt/venv/bin:$PATH"
# pre-install the heavy dependencies (these can later be overridden by the deps from setup.py)
RUN python3 -m pip install --no-cache-dir --upgrade pip uv==0.1.11 && \
python3 -m uv pip install --no-cache-dir \
torch==2.1.2 \
torchvision==0.16.2 \
torchaudio==2.1.2 \
torch \
torchvision \
torchaudio\
onnxruntime \
--extra-index-url https://download.pytorch.org/whl/cpu && \
python3 -m uv pip install --no-cache-dir \

View File

@@ -0,0 +1,53 @@
FROM nvidia/cuda:12.1.0-runtime-ubuntu20.04
LABEL maintainer="Hugging Face"
LABEL repository="diffusers"
ENV DEBIAN_FRONTEND=noninteractive
ENV MINIMUM_SUPPORTED_TORCH_VERSION="2.1.0"
ENV MINIMUM_SUPPORTED_TORCHVISION_VERSION="0.16.0"
ENV MINIMUM_SUPPORTED_TORCHAUDIO_VERSION="2.1.0"
RUN apt-get -y update \
&& apt-get install -y software-properties-common \
&& add-apt-repository ppa:deadsnakes/ppa
RUN apt install -y bash \
build-essential \
git \
git-lfs \
curl \
ca-certificates \
libsndfile1-dev \
libgl1 \
python3.10 \
python3.10-dev \
python3-pip \
python3.10-venv && \
rm -rf /var/lib/apt/lists
# make sure to use venv
RUN python3.10 -m venv /opt/venv
ENV PATH="/opt/venv/bin:$PATH"
# pre-install the heavy dependencies (these can later be overridden by the deps from setup.py)
RUN python3.10 -m pip install --no-cache-dir --upgrade pip uv==0.1.11 && \
python3.10 -m uv pip install --no-cache-dir \
torch==$MINIMUM_SUPPORTED_TORCH_VERSION \
torchvision==$MINIMUM_SUPPORTED_TORCHVISION_VERSION \
torchaudio==$MINIMUM_SUPPORTED_TORCHAUDIO_VERSION \
invisible_watermark && \
python3.10 -m pip install --no-cache-dir \
accelerate \
datasets \
hf-doc-builder \
huggingface-hub \
hf_transfer \
Jinja2 \
librosa \
numpy==1.26.4 \
scipy \
tensorboard \
transformers \
hf_transfer
CMD ["/bin/bash"]

View File

@@ -1,4 +1,4 @@
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
<!--Copyright 2025 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

View File

@@ -17,12 +17,6 @@
title: AutoPipeline
- local: tutorials/basic_training
title: Train a diffusion model
- local: tutorials/using_peft_for_inference
title: Load LoRAs for inference
- local: tutorials/fast_diffusion
title: Accelerate inference of text-to-image diffusion models
- local: tutorials/inference_with_big_models
title: Working with big models
title: Tutorials
- sections:
- local: using-diffusers/loading
@@ -33,11 +27,24 @@
title: Load schedulers and models
- local: using-diffusers/other-formats
title: Model files and layouts
- local: using-diffusers/loading_adapters
title: Load adapters
- local: using-diffusers/push_to_hub
title: Push files to the Hub
title: Load pipelines and adapters
- sections:
- local: tutorials/using_peft_for_inference
title: LoRA
- local: using-diffusers/ip_adapter
title: IP-Adapter
- local: using-diffusers/controlnet
title: ControlNet
- local: using-diffusers/t2i_adapter
title: T2I-Adapter
- local: using-diffusers/dreambooth
title: DreamBooth
- local: using-diffusers/textual_inversion_inference
title: Textual inversion
title: Adapters
isExpanded: false
- sections:
- local: using-diffusers/unconditional_image_generation
title: Unconditional image generation
@@ -48,7 +55,7 @@
- local: using-diffusers/inpaint
title: Inpainting
- local: using-diffusers/text-img2vid
title: Text or image-to-video
title: Video generation
- local: using-diffusers/depth2img
title: Depth-to-image
title: Generative tasks
@@ -59,8 +66,6 @@
title: Create a server
- local: training/distributed_inference
title: Distributed inference
- local: using-diffusers/merge_loras
title: Merge LoRAs
- local: using-diffusers/scheduler_features
title: Scheduler features
- local: using-diffusers/callback
@@ -77,26 +82,30 @@
title: Outpainting
title: Advanced inference
- sections:
- local: using-diffusers/cogvideox
title: CogVideoX
- local: hybrid_inference/overview
title: Overview
- local: hybrid_inference/vae_decode
title: VAE Decode
- local: hybrid_inference/vae_encode
title: VAE Encode
- local: hybrid_inference/api_reference
title: API Reference
title: Hybrid Inference
- sections:
- local: using-diffusers/consisid
title: ConsisID
- local: using-diffusers/sdxl
title: Stable Diffusion XL
- local: using-diffusers/sdxl_turbo
title: SDXL Turbo
- local: using-diffusers/kandinsky
title: Kandinsky
- local: using-diffusers/ip_adapter
title: IP-Adapter
- local: using-diffusers/omnigen
title: OmniGen
- local: using-diffusers/pag
title: PAG
- local: using-diffusers/controlnet
title: ControlNet
- local: using-diffusers/t2i_adapter
title: T2I-Adapter
- local: using-diffusers/inference_with_lcm
title: Latent Consistency Model
- local: using-diffusers/textual_inversion_inference
title: Textual inversion
- local: using-diffusers/shap-e
title: Shap-E
- local: using-diffusers/diffedit
@@ -157,14 +166,22 @@
title: Getting Started
- local: quantization/bitsandbytes
title: bitsandbytes
- local: quantization/gguf
title: gguf
- local: quantization/torchao
title: torchao
- local: quantization/quanto
title: quanto
title: Quantization Methods
- sections:
- local: optimization/fp16
title: Speed up inference
title: Accelerate inference
- local: optimization/cache
title: Caching
- local: optimization/memory
title: Reduce memory usage
- local: optimization/torch2.0
title: PyTorch 2.0
- local: optimization/pruna
title: Pruna
- local: optimization/xformers
title: xFormers
- local: optimization/tome
@@ -175,6 +192,8 @@
title: TGATE
- local: optimization/xdit
title: xDiT
- local: optimization/para_attn
title: ParaAttention
- sections:
- local: using-diffusers/stable_diffusion_jax_how_to
title: JAX/Flax
@@ -189,7 +208,7 @@
- local: optimization/mps
title: Metal Performance Shaders (MPS)
- local: optimization/habana
title: Habana Gaudi
title: Intel Gaudi
- local: optimization/neuron
title: AWS Neuron
title: Optimized hardware
@@ -234,6 +253,8 @@
title: Textual Inversion
- local: api/loaders/unet
title: UNet
- local: api/loaders/transformer_sd3
title: SD3Transformer2D
- local: api/loaders/peft
title: PEFT
title: Loaders
@@ -241,67 +262,91 @@
sections:
- local: api/models/overview
title: Overview
- local: api/models/auto_model
title: AutoModel
- sections:
- local: api/models/controlnet
title: ControlNetModel
- local: api/models/controlnet_union
title: ControlNetUnionModel
- local: api/models/controlnet_flux
title: FluxControlNetModel
- local: api/models/controlnet_hunyuandit
title: HunyuanDiT2DControlNetModel
- local: api/models/controlnet_sana
title: SanaControlNetModel
- local: api/models/controlnet_sd3
title: SD3ControlNetModel
- local: api/models/controlnet_sparsectrl
title: SparseControlNetModel
- local: api/models/controlnet_union
title: ControlNetUnionModel
title: ControlNets
- sections:
- local: api/models/allegro_transformer3d
title: AllegroTransformer3DModel
- local: api/models/aura_flow_transformer2d
title: AuraFlowTransformer2DModel
- local: api/models/chroma_transformer
title: ChromaTransformer2DModel
- local: api/models/cogvideox_transformer3d
title: CogVideoXTransformer3DModel
- local: api/models/cogview3plus_transformer2d
title: CogView3PlusTransformer2DModel
- local: api/models/cogview4_transformer2d
title: CogView4Transformer2DModel
- local: api/models/consisid_transformer3d
title: ConsisIDTransformer3DModel
- local: api/models/cosmos_transformer3d
title: CosmosTransformer3DModel
- local: api/models/dit_transformer2d
title: DiTTransformer2DModel
- local: api/models/easyanimate_transformer3d
title: EasyAnimateTransformer3DModel
- local: api/models/flux_transformer
title: FluxTransformer2DModel
- local: api/models/hidream_image_transformer
title: HiDreamImageTransformer2DModel
- local: api/models/hunyuan_transformer2d
title: HunyuanDiT2DModel
- local: api/models/hunyuan_video_transformer_3d
title: HunyuanVideoTransformer3DModel
- local: api/models/latte_transformer3d
title: LatteTransformer3DModel
- local: api/models/lumina_nextdit2d
title: LuminaNextDiT2DModel
- local: api/models/ltx_video_transformer3d
title: LTXVideoTransformer3DModel
- local: api/models/lumina2_transformer2d
title: Lumina2Transformer2DModel
- local: api/models/lumina_nextdit2d
title: LuminaNextDiT2DModel
- local: api/models/mochi_transformer3d
title: MochiTransformer3DModel
- local: api/models/omnigen_transformer
title: OmniGenTransformer2DModel
- local: api/models/pixart_transformer2d
title: PixArtTransformer2DModel
- local: api/models/prior_transformer
title: PriorTransformer
- local: api/models/sd3_transformer2d
title: SD3Transformer2DModel
- local: api/models/sana_transformer2d
title: SanaTransformer2DModel
- local: api/models/sd3_transformer2d
title: SD3Transformer2DModel
- local: api/models/stable_audio_transformer
title: StableAudioDiTModel
- local: api/models/transformer2d
title: Transformer2DModel
- local: api/models/transformer_temporal
title: TransformerTemporalModel
- local: api/models/wan_transformer_3d
title: WanTransformer3DModel
title: Transformers
- sections:
- local: api/models/stable_cascade_unet
title: StableCascadeUNet
- local: api/models/unet
title: UNet1DModel
- local: api/models/unet2d
title: UNet2DModel
- local: api/models/unet2d-cond
title: UNet2DConditionModel
- local: api/models/unet2d
title: UNet2DModel
- local: api/models/unet3d-cond
title: UNet3DConditionModel
- local: api/models/unet-motion
@@ -310,20 +355,28 @@
title: UViT2DModel
title: UNets
- sections:
- local: api/models/asymmetricautoencoderkl
title: AsymmetricAutoencoderKL
- local: api/models/autoencoder_dc
title: AutoencoderDC
- local: api/models/autoencoderkl
title: AutoencoderKL
- local: api/models/autoencoderkl_allegro
title: AutoencoderKLAllegro
- local: api/models/autoencoderkl_cogvideox
title: AutoencoderKLCogVideoX
- local: api/models/autoencoderkl_cosmos
title: AutoencoderKLCosmos
- local: api/models/autoencoder_kl_hunyuan_video
title: AutoencoderKLHunyuanVideo
- local: api/models/autoencoderkl_ltx_video
title: AutoencoderKLLTXVideo
- local: api/models/autoencoderkl_magvit
title: AutoencoderKLMagvit
- local: api/models/autoencoderkl_mochi
title: AutoencoderKLMochi
- local: api/models/asymmetricautoencoderkl
title: AsymmetricAutoencoderKL
- local: api/models/autoencoder_dc
title: AutoencoderDC
- local: api/models/autoencoder_kl_wan
title: AutoencoderKLWan
- local: api/models/consistency_decoder_vae
title: ConsistencyDecoderVAE
- local: api/models/autoencoder_oobleck
@@ -356,10 +409,16 @@
title: AutoPipeline
- local: api/pipelines/blip_diffusion
title: BLIP-Diffusion
- local: api/pipelines/chroma
title: Chroma
- local: api/pipelines/cogvideox
title: CogVideoX
- local: api/pipelines/cogview3
title: CogView3
- local: api/pipelines/cogview4
title: CogView4
- local: api/pipelines/consisid
title: ConsisID
- local: api/pipelines/consistency_models
title: Consistency Models
- local: api/pipelines/controlnet
@@ -372,12 +431,16 @@
title: ControlNet with Stable Diffusion 3
- local: api/pipelines/controlnet_sdxl
title: ControlNet with Stable Diffusion XL
- local: api/pipelines/controlnet_sana
title: ControlNet-Sana
- local: api/pipelines/controlnetxs
title: ControlNet-XS
- local: api/pipelines/controlnetxs_sdxl
title: ControlNet-XS with Stable Diffusion XL
- local: api/pipelines/controlnet_union
title: ControlNetUnion
- local: api/pipelines/cosmos
title: Cosmos
- local: api/pipelines/dance_diffusion
title: Dance Diffusion
- local: api/pipelines/ddim
@@ -390,10 +453,20 @@
title: DiffEdit
- local: api/pipelines/dit
title: DiT
- local: api/pipelines/easyanimate
title: EasyAnimate
- local: api/pipelines/flux
title: Flux
- local: api/pipelines/control_flux_inpaint
title: FluxControlInpaint
- local: api/pipelines/framepack
title: Framepack
- local: api/pipelines/hidream
title: HiDream-I1
- local: api/pipelines/hunyuandit
title: Hunyuan-DiT
- local: api/pipelines/hunyuan_video
title: HunyuanVideo
- local: api/pipelines/i2vgenxl
title: I2VGen-XL
- local: api/pipelines/pix2pix
@@ -415,7 +488,9 @@
- local: api/pipelines/ledits_pp
title: LEDITS++
- local: api/pipelines/ltx_video
title: LTX
title: LTXVideo
- local: api/pipelines/lumina2
title: Lumina 2.0
- local: api/pipelines/lumina
title: Lumina-T2X
- local: api/pipelines/marigold
@@ -426,6 +501,8 @@
title: MultiDiffusion
- local: api/pipelines/musicldm
title: MusicLDM
- local: api/pipelines/omnigen
title: OmniGen
- local: api/pipelines/pag
title: PAG
- local: api/pipelines/paint_by_example
@@ -438,6 +515,8 @@
title: PixArt-Σ
- local: api/pipelines/sana
title: Sana
- local: api/pipelines/sana_sprint
title: Sana Sprint
- local: api/pipelines/self_attention_guidance
title: Self-Attention Guidance
- local: api/pipelines/semantic_stable_diffusion
@@ -451,40 +530,40 @@
- sections:
- local: api/pipelines/stable_diffusion/overview
title: Overview
- local: api/pipelines/stable_diffusion/text2img
title: Text-to-image
- local: api/pipelines/stable_diffusion/depth2img
title: Depth-to-image
- local: api/pipelines/stable_diffusion/gligen
title: GLIGEN (Grounded Language-to-Image Generation)
- local: api/pipelines/stable_diffusion/image_variation
title: Image variation
- local: api/pipelines/stable_diffusion/img2img
title: Image-to-image
- local: api/pipelines/stable_diffusion/svd
title: Image-to-video
- local: api/pipelines/stable_diffusion/inpaint
title: Inpainting
- local: api/pipelines/stable_diffusion/depth2img
title: Depth-to-image
- local: api/pipelines/stable_diffusion/image_variation
title: Image variation
- local: api/pipelines/stable_diffusion/k_diffusion
title: K-Diffusion
- local: api/pipelines/stable_diffusion/latent_upscale
title: Latent upscaler
- local: api/pipelines/stable_diffusion/ldm3d_diffusion
title: LDM3D Text-to-(RGB, Depth), Text-to-(RGB-pano, Depth-pano), LDM3D Upscaler
- local: api/pipelines/stable_diffusion/stable_diffusion_safe
title: Safe Stable Diffusion
- local: api/pipelines/stable_diffusion/sdxl_turbo
title: SDXL Turbo
- local: api/pipelines/stable_diffusion/stable_diffusion_2
title: Stable Diffusion 2
- local: api/pipelines/stable_diffusion/stable_diffusion_3
title: Stable Diffusion 3
- local: api/pipelines/stable_diffusion/stable_diffusion_xl
title: Stable Diffusion XL
- local: api/pipelines/stable_diffusion/sdxl_turbo
title: SDXL Turbo
- local: api/pipelines/stable_diffusion/latent_upscale
title: Latent upscaler
- local: api/pipelines/stable_diffusion/upscale
title: Super-resolution
- local: api/pipelines/stable_diffusion/k_diffusion
title: K-Diffusion
- local: api/pipelines/stable_diffusion/ldm3d_diffusion
title: LDM3D Text-to-(RGB, Depth), Text-to-(RGB-pano, Depth-pano), LDM3D Upscaler
- local: api/pipelines/stable_diffusion/adapter
title: T2I-Adapter
- local: api/pipelines/stable_diffusion/gligen
title: GLIGEN (Grounded Language-to-Image Generation)
- local: api/pipelines/stable_diffusion/text2img
title: Text-to-image
title: Stable Diffusion
- local: api/pipelines/stable_unclip
title: Stable unCLIP
@@ -498,6 +577,10 @@
title: UniDiffuser
- local: api/pipelines/value_guided_sampling
title: Value-guided sampling
- local: api/pipelines/visualcloze
title: VisualCloze
- local: api/pipelines/wan
title: Wan
- local: api/pipelines/wuerstchen
title: Wuerstchen
title: Pipelines
@@ -507,6 +590,10 @@
title: Overview
- local: api/schedulers/cm_stochastic_iterative
title: CMStochasticIterativeScheduler
- local: api/schedulers/ddim_cogvideox
title: CogVideoXDDIMScheduler
- local: api/schedulers/multistep_dpm_solver_cogvideox
title: CogVideoXDPMScheduler
- local: api/schedulers/consistency_decoder
title: ConsistencyDecoderScheduler
- local: api/schedulers/cosine_dpm
@@ -576,6 +663,8 @@
title: Attention Processor
- local: api/activations
title: Custom activation functions
- local: api/cache
title: Caching methods
- local: api/normalization
title: Custom normalization layers
- local: api/utilities

View File

@@ -1,4 +1,4 @@
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
<!--Copyright 2025 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

View File

@@ -1,4 +1,4 @@
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
<!--Copyright 2025 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
@@ -25,3 +25,16 @@ Customized activation functions for supporting various models in 🤗 Diffusers.
## ApproximateGELU
[[autodoc]] models.activations.ApproximateGELU
## SwiGLU
[[autodoc]] models.activations.SwiGLU
## FP32SiLU
[[autodoc]] models.activations.FP32SiLU
## LinearActivation
[[autodoc]] models.activations.LinearActivation

View File

@@ -1,4 +1,4 @@
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
<!--Copyright 2025 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
@@ -15,40 +15,152 @@ specific language governing permissions and limitations under the License.
An attention processor is a class for applying different types of attention mechanisms.
## AttnProcessor
[[autodoc]] models.attention_processor.AttnProcessor
## AttnProcessor2_0
[[autodoc]] models.attention_processor.AttnProcessor2_0
## AttnAddedKVProcessor
[[autodoc]] models.attention_processor.AttnAddedKVProcessor
## AttnAddedKVProcessor2_0
[[autodoc]] models.attention_processor.AttnAddedKVProcessor2_0
## CrossFrameAttnProcessor
[[autodoc]] pipelines.text_to_video_synthesis.pipeline_text_to_video_zero.CrossFrameAttnProcessor
[[autodoc]] models.attention_processor.AttnProcessorNPU
## CustomDiffusionAttnProcessor
[[autodoc]] models.attention_processor.CustomDiffusionAttnProcessor
## CustomDiffusionAttnProcessor2_0
[[autodoc]] models.attention_processor.CustomDiffusionAttnProcessor2_0
## CustomDiffusionXFormersAttnProcessor
[[autodoc]] models.attention_processor.CustomDiffusionXFormersAttnProcessor
## FusedAttnProcessor2_0
[[autodoc]] models.attention_processor.FusedAttnProcessor2_0
## Allegro
[[autodoc]] models.attention_processor.AllegroAttnProcessor2_0
## AuraFlow
[[autodoc]] models.attention_processor.AuraFlowAttnProcessor2_0
[[autodoc]] models.attention_processor.FusedAuraFlowAttnProcessor2_0
## CogVideoX
[[autodoc]] models.attention_processor.CogVideoXAttnProcessor2_0
[[autodoc]] models.attention_processor.FusedCogVideoXAttnProcessor2_0
## CrossFrameAttnProcessor
[[autodoc]] pipelines.text_to_video_synthesis.pipeline_text_to_video_zero.CrossFrameAttnProcessor
## Custom Diffusion
[[autodoc]] models.attention_processor.CustomDiffusionAttnProcessor
[[autodoc]] models.attention_processor.CustomDiffusionAttnProcessor2_0
[[autodoc]] models.attention_processor.CustomDiffusionXFormersAttnProcessor
## Flux
[[autodoc]] models.attention_processor.FluxAttnProcessor2_0
[[autodoc]] models.attention_processor.FusedFluxAttnProcessor2_0
[[autodoc]] models.attention_processor.FluxSingleAttnProcessor2_0
## Hunyuan
[[autodoc]] models.attention_processor.HunyuanAttnProcessor2_0
[[autodoc]] models.attention_processor.FusedHunyuanAttnProcessor2_0
[[autodoc]] models.attention_processor.PAGHunyuanAttnProcessor2_0
[[autodoc]] models.attention_processor.PAGCFGHunyuanAttnProcessor2_0
## IdentitySelfAttnProcessor2_0
[[autodoc]] models.attention_processor.PAGIdentitySelfAttnProcessor2_0
[[autodoc]] models.attention_processor.PAGCFGIdentitySelfAttnProcessor2_0
## IP-Adapter
[[autodoc]] models.attention_processor.IPAdapterAttnProcessor
[[autodoc]] models.attention_processor.IPAdapterAttnProcessor2_0
[[autodoc]] models.attention_processor.SD3IPAdapterJointAttnProcessor2_0
## JointAttnProcessor2_0
[[autodoc]] models.attention_processor.JointAttnProcessor2_0
[[autodoc]] models.attention_processor.PAGJointAttnProcessor2_0
[[autodoc]] models.attention_processor.PAGCFGJointAttnProcessor2_0
[[autodoc]] models.attention_processor.FusedJointAttnProcessor2_0
## LoRA
[[autodoc]] models.attention_processor.LoRAAttnProcessor
[[autodoc]] models.attention_processor.LoRAAttnProcessor2_0
[[autodoc]] models.attention_processor.LoRAAttnAddedKVProcessor
[[autodoc]] models.attention_processor.LoRAXFormersAttnProcessor
## Lumina-T2X
[[autodoc]] models.attention_processor.LuminaAttnProcessor2_0
## Mochi
[[autodoc]] models.attention_processor.MochiAttnProcessor2_0
[[autodoc]] models.attention_processor.MochiVaeAttnProcessor2_0
## Sana
[[autodoc]] models.attention_processor.SanaLinearAttnProcessor2_0
[[autodoc]] models.attention_processor.SanaMultiscaleAttnProcessor2_0
[[autodoc]] models.attention_processor.PAGCFGSanaLinearAttnProcessor2_0
[[autodoc]] models.attention_processor.PAGIdentitySanaLinearAttnProcessor2_0
## Stable Audio
[[autodoc]] models.attention_processor.StableAudioAttnProcessor2_0
## SlicedAttnProcessor
[[autodoc]] models.attention_processor.SlicedAttnProcessor
## SlicedAttnAddedKVProcessor
[[autodoc]] models.attention_processor.SlicedAttnAddedKVProcessor
## XFormersAttnProcessor
[[autodoc]] models.attention_processor.XFormersAttnProcessor
## AttnProcessorNPU
[[autodoc]] models.attention_processor.AttnProcessorNPU
[[autodoc]] models.attention_processor.XFormersAttnAddedKVProcessor
## XLAFlashAttnProcessor2_0
[[autodoc]] models.attention_processor.XLAFlashAttnProcessor2_0
## XFormersJointAttnProcessor
[[autodoc]] models.attention_processor.XFormersJointAttnProcessor
## IPAdapterXFormersAttnProcessor
[[autodoc]] models.attention_processor.IPAdapterXFormersAttnProcessor
## FluxIPAdapterJointAttnProcessor2_0
[[autodoc]] models.attention_processor.FluxIPAdapterJointAttnProcessor2_0
## XLAFluxFlashAttnProcessor2_0
[[autodoc]] models.attention_processor.XLAFluxFlashAttnProcessor2_0

View File

@@ -0,0 +1,30 @@
<!-- Copyright 2025 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License. -->
# Caching methods
Cache methods speedup diffusion transformers by storing and reusing intermediate outputs of specific layers, such as attention and feedforward layers, instead of recalculating them at each inference step.
## CacheMixin
[[autodoc]] CacheMixin
## PyramidAttentionBroadcastConfig
[[autodoc]] PyramidAttentionBroadcastConfig
[[autodoc]] apply_pyramid_attention_broadcast
## FasterCacheConfig
[[autodoc]] FasterCacheConfig
[[autodoc]] apply_faster_cache

View File

@@ -1,4 +1,4 @@
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
<!--Copyright 2025 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

View File

@@ -1,4 +1,4 @@
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
<!--Copyright 2025 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

View File

@@ -1,4 +1,4 @@
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
<!--Copyright 2025 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

View File

@@ -1,4 +1,4 @@
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
<!--Copyright 2025 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
@@ -24,6 +24,12 @@ Learn how to load an IP-Adapter checkpoint and image in the IP-Adapter [loading]
[[autodoc]] loaders.ip_adapter.IPAdapterMixin
## SD3IPAdapterMixin
[[autodoc]] loaders.ip_adapter.SD3IPAdapterMixin
- all
- is_ip_adapter_active
## IPAdapterMaskProcessor
[[autodoc]] image_processor.IPAdapterMaskProcessor

View File

@@ -1,4 +1,4 @@
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
<!--Copyright 2025 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
@@ -17,7 +17,18 @@ LoRA is a fast and lightweight training method that inserts and trains a signifi
- [`StableDiffusionLoraLoaderMixin`] provides functions for loading and unloading, fusing and unfusing, enabling and disabling, and more functions for managing LoRA weights. This class can be used with any model.
- [`StableDiffusionXLLoraLoaderMixin`] is a [Stable Diffusion (SDXL)](../../api/pipelines/stable_diffusion/stable_diffusion_xl) version of the [`StableDiffusionLoraLoaderMixin`] class for loading and saving LoRA weights. It can only be used with the SDXL model.
- [`SD3LoraLoaderMixin`] provides similar functions for [Stable Diffusion 3](https://huggingface.co/blog/sd3).
- [`FluxLoraLoaderMixin`] provides similar functions for [Flux](https://huggingface.co/docs/diffusers/main/en/api/pipelines/flux).
- [`CogVideoXLoraLoaderMixin`] provides similar functions for [CogVideoX](https://huggingface.co/docs/diffusers/main/en/api/pipelines/cogvideox).
- [`Mochi1LoraLoaderMixin`] provides similar functions for [Mochi](https://huggingface.co/docs/diffusers/main/en/api/pipelines/mochi).
- [`AuraFlowLoraLoaderMixin`] provides similar functions for [AuraFlow](https://huggingface.co/fal/AuraFlow).
- [`LTXVideoLoraLoaderMixin`] provides similar functions for [LTX-Video](https://huggingface.co/docs/diffusers/main/en/api/pipelines/ltx_video).
- [`SanaLoraLoaderMixin`] provides similar functions for [Sana](https://huggingface.co/docs/diffusers/main/en/api/pipelines/sana).
- [`HunyuanVideoLoraLoaderMixin`] provides similar functions for [HunyuanVideo](https://huggingface.co/docs/diffusers/main/en/api/pipelines/hunyuan_video).
- [`Lumina2LoraLoaderMixin`] provides similar functions for [Lumina2](https://huggingface.co/docs/diffusers/main/en/api/pipelines/lumina2).
- [`WanLoraLoaderMixin`] provides similar functions for [Wan](https://huggingface.co/docs/diffusers/main/en/api/pipelines/wan).
- [`CogView4LoraLoaderMixin`] provides similar functions for [CogView4](https://huggingface.co/docs/diffusers/main/en/api/pipelines/cogview4).
- [`AmusedLoraLoaderMixin`] is for the [`AmusedPipeline`].
- [`HiDreamImageLoraLoaderMixin`] provides similar functions for [HiDream Image](https://huggingface.co/docs/diffusers/main/en/api/pipelines/hidream)
- [`LoraBaseMixin`] provides a base class with several utility methods to fuse, unfuse, unload, LoRAs and more.
<Tip>
@@ -38,10 +49,57 @@ To learn more about how to load LoRA weights, see the [LoRA](../../using-diffuse
[[autodoc]] loaders.lora_pipeline.SD3LoraLoaderMixin
## FluxLoraLoaderMixin
[[autodoc]] loaders.lora_pipeline.FluxLoraLoaderMixin
## CogVideoXLoraLoaderMixin
[[autodoc]] loaders.lora_pipeline.CogVideoXLoraLoaderMixin
## Mochi1LoraLoaderMixin
[[autodoc]] loaders.lora_pipeline.Mochi1LoraLoaderMixin
## AuraFlowLoraLoaderMixin
[[autodoc]] loaders.lora_pipeline.AuraFlowLoraLoaderMixin
## LTXVideoLoraLoaderMixin
[[autodoc]] loaders.lora_pipeline.LTXVideoLoraLoaderMixin
## SanaLoraLoaderMixin
[[autodoc]] loaders.lora_pipeline.SanaLoraLoaderMixin
## HunyuanVideoLoraLoaderMixin
[[autodoc]] loaders.lora_pipeline.HunyuanVideoLoraLoaderMixin
## Lumina2LoraLoaderMixin
[[autodoc]] loaders.lora_pipeline.Lumina2LoraLoaderMixin
## CogView4LoraLoaderMixin
[[autodoc]] loaders.lora_pipeline.CogView4LoraLoaderMixin
## WanLoraLoaderMixin
[[autodoc]] loaders.lora_pipeline.WanLoraLoaderMixin
## AmusedLoraLoaderMixin
[[autodoc]] loaders.lora_pipeline.AmusedLoraLoaderMixin
## HiDreamImageLoraLoaderMixin
[[autodoc]] loaders.lora_pipeline.HiDreamImageLoraLoaderMixin
## LoraBaseMixin
[[autodoc]] loaders.lora_base.LoraBaseMixin
[[autodoc]] loaders.lora_base.LoraBaseMixin
## WanLoraLoaderMixin
[[autodoc]] loaders.lora_pipeline.WanLoraLoaderMixin

View File

@@ -1,4 +1,4 @@
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
<!--Copyright 2025 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

View File

@@ -1,4 +1,4 @@
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
<!--Copyright 2025 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

View File

@@ -1,4 +1,4 @@
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
<!--Copyright 2025 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

View File

@@ -0,0 +1,29 @@
<!--Copyright 2025 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->
# SD3Transformer2D
This class is useful when *only* loading weights into a [`SD3Transformer2DModel`]. If you need to load weights into the text encoder or a text encoder and SD3Transformer2DModel, check [`SD3LoraLoaderMixin`](lora#diffusers.loaders.SD3LoraLoaderMixin) class instead.
The [`SD3Transformer2DLoadersMixin`] class currently only loads IP-Adapter weights, but will be used in the future to save weights and load LoRAs.
<Tip>
To learn more about how to load LoRA weights, see the [LoRA](../../using-diffusers/loading_adapters#lora) loading guide.
</Tip>
## SD3Transformer2DLoadersMixin
[[autodoc]] loaders.transformer_sd3.SD3Transformer2DLoadersMixin
- all
- _load_ip_adapter_weights

View File

@@ -1,4 +1,4 @@
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
<!--Copyright 2025 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

View File

@@ -1,4 +1,4 @@
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
<!--Copyright 2025 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

View File

@@ -1,4 +1,4 @@
<!-- Copyright 2024 The HuggingFace Team. All rights reserved.
<!-- Copyright 2025 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
@@ -18,7 +18,7 @@ The model can be loaded with the following code snippet.
```python
from diffusers import AllegroTransformer3DModel
vae = AllegroTransformer3DModel.from_pretrained("rhymes-ai/Allegro", subfolder="transformer", torch_dtype=torch.bfloat16).to("cuda")
transformer = AllegroTransformer3DModel.from_pretrained("rhymes-ai/Allegro", subfolder="transformer", torch_dtype=torch.bfloat16).to("cuda")
```
## AllegroTransformer3DModel

View File

@@ -1,4 +1,4 @@
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
<!--Copyright 2025 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
@@ -12,7 +12,7 @@ specific language governing permissions and limitations under the License.
# AsymmetricAutoencoderKL
Improved larger variational autoencoder (VAE) model with KL loss for inpainting task: [Designing a Better Asymmetric VQGAN for StableDiffusion](https://arxiv.org/abs/2306.04632) by Zixin Zhu, Xuelu Feng, Dongdong Chen, Jianmin Bao, Le Wang, Yinpeng Chen, Lu Yuan, Gang Hua.
Improved larger variational autoencoder (VAE) model with KL loss for inpainting task: [Designing a Better Asymmetric VQGAN for StableDiffusion](https://huggingface.co/papers/2306.04632) by Zixin Zhu, Xuelu Feng, Dongdong Chen, Jianmin Bao, Le Wang, Yinpeng Chen, Lu Yuan, Gang Hua.
The abstract from the paper is:

View File

@@ -1,4 +1,4 @@
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
<!--Copyright 2025 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

View File

@@ -0,0 +1,29 @@
<!--Copyright 2025 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->
# AutoModel
The `AutoModel` is designed to make it easy to load a checkpoint without needing to know the specific model class. `AutoModel` automatically retrieves the correct model class from the checkpoint `config.json` file.
```python
from diffusers import AutoModel, AutoPipelineForText2Image
unet = AutoModel.from_pretrained("stable-diffusion-v1-5/stable-diffusion-v1-5", subfolder="unet")
pipe = AutoPipelineForText2Image.from_pretrained("stable-diffusion-v1-5/stable-diffusion-v1-5", unet=unet)
```
## AutoModel
[[autodoc]] AutoModel
- all
- from_pretrained

View File

@@ -1,4 +1,4 @@
<!-- Copyright 2024 The HuggingFace Team. All rights reserved.
<!-- Copyright 2025 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
@@ -29,6 +29,8 @@ The following DCAE models are released and supported in Diffusers.
| [`mit-han-lab/dc-ae-f128c512-in-1.0-diffusers`](https://huggingface.co/mit-han-lab/dc-ae-f128c512-in-1.0-diffusers) | [`mit-han-lab/dc-ae-f128c512-in-1.0`](https://huggingface.co/mit-han-lab/dc-ae-f128c512-in-1.0)
| [`mit-han-lab/dc-ae-f128c512-mix-1.0-diffusers`](https://huggingface.co/mit-han-lab/dc-ae-f128c512-mix-1.0-diffusers) | [`mit-han-lab/dc-ae-f128c512-mix-1.0`](https://huggingface.co/mit-han-lab/dc-ae-f128c512-mix-1.0)
This model was contributed by [lawrence-cj](https://github.com/lawrence-cj).
Load a model in Diffusers format with [`~ModelMixin.from_pretrained`].
```python

View File

@@ -0,0 +1,32 @@
<!-- Copyright 2025 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License. -->
# AutoencoderKLHunyuanVideo
The 3D variational autoencoder (VAE) model with KL loss used in [HunyuanVideo](https://github.com/Tencent/HunyuanVideo/), which was introduced in [HunyuanVideo: A Systematic Framework For Large Video Generative Models](https://huggingface.co/papers/2412.03603) by Tencent.
The model can be loaded with the following code snippet.
```python
from diffusers import AutoencoderKLHunyuanVideo
vae = AutoencoderKLHunyuanVideo.from_pretrained("hunyuanvideo-community/HunyuanVideo", subfolder="vae", torch_dtype=torch.float16)
```
## AutoencoderKLHunyuanVideo
[[autodoc]] AutoencoderKLHunyuanVideo
- decode
- all
## DecoderOutput
[[autodoc]] models.autoencoders.vae.DecoderOutput

View File

@@ -0,0 +1,32 @@
<!-- Copyright 2025 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License. -->
# AutoencoderKLWan
The 3D variational autoencoder (VAE) model with KL loss used in [Wan 2.1](https://github.com/Wan-Video/Wan2.1) by the Alibaba Wan Team.
The model can be loaded with the following code snippet.
```python
from diffusers import AutoencoderKLWan
vae = AutoencoderKLWan.from_pretrained("Wan-AI/Wan2.1-T2V-1.3B-Diffusers", subfolder="vae", torch_dtype=torch.float32)
```
## AutoencoderKLWan
[[autodoc]] AutoencoderKLWan
- decode
- all
## DecoderOutput
[[autodoc]] models.autoencoders.vae.DecoderOutput

View File

@@ -1,4 +1,4 @@
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
<!--Copyright 2025 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

View File

@@ -1,4 +1,4 @@
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
<!--Copyright 2025 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

View File

@@ -1,4 +1,4 @@
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
<!--Copyright 2025 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
@@ -12,7 +12,7 @@ specific language governing permissions and limitations under the License.
# AutoencoderKL
The variational autoencoder (VAE) model with KL loss was introduced in [Auto-Encoding Variational Bayes](https://arxiv.org/abs/1312.6114v11) by Diederik P. Kingma and Max Welling. The model is used in 🤗 Diffusers to encode images into latents and to decode latent representations into images.
The variational autoencoder (VAE) model with KL loss was introduced in [Auto-Encoding Variational Bayes](https://huggingface.co/papers/1312.6114v11) by Diederik P. Kingma and Max Welling. The model is used in 🤗 Diffusers to encode images into latents and to decode latent representations into images.
The abstract from the paper is:

View File

@@ -1,4 +1,4 @@
<!-- Copyright 2024 The HuggingFace Team. All rights reserved.
<!-- Copyright 2025 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
@@ -18,7 +18,7 @@ The model can be loaded with the following code snippet.
```python
from diffusers import AutoencoderKLAllegro
vae = AutoencoderKLCogVideoX.from_pretrained("rhymes-ai/Allegro", subfolder="vae", torch_dtype=torch.float32).to("cuda")
vae = AutoencoderKLAllegro.from_pretrained("rhymes-ai/Allegro", subfolder="vae", torch_dtype=torch.float32).to("cuda")
```
## AutoencoderKLAllegro

View File

@@ -1,4 +1,4 @@
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
<!--Copyright 2025 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

View File

@@ -0,0 +1,40 @@
<!-- Copyright 2025 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License. -->
# AutoencoderKLCosmos
[Cosmos Tokenizers](https://github.com/NVIDIA/Cosmos-Tokenizer).
Supported models:
- [nvidia/Cosmos-1.0-Tokenizer-CV8x8x8](https://huggingface.co/nvidia/Cosmos-1.0-Tokenizer-CV8x8x8)
The model can be loaded with the following code snippet.
```python
from diffusers import AutoencoderKLCosmos
vae = AutoencoderKLCosmos.from_pretrained("nvidia/Cosmos-1.0-Tokenizer-CV8x8x8", subfolder="vae")
```
## AutoencoderKLCosmos
[[autodoc]] AutoencoderKLCosmos
- decode
- encode
- all
## AutoencoderKLOutput
[[autodoc]] models.autoencoders.autoencoder_kl.AutoencoderKLOutput
## DecoderOutput
[[autodoc]] models.autoencoders.vae.DecoderOutput

View File

@@ -1,4 +1,4 @@
<!-- Copyright 2024 The HuggingFace Team. All rights reserved.
<!-- Copyright 2025 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
@@ -18,7 +18,7 @@ The model can be loaded with the following code snippet.
```python
from diffusers import AutoencoderKLLTXVideo
vae = AutoencoderKLLTXVideo.from_pretrained("TODO/TODO", subfolder="vae", torch_dtype=torch.float32).to("cuda")
vae = AutoencoderKLLTXVideo.from_pretrained("Lightricks/LTX-Video", subfolder="vae", torch_dtype=torch.float32).to("cuda")
```
## AutoencoderKLLTXVideo

View File

@@ -0,0 +1,37 @@
<!--Copyright 2025 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License. -->
# AutoencoderKLMagvit
The 3D variational autoencoder (VAE) model with KL loss used in [EasyAnimate](https://github.com/aigc-apps/EasyAnimate) was introduced by Alibaba PAI.
The model can be loaded with the following code snippet.
```python
from diffusers import AutoencoderKLMagvit
vae = AutoencoderKLMagvit.from_pretrained("alibaba-pai/EasyAnimateV5.1-12b-zh", subfolder="vae", torch_dtype=torch.float16).to("cuda")
```
## AutoencoderKLMagvit
[[autodoc]] AutoencoderKLMagvit
- decode
- encode
- all
## AutoencoderKLOutput
[[autodoc]] models.autoencoders.autoencoder_kl.AutoencoderKLOutput
## DecoderOutput
[[autodoc]] models.autoencoders.vae.DecoderOutput

View File

@@ -1,4 +1,4 @@
<!-- Copyright 2024 The HuggingFace Team. All rights reserved.
<!-- Copyright 2025 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

View File

@@ -0,0 +1,19 @@
<!--Copyright 2025 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->
# ChromaTransformer2DModel
A modified flux Transformer model from [Chroma](https://huggingface.co/lodestones/Chroma)
## ChromaTransformer2DModel
[[autodoc]] ChromaTransformer2DModel

View File

@@ -1,4 +1,4 @@
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
<!--Copyright 2025 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
@@ -18,7 +18,7 @@ The model can be loaded with the following code snippet.
```python
from diffusers import CogVideoXTransformer3DModel
vae = CogVideoXTransformer3DModel.from_pretrained("THUDM/CogVideoX-2b", subfolder="transformer", torch_dtype=torch.float16).to("cuda")
transformer = CogVideoXTransformer3DModel.from_pretrained("THUDM/CogVideoX-2b", subfolder="transformer", torch_dtype=torch.float16).to("cuda")
```
## CogVideoXTransformer3DModel

View File

@@ -1,4 +1,4 @@
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
<!--Copyright 2025 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
@@ -18,7 +18,7 @@ The model can be loaded with the following code snippet.
```python
from diffusers import CogView3PlusTransformer2DModel
vae = CogView3PlusTransformer2DModel.from_pretrained("THUDM/CogView3Plus-3b", subfolder="transformer", torch_dtype=torch.bfloat16).to("cuda")
transformer = CogView3PlusTransformer2DModel.from_pretrained("THUDM/CogView3Plus-3b", subfolder="transformer", torch_dtype=torch.bfloat16).to("cuda")
```
## CogView3PlusTransformer2DModel

View File

@@ -0,0 +1,30 @@
<!--Copyright 2025 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License. -->
# CogView4Transformer2DModel
A Diffusion Transformer model for 2D data from [CogView4]()
The model can be loaded with the following code snippet.
```python
from diffusers import CogView4Transformer2DModel
transformer = CogView4Transformer2DModel.from_pretrained("THUDM/CogView4-6B", subfolder="transformer", torch_dtype=torch.bfloat16).to("cuda")
```
## CogView4Transformer2DModel
[[autodoc]] CogView4Transformer2DModel
## Transformer2DModelOutput
[[autodoc]] models.modeling_outputs.Transformer2DModelOutput

View File

@@ -0,0 +1,30 @@
<!--Copyright 2025 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License. -->
# ConsisIDTransformer3DModel
A Diffusion Transformer model for 3D data from [ConsisID](https://github.com/PKU-YuanGroup/ConsisID) was introduced in [Identity-Preserving Text-to-Video Generation by Frequency Decomposition](https://huggingface.co/papers/2411.17440) by Peking University & University of Rochester & etc.
The model can be loaded with the following code snippet.
```python
from diffusers import ConsisIDTransformer3DModel
transformer = ConsisIDTransformer3DModel.from_pretrained("BestWishYsh/ConsisID-preview", subfolder="transformer", torch_dtype=torch.bfloat16).to("cuda")
```
## ConsisIDTransformer3DModel
[[autodoc]] ConsisIDTransformer3DModel
## Transformer2DModelOutput
[[autodoc]] models.modeling_outputs.Transformer2DModelOutput

View File

@@ -1,4 +1,4 @@
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
<!--Copyright 2025 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

View File

@@ -1,4 +1,4 @@
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
<!--Copyright 2025 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

View File

@@ -1,4 +1,4 @@
<!--Copyright 2024 The HuggingFace Team and The InstantX Team. All rights reserved.
<!--Copyright 2025 The HuggingFace Team and The InstantX Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

View File

@@ -1,4 +1,4 @@
<!--Copyright 2024 The HuggingFace Team and Tencent Hunyuan Team. All rights reserved.
<!--Copyright 2025 The HuggingFace Team and Tencent Hunyuan Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
@@ -12,7 +12,7 @@ specific language governing permissions and limitations under the License.
# HunyuanDiT2DControlNetModel
HunyuanDiT2DControlNetModel is an implementation of ControlNet for [Hunyuan-DiT](https://arxiv.org/abs/2405.08748).
HunyuanDiT2DControlNetModel is an implementation of ControlNet for [Hunyuan-DiT](https://huggingface.co/papers/2405.08748).
ControlNet was introduced in [Adding Conditional Control to Text-to-Image Diffusion Models](https://huggingface.co/papers/2302.05543) by Lvmin Zhang, Anyi Rao, and Maneesh Agrawala.

View File

@@ -0,0 +1,29 @@
<!--Copyright 2025 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->
# SanaControlNetModel
The ControlNet model was introduced in [Adding Conditional Control to Text-to-Image Diffusion Models](https://huggingface.co/papers/2302.05543) by Lvmin Zhang, Anyi Rao, Maneesh Agrawala. It provides a greater degree of control over text-to-image generation by conditioning the model on additional inputs such as edge maps, depth maps, segmentation maps, and keypoints for pose detection.
The abstract from the paper is:
*We present ControlNet, a neural network architecture to add spatial conditioning controls to large, pretrained text-to-image diffusion models. ControlNet locks the production-ready large diffusion models, and reuses their deep and robust encoding layers pretrained with billions of images as a strong backbone to learn a diverse set of conditional controls. The neural architecture is connected with "zero convolutions" (zero-initialized convolution layers) that progressively grow the parameters from zero and ensure that no harmful noise could affect the finetuning. We test various conditioning controls, eg, edges, depth, segmentation, human pose, etc, with Stable Diffusion, using single or multiple conditions, with or without prompts. We show that the training of ControlNets is robust with small (<50k) and large (>1m) datasets. Extensive results show that ControlNet may facilitate wider applications to control image diffusion models.*
This model was contributed by [ishan24](https://huggingface.co/ishan24). ❤️
The original codebase can be found at [NVlabs/Sana](https://github.com/NVlabs/Sana), and you can find official ControlNet checkpoints on [Efficient-Large-Model's](https://huggingface.co/Efficient-Large-Model) Hub profile.
## SanaControlNetModel
[[autodoc]] SanaControlNetModel
## SanaControlNetOutput
[[autodoc]] models.controlnets.controlnet_sana.SanaControlNetOutput

View File

@@ -1,4 +1,4 @@
<!--Copyright 2024 The HuggingFace Team and The InstantX Team. All rights reserved.
<!--Copyright 2025 The HuggingFace Team and The InstantX Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

View File

@@ -1,4 +1,4 @@
<!-- Copyright 2024 The HuggingFace Team. All rights reserved.
<!-- Copyright 2025 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
@@ -11,11 +11,11 @@ specific language governing permissions and limitations under the License. -->
# SparseControlNetModel
SparseControlNetModel is an implementation of ControlNet for [AnimateDiff](https://arxiv.org/abs/2307.04725).
SparseControlNetModel is an implementation of ControlNet for [AnimateDiff](https://huggingface.co/papers/2307.04725).
ControlNet was introduced in [Adding Conditional Control to Text-to-Image Diffusion Models](https://huggingface.co/papers/2302.05543) by Lvmin Zhang, Anyi Rao, and Maneesh Agrawala.
The SparseCtrl version of ControlNet was introduced in [SparseCtrl: Adding Sparse Controls to Text-to-Video Diffusion Models](https://arxiv.org/abs/2311.16933) for achieving controlled generation in text-to-video diffusion models by Yuwei Guo, Ceyuan Yang, Anyi Rao, Maneesh Agrawala, Dahua Lin, and Bo Dai.
The SparseCtrl version of ControlNet was introduced in [SparseCtrl: Adding Sparse Controls to Text-to-Video Diffusion Models](https://huggingface.co/papers/2311.16933) for achieving controlled generation in text-to-video diffusion models by Yuwei Guo, Ceyuan Yang, Anyi Rao, Maneesh Agrawala, Dahua Lin, and Bo Dai.
The abstract from the paper is:

View File

@@ -1,4 +1,4 @@
<!--Copyright 2024 The HuggingFace Team and The InstantX Team. All rights reserved.
<!--Copyright 2025 The HuggingFace Team and The InstantX Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

View File

@@ -0,0 +1,30 @@
<!-- Copyright 2025 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License. -->
# CosmosTransformer3DModel
A Diffusion Transformer model for 3D video-like data was introduced in [Cosmos World Foundation Model Platform for Physical AI](https://huggingface.co/papers/2501.03575) by NVIDIA.
The model can be loaded with the following code snippet.
```python
from diffusers import CosmosTransformer3DModel
transformer = CosmosTransformer3DModel.from_pretrained("nvidia/Cosmos-1.0-Diffusion-7B-Text2World", subfolder="transformer", torch_dtype=torch.bfloat16)
```
## CosmosTransformer3DModel
[[autodoc]] CosmosTransformer3DModel
## Transformer2DModelOutput
[[autodoc]] models.modeling_outputs.Transformer2DModelOutput

View File

@@ -1,4 +1,4 @@
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
<!--Copyright 2025 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

View File

@@ -0,0 +1,30 @@
<!--Copyright 2025 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License. -->
# EasyAnimateTransformer3DModel
A Diffusion Transformer model for 3D data from [EasyAnimate](https://github.com/aigc-apps/EasyAnimate) was introduced by Alibaba PAI.
The model can be loaded with the following code snippet.
```python
from diffusers import EasyAnimateTransformer3DModel
transformer = EasyAnimateTransformer3DModel.from_pretrained("alibaba-pai/EasyAnimateV5.1-12b-zh", subfolder="transformer", torch_dtype=torch.float16).to("cuda")
```
## EasyAnimateTransformer3DModel
[[autodoc]] EasyAnimateTransformer3DModel
## Transformer2DModelOutput
[[autodoc]] models.modeling_outputs.Transformer2DModelOutput

View File

@@ -1,4 +1,4 @@
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
<!--Copyright 2025 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

View File

@@ -0,0 +1,46 @@
<!-- Copyright 2025 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License. -->
# HiDreamImageTransformer2DModel
A Transformer model for image-like data from [HiDream-I1](https://huggingface.co/HiDream-ai).
The model can be loaded with the following code snippet.
```python
from diffusers import HiDreamImageTransformer2DModel
transformer = HiDreamImageTransformer2DModel.from_pretrained("HiDream-ai/HiDream-I1-Full", subfolder="transformer", torch_dtype=torch.bfloat16)
```
## Loading GGUF quantized checkpoints for HiDream-I1
GGUF checkpoints for the `HiDreamImageTransformer2DModel` can be loaded using `~FromOriginalModelMixin.from_single_file`
```python
import torch
from diffusers import GGUFQuantizationConfig, HiDreamImageTransformer2DModel
ckpt_path = "https://huggingface.co/city96/HiDream-I1-Dev-gguf/blob/main/hidream-i1-dev-Q2_K.gguf"
transformer = HiDreamImageTransformer2DModel.from_single_file(
ckpt_path,
quantization_config=GGUFQuantizationConfig(compute_dtype=torch.bfloat16),
torch_dtype=torch.bfloat16
)
```
## HiDreamImageTransformer2DModel
[[autodoc]] HiDreamImageTransformer2DModel
## Transformer2DModelOutput
[[autodoc]] models.modeling_outputs.Transformer2DModelOutput

View File

@@ -1,4 +1,4 @@
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
<!--Copyright 2025 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

View File

@@ -0,0 +1,30 @@
<!-- Copyright 2025 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License. -->
# HunyuanVideoTransformer3DModel
A Diffusion Transformer model for 3D video-like data was introduced in [HunyuanVideo: A Systematic Framework For Large Video Generative Models](https://huggingface.co/papers/2412.03603) by Tencent.
The model can be loaded with the following code snippet.
```python
from diffusers import HunyuanVideoTransformer3DModel
transformer = HunyuanVideoTransformer3DModel.from_pretrained("hunyuanvideo-community/HunyuanVideo", subfolder="transformer", torch_dtype=torch.bfloat16)
```
## HunyuanVideoTransformer3DModel
[[autodoc]] HunyuanVideoTransformer3DModel
## Transformer2DModelOutput
[[autodoc]] models.modeling_outputs.Transformer2DModelOutput

View File

@@ -1,4 +1,4 @@
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
<!--Copyright 2025 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

View File

@@ -1,4 +1,4 @@
<!-- Copyright 2024 The HuggingFace Team. All rights reserved.
<!-- Copyright 2025 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
@@ -18,7 +18,7 @@ The model can be loaded with the following code snippet.
```python
from diffusers import LTXVideoTransformer3DModel
transformer = LTXVideoTransformer3DModel.from_pretrained("TODO/TODO", subfolder="transformer", torch_dtype=torch.bfloat16).to("cuda")
transformer = LTXVideoTransformer3DModel.from_pretrained("Lightricks/LTX-Video", subfolder="transformer", torch_dtype=torch.bfloat16).to("cuda")
```
## LTXVideoTransformer3DModel

View File

@@ -0,0 +1,30 @@
<!-- Copyright 2025 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License. -->
# Lumina2Transformer2DModel
A Diffusion Transformer model for 3D video-like data was introduced in [Lumina Image 2.0](https://huggingface.co/Alpha-VLLM/Lumina-Image-2.0) by Alpha-VLLM.
The model can be loaded with the following code snippet.
```python
from diffusers import Lumina2Transformer2DModel
transformer = Lumina2Transformer2DModel.from_pretrained("Alpha-VLLM/Lumina-Image-2.0", subfolder="transformer", torch_dtype=torch.bfloat16)
```
## Lumina2Transformer2DModel
[[autodoc]] Lumina2Transformer2DModel
## Transformer2DModelOutput
[[autodoc]] models.modeling_outputs.Transformer2DModelOutput

View File

@@ -1,4 +1,4 @@
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
<!--Copyright 2025 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

View File

@@ -1,4 +1,4 @@
<!-- Copyright 2024 The HuggingFace Team. All rights reserved.
<!-- Copyright 2025 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
@@ -18,7 +18,7 @@ The model can be loaded with the following code snippet.
```python
from diffusers import MochiTransformer3DModel
vae = MochiTransformer3DModel.from_pretrained("genmo/mochi-1-preview", subfolder="transformer", torch_dtype=torch.float16).to("cuda")
transformer = MochiTransformer3DModel.from_pretrained("genmo/mochi-1-preview", subfolder="transformer", torch_dtype=torch.float16).to("cuda")
```
## MochiTransformer3DModel

View File

@@ -0,0 +1,30 @@
<!--Copyright 2025 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->
# OmniGenTransformer2DModel
A Transformer model that accepts multimodal instructions to generate images for [OmniGen](https://github.com/VectorSpaceLab/OmniGen/).
The abstract from the paper is:
*The emergence of Large Language Models (LLMs) has unified language generation tasks and revolutionized human-machine interaction. However, in the realm of image generation, a unified model capable of handling various tasks within a single framework remains largely unexplored. In this work, we introduce OmniGen, a new diffusion model for unified image generation. OmniGen is characterized by the following features: 1) Unification: OmniGen not only demonstrates text-to-image generation capabilities but also inherently supports various downstream tasks, such as image editing, subject-driven generation, and visual conditional generation. 2) Simplicity: The architecture of OmniGen is highly simplified, eliminating the need for additional plugins. Moreover, compared to existing diffusion models, it is more user-friendly and can complete complex tasks end-to-end through instructions without the need for extra intermediate steps, greatly simplifying the image generation workflow. 3) Knowledge Transfer: Benefit from learning in a unified format, OmniGen effectively transfers knowledge across different tasks, manages unseen tasks and domains, and exhibits novel capabilities. We also explore the models reasoning capabilities and potential applications of the chain-of-thought mechanism. This work represents the first attempt at a general-purpose image generation model, and we will release our resources at https://github.com/VectorSpaceLab/OmniGen to foster future advancements.*
```python
import torch
from diffusers import OmniGenTransformer2DModel
transformer = OmniGenTransformer2DModel.from_pretrained("Shitao/OmniGen-v1-diffusers", subfolder="transformer", torch_dtype=torch.bfloat16)
```
## OmniGenTransformer2DModel
[[autodoc]] OmniGenTransformer2DModel

View File

@@ -1,4 +1,4 @@
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
<!--Copyright 2025 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

View File

@@ -1,4 +1,4 @@
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
<!--Copyright 2025 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

View File

@@ -1,4 +1,4 @@
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
<!--Copyright 2025 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

View File

@@ -1,4 +1,4 @@
<!-- Copyright 2024 The HuggingFace Team. All rights reserved.
<!-- Copyright 2025 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
@@ -22,7 +22,7 @@ The model can be loaded with the following code snippet.
```python
from diffusers import SanaTransformer2DModel
transformer = SanaTransformer2DModel.from_pretrained("Efficient-Large-Model/Sana_1600M_1024px_diffusers", subfolder="transformer", torch_dtype=torch.float16)
transformer = SanaTransformer2DModel.from_pretrained("Efficient-Large-Model/Sana_1600M_1024px_BF16_diffusers", subfolder="transformer", torch_dtype=torch.bfloat16)
```
## SanaTransformer2DModel

View File

@@ -1,4 +1,4 @@
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
<!--Copyright 2025 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

View File

@@ -1,4 +1,4 @@
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
<!--Copyright 2025 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

View File

@@ -1,4 +1,4 @@
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
<!--Copyright 2025 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

View File

@@ -1,4 +1,4 @@
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
<!--Copyright 2025 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

View File

@@ -1,4 +1,4 @@
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
<!--Copyright 2025 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

View File

@@ -1,4 +1,4 @@
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
<!--Copyright 2025 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

View File

@@ -1,4 +1,4 @@
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
<!--Copyright 2025 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

View File

@@ -1,4 +1,4 @@
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
<!--Copyright 2025 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

View File

@@ -1,4 +1,4 @@
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
<!--Copyright 2025 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

View File

@@ -1,4 +1,4 @@
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
<!--Copyright 2025 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

View File

@@ -1,4 +1,4 @@
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
<!--Copyright 2025 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

View File

@@ -1,4 +1,4 @@
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
<!--Copyright 2025 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

View File

@@ -0,0 +1,30 @@
<!-- Copyright 2025 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License. -->
# WanTransformer3DModel
A Diffusion Transformer model for 3D video-like data was introduced in [Wan 2.1](https://github.com/Wan-Video/Wan2.1) by the Alibaba Wan Team.
The model can be loaded with the following code snippet.
```python
from diffusers import WanTransformer3DModel
transformer = WanTransformer3DModel.from_pretrained("Wan-AI/Wan2.1-T2V-1.3B-Diffusers", subfolder="transformer", torch_dtype=torch.bfloat16)
```
## WanTransformer3DModel
[[autodoc]] WanTransformer3DModel
## Transformer2DModelOutput
[[autodoc]] models.modeling_outputs.Transformer2DModelOutput

View File

@@ -1,4 +1,4 @@
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
<!--Copyright 2025 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
@@ -29,3 +29,43 @@ Customized normalization layers for supporting various models in 🤗 Diffusers.
## AdaGroupNorm
[[autodoc]] models.normalization.AdaGroupNorm
## AdaLayerNormContinuous
[[autodoc]] models.normalization.AdaLayerNormContinuous
## RMSNorm
[[autodoc]] models.normalization.RMSNorm
## GlobalResponseNorm
[[autodoc]] models.normalization.GlobalResponseNorm
## LuminaLayerNormContinuous
[[autodoc]] models.normalization.LuminaLayerNormContinuous
## SD35AdaLayerNormZeroX
[[autodoc]] models.normalization.SD35AdaLayerNormZeroX
## AdaLayerNormZeroSingle
[[autodoc]] models.normalization.AdaLayerNormZeroSingle
## LuminaRMSNormZero
[[autodoc]] models.normalization.LuminaRMSNormZero
## LpNorm
[[autodoc]] models.normalization.LpNorm
## CogView3PlusAdaLayerNormZeroTextImage
[[autodoc]] models.normalization.CogView3PlusAdaLayerNormZeroTextImage
## CogVideoXLayerNormZero
[[autodoc]] models.normalization.CogVideoXLayerNormZero
## MochiRMSNormZero
[[autodoc]] models.transformers.transformer_mochi.MochiRMSNormZero
## MochiRMSNorm
[[autodoc]] models.normalization.MochiRMSNorm

View File

@@ -1,4 +1,4 @@
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
<!--Copyright 2025 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

View File

@@ -1,4 +1,4 @@
<!-- Copyright 2024 The HuggingFace Team. All rights reserved.
<!-- Copyright 2025 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
@@ -19,10 +19,55 @@ The abstract from the paper is:
<Tip>
Make sure to check out the Schedulers [guide](../../using-diffusers/schedulers.md) to learn how to explore the tradeoff between scheduler speed and quality, and see the [reuse components across pipelines](../../using-diffusers/loading.md#reuse-a-pipeline) section to learn how to efficiently load the same components into multiple pipelines.
Make sure to check out the Schedulers [guide](../../using-diffusers/schedulers) to learn how to explore the tradeoff between scheduler speed and quality, and see the [reuse components across pipelines](../../using-diffusers/loading#reuse-a-pipeline) section to learn how to efficiently load the same components into multiple pipelines.
</Tip>
## Quantization
Quantization helps reduce the memory requirements of very large models by storing model weights in a lower precision data type. However, quantization may have varying impact on video quality depending on the video model.
Refer to the [Quantization](../../quantization/overview) overview to learn more about supported quantization backends and selecting a quantization backend that supports your use case. The example below demonstrates how to load a quantized [`AllegroPipeline`] for inference with bitsandbytes.
```py
import torch
from diffusers import BitsAndBytesConfig as DiffusersBitsAndBytesConfig, AllegroTransformer3DModel, AllegroPipeline
from diffusers.utils import export_to_video
from transformers import BitsAndBytesConfig as BitsAndBytesConfig, T5EncoderModel
quant_config = BitsAndBytesConfig(load_in_8bit=True)
text_encoder_8bit = T5EncoderModel.from_pretrained(
"rhymes-ai/Allegro",
subfolder="text_encoder",
quantization_config=quant_config,
torch_dtype=torch.float16,
)
quant_config = DiffusersBitsAndBytesConfig(load_in_8bit=True)
transformer_8bit = AllegroTransformer3DModel.from_pretrained(
"rhymes-ai/Allegro",
subfolder="transformer",
quantization_config=quant_config,
torch_dtype=torch.float16,
)
pipeline = AllegroPipeline.from_pretrained(
"rhymes-ai/Allegro",
text_encoder=text_encoder_8bit,
transformer=transformer_8bit,
torch_dtype=torch.float16,
device_map="balanced",
)
prompt = (
"A seaside harbor with bright sunlight and sparkling seawater, with many boats in the water. From an aerial view, "
"the boats vary in size and color, some moving and some stationary. Fishing boats in the water suggest that this "
"location might be a popular spot for docking fishing boats."
)
video = pipeline(prompt, guidance_scale=7.5, max_sequence_length=512).frames[0]
export_to_video(video, "harbor.mp4", fps=15)
```
## AllegroPipeline
[[autodoc]] AllegroPipeline

View File

@@ -1,4 +1,4 @@
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
<!--Copyright 2025 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
@@ -14,7 +14,7 @@ specific language governing permissions and limitations under the License.
aMUSEd was introduced in [aMUSEd: An Open MUSE Reproduction](https://huggingface.co/papers/2401.01808) by Suraj Patil, William Berman, Robin Rombach, and Patrick von Platen.
Amused is a lightweight text to image model based off of the [MUSE](https://arxiv.org/abs/2301.00704) architecture. Amused is particularly useful in applications that require a lightweight and fast model such as generating many images quickly at once.
Amused is a lightweight text to image model based off of the [MUSE](https://huggingface.co/papers/2301.00704) architecture. Amused is particularly useful in applications that require a lightweight and fast model such as generating many images quickly at once.
Amused is a vqvae token based transformer that can generate an image in fewer forward passes than many diffusion models. In contrast with muse, it uses the smaller text encoder CLIP-L/14 instead of t5-xxl. Due to its small parameter count and few forward pass generation process, amused can generate many images quickly. This benefit is seen particularly at larger batch sizes.

View File

@@ -1,4 +1,4 @@
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
<!--Copyright 2025 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
@@ -12,9 +12,13 @@ specific language governing permissions and limitations under the License.
# Text-to-Video Generation with AnimateDiff
<div class="flex flex-wrap space-x-1">
<img alt="LoRA" src="https://img.shields.io/badge/LoRA-d8b4fe?style=flat"/>
</div>
## Overview
[AnimateDiff: Animate Your Personalized Text-to-Image Diffusion Models without Specific Tuning](https://arxiv.org/abs/2307.04725) by Yuwei Guo, Ceyuan Yang, Anyi Rao, Yaohui Wang, Yu Qiao, Dahua Lin, Bo Dai.
[AnimateDiff: Animate Your Personalized Text-to-Image Diffusion Models without Specific Tuning](https://huggingface.co/papers/2307.04725) by Yuwei Guo, Ceyuan Yang, Anyi Rao, Yaohui Wang, Yu Qiao, Dahua Lin, Bo Dai.
The abstract of the paper is the following:
@@ -183,7 +187,7 @@ Here are some sample outputs:
### AnimateDiffSparseControlNetPipeline
[SparseCtrl: Adding Sparse Controls to Text-to-Video Diffusion Models](https://arxiv.org/abs/2311.16933) for achieving controlled generation in text-to-video diffusion models by Yuwei Guo, Ceyuan Yang, Anyi Rao, Maneesh Agrawala, Dahua Lin, and Bo Dai.
[SparseCtrl: Adding Sparse Controls to Text-to-Video Diffusion Models](https://huggingface.co/papers/2311.16933) for achieving controlled generation in text-to-video diffusion models by Yuwei Guo, Ceyuan Yang, Anyi Rao, Maneesh Agrawala, Dahua Lin, and Bo Dai.
The abstract from the paper is:
@@ -747,7 +751,7 @@ export_to_gif(frames, "animation.gif")
## Using FreeInit
[FreeInit: Bridging Initialization Gap in Video Diffusion Models](https://arxiv.org/abs/2312.07537) by Tianxing Wu, Chenyang Si, Yuming Jiang, Ziqi Huang, Ziwei Liu.
[FreeInit: Bridging Initialization Gap in Video Diffusion Models](https://huggingface.co/papers/2312.07537) by Tianxing Wu, Chenyang Si, Yuming Jiang, Ziqi Huang, Ziwei Liu.
FreeInit is an effective method that improves temporal consistency and overall quality of videos generated using video-diffusion-models without any addition training. It can be applied to AnimateDiff, ModelScope, VideoCrafter and various other video generation models seamlessly at inference time, and works by iteratively refining the latent-initialization noise. More details can be found it the paper.
@@ -803,7 +807,7 @@ FreeInit is not really free - the improved quality comes at the cost of extra co
<Tip>
Make sure to check out the Schedulers [guide](../../using-diffusers/schedulers) to learn how to explore the tradeoff between scheduler speed and quality, and see the [reuse components across pipelines](../../using-diffusers/loading#reuse-components-across-pipelines) section to learn how to efficiently load the same components into multiple pipelines.
Make sure to check out the Schedulers [guide](../../using-diffusers/schedulers) to learn how to explore the tradeoff between scheduler speed and quality, and see the [reuse components across pipelines](../../using-diffusers/loading#reuse-a-pipeline) section to learn how to efficiently load the same components into multiple pipelines.
</Tip>
@@ -916,7 +920,7 @@ export_to_gif(frames, "animatelcm-motion-lora.gif")
## Using FreeNoise
[FreeNoise: Tuning-Free Longer Video Diffusion via Noise Rescheduling](https://arxiv.org/abs/2310.15169) by Haonan Qiu, Menghan Xia, Yong Zhang, Yingqing He, Xintao Wang, Ying Shan, Ziwei Liu.
[FreeNoise: Tuning-Free Longer Video Diffusion via Noise Rescheduling](https://huggingface.co/papers/2310.15169) by Haonan Qiu, Menghan Xia, Yong Zhang, Yingqing He, Xintao Wang, Ying Shan, Ziwei Liu.
FreeNoise is a sampling mechanism that can generate longer videos with short-video generation models by employing noise-rescheduling, temporal attention over sliding windows, and weighted averaging of latent frames. It also can be used with multiple prompts to allow for interpolated video generations. More details are available in the paper.
@@ -962,7 +966,7 @@ pipe.to("cuda")
prompt = {
0: "A caterpillar on a leaf, high quality, photorealistic",
40: "A caterpillar transforming into a cocoon, on a leaf, near flowers, photorealistic",
80: "A cocoon on a leaf, flowers in the backgrond, photorealistic",
80: "A cocoon on a leaf, flowers in the background, photorealistic",
120: "A cocoon maturing and a butterfly being born, flowers and leaves visible in the background, photorealistic",
160: "A beautiful butterfly, vibrant colors, sitting on a leaf, flowers in the background, photorealistic",
200: "A beautiful butterfly, flying away in a forest, photorealistic",

View File

@@ -1,4 +1,4 @@
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
<!--Copyright 2025 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
@@ -22,7 +22,7 @@ You can find additional information about Attend-and-Excite on the [project page
<Tip>
Make sure to check out the Schedulers [guide](../../using-diffusers/schedulers) to learn how to explore the tradeoff between scheduler speed and quality, and see the [reuse components across pipelines](../../using-diffusers/loading#reuse-components-across-pipelines) section to learn how to efficiently load the same components into multiple pipelines.
Make sure to check out the Schedulers [guide](../../using-diffusers/schedulers) to learn how to explore the tradeoff between scheduler speed and quality, and see the [reuse components across pipelines](../../using-diffusers/loading#reuse-a-pipeline) section to learn how to efficiently load the same components into multiple pipelines.
</Tip>

View File

@@ -1,4 +1,4 @@
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
<!--Copyright 2025 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
@@ -37,7 +37,7 @@ During inference:
<Tip>
Make sure to check out the Schedulers [guide](../../using-diffusers/schedulers) to learn how to explore the tradeoff between scheduler speed and quality, and see the [reuse components across pipelines](../../using-diffusers/loading#reuse-components-across-pipelines) section to learn how to efficiently load the same components into multiple pipelines.
Make sure to check out the Schedulers [guide](../../using-diffusers/schedulers) to learn how to explore the tradeoff between scheduler speed and quality, and see the [reuse components across pipelines](../../using-diffusers/loading#reuse-a-pipeline) section to learn how to efficiently load the same components into multiple pipelines.
</Tip>

Some files were not shown because too many files have changed in this diff Show More