* gracefully error out when attn-backend x cp combo isn't supported.
* Revert "gracefully error out when attn-backend x cp combo isn't supported."
This reverts commit c8abb5d7c0.
* gracefully error out when attn-backend x cp combo isn't supported.
* up
* address PR feedback.
* up
* Update src/diffusers/models/modeling_utils.py
Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>
* dot.
---------
Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>
* LTX 2 transformer single file support
* LTX 2 video VAE single file support
* LTX 2 audio VAE single file support
* Make it easier to distinguish LTX 1 and 2 models
* Improve incorrect LoRA format error message
* Add flag in PeftLoraLoaderMixinTests to disable text encoder LoRA tests
* Apply changes to LTX2LoraTests
* Further improve incorrect LoRA format error msg following review
---------
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* Fix QwenImage txt_seq_lens handling
* formatting
* formatting
* remove txt_seq_lens and use bool mask
* use compute_text_seq_len_from_mask
* add seq_lens to dispatch_attention_fn
* use joint_seq_lens
* remove unused index_block
* WIP: Remove seq_lens parameter and use mask-based approach
- Remove seq_lens parameter from dispatch_attention_fn
- Update varlen backends to extract seqlens from masks
- Update QwenImage to pass 2D joint_attention_mask
- Fix native backend to handle 2D boolean masks
- Fix sage_varlen seqlens_q to match seqlens_k for self-attention
Note: sage_varlen still producing black images, needs further investigation
* fix formatting
* undo sage changes
* xformers support
* hub fix
* fix torch compile issues
* fix tests
* use _prepare_attn_mask_native
* proper deprecation notice
* add deprecate to txt_seq_lens
* Update src/diffusers/models/transformers/transformer_qwenimage.py
Co-authored-by: YiYi Xu <yixu310@gmail.com>
* Update src/diffusers/models/transformers/transformer_qwenimage.py
Co-authored-by: YiYi Xu <yixu310@gmail.com>
* Only create the mask if there's actual padding
* fix order of docstrings
* Adds performance benchmarks and optimization details for QwenImage
Enhances documentation with comprehensive performance insights for QwenImage pipeline:
* rope_text_seq_len = text_seq_len
* rename to max_txt_seq_len
* removed deprecated args
* undo unrelated change
* Updates QwenImage performance documentation
Removes detailed attention backend benchmarks and simplifies torch.compile performance description
Focuses on key performance improvement with torch.compile, highlighting the specific speedup from 4.70s to 1.93s on an A100 GPU
Streamlines the documentation to provide more concise and actionable performance insights
* Updates deprecation warnings for txt_seq_lens parameter
Extends deprecation timeline for txt_seq_lens from version 0.37.0 to 0.39.0 across multiple Qwen image-related models
Adds a new unit test to verify the deprecation warning behavior for the txt_seq_lens parameter
* fix compile
* formatting
* fix compile tests
* rename helper
* remove duplicate
* smaller values
* removed
* use torch.cond for torch compile
* Construct joint attention mask once
* test different backends
* construct joint attention mask once to avoid reconstructing in every block
* Update src/diffusers/models/attention_dispatch.py
Co-authored-by: YiYi Xu <yixu310@gmail.com>
* formatting
* raising an error from the EditPlus pipeline when batch_size > 1
---------
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: YiYi Xu <yixu310@gmail.com>
Co-authored-by: cdutr <dutra_carlos@hotmail.com>
* Basic implementation of request scheduling
* Basic editing in SD and Flux Pipelines
* Small Fix
* Fix
* Update for more pipelines
* Add examples/server-async
* Add examples/server-async
* Updated RequestScopedPipeline to handle a single tokenizer lock to avoid race conditions
* Fix
* Fix _TokenizerLockWrapper
* Fix _TokenizerLockWrapper
* Delete _TokenizerLockWrapper
* Fix tokenizer
* Update examples/server-async
* Fix server-async
* Optimizations in examples/server-async
* We keep the implementation simple in examples/server-async
* Update examples/server-async/README.md
* Update examples/server-async/README.md for changes to tokenizer locks and backward-compatible retrieve_timesteps
* The changes to the diffusers core have been undone and all logic is being moved to exmaples/server-async
* Update examples/server-async/utils/*
* Fix BaseAsyncScheduler
* Rollback in the core of the diffusers
* Update examples/server-async/README.md
* Complete rollback of diffusers core files
* Simple implementation of an asynchronous server compatible with SD3-3.5 and Flux Pipelines
* Update examples/server-async/README.md
* Fixed import errors in 'examples/server-async/serverasync.py'
* Flux Pipeline Discard
* Update examples/server-async/README.md
* Apply style fixes
* Add thread-safe wrappers for components in pipeline
Refactor requestscopedpipeline.py to add thread-safe wrappers for tokenizer, VAE, and image processor. Introduce locking mechanisms to ensure thread safety during concurrent access.
* Add wrappers.py
* Apply style fixes
---------
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Store vae.config.scaling_factor to prevent missing attr reference
In sdxl advanced dreambooth training script
vae.config.scaling_factor becomes inaccessible after: del vae
when: --cache_latents, and no --validation_prompt
Co-authored-by: Teriks <Teriks@users.noreply.github.com>
Co-authored-by: Linoy Tsaban <57615435+linoytsaban@users.noreply.github.com>
Co-authored-by: YiYi Xu <yixu310@gmail.com>
* 3 files
* add conditoinal pipeline
* refactor qwen modular
* add layered
* up up
* u p
* add to import
* more refacotr, make layer work
* clean up a bit git add src
* more
* style
* style
* Adding torchao version guard for floatx usage
Summary: TorchAO removing floatx support, added version guard in quantization_config.py
* Adding torchao version guard for floatx usage
Summary: TorchAO removing floatx support, added version guard in quantization_config.py
Altered tests in test_torchao.py to version guard floatx
Created new test to verify version guard of floatx support
* Adding torchao version guard for floatx usage
Summary: TorchAO removing floatx support, added version guard in quantization_config.py
Altered tests in test_torchao.py to version guard floatx
Created new test to verify version guard of floatx support
* Adding torchao version guard for floatx usage
Summary: TorchAO removing floatx support, added version guard in quantization_config.py
Altered tests in test_torchao.py to version guard floatx
Created new test to verify version guard of floatx support
* Adding torchao version guard for floatx usage
Summary: TorchAO removing floatx support, added version guard in quantization_config.py
Altered tests in test_torchao.py to version guard floatx
Created new test to verify version guard of floatx support
* Adding torchao version guard for floatx usage
Summary: TorchAO removing floatx support, added version guard in quantization_config.py
Altered tests in test_torchao.py to version guard floatx
Created new test to verify version guard of floatx support
---------
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* Initial LTX 2.0 transformer implementation
* Add tests for LTX 2 transformer model
* Get LTX 2 transformer tests working
* Rename LTX 2 compile test class to have LTX2
* Remove RoPE debug print statements
* Get LTX 2 transformer compile tests passing
* Fix LTX 2 transformer shape errors
* Initial script to convert LTX 2 transformer to diffusers
* Add more LTX 2 transformer audio arguments
* Allow LTX 2 transformer to be loaded from local path for conversion
* Improve dummy inputs and add test for LTX 2 transformer consistency
* Fix LTX 2 transformer bugs so consistency test passes
* Initial implementation of LTX 2.0 video VAE
* Explicitly specify temporal and spatial VAE scale factors when converting
* Add initial LTX 2.0 video VAE tests
* Add initial LTX 2.0 video VAE tests (part 2)
* Get diffusers implementation on par with official LTX 2.0 video VAE implementation
* Initial LTX 2.0 vocoder implementation
* Use RMSNorm implementation closer to original for LTX 2.0 video VAE
* start audio decoder.
* init registration.
* up
* simplify and clean up
* up
* Initial LTX 2.0 text encoder implementation
* Rough initial LTX 2.0 pipeline implementation
* up
* up
* up
* up
* Add imports for LTX 2.0 Audio VAE
* Conversion script for LTX 2.0 Audio VAE Decoder
* Add Audio VAE logic to T2V pipeline
* Duplicate scheduler for audio latents
* Support num_videos_per_prompt for prompt embeddings
* LTX 2.0 scheduler and full pipeline conversion
* Add script to test full LTX2Pipeline T2V inference
* Fix pipeline return bugs
* Add LTX 2 text encoder and vocoder to ltx2 subdirectory __init__
* Fix more bugs in LTX2Pipeline.__call__
* Improve CPU offload support
* Fix pipeline audio VAE decoding dtype bug
* Fix video shape error in full pipeline test script
* Get LTX 2 T2V pipeline to produce reasonable outputs
* Make LTX 2.0 scheduler more consistent with original code
* Fix typo when applying scheduler fix in T2V inference script
* Refactor Audio VAE to be simpler and remove helpers (#7)
* remove resolve causality axes stuff.
* remove a bunch of helpers.
* remove adjust output shape helper.
* remove the use of audiolatentshape.
* move normalization and patchify out of pipeline.
* fix
* up
* up
* Remove unpatchify and patchify ops before audio latents denormalization (#9)
---------
Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com>
* Add support for I2V (#8)
* start i2v.
* up
* up
* up
* up
* up
* remove uniform strategy code.
* remove unneeded code.
* Denormalize audio latents in I2V pipeline (analogous to T2V change) (#11)
* test i2v.
* Move Video and Audio Text Encoder Connectors to Transformer (#12)
* Denormalize audio latents in I2V pipeline (analogous to T2V change)
* Initial refactor to put video and audio text encoder connectors in transformer
* Get LTX 2 transformer tests working after connector refactor
* precompute run_connectors,.
* fixes
* Address review comments
* Calculate RoPE double precisions freqs using torch instead of np
* Further simplify LTX 2 RoPE freq calc
* Make connectors a separate module (#18)
* remove text_encoder.py
* address yiyi's comments.
* up
* up
* up
* up
---------
Co-authored-by: sayakpaul <spsayakpaul@gmail.com>
* up (#19)
* address initial feedback from lightricks team (#16)
* cross_attn_timestep_scale_multiplier to 1000
* implement split rope type.
* up
* propagate rope_type to rope embed classes as well.
* up
* When using split RoPE, make sure that the output dtype is same as input dtype
* Fix apply split RoPE shape error when reshaping x to 4D
* Add export_utils file for exporting LTX 2.0 videos with audio
* Tests for T2V and I2V (#6)
* add ltx2 pipeline tests.
* up
* up
* up
* up
* remove content
* style
* Denormalize audio latents in I2V pipeline (analogous to T2V change)
* Initial refactor to put video and audio text encoder connectors in transformer
* Get LTX 2 transformer tests working after connector refactor
* up
* up
* i2v tests.
* up
* Address review comments
* Calculate RoPE double precisions freqs using torch instead of np
* Further simplify LTX 2 RoPE freq calc
* revert unneded changes.
* up
* up
* update to split style rope.
* up
---------
Co-authored-by: Daniel Gu <dgu8957@gmail.com>
* up
* use export util funcs.
* Point original checkpoint to LTX 2.0 official checkpoint
* Allow the I2V pipeline to accept image URLs
* make style and make quality
* remove function map.
* remove args.
* update docs.
* update doc entries.
* disable ltx2_consistency test
* Simplify LTX 2 RoPE forward by removing coords is None logic
* make style and make quality
* Support LTX 2.0 audio VAE encoder
* Apply suggestions from code review
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* Remove print statement in audio VAE
* up
* Fix bug when calculating audio RoPE coords
* Ltx 2 latent upsample pipeline (#12922)
* Initial implementation of LTX 2.0 latent upsampling pipeline
* Add new LTX 2.0 spatial latent upsampler logic
* Add test script for LTX 2.0 latent upsampling
* Add option to enable VAE tiling in upsampling test script
* Get latent upsampler working with video latents
* Fix typo in BlurDownsample
* Add latent upsample pipeline docstring and example
* Remove deprecated pipeline VAE slicing/tiling methods
* make style and make quality
* When returning latents, return unpacked and denormalized latents for T2V and I2V
* Add model_cpu_offload_seq for latent upsampling pipeline
---------
Co-authored-by: Daniel Gu <dgu8957@gmail.com>
* Fix latent upsampler filename in LTX 2 conversion script
* Add latent upsample pipeline to LTX 2 docs
* Add dummy objects for LTX 2 latent upsample pipeline
* Set default FPS to official LTX 2 ckpt default of 24.0
* Set default CFG scale to official LTX 2 ckpt default of 4.0
* Update LTX 2 pipeline example docstrings
* make style and make quality
* Remove LTX 2 test scripts
* Fix LTX 2 upsample pipeline example docstring
* Add logic to convert and save a LTX 2 upsampling pipeline
* Document LTX2VideoTransformer3DModel forward pass
---------
Co-authored-by: sayakpaul <spsayakpaul@gmail.com>
* docs: add comprehensive docstrings and refine type hints for EDM scheduler methods and config parameters.
* refactor: Add type hints to DPM-Solver scheduler methods.
* feat: Add transformer cache context for conditional and unconditional predictions for skyreels-v2 pipes.
* docs: Remove SkyReels-V2 FLF2V model link and add contributor attribution.
* LTX Video 0.9.8 long multi prompt
* Further align comfyui
- Added the “LTXEulerAncestralRFScheduler” scheduler, aligned with [sample_euler_ancestral_RF](7d6103325e/comfy/k_diffusion/sampling.py (L234))
- Updated the LTXI2VLongMultiPromptPipeline.from_pretrained() method:
- Now uses LTXEulerAncestralRFScheduler by default, for better compatibility with the ComfyUI LTXV workflow.
- Changed the default value of cond_strength from 1.0 to 0.5, aligning with ComfyUI’s default.
- Optimized cross-window overlap blending: moved the latent-space guidance injection to before the UNet and after each step, aligned with[KSamplerX0Inpaint]([ComfyUI/comfy/samplers.py at master · comfyanonymous/ComfyUI](https://github.com/comfyanonymous/ComfyUI/blob/master/comfy/samplers.py#L391))
- Adjusted the default value of skip_steps_sigma_threshold to 1.
* align with diffusers contribute rule
* Add new pipelines and update imports
* Enhance LTXI2VLongMultiPromptPipeline with noise rescaling
Refactor LTXI2VLongMultiPromptPipeline to improve documentation and add noise rescaling functionality.
* Clean up comments in scheduling_ltx_euler_ancestral_rf.py
Removed design notes and limitations from the implementation.
* Enhance video generation example with scheduler
Updated LTXI2VLongMultiPromptPipeline example to include LTXEulerAncestralRFScheduler for ComfyUI parity.
* clean up
* style
* copies
* import ltx scheduler
* copies
* fix
* fix more
* up up
* up up up
* up upup
* Apply suggestions from code review
* Update docs/source/en/api/pipelines/ltx_video.md
* Update docs/source/en/api/pipelines/ltx_video.md
---------
Co-authored-by: yiyixuxu <yixu310@gmail.com>
* [Flux.1] improve pos embed for ascend npu by setting it back to npu computation.
* [Flux.2] improve pos embed for ascend npu by setting it back to npu computation.
* [LongCat-Image] improve pos embed for ascend npu by setting it back to npu computation.
* [Ovis-Image] improve pos embed for ascend npu by setting it back to npu computation.
* Remove unused import of is_torch_npu_available
---------
Co-authored-by: zhangtao <zhangtao529@huawei.com>
* Community Pipeline: Add z-image differential img2img
* add pipeline for z-image differential img2img diffusion examples : run make style , make quality, and fix white spaces in example doc string.
---------
Co-authored-by: r4inm4ker <jefri.yeh@gmail.com>
* fix torchao quantizer for new torchao versions
Summary:
`torchao==0.16.0` (not yet released) has some bc-breaking changes, this
PR fixes the diffusers repo with those changes. Specifics on the
changes:
1. `UInt4Tensor` is removed: https://github.com/pytorch/ao/pull/3536
2. old float8 tensors v1 are removed: https://github.com/pytorch/ao/pull/3510
In this PR:
1. move the logger variable up (not sure why it was in the middle of the
file before) to get better error messages
2. gate the old torchao objects by torchao version
Test Plan:
import diffusers objects with new versions of torchao works:
```bash
> python -c "import torchao; print(torchao.__version__); from diffusers import StableDiffusionPipeline"
0.16.0.dev20251229+cu129
```
Reviewers:
Subscribers:
Tasks:
Tags:
* Apply style fixes
---------
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
* Add z-image-omni-base implementation
* Merged into one transformer for Z-Image.
* Fix bugs for controlnet after merging the main branch new feature.
* Fix for auto_pipeline, Add Styling.
* Refactor noise handling and modulation
- Add select_per_token function for per-token value selection
- Separate adaptive modulation logic
- Cleanify t_noisy/clean variable naming
- Move image_noise_mask handler from forward to pipeline
* Styling & Formatting.
* Rewrite code with more non-forward func & clean forward.
1.Change to one forward with shorter code with omni code (None).
2.Split out non-forward funcs: _build_unified_sequence, _prepare_sequence, patchify, pad.
* Styling & Formatting.
* Manual check fix-copies in controlnet, Add select_per_token, _patchify_image, _pad_with_ids; Styling.
* Add Import in pipeline __init__.py.
---------
Co-authored-by: Jerry Qilong Wu <xinglong.wql@alibaba-inc.com>
Co-authored-by: YiYi Xu <yixu310@gmail.com>
Use `T5Tokenizer` instead of `MT5Tokenizer`
Given that the `MT5Tokenizer` in `transformers` is just a "re-export" of
`T5Tokenizer` as per
https://github.com/huggingface/transformers/blob/v4.57.3/src/transformers/models/mt5/tokenization_mt5.py
)on latest available stable Transformers i.e., v4.57.3), this commit
updates the imports to point to `T5Tokenizer` instead, so that those
still work with Transformers v5.0.0rc0 onwards.
* move node registry to mellon
* up
* fix
* modula rpipeline update: filter out none for input_names, fix default blocks for pipe.init() and allow user pass additional kwargs_type in a dict
* qwen modular refactor, unpack before decode
* update mellon node config, adding* to required_inputs and required_model_inputs
* modularpipeline.from_pretrained: error out if no config found
* add a component_names property to modular blocks to be consistent!
* flux image_encoder -> vae_encoder
* controlnet_bundle
* refator MellonNodeConfig MellonPipelineConfig
* refactor & simplify mellon utils
* vae_image_encoder -> vae_encoder
* mellon config save keep key order
* style + copies
* add kwargs input for zimage
* cosmos predict2.5 base: convert chkpt & pipeline
- New scheduler: scheduling_flow_unipc_multistep.py
- Changes to TransformerCosmos for text embeddings via crossattn_proj
* scheduler cleanup
* simplify inference pipeline
* cleanup scheduler + tests
* Basic tests for flow unipc
* working b2b inference
* Rename everything
* Tests for pipeline present, but not working (predict2 also not working)
* docstring update
* wrapper pipelines + make style
* remove unnecessary files
* UniPCMultistep: support use_karras_sigmas=True and use_flow_sigmas=True
* use UniPCMultistepScheduler + fix tests for pipeline
* Remove FlowUniPCMultistepScheduler
* UniPCMultistepScheduler for use_flow_sigmas=True & use_karras_sigmas=True
* num_inference_steps=36 due to bug in scheduler used by predict2.5
* Address comments
* make style + make fix-copies
* fix tests + remove references to old pipelines
* address comments
* add revision in from_pretrained call
* fix tests
* extend TorchAoTest::test_model_memory_usage to other platform
Signe-off-by: Wang, Yi <yi.a.wang@inel.com>
* add some comments
Signed-off-by: Wang, Yi <yi.a.wang@intel.com>
---------
Signed-off-by: Wang, Yi <yi.a.wang@intel.com>
fix pytest tests/pipelines/pixart_sigma/test_pixart.py::PixArtSigmaPipelineIntegrationTests::test_pixart_512 in xpu
Signed-off-by: Wang, Yi <yi.a.wang@intel.com>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* Add ZImageImg2ImgPipeline
Updated the pipeline structure to include ZImageImg2ImgPipeline
alongside ZImagePipeline.
Implemented the ZImageImg2ImgPipeline class for image-to-image
transformations, including necessary methods for
encoding prompts, preparing latents, and denoising.
Enhanced the auto_pipeline to map the new ZImageImg2ImgPipeline
for image generation tasks.
Added unit tests for ZImageImg2ImgPipeline to ensure
functionality and performance.
Updated dummy objects to include ZImageImg2ImgPipeline for
testing purposes.
* Address review comments for ZImageImg2ImgPipeline
- Add `# Copied from` annotations to encode_prompt and _encode_prompt
- Add ZImagePipeline to auto_pipeline.py for AutoPipeline support
* Add ZImage pipeline documentation
---------
Co-authored-by: YiYi Xu <yixu310@gmail.com>
Co-authored-by: Álvaro Somoza <asomoza@users.noreply.github.com>
* feat: Add `flow_prediction` to `prediction_type`, introduce `use_flow_sigmas`, `flow_shift`, `use_dynamic_shifting`, and `time_shift_type` parameters, and refine type hints for various arguments.
* style: reformat argument wrapping in `_convert_to_beta` and `index_for_timestep` method signatures.
* fix: group offloading to support standalone computational layers in block-level offloading
* test: for models with standalone and deeply nested layers in block-level offloading
* feat: support for block-level offloading in group offloading config
* fix: group offload block modules to AutoencoderKL and AutoencoderKLWan
* fix: update group offloading tests to use AutoencoderKL and adjust input dimensions
* refactor: streamline block offloading logic
* Apply style fixes
* update tests
* update
* fix for failing tests
* clean up
* revert to use skip_keys
* clean up
---------
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>
* start zimage model tests.
* up
* up
* up
* up
* up
* up
* up
* up
* up
* up
* up
* up
* Revert "up"
This reverts commit bca3e27c96.
* expand upon compilation failure reason.
* Update tests/models/transformers/test_models_transformer_z_image.py
Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com>
* reinitialize the padding tokens to ones to prevent NaN problems.
* updates
* up
* skipping ZImage DiT tests
* up
* up
---------
Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com>
* Fix(peft): Re-apply group offloading after deleting adapters
* Test: Add regression test for group offloading + delete_adapters
* Test: Add assertions to verify output changes after deletion
* Test: Add try/finally to clean up group offloading hooks
---------
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* Fixes#12673.
Wrong default_stream is used. leading to wrong execution order when record_steram is enabled.
* update
* Update test
---------
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* Refactor image padding logic to pervent zero tensor in transformer_z_image.py
* Apply style fixes
* Add more support to fix repeat bug on tpu devices.
* Fix for dynamo compile error for multi if-branches.
---------
Co-authored-by: Mingjia Li <mingjiali@tju.edu.cn>
Co-authored-by: Mingjia Li <mail@mingjia.li>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
* Add ZImage LoRA support and integrate into ZImagePipeline
* Add LoRA test for Z-Image
* Move the LoRA test
* Fix ZImage LoRA scale support and test configuration
* Add ZImage LoRA test overrides for architecture differences
- Override test_lora_fuse_nan to use ZImage's 'layers' attribute
instead of 'transformer_blocks'
- Skip block-level LoRA scaling test (not supported in ZImage)
- Add required imports: numpy, torch_device, check_if_lora_correctly_set
* Add ZImageLoraLoaderMixin to LoRA documentation
* Use conditional import for peft.LoraConfig in ZImage tests
* Override test_correct_lora_configs_with_different_ranks for ZImage
ZImage uses 'attention.to_k' naming convention instead of 'attn.to_k',
so the base test's module name search loop never finds a match. This
override uses the correct naming pattern for ZImage architecture.
* Add is_flaky decorator to ZImage LoRA tests initialise padding tokens
* Skip ZImage LoRA test class entirely
Skip the entire ZImageLoRATests class due to non-deterministic behavior
from complex64 RoPE operations and torch.empty padding tokens.
LoRA functionality works correctly with real models.
Clean up removed:
- Individual @unittest.skip decorators
- @is_flaky decorator overrides for inherited methods
- Custom test method overrides
- Global torch deterministic settings
- Unused imports (numpy, is_flaky, check_if_lora_correctly_set)
---------
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: Álvaro Somoza <asomoza@users.noreply.github.com>
* Fix examples not loading LoRA adapter weights from checkpoint
* Updated lora saving logic with accelerate save_model_hook and load_model_hook
* Formatted the changes using ruff
* import and upcasting changed
---------
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* Add Support for Z-Image.
* Reformatting with make style, black & isort.
* Remove init, Modify import utils, Merge forward in transformers block, Remove once func in pipeline.
* modified main model forward, freqs_cis left
* refactored to add B dim
* fixed stack issue
* fixed modulation bug
* fixed modulation bug
* fix bug
* remove value_from_time_aware_config
* styling
* Fix neg embed and devide / bug; Reuse pad zero tensor; Turn cat -> repeat; Add hint for attn processor.
* Replace padding with pad_sequence; Add gradient checkpointing.
* Fix flash_attn3 in dispatch attn backend by _flash_attn_forward, replace its origin implement; Add DocString in pipeline for that.
* Fix Docstring and Make Style.
* Revert "Fix flash_attn3 in dispatch attn backend by _flash_attn_forward, replace its origin implement; Add DocString in pipeline for that."
This reverts commit fbf26b7ed1.
* update z-image docstring
* Revert attention dispatcher
* update z-image docstring
* styling
* Recover attention_dispatch.py with its origin impl, later would special commit for fa3 compatibility.
* Fix prev bug, and support for prompt_embeds pass in args after prompt pre-encode as List of torch Tensor.
* Remove einop dependency.
* remove redundant imports & make fix-copies
* fix import
* Support for num_images_per_prompt>1; Remove redundant unquote variables.
* Fix bugs for num_images_per_prompt with actual batch.
* Add unit tests for Z-Image.
* Refine unitest and skip for cases needed separate test env; Fix compatibility with unitest in model, mostly precision formating.
* Add clean env for test_save_load_float16 separ test; Add Note; Styling.
* Update dtype mentioned by yiyi.
---------
Co-authored-by: liudongyang <liudongyang0114@gmail.com>
* add vae
* Initial commit for Flux 2 Transformer implementation
* add pipeline part
* small edits to the pipeline and conversion
* update conversion script
* fix
* up up
* finish pipeline
* Remove Flux IP Adapter logic for now
* Remove deprecated 3D id logic
* Remove ControlNet logic for now
* Add link to ViT-22B paper as reference for parallel transformer blocks such as the Flux 2 single stream block
* update pipeline
* Don't use biases for input projs and output AdaNorm
* up
* Remove bias for double stream block text QKV projections
* Add script to convert Flux 2 transformer to diffusers
* make style and make quality
* fix a few things.
* allow sft files to go.
* fix image processor
* fix batch
* style a bit
* Fix some bugs in Flux 2 transformer implementation
* Fix dummy input preparation and fix some test bugs
* fix dtype casting in timestep guidance module.
* resolve conflicts.,
* remove ip adapter stuff.
* Fix Flux 2 transformer consistency test
* Fix bug in Flux2TransformerBlock (double stream block)
* Get remaining Flux 2 transformer tests passing
* make style; make quality; make fix-copies
* remove stuff.
* fix type annotaton.
* remove unneeded stuff from tests
* tests
* up
* up
* add sf support
* Remove unused IP Adapter and ControlNet logic from transformer (#9)
* copied from
* Apply suggestions from code review
Co-authored-by: YiYi Xu <yixu310@gmail.com>
Co-authored-by: apolinário <joaopaulo.passos@gmail.com>
* up
* up
* up
* up
* up
* Refactor Flux2Attention into separate classes for double stream and single stream attention
* Add _supports_qkv_fusion to AttentionModuleMixin to allow subclasses to disable QKV fusion
* Have Flux2ParallelSelfAttention inherit from AttentionModuleMixin with _supports_qkv_fusion=False
* Log debug message when calling fuse_projections on a AttentionModuleMixin subclass that does not support QKV fusion
* Address review comments
* Update src/diffusers/pipelines/flux2/pipeline_flux2.py
Co-authored-by: YiYi Xu <yixu310@gmail.com>
* up
* Remove maybe_allow_in_graph decorators for Flux 2 transformer blocks (#12)
* up
* support ostris loras. (#13)
* up
* update schdule
* up
* up (#17)
* add training scripts (#16)
* add training scripts
Co-authored-by: Linoy Tsaban <linoytsaban@gmail.com>
* model cpu offload in validation.
* add flux.2 readme
* add img2img and tests
* cpu offload in log validation
* Apply suggestions from code review
* fix
* up
* fixes
* remove i2i training tests for now.
---------
Co-authored-by: Linoy Tsaban <linoytsaban@gmail.com>
Co-authored-by: linoytsaban <linoy@huggingface.co>
* up
---------
Co-authored-by: yiyixuxu <yixu310@gmail.com>
Co-authored-by: Daniel Gu <dgu8957@gmail.com>
Co-authored-by: yiyi@huggingface.co <yiyi@ip-10-53-87-203.ec2.internal>
Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com>
Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>
Co-authored-by: apolinário <joaopaulo.passos@gmail.com>
Co-authored-by: yiyi@huggingface.co <yiyi@ip-26-0-160-103.ec2.internal>
Co-authored-by: Linoy Tsaban <linoytsaban@gmail.com>
Co-authored-by: linoytsaban <linoy@huggingface.co>
* Add Support for Z-Image.
* Reformatting with make style, black & isort.
* Remove init, Modify import utils, Merge forward in transformers block, Remove once func in pipeline.
* modified main model forward, freqs_cis left
* refactored to add B dim
* fixed stack issue
* fixed modulation bug
* fixed modulation bug
* fix bug
* remove value_from_time_aware_config
* styling
* Fix neg embed and devide / bug; Reuse pad zero tensor; Turn cat -> repeat; Add hint for attn processor.
* Replace padding with pad_sequence; Add gradient checkpointing.
* Fix flash_attn3 in dispatch attn backend by _flash_attn_forward, replace its origin implement; Add DocString in pipeline for that.
* Fix Docstring and Make Style.
* Revert "Fix flash_attn3 in dispatch attn backend by _flash_attn_forward, replace its origin implement; Add DocString in pipeline for that."
This reverts commit fbf26b7ed1.
* update z-image docstring
* Revert attention dispatcher
* update z-image docstring
* styling
* Recover attention_dispatch.py with its origin impl, later would special commit for fa3 compatibility.
* Fix prev bug, and support for prompt_embeds pass in args after prompt pre-encode as List of torch Tensor.
* Remove einop dependency.
* remove redundant imports & make fix-copies
* fix import
---------
Co-authored-by: liudongyang <liudongyang0114@gmail.com>
* Updates Portuguese documentation for Diffusers library
Enhances the Portuguese documentation with:
- Restructured table of contents for improved navigation
- Added placeholder page for in-translation content
- Refined language and improved readability in existing pages
- Introduced a new page on basic Stable Diffusion performance guidance
Improves overall documentation structure and user experience for Portuguese-speaking users
* Removes untranslated sections from Portuguese documentation
Cleans up the Portuguese documentation table of contents by removing placeholder sections marked as "Em tradução" (In translation)
Removes the in_translation.md file and associated table of contents entries for sections that are not yet translated, improving documentation clarity
* Enhance type hints and docstrings in LMSDiscreteScheduler class
Updated type hints for function parameters and return types to improve code clarity and maintainability. Enhanced docstrings for several methods, providing clearer descriptions of their functionality and expected arguments. Notable changes include specifying Literal types for certain parameters and ensuring consistent return type annotations across the class.
* docs: Add specific paper reference to `_convert_to_karras` docstring.
* Refactor `_convert_to_karras` docstring in DPMSolverSDEScheduler to include detailed descriptions and a specific paper reference, enhancing clarity and documentation consistency.
* Enhance docstrings and type hints in PNDMScheduler class
- Updated parameter descriptions to include default values and specific types using Literal for better clarity.
- Improved docstring formatting and consistency across methods, including detailed explanations for the `_get_prev_sample` method.
- Added type hints for method return types to enhance code readability and maintainability.
* Refactor docstring in PNDMScheduler class to enhance clarity
- Simplified the explanation of the method for computing the previous sample from the current sample.
- Updated the reference to the PNDM paper for better accessibility.
- Removed redundant notation explanations to streamline the documentation.
* Update the Wan Animate docs to reflect the most recent code
* Further explain input preprocessing and link to original Wan Animate preprocessing scripts
* refactor: enhance type hints and documentation in EulerDiscreteScheduler
Updated type hints for function parameters and return types in the EulerDiscreteScheduler class to improve code clarity and maintainability. Enhanced docstrings for several methods to provide clearer descriptions of their functionality and expected arguments. This includes specifying Literal types for certain parameters and ensuring consistent return type annotations across the class.
* refactor: enhance type hints and documentation across multiple schedulers
Updated type hints and improved docstrings in various scheduler classes, including CMStochasticIterativeScheduler, CosineDPMSolverMultistepScheduler, and others. This includes specifying parameter types, return types, and providing clearer descriptions of method functionalities. Notable changes include the addition of default values in the begin_index argument and enhanced explanations for noise addition methods. These improvements aim to enhance code clarity and maintainability across the scheduling module.
* refactor: update docstrings to clarify noise schedule construction
Revised docstrings across multiple scheduler classes to enhance clarity regarding the construction of noise schedules. Updated references to relevant papers, ensuring accurate citations for the methodologies used. This includes changes in DEISMultistepScheduler, DPMSolverMultistepInverseScheduler, and others, improving documentation consistency and readability.
* Enhance type hints and docstrings in scheduling_ddpm.py
- Added type hints for function parameters and return types across the DDPMScheduler class and related functions.
- Improved docstrings for clarity, including detailed descriptions of parameters and return values.
- Updated the alpha_transform_type and beta_schedule parameters to use Literal types for better type safety.
- Refined the _get_variance and previous_timestep methods with comprehensive documentation.
* Refactor docstrings and type hints in scheduling_ddpm.py
- Cleaned up whitespace in the rescale_zero_terminal_snr function.
- Enhanced the variance_type parameter in the DDPMScheduler class with improved formatting for better readability.
- Updated the docstring for the compute_variance method to maintain consistency and clarity in parameter descriptions and return values.
* Apply `make fix-copies`
* Refactor type hints across multiple scheduler files
- Updated type hints to include `Literal` for improved type safety in various scheduling files.
- Ensured consistency in type hinting for parameters and return types across the affected modules.
- This change enhances code clarity and maintainability.
* Improve docstrings and type hints in scheduling_ddim.py
- Add complete type hints for all function parameters
- Enhance docstrings to follow project conventions
- Add missing parameter descriptions
Fixes#9567
* Enhance docstrings and type hints in scheduling_ddim.py
- Update parameter types and descriptions for clarity
- Improve explanations in method docstrings to align with project standards
- Add optional annotations for parameters where applicable
* Refine type hints and docstrings in scheduling_ddim.py
- Update parameter types to use Literal for specific string options
- Enhance docstring descriptions for clarity and consistency
- Ensure all parameters have appropriate type annotations and defaults
* Apply review feedback on scheduling_ddim.py
- Replace "prevent singularities" with "avoid numerical instability" for better clarity
- Add backticks around `alpha_bar` variable name for consistent formatting
- Convert Imagen Video paper URLs to Hugging Face papers references
* Propagate changes using 'make fix-copies'
* Add missing Literal
* Improve docstrings and type hints in scheduling_amused.py
- Add complete type hints for helper functions (gumbel_noise, mask_by_random_topk)
- Enhance AmusedSchedulerOutput with proper Optional typing
- Add comprehensive docstrings for AmusedScheduler class
- Improve __init__, set_timesteps, step, and add_noise methods
- Fix type hints to match documentation conventions
- All changes follow project standards from issue #9567
* Enhance type hints and docstrings in scheduling_amused.py
- Update type hints for `prev_sample` and `pred_original_sample` in `AmusedSchedulerOutput` to reflect their tensor types.
- Improve docstring for `gumbel_noise` to specify the output tensor's dtype and device.
- Refine `AmusedScheduler` class documentation, including detailed descriptions of the masking schedule and temperature parameters.
- Adjust type hints in `set_timesteps` and `step` methods for better clarity and consistency.
* Apply review feedback on scheduling_amused.py
- Replace generic [Amused] reference with specific [`AmusedPipeline`] reference for consistency with project documentation conventions
* add tests for qwenimage modular.
* qwenimage edit.
* qwenimage edit plus.
* empty
* align with the latest structure
* up
* up
* reason
* up
* fix multiple issues.
* up
* up
* fix
* up
* make it similar to the original pipeline.
* Fix rotary positional embedding dimension mismatch in Wan and SkyReels V2 transformers
- Store t_dim, h_dim, w_dim as instance variables in WanRotaryPosEmbed and SkyReelsV2RotaryPosEmbed __init__
- Use stored dimensions in forward() instead of recalculating with different formula
- Fixes inconsistency between init (using // 6) and forward (using // 3)
- Ensures split_sizes matches the dimensions used to create rotary embeddings
* quality fix
---------
Co-authored-by: Charchit Sharma <charchitsharma@A-267.local>
* Fix: update type hints for Tuple parameters across multiple files to support variable-length tuples
* Apply style fixes
---------
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
The function load_model_dict_into_meta was moved from modeling_utils.py to
model_loading_utils.py but the imports in the conversion scripts were not
updated, causing ImportError when running these scripts.
This fixes the import in 6 conversion scripts:
- scripts/convert_sd3_to_diffusers.py
- scripts/convert_stable_cascade_lite.py
- scripts/convert_stable_cascade.py
- scripts/convert_stable_audio.py
- scripts/convert_sana_to_diffusers.py
- scripts/convert_sana_controlnet_to_diffusers.py
Fixes#12606
* Fix overflow in rgblike_to_depthmap by safe dtype casting (torch & NumPy)
* Fix: store original dtype and cast back after safe computation
* Apply style fixes
---------
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
* Changing the way we infer dtype to avoid force evaluation of lazy tensors
* changing way to infer dtype to ensure type consistency
* more robust infering of dtype
* removing the upscale dtype entirely
* Bria FIBO pipeline
* style fixs
* fix CR
* Refactor BriaFibo classes and update pipeline parameters
- Updated BriaFiboAttnProcessor and BriaFiboAttention classes to reflect changes from Flux equivalents.
- Modified the _unpack_latents method in BriaFiboPipeline to improve clarity.
- Increased the default max_sequence_length to 3000 and added a new optional parameter do_patching.
- Cleaned up test_pipeline_bria_fibo.py by removing unused imports and skipping unsupported tests.
* edit the docs of FIBO
* Remove unused BriaFibo imports and update CPU offload method in BriaFiboPipeline
* Refactor FIBO classes to BriaFibo naming convention
- Updated class names from FIBO to BriaFibo for consistency across the module.
- Modified instances of FIBOEmbedND, FIBOTimesteps, TextProjection, and TimestepProjEmbeddings to reflect the new naming.
- Ensured all references in the BriaFiboTransformer2DModel are updated accordingly.
* Add BriaFiboTransformer2DModel import to transformers module
* Remove unused BriaFibo imports from modular pipelines and add BriaFiboTransformer2DModel and BriaFiboPipeline classes to dummy objects for enhanced compatibility with torch and transformers.
* Update BriaFibo classes with copied documentation and fix import typo in pipeline module
- Added documentation comments indicating the source of copied code in BriaFiboTransformerBlock and _pack_latents methods.
- Corrected the import statement for BriaFiboPipeline in the pipelines module.
* Remove unused BriaFibo imports from __init__.py to streamline modular pipelines.
* Refactor documentation comments in BriaFibo classes to indicate inspiration from existing implementations
- Updated comments in BriaFiboAttnProcessor, BriaFiboAttention, and BriaFiboPipeline to reflect that the code is inspired by other modules rather than copied.
- Enhanced clarity on the origins of the methods to maintain proper attribution.
* change Inspired by to Based on
* add reference link and fix trailing whitespace
* Add BriaFiboTransformer2DModel documentation and update comments in BriaFibo classes
- Introduced a new documentation file for BriaFiboTransformer2DModel.
- Updated comments in BriaFiboAttnProcessor, BriaFiboAttention, and BriaFiboPipeline to clarify the origins of the code, indicating copied sources for better attribution.
---------
Co-authored-by: sayakpaul <spsayakpaul@gmail.com>
* rename photon to prx
* rename photon into prx
* Revert .gitignore to state before commit b7fb0fe9d6
* rename photon to prx
* rename photon into prx
* Revert .gitignore to state before commit b7fb0fe9d6
* make fix-copies
* purge HF_HUB_ENABLE_HF_TRANSFER; promote Xet
* purge HF_HUB_ENABLE_HF_TRANSFER; promote Xet x2
* restrict docker build test to the ones we actually use in CI.
---------
Co-authored-by: YiYi Xu <yixu310@gmail.com>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* Add Photon model and pipeline support
This commit adds support for the Photon image generation model:
- PhotonTransformer2DModel: Core transformer architecture
- PhotonPipeline: Text-to-image generation pipeline
- Attention processor updates for Photon-specific attention mechanism
- Conversion script for loading Photon checkpoints
- Documentation and tests
* just store the T5Gemma encoder
* enhance_vae_properties if vae is provided only
* remove autocast for text encoder forwad
* BF16 example
* conditioned CFG
* remove enhance vae and use vae.config directly when possible
* move PhotonAttnProcessor2_0 in transformer_photon
* remove einops dependency and now inherits from AttentionMixin
* unify the structure of the forward block
* update doc
* update doc
* fix T5Gemma loading from hub
* fix timestep shift
* remove lora support from doc
* Rename EmbedND for PhotoEmbedND
* remove modulation dataclass
* put _attn_forward and _ffn_forward logic in PhotonBlock's forward
* renam LastLayer for FinalLayer
* remove lora related code
* rename vae_spatial_compression_ratio for vae_scale_factor
* support prompt_embeds in call
* move xattention conditionning out computation out of the denoising loop
* add negative prompts
* Use _import_structure for lazy loading
* make quality + style
* add pipeline test + corresponding fixes
* utility function that determines the default resolution given the VAE
* Refactor PhotonAttention to match Flux pattern
* built-in RMSNorm
* Revert accidental .gitignore change
* parameter names match the standard diffusers conventions
* renaming and remove unecessary attributes setting
* Update docs/source/en/api/pipelines/photon.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* quantization example
* added doc to toctree
* Update docs/source/en/api/pipelines/photon.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/api/pipelines/photon.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/api/pipelines/photon.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* use dispatch_attention_fn for multiple attention backend support
* naming changes
* make fix copy
* Update docs/source/en/api/pipelines/photon.md
Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com>
* Add PhotonTransformer2DModel to TYPE_CHECKING imports
* make fix-copies
* Use Tuple instead of tuple
Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com>
* restrict the version of transformers
Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com>
* Update tests/pipelines/photon/test_pipeline_photon.py
Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com>
* Update tests/pipelines/photon/test_pipeline_photon.py
Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com>
* change | for Optional
* fix nits.
* use typing Dict
---------
Co-authored-by: davidb <davidb@worker-10.soperator-worker-svc.soperator.svc.cluster.local>
Co-authored-by: David Briand <david@photoroom.com>
Co-authored-by: davidb <davidb@worker-8.soperator-worker-svc.soperator.svc.cluster.local>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com>
Co-authored-by: sayakpaul <spsayakpaul@gmail.com>
Fix: Use incorrect temporary variable key when replacing adapter name in state dict within load_lora_adapter function
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* fix dockerfile definitions.
* python 3.10 slim.
* up
* up
* up
* up
* up
* revert pr_tests.yml changes
* up
* up
* reduce python version for torch 2.1.0
* fix bug when offload and cache_latents both enabled
* fix bug when offload and cache_latents both enabled
* fix bug when offload and cache_latents both enabled
* fix bug when offload and cache_latents both enabled
* fix bug when offload and cache_latents both enabled
* fix bug when offload and cache_latents both enabled
* fix bug when offload and cache_latents both enabled
* fix bug when offload and cache_latents both enabled
* fix bug when offload and cache_latents both enabled
I noticed that the test should be for the option check_compiled="ignore"
but it was using check_compiled="warn". This has been fixed, now the
correct argument is passed.
However, the fact that the test passed means that it was incorrect to
begin with. The way that logs are collected does not collect the
logger.warning call here (not sure why). To amend this, I'm now using
assertNoLogs. With this change, the test correctly fails when the wrong
argument is passed.
* cache non lora pipeline outputs.
* up
* up
* up
* up
* Revert "up"
This reverts commit 772c32e433.
* up
* Revert "up"
This reverts commit cca03df7fc.
* up
* up
* add .
* up
* up
* up
* up
* up
* up
* docs: introduce cache-dit to diffusers
* docs: introduce cache-dit to diffusers
* docs: introduce cache-dit to diffusers
* docs: introduce cache-dit to diffusers
* docs: introduce cache-dit to diffusers
* docs: introduce cache-dit to diffusers
* docs: introduce cache-dit to diffusers
* misc: update examples link
* misc: update examples link
* docs: introduce cache-dit to diffusers
* docs: introduce cache-dit to diffusers
* docs: introduce cache-dit to diffusers
* docs: introduce cache-dit to diffusers
* docs: introduce cache-dit to diffusers
* Refine documentation for CacheDiT features
Updated the wording for clarity and consistency in the documentation. Adjusted sections on cache acceleration, automatic block adapter, patch functor, and hybrid cache configuration.
* Upgrade huggingface-hub to version 0.35.0
Updated huggingface-hub version from 0.26.1 to 0.35.0.
* Add uvicorn and accelerate to requirements
* Fix install instructions for server
* Convert alphas for embedders for sd-scripts to ai toolkit conversion
* Add kohya embedders conversion test
* Apply style fixes
---------
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
* Basic implementation of request scheduling
* Basic editing in SD and Flux Pipelines
* Small Fix
* Fix
* Update for more pipelines
* Add examples/server-async
* Add examples/server-async
* Updated RequestScopedPipeline to handle a single tokenizer lock to avoid race conditions
* Fix
* Fix _TokenizerLockWrapper
* Fix _TokenizerLockWrapper
* Delete _TokenizerLockWrapper
* Fix tokenizer
* Update examples/server-async
* Fix server-async
* Optimizations in examples/server-async
* We keep the implementation simple in examples/server-async
* Update examples/server-async/README.md
* Update examples/server-async/README.md for changes to tokenizer locks and backward-compatible retrieve_timesteps
* The changes to the diffusers core have been undone and all logic is being moved to exmaples/server-async
* Update examples/server-async/utils/*
* Fix BaseAsyncScheduler
* Rollback in the core of the diffusers
* Update examples/server-async/README.md
* Complete rollback of diffusers core files
* Simple implementation of an asynchronous server compatible with SD3-3.5 and Flux Pipelines
* Update examples/server-async/README.md
* Fixed import errors in 'examples/server-async/serverasync.py'
* Flux Pipeline Discard
* Update examples/server-async/README.md
* Apply style fixes
---------
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
* fix hidream type hint
* fix hunyuan-video type hint
* fix many type hint
* fix many type hint errors
* fix many type hint errors
* fix many type hint errors
* make stype & make quality
* Update autoencoder_kl_wan.py
When using the Wan2.2 VAE, the spatial compression ratio calculated here is incorrect. It should be 16 instead of 8. Pass it in directly via the config to ensure it’s correct here.
* Update autoencoder_kl_wan.py
* support Wan2.2-VACE-Fun-A14B
* support Wan2.2-VACE-Fun-A14B
* support Wan2.2-VACE-Fun-A14B
* Apply style fixes
* test
---------
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
* Use SDP on BF16 in GPU/HPU migration
Signed-off-by: Daniel Socek <daniel.socek@intel.com>
* Formatting fix for enabling SDP with BF16 precision on HPU
Signed-off-by: Daniel Socek <daniel.socek@intel.com>
---------
Signed-off-by: Daniel Socek <daniel.socek@intel.com>
* Add AttentionMixin to WanVACETransformer3DModel
to enable methods like `set_attn_processor()`.
* Import AttentionMixin in transformer_wan_vace.py
Special thanks to @tolgacangoz 🙇♂️
* make modular pipeline work with model_index.json
* up
* style
* up
* up
* style
* up more
* Fix MultiControlNet import (#12118)
fix
---------
Co-authored-by: Álvaro Somoza <asomoza@users.noreply.github.com>
Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>
* fix: update SkyReels-V2 documentation and moving into attn dispatcher
* Refactors SkyReelsV2's attention implementation
* style
* up
* Fixes formatting in SkyReels-V2 documentation
Wraps the visual demonstration section in a Markdown code block.
This change corrects the rendering of ASCII diagrams and examples, improving the overall readability of the document.
* Docs: Condense example arrays in skyreels_v2 guide
Improves the readability of the `step_matrix` examples by replacing long sequences of repeated numbers with a more compact `value×count` notation.
This change makes the underlying data patterns in the examples easier to understand at a glance.
* Add _repeated_blocks attribute to SkyReelsV2Transformer3DModel
* Refactor rotary embedding calculations in SkyReelsV2 to separate cosine and sine frequencies
* Enhance SkyReels-V2 documentation: update model loading for GPU support and remove outdated notes
* up
* up
* Update model_id in SkyReels-V2 documentation
* up
* refactor: remove device_map parameter for model loading and add pipeline.to("cuda") for GPU allocation
* fix: update copyright year to 2025 in skyreels_v2.md
* docs: enhance parameter examples and formatting in skyreels_v2.md
* docs: update example formatting and add notes on LoRA support in skyreels_v2.md
* refactor: remove copied comments from transformer_wan in SkyReelsV2 classes
* Clean up comments in skyreels_v2.md
Removed comments about acceleration helpers and Flash Attention installation.
* Add deprecation warning for `SkyReelsV2AttnProcessor2_0` class
* Fix PyTorch 2.3.1 compatibility: add version guard for torch.library.custom_op
- Add hasattr() check for torch.library.custom_op and register_fake
- These functions were added in PyTorch 2.4, causing import failures in 2.3.1
- Both decorators and functions are now properly guarded with version checks
- Maintains backward compatibility while preserving functionality
Fixes#12195
* Use dummy decorators approach for PyTorch version compatibility
- Replace hasattr check with version string comparison
- Add no-op decorator functions for PyTorch < 2.4.0
- Follows pattern from #11941 as suggested by reviewer
- Maintains cleaner code structure without indentation changes
* Update src/diffusers/models/attention_dispatch.py
Update all the decorator usages
Co-authored-by: Aryan <contact.aryanvs@gmail.com>
* Update src/diffusers/models/attention_dispatch.py
Co-authored-by: Aryan <contact.aryanvs@gmail.com>
* Update src/diffusers/models/attention_dispatch.py
Co-authored-by: Aryan <contact.aryanvs@gmail.com>
* Update src/diffusers/models/attention_dispatch.py
Co-authored-by: Aryan <contact.aryanvs@gmail.com>
* Move version check to top of file and use private naming as requested
* Apply style fixes
---------
Co-authored-by: Aryan <contact.aryanvs@gmail.com>
Co-authored-by: Aryan <aryan@huggingface.co>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
* Add Bria model and pipeline to diffusers
- Introduced `BriaTransformer2DModel` and `BriaPipeline` for enhanced image generation capabilities.
- Updated import structures across various modules to include the new Bria components.
- Added utility functions and output classes specific to the Bria pipeline.
- Implemented tests for the Bria pipeline to ensure functionality and output integrity.
* with working tests
* style and quality pass
* adding docs
* add to overview
* fixes from "make fix-copies"
* Refactor transformer_bria.py and pipeline_bria.py: Introduce new EmbedND class for rotary position embedding, and enhance Timestep and TimestepProjEmbeddings classes. Add utility functions for handling negative prompts and generating original sigmas in pipeline_bria.py.
* remove redundent and duplicates tests and fix bf16
slow test
* style fixes
* small doc update
* Enhance Bria 3.2 documentation and implementation
- Updated the GitHub repository link for Bria 3.2.
- Added usage instructions for the gated model access.
- Introduced the BriaTransformerBlock and BriaAttention classes to the model architecture.
- Refactored existing classes to integrate Bria-specific components, including BriaEmbedND and BriaPipeline.
- Updated the pipeline output class to reflect Bria-specific functionality.
- Adjusted test cases to align with the new Bria model structure.
* Refactor Bria model components and update documentation
- Removed outdated inference example from Bria 3.2 documentation.
- Introduced the BriaTransformerBlock class to enhance model architecture.
- Updated attention handling to use `attention_kwargs` instead of `joint_attention_kwargs`.
- Improved import structure in the Bria pipeline to handle optional dependencies.
- Adjusted test cases to reflect changes in model dtype assertions.
* Update Bria model reference in documentation to reflect new file naming convention
* Update docs/source/en/_toctree.yml
* Refactor BriaPipeline to inherit from DiffusionPipeline instead of FluxPipeline, updating imports accordingly.
* move the __call__ func to the end of file
* Update BriaPipeline example to use bfloat16 for precision sensitivity for better result
* make style && make quality && make fix-copiessource
---------
Co-authored-by: Linoy Tsaban <57615435+linoytsaban@users.noreply.github.com>
Co-authored-by: Aryan <contact.aryanvs@gmail.com>
- Modify offload_models function to handle DiffusionPipeline correctly
- Ensure compatibility with both single and multiple module inputs
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* CogView4: remove SiLU in final AdaLN (match Megatron); add switch to AdaLayerNormContinuous; split temb_raw/temb_blocks
* CogView4: remove SiLU in final AdaLN (match Megatron); add switch to AdaLayerNormContinuous; split temb_raw/temb_blocks
* CogView4: remove SiLU in final AdaLN (match Megatron); add switch to AdaLayerNormContinuous; split temb_raw/temb_blocks
* CogView4: use local final AdaLN (no SiLU) per review; keep generic AdaLN unchanged
* re-add configs as normal files (no LFS)
* Apply suggestions from code review
* Apply style fixes
---------
Co-authored-by: 武嘉涵 <lambert@wujiahandeMacBook-Pro.local>
Co-authored-by: Aryan <contact.aryanvs@gmail.com>
Co-authored-by: Aryan <aryan@huggingface.co>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
* try to use deepseek with an agent to auto i18n to zh
Signed-off-by: SamYuan1990 <yy19902439@126.com>
* add two more docs
Signed-off-by: SamYuan1990 <yy19902439@126.com>
* fix, updated some prompt for better translation
Signed-off-by: SamYuan1990 <yy19902439@126.com>
* Try to passs CI check
Signed-off-by: SamYuan1990 <yy19902439@126.com>
* fix up for human review process
Signed-off-by: SamYuan1990 <yy19902439@126.com>
* fix up
Signed-off-by: SamYuan1990 <yy19902439@126.com>
* fix review comments
Signed-off-by: SamYuan1990 <yy19902439@126.com>
---------
Signed-off-by: SamYuan1990 <yy19902439@126.com>
* Initial commit implementing frequency-decoupled guidance (FDG) as a guider
* Update FrequencyDecoupledGuidance docstring to describe FDG
* Update project so that it accepts any number of non-batch dims
* Change guidance_scale and other params to accept a list of params for each freq level
* Add comment with Laplacian pyramid shapes
* Add function to import_utils to check if the kornia package is available
* Only import from kornia if package is available
* Fix bug: use pred_cond/uncond in freq space rather than data space
* Allow guidance rescaling to be done in data space or frequency space (speculative)
* Add kornia install instructions to kornia import error message
* Add config to control whether operations are upcast to fp64
* Add parallel_weights recommended values to docstring
* Apply style fixes
* make fix-copies
---------
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Aryan <aryan@huggingface.co>
* feat: support lora in qwen image and training script
* up
* up
* up
* up
* up
* up
* add lora tests
* fix
* add tests
* fix
* reviewer feedback
* up[
* Apply suggestions from code review
Co-authored-by: Aryan <aryan@huggingface.co>
---------
Co-authored-by: Aryan <aryan@huggingface.co>
[Examples] uniform naming notations
since the in parameter `size` represents `args.resolution`, I thus replace the `args.resolution` inside DreamBoothData with `size`. And revise some notations such as `center_crop`.
Co-authored-by: Linoy Tsaban <57615435+linoytsaban@users.noreply.github.com>
* style
* Fix class name casing for SkyReelsV2 components in multiple files to ensure consistency and correct functionality.
* cleaning
* cleansing
* Refactor `get_timestep_embedding` to move modifications into `SkyReelsV2TimeTextImageEmbedding`.
* Remove unnecessary line break in `get_timestep_embedding` function for cleaner code.
* Remove `skyreels_v2` entry from `_import_structure` and update its initialization to directly assign the list of SkyReelsV2 components.
* cleansing
* Refactor attention processing in `SkyReelsV2AttnProcessor2_0` to always convert query, key, and value to `torch.bfloat16`, simplifying the code and improving clarity.
* Enhance example usage in `pipeline_skyreels_v2_diffusion_forcing.py` by adding VAE initialization and detailed prompt for video generation, improving clarity and usability of the documentation.
* Refactor import structure in `__init__.py` for SkyReelsV2 components and improve formatting in `pipeline_skyreels_v2_diffusion_forcing.py` to enhance code readability and maintainability.
* Update `guidance_scale` parameter in `SkyReelsV2DiffusionForcingPipeline` from 5.0 to 6.0 to enhance video generation quality.
* Update `guidance_scale` parameter in example documentation and class definition of `SkyReelsV2DiffusionForcingPipeline` to ensure consistency and improve video generation quality.
* Update `causal_block_size` parameter in `SkyReelsV2DiffusionForcingPipeline` to default to `None`.
* up
* Fix dtype conversion for `timestep_proj` in `SkyReelsV2Transformer3DModel` to *ensure* correct tensor operations.
* Optimize causal mask generation by replacing repeated tensor with `repeat_interleave` for improved efficiency in `SkyReelsV2Transformer3DModel`.
* style
* Enhance example documentation in `SkyReelsV2DiffusionForcingPipeline` with guidance scale and shift parameters for T2V and I2V. Remove unused `retrieve_latents` function to streamline the code.
* Refactor sample scheduler creation in `SkyReelsV2DiffusionForcingPipeline` to use `deepcopy` for improved state management during inference steps.
* Enhance error handling and documentation in `SkyReelsV2DiffusionForcingPipeline` for `overlap_history` and `addnoise_condition` parameters to improve long video generation guidance.
* Update documentation and progress bar handling in `SkyReelsV2DiffusionForcingPipeline` to clarify asynchronous inference settings and improve progress tracking during denoising steps.
* Refine progress bar calculation in `SkyReelsV2DiffusionForcingPipeline` by rounding the step size to one decimal place for improved readability during denoising steps.
* Update import statements in `SkyReelsV2DiffusionForcingPipeline` documentation for improved clarity and organization.
* Refactor progress bar handling in `SkyReelsV2DiffusionForcingPipeline` to use total steps instead of calculated step size.
* update templates for i2v, v2v
* Add `retrieve_latents` function to streamline latent retrieval in `SkyReelsV2DiffusionForcingPipeline`. Update video latent processing to utilize this new function for improved clarity and maintainability.
* Add `retrieve_latents` function to both i2v and v2v pipelines for consistent latent retrieval. Update video latent processing to utilize this function, enhancing clarity and maintainability across the SkyReelsV2DiffusionForcingPipeline implementations.
* Remove redundant ValueError for `overlap_history` in `SkyReelsV2DiffusionForcingPipeline` to streamline error handling and improve user guidance for long video generation.
* Update default video dimensions and flow matching scheduler parameter in `SkyReelsV2DiffusionForcingPipeline` to enhance video generation capabilities.
* Refactor `SkyReelsV2DiffusionForcingPipeline` to support Image-to-Video (i2v) generation. Update class name, add image encoding functionality, and adjust parameters for improved video generation. Enhance error handling for image inputs and update documentation accordingly.
* Improve organization for image-last_image condition.
* Refactor `SkyReelsV2DiffusionForcingImageToVideoPipeline` to improve latent preparation and video condition handling integration.
* style
* style
* Add example usage of PIL for image input in `SkyReelsV2DiffusionForcingImageToVideoPipeline` documentation.
* Refactor `SkyReelsV2DiffusionForcingPipeline` to `SkyReelsV2DiffusionForcingVideoToVideoPipeline`, enhancing support for Video-to-Video (v2v) generation. Introduce video input handling, update latent preparation logic, and improve error handling for input parameters.
* Refactor `SkyReelsV2DiffusionForcingImageToVideoPipeline` by removing the `image_encoder` and `image_processor` dependencies. Update the CPU offload sequence accordingly.
* Refactor `SkyReelsV2DiffusionForcingImageToVideoPipeline` to enhance latent preparation logic and condition handling. Update image input type to `Optional`, streamline video condition processing, and improve handling of `last_image` during latent generation.
* Enhance `SkyReelsV2DiffusionForcingPipeline` by refining latent preparation for long video generation. Introduce new parameters for video handling, overlap history, and causal block size. Update logic to accommodate both short and long video scenarios, ensuring compatibility and improved processing.
* refactor
* fix num_frames
* fix prefix_video_latents
* up
* refactor
* Fix typo in scheduler method call within `SkyReelsV2DiffusionForcingVideoToVideoPipeline` to ensure proper noise scaling during latent generation.
* up
* Enhance `SkyReelsV2DiffusionForcingImageToVideoPipeline` by adding support for `last_image` parameter and refining latent frame calculations. Update preprocessing logic.
* add statistics
* Refine latent frame handling in `SkyReelsV2DiffusionForcingImageToVideoPipeline` by correcting variable names and reintroducing latent mean and standard deviation calculations. Update logic for frame preparation and sampling to ensure accurate video generation.
* up
* refactor
* up
* Refactor `SkyReelsV2DiffusionForcingVideoToVideoPipeline` to improve latent handling by enforcing tensor input for video, updating frame preparation logic, and adjusting default frame count. Enhance preprocessing and postprocessing steps for better integration.
* style
* fix vae output indexing
* upup
* up
* Fix tensor concatenation and repetition logic in `SkyReelsV2DiffusionForcingImageToVideoPipeline` to ensure correct dimensionality for video conditions and latent conditions.
* Refactor latent retrieval logic in `SkyReelsV2DiffusionForcingVideoToVideoPipeline` to handle tensor dimensions more robustly, ensuring compatibility with both 3D and 4D video inputs.
* Enhance logging in `SkyReelsV2DiffusionForcing` pipelines by adding iteration print statements for better debugging. Clean up unused code related to prefix video latents length calculation in `SkyReelsV2DiffusionForcingImageToVideoPipeline`.
* Update latent handling in `SkyReelsV2DiffusionForcingImageToVideoPipeline` to conditionally set latents based on video iteration state, improving flexibility for video input processing.
* Refactor `SkyReelsV2TimeTextImageEmbedding` to utilize `get_1d_sincos_pos_embed_from_grid` for timestep projection.
* Enhance `get_1d_sincos_pos_embed_from_grid` function to include an optional parameter `flip_sin_to_cos` for flipping sine and cosine embeddings, improving flexibility in positional embedding generation.
* Update timestep projection in `SkyReelsV2TimeTextImageEmbedding` to include `flip_sin_to_cos` parameter, enhancing the flexibility of time embedding generation.
* Refactor tensor type handling in `SkyReelsV2AttnProcessor2_0` and `SkyReelsV2TransformerBlock` to ensure consistent use of `torch.float32` and `torch.bfloat16`, improving integration.
* Update tensor type in `SkyReelsV2RotaryPosEmbed` to use `torch.float32` for frequency calculations, ensuring consistency in data types across the model.
* Refactor `SkyReelsV2TimeTextImageEmbedding` to utilize automatic mixed precision for timestep projection.
* down
* down
* style
* Add debug tensor tracking to `SkyReelsV2Transformer3DModel` for enhanced debugging and output analysis; update `Transformer2DModelOutput` to include debug tensors.
* up
* Refactor indentation in `SkyReelsV2AttnProcessor2_0` to improve code readability and maintain consistency in style.
* Convert query, key, and value tensors to bfloat16 in `SkyReelsV2AttnProcessor2_0` for improved performance.
* Add debug print statements in `SkyReelsV2TransformerBlock` to track tensor shapes and values for improved debugging and analysis.
* debug
* debug
* Remove commented-out debug tensor tracking from `SkyReelsV2TransformerBlock`
* Add functionality to save processed video latents as a Safetensors file in `SkyReelsV2DiffusionForcingPipeline`.
* up
* Add functionality to save output latents as a Safetensors file in `SkyReelsV2DiffusionForcingPipeline`.
* up
* Remove additional commented-out debug tensor tracking from `SkyReelsV2TransformerBlock` and `SkyReelsV2Transformer3DModel` for cleaner code.
* style
* cleansing
* Update example documentation and parameters in `SkyReelsV2Pipeline`. Adjusted example code for loading models, modified default values for height, width, num_frames, and guidance_scale, and improved output video quality settings.
* Update shift parameter in example documentation and default values across SkyReels V2 pipelines. Adjusted shift values for I2V from 3.0 to 5.0 and updated related example code for consistency.
* Update example documentation in SkyReels V2 pipelines to include available model options and update model references for loading. Adjusted model names to reflect the latest versions across I2V, V2V, and T2V pipelines.
* Add test templates
* style
* Add docs template
* Add SkyReels V2 Diffusion Forcing Video-to-Video Pipeline to imports
* style
* fix-copies
* convert i2v 1.3b
* Update transformer configuration to include `image_dim` for SkyReels V2 models and refactor imports to use `SkyReelsV2Transformer3DModel`.
* Refactor transformer import in SkyReels V2 pipeline to use `SkyReelsV2Transformer3DModel` for consistency.
* Update transformer configuration in SkyReels V2 to increase `in_channels` from 16 to 36 for i2v conf.
* Update transformer configuration in SkyReels V2 to set `added_kv_proj_dim` values for different model types.
* up
* up
* up
* Add SkyReelsV2Pipeline support for T2V model type in conversion script
* upp
* Refactor model type checks in conversion script to use substring matching for improved flexibility
* upp
* Fix shard path formatting in conversion script to accommodate varying model types by dynamically adjusting zero padding.
* Update sharded safetensors loading logic in conversion script to use substring matching for model directory checks
* Update scheduler parameters in SkyReels V2 test files for consistency across image and video pipelines
* Refactor conversion script to initialize text encoder, tokenizer, and scheduler for SkyReels pipelines, enhancing model integration
* style
* Update documentation for SkyReels-V2, introducing the Infinite-length Film Generative model, enhancing text-to-video generation examples, and updating model references throughout the API documentation.
* Add SkyReelsV2Transformer3DModel and FlowMatchUniPCMultistepScheduler documentation, updating TOC and introducing new model and scheduler files.
* style
* Update documentation for SkyReelsV2DiffusionForcingPipeline to correct flow matching scheduler parameter for I2V from 3.0 to 5.0, ensuring clarity in usage examples.
* Add documentation for causal_block_size parameter in SkyReelsV2DF pipelines, clarifying its role in asynchronous inference.
* Simplify min_ar_step calculation in SkyReelsV2DiffusionForcingPipeline to improve clarity.
* style and fix-copies
* style
* Add documentation for SkyReelsV2Transformer3DModel
Introduced a new markdown file detailing the SkyReelsV2Transformer3DModel, including usage instructions and model output specifications.
* Update test configurations for SkyReelsV2 pipelines
- Adjusted `in_channels` from 36 to 16 in `test_skyreels_v2_df_image_to_video.py`.
- Added new parameters: `overlap_history`, `num_frames`, and `base_num_frames` in `test_skyreels_v2_df_video_to_video.py`.
- Updated expected output shape in video tests from (17, 3, 16, 16) to (41, 3, 16, 16).
* Refines SkyReelsV2DF test parameters
* Update src/diffusers/models/modeling_outputs.py
Co-authored-by: Aryan <contact.aryanvs@gmail.com>
* Refactor `grid_sizes` processing by using already-calculated post-patch parameters to simplify
* Update docs/source/en/api/pipelines/skyreels_v2.md
Co-authored-by: Aryan <contact.aryanvs@gmail.com>
* Refactor parameter naming for diffusion forcing in SkyReelsV2 pipelines
- Changed `flag_df` to `enable_diffusion_forcing` for clarity in the SkyReelsV2Transformer3DModel and associated pipelines.
- Updated all relevant method calls to reflect the new parameter name.
* Revert _toctree.yml to adjust section expansion states
* style
* Update docs/source/en/api/models/skyreels_v2_transformer_3d.md
Co-authored-by: YiYi Xu <yixu310@gmail.com>
* Add copying label to SkyReelsV2ImageEmbedding from WanImageEmbedding.
* Refactor transformer block processing in SkyReelsV2Transformer3DModel
- Ensured proper handling of hidden states during both gradient checkpointing and standard processing.
* Update SkyReels V2 documentation to remove VRAM requirement and streamline imports
- Removed the mention of ~13GB VRAM requirement for the SkyReels-V2 model.
- Simplified import statements by removing unused `load_image` import.
* Add SkyReelsV2LoraLoaderMixin for loading and managing LoRA layers in SkyReelsV2Transformer3DModel
- Introduced SkyReelsV2LoraLoaderMixin class to handle loading, saving, and fusing of LoRA weights specific to the SkyReelsV2 model.
- Implemented methods for state dict management, including compatibility checks for various LoRA formats.
- Enhanced functionality for loading weights with options for low CPU memory usage and hotswapping.
- Added detailed docstrings for clarity on parameters and usage.
* Update SkyReelsV2 documentation and loader mixin references
- Corrected the documentation to reference the new `SkyReelsV2LoraLoaderMixin` for loading LoRA weights.
- Updated comments in the `SkyReelsV2LoraLoaderMixin` class to reflect changes in model references from `WanTransformer3DModel` to `SkyReelsV2Transformer3DModel`.
* Enhance SkyReelsV2 integration by adding SkyReelsV2LoraLoaderMixin references
- Added `SkyReelsV2LoraLoaderMixin` to the documentation and loader imports for improved LoRA weight management.
- Updated multiple pipeline classes to inherit from `SkyReelsV2LoraLoaderMixin` instead of `WanLoraLoaderMixin`.
* Update SkyReelsV2 model references in documentation
- Replaced placeholder model paths with actual paths for SkyReels-V2 models in multiple pipeline files.
- Ensured consistency across the documentation for loading models in the SkyReelsV2 pipelines.
* style
* fix-copies
* Refactor `fps_projection` in `SkyReelsV2Transformer3DModel`
- Replaced the sequential linear layers for `fps_projection` with a `FeedForward` layer using `SiLU` activation for better integration.
* Update docs
* Refactor video processing in SkyReelsV2DiffusionForcingPipeline
- Renamed parameters for clarity: `video` to `video_latents` and `overlap_history` to `overlap_history_latent_frames`.
- Updated logic for handling long video generation, including adjustments to latent frame calculations and accumulation.
- Consolidated handling of latents for both long and short video generation scenarios.
- Final decoding step now consistently converts latents to pixels, ensuring proper output format.
* Update activation function in `fps_projection` of `SkyReelsV2Transformer3DModel`
- Changed activation function from `silu` to `linear-silu` in the `fps_projection` layer for improved performance and integration.
* Add fps_projection layer renaming in convert_skyreelsv2_to_diffusers.py
- Updated key mappings for the `fps_projection` layer to align with new naming conventions, ensuring consistency in model integration.
* Fix fps_projection assignment in SkyReelsV2Transformer3DModel
- Corrected the assignment of the `fps_projection` layer to ensure it is properly cast to the appropriate data type, enhancing model functionality.
* Update _keep_in_fp32_modules in SkyReelsV2Transformer3DModel
- Added `fps_projection` to the list of modules that should remain in FP32 precision, ensuring proper handling of data types during model operations.
* Remove integration test classes from SkyReelsV2 test files
- Deleted the `SkyReelsV2DiffusionForcingPipelineIntegrationTests` and `SkyReelsV2PipelineIntegrationTests` classes along with their associated setup, teardown, and test methods, as they were not implemented and not needed for current testing.
* style
* Refactor: Remove hardcoded `torch.bfloat16` cast in attention
* Refactor: Simplify data type handling in transformer model
Removes unnecessary data type conversions for the FPS embedding and timestep projection.
This change simplifies the forward pass by relying on the inherent data types of the tensors.
* Refactor: Remove `fps_projection` from `_keep_in_fp32_modules` in `SkyReelsV2Transformer3DModel`
* Update src/diffusers/models/transformers/transformer_skyreels_v2.py
Co-authored-by: Aryan <contact.aryanvs@gmail.com>
* Refactor: Remove unused flags and simplify attention mask handling in SkyReelsV2AttnProcessor2_0 and SkyReelsV2Transformer3DModel
Refactor: Simplify causal attention logic in SkyReelsV2
Removes the `flag_causal_attention` and `_flag_ar_attention` flags to simplify the implementation.
The decision to apply a causal attention mask is now based directly on the `num_frame_per_block` configuration, eliminating redundant flags and conditional checks. This streamlines the attention mechanism and simplifies the `set_ar_attention` methods.
* Refactor: Clarify variable names for latent frames
Renames `base_num_frames` to `base_latent_num_frames` to make it explicit that the variable refers to the number of frames in the latent space.
This change improves code readability and reduces potential confusion between latent frames and decoded video frames.
The `num_frames` parameter in `generate_timestep_matrix` is also renamed to `num_latent_frames` for consistency.
* Enhance documentation: Add detailed docstring for timestep matrix generation in SkyReelsV2DiffusionForcingPipeline
* Docs: Clarify long video chunking in pipeline docstring
Improves the explanation of long video processing within the pipeline's docstring.
The update replaces the abstract description with a concrete example, illustrating how the sliding window mechanism works with overlapping chunks. This makes the roles of `base_num_frames` and `overlap_history` clearer for users.
* Docs: Move visual demonstration and processing details for SkyReelsV2DiffusionForcingPipeline to docs page from the code
* Docs: Update asynchronous processing timeline and examples for long video handling in SkyReels-V2 documentation
* Enhance timestep matrix generation documentation and logic for synchronous/asynchronous video processing
* Update timestep matrix documentation and enhance analysis for clarity in SkyReelsV2DiffusionForcingPipeline
* Docs: Update visual demonstration section and add detailed step matrix construction example for asynchronous processing in SkyReelsV2DiffusionForcingPipeline
* style
* fix-copies
* Refactor parameter names for clarity in SkyReelsV2DiffusionForcingImageToVideoPipeline and SkyReelsV2DiffusionForcingVideoToVideoPipeline
* Refactor: Avoid VAE roundtrip in long video generation
Improves performance and quality for long video generation by operating entirely in latent space during the iterative generation process.
Instead of decoding latents to video and then re-encoding the overlapping section for the next chunk, this change passes the generated latents directly between iterations.
This avoids a computationally expensive and potentially lossy VAE decode/encode cycle within the loop. The full video is now decoded only once from the accumulated latents at the end of the process.
* Refactor: Rename prefix_video_latents_length to prefix_video_latents_frames for clarity
* Refactor: Rename num_latent_frames to current_num_latent_frames for clarity in SkyReelsV2DiffusionForcingImageToVideoPipeline
* Refactor: Enhance long video generation logic and improve latent handling in SkyReelsV2DiffusionForcingImageToVideoPipeline
Refactor: Unify video generation and pass latents directly
Unifies the separate code paths for short and long video generation into a single, streamlined loop.
This change eliminates the inefficient decode-encode cycle during long video generation. Instead of converting latents to pixel-space video between chunks, the pipeline now passes the generated latents directly to the next iteration.
This improves performance, avoids potential quality loss from intermediate VAE steps, and enhances code maintainability by removing significant duplication.
* style
* Refactor: Remove overlap_history parameter and streamline long video generation logic in SkyReelsV2DiffusionForcingImageToVideoPipeline
Refactor: Streamline long video generation logic
Removes the `overlap_history` parameter and simplifies the conditioning process for long video generation.
This change avoids a redundant VAE encoding step by directly using latent frames from the previous chunk for conditioning. It also moves image preprocessing outside the main generation loop to prevent repeated computations and clarifies the handling of prefix latents.
* style
* Refactor latent handling in i2v diffusion forcing pipeline
Improves the latent conditioning and accumulation logic within the image-to-video diffusion forcing loop.
- Corrects the splitting of the initial conditioning tensor to robustly handle both even and odd lengths.
- Simplifies how latents are accumulated across iterations for long video generation.
- Ensures the final latents are trimmed correctly before decoding only when a `last_image` is provided.
* Refactor: Remove overlap_history parameter from SkyReelsV2DiffusionForcingImageToVideoPipeline
* Refactor: Adjust video_latents parameter handling in prepare_latents method
* style
* Refactor: Update long video iteration print statements for clarity
* Fix: Update transformer config with dynamic causal block size
Updates the SkyReelsV2 pipelines to correctly set the `causal_block_size` in the transformer's configuration when it's provided during a pipeline call.
This ensures the model configuration reflects the user's specified setting for the inference run. The `set_ar_attention` method is also renamed to `_set_ar_attention` to mark it as an internal helper.
* style
* Refactor: Adjust video input size and expected output shape in inference test
* Refactor: Rename video variables for clarity in SkyReelsV2DiffusionForcingVideoToVideoPipeline
* Docs: Clarify time embedding logic in SkyReelsV2
Adds comments to explain the handling of different time embedding tensor dimensions.
A 2D tensor is used for standard models with a single time embedding per batch, while a 3D tensor is used for Diffusion Forcing models where each frame has its own time embedding. This clarifies the expected input for different model variations.
* Docs: Update SkyReels V2 pipeline examples
Updates the docstring examples for the SkyReels V2 pipelines to reflect current best practices and API changes.
- Removes the `shift` parameter from pipeline call examples, as it is now configured directly on the scheduler.
- Replaces the `set_ar_attention` method call with the `causal_block_size` argument in the pipeline call for diffusion forcing examples.
- Adjusts recommended parameters for I2V and V2V examples, including inference steps, guidance scale, and `ar_step`.
* Refactor: Remove `shift` parameter from SkyReelsV2 pipelines
Removes the `shift` parameter from the call signature of all SkyReelsV2 pipelines.
This parameter is a scheduler-specific configuration and should be set directly on the scheduler during its initialization, rather than being passed at runtime through the pipeline. This change simplifies the pipeline API.
Usage examples are updated to reflect that the `shift` value should now be passed when creating the `FlowMatchUniPCMultistepScheduler`.
* Refactors SkyReelsV2 image-to-video tests and adds last image case
Simplifies the test suite by removing a duplicated test class and streamlining the dummy component and input generation.
Adds a new test to verify the pipeline's behavior when a `last_image` is provided as input for conditioning.
* test: Add image components to SkyReelsV2 pipeline test
Adds the `image_encoder` and `image_processor` to the test components for the image-to-video pipeline.
Also replaces a hardcoded value for the positional embedding sequence length with a more descriptive calculation, improving clarity.
* test: Add callback configuration test for SkyReelsV2DiffusionForcingVideoToVideoPipeline
test: Add callback test for SkyReelsV2DFV2V pipeline
Adds a test to validate the callback functionality for the `SkyReelsV2DiffusionForcingVideoToVideoPipeline`.
This test confirms that `callback_on_step_end` is invoked correctly and can modify the pipeline's state during inference. It uses a callback to dynamically increase the `guidance_scale` and asserts that the final value is as expected.
The implementation correctly accounts for the nested denoising loops present in diffusion forcing pipelines.
* style
* fix: Update image_encoder type to CLIPVisionModelWithProjection in SkyReelsV2ImageToVideoPipeline
* UP
* Add conversion support for SkyReels-V2-FLF2V models
Adds configurations for three new FLF2V model variants (1.3B-540P, 14B-540P, and 14B-720P) to the conversion script.
This change also introduces specific handling to zero out the image positional embeddings for these models and updates the main script to correctly initialize the image-to-video pipeline.
* Docs: Update and simplify SkyReels V2 usage examples
Simplifies the text-to-video example by removing the manual group offloading configuration, making it more straightforward.
Adds comments to pipeline parameters to clarify their purpose and provides guidance for different resolutions and long video generation.
Introduces a new section with a code example for the video-to-video pipeline.
* style
* docs: Add SkyReels-V2 FLF2V 1.3B model to supported models list
* docs: Update SkyReels-V2 documentation
* Move the initialization of the `gradient_checkpointing` attribute to its suggested location.
* Refactor: Use logger for long video progress messages
Replaces `print()` calls with `logger.debug()` for reporting progress during long video generation in SkyReelsV2DF pipelines.
This change reduces console output verbosity for standard runs while allowing developers to view progress by enabling debug-level logging.
* Refactor SkyReelsV2 timestep embedding into a module
Extract the sinusoidal timestep embedding logic into a new `SkyReelsV2Timesteps` `nn.Module`.
This change encapsulates the embedding generation, which simplifies the `SkyReelsV2TimeTextImageEmbedding` class and improves code modularity.
* Fix: Preserve original shape in timestep embeddings
Reshapes the timestep embedding tensor to match the original input shape.
This ensures that batched timestep inputs retain their batch dimension after embedding, preventing potential shape mismatches.
* style
* Refactor: Move SkyReelsV2Timesteps to model file
Colocates the `SkyReelsV2Timesteps` class with the SkyReelsV2 transformer model.
This change moves model-specific timestep embedding logic from the general embeddings module to the transformer's own file, improving modularity and making the model more self-contained.
* Refactor parameter dtype retrieval to use utility function
Replaces manual parameter iteration with the `get_parameter_dtype` helper to determine the time embedder's data type.
This change improves code readability and centralizes the logic.
* Add comments to track the tensor shape transformations
* Add copied froms
* style
* fix-copies
* up
* Remove FlowMatchUniPCMultistepScheduler
Deletes the `FlowMatchUniPCMultistepScheduler` as it is no longer being used.
* Refactor: Replace FlowMatchUniPC scheduler with UniPC
Removes the `FlowMatchUniPCMultistepScheduler` and integrates its functionality into the existing `UniPCMultistepScheduler`.
This consolidation is achieved by using the `use_flow_sigmas=True` parameter in `UniPCMultistepScheduler`, simplifying the scheduler API and reducing code duplication. All usages, documentation, and tests are updated accordingly.
* style
* Remove text_encoder parameter from SkyReelsV2DiffusionForcingPipeline initialization
* Docs: Rename `pipe` to `pipeline` in SkyReels examples
Updates the variable name from `pipe` to `pipeline` across all SkyReels V2 documentation examples. This change improves clarity and consistency.
* Fix: Rename shift parameter to flow_shift in SkyReels-V2 examples
* Fix: Rename shift parameter to flow_shift in example documentation across SkyReels-V2 files
* Fix: Rename shift parameter to flow_shift in UniPCMultistepScheduler initialization across SkyReels test files
* Removes unused generator argument from scheduler step
The `generator` parameter is not used by the scheduler's `step` method within the SkyReelsV2 diffusion forcing pipelines. This change removes the unnecessary argument from the method call for code clarity and consistency.
* Fix: Update time_embedder_dtype assignment to use the first parameter's dtype in SkyReelsV2TimeTextImageEmbedding
* style
* Refactor: Use get_parameter_dtype utility function
Replaces manual parameter iteration with the `get_parameter_dtype` helper.
* Fix: Prevent (potential) error in parameter dtype check
Adds a check to ensure the `_keep_in_fp32_modules` attribute exists on a parameter before it is accessed.
This prevents a potential `AttributeError`, making the utility function more robust when used with models that do not define this attribute.
---------
Co-authored-by: YiYi Xu <yixu310@gmail.com>
Co-authored-by: Aryan <contact.aryanvs@gmail.com>
* Update pipeline_onnx_stable_diffusion.py to remove float64
init_noise_sigma was being set as float64 before multiplying with latents, which changed latents into float64 too, which caused errors with onnxruntime since the latter wanted float16.
* Update pipeline_onnx_stable_diffusion_inpaint.py to remove float64
init_noise_sigma was being set as float64 before multiplying with latents, which changed latents into float64 too, which caused errors with onnxruntime since the latter wanted float16.
* Update pipeline_onnx_stable_diffusion_upscale.py to remove float64
init_noise_sigma was being set as float64 before multiplying with latents, which changed latents into float64 too, which caused errors with onnxruntime since the latter wanted float16.
* Update pipeline_onnx_stable_diffusion.py with comment for previous commit
Added comment on purpose of init_noise_sigma. This comment exists in related scripts that use the same line of code, but it was missing here.
---------
Co-authored-by: YiYi Xu <yixu310@gmail.com>
* remove k-diffusion as we don't use it from the core.
* Revert "remove k-diffusion as we don't use it from the core."
This reverts commit 8bc86925a0.
* pin k-diffusion
* FIX set_lora_device when target layers differ
Resolves#11833
Fixes a bug that occurs after calling set_lora_device when multiple LoRA
adapters are loaded that target different layers.
Note: Technically, the accompanying test does not require a GPU because
the bug is triggered even if the parameters are already on the
corresponding device, i.e. loading on CPU and then changing the device
to CPU is sufficient to cause the bug. However, this may be optimized
away in the future, so I decided to test with GPU.
* Update docstring to warn about device mismatch
* Extend docstring with an example
* Fix docstring
---------
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* ENH Improve speed of expanding LoRA scales
Resolves#11816
The following call proved to be a bottleneck when setting a lot of LoRA
adapters in diffusers:
cdaf84a708/src/diffusers/loaders/peft.py (L482)
This is because we would repeatedly call unet.state_dict(), even though
in the standard case, it is not necessary:
cdaf84a708/src/diffusers/loaders/unet_loader_utils.py (L55)
This PR fixes this by deferring this call, so that it is only run when
it's necessary, not earlier.
* Small fix
---------
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* feat: use exclude modules to loraconfig.
* version-guard.
* tests and version guard.
* remove print.
* describe the test
* more detailed warning message + shift to debug
* update
* update
* update
* remove test
* ⚡️ Speed up method `AutoencoderKLWan.clear_cache` by 886%
**Key optimizations:**
- Compute the number of `WanCausalConv3d` modules in each model (`encoder`/`decoder`) **only once during initialization**, store in `self._cached_conv_counts`. This removes unnecessary repeated tree traversals at every `clear_cache` call, which was the main bottleneck (from profiling).
- The internal helper `_count_conv3d_fast` is optimized via a generator expression with `sum` for efficiency.
All comments from the original code are preserved, except for updated or removed local docstrings/comments relevant to changed lines.
**Function signatures and outputs remain unchanged.**
* Apply style fixes
* Apply suggestions from code review
Co-authored-by: Aryan <contact.aryanvs@gmail.com>
* Apply style fixes
---------
Co-authored-by: codeflash-ai[bot] <148906541+codeflash-ai[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Aryan <aryan@huggingface.co>
Co-authored-by: Aryan <contact.aryanvs@gmail.com>
Co-authored-by: Aseem Saxena <aseem.bits@gmail.com>
* Add Pruna optimization framework documentation
- Introduced a new section for Pruna in the table of contents.
- Added comprehensive documentation for Pruna, detailing its optimization techniques, installation instructions, and examples for optimizing and evaluating models
* Enhance Pruna documentation with image alt text and code block formatting
- Added alt text to images for better accessibility and context.
- Changed code block syntax from diff to python for improved clarity.
* Add installation section to Pruna documentation
- Introduced a new installation section in the Pruna documentation to guide users on how to install the framework.
- Enhanced the overall clarity and usability of the documentation for new users.
* Update pruna.md
* Update pruna.md
* Update Pruna documentation for model optimization and evaluation
- Changed section titles for consistency and clarity, from "Optimizing models" to "Optimize models" and "Evaluating and benchmarking optimized models" to "Evaluate and benchmark models".
- Enhanced descriptions to clarify the use of `diffusers` models and the evaluation process.
- Added a new example for evaluating standalone `diffusers` models.
- Updated references and links for better navigation within the documentation.
* Refactor Pruna documentation for clarity and consistency
- Removed outdated references to FLUX-juiced and streamlined the explanation of benchmarking.
- Enhanced the description of evaluating standalone `diffusers` models.
- Cleaned up code examples by removing unnecessary imports and comments for better readability.
* Apply suggestions from code review
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Enhance Pruna documentation with new examples and clarifications
- Added an image to illustrate the optimization process.
- Updated the explanation for sharing and loading optimized models on the Hugging Face Hub.
- Clarified the evaluation process for optimized models using the EvaluationAgent.
- Improved descriptions for defining metrics and evaluating standalone diffusers models.
---------
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* support text-to-image
* update example
* make fix-copies
* support use_flow_sigmas in EDM scheduler instead of maintain cosmos-specific scheduler
* support video-to-world
* update
* rename text2image pipeline
* make fix-copies
* add t2i test
* add test for v2w pipeline
* support edm dpmsolver multistep
* update
* update
* update
* update tests
* fix tests
* safety checker
* make conversion script work without guardrail
* add clarity in documentation for device_map
* docs
* fix how compiler tester mixins are used.
* propagate
* more
* typo.
* fix tests
* fix order of decroators.
* clarify more.
* more test cases.
* fix doc
* fix device_map docstring in pipeline_utils.
* more examples
* more
* update
* remove code for stuff that is already supported.
* fix stuff.
* allow loading from repo with dot in name
* put new arg at the end to avoid breaking compatibility
* add test for loading repo with dot in name
---------
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* Update pipeline_flux_inpaint.py to fix padding_mask_crop returning only the inpainted area and not the entire image.
* Apply style fixes
* Update src/diffusers/pipelines/flux/pipeline_flux_inpaint.py
* Add community class StableDiffusionXL_T5Pipeline
Will be used with base model opendiffusionai/stablediffusionxl_t5
* Changed pooled_embeds to use projection instead of slice
* "make style" tweaks
* Added comments to top of code
* Apply style fixes
[examples] flux-control: use num_training_steps_for_scheduler in get_scheduler instead of args.max_train_steps * accelerator.num_processes
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* add guidance rescale
* update docs
* support adaptive instance norm filter
* fix custom timesteps support
* add custom timestep example to docs
* add a note about best generation settings being available only in the original repository
* use original org hub ids instead of personal
* make fix-copies
---------
Co-authored-by: Linoy Tsaban <57615435+linoytsaban@users.noreply.github.com>
* [gguf] Refactor __torch_function__ to avoid unnecessary computation
This helps with torch.compile compilation lantency. Avoiding unnecessary
computation should also lead to a slightly improved eager latency.
* Apply style fixes
---------
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
echo "Quality check failed. Please ensure the right dependency versions are installed with 'pip install -e .[quality]' and run 'make style && make quality'" >> $GITHUB_STEP_SUMMARY
check_repository_consistency:
needs:check_code_quality
runs-on:ubuntu-22.04
steps:
- uses:actions/checkout@v6
- name:Set up Python
uses:actions/setup-python@v6
with:
python-version:"3.10"
- name:Install dependencies
run:|
pip install --upgrade pip
pip install .[quality]
- name:Check repo consistency
run:|
python utils/check_copies.py
python utils/check_dummies.py
python utils/check_support_list.py
make deps_table_check_updated
- name:Check if failure
if:${{ failure() }}
run:|
echo "Repo consistency check failed. Please ensure the right dependency versions are installed with 'pip install -e .[quality]' and run 'make fix-copies'" >> $GITHUB_STEP_SUMMARY
PIPELINE_USAGE_CUTOFF:1000000000# set high cutoff so that only always-test pipelines run
@@ -31,14 +32,14 @@ jobs:
check_code_quality:
runs-on:ubuntu-22.04
steps:
- uses:actions/checkout@v3
- uses:actions/checkout@v6
- name:Set up Python
uses:actions/setup-python@v4
uses:actions/setup-python@v6
with:
python-version:"3.8"
- name:Install dependencies
run:|
python -m pip install --upgrade pip
pip install --upgrade pip
pip install .[quality]
- name:Check quality
run:make quality
@@ -51,14 +52,14 @@ jobs:
needs:check_code_quality
runs-on:ubuntu-22.04
steps:
- uses:actions/checkout@v3
- uses:actions/checkout@v6
- name:Set up Python
uses:actions/setup-python@v4
uses:actions/setup-python@v6
with:
python-version:"3.8"
- name:Install dependencies
run:|
python -m pip install --upgrade pip
pip install --upgrade pip
pip install .[quality]
- name:Check repo consistency
run:|
@@ -70,7 +71,7 @@ jobs:
if:${{ failure() }}
run:|
echo "Repo consistency check failed. Please ensure the right dependency versions are installed with 'pip install -e .[quality]' and run 'make fix-copies'" >> $GITHUB_STEP_SUMMARY
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->
# How to contribute to Diffusers 🧨
We ❤️ contributions from the open-source community! Everyone is welcome, and all types of participation –not just code– are valued and appreciated. Answering questions, helping others, reaching out, and improving the documentation are all immensely valuable to the community, so don't be afraid and get involved if you're up for it!
Everyone is encouraged to start by saying 👋 in our public Discord channel. We discuss the latest trends in diffusion models, ask questions, show off personal projects, help each other with contributions, or just hang out ☕. <a href="https://discord.gg/G7tWnz98XR"><img alt="Join us on Discord" src="https://img.shields.io/discord/823813159592001537?color=5865F2&logo=Discord&logoColor=white"></a>
Whichever way you choose to contribute, we strive to be part of an open, welcoming, and kind community. Please, read our [code of conduct](https://github.com/huggingface/diffusers/blob/main/CODE_OF_CONDUCT.md) and be mindful to respect it during your interactions. We also recommend you become familiar with the [ethical guidelines](https://huggingface.co/docs/diffusers/conceptual/ethical_guidelines) that guide our project and ask you to adhere to the same principles of transparency and responsibility.
We enormously value feedback from the community, so please do not be afraid to speak up if you believe you have valuable feedback that can help improve the library - every message, comment, issue, and pull request (PR) is read and considered.
## Overview
You can contribute in many ways ranging from answering questions on issues to adding new diffusion models to
the core library.
In the following, we give an overview of different ways to contribute, ranked by difficulty in ascending order. All of them are valuable to the community.
* 1. Asking and answering questions on [the Diffusers discussion forum](https://discuss.huggingface.co/c/discussion-related-to-httpsgithubcomhuggingfacediffusers) or on [Discord](https://discord.gg/G7tWnz98XR).
* 2. Opening new issues on [the GitHub Issues tab](https://github.com/huggingface/diffusers/issues/new/choose).
* 3. Answering issues on [the GitHub Issues tab](https://github.com/huggingface/diffusers/issues).
* 4. Fix a simple issue, marked by the "Good first issue" label, see [here](https://github.com/huggingface/diffusers/issues?q=is%3Aopen+is%3Aissue+label%3A%22good+first+issue%22).
* 5. Contribute to the [documentation](https://github.com/huggingface/diffusers/tree/main/docs/source).
* 6. Contribute a [Community Pipeline](https://github.com/huggingface/diffusers/issues?q=is%3Aopen+is%3Aissue+label%3Acommunity-examples).
* 7. Contribute to the [examples](https://github.com/huggingface/diffusers/tree/main/examples).
* 8. Fix a more difficult issue, marked by the "Good second issue" label, see [here](https://github.com/huggingface/diffusers/issues?q=is%3Aopen+is%3Aissue+label%3A%22Good+second+issue%22).
* 9. Add a new pipeline, model, or scheduler, see ["New Pipeline/Model"](https://github.com/huggingface/diffusers/issues?q=is%3Aopen+is%3Aissue+label%3A%22New+pipeline%2Fmodel%22) and ["New scheduler"](https://github.com/huggingface/diffusers/issues?q=is%3Aopen+is%3Aissue+label%3A%22New+scheduler%22) issues. For this contribution, please have a look at [Design Philosophy](https://github.com/huggingface/diffusers/blob/main/PHILOSOPHY.md).
As said before, **all contributions are valuable to the community**.
In the following, we will explain each contribution a bit more in detail.
For all contributions 4-9, you will need to open a PR. It is explained in detail how to do so in [Opening a pull request](#how-to-open-a-pr).
### 1. Asking and answering questions on the Diffusers discussion forum or on the Diffusers Discord
Any question or comment related to the Diffusers library can be asked on the [discussion forum](https://discuss.huggingface.co/c/discussion-related-to-httpsgithubcomhuggingfacediffusers/) or on [Discord](https://discord.gg/G7tWnz98XR). Such questions and comments include (but are not limited to):
- Reports of training or inference experiments in an attempt to share knowledge
- Presentation of personal projects
- Questions to non-official training examples
- Project proposals
- General feedback
- Paper summaries
- Asking for help on personal projects that build on top of the Diffusers library
- General questions
- Ethical questions regarding diffusion models
- ...
Every question that is asked on the forum or on Discord actively encourages the community to publicly
share knowledge and might very well help a beginner in the future who has the same question you're
having. Please do pose any questions you might have.
In the same spirit, you are of immense help to the community by answering such questions because this way you are publicly documenting knowledge for everybody to learn from.
**Please** keep in mind that the more effort you put into asking or answering a question, the higher
the quality of the publicly documented knowledge. In the same way, well-posed and well-answered questions create a high-quality knowledge database accessible to everybody, while badly posed questions or answers reduce the overall quality of the public knowledge database.
In short, a high quality question or answer is *precise*, *concise*, *relevant*, *easy-to-understand*, *accessible*, and *well-formatted/well-posed*. For more information, please have a look through the [How to write a good issue](#how-to-write-a-good-issue) section.
**NOTE about channels**:
[*The forum*](https://discuss.huggingface.co/c/discussion-related-to-httpsgithubcomhuggingfacediffusers/63) is much better indexed by search engines, such as Google. Posts are ranked by popularity rather than chronologically. Hence, it's easier to look up questions and answers that we posted some time ago.
In addition, questions and answers posted in the forum can easily be linked to.
In contrast, *Discord* has a chat-like format that invites fast back-and-forth communication.
While it will most likely take less time for you to get an answer to your question on Discord, your
question won't be visible anymore over time. Also, it's much harder to find information that was posted a while back on Discord. We therefore strongly recommend using the forum for high-quality questions and answers in an attempt to create long-lasting knowledge for the community. If discussions on Discord lead to very interesting answers and conclusions, we recommend posting the results on the forum to make the information more available for future readers.
### 2. Opening new issues on the GitHub issues tab
The 🧨 Diffusers library is robust and reliable thanks to the users who notify us of
the problems they encounter. So thank you for reporting an issue.
Remember, GitHub issues are reserved for technical questions directly related to the Diffusers library, bug reports, feature requests, or feedback on the library design.
In a nutshell, this means that everything that is **not** related to the **code of the Diffusers library** (including the documentation) should **not** be asked on GitHub, but rather on either the [forum](https://discuss.huggingface.co/c/discussion-related-to-httpsgithubcomhuggingfacediffusers/63) or [Discord](https://discord.gg/G7tWnz98XR).
**Please consider the following guidelines when opening a new issue**:
- Make sure you have searched whether your issue has already been asked before (use the search bar on GitHub under Issues).
- Please never report a new issue on another (related) issue. If another issue is highly related, please
open a new issue nevertheless and link to the related issue.
- Make sure your issue is written in English. Please use one of the great, free online translation services, such as [DeepL](https://www.deepl.com/translator) to translate from your native language to English if you are not comfortable in English.
- Check whether your issue might be solved by updating to the newest Diffusers version. Before posting your issue, please make sure that `python -c "import diffusers; print(diffusers.__version__)"` is higher or matches the latest Diffusers version.
- Remember that the more effort you put into opening a new issue, the higher the quality of your answer will be and the better the overall quality of the Diffusers issues.
New issues usually include the following.
#### 2.1. Reproducible, minimal bug reports
A bug report should always have a reproducible code snippet and be as minimal and concise as possible.
This means in more detail:
- Narrow the bug down as much as you can, **do not just dump your whole code file**.
- Format your code.
- Do not include any external libraries except for Diffusers depending on them.
- **Always** provide all necessary information about your environment; for this, you can run: `diffusers-cli env` in your shell and copy-paste the displayed information to the issue.
- Explain the issue. If the reader doesn't know what the issue is and why it is an issue, she cannot solve it.
- **Always** make sure the reader can reproduce your issue with as little effort as possible. If your code snippet cannot be run because of missing libraries or undefined variables, the reader cannot help you. Make sure your reproducible code snippet is as minimal as possible and can be copy-pasted into a simple Python shell.
- If in order to reproduce your issue a model and/or dataset is required, make sure the reader has access to that model or dataset. You can always upload your model or dataset to the [Hub](https://huggingface.co) to make it easily downloadable. Try to keep your model and dataset as small as possible, to make the reproduction of your issue as effortless as possible.
For more information, please have a look through the [How to write a good issue](#how-to-write-a-good-issue) section.
You can open a bug report [here](https://github.com/huggingface/diffusers/issues/new?assignees=&labels=bug&projects=&template=bug-report.yml).
#### 2.2. Feature requests
A world-class feature request addresses the following points:
1. Motivation first:
* Is it related to a problem/frustration with the library? If so, please explain
why. Providing a code snippet that demonstrates the problem is best.
* Is it related to something you would need for a project? We'd love to hear
about it!
* Is it something you worked on and think could benefit the community?
Awesome! Tell us what problem it solved for you.
2. Write a *full paragraph* describing the feature;
3. Provide a **code snippet** that demonstrates its future use;
4. In case this is related to a paper, please attach a link;
5. Attach any additional information (drawings, screenshots, etc.) you think may help.
You can open a feature request [here](https://github.com/huggingface/diffusers/issues/new?assignees=&labels=&template=feature_request.md&title=).
#### 2.3 Feedback
Feedback about the library design and why it is good or not good helps the core maintainers immensely to build a user-friendly library. To understand the philosophy behind the current design philosophy, please have a look [here](https://huggingface.co/docs/diffusers/conceptual/philosophy). If you feel like a certain design choice does not fit with the current design philosophy, please explain why and how it should be changed. If a certain design choice follows the design philosophy too much, hence restricting use cases, explain why and how it should be changed.
If a certain design choice is very useful for you, please also leave a note as this is great feedback for future design decisions.
You can open an issue about feedback [here](https://github.com/huggingface/diffusers/issues/new?assignees=&labels=&template=feedback.md&title=).
#### 2.4 Technical questions
Technical questions are mainly about why certain code of the library was written in a certain way, or what a certain part of the code does. Please make sure to link to the code in question and please provide detail on
why this part of the code is difficult to understand.
You can open an issue about a technical question [here](https://github.com/huggingface/diffusers/issues/new?assignees=&labels=bug&template=bug-report.yml).
#### 2.5 Proposal to add a new model, scheduler, or pipeline
If the diffusion model community released a new model, pipeline, or scheduler that you would like to see in the Diffusers library, please provide the following information:
* Short description of the diffusion pipeline, model, or scheduler and link to the paper or public release.
* Link to any of its open-source implementation.
* Link to the model weights if they are available.
If you are willing to contribute to the model yourself, let us know so we can best guide you. Also, don't forget
to tag the original author of the component (model, scheduler, pipeline, etc.) by GitHub handle if you can find it.
You can open a request for a model/pipeline/scheduler [here](https://github.com/huggingface/diffusers/issues/new?assignees=&labels=New+model%2Fpipeline%2Fscheduler&template=new-model-addition.yml).
### 3. Answering issues on the GitHub issues tab
Answering issues on GitHub might require some technical knowledge of Diffusers, but we encourage everybody to give it a try even if you are not 100% certain that your answer is correct.
Some tips to give a high-quality answer to an issue:
- Be as concise and minimal as possible.
- Stay on topic. An answer to the issue should concern the issue and only the issue.
- Provide links to code, papers, or other sources that prove or encourage your point.
- Answer in code. If a simple code snippet is the answer to the issue or shows how the issue can be solved, please provide a fully reproducible code snippet.
Also, many issues tend to be simply off-topic, duplicates of other issues, or irrelevant. It is of great
help to the maintainers if you can answer such issues, encouraging the author of the issue to be
more precise, provide the link to a duplicated issue or redirect them to [the forum](https://discuss.huggingface.co/c/discussion-related-to-httpsgithubcomhuggingfacediffusers/63) or [Discord](https://discord.gg/G7tWnz98XR).
If you have verified that the issued bug report is correct and requires a correction in the source code,
please have a look at the next sections.
For all of the following contributions, you will need to open a PR. It is explained in detail how to do so in the [Opening a pull request](#how-to-open-a-pr) section.
### 4. Fixing a "Good first issue"
*Good first issues* are marked by the [Good first issue](https://github.com/huggingface/diffusers/issues?q=is%3Aopen+is%3Aissue+label%3A%22good+first+issue%22) label. Usually, the issue already
explains how a potential solution should look so that it is easier to fix.
If the issue hasn't been closed and you would like to try to fix this issue, you can just leave a message "I would like to try this issue.". There are usually three scenarios:
- a.) The issue description already proposes a fix. In this case and if the solution makes sense to you, you can open a PR or draft PR to fix it.
- b.) The issue description does not propose a fix. In this case, you can ask what a proposed fix could look like and someone from the Diffusers team should answer shortly. If you have a good idea of how to fix it, feel free to directly open a PR.
- c.) There is already an open PR to fix the issue, but the issue hasn't been closed yet. If the PR has gone stale, you can simply open a new PR and link to the stale PR. PRs often go stale if the original contributor who wanted to fix the issue suddenly cannot find the time anymore to proceed. This often happens in open-source and is very normal. In this case, the community will be very happy if you give it a new try and leverage the knowledge of the existing PR. If there is already a PR and it is active, you can help the author by giving suggestions, reviewing the PR or even asking whether you can contribute to the PR.
### 5. Contribute to the documentation
A good library **always** has good documentation! The official documentation is often one of the first points of contact for new users of the library, and therefore contributing to the documentation is a **highly
valuable contribution**.
Contributing to the library can have many forms:
- Correcting spelling or grammatical errors.
- Correct incorrect formatting of the docstring. If you see that the official documentation is weirdly displayed or a link is broken, we are very happy if you take some time to correct it.
- Correct the shape or dimensions of a docstring input or output tensor.
- Clarify documentation that is hard to understand or incorrect.
- Update outdated code examples.
- Translating the documentation to another language.
Anything displayed on [the official Diffusers doc page](https://huggingface.co/docs/diffusers/index) is part of the official documentation and can be corrected, adjusted in the respective [documentation source](https://github.com/huggingface/diffusers/tree/main/docs/source).
Please have a look at [this page](https://github.com/huggingface/diffusers/tree/main/docs) on how to verify changes made to the documentation locally.
### 6. Contribute a community pipeline
[Pipelines](https://huggingface.co/docs/diffusers/api/pipelines/overview) are usually the first point of contact between the Diffusers library and the user.
Pipelines are examples of how to use Diffusers [models](https://huggingface.co/docs/diffusers/api/models/overview) and [schedulers](https://huggingface.co/docs/diffusers/api/schedulers/overview).
We support two types of pipelines:
- Official Pipelines
- Community Pipelines
Both official and community pipelines follow the same design and consist of the same type of components.
Official pipelines are tested and maintained by the core maintainers of Diffusers. Their code
resides in [src/diffusers/pipelines](https://github.com/huggingface/diffusers/tree/main/src/diffusers/pipelines).
In contrast, community pipelines are contributed and maintained purely by the **community** and are **not** tested.
They reside in [examples/community](https://github.com/huggingface/diffusers/tree/main/examples/community) and while they can be accessed via the [PyPI diffusers package](https://pypi.org/project/diffusers/), their code is not part of the PyPI distribution.
The reason for the distinction is that the core maintainers of the Diffusers library cannot maintain and test all
possible ways diffusion models can be used for inference, but some of them may be of interest to the community.
Officially released diffusion pipelines,
such as Stable Diffusion are added to the core src/diffusers/pipelines package which ensures
high quality of maintenance, no backward-breaking code changes, and testing.
More bleeding edge pipelines should be added as community pipelines. If usage for a community pipeline is high, the pipeline can be moved to the official pipelines upon request from the community. This is one of the ways we strive to be a community-driven library.
To add a community pipeline, one should add a <name-of-the-community>.py file to [examples/community](https://github.com/huggingface/diffusers/tree/main/examples/community) and adapt the [examples/community/README.md](https://github.com/huggingface/diffusers/tree/main/examples/community/README.md) to include an example of the new pipeline.
An example can be seen [here](https://github.com/huggingface/diffusers/pull/2400).
Community pipeline PRs are only checked at a superficial level and ideally they should be maintained by their original authors.
Contributing a community pipeline is a great way to understand how Diffusers models and schedulers work. Having contributed a community pipeline is usually the first stepping stone to contributing an official pipeline to the
core package.
### 7. Contribute to training examples
Diffusers examples are a collection of training scripts that reside in [examples](https://github.com/huggingface/diffusers/tree/main/examples).
We support two types of training examples:
- Official training examples
- Research training examples
Research training examples are located in [examples/research_projects](https://github.com/huggingface/diffusers/tree/main/examples/research_projects) whereas official training examples include all folders under [examples](https://github.com/huggingface/diffusers/tree/main/examples) except the `research_projects` and `community` folders.
The official training examples are maintained by the Diffusers' core maintainers whereas the research training examples are maintained by the community.
This is because of the same reasons put forward in [6. Contribute a community pipeline](#6-contribute-a-community-pipeline) for official pipelines vs. community pipelines: It is not feasible for the core maintainers to maintain all possible training methods for diffusion models.
If the Diffusers core maintainers and the community consider a certain training paradigm to be too experimental or not popular enough, the corresponding training code should be put in the `research_projects` folder and maintained by the author.
Both official training and research examples consist of a directory that contains one or more training scripts, a `requirements.txt` file, and a `README.md` file. In order for the user to make use of the
training examples, it is required to clone the repository:
Therefore when adding an example, the `requirements.txt` file shall define all pip dependencies required for your training example so that once all those are installed, the user can run the example's training script. See, for example, the [DreamBooth `requirements.txt` file](https://github.com/huggingface/diffusers/blob/main/examples/dreambooth/requirements.txt).
Training examples of the Diffusers library should adhere to the following philosophy:
- All the code necessary to run the examples should be found in a single Python file.
- One should be able to run the example from the command line with `python <your-example>.py --args`.
- Examples should be kept simple and serve as **an example** on how to use Diffusers for training. The purpose of example scripts is **not** to create state-of-the-art diffusion models, but rather to reproduce known training schemes without adding too much custom logic. As a byproduct of this point, our examples also strive to serve as good educational materials.
To contribute an example, it is highly recommended to look at already existing examples such as [dreambooth](https://github.com/huggingface/diffusers/blob/main/examples/dreambooth/train_dreambooth.py) to get an idea of how they should look like.
We strongly advise contributors to make use of the [Accelerate library](https://github.com/huggingface/accelerate) as it's tightly integrated
with Diffusers.
Once an example script works, please make sure to add a comprehensive `README.md` that states how to use the example exactly. This README should include:
- An example command on how to run the example script as shown [here e.g.](https://github.com/huggingface/diffusers/tree/main/examples/dreambooth#running-locally-with-pytorch).
- A link to some training results (logs, models, ...) that show what the user can expect as shown [here e.g.](https://api.wandb.ai/report/patrickvonplaten/xm6cd5q5).
- If you are adding a non-official/research training example, **please don't forget** to add a sentence that you are maintaining this training example which includes your git handle as shown [here](https://github.com/huggingface/diffusers/tree/main/examples/research_projects/intel_opts#diffusers-examples-with-intel-optimizations).
If you are contributing to the official training examples, please also make sure to add a test to [examples/test_examples.py](https://github.com/huggingface/diffusers/blob/main/examples/test_examples.py). This is not necessary for non-official training examples.
### 8. Fixing a "Good second issue"
*Good second issues* are marked by the [Good second issue](https://github.com/huggingface/diffusers/issues?q=is%3Aopen+is%3Aissue+label%3A%22Good+second+issue%22) label. Good second issues are
usually more complicated to solve than [Good first issues](https://github.com/huggingface/diffusers/issues?q=is%3Aopen+is%3Aissue+label%3A%22good+first+issue%22).
The issue description usually gives less guidance on how to fix the issue and requires
a decent understanding of the library by the interested contributor.
If you are interested in tackling a good second issue, feel free to open a PR to fix it and link the PR to the issue. If you see that a PR has already been opened for this issue but did not get merged, have a look to understand why it wasn't merged and try to open an improved PR.
Good second issues are usually more difficult to get merged compared to good first issues, so don't hesitate to ask for help from the core maintainers. If your PR is almost finished the core maintainers can also jump into your PR and commit to it in order to get it merged.
### 9. Adding pipelines, models, schedulers
Pipelines, models, and schedulers are the most important pieces of the Diffusers library.
They provide easy access to state-of-the-art diffusion technologies and thus allow the community to
build powerful generative AI applications.
By adding a new model, pipeline, or scheduler you might enable a new powerful use case for any of the user interfaces relying on Diffusers which can be of immense value for the whole generative AI ecosystem.
Diffusers has a couple of open feature requests for all three components - feel free to gloss over them
if you don't know yet what specific component you would like to add:
- [Model or pipeline](https://github.com/huggingface/diffusers/issues?q=is%3Aopen+is%3Aissue+label%3A%22New+pipeline%2Fmodel%22)
Before adding any of the three components, it is strongly recommended that you give the [Philosophy guide](https://github.com/huggingface/diffusers/blob/main/PHILOSOPHY.md) a read to better understand the design of any of the three components. Please be aware that
we cannot merge model, scheduler, or pipeline additions that strongly diverge from our design philosophy
as it will lead to API inconsistencies. If you fundamentally disagree with a design choice, please
open a [Feedback issue](https://github.com/huggingface/diffusers/issues/new?assignees=&labels=&template=feedback.md&title=) instead so that it can be discussed whether a certain design
pattern/design choice shall be changed everywhere in the library and whether we shall update our design philosophy. Consistency across the library is very important for us.
Please make sure to add links to the original codebase/paper to the PR and ideally also ping the
original author directly on the PR so that they can follow the progress and potentially help with questions.
If you are unsure or stuck in the PR, don't hesitate to leave a message to ask for a first review or help.
## How to write a good issue
**The better your issue is written, the higher the chances that it will be quickly resolved.**
1. Make sure that you've used the correct template for your issue. You can pick between *Bug Report*, *Feature Request*, *Feedback about API Design*, *New model/pipeline/scheduler addition*, *Forum*, or a blank issue. Make sure to pick the correct one when opening [a new issue](https://github.com/huggingface/diffusers/issues/new/choose).
2.**Be precise**: Give your issue a fitting title. Try to formulate your issue description as simple as possible. The more precise you are when submitting an issue, the less time it takes to understand the issue and potentially solve it. Make sure to open an issue for one issue only and not for multiple issues. If you found multiple issues, simply open multiple issues. If your issue is a bug, try to be as precise as possible about what bug it is - you should not just write "Error in diffusers".
3.**Reproducibility**: No reproducible code snippet == no solution. If you encounter a bug, maintainers **have to be able to reproduce** it. Make sure that you include a code snippet that can be copy-pasted into a Python interpreter to reproduce the issue. Make sure that your code snippet works, *i.e.* that there are no missing imports or missing links to images, ... Your issue should contain an error message **and** a code snippet that can be copy-pasted without any changes to reproduce the exact same error message. If your issue is using local model weights or local data that cannot be accessed by the reader, the issue cannot be solved. If you cannot share your data or model, try to make a dummy model or dummy data.
4.**Minimalistic**: Try to help the reader as much as you can to understand the issue as quickly as possible by staying as concise as possible. Remove all code / all information that is irrelevant to the issue. If you have found a bug, try to create the easiest code example you can to demonstrate your issue, do not just dump your whole workflow into the issue as soon as you have found a bug. E.g., if you train a model and get an error at some point during the training, you should first try to understand what part of the training code is responsible for the error and try to reproduce it with a couple of lines. Try to use dummy data instead of full datasets.
5. Add links. If you are referring to a certain naming, method, or model make sure to provide a link so that the reader can better understand what you mean. If you are referring to a specific PR or issue, make sure to link it to your issue. Do not assume that the reader knows what you are talking about. The more links you add to your issue the better.
6. Formatting. Make sure to nicely format your issue by formatting code into Python code syntax, and error messages into normal code syntax. See the [official GitHub formatting docs](https://docs.github.com/en/get-started/writing-on-github/getting-started-with-writing-and-formatting-on-github/basic-writing-and-formatting-syntax) for more information.
7. Think of your issue not as a ticket to be solved, but rather as a beautiful entry to a well-written encyclopedia. Every added issue is a contribution to publicly available knowledge. By adding a nicely written issue you not only make it easier for maintainers to solve your issue, but you are helping the whole community to better understand a certain aspect of the library.
## How to write a good PR
1. Be a chameleon. Understand existing design patterns and syntax and make sure your code additions flow seamlessly into the existing code base. Pull requests that significantly diverge from existing design patterns or user interfaces will not be merged.
2. Be laser focused. A pull request should solve one problem and one problem only. Make sure to not fall into the trap of "also fixing another problem while we're adding it". It is much more difficult to review pull requests that solve multiple, unrelated problems at once.
3. If helpful, try to add a code snippet that displays an example of how your addition can be used.
4. The title of your pull request should be a summary of its contribution.
5. If your pull request addresses an issue, please mention the issue number in
the pull request description to make sure they are linked (and people
consulting the issue know you are working on it);
6. To indicate a work in progress please prefix the title with `[WIP]`. These
are useful to avoid duplicated work, and to differentiate it from PRs ready
to be merged;
7. Try to formulate and format your text as explained in [How to write a good issue](#how-to-write-a-good-issue).
8. Make sure existing tests pass;
9. Add high-coverage tests. No quality testing = no merge.
- If you are adding new `@slow` tests, make sure they pass using
CircleCI does not run the slow tests, but GitHub Actions does every night!
10. All public methods must have informative docstrings that work nicely with markdown. See [`pipeline_latent_diffusion.py`](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/latent_diffusion/pipeline_latent_diffusion.py) for an example.
11. Due to the rapidly growing repository, it is important to make sure that no files that would significantly weigh down the repository are added. This includes images, videos, and other non-text files. We prefer to leverage a hf.co hosted `dataset` like
[`hf-internal-testing`](https://huggingface.co/hf-internal-testing) or [huggingface/documentation-images](https://huggingface.co/datasets/huggingface/documentation-images) to place these files.
If an external contribution, feel free to add the images to your PR and ask a Hugging Face member to migrate your images
to this dataset.
## How to open a PR
Before writing code, we strongly advise you to search through the existing PRs or
issues to make sure that nobody is already working on the same thing. If you are
unsure, it is always a good idea to open an issue to get some feedback.
You will need basic `git` proficiency to be able to contribute to
🧨 Diffusers. `git` is not the easiest tool to use but it has the greatest
manual. Type `git --help` in a shell and enjoy. If you prefer books, [Pro
Git](https://git-scm.com/book/en/v2) is a very good reference.
Follow these steps to start contributing ([supported Python versions](https://github.com/huggingface/diffusers/blob/42f25d601a910dceadaee6c44345896b4cfa9928/setup.py#L270)):
1. Fork the [repository](https://github.com/huggingface/diffusers) by
clicking on the 'Fork' button on the repository's page. This creates a copy of the code
under your GitHub user account.
2. Clone your fork to your local disk, and add the base repository as a remote:
### Syncing forked main with upstream (HuggingFace) main
To avoid pinging the upstream repository which adds reference notes to each upstream PR and sends unnecessary notifications to the developers involved in these PRs,
when syncing the main branch of a forked repository, please, follow these steps:
1. When possible, avoid syncing with the upstream using a branch and PR on the forked repository. Instead, merge directly into the forked main.
2. If a PR is absolutely necessary, use the following steps after checking out your branch:
```bash
$ git checkout -b your-branch-for-syncing
$ git pull --squash --no-commit upstream main
$ git commit -m '<your message without GitHub references>'
We recommend installing 🤗 Diffusers in a virtual environment from PyPI or Conda. For more details about installing [PyTorch](https://pytorch.org/get-started/locally/) and [Flax](https://flax.readthedocs.io/en/latest/#installation), please refer to their official documentation.
We recommend installing 🤗 Diffusers in a virtual environment from PyPI or Conda. For more details about installing [PyTorch](https://pytorch.org/get-started/locally/), please refer to their official documentation.
### PyTorch
@@ -53,14 +53,6 @@ With `conda` (maintained by the community):
conda install -c conda-forge diffusers
```
### Flax
With `pip` (official package):
```bash
pip install --upgrade diffusers[flax]
```
### Apple Silicon (M1/M2) support
Please refer to the [How to use Stable Diffusion in Apple Silicon](https://huggingface.co/docs/diffusers/optimization/mps) guide.
@@ -179,7 +171,7 @@ Also, say 👋 in our public Discord channel <a href="https://discord.gg/G7tWnz9
Welcome to Diffusers Benchmarks. These benchmarks are use to obtain latency and memory information of the most popular models across different scenarios such as:
* Base case i.e., when using `torch.bfloat16` and `torch.nn.functional.scaled_dot_product_attention`.
* Base + `torch.compile()`
* NF4 quantization
* Layerwise upcasting
Instead of full diffusion pipelines, only the forward pass of the respective model classes (such as `FluxTransformer2DModel`) is tested with the real checkpoints (such as `"black-forest-labs/FLUX.1-dev"`).
The entrypoint to running all the currently available benchmarks is in `run_all.py`. However, one can run the individual benchmarks, too, e.g., `python benchmarking_flux.py`. It should produce a CSV file containing various information about the benchmarks run.
The benchmarks are run on a weekly basis and the CI is defined in [benchmark.yml](../.github/workflows/benchmark.yml).
## Running the benchmarks manually
First set up `torch` and install `diffusers` from the root of the directory:
```py
pipinstall-e".[quality,test]"
```
Then make sure the other dependencies are installed:
```sh
cd benchmarks/
pip install -r requirements.txt
```
We need to be authenticated to access some of the checkpoints used during benchmarking:
```sh
hf auth login
```
We use an L40 GPU with 128GB RAM to run the benchmark CI. As such, the benchmarks are configured to run on NVIDIA GPUs. So, make sure you have access to a similar machine (or modify the benchmarking scripts accordingly).
Then you can either launch the entire benchmarking suite by running:
```sh
python run_all.py
```
Or, you can run the individual benchmarks.
## Customizing the benchmarks
We define "scenarios" to cover the most common ways in which these models are used. You can
define a new scenario, modifying an existing benchmark file:
You can also configure a new model-level benchmark and add it to the existing suite. To do so, just defining a valid benchmarking file like `benchmarking_flux.py` should be enough.
<!-- Copyright 2024 The HuggingFace Team. All rights reserved.
<!-- Copyright 2025 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
@@ -11,72 +11,32 @@ specific language governing permissions and limitations under the License. -->
# Caching methods
## Pyramid Attention Broadcast
Cache methods speedup diffusion transformers by storing and reusing intermediate outputs of specific layers, such as attention and feedforward layers, instead of recalculating them at each inference step.
[Pyramid Attention Broadcast](https://huggingface.co/papers/2408.12588) from Xuanlei Zhao, Xiaolong Jin, Kai Wang, Yang You.
Pyramid Attention Broadcast (PAB) is a method that speeds up inference in diffusion models by systematically skipping attention computations between successive inference steps and reusing cached attention states. The attention states are not very different between successive inference steps. The most prominent difference is in the spatial attention blocks, not as much in the temporal attention blocks, and finally the least in the cross attention blocks. Therefore, many cross attention computation blocks can be skipped, followed by the temporal and spatial attention blocks. By combining other techniques like sequence parallelism and classifier-free guidance parallelism, PAB achieves near real-time video generation.
Enable PAB with [`~PyramidAttentionBroadcastConfig`] on any pipeline. For some benchmarks, refer to [this](https://github.com/huggingface/diffusers/pull/9562) pull request.
[FasterCache](https://huggingface.co/papers/2410.19355) from Zhengyao Lv, Chenyang Si, Junhao Song, Zhenyu Yang, Yu Qiao, Ziwei Liu, Kwan-Yee K. Wong.
FasterCache is a method that speeds up inference in diffusion transformers by:
- Reusing attention states between successive inference steps, due to high similarity between them
- Skipping unconditional branch prediction used in classifier-free guidance by revealing redundancies between unconditional and conditional branch outputs for the same timestep, and therefore approximating the unconditional branch output using the conditional branch output
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
<!--Copyright 2025 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
@@ -14,11 +14,8 @@ specific language governing permissions and limitations under the License.
Schedulers from [`~schedulers.scheduling_utils.SchedulerMixin`] and models from [`ModelMixin`] inherit from [`ConfigMixin`] which stores all the parameters that are passed to their respective `__init__` methods in a JSON-configuration file.
<Tip>
To use private or [gated](https://huggingface.co/docs/hub/models-gated#gated-models) models, log-in with `huggingface-cli login`.
</Tip>
> [!TIP]
> To use private or [gated](https://huggingface.co/docs/hub/models-gated#gated-models) models, log-in with `hf auth login`.
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
<!--Copyright 2025 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
@@ -20,6 +20,12 @@ All pipelines with [`VaeImageProcessor`] accept PIL Image, PyTorch tensor, or Nu
[[autodoc]] image_processor.VaeImageProcessor
## InpaintProcessor
The [`InpaintProcessor`] accepts `mask` and `image` inputs and process them together. Optionally, it can accept padding_mask_crop and apply mask overlay.
[[autodoc]] image_processor.InpaintProcessor
## VaeImageProcessorLDM3D
The [`VaeImageProcessorLDM3D`] accepts RGB and depth inputs and returns RGB and depth outputs.
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
<!--Copyright 2025 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
@@ -14,11 +14,8 @@ specific language governing permissions and limitations under the License.
[IP-Adapter](https://hf.co/papers/2308.06721) is a lightweight adapter that enables prompting a diffusion model with an image. This method decouples the cross-attention layers of the image and text features. The image features are generated from an image encoder.
<Tip>
Learn how to load an IP-Adapter checkpoint and image in the IP-Adapter [loading](../../using-diffusers/loading_adapters#ip-adapter) guide, and you can see how to use it in the [usage](../../using-diffusers/ip_adapter) guide.
</Tip>
> [!TIP]
> Learn how to load and use an IP-Adapter checkpoint and image in the [IP-Adapter](../../using-diffusers/ip_adapter) guide,.
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
<!--Copyright 2025 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
@@ -26,16 +26,22 @@ LoRA is a fast and lightweight training method that inserts and trains a signifi
- [`HunyuanVideoLoraLoaderMixin`] provides similar functions for [HunyuanVideo](https://huggingface.co/docs/diffusers/main/en/api/pipelines/hunyuan_video).
- [`Lumina2LoraLoaderMixin`] provides similar functions for [Lumina2](https://huggingface.co/docs/diffusers/main/en/api/pipelines/lumina2).
- [`WanLoraLoaderMixin`] provides similar functions for [Wan](https://huggingface.co/docs/diffusers/main/en/api/pipelines/wan).
- [`SkyReelsV2LoraLoaderMixin`] provides similar functions for [SkyReels-V2](https://huggingface.co/docs/diffusers/main/en/api/pipelines/skyreels_v2).
- [`CogView4LoraLoaderMixin`] provides similar functions for [CogView4](https://huggingface.co/docs/diffusers/main/en/api/pipelines/cogview4).
- [`AmusedLoraLoaderMixin`] is for the [`AmusedPipeline`].
- [`HiDreamImageLoraLoaderMixin`] provides similar functions for [HiDream Image](https://huggingface.co/docs/diffusers/main/en/api/pipelines/hidream)
- [`QwenImageLoraLoaderMixin`] provides similar functions for [Qwen Image](https://huggingface.co/docs/diffusers/main/en/api/pipelines/qwen).
- [`ZImageLoraLoaderMixin`] provides similar functions for [Z-Image](https://huggingface.co/docs/diffusers/main/en/api/pipelines/zimage).
- [`Flux2LoraLoaderMixin`] provides similar functions for [Flux2](https://huggingface.co/docs/diffusers/main/en/api/pipelines/flux2).
- [`LTX2LoraLoaderMixin`] provides similar functions for [Flux2](https://huggingface.co/docs/diffusers/main/en/api/pipelines/ltx2).
- [`LoraBaseMixin`] provides a base class with several utility methods to fuse, unfuse, unload, LoRAs and more.
<Tip>
> [!TIP]
> To learn more about how to load LoRA weights, see the [LoRA](../../tutorials/using_peft_for_inference) loading guide.
To learn more about how to load LoRA weights, see the [LoRA](../../using-diffusers/loading_adapters#lora) loading guide.
## LoraBaseMixin
</Tip>
[[autodoc]] loaders.lora_base.LoraBaseMixin
## StableDiffusionLoraLoaderMixin
@@ -53,6 +59,14 @@ To learn more about how to load LoRA weights, see the [LoRA](../../using-diffuse
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
<!--Copyright 2025 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
@@ -12,13 +12,10 @@ specific language governing permissions and limitations under the License.
# PEFT
Diffusers supports loading adapters such as [LoRA](../../using-diffusers/loading_adapters) with the [PEFT](https://huggingface.co/docs/peft/index) library with the [`~loaders.peft.PeftAdapterMixin`] class. This allows modeling classes in Diffusers like [`UNet2DConditionModel`], [`SD3Transformer2DModel`] to operate with an adapter.
Diffusers supports loading adapters such as [LoRA](../../tutorials/using_peft_for_inference) with the [PEFT](https://huggingface.co/docs/peft/index) library with the [`~loaders.peft.PeftAdapterMixin`] class. This allows modeling classes in Diffusers like [`UNet2DConditionModel`], [`SD3Transformer2DModel`] to operate with an adapter.
<Tip>
Refer to the [Inference with PEFT](../../tutorials/using_peft_for_inference.md) tutorial for an overview of how to use PEFT in Diffusers for inference.
</Tip>
> [!TIP]
> Refer to the [Inference with PEFT](../../tutorials/using_peft_for_inference.md) tutorial for an overview of how to use PEFT in Diffusers for inference.
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
<!--Copyright 2025 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
@@ -16,11 +16,8 @@ Textual Inversion is a training method for personalizing models by learning new
[`TextualInversionLoaderMixin`] provides a function for loading Textual Inversion embeddings from Diffusers and Automatic1111 into the text encoder and loading a special token to activate the embeddings.
<Tip>
To learn more about how to load Textual Inversion embeddings, see the [Textual Inversion](../../using-diffusers/loading_adapters#textual-inversion) loading guide.
</Tip>
> [!TIP]
> To learn more about how to load Textual Inversion embeddings, see the [Textual Inversion](../../using-diffusers/textual_inversion_inference) loading guide.
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
<!--Copyright 2025 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
@@ -16,11 +16,8 @@ Some training methods - like LoRA and Custom Diffusion - typically target the UN
The [`UNet2DConditionLoadersMixin`] class provides functions for loading and saving weights, fusing and unfusing LoRAs, disabling and enabling LoRAs, and setting and deleting adapters.
<Tip>
To learn more about how to load LoRA weights, see the [LoRA](../../using-diffusers/loading_adapters#lora) loading guide.
</Tip>
> [!TIP]
> To learn more about how to load LoRA weights, see the [LoRA](../../tutorials/using_peft_for_inference) guide.
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
<!--Copyright 2025 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
@@ -12,7 +12,7 @@ specific language governing permissions and limitations under the License.
# AsymmetricAutoencoderKL
Improved larger variational autoencoder (VAE) model with KL loss for inpainting task: [Designing a Better Asymmetric VQGAN for StableDiffusion](https://arxiv.org/abs/2306.04632) by Zixin Zhu, Xuelu Feng, Dongdong Chen, Jianmin Bao, Le Wang, Yinpeng Chen, Lu Yuan, Gang Hua.
Improved larger variational autoencoder (VAE) model with KL loss for inpainting task: [Designing a Better Asymmetric VQGAN for StableDiffusion](https://huggingface.co/papers/2306.04632) by Zixin Zhu, Xuelu Feng, Dongdong Chen, Jianmin Bao, Le Wang, Yinpeng Chen, Lu Yuan, Gang Hua.
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
<!--Copyright 2025 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
@@ -12,15 +12,7 @@ specific language governing permissions and limitations under the License.
# AutoModel
The `AutoModel` is designed to make it easy to load a checkpoint without needing to know the specific model class. `AutoModel` automatically retrieves the correct model class from the checkpoint `config.json` file.
<!-- Copyright 2025 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License. -->
# AutoencoderKLHunyuanImageRefiner
The 3D variational autoencoder (VAE) model with KL loss used in [HunyuanImage2.1](https://github.com/Tencent-Hunyuan/HunyuanImage-2.1) for its refiner pipeline.
The model can be loaded with the following code snippet.
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
<!--Copyright 2025 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
@@ -12,7 +12,7 @@ specific language governing permissions and limitations under the License.
# AutoencoderKL
The variational autoencoder (VAE) model with KL loss was introduced in [Auto-Encoding Variational Bayes](https://arxiv.org/abs/1312.6114v11) by Diederik P. Kingma and Max Welling. The model is used in 🤗 Diffusers to encode images into latents and to decode latent representations into images.
The variational autoencoder (VAE) model with KL loss was introduced in [Auto-Encoding Variational Bayes](https://huggingface.co/papers/1312.6114v11) by Diederik P. Kingma and Max Welling. The model is used in 🤗 Diffusers to encode images into latents and to decode latent representations into images.
The abstract from the paper is:
@@ -44,15 +44,3 @@ model = AutoencoderKL.from_single_file(url)
<!-- Copyright 2025 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License. -->
# AutoencoderKLLTX2Audio
The 3D variational autoencoder (VAE) model with KL loss used in [LTX-2](https://huggingface.co/Lightricks/LTX-2) was introduced by Lightricks. This is for encoding and decoding audio latent representations.
The model can be loaded with the following code snippet.
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.