* remove str option for quantization config in torchao
* Apply style fixes
* minor fixes
* Added AOBaseConfig docs to torchao.md
* minor fixes for removing str option torchao
* minor change to add back int and uint check
* minor fixes
* minor fixes to tests
* Update tests/quantization/torchao/test_torchao.py
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* Update docs/source/en/quantization/torchao.md
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* Update tests/quantization/torchao/test_torchao.py
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* version=2 update to test_torchao.py
---------
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* feat: implement three RAE encoders(dinov2, siglip2, mae)
* feat: finish first version of autoencoder_rae
* fix formatting
* make fix-copies
* initial doc
* fix latent_mean / latent_var init types to accept config-friendly inputs
* use mean and std convention
* cleanup
* add rae to diffusers script
* use imports
* use attention
* remove unneeded class
* example traiing script
* input and ground truth sizes have to be the same
* fix argument
* move loss to training script
* cleanup
* simplify mixins
* fix training script
* fix entrypoint for instantiating the AutoencoderRAE
* added encoder_image_size config
* undo last change
* fixes from pretrained weights
* cleanups
* address reviews
* fix train script to use pretrained
* fix conversion script review
* latebt normalization buffers are now always registered with no-op defaults
* Update examples/research_projects/autoencoder_rae/README.md
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* Update src/diffusers/models/autoencoders/autoencoder_rae.py
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* use image url
* Encoder is frozen
* fix slow test
* remove config
* use ModelTesterMixin and AutoencoderTesterMixin
* make quality
* strip final layernorm when converting
* _strip_final_layernorm_affine for training script
* fix test
* add dispatch forward and update conversion script
* update training script
* error out as soon as possible and add comments
* Update src/diffusers/models/autoencoders/autoencoder_rae.py
Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com>
* use buffer
* inline
* Update src/diffusers/models/autoencoders/autoencoder_rae.py
Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com>
* remove optional
* _noising takes a generator
* Update src/diffusers/models/autoencoders/autoencoder_rae.py
Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com>
* fix api
* rename
* remove unittest
* use randn_tensor
* fix device map on multigpu
* check if the key is missing in the original state dict and only then add to the allow_missing set
* remove initialize_weights
---------
Co-authored-by: wangyuqi <wangyuqi@MBP-FJDQNJTWYN-0208.local>
Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com>
* Fix Helios paper link in documentation
Updated the link to the Helios paper for accuracy.
* Fix reference link in HeliosTransformer3DModel documentation
Updated the reference link for the Helios Transformer model paper.
* Update Helios research paper link in documentation
* Update Helios research paper link in documentation
* LTX2 condition pipeline initial commit
* Fix pipeline import error
* Implement LTX-2-style general image conditioning
* Blend denoising output and clean latents in sample space instead of velocity space
* make style and make quality
* make fix-copies
* Rename LTX2VideoCondition image to frames
* Update LTX2ConditionPipeline example
* Remove support for image and video in __call__
* Put latent_idx_from_index logic inline
* Improve comment on using the conditioning mask in denoising loop
* Apply suggestions from code review
Co-authored-by: Álvaro Somoza <asomoza@users.noreply.github.com>
* make fix-copies
* Migrate to Python 3.9+ style type annotations without explicit typing imports
* Forward kwargs from preprocess/postprocess_video to preprocess/postprocess resp.
* Center crop LTX-2 conditions following original code
* Duplicate video and audio position ids if using CFG
* make style and make quality
* Remove unused index_type arg to preprocess_conditions
* Add # Copied from for _normalize_latents
* Fix _normalize_latents # Copied from statement
* Add LTX-2 condition pipeline docs
* Remove TODOs
* Support only unpacked latents (5D for video, 4D for audio)
* Remove # Copied from for prepare_audio_latents
---------
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: Álvaro Somoza <asomoza@users.noreply.github.com>
* up
* up up
* update outputs
* style
* add modular_auto_docstring!
* more auto docstring
* style
* up up up
* more more
* up
* address feedbacks
* add TODO in the description for empty docstring
* refactor based on dhruv's feedback: remove the class method
* add template method
* up
* up up up
* apply auto docstring
* make style
* rmove space in make docstring
* Apply suggestions from code review
* revert change in z
* fix
* Apply style fixes
* include auto-docstring check in the modular ci. (#13004)
* initial support: workflow
* up up
* treeat loop sequential pipeline blocks as leaf
* update qwen image docstring note
* add workflow support for sdxl
* add a test suit
* add test for qwen-image
* refactor flux a bit, seperate modular_blocks into modular_blocks_flux and modular_blocks_flux_kontext + support workflow
* refactor flux2: seperate blocks for klein_base + workflow
* qwen: remove import support for stuff other than the default blocks
* add workflow support for wan
* sdxl: remove some imports:
* refactor z
* update flux2 auto core denoise
* add workflow test for z and flux2
* Apply suggestions from code review
* Apply suggestions from code review
* add test for flux
* add workflow test for flux
* add test for flux-klein
* sdxl: modular_blocks.py -> modular_blocks_stable_diffusion_xl.py
* style
* up
* add auto docstring
* workflow_names -> available_workflows
* fix workflow test for klein base
* Apply suggestions from code review
Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>
* fix workflow tests
* qwen: edit -> image_conditioned to be consistent with flux kontext/2 such
* remove Optional
* update type hints
* update guider update_components
* fix more
* update docstring auto again
---------
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: yiyi@huggingface.co <yiyi@ip-26-0-160-103.ec2.internal>
Co-authored-by: yiyi@huggingface.co <yiyi@ip-26-0-161-123.ec2.internal>
Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>
* Support different pipeline outputs for LTX 2 encode_video
* Update examples to use improved encode_video function
* Fix comment
* Address review comments
* make style and make quality
* Have non-iterator video inputs respect video_chunks_number
* make style and make quality
* Add warning when encode_video receives a non-denormalized np.ndarray
* make style and make quality
---------
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* Add ZImageInpaintPipeline
Updated the pipeline structure to include ZImageInpaintPipeline
alongside ZImagePipeline and ZImageImg2ImgPipeline.
Implemented the ZImageInpaintPipeline class for inpainting
tasks, including necessary methods for encoding prompts,
preparing masked latents, and denoising.
Enhanced the auto_pipeline to map the new ZImageInpaintPipeline
for inpainting generation tasks.
Added unit tests for ZImageInpaintPipeline to ensure
functionality and performance.
Updated dummy objects to include ZImageInpaintPipeline for
testing purposes.
* Add documentation and improve test stability for ZImageInpaintPipeline
- Add torch.empty fix for x_pad_token and cap_pad_token in test
- Add # Copied from annotations for encode_prompt methods
- Add documentation with usage example and autodoc directive
* Address PR review feedback for ZImageInpaintPipeline
Add batch size validation and callback handling fixes per review,
using diffusers conventions rather than suggested code verbatim.
* Update src/diffusers/pipelines/z_image/pipeline_z_image_inpaint.py
Co-authored-by: Álvaro Somoza <asomoza@users.noreply.github.com>
* Update src/diffusers/pipelines/z_image/pipeline_z_image_inpaint.py
Co-authored-by: Álvaro Somoza <asomoza@users.noreply.github.com>
* Add input validation and fix XLA support for ZImageInpaintPipeline
- Add missing is_torch_xla_available import for TPU support
- Add xm.mark_step() in denoising loop for proper XLA execution
- Add check_inputs() method for comprehensive input validation
- Call check_inputs() at the start of __call__
Addresses PR review feedback from @asomoza.
* Cleanup
---------
Co-authored-by: Álvaro Somoza <asomoza@users.noreply.github.com>
* add metadata field to input/output param
* refactor mellonparam: move the template outside, add metaclass, define some generic template for custom node
* add from_custom_block
* style
* up up fix
* add mellon guide
* add to toctree
* style
* add mellon_types
* style
* mellon_type -> inpnt_types + output_types
* update doc
* add quant info to components manager
* fix more
* up up
* fix components manager
* update custom block guide
* update
* style
* add a warn for mellon and add new guides to overview
* Apply suggestions from code review
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/modular_diffusers/mellon.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* more update on custom block guide
* Update docs/source/en/modular_diffusers/mellon.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* a few mamual
* apply suggestion: turn into bullets
* support define mellon meta with MellonParam directly, and update doc
* add the video
---------
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
Co-authored-by: yiyi@huggingface.co <yiyi@ip-26-0-160-103.ec2.internal>
* add a real quick start guide
* Update docs/source/en/modular_diffusers/quickstart.md
* update a bit more
* fix
* Apply suggestions from code review
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/modular_diffusers/quickstart.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/modular_diffusers/quickstart.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* update more
* Apply suggestions from code review
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* address more feedbacks: move components amnager earlier, explain blocks vs sub-blocks etc
* more
* remove the link to mellon guide, not exist in this PR yet
---------
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* Feature: Add BriaFiboEditPipeline to diffusers
* Introduced BriaFiboEditPipeline class with necessary backend requirements.
* Updated import structures in relevant modules to include BriaFiboEditPipeline.
* Ensured compatibility with existing pipelines and type checking.
* Feature: Introduce Bria Fibo Edit Pipeline
* Added BriaFiboEditPipeline class for structured JSON-native image editing.
* Created documentation for the new pipeline in bria_fibo_edit.md.
* Updated import structures to include the new pipeline and its components.
* Added unit tests for the BriaFiboEditPipeline to ensure functionality and correctness.
* Enhancement: Update Bria Fibo Edit Pipeline and Documentation
* Refined the Bria Fibo Edit model description for clarity and detail.
* Added usage instructions for model authentication and login.
* Implemented mask handling functions in the BriaFiboEditPipeline for improved image editing capabilities.
* Updated unit tests to cover new mask functionalities and ensure input validation.
* Adjusted example code in documentation to reflect changes in the pipeline's usage.
* Update Bria Fibo Edit documentation with corrected Hugging Face page link
* add dreambooth training script
* style and quality
* Delete temp.py
* Enhancement: Improve JSON caption validation in DreamBoothDataset
* Updated the clean_json_caption function to handle both string and dictionary inputs for captions.
* Added error handling to raise a ValueError for invalid caption types, ensuring better input validation.
* Add datasets dependency to requirements_fibo_edit.txt
* Add bria_fibo_edit to docs table of contents
* Fix dummy objects ordering
* Fix BriaFiboEditPipeline to use passed generator parameter
The pipeline was ignoring the generator parameter and only using
the seed parameter. This caused non-deterministic outputs in tests
that pass a seeded generator.
* Remove fibo_edit training script and related files
---------
Co-authored-by: kfirbria <kfir@bria.ai>
* Fix QwenImage txt_seq_lens handling
* formatting
* formatting
* remove txt_seq_lens and use bool mask
* use compute_text_seq_len_from_mask
* add seq_lens to dispatch_attention_fn
* use joint_seq_lens
* remove unused index_block
* WIP: Remove seq_lens parameter and use mask-based approach
- Remove seq_lens parameter from dispatch_attention_fn
- Update varlen backends to extract seqlens from masks
- Update QwenImage to pass 2D joint_attention_mask
- Fix native backend to handle 2D boolean masks
- Fix sage_varlen seqlens_q to match seqlens_k for self-attention
Note: sage_varlen still producing black images, needs further investigation
* fix formatting
* undo sage changes
* xformers support
* hub fix
* fix torch compile issues
* fix tests
* use _prepare_attn_mask_native
* proper deprecation notice
* add deprecate to txt_seq_lens
* Update src/diffusers/models/transformers/transformer_qwenimage.py
Co-authored-by: YiYi Xu <yixu310@gmail.com>
* Update src/diffusers/models/transformers/transformer_qwenimage.py
Co-authored-by: YiYi Xu <yixu310@gmail.com>
* Only create the mask if there's actual padding
* fix order of docstrings
* Adds performance benchmarks and optimization details for QwenImage
Enhances documentation with comprehensive performance insights for QwenImage pipeline:
* rope_text_seq_len = text_seq_len
* rename to max_txt_seq_len
* removed deprecated args
* undo unrelated change
* Updates QwenImage performance documentation
Removes detailed attention backend benchmarks and simplifies torch.compile performance description
Focuses on key performance improvement with torch.compile, highlighting the specific speedup from 4.70s to 1.93s on an A100 GPU
Streamlines the documentation to provide more concise and actionable performance insights
* Updates deprecation warnings for txt_seq_lens parameter
Extends deprecation timeline for txt_seq_lens from version 0.37.0 to 0.39.0 across multiple Qwen image-related models
Adds a new unit test to verify the deprecation warning behavior for the txt_seq_lens parameter
* fix compile
* formatting
* fix compile tests
* rename helper
* remove duplicate
* smaller values
* removed
* use torch.cond for torch compile
* Construct joint attention mask once
* test different backends
* construct joint attention mask once to avoid reconstructing in every block
* Update src/diffusers/models/attention_dispatch.py
Co-authored-by: YiYi Xu <yixu310@gmail.com>
* formatting
* raising an error from the EditPlus pipeline when batch_size > 1
---------
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: YiYi Xu <yixu310@gmail.com>
Co-authored-by: cdutr <dutra_carlos@hotmail.com>