* remove str option for quantization config in torchao
* Apply style fixes
* minor fixes
* Added AOBaseConfig docs to torchao.md
* minor fixes for removing str option torchao
* minor change to add back int and uint check
* minor fixes
* minor fixes to tests
* Update tests/quantization/torchao/test_torchao.py
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* Update docs/source/en/quantization/torchao.md
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* Update tests/quantization/torchao/test_torchao.py
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* version=2 update to test_torchao.py
---------
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* fix mask in SP
* change the modification to qwen specific
* drop xfail since qwen-image mask is fixed
---------
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* UT expands to batch inputs
* update according to suggestion
* update according to suggestion 2
* fix CI
* update according to suggestion 3
* clean line
* feat: implement three RAE encoders(dinov2, siglip2, mae)
* feat: finish first version of autoencoder_rae
* fix formatting
* make fix-copies
* initial doc
* fix latent_mean / latent_var init types to accept config-friendly inputs
* use mean and std convention
* cleanup
* add rae to diffusers script
* use imports
* use attention
* remove unneeded class
* example traiing script
* input and ground truth sizes have to be the same
* fix argument
* move loss to training script
* cleanup
* simplify mixins
* fix training script
* fix entrypoint for instantiating the AutoencoderRAE
* added encoder_image_size config
* undo last change
* fixes from pretrained weights
* cleanups
* address reviews
* fix train script to use pretrained
* fix conversion script review
* latebt normalization buffers are now always registered with no-op defaults
* Update examples/research_projects/autoencoder_rae/README.md
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* Update src/diffusers/models/autoencoders/autoencoder_rae.py
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* use image url
* Encoder is frozen
* fix slow test
* remove config
* use ModelTesterMixin and AutoencoderTesterMixin
* make quality
* strip final layernorm when converting
* _strip_final_layernorm_affine for training script
* fix test
* add dispatch forward and update conversion script
* update training script
* error out as soon as possible and add comments
* Update src/diffusers/models/autoencoders/autoencoder_rae.py
Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com>
* use buffer
* inline
* Update src/diffusers/models/autoencoders/autoencoder_rae.py
Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com>
* remove optional
* _noising takes a generator
* Update src/diffusers/models/autoencoders/autoencoder_rae.py
Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com>
* fix api
* rename
* remove unittest
* use randn_tensor
* fix device map on multigpu
* check if the key is missing in the original state dict and only then add to the allow_missing set
* remove initialize_weights
---------
Co-authored-by: wangyuqi <wangyuqi@MBP-FJDQNJTWYN-0208.local>
Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com>
* switch to transformers main again./
* more
* up
* up
* fix group offloading.
* attributes
* up
* up
* tie embedding issue.
* fix t5 stuff for more.
* matrix configuration to see differences between 4.57.3 and main failures.
* change qwen expected slice because of how init is handled in v5.
* same stuff.
* up
* up
* Revert "up"
This reverts commit 515dd06db5.
* Revert "up"
This reverts commit 5274ffdd7f.
* up
* up
* fix with peft_format.
* just keep main for easier debugging.
* remove torchvision.
* empty
* up
* up with skyreelsv2 fixes.
* fix skyreels type annotation.
* up
* up
* fix variant loading issues.
* more fixes.
* fix dduf
* fix
* fix
* fix
* more fixes
* fixes
* up
* up
* fix dduf test
* up
* more
* update
* hopefully ,final?
* one last breath
* always install from main
* up
* audioldm tests
* up
* fix PRX tests.
* up
* kandinsky fixes
* qwen fixes.
* prx
* hidream
* support device type device_maps to work with offloading.
* add tests.
* fix tests
* skip tests where it's not supported.
* empty
* up
* up
* fix allegro.
* drop python 3.8
* remove list, tuple, dict from typing
* fold Unions into |
* up
* fix a bunch and please me.
* up
* up
* up
* up
* up
* up
* enforce 3.10.0.
* up
* up
* up
* up
* up
* up
* up
* up
* Update setup.py
Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>
* up.
* python 3.10.
* ifx
* up
* up
* up
* up
* final
* up
* fix typing utils.
* up
* up
* up
* up
* up
* up
* fix
* up
* up
* up
* up
* up
* up
* handle modern types.
* up
* up
* fix ip adapter type checking.
* up
* up
* up
* up
* up
* up
* up
* revert docstring changes.
* keep deleted files deleted.
* keep deleted files deleted.
---------
Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>
* initial conversion script
* cosmos control net block
* CosmosAttention
* base model conversion
* wip
* pipeline updates
* convert controlnet
* pipeline: working without controls
* wip
* debugging
* Almost working
* temp
* control working
* cleanup + detail on neg_encoder_hidden_states
* convert edge
* pos emb for control latents
* convert all chkpts
* resolve TODOs
* remove prints
* Docs
* add siglip image reference encoder
* Add unit tests
* controlnet: add duplicate layers
* Additional tests
* skip less
* skip less
* remove image_ref
* minor
* docs
* remove skipped test in transfer
* Don't crash process
* formatting
* revert some changes
* remove skipped test
* make style
* Address comment + fix example
* CosmosAttnProcessor2_0 revert + CosmosAttnProcessor2_5 changes
* make style
* make fix-copies
* avoid creating attention masks when there is no padding
* make fix-copies
* torch compile tests
* set all ones mask to none
* fix positional encoding from becoming > 4096
* fix from review
* slice freqs_cis to match the input sequence length
* keep only attenton masking change
---------
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* Fix QwenImage txt_seq_lens handling
* formatting
* formatting
* remove txt_seq_lens and use bool mask
* use compute_text_seq_len_from_mask
* add seq_lens to dispatch_attention_fn
* use joint_seq_lens
* remove unused index_block
* WIP: Remove seq_lens parameter and use mask-based approach
- Remove seq_lens parameter from dispatch_attention_fn
- Update varlen backends to extract seqlens from masks
- Update QwenImage to pass 2D joint_attention_mask
- Fix native backend to handle 2D boolean masks
- Fix sage_varlen seqlens_q to match seqlens_k for self-attention
Note: sage_varlen still producing black images, needs further investigation
* fix formatting
* undo sage changes
* xformers support
* hub fix
* fix torch compile issues
* fix tests
* use _prepare_attn_mask_native
* proper deprecation notice
* add deprecate to txt_seq_lens
* Update src/diffusers/models/transformers/transformer_qwenimage.py
Co-authored-by: YiYi Xu <yixu310@gmail.com>
* Update src/diffusers/models/transformers/transformer_qwenimage.py
Co-authored-by: YiYi Xu <yixu310@gmail.com>
* Only create the mask if there's actual padding
* fix order of docstrings
* Adds performance benchmarks and optimization details for QwenImage
Enhances documentation with comprehensive performance insights for QwenImage pipeline:
* rope_text_seq_len = text_seq_len
* rename to max_txt_seq_len
* removed deprecated args
* undo unrelated change
* Updates QwenImage performance documentation
Removes detailed attention backend benchmarks and simplifies torch.compile performance description
Focuses on key performance improvement with torch.compile, highlighting the specific speedup from 4.70s to 1.93s on an A100 GPU
Streamlines the documentation to provide more concise and actionable performance insights
* Updates deprecation warnings for txt_seq_lens parameter
Extends deprecation timeline for txt_seq_lens from version 0.37.0 to 0.39.0 across multiple Qwen image-related models
Adds a new unit test to verify the deprecation warning behavior for the txt_seq_lens parameter
* fix compile
* formatting
* fix compile tests
* rename helper
* remove duplicate
* smaller values
* removed
* use torch.cond for torch compile
* Construct joint attention mask once
* test different backends
* construct joint attention mask once to avoid reconstructing in every block
* Update src/diffusers/models/attention_dispatch.py
Co-authored-by: YiYi Xu <yixu310@gmail.com>
* formatting
* raising an error from the EditPlus pipeline when batch_size > 1
---------
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: YiYi Xu <yixu310@gmail.com>
Co-authored-by: cdutr <dutra_carlos@hotmail.com>
* Initial LTX 2.0 transformer implementation
* Add tests for LTX 2 transformer model
* Get LTX 2 transformer tests working
* Rename LTX 2 compile test class to have LTX2
* Remove RoPE debug print statements
* Get LTX 2 transformer compile tests passing
* Fix LTX 2 transformer shape errors
* Initial script to convert LTX 2 transformer to diffusers
* Add more LTX 2 transformer audio arguments
* Allow LTX 2 transformer to be loaded from local path for conversion
* Improve dummy inputs and add test for LTX 2 transformer consistency
* Fix LTX 2 transformer bugs so consistency test passes
* Initial implementation of LTX 2.0 video VAE
* Explicitly specify temporal and spatial VAE scale factors when converting
* Add initial LTX 2.0 video VAE tests
* Add initial LTX 2.0 video VAE tests (part 2)
* Get diffusers implementation on par with official LTX 2.0 video VAE implementation
* Initial LTX 2.0 vocoder implementation
* Use RMSNorm implementation closer to original for LTX 2.0 video VAE
* start audio decoder.
* init registration.
* up
* simplify and clean up
* up
* Initial LTX 2.0 text encoder implementation
* Rough initial LTX 2.0 pipeline implementation
* up
* up
* up
* up
* Add imports for LTX 2.0 Audio VAE
* Conversion script for LTX 2.0 Audio VAE Decoder
* Add Audio VAE logic to T2V pipeline
* Duplicate scheduler for audio latents
* Support num_videos_per_prompt for prompt embeddings
* LTX 2.0 scheduler and full pipeline conversion
* Add script to test full LTX2Pipeline T2V inference
* Fix pipeline return bugs
* Add LTX 2 text encoder and vocoder to ltx2 subdirectory __init__
* Fix more bugs in LTX2Pipeline.__call__
* Improve CPU offload support
* Fix pipeline audio VAE decoding dtype bug
* Fix video shape error in full pipeline test script
* Get LTX 2 T2V pipeline to produce reasonable outputs
* Make LTX 2.0 scheduler more consistent with original code
* Fix typo when applying scheduler fix in T2V inference script
* Refactor Audio VAE to be simpler and remove helpers (#7)
* remove resolve causality axes stuff.
* remove a bunch of helpers.
* remove adjust output shape helper.
* remove the use of audiolatentshape.
* move normalization and patchify out of pipeline.
* fix
* up
* up
* Remove unpatchify and patchify ops before audio latents denormalization (#9)
---------
Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com>
* Add support for I2V (#8)
* start i2v.
* up
* up
* up
* up
* up
* remove uniform strategy code.
* remove unneeded code.
* Denormalize audio latents in I2V pipeline (analogous to T2V change) (#11)
* test i2v.
* Move Video and Audio Text Encoder Connectors to Transformer (#12)
* Denormalize audio latents in I2V pipeline (analogous to T2V change)
* Initial refactor to put video and audio text encoder connectors in transformer
* Get LTX 2 transformer tests working after connector refactor
* precompute run_connectors,.
* fixes
* Address review comments
* Calculate RoPE double precisions freqs using torch instead of np
* Further simplify LTX 2 RoPE freq calc
* Make connectors a separate module (#18)
* remove text_encoder.py
* address yiyi's comments.
* up
* up
* up
* up
---------
Co-authored-by: sayakpaul <spsayakpaul@gmail.com>
* up (#19)
* address initial feedback from lightricks team (#16)
* cross_attn_timestep_scale_multiplier to 1000
* implement split rope type.
* up
* propagate rope_type to rope embed classes as well.
* up
* When using split RoPE, make sure that the output dtype is same as input dtype
* Fix apply split RoPE shape error when reshaping x to 4D
* Add export_utils file for exporting LTX 2.0 videos with audio
* Tests for T2V and I2V (#6)
* add ltx2 pipeline tests.
* up
* up
* up
* up
* remove content
* style
* Denormalize audio latents in I2V pipeline (analogous to T2V change)
* Initial refactor to put video and audio text encoder connectors in transformer
* Get LTX 2 transformer tests working after connector refactor
* up
* up
* i2v tests.
* up
* Address review comments
* Calculate RoPE double precisions freqs using torch instead of np
* Further simplify LTX 2 RoPE freq calc
* revert unneded changes.
* up
* up
* update to split style rope.
* up
---------
Co-authored-by: Daniel Gu <dgu8957@gmail.com>
* up
* use export util funcs.
* Point original checkpoint to LTX 2.0 official checkpoint
* Allow the I2V pipeline to accept image URLs
* make style and make quality
* remove function map.
* remove args.
* update docs.
* update doc entries.
* disable ltx2_consistency test
* Simplify LTX 2 RoPE forward by removing coords is None logic
* make style and make quality
* Support LTX 2.0 audio VAE encoder
* Apply suggestions from code review
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* Remove print statement in audio VAE
* up
* Fix bug when calculating audio RoPE coords
* Ltx 2 latent upsample pipeline (#12922)
* Initial implementation of LTX 2.0 latent upsampling pipeline
* Add new LTX 2.0 spatial latent upsampler logic
* Add test script for LTX 2.0 latent upsampling
* Add option to enable VAE tiling in upsampling test script
* Get latent upsampler working with video latents
* Fix typo in BlurDownsample
* Add latent upsample pipeline docstring and example
* Remove deprecated pipeline VAE slicing/tiling methods
* make style and make quality
* When returning latents, return unpacked and denormalized latents for T2V and I2V
* Add model_cpu_offload_seq for latent upsampling pipeline
---------
Co-authored-by: Daniel Gu <dgu8957@gmail.com>
* Fix latent upsampler filename in LTX 2 conversion script
* Add latent upsample pipeline to LTX 2 docs
* Add dummy objects for LTX 2 latent upsample pipeline
* Set default FPS to official LTX 2 ckpt default of 24.0
* Set default CFG scale to official LTX 2 ckpt default of 4.0
* Update LTX 2 pipeline example docstrings
* make style and make quality
* Remove LTX 2 test scripts
* Fix LTX 2 upsample pipeline example docstring
* Add logic to convert and save a LTX 2 upsampling pipeline
* Document LTX2VideoTransformer3DModel forward pass
---------
Co-authored-by: sayakpaul <spsayakpaul@gmail.com>
* fix: group offloading to support standalone computational layers in block-level offloading
* test: for models with standalone and deeply nested layers in block-level offloading
* feat: support for block-level offloading in group offloading config
* fix: group offload block modules to AutoencoderKL and AutoencoderKLWan
* fix: update group offloading tests to use AutoencoderKL and adjust input dimensions
* refactor: streamline block offloading logic
* Apply style fixes
* update tests
* update
* fix for failing tests
* clean up
* revert to use skip_keys
* clean up
---------
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>
* start zimage model tests.
* up
* up
* up
* up
* up
* up
* up
* up
* up
* up
* up
* up
* Revert "up"
This reverts commit bca3e27c96.
* expand upon compilation failure reason.
* Update tests/models/transformers/test_models_transformer_z_image.py
Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com>
* reinitialize the padding tokens to ones to prevent NaN problems.
* updates
* up
* skipping ZImage DiT tests
* up
* up
---------
Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com>
* Fixes#12673.
Wrong default_stream is used. leading to wrong execution order when record_steram is enabled.
* update
* Update test
---------
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>