* fix mask in SP
* change the modification to qwen specific
* drop xfail since qwen-image mask is fixed
---------
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* UT expands to batch inputs
* update according to suggestion
* update according to suggestion 2
* fix CI
* update according to suggestion 3
* clean line
* Initial implementation of perturbed attn processor for LTX 2.3
* Update DiT block for LTX 2.3 + add self_attention_mask
* Add flag to control using perturbed attn processor for now
* Add support for new video upsampling blocks used by LTX-2.3
* Support LTX-2.3 Big-VGAN V2-style vocoder
* Initial implementation of LTX-2.3 vocoder with bandwidth extender
* Initial support for LTX-2.3 per-modality feature extractor
* Refactor so that text connectors own all text encoder hidden_states normalization logic
* Fix some bugs for inference
* Fix LTX-2.X DiT block forward pass
* Support prompt timestep embeds and prompt cross attn modulation
* Add LTX-2.3 configs to conversion script
* Support converting LTX-2.3 DiT checkpoints
* Support converting LTX-2.3 Video VAE checkpoints
* Support converting LTX-2.3 Vocoder with bandwidth extender
* Support converting LTX-2.3 text connectors
* Don't convert any upsamplers for now
* Support self attention mask for LTX2Pipeline
* Fix some inference bugs
* Support self attn mask and sigmas for LTX-2.3 I2V, Cond pipelines
* Support STG and modality isolation guidance for LTX-2.3
* make style and make quality
* Make audio guidance values default to video values by default
* Update to LTX-2.3 style guidance rescaling
* Support cross timesteps for LTX-2.3 cross attention modulation
* Fix RMS norm bug for LTX-2.3 text connectors
* Perform guidance rescale in sample (x0) space following original code
* Support LTX-2.3 Latent Spatial Upsampler model
* Support LTX-2.3 distilled LoRA
* Support LTX-2.3 Distilled checkpoint
* Support LTX-2.3 prompt enhancement
* Make LTX-2.X processor non-required so that tests pass
* Fix test_components_function tests for LTX2 T2V and I2V
* Fix LTX-2.3 Video VAE configuration bug causing pixel jitter
* Apply suggestions from code review
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* Refactor LTX-2.X Video VAE upsampler block init logic
* Refactor LTX-2.X guidance rescaling to use rescale_noise_cfg
* Use generator initial seed to control prompt enhancement if available
* Remove self attention mask logic as it is not used in any current pipelines
* Commit fixes suggested by claude code (guidance in sample (x0) space, denormalize after timestep conditioning)
* Use constant shift following original code
---------
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* add a test to check modular index consistency
* check for compulsory keys.
* use fixture for tmp_path in modular tests.
* remove unneeded test.
* fix code quality.
* up
* up
* feat: implement three RAE encoders(dinov2, siglip2, mae)
* feat: finish first version of autoencoder_rae
* fix formatting
* make fix-copies
* initial doc
* fix latent_mean / latent_var init types to accept config-friendly inputs
* use mean and std convention
* cleanup
* add rae to diffusers script
* use imports
* use attention
* remove unneeded class
* example traiing script
* input and ground truth sizes have to be the same
* fix argument
* move loss to training script
* cleanup
* simplify mixins
* fix training script
* fix entrypoint for instantiating the AutoencoderRAE
* added encoder_image_size config
* undo last change
* fixes from pretrained weights
* cleanups
* address reviews
* fix train script to use pretrained
* fix conversion script review
* latebt normalization buffers are now always registered with no-op defaults
* Update examples/research_projects/autoencoder_rae/README.md
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* Update src/diffusers/models/autoencoders/autoencoder_rae.py
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* use image url
* Encoder is frozen
* fix slow test
* remove config
* use ModelTesterMixin and AutoencoderTesterMixin
* make quality
* strip final layernorm when converting
* _strip_final_layernorm_affine for training script
* fix test
* add dispatch forward and update conversion script
* update training script
* error out as soon as possible and add comments
* Update src/diffusers/models/autoencoders/autoencoder_rae.py
Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com>
* use buffer
* inline
* Update src/diffusers/models/autoencoders/autoencoder_rae.py
Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com>
* remove optional
* _noising takes a generator
* Update src/diffusers/models/autoencoders/autoencoder_rae.py
Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com>
* fix api
* rename
* remove unittest
* use randn_tensor
* fix device map on multigpu
* check if the key is missing in the original state dict and only then add to the allow_missing set
* remove initialize_weights
---------
Co-authored-by: wangyuqi <wangyuqi@MBP-FJDQNJTWYN-0208.local>
Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com>
* Implement synchronous onload for offloaded parameters
Add fallback synchronous onload for conditionally-executed modules.
* add test for new code path about group-offloading
* Update tests/hooks/test_group_offloading.py
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* use unittest.skipIf and update the comment
---------
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* Fix LTX-2 image-to-video generation failure in two stages generation
In LTX-2's two-stage image-to-video generation task, specifically after
the upsampling step, a shape mismatch occurs between the `latents` and
the `conditioning_mask`, which causes an error in function
`_create_noised_state`.
Fix it by creating the `conditioning_mask` based on the shape of the
`latents`.
* Add unit test for LTX-2 i2v two stages inference with upsampler
* Downscaling the upsampler in LTX-2 image-to-video unit test
* Apply style fixes
---------
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
* switch to transformers main again./
* more
* up
* up
* fix group offloading.
* attributes
* up
* up
* tie embedding issue.
* fix t5 stuff for more.
* matrix configuration to see differences between 4.57.3 and main failures.
* change qwen expected slice because of how init is handled in v5.
* same stuff.
* up
* up
* Revert "up"
This reverts commit 515dd06db5.
* Revert "up"
This reverts commit 5274ffdd7f.
* up
* up
* fix with peft_format.
* just keep main for easier debugging.
* remove torchvision.
* empty
* up
* up with skyreelsv2 fixes.
* fix skyreels type annotation.
* up
* up
* fix variant loading issues.
* more fixes.
* fix dduf
* fix
* fix
* fix
* more fixes
* fixes
* up
* up
* fix dduf test
* up
* more
* update
* hopefully ,final?
* one last breath
* always install from main
* up
* audioldm tests
* up
* fix PRX tests.
* up
* kandinsky fixes
* qwen fixes.
* prx
* hidream
* Guard ftfy import with is_ftfy_available
* Remove xfail for PRX pipeline tests as they appear to work on transformers>4.57.1
* make style and make quality
* support device type device_maps to work with offloading.
* add tests.
* fix tests
* skip tests where it's not supported.
* empty
* up
* up
* fix allegro.
* up
* up up
* update outputs
* style
* add modular_auto_docstring!
* more auto docstring
* style
* up up up
* more more
* up
* address feedbacks
* add TODO in the description for empty docstring
* refactor based on dhruv's feedback: remove the class method
* add template method
* up
* up up up
* apply auto docstring
* make style
* rmove space in make docstring
* Apply suggestions from code review
* revert change in z
* fix
* Apply style fixes
* include auto-docstring check in the modular ci. (#13004)
* initial support: workflow
* up up
* treeat loop sequential pipeline blocks as leaf
* update qwen image docstring note
* add workflow support for sdxl
* add a test suit
* add test for qwen-image
* refactor flux a bit, seperate modular_blocks into modular_blocks_flux and modular_blocks_flux_kontext + support workflow
* refactor flux2: seperate blocks for klein_base + workflow
* qwen: remove import support for stuff other than the default blocks
* add workflow support for wan
* sdxl: remove some imports:
* refactor z
* update flux2 auto core denoise
* add workflow test for z and flux2
* Apply suggestions from code review
* Apply suggestions from code review
* add test for flux
* add workflow test for flux
* add test for flux-klein
* sdxl: modular_blocks.py -> modular_blocks_stable_diffusion_xl.py
* style
* up
* add auto docstring
* workflow_names -> available_workflows
* fix workflow test for klein base
* Apply suggestions from code review
Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>
* fix workflow tests
* qwen: edit -> image_conditioned to be consistent with flux kontext/2 such
* remove Optional
* update type hints
* update guider update_components
* fix more
* update docstring auto again
---------
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: yiyi@huggingface.co <yiyi@ip-26-0-160-103.ec2.internal>
Co-authored-by: yiyi@huggingface.co <yiyi@ip-26-0-161-123.ec2.internal>
Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>
* drop python 3.8
* remove list, tuple, dict from typing
* fold Unions into |
* up
* fix a bunch and please me.
* up
* up
* up
* up
* up
* up
* enforce 3.10.0.
* up
* up
* up
* up
* up
* up
* up
* up
* Update setup.py
Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>
* up.
* python 3.10.
* ifx
* up
* up
* up
* up
* final
* up
* fix typing utils.
* up
* up
* up
* up
* up
* up
* fix
* up
* up
* up
* up
* up
* up
* handle modern types.
* up
* up
* fix ip adapter type checking.
* up
* up
* up
* up
* up
* up
* up
* revert docstring changes.
* keep deleted files deleted.
* keep deleted files deleted.
---------
Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>
* initial conversion script
* cosmos control net block
* CosmosAttention
* base model conversion
* wip
* pipeline updates
* convert controlnet
* pipeline: working without controls
* wip
* debugging
* Almost working
* temp
* control working
* cleanup + detail on neg_encoder_hidden_states
* convert edge
* pos emb for control latents
* convert all chkpts
* resolve TODOs
* remove prints
* Docs
* add siglip image reference encoder
* Add unit tests
* controlnet: add duplicate layers
* Additional tests
* skip less
* skip less
* remove image_ref
* minor
* docs
* remove skipped test in transfer
* Don't crash process
* formatting
* revert some changes
* remove skipped test
* make style
* Address comment + fix example
* CosmosAttnProcessor2_0 revert + CosmosAttnProcessor2_5 changes
* make style
* make fix-copies