Commit Graph

353 Commits

Author SHA1 Message Date
Cheung Ka Wai
e1e7d58a4a Fix Ulysses SP backward with SDPA (#13328)
* add UT for backward

* fix SDPA attention backward
2026-03-30 15:15:27 +05:30
Sayak Paul
10ec3040a2 [ci] move to assert instead of self.Assert* (#13366)
move to assert instead of self.Assert*
2026-03-30 11:09:14 +05:30
Howard Zhang
1fe2125802 remove str option for quantization config in torchao (#13291)
* remove str option for quantization config in torchao

* Apply style fixes

* minor fixes

* Added AOBaseConfig docs to torchao.md

* minor fixes for removing str option torchao

* minor change to add back int and uint check

* minor fixes

* minor fixes to tests

* Update tests/quantization/torchao/test_torchao.py

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

* Update docs/source/en/quantization/torchao.md

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

* Update tests/quantization/torchao/test_torchao.py

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

* version=2 update to test_torchao.py

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2026-03-27 08:52:37 +05:30
Sayak Paul
f1fd515257 [tests] fix lora logging tests for models. (#13318)
* fix lora logging tests for models.

* make style
2026-03-24 15:48:03 +05:30
Cheung Ka Wai
afdda57f61 Fix the attention mask in ulysses SP for QwenImage (#13278)
* fix mask in SP

* change the modification to qwen specific

* drop xfail since qwen-image mask is fixed

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2026-03-24 02:12:50 -07:00
Cheung Ka Wai
9d4c9dcf21 change QwenImageTransformer UT to batch inputs (#13312)
* UT expands to batch inputs

* update according to suggestion

* update according to suggestion 2

* fix CI

* update according to suggestion 3

* clean line
2026-03-24 08:56:40 +05:30
ddavidchick
ef309a1bb0 Add KVAE 1.0 (#13033)
* add kvae2d

* add kvae3d video

* add docs for kvae2d and kvae3d video

* style fixes

* fix kvae3d docs

* fix normalzation

* fix kvae video for code style

* fix kvae video

* kvae minor fixes

* add gradient ckpting for kvaes

* get rid of inplace ops kvae video

* add tests for KVAEs

* kvae2d normalization style change

* kvaes fix style

* update dummy_pt_objects test for kvaes

---------

Co-authored-by: YiYi Xu <yixu310@gmail.com>
2026-03-23 12:56:49 -10:00
Dhruv Nair
52558b45d8 [CI] Flux2 Model Test Refactor (#13071)
* update

* update

* update

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2026-03-23 16:56:08 +05:30
Dhruv Nair
11a3284cee [CI] Qwen Image Model Test Refactor (#13069)
* update

* update

* update

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2026-03-17 16:44:04 +05:30
Wang, Yi
9677859ebf fix parallelism case failure in xpu (#13270)
* fix parallelism case failure in xpu

Signed-off-by: Wang, Yi <yi.a.wang@intel.com>

* updated

Signed-off-by: Wang, Yi <yi.a.wang@intel.com>

---------

Signed-off-by: Wang, Yi <yi.a.wang@intel.com>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2026-03-17 08:52:15 +05:30
Dhruv Nair
07c5ba8eee [Context Parallel] Add support for custom device mesh (#13064)
* add custom mesh support

* update

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2026-03-11 16:42:11 +05:30
annitang1997
7f92d81320 Add VidTok AutoEncoders (#11261)
* add_autoencoder_vidtok

* format standardization

* remove small functions

* making the code style more diffusers-like

* Apply style fixes

* Add dummy objects for AutoencoderVidTok

* Fix AutoencoderVidTok avg_pool3d BFloat16 CPU compatibility

* skip test_layerwise_casting_training test

* Apply style fixes

---------

Co-authored-by: annitang1997 <memory97@sjtu.edu.cn>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com>
2026-03-08 20:23:49 -07:00
Ando
8ec0a5ccad feat: implement rae autoencoder. (#13046)
* feat: implement three RAE encoders(dinov2, siglip2, mae)

* feat: finish first version of autoencoder_rae

* fix formatting

* make fix-copies

* initial doc

* fix latent_mean / latent_var init types to accept config-friendly inputs

* use mean and std convention

* cleanup

* add rae to diffusers script

* use imports

* use attention

* remove unneeded class

* example traiing script

* input and ground truth sizes have to be the same

* fix argument

* move loss to training script

* cleanup

* simplify mixins

* fix training script

* fix entrypoint for instantiating the AutoencoderRAE

* added encoder_image_size config

* undo last change

* fixes from pretrained weights

* cleanups

* address reviews

* fix train script to use pretrained

* fix conversion script review

* latebt normalization buffers are now always registered with no-op defaults

* Update examples/research_projects/autoencoder_rae/README.md

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

* Update src/diffusers/models/autoencoders/autoencoder_rae.py

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

* use image url

* Encoder is frozen

* fix slow test

* remove config

* use ModelTesterMixin and AutoencoderTesterMixin

* make quality

* strip final layernorm when converting

* _strip_final_layernorm_affine for training script

* fix test

* add dispatch forward and update conversion script

* update training script

* error out as soon as possible and add comments

* Update src/diffusers/models/autoencoders/autoencoder_rae.py

Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com>

* use buffer

* inline

* Update src/diffusers/models/autoencoders/autoencoder_rae.py

Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com>

* remove optional

* _noising takes a generator

* Update src/diffusers/models/autoencoders/autoencoder_rae.py

Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com>

* fix api

* rename

* remove unittest

* use randn_tensor

* fix device map on multigpu

* check if the key is missing in the original state dict and only then add to the allow_missing set

* remove initialize_weights

---------

Co-authored-by: wangyuqi <wangyuqi@MBP-FJDQNJTWYN-0208.local>
Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com>
2026-03-05 20:17:14 +05:30
dg845
33f785b444 Add Helios-14B Video Generation Pipelines (#13208)
* [1/N] add helios

* fix test

* make fix-copies

* change script path

* fix cus script

* update docs

* fix documented check

* update links for docs and examples

* change default config

* small refactor

* add test

* Update src/diffusers/models/transformers/transformer_helios.py

Co-authored-by: YiYi Xu <yixu310@gmail.com>

* remove register_buffer for _scale_cache

* fix non-cuda devices error

* remove "handle the case when timestep is 2D"

* refactor HeliosMultiTermMemoryPatch and process_input_hidden_states

* Update src/diffusers/pipelines/helios/pipeline_helios.py

Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com>

* Update src/diffusers/models/transformers/transformer_helios.py

Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com>

* Update src/diffusers/pipelines/helios/pipeline_helios.py

Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com>

* fix calculate_shift

* Update src/diffusers/pipelines/helios/pipeline_helios.py

Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com>

* rewritten `einops` in pure `torch`

* fix: pass patch_size to apply_schedule_shift instead of hardcoding

* remove the logics of 'vae_decode_type'

* move some validation into check_inputs()

* rename helios scheduler & merge all into one step()

* add some details to doc

* move dmd  step() logics from pipeline to scheduler

* change to Python 3.9+ style type

* fix NoneType error

* refactor DMD scheduler's set_timestep

* change rope related vars name

* fix stage2 sample

* fix dmd sample

* Update src/diffusers/models/transformers/transformer_helios.py

Co-authored-by: YiYi Xu <yixu310@gmail.com>

* Update src/diffusers/models/transformers/transformer_helios.py

Co-authored-by: YiYi Xu <yixu310@gmail.com>

* remove redundant & refactor norm_out

* Update src/diffusers/pipelines/helios/pipeline_helios.py

Co-authored-by: YiYi Xu <yixu310@gmail.com>

* change "is_keep_x0" to "keep_first_frame"

* use a more intuitive name

* refactor dynamic_time_shifting

* remove use_dynamic_shifting args

* remove usage of UniPCMultistepScheduler

* separate stage2 sample to HeliosPyramidPipeline

* Update src/diffusers/models/transformers/transformer_helios.py

Co-authored-by: YiYi Xu <yixu310@gmail.com>

* Update src/diffusers/models/transformers/transformer_helios.py

Co-authored-by: YiYi Xu <yixu310@gmail.com>

* Update src/diffusers/models/transformers/transformer_helios.py

Co-authored-by: YiYi Xu <yixu310@gmail.com>

* Update src/diffusers/models/transformers/transformer_helios.py

Co-authored-by: YiYi Xu <yixu310@gmail.com>

* fix transformer

* use a more intuitive name

* update example script

* fix requirements

* remove redudant attention mask

* fix

* optimize pipelines

* make style .

* update TYPE_CHECKING

* change to use torch.split

Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com>

* derive memory patch sizes from patch_size multiples

* remove some hardcoding

* move some checks into check_inputs

* refactor sample_block_noise

* optimize encoding chunks logits for v2v

* use num_history_latent_frames = sum(history_sizes)

* Update src/diffusers/pipelines/helios/pipeline_helios.py

Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com>

* remove redudant optimized_scale

* Update src/diffusers/pipelines/helios/pipeline_helios_pyramid.py

Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com>

* use more descriptive name

* optimize history_latents

* remove not used "num_inference_steps"

* removed redudant "pyramid_num_stages"

* add "is_cfg_zero_star" and "is_distilled" to HeliosPyramidPipeline

* remove redudant

* change example scripts name

* change example scripts name

* correct docs

* update example

* update docs

* Update tests/models/transformers/test_models_transformer_helios.py

Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com>

* Update tests/models/transformers/test_models_transformer_helios.py

Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com>

* separate HeliosDMDScheduler

* fix numerical stability issue:

* Update src/diffusers/schedulers/scheduling_helios_dmd.py

Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com>

* Update src/diffusers/schedulers/scheduling_helios_dmd.py

Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com>

* Update src/diffusers/schedulers/scheduling_helios_dmd.py

Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com>

* Update src/diffusers/schedulers/scheduling_helios_dmd.py

Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com>

* Update src/diffusers/schedulers/scheduling_helios_dmd.py

Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com>

* remove redudant

* small refactor

* remove use_interpolate_prompt logits

* simplified model test

* fallbackt to BaseModelTesterConfig

* remove _maybe_expand_t2v_lora_for_i2v

* fix HeliosLoraLoaderMixin

* update docs

* use randn_tensor for test

* fix doc typo

* optimize code

* mark torch.compile xfail

* change paper name

* Make get_dummy_inputs deterministic using self.generator

* Set less strict threshold for test_save_load_float16 test for Helios pipeline

* make style and make quality

* Preparation for merging

* add torch.Generator

* Fix HeliosPipelineOutput doc path

* Fix Helios related (optimize docs & remove redudant) (#13210)

* fix docs

* remove redudant

* remove redudant

* fix group offload

* Removed fixes for group offload

---------

Co-authored-by: yuanshenghai <yuanshenghai@bytedance.com>
Co-authored-by: Shenghai Yuan <140951558+SHYuanBest@users.noreply.github.com>
Co-authored-by: YiYi Xu <yixu310@gmail.com>
Co-authored-by: SHYuanBest <shyuan-cs@hotmail.com>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2026-03-04 21:31:43 +05:30
Dhruv Nair
3fd14f1acf [AutoModel] Allow registering auto_map to model config (#13186)
* update

* update
2026-03-02 22:13:25 +05:30
Dhruv Nair
e7fe4ce92f [AutoModel] Fix bug with subfolders and local model paths when loading custom code (#13197)
* update

* update
2026-03-02 17:44:25 +05:30
Miguel Martin
212db7b999 Cosmos Transfer2.5 Auto-Regressive Inference Pipeline (#13114)
* AR

* address comments

* address comments 2
2026-02-25 14:42:29 -10:00
Sayak Paul
5e94d62eb4 migrate to transformers v5 (#12976)
* switch to transformers main again./

* more

* up

* up

* fix group offloading.

* attributes

* up

* up

* tie embedding issue.

* fix t5 stuff for more.

* matrix configuration to see differences between 4.57.3 and main failures.

* change qwen expected slice because of how init is handled in v5.

* same stuff.

* up

* up

* Revert "up"

This reverts commit 515dd06db5.

* Revert "up"

This reverts commit 5274ffdd7f.

* up

* up

* fix with peft_format.

* just keep main for easier debugging.

* remove torchvision.

* empty

* up

* up with skyreelsv2 fixes.

* fix skyreels type annotation.

* up

* up

* fix variant loading issues.

* more fixes.

* fix dduf

* fix

* fix

* fix

* more fixes

* fixes

* up

* up

* fix dduf test

* up

* more

* update

* hopefully ,final?

* one last breath

* always install from main

* up

* audioldm tests

* up

* fix PRX tests.

* up

* kandinsky fixes

* qwen fixes.

* prx

* hidream
2026-02-24 10:53:56 +05:30
Dhruv Nair
4890e9bf70 Allow Automodel to use from_config with custom code. (#13123)
* update

* update
2026-02-23 21:55:59 +05:30
Dhruv Nair
db2d7e7bc4 [CI] Fix new LoRAHotswap tests (#13163)
update

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2026-02-20 09:01:20 +05:30
Sayak Paul
35086ac06a [core] support device type device_maps to work with offloading. (#12811)
* support device type device_maps to work with offloading.

* add tests.

* fix tests

* skip tests where it's not supported.

* empty

* up

* up

* fix allegro.
2026-02-16 16:31:45 +05:30
Sayak Paul
e390646f25 [tests] accept recompile_limit from the user in tests (#13150)
accept recompile_limit from the user in tests
2026-02-16 14:48:21 +05:30
Sayak Paul
2843b3d37a Sunset Python 3.8 & get rid of explicit typing exports where possible (#12524)
* drop python 3.8

* remove list, tuple, dict from typing

* fold Unions into |

* up

* fix a bunch and please me.

* up

* up

* up

* up

* up

* up

* enforce 3.10.0.

* up

* up

* up

* up

* up

* up

* up

* up

* Update setup.py

Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>

* up.

* python 3.10.

* ifx

* up

* up

* up

* up

* final

* up

* fix typing utils.

* up

* up

* up

* up

* up

* up

* fix

* up

* up

* up

* up

* up

* up

* handle modern types.

* up

* up

* fix ip adapter type checking.

* up

* up

* up

* up

* up

* up

* up

* revert docstring changes.

* keep deleted files deleted.

* keep deleted files deleted.

---------

Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>
2026-02-13 18:16:51 +05:30
Miguel Martin
a1816166a5 Cosmos Transfer2.5 inference pipeline: general/{seg, depth, blur, edge} (#13066)
* initial conversion script

* cosmos control net block

* CosmosAttention

* base model conversion

* wip

* pipeline updates

* convert controlnet

* pipeline: working without controls

* wip

* debugging

* Almost working

* temp

* control working

* cleanup + detail on neg_encoder_hidden_states

* convert edge

* pos emb for control latents

* convert all chkpts

* resolve TODOs

* remove prints

* Docs

* add siglip image reference encoder

* Add unit tests

* controlnet: add duplicate layers

* Additional tests

* skip less

* skip less

* remove image_ref

* minor

* docs

* remove skipped test in transfer

* Don't crash process

* formatting

* revert some changes

* remove skipped test

* make style

* Address comment + fix example

* CosmosAttnProcessor2_0 revert + CosmosAttnProcessor2_5 changes

* make style

* make fix-copies
2026-02-11 18:33:09 -10:00
Dhruv Nair
c3a4cd14b8 [CI] Refactor Wan Model Tests (#13082)
* update

* update

* update

* update

* update

* update

* update

* update
2026-02-11 14:42:58 +05:30
Dhruv Nair
0b76728e27 Refactor Model Tests (#12822)
* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2026-02-02 18:51:44 +05:30
Kashif Rasul
d54669a73e [Qwen] avoid creating attention masks when there is no padding (#12987)
* avoid creating attention masks when there is no padding

* make fix-copies

* torch compile tests

* set all ones mask to none

* fix positional encoding from becoming > 4096

* fix from review

* slice freqs_cis to match the input sequence length

* keep only attenton masking change

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2026-01-27 12:42:48 -10:00
Kashif Rasul
dad5cb55e6 Fix QwenImage txt_seq_lens handling (#12702)
* Fix QwenImage txt_seq_lens handling

* formatting

* formatting

* remove txt_seq_lens and use bool  mask

* use compute_text_seq_len_from_mask

* add seq_lens to dispatch_attention_fn

* use joint_seq_lens

* remove unused index_block

* WIP: Remove seq_lens parameter and use mask-based approach

- Remove seq_lens parameter from dispatch_attention_fn
- Update varlen backends to extract seqlens from masks
- Update QwenImage to pass 2D joint_attention_mask
- Fix native backend to handle 2D boolean masks
- Fix sage_varlen seqlens_q to match seqlens_k for self-attention

Note: sage_varlen still producing black images, needs further investigation

* fix formatting

* undo sage changes

* xformers support

* hub fix

* fix torch compile issues

* fix tests

* use _prepare_attn_mask_native

* proper deprecation notice

* add deprecate to txt_seq_lens

* Update src/diffusers/models/transformers/transformer_qwenimage.py

Co-authored-by: YiYi Xu <yixu310@gmail.com>

* Update src/diffusers/models/transformers/transformer_qwenimage.py

Co-authored-by: YiYi Xu <yixu310@gmail.com>

* Only create the mask if there's actual padding

* fix order of docstrings

* Adds performance benchmarks and optimization details for QwenImage

Enhances documentation with comprehensive performance insights for QwenImage pipeline:

* rope_text_seq_len = text_seq_len

* rename to max_txt_seq_len

* removed deprecated args

* undo unrelated change

* Updates QwenImage performance documentation

Removes detailed attention backend benchmarks and simplifies torch.compile performance description

Focuses on key performance improvement with torch.compile, highlighting the specific speedup from 4.70s to 1.93s on an A100 GPU

Streamlines the documentation to provide more concise and actionable performance insights

* Updates deprecation warnings for txt_seq_lens parameter

Extends deprecation timeline for txt_seq_lens from version 0.37.0 to 0.39.0 across multiple Qwen image-related models

Adds a new unit test to verify the deprecation warning behavior for the txt_seq_lens parameter

* fix compile

* formatting

* fix compile tests

* rename helper

* remove duplicate

* smaller values

* removed

* use torch.cond for torch compile

* Construct joint attention mask once

* test different backends

* construct joint attention mask once to avoid reconstructing in every block

* Update src/diffusers/models/attention_dispatch.py

Co-authored-by: YiYi Xu <yixu310@gmail.com>

* formatting

* raising an error from the EditPlus pipeline when batch_size > 1

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: YiYi Xu <yixu310@gmail.com>
Co-authored-by: cdutr <dutra_carlos@hotmail.com>
2026-01-12 13:45:09 +05:30
dg845
c10bdd9b73 Add LTX 2.0 Video Pipelines (#12915)
* Initial LTX 2.0 transformer implementation

* Add tests for LTX 2 transformer model

* Get LTX 2 transformer tests working

* Rename LTX 2 compile test class to have LTX2

* Remove RoPE debug print statements

* Get LTX 2 transformer compile tests passing

* Fix LTX 2 transformer shape errors

* Initial script to convert LTX 2 transformer to diffusers

* Add more LTX 2 transformer audio arguments

* Allow LTX 2 transformer to be loaded from local path for conversion

* Improve dummy inputs and add test for LTX 2 transformer consistency

* Fix LTX 2 transformer bugs so consistency test passes

* Initial implementation of LTX 2.0 video VAE

* Explicitly specify temporal and spatial VAE scale factors when converting

* Add initial LTX 2.0 video VAE tests

* Add initial LTX 2.0 video VAE tests (part 2)

* Get diffusers implementation on par with official LTX 2.0 video VAE implementation

* Initial LTX 2.0 vocoder implementation

* Use RMSNorm implementation closer to original for LTX 2.0 video VAE

* start audio decoder.

* init registration.

* up

* simplify and clean up

* up

* Initial LTX 2.0 text encoder implementation

* Rough initial LTX 2.0 pipeline implementation

* up

* up

* up

* up

* Add imports for LTX 2.0 Audio VAE

* Conversion script for LTX 2.0 Audio VAE Decoder

* Add Audio VAE logic to T2V pipeline

* Duplicate scheduler for audio latents

* Support num_videos_per_prompt for prompt embeddings

* LTX 2.0 scheduler and full pipeline conversion

* Add script to test full LTX2Pipeline T2V inference

* Fix pipeline return bugs

* Add LTX 2 text encoder and vocoder to ltx2 subdirectory __init__

* Fix more bugs in LTX2Pipeline.__call__

* Improve CPU offload support

* Fix pipeline audio VAE decoding dtype bug

* Fix video shape error in full pipeline test script

* Get LTX 2 T2V pipeline to produce reasonable outputs

* Make LTX 2.0 scheduler more consistent with original code

* Fix typo when applying scheduler fix in T2V inference script

* Refactor Audio VAE to be simpler and remove helpers (#7)

* remove resolve causality axes stuff.

* remove a bunch of helpers.

* remove adjust output shape helper.

* remove the use of audiolatentshape.

* move normalization and patchify out of pipeline.

* fix

* up

* up

* Remove unpatchify and patchify ops before audio latents denormalization (#9)

---------

Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com>

* Add support for I2V (#8)

* start i2v.

* up

* up

* up

* up

* up

* remove uniform strategy code.

* remove unneeded code.

* Denormalize audio latents in I2V pipeline (analogous to T2V change) (#11)

* test i2v.

* Move Video and Audio Text Encoder Connectors to Transformer (#12)

* Denormalize audio latents in I2V pipeline (analogous to T2V change)

* Initial refactor to put video and audio text encoder connectors in transformer

* Get LTX 2 transformer tests working after connector refactor

* precompute run_connectors,.

* fixes

* Address review comments

* Calculate RoPE double precisions freqs using torch instead of np

* Further simplify LTX 2 RoPE freq calc

* Make connectors a separate module (#18)

* remove text_encoder.py

* address yiyi's comments.

* up

* up

* up

* up

---------

Co-authored-by: sayakpaul <spsayakpaul@gmail.com>

* up (#19)

* address initial feedback from lightricks team (#16)

* cross_attn_timestep_scale_multiplier to 1000

* implement split rope type.

* up

* propagate rope_type to rope embed classes as well.

* up

* When using split RoPE, make sure that the output dtype is same as input dtype

* Fix apply split RoPE shape error when reshaping x to 4D

* Add export_utils file for exporting LTX 2.0 videos with audio

* Tests for T2V and I2V (#6)

* add ltx2 pipeline tests.

* up

* up

* up

* up

* remove content

* style

* Denormalize audio latents in I2V pipeline (analogous to T2V change)

* Initial refactor to put video and audio text encoder connectors in transformer

* Get LTX 2 transformer tests working after connector refactor

* up

* up

* i2v tests.

* up

* Address review comments

* Calculate RoPE double precisions freqs using torch instead of np

* Further simplify LTX 2 RoPE freq calc

* revert unneded changes.

* up

* up

* update to split style rope.

* up

---------

Co-authored-by: Daniel Gu <dgu8957@gmail.com>

* up

* use export util funcs.

* Point original checkpoint to LTX 2.0 official checkpoint

* Allow the I2V pipeline to accept image URLs

* make style and make quality

* remove function map.

* remove args.

* update docs.

* update doc entries.

* disable ltx2_consistency test

* Simplify LTX 2 RoPE forward by removing coords is None logic

* make style and make quality

* Support LTX 2.0 audio VAE encoder

* Apply suggestions from code review

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

* Remove print statement in audio VAE

* up

* Fix bug when calculating audio RoPE coords

* Ltx 2 latent upsample pipeline (#12922)

* Initial implementation of LTX 2.0 latent upsampling pipeline

* Add new LTX 2.0 spatial latent upsampler logic

* Add test script for LTX 2.0 latent upsampling

* Add option to enable VAE tiling in upsampling test script

* Get latent upsampler working with video latents

* Fix typo in BlurDownsample

* Add latent upsample pipeline docstring and example

* Remove deprecated pipeline VAE slicing/tiling methods

* make style and make quality

* When returning latents, return unpacked and denormalized latents for T2V and I2V

* Add model_cpu_offload_seq for latent upsampling pipeline

---------

Co-authored-by: Daniel Gu <dgu8957@gmail.com>

* Fix latent upsampler filename in LTX 2 conversion script

* Add latent upsample pipeline to LTX 2 docs

* Add dummy objects for LTX 2 latent upsample pipeline

* Set default FPS to official LTX 2 ckpt default of 24.0

* Set default CFG scale to official LTX 2 ckpt default of 4.0

* Update LTX 2 pipeline example docstrings

* make style and make quality

* Remove LTX 2 test scripts

* Fix LTX 2 upsample pipeline example docstring

* Add logic to convert and save a LTX 2 upsampling pipeline

* Document LTX2VideoTransformer3DModel forward pass

---------

Co-authored-by: sayakpaul <spsayakpaul@gmail.com>
2026-01-07 21:24:27 -08:00
swappy
f12d161d67 Fix broken group offloading with block_level for models with standalone layers (#12692)
* fix: group offloading to support standalone computational layers in block-level offloading

* test: for models with standalone and deeply nested layers in block-level offloading

* feat: support for block-level offloading in group offloading config

* fix: group offload block modules to AutoencoderKL and AutoencoderKLWan

* fix: update group offloading tests to use AutoencoderKL and adjust input dimensions

* refactor: streamline block offloading logic

* Apply style fixes

* update tests

* update

* fix for failing tests

* clean up

* revert to use skip_keys

* clean up

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>
2025-12-05 18:54:05 +05:30
Sayak Paul
a1f36ee3ef [Z-Image] various small changes, Z-Image transformer tests, etc. (#12741)
* start zimage model tests.

* up

* up

* up

* up

* up

* up

* up

* up

* up

* up

* up

* up

* Revert "up"

This reverts commit bca3e27c96.

* expand upon compilation failure reason.

* Update tests/models/transformers/test_models_transformer_z_image.py

Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com>

* reinitialize the padding tokens to ones to prevent NaN problems.

* updates

* up

* skipping ZImage DiT tests

* up

* up

---------

Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com>
2025-12-03 19:35:46 +05:30
Sayak Paul
d96cbacacd [tests] fix hunuyanvideo 1.5 offloading tests. (#12782)
fix hunuyanvideo 1.5 offloading tests.
2025-12-03 18:07:59 +05:30
Kimbing Ng
3c05b9f71c Fixes #12673. record_stream in group offloading is not working properly (#12721)
* Fixes #12673.

    Wrong default_stream is used. leading to wrong execution order when record_steram is enabled.

* update

* Update test

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2025-12-03 11:37:11 +05:30
YiYi Xu
6156cf8f22 Hunyuanvideo15 (#12696)
* add


---------

Co-authored-by: yiyi@huggingface.co <yiyi@ip-26-0-161-123.ec2.internal>
Co-authored-by: yiyi@huggingface.co <yiyi@ip-26-0-160-103.ec2.internal>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2025-11-30 20:27:59 -10:00
Sayak Paul
5ffb73d4ae let's go Flux2 🚀 (#12711)
* add vae

* Initial commit for Flux 2 Transformer implementation

* add pipeline part

* small edits to the pipeline and conversion

* update conversion script

* fix

* up up

* finish pipeline

* Remove Flux IP Adapter logic for now

* Remove deprecated 3D id logic

* Remove ControlNet logic for now

* Add link to ViT-22B paper as reference for parallel transformer blocks such as the Flux 2 single stream block

* update pipeline

* Don't use biases for input projs and output AdaNorm

* up

* Remove bias for double stream block text QKV projections

* Add script to convert Flux 2 transformer to diffusers

* make style and make quality

* fix a few things.

* allow sft files to go.

* fix image processor

* fix batch

* style a bit

* Fix some bugs in Flux 2 transformer implementation

* Fix dummy input preparation and fix some test bugs

* fix dtype casting in timestep guidance module.

* resolve conflicts.,

* remove ip adapter stuff.

* Fix Flux 2 transformer consistency test

* Fix bug in Flux2TransformerBlock (double stream block)

* Get remaining Flux 2 transformer tests passing

* make style; make quality; make fix-copies

* remove stuff.

* fix type annotaton.

* remove unneeded stuff from tests

* tests

* up

* up

* add sf support

* Remove unused IP Adapter and ControlNet logic from transformer (#9)

* copied from

* Apply suggestions from code review

Co-authored-by: YiYi Xu <yixu310@gmail.com>
Co-authored-by: apolinário <joaopaulo.passos@gmail.com>

* up

* up

* up

* up

* up

* Refactor Flux2Attention into separate classes for double stream and single stream attention

* Add _supports_qkv_fusion to AttentionModuleMixin to allow subclasses to disable QKV fusion

* Have Flux2ParallelSelfAttention inherit from AttentionModuleMixin with _supports_qkv_fusion=False

* Log debug message when calling fuse_projections on a AttentionModuleMixin subclass that does not support QKV fusion

* Address review comments

* Update src/diffusers/pipelines/flux2/pipeline_flux2.py

Co-authored-by: YiYi Xu <yixu310@gmail.com>

* up

* Remove maybe_allow_in_graph decorators for Flux 2 transformer blocks (#12)

* up

* support ostris loras. (#13)

* up

* update schdule

* up

* up (#17)

* add training scripts (#16)

* add training scripts

Co-authored-by: Linoy Tsaban <linoytsaban@gmail.com>

* model cpu offload in validation.

* add flux.2 readme

* add img2img and tests

* cpu offload in log validation

* Apply suggestions from code review

* fix

* up

* fixes

* remove i2i training tests for now.

---------

Co-authored-by: Linoy Tsaban <linoytsaban@gmail.com>
Co-authored-by: linoytsaban <linoy@huggingface.co>

* up

---------

Co-authored-by: yiyixuxu <yixu310@gmail.com>
Co-authored-by: Daniel Gu <dgu8957@gmail.com>
Co-authored-by: yiyi@huggingface.co <yiyi@ip-10-53-87-203.ec2.internal>
Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com>
Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>
Co-authored-by: apolinário <joaopaulo.passos@gmail.com>
Co-authored-by: yiyi@huggingface.co <yiyi@ip-26-0-160-103.ec2.internal>
Co-authored-by: Linoy Tsaban <linoytsaban@gmail.com>
Co-authored-by: linoytsaban <linoy@huggingface.co>
2025-11-25 21:49:04 +05:30
Sayak Paul
cd3bbe2910 skip autoencoderdl layerwise casting memory (#12647) 2025-11-13 12:56:22 +05:30
dg845
d8e4805816 [WIP]Add Wan2.2 Animate Pipeline (Continuation of #12442 by tolgacangoz) (#12526)
---------

Co-authored-by: Tolga Cangöz <mtcangoz@gmail.com>
Co-authored-by: Tolga Cangöz <46008593+tolgacangoz@users.noreply.github.com>
2025-11-12 16:52:31 -10:00
Junsong Chen
b3e9dfced7 [SANA-Video] Adding 5s pre-trained 480p SANA-Video inference (#12584)
* 1. add `SanaVideoTransformer3DModel` in transformer_sana_video.py
2. add `SanaVideoPipeline` in pipeline_sana_video.py
3. add all code we need for import `SanaVideoPipeline`

* add a sample about how to use sana-video;

* code update;

* update hf model path;

* update code;

* sana-video can run now;

* 1. add aspect ratio in sana-video-pipeline;
2. add reshape function in sana-video-processor;
3. fix convert pth to safetensor bugs;

* default to use `use_resolution_binning`;

* make style;

* remove unused code;

* Update src/diffusers/models/transformers/transformer_sana_video.py

Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com>

* Update src/diffusers/models/transformers/transformer_sana_video.py

Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com>

* Update src/diffusers/models/transformers/transformer_sana_video.py

Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com>

* Update src/diffusers/pipelines/sana/pipeline_sana_video.py

Co-authored-by: YiYi Xu <yixu310@gmail.com>

* Update src/diffusers/models/transformers/transformer_sana_video.py

Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com>

* Update src/diffusers/models/transformers/transformer_sana_video.py

Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com>

* Update src/diffusers/models/transformers/transformer_sana_video.py

* Update src/diffusers/pipelines/sana/pipeline_sana_video.py

Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com>

* Update src/diffusers/models/transformers/transformer_sana_video.py

Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com>

* Update src/diffusers/pipelines/sana/pipeline_sana_video.py

Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com>

* support `dispatch_attention_fn`

* 1. add sana-video markdown;
2. fix typos;

* add two test case for sana-video (need check)

* fix text-encoder in test-sana-video;

* Update tests/pipelines/sana/test_sana_video.py

* Update tests/pipelines/sana/test_sana_video.py

Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com>

* Update tests/pipelines/sana/test_sana_video.py

Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com>

* Update tests/pipelines/sana/test_sana_video.py

Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com>

* Update tests/pipelines/sana/test_sana_video.py

Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com>

* Update tests/pipelines/sana/test_sana_video.py

Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com>

* Update src/diffusers/pipelines/sana/pipeline_sana_video.py

Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com>

* Update src/diffusers/video_processor.py

Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com>

* make style
make quality
make fix-copies

* toctree yaml update;

* add sana-video-transformer3d markdown;

* Apply style fixes

---------

Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com>
Co-authored-by: YiYi Xu <yixu310@gmail.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2025-11-05 21:08:47 -08:00
galbria
84e16575e4 Bria fibo (#12545)
* Bria FIBO pipeline

* style fixs

* fix CR

* Refactor BriaFibo classes and update pipeline parameters

- Updated BriaFiboAttnProcessor and BriaFiboAttention classes to reflect changes from Flux equivalents.
- Modified the _unpack_latents method in BriaFiboPipeline to improve clarity.
- Increased the default max_sequence_length to 3000 and added a new optional parameter do_patching.
- Cleaned up test_pipeline_bria_fibo.py by removing unused imports and skipping unsupported tests.

* edit the docs of FIBO

* Remove unused BriaFibo imports and update CPU offload method in BriaFiboPipeline

* Refactor FIBO classes to BriaFibo naming convention

- Updated class names from FIBO to BriaFibo for consistency across the module.
- Modified instances of FIBOEmbedND, FIBOTimesteps, TextProjection, and TimestepProjEmbeddings to reflect the new naming.
- Ensured all references in the BriaFiboTransformer2DModel are updated accordingly.

* Add BriaFiboTransformer2DModel import to transformers module

* Remove unused BriaFibo imports from modular pipelines and add BriaFiboTransformer2DModel and BriaFiboPipeline classes to dummy objects for enhanced compatibility with torch and transformers.

* Update BriaFibo classes with copied documentation and fix import typo in pipeline module

- Added documentation comments indicating the source of copied code in BriaFiboTransformerBlock and _pack_latents methods.
- Corrected the import statement for BriaFiboPipeline in the pipelines module.

* Remove unused BriaFibo imports from __init__.py to streamline modular pipelines.

* Refactor documentation comments in BriaFibo classes to indicate inspiration from existing implementations

- Updated comments in BriaFiboAttnProcessor, BriaFiboAttention, and BriaFiboPipeline to reflect that the code is inspired by other modules rather than copied.
- Enhanced clarity on the origins of the methods to maintain proper attribution.

* change Inspired by to Based on

* add reference link and fix trailing whitespace

* Add BriaFiboTransformer2DModel documentation and update comments in BriaFibo classes

- Introduced a new documentation file for BriaFiboTransformer2DModel.
- Updated comments in BriaFiboAttnProcessor, BriaFiboAttention, and BriaFiboPipeline to clarify the origins of the code, indicating copied sources for better attribution.

---------

Co-authored-by: sayakpaul <spsayakpaul@gmail.com>
2025-10-28 16:27:48 +05:30
Sayak Paul
55d49d4379 [ci] don't run sana layerwise casting tests in CI. (#12551)
* don't run sana layerwise casting tests in CI.

* up
2025-10-28 13:29:51 +05:30
Sayak Paul
a5a0ccf86a [core] AutoencoderMixin to abstract common methods (#12473)
* up

* correct wording.

* up

* up

* up
2025-10-22 08:52:06 +05:30
David Bertoin
dd07b19e27 Prx (#12525)
* rename photon to prx

* rename photon into prx

* Revert .gitignore to state before commit b7fb0fe9d6

* rename photon to prx

* rename photon into prx

* Revert .gitignore to state before commit b7fb0fe9d6

* make fix-copies
2025-10-21 17:09:22 -07:00
David Bertoin
cefc2cf82d Add Photon model and pipeline support (#12456)
* Add Photon model and pipeline support

This commit adds support for the Photon image generation model:
- PhotonTransformer2DModel: Core transformer architecture
- PhotonPipeline: Text-to-image generation pipeline
- Attention processor updates for Photon-specific attention mechanism
- Conversion script for loading Photon checkpoints
- Documentation and tests

* just store the T5Gemma encoder

* enhance_vae_properties if vae is provided only

* remove autocast for text encoder forwad

* BF16 example

* conditioned CFG

* remove enhance vae and use vae.config directly when possible

* move PhotonAttnProcessor2_0 in transformer_photon

* remove einops dependency and now inherits from AttentionMixin

* unify the structure of the forward block

* update doc

* update doc

* fix T5Gemma loading from hub

* fix timestep shift

* remove lora support from doc

* Rename EmbedND for PhotoEmbedND

* remove modulation dataclass

* put _attn_forward and _ffn_forward logic in PhotonBlock's forward

* renam LastLayer for FinalLayer

* remove lora related code

* rename vae_spatial_compression_ratio for vae_scale_factor

* support prompt_embeds in call

* move xattention conditionning out computation out of the denoising loop

* add negative prompts

* Use _import_structure for lazy loading

* make quality + style

* add pipeline test + corresponding fixes

* utility function that determines the default resolution given the VAE

* Refactor PhotonAttention to match Flux pattern

* built-in RMSNorm

* Revert accidental .gitignore change

* parameter names match the standard diffusers conventions

* renaming and remove unecessary attributes setting

* Update docs/source/en/api/pipelines/photon.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* quantization example

* added doc to toctree

* Update docs/source/en/api/pipelines/photon.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/api/pipelines/photon.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/api/pipelines/photon.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* use dispatch_attention_fn for multiple attention backend support

* naming changes

* make fix copy

* Update docs/source/en/api/pipelines/photon.md

Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com>

* Add PhotonTransformer2DModel to TYPE_CHECKING imports

* make fix-copies

* Use Tuple instead of tuple

Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com>

* restrict the version of transformers

Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com>

* Update tests/pipelines/photon/test_pipeline_photon.py

Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com>

* Update tests/pipelines/photon/test_pipeline_photon.py

Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com>

* change | for Optional

* fix nits.

* use typing Dict

---------

Co-authored-by: davidb <davidb@worker-10.soperator-worker-svc.soperator.svc.cluster.local>
Co-authored-by: David Briand <david@photoroom.com>
Co-authored-by: davidb <davidb@worker-8.soperator-worker-svc.soperator.svc.cluster.local>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com>
Co-authored-by: sayakpaul <spsayakpaul@gmail.com>
2025-10-21 20:55:55 +05:30
Sayak Paul
af769881d3 [tests] introduce VAETesterMixin to consolidate tests for slicing and tiling (#12374)
* up

* up

* up

* up

* up

* u[

* up

* up

* up
2025-10-17 12:02:29 +05:30
Benjamin Bossan
7242b5ff62 FIX Test to ignore warning for enable_lora_hotswap (#12421)
I noticed that the test should be for the option check_compiled="ignore"
but it was using check_compiled="warn". This has been fixed, now the
correct argument is passed.

However, the fact that the test passed means that it was incorrect to
begin with. The way that logs are collected does not collect the
logger.warning call here (not sure why). To amend this, I'm now using
assertNoLogs. With this change, the test correctly fails when the wrong
argument is passed.
2025-10-02 20:57:11 +02:00
Lucain
b59654544b Install latest prerelease from huggingface_hub when installing transformers from main (#12395)
* Allow prerelease when installing transformers from main

* maybe better

* maybe better

* and now?

* just bored

* should be better

* works now
2025-09-30 17:02:33 +05:30
Sayak Paul
19085ac8f4 Don't skip Qwen model tests for group offloading with disk (#12382)
u[
2025-09-29 13:08:05 +05:30
Lucain
ec5449f3a1 Support both huggingface_hub v0.x and v1.x (#12389)
* Support huggingface_hub 0.x and 1.x

* httpx
2025-09-25 18:28:54 +02:00
Sayak Paul
ffc8c0c1e1 [tests] feat: add AoT compilation tests (#12203)
* feat: add a test for aot.

* up
2025-09-03 11:15:27 +05:30
Dhruv Nair
7aa6af1138 [Refactor] Move testing utils out of src (#12238)
* update

* update

* update

* update

* update

* merge main

* Revert "merge main"

This reverts commit 65efbcead5.
2025-08-28 19:53:02 +05:30