Commit Graph

593 Commits

Author SHA1 Message Date
Howard Zhang
1fe2125802 remove str option for quantization config in torchao (#13291)
* remove str option for quantization config in torchao

* Apply style fixes

* minor fixes

* Added AOBaseConfig docs to torchao.md

* minor fixes for removing str option torchao

* minor change to add back int and uint check

* minor fixes

* minor fixes to tests

* Update tests/quantization/torchao/test_torchao.py

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

* Update docs/source/en/quantization/torchao.md

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

* Update tests/quantization/torchao/test_torchao.py

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

* version=2 update to test_torchao.py

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2026-03-27 08:52:37 +05:30
dg845
7298f5be93 Update LTX-2 Docs to Cover LTX-2.3 Models (#13337)
* Update LTX-2 docs to cover multimodal guidance and prompt enhancement

* Apply suggestions from code review

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Apply reviewer feedback

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2026-03-26 17:51:29 -07:00
Kashif Rasul
762ae059fa [LLADA2] documentation fixes (#13333)
documentation fixes
2026-03-25 17:49:31 +05:30
Kashif Rasul
5d207e756e [Discrete Diffusion] Add LLaDA2 pipeline (#13226)
* feat: add LLaDA2 and BlockRefinement pipelines for discrete text diffusion

Add support for LLaDA2/LLaDA2.1 discrete diffusion text generation:
- BlockRefinementPipeline: block-wise iterative refinement with confidence-based
  token commitment, supporting editing threshold for LLaDA2.1 models
- LLaDA2Pipeline: convenience wrapper with LLaDA2-specific defaults
- DiscreteDiffusionPipelineMixin: shared SAR sampling utilities (top-k, top-p,
  temperature) and prompt/prefix helpers
- compute_confidence_aware_loss: CAP-style training loss
- Examples: sampling scripts for LLaDA2 and block refinement, training scripts
  with Qwen causal LM
- Docs and tests included

* feat: add BlockRefinementScheduler for commit-by-confidence scheduling

Extract the confidence-based token commit logic from BlockRefinementPipeline
into a dedicated BlockRefinementScheduler, following diffusers conventions.

The scheduler owns:
- Transfer schedule computation (get_num_transfer_tokens)
- Timestep management (set_timesteps)
- Step logic: confidence-based mask-filling and optional token editing

The pipeline now delegates scheduling to self.scheduler.step() and accepts
a scheduler parameter in __init__.

* test: add unit tests for BlockRefinementScheduler

12 tests covering set_timesteps, get_num_transfer_tokens, step logic
(confidence-based commits, threshold behavior, editing, prompt masking,
batched inputs, tuple output).

* docs: add toctree entries and standalone scheduler doc page

- Add BlockRefinement and LLaDA2 to docs sidebar navigation
- Add BlockRefinementScheduler to schedulers sidebar navigation
- Move scheduler autodoc to its own page under api/schedulers/

* feat: add --revision flag and fix dtype deprecation in sample_llada2.py

- Add --revision argument for loading model revisions from the Hub
- Replace deprecated torch_dtype with dtype for transformers 5.x compat

* fix: use 1/0 attention mask instead of 0/-inf for LLaDA2 compat

LLaDA2 models expect a boolean-style (1/0) attention mask, not an
additive (0/-inf) mask. The model internally converts to additive,
so passing 0/-inf caused double-masking and gibberish output.

* refactor: consolidate training scripts into single train_block_refinement.py

- Remove toy train_block_refinement_cap.py (self-contained demo with tiny model)
- Rename train_block_refinement_qwen_cap.py to train_block_refinement.py
  (already works with any causal LM via AutoModelForCausalLM)
- Fix torch_dtype deprecation and update README with correct script names

* fix formatting

* docs: improve LLaDA2 and BlockRefinement documentation

- Add usage examples with real model IDs and working code
- Add recommended parameters table for LLaDA2.1 quality/speed modes
- Note that editing is LLaDA2.1-only (not for LLaDA2.0 models)
- Remove misleading config defaults section from BlockRefinement docs

* feat: set LLaDA2Pipeline defaults to recommended model parameters

- threshold: 0.95 -> 0.7 (quality mode)
- max_post_steps: 0 -> 16 (recommended for LLaDA2.1, harmless for 2.0)
- eos_early_stop: False -> True (stop at EOS token)

block_length=32, steps=32, temperature=0.0 were already correct.
editing_threshold remains None (users enable for LLaDA2.1 models).

* feat: default editing_threshold=0.5 for LLaDA2.1 quality mode

LLaDA2.1 is the current generation. Users with LLaDA2.0 models can
disable editing by passing editing_threshold=None.

* fix: align sampling utilities with official LLaDA2 implementation

- top_p filtering: add shift-right to preserve at least one token above
  threshold (matches official code line 1210)
- temperature ordering: apply scaling before top-k/top-p filtering so
  filtering operates on scaled logits (matches official code lines 1232-1235)
- greedy branch: return argmax directly when temperature=0 without
  filtering (matches official code lines 1226-1230)

* refactor: remove duplicate prompt encoding, reuse mixin's _prepare_input_ids

LLaDA2Pipeline._prepare_prompt_ids was a near-copy of
DiscreteDiffusionPipelineMixin._prepare_input_ids. Remove the duplicate
and call the mixin method directly. Also simplify _extract_input_ids
since we always pass return_dict=True.

* formatting

* fix: replace deprecated torch_dtype with dtype in examples and docstrings

- Update EXAMPLE_DOC_STRING to use dtype= and LLaDA2.1-mini model ID
- Fix sample_block_refinement.py to use dtype=

* remove BlockRefinementPipeline

* cleanup

* fix readme

* Update src/diffusers/pipelines/llada2/pipeline_llada2.py

Co-authored-by: YiYi Xu <yixu310@gmail.com>

* Update src/diffusers/pipelines/llada2/pipeline_llada2.py

Co-authored-by: YiYi Xu <yixu310@gmail.com>

* Update src/diffusers/pipelines/llada2/pipeline_llada2.py

Co-authored-by: YiYi Xu <yixu310@gmail.com>

* Update src/diffusers/pipelines/llada2/pipeline_llada2.py

Co-authored-by: YiYi Xu <yixu310@gmail.com>

* Update src/diffusers/pipelines/llada2/pipeline_llada2.py

Co-authored-by: YiYi Xu <yixu310@gmail.com>

* Update src/diffusers/pipelines/llada2/pipeline_llada2.py

Co-authored-by: YiYi Xu <yixu310@gmail.com>

* Update src/diffusers/pipelines/llada2/pipeline_llada2.py

Co-authored-by: YiYi Xu <yixu310@gmail.com>

* Update src/diffusers/pipelines/llada2/pipeline_llada2.py

Co-authored-by: YiYi Xu <yixu310@gmail.com>

* Update src/diffusers/pipelines/llada2/pipeline_llada2.py

Co-authored-by: YiYi Xu <yixu310@gmail.com>

* Update src/diffusers/pipelines/llada2/pipeline_llada2.py

Co-authored-by: YiYi Xu <yixu310@gmail.com>

* Update src/diffusers/pipelines/llada2/pipeline_llada2.py

Co-authored-by: YiYi Xu <yixu310@gmail.com>

* Update src/diffusers/pipelines/llada2/pipeline_llada2.py

Co-authored-by: YiYi Xu <yixu310@gmail.com>

* Update src/diffusers/pipelines/llada2/pipeline_llada2.py

Co-authored-by: YiYi Xu <yixu310@gmail.com>

* Update src/diffusers/pipelines/llada2/pipeline_llada2.py

Co-authored-by: YiYi Xu <yixu310@gmail.com>

* Update src/diffusers/pipelines/llada2/pipeline_llada2.py

Co-authored-by: YiYi Xu <yixu310@gmail.com>

* removed DiscreteDiffusionPipelineMixin

* add support for 2d masks for flash attn

* Update src/diffusers/training_utils.py

Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com>

* Update src/diffusers/training_utils.py

Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com>

* fix issues from review

* added tests

* formatting

* add check_eos_finished to scheduler

* Update src/diffusers/pipelines/llada2/pipeline_llada2.py

Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com>

* Update src/diffusers/pipelines/llada2/pipeline_llada2.py

Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com>

* Update src/diffusers/pipelines/llada2/pipeline_llada2.py

Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com>

* Update src/diffusers/pipelines/llada2/pipeline_llada2.py

Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com>

* Update src/diffusers/schedulers/scheduling_block_refinement.py

Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com>

* Update src/diffusers/pipelines/llada2/pipeline_llada2.py

Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com>

* Update src/diffusers/pipelines/llada2/pipeline_llada2.py

Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com>

* Update src/diffusers/pipelines/llada2/pipeline_llada2.py

Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com>

* Update src/diffusers/schedulers/scheduling_block_refinement.py

Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com>

* fix renaming issues and types

* remove duplicate check

* Update docs/source/en/api/pipelines/llada2.md

Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com>

* Update src/diffusers/pipelines/llada2/pipeline_llada2.py

Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com>

* Update src/diffusers/pipelines/llada2/pipeline_llada2.py

Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com>

---------

Co-authored-by: YiYi Xu <yixu310@gmail.com>
Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com>
2026-03-25 16:17:50 +05:30
ddavidchick
ef309a1bb0 Add KVAE 1.0 (#13033)
* add kvae2d

* add kvae3d video

* add docs for kvae2d and kvae3d video

* style fixes

* fix kvae3d docs

* fix normalzation

* fix kvae video for code style

* fix kvae video

* kvae minor fixes

* add gradient ckpting for kvaes

* get rid of inplace ops kvae video

* add tests for KVAEs

* kvae2d normalization style change

* kvaes fix style

* update dummy_pt_objects test for kvaes

---------

Co-authored-by: YiYi Xu <yixu310@gmail.com>
2026-03-23 12:56:49 -10:00
Steven Liu
ed31974c3e [docs] updates (#13248)
* fixes

* few more links

* update zh

* fix
2026-03-16 13:24:57 -07:00
Sayak Paul
764f7ede33 [core] Flux2 klein kv followups (#13264)
* implement Flux2Transformer2DModelOutput.

* add output class to docs.

* add Flux2KleinKV to docs.

* add pipeline tests for klein kv.
2026-03-13 10:05:11 +05:30
Miguel Martin
0a2c26d0a4 Update Documentation for NVIDIA Cosmos (#13251)
* fix docs

* update main example
2026-03-11 09:14:56 -07:00
Shenghai Yuan
bd7a7a0b95 Optimize Helios docs (#13222)
optimize helios docs
2026-03-08 13:54:16 -10:00
Ando
8ec0a5ccad feat: implement rae autoencoder. (#13046)
* feat: implement three RAE encoders(dinov2, siglip2, mae)

* feat: finish first version of autoencoder_rae

* fix formatting

* make fix-copies

* initial doc

* fix latent_mean / latent_var init types to accept config-friendly inputs

* use mean and std convention

* cleanup

* add rae to diffusers script

* use imports

* use attention

* remove unneeded class

* example traiing script

* input and ground truth sizes have to be the same

* fix argument

* move loss to training script

* cleanup

* simplify mixins

* fix training script

* fix entrypoint for instantiating the AutoencoderRAE

* added encoder_image_size config

* undo last change

* fixes from pretrained weights

* cleanups

* address reviews

* fix train script to use pretrained

* fix conversion script review

* latebt normalization buffers are now always registered with no-op defaults

* Update examples/research_projects/autoencoder_rae/README.md

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

* Update src/diffusers/models/autoencoders/autoencoder_rae.py

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

* use image url

* Encoder is frozen

* fix slow test

* remove config

* use ModelTesterMixin and AutoencoderTesterMixin

* make quality

* strip final layernorm when converting

* _strip_final_layernorm_affine for training script

* fix test

* add dispatch forward and update conversion script

* update training script

* error out as soon as possible and add comments

* Update src/diffusers/models/autoencoders/autoencoder_rae.py

Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com>

* use buffer

* inline

* Update src/diffusers/models/autoencoders/autoencoder_rae.py

Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com>

* remove optional

* _noising takes a generator

* Update src/diffusers/models/autoencoders/autoencoder_rae.py

Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com>

* fix api

* rename

* remove unittest

* use randn_tensor

* fix device map on multigpu

* check if the key is missing in the original state dict and only then add to the allow_missing set

* remove initialize_weights

---------

Co-authored-by: wangyuqi <wangyuqi@MBP-FJDQNJTWYN-0208.local>
Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com>
2026-03-05 20:17:14 +05:30
Shenghai Yuan
ae5881ba77 Fix Helios paper link in documentation (#13213)
* Fix Helios paper link in documentation

Updated the link to the Helios paper for accuracy.

* Fix reference link in HeliosTransformer3DModel documentation

Updated the reference link for the Helios Transformer model paper.

* Update Helios research paper link in documentation

* Update Helios research paper link in documentation
2026-03-05 18:58:13 +05:30
dg845
ab6040ab2d Add LTX2 Condition Pipeline (#13058)
* LTX2 condition pipeline initial commit

* Fix pipeline import error

* Implement LTX-2-style general image conditioning

* Blend denoising output and clean latents in sample space instead of velocity space

* make style and make quality

* make fix-copies

* Rename LTX2VideoCondition image to frames

* Update LTX2ConditionPipeline example

* Remove support for image and video in __call__

* Put latent_idx_from_index logic inline

* Improve comment on using the conditioning mask in denoising loop

* Apply suggestions from code review

Co-authored-by: Álvaro Somoza <asomoza@users.noreply.github.com>

* make fix-copies

* Migrate to Python 3.9+ style type annotations without explicit typing imports

* Forward kwargs from preprocess/postprocess_video to preprocess/postprocess resp.

* Center crop LTX-2 conditions following original code

* Duplicate video and audio position ids if using CFG

* make style and make quality

* Remove unused index_type arg to preprocess_conditions

* Add # Copied from for _normalize_latents

* Fix _normalize_latents # Copied from statement

* Add LTX-2 condition pipeline docs

* Remove TODOs

* Support only unpacked latents (5D for video, 4D for audio)

* Remove # Copied from for prepare_audio_latents

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: Álvaro Somoza <asomoza@users.noreply.github.com>
2026-03-05 00:42:55 -08:00
dg845
33f785b444 Add Helios-14B Video Generation Pipelines (#13208)
* [1/N] add helios

* fix test

* make fix-copies

* change script path

* fix cus script

* update docs

* fix documented check

* update links for docs and examples

* change default config

* small refactor

* add test

* Update src/diffusers/models/transformers/transformer_helios.py

Co-authored-by: YiYi Xu <yixu310@gmail.com>

* remove register_buffer for _scale_cache

* fix non-cuda devices error

* remove "handle the case when timestep is 2D"

* refactor HeliosMultiTermMemoryPatch and process_input_hidden_states

* Update src/diffusers/pipelines/helios/pipeline_helios.py

Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com>

* Update src/diffusers/models/transformers/transformer_helios.py

Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com>

* Update src/diffusers/pipelines/helios/pipeline_helios.py

Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com>

* fix calculate_shift

* Update src/diffusers/pipelines/helios/pipeline_helios.py

Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com>

* rewritten `einops` in pure `torch`

* fix: pass patch_size to apply_schedule_shift instead of hardcoding

* remove the logics of 'vae_decode_type'

* move some validation into check_inputs()

* rename helios scheduler & merge all into one step()

* add some details to doc

* move dmd  step() logics from pipeline to scheduler

* change to Python 3.9+ style type

* fix NoneType error

* refactor DMD scheduler's set_timestep

* change rope related vars name

* fix stage2 sample

* fix dmd sample

* Update src/diffusers/models/transformers/transformer_helios.py

Co-authored-by: YiYi Xu <yixu310@gmail.com>

* Update src/diffusers/models/transformers/transformer_helios.py

Co-authored-by: YiYi Xu <yixu310@gmail.com>

* remove redundant & refactor norm_out

* Update src/diffusers/pipelines/helios/pipeline_helios.py

Co-authored-by: YiYi Xu <yixu310@gmail.com>

* change "is_keep_x0" to "keep_first_frame"

* use a more intuitive name

* refactor dynamic_time_shifting

* remove use_dynamic_shifting args

* remove usage of UniPCMultistepScheduler

* separate stage2 sample to HeliosPyramidPipeline

* Update src/diffusers/models/transformers/transformer_helios.py

Co-authored-by: YiYi Xu <yixu310@gmail.com>

* Update src/diffusers/models/transformers/transformer_helios.py

Co-authored-by: YiYi Xu <yixu310@gmail.com>

* Update src/diffusers/models/transformers/transformer_helios.py

Co-authored-by: YiYi Xu <yixu310@gmail.com>

* Update src/diffusers/models/transformers/transformer_helios.py

Co-authored-by: YiYi Xu <yixu310@gmail.com>

* fix transformer

* use a more intuitive name

* update example script

* fix requirements

* remove redudant attention mask

* fix

* optimize pipelines

* make style .

* update TYPE_CHECKING

* change to use torch.split

Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com>

* derive memory patch sizes from patch_size multiples

* remove some hardcoding

* move some checks into check_inputs

* refactor sample_block_noise

* optimize encoding chunks logits for v2v

* use num_history_latent_frames = sum(history_sizes)

* Update src/diffusers/pipelines/helios/pipeline_helios.py

Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com>

* remove redudant optimized_scale

* Update src/diffusers/pipelines/helios/pipeline_helios_pyramid.py

Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com>

* use more descriptive name

* optimize history_latents

* remove not used "num_inference_steps"

* removed redudant "pyramid_num_stages"

* add "is_cfg_zero_star" and "is_distilled" to HeliosPyramidPipeline

* remove redudant

* change example scripts name

* change example scripts name

* correct docs

* update example

* update docs

* Update tests/models/transformers/test_models_transformer_helios.py

Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com>

* Update tests/models/transformers/test_models_transformer_helios.py

Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com>

* separate HeliosDMDScheduler

* fix numerical stability issue:

* Update src/diffusers/schedulers/scheduling_helios_dmd.py

Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com>

* Update src/diffusers/schedulers/scheduling_helios_dmd.py

Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com>

* Update src/diffusers/schedulers/scheduling_helios_dmd.py

Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com>

* Update src/diffusers/schedulers/scheduling_helios_dmd.py

Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com>

* Update src/diffusers/schedulers/scheduling_helios_dmd.py

Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com>

* remove redudant

* small refactor

* remove use_interpolate_prompt logits

* simplified model test

* fallbackt to BaseModelTesterConfig

* remove _maybe_expand_t2v_lora_for_i2v

* fix HeliosLoraLoaderMixin

* update docs

* use randn_tensor for test

* fix doc typo

* optimize code

* mark torch.compile xfail

* change paper name

* Make get_dummy_inputs deterministic using self.generator

* Set less strict threshold for test_save_load_float16 test for Helios pipeline

* make style and make quality

* Preparation for merging

* add torch.Generator

* Fix HeliosPipelineOutput doc path

* Fix Helios related (optimize docs & remove redudant) (#13210)

* fix docs

* remove redudant

* remove redudant

* fix group offload

* Removed fixes for group offload

---------

Co-authored-by: yuanshenghai <yuanshenghai@bytedance.com>
Co-authored-by: Shenghai Yuan <140951558+SHYuanBest@users.noreply.github.com>
Co-authored-by: YiYi Xu <yixu310@gmail.com>
Co-authored-by: SHYuanBest <shyuan-cs@hotmail.com>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2026-03-04 21:31:43 +05:30
YiYi Xu
680076fcc0 [Modular] update the auto pipeline blocks doc (#13148)
* update

* Apply suggestion from @yiyixuxu

* Update docs/source/en/modular_diffusers/auto_pipeline_blocks.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/modular_diffusers/auto_pipeline_blocks.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/modular_diffusers/auto_pipeline_blocks.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/modular_diffusers/auto_pipeline_blocks.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* add to api

---------

Co-authored-by: yiyi@huggingface.co <yiyi@ip-26-0-160-103.ec2.internal>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
Co-authored-by: yiyi@huggingface.co <yiyi@ip-26-0-161-123.ec2.internal>
2026-02-27 10:50:35 -10:00
Miguel Martin
212db7b999 Cosmos Transfer2.5 Auto-Regressive Inference Pipeline (#13114)
* AR

* address comments

* address comments 2
2026-02-25 14:42:29 -10:00
Steven Liu
6875490c3b [docs] add docs for qwenimagelayered (#13158)
* add example

* feedback
2026-02-18 11:02:15 -08:00
Dhruv Nair
59e7a46928 [Pipelines] Remove k-diffusion (#13152)
* remove k-diffusion

* fix copies
2026-02-16 13:54:24 +05:30
dg845
baaa8d040b LTX 2 Improve encode_video by Accepting More Input Types (#13057)
* Support different pipeline outputs for LTX 2 encode_video

* Update examples to use improved encode_video function

* Fix comment

* Address review comments

* make style and make quality

* Have non-iterator video inputs respect video_chunks_number

* make style and make quality

* Add warning when encode_video receives a non-denormalized np.ndarray

* make style and make quality

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2026-02-08 19:40:34 -08:00
CalamitousFelicitousness
99e2cfff27 Feature/zimage inpaint pipeline (#13006)
* Add ZImageInpaintPipeline

Updated the pipeline structure to include ZImageInpaintPipeline
    alongside ZImagePipeline and ZImageImg2ImgPipeline.
Implemented the ZImageInpaintPipeline class for inpainting
    tasks, including necessary methods for encoding prompts,
    preparing masked latents, and denoising.
Enhanced the auto_pipeline to map the new ZImageInpaintPipeline
    for inpainting generation tasks.
Added unit tests for ZImageInpaintPipeline to ensure
    functionality and performance.
Updated dummy objects to include ZImageInpaintPipeline for
    testing purposes.

* Add documentation and improve test stability for ZImageInpaintPipeline

- Add torch.empty fix for x_pad_token and cap_pad_token in test
- Add # Copied from annotations for encode_prompt methods
- Add documentation with usage example and autodoc directive

* Address PR review feedback for ZImageInpaintPipeline

Add batch size validation and callback handling fixes per review,
using diffusers conventions rather than suggested code verbatim.

* Update src/diffusers/pipelines/z_image/pipeline_z_image_inpaint.py

Co-authored-by: Álvaro Somoza <asomoza@users.noreply.github.com>

* Update src/diffusers/pipelines/z_image/pipeline_z_image_inpaint.py

Co-authored-by: Álvaro Somoza <asomoza@users.noreply.github.com>

* Add input validation and fix XLA support for ZImageInpaintPipeline

- Add missing is_torch_xla_available import for TPU support
- Add xm.mark_step() in denoising loop for proper XLA execution
- Add check_inputs() method for comprehensive input validation
- Call check_inputs() at the start of __call__

Addresses PR review feedback from @asomoza.

* Cleanup

---------

Co-authored-by: Álvaro Somoza <asomoza@users.noreply.github.com>
2026-02-05 11:48:25 -03:00
Vinh H. Pham
a2ea45a5da LTX2 distilled checkpoint support (#12934)
* add constants for distill sigmas values and allow ltx pipeline to pass in sigmas

* add time conditioning conversion and token packing for latents

* make style & quality

* remove prenorm

* add sigma param to ltx2 i2v

* fix copies and add pack latents to i2v

* Apply suggestions from code review

Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com>

* Infer latent dims if latents/audio_latents is supplied

* add note for predefined sigmas

* run make style and quality

* revert distill timesteps & set original_state_dict_repo_idd to default None

* add latent normalize

* add create noised state, delete last sigmas

* remove normalize step in latent upsample pipeline and move it to ltx2 pipeline

* add create noise latent to i2v pipeline

* fix copies

* parse none value in weight conversion script

* explicit shape handling

* Apply suggestions from code review

Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com>

* make style

* add two stage inference tests

* add ltx2 documentation

* update i2v expected_audio_slice

* Apply suggestions from code review

Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com>

* Apply suggestion from @dg845

Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com>

* Update ltx2.md to remove one-stage example

Removed one-stage generation example code and added comments for noise scale in two-stage generation.

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com>
Co-authored-by: Daniel Gu <dgu8957@gmail.com>
2026-01-28 19:35:43 -08:00
Gal Davidi
d7a1c31f4f Fibo edit pipeline (#12930)
* Feature: Add BriaFiboEditPipeline to diffusers

* Introduced BriaFiboEditPipeline class with necessary backend requirements.
* Updated import structures in relevant modules to include BriaFiboEditPipeline.
* Ensured compatibility with existing pipelines and type checking.

* Feature: Introduce Bria Fibo Edit Pipeline

* Added BriaFiboEditPipeline class for structured JSON-native image editing.
* Created documentation for the new pipeline in bria_fibo_edit.md.
* Updated import structures to include the new pipeline and its components.
* Added unit tests for the BriaFiboEditPipeline to ensure functionality and correctness.

* Enhancement: Update Bria Fibo Edit Pipeline and Documentation

* Refined the Bria Fibo Edit model description for clarity and detail.
* Added usage instructions for model authentication and login.
* Implemented mask handling functions in the BriaFiboEditPipeline for improved image editing capabilities.
* Updated unit tests to cover new mask functionalities and ensure input validation.
* Adjusted example code in documentation to reflect changes in the pipeline's usage.

* Update Bria Fibo Edit documentation with corrected Hugging Face page link

* add dreambooth training script

* style and quality

* Delete temp.py

* Enhancement: Improve JSON caption validation in DreamBoothDataset

* Updated the clean_json_caption function to handle both string and dictionary inputs for captions.
* Added error handling to raise a ValueError for invalid caption types, ensuring better input validation.

* Add datasets dependency to requirements_fibo_edit.txt

* Add bria_fibo_edit to docs table of contents

* Fix dummy objects ordering

* Fix BriaFiboEditPipeline to use passed generator parameter

The pipeline was ignoring the generator parameter and only using
the seed parameter. This caused non-deterministic outputs in tests
that pass a seeded generator.

* Remove fibo_edit training script and related files

---------

Co-authored-by: kfirbria <kfir@bria.ai>
2026-01-19 22:09:53 +05:30
Sayak Paul
74654df203 add klein docs. (#12984) 2026-01-16 10:12:42 +05:30
Hameer Abbasi
5efb81fa71 Add ChromaInpaintPipeline (#12848)
* Add `ChromaInpaintPipeline`

* Set `attention_mask` to `dtype=torch.bool` for `ChromaInpaintPipeline`.

* Revert `.gitignore`.
2026-01-15 12:58:50 +05:30
Sayak Paul
7299121413 Z rz rz rz rz rz rz r cogview (#12973)
* init

* add

* add 1

* Update __init__.py

* rename

* 2

* update

* init with encoder

* merge2pipeline

* Update pipeline_glm_image.py

* remove sop

* remove useless func

* Update pipeline_glm_image.py

* up

(cherry picked from commit cfe19a31b9)

* review for work only

* change place

* Update pipeline_glm_image.py

* update

* Update transformer_glm_image.py

* 1

* no  negative_prompt for GLM-Image

* remove CogView4LoraLoaderMixin

* refactor attention processor.

* update

* fix

* use staticmethod

* update

* up

* up

* update

* Update glm_image.md

* 1

* Update pipeline_glm_image.py

* Update transformer_glm_image.py

* using new transformers impl

* support

* resolution change

* fix-copies

* Update src/diffusers/pipelines/glm_image/pipeline_glm_image.py

Co-authored-by: YiYi Xu <yixu310@gmail.com>

* Update pipeline_glm_image.py

* use cogview4

* Update pipeline_glm_image.py

* Update pipeline_glm_image.py

* revert

* update

* batch support

* update

* version guard glm image pipeline

* validate prompt_embeds and prior_token_ids

* try docs.

* 4

* up

* up

* skip properly

* fix tests

* up

* up

---------

Co-authored-by: zRzRzRzRzRzRzR <2448370773@qq.com>
Co-authored-by: yiyixuxu <yixu310@gmail.com>
2026-01-13 06:39:22 -10:00
Kashif Rasul
dad5cb55e6 Fix QwenImage txt_seq_lens handling (#12702)
* Fix QwenImage txt_seq_lens handling

* formatting

* formatting

* remove txt_seq_lens and use bool  mask

* use compute_text_seq_len_from_mask

* add seq_lens to dispatch_attention_fn

* use joint_seq_lens

* remove unused index_block

* WIP: Remove seq_lens parameter and use mask-based approach

- Remove seq_lens parameter from dispatch_attention_fn
- Update varlen backends to extract seqlens from masks
- Update QwenImage to pass 2D joint_attention_mask
- Fix native backend to handle 2D boolean masks
- Fix sage_varlen seqlens_q to match seqlens_k for self-attention

Note: sage_varlen still producing black images, needs further investigation

* fix formatting

* undo sage changes

* xformers support

* hub fix

* fix torch compile issues

* fix tests

* use _prepare_attn_mask_native

* proper deprecation notice

* add deprecate to txt_seq_lens

* Update src/diffusers/models/transformers/transformer_qwenimage.py

Co-authored-by: YiYi Xu <yixu310@gmail.com>

* Update src/diffusers/models/transformers/transformer_qwenimage.py

Co-authored-by: YiYi Xu <yixu310@gmail.com>

* Only create the mask if there's actual padding

* fix order of docstrings

* Adds performance benchmarks and optimization details for QwenImage

Enhances documentation with comprehensive performance insights for QwenImage pipeline:

* rope_text_seq_len = text_seq_len

* rename to max_txt_seq_len

* removed deprecated args

* undo unrelated change

* Updates QwenImage performance documentation

Removes detailed attention backend benchmarks and simplifies torch.compile performance description

Focuses on key performance improvement with torch.compile, highlighting the specific speedup from 4.70s to 1.93s on an A100 GPU

Streamlines the documentation to provide more concise and actionable performance insights

* Updates deprecation warnings for txt_seq_lens parameter

Extends deprecation timeline for txt_seq_lens from version 0.37.0 to 0.39.0 across multiple Qwen image-related models

Adds a new unit test to verify the deprecation warning behavior for the txt_seq_lens parameter

* fix compile

* formatting

* fix compile tests

* rename helper

* remove duplicate

* smaller values

* removed

* use torch.cond for torch compile

* Construct joint attention mask once

* test different backends

* construct joint attention mask once to avoid reconstructing in every block

* Update src/diffusers/models/attention_dispatch.py

Co-authored-by: YiYi Xu <yixu310@gmail.com>

* formatting

* raising an error from the EditPlus pipeline when batch_size > 1

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: YiYi Xu <yixu310@gmail.com>
Co-authored-by: cdutr <dutra_carlos@hotmail.com>
2026-01-12 13:45:09 +05:30
Francisco Kurucz
b86bd99eac Fix link to diffedit implementation reference (#12708) 2026-01-10 11:13:23 -08:00
Sayak Paul
4ac2b4a521 [docs] polish caching docs. (#12684)
* polish caching docs.

* Update docs/source/en/optimization/cache.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/optimization/cache.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* up

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2026-01-10 10:09:05 -08:00
Sayak Paul
ed6e5ecf67 [LoRA] add LoRA support to LTX-2 (#12933)
* up

* fixes

* tests

* docs.

* fix

* change loading info.

* up

* up
2026-01-10 11:27:22 +05:30
Jay Wu
02c7adc356 [ChronoEdit] support multiple loras (#12679)
Co-authored-by: wjay <wjay@nvidia.com>
Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>
2026-01-09 15:50:16 -10:00
SahilCarterr
1791306739 [Fix] syntax in QwenImageEditPlusPipeline (#12371)
* Fixes syntax for consistency among pipelines

* Update test_qwenimage_edit_plus.py
2026-01-09 13:55:42 -10:00
Sayak Paul
d568c9773f [chore] remove controlnet implementations outside controlnet module. (#12152)
* remove controlnet implementations outside controlnet module.

* fix

* fix

* fix
2026-01-09 21:22:45 +05:30
dg845
c10bdd9b73 Add LTX 2.0 Video Pipelines (#12915)
* Initial LTX 2.0 transformer implementation

* Add tests for LTX 2 transformer model

* Get LTX 2 transformer tests working

* Rename LTX 2 compile test class to have LTX2

* Remove RoPE debug print statements

* Get LTX 2 transformer compile tests passing

* Fix LTX 2 transformer shape errors

* Initial script to convert LTX 2 transformer to diffusers

* Add more LTX 2 transformer audio arguments

* Allow LTX 2 transformer to be loaded from local path for conversion

* Improve dummy inputs and add test for LTX 2 transformer consistency

* Fix LTX 2 transformer bugs so consistency test passes

* Initial implementation of LTX 2.0 video VAE

* Explicitly specify temporal and spatial VAE scale factors when converting

* Add initial LTX 2.0 video VAE tests

* Add initial LTX 2.0 video VAE tests (part 2)

* Get diffusers implementation on par with official LTX 2.0 video VAE implementation

* Initial LTX 2.0 vocoder implementation

* Use RMSNorm implementation closer to original for LTX 2.0 video VAE

* start audio decoder.

* init registration.

* up

* simplify and clean up

* up

* Initial LTX 2.0 text encoder implementation

* Rough initial LTX 2.0 pipeline implementation

* up

* up

* up

* up

* Add imports for LTX 2.0 Audio VAE

* Conversion script for LTX 2.0 Audio VAE Decoder

* Add Audio VAE logic to T2V pipeline

* Duplicate scheduler for audio latents

* Support num_videos_per_prompt for prompt embeddings

* LTX 2.0 scheduler and full pipeline conversion

* Add script to test full LTX2Pipeline T2V inference

* Fix pipeline return bugs

* Add LTX 2 text encoder and vocoder to ltx2 subdirectory __init__

* Fix more bugs in LTX2Pipeline.__call__

* Improve CPU offload support

* Fix pipeline audio VAE decoding dtype bug

* Fix video shape error in full pipeline test script

* Get LTX 2 T2V pipeline to produce reasonable outputs

* Make LTX 2.0 scheduler more consistent with original code

* Fix typo when applying scheduler fix in T2V inference script

* Refactor Audio VAE to be simpler and remove helpers (#7)

* remove resolve causality axes stuff.

* remove a bunch of helpers.

* remove adjust output shape helper.

* remove the use of audiolatentshape.

* move normalization and patchify out of pipeline.

* fix

* up

* up

* Remove unpatchify and patchify ops before audio latents denormalization (#9)

---------

Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com>

* Add support for I2V (#8)

* start i2v.

* up

* up

* up

* up

* up

* remove uniform strategy code.

* remove unneeded code.

* Denormalize audio latents in I2V pipeline (analogous to T2V change) (#11)

* test i2v.

* Move Video and Audio Text Encoder Connectors to Transformer (#12)

* Denormalize audio latents in I2V pipeline (analogous to T2V change)

* Initial refactor to put video and audio text encoder connectors in transformer

* Get LTX 2 transformer tests working after connector refactor

* precompute run_connectors,.

* fixes

* Address review comments

* Calculate RoPE double precisions freqs using torch instead of np

* Further simplify LTX 2 RoPE freq calc

* Make connectors a separate module (#18)

* remove text_encoder.py

* address yiyi's comments.

* up

* up

* up

* up

---------

Co-authored-by: sayakpaul <spsayakpaul@gmail.com>

* up (#19)

* address initial feedback from lightricks team (#16)

* cross_attn_timestep_scale_multiplier to 1000

* implement split rope type.

* up

* propagate rope_type to rope embed classes as well.

* up

* When using split RoPE, make sure that the output dtype is same as input dtype

* Fix apply split RoPE shape error when reshaping x to 4D

* Add export_utils file for exporting LTX 2.0 videos with audio

* Tests for T2V and I2V (#6)

* add ltx2 pipeline tests.

* up

* up

* up

* up

* remove content

* style

* Denormalize audio latents in I2V pipeline (analogous to T2V change)

* Initial refactor to put video and audio text encoder connectors in transformer

* Get LTX 2 transformer tests working after connector refactor

* up

* up

* i2v tests.

* up

* Address review comments

* Calculate RoPE double precisions freqs using torch instead of np

* Further simplify LTX 2 RoPE freq calc

* revert unneded changes.

* up

* up

* update to split style rope.

* up

---------

Co-authored-by: Daniel Gu <dgu8957@gmail.com>

* up

* use export util funcs.

* Point original checkpoint to LTX 2.0 official checkpoint

* Allow the I2V pipeline to accept image URLs

* make style and make quality

* remove function map.

* remove args.

* update docs.

* update doc entries.

* disable ltx2_consistency test

* Simplify LTX 2 RoPE forward by removing coords is None logic

* make style and make quality

* Support LTX 2.0 audio VAE encoder

* Apply suggestions from code review

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

* Remove print statement in audio VAE

* up

* Fix bug when calculating audio RoPE coords

* Ltx 2 latent upsample pipeline (#12922)

* Initial implementation of LTX 2.0 latent upsampling pipeline

* Add new LTX 2.0 spatial latent upsampler logic

* Add test script for LTX 2.0 latent upsampling

* Add option to enable VAE tiling in upsampling test script

* Get latent upsampler working with video latents

* Fix typo in BlurDownsample

* Add latent upsample pipeline docstring and example

* Remove deprecated pipeline VAE slicing/tiling methods

* make style and make quality

* When returning latents, return unpacked and denormalized latents for T2V and I2V

* Add model_cpu_offload_seq for latent upsampling pipeline

---------

Co-authored-by: Daniel Gu <dgu8957@gmail.com>

* Fix latent upsampler filename in LTX 2 conversion script

* Add latent upsample pipeline to LTX 2 docs

* Add dummy objects for LTX 2 latent upsample pipeline

* Set default FPS to official LTX 2 ckpt default of 24.0

* Set default CFG scale to official LTX 2 ckpt default of 4.0

* Update LTX 2 pipeline example docstrings

* make style and make quality

* Remove LTX 2 test scripts

* Fix LTX 2 upsample pipeline example docstring

* Add logic to convert and save a LTX 2 upsampling pipeline

* Document LTX2VideoTransformer3DModel forward pass

---------

Co-authored-by: sayakpaul <spsayakpaul@gmail.com>
2026-01-07 21:24:27 -08:00
Sayak Paul
6fb4c99f5a Update wan.md to remove unneeded hfoptions (#12890) 2026-01-07 09:47:19 -08:00
Tolga Cangöz
8f30bfff1f Add transformer cache context for SkyReels-V2 pipelines & Update docs (#12837)
* feat: Add transformer cache context for conditional and unconditional predictions for skyreels-v2 pipes.

* docs: Remove SkyReels-V2 FLF2V model link and add contributor attribution.
2026-01-06 22:30:30 -10:00
Hu Yaoqi
98479a94c2 LTX Video 0.9.8 long multi prompt (#12614)
* LTX Video 0.9.8  long multi prompt

* Further align comfyui

- Added the “LTXEulerAncestralRFScheduler” scheduler, aligned with [sample_euler_ancestral_RF](7d6103325e/comfy/k_diffusion/sampling.py (L234))

- Updated the LTXI2VLongMultiPromptPipeline.from_pretrained() method:
  - Now uses LTXEulerAncestralRFScheduler by default, for better compatibility with the ComfyUI LTXV workflow.

- Changed the default value of cond_strength from 1.0 to 0.5, aligning with ComfyUI’s default.

- Optimized cross-window overlap blending: moved the latent-space guidance injection to before the UNet and after each step, aligned with[KSamplerX0Inpaint]([ComfyUI/comfy/samplers.py at master · comfyanonymous/ComfyUI](https://github.com/comfyanonymous/ComfyUI/blob/master/comfy/samplers.py#L391))

- Adjusted the default value of skip_steps_sigma_threshold to 1.

* align with diffusers contribute rule

* Add new pipelines and update imports

* Enhance LTXI2VLongMultiPromptPipeline with noise rescaling

Refactor LTXI2VLongMultiPromptPipeline to improve documentation and add noise rescaling functionality.

* Clean up comments in scheduling_ltx_euler_ancestral_rf.py

Removed design notes and limitations from the implementation.

* Enhance video generation example with scheduler

Updated LTXI2VLongMultiPromptPipeline example to include LTXEulerAncestralRFScheduler for ComfyUI parity.

* clean up

* style

* copies

* import ltx scheduler

* copies

* fix

* fix more

* up up

* up up up

* up upup

* Apply suggestions from code review

* Update docs/source/en/api/pipelines/ltx_video.md

* Update docs/source/en/api/pipelines/ltx_video.md

---------

Co-authored-by: yiyixuxu <yixu310@gmail.com>
2026-01-06 18:18:04 -10:00
Miguel Martin
b5309683cb Cosmos Predict2.5 Base: inference pipeline, scheduler & chkpt conversion (#12852)
* cosmos predict2.5 base: convert chkpt & pipeline
- New scheduler: scheduling_flow_unipc_multistep.py
- Changes to TransformerCosmos for text embeddings via crossattn_proj

* scheduler cleanup

* simplify inference pipeline

* cleanup scheduler + tests

* Basic tests for flow unipc

* working b2b inference

* Rename everything

* Tests for pipeline present, but not working (predict2 also not working)

* docstring update

* wrapper pipelines + make style

* remove unnecessary files

* UniPCMultistep: support use_karras_sigmas=True and use_flow_sigmas=True

* use UniPCMultistepScheduler + fix tests for pipeline

* Remove FlowUniPCMultistepScheduler

* UniPCMultistepScheduler for use_flow_sigmas=True & use_karras_sigmas=True

* num_inference_steps=36 due to bug in scheduler used by predict2.5

* Address comments

* make style + make fix-copies

* fix tests + remove references to old pipelines

* address comments

* add revision in from_pretrained call

* fix tests
2025-12-19 05:38:18 +05:30
junqiangwu
a748a839ad Add support for LongCat-Image (#12828)
* Add  LongCat-Image

* Update src/diffusers/models/transformers/transformer_longcat_image.py

Co-authored-by: YiYi Xu <yixu310@gmail.com>

* Update src/diffusers/models/transformers/transformer_longcat_image.py

Co-authored-by: YiYi Xu <yixu310@gmail.com>

* Update src/diffusers/models/transformers/transformer_longcat_image.py

Co-authored-by: YiYi Xu <yixu310@gmail.com>

* Update src/diffusers/pipelines/longcat_image/pipeline_longcat_image.py

Co-authored-by: YiYi Xu <yixu310@gmail.com>

* Update src/diffusers/pipelines/longcat_image/pipeline_longcat_image.py

Co-authored-by: YiYi Xu <yixu310@gmail.com>

* Update src/diffusers/pipelines/longcat_image/pipeline_longcat_image.py

Co-authored-by: YiYi Xu <yixu310@gmail.com>

* Update src/diffusers/models/transformers/transformer_longcat_image.py

Co-authored-by: YiYi Xu <yixu310@gmail.com>

* Update src/diffusers/pipelines/longcat_image/pipeline_longcat_image.py

Co-authored-by: YiYi Xu <yixu310@gmail.com>

* fix code

* add doc

* Update src/diffusers/pipelines/longcat_image/pipeline_longcat_image_edit.py

Co-authored-by: YiYi Xu <yixu310@gmail.com>

* Update src/diffusers/pipelines/longcat_image/pipeline_longcat_image_edit.py

Co-authored-by: YiYi Xu <yixu310@gmail.com>

* Update src/diffusers/pipelines/longcat_image/pipeline_longcat_image.py

Co-authored-by: YiYi Xu <yixu310@gmail.com>

* Update src/diffusers/pipelines/longcat_image/pipeline_longcat_image.py

Co-authored-by: YiYi Xu <yixu310@gmail.com>

* Update src/diffusers/pipelines/longcat_image/pipeline_longcat_image.py

Co-authored-by: YiYi Xu <yixu310@gmail.com>

* Update src/diffusers/pipelines/longcat_image/pipeline_longcat_image.py

Co-authored-by: YiYi Xu <yixu310@gmail.com>

* fix code & mask style & fix-copies

* Apply style fixes

* fix single input rewrite error

---------

Co-authored-by: YiYi Xu <yixu310@gmail.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: hadoop-imagen <hadoop-imagen@psxfb7pxrbvmh3oq-worker-0.psxfb7pxrbvmh3oq.hadoop-aipnlp.svc.cluster.local>
2025-12-15 07:45:17 -10:00
Yuqian Hong
58519283e7 Support for control-lora (#10686)
* run control-lora on diffusers

* cannot load lora adapter

* test

* 1

* add control-lora

* 1

* 1

* 1

* fix PeftAdapterMixin

* fix module_to_save bug

* delete json print

* resolve conflits

* merged but bug

* change peft.py

* 1

* delete state_dict print

* fix alpha

* Create control_lora.py

* Add files via upload

* rename

* no need modify as peft updated

* add doc

* fix code style

* styling isn't that hard 😉

* empty

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2025-12-15 15:52:42 +05:30
CalamitousFelicitousness
2246d2c7c4 Add ZImageImg2ImgPipeline (#12751)
* Add ZImageImg2ImgPipeline

Updated the pipeline structure to include ZImageImg2ImgPipeline
    alongside ZImagePipeline.
Implemented the ZImageImg2ImgPipeline class for image-to-image
    transformations, including necessary methods for
    encoding prompts, preparing latents, and denoising.
Enhanced the auto_pipeline to map the new ZImageImg2ImgPipeline
    for image generation tasks.
Added unit tests for ZImageImg2ImgPipeline to ensure
    functionality and performance.
Updated dummy objects to include ZImageImg2ImgPipeline for
    testing purposes.

* Address review comments for ZImageImg2ImgPipeline

- Add `# Copied from` annotations to encode_prompt and _encode_prompt
- Add ZImagePipeline to auto_pipeline.py for AutoPipeline support

* Add ZImage pipeline documentation

---------

Co-authored-by: YiYi Xu <yixu310@gmail.com>
Co-authored-by: Álvaro Somoza <asomoza@users.noreply.github.com>
2025-12-07 22:06:23 -10:00
Tran Thanh Luan
6290fdfda4 [Feat] TaylorSeer Cache (#12648)
* init taylor_seer cache

* make compatible with any tuple size returned

* use logger for printing, add warmup feature

* still update in warmup steps

* refractor, add docs

* add configurable cache, skip compute module

* allow special cache ids only

* add stop_predicts (cooldown)

* update docs

* apply ruff

* update to handle multple calls per timestep

* refractor to use state manager

* fix format & doc

* chores: naming, remove redundancy

* add docs

* quality & style

* fix taylor precision

* Apply style fixes

* add tests

* Apply style fixes

* Remove TaylorSeerCacheTesterMixin from flux2 tests

* rename identifiers, use more expressive taylor predict loop

* torch compile compatible

* Apply style fixes

* Update src/diffusers/hooks/taylorseer_cache.py

Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>

* update docs

* make fix-copies

* fix example usage.

* remove tests on flux kontext

---------

Co-authored-by: toilaluan <toilaluan@github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2025-12-06 05:39:54 +05:30
Sayak Paul
8430ac2a2f [docs] minor fixes to kandinsky docs (#12797)
up
2025-12-05 08:33:05 -08:00
Álvaro Somoza
c98c157a9e [Docs] Add Z-Image docs (#12775)
* initial

* toctree

* fix

* apply review and fix

* Update docs/source/en/api/pipelines/z_image.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/api/pipelines/z_image.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/api/pipelines/z_image.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-12-05 11:05:47 -03:00
Lev Novitskiy
d0c54e5563 Kandinsky 5.0 Video Pro and Image Lite (#12664)
* add transformer pipeline first version


---------

Co-authored-by: Álvaro Somoza <asomoza@users.noreply.github.com>
Co-authored-by: YiYi Xu <yixu310@gmail.com>
Co-authored-by: Charles <charles@huggingface.co>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: dmitrienkoae <dmitrienko.ae@phystech.edu>
Co-authored-by: nvvaulin <nvvaulin@gmail.com>
2025-12-03 00:46:37 -10:00
Sayak Paul
f48f9c250f [core] start varlen variants for attn backend kernels. (#12765)
* start varlen variants for attn backend kernels.

* maybe unflatten heads.

* updates

* remove unused function.

* doc

* up
2025-12-03 13:34:52 +05:30
Guo-Hua Wang
4f136f842c Add support for Ovis-Image (#12740)
* add ovis_image

* fix code quality

* optimize pipeline_ovis_image.py according to the feedbacks

* optimize imports

* add docs

* make style

* make style

* add ovis to toctree

* oops

---------

Co-authored-by: YiYi Xu <yixu310@gmail.com>
2025-12-02 11:48:07 -10:00
CalamitousFelicitousness
edf36f5128 Add ZImage LoRA support and integrate into ZImagePipeline (#12750)
* Add ZImage LoRA support and integrate into ZImagePipeline

* Add LoRA test for Z-Image

* Move the LoRA test

* Fix ZImage LoRA scale support and test configuration

* Add ZImage LoRA test overrides for architecture differences

- Override test_lora_fuse_nan to use ZImage's 'layers' attribute
  instead of 'transformer_blocks'
- Skip block-level LoRA scaling test (not supported in ZImage)
- Add required imports: numpy, torch_device, check_if_lora_correctly_set

* Add ZImageLoraLoaderMixin to LoRA documentation

* Use conditional import for peft.LoraConfig in ZImage tests

* Override test_correct_lora_configs_with_different_ranks for ZImage

ZImage uses 'attention.to_k' naming convention instead of 'attn.to_k',
so the base test's module name search loop never finds a match. This
override uses the correct naming pattern for ZImage architecture.

* Add is_flaky decorator to ZImage LoRA tests initialise padding tokens

* Skip ZImage LoRA test class entirely

Skip the entire ZImageLoRATests class due to non-deterministic behavior
from complex64 RoPE operations and torch.empty padding tokens.
LoRA functionality works correctly with real models.

Clean up removed:
- Individual @unittest.skip decorators
- @is_flaky decorator overrides for inherited methods
- Custom test method overrides
- Global torch deterministic settings
- Unused imports (numpy, is_flaky, check_if_lora_correctly_set)

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: Álvaro Somoza <asomoza@users.noreply.github.com>
2025-12-02 02:16:30 -03:00
Sayak Paul
564079f295 [feat]: implement "local" caption upsampling for Flux.2 (#12718)
* feat: implement caption upsampling for flux.2.

* doc

* up

* fix

* up

* fix system prompts 🤷‍

* up

* up

* up
2025-12-02 04:27:24 +05:30
Sayak Paul
394a48d169 Update bria_fibo.md with minor fixes (#12731)
* Update bria_fibo.md with minor fixes

* Apply suggestions from code review

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-12-02 04:24:19 +05:30
Gal Davidi
99784ae0d2 Rename BriaPipeline to BriaFiboPipeline in documentation (#12758) 2025-12-01 09:34:47 -10:00
YiYi Xu
6156cf8f22 Hunyuanvideo15 (#12696)
* add


---------

Co-authored-by: yiyi@huggingface.co <yiyi@ip-26-0-161-123.ec2.internal>
Co-authored-by: yiyi@huggingface.co <yiyi@ip-26-0-160-103.ec2.internal>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2025-11-30 20:27:59 -10:00