mirror of https://github.com/huggingface/diffusers.git synced 2026-04-22 15:41:25 +08:00

Files

Kashif Rasul 5d207e756e [Discrete Diffusion] Add LLaDA2 pipeline (#13226 )

* feat: add LLaDA2 and BlockRefinement pipelines for discrete text diffusion

Add support for LLaDA2/LLaDA2.1 discrete diffusion text generation:
- BlockRefinementPipeline: block-wise iterative refinement with confidence-based
  token commitment, supporting editing threshold for LLaDA2.1 models
- LLaDA2Pipeline: convenience wrapper with LLaDA2-specific defaults
- DiscreteDiffusionPipelineMixin: shared SAR sampling utilities (top-k, top-p,
  temperature) and prompt/prefix helpers
- compute_confidence_aware_loss: CAP-style training loss
- Examples: sampling scripts for LLaDA2 and block refinement, training scripts
  with Qwen causal LM
- Docs and tests included

* feat: add BlockRefinementScheduler for commit-by-confidence scheduling

Extract the confidence-based token commit logic from BlockRefinementPipeline
into a dedicated BlockRefinementScheduler, following diffusers conventions.

The scheduler owns:
- Transfer schedule computation (get_num_transfer_tokens)
- Timestep management (set_timesteps)
- Step logic: confidence-based mask-filling and optional token editing

The pipeline now delegates scheduling to self.scheduler.step() and accepts
a scheduler parameter in __init__.

* test: add unit tests for BlockRefinementScheduler

12 tests covering set_timesteps, get_num_transfer_tokens, step logic
(confidence-based commits, threshold behavior, editing, prompt masking,
batched inputs, tuple output).

* docs: add toctree entries and standalone scheduler doc page

- Add BlockRefinement and LLaDA2 to docs sidebar navigation
- Add BlockRefinementScheduler to schedulers sidebar navigation
- Move scheduler autodoc to its own page under api/schedulers/

* feat: add --revision flag and fix dtype deprecation in sample_llada2.py

- Add --revision argument for loading model revisions from the Hub
- Replace deprecated torch_dtype with dtype for transformers 5.x compat

* fix: use 1/0 attention mask instead of 0/-inf for LLaDA2 compat

LLaDA2 models expect a boolean-style (1/0) attention mask, not an
additive (0/-inf) mask. The model internally converts to additive,
so passing 0/-inf caused double-masking and gibberish output.

* refactor: consolidate training scripts into single train_block_refinement.py

- Remove toy train_block_refinement_cap.py (self-contained demo with tiny model)
- Rename train_block_refinement_qwen_cap.py to train_block_refinement.py
  (already works with any causal LM via AutoModelForCausalLM)
- Fix torch_dtype deprecation and update README with correct script names

* fix formatting

* docs: improve LLaDA2 and BlockRefinement documentation

- Add usage examples with real model IDs and working code
- Add recommended parameters table for LLaDA2.1 quality/speed modes
- Note that editing is LLaDA2.1-only (not for LLaDA2.0 models)
- Remove misleading config defaults section from BlockRefinement docs

* feat: set LLaDA2Pipeline defaults to recommended model parameters

- threshold: 0.95 -> 0.7 (quality mode)
- max_post_steps: 0 -> 16 (recommended for LLaDA2.1, harmless for 2.0)
- eos_early_stop: False -> True (stop at EOS token)

block_length=32, steps=32, temperature=0.0 were already correct.
editing_threshold remains None (users enable for LLaDA2.1 models).

* feat: default editing_threshold=0.5 for LLaDA2.1 quality mode

LLaDA2.1 is the current generation. Users with LLaDA2.0 models can
disable editing by passing editing_threshold=None.

* fix: align sampling utilities with official LLaDA2 implementation

- top_p filtering: add shift-right to preserve at least one token above
  threshold (matches official code line 1210)
- temperature ordering: apply scaling before top-k/top-p filtering so
  filtering operates on scaled logits (matches official code lines 1232-1235)
- greedy branch: return argmax directly when temperature=0 without
  filtering (matches official code lines 1226-1230)

* refactor: remove duplicate prompt encoding, reuse mixin's _prepare_input_ids

LLaDA2Pipeline._prepare_prompt_ids was a near-copy of
DiscreteDiffusionPipelineMixin._prepare_input_ids. Remove the duplicate
and call the mixin method directly. Also simplify _extract_input_ids
since we always pass return_dict=True.

* formatting

* fix: replace deprecated torch_dtype with dtype in examples and docstrings

- Update EXAMPLE_DOC_STRING to use dtype= and LLaDA2.1-mini model ID
- Fix sample_block_refinement.py to use dtype=

* remove BlockRefinementPipeline

* cleanup

* fix readme

* Update src/diffusers/pipelines/llada2/pipeline_llada2.py

Co-authored-by: YiYi Xu <yixu310@gmail.com>

* Update src/diffusers/pipelines/llada2/pipeline_llada2.py

Co-authored-by: YiYi Xu <yixu310@gmail.com>

* Update src/diffusers/pipelines/llada2/pipeline_llada2.py

Co-authored-by: YiYi Xu <yixu310@gmail.com>

* Update src/diffusers/pipelines/llada2/pipeline_llada2.py

Co-authored-by: YiYi Xu <yixu310@gmail.com>

* Update src/diffusers/pipelines/llada2/pipeline_llada2.py

Co-authored-by: YiYi Xu <yixu310@gmail.com>

* Update src/diffusers/pipelines/llada2/pipeline_llada2.py

Co-authored-by: YiYi Xu <yixu310@gmail.com>

* Update src/diffusers/pipelines/llada2/pipeline_llada2.py

Co-authored-by: YiYi Xu <yixu310@gmail.com>

* Update src/diffusers/pipelines/llada2/pipeline_llada2.py

Co-authored-by: YiYi Xu <yixu310@gmail.com>

* Update src/diffusers/pipelines/llada2/pipeline_llada2.py

Co-authored-by: YiYi Xu <yixu310@gmail.com>

* Update src/diffusers/pipelines/llada2/pipeline_llada2.py

Co-authored-by: YiYi Xu <yixu310@gmail.com>

* Update src/diffusers/pipelines/llada2/pipeline_llada2.py

Co-authored-by: YiYi Xu <yixu310@gmail.com>

* Update src/diffusers/pipelines/llada2/pipeline_llada2.py

Co-authored-by: YiYi Xu <yixu310@gmail.com>

* Update src/diffusers/pipelines/llada2/pipeline_llada2.py

Co-authored-by: YiYi Xu <yixu310@gmail.com>

* Update src/diffusers/pipelines/llada2/pipeline_llada2.py

Co-authored-by: YiYi Xu <yixu310@gmail.com>

* Update src/diffusers/pipelines/llada2/pipeline_llada2.py

Co-authored-by: YiYi Xu <yixu310@gmail.com>

* removed DiscreteDiffusionPipelineMixin

* add support for 2d masks for flash attn

* Update src/diffusers/training_utils.py

Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com>

* Update src/diffusers/training_utils.py

Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com>

* fix issues from review

* added tests

* formatting

* add check_eos_finished to scheduler

* Update src/diffusers/pipelines/llada2/pipeline_llada2.py

Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com>

* Update src/diffusers/pipelines/llada2/pipeline_llada2.py

Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com>

* Update src/diffusers/pipelines/llada2/pipeline_llada2.py

Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com>

* Update src/diffusers/pipelines/llada2/pipeline_llada2.py

Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com>

* Update src/diffusers/schedulers/scheduling_block_refinement.py

Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com>

* Update src/diffusers/pipelines/llada2/pipeline_llada2.py

Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com>

* Update src/diffusers/pipelines/llada2/pipeline_llada2.py

Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com>

* Update src/diffusers/pipelines/llada2/pipeline_llada2.py

Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com>

* Update src/diffusers/schedulers/scheduling_block_refinement.py

Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com>

* fix renaming issues and types

* remove duplicate check

* Update docs/source/en/api/pipelines/llada2.md

Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com>

* Update src/diffusers/pipelines/llada2/pipeline_llada2.py

Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com>

* Update src/diffusers/pipelines/llada2/pipeline_llada2.py

Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com>

---------

Co-authored-by: YiYi Xu <yixu310@gmail.com>
Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com>

2026-03-25 16:17:50 +05:30

6.4 KiB

Raw Permalink Blame History

Pipelines

Pipelines provide a simple way to run state-of-the-art diffusion models in inference by bundling all of the necessary components (multiple independently-trained models, schedulers, and processors) into a single end-to-end class. Pipelines are flexible and they can be adapted to use different schedulers or even model components.

All pipelines are built from the base [DiffusionPipeline] class which provides basic functionality for loading, downloading, and saving all the components. Specific pipeline types (for example [StableDiffusionPipeline]) loaded with [~DiffusionPipeline.from_pretrained] are automatically detected and the pipeline components are loaded and passed to the __init__ function of the pipeline.

Warning

You shouldn't use the [DiffusionPipeline] class for training. Individual components (for example, [UNet2DModel] and [UNet2DConditionModel]) of diffusion pipelines are usually trained individually, so we suggest directly working with them instead.

Pipelines do not offer any training functionality. You'll notice PyTorch's autograd is disabled by decorating the [~DiffusionPipeline.__call__] method with a torch.no_grad decorator because pipelines should not be used for training. If you're interested in training, please take a look at the Training guides instead!

The table below lists all the pipelines currently available in 🤗 Diffusers and the tasks they support. Click on a pipeline to view its abstract and published paper.

Pipeline	Tasks
aMUSEd	text2image
AnimateDiff	text2video
Attend-and-Excite	text2image
AudioLDM	text2audio
AudioLDM2	text2audio
AuraFlow	text2image
BLIP Diffusion	text2image
Bria 3.2	text2image
CogVideoX	text2video
Consistency Models	unconditional image generation
ControlNet	text2image, image2image, inpainting
ControlNet with Flux.1	text2image
ControlNet with Hunyuan-DiT	text2image
ControlNet with Stable Diffusion 3	text2image
ControlNet with Stable Diffusion XL	text2image
ControlNet-XS	text2image
ControlNet-XS with Stable Diffusion XL	text2image
Cosmos	text2video, video2video
Dance Diffusion	unconditional audio generation
DDIM	unconditional image generation
DDPM	unconditional image generation
DeepFloyd IF	text2image, image2image, inpainting, super-resolution
DiffEdit	inpainting
DiT	text2image
Flux	text2image
Hunyuan-DiT	text2image
I2VGen-XL	image2video
InstructPix2Pix	image editing
Kandinsky 2.1	text2image, image2image, inpainting, interpolation
Kandinsky 2.2	text2image, image2image, inpainting
Kandinsky 3	text2image, image2image
Kolors	text2image
Latent Consistency Models	text2image
Latent Diffusion	text2image, super-resolution
Latte	text2image
LEDITS++	image editing
LLaDA2	text2text
Lumina-T2X	text2image
Marigold	depth-estimation, normals-estimation, intrinsic-decomposition
MultiDiffusion	text2image
MusicLDM	text2audio
PAG	text2image
Paint by Example	inpainting
PIA	image2video
PixArt-α	text2image
PixArt-Σ	text2image
Self-Attention Guidance	text2image
Semantic Guidance	text2image
Shap-E	text-to-3D, image-to-3D
Stable Audio	text2audio
Stable Cascade	text2image
Stable Diffusion	text2image, image2image, depth2image, inpainting, image variation, latent upscaler, super-resolution
Stable Diffusion XL	text2image, image2image, inpainting
Stable Diffusion XL Turbo	text2image, image2image, inpainting
Stable unCLIP	text2image, image variation
T2I-Adapter	text2image
Text2Video	text2video, video2video
Text2Video-Zero	text2video
unCLIP	text2image, image variation
UniDiffuser	text2image, image2text, image variation, text variation, unconditional image generation, unconditional audio generation
Value-guided planning	value guided sampling
Wuerstchen	text2image
VisualCloze	text2image, image2image, subject driven generation, inpainting, style transfer, image restoration, image editing, [depth,normal,edge,pose]2image, [depth,normal,edge,pose]-estimation, virtual try-on, image relighting

DiffusionPipeline

autodoc DiffusionPipeline - all - call - device - to - components

autodoc pipelines.StableDiffusionMixin.enable_freeu

autodoc pipelines.StableDiffusionMixin.disable_freeu

PushToHubMixin

autodoc utils.PushToHubMixin

Callbacks

autodoc callbacks.PipelineCallback

autodoc callbacks.SDCFGCutoffCallback

autodoc callbacks.SDXLCFGCutoffCallback

autodoc callbacks.SDXLControlnetCFGCutoffCallback

autodoc callbacks.IPAdapterScaleCutoffCallback

autodoc callbacks.SD3CFGCutoffCallback

6.4 KiB Raw Permalink Blame History Unescape Escape

Pipelines

DiffusionPipeline

PushToHubMixin

Callbacks

6.4 KiB

Raw Permalink Blame History