support ltx-2 type masking in flash_3_hub_varlen

2026-03-15 13:08:00 +08:00 · 2026-03-13 13:59:48 +05:30
6 changed files with 15 additions and 110 deletions
--- a/.ai/AGENTS.md
+++ b/.ai/AGENTS.md
@@ -1,77 +0,0 @@
-# Diffusers — Agent Guide
-
-## Coding style
-
-Strive to write code as simple and explicit as possible.
-
- Minimize small helper/utility functions — inline the logic instead. A reader should be able to follow the full flow without jumping between functions.
- No defensive code or unused code paths — do not add fallback paths, safety checks, or configuration options "just in case". When porting from a research repo, delete training-time code paths, experimental flags, and ablation branches entirely — only keep the inference path you are actually integrating.
- Do not guess user intent and silently correct behavior. Make the expected inputs clear in the docstring, and raise a concise error for unsupported cases rather than adding complex fallback logic.
-
---
-
-### Dependencies
- No new mandatory dependency without discussion (e.g. `einops`)
- Optional deps guarded with `is_X_available()` and a dummy in `utils/dummy_*.py`
-
-## Code formatting
- `make style` and `make fix-copies` should be run as the final step before opening a PR
-
-### Copied Code
- Many classes are kept in sync with a source via a `# Copied from ...` header comment
- Do not edit a `# Copied from` block directly — run `make fix-copies` to propagate changes from the source
- Remove the header to intentionally break the link
-
-### Models
- All layer calls should be visible directly in `forward` — avoid helper functions that hide `nn.Module` calls.
- Try to not introduce graph breaks as much as possible for better compatibility with `torch.compile`. For example, DO NOT arbitrarily insert operations from NumPy in the forward implementations.
- Attention must follow the diffusers pattern: both the `Attention` class and its processor are defined in the model file. The processor's `__call__` handles the actual compute and must use `dispatch_attention_fn` rather than calling `F.scaled_dot_product_attention` directly. The attention class inherits `AttentionModuleMixin` and declares `_default_processor_cls` and `_available_processors`.
-
-```python
-# transformer_mymodel.py
-
-class MyModelAttnProcessor:
-    _attention_backend = None
-    _parallel_config = None
-
-    def __call__(self, attn, hidden_states, attention_mask=None, ...):
-        query = attn.to_q(hidden_states)
-        key = attn.to_k(hidden_states)
-        value = attn.to_v(hidden_states)
-        # reshape, apply rope, etc.
-        hidden_states = dispatch_attention_fn(
-            query, key, value,
-            attn_mask=attention_mask,
-            backend=self._attention_backend,
-            parallel_config=self._parallel_config,
-        )
-        hidden_states = hidden_states.flatten(2, 3)
-        return attn.to_out[0](hidden_states)
-
-
-class MyModelAttention(nn.Module, AttentionModuleMixin):
-    _default_processor_cls = MyModelAttnProcessor
-    _available_processors = [MyModelAttnProcessor]
-
-    def __init__(self, query_dim, heads=8, dim_head=64, ...):
-        super().__init__()
-        self.to_q = nn.Linear(query_dim, heads * dim_head, bias=False)
-        self.to_k = nn.Linear(query_dim, heads * dim_head, bias=False)
-        self.to_v = nn.Linear(query_dim, heads * dim_head, bias=False)
-        self.to_out = nn.ModuleList([nn.Linear(heads * dim_head, query_dim), nn.Dropout(0.0)])
-        self.set_processor(MyModelAttnProcessor())
-
-    def forward(self, hidden_states, attention_mask=None, **kwargs):
-        return self.processor(self, hidden_states, attention_mask, **kwargs)
-```
-
-Consult the implementations in `src/diffusers/models/transformers/` if you need further references.
-
-### Pipeline
- All pipelines must inherit from `DiffusionPipeline`. Consult implementations in `src/diffusers/pipelines` in case you need references.
- DO NOT use an existing pipeline class (e.g., `FluxPipeline`) to override another pipeline (e.g., `FluxImg2ImgPipeline` which will be a part of the core codebase (`src`).
-
-
-### Tests
- Slow tests gated with `@slow` and `RUN_SLOW=1`
- All model-level tests must use the `BaseModelTesterConfig`, `ModelTesterMixin`, `MemoryTesterMixin`, `AttentionTesterMixin`, `LoraTesterMixin`, and `TrainingTesterMixin` classes initially to write the tests. Any additional tests should be added after discussions with the maintainers. Use `tests/models/transformers/test_models_transformer_flux.py` as a reference.
--- a/.gitignore
+++ b/.gitignore
@@ -178,8 +178,4 @@ tags
 .ruff_cache

 # wandb
-wandb
-
-# AI agent generated symlinks
-/AGENTS.md
-/CLAUDE.md
+wandb
--- a/13
+++ b/13
@@ -1,4 +1,4 @@
-.PHONY: deps_table_update modified_only_fixup extra_style_checks quality style fixup fix-copies test test-examples codex claude clean-ai
+.PHONY: deps_table_update modified_only_fixup extra_style_checks quality style fixup fix-copies test test-examples

 # make sure to test the local checkout in scripts and not the pre-installed one (don't use quotes!)
 export PYTHONPATH = src
@@ -98,14 +98,3 @@ post-release:

 post-patch:
 	python utils/release.py --post_release --patch
-
-# AI agent symlinks
-
-codex:
-	ln -snf .ai/AGENTS.md AGENTS.md
-
-claude:
-	ln -snf .ai/AGENTS.md CLAUDE.md
-
-clean-ai:
-	rm -f AGENTS.md CLAUDE.md
--- a/docs/source/en/conceptual/contribution.md
+++ b/docs/source/en/conceptual/contribution.md
@@ -565,16 +565,4 @@ $ git push --set-upstream origin your-branch-for-syncing

 ### Style guide

-For documentation strings, 🧨 Diffusers follows the [Google style](https://google.github.io/styleguide/pyguide.html).
-
-
-## Coding with AI agents
-
-The repository keeps AI-agent configuration in `.ai/` and exposes local agent files via symlinks.
-
- **Source of truth** — edit `.ai/AGENTS.md` (and any future `.ai/skills/`)
- **Don't edit** generated root-level `AGENTS.md` or `CLAUDE.md` — they are symlinks
- Setup commands:
-  - `make codex` — symlink for OpenAI Codex
-  - `make claude` — symlink for Claude Code
-  - `make clean-ai` — remove generated symlinks
+For documentation strings, 🧨 Diffusers follows the [Google style](https://google.github.io/styleguide/pyguide.html).
--- a/src/diffusers/models/attention_dispatch.py
+++ b/src/diffusers/models/attention_dispatch.py
@@ -2559,7 +2559,9 @@ def _flash_attention_3_hub(
    _parallel_config: "ParallelConfig" | None = None,
 ) -> torch.Tensor:
    if attn_mask is not None:
-        raise ValueError("`attn_mask` is not supported for flash-attn 3.")
+        raise ValueError(
+            "`attn_mask` is not supported for flash-attn 3. Please use the `_flash_3_varlen_hub` backend instead."
+        )

    func = _HUB_KERNELS_REGISTRY[AttentionBackendName._FLASH_3_HUB].kernel_fn
    if _parallel_config is None:
@@ -2641,6 +2643,8 @@ def _flash_attention_3_varlen_hub(
    _, seq_len_kv, _, _ = key.shape

    if attn_mask is not None:
+        if attn_mask.dtype != torch.bool:
+            attn_mask = attn_mask > -1
        attn_mask = _normalize_attn_mask(attn_mask, batch_size, seq_len_kv)

    (_, seqlens_k), (cu_seqlens_q, cu_seqlens_k), (max_seqlen_q, max_seqlen_k) = (
@@ -2660,7 +2664,7 @@ def _flash_attention_3_varlen_hub(
    value_packed = torch.cat(value_valid, dim=0)

    func = _HUB_KERNELS_REGISTRY[AttentionBackendName._FLASH_3_VARLEN_HUB].kernel_fn
-    out, lse, *_ = func(
+    result = func(
        q=query_packed,
        k=key_packed,
        v=value_packed,
@@ -2671,6 +2675,11 @@ def _flash_attention_3_varlen_hub(
        softmax_scale=scale,
        causal=is_causal,
    )
+    if isinstance(result, tuple):
+        out, lse, *_ = result
+    else:
+        out = result
+        lse = None
    out = out.unflatten(0, (batch_size, -1))

    return (out, lse) if return_lse else out
--- a/src/diffusers/pipelines/flux2/pipeline_flux2.py
+++ b/src/diffusers/pipelines/flux2/pipeline_flux2.py
@@ -744,7 +744,7 @@ class Flux2Pipeline(DiffusionPipeline, Flux2LoraLoaderMixin):
    @replace_example_docstring(EXAMPLE_DOC_STRING)
    def __call__(
        self,
-        image: PIL.Image.Image | list[PIL.Image.Image] | None = None,
+        image: list[PIL.Image.Image, PIL.Image.Image] | None = None,
        prompt: str | list[str] = None,
        height: int | None = None,
        width: int | None = None,