add more tests

fix update_componenet with custom model
up
2026-03-09 10:11:43 +08:00 · 2026-03-03 09:05:33 +00:00 · 2026-03-03 09:05:22 +00:00 · 2026-02-27 09:59:39 +00:00
135 changed files with 851 additions and 3443 deletions
--- a/.claude/CLAUDE.md
+++ b/.claude/CLAUDE.md
@@ -1,100 +0,0 @@
-# CLAUDE.md
-
-This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
-
-## Build, Lint, and Test Commands
-
-```bash
-# Install in development mode
-pip install -e ".[dev]"
-
-# Run full test suite (requires beefy machine)
-make test
-# Or directly:
-python -m pytest -n auto --dist=loadfile -s -v ./tests/
-
-# Run a single test file
-python -m pytest tests/<TEST_FILE>.py
-
-# Run slow tests (downloads many GBs of models)
-RUN_SLOW=yes python -m pytest -n auto --dist=loadfile -s -v ./tests/
-
-# Format code (ruff + doc-builder)
-make style
-
-# Check code quality without modifying
-make quality
-
-# Fast fixup for modified files only (recommended before commits)
-make fixup
-
-# Fix copied code snippets and dummy objects
-make fix-copies
-
-# Check repository consistency (dummies, inits, repo structure)
-make repo-consistency
-```
-
-## Code Architecture
-
-Diffusers is built on three core component types that work together:
-
-### Pipelines (`src/diffusers/pipelines/`)
- End-to-end inference workflows combining models and schedulers
- Base class: `DiffusionPipeline` (in `pipeline_utils.py`)
- Follow **single-file policy**: each pipeline in its own directory
- Loaded via `DiffusionPipeline.from_pretrained()` which reads `model_index.json`
- Components registered via `register_modules()` become pipeline attributes
- ~99 pipeline implementations (Stable Diffusion, SDXL, Flux, etc.)
-
-### Models (`src/diffusers/models/`)
- Configurable neural network architectures extending PyTorch's Module
- Base classes: `ModelMixin` + `ConfigMixin` (in `modeling_utils.py`)
- **Do NOT follow single-file policy**: use shared building blocks (`attention.py`, `embeddings.py`, `resnet.py`)
- Key subdirectories:
-  - `autoencoders/`: VAEs for latent space compression
-  - `unets/`: Diffusion model architectures (UNet2DConditionModel, etc.)
-  - `transformers/`: Transformer-based models (Flux, SD3, etc.)
-  - `controlnets/`: ControlNet variants
-
-### Schedulers (`src/diffusers/schedulers/`)
- Guide denoising process during inference
- Base class: `SchedulerMixin` + `ConfigMixin` (in `scheduling_utils.py`)
- Follow **single-file policy**: one scheduler per file
- Key methods: `set_num_inference_steps()`, `step()`, `timesteps` property
- Easily swappable via `ConfigMixin.from_config()`
- ~55 scheduler algorithms (DDPM, DDIM, Euler, DPM-Solver, etc.)
-
-### Supporting Systems
-
- **Loaders** (`src/diffusers/loaders/`): Mixins for LoRA, IP-Adapter, textual inversion, single-file loading
- **Quantizers** (`src/diffusers/quantizers/`): BitsAndBytes, GGUF, TorchAO, Quanto support
- **Hooks** (`src/diffusers/hooks/`): Runtime optimizations (offloading, layer skipping, caching)
- **Guiders** (`src/diffusers/guiders/`): Guidance algorithms (CFG, PAG, etc.)
-
-## Configuration System
-
-All components use `ConfigMixin` for serialization:
- Constructor arguments stored via `register_to_config(**kwargs)`
- Instantiate from config: `Component.from_config(config_dict)`
- Save/load as JSON files
-
-## Key Design Principles
-
-1. **Usability over Performance**: Models load at float32/CPU by default
-2. **Simple over Easy**: Explicit > implicit; expose complexity rather than hide it
-3. **Single-file policy**: Pipelines and schedulers are self-contained; models share building blocks
-4. **Copy-paste over abstraction**: Prefer duplicated code over hasty abstractions for contributor-friendliness
-
-## Code Style
-
- Uses `ruff` for linting and formatting (line length: 119)
- Documentation follows [Google style](https://google.github.io/styleguide/pyguide.html)
- Use `# Copied from` mechanism for sharing code between similar files
- Avoid lambda functions and advanced PyTorch operators for readability
-
-## Testing
-
- Tests use `pytest` with `pytest-xdist` for parallelization
- Slow tests gated by `RUN_SLOW=yes` environment variable
- Test dependencies: `pip install -e ".[test]"`
--- a/.github/workflows/benchmark.yml
+++ b/.github/workflows/benchmark.yml
@@ -62,6 +62,20 @@ jobs:
        with:
          name: benchmark_test_reports
          path: benchmarks/${{ env.BASE_PATH }}
+      
+      # TODO: enable this once the connection problem has been resolved.
+      - name: Update benchmarking results to DB
+        env:
+          PGDATABASE: metrics
+          PGHOST: ${{ secrets.DIFFUSERS_BENCHMARKS_PGHOST }}
+          PGUSER: transformers_benchmarks
+          PGPASSWORD: ${{ secrets.DIFFUSERS_BENCHMARKS_PGPASSWORD }}
+          BRANCH_NAME: ${{ github.head_ref || github.ref_name }}
+        run: |
+          git config --global --add safe.directory /__w/diffusers/diffusers
+          commit_id=$GITHUB_SHA
+          commit_msg=$(git show -s --format=%s "$commit_id" | cut -c1-70)
+          cd benchmarks && python populate_into_db.py "$BRANCH_NAME" "$commit_id" "$commit_msg"

      - name: Report success status
        if: ${{ success() }}
--- a/.github/workflows/pr_tests.yml
+++ b/.github/workflows/pr_tests.yml
@@ -92,6 +92,7 @@ jobs:
            runner: aws-general-8-plus
            image: diffusers/diffusers-pytorch-cpu
            report: torch_example_cpu
+
    name: ${{ matrix.config.name }}

    runs-on:
@@ -114,7 +115,8 @@ jobs:
    - name: Install dependencies
      run: |
        uv pip install -e ".[quality]"
-        uv pip uninstall transformers huggingface_hub && uv pip install --prerelease allow -U transformers@git+https://github.com/huggingface/transformers.git
+        #uv pip uninstall transformers huggingface_hub && uv pip install --prerelease allow -U transformers@git+https://github.com/huggingface/transformers.git
+        uv pip uninstall transformers huggingface_hub && uv pip install transformers==4.57.1
        uv pip uninstall accelerate && uv pip install -U accelerate@git+https://github.com/huggingface/accelerate.git --no-deps

    - name: Environment
@@ -216,6 +218,8 @@ jobs:

  run_lora_tests:
    needs: [check_code_quality, check_repository_consistency]
+    strategy:
+      fail-fast: false

    name: LoRA tests with PEFT main

@@ -243,8 +247,9 @@ jobs:
        uv pip install -U peft@git+https://github.com/huggingface/peft.git --no-deps
        uv pip install -U tokenizers
        uv pip uninstall accelerate && uv pip install -U accelerate@git+https://github.com/huggingface/accelerate.git --no-deps
-        uv pip uninstall transformers huggingface_hub && uv pip install --prerelease allow -U transformers@git+https://github.com/huggingface/transformers.git
-        
+        #uv pip uninstall transformers huggingface_hub && uv pip install --prerelease allow -U transformers@git+https://github.com/huggingface/transformers.git
+        uv pip uninstall transformers huggingface_hub && uv pip install transformers==4.57.1
+
    - name: Environment
      run: |
        python utils/print_env.py
@@ -270,6 +275,6 @@ jobs:
      if: ${{ always() }}
      uses: actions/upload-artifact@v6
      with:
-        name: pr_lora_test_reports
+        name: pr_main_test_reports
        path: reports

--- a/.github/workflows/pr_tests_gpu.yml
+++ b/.github/workflows/pr_tests_gpu.yml
@@ -131,7 +131,8 @@ jobs:
        run: |
          uv pip install -e ".[quality]"
          uv pip uninstall accelerate && uv pip install -U accelerate@git+https://github.com/huggingface/accelerate.git
-          uv pip uninstall transformers huggingface_hub && uv pip install --prerelease allow -U transformers@git+https://github.com/huggingface/transformers.git
+          #uv pip uninstall transformers huggingface_hub && uv pip install --prerelease allow -U transformers@git+https://github.com/huggingface/transformers.git
+          uv pip uninstall transformers huggingface_hub && uv pip install transformers==4.57.1

      - name: Environment
        run: |
@@ -201,7 +202,8 @@ jobs:
        uv pip install -e ".[quality]"
        uv pip install peft@git+https://github.com/huggingface/peft.git
        uv pip uninstall accelerate && uv pip install -U accelerate@git+https://github.com/huggingface/accelerate.git
-        uv pip uninstall transformers huggingface_hub && uv pip install --prerelease allow -U transformers@git+https://github.com/huggingface/transformers.git
+        #uv pip uninstall transformers huggingface_hub && uv pip install --prerelease allow -U transformers@git+https://github.com/huggingface/transformers.git
+        uv pip uninstall transformers huggingface_hub && uv pip install transformers==4.57.1

    - name: Environment
      run: |
@@ -262,7 +264,8 @@ jobs:
        nvidia-smi
    - name: Install dependencies
      run: |
-        uv pip uninstall transformers huggingface_hub && uv pip install --prerelease allow -U transformers@git+https://github.com/huggingface/transformers.git
+        #uv pip uninstall transformers huggingface_hub && uv pip install --prerelease allow -U transformers@git+https://github.com/huggingface/transformers.git
+        uv pip uninstall transformers huggingface_hub && uv pip install transformers==4.57.1
        uv pip install -e ".[quality,training]"

    - name: Environment
--- a/.github/workflows/push_tests.yml
+++ b/.github/workflows/push_tests.yml
@@ -76,7 +76,8 @@ jobs:
        run: |
          uv pip install -e ".[quality]"
          uv pip uninstall accelerate && uv pip install -U accelerate@git+https://github.com/huggingface/accelerate.git
-          uv pip uninstall transformers huggingface_hub && uv pip install --prerelease allow -U transformers@git+https://github.com/huggingface/transformers.git
+          #uv pip uninstall transformers huggingface_hub && uv pip install --prerelease allow -U transformers@git+https://github.com/huggingface/transformers.git
+          uv pip uninstall transformers huggingface_hub && uv pip install transformers==4.57.1
      - name: Environment
        run: |
          python utils/print_env.py
@@ -128,7 +129,8 @@ jobs:
        uv pip install -e ".[quality]"
        uv pip install peft@git+https://github.com/huggingface/peft.git
        uv pip uninstall accelerate && uv pip install -U accelerate@git+https://github.com/huggingface/accelerate.git
-        uv pip uninstall transformers huggingface_hub && uv pip install --prerelease allow -U transformers@git+https://github.com/huggingface/transformers.git
+        #uv pip uninstall transformers huggingface_hub && uv pip install --prerelease allow -U transformers@git+https://github.com/huggingface/transformers.git
+        uv pip uninstall transformers huggingface_hub && uv pip install transformers==4.57.1

    - name: Environment
      run: |
@@ -180,7 +182,8 @@ jobs:
    - name: Install dependencies
      run: |
        uv pip install -e ".[quality,training]"
-        uv pip uninstall transformers huggingface_hub && uv pip install --prerelease allow -U transformers@git+https://github.com/huggingface/transformers.git
+        #uv pip uninstall transformers huggingface_hub && uv pip install --prerelease allow -U transformers@git+https://github.com/huggingface/transformers.git
+        uv pip uninstall transformers huggingface_hub && uv pip install transformers==4.57.1
    - name: Environment
      run: |
        python utils/print_env.py
--- a/.github/workflows/pypi_publish.yaml
+++ b/.github/workflows/pypi_publish.yaml
@@ -54,6 +54,7 @@ jobs:
          python -m pip install --upgrade pip
          pip install -U setuptools wheel twine
          pip install -U torch --index-url https://download.pytorch.org/whl/cpu
+          pip install -U transformers

      - name: Build the dist files
        run: python setup.py bdist_wheel && python setup.py sdist
@@ -68,8 +69,6 @@ jobs:
        run: |
          pip install diffusers && pip uninstall diffusers -y
          pip install -i https://test.pypi.org/simple/ diffusers
-          pip install -U transformers
-          python utils/print_env.py
          python -c "from diffusers import __version__; print(__version__)"
          python -c "from diffusers import DiffusionPipeline; pipe = DiffusionPipeline.from_pretrained('fusing/unet-ldm-dummy-update'); pipe()"
          python -c "from diffusers import DiffusionPipeline; pipe = DiffusionPipeline.from_pretrained('hf-internal-testing/tiny-stable-diffusion-pipe', safety_checker=None); pipe('ah suh du')"
--- a/_modular_model_index.json
+++ b/_modular_model_index.json
@@ -1,75 +0,0 @@
-{
-    "_blocks_class_name": "SequentialPipelineBlocks",
-    "_class_name": "Flux2ModularPipeline",
-    "_diffusers_version": "0.36.0.dev0",
-    "scheduler": [
-        "diffusers",
-        "FlowMatchEulerDiscreteScheduler",
-        {
-            "repo": "hf-internal-testing/tiny-flux2",
-            "revision": null,
-            "subfolder": "scheduler",
-            "type_hint": [
-                "diffusers",
-                "FlowMatchEulerDiscreteScheduler"
-            ],
-            "variant": null
-        }
-    ],
-    "text_encoder": [
-        "transformers",
-        "Mistral3ForConditionalGeneration",
-        {
-            "repo": "hf-internal-testing/tiny-flux2",
-            "revision": null,
-            "subfolder": "text_encoder",
-            "type_hint": [
-                "transformers",
-                "Mistral3ForConditionalGeneration"
-            ],
-            "variant": null
-        }
-    ],
-    "tokenizer": [
-        "transformers",
-        "AutoProcessor",
-        {
-            "repo": "hf-internal-testing/Mistral-Small-3.1-24B-Instruct-2503-only-processor",
-            "revision": null,
-            "subfolder": "",
-            "type_hint": [
-                "transformers",
-                "AutoProcessor"
-            ],
-            "variant": null
-        }
-    ],
-    "transformer": [
-        "diffusers",
-        "Flux2Transformer2DModel",
-        {
-            "repo": "hf-internal-testing/tiny-flux2",
-            "revision": null,
-            "subfolder": "transformer",
-            "type_hint": [
-                "diffusers",
-                "Flux2Transformer2DModel"
-            ],
-            "variant": null
-        }
-    ],
-    "vae": [
-        "diffusers",
-        "AutoencoderKLFlux2",
-        {
-            "repo": "hf-internal-testing/tiny-flux2",
-            "revision": null,
-            "subfolder": "vae",
-            "type_hint": [
-                "diffusers",
-                "AutoencoderKLFlux2"
-            ],
-            "variant": null
-        }
-    ]
-}
--- a/benchmarks/populate_into_db.py
+++ b/benchmarks/populate_into_db.py
@@ -0,0 +1,166 @@
+import argparse
+import os
+import sys
+
+import gpustat
+import pandas as pd
+import psycopg2
+import psycopg2.extras
+from psycopg2.extensions import register_adapter
+from psycopg2.extras import Json
+
+
+register_adapter(dict, Json)
+
+FINAL_CSV_FILENAME = "collated_results.csv"
+# https://github.com/huggingface/transformers/blob/593e29c5e2a9b17baec010e8dc7c1431fed6e841/benchmark/init_db.sql#L27
+BENCHMARKS_TABLE_NAME = "benchmarks"
+MEASUREMENTS_TABLE_NAME = "model_measurements"
+
+
+def _init_benchmark(conn, branch, commit_id, commit_msg):
+    gpu_stats = gpustat.GPUStatCollection.new_query()
+    metadata = {"gpu_name": gpu_stats[0]["name"]}
+    repository = "huggingface/diffusers"
+    with conn.cursor() as cur:
+        cur.execute(
+            f"INSERT INTO {BENCHMARKS_TABLE_NAME} (repository, branch, commit_id, commit_message, metadata) VALUES (%s, %s, %s, %s, %s) RETURNING benchmark_id",
+            (repository, branch, commit_id, commit_msg, metadata),
+        )
+        benchmark_id = cur.fetchone()[0]
+        print(f"Initialised benchmark #{benchmark_id}")
+        return benchmark_id
+
+
+def parse_args():
+    parser = argparse.ArgumentParser()
+    parser.add_argument(
+        "branch",
+        type=str,
+        help="The branch name on which the benchmarking is performed.",
+    )
+
+    parser.add_argument(
+        "commit_id",
+        type=str,
+        help="The commit hash on which the benchmarking is performed.",
+    )
+
+    parser.add_argument(
+        "commit_msg",
+        type=str,
+        help="The commit message associated with the commit, truncated to 70 characters.",
+    )
+    args = parser.parse_args()
+    return args
+
+
+if __name__ == "__main__":
+    args = parse_args()
+    try:
+        conn = psycopg2.connect(
+            host=os.getenv("PGHOST"),
+            database=os.getenv("PGDATABASE"),
+            user=os.getenv("PGUSER"),
+            password=os.getenv("PGPASSWORD"),
+        )
+        print("DB connection established successfully.")
+    except Exception as e:
+        print(f"Problem during DB init: {e}")
+        sys.exit(1)
+
+    try:
+        benchmark_id = _init_benchmark(
+            conn=conn,
+            branch=args.branch,
+            commit_id=args.commit_id,
+            commit_msg=args.commit_msg,
+        )
+    except Exception as e:
+        print(f"Problem during initializing benchmark: {e}")
+        sys.exit(1)
+
+    cur = conn.cursor()
+
+    df = pd.read_csv(FINAL_CSV_FILENAME)
+
+    # Helper to cast values (or None) given a dtype
+    def _cast_value(val, dtype: str):
+        if pd.isna(val):
+            return None
+
+        if dtype == "text":
+            return str(val).strip()
+
+        if dtype == "float":
+            try:
+                return float(val)
+            except ValueError:
+                return None
+
+        if dtype == "bool":
+            s = str(val).strip().lower()
+            if s in ("true", "t", "yes", "1"):
+                return True
+            if s in ("false", "f", "no", "0"):
+                return False
+            if val in (1, 1.0):
+                return True
+            if val in (0, 0.0):
+                return False
+            return None
+
+        return val
+
+    try:
+        rows_to_insert = []
+        for _, row in df.iterrows():
+            scenario = _cast_value(row.get("scenario"), "text")
+            model_cls = _cast_value(row.get("model_cls"), "text")
+            num_params_B = _cast_value(row.get("num_params_B"), "float")
+            flops_G = _cast_value(row.get("flops_G"), "float")
+            time_plain_s = _cast_value(row.get("time_plain_s"), "float")
+            mem_plain_GB = _cast_value(row.get("mem_plain_GB"), "float")
+            time_compile_s = _cast_value(row.get("time_compile_s"), "float")
+            mem_compile_GB = _cast_value(row.get("mem_compile_GB"), "float")
+            fullgraph = _cast_value(row.get("fullgraph"), "bool")
+            mode = _cast_value(row.get("mode"), "text")
+
+            # If "github_sha" column exists in the CSV, cast it; else default to None
+            if "github_sha" in df.columns:
+                github_sha = _cast_value(row.get("github_sha"), "text")
+            else:
+                github_sha = None
+
+            measurements = {
+                "scenario": scenario,
+                "model_cls": model_cls,
+                "num_params_B": num_params_B,
+                "flops_G": flops_G,
+                "time_plain_s": time_plain_s,
+                "mem_plain_GB": mem_plain_GB,
+                "time_compile_s": time_compile_s,
+                "mem_compile_GB": mem_compile_GB,
+                "fullgraph": fullgraph,
+                "mode": mode,
+                "github_sha": github_sha,
+            }
+            rows_to_insert.append((benchmark_id, measurements))
+
+        # Batch-insert all rows
+        insert_sql = f"""
+        INSERT INTO {MEASUREMENTS_TABLE_NAME} (
+            benchmark_id,
+            measurements
+        )
+        VALUES (%s, %s);
+        """
+
+        psycopg2.extras.execute_batch(cur, insert_sql, rows_to_insert)
+        conn.commit()
+
+        cur.close()
+        conn.close()
+    except Exception as e:
+        print(f"Exception: {e}")
+        sys.exit(1)
--- a/custom_model_automodel_guide.md
+++ b/custom_model_automodel_guide.md
@@ -1,239 +0,0 @@
-# Loading Custom Models with `AutoModel` and `trust_remote_code`
-
-This guide shows how to create a custom model class that lives outside the `diffusers` library and load it via `AutoModel` with `trust_remote_code=True`.
-
-## How It Works
-
-When `AutoModel.from_pretrained()` (or `from_config()`) is called with `trust_remote_code=True`, it:
-
-1. Loads the `config.json` from the model repository.
-2. Checks for an `"auto_map"` key in the config that maps `"AutoModel"` to a `"<module_file>.<ClassName>"` reference.
-3. Downloads the referenced Python module from the repository.
-4. Dynamically imports and instantiates the class from that module.
-
-This allows anyone to define and share completely custom model architectures without requiring changes to the `diffusers` library itself.
-
-## Step 1: Define Your Custom Model
-
-Create a Python file (e.g., `modeling_my_model.py`) that defines your model class. The class must inherit from `ModelMixin` and `ConfigMixin`, and use the `@register_to_config` decorator on `__init__`.
-
-```python
-# modeling_my_model.py
-
-import torch
-from torch import nn
-from diffusers import ModelMixin, ConfigMixin
-from diffusers.configuration_utils import register_to_config
-
-
-class MyCustomModel(ModelMixin, ConfigMixin):
-    @register_to_config
-    def __init__(self, in_channels: int = 3, hidden_dim: int = 64, out_channels: int = 3):
-        super().__init__()
-        self.net = nn.Sequential(
-            nn.Conv2d(in_channels, hidden_dim, kernel_size=3, padding=1),
-            nn.SiLU(),
-            nn.Conv2d(hidden_dim, hidden_dim, kernel_size=3, padding=1),
-            nn.SiLU(),
-            nn.Conv2d(hidden_dim, out_channels, kernel_size=3, padding=1),
-        )
-
-    def forward(self, x: torch.Tensor) -> torch.Tensor:
-        return self.net(x)
-```
-
-Key requirements:
-
- **`ModelMixin`** provides `save_pretrained()` / `from_pretrained()` for weight serialization.
- **`ConfigMixin`** provides `save_config()` / `from_config()` and the `config.json` machinery.
- **`@register_to_config`** automatically captures all `__init__` parameters into `config.json` so the model can be reconstructed from config alone.
-
-## Step 2: Save the Model Locally
-
-```python
-from modeling_my_model import MyCustomModel
-
-model = MyCustomModel(in_channels=3, hidden_dim=128, out_channels=3)
-model.save_pretrained("./my-custom-model")
-```
-
-This creates a directory with:
-
-```
-my-custom-model/
-├── config.json
-└── diffusion_pytorch_model.safetensors
-```
-
-The generated `config.json` will look like:
-
-```json
-{
-  "_class_name": "MyCustomModel",
-  "_diffusers_version": "0.32.0",
-  "in_channels": 3,
-  "hidden_dim": 128,
-  "out_channels": 3
-}
-```
-
-## Step 3: Add the `auto_map` and Model File to the Repository
-
-To make `AutoModel` aware of your custom class, you need to:
-
-1. **Copy `modeling_my_model.py` into the saved model directory.**
-2. **Add an `"auto_map"` entry to `config.json`** that points `AutoModel` to your class.
-
-The `auto_map` value format is `"<module_name_without_.py>.<ClassName>"`:
-
-```json
-{
-  "_class_name": "MyCustomModel",
-  "_diffusers_version": "0.32.0",
-  "in_channels": 3,
-  "hidden_dim": 128,
-  "out_channels": 3,
-  "auto_map": {
-    "AutoModel": "modeling_my_model.MyCustomModel"
-  }
-}
-```
-
-Your final directory structure should be:
-
-```
-my-custom-model/
-├── config.json                          # with auto_map added
-├── diffusion_pytorch_model.safetensors
-└── modeling_my_model.py                 # your custom model code
-```
-
-## Step 4: Load with `AutoModel`
-
-### From a Local Directory
-
-```python
-from diffusers import AutoModel
-
-model = AutoModel.from_pretrained("./my-custom-model", trust_remote_code=True)
-print(model)
-```
-
-### From the Hugging Face Hub
-
-First, push the model directory to a Hub repository:
-
-```python
-from huggingface_hub import HfApi
-
-api = HfApi()
-api.create_repo("your-username/my-custom-model", exist_ok=True)
-api.upload_folder(
-    folder_path="./my-custom-model",
-    repo_id="your-username/my-custom-model",
-)
-```
-
-Then load it:
-
-```python
-from diffusers import AutoModel
-
-model = AutoModel.from_pretrained(
-    "your-username/my-custom-model",
-    trust_remote_code=True,
-)
-```
-
-### Initializing from Config (Random Weights)
-
-```python
-from diffusers import AutoModel
-
-model = AutoModel.from_config("./my-custom-model", trust_remote_code=True)
-```
-
-## Complete Example
-
-```python
-import torch
-from torch import nn
-from diffusers import ModelMixin, ConfigMixin, AutoModel
-from diffusers.configuration_utils import register_to_config
-
-
-# 1. Define
-class MyCustomModel(ModelMixin, ConfigMixin):
-    @register_to_config
-    def __init__(self, in_channels: int = 3, hidden_dim: int = 64, out_channels: int = 3):
-        super().__init__()
-        self.net = nn.Sequential(
-            nn.Conv2d(in_channels, hidden_dim, kernel_size=3, padding=1),
-            nn.SiLU(),
-            nn.Conv2d(hidden_dim, hidden_dim, kernel_size=3, padding=1),
-            nn.SiLU(),
-            nn.Conv2d(hidden_dim, out_channels, kernel_size=3, padding=1),
-        )
-
-    def forward(self, x: torch.Tensor) -> torch.Tensor:
-        return self.net(x)
-
-
-# 2. Save
-model = MyCustomModel(in_channels=3, hidden_dim=128, out_channels=3)
-model.save_pretrained("./my-custom-model")
-
-# 3. Manually add auto_map to config.json and copy modeling file
-import json, shutil
-
-config_path = "./my-custom-model/config.json"
-with open(config_path) as f:
-    config = json.load(f)
-
-config["auto_map"] = {"AutoModel": "modeling_my_model.MyCustomModel"}
-
-with open(config_path, "w") as f:
-    json.dump(config, f, indent=2)
-
-shutil.copy("modeling_my_model.py", "./my-custom-model/modeling_my_model.py")
-
-# 4. Load via AutoModel
-loaded_model = AutoModel.from_pretrained("./my-custom-model", trust_remote_code=True)
-
-# 5. Verify
-x = torch.randn(1, 3, 32, 32)
-with torch.no_grad():
-    out_original = model(x)
-    out_loaded = loaded_model(x)
-
-assert torch.allclose(out_original, out_loaded)
-print("Models produce identical outputs!")
-```
-
-## Using Relative Imports in Custom Code
-
-If your custom model depends on additional modules, you can use relative imports. For example, if your model uses a custom attention layer defined in a separate file:
-
-```
-my-custom-model/
-├── config.json
-├── diffusion_pytorch_model.safetensors
-├── modeling_my_model.py      # imports from .my_attention
-└── my_attention.py            # custom attention implementation
-```
-
-In `modeling_my_model.py`:
-
-```python
-from .my_attention import MyAttention
-```
-
-The dynamic module loader will automatically resolve and download all relatively imported files.
-
-## Security Note
-
-`trust_remote_code=True` executes arbitrary Python code from the model repository. Only use it with repositories you trust. You can globally disable remote code execution by setting the environment variable:
-
-```bash
-export DIFFUSERS_DISABLE_REMOTE_CODE=1
-```
--- a/docs/source/en/api/modular_diffusers/pipeline_blocks.md
+++ b/docs/source/en/api/modular_diffusers/pipeline_blocks.md
@@ -14,8 +14,4 @@

 ## AutoPipelineBlocks

-[[autodoc]] diffusers.modular_pipelines.modular_pipeline.AutoPipelineBlocks
-
-## ConditionalPipelineBlocks
-
-[[autodoc]] diffusers.modular_pipelines.modular_pipeline.ConditionalPipelineBlocks
+[[autodoc]] diffusers.modular_pipelines.modular_pipeline.AutoPipelineBlocks
--- a/docs/source/en/api/pipelines/cosmos.md
+++ b/docs/source/en/api/pipelines/cosmos.md
@@ -46,20 +46,6 @@ output = pipe(
 output.save("output.png")
 ```

-## Cosmos2_5_TransferPipeline
-
-[[autodoc]] Cosmos2_5_TransferPipeline
-  - all
-  - __call__
-
-
-## Cosmos2_5_PredictBasePipeline
-
-[[autodoc]] Cosmos2_5_PredictBasePipeline
-  - all
-  - __call__
-
-
 ## CosmosTextToWorldPipeline

 [[autodoc]] CosmosTextToWorldPipeline
@@ -84,6 +70,12 @@ output.save("output.png")
  - all
  - __call__

+## Cosmos2_5_PredictBasePipeline
+
+[[autodoc]] Cosmos2_5_PredictBasePipeline
+  - all
+  - __call__
+
 ## CosmosPipelineOutput

 [[autodoc]] pipelines.cosmos.pipeline_output.CosmosPipelineOutput
--- a/docs/source/en/modular_diffusers/auto_pipeline_blocks.md
+++ b/docs/source/en/modular_diffusers/auto_pipeline_blocks.md
@@ -121,7 +121,7 @@ from diffusers.modular_pipelines import AutoPipelineBlocks

 class AutoImageBlocks(AutoPipelineBlocks):
    # List of sub-block classes to choose from
-    block_classes = [InpaintBlock, ImageToImageBlock, TextToImageBlock]
+    block_classes = [block_inpaint_cls, block_i2i_cls, block_t2i_cls]
    # Names for each block in the same order
    block_names = ["inpaint", "img2img", "text2img"]
    # Trigger inputs that determine which block to run
@@ -129,8 +129,8 @@ class AutoImageBlocks(AutoPipelineBlocks):
    # - "image" triggers img2img workflow (but only if mask is not provided)
    # - if none of above, runs the text2img workflow (default)
    block_trigger_inputs = ["mask", "image", None]
+    # Description is extremely important for AutoPipelineBlocks

-    @property
    def description(self):
        return (
            "Pipeline generates images given different types of conditions!\n"
@@ -141,7 +141,7 @@ class AutoImageBlocks(AutoPipelineBlocks):
        )
 ```

-It is **very** important to include a `description` to avoid any confusion over how to run a block and what inputs are required. While [`~modular_pipelines.AutoPipelineBlocks`] are convenient, its conditional logic may be difficult to figure out if it isn't properly explained.
+It is **very** important to include a `description` to avoid any confusion over how to run a block and what inputs are required. While [`~modular_pipelines.AutoPipelineBlocks`] are convenient, it's conditional logic may be difficult to figure out if it isn't properly explained.

 Create an instance of `AutoImageBlocks`.

@@ -152,74 +152,5 @@ auto_blocks = AutoImageBlocks()
 For more complex compositions, such as nested [`~modular_pipelines.AutoPipelineBlocks`] blocks when they're used as sub-blocks in larger pipelines, use the [`~modular_pipelines.SequentialPipelineBlocks.get_execution_blocks`] method to extract the a block that is actually run based on your input.

 ```py
-auto_blocks.get_execution_blocks(mask=True)
-```
-
-## ConditionalPipelineBlocks
-
-[`~modular_pipelines.AutoPipelineBlocks`] is a special case of [`~modular_pipelines.ConditionalPipelineBlocks`]. While [`~modular_pipelines.AutoPipelineBlocks`] selects blocks based on whether a trigger input is provided or not, [`~modular_pipelines.ConditionalPipelineBlocks`] is able to select a block based on custom selection logic provided in the `select_block` method.
-
-Here is the same example written using [`~modular_pipelines.ConditionalPipelineBlocks`] directly:
-
-```py
-from diffusers.modular_pipelines import ConditionalPipelineBlocks
-
-class AutoImageBlocks(ConditionalPipelineBlocks):
-    block_classes = [InpaintBlock, ImageToImageBlock, TextToImageBlock]
-    block_names = ["inpaint", "img2img", "text2img"]
-    block_trigger_inputs = ["mask", "image"]
-    default_block_name = "text2img"
-
-    @property
-    def description(self):
-        return (
-            "Pipeline generates images given different types of conditions!\n"
-            + "This is an auto pipeline block that works for text2img, img2img and inpainting tasks.\n"
-            + " - inpaint workflow is run when `mask` is provided.\n"
-            + " - img2img workflow is run when `image` is provided (but only when `mask` is not provided).\n"
-            + " - text2img workflow is run when neither `image` nor `mask` is provided.\n"
-        )
-
-    def select_block(self, mask=None, image=None) -> str | None:
-        if mask is not None:
-            return "inpaint"
-        if image is not None:
-            return "img2img"
-        return None  # falls back to default_block_name ("text2img")
-```
-
-The inputs listed in `block_trigger_inputs` are passed as keyword arguments to `select_block()`. When `select_block` returns `None`, it falls back to `default_block_name`. If `default_block_name` is also `None`, the entire conditional block is skipped — this is useful for optional processing steps that should only run when specific inputs are provided.
-
-## Workflows
-
-Pipelines that contain conditional blocks ([`~modular_pipelines.AutoPipelineBlocks`] or [`~modular_pipelines.ConditionalPipelineBlocks]`) can support multiple workflows — for example, our SDXL modular pipeline supports a dozen workflows all in one pipeline. But this also means it can be confusing for users to know what workflows are supported and how to run them. For pipeline builders, it's useful to be able to extract only the blocks relevant to a specific workflow.
-
-We recommend defining a `_workflow_map` to give each workflow a name and explicitly list the inputs it requires.
-
-```py
-from diffusers.modular_pipelines import SequentialPipelineBlocks
-
-class MyPipelineBlocks(SequentialPipelineBlocks):
-    block_classes = [TextEncoderBlock, AutoImageBlocks, DecodeBlock]
-    block_names = ["text_encoder", "auto_image", "decode"]
-
-    _workflow_map = {
-        "text2image": {"prompt": True},
-        "image2image": {"image": True, "prompt": True},
-        "inpaint": {"mask": True, "image": True, "prompt": True},
-    }
-```
-
-All of our built-in modular pipelines come with pre-defined workflows. The `available_workflows` property lists all supported workflows:
-
-```py
-pipeline_blocks = MyPipelineBlocks()
-pipeline_blocks.available_workflows
-# ['text2image', 'image2image', 'inpaint']
-```
-
-Retrieve a specific workflow with `get_workflow` to inspect and debug a specific block that executes the workflow.
-
-```py
-pipeline_blocks.get_workflow("inpaint")
+auto_blocks.get_execution_blocks("mask")
 ```
--- a/docs/source/en/training/distributed_inference.md
+++ b/docs/source/en/training/distributed_inference.md
@@ -111,7 +111,7 @@ if __name__ == "__main__":
 Call `torchrun` to run the inference script and use the `--nproc_per_node` argument to set the number of GPUs to use.

 ```bash
-torchrun --nproc_per_node=2 run_distributed.py
+torchrun run_distributed.py --nproc_per_node=2
 ```

 ## device_map
--- a/example.py
+++ b/example.py
@@ -1,120 +0,0 @@
-# coding=utf-8
-# Copyright 2025 HuggingFace Inc.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
-import torch
-
-from diffusers import QwenImageTransformer2DModel
-from diffusers.utils.torch_utils import randn_tensor
-
-from ...testing_utils import enable_full_determinism, torch_device
-from ..test_modeling_common import LoraHotSwappingForModelTesterMixin
-from ..testing_utils import (
-    AttentionTesterMixin,
-    ContextParallelTesterMixin,
-    LoraTesterMixin,
-    MemoryTesterMixin,
-    ModelTesterMixin,
-    TorchCompileTesterMixin,
-    TrainingTesterMixin,
-)
-
-
-enable_full_determinism()
-
-
-class QwenImageTransformerTesterConfig:
-    model_class = QwenImageTransformer2DModel
-    pretrained_model_name_or_path = ""
-    pretrained_model_kwargs = {"subfolder": "transformer"}
-
-    @property
-    def generator(self):
-        return torch.Generator("cpu").manual_seed(0)
-
-    def get_init_dict(self) -> dict[str, int | list[int]]:
-        # __init__ parameters:
-        #   patch_size: int = 2
-        #   in_channels: int = 64
-        #   out_channels: Optional[int] = 16
-        #   num_layers: int = 60
-        #   attention_head_dim: int = 128
-        #   num_attention_heads: int = 24
-        #   joint_attention_dim: int = 3584
-        #   guidance_embeds: bool = False
-        #   axes_dims_rope: Tuple[int, int, int] = <complex>
-        return {}
-
-    def get_dummy_inputs(self) -> dict[str, torch.Tensor]:
-        # forward() parameters:
-        #   hidden_states: torch.Tensor
-        #   encoder_hidden_states: torch.Tensor
-        #   encoder_hidden_states_mask: torch.Tensor
-        #   timestep: torch.LongTensor
-        #   img_shapes: Optional[List[Tuple[int, int, int]]]
-        #   txt_seq_lens: Optional[List[int]]
-        #   guidance: torch.Tensor
-        #   attention_kwargs: Optional[Dict[str, Any]]
-        #   controlnet_block_samples
-        #   return_dict: bool = True
-        # TODO: Fill in dummy inputs
-        return {}
-
-    @property
-    def input_shape(self) -> tuple[int, ...]:
-        return (1, 1)
-
-    @property
-    def output_shape(self) -> tuple[int, ...]:
-        return (1, 1)
-
-
-class TestQwenImageTransformerModel(QwenImageTransformerTesterConfig, ModelTesterMixin):
-    pass
-
-
-class TestQwenImageTransformerMemory(QwenImageTransformerTesterConfig, MemoryTesterMixin):
-    pass
-
-
-class TestQwenImageTransformerAttention(QwenImageTransformerTesterConfig, AttentionTesterMixin):
-    pass
-
-
-class TestQwenImageTransformerTorchCompile(QwenImageTransformerTesterConfig, TorchCompileTesterMixin):
-    different_shapes_for_compilation = [(4, 4), (4, 8), (8, 8)]
-
-    def get_dummy_inputs(self, height: int = 4, width: int = 4) -> dict[str, torch.Tensor]:
-        # TODO: Implement dynamic input generation
-        return {}
-
-
-class TestQwenImageTransformerLora(QwenImageTransformerTesterConfig, LoraTesterMixin):
-    pass
-
-
-class TestQwenImageTransformerContextParallel(QwenImageTransformerTesterConfig, ContextParallelTesterMixin):
-    pass
-
-
-class TestQwenImageTransformerTraining(QwenImageTransformerTesterConfig, TrainingTesterMixin):
-    pass
-
-
-class TestQwenImageTransformerLoraHotSwappingForModel(QwenImageTransformerTesterConfig, LoraHotSwappingForModelTesterMixin):
-    different_shapes_for_compilation = [(4, 4), (4, 8), (8, 8)]
-
-    def get_dummy_inputs(self, height: int = 4, width: int = 4) -> dict[str, torch.Tensor]:
-        # TODO: Implement dynamic input generation
-        return {}
--- a/examples/custom_diffusion/test_custom_diffusion.py
+++ b/examples/custom_diffusion/test_custom_diffusion.py
@@ -17,9 +17,6 @@ import logging
 import os
 import sys
 import tempfile
-import unittest
-
-from diffusers.utils import is_transformers_version


 sys.path.append("..")
@@ -33,7 +30,6 @@ stream_handler = logging.StreamHandler(sys.stdout)
 logger.addHandler(stream_handler)


-@unittest.skipIf(is_transformers_version(">=", "4.57.5"), "Size mismatch")
 class CustomDiffusion(ExamplesTestsAccelerate):
    def test_custom_diffusion(self):
        with tempfile.TemporaryDirectory() as tmpdir:
--- a/modular_model_index.json
+++ b/modular_model_index.json
@@ -1,73 +0,0 @@
-{
-  "_blocks_class_name": "SequentialPipelineBlocks",
-  "_class_name": "Flux2ModularPipeline",
-  "_diffusers_version": "0.36.0.dev0",
-  "scheduler": [
-    null,
-    null,
-    {
-      "pretrained_model_name_or_path": "black-forest-labs/FLUX.2-dev",
-      "revision": null,
-      "subfolder": "scheduler",
-      "type_hint": [
-        "diffusers",
-        "FlowMatchEulerDiscreteScheduler"
-      ],
-      "variant": null
-    }
-  ],
-  "text_encoder": [
-    null,
-    null,
-    {
-      "revision": null,
-      "subfolder": "text_encoder",
-      "type_hint": [
-        "transformers",
-        "Mistral3ForConditionalGeneration"
-      ],
-      "variant": null
-    }
-  ],
-  "tokenizer": [
-    null,
-    null,
-    {
-      "revision": null,
-      "subfolder": "tokenizer",
-      "type_hint": [
-        "transformers",
-        "AutoProcessor"
-      ],
-      "variant": null
-    }
-  ],
-  "transformer": [
-    null,
-    null,
-    {
-      "pretrained_model_name_or_path": "diffusers/FLUX.2-dev-bnb-4bit",
-      "revision": null,
-      "subfolder": "transformer",
-      "type_hint": [
-        "diffusers",
-        "Flux2Transformer2DModel"
-      ],
-      "variant": null
-    }
-  ],
-  "vae": [
-    null,
-    null,
-    {
-      "pretrained_model_name_or_path": "black-forest-labs/FLUX.2-dev",
-      "revision": null,
-      "subfolder": "vae",
-      "type_hint": [
-        "diffusers",
-        "AutoencoderKLFlux2"
-      ],
-      "variant": null
-    }
-  ]
-}
--- a/pr_review/12498.md
+++ b/pr_review/12498.md
@@ -1,56 +0,0 @@
-Now let me look at what `dequantize_gguf_tensor` does to understand the fix better:
-
-## Code Review: GGUF fix for unquantized types
-
-### 1. Summary of Changes
-
-The PR fixes a bug in the `_fused_mul_mat_gguf` function (line 79-105) where unquantized GGUF tensor types (F32, F16, BF16) were incorrectly handled. 
-
-**Before:** When `qweight_type` was an unquantized type, the code directly performed matrix multiplication: `x @ qweight.T`
-
-**After:** It now calls `dequantize_gguf_tensor(qweight)` first, then performs the matrix multiplication: `x @ weight.T`
-
-The issue was that even "unquantized" GGUF tensors are stored in an 8-bit tensor format and need to be converted to their proper data type representation before use.
-
-### 2. Potential Issues or Bugs
-
-**None identified.** The fix is correct and addresses a real bug:
-
- The `dequantize_gguf_tensor` function (lines 509-527) checks if the tensor has a `quant_type` attribute and handles the appropriate conversion
- For BF16 specifically, there's a dedicated `dequantize_blocks_BF16` function (lines 428-429) that properly converts the 8-bit storage format
- The fix aligns with how the native path already works in `forward_native` (lines 593-599), which always calls `dequantize_gguf_tensor`
-
-### 3. Code Quality Observations
-
-**Strengths:**
- The fix is minimal and surgical - only changes what's necessary
- Maintains consistency with the `forward_native` path which already uses `dequantize_gguf_tensor`
- The variable naming (`weight` instead of reusing `qweight`) makes it clear a transformation occurred
-
-**Minor observation:**
- The comment on line 80 "there is no need to call any kernel for fp16/bf16" is now slightly misleading since we DO need to call dequantization logic. Consider updating it to something like: "no need to call specialized GGUF kernel for fp16/bf16, but still need to dequantize from 8-bit storage"
-
-### 4. Security Considerations
-
-**No security concerns.** The change:
- Doesn't introduce any external input handling
- Doesn't modify control flow in a way that could bypass security checks
- Only fixes a data type conversion issue
-
-### 5. Suggestions for Improvement
-
-1. **Update the comment** on line 80 in `src/diffusers/quantizers/gguf/utils.py:80`:
-   ```python
-   # unquantized types still need dequantization from 8-bit storage, but don't need specialized kernels
-   if qweight_type in UNQUANTIZED_TYPES:
-       weight = dequantize_gguf_tensor(qweight)
-       return x @ weight.T
-   ```
-
-2. **Consider adding a test** to prevent regression of this issue. A test should verify that unquantized GGUF tensors produce correct output shapes and values.
-
-3. **Documentation:** The PR description mentions torch 2.8/2.9 build availability. This might be worth tracking in a GitHub issue if not already done.
-
-### Verdict
-
-**Approve with minor comment update suggestion.** The fix correctly addresses a real shape mismatch bug where GGUF's 8-bit storage format wasn't being properly converted for unquantized types. The logic is sound and aligns with the existing native implementation path.
--- a/pr_review/12744.md
+++ b/pr_review/12744.md
@@ -1,186 +0,0 @@
-I'll provide a comprehensive code review of this MagCache PR.
-
-## Summary of Changes
-
-This PR implements MagCache (Magnitude-aware Cache), a training-free inference acceleration technique for diffusion transformers. The implementation:
-
- Adds a `MagCacheConfig` class for configuration
- Implements `MagCacheHeadHook` and `MagCacheBlockHook` following the existing ModelHook pattern
- Includes calibration mode to compute magnitude ratios for any transformer model
- Provides pre-computed `FLUX_MAG_RATIOS` for Flux models
- Adds comprehensive documentation and tests
-
-## Potential Issues and Bugs
-
-### 1. **Critical: Missing Hook Removal in `disable_cache()`**
-```python
-# In cache_utils.py, line ~127
-elif isinstance(self._cache_config, MagCacheConfig):
-    registry.remove_hook(_MAG_CACHE_LEADER_BLOCK_HOOK, recurse=True)
-```
-
-**Issue**: The code only removes the leader/head hook but not the block hooks (`_MAG_CACHE_BLOCK_HOOK`). This will leave hooks attached when disabling the cache.
-
-**Fix**: Add removal of block hooks:
-```python
-elif isinstance(self._cache_config, MagCacheConfig):
-    registry.remove_hook(_MAG_CACHE_LEADER_BLOCK_HOOK, recurse=True)
-    registry.remove_hook(_MAG_CACHE_BLOCK_HOOK, recurse=True)
-```
-
-### 2. **Shape Mismatch Handling Logic Issue**
-In `mag_cache.py` lines 224-248, the shape mismatch handling has a potential issue:
-
-```python
-elif (
-    output.ndim == 3
-    and res.ndim == 3
-    and output.shape[0] == res.shape[0]
-    and output.shape[2] == res.shape[2]
-):
-    diff = output.shape[1] - res.shape[1]
-    if diff > 0:
-        output = output.clone()
-        output[:, diff:, :] = output[:, diff:, :] + res
-```
-
-**Issue**: This assumes text tokens come first and image tokens come last. This may not be universal across all models (e.g., some models interleave tokens differently).
-
-**Suggestion**: Add a comment explaining this assumption or add configuration to specify the concatenation strategy.
-
-### 3. **Residual Calculation Fallback is Unsafe**
-In `mag_cache.py` line 343:
-
-```python
-else:
-    # Fallback for completely mismatched shapes
-    residual = out_hidden
-```
-
-**Issue**: This fallback doesn't compute a residual at all—it just uses the output. This will cause incorrect behavior in subsequent steps.
-
-**Suggestion**: Either raise an error or add a warning that calibration is required for this model architecture.
-
-### 4. **Device Mismatch Handling is Incomplete**
-```python
-if res.device != output.device:
-    res = res.to(output.device)
-```
-
-**Issue**: This only handles device mismatch for the residual, but doesn't handle dtype mismatches which could occur with mixed precision training.
-
-**Suggestion**: Add dtype handling:
-```python
-if res.device != output.device or res.dtype != output.dtype:
-    res = res.to(device=output.device, dtype=output.dtype)
-```
-
-### 5. **Calibration Logging Could Be Missed**
-The calibration results are printed to stdout (line 380) and logged. However, if the user has logging disabled or redirected, they might miss this critical information.
-
-**Suggestion**: Consider returning calibration results from the pipeline or raising a more visible notification.
-
-### 6. **Test Suite is Skipped**
-```python
-@unittest.skip("MagCache unit tests are skipped.")
-class MagCacheTests(unittest.TestCase):
-```
-
-**Issue**: All unit tests are skipped, which means the core logic isn't being validated in CI.
-
-**Action Required**: Remove the skip decorator before merging or add a comment explaining why it's temporarily skipped.
-
-## Code Quality Observations
-
-### Strengths:
-1. **Well-structured**: Follows existing patterns (ModelHook, StateManager) consistently
-2. **Good documentation**: Comprehensive docstrings and inline comments
-3. **Calibration mode**: Clever design allowing model-agnostic usage
-4. **Error handling**: Validates configuration upfront
-5. **Interpolation logic**: Smart handling of different step counts via `nearest_interp()`
-
-### Areas for Improvement:
-
-1. **Magic Numbers**: Several hardcoded values could be constants:
-   ```python
-   eps = 1e-8  # Line 335 in _perform_calibration_step
-   expected_atol = 0.1  # Line 2989 in test
-   ```
-
-2. **Code Duplication**: The logic for handling tuple returns appears multiple times. Consider extracting to a helper method.
-
-3. **Type Hints**: Some methods lack return type hints (e.g., `nearest_interp`)
-
-4. **Compiler Disable Decorator**: The `@torch.compiler.disable` decorator is used but not explained. Add a comment about why compilation is disabled.
-
-## Security Considerations
-
-### Low Risk:
- No external network calls
- No file system access beyond logging
- No execution of arbitrary code
- Tensor operations are standard PyTorch
-
-### Observations:
-1. **Device Transfer**: The `.to(device)` calls are safe but could consume unexpected memory if tensors are large
-2. **State Management**: The state is properly isolated and reset between inference runs
-
-## Suggestions for Improvement
-
-### 1. Add Configuration Validation
-```python
-def __post_init__(self):
-    # Existing checks...
-    
-    # Add bounds checking
-    if not 0.0 <= self.retention_ratio <= 1.0:
-        raise ValueError(f"retention_ratio must be in [0, 1], got {self.retention_ratio}")
-    if self.max_skip_steps < 1:
-        raise ValueError(f"max_skip_steps must be >= 1, got {self.max_skip_steps}")
-    if self.threshold <= 0:
-        raise ValueError(f"threshold must be positive, got {self.threshold}")
-```
-
-### 2. Add Metrics/Statistics
-Consider adding optional statistics collection:
- How many blocks were skipped per step
- Average accumulated error
- Total compute savings
-
-This would help users optimize their thresholds.
-
-### 3. Improve Documentation Example
-The documentation example could show expected speedup or quality metrics to set user expectations.
-
-### 4. Add Gradient Mode Check
-```python
-if torch.is_grad_enabled():
-    logger.warning("MagCache is designed for inference only. Gradients are enabled but will not flow correctly through cached blocks.")
-```
-
-### 5. Consider Memory Cleanup
-The `previous_residual` is held in state indefinitely. Consider adding explicit cleanup:
-```python
-def cleanup(self):
-    if self.previous_residual is not None:
-        del self.previous_residual
-        self.previous_residual = None
-```
-
-## Minor Issues
-
-1. **Line 26**: Unused import or should be used in logger initialization
-2. **Line 332**: Comment says "Fallback to matching tail" but logic is unclear
-3. **Documentation**: The TIP about batched CFG could include more detail about why this works
-
-## Conclusion
-
-This is a **well-implemented feature** with good design patterns and documentation. The main concerns are:
-
-1. **Critical**: Fix the missing block hook removal in `disable_cache()` (Line 127)
-2. **Important**: Unskip and fix the unit tests
-3. **Recommended**: Improve shape mismatch handling with better error messages
-
-The implementation is production-ready once these issues are addressed. The calibration mode is particularly clever and makes this genuinely model-agnostic.
-
-**Recommendation**: Request changes for items #1 and #2, then approve once fixed.
--- a/pr_review/13028.md
+++ b/pr_review/13028.md
@@ -1,99 +0,0 @@
-# PR #13028: [Modular] add explicit workflow support
-
-**Author:** @yiyixuxu
-**Branch:** `modular-workflow` -> `main`
-**Files changed:** `modular_pipeline.py`, `modular_pipeline_utils.py`, `qwenimage/modular_blocks_qwenimage.py`
-**+298 / -165**
-
---
-
-## Summary
-
-This PR adds a `_workflow_map` class attribute to `SequentialPipelineBlocks` that maps named workflows (e.g., `"text2image"`, `"inpainting"`) to their trigger inputs. Users can then call `get_workflow("text2image")` to get the execution blocks for that workflow. The PR also refactors `get_execution_blocks` into `ConditionalPipelineBlocks` and `SequentialPipelineBlocks`, moves `combine_inputs`/`combine_outputs` to module-level functions, and improves docstrings.
-
-## Main Concern: "Workflow" as a New Concept
-
-Modular Diffusers already requires users to learn: **Pipelines**, **Blocks** (Sequential, Conditional, Auto, Loop), **Steps**, **Components**, **Inputs/Outputs**, **Trigger Inputs**, **Execution Blocks**, **PipelineState**, and **BlockState**. Adding "workflow" as yet another term increases cognitive overhead.
-
-The underlying feature is useful — named presets for trigger inputs are genuinely helpful for discoverability. But "workflow" may not be the right label:
-
-1. **Overloaded term**: "Workflow" is heavily used in the AI/ML ecosystem (ComfyUI workflows, orchestration workflows, CI/CD workflows). Users may expect something more complex than what this is.
-
-2. **It's really a task/mode, not a workflow**: `"text2image"`, `"inpainting"`, `"image2image"` are *tasks* or *modes*. The rest of diffusers already uses "task" terminology — `AutoPipelineForText2Image`, `AutoPipelineForInpainting`, etc. Calling the same concept "workflow" in Modular Diffusers creates inconsistency.
-
-3. **It's a thin wrapper**: `get_workflow("text2image")` is just `get_execution_blocks(prompt=True)`. Users still need to understand `get_execution_blocks` and trigger inputs to do anything beyond the predefined workflows. The abstraction doesn't save much complexity.
-
-**Suggestion**: Consider `_task_map` / `get_task()` / `task_names` to align with existing diffusers terminology, or `_mode_map` / `get_mode()` / `mode_names` for something more neutral. The existing `auto_pipeline.py` already uses "task" internally — `_get_task_class()` maps pipeline class names to task-specific variants (text2image, image2image, inpainting), and the public API follows the `AutoPipelineFor<Task>` naming pattern. These are the exact same concepts this PR calls "workflows." Alternatively, this could simply be better documentation on `get_execution_blocks` with named examples, rather than a new API surface.
-
-## Code Issues
-
-### Behavioral change: `outputs` -> `intermediate_outputs` in traversal
-
-`modular_pipeline.py` — In `SequentialPipelineBlocks.get_execution_blocks`, the old `_traverse_trigger_blocks` tracked `block.outputs` to propagate available values to downstream blocks. The new code tracks `block.intermediate_outputs` instead:
-
-```python
-# Old
-if hasattr(block, "outputs"):
-    for out in block.outputs:
-        active_inputs[out.name] = True
-
-# New
-if hasattr(block, "intermediate_outputs"):
-    for out in block.intermediate_outputs:
-        active_inputs[out.name] = True
-```
-
-`intermediate_outputs` and `outputs` can differ — `intermediate_outputs` includes values passed between blocks in the pipeline state, while `outputs` are the final outputs. This could change which downstream conditional blocks get triggered. If this is intentional, it should be called out explicitly in the PR description since it affects existing behavior.
-
-### `_workflow_map` on base class, implementations only on `SequentialPipelineBlocks`
-
-`_workflow_map = None` is defined on `ModularPipelineBlocks` (the base class), but `workflow_names` and `get_workflow()` are only implemented on `SequentialPipelineBlocks`. The base class stubs raise `NotImplementedError`. This is misleading — it suggests workflows *could* be implemented for other block types. If workflows are intentionally only for `SequentialPipelineBlocks`, define `_workflow_map` there and don't add stubs to the base class.
-
-### `get_execution_blocks` no longer filters None values
-
-Old code:
-```python
-active_inputs = {k: v for k, v in kwargs.items() if v is not None}
-```
-
-New code:
-```python
-active_inputs = dict(kwargs)
-```
-
-This is a behavioral change to the public `get_execution_blocks` API. The old code explicitly stripped `None` values so users could write `get_execution_blocks(prompt="a cat", image=None)` and `image` wouldn't trigger anything. The new code passes `None` through. It happens to still work because `select_block` checks `is not None` internally, but callers can no longer rely on the documented filtering behavior. This should be noted.
-
-### `default_block_name` changed from property to instance attribute
-
-In `AutoPipelineBlocks`, `default_block_name` was a `@property` that derived the default from `block_trigger_inputs` on every access. It's now set as an instance attribute in `__init__`. This is mostly fine, but the new code also adds a validation that `default_block_name is not None` raises an error before it's set — so subclasses that accidentally set `default_block_name` as a class attribute will now break. This is a stricter contract that should be documented.
-
-### Typo
-
-`modular_pipeline.py` — `# currentlyonly ConditionalPipelineBlocks` should be `# currently only`.
-
-### `_get_trigger_inputs()` called multiple times in `__repr__`
-
-In `SequentialPipelineBlocks.__repr__`, `self._get_trigger_inputs()` is called 3 times (condition check, trigger inputs display, example input). This recursively traverses all blocks each time. Should be computed once and reused.
-
-### Duplicate `format_workflow` calls in `__repr__` and `doc`
-
-Both `SequentialPipelineBlocks.__repr__` and `SequentialPipelineBlocks.doc` build the description + workflow string independently with identical logic:
-
-```python
-description = self.description
-if self._workflow_map is not None:
-    workflow_str = format_workflow(self._workflow_map)
-    description = f"{self.description}\n\n{workflow_str}"
-```
-
-This should be extracted into a property or helper.
-
-### No tests
-
-The PR description mentions "I will add a test suite for this too!" but there are no tests included. Workflow resolution, edge cases (empty workflow map, missing workflow name, workflows with overlapping triggers), and the `get_execution_blocks` refactoring should all be tested before merge.
-
-## Refactoring Quality
-
-The refactoring of `get_execution_blocks` from a monolithic method on `SequentialPipelineBlocks` into separate implementations on `ConditionalPipelineBlocks` and `SequentialPipelineBlocks` is a good separation of concerns. Moving `combine_inputs`/`combine_outputs` to module-level functions is also reasonable since they don't depend on instance state.
-
-The improved `AutoPipelineBlocks` docstring with the example is a significant documentation improvement.
--- a/pr_review/13075.md
+++ b/pr_review/13075.md
@@ -1,97 +0,0 @@
-I'll review this PR that addresses PyTorch version compatibility for distributed operations.
-
-## Summary of Changes
-
-The PR refactors the `gather_size_by_comm` function in `_modeling_parallel.py` to handle PyTorch versions prior to 2.6 that don't have the `torch.accelerator` API. The changes replace a single ternary expression with a multi-level conditional that:
-
-1. First checks if "cpu" is in the backend string
-2. Then checks if `torch.accelerator` exists (PyTorch >= 2.6)
-3. Falls back to CUDA as a default device
-
-## Potential Issues or Bugs
-
-**1. Device Type Inconsistency**
-The original code returns a string `"cpu"` but the new code returns `torch.device("cuda")` objects. This inconsistency could cause issues:
-
-```python
-gather_device = "cpu"  # str
-# vs
-gather_device = torch.device("cuda")  # torch.device object
-```
-
-**Recommendation:** Use `torch.device()` consistently:
-```python
-if "cpu" in comm_backends:
-    gather_device = torch.device("cpu")
-elif hasattr(torch, "accelerator"):
-    acc = torch.accelerator.current_accelerator()
-    gather_device = torch.device(acc if acc is not None else "cuda")
-else:
-    gather_device = torch.device("cuda")
-```
-
-**2. Unclear Accelerator Return Behavior**
-The comment states "Fall back to CUDA when no accelerator is returned" but it's unclear when `torch.accelerator.current_accelerator()` would return `None`. This should be verified or documented.
-
-**3. Missing Type Information**
-What type does `torch.accelerator.current_accelerator()` return? If it returns a string like `"cuda"` or `"mps"`, the code should handle it consistently. If it returns a device object, the logic might need adjustment.
-
-## Code Quality Observations
-
-**Positive:**
- Clear comments explaining the fallback logic
- Proper use of `hasattr()` for backward compatibility
- Addresses the reported issue #13074
-
-**Areas for Improvement:**
-
-1. **Device type consistency** (mentioned above)
-
-2. **Consider alternative hardware accelerators:** The fallback to CUDA might not be appropriate for all systems (e.g., MPS on macOS, XPU on Intel). Consider:
-   ```python
-   else:
-       # Fallback for PyTorch < 2.6
-       if torch.cuda.is_available():
-           gather_device = torch.device("cuda")
-       else:
-           gather_device = torch.device("cpu")
-   ```
-
-3. **Code style:** The expanded conditional is more readable but could benefit from extracting into a helper function if this pattern appears elsewhere:
-   ```python
-   def _get_gather_device(comm_backends: str) -> torch.device:
-       """Determine device for distributed gather operations."""
-       # ... implementation
-   ```
-
-## Security Considerations
-
-No significant security issues identified. This is primarily a compatibility fix for internal device selection logic.
-
-## Suggestions for Improvement
-
-1. **Add a test case** to verify behavior on PyTorch < 2.6 (if not already covered)
-
-2. **Document the behavior** more explicitly:
-   ```python
-   # Determine gather device based on backend and PyTorch version
-   # Priority: CPU backend > torch.accelerator (>= 2.6) > CUDA fallback (< 2.6)
-   ```
-
-3. **Consider this more defensive approach:**
-   ```python
-   if "cpu" in comm_backends:
-       gather_device = torch.device("cpu")
-   elif hasattr(torch, "accelerator"):
-       acc = torch.accelerator.current_accelerator()
-       gather_device = torch.device(acc if acc else "cuda")
-   elif torch.cuda.is_available():
-       gather_device = torch.device("cuda")
-   else:
-       # Fallback to CPU if no GPU available
-       gather_device = torch.device("cpu")
-   ```
-
-## Verdict
-
-The PR addresses the compatibility issue but has a **type inconsistency bug** that should be fixed before merging. The string vs `torch.device` object mismatch could cause runtime errors. Once that's addressed, the change is sound for backward compatibility.
--- a/pr_review/13116.md
+++ b/pr_review/13116.md
@@ -1,66 +0,0 @@
-# PR #13116: [tests] tests for `modules_to_not_convert`
-
-**Author:** @sayakpaul
-**Branch:** `fix-modules-no-convert-torchao` -> `main`
-**Files changed:** `tests/models/testing_utils/quantization.py`, `tests/models/transformers/test_models_transformer_flux.py`
-
---
-
-## Summary
-
-This PR fixes the `modules_to_not_convert` tests that were effectively dead code. They existed in the base `QuantizationTesterMixin` but never ran because no test class defined `modules_to_not_convert_for_test`. The PR activates these tests for Flux and fixes several underlying bugs that would have caused them to fail.
-
-## Key Changes
-
-1. **BnB config key fix**: `BitsAndBytesConfig` uses `llm_int8_skip_modules`, not `modules_to_not_convert`. The base test was setting the wrong key, so modules were never actually excluded.
-
-2. **TorchAO `_verify_if_layer_quantized` fix**: Previously only checked `isinstance(module, torch.nn.Linear)`, which is always true for TorchAO (it doesn't replace the module class). Now properly checks weight tensor types (`AffineQuantizedTensor`, `LinearActivationQuantizedTensor`).
-
-3. **`_is_module_quantized` fix**: Now passes `quant_config_kwargs` to `_verify_if_layer_quantized`. Previously it passed `{}`, which caused BnB to always check for `Int8Params` even on 4-bit models.
-
-4. **Cleanup**: Removes unused guard blocks (`is_gguf_available`, `is_torchao_available`) that only contained `pass`.
-
-5. **Activates tests**: Adds `modules_to_not_convert_for_test` returning `["norm_out.linear"]` to BnB, Quanto, TorchAo, and ModelOpt Flux test classes.
-
-## Issues
-
-### `to_not_convert_key` parameter pollutes the base class interface
-
-`quantization.py:271-273` — The new `to_not_convert_key` parameter on `_test_quantization_modules_to_not_convert` exists solely for BnB's naming quirk (`llm_int8_skip_modules` vs `modules_to_not_convert`). Every other backend uses the default. This leaks a BnB-specific detail into the shared base method.
-
-BnB already has its own `test_bnb_modules_to_not_convert` that could handle the key translation locally — either by building the correct `config_kwargs` with `llm_int8_skip_modules` before calling `_create_quantized_model` directly, or by overriding the test. This keeps the base method clean and isolates BnB's naming quirk in `BitsAndBytesTesterMixin` where it belongs.
-
-### Code duplication in TorchAO `test_torchao_modules_to_not_convert`
-
-`quantization.py:915-950` — The TorchAO test inlines ~30 lines from `_test_quantization_modules_to_not_convert` to skip the memory footprint comparison. If the base method is updated in the future, this copy won't get the fix. Consider parameterizing the base method instead:
-
-```python
-def _test_quantization_modules_to_not_convert(
-    self, config_kwargs, modules_to_not_convert, check_memory_footprint=True,
-):
-    # ... existing module-walking logic ...
-
-    if check_memory_footprint:
-        # Compare memory footprint with fully quantized model
-        ...
-```
-
-Then TorchAO could simply call:
-```python
-self._test_quantization_modules_to_not_convert(
-    TorchAoConfigMixin.TORCHAO_QUANT_TYPES["int8wo"], modules_to_exclude,
-    check_memory_footprint=False,
-)
-```
-
-### TorchAO imports inside method body
-
-`quantization.py:822-823` — The `torchao` imports are placed inside `_verify_if_layer_quantized`. While functional (avoids import errors when torchao isn't installed), these could be placed at module level under the existing `is_torchao_available()` guard for consistency with how `bnb` and `QLinear` imports are handled. Minor style point.
-
-### `_is_module_quantized` callers not updated
-
-`quantization.py:368` — The `_test_dequantize` method still calls `self._is_module_quantized(module)` without `quant_config_kwargs`. This happens to work correctly (for BnB, checking `Int8Params` after dequantization correctly returns False; for TorchAO, the weight won't be an `AffineQuantizedTensor`), but it means BnB dequantize for 4-bit models asserts the weight is not `Int8Params` rather than asserting it's not `Params4bit`. Consider updating for correctness.
-
-### Missing GGUF test coverage
-
-GGUF's `GGUFTesterMixin` doesn't have a `test_gguf_modules_to_not_convert` method. If GGUF is expected to support `modules_to_not_convert`, a test should be added. If not, a comment explaining why would be helpful.
--- a/pr_review/pr_12700_flashpack.md
+++ b/pr_review/pr_12700_flashpack.md
@@ -1,144 +0,0 @@
-# PR #12700 — FlashPack Integration Review
-
-**URL**: https://github.com/huggingface/diffusers/pull/12700
-**State**: OPEN
-**Branch**: `flashpack` → `main`
-
-## Summary
-
-Adds FlashPack as a new weight serialization format for faster model loading. FlashPack packs model weights into a single contiguous file (`model.flashpack`) that can be loaded efficiently, especially for larger models. The PR integrates it across `ModelMixin` (save/load), `DiffusionPipeline` (save/load/download), and supporting utilities.
-
-## Files Changed
-
- `setup.py` / `dependency_versions_table.py` — add `flashpack` dependency
- `src/diffusers/utils/constants.py` — `FLASHPACK_WEIGHTS_NAME`, `FLASHPACK_FILE_EXTENSION`
- `src/diffusers/utils/import_utils.py` — `is_flashpack_available()`
- `src/diffusers/utils/__init__.py` — re-exports
- `src/diffusers/models/model_loading_utils.py` — `load_flashpack_checkpoint()`, dispatch in `load_state_dict()`
- `src/diffusers/models/modeling_utils.py` — `save_pretrained(use_flashpack=...)`, `from_pretrained(use_flashpack=..., flashpack_kwargs=...)`
- `src/diffusers/pipelines/pipeline_utils.py` — pipeline-level `save_pretrained`, `from_pretrained`, `download` with `use_flashpack`
- `src/diffusers/pipelines/pipeline_loading_utils.py` — `load_sub_model`, `_get_ignore_patterns`, `get_class_obj_and_candidates`, `filter_model_files`
-
---
-
-## Issues
-
-### 1. `use_flashpack=True` default in `DiffusionPipeline.download()`
-
-```python
-# pipeline_utils.py, in download()
-use_flashpack = kwargs.pop("use_flashpack", True)
-```
-
-This defaults to `True`, meaning `download()` will always try to download FlashPack files by default. Every other call site defaults to `False`. This looks like a bug — it would change download behavior for all users even if they never asked for FlashPack. Should be `False`.
-
-### 2. `load_flashpack_checkpoint` is unused in the `from_pretrained` path
-
-`load_flashpack_checkpoint()` is added to `model_loading_utils.py` and wired into `load_state_dict()`. However, in `ModelMixin.from_pretrained`, when `use_flashpack=True`, the code **early-returns** after calling `flashpack.mixin.assign_from_file()` directly — it never goes through `load_state_dict()`. So `load_flashpack_checkpoint` is dead code in the `from_pretrained` flow. Either:
- Remove it if FlashPack always uses its own assign path, or
- Use it consistently (load state dict → assign to model, like safetensors/pickle)
-
-### 3. `resolved_model_file` may be undefined when `use_flashpack=True` and file fetch fails
-
-```python
-# modeling_utils.py, from_pretrained
-elif use_flashpack:
-    try:
-        resolved_model_file = _get_model_file(...)
-    except IOError as e:
-        logger.error(...)
-        if not allow_pickle:
-            raise
-        logger.warning("Defaulting to unsafe serialization...")
-```
-
-If the `IOError` is caught and `allow_pickle` is truthy, `resolved_model_file` is never set but is used later at `flashpack.mixin.assign_from_file(model=model, path=resolved_model_file[0], ...)`. This would crash with `NameError` or `UnboundLocalError`. The fallback logic (copied from the safetensors block) doesn't make sense for FlashPack — there's no pickle fallback for FlashPack. The `except` block should just re-raise unconditionally.
-
-### 4. `resolved_model_file[0]` assumes a list, but `_get_model_file` returns a string
-
-```python
-flashpack.mixin.assign_from_file(
-    model=model,
-    path=resolved_model_file[0],  # indexing into a string
-    ...
-)
-```
-
-`_get_model_file` returns a single file path (string), not a list. `resolved_model_file[0]` would give the first character of the path. Should be just `resolved_model_file`.
-
-### 5. `device_map` handling assumes `device_map[""]` exists
-
-```python
-flashpack_device = device_map[""]
-```
-
-`device_map` can be a dict with arbitrary keys (layer names, module names), not just `{"": device}`. This would raise `KeyError` for any non-trivial device map. Should handle the general case or document the constraint.
-
-### 6. `FlashPack` prefix stripping in `get_class_obj_and_candidates` is unexplained
-
-```python
-if class_name.startswith("FlashPack"):
-    class_name = class_name.removeprefix("FlashPack")
-```
-
-This is injected into a general-purpose utility function with no explanation of when/why a class name would have a `FlashPack` prefix. This seems like it handles a specific config format but there's no corresponding code that writes `FlashPack`-prefixed class names. If this is for some external convention, it should be documented. If not needed, remove it.
-
-### 7. Duplicated availability check pattern
-
-The `is_flashpack_available()` check + import + error message pattern is repeated 3 times:
- `load_flashpack_checkpoint()` in `model_loading_utils.py`
- `save_pretrained()` in `modeling_utils.py`
- `from_pretrained()` in `modeling_utils.py`
-
-Each has slightly different wording. Should be consolidated — e.g., a helper or just use a single `require_flashpack()` function, consistent with how other optional deps are handled.
-
-### 8. `save_pretrained` error message says "load" instead of "save"
-
-```python
-# modeling_utils.py, save_pretrained, use_flashpack=True branch
-raise ImportError("Please install torch and flashpack to load a FlashPack checkpoint in PyTorch.")
-```
-
-This is in the **save** path, but the message says "load". Should say "save".
-
-### 9. No `config.json` saved alongside FlashPack weights in `save_pretrained`
-
-When `use_flashpack=True` in `ModelMixin.save_pretrained`, the model config is saved normally at the top of the method, but the FlashPack branch calls `flashpack.serialization.pack_to_file()` with `target_dtype=self.dtype`. It's not clear if FlashPack's own `config.json` (mentioned in the benchmark script as `flashpack_config.json`) is the same as diffusers' `config.json`. If they're different files, loading back with `from_pretrained(use_flashpack=True)` might fail to reconstruct the model architecture since `from_config` needs the diffusers config.
-
-### 10. `output_loading_info` warning placement
-
-```python
-if output_loading_info:
-    logger.warning("`output_loading_info` is not supported with FlashPack.")
-    return model, {}
-```
-
-This returns an empty dict silently. The warning is fine, but returning `{}` instead of a proper `loading_info` structure (with `missing_keys`, `unexpected_keys`, etc.) could break code that destructures the result.
-
-### 11. No tests included
-
-The PR has no test files. At minimum there should be:
- Unit tests for `load_flashpack_checkpoint` (mocking `flashpack`)
- Unit tests for save/load roundtrip with `use_flashpack=True`
- Integration test for pipeline save/load
-
-### 12. FlashPack doesn't support sharding
-
-The `save_pretrained` FlashPack branch ignores `max_shard_size` entirely and always saves a single file. This is fine for the format but should either:
- Log a warning if `max_shard_size` is explicitly set alongside `use_flashpack=True`
- Document this limitation
-
---
-
-## Minor Issues
-
- The benchmark in the PR description shows FlashPack is actually **slower** for fp16 SD v1.5 (0.95x). The claimed benefit is only for bf16. This should be prominently noted.
- `FLASHPACK_WEIGHTS_NAME = "model.flashpack"` breaks the diffusers naming convention (`diffusion_pytorch_model.*` for other formats).
- The PR modifies `_get_ignore_patterns` but doesn't handle the case where both `use_safetensors` and `use_flashpack` are True.
- `filter_model_files` adds `FLASHPACK_WEIGHTS_NAME` to the known weights list but there are no corresponding tests for this filtering.
-
---
-
-## Verdict
-
-The PR needs significant work before it's mergeable. The critical issues are the `use_flashpack=True` default in `download()`, the `resolved_model_file[0]` indexing bug, the dead code path with `load_flashpack_checkpoint`, and the lack of tests. The integration pattern also doesn't feel consistent with how other formats (safetensors, GGUF) are integrated — FlashPack bypasses the standard state dict loading path entirely via its own `assign_from_file`, making it a special case that's harder to maintain.
--- a/pr_review/teacache_pr_12652_review.md
+++ b/pr_review/teacache_pr_12652_review.md
@@ -1,286 +0,0 @@
-# TeaCache PR #12652 Review Notes
-
-## PR Overview
-
- **PR**: https://github.com/huggingface/diffusers/pull/12652
- **Title**: Implement TeaCache
- **Author**: LawJarp-A (Prajwal A)
- **Status**: Open
- **Changes**: +1335 / -22 lines across 6 files
-
-### What is TeaCache?
-
-[TeaCache](https://huggingface.co/papers/2411.19108) (Timestep Embedding Aware Cache) is a training-free caching technique that speeds up diffusion model inference by **1.5x-2.6x** by reusing transformer block computations when consecutive timestep embeddings are similar.
-
-### Algorithm
-
-1. Extract modulated input from first transformer block (after norm1 + timestep embedding)
-2. Compute relative L1 distance vs previous timestep
-3. Apply model-specific polynomial rescaling: `c[0]*x^4 + c[1]*x^3 + c[2]*x^2 + c[3]*x + c[4]`
-4. Accumulate rescaled distance across timesteps
-5. If accumulated < threshold → Reuse cached residual (FAST)
-6. If accumulated >= threshold → Full transformer pass (SLOW, update cache)
-
---
-
-## The Mid-Forward Intercept Problem
-
-### Why TeaCache is Model-Specific
-
-TeaCache needs to intercept **within** a model's forward method, not just at module boundaries:
-
-```
-┌─────────────────────────────────────────────────────────────┐
-│  Model Forward                                              │
-│                                                             │
-│  PREPROCESSING (must always run)                            │
-│  ├── x_embedder(hidden_states)                              │
-│  ├── time_text_embed(timestep, ...)                         │
-│  └── context_embedder(encoder_hidden_states)                │
-│                                                             │
-│  ═══════════════════════════════════════════════════════════│
-│  DECISION POINT ◄── TeaCache needs to intercept HERE        │
-│  └── Extract: transformer_blocks[0].norm1(hs, temb)[0]      │
-│  ═══════════════════════════════════════════════════════════│
-│                                                             │
-│  CACHEABLE REGION (can be skipped if cached)                │
-│  ├── for block in transformer_blocks: ...                   │
-│  └── for block in single_transformer_blocks: ...            │
-│                                                             │
-│  POSTPROCESSING (must always run)                           │
-│  ├── norm_out(hidden_states, temb)                          │
-│  └── proj_out(hidden_states)                                │
-└─────────────────────────────────────────────────────────────┘
-```
-
-PyTorch hooks only intercept at **module boundaries** (before/after `forward()`), not within a forward method. The `for` loop over blocks is Python control flow - there's no hook point to skip it.
-
-### Workaround: Custom Forward Replacement
-
-The PR replaces the entire model forward with a custom implementation that has cache logic inserted at the right point. This works but requires maintaining separate forward functions for each model.
-
---
-
-## Comparison of Caching Approaches
-
-### TeaCache vs FirstBlockCache vs FasterCache
-
-| Aspect | TeaCache | FirstBlockCache | FasterCache |
-|--------|----------|-----------------|-------------|
-| **Hook target** | Model forward | Transformer blocks | Attention layers |
-| **Decision signal** | Modulated input (norm1 output) | Block output residual | Iteration count |
-| **Where signal is** | Inside first block | Block boundary | Attention output |
-| **Model-specific needs** | norm1 structure | Block output format | Attention class type |
-| **Model-agnostic?** | ❌ No | ✅ Yes | ✅ Yes |
-
-### Why FirstBlockCache is Model-Agnostic
-
-FirstBlockCache uses the **first block's output residual** as its signal:
-
-```python
-# FirstBlockCache: hooks individual blocks
-def new_forward(self, module, *args, **kwargs):
-    original_hidden_states = args[0]
-    output = self.fn_ref.original_forward(*args, **kwargs)  # Run block fully
-    residual = output - original_hidden_states  # Signal from OUTPUT
-    should_compute = self._compare_residual(residual)
-    ...
-```
-
-It doesn't need to understand block internals - just input and output.
-
-### Why FasterCache is Model-Agnostic
-
-FasterCache hooks **attention layers** (not blocks) using class type checking:
-
-```python
-_ATTENTION_CLASSES = (Attention, MochiAttention, AttentionModuleMixin)
-
-for name, submodule in module.named_modules():
-    if isinstance(submodule, _ATTENTION_CLASSES):
-        # Hook this attention module
-```
-
-All transformer models use standardized attention classes.
-
---
-
-## Model Architecture Analysis
-
-### Models That Fit TeaCache Pattern
-
-Models with `norm1(hidden_states, temb)` returning modulated input:
-
-| Model | norm1 Signature | Modulation Location | Single Residual |
-|-------|----------------|---------------------|-----------------|
-| FLUX 1 | `norm1(hs, emb=temb) → (tensor, gate)` | Inside norm1 | ✅ |
-| FLUX Kontext | `norm1(hs, emb=temb) → (tensor, gate)` | Inside norm1 | ✅ |
-| Mochi | `norm1(hs, temb) → (tensor, g, s, g)` | Inside norm1 | ✅ |
-| Lumina2 | `norm1(hs, temb) → (tensor, gate)` | Inside norm1 | ✅ |
-
-### Models That DON'T Fit Pattern
-
-| Model | norm1 Signature | Modulation Location | Issue |
-|-------|----------------|---------------------|-------|
-| **FLUX 2** | `norm1(hs) → tensor` | Outside norm1 | Plain LayerNorm |
-| **Wan** | `norm1(hs) → tensor` | Outside norm1 | Plain LayerNorm |
-| **ZImage** | `attention_norm1(x) → tensor` | Outside norm1 | Plain LayerNorm |
-| **CogVideoX** | N/A (uses `emb` directly) | N/A | Dual residual needed |
-
-### FLUX 1 vs FLUX 2 Architecture Difference
-
-**FLUX 1** (AdaLayerNorm - modulation inside):
-```python
-class FluxTransformerBlock:
-    self.norm1 = AdaLayerNormZero(dim)  # Takes temb!
-
-    def forward(self, hidden_states, temb, ...):
-        norm_hs, gate = self.norm1(hidden_states, emb=temb)  # Modulation inside
-```
-
-**FLUX 2** (Plain LayerNorm - modulation outside):
-```python
-class Flux2TransformerBlock:
-    self.norm1 = nn.LayerNorm(dim)  # NO temb!
-
-    def forward(self, hidden_states, temb_mod_params_img, ...):
-        (shift_msa, scale_msa, gate_msa), ... = temb_mod_params_img
-        norm_hs = self.norm1(hidden_states)  # Plain norm
-        norm_hs = (1 + scale_msa) * norm_hs + shift_msa  # Modulation outside
-```
-
-FLUX 2 follows the Wan/ZImage pattern and would need a separate custom forward.
-
---
-
-## CogVideoX: The Architectural Outlier
-
-CogVideoX has two unique requirements that don't fit the pattern:
-
-### 1. Different Modulated Input Source
-
-```python
-# Other models: extract from norm1
-modulated_inp = block.norm1(hidden_states, temb)[0]
-
-# CogVideoX: uses timestep embedding directly
-modulated_inp = emb  # Just the embedding, computed before blocks!
-```
-
-### 2. Dual Residual Caching
-
-CogVideoX blocks return and modify TWO tensors:
-```python
-def forward(self, hidden_states, encoder_hidden_states, temb, ...):
-    # Both are modified!
-    return hidden_states, encoder_hidden_states
-```
-
-Requires caching two residuals:
-```python
-state.previous_residual = hs_output - hs_input
-state.previous_residual_encoder = enc_output - enc_input  # Extra!
-```
-
---
-
-## Recommendations
-
-### Simplification: FLUX-Only Support
-
-Given the architectural diversity, recommend supporting only FLUX 1 and FLUX Kontext initially:
-
-```python
-_MODEL_CONFIG = {
-    "FluxKontext": {
-        "forward_func": _flux_teacache_forward,
-        "coefficients": [-1.04655119e03, 3.12563399e02, -1.69500694e01, 4.10995971e-01, 3.74537863e-02],
-    },
-    "Flux": {
-        "forward_func": _flux_teacache_forward,
-        "coefficients": [4.98651651e02, -2.83781631e02, 5.58554382e01, -3.82021401e00, 2.64230861e-01],
-    },
-}
-```
-
-### What to Remove from PR
-
-1. **CogVideoX support** - Dual residual architecture doesn't fit
-2. **Mochi support** - Can be added later if needed
-3. **Lumina2 support** - Can be added later if needed
-4. **FLUX 2 support** - Different architecture (plain LayerNorm)
-
-### Estimated Code Reduction
-
-| Component | Original (PR) | FLUX-Only |
-|-----------|---------------|-----------|
-| Forward functions | 4 (~400 lines) | 1 (~100 lines) |
-| Model configs | 10 entries | 2 entries |
-| State fields | 8 | 5 |
-| Utility functions | 6 | 3 |
-| **Total teacache.py** | ~900 lines | ~350 lines |
-
-### Simplified State
-
-```python
-class TeaCacheState(BaseState):
-    def __init__(self):
-        self.cnt = 0
-        self.num_steps = 0
-        self.accumulated_rel_l1_distance = 0.0
-        self.previous_modulated_input = None
-        self.previous_residual = None
-        # Removed: previous_residual_encoder (CogVideoX)
-        # Removed: cache_dict (Lumina2)
-        # Removed: uncond_seq_len (Lumina2)
-```
-
---
-
-## Why Custom Forwards Are Necessary
-
-Despite the maintenance burden, custom forwards are the pragmatic approach for TeaCache because:
-
-1. **Mid-forward intercept required** - Need to access `norm1` output before blocks run
-2. **Architectural diversity** - Models differ in where/how modulation happens
-3. **Block-level hooks insufficient** - Can't extract modulated input from block hooks
-4. **Algorithm requirements** - TeaCache paper specifically uses modulated input as signal
-
-### Alternative Approaches Considered
-
-| Approach | Works? | Issue |
-|----------|--------|-------|
-| Block-level hooks (like FirstBlockCache) | ❌ | Can't access modulated input inside block |
-| Attention-level hooks (like FasterCache) | ❌ | Different algorithm, not TeaCache |
-| Hook norm1 directly | ⚠️ | norm1 interface varies per model |
-| Hybrid (FirstBlockCache signal + TeaCache algorithm) | ⚠️ | Loses "optimal" signal per paper |
-
---
-
-## PR Code Quality Issues (From Review)
-
-1. **torch.compile incompatibility** - `.item()` calls in `_compute_rel_l1_distance` create graph breaks
-2. **Boundary check bug** - `state.cnt == state.num_steps - 1` when `num_steps=0` evaluates to `-1`
-3. **Incomplete Lumina2 state reset** - `cache_dict` and `uncond_seq_len` not reset
-4. **Model auto-detection fragility** - Substring matching relies on iteration order
-
---
-
-## Extension Path
-
-If support for additional models is needed later:
-
-1. **Mochi** - Same pattern as FLUX, just add coefficients and reuse `_flux_teacache_forward` or create similar
-2. **Lumina2** - Same pattern but needs per-sequence-length caching for CFG
-3. **FLUX 2 / Wan / ZImage** - Need separate forwards that extract modulated input differently
-4. **CogVideoX** - Needs dual residual support, significant additional complexity
-
---
-
-## Summary
-
- **TeaCache requires custom forwards** due to mid-forward intercept requirement
- **FLUX 1 + FLUX Kontext only** is the recommended scope for initial implementation
- **~60% code reduction** possible by removing unsupported models
- **Clear extension path** for adding models later as needed
- **Maintenance burden** is acceptable given the architectural constraints
--- a/release_notes/v0.37.0.md
+++ b/release_notes/v0.37.0.md
@@ -1,129 +0,0 @@
-# Diffusers v0.37.0 Release Notes
-
-*Release based on 191 commits since v0.36.0*
-
---
-
-## Highlights
-
- **Modular Pipelines overhaul**: Major investment in the modular pipeline system with explicit workflow support, improved loaders, documentation, and modular implementations for Wan, Flux2, Z-Image, Qwen, and Mellon pipelines.
- **New pipelines and models**: Cosmos Predict2.5, LTX 2.0 Video, LongCat-Image, Fibo Edit, Z-Image Omni Base, and more.
- **Distributed inference improvements**: Unified Sequence Parallel attention, Ulysses Anything Attention, and context parallel support in native flash attention.
- **Python 3.8 dropped**: Sunset Python 3.8 and cleaned up explicit `typing` exports.
-
---
-
-## New Pipelines and Models
-
- **Cosmos Predict2.5**: Base inference pipeline, scheduler, and checkpoint conversion; 14B model support (#12852, #12863)
- **Cosmos Transfer2.5**: General transfer pipelines for segmentation, depth, blur, and edge (#13066)
- **LTX 2.0 Video Pipelines**: New video generation pipelines (#12915), distilled checkpoint support (#12934), single-file loading (#12983), LoRA support (#12933), long multi-prompt (#12614)
- **LongCat-Image**: New pipeline with offloading/quantization support and regional compile acceleration (#12828, #12963, #12699, #13019, #13021)
- **Fibo Edit Pipeline**: New editing pipeline (#12930)
- **Z-Image Omni Base**: New implementation (#12857)
- **Z-Image Turbo ControlNet**: ControlNet support for Z-Image Turbo (#12792)
- **Z-Image Inpaint Pipeline**: Inpainting support (#13006)
- **Z-Image ControlNet CFG**: CFG support for Z-Image ControlNet (#13080)
- **Chroma Inpaint Pipeline**: New inpainting pipeline for Chroma (#12848)
- **Flux2 Klein**: New model variant (#12982)
- **Qwen Image Edit 2511**: New editing support (#12839)
- **Qwen Image Layered Support** (#12853)
-
-## Modular Pipelines
-
- Explicit workflow support for modular pipelines (#13028)
- Modular implementations for: Wan (#13063), Flux2 (#12763), Z-Image (#12808), Qwen (#12872), Mellon (#12978, #12924, #13051)
- Improved loader support (#13025)
- Custom block tests (#12557)
- Auto-docstring generation and documentation refactors (#12958)
- Quick start guide (#13029)
- Guard `ModularPipeline.blocks` attribute (#13014)
- Better docstrings and template pipeline card (#13072, #12932)
-
-## Core Improvements
-
- **Device-type device maps with offloading support** (#12811)
- **`disable_mmap` in pipeline `from_pretrained`** (#12854)
- **`apply_lora_scale` helper** to remove boilerplate (#12994)
- **MagCache support**: Caching mechanism for faster inference (#12744)
- **Mambo-G Guidance**: New guider implementation (#12862)
- **Laplace Scheduler for DDPM** (#11320)
- **Custom sigmas in UniPCMultistepScheduler** (#12109)
- **Control-LoRA support** (#10686)
- **Latent Perceptual Loss (LPL) for SDXL** (#11573)
- **MultiControlNet support for SD3 Inpainting** (#11251)
- Remove 8-bit device restriction (#12972)
- Graceful error for unsupported attn-backend / context-parallel combos (#12832)
- Handle progress bar and logging in distributed environments (#12806)
- Remove unneeded autoencoder methods from `AutoencoderMixin` subclasses (#12873)
- Remove k-diffusion support (#13152)
- Flag Flax schedulers as deprecated (#13031)
-
-## Distributed Inference
-
- **Unified Sequence Parallel attention** (#12693)
- **Ulysses Anything Attention** (#12996)
- **Context parallel in native flash attention** (#12829)
- NPU Ulysses attention support (#12919)
- Fix Wan 2.1 I2V context parallel (#12909)
- Fix Qwen-Image context parallel (#12970)
-
-## LoRA
-
- Z-Image LoRA training (#13056)
- Fix non-diffusers LoRA key handling for Flux2 (#13119)
- Fix LoRA loading for Flux2 Klein with adaptive block enumeration (#13030)
- Fix wrong LTX2 LoRA mixin (#13144)
-
-## Bug Fixes
-
- Fix QwenImageEditPlus on NPU (#13017)
- Fix MT5Tokenizer → use `T5Tokenizer` for Transformers v5.0+ compatibility (#12877)
- Fix Wan/WanI2V patchification (#13038)
- Fix LTX-2 inference with `num_videos_per_prompt > 1` and CFG (#13121)
- Fix Flux2 img2img prediction (#12855)
- Fix QwenImage `txt_seq_lens` handling (#12702)
- Fix `prefix_token_len` bug (#12845)
- Fix ftfy imports in Wan and SkyReels-V2 (#12314, #13113)
- Fix `is_fsdp` determination (#12960)
- Fix GLM-Image `get_image_features` API (#13052)
- Fix Wan 2.2 when either transformer isn't present (#13055)
- Fix guider issue (#13147)
- Fix torchao quantizer for new versions (#12901)
- Fix GGUF for unquantized types with unquantize kernels (#12498)
- Make Qwen hidden states contiguous for torchao (#13081)
- Make Flux hidden states contiguous (#13068)
- Fix Kandinsky 5 hardcoded CUDA autocast (#12814)
- Fix `aiter` availability check (#13059)
- Fix attention mask check for unsupported backends (#12892)
- Allow `prompt` and `prior_token_ids` simultaneously in `GlmImagePipeline` (#13092)
- GLM-Image batch support (#13007)
- Cosmos 2.5 Video2World frame extraction fix (#13018)
- ResNet: only use contiguous in training mode (#12977)
-
-## Testing and CI
-
- Refactor model tests (#12822)
- Refactor Wan model tests (#13082)
- Accept `recompile_limit` from user in tests (#13150)
- CodeQL workflow for security analysis (#12917)
- Upgrade GitHub Actions for Node 24 compatibility (#12865, #12866)
- Fix `setuptools` / `pkg_resources` CI bugs (#13129, #13132)
- CUDA 12.9 upgrade (#13045)
- FSDP option for Flux2 (#12860)
-
-## Documentation
-
- Custom code AutoModel guide (#13099)
- Remote inference docs (#12372)
- Improved distributed inference docs (#12810, #12827, #12971)
- Improved caching docs (#12684)
- Numerous scheduler docstring improvements (#12798, #12871, #12928, #12931, #12936, #12992, #13010, #13020, #13023, #13024, #13027, #13044, #13083, #13085, #13122, #13127, #13130)
- Various typo and syntax fixes
-
-## Breaking Changes
-
- **Python 3.8 support removed** (#12524)
- **k-diffusion removed** (#13152)
- **Flax schedulers flagged as deprecated** (#13031)
- ControlNet implementations outside the controlnet module removed (#12152)
--- a/scripts/compare_test_coverage.py
+++ b/scripts/compare_test_coverage.py
@@ -1,183 +0,0 @@
-#!/usr/bin/env python3
-"""
-Compare test coverage between main and model-test-refactor branches
-for the Flux transformer tests.
-
-Usage:
-    python scripts/compare_test_coverage.py
-"""
-
-import subprocess
-
-
-TEST_FILE = "tests/models/transformers/test_models_transformer_flux.py"
-BRANCHES = ["main", "model-test-refactor"]
-
-
-def run_command(cmd, capture=True):
-    """Run a shell command and return output."""
-    result = subprocess.run(cmd, shell=True, capture_output=capture, text=True)
-    return result.stdout, result.stderr, result.returncode
-
-
-def get_current_branch():
-    """Get the current git branch name."""
-    stdout, _, _ = run_command("git branch --show-current")
-    return stdout.strip()
-
-
-def stash_changes():
-    """Stash any uncommitted changes."""
-    run_command("git stash")
-
-
-def pop_stash():
-    """Pop stashed changes."""
-    run_command("git stash pop")
-
-
-def checkout_branch(branch):
-    """Checkout a git branch."""
-    _, stderr, code = run_command(f"git checkout {branch}")
-    if code != 0:
-        print(f"Failed to checkout {branch}: {stderr}")
-        return False
-    return True
-
-
-def collect_tests(test_file):
-    """Collect tests from a test file and return test info."""
-    cmd = f"python -m pytest {test_file} --collect-only -q 2>/dev/null"
-    stdout, stderr, code = run_command(cmd)
-
-    tests = []
-    for line in stdout.strip().split("\n"):
-        if "::" in line and not line.startswith("="):
-            tests.append(line.strip())
-
-    return tests
-
-
-def run_tests_verbose(test_file):
-    """Run tests and capture pass/skip/fail status."""
-    cmd = f"python -m pytest {test_file} -v --tb=no 2>&1"
-    stdout, _, _ = run_command(cmd)
-
-    results = {"passed": [], "skipped": [], "failed": [], "errors": []}
-
-    for line in stdout.split("\n"):
-        if " PASSED" in line:
-            test_name = line.split(" PASSED")[0].strip()
-            results["passed"].append(test_name)
-        elif " SKIPPED" in line:
-            test_name = line.split(" SKIPPED")[0].strip()
-            reason = ""
-            if "SKIPPED" in line and "[" in line:
-                reason = line.split("[")[-1].rstrip("]") if "[" in line else ""
-            results["skipped"].append((test_name, reason))
-        elif " FAILED" in line:
-            test_name = line.split(" FAILED")[0].strip()
-            results["failed"].append(test_name)
-        elif " ERROR" in line:
-            test_name = line.split(" ERROR")[0].strip()
-            results["errors"].append(test_name)
-
-    return results
-
-
-def compare_results(main_results, pr_results):
-    """Compare test results between branches."""
-    print("\n" + "=" * 70)
-    print("COVERAGE COMPARISON REPORT")
-    print("=" * 70)
-
-    print("\n## Test Counts")
-    print(f"{'Category':<20} {'main':<15} {'PR':<15} {'Diff':<10}")
-    print("-" * 60)
-
-    for category in ["passed", "skipped", "failed", "errors"]:
-        main_count = len(main_results[category])
-        pr_count = len(pr_results[category])
-        diff = pr_count - main_count
-        diff_str = f"+{diff}" if diff > 0 else str(diff)
-        print(f"{category:<20} {main_count:<15} {pr_count:<15} {diff_str:<10}")
-
-    main_tests = set(main_results["passed"] + [t[0] for t in main_results["skipped"]])
-    pr_tests = set(pr_results["passed"] + [t[0] for t in pr_results["skipped"]])
-
-    missing_in_pr = main_tests - pr_tests
-    new_in_pr = pr_tests - main_tests
-
-    if missing_in_pr:
-        print("\n## Tests in main but MISSING in PR:")
-        for test in sorted(missing_in_pr):
-            print(f"  - {test}")
-
-    if new_in_pr:
-        print("\n## NEW tests in PR (not in main):")
-        for test in sorted(new_in_pr):
-            print(f"  + {test}")
-
-    print("\n## Skipped Tests Comparison")
-    main_skipped = {t[0]: t[1] for t in main_results["skipped"]}
-    pr_skipped = {t[0]: t[1] for t in pr_results["skipped"]}
-
-    newly_skipped = set(pr_skipped.keys()) - set(main_skipped.keys())
-    no_longer_skipped = set(main_skipped.keys()) - set(pr_skipped.keys())
-
-    if newly_skipped:
-        print("\nNewly skipped in PR:")
-        for test in sorted(newly_skipped):
-            print(f"  - {test}: {pr_skipped.get(test, 'unknown reason')}")
-
-    if no_longer_skipped:
-        print("\nNo longer skipped in PR (now running):")
-        for test in sorted(no_longer_skipped):
-            print(f"  + {test}")
-
-    if not newly_skipped and not no_longer_skipped:
-        print("\nNo changes in skipped tests.")
-
-    print("\n" + "=" * 70)
-
-
-def main():
-    original_branch = get_current_branch()
-    print(f"Current branch: {original_branch}")
-
-    results = {}
-
-    print("Stashing uncommitted changes...")
-    stash_changes()
-
-    try:
-        for branch in BRANCHES:
-            print(f"\n--- Analyzing branch: {branch} ---")
-
-            if not checkout_branch(branch):
-                print(f"Skipping {branch}")
-                continue
-
-            print(f"Collecting and running tests from {TEST_FILE}...")
-            results[branch] = run_tests_verbose(TEST_FILE)
-
-            print(f"  Passed: {len(results[branch]['passed'])}")
-            print(f"  Skipped: {len(results[branch]['skipped'])}")
-            print(f"  Failed: {len(results[branch]['failed'])}")
-
-        checkout_branch(original_branch)
-
-        if "main" in results and "model-test-refactor" in results:
-            compare_results(results["main"], results["model-test-refactor"])
-        else:
-            print("Could not compare - missing results from one or both branches")
-
-    finally:
-        print("\nRestoring stashed changes...")
-        pop_stash()
-
-        checkout_branch(original_branch)
-
-
-if __name__ == "__main__":
-    main()
--- a/scripts/convert_cosmos_to_diffusers.py
+++ b/scripts/convert_cosmos_to_diffusers.py
@@ -94,15 +94,9 @@ python scripts/convert_cosmos_to_diffusers.py \
    --transformer_type Cosmos-2.5-Transfer-General-2B \
    --transformer_ckpt_path $transformer_ckpt_path \
    --vae_type wan2.1 \
-    --output_path converted/transfer/2b/general/depth/pipeline \
+    --output_path converted/transfer/2b/general/depth \
    --save_pipeline

-python scripts/convert_cosmos_to_diffusers.py \
-    --transformer_type Cosmos-2.5-Transfer-General-2B \
-    --transformer_ckpt_path $transformer_ckpt_path \
-    --vae_type wan2.1 \
-    --output_path converted/transfer/2b/general/depth/models
-
 # edge
 transformer_ckpt_path=~/.cache/huggingface/hub/models--nvidia--Cosmos-Transfer2.5-2B/snapshots/eb5325b77d358944da58a690157dd2b8071bbf85/general/edge/61f5694b-0ad5-4ecd-8ad7-c8545627d125_ema_bf16.pt

@@ -126,15 +120,9 @@ python scripts/convert_cosmos_to_diffusers.py \
    --transformer_type Cosmos-2.5-Transfer-General-2B \
    --transformer_ckpt_path $transformer_ckpt_path \
    --vae_type wan2.1 \
-    --output_path converted/transfer/2b/general/blur/pipeline \
+    --output_path converted/transfer/2b/general/blur \
    --save_pipeline

-python scripts/convert_cosmos_to_diffusers.py \
-    --transformer_type Cosmos-2.5-Transfer-General-2B \
-    --transformer_ckpt_path $transformer_ckpt_path \
-    --vae_type wan2.1 \
-    --output_path converted/transfer/2b/general/blur/models
-
 # seg
 transformer_ckpt_path=~/.cache/huggingface/hub/models--nvidia--Cosmos-Transfer2.5-2B/snapshots/eb5325b77d358944da58a690157dd2b8071bbf85/general/seg/5136ef49-6d8d-42e8-8abf-7dac722a304a_ema_bf16.pt

@@ -142,14 +130,8 @@ python scripts/convert_cosmos_to_diffusers.py \
    --transformer_type Cosmos-2.5-Transfer-General-2B \
    --transformer_ckpt_path $transformer_ckpt_path \
    --vae_type wan2.1 \
-    --output_path converted/transfer/2b/general/seg/pipeline \
+    --output_path converted/transfer/2b/general/seg \
    --save_pipeline
-
-python scripts/convert_cosmos_to_diffusers.py \
-    --transformer_type Cosmos-2.5-Transfer-General-2B \
-    --transformer_ckpt_path $transformer_ckpt_path \
-    --vae_type wan2.1 \
-    --output_path converted/transfer/2b/general/seg/models
 ```
 """

--- a/src/diffusers/hooks/_common.py
+++ b/src/diffusers/hooks/_common.py
@@ -48,7 +48,6 @@ _GO_LC_SUPPORTED_PYTORCH_LAYERS = (
    torch.nn.ConvTranspose2d,
    torch.nn.ConvTranspose3d,
    torch.nn.Linear,
-    torch.nn.Embedding,
    # TODO(aryan): look into torch.nn.LayerNorm, torch.nn.GroupNorm later, seems to be causing some issues with CogVideoX
    # because of double invocation of the same norm layer in CogVideoXLayerNorm
 )
--- a/src/diffusers/loaders/lora_conversion_utils.py
+++ b/src/diffusers/loaders/lora_conversion_utils.py
@@ -856,7 +856,7 @@ def _convert_kohya_flux_lora_to_diffusers(state_dict):
                )
            state_dict = {k: v for k, v in state_dict.items() if not k.startswith("text_encoders.t5xxl.transformer.")}

-        has_diffb = any("diff_b" in k and k.startswith(("lora_unet_", "lora_te_", "lora_te1_")) for k in state_dict)
+        has_diffb = any("diff_b" in k and k.startswith(("lora_unet_", "lora_te_")) for k in state_dict)
        if has_diffb:
            zero_status_diff_b = state_dict_all_zero(state_dict, ".diff_b")
            if zero_status_diff_b:
@@ -895,7 +895,7 @@ def _convert_kohya_flux_lora_to_diffusers(state_dict):
        state_dict = {
            _custom_replace(k, limit_substrings): v
            for k, v in state_dict.items()
-            if k.startswith(("lora_unet_", "lora_te_", "lora_te1_"))
+            if k.startswith(("lora_unet_", "lora_te_"))
        }

        if any("text_projection" in k for k in state_dict):
--- a/src/diffusers/loaders/lora_pipeline.py
+++ b/src/diffusers/loaders/lora_pipeline.py
@@ -5472,10 +5472,6 @@ class Flux2LoraLoaderMixin(LoraBaseMixin):
            logger.warning(warn_msg)
            state_dict = {k: v for k, v in state_dict.items() if "dora_scale" not in k}

-        is_peft_format = any(k.startswith("base_model.model.") for k in state_dict)
-        if is_peft_format:
-            state_dict = {k.replace("base_model.model.", "diffusion_model."): v for k, v in state_dict.items()}
-
        is_ai_toolkit = any(k.startswith("diffusion_model.") for k in state_dict)
        if is_ai_toolkit:
            state_dict = _convert_non_diffusers_flux2_lora_to_diffusers(state_dict)
--- a/src/diffusers/loaders/textual_inversion.py
+++ b/src/diffusers/loaders/textual_inversion.py
@@ -22,12 +22,7 @@ from tokenizers import Tokenizer as TokenizerFast
 from torch import nn

 from ..models.modeling_utils import load_state_dict
-from ..utils import (
-    _get_model_file,
-    is_accelerate_available,
-    is_transformers_available,
-    logging,
-)
+from ..utils import _get_model_file, is_accelerate_available, is_transformers_available, logging


 if is_transformers_available():
--- a/src/diffusers/models/attention_dispatch.py
+++ b/src/diffusers/models/attention_dispatch.py
@@ -62,8 +62,6 @@ _REQUIRED_FLEX_VERSION = "2.5.0"
 _REQUIRED_XLA_VERSION = "2.2"
 _REQUIRED_XFORMERS_VERSION = "0.0.29"

-logger = get_logger(__name__)  # pylint: disable=invalid-name
-
 _CAN_USE_FLASH_ATTN = is_flash_attn_available() and is_flash_attn_version(">=", _REQUIRED_FLASH_VERSION)
 _CAN_USE_FLASH_ATTN_3 = is_flash_attn_3_available()
 _CAN_USE_AITER_ATTN = is_aiter_available() and is_aiter_version(">=", _REQUIRED_AITER_VERSION)
@@ -75,18 +73,8 @@ _CAN_USE_XFORMERS_ATTN = is_xformers_available() and is_xformers_version(">=", _


 if _CAN_USE_FLASH_ATTN:
-    try:
-        from flash_attn import flash_attn_func, flash_attn_varlen_func
-        from flash_attn.flash_attn_interface import _wrapped_flash_attn_backward, _wrapped_flash_attn_forward
-    except (ImportError, OSError, RuntimeError) as e:
-        # Handle ABI mismatch or other import failures gracefully.
-        # This can happen when flash_attn was compiled against a different PyTorch version.
-        logger.warning(f"flash_attn is installed but failed to import: {e}. Falling back to native PyTorch attention.")
-        _CAN_USE_FLASH_ATTN = False
-        flash_attn_func = None
-        flash_attn_varlen_func = None
-        _wrapped_flash_attn_backward = None
-        _wrapped_flash_attn_forward = None
+    from flash_attn import flash_attn_func, flash_attn_varlen_func
+    from flash_attn.flash_attn_interface import _wrapped_flash_attn_backward, _wrapped_flash_attn_forward
 else:
    flash_attn_func = None
    flash_attn_varlen_func = None
@@ -95,47 +83,26 @@ else:


 if _CAN_USE_FLASH_ATTN_3:
-    try:
-        from flash_attn_interface import flash_attn_func as flash_attn_3_func
-        from flash_attn_interface import flash_attn_varlen_func as flash_attn_3_varlen_func
-    except (ImportError, OSError, RuntimeError) as e:
-        logger.warning(f"flash_attn_3 failed to import: {e}. Falling back to native attention.")
-        _CAN_USE_FLASH_ATTN_3 = False
-        flash_attn_3_func = None
-        flash_attn_3_varlen_func = None
+    from flash_attn_interface import flash_attn_func as flash_attn_3_func
+    from flash_attn_interface import flash_attn_varlen_func as flash_attn_3_varlen_func
 else:
    flash_attn_3_func = None
    flash_attn_3_varlen_func = None

 if _CAN_USE_AITER_ATTN:
-    try:
-        from aiter import flash_attn_func as aiter_flash_attn_func
-    except (ImportError, OSError, RuntimeError) as e:
-        logger.warning(f"aiter failed to import: {e}. Falling back to native attention.")
-        _CAN_USE_AITER_ATTN = False
-        aiter_flash_attn_func = None
+    from aiter import flash_attn_func as aiter_flash_attn_func
 else:
    aiter_flash_attn_func = None

 if _CAN_USE_SAGE_ATTN:
-    try:
-        from sageattention import (
-            sageattn,
-            sageattn_qk_int8_pv_fp8_cuda,
-            sageattn_qk_int8_pv_fp8_cuda_sm90,
-            sageattn_qk_int8_pv_fp16_cuda,
-            sageattn_qk_int8_pv_fp16_triton,
-            sageattn_varlen,
-        )
-    except (ImportError, OSError, RuntimeError) as e:
-        logger.warning(f"sageattention failed to import: {e}. Falling back to native attention.")
-        _CAN_USE_SAGE_ATTN = False
-        sageattn = None
-        sageattn_qk_int8_pv_fp8_cuda = None
-        sageattn_qk_int8_pv_fp8_cuda_sm90 = None
-        sageattn_qk_int8_pv_fp16_cuda = None
-        sageattn_qk_int8_pv_fp16_triton = None
-        sageattn_varlen = None
+    from sageattention import (
+        sageattn,
+        sageattn_qk_int8_pv_fp8_cuda,
+        sageattn_qk_int8_pv_fp8_cuda_sm90,
+        sageattn_qk_int8_pv_fp16_cuda,
+        sageattn_qk_int8_pv_fp16_triton,
+        sageattn_varlen,
+    )
 else:
    sageattn = None
    sageattn_qk_int8_pv_fp16_cuda = None
@@ -146,48 +113,26 @@ else:


 if _CAN_USE_FLEX_ATTN:
-    try:
-        # We cannot import the flex_attention function from the package directly because it is expected (from the
-        # pytorch documentation) that the user may compile it. If we import directly, we will not have access to the
-        # compiled function.
-        import torch.nn.attention.flex_attention as flex_attention
-    except (ImportError, OSError, RuntimeError) as e:
-        logger.warning(f"flex_attention failed to import: {e}. Falling back to native attention.")
-        _CAN_USE_FLEX_ATTN = False
-        flex_attention = None
-else:
-    flex_attention = None
+    # We cannot import the flex_attention function from the package directly because it is expected (from the
+    # pytorch documentation) that the user may compile it. If we import directly, we will not have access to the
+    # compiled function.
+    import torch.nn.attention.flex_attention as flex_attention


 if _CAN_USE_NPU_ATTN:
-    try:
-        from torch_npu import npu_fusion_attention
-    except (ImportError, OSError, RuntimeError) as e:
-        logger.warning(f"torch_npu failed to import: {e}. Falling back to native attention.")
-        _CAN_USE_NPU_ATTN = False
-        npu_fusion_attention = None
+    from torch_npu import npu_fusion_attention
 else:
    npu_fusion_attention = None


 if _CAN_USE_XLA_ATTN:
-    try:
-        from torch_xla.experimental.custom_kernel import flash_attention as xla_flash_attention
-    except (ImportError, OSError, RuntimeError) as e:
-        logger.warning(f"torch_xla failed to import: {e}. Falling back to native attention.")
-        _CAN_USE_XLA_ATTN = False
-        xla_flash_attention = None
+    from torch_xla.experimental.custom_kernel import flash_attention as xla_flash_attention
 else:
    xla_flash_attention = None


 if _CAN_USE_XFORMERS_ATTN:
-    try:
-        import xformers.ops as xops
-    except (ImportError, OSError, RuntimeError) as e:
-        logger.warning(f"xformers failed to import: {e}. Falling back to native attention.")
-        _CAN_USE_XFORMERS_ATTN = False
-        xops = None
+    import xformers.ops as xops
 else:
    xops = None

@@ -213,6 +158,8 @@ else:
    _register_fake = register_fake_no_op


+logger = get_logger(__name__)  # pylint: disable=invalid-name
+
 # TODO(aryan): Add support for the following:
 # - Sage Attention++
 # - block sparse, radial and other attention methods
@@ -329,11 +276,7 @@ class _HubKernelConfig:
 _HUB_KERNELS_REGISTRY: dict["AttentionBackendName", _HubKernelConfig] = {
    # TODO: temporary revision for now. Remove when merged upstream into `main`.
    AttentionBackendName._FLASH_3_HUB: _HubKernelConfig(
-        repo_id="kernels-community/flash-attn3",
-        function_attr="flash_attn_func",
-        revision="fake-ops-return-probs",
-        wrapped_forward_attr="flash_attn_interface._flash_attn_forward",
-        wrapped_backward_attr="flash_attn_interface._flash_attn_backward",
+        repo_id="kernels-community/flash-attn3", function_attr="flash_attn_func", revision="fake-ops-return-probs"
    ),
    AttentionBackendName._FLASH_3_VARLEN_HUB: _HubKernelConfig(
        repo_id="kernels-community/flash-attn3",
@@ -733,7 +676,7 @@ def _wrapped_flash_attn_3(
 ) -> tuple[torch.Tensor, torch.Tensor]:
    # Hardcoded for now because pytorch does not support tuple/int type hints
    window_size = (-1, -1)
-    result = flash_attn_3_func(
+    out, lse, *_ = flash_attn_3_func(
        q=q,
        k=k,
        v=v,
@@ -750,9 +693,7 @@ def _wrapped_flash_attn_3(
        pack_gqa=pack_gqa,
        deterministic=deterministic,
        sm_margin=sm_margin,
-        return_attn_probs=True,
    )
-    out, lse, *_ = result
    lse = lse.permute(0, 2, 1)
    return out, lse

@@ -1296,62 +1237,36 @@ def _flash_attention_3_hub_forward_op(
    if enable_gqa:
        raise ValueError("`enable_gqa` is not yet supported for flash-attn 3 hub kernels.")

-    config = _HUB_KERNELS_REGISTRY[AttentionBackendName._FLASH_3_HUB]
-    wrapped_forward_fn = config.wrapped_forward_fn
-    if wrapped_forward_fn is None:
-        raise RuntimeError(
-            "Flash attention 3 hub kernels must expose `flash_attn_interface._flash_attn_forward` "
-            "for context parallel execution."
-        )
-
-    if scale is None:
-        scale = query.shape[-1] ** (-0.5)
-
-    out, softmax_lse, *_ = wrapped_forward_fn(
-        query,
-        key,
-        value,
-        None,
-        None,  # k_new, v_new
-        None,  # qv
-        None,  # out
-        None,
-        None,
-        None,  # cu_seqlens_q/k/k_new
-        None,
-        None,  # seqused_q/k
-        None,
-        None,  # max_seqlen_q/k
-        None,
-        None,
-        None,  # page_table, kv_batch_idx, leftpad_k
-        None,
-        None,
-        None,  # rotary_cos/sin, seqlens_rotary
-        None,
-        None,
-        None,  # q_descale, k_descale, v_descale
-        scale,
+    func = _HUB_KERNELS_REGISTRY[AttentionBackendName._FLASH_3_HUB].kernel_fn
+    out = func(
+        q=query,
+        k=key,
+        v=value,
+        softmax_scale=scale,
        causal=is_causal,
-        window_size_left=window_size[0],
-        window_size_right=window_size[1],
-        attention_chunk=0,
+        qv=None,
+        q_descale=None,
+        k_descale=None,
+        v_descale=None,
+        window_size=window_size,
        softcap=softcap,
        num_splits=num_splits,
        pack_gqa=pack_gqa,
+        deterministic=deterministic,
        sm_margin=sm_margin,
+        return_attn_probs=return_lse,
    )

-    lse = softmax_lse.permute(0, 2, 1).contiguous() if return_lse else None
+    lse = None
+    if return_lse:
+        out, lse = out
+        lse = lse.permute(0, 2, 1).contiguous()

    if _save_ctx:
-        ctx.save_for_backward(query, key, value, out, softmax_lse)
+        ctx.save_for_backward(query, key, value)
        ctx.scale = scale
        ctx.is_causal = is_causal
-        ctx.window_size = window_size
-        ctx.softcap = softcap
-        ctx.deterministic = deterministic
-        ctx.sm_margin = sm_margin
+        ctx._hub_kernel = func

    return (out, lse) if return_lse else out

@@ -1360,49 +1275,54 @@ def _flash_attention_3_hub_backward_op(
    ctx: torch.autograd.function.FunctionCtx,
    grad_out: torch.Tensor,
    *args,
-    **kwargs,
+    window_size: tuple[int, int] = (-1, -1),
+    softcap: float = 0.0,
+    num_splits: int = 1,
+    pack_gqa: bool | None = None,
+    deterministic: bool = False,
+    sm_margin: int = 0,
 ):
-    config = _HUB_KERNELS_REGISTRY[AttentionBackendName._FLASH_3_HUB]
-    wrapped_backward_fn = config.wrapped_backward_fn
-    if wrapped_backward_fn is None:
-        raise RuntimeError(
-            "Flash attention 3 hub kernels must expose `flash_attn_interface._flash_attn_backward` "
-            "for context parallel execution."
+    query, key, value = ctx.saved_tensors
+    kernel_fn = ctx._hub_kernel
+    # NOTE: Unlike the FA2 hub kernel, the FA3 hub kernel does not expose separate wrapped forward/backward
+    # primitives (no `wrapped_forward_attr`/`wrapped_backward_attr` in its `_HubKernelConfig`). We
+    # therefore rerun the forward pass under `torch.enable_grad()` and differentiate through it with
+    # `torch.autograd.grad()`. This is a second forward pass during backward; it can be avoided once
+    # the FA3 hub exposes a dedicated fused backward kernel (analogous to `_wrapped_flash_attn_backward`
+    # in the FA2 hub), at which point this can be refactored to match `_flash_attention_hub_backward_op`.
+    with torch.enable_grad():
+        query_r = query.detach().requires_grad_(True)
+        key_r = key.detach().requires_grad_(True)
+        value_r = value.detach().requires_grad_(True)
+
+        out = kernel_fn(
+            q=query_r,
+            k=key_r,
+            v=value_r,
+            softmax_scale=ctx.scale,
+            causal=ctx.is_causal,
+            qv=None,
+            q_descale=None,
+            k_descale=None,
+            v_descale=None,
+            window_size=window_size,
+            softcap=softcap,
+            num_splits=num_splits,
+            pack_gqa=pack_gqa,
+            deterministic=deterministic,
+            sm_margin=sm_margin,
+            return_attn_probs=False,
        )
+        if isinstance(out, tuple):
+            out = out[0]

-    query, key, value, out, softmax_lse = ctx.saved_tensors
-    grad_query = torch.empty_like(query)
-    grad_key = torch.empty_like(key)
-    grad_value = torch.empty_like(value)
-
-    wrapped_backward_fn(
-        grad_out,
-        query,
-        key,
-        value,
-        out,
-        softmax_lse,
-        None,
-        None,  # cu_seqlens_q, cu_seqlens_k
-        None,
-        None,  # seqused_q, seqused_k
-        None,
-        None,  # max_seqlen_q, max_seqlen_k
-        grad_query,
-        grad_key,
-        grad_value,
-        ctx.scale,
-        ctx.is_causal,
-        ctx.window_size[0],
-        ctx.window_size[1],
-        ctx.softcap,
-        ctx.deterministic,
-        ctx.sm_margin,
-    )
-
-    grad_query = grad_query[..., : grad_out.shape[-1]]
-    grad_key = grad_key[..., : grad_out.shape[-1]]
-    grad_value = grad_value[..., : grad_out.shape[-1]]
+        grad_query, grad_key, grad_value = torch.autograd.grad(
+            out,
+            (query_r, key_r, value_r),
+            grad_out,
+            retain_graph=False,
+            allow_unused=False,
+        )

    return grad_query, grad_key, grad_value

@@ -2703,7 +2623,7 @@ def _flash_varlen_attention_3(
    key_packed = torch.cat(key_valid, dim=0)
    value_packed = torch.cat(value_valid, dim=0)

-    result = flash_attn_3_varlen_func(
+    out, lse, *_ = flash_attn_3_varlen_func(
        q=query_packed,
        k=key_packed,
        v=value_packed,
@@ -2713,13 +2633,7 @@ def _flash_varlen_attention_3(
        max_seqlen_k=max_seqlen_k,
        softmax_scale=scale,
        causal=is_causal,
-        return_attn_probs=return_lse,
    )
-    if isinstance(result, tuple):
-        out, lse, *_ = result
-    else:
-        out = result
-        lse = None
    out = out.unflatten(0, (batch_size, -1))

    return (out, lse) if return_lse else out
--- a/src/diffusers/models/auto_model.py
+++ b/src/diffusers/models/auto_model.py
@@ -30,126 +30,10 @@ class AutoModel(ConfigMixin):
    def __init__(self, *args, **kwargs):
        raise EnvironmentError(
            f"{self.__class__.__name__} is designed to be instantiated "
-            f"using the `{self.__class__.__name__}.from_pretrained(pretrained_model_name_or_path)`, "
-            f"`{self.__class__.__name__}.from_config(config)`, or "
+            f"using the `{self.__class__.__name__}.from_pretrained(pretrained_model_name_or_path)` or "
            f"`{self.__class__.__name__}.from_pipe(pipeline)` methods."
        )

-    @classmethod
-    def from_config(cls, pretrained_model_name_or_path_or_dict: str | os.PathLike | dict | None = None, **kwargs):
-        r"""
-        Instantiate a model from a config dictionary or a pretrained model configuration file with random weights (no
-        pretrained weights are loaded).
-
-        Parameters:
-            pretrained_model_name_or_path_or_dict (`str`, `os.PathLike`, or `dict`):
-                Can be either:
-
-                    - A string, the *model id* (for example `google/ddpm-celebahq-256`) of a pretrained model
-                      configuration hosted on the Hub.
-                    - A path to a *directory* (for example `./my_model_directory`) containing a model configuration
-                      file.
-                    - A config dictionary.
-
-            cache_dir (`Union[str, os.PathLike]`, *optional*):
-                Path to a directory where a downloaded pretrained model configuration is cached if the standard cache
-                is not used.
-            force_download (`bool`, *optional*, defaults to `False`):
-                Whether or not to force the (re-)download of the model configuration, overriding the cached version if
-                it exists.
-            proxies (`Dict[str, str]`, *optional*):
-                A dictionary of proxy servers to use by protocol or endpoint.
-            local_files_only(`bool`, *optional*, defaults to `False`):
-                Whether to only load local model configuration files or not.
-            token (`str` or *bool*, *optional*):
-                The token to use as HTTP bearer authorization for remote files.
-            revision (`str`, *optional*, defaults to `"main"`):
-                The specific model version to use.
-            trust_remote_code (`bool`, *optional*, defaults to `False`):
-                Whether to trust remote code.
-            subfolder (`str`, *optional*, defaults to `""`):
-                The subfolder location of a model file within a larger model repository on the Hub or locally.
-
-        Returns:
-            A model object instantiated from the config with random weights.
-
-        Example:
-
-        ```py
-        from diffusers import AutoModel
-
-        model = AutoModel.from_config("stable-diffusion-v1-5/stable-diffusion-v1-5", subfolder="unet")
-        ```
-        """
-        subfolder = kwargs.pop("subfolder", None)
-        trust_remote_code = kwargs.pop("trust_remote_code", False)
-
-        hub_kwargs_names = [
-            "cache_dir",
-            "force_download",
-            "local_files_only",
-            "proxies",
-            "revision",
-            "token",
-        ]
-        hub_kwargs = {name: kwargs.pop(name, None) for name in hub_kwargs_names}
-
-        if pretrained_model_name_or_path_or_dict is None:
-            raise ValueError(
-                "Please provide a `pretrained_model_name_or_path_or_dict` as the first positional argument."
-            )
-
-        if isinstance(pretrained_model_name_or_path_or_dict, (str, os.PathLike)):
-            pretrained_model_name_or_path = pretrained_model_name_or_path_or_dict
-            config = cls.load_config(pretrained_model_name_or_path, subfolder=subfolder, **hub_kwargs)
-        else:
-            config = pretrained_model_name_or_path_or_dict
-            pretrained_model_name_or_path = config.get("_name_or_path", None)
-
-        has_remote_code = "auto_map" in config and cls.__name__ in config["auto_map"]
-        trust_remote_code = resolve_trust_remote_code(
-            trust_remote_code, pretrained_model_name_or_path, has_remote_code
-        )
-
-        if has_remote_code and trust_remote_code:
-            class_ref = config["auto_map"][cls.__name__]
-            module_file, class_name = class_ref.split(".")
-            module_file = module_file + ".py"
-            model_cls = get_class_from_dynamic_module(
-                pretrained_model_name_or_path,
-                subfolder=subfolder,
-                module_file=module_file,
-                class_name=class_name,
-                **hub_kwargs,
-            )
-        else:
-            if "_class_name" in config:
-                class_name = config["_class_name"]
-                library = "diffusers"
-            elif "model_type" in config:
-                class_name = "AutoModel"
-                library = "transformers"
-            else:
-                raise ValueError(
-                    f"Couldn't find a model class associated with the config: {config}. Make sure the config "
-                    "contains a `_class_name` or `model_type` key."
-                )
-
-            from ..pipelines.pipeline_loading_utils import ALL_IMPORTABLE_CLASSES, get_class_obj_and_candidates
-
-            model_cls, _ = get_class_obj_and_candidates(
-                library_name=library,
-                class_name=class_name,
-                importable_classes=ALL_IMPORTABLE_CLASSES,
-                pipelines=None,
-                is_pipeline_module=False,
-            )
-
-        if model_cls is None:
-            raise ValueError(f"AutoModel can't find a model linked to {class_name}.")
-
-        return model_cls.from_config(config, **kwargs)
-
    @classmethod
    @validate_hf_hub_args
    def from_pretrained(cls, pretrained_model_or_path: str | os.PathLike | None = None, **kwargs):
--- a/src/diffusers/models/controlnets/controlnet_cosmos.py
+++ b/src/diffusers/models/controlnets/controlnet_cosmos.py
@@ -191,12 +191,7 @@ class CosmosControlNetModel(ModelMixin, ConfigMixin, FromOriginalModelMixin):
                dim=1,
            )

-        if condition_mask is not None:
-            control_hidden_states = torch.cat([control_hidden_states, condition_mask], dim=1)
-        else:
-            control_hidden_states = torch.cat(
-                [control_hidden_states, torch.zeros_like(controls_latents[:, :1])], dim=1
-            )
+        control_hidden_states = torch.cat([control_hidden_states, torch.zeros_like(controls_latents[:, :1])], dim=1)

        padding_mask_resized = transforms.functional.resize(
            padding_mask, list(control_hidden_states.shape[-2:]), interpolation=transforms.InterpolationMode.NEAREST
--- a/src/diffusers/modular_pipelines/modular_pipeline.py
+++ b/src/diffusers/modular_pipelines/modular_pipeline.py
@@ -2252,6 +2252,10 @@ class ModularPipeline(ConfigMixin, PushToHubMixin):
                new_component_spec = current_component_spec
                if hasattr(self, name) and getattr(self, name) is not None:
                    logger.warning(f"ModularPipeline.update_components: setting {name} to None (spec unchanged)")
+            elif current_component_spec.default_creation_method == "from_pretrained" and not (
+                hasattr(component, "_diffusers_load_id") and component._diffusers_load_id is not None
+            ):
+                new_component_spec = ComponentSpec(name=name, type_hint=type(component))
            else:
                new_component_spec = ComponentSpec.from_component(name, component)

--- a/src/diffusers/pipelines/audioldm2/pipeline_audioldm2.py
+++ b/src/diffusers/pipelines/audioldm2/pipeline_audioldm2.py
@@ -502,10 +502,6 @@ class AudioLDM2Pipeline(DiffusionPipeline):
                        text_input_ids,
                        attention_mask=attention_mask,
                    )
-                    # Extract the pooler output if it's a BaseModelOutputWithPooling (Transformers v5+)
-                    # otherwise use it directly (Transformers v4)
-                    if hasattr(prompt_embeds, "pooler_output"):
-                        prompt_embeds = prompt_embeds.pooler_output
                    # append the seq-len dim: (bs, hidden_size) -> (bs, seq_len, hidden_size)
                    prompt_embeds = prompt_embeds[:, None, :]
                    # make sure that we attend to this single hidden-state
@@ -614,10 +610,6 @@ class AudioLDM2Pipeline(DiffusionPipeline):
                        uncond_input_ids,
                        attention_mask=negative_attention_mask,
                    )
-                    # Extract the pooler output if it's a BaseModelOutputWithPooling (Transformers v5+)
-                    # otherwise use it directly (Transformers v4)
-                    if hasattr(negative_prompt_embeds, "pooler_output"):
-                        negative_prompt_embeds = negative_prompt_embeds.pooler_output
                    # append the seq-len dim: (bs, hidden_size) -> (bs, seq_len, hidden_size)
                    negative_prompt_embeds = negative_prompt_embeds[:, None, :]
                    # make sure that we attend to this single hidden-state
--- a/src/diffusers/pipelines/cosmos/pipeline_cosmos2_5_predict.py
+++ b/src/diffusers/pipelines/cosmos/pipeline_cosmos2_5_predict.py
@@ -287,9 +287,6 @@ class Cosmos2_5_PredictBasePipeline(DiffusionPipeline):
                truncation=True,
                padding="max_length",
            )
-            input_ids = (
-                input_ids["input_ids"] if not isinstance(input_ids, list) and "input_ids" in input_ids else input_ids
-            )
            input_ids = torch.LongTensor(input_ids)
            input_ids_batch.append(input_ids)

--- a/src/diffusers/pipelines/cosmos/pipeline_cosmos2_5_transfer.py
+++ b/src/diffusers/pipelines/cosmos/pipeline_cosmos2_5_transfer.py
@@ -17,6 +17,9 @@ from typing import Callable, Dict, List, Optional, Union
 import numpy as np
 import PIL.Image
 import torch
+import torchvision
+import torchvision.transforms
+import torchvision.transforms.functional
 from transformers import AutoTokenizer, Qwen2_5_VLForConditionalGeneration

 from ...callbacks import MultiPipelineCallbacks, PipelineCallback
@@ -51,13 +54,11 @@ else:
 logger = logging.get_logger(__name__)  # pylint: disable=invalid-name


-def _maybe_pad_or_trim_video(video: torch.Tensor, num_frames: int):
+def _maybe_pad_video(video: torch.Tensor, num_frames: int):
    n_pad_frames = num_frames - video.shape[2]
    if n_pad_frames > 0:
        last_frame = video[:, :, -1:, :, :]
        video = torch.cat((video, last_frame.repeat(1, 1, n_pad_frames, 1, 1)), dim=2)
-    elif num_frames < video.shape[2]:
-        video = video[:, :, :num_frames, :, :]
    return video


@@ -133,8 +134,8 @@ EXAMPLE_DOC_STRING = """
        >>> controls = [Image.fromarray(x.numpy()) for x in controls.permute(1, 2, 3, 0)]
        >>> export_to_video(controls, "edge_controlled_video_edge.mp4", fps=30)

-        >>> # Transfer inference with controls.
        >>> video = pipe(
+        ...     video=input_video[:num_frames],
        ...     controls=controls,
        ...     controls_conditioning_scale=1.0,
        ...     prompt=prompt,
@@ -148,7 +149,7 @@ EXAMPLE_DOC_STRING = """

 class Cosmos2_5_TransferPipeline(DiffusionPipeline):
    r"""
-    Pipeline for Cosmos Transfer2.5, supporting auto-regressive inference.
+    Pipeline for Cosmos Transfer2.5 base model.

    This model inherits from [`DiffusionPipeline`]. Check the superclass documentation for the generic methods
    implemented for all pipelines (downloading, saving, running on a particular device, etc.).
@@ -165,14 +166,12 @@ class Cosmos2_5_TransferPipeline(DiffusionPipeline):
            A scheduler to be used in combination with `transformer` to denoise the encoded image latents.
        vae ([`AutoencoderKLWan`]):
            Variational Auto-Encoder (VAE) Model to encode and decode videos to and from latent representations.
-        controlnet ([`CosmosControlNetModel`]):
-            ControlNet used to condition generation on control inputs.
    """

    model_cpu_offload_seq = "text_encoder->transformer->controlnet->vae"
    _callback_tensor_inputs = ["latents", "prompt_embeds", "negative_prompt_embeds"]
    # We mark safety_checker as optional here to get around some test failures, but it is not really optional
-    _optional_components = ["safety_checker"]
+    _optional_components = ["safety_checker", "controlnet"]
    _exclude_from_cpu_offload = ["safety_checker"]

    def __init__(
@@ -182,8 +181,8 @@ class Cosmos2_5_TransferPipeline(DiffusionPipeline):
        transformer: CosmosTransformer3DModel,
        vae: AutoencoderKLWan,
        scheduler: UniPCMultistepScheduler,
-        controlnet: CosmosControlNetModel,
-        safety_checker: Optional[CosmosSafetyChecker] = None,
+        controlnet: Optional[CosmosControlNetModel],
+        safety_checker: CosmosSafetyChecker = None,
    ):
        super().__init__()

@@ -263,9 +262,6 @@ class Cosmos2_5_TransferPipeline(DiffusionPipeline):
                truncation=True,
                padding="max_length",
            )
-            input_ids = (
-                input_ids["input_ids"] if not isinstance(input_ids, list) and "input_ids" in input_ids else input_ids
-            )
            input_ids = torch.LongTensor(input_ids)
            input_ids_batch.append(input_ids)

@@ -385,11 +381,10 @@ class Cosmos2_5_TransferPipeline(DiffusionPipeline):
        num_frames_in: int = 93,
        num_frames_out: int = 93,
        do_classifier_free_guidance: bool = True,
-        dtype: Optional[torch.dtype] = None,
-        device: Optional[torch.device] = None,
-        generator: Optional[Union[torch.Generator, List[torch.Generator]]] = None,
-        latents: Optional[torch.Tensor] = None,
-        num_cond_latent_frames: int = 0,
+        dtype: torch.dtype | None = None,
+        device: torch.device | None = None,
+        generator: torch.Generator | list[torch.Generator] | None = None,
+        latents: torch.Tensor | None = None,
    ) -> torch.Tensor:
        if isinstance(generator, list) and len(generator) != batch_size:
            raise ValueError(
@@ -404,14 +399,10 @@ class Cosmos2_5_TransferPipeline(DiffusionPipeline):
        W = width // self.vae_scale_factor_spatial
        shape = (B, C, T, H, W)

-        if latents is not None:
-            if latents.shape[1:] != shape[1:]:
-                raise ValueError(f"Unexpected `latents` shape, got {latents.shape}, expected {shape}.")
-            latents = latents.to(device=device, dtype=dtype)
-        else:
-            latents = randn_tensor(shape, generator=generator, device=device, dtype=dtype)
-
        if num_frames_in == 0:
+            if latents is None:
+                latents = randn_tensor(shape, generator=generator, device=device, dtype=dtype)
+
            cond_mask = torch.zeros((B, 1, T, H, W), dtype=latents.dtype, device=latents.device)
            cond_indicator = torch.zeros((B, 1, T, 1, 1), dtype=latents.dtype, device=latents.device)

@@ -441,12 +432,16 @@ class Cosmos2_5_TransferPipeline(DiffusionPipeline):
            latents_std = self.latents_std.to(device=device, dtype=dtype)
            cond_latents = (cond_latents - latents_mean) / latents_std

+            if latents is None:
+                latents = randn_tensor(shape, generator=generator, device=device, dtype=dtype)
+            else:
+                latents = latents.to(device=device, dtype=dtype)
+
            padding_shape = (B, 1, T, H, W)
            ones_padding = latents.new_ones(padding_shape)
            zeros_padding = latents.new_zeros(padding_shape)

-            cond_indicator = latents.new_zeros(B, 1, latents.size(2), 1, 1)
-            cond_indicator[:, :, 0:num_cond_latent_frames, :, :] = 1.0
+            cond_indicator = latents.new_zeros(1, 1, latents.size(2), 1, 1)
            cond_mask = cond_indicator * ones_padding + (1 - cond_indicator) * zeros_padding

            return (
@@ -456,7 +451,34 @@ class Cosmos2_5_TransferPipeline(DiffusionPipeline):
                cond_indicator,
            )

-    # Modified from diffusers.pipelines.cosmos.pipeline_cosmos_text2world.CosmosTextToWorldPipeline.check_inputs
+    def _encode_controls(
+        self,
+        controls: Optional[torch.Tensor],
+        height: int,
+        width: int,
+        num_frames: int,
+        dtype: torch.dtype,
+        device: torch.device,
+        generator: torch.Generator | list[torch.Generator] | None,
+    ) -> Optional[torch.Tensor]:
+        if controls is None:
+            return None
+
+        control_video = self.video_processor.preprocess_video(controls, height, width)
+        control_video = _maybe_pad_video(control_video, num_frames)
+
+        control_video = control_video.to(device=device, dtype=self.vae.dtype)
+        control_latents = [
+            retrieve_latents(self.vae.encode(vid.unsqueeze(0)), generator=generator) for vid in control_video
+        ]
+        control_latents = torch.cat(control_latents, dim=0).to(dtype)
+
+        latents_mean = self.latents_mean.to(device=device, dtype=dtype)
+        latents_std = self.latents_std.to(device=device, dtype=dtype)
+        control_latents = (control_latents - latents_mean) / latents_std
+        return control_latents
+
+    # Copied from diffusers.pipelines.cosmos.pipeline_cosmos_text2world.CosmosTextToWorldPipeline.check_inputs
    def check_inputs(
        self,
        prompt,
@@ -464,25 +486,9 @@ class Cosmos2_5_TransferPipeline(DiffusionPipeline):
        width,
        prompt_embeds=None,
        callback_on_step_end_tensor_inputs=None,
-        num_ar_conditional_frames=None,
-        num_ar_latent_conditional_frames=None,
-        num_frames_per_chunk=None,
-        num_frames=None,
-        conditional_frame_timestep=0.1,
    ):
-        if width <= 0 or height <= 0 or height % 16 != 0 or width % 16 != 0:
-            raise ValueError(
-                f"`height` and `width` have to be divisible by 16 (& positive) but are {height} and {width}."
-            )
-
-        if num_frames is not None and num_frames <= 0:
-            raise ValueError(f"`num_frames` has to be a positive integer when provided but is {num_frames}.")
-
-        if conditional_frame_timestep < 0 or conditional_frame_timestep > 1:
-            raise ValueError(
-                "`conditional_frame_timestep` has to be a float in the [0, 1] interval but is "
-                f"{conditional_frame_timestep}."
-            )
+        if height % 16 != 0 or width % 16 != 0:
+            raise ValueError(f"`height` and `width` have to be divisible by 16 but are {height} and {width}.")

        if callback_on_step_end_tensor_inputs is not None and not all(
            k in self._callback_tensor_inputs for k in callback_on_step_end_tensor_inputs
@@ -503,46 +509,6 @@ class Cosmos2_5_TransferPipeline(DiffusionPipeline):
        elif prompt is not None and (not isinstance(prompt, str) and not isinstance(prompt, list)):
            raise ValueError(f"`prompt` has to be of type `str` or `list` but is {type(prompt)}")

-        if num_ar_latent_conditional_frames is not None and num_ar_conditional_frames is not None:
-            raise ValueError(
-                "Provide only one of `num_ar_conditional_frames` or `num_ar_latent_conditional_frames`, not both."
-            )
-        if num_ar_latent_conditional_frames is None and num_ar_conditional_frames is None:
-            raise ValueError("Provide either `num_ar_conditional_frames` or `num_ar_latent_conditional_frames`.")
-        if num_ar_latent_conditional_frames is not None and num_ar_latent_conditional_frames < 0:
-            raise ValueError("`num_ar_latent_conditional_frames` must be >= 0.")
-        if num_ar_conditional_frames is not None and num_ar_conditional_frames < 0:
-            raise ValueError("`num_ar_conditional_frames` must be >= 0.")
-
-        if num_ar_latent_conditional_frames is not None:
-            num_ar_conditional_frames = max(
-                0, (num_ar_latent_conditional_frames - 1) * self.vae_scale_factor_temporal + 1
-            )
-
-        min_chunk_len = self.vae_scale_factor_temporal + 1
-        if num_frames_per_chunk < min_chunk_len:
-            logger.warning(f"{num_frames_per_chunk=} must be larger than {min_chunk_len=}, setting to min_chunk_len")
-            num_frames_per_chunk = min_chunk_len
-
-        max_frames_by_rope = None
-        if getattr(self.transformer.config, "max_size", None) is not None:
-            max_frames_by_rope = max(
-                size // patch
-                for size, patch in zip(self.transformer.config.max_size, self.transformer.config.patch_size)
-            )
-            if num_frames_per_chunk > max_frames_by_rope:
-                raise ValueError(
-                    f"{num_frames_per_chunk=} is too large for RoPE setting ({max_frames_by_rope=}). "
-                    "Please reduce `num_frames_per_chunk`."
-                )
-
-        if num_ar_conditional_frames >= num_frames_per_chunk:
-            raise ValueError(
-                f"{num_ar_conditional_frames=} must be smaller than {num_frames_per_chunk=} for chunked generation."
-            )
-
-        return num_frames_per_chunk
-
    @property
    def guidance_scale(self):
        return self._guidance_scale
@@ -567,22 +533,23 @@ class Cosmos2_5_TransferPipeline(DiffusionPipeline):
    @replace_example_docstring(EXAMPLE_DOC_STRING)
    def __call__(
        self,
-        controls: PipelineImageInput | List[PipelineImageInput],
-        controls_conditioning_scale: Union[float, List[float]] = 1.0,
+        image: PipelineImageInput | None = None,
+        video: List[PipelineImageInput] | None = None,
        prompt: Union[str, List[str]] | None = None,
        negative_prompt: Union[str, List[str]] = DEFAULT_NEGATIVE_PROMPT,
        height: int = 704,
-        width: Optional[int] = None,
-        num_frames: Optional[int] = None,
-        num_frames_per_chunk: int = 93,
+        width: int | None = None,
+        num_frames: int = 93,
        num_inference_steps: int = 36,
        guidance_scale: float = 3.0,
-        num_videos_per_prompt: int = 1,
-        generator: Optional[Union[torch.Generator, List[torch.Generator]]] = None,
-        latents: Optional[torch.Tensor] = None,
-        prompt_embeds: Optional[torch.Tensor] = None,
-        negative_prompt_embeds: Optional[torch.Tensor] = None,
-        output_type: Optional[str] = "pil",
+        num_videos_per_prompt: Optional[int] = 1,
+        generator: torch.Generator | list[torch.Generator] | None = None,
+        latents: torch.Tensor | None = None,
+        controls: Optional[PipelineImageInput | List[PipelineImageInput]] = None,
+        controls_conditioning_scale: float | list[float] = 1.0,
+        prompt_embeds: torch.Tensor | None = None,
+        negative_prompt_embeds: torch.Tensor | None = None,
+        output_type: str = "pil",
        return_dict: bool = True,
        callback_on_step_end: Optional[
            Union[Callable[[int, int, Dict], None], PipelineCallback, MultiPipelineCallbacks]
@@ -590,26 +557,24 @@ class Cosmos2_5_TransferPipeline(DiffusionPipeline):
        callback_on_step_end_tensor_inputs: List[str] = ["latents"],
        max_sequence_length: int = 512,
        conditional_frame_timestep: float = 0.1,
-        num_ar_conditional_frames: Optional[int] = 1,
-        num_ar_latent_conditional_frames: Optional[int] = None,
    ):
        r"""
-        `controls` drive the conditioning through ControlNet. Controls are assumed to be pre-processed, e.g. edge maps
-        are pre-computed.
+        The call function to the pipeline for generation. Supports three modes:

-        Setting `num_frames` will restrict the total number of frames output, if not provided or assigned to None
-        (default) then the number of output frames will match the input `controls`.
+        - **Text2World**: `image=None`, `video=None`, `prompt` provided. Generates a world clip.
+        - **Image2World**: `image` provided, `video=None`, `prompt` provided. Conditions on a single frame.
+        - **Video2World**: `video` provided, `image=None`, `prompt` provided. Conditions on an input clip.

-        Auto-regressive inference is supported and thus a sliding window of `num_frames_per_chunk` frames are used per
-        denoising loop. In addition, when auto-regressive inference is performed, the previous
-        `num_ar_latent_conditional_frames` or `num_ar_conditional_frames` are used to condition the following denoising
-        inference loops.
+        Set `num_frames=93` (default) to produce a world video, or `num_frames=1` to produce a single image frame (the
+        above in "*2Image mode").
+
+        Outputs follow `output_type` (e.g., `"pil"` returns a list of `num_frames` PIL images per prompt).

        Args:
-            controls (`PipelineImageInput`, `List[PipelineImageInput]`):
-                Control image or video input used by the ControlNet.
-            controls_conditioning_scale (`float` or `List[float]`, *optional*, defaults to `1.0`):
-                The scale factor(s) for the ControlNet outputs. A single float is broadcast to all control blocks.
+            image (`PIL.Image.Image`, `np.ndarray`, `torch.Tensor`, *optional*):
+                Optional single image for Image2World conditioning. Must be `None` when `video` is provided.
+            video (`List[PIL.Image.Image]`, `np.ndarray`, `torch.Tensor`, *optional*):
+                Optional input video for Video2World conditioning. Must be `None` when `image` is provided.
            prompt (`str` or `List[str]`, *optional*):
                The prompt or prompts to guide generation. Required unless `prompt_embeds` is supplied.
            height (`int`, defaults to `704`):
@@ -617,10 +582,9 @@ class Cosmos2_5_TransferPipeline(DiffusionPipeline):
            width (`int`, *optional*):
                The width in pixels of the generated image. If not provided, this will be determined based on the
                aspect ratio of the input and the provided height.
-            num_frames (`int`, *optional*):
-                Number of output frames. Defaults to `None` to output the same number of frames as the input
-                `controls`.
-            num_inference_steps (`int`, defaults to `36`):
+            num_frames (`int`, defaults to `93`):
+                Number of output frames. Use `93` for world (video) generation; set to `1` to return a single frame.
+            num_inference_steps (`int`, defaults to `35`):
                The number of denoising steps. More denoising steps usually lead to a higher quality image at the
                expense of slower inference.
            guidance_scale (`float`, defaults to `3.0`):
@@ -634,9 +598,13 @@ class Cosmos2_5_TransferPipeline(DiffusionPipeline):
                A [`torch.Generator`](https://pytorch.org/docs/stable/generated/torch.Generator.html) to make
                generation deterministic.
            latents (`torch.Tensor`, *optional*):
-                Pre-generated noisy latents sampled from a Gaussian distribution, to be used as inputs. Can be used to
-                tweak the same generation with different prompts. If not provided, a latents tensor is generated by
-                sampling using the supplied random `generator`.
+                Pre-generated noisy latents sampled from a Gaussian distribution, to be used as inputs for image
+                generation. Can be used to tweak the same generation with different prompts. If not provided, a latents
+                tensor is generated by sampling using the supplied random `generator`.
+            controls (`PipelineImageInput`, `List[PipelineImageInput]`, *optional*):
+                Control image or video input used by the ControlNet. If `None`, ControlNet is skipped.
+            controls_conditioning_scale (`float` or `List[float]`, *optional*, defaults to `1.0`):
+                The scale factor(s) for the ControlNet outputs. A single float is broadcast to all control blocks.
            prompt_embeds (`torch.Tensor`, *optional*):
                Pre-generated text embeddings. Can be used to easily tweak text inputs, *e.g.* prompt weighting. If not
                provided, text embeddings will be generated from `prompt` input argument.
@@ -659,18 +627,7 @@ class Cosmos2_5_TransferPipeline(DiffusionPipeline):
            max_sequence_length (`int`, defaults to `512`):
                The maximum number of tokens in the prompt. If the prompt exceeds this length, it will be truncated. If
                the prompt is shorter than this length, it will be padded.
-            num_ar_conditional_frames (`int`, *optional*, defaults to `1`):
-                Number of frames to condition on subsequent inference loops in auto-regressive inference, i.e. for the
-                second chunk and onwards. Only used if `num_ar_latent_conditional_frames` is `None`.

-                This is only used when auto-regressive inference is performed, i.e. when the number of frames in
-                controls is > num_frames_per_chunk
-            num_ar_latent_conditional_frames (`int`, *optional*):
-                Number of latent frames to condition on subsequent inference loops in auto-regressive inference, i.e.
-                for the second chunk and onwards. Only used if `num_ar_conditional_frames` is `None`.
-
-                This is only used when auto-regressive inference is performed, i.e. when the number of frames in
-                controls is > num_frames_per_chunk
        Examples:

        Returns:
@@ -690,40 +647,21 @@ class Cosmos2_5_TransferPipeline(DiffusionPipeline):
            callback_on_step_end_tensor_inputs = callback_on_step_end.tensor_inputs

        if width is None:
-            frame = controls[0] if isinstance(controls, list) else controls
-            if isinstance(frame, list):
-                frame = frame[0]
-            if isinstance(frame, (torch.Tensor, np.ndarray)):
-                if frame.ndim == 5:
-                    frame = frame[0, 0]
-                elif frame.ndim == 4:
-                    frame = frame[0]
+            frame = image or video[0] if image or video else None
+            if frame is None and controls is not None:
+                frame = controls[0] if isinstance(controls, list) else controls
+                if isinstance(frame, (torch.Tensor, np.ndarray)) and len(frame.shape) == 4:
+                    frame = controls[0]

-            if isinstance(frame, PIL.Image.Image):
+            if frame is None:
+                width = int((height + 16) * (1280 / 720))
+            elif isinstance(frame, PIL.Image.Image):
                width = int((height + 16) * (frame.width / frame.height))
            else:
-                if frame.ndim != 3:
-                    raise ValueError("`controls` must contain 3D frames in CHW format.")
                width = int((height + 16) * (frame.shape[2] / frame.shape[1]))  # NOTE: assuming C H W

-        num_frames_per_chunk = self.check_inputs(
-            prompt,
-            height,
-            width,
-            prompt_embeds,
-            callback_on_step_end_tensor_inputs,
-            num_ar_conditional_frames,
-            num_ar_latent_conditional_frames,
-            num_frames_per_chunk,
-            num_frames,
-            conditional_frame_timestep,
-        )
-
-        if num_ar_latent_conditional_frames is not None:
-            num_cond_latent_frames = num_ar_latent_conditional_frames
-            num_ar_conditional_frames = max(0, (num_cond_latent_frames - 1) * self.vae_scale_factor_temporal + 1)
-        else:
-            num_cond_latent_frames = max(0, (num_ar_conditional_frames - 1) // self.vae_scale_factor_temporal + 1)
+        # Check inputs. Raise error if not correct
+        self.check_inputs(prompt, height, width, prompt_embeds, callback_on_step_end_tensor_inputs)

        self._guidance_scale = guidance_scale
        self._current_timestep = None
@@ -768,137 +706,102 @@ class Cosmos2_5_TransferPipeline(DiffusionPipeline):
        vae_dtype = self.vae.dtype
        transformer_dtype = self.transformer.dtype

-        if getattr(self.transformer.config, "img_context_dim_in", None):
-            img_context = torch.zeros(
-                batch_size,
-                self.transformer.config.img_context_num_tokens,
-                self.transformer.config.img_context_dim_in,
-                device=prompt_embeds.device,
+        img_context = torch.zeros(
+            batch_size,
+            self.transformer.config.img_context_num_tokens,
+            self.transformer.config.img_context_dim_in,
+            device=prompt_embeds.device,
+            dtype=transformer_dtype,
+        )
+        encoder_hidden_states = (prompt_embeds, img_context)
+        neg_encoder_hidden_states = (negative_prompt_embeds, img_context)
+
+        num_frames_in = None
+        if image is not None:
+            if batch_size != 1:
+                raise ValueError(f"batch_size must be 1 for image input (given {batch_size})")
+
+            image = torchvision.transforms.functional.to_tensor(image).unsqueeze(0)
+            video = torch.cat([image, torch.zeros_like(image).repeat(num_frames - 1, 1, 1, 1)], dim=0)
+            video = video.unsqueeze(0)
+            num_frames_in = 1
+        elif video is None:
+            video = torch.zeros(batch_size, num_frames, 3, height, width, dtype=torch.uint8)
+            num_frames_in = 0
+        else:
+            num_frames_in = len(video)
+
+            if batch_size != 1:
+                raise ValueError(f"batch_size must be 1 for video input (given {batch_size})")
+
+        assert video is not None
+        video = self.video_processor.preprocess_video(video, height, width)
+
+        # pad with last frame (for video2world)
+        num_frames_out = num_frames
+        video = _maybe_pad_video(video, num_frames_out)
+        assert num_frames_in <= num_frames_out, f"expected ({num_frames_in=}) <= ({num_frames_out=})"
+
+        video = video.to(device=device, dtype=vae_dtype)
+
+        num_channels_latents = self.transformer.config.in_channels - 1
+        latents, cond_latent, cond_mask, cond_indicator = self.prepare_latents(
+            video=video,
+            batch_size=batch_size * num_videos_per_prompt,
+            num_channels_latents=num_channels_latents,
+            height=height,
+            width=width,
+            num_frames_in=num_frames_in,
+            num_frames_out=num_frames,
+            do_classifier_free_guidance=self.do_classifier_free_guidance,
+            dtype=torch.float32,
+            device=device,
+            generator=generator,
+            latents=latents,
+        )
+        cond_timestep = torch.ones_like(cond_indicator) * conditional_frame_timestep
+        cond_mask = cond_mask.to(transformer_dtype)
+
+        controls_latents = None
+        if controls is not None:
+            controls_latents = self._encode_controls(
+                controls,
+                height=height,
+                width=width,
+                num_frames=num_frames,
                dtype=transformer_dtype,
+                device=device,
+                generator=generator,
            )

-            if num_videos_per_prompt > 1:
-                img_context = img_context.repeat_interleave(num_videos_per_prompt, dim=0)
+        padding_mask = latents.new_zeros(1, 1, height, width, dtype=transformer_dtype)

-            encoder_hidden_states = (prompt_embeds, img_context)
-            neg_encoder_hidden_states = (negative_prompt_embeds, img_context)
-        else:
-            encoder_hidden_states = prompt_embeds
-            neg_encoder_hidden_states = negative_prompt_embeds
+        # Denoising loop
+        self.scheduler.set_timesteps(num_inference_steps, device=device)
+        timesteps = self.scheduler.timesteps
+        self._num_timesteps = len(timesteps)
+        num_warmup_steps = len(timesteps) - num_inference_steps * self.scheduler.order

-        control_video = self.video_processor.preprocess_video(controls, height, width)
-        if control_video.shape[0] != batch_size:
-            if control_video.shape[0] == 1:
-                control_video = control_video.repeat(batch_size, 1, 1, 1, 1)
-            else:
-                raise ValueError(
-                    f"Expected controls batch size {batch_size} to match prompt batch size, but got {control_video.shape[0]}."
+        gt_velocity = (latents - cond_latent) * cond_mask
+        with self.progress_bar(total=num_inference_steps) as progress_bar:
+            for i, t in enumerate(timesteps):
+                if self.interrupt:
+                    continue
+
+                self._current_timestep = t.cpu().item()
+
+                # NOTE: assumes sigma(t) \in [0, 1]
+                sigma_t = (
+                    torch.tensor(self.scheduler.sigmas[i].item())
+                    .unsqueeze(0)
+                    .to(device=device, dtype=transformer_dtype)
                )

-        num_frames_out = control_video.shape[2]
-        if num_frames is not None:
-            num_frames_out = min(num_frames_out, num_frames)
-
-        control_video = _maybe_pad_or_trim_video(control_video, num_frames_out)
-
-        # chunk information
-        num_latent_frames_per_chunk = (num_frames_per_chunk - 1) // self.vae_scale_factor_temporal + 1
-        chunk_stride = num_frames_per_chunk - num_ar_conditional_frames
-        chunk_idxs = [
-            (start_idx, min(start_idx + num_frames_per_chunk, num_frames_out))
-            for start_idx in range(0, num_frames_out - num_ar_conditional_frames, chunk_stride)
-        ]
-
-        video_chunks = []
-        latents_mean = self.latents_mean.to(dtype=vae_dtype, device=device)
-        latents_std = self.latents_std.to(dtype=vae_dtype, device=device)
-
-        def decode_latents(latents):
-            latents = latents * latents_std + latents_mean
-            video = self.vae.decode(latents.to(dtype=self.vae.dtype, device=device), return_dict=False)[0]
-            return video
-
-        latents_arg = latents
-        initial_num_cond_latent_frames = 0
-        latent_chunks = []
-        num_chunks = len(chunk_idxs)
-        total_steps = num_inference_steps * num_chunks
-        with self.progress_bar(total=total_steps) as progress_bar:
-            for chunk_idx, (start_idx, end_idx) in enumerate(chunk_idxs):
-                if chunk_idx == 0:
-                    prev_output = torch.zeros((batch_size, num_frames_per_chunk, 3, height, width), dtype=vae_dtype)
-                    prev_output = self.video_processor.preprocess_video(prev_output, height, width)
-                else:
-                    prev_output = video_chunks[-1].clone()
-                    if num_ar_conditional_frames > 0:
-                        prev_output[:, :, :num_ar_conditional_frames] = prev_output[:, :, -num_ar_conditional_frames:]
-                        prev_output[:, :, num_ar_conditional_frames:] = -1  # -1 == 0 in processed video space
-                    else:
-                        prev_output.fill_(-1)
-
-                chunk_video = prev_output.to(device=device, dtype=vae_dtype)
-                chunk_video = _maybe_pad_or_trim_video(chunk_video, num_frames_per_chunk)
-                latents, cond_latent, cond_mask, cond_indicator = self.prepare_latents(
-                    video=chunk_video,
-                    batch_size=batch_size * num_videos_per_prompt,
-                    num_channels_latents=self.transformer.config.in_channels - 1,
-                    height=height,
-                    width=width,
-                    num_frames_in=chunk_video.shape[2],
-                    num_frames_out=num_frames_per_chunk,
-                    do_classifier_free_guidance=self.do_classifier_free_guidance,
-                    dtype=torch.float32,
-                    device=device,
-                    generator=generator,
-                    num_cond_latent_frames=initial_num_cond_latent_frames
-                    if chunk_idx == 0
-                    else num_cond_latent_frames,
-                    latents=latents_arg,
-                )
-                cond_mask = cond_mask.to(transformer_dtype)
-                cond_timestep = torch.ones_like(cond_indicator) * conditional_frame_timestep
-                padding_mask = latents.new_zeros(1, 1, height, width, dtype=transformer_dtype)
-
-                chunk_control_video = control_video[:, :, start_idx:end_idx, ...].to(
-                    device=device, dtype=self.vae.dtype
-                )
-                chunk_control_video = _maybe_pad_or_trim_video(chunk_control_video, num_frames_per_chunk)
-                if isinstance(generator, list):
-                    controls_latents = [
-                        retrieve_latents(self.vae.encode(chunk_control_video[i].unsqueeze(0)), generator=generator[i])
-                        for i in range(chunk_control_video.shape[0])
-                    ]
-                else:
-                    controls_latents = [
-                        retrieve_latents(self.vae.encode(vid.unsqueeze(0)), generator=generator)
-                        for vid in chunk_control_video
-                    ]
-                controls_latents = torch.cat(controls_latents, dim=0).to(transformer_dtype)
-
-                controls_latents = (controls_latents - latents_mean) / latents_std
-
-                # Denoising loop
-                self.scheduler.set_timesteps(num_inference_steps, device=device)
-                timesteps = self.scheduler.timesteps
-                self._num_timesteps = len(timesteps)
-
-                gt_velocity = (latents - cond_latent) * cond_mask
-                for i, t in enumerate(timesteps):
-                    if self.interrupt:
-                        continue
-
-                    self._current_timestep = t.cpu().item()
-
-                    # NOTE: assumes sigma(t) \in [0, 1]
-                    sigma_t = (
-                        torch.tensor(self.scheduler.sigmas[i].item())
-                        .unsqueeze(0)
-                        .to(device=device, dtype=transformer_dtype)
-                    )
-
-                    in_latents = cond_mask * cond_latent + (1 - cond_mask) * latents
-                    in_latents = in_latents.to(transformer_dtype)
-                    in_timestep = cond_indicator * cond_timestep + (1 - cond_indicator) * sigma_t
+                in_latents = cond_mask * cond_latent + (1 - cond_mask) * latents
+                in_latents = in_latents.to(transformer_dtype)
+                in_timestep = cond_indicator * cond_timestep + (1 - cond_indicator) * sigma_t
+                control_blocks = None
+                if controls_latents is not None and self.controlnet is not None:
                    control_output = self.controlnet(
                        controls_latents=controls_latents,
                        latents=in_latents,
@@ -911,18 +814,20 @@ class Cosmos2_5_TransferPipeline(DiffusionPipeline):
                    )
                    control_blocks = control_output[0]

-                    noise_pred = self.transformer(
-                        hidden_states=in_latents,
-                        timestep=in_timestep,
-                        encoder_hidden_states=encoder_hidden_states,
-                        block_controlnet_hidden_states=control_blocks,
-                        condition_mask=cond_mask,
-                        padding_mask=padding_mask,
-                        return_dict=False,
-                    )[0]
-                    noise_pred = gt_velocity + noise_pred * (1 - cond_mask)
+                noise_pred = self.transformer(
+                    hidden_states=in_latents,
+                    timestep=in_timestep,
+                    encoder_hidden_states=encoder_hidden_states,
+                    block_controlnet_hidden_states=control_blocks,
+                    condition_mask=cond_mask,
+                    padding_mask=padding_mask,
+                    return_dict=False,
+                )[0]
+                noise_pred = gt_velocity + noise_pred * (1 - cond_mask)

-                    if self.do_classifier_free_guidance:
+                if self.do_classifier_free_guidance:
+                    control_blocks = None
+                    if controls_latents is not None and self.controlnet is not None:
                        control_output = self.controlnet(
                            controls_latents=controls_latents,
                            latents=in_latents,
@@ -935,50 +840,46 @@ class Cosmos2_5_TransferPipeline(DiffusionPipeline):
                        )
                        control_blocks = control_output[0]

-                        noise_pred_neg = self.transformer(
-                            hidden_states=in_latents,
-                            timestep=in_timestep,
-                            encoder_hidden_states=neg_encoder_hidden_states,  # NOTE: negative prompt
-                            block_controlnet_hidden_states=control_blocks,
-                            condition_mask=cond_mask,
-                            padding_mask=padding_mask,
-                            return_dict=False,
-                        )[0]
-                        # NOTE: replace velocity (noise_pred_neg) with gt_velocity for conditioning inputs only
-                        noise_pred_neg = gt_velocity + noise_pred_neg * (1 - cond_mask)
-                        noise_pred = noise_pred + self.guidance_scale * (noise_pred - noise_pred_neg)
+                    noise_pred_neg = self.transformer(
+                        hidden_states=in_latents,
+                        timestep=in_timestep,
+                        encoder_hidden_states=neg_encoder_hidden_states,  # NOTE: negative prompt
+                        block_controlnet_hidden_states=control_blocks,
+                        condition_mask=cond_mask,
+                        padding_mask=padding_mask,
+                        return_dict=False,
+                    )[0]
+                    # NOTE: replace velocity (noise_pred_neg) with gt_velocity for conditioning inputs only
+                    noise_pred_neg = gt_velocity + noise_pred_neg * (1 - cond_mask)
+                    noise_pred = noise_pred + self.guidance_scale * (noise_pred - noise_pred_neg)

-                    latents = self.scheduler.step(noise_pred, t, latents, return_dict=False)[0]
+                latents = self.scheduler.step(noise_pred, t, latents, return_dict=False)[0]

-                    # call the callback, if provided
-                    if callback_on_step_end is not None:
-                        callback_kwargs = {}
-                        for k in callback_on_step_end_tensor_inputs:
-                            callback_kwargs[k] = locals()[k]
-                        callback_outputs = callback_on_step_end(self, i, t, callback_kwargs)
+                if callback_on_step_end is not None:
+                    callback_kwargs = {}
+                    for k in callback_on_step_end_tensor_inputs:
+                        callback_kwargs[k] = locals()[k]
+                    callback_outputs = callback_on_step_end(self, i, t, callback_kwargs)

-                        latents = callback_outputs.pop("latents", latents)
-                        prompt_embeds = callback_outputs.pop("prompt_embeds", prompt_embeds)
-                        negative_prompt_embeds = callback_outputs.pop("negative_prompt_embeds", negative_prompt_embeds)
+                    latents = callback_outputs.pop("latents", latents)
+                    prompt_embeds = callback_outputs.pop("prompt_embeds", prompt_embeds)
+                    negative_prompt_embeds = callback_outputs.pop("negative_prompt_embeds", negative_prompt_embeds)

-                    if i == total_steps - 1 or ((i + 1) % self.scheduler.order == 0):
-                        progress_bar.update()
+                # call the callback, if provided
+                if i == len(timesteps) - 1 or ((i + 1) > num_warmup_steps and (i + 1) % self.scheduler.order == 0):
+                    progress_bar.update()

-                    if XLA_AVAILABLE:
-                        xm.mark_step()
-
-                video_chunks.append(decode_latents(latents).detach().cpu())
-                latent_chunks.append(latents.detach().cpu())
+                if XLA_AVAILABLE:
+                    xm.mark_step()

        self._current_timestep = None

        if not output_type == "latent":
-            video_chunks = [
-                chunk[:, :, num_ar_conditional_frames:, ...] if chunk_idx != 0 else chunk
-                for chunk_idx, chunk in enumerate(video_chunks)
-            ]
-            video = torch.cat(video_chunks, dim=2)
-            video = video[:, :, :num_frames_out, ...]
+            latents_mean = self.latents_mean.to(latents.device, latents.dtype)
+            latents_std = self.latents_std.to(latents.device, latents.dtype)
+            latents = latents * latents_std + latents_mean
+            video = self.vae.decode(latents.to(self.vae.dtype), return_dict=False)[0]
+            video = self._match_num_frames(video, num_frames)

            assert self.safety_checker is not None
            self.safety_checker.to(device)
@@ -995,13 +896,7 @@ class Cosmos2_5_TransferPipeline(DiffusionPipeline):
            video = torch.from_numpy(video).permute(0, 4, 1, 2, 3)
            video = self.video_processor.postprocess_video(video, output_type=output_type)
        else:
-            latent_T = (num_frames_out - 1) // self.vae_scale_factor_temporal + 1
-            latent_chunks = [
-                chunk[:, :, num_cond_latent_frames:, ...] if chunk_idx != 0 else chunk
-                for chunk_idx, chunk in enumerate(latent_chunks)
-            ]
-            video = torch.cat(latent_chunks, dim=2)
-            video = video[:, :, :latent_T, ...]
+            video = latents

        # Offload all models
        self.maybe_free_model_hooks()
@@ -1010,3 +905,19 @@ class Cosmos2_5_TransferPipeline(DiffusionPipeline):
            return (video,)

        return CosmosPipelineOutput(frames=video)
+
+    def _match_num_frames(self, video: torch.Tensor, target_num_frames: int) -> torch.Tensor:
+        if target_num_frames <= 0 or video.shape[2] == target_num_frames:
+            return video
+
+        frames_per_latent = max(self.vae_scale_factor_temporal, 1)
+        video = torch.repeat_interleave(video, repeats=frames_per_latent, dim=2)
+
+        current_frames = video.shape[2]
+        if current_frames < target_num_frames:
+            pad = video[:, :, -1:, :, :].repeat(1, 1, target_num_frames - current_frames, 1, 1)
+            video = torch.cat([video, pad], dim=2)
+        elif current_frames > target_num_frames:
+            video = video[:, :, :target_num_frames]
+
+        return video
--- a/src/diffusers/pipelines/kandinsky/text_encoder.py
+++ b/src/diffusers/pipelines/kandinsky/text_encoder.py
@@ -20,8 +20,6 @@ class MultilingualCLIP(PreTrainedModel):
        self.LinearTransformation = torch.nn.Linear(
            in_features=config.transformerDimensions, out_features=config.numDims
        )
-        if hasattr(self, "post_init"):
-            self.post_init()

    def forward(self, input_ids, attention_mask):
        embs = self.transformer(input_ids=input_ids, attention_mask=attention_mask)[0]
--- a/src/diffusers/pipelines/kolors/text_encoder.py
+++ b/src/diffusers/pipelines/kolors/text_encoder.py
@@ -781,9 +781,6 @@ class ChatGLMModel(ChatGLMPreTrainedModel):
            self.prefix_encoder = PrefixEncoder(config)
            self.dropout = torch.nn.Dropout(0.1)

-        if hasattr(self, "post_init"):
-            self.post_init()
-
    def get_input_embeddings(self):
        return self.embedding.word_embeddings

@@ -813,7 +810,7 @@ class ChatGLMModel(ChatGLMPreTrainedModel):
        output_hidden_states = (
            output_hidden_states if output_hidden_states is not None else self.config.output_hidden_states
        )
-        use_cache = use_cache if use_cache is not None else getattr(self.config, "use_cache", None)
+        use_cache = use_cache if use_cache is not None else self.config.use_cache
        return_dict = return_dict if return_dict is not None else self.config.use_return_dict

        batch_size, seq_length = input_ids.shape
--- a/src/diffusers/pipelines/ltx2/pipeline_ltx2_image2video.py
+++ b/src/diffusers/pipelines/ltx2/pipeline_ltx2_image2video.py
@@ -699,13 +699,9 @@ class LTX2ImageToVideoPipeline(DiffusionPipeline, FromSingleFileMixin, LTX2LoraL
        mask_shape = (batch_size, 1, num_frames, height, width)

        if latents is not None:
+            conditioning_mask = latents.new_zeros(mask_shape)
+            conditioning_mask[:, :, 0] = 1.0
            if latents.ndim == 5:
-                # conditioning_mask needs to the same shape as latents in two stages generation.
-                batch_size, _, num_frames, height, width = latents.shape
-                mask_shape = (batch_size, 1, num_frames, height, width)
-                conditioning_mask = latents.new_zeros(mask_shape)
-                conditioning_mask[:, :, 0] = 1.0
-
                latents = self._normalize_latents(
                    latents, self.vae.latents_mean, self.vae.latents_std, self.vae.config.scaling_factor
                )
@@ -714,9 +710,6 @@ class LTX2ImageToVideoPipeline(DiffusionPipeline, FromSingleFileMixin, LTX2LoraL
                latents = self._pack_latents(
                    latents, self.transformer_spatial_patch_size, self.transformer_temporal_patch_size
                )
-            else:
-                conditioning_mask = latents.new_zeros(mask_shape)
-                conditioning_mask[:, :, 0] = 1.0
            conditioning_mask = self._pack_latents(
                conditioning_mask, self.transformer_spatial_patch_size, self.transformer_temporal_patch_size
            ).squeeze(-1)
--- a/src/diffusers/pipelines/pipeline_utils.py
+++ b/src/diffusers/pipelines/pipeline_utils.py
@@ -341,7 +341,6 @@ class DiffusionPipeline(ConfigMixin, PushToHubMixin):
            save_method_accept_safe = "safe_serialization" in save_method_signature.parameters
            save_method_accept_variant = "variant" in save_method_signature.parameters
            save_method_accept_max_shard_size = "max_shard_size" in save_method_signature.parameters
-            save_method_accept_peft_format = "save_peft_format" in save_method_signature.parameters

            save_kwargs = {}
            if save_method_accept_safe:
@@ -351,11 +350,6 @@ class DiffusionPipeline(ConfigMixin, PushToHubMixin):
            if save_method_accept_max_shard_size and max_shard_size is not None:
                # max_shard_size is expected to not be None in ModelMixin
                save_kwargs["max_shard_size"] = max_shard_size
-            if save_method_accept_peft_format:
-                # Set save_peft_format=False for transformers>=5.0.0 compatibility
-                # In transformers 5.0.0+, the default save_peft_format=True adds "base_model.model" prefix
-                # to adapter keys, but from_pretrained expects keys without this prefix
-                save_kwargs["save_peft_format"] = False

            save_method(os.path.join(save_directory, pipeline_component_name), **save_kwargs)

--- a/src/diffusers/pipelines/prx/init.py
+++ b/src/diffusers/pipelines/prx/init.py
@@ -24,25 +24,14 @@ except OptionalDependencyNotAvailable:
 else:
    _import_structure["pipeline_prx"] = ["PRXPipeline"]

-# Wrap T5GemmaEncoder to pass config.encoder (T5GemmaModuleConfig) instead of the
-# composite T5GemmaConfig, which lacks flat attributes expected by T5GemmaEncoder.__init__.
+# Import T5GemmaEncoder for pipeline loading compatibility
 try:
    if is_transformers_available():
        import transformers
-        from transformers.models.t5gemma.modeling_t5gemma import T5GemmaEncoder as _T5GemmaEncoder
-
-        class T5GemmaEncoder(_T5GemmaEncoder):
-            @classmethod
-            def from_pretrained(cls, pretrained_model_name_or_path, *args, **kwargs):
-                if "config" not in kwargs:
-                    from transformers.models.t5gemma.configuration_t5gemma import T5GemmaConfig
-
-                    config = T5GemmaConfig.from_pretrained(pretrained_model_name_or_path)
-                    if hasattr(config, "encoder"):
-                        kwargs["config"] = config.encoder
-                return super().from_pretrained(pretrained_model_name_or_path, *args, **kwargs)
+        from transformers.models.t5gemma.modeling_t5gemma import T5GemmaEncoder

        _additional_imports["T5GemmaEncoder"] = T5GemmaEncoder
+        # Patch transformers module directly for serialization
        if not hasattr(transformers, "T5GemmaEncoder"):
            transformers.T5GemmaEncoder = T5GemmaEncoder
 except ImportError:
--- a/src/diffusers/pipelines/skyreels_v2/pipeline_skyreels_v2.py
+++ b/src/diffusers/pipelines/skyreels_v2/pipeline_skyreels_v2.py
@@ -17,7 +17,7 @@ from typing import Any, Callable

 import regex as re
 import torch
-from transformers import AutoTokenizer, T5EncoderModel, UMT5EncoderModel
+from transformers import AutoTokenizer, UMT5EncoderModel

 from ...callbacks import MultiPipelineCallbacks, PipelineCallback
 from ...loaders import SkyReelsV2LoraLoaderMixin
@@ -132,7 +132,7 @@ class SkyReelsV2Pipeline(DiffusionPipeline, SkyReelsV2LoraLoaderMixin):
    def __init__(
        self,
        tokenizer: AutoTokenizer,
-        text_encoder: T5EncoderModel | UMT5EncoderModel,
+        text_encoder: UMT5EncoderModel,
        transformer: SkyReelsV2Transformer3DModel,
        vae: AutoencoderKLWan,
        scheduler: UniPCMultistepScheduler,
--- a/src/diffusers/pipelines/skyreels_v2/pipeline_skyreels_v2_diffusion_forcing.py
+++ b/src/diffusers/pipelines/skyreels_v2/pipeline_skyreels_v2_diffusion_forcing.py
@@ -19,7 +19,7 @@ from copy import deepcopy
 from typing import Any, Callable

 import torch
-from transformers import AutoTokenizer, T5EncoderModel, UMT5EncoderModel
+from transformers import AutoTokenizer, UMT5EncoderModel

 from ...callbacks import MultiPipelineCallbacks, PipelineCallback
 from ...loaders import SkyReelsV2LoraLoaderMixin
@@ -153,7 +153,7 @@ class SkyReelsV2DiffusionForcingPipeline(DiffusionPipeline, SkyReelsV2LoraLoader
    def __init__(
        self,
        tokenizer: AutoTokenizer,
-        text_encoder: T5EncoderModel | UMT5EncoderModel,
+        text_encoder: UMT5EncoderModel,
        transformer: SkyReelsV2Transformer3DModel,
        vae: AutoencoderKLWan,
        scheduler: UniPCMultistepScheduler,
--- a/src/diffusers/pipelines/skyreels_v2/pipeline_skyreels_v2_diffusion_forcing_i2v.py
+++ b/src/diffusers/pipelines/skyreels_v2/pipeline_skyreels_v2_diffusion_forcing_i2v.py
@@ -20,7 +20,7 @@ from typing import Any, Callable

 import PIL
 import torch
-from transformers import AutoTokenizer, T5EncoderModel, UMT5EncoderModel
+from transformers import AutoTokenizer, UMT5EncoderModel

 from diffusers.image_processor import PipelineImageInput
 from diffusers.utils.torch_utils import randn_tensor
@@ -158,7 +158,7 @@ class SkyReelsV2DiffusionForcingImageToVideoPipeline(DiffusionPipeline, SkyReels
    def __init__(
        self,
        tokenizer: AutoTokenizer,
-        text_encoder: T5EncoderModel | UMT5EncoderModel,
+        text_encoder: UMT5EncoderModel,
        transformer: SkyReelsV2Transformer3DModel,
        vae: AutoencoderKLWan,
        scheduler: UniPCMultistepScheduler,
--- a/src/diffusers/pipelines/skyreels_v2/pipeline_skyreels_v2_diffusion_forcing_v2v.py
+++ b/src/diffusers/pipelines/skyreels_v2/pipeline_skyreels_v2_diffusion_forcing_v2v.py
@@ -21,7 +21,7 @@ from typing import Any, Callable

 import torch
 from PIL import Image
-from transformers import AutoTokenizer, T5EncoderModel, UMT5EncoderModel
+from transformers import AutoTokenizer, UMT5EncoderModel

 from ...callbacks import MultiPipelineCallbacks, PipelineCallback
 from ...loaders import SkyReelsV2LoraLoaderMixin
@@ -214,7 +214,7 @@ class SkyReelsV2DiffusionForcingVideoToVideoPipeline(DiffusionPipeline, SkyReels
    def __init__(
        self,
        tokenizer: AutoTokenizer,
-        text_encoder: T5EncoderModel | UMT5EncoderModel,
+        text_encoder: UMT5EncoderModel,
        transformer: SkyReelsV2Transformer3DModel,
        vae: AutoencoderKLWan,
        scheduler: UniPCMultistepScheduler,
--- a/src/diffusers/pipelines/skyreels_v2/pipeline_skyreels_v2_i2v.py
+++ b/src/diffusers/pipelines/skyreels_v2/pipeline_skyreels_v2_i2v.py
@@ -18,7 +18,7 @@ from typing import Any, Callable
 import PIL
 import regex as re
 import torch
-from transformers import AutoTokenizer, CLIPProcessor, CLIPVisionModelWithProjection, T5EncoderModel, UMT5EncoderModel
+from transformers import AutoTokenizer, CLIPProcessor, CLIPVisionModelWithProjection, UMT5EncoderModel

 from ...callbacks import MultiPipelineCallbacks, PipelineCallback
 from ...image_processor import PipelineImageInput
@@ -157,7 +157,7 @@ class SkyReelsV2ImageToVideoPipeline(DiffusionPipeline, SkyReelsV2LoraLoaderMixi
    def __init__(
        self,
        tokenizer: AutoTokenizer,
-        text_encoder: T5EncoderModel | UMT5EncoderModel,
+        text_encoder: UMT5EncoderModel,
        image_encoder: CLIPVisionModelWithProjection,
        image_processor: CLIPProcessor,
        transformer: SkyReelsV2Transformer3DModel,
--- a/src/diffusers/pipelines/transformers_loading_utils.py
+++ b/src/diffusers/pipelines/transformers_loading_utils.py
@@ -112,8 +112,6 @@ def _load_transformers_model_from_dduf(
                tensors = safetensors.torch.load(mmap)
                # Update the state dictionary with tensors
                state_dict.update(tensors)
-            # `from_pretrained` sets the model to eval mode by default, which is the
-            # correct behavior for inference. Do not call `model.train()` here.
            return cls.from_pretrained(
                pretrained_model_name_or_path=None,
                config=config,
--- a/src/diffusers/pipelines/z_image/pipeline_z_image.py
+++ b/src/diffusers/pipelines/z_image/pipeline_z_image.py
@@ -276,7 +276,7 @@ class ZImagePipeline(DiffusionPipeline, ZImageLoraLoaderMixin, FromSingleFileMix

    @property
    def do_classifier_free_guidance(self):
-        return self._guidance_scale > 0
+        return self._guidance_scale > 1

    @property
    def joint_attention_kwargs(self):
--- a/src/diffusers/utils/dynamic_modules_utils.py
+++ b/src/diffusers/utils/dynamic_modules_utils.py
@@ -299,10 +299,7 @@ def get_cached_module_file(
    # Download and cache module_file from the repo `pretrained_model_name_or_path` of grab it if it's a local file.
    pretrained_model_name_or_path = str(pretrained_model_name_or_path)

-    if subfolder is not None:
-        module_file_or_url = os.path.join(pretrained_model_name_or_path, subfolder, module_file)
-    else:
-        module_file_or_url = os.path.join(pretrained_model_name_or_path, module_file)
+    module_file_or_url = os.path.join(pretrained_model_name_or_path, module_file)

    if os.path.isfile(module_file_or_url):
        resolved_module_file = module_file_or_url
@@ -387,11 +384,7 @@ def get_cached_module_file(
                if not os.path.exists(submodule_path / module_folder):
                    os.makedirs(submodule_path / module_folder)
            module_needed = f"{module_needed}.py"
-            if subfolder is not None:
-                source_path = os.path.join(pretrained_model_name_or_path, subfolder, module_needed)
-            else:
-                source_path = os.path.join(pretrained_model_name_or_path, module_needed)
-            shutil.copyfile(source_path, submodule_path / module_needed)
+            shutil.copyfile(os.path.join(pretrained_model_name_or_path, module_needed), submodule_path / module_needed)
    else:
        # Get the commit hash
        # TODO: we will get this info in the etag soon, so retrieve it from there and not here.
--- a/test_automodel_meta.py
+++ b/test_automodel_meta.py
@@ -1,14 +0,0 @@
-import torch
-from diffusers import AutoModel
-
-repo = "meituan-longcat/LongCat-Image"
-subfolder = "transformer"
-
-config = AutoModel.load_config(repo, subfolder=subfolder)
-
-with torch.device("meta"):
-    model = AutoModel.from_config(config)
-print(f"model.config:")
-for k, v in dict(model.config).items():
-    if not k.startswith("_"):
-        print(f"  {k}: {v}")
--- a/test_dataclass_config.py
+++ b/test_dataclass_config.py
@@ -1,11 +0,0 @@
-import dataclasses
-from diffusers import AutoModel, LongCatImageTransformer2DModel
-
-config_dict = AutoModel.load_config(
-    "meituan-longcat/LongCat-Image",
-    subfolder="transformer",
-)
-# import DiT based on _class_name
-typed_config = LongCatImageTransformer2DModel._get_dataclass_from_config(config_dict)
-for f in dataclasses.fields(typed_config):
-    print(f"{f.name}: {f.type}")
--- a/test_pretrained_config.py
+++ b/test_pretrained_config.py
@@ -1,29 +0,0 @@
-import dataclasses
-import torch
-from diffusers import FluxTransformer2DModel
-from diffusers.models import AutoModel
-
-repo = "black-forest-labs/FLUX.2-dev"
-subfolder = "transformer"
-
-print("=== From load_config (no model instantiation) ===")
-config_dict = FluxTransformer2DModel.load_config(repo, subfolder=subfolder)
-tc = FluxTransformer2DModel._get_dataclass_from_config(config_dict)
-print(f"Type: {type(tc).__name__}")
-for k, v in dataclasses.asdict(tc).items():
-    print(f"  {k}: {v}")
-
-print()
-print("=== From AutoModel.from_config on meta device ===")
-with torch.device("meta"):
-    model = AutoModel.from_config(repo, subfolder=subfolder)
-print(f"model.config:")
-for k, v in dict(model.config).items():
-    if not k.startswith("_"):
-        print(f"  {k}: {v}")
-
-print()
-print("=== Comparison ===")
-dc_dict = dataclasses.asdict(tc)
-config = {k: v for k, v in dict(model.config).items() if not k.startswith("_")}
-print(f"Match: {dc_dict == config}")
--- a/tests/models/controlnets/test_models_controlnet_cosmos.py
+++ b/tests/models/controlnets/test_models_controlnet_cosmos.py
@@ -131,26 +131,6 @@ class CosmosControlNetModelTests(ModelTesterMixin, unittest.TestCase):
        self.assertIsInstance(output[0], list)
        self.assertEqual(len(output[0]), init_dict["n_controlnet_blocks"])

-    def test_condition_mask_changes_output(self):
-        """Test that condition mask affects control outputs."""
-        init_dict, inputs_dict = self.prepare_init_args_and_inputs_for_common()
-        model = self.model_class(**init_dict)
-        model.to(torch_device)
-        model.eval()
-
-        inputs_no_mask = dict(inputs_dict)
-        inputs_no_mask["condition_mask"] = torch.zeros_like(inputs_dict["condition_mask"])
-
-        with torch.no_grad():
-            output_no_mask = model(**inputs_no_mask)
-            output_with_mask = model(**inputs_dict)
-
-        self.assertEqual(len(output_no_mask.control_block_samples), len(output_with_mask.control_block_samples))
-        for no_mask_tensor, with_mask_tensor in zip(
-            output_no_mask.control_block_samples, output_with_mask.control_block_samples
-        ):
-            self.assertFalse(torch.allclose(no_mask_tensor, with_mask_tensor))
-
    def test_conditioning_scale_single(self):
        """Test that a single conditioning scale is broadcast to all blocks."""
        init_dict, inputs_dict = self.prepare_init_args_and_inputs_for_common()
--- a/tests/models/test_models_auto.py
+++ b/tests/models/test_models_auto.py
@@ -1,10 +1,6 @@
-import json
-import os
-import tempfile
 import unittest
-from unittest.mock import MagicMock, patch
+from unittest.mock import patch

-import torch
 from transformers import CLIPTextModel, LongformerModel

 from diffusers.models import AutoModel, UNet2DConditionModel
@@ -24,9 +20,7 @@ class TestAutoModel(unittest.TestCase):
        side_effect=[EnvironmentError("File not found"), {"model_type": "clip_text_model"}],
    )
    def test_load_from_config_transformers_with_subfolder(self, mock_load_config):
-        model = AutoModel.from_pretrained(
-            "hf-internal-testing/tiny-stable-diffusion-torch", subfolder="text_encoder", use_safetensors=False
-        )
+        model = AutoModel.from_pretrained("hf-internal-testing/tiny-stable-diffusion-torch", subfolder="text_encoder")
        assert isinstance(model, CLIPTextModel)

    def test_load_from_config_without_subfolder(self):
@@ -34,112 +28,5 @@ class TestAutoModel(unittest.TestCase):
        assert isinstance(model, LongformerModel)

    def test_load_from_model_index(self):
-        model = AutoModel.from_pretrained(
-            "hf-internal-testing/tiny-stable-diffusion-torch", subfolder="text_encoder", use_safetensors=False
-        )
+        model = AutoModel.from_pretrained("hf-internal-testing/tiny-stable-diffusion-torch", subfolder="text_encoder")
        assert isinstance(model, CLIPTextModel)
-
-    def test_load_dynamic_module_from_local_path_with_subfolder(self):
-        CUSTOM_MODEL_CODE = (
-            "import torch\n"
-            "from diffusers import ModelMixin, ConfigMixin\n"
-            "from diffusers.configuration_utils import register_to_config\n"
-            "\n"
-            "class CustomModel(ModelMixin, ConfigMixin):\n"
-            "    @register_to_config\n"
-            "    def __init__(self, hidden_size=8):\n"
-            "        super().__init__()\n"
-            "        self.linear = torch.nn.Linear(hidden_size, hidden_size)\n"
-            "\n"
-            "    def forward(self, x):\n"
-            "        return self.linear(x)\n"
-        )
-
-        with tempfile.TemporaryDirectory() as tmpdir:
-            subfolder = "custom_model"
-            model_dir = os.path.join(tmpdir, subfolder)
-            os.makedirs(model_dir)
-
-            with open(os.path.join(model_dir, "modeling.py"), "w") as f:
-                f.write(CUSTOM_MODEL_CODE)
-
-            config = {
-                "_class_name": "CustomModel",
-                "_diffusers_version": "0.0.0",
-                "auto_map": {"AutoModel": "modeling.CustomModel"},
-                "hidden_size": 8,
-            }
-            with open(os.path.join(model_dir, "config.json"), "w") as f:
-                json.dump(config, f)
-
-            torch.save({}, os.path.join(model_dir, "diffusion_pytorch_model.bin"))
-
-            model = AutoModel.from_pretrained(tmpdir, subfolder=subfolder, trust_remote_code=True)
-            assert model.__class__.__name__ == "CustomModel"
-            assert model.config["hidden_size"] == 8
-
-
-class TestAutoModelFromConfig(unittest.TestCase):
-    @patch(
-        "diffusers.pipelines.pipeline_loading_utils.get_class_obj_and_candidates",
-        return_value=(MagicMock(), None),
-    )
-    def test_from_config_with_dict_diffusers_class(self, mock_get_class):
-        config = {"_class_name": "UNet2DConditionModel", "sample_size": 64}
-        mock_model = MagicMock()
-        mock_get_class.return_value[0].from_config.return_value = mock_model
-
-        result = AutoModel.from_config(config)
-
-        mock_get_class.assert_called_once_with(
-            library_name="diffusers",
-            class_name="UNet2DConditionModel",
-            importable_classes=unittest.mock.ANY,
-            pipelines=None,
-            is_pipeline_module=False,
-        )
-        mock_get_class.return_value[0].from_config.assert_called_once_with(config)
-        assert result is mock_model
-
-    @patch(
-        "diffusers.pipelines.pipeline_loading_utils.get_class_obj_and_candidates",
-        return_value=(MagicMock(), None),
-    )
-    @patch("diffusers.models.AutoModel.load_config", return_value={"_class_name": "UNet2DConditionModel"})
-    def test_from_config_with_string_path(self, mock_load_config, mock_get_class):
-        mock_model = MagicMock()
-        mock_get_class.return_value[0].from_config.return_value = mock_model
-
-        result = AutoModel.from_config("hf-internal-testing/tiny-stable-diffusion-torch", subfolder="unet")
-
-        mock_load_config.assert_called_once()
-        assert result is mock_model
-
-    def test_from_config_raises_on_missing_class_info(self):
-        config = {"some_key": "some_value"}
-        with self.assertRaises(ValueError, msg="Couldn't find a model class"):
-            AutoModel.from_config(config)
-
-    @patch(
-        "diffusers.pipelines.pipeline_loading_utils.get_class_obj_and_candidates",
-        return_value=(MagicMock(), None),
-    )
-    def test_from_config_with_model_type_routes_to_transformers(self, mock_get_class):
-        config = {"model_type": "clip_text_model"}
-        mock_model = MagicMock()
-        mock_get_class.return_value[0].from_config.return_value = mock_model
-
-        result = AutoModel.from_config(config)
-
-        mock_get_class.assert_called_once_with(
-            library_name="transformers",
-            class_name="AutoModel",
-            importable_classes=unittest.mock.ANY,
-            pipelines=None,
-            is_pipeline_module=False,
-        )
-        assert result is mock_model
-
-    def test_from_config_raises_on_none(self):
-        with self.assertRaises(ValueError, msg="Please provide a `pretrained_model_name_or_path_or_dict`"):
-            AutoModel.from_config(None)
--- a/tests/modular_pipelines/test_modular_pipelines_common.py
+++ b/tests/modular_pipelines/test_modular_pipelines_common.py
@@ -1,5 +1,4 @@
 import gc
-import json
 import os
 import tempfile
 from typing import Callable
@@ -351,33 +350,6 @@ class ModularPipelineTesterMixin:

        assert torch.abs(image_slices[0] - image_slices[1]).max() < 1e-3

-    def test_modular_index_consistency(self):
-        pipe = self.get_pipeline()
-        components_spec = pipe._component_specs
-        components = sorted(components_spec.keys())
-
-        with tempfile.TemporaryDirectory() as tmpdir:
-            pipe.save_pretrained(tmpdir)
-            index_file = os.path.join(tmpdir, "modular_model_index.json")
-            assert os.path.exists(index_file)
-
-            with open(index_file) as f:
-                index_contents = json.load(f)
-
-            compulsory_keys = {"_blocks_class_name", "_class_name", "_diffusers_version"}
-            for k in compulsory_keys:
-                assert k in index_contents
-
-            to_check_attrs = {"pretrained_model_name_or_path", "revision", "subfolder"}
-            for component in components:
-                spec = components_spec[component]
-                for attr in to_check_attrs:
-                    if getattr(spec, "pretrained_model_name_or_path", None) is not None:
-                        for attr in to_check_attrs:
-                            assert component in index_contents, f"{component} should be present in index but isn't."
-                            attr_value_from_index = index_contents[component][2][attr]
-                            assert getattr(spec, attr) == attr_value_from_index
-
    def test_workflow_map(self):
        blocks = self.pipeline_blocks_class()
        if blocks._workflow_map is None:
@@ -715,6 +687,18 @@ class TestLoadComponentsSkipBehavior:
        assert pipe.unet is not None
        assert getattr(pipe, "vae", None) is None

+    def test_load_components_selective_loading_incremental(self):
+        """Loading a subset of components should not affect already-loaded components."""
+        pipe = ModularPipeline.from_pretrained("hf-internal-testing/tiny-stable-diffusion-xl-pipe")
+
+        pipe.load_components(names="unet", torch_dtype=torch.float32)
+        pipe.load_components(names="text_encoder", torch_dtype=torch.float32)
+
+        assert hasattr(pipe, "unet")
+        assert pipe.unet is not None
+        assert hasattr(pipe, "text_encoder")
+        assert pipe.text_encoder is not None
+
    def test_load_components_skips_invalid_pretrained_path(self):
        pipe = ModularPipeline.from_pretrained("hf-internal-testing/tiny-stable-diffusion-xl-pipe")

@@ -777,6 +761,36 @@ class TestCustomModelSavePretrained:
        for key in original_state_dict:
            assert torch.equal(original_state_dict[key], loaded_state_dict[key]), f"Mismatch in {key}"

+    def test_save_pretrained_updates_index_for_model_with_no_load_id(self, tmp_path):
+        """When a component without _diffusers_load_id (custom/local model) is saved,
+        modular_model_index.json should point to the save directory."""
+        import json
+
+        from diffusers import UNet2DConditionModel
+
+        pipe = ModularPipeline.from_pretrained("hf-internal-testing/tiny-stable-diffusion-xl-pipe")
+        pipe.load_components(torch_dtype=torch.float32)
+
+        unet = UNet2DConditionModel.from_pretrained(
+            "hf-internal-testing/tiny-stable-diffusion-xl-pipe", subfolder="unet"
+        )
+        assert not hasattr(unet, "_diffusers_load_id")
+
+        pipe.update_components(unet=unet)
+
+        save_dir = str(tmp_path / "my-pipeline")
+        pipe.save_pretrained(save_dir)
+
+        with open(os.path.join(save_dir, "modular_model_index.json")) as f:
+            index = json.load(f)
+
+        _library, _cls, unet_spec = index["unet"]
+        assert unet_spec["pretrained_model_name_or_path"] == save_dir
+        assert unet_spec["subfolder"] == "unet"
+
+        _library, _cls, vae_spec = index["vae"]
+        assert vae_spec["pretrained_model_name_or_path"] == "hf-internal-testing/tiny-stable-diffusion-xl-pipe"
+
    def test_save_pretrained_overwrite_modular_index(self, tmp_path):
        """With overwrite_modular_index=True, all component references should point to the save directory."""
        import json
--- a/tests/pipelines/audioldm2/test_audioldm2.py
+++ b/tests/pipelines/audioldm2/test_audioldm2.py
@@ -282,8 +282,6 @@ class AudioLDM2PipelineFastTests(PipelineTesterMixin, unittest.TestCase):
        text_inputs = text_inputs["input_ids"].to(torch_device)

        clap_prompt_embeds = audioldm_pipe.text_encoder.get_text_features(text_inputs)
-        if hasattr(clap_prompt_embeds, "pooler_output"):
-            clap_prompt_embeds = clap_prompt_embeds.pooler_output
        clap_prompt_embeds = clap_prompt_embeds[:, None, :]

        text_inputs = audioldm_pipe.tokenizer_2(
@@ -343,8 +341,6 @@ class AudioLDM2PipelineFastTests(PipelineTesterMixin, unittest.TestCase):
            text_inputs = text_inputs["input_ids"].to(torch_device)

            clap_prompt_embeds = audioldm_pipe.text_encoder.get_text_features(text_inputs)
-            if hasattr(clap_prompt_embeds, "pooler_output"):
-                clap_prompt_embeds = clap_prompt_embeds.pooler_output
            clap_prompt_embeds = clap_prompt_embeds[:, None, :]

            text_inputs = audioldm_pipe.tokenizer_2(
--- a/tests/pipelines/bria/test_pipeline_bria.py
+++ b/tests/pipelines/bria/test_pipeline_bria.py
@@ -19,7 +19,7 @@ import unittest
 import numpy as np
 import torch
 from huggingface_hub import hf_hub_download
-from transformers import AutoConfig, T5EncoderModel, T5TokenizerFast
+from transformers import T5EncoderModel, T5TokenizerFast

 from diffusers import (
    AutoencoderKL,
@@ -89,8 +89,7 @@ class BriaPipelineFastTests(PipelineTesterMixin, unittest.TestCase):
        scheduler = FlowMatchEulerDiscreteScheduler()

        torch.manual_seed(0)
-        config = AutoConfig.from_pretrained("hf-internal-testing/tiny-random-t5")
-        text_encoder = T5EncoderModel(config)
+        text_encoder = T5EncoderModel.from_pretrained("hf-internal-testing/tiny-random-t5")
        tokenizer = T5TokenizerFast.from_pretrained("hf-internal-testing/tiny-random-t5")

        components = {
--- a/tests/pipelines/chroma/test_pipeline_chroma.py
+++ b/tests/pipelines/chroma/test_pipeline_chroma.py
@@ -2,7 +2,7 @@ import unittest

 import numpy as np
 import torch
-from transformers import AutoConfig, AutoTokenizer, T5EncoderModel
+from transformers import AutoTokenizer, T5EncoderModel

 from diffusers import AutoencoderKL, ChromaPipeline, ChromaTransformer2DModel, FlowMatchEulerDiscreteScheduler

@@ -41,8 +41,7 @@ class ChromaPipelineFastTests(
        )

        torch.manual_seed(0)
-        config = AutoConfig.from_pretrained("hf-internal-testing/tiny-random-t5")
-        text_encoder = T5EncoderModel(config)
+        text_encoder = T5EncoderModel.from_pretrained("hf-internal-testing/tiny-random-t5")

        tokenizer = AutoTokenizer.from_pretrained("hf-internal-testing/tiny-random-t5")

--- a/tests/pipelines/chroma/test_pipeline_chroma_img2img.py
+++ b/tests/pipelines/chroma/test_pipeline_chroma_img2img.py
@@ -3,7 +3,7 @@ import unittest

 import numpy as np
 import torch
-from transformers import AutoConfig, AutoTokenizer, T5EncoderModel
+from transformers import AutoTokenizer, T5EncoderModel

 from diffusers import AutoencoderKL, ChromaImg2ImgPipeline, ChromaTransformer2DModel, FlowMatchEulerDiscreteScheduler

@@ -42,8 +42,7 @@ class ChromaImg2ImgPipelineFastTests(
        )

        torch.manual_seed(0)
-        config = AutoConfig.from_pretrained("hf-internal-testing/tiny-random-t5")
-        text_encoder = T5EncoderModel(config)
+        text_encoder = T5EncoderModel.from_pretrained("hf-internal-testing/tiny-random-t5")

        tokenizer = AutoTokenizer.from_pretrained("hf-internal-testing/tiny-random-t5")

--- a/tests/pipelines/chronoedit/test_chronoedit.py
+++ b/tests/pipelines/chronoedit/test_chronoedit.py
@@ -17,7 +17,6 @@ import unittest
 import torch
 from PIL import Image
 from transformers import (
-    AutoConfig,
    AutoTokenizer,
    CLIPImageProcessor,
    CLIPVisionConfig,
@@ -72,8 +71,7 @@ class ChronoEditPipelineFastTests(PipelineTesterMixin, unittest.TestCase):
        torch.manual_seed(0)
        # TODO: impl FlowDPMSolverMultistepScheduler
        scheduler = FlowMatchEulerDiscreteScheduler(shift=7.0)
-        config = AutoConfig.from_pretrained("hf-internal-testing/tiny-random-t5")
-        text_encoder = T5EncoderModel(config)
+        text_encoder = T5EncoderModel.from_pretrained("hf-internal-testing/tiny-random-t5")
        tokenizer = AutoTokenizer.from_pretrained("hf-internal-testing/tiny-random-t5")

        torch.manual_seed(0)
--- a/tests/pipelines/cogvideo/test_cogvideox.py
+++ b/tests/pipelines/cogvideo/test_cogvideox.py
@@ -18,7 +18,7 @@ import unittest

 import numpy as np
 import torch
-from transformers import AutoConfig, AutoTokenizer, T5EncoderModel
+from transformers import AutoTokenizer, T5EncoderModel

 from diffusers import AutoencoderKLCogVideoX, CogVideoXPipeline, CogVideoXTransformer3DModel, DDIMScheduler

@@ -117,8 +117,7 @@ class CogVideoXPipelineFastTests(

        torch.manual_seed(0)
        scheduler = DDIMScheduler()
-        config = AutoConfig.from_pretrained("hf-internal-testing/tiny-random-t5")
-        text_encoder = T5EncoderModel(config)
+        text_encoder = T5EncoderModel.from_pretrained("hf-internal-testing/tiny-random-t5")
        tokenizer = AutoTokenizer.from_pretrained("hf-internal-testing/tiny-random-t5")

        components = {
@@ -236,9 +235,6 @@ class CogVideoXPipelineFastTests(
            return

        components = self.get_dummy_components()
-        for key in components:
-            if "text_encoder" in key and hasattr(components[key], "eval"):
-                components[key].eval()
        pipe = self.pipeline_class(**components)
        for component in pipe.components.values():
            if hasattr(component, "set_default_attn_processor"):
--- a/tests/pipelines/cogvideo/test_cogvideox_fun_control.py
+++ b/tests/pipelines/cogvideo/test_cogvideox_fun_control.py
@@ -18,7 +18,7 @@ import unittest
 import numpy as np
 import torch
 from PIL import Image
-from transformers import AutoConfig, AutoTokenizer, T5EncoderModel
+from transformers import AutoTokenizer, T5EncoderModel

 from diffusers import AutoencoderKLCogVideoX, CogVideoXFunControlPipeline, CogVideoXTransformer3DModel, DDIMScheduler

@@ -104,8 +104,7 @@ class CogVideoXFunControlPipelineFastTests(PipelineTesterMixin, unittest.TestCas

        torch.manual_seed(0)
        scheduler = DDIMScheduler()
-        config = AutoConfig.from_pretrained("hf-internal-testing/tiny-random-t5")
-        text_encoder = T5EncoderModel(config)
+        text_encoder = T5EncoderModel.from_pretrained("hf-internal-testing/tiny-random-t5")
        tokenizer = AutoTokenizer.from_pretrained("hf-internal-testing/tiny-random-t5")

        components = {
@@ -229,9 +228,6 @@ class CogVideoXFunControlPipelineFastTests(PipelineTesterMixin, unittest.TestCas
            return

        components = self.get_dummy_components()
-        for key in components:
-            if "text_encoder" in key and hasattr(components[key], "eval"):
-                components[key].eval()
        pipe = self.pipeline_class(**components)
        for component in pipe.components.values():
            if hasattr(component, "set_default_attn_processor"):
--- a/tests/pipelines/cogvideo/test_cogvideox_image2video.py
+++ b/tests/pipelines/cogvideo/test_cogvideox_image2video.py
@@ -19,7 +19,7 @@ import unittest
 import numpy as np
 import torch
 from PIL import Image
-from transformers import AutoConfig, AutoTokenizer, T5EncoderModel
+from transformers import AutoTokenizer, T5EncoderModel

 from diffusers import AutoencoderKLCogVideoX, CogVideoXImageToVideoPipeline, CogVideoXTransformer3DModel, DDIMScheduler
 from diffusers.utils import load_image
@@ -113,8 +113,7 @@ class CogVideoXImageToVideoPipelineFastTests(PipelineTesterMixin, unittest.TestC

        torch.manual_seed(0)
        scheduler = DDIMScheduler()
-        config = AutoConfig.from_pretrained("hf-internal-testing/tiny-random-t5")
-        text_encoder = T5EncoderModel(config)
+        text_encoder = T5EncoderModel.from_pretrained("hf-internal-testing/tiny-random-t5")
        tokenizer = AutoTokenizer.from_pretrained("hf-internal-testing/tiny-random-t5")

        components = {
@@ -238,9 +237,6 @@ class CogVideoXImageToVideoPipelineFastTests(PipelineTesterMixin, unittest.TestC
            return

        components = self.get_dummy_components()
-        for key in components:
-            if "text_encoder" in key and hasattr(components[key], "eval"):
-                components[key].eval()
        pipe = self.pipeline_class(**components)
        for component in pipe.components.values():
            if hasattr(component, "set_default_attn_processor"):
--- a/tests/pipelines/cogvideo/test_cogvideox_video2video.py
+++ b/tests/pipelines/cogvideo/test_cogvideox_video2video.py
@@ -18,7 +18,7 @@ import unittest
 import numpy as np
 import torch
 from PIL import Image
-from transformers import AutoConfig, AutoTokenizer, T5EncoderModel
+from transformers import AutoTokenizer, T5EncoderModel

 from diffusers import AutoencoderKLCogVideoX, CogVideoXTransformer3DModel, CogVideoXVideoToVideoPipeline, DDIMScheduler

@@ -99,8 +99,7 @@ class CogVideoXVideoToVideoPipelineFastTests(PipelineTesterMixin, unittest.TestC

        torch.manual_seed(0)
        scheduler = DDIMScheduler()
-        config = AutoConfig.from_pretrained("hf-internal-testing/tiny-random-t5")
-        text_encoder = T5EncoderModel(config)
+        text_encoder = T5EncoderModel.from_pretrained("hf-internal-testing/tiny-random-t5")
        tokenizer = AutoTokenizer.from_pretrained("hf-internal-testing/tiny-random-t5")

        components = {
--- a/tests/pipelines/cogview3/test_cogview3plus.py
+++ b/tests/pipelines/cogview3/test_cogview3plus.py
@@ -18,7 +18,7 @@ import unittest

 import numpy as np
 import torch
-from transformers import AutoConfig, AutoTokenizer, T5EncoderModel
+from transformers import AutoTokenizer, T5EncoderModel

 from diffusers import AutoencoderKL, CogVideoXDDIMScheduler, CogView3PlusPipeline, CogView3PlusTransformer2DModel

@@ -89,8 +89,7 @@ class CogView3PlusPipelineFastTests(PipelineTesterMixin, unittest.TestCase):

        torch.manual_seed(0)
        scheduler = CogVideoXDDIMScheduler()
-        config = AutoConfig.from_pretrained("hf-internal-testing/tiny-random-t5")
-        text_encoder = T5EncoderModel(config)
+        text_encoder = T5EncoderModel.from_pretrained("hf-internal-testing/tiny-random-t5")
        tokenizer = AutoTokenizer.from_pretrained("hf-internal-testing/tiny-random-t5")

        components = {
--- a/tests/pipelines/cogview4/test_cogview4.py
+++ b/tests/pipelines/cogview4/test_cogview4.py
@@ -108,7 +108,7 @@ class CogView4PipelineFastTests(PipelineTesterMixin, unittest.TestCase):
            generator = torch.Generator(device=device).manual_seed(seed)
        inputs = {
            "prompt": "dance monkey",
-            "negative_prompt": "bad",
+            "negative_prompt": "",
            "generator": generator,
            "num_inference_steps": 2,
            "guidance_scale": 6.0,
--- a/tests/pipelines/consisid/test_consisid.py
+++ b/tests/pipelines/consisid/test_consisid.py
@@ -19,7 +19,7 @@ import unittest
 import numpy as np
 import torch
 from PIL import Image
-from transformers import AutoConfig, AutoTokenizer, T5EncoderModel
+from transformers import AutoTokenizer, T5EncoderModel

 from diffusers import AutoencoderKLCogVideoX, ConsisIDPipeline, ConsisIDTransformer3DModel, DDIMScheduler
 from diffusers.utils import load_image
@@ -122,8 +122,7 @@ class ConsisIDPipelineFastTests(PipelineTesterMixin, unittest.TestCase):

        torch.manual_seed(0)
        scheduler = DDIMScheduler()
-        config = AutoConfig.from_pretrained("hf-internal-testing/tiny-random-t5")
-        text_encoder = T5EncoderModel(config)
+        text_encoder = T5EncoderModel.from_pretrained("hf-internal-testing/tiny-random-t5")
        tokenizer = AutoTokenizer.from_pretrained("hf-internal-testing/tiny-random-t5")

        components = {
@@ -249,9 +248,6 @@ class ConsisIDPipelineFastTests(PipelineTesterMixin, unittest.TestCase):
            return

        components = self.get_dummy_components()
-        for key in components:
-            if "text_encoder" in key and hasattr(components[key], "eval"):
-                components[key].eval()
        pipe = self.pipeline_class(**components)
        for component in pipe.components.values():
            if hasattr(component, "set_default_attn_processor"):
--- a/tests/pipelines/controlnet_flux/test_controlnet_flux.py
+++ b/tests/pipelines/controlnet_flux/test_controlnet_flux.py
@@ -19,7 +19,7 @@ import unittest
 import numpy as np
 import torch
 from huggingface_hub import hf_hub_download
-from transformers import AutoConfig, CLIPTextConfig, CLIPTextModel, CLIPTokenizer, T5EncoderModel, T5TokenizerFast
+from transformers import CLIPTextConfig, CLIPTextModel, CLIPTokenizer, T5EncoderModel, T5TokenizerFast

 from diffusers import (
    AutoencoderKL,
@@ -97,8 +97,7 @@ class FluxControlNetPipelineFastTests(unittest.TestCase, PipelineTesterMixin, Fl
        text_encoder = CLIPTextModel(clip_text_encoder_config)

        torch.manual_seed(0)
-        config = AutoConfig.from_pretrained("hf-internal-testing/tiny-random-t5")
-        text_encoder_2 = T5EncoderModel(config)
+        text_encoder_2 = T5EncoderModel.from_pretrained("hf-internal-testing/tiny-random-t5")

        tokenizer = CLIPTokenizer.from_pretrained("hf-internal-testing/tiny-random-clip")
        tokenizer_2 = T5TokenizerFast.from_pretrained("hf-internal-testing/tiny-random-t5")
--- a/tests/pipelines/controlnet_flux/test_controlnet_flux_img2img.py
+++ b/tests/pipelines/controlnet_flux/test_controlnet_flux_img2img.py
@@ -2,7 +2,7 @@ import unittest

 import numpy as np
 import torch
-from transformers import AutoConfig, AutoTokenizer, CLIPTextConfig, CLIPTextModel, CLIPTokenizer, T5EncoderModel
+from transformers import AutoTokenizer, CLIPTextConfig, CLIPTextModel, CLIPTokenizer, T5EncoderModel

 from diffusers import (
    AutoencoderKL,
@@ -13,7 +13,9 @@ from diffusers import (
 )
 from diffusers.utils.torch_utils import randn_tensor

-from ...testing_utils import torch_device
+from ...testing_utils import (
+    torch_device,
+)
 from ..test_pipelines_common import PipelineTesterMixin, check_qkv_fused_layers_exist


@@ -68,8 +70,7 @@ class FluxControlNetImg2ImgPipelineFastTests(unittest.TestCase, PipelineTesterMi
        text_encoder = CLIPTextModel(clip_text_encoder_config)

        torch.manual_seed(0)
-        config = AutoConfig.from_pretrained("hf-internal-testing/tiny-random-t5")
-        text_encoder_2 = T5EncoderModel(config)
+        text_encoder_2 = T5EncoderModel.from_pretrained("hf-internal-testing/tiny-random-t5")

        tokenizer = CLIPTokenizer.from_pretrained("hf-internal-testing/tiny-random-clip")
        tokenizer_2 = AutoTokenizer.from_pretrained("hf-internal-testing/tiny-random-t5")
--- a/tests/pipelines/controlnet_flux/test_controlnet_flux_inpaint.py
+++ b/tests/pipelines/controlnet_flux/test_controlnet_flux_inpaint.py
@@ -3,7 +3,15 @@ import unittest

 import numpy as np
 import torch
-from transformers import AutoConfig, AutoTokenizer, CLIPTextConfig, CLIPTextModel, CLIPTokenizer, T5EncoderModel
+
+# torch_device,  # {{ edit_1 }} Removed unused import
+from transformers import (
+    AutoTokenizer,
+    CLIPTextConfig,
+    CLIPTextModel,
+    CLIPTokenizer,
+    T5EncoderModel,
+)

 from diffusers import (
    AutoencoderKL,
@@ -14,7 +22,11 @@ from diffusers import (
 )
 from diffusers.utils.torch_utils import randn_tensor

-from ...testing_utils import enable_full_determinism, floats_tensor, torch_device
+from ...testing_utils import (
+    enable_full_determinism,
+    floats_tensor,
+    torch_device,
+)
 from ..test_pipelines_common import PipelineTesterMixin


@@ -73,8 +85,7 @@ class FluxControlNetInpaintPipelineTests(unittest.TestCase, PipelineTesterMixin)
        text_encoder = CLIPTextModel(clip_text_encoder_config)

        torch.manual_seed(0)
-        config = AutoConfig.from_pretrained("hf-internal-testing/tiny-random-t5")
-        text_encoder_2 = T5EncoderModel(config)
+        text_encoder_2 = T5EncoderModel.from_pretrained("hf-internal-testing/tiny-random-t5")

        tokenizer = CLIPTokenizer.from_pretrained("hf-internal-testing/tiny-random-clip")
        tokenizer_2 = AutoTokenizer.from_pretrained("hf-internal-testing/tiny-random-t5")
--- a/tests/pipelines/controlnet_hunyuandit/test_controlnet_hunyuandit.py
+++ b/tests/pipelines/controlnet_hunyuandit/test_controlnet_hunyuandit.py
@@ -18,7 +18,7 @@ import unittest

 import numpy as np
 import torch
-from transformers import AutoConfig, AutoTokenizer, BertModel, T5EncoderModel
+from transformers import AutoTokenizer, BertModel, T5EncoderModel

 from diffusers import (
    AutoencoderKL,
@@ -96,10 +96,7 @@ class HunyuanDiTControlNetPipelineFastTests(unittest.TestCase, PipelineTesterMix
        scheduler = DDPMScheduler()
        text_encoder = BertModel.from_pretrained("hf-internal-testing/tiny-random-BertModel")
        tokenizer = AutoTokenizer.from_pretrained("hf-internal-testing/tiny-random-BertModel")
-
-        torch.manual_seed(0)
-        config = AutoConfig.from_pretrained("hf-internal-testing/tiny-random-t5")
-        text_encoder_2 = T5EncoderModel(config)
+        text_encoder_2 = T5EncoderModel.from_pretrained("hf-internal-testing/tiny-random-t5")
        tokenizer_2 = AutoTokenizer.from_pretrained("hf-internal-testing/tiny-random-t5")

        components = {
--- a/tests/pipelines/controlnet_sd3/test_controlnet_inpaint_sd3.py
+++ b/tests/pipelines/controlnet_sd3/test_controlnet_inpaint_sd3.py
@@ -17,14 +17,7 @@ import unittest

 import numpy as np
 import torch
-from transformers import (
-    AutoConfig,
-    AutoTokenizer,
-    CLIPTextConfig,
-    CLIPTextModelWithProjection,
-    CLIPTokenizer,
-    T5EncoderModel,
-)
+from transformers import AutoTokenizer, CLIPTextConfig, CLIPTextModelWithProjection, CLIPTokenizer, T5EncoderModel

 from diffusers import (
    AutoencoderKL,
@@ -35,7 +28,10 @@ from diffusers import (
 from diffusers.models import SD3ControlNetModel
 from diffusers.utils.torch_utils import randn_tensor

-from ...testing_utils import enable_full_determinism, torch_device
+from ...testing_utils import (
+    enable_full_determinism,
+    torch_device,
+)
 from ..test_pipelines_common import PipelineTesterMixin


@@ -107,8 +103,7 @@ class StableDiffusion3ControlInpaintNetPipelineFastTests(unittest.TestCase, Pipe
        text_encoder_2 = CLIPTextModelWithProjection(clip_text_encoder_config)

        torch.manual_seed(0)
-        config = AutoConfig.from_pretrained("hf-internal-testing/tiny-random-t5")
-        text_encoder_3 = T5EncoderModel(config)
+        text_encoder_3 = T5EncoderModel.from_pretrained("hf-internal-testing/tiny-random-t5")

        tokenizer = CLIPTokenizer.from_pretrained("hf-internal-testing/tiny-random-clip")
        tokenizer_2 = CLIPTokenizer.from_pretrained("hf-internal-testing/tiny-random-clip")
--- a/tests/pipelines/controlnet_sd3/test_controlnet_sd3.py
+++ b/tests/pipelines/controlnet_sd3/test_controlnet_sd3.py
@@ -18,14 +18,7 @@ import unittest

 import numpy as np
 import torch
-from transformers import (
-    AutoConfig,
-    AutoTokenizer,
-    CLIPTextConfig,
-    CLIPTextModelWithProjection,
-    CLIPTokenizer,
-    T5EncoderModel,
-)
+from transformers import AutoTokenizer, CLIPTextConfig, CLIPTextModelWithProjection, CLIPTokenizer, T5EncoderModel

 from diffusers import (
    AutoencoderKL,
@@ -124,8 +117,7 @@ class StableDiffusion3ControlNetPipelineFastTests(unittest.TestCase, PipelineTes
        text_encoder_2 = CLIPTextModelWithProjection(clip_text_encoder_config)

        torch.manual_seed(0)
-        config = AutoConfig.from_pretrained("hf-internal-testing/tiny-random-t5")
-        text_encoder_3 = T5EncoderModel(config)
+        text_encoder_3 = T5EncoderModel.from_pretrained("hf-internal-testing/tiny-random-t5")

        tokenizer = CLIPTokenizer.from_pretrained("hf-internal-testing/tiny-random-clip")
        tokenizer_2 = CLIPTokenizer.from_pretrained("hf-internal-testing/tiny-random-clip")
--- a/tests/pipelines/cosmos/test_cosmos.py
+++ b/tests/pipelines/cosmos/test_cosmos.py
@@ -20,7 +20,7 @@ import unittest

 import numpy as np
 import torch
-from transformers import AutoConfig, AutoTokenizer, T5EncoderModel
+from transformers import AutoTokenizer, T5EncoderModel

 from diffusers import AutoencoderKLCosmos, CosmosTextToWorldPipeline, CosmosTransformer3DModel, EDMEulerScheduler

@@ -107,8 +107,7 @@ class CosmosTextToWorldPipelineFastTests(PipelineTesterMixin, unittest.TestCase)
            rho=7.0,
            final_sigmas_type="sigma_min",
        )
-        config = AutoConfig.from_pretrained("hf-internal-testing/tiny-random-t5")
-        text_encoder = T5EncoderModel(config)
+        text_encoder = T5EncoderModel.from_pretrained("hf-internal-testing/tiny-random-t5")
        tokenizer = AutoTokenizer.from_pretrained("hf-internal-testing/tiny-random-t5")

        components = {
@@ -233,9 +232,6 @@ class CosmosTextToWorldPipelineFastTests(PipelineTesterMixin, unittest.TestCase)
            return

        components = self.get_dummy_components()
-        for key in components:
-            if "text_encoder" in key and hasattr(components[key], "eval"):
-                components[key].eval()
        pipe = self.pipeline_class(**components)
        for component in pipe.components.values():
            if hasattr(component, "set_default_attn_processor"):
--- a/tests/pipelines/cosmos/test_cosmos2_5_transfer.py
+++ b/tests/pipelines/cosmos/test_cosmos2_5_transfer.py
@@ -55,7 +55,7 @@ class Cosmos2_5_TransferWrapper(Cosmos2_5_TransferPipeline):
 class Cosmos2_5_TransferPipelineFastTests(PipelineTesterMixin, unittest.TestCase):
    pipeline_class = Cosmos2_5_TransferWrapper
    params = TEXT_TO_IMAGE_PARAMS - {"cross_attention_kwargs"}
-    batch_params = TEXT_TO_IMAGE_BATCH_PARAMS.union({"controls"})
+    batch_params = TEXT_TO_IMAGE_BATCH_PARAMS
    image_params = TEXT_TO_IMAGE_IMAGE_PARAMS
    image_latents_params = TEXT_TO_IMAGE_IMAGE_PARAMS
    required_optional_params = frozenset(
@@ -176,19 +176,15 @@ class Cosmos2_5_TransferPipelineFastTests(PipelineTesterMixin, unittest.TestCase
        else:
            generator = torch.Generator(device=device).manual_seed(seed)

-        controls_generator = torch.Generator(device="cpu").manual_seed(seed)
-
        inputs = {
            "prompt": "dance monkey",
            "negative_prompt": "bad quality",
-            "controls": [torch.randn(3, 32, 32, generator=controls_generator) for _ in range(5)],
            "generator": generator,
            "num_inference_steps": 2,
            "guidance_scale": 3.0,
            "height": 32,
            "width": 32,
            "num_frames": 3,
-            "num_frames_per_chunk": 16,
            "max_sequence_length": 16,
            "output_type": "pt",
        }
@@ -216,56 +212,6 @@ class Cosmos2_5_TransferPipelineFastTests(PipelineTesterMixin, unittest.TestCase
        self.assertEqual(generated_video.shape, (3, 3, 32, 32))
        self.assertTrue(torch.isfinite(generated_video).all())

-    def test_inference_autoregressive_multi_chunk(self):
-        device = "cpu"
-
-        components = self.get_dummy_components()
-        pipe = self.pipeline_class(**components)
-        pipe.to(device)
-        pipe.set_progress_bar_config(disable=None)
-
-        inputs = self.get_dummy_inputs(device)
-        inputs["num_frames"] = 5
-        inputs["num_frames_per_chunk"] = 3
-        inputs["num_ar_conditional_frames"] = 1
-
-        video = pipe(**inputs).frames
-        generated_video = video[0]
-        self.assertEqual(generated_video.shape, (5, 3, 32, 32))
-        self.assertTrue(torch.isfinite(generated_video).all())
-
-    def test_inference_autoregressive_multi_chunk_no_condition_frames(self):
-        device = "cpu"
-
-        components = self.get_dummy_components()
-        pipe = self.pipeline_class(**components)
-        pipe.to(device)
-        pipe.set_progress_bar_config(disable=None)
-
-        inputs = self.get_dummy_inputs(device)
-        inputs["num_frames"] = 5
-        inputs["num_frames_per_chunk"] = 3
-        inputs["num_ar_conditional_frames"] = 0
-
-        video = pipe(**inputs).frames
-        generated_video = video[0]
-        self.assertEqual(generated_video.shape, (5, 3, 32, 32))
-        self.assertTrue(torch.isfinite(generated_video).all())
-
-    def test_num_frames_per_chunk_above_rope_raises(self):
-        device = "cpu"
-
-        components = self.get_dummy_components()
-        pipe = self.pipeline_class(**components)
-        pipe.to(device)
-        pipe.set_progress_bar_config(disable=None)
-
-        inputs = self.get_dummy_inputs(device)
-        inputs["num_frames_per_chunk"] = 17
-
-        with self.assertRaisesRegex(ValueError, "too large for RoPE setting"):
-            pipe(**inputs)
-
    def test_inference_with_controls(self):
        """Test inference with control inputs (ControlNet)."""
        device = "cpu"
@@ -276,13 +222,13 @@ class Cosmos2_5_TransferPipelineFastTests(PipelineTesterMixin, unittest.TestCase
        pipe.set_progress_bar_config(disable=None)

        inputs = self.get_dummy_inputs(device)
-        inputs["controls"] = [torch.randn(3, 32, 32) for _ in range(5)]  # list of 5 frames (C, H, W)
+        # Add control video input - should be a video tensor
+        inputs["controls"] = [torch.randn(3, 3, 32, 32)]  # num_frames, channels, height, width
        inputs["controls_conditioning_scale"] = 1.0
-        inputs["num_frames"] = None

        video = pipe(**inputs).frames
        generated_video = video[0]
-        self.assertEqual(generated_video.shape, (5, 3, 32, 32))
+        self.assertEqual(generated_video.shape, (3, 3, 32, 32))
        self.assertTrue(torch.isfinite(generated_video).all())

    def test_callback_inputs(self):
--- a/tests/pipelines/cosmos/test_cosmos2_text2image.py
+++ b/tests/pipelines/cosmos/test_cosmos2_text2image.py
@@ -20,7 +20,7 @@ import unittest

 import numpy as np
 import torch
-from transformers import AutoConfig, AutoTokenizer, T5EncoderModel
+from transformers import AutoTokenizer, T5EncoderModel

 from diffusers import (
    AutoencoderKLWan,
@@ -95,8 +95,7 @@ class Cosmos2TextToImagePipelineFastTests(PipelineTesterMixin, unittest.TestCase

        torch.manual_seed(0)
        scheduler = FlowMatchEulerDiscreteScheduler(use_karras_sigmas=True)
-        config = AutoConfig.from_pretrained("hf-internal-testing/tiny-random-t5")
-        text_encoder = T5EncoderModel(config)
+        text_encoder = T5EncoderModel.from_pretrained("hf-internal-testing/tiny-random-t5")
        tokenizer = AutoTokenizer.from_pretrained("hf-internal-testing/tiny-random-t5")

        components = {
--- a/tests/pipelines/cosmos/test_cosmos2_video2world.py
+++ b/tests/pipelines/cosmos/test_cosmos2_video2world.py
@@ -21,7 +21,7 @@ import unittest
 import numpy as np
 import PIL.Image
 import torch
-from transformers import AutoConfig, AutoTokenizer, T5EncoderModel
+from transformers import AutoTokenizer, T5EncoderModel

 from diffusers import (
    AutoencoderKLWan,
@@ -96,8 +96,7 @@ class Cosmos2VideoToWorldPipelineFastTests(PipelineTesterMixin, unittest.TestCas

        torch.manual_seed(0)
        scheduler = FlowMatchEulerDiscreteScheduler(use_karras_sigmas=True)
-        config = AutoConfig.from_pretrained("hf-internal-testing/tiny-random-t5")
-        text_encoder = T5EncoderModel(config)
+        text_encoder = T5EncoderModel.from_pretrained("hf-internal-testing/tiny-random-t5")
        tokenizer = AutoTokenizer.from_pretrained("hf-internal-testing/tiny-random-t5")

        components = {
--- a/tests/pipelines/cosmos/test_cosmos_video2world.py
+++ b/tests/pipelines/cosmos/test_cosmos_video2world.py
@@ -21,7 +21,7 @@ import unittest
 import numpy as np
 import PIL.Image
 import torch
-from transformers import AutoConfig, AutoTokenizer, T5EncoderModel
+from transformers import AutoTokenizer, T5EncoderModel

 from diffusers import AutoencoderKLCosmos, CosmosTransformer3DModel, CosmosVideoToWorldPipeline, EDMEulerScheduler

@@ -108,8 +108,7 @@ class CosmosVideoToWorldPipelineFastTests(PipelineTesterMixin, unittest.TestCase
            rho=7.0,
            final_sigmas_type="sigma_min",
        )
-        config = AutoConfig.from_pretrained("hf-internal-testing/tiny-random-t5")
-        text_encoder = T5EncoderModel(config)
+        text_encoder = T5EncoderModel.from_pretrained("hf-internal-testing/tiny-random-t5")
        tokenizer = AutoTokenizer.from_pretrained("hf-internal-testing/tiny-random-t5")

        components = {
@@ -246,9 +245,6 @@ class CosmosVideoToWorldPipelineFastTests(PipelineTesterMixin, unittest.TestCase
            return

        components = self.get_dummy_components()
-        for key in components:
-            if "text_encoder" in key and hasattr(components[key], "eval"):
-                components[key].eval()
        pipe = self.pipeline_class(**components)
        for component in pipe.components.values():
            if hasattr(component, "set_default_attn_processor"):
--- a/tests/pipelines/deepfloyd_if/init.py
+++ b/tests/pipelines/deepfloyd_if/init.py
@@ -2,7 +2,7 @@ import tempfile

 import numpy as np
 import torch
-from transformers import AutoConfig, AutoTokenizer, T5EncoderModel
+from transformers import AutoTokenizer, T5EncoderModel

 from diffusers import DDPMScheduler, UNet2DConditionModel
 from diffusers.models.attention_processor import AttnAddedKVProcessor
@@ -18,8 +18,7 @@ from ..test_pipelines_common import to_np
 class IFPipelineTesterMixin:
    def _get_dummy_components(self):
        torch.manual_seed(0)
-        config = AutoConfig.from_pretrained("hf-internal-testing/tiny-random-t5")
-        text_encoder = T5EncoderModel(config)
+        text_encoder = T5EncoderModel.from_pretrained("hf-internal-testing/tiny-random-t5")

        torch.manual_seed(0)
        tokenizer = AutoTokenizer.from_pretrained("hf-internal-testing/tiny-random-t5")
@@ -76,8 +75,7 @@ class IFPipelineTesterMixin:

    def _get_superresolution_dummy_components(self):
        torch.manual_seed(0)
-        config = AutoConfig.from_pretrained("hf-internal-testing/tiny-random-t5")
-        text_encoder = T5EncoderModel(config)
+        text_encoder = T5EncoderModel.from_pretrained("hf-internal-testing/tiny-random-t5")

        torch.manual_seed(0)
        tokenizer = AutoTokenizer.from_pretrained("hf-internal-testing/tiny-random-t5")
@@ -252,9 +250,6 @@ class IFPipelineTesterMixin:
    # This should be handled in the base test and then this method can be removed.
    def _test_save_load_local(self):
        components = self.get_dummy_components()
-        for key in components:
-            if "text_encoder" in key and hasattr(components[key], "eval"):
-                components[key].eval()
        pipe = self.pipeline_class(**components)
        pipe.to(torch_device)
        pipe.set_progress_bar_config(disable=None)
--- a/tests/pipelines/deepfloyd_if/test_if.py
+++ b/tests/pipelines/deepfloyd_if/test_if.py
@@ -18,7 +18,9 @@ import unittest

 import torch

-from diffusers import IFPipeline
+from diffusers import (
+    IFPipeline,
+)
 from diffusers.models.attention_processor import AttnAddedKVProcessor
 from diffusers.utils.import_utils import is_xformers_available

--- a/tests/pipelines/flux/test_pipeline_flux.py
+++ b/tests/pipelines/flux/test_pipeline_flux.py
@@ -4,7 +4,7 @@ import unittest
 import numpy as np
 import torch
 from huggingface_hub import hf_hub_download
-from transformers import AutoConfig, AutoTokenizer, CLIPTextConfig, CLIPTextModel, CLIPTokenizer, T5EncoderModel
+from transformers import AutoTokenizer, CLIPTextConfig, CLIPTextModel, CLIPTokenizer, T5EncoderModel

 from diffusers import (
    AutoencoderKL,
@@ -93,8 +93,7 @@ class FluxPipelineFastTests(
        text_encoder = CLIPTextModel(clip_text_encoder_config)

        torch.manual_seed(0)
-        config = AutoConfig.from_pretrained("hf-internal-testing/tiny-random-t5")
-        text_encoder_2 = T5EncoderModel(config)
+        text_encoder_2 = T5EncoderModel.from_pretrained("hf-internal-testing/tiny-random-t5")

        tokenizer = CLIPTokenizer.from_pretrained("hf-internal-testing/tiny-random-clip")
        tokenizer_2 = AutoTokenizer.from_pretrained("hf-internal-testing/tiny-random-t5")
--- a/tests/pipelines/flux/test_pipeline_flux_control.py
+++ b/tests/pipelines/flux/test_pipeline_flux_control.py
@@ -3,7 +3,7 @@ import unittest
 import numpy as np
 import torch
 from PIL import Image
-from transformers import AutoConfig, AutoTokenizer, CLIPTextConfig, CLIPTextModel, CLIPTokenizer, T5EncoderModel
+from transformers import AutoTokenizer, CLIPTextConfig, CLIPTextModel, CLIPTokenizer, T5EncoderModel

 from diffusers import AutoencoderKL, FlowMatchEulerDiscreteScheduler, FluxControlPipeline, FluxTransformer2DModel

@@ -53,8 +53,7 @@ class FluxControlPipelineFastTests(unittest.TestCase, PipelineTesterMixin):
        text_encoder = CLIPTextModel(clip_text_encoder_config)

        torch.manual_seed(0)
-        config = AutoConfig.from_pretrained("hf-internal-testing/tiny-random-t5")
-        text_encoder_2 = T5EncoderModel(config)
+        text_encoder_2 = T5EncoderModel.from_pretrained("hf-internal-testing/tiny-random-t5")

        tokenizer = CLIPTokenizer.from_pretrained("hf-internal-testing/tiny-random-clip")
        tokenizer_2 = AutoTokenizer.from_pretrained("hf-internal-testing/tiny-random-t5")
--- a/tests/pipelines/flux/test_pipeline_flux_control_img2img.py
+++ b/tests/pipelines/flux/test_pipeline_flux_control_img2img.py
@@ -3,7 +3,7 @@ import unittest
 import numpy as np
 import torch
 from PIL import Image
-from transformers import AutoConfig, AutoTokenizer, CLIPTextConfig, CLIPTextModel, CLIPTokenizer, T5EncoderModel
+from transformers import AutoTokenizer, CLIPTextConfig, CLIPTextModel, CLIPTokenizer, T5EncoderModel

 from diffusers import (
    AutoencoderKL,
@@ -57,8 +57,7 @@ class FluxControlImg2ImgPipelineFastTests(unittest.TestCase, PipelineTesterMixin
        text_encoder = CLIPTextModel(clip_text_encoder_config)

        torch.manual_seed(0)
-        config = AutoConfig.from_pretrained("hf-internal-testing/tiny-random-t5")
-        text_encoder_2 = T5EncoderModel(config)
+        text_encoder_2 = T5EncoderModel.from_pretrained("hf-internal-testing/tiny-random-t5")

        tokenizer = CLIPTokenizer.from_pretrained("hf-internal-testing/tiny-random-clip")
        tokenizer_2 = AutoTokenizer.from_pretrained("hf-internal-testing/tiny-random-t5")
--- a/tests/pipelines/flux/test_pipeline_flux_control_inpaint.py
+++ b/tests/pipelines/flux/test_pipeline_flux_control_inpaint.py
@@ -3,7 +3,7 @@ import unittest
 import numpy as np
 import torch
 from PIL import Image
-from transformers import AutoConfig, AutoTokenizer, CLIPTextConfig, CLIPTextModel, CLIPTokenizer, T5EncoderModel
+from transformers import AutoTokenizer, CLIPTextConfig, CLIPTextModel, CLIPTokenizer, T5EncoderModel

 from diffusers import (
    AutoencoderKL,
@@ -58,8 +58,7 @@ class FluxControlInpaintPipelineFastTests(unittest.TestCase, PipelineTesterMixin
        text_encoder = CLIPTextModel(clip_text_encoder_config)

        torch.manual_seed(0)
-        config = AutoConfig.from_pretrained("hf-internal-testing/tiny-random-t5")
-        text_encoder_2 = T5EncoderModel(config)
+        text_encoder_2 = T5EncoderModel.from_pretrained("hf-internal-testing/tiny-random-t5")

        tokenizer = CLIPTokenizer.from_pretrained("hf-internal-testing/tiny-random-clip")
        tokenizer_2 = AutoTokenizer.from_pretrained("hf-internal-testing/tiny-random-t5")
--- a/tests/pipelines/flux/test_pipeline_flux_fill.py
+++ b/tests/pipelines/flux/test_pipeline_flux_fill.py
@@ -3,7 +3,7 @@ import unittest

 import numpy as np
 import torch
-from transformers import AutoConfig, AutoTokenizer, CLIPTextConfig, CLIPTextModel, CLIPTokenizer, T5EncoderModel
+from transformers import AutoTokenizer, CLIPTextConfig, CLIPTextModel, CLIPTokenizer, T5EncoderModel

 from diffusers import AutoencoderKL, FlowMatchEulerDiscreteScheduler, FluxFillPipeline, FluxTransformer2DModel

@@ -58,8 +58,7 @@ class FluxFillPipelineFastTests(unittest.TestCase, PipelineTesterMixin):
        text_encoder = CLIPTextModel(clip_text_encoder_config)

        torch.manual_seed(0)
-        config = AutoConfig.from_pretrained("hf-internal-testing/tiny-random-t5")
-        text_encoder_2 = T5EncoderModel(config)
+        text_encoder_2 = T5EncoderModel.from_pretrained("hf-internal-testing/tiny-random-t5")

        tokenizer = CLIPTokenizer.from_pretrained("hf-internal-testing/tiny-random-clip")
        tokenizer_2 = AutoTokenizer.from_pretrained("hf-internal-testing/tiny-random-t5")
--- a/tests/pipelines/flux/test_pipeline_flux_img2img.py
+++ b/tests/pipelines/flux/test_pipeline_flux_img2img.py
@@ -3,7 +3,7 @@ import unittest

 import numpy as np
 import torch
-from transformers import AutoConfig, AutoTokenizer, CLIPTextConfig, CLIPTextModel, CLIPTokenizer, T5EncoderModel
+from transformers import AutoTokenizer, CLIPTextConfig, CLIPTextModel, CLIPTokenizer, T5EncoderModel

 from diffusers import AutoencoderKL, FlowMatchEulerDiscreteScheduler, FluxImg2ImgPipeline, FluxTransformer2DModel

@@ -55,8 +55,7 @@ class FluxImg2ImgPipelineFastTests(unittest.TestCase, PipelineTesterMixin, FluxI
        text_encoder = CLIPTextModel(clip_text_encoder_config)

        torch.manual_seed(0)
-        config = AutoConfig.from_pretrained("hf-internal-testing/tiny-random-t5")
-        text_encoder_2 = T5EncoderModel(config)
+        text_encoder_2 = T5EncoderModel.from_pretrained("hf-internal-testing/tiny-random-t5")

        tokenizer = CLIPTokenizer.from_pretrained("hf-internal-testing/tiny-random-clip")
        tokenizer_2 = AutoTokenizer.from_pretrained("hf-internal-testing/tiny-random-t5")
--- a/tests/pipelines/flux/test_pipeline_flux_inpaint.py
+++ b/tests/pipelines/flux/test_pipeline_flux_inpaint.py
@@ -3,7 +3,7 @@ import unittest

 import numpy as np
 import torch
-from transformers import AutoConfig, AutoTokenizer, CLIPTextConfig, CLIPTextModel, CLIPTokenizer, T5EncoderModel
+from transformers import AutoTokenizer, CLIPTextConfig, CLIPTextModel, CLIPTokenizer, T5EncoderModel

 from diffusers import AutoencoderKL, FlowMatchEulerDiscreteScheduler, FluxInpaintPipeline, FluxTransformer2DModel

@@ -55,8 +55,7 @@ class FluxInpaintPipelineFastTests(unittest.TestCase, PipelineTesterMixin, FluxI
        text_encoder = CLIPTextModel(clip_text_encoder_config)

        torch.manual_seed(0)
-        config = AutoConfig.from_pretrained("hf-internal-testing/tiny-random-t5")
-        text_encoder_2 = T5EncoderModel(config)
+        text_encoder_2 = T5EncoderModel.from_pretrained("hf-internal-testing/tiny-random-t5")

        tokenizer = CLIPTokenizer.from_pretrained("hf-internal-testing/tiny-random-clip")
        tokenizer_2 = AutoTokenizer.from_pretrained("hf-internal-testing/tiny-random-t5")
--- a/tests/pipelines/flux/test_pipeline_flux_kontext.py
+++ b/tests/pipelines/flux/test_pipeline_flux_kontext.py
@@ -3,7 +3,7 @@ import unittest
 import numpy as np
 import PIL.Image
 import torch
-from transformers import AutoConfig, AutoTokenizer, CLIPTextConfig, CLIPTextModel, CLIPTokenizer, T5EncoderModel
+from transformers import AutoTokenizer, CLIPTextConfig, CLIPTextModel, CLIPTokenizer, T5EncoderModel

 from diffusers import (
    AutoencoderKL,
@@ -79,8 +79,7 @@ class FluxKontextPipelineFastTests(
        text_encoder = CLIPTextModel(clip_text_encoder_config)

        torch.manual_seed(0)
-        config = AutoConfig.from_pretrained("hf-internal-testing/tiny-random-t5")
-        text_encoder_2 = T5EncoderModel(config)
+        text_encoder_2 = T5EncoderModel.from_pretrained("hf-internal-testing/tiny-random-t5")

        tokenizer = CLIPTokenizer.from_pretrained("hf-internal-testing/tiny-random-clip")
        tokenizer_2 = AutoTokenizer.from_pretrained("hf-internal-testing/tiny-random-t5")
--- a/tests/pipelines/flux/test_pipeline_flux_kontext_inpaint.py
+++ b/tests/pipelines/flux/test_pipeline_flux_kontext_inpaint.py
@@ -3,7 +3,7 @@ import unittest

 import numpy as np
 import torch
-from transformers import AutoConfig, AutoTokenizer, CLIPTextConfig, CLIPTextModel, CLIPTokenizer, T5EncoderModel
+from transformers import AutoTokenizer, CLIPTextConfig, CLIPTextModel, CLIPTokenizer, T5EncoderModel

 from diffusers import (
    AutoencoderKL,
@@ -79,8 +79,7 @@ class FluxKontextInpaintPipelineFastTests(
        text_encoder = CLIPTextModel(clip_text_encoder_config)

        torch.manual_seed(0)
-        config = AutoConfig.from_pretrained("hf-internal-testing/tiny-random-t5")
-        text_encoder_2 = T5EncoderModel(config)
+        text_encoder_2 = T5EncoderModel.from_pretrained("hf-internal-testing/tiny-random-t5")

        tokenizer = CLIPTokenizer.from_pretrained("hf-internal-testing/tiny-random-clip")
        tokenizer_2 = AutoTokenizer.from_pretrained("hf-internal-testing/tiny-random-t5")
--- a/tests/pipelines/glm_image/test_glm_image.py
+++ b/tests/pipelines/glm_image/test_glm_image.py
@@ -16,7 +16,7 @@ import unittest

 import numpy as np
 import torch
-from transformers import AutoConfig, AutoTokenizer, T5EncoderModel
+from transformers import AutoTokenizer, T5EncoderModel

 from diffusers import AutoencoderKL, FlowMatchEulerDiscreteScheduler, GlmImagePipeline, GlmImageTransformer2DModel
 from diffusers.utils import is_transformers_version
@@ -57,8 +57,7 @@ class GlmImagePipelineFastTests(PipelineTesterMixin, unittest.TestCase):

    def get_dummy_components(self):
        torch.manual_seed(0)
-        config = AutoConfig.from_pretrained("hf-internal-testing/tiny-random-t5")
-        text_encoder = T5EncoderModel(config)
+        text_encoder = T5EncoderModel.from_pretrained("hf-internal-testing/tiny-random-t5")
        tokenizer = AutoTokenizer.from_pretrained("hf-internal-testing/tiny-random-t5")

        glm_config = GlmImageConfig(
--- a/tests/pipelines/hidream_image/test_pipeline_hidream.py
+++ b/tests/pipelines/hidream_image/test_pipeline_hidream.py
@@ -18,7 +18,6 @@ import unittest
 import numpy as np
 import torch
 from transformers import (
-    AutoConfig,
    AutoTokenizer,
    CLIPTextConfig,
    CLIPTextModelWithProjection,
@@ -95,8 +94,7 @@ class HiDreamImagePipelineFastTests(PipelineTesterMixin, unittest.TestCase):
        text_encoder_2 = CLIPTextModelWithProjection(clip_text_encoder_config)

        torch.manual_seed(0)
-        config = AutoConfig.from_pretrained("hf-internal-testing/tiny-random-t5")
-        text_encoder_3 = T5EncoderModel(config)
+        text_encoder_3 = T5EncoderModel.from_pretrained("hf-internal-testing/tiny-random-t5")

        torch.manual_seed(0)
        text_encoder_4 = LlamaForCausalLM.from_pretrained("hf-internal-testing/tiny-random-LlamaForCausalLM")
@@ -151,12 +149,12 @@ class HiDreamImagePipelineFastTests(PipelineTesterMixin, unittest.TestCase):
        self.assertEqual(generated_image.shape, (128, 128, 3))

        # fmt: off
-        expected_slice = np.array([0.4501, 0.5256, 0.4207, 0.5783, 0.4842, 0.4833, 0.4441, 0.5112, 0.6587, 0.3169, 0.7308, 0.5927, 0.6251, 0.5509, 0.5355, 0.5969])
+        expected_slice = np.array([0.4507, 0.5256, 0.4205, 0.5791, 0.4848, 0.4831, 0.4443, 0.5107, 0.6586, 0.3163, 0.7318, 0.5933, 0.6252, 0.5512, 0.5357, 0.5983])
        # fmt: on

        generated_slice = generated_image.flatten()
        generated_slice = np.concatenate([generated_slice[:8], generated_slice[-8:]])
-        self.assertTrue(np.allclose(generated_slice, expected_slice, atol=5e-3))
+        self.assertTrue(np.allclose(generated_slice, expected_slice, atol=1e-3))

    def test_inference_batch_single_identical(self):
        super().test_inference_batch_single_identical(expected_max_diff=3e-4)
--- a/tests/pipelines/hunyuan_image_21/test_hunyuanimage.py
+++ b/tests/pipelines/hunyuan_image_21/test_hunyuanimage.py
@@ -223,7 +223,7 @@ class HunyuanImagePipelineFastTests(
        self.assertEqual(generated_image.shape, (3, 16, 16))

        expected_slice_np = np.array(
-            [0.6068114, 0.48716035, 0.5984431, 0.60241306, 0.48849544, 0.5624479, 0.53696984, 0.58964247, 0.54248774]
+            [0.61494756, 0.49616697, 0.60327923, 0.6115793, 0.49047345, 0.56977504, 0.53066164, 0.58880305, 0.5570612]
        )
        output_slice = generated_image[0, -3:, -3:].flatten().cpu().numpy()

--- a/tests/pipelines/hunyuan_video/test_hunyuan_image2video.py
+++ b/tests/pipelines/hunyuan_video/test_hunyuan_image2video.py
@@ -233,7 +233,7 @@ class HunyuanVideoImageToVideoPipelineFastTests(
        self.assertEqual(generated_video.shape, (5, 3, 16, 16))

        # fmt: off
-        expected_slice = torch.tensor([0.4441, 0.4790, 0.4485, 0.5748, 0.3539, 0.1553, 0.2707, 0.3594, 0.5331, 0.6645, 0.6799, 0.5257, 0.5092, 0.3450, 0.4276, 0.4127])
+        expected_slice = torch.tensor([0.444, 0.479, 0.4485, 0.5752, 0.3539, 0.1548, 0.2706, 0.3593, 0.5323, 0.6635, 0.6795, 0.5255, 0.5091, 0.345, 0.4276, 0.4128])
        # fmt: on

        generated_slice = generated_video.flatten()
--- a/tests/pipelines/hunyuan_video1_5/test_hunyuan_1_5.py
+++ b/tests/pipelines/hunyuan_video1_5/test_hunyuan_1_5.py
@@ -15,14 +15,7 @@
 import unittest

 import torch
-from transformers import (
-    AutoConfig,
-    ByT5Tokenizer,
-    Qwen2_5_VLTextConfig,
-    Qwen2_5_VLTextModel,
-    Qwen2Tokenizer,
-    T5EncoderModel,
-)
+from transformers import ByT5Tokenizer, Qwen2_5_VLTextConfig, Qwen2_5_VLTextModel, Qwen2Tokenizer, T5EncoderModel

 from diffusers import (
    AutoencoderKLHunyuanVideo15,
@@ -121,8 +114,7 @@ class HunyuanVideo15PipelineFastTests(PipelineTesterMixin, unittest.TestCase):
        tokenizer = Qwen2Tokenizer.from_pretrained("hf-internal-testing/tiny-random-Qwen2VLForConditionalGeneration")

        torch.manual_seed(0)
-        config = AutoConfig.from_pretrained("hf-internal-testing/tiny-random-t5")
-        text_encoder_2 = T5EncoderModel(config)
+        text_encoder_2 = T5EncoderModel.from_pretrained("hf-internal-testing/tiny-random-t5")
        tokenizer_2 = ByT5Tokenizer()

        guider = ClassifierFreeGuidance(guidance_scale=1.0)
--- a/tests/pipelines/hunyuandit/test_hunyuan_dit.py
+++ b/tests/pipelines/hunyuandit/test_hunyuan_dit.py
@@ -19,7 +19,7 @@ import unittest

 import numpy as np
 import torch
-from transformers import AutoConfig, AutoTokenizer, BertModel, T5EncoderModel
+from transformers import AutoTokenizer, BertModel, T5EncoderModel

 from diffusers import AutoencoderKL, DDPMScheduler, HunyuanDiT2DModel, HunyuanDiTPipeline

@@ -74,9 +74,7 @@ class HunyuanDiTPipelineFastTests(PipelineTesterMixin, unittest.TestCase):
        scheduler = DDPMScheduler()
        text_encoder = BertModel.from_pretrained("hf-internal-testing/tiny-random-BertModel")
        tokenizer = AutoTokenizer.from_pretrained("hf-internal-testing/tiny-random-BertModel")
-        torch.manual_seed(0)
-        config = AutoConfig.from_pretrained("hf-internal-testing/tiny-random-t5")
-        text_encoder_2 = T5EncoderModel(config)
+        text_encoder_2 = T5EncoderModel.from_pretrained("hf-internal-testing/tiny-random-t5")
        tokenizer_2 = AutoTokenizer.from_pretrained("hf-internal-testing/tiny-random-t5")

        components = {
--- a/tests/pipelines/kandinsky3/test_kandinsky3.py
+++ b/tests/pipelines/kandinsky3/test_kandinsky3.py
@@ -19,7 +19,7 @@ import unittest
 import numpy as np
 import torch
 from PIL import Image
-from transformers import AutoConfig, AutoTokenizer, T5EncoderModel
+from transformers import AutoTokenizer, T5EncoderModel

 from diffusers import (
    AutoPipelineForImage2Image,
@@ -108,8 +108,7 @@ class Kandinsky3PipelineFastTests(PipelineTesterMixin, unittest.TestCase):
        torch.manual_seed(0)
        movq = self.dummy_movq
        torch.manual_seed(0)
-        config = AutoConfig.from_pretrained("hf-internal-testing/tiny-random-t5")
-        text_encoder = T5EncoderModel(config).eval()
+        text_encoder = T5EncoderModel.from_pretrained("hf-internal-testing/tiny-random-t5")

        torch.manual_seed(0)
        tokenizer = AutoTokenizer.from_pretrained("hf-internal-testing/tiny-random-t5")
@@ -156,9 +155,9 @@ class Kandinsky3PipelineFastTests(PipelineTesterMixin, unittest.TestCase):

        assert image.shape == (1, 16, 16, 3)

-        expected_slice = np.array([0.3944, 0.3680, 0.4842, 0.5333, 0.4412, 0.4812, 0.5089, 0.5381, 0.5578])
+        expected_slice = np.array([0.3768, 0.4373, 0.4865, 0.4890, 0.4299, 0.5122, 0.4921, 0.4924, 0.5599])

-        assert np.abs(image_slice.flatten() - expected_slice).max() < 1e-1, (
+        assert np.abs(image_slice.flatten() - expected_slice).max() < 1e-2, (
            f" expected_slice {expected_slice}, but got {image_slice.flatten()}"
        )

--- a/tests/pipelines/kandinsky3/test_kandinsky3_img2img.py
+++ b/tests/pipelines/kandinsky3/test_kandinsky3_img2img.py
@@ -20,7 +20,7 @@ import unittest
 import numpy as np
 import torch
 from PIL import Image
-from transformers import AutoConfig, AutoTokenizer, T5EncoderModel
+from transformers import AutoTokenizer, T5EncoderModel

 from diffusers import (
    AutoPipelineForImage2Image,
@@ -119,8 +119,7 @@ class Kandinsky3Img2ImgPipelineFastTests(PipelineTesterMixin, unittest.TestCase)
        torch.manual_seed(0)
        movq = self.dummy_movq
        torch.manual_seed(0)
-        config = AutoConfig.from_pretrained("hf-internal-testing/tiny-random-t5")
-        text_encoder = T5EncoderModel(config).eval()
+        text_encoder = T5EncoderModel.from_pretrained("hf-internal-testing/tiny-random-t5")

        torch.manual_seed(0)
        tokenizer = AutoTokenizer.from_pretrained("hf-internal-testing/tiny-random-t5")
@@ -156,7 +155,10 @@ class Kandinsky3Img2ImgPipelineFastTests(PipelineTesterMixin, unittest.TestCase)
        return inputs

    def test_dict_tuple_outputs_equivalent(self):
-        super().test_dict_tuple_outputs_equivalent()
+        expected_slice = None
+        if torch_device == "cpu":
+            expected_slice = np.array([0.5762, 0.6112, 0.4150, 0.6018, 0.6167, 0.4626, 0.5426, 0.5641, 0.6536])
+        super().test_dict_tuple_outputs_equivalent(expected_slice=expected_slice)

    def test_kandinsky3_img2img(self):
        device = "cpu"
@@ -175,9 +177,11 @@ class Kandinsky3Img2ImgPipelineFastTests(PipelineTesterMixin, unittest.TestCase)

        assert image.shape == (1, 64, 64, 3)

-        expected_slice = np.array([0.5725, 0.6248, 0.4355, 0.5732, 0.6105, 0.5267, 0.5470, 0.5512, 0.6618])
+        expected_slice = np.array(
+            [0.576259, 0.6132097, 0.41703486, 0.603196, 0.62062526, 0.4655338, 0.5434324, 0.5660727, 0.65433365]
+        )

-        assert np.abs(image_slice.flatten() - expected_slice).max() < 1e-1, (
+        assert np.abs(image_slice.flatten() - expected_slice).max() < 1e-2, (
            f" expected_slice {expected_slice}, but got {image_slice.flatten()}"
        )

--- a/tests/pipelines/latte/test_latte.py
+++ b/tests/pipelines/latte/test_latte.py
@@ -20,7 +20,7 @@ import unittest

 import numpy as np
 import torch
-from transformers import AutoConfig, AutoTokenizer, T5EncoderModel
+from transformers import AutoTokenizer, T5EncoderModel

 from diffusers import (
    AutoencoderKL,
@@ -109,8 +109,7 @@ class LattePipelineFastTests(
        vae = AutoencoderKL()

        scheduler = DDIMScheduler()
-        config = AutoConfig.from_pretrained("hf-internal-testing/tiny-random-t5")
-        text_encoder = T5EncoderModel(config)
+        text_encoder = T5EncoderModel.from_pretrained("hf-internal-testing/tiny-random-t5")

        tokenizer = AutoTokenizer.from_pretrained("hf-internal-testing/tiny-random-t5")

--- a/tests/pipelines/ltx/test_ltx.py
+++ b/tests/pipelines/ltx/test_ltx.py
@@ -17,7 +17,7 @@ import unittest

 import numpy as np
 import torch
-from transformers import AutoConfig, AutoTokenizer, T5EncoderModel
+from transformers import AutoTokenizer, T5EncoderModel

 from diffusers import AutoencoderKLLTXVideo, FlowMatchEulerDiscreteScheduler, LTXPipeline, LTXVideoTransformer3DModel

@@ -88,8 +88,7 @@ class LTXPipelineFastTests(PipelineTesterMixin, FirstBlockCacheTesterMixin, unit

        torch.manual_seed(0)
        scheduler = FlowMatchEulerDiscreteScheduler()
-        config = AutoConfig.from_pretrained("hf-internal-testing/tiny-random-t5")
-        text_encoder = T5EncoderModel(config)
+        text_encoder = T5EncoderModel.from_pretrained("hf-internal-testing/tiny-random-t5")
        tokenizer = AutoTokenizer.from_pretrained("hf-internal-testing/tiny-random-t5")

        components = {
--- a/Show More
+++ b/Show More
Author	SHA1	Message	Date
yiyi@huggingface.co	cd93862fd1	add more tests	2026-03-03 09:05:33 +00:00
yiyi@huggingface.co	3a7c5cb330	fix update_componenet with custom model	2026-03-03 09:05:22 +00:00
yiyi@huggingface.co	2d20c6f740	up	2026-02-27 09:59:39 +00:00