Test hfh v0.29.0.rc5

Fix max_shift value in flux and related functions to 1.15 (issue #10675 ) (#10807 )
This PR updates the max_shift value in flux to 1.15 for consistency across the codebase. In addition to modifying max_shift in flux, all related functions that copy and use this logic, such as calculate_shift in `src/diffusers/pipelines/stable_diffusion_3/pipeline_stable_diffusion_3_img2img.py`, have also been updated to ensure uniform behavior.
2025-12-06 12:34:13 +08:00 · 2025-02-18 16:39:51 +00:00 · 2025-02-18 06:54:56 +00:00 · 2025-02-17 19:04:48 +05:30 · 2025-02-17 09:26:15 +05:30 · 2025-02-16 17:28:57 +00:00
1506 changed files with 331078 additions and 38020 deletions
--- a/.github/ISSUE_TEMPLATE/bug-report.yml
+++ b/.github/ISSUE_TEMPLATE/bug-report.yml
@@ -57,50 +57,54 @@ body:
      description: |
        Your issue will be replied to more quickly if you can figure out the right person to tag with @.
        If you know how to use git blame, that is the easiest way, otherwise, here is a rough guide of **who to tag**.
-        
+
        All issues are read by one of the core maintainers, so if you don't know who to tag, just leave this blank and
        a core maintainer will ping the right person.
-        
+
        Please tag a maximum of 2 people.

-        Questions on DiffusionPipeline (Saving, Loading, From pretrained, ...):
+        Questions on DiffusionPipeline (Saving, Loading, From pretrained, ...): @sayakpaul @DN6

        Questions on pipelines:
-        - Stable Diffusion @yiyixuxu @DN6 @sayakpaul @patrickvonplaten
-        - Stable Diffusion XL @yiyixuxu @sayakpaul @DN6 @patrickvonplaten
-        - Kandinsky @yiyixuxu @patrickvonplaten
-        - ControlNet @sayakpaul @yiyixuxu @DN6 @patrickvonplaten
-        - T2I Adapter @sayakpaul @yiyixuxu @DN6 @patrickvonplaten
-        - IF @DN6 @patrickvonplaten
-        - Text-to-Video / Video-to-Video @DN6 @sayakpaul @patrickvonplaten
-        - Wuerstchen @DN6 @patrickvonplaten
+        - Stable Diffusion @yiyixuxu @asomoza
+        - Stable Diffusion XL @yiyixuxu @sayakpaul @DN6
+        - Stable Diffusion 3: @yiyixuxu @sayakpaul @DN6 @asomoza
+        - Kandinsky @yiyixuxu
+        - ControlNet @sayakpaul @yiyixuxu @DN6
+        - T2I Adapter @sayakpaul @yiyixuxu @DN6
+        - IF @DN6
+        - Text-to-Video / Video-to-Video @DN6 @a-r-r-o-w
+        - Wuerstchen @DN6
        - Other: @yiyixuxu @DN6
+        - Improving generation quality: @asomoza

        Questions on models:
-        - UNet @DN6 @yiyixuxu @sayakpaul @patrickvonplaten
-        - VAE @sayakpaul @DN6 @yiyixuxu @patrickvonplaten
-        - Transformers/Attention @DN6 @yiyixuxu @sayakpaul @DN6 @patrickvonplaten
+        - UNet @DN6 @yiyixuxu @sayakpaul
+        - VAE @sayakpaul @DN6 @yiyixuxu
+        - Transformers/Attention @DN6 @yiyixuxu @sayakpaul

-        Questions on Schedulers: @yiyixuxu @patrickvonplaten
+        Questions on single file checkpoints: @DN6

-        Questions on LoRA: @sayakpaul @patrickvonplaten
+        Questions on Schedulers: @yiyixuxu

-        Questions on Textual Inversion: @sayakpaul @patrickvonplaten
+        Questions on LoRA: @sayakpaul

-        Questions on Training: 
-        - DreamBooth @sayakpaul @patrickvonplaten
-        - Text-to-Image Fine-tuning @sayakpaul @patrickvonplaten
-        - Textual Inversion @sayakpaul @patrickvonplaten
-        - ControlNet @sayakpaul @patrickvonplaten
+        Questions on Textual Inversion: @sayakpaul

-        Questions on Tests: @DN6 @sayakpaul @yiyixuxu 
+        Questions on Training:
+        - DreamBooth @sayakpaul
+        - Text-to-Image Fine-tuning @sayakpaul
+        - Textual Inversion @sayakpaul
+        - ControlNet @sayakpaul
+
+        Questions on Tests: @DN6 @sayakpaul @yiyixuxu

        Questions on Documentation: @stevhliu

        Questions on JAX- and MPS-related things: @pcuenca

-        Questions on audio pipelines: @DN6 @patrickvonplaten
-        
+        Questions on audio pipelines: @sanchit-gandhi
+
+

-        
      placeholder: "@Username ..."
--- a/.github/PULL_REQUEST_TEMPLATE.md
+++ b/.github/PULL_REQUEST_TEMPLATE.md
@@ -38,17 +38,18 @@ members/contributors who may be interested in your PR.

 Core library:

- Schedulers: @yiyixuxu and @patrickvonplaten
- Pipelines:  @patrickvonplaten and @sayakpaul
- Training examples: @sayakpaul and @patrickvonplaten
- Docs: @stevhliu and @yiyixuxu
+- Schedulers: @yiyixuxu
+- Pipelines and pipeline callbacks: @yiyixuxu and @asomoza
+- Training examples: @sayakpaul
+- Docs: @stevhliu and @sayakpaul
 - JAX and MPS: @pcuenca
 - Audio: @sanchit-gandhi
- General functionalities: @patrickvonplaten and @sayakpaul
+- General functionalities: @sayakpaul @yiyixuxu @DN6

 Integrations:

- deepspeed: HF Trainer/Accelerate: @pacman100
+- deepspeed: HF Trainer/Accelerate: @SunMarc
+- PEFT: @sayakpaul @BenjaminBossan

 HF projects:

--- a/.github/workflows/benchmark.yml
+++ b/.github/workflows/benchmark.yml
@@ -1,25 +1,30 @@
 name: Benchmarking tests

 on:
+  workflow_dispatch:
  schedule:
    - cron: "30 1 1,15 * *" # every 2 weeks on the 1st and the 15th of every month at 1:30 AM

 env:
  DIFFUSERS_IS_CI: yes
+  HF_HUB_ENABLE_HF_TRANSFER: 1
  HF_HOME: /mnt/cache
  OMP_NUM_THREADS: 8
  MKL_NUM_THREADS: 8

 jobs:
  torch_pipelines_cuda_benchmark_tests:
+    env:
+      SLACK_WEBHOOK_URL: ${{ secrets.SLACK_WEBHOOK_URL_BENCHMARK }}
    name: Torch Core Pipelines CUDA Benchmarking Tests
    strategy:
      fail-fast: false
      max-parallel: 1
-    runs-on: [single-gpu, nvidia-gpu, a10, ci]
+    runs-on:
+      group: aws-g6-4xlarge-plus
    container:
-      image: diffusers/diffusers-pytorch-cuda
-      options: --shm-size "16gb" --ipc host -v /mnt/hf_cache:/mnt/cache/ --gpus 0
+      image: diffusers/diffusers-pytorch-compile-cuda
+      options: --shm-size "16gb" --ipc host --gpus 0
    steps:
      - name: Checkout diffusers
        uses: actions/checkout@v3
@@ -30,15 +35,15 @@ jobs:
          nvidia-smi
      - name: Install dependencies
        run: |
-          apt-get update && apt-get install libsndfile1-dev libgl1 -y
-          python -m pip install -e .[quality,test]
-          python -m pip install pandas peft
+          python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
+          python -m uv pip install -e [quality,test]
+          python -m uv pip install pandas peft
      - name: Environment
        run: |
          python utils/print_env.py
      - name: Diffusers Benchmarking
        env:
-            HUGGING_FACE_HUB_TOKEN: ${{ secrets.DIFFUSERS_BOT_TOKEN }}
+            HF_TOKEN: ${{ secrets.DIFFUSERS_BOT_TOKEN }}
            BASE_PATH: benchmark_outputs
        run: |
          export TOTAL_GPU_MEMORY=$(python -c "import torch; print(torch.cuda.get_device_properties(0).total_memory / (1024**3))")
@@ -46,7 +51,17 @@ jobs:

      - name: Test suite reports artifacts
        if: ${{ always() }}
-        uses: actions/upload-artifact@v2
+        uses: actions/upload-artifact@v4
        with:
          name: benchmark_test_reports
-          path: benchmarks/benchmark_outputs
+          path: benchmarks/benchmark_outputs
+
+      - name: Report success status
+        if: ${{ success() }}
+        run: |
+          pip install requests && python utils/notify_benchmarking_status.py --status=success
+
+      - name: Report failure status
+        if: ${{ failure() }}
+        run: |
+          pip install requests && python utils/notify_benchmarking_status.py --status=failure
--- a/.github/workflows/build_docker_images.yml
+++ b/.github/workflows/build_docker_images.yml
@@ -1,20 +1,59 @@
-name: Build Docker images (nightly)
+name: Test, build, and push Docker images

 on:
+  pull_request: # During PRs, we just check if the changes Dockerfiles can be successfully built
+    branches:
+      - main
+    paths:
+      - "docker/**"
  workflow_dispatch:
  schedule:
    - cron: "0 0 * * *" # every day at midnight

 concurrency:
-  group: docker-image-builds
-  cancel-in-progress: false
+  group: ${{ github.workflow }}-${{ github.head_ref || github.run_id }}
+  cancel-in-progress: true

 env:
  REGISTRY: diffusers
+  CI_SLACK_CHANNEL: ${{ secrets.CI_DOCKER_CHANNEL }}

 jobs:
-  build-docker-images:
-    runs-on: ubuntu-latest
+  test-build-docker-images:
+    runs-on:
+      group: aws-general-8-plus
+    if: github.event_name == 'pull_request'
+    steps:
+      - name: Set up Docker Buildx
+        uses: docker/setup-buildx-action@v1
+
+      - name: Check out code
+        uses: actions/checkout@v3
+
+      - name: Find Changed Dockerfiles
+        id: file_changes
+        uses: jitterbit/get-changed-files@v1
+        with:
+          format: "space-delimited"
+          token: ${{ secrets.GITHUB_TOKEN }}
+
+      - name: Build Changed Docker Images
+        run: |
+          CHANGED_FILES="${{ steps.file_changes.outputs.all }}"
+          for FILE in $CHANGED_FILES; do
+            if [[ "$FILE" == docker/*Dockerfile ]]; then
+              DOCKER_PATH="${FILE%/Dockerfile}"
+              DOCKER_TAG=$(basename "$DOCKER_PATH")
+              echo "Building Docker image for $DOCKER_TAG"
+              docker build -t "$DOCKER_TAG" "$DOCKER_PATH"
+            fi
+          done
+        if: steps.file_changes.outputs.all != ''
+
+  build-and-push-docker-images:
+    runs-on:
+      group: aws-general-8-plus
+    if: github.event_name != 'pull_request'

    permissions:
      contents: read
@@ -28,21 +67,23 @@ jobs:
          - diffusers-pytorch-cuda
          - diffusers-pytorch-compile-cuda
          - diffusers-pytorch-xformers-cuda
+          - diffusers-pytorch-minimum-cuda
          - diffusers-flax-cpu
          - diffusers-flax-tpu
          - diffusers-onnxruntime-cpu
          - diffusers-onnxruntime-cuda
+          - diffusers-doc-builder

    steps:
      - name: Checkout repository
        uses: actions/checkout@v3
-
+      - name: Set up Docker Buildx
+        uses: docker/setup-buildx-action@v1
      - name: Login to Docker Hub
        uses: docker/login-action@v2
        with:
          username: ${{ env.REGISTRY }}
          password: ${{ secrets.DOCKERHUB_TOKEN }}
-
      - name: Build and push
        uses: docker/build-push-action@v3
        with:
@@ -50,3 +91,14 @@ jobs:
          context: ./docker/${{ matrix.image-name }}
          push: true
          tags: ${{ env.REGISTRY }}/${{ matrix.image-name }}:latest
+
+      - name: Post to a Slack channel
+        id: slack
+        uses: huggingface/hf-workflows/.github/actions/post-slack@main
+        with:
+          # Slack channel id, channel name, or user id to post message.
+          # See also: https://api.slack.com/methods/chat.postMessage#channels
+          slack_channel: ${{ env.CI_SLACK_CHANNEL }}
+          title: "🤗 Results of the ${{ matrix.image-name }} Docker Image build"
+          status: ${{ job.status }}
+          slack_token: ${{ secrets.SLACK_CIFEEDBACK_BOT_TOKEN }}
--- a/.github/workflows/build_documentation.yml
+++ b/.github/workflows/build_documentation.yml
@@ -7,6 +7,10 @@ on:
      - doc-builder*
      - v*-release
      - v*-patch
+    paths:
+      - "src/diffusers/**.py"
+      - "examples/**"
+      - "docs/**"

 jobs:
  build:
@@ -17,7 +21,7 @@ jobs:
      package: diffusers
      notebook_folder: diffusers_doc
      languages: en ko zh ja pt
-
+      custom_container: diffusers/diffusers-doc-builder
    secrets:
      token: ${{ secrets.HUGGINGFACE_PUSH }}
      hf_token: ${{ secrets.HF_DOC_BUILD_PUSH }}
--- a/.github/workflows/build_pr_documentation.yml
+++ b/.github/workflows/build_pr_documentation.yml
@@ -2,6 +2,10 @@ name: Build PR Documentation

 on:
  pull_request:
+    paths:
+      - "src/diffusers/**.py"
+      - "examples/**"
+      - "docs/**"

 concurrency:
  group: ${{ github.workflow }}-${{ github.head_ref || github.run_id }}
@@ -16,3 +20,4 @@ jobs:
      install_libgl1: true
      package: diffusers
      languages: en ko zh ja pt
+      custom_container: diffusers/diffusers-doc-builder
--- a/.github/workflows/mirror_community_pipeline.yml
+++ b/.github/workflows/mirror_community_pipeline.yml
@@ -0,0 +1,102 @@
+name: Mirror Community Pipeline
+
+on:
+  # Push changes on the main branch
+  push:
+    branches:
+      - main
+    paths:
+      - 'examples/community/**.py'
+
+    # And on tag creation (e.g. `v0.28.1`)
+    tags:
+      - '*'
+
+  # Manual trigger with ref input
+  workflow_dispatch:
+    inputs:
+      ref:
+        description: "Either 'main' or a tag ref"
+        required: true
+        default: 'main'
+
+jobs:
+  mirror_community_pipeline:
+    env:
+      SLACK_WEBHOOK_URL: ${{ secrets.SLACK_WEBHOOK_URL_COMMUNITY_MIRROR }}
+
+    runs-on: ubuntu-22.04
+    steps:
+      # Checkout to correct ref
+      #   If workflow dispatch
+      #     If ref is 'main', set:
+      #       CHECKOUT_REF=refs/heads/main
+      #       PATH_IN_REPO=main
+      #     Else it must be a tag. Set:
+      #       CHECKOUT_REF=refs/tags/{tag}
+      #       PATH_IN_REPO={tag}
+      #   If not workflow dispatch
+      #     If ref is 'refs/heads/main' => set 'main'
+      #     Else it must be a tag => set {tag}
+      - name: Set checkout_ref and path_in_repo
+        run: |
+          if [ "${{ github.event_name }}" == "workflow_dispatch" ]; then
+            if [ -z "${{ github.event.inputs.ref }}" ]; then
+              echo "Error: Missing ref input"
+              exit 1
+            elif [ "${{ github.event.inputs.ref }}" == "main" ]; then
+              echo "CHECKOUT_REF=refs/heads/main" >> $GITHUB_ENV
+              echo "PATH_IN_REPO=main" >> $GITHUB_ENV
+            else
+              echo "CHECKOUT_REF=refs/tags/${{ github.event.inputs.ref }}" >> $GITHUB_ENV
+              echo "PATH_IN_REPO=${{ github.event.inputs.ref }}" >> $GITHUB_ENV
+            fi
+          elif [ "${{ github.ref }}" == "refs/heads/main" ]; then
+            echo "CHECKOUT_REF=${{ github.ref }}" >> $GITHUB_ENV
+            echo "PATH_IN_REPO=main" >> $GITHUB_ENV
+          else
+            # e.g. refs/tags/v0.28.1 -> v0.28.1
+            echo "CHECKOUT_REF=${{ github.ref }}" >> $GITHUB_ENV
+            echo "PATH_IN_REPO=$(echo ${{ github.ref }} | sed 's/^refs\/tags\///')" >> $GITHUB_ENV
+          fi
+      - name: Print env vars
+        run: |
+          echo "CHECKOUT_REF: ${{ env.CHECKOUT_REF }}"
+          echo "PATH_IN_REPO: ${{ env.PATH_IN_REPO }}"
+      - uses: actions/checkout@v3
+        with:
+          ref: ${{ env.CHECKOUT_REF }}
+
+      # Setup + install dependencies
+      - name: Set up Python
+        uses: actions/setup-python@v4
+        with:
+          python-version: "3.10"
+      - name: Install dependencies
+        run: |
+          python -m pip install --upgrade pip
+          pip install --upgrade huggingface_hub
+
+      # Check secret is set
+      - name: whoami
+        run: huggingface-cli whoami
+        env:
+            HF_TOKEN: ${{ secrets.HF_TOKEN_MIRROR_COMMUNITY_PIPELINES }}
+
+      # Push to HF! (under subfolder based on checkout ref)
+      # https://huggingface.co/datasets/diffusers/community-pipelines-mirror
+      - name: Mirror community pipeline to HF
+        run: huggingface-cli upload diffusers/community-pipelines-mirror ./examples/community ${PATH_IN_REPO} --repo-type dataset
+        env:
+            PATH_IN_REPO: ${{ env.PATH_IN_REPO }}
+            HF_TOKEN: ${{ secrets.HF_TOKEN_MIRROR_COMMUNITY_PIPELINES }}
+
+      - name: Report success status
+        if: ${{ success() }}
+        run: |
+          pip install requests && python utils/notify_community_pipelines_mirror.py --status=success
+
+      - name: Report failure status
+        if: ${{ failure() }}
+        run: |
+          pip install requests && python utils/notify_community_pipelines_mirror.py --status=failure
--- a/.github/workflows/nightly_tests.yml
+++ b/.github/workflows/nightly_tests.yml
@@ -1,162 +1,584 @@
-name: Nightly tests on main
+name: Nightly and release tests on main/release branch

 on:
+  workflow_dispatch:
  schedule:
    - cron: "0 0 * * *" # every day at midnight

 env:
  DIFFUSERS_IS_CI: yes
-  HF_HOME: /mnt/cache
+  HF_HUB_ENABLE_HF_TRANSFER: 1
  OMP_NUM_THREADS: 8
  MKL_NUM_THREADS: 8
  PYTEST_TIMEOUT: 600
  RUN_SLOW: yes
  RUN_NIGHTLY: yes
+  PIPELINE_USAGE_CUTOFF: 5000
+  SLACK_API_TOKEN: ${{ secrets.SLACK_CIFEEDBACK_BOT_TOKEN }}

 jobs:
-  run_nightly_tests:
+  setup_torch_cuda_pipeline_matrix:
+    name: Setup Torch Pipelines CUDA Slow Tests Matrix
+    runs-on:
+      group: aws-general-8-plus
+    container:
+      image: diffusers/diffusers-pytorch-cpu
+    outputs:
+      pipeline_test_matrix: ${{ steps.fetch_pipeline_matrix.outputs.pipeline_test_matrix }}
+    steps:
+      - name: Checkout diffusers
+        uses: actions/checkout@v3
+        with:
+          fetch-depth: 2
+      - name: Install dependencies
+        run: |
+          pip install -e .[test]
+          pip install huggingface_hub
+      - name: Fetch Pipeline Matrix
+        id: fetch_pipeline_matrix
+        run: |
+          matrix=$(python utils/fetch_torch_cuda_pipeline_test_matrix.py)
+          echo $matrix
+          echo "pipeline_test_matrix=$matrix" >> $GITHUB_OUTPUT
+
+      - name: Pipeline Tests Artifacts
+        if: ${{ always() }}
+        uses: actions/upload-artifact@v4
+        with:
+          name: test-pipelines.json
+          path: reports
+
+  run_nightly_tests_for_torch_pipelines:
+    name: Nightly Torch Pipelines CUDA Tests
+    needs: setup_torch_cuda_pipeline_matrix
    strategy:
      fail-fast: false
+      max-parallel: 8
      matrix:
-        config:
-          - name: Nightly PyTorch CUDA tests on Ubuntu
-            framework: pytorch
-            runner: docker-gpu
-            image: diffusers/diffusers-pytorch-cuda
-            report: torch_cuda
-          - name: Nightly Flax TPU tests on Ubuntu
-            framework: flax
-            runner: docker-tpu
-            image: diffusers/diffusers-flax-tpu
-            report: flax_tpu
-          - name: Nightly ONNXRuntime CUDA tests on Ubuntu
-            framework: onnxruntime
-            runner: docker-gpu
-            image: diffusers/diffusers-onnxruntime-cuda
-            report: onnx_cuda
-
-    name: ${{ matrix.config.name }}
-
-    runs-on: ${{ matrix.config.runner }}
-
+        module: ${{ fromJson(needs.setup_torch_cuda_pipeline_matrix.outputs.pipeline_test_matrix) }}
+    runs-on:
+      group: aws-g4dn-2xlarge
    container:
-      image: ${{ matrix.config.image }}
-      options: --shm-size "16gb" --ipc host -v /mnt/hf_cache:/mnt/cache/ ${{ matrix.config.runner == 'docker-tpu' && '--privileged' || '--gpus 0'}}
+      image: diffusers/diffusers-pytorch-cuda
+      options: --shm-size "16gb" --ipc host --gpus 0
+    steps:
+      - name: Checkout diffusers
+        uses: actions/checkout@v3
+        with:
+          fetch-depth: 2
+      - name: NVIDIA-SMI
+        run: nvidia-smi
+      - name: Install dependencies
+        run: |
+          python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
+          python -m uv pip install -e [quality,test]
+          pip uninstall accelerate -y && python -m uv pip install -U accelerate@git+https://github.com/huggingface/accelerate.git
+          python -m uv pip install pytest-reportlog
+      - name: Environment
+        run: |
+          python utils/print_env.py
+      - name: Pipeline CUDA Test
+        env:
+          HF_TOKEN: ${{ secrets.DIFFUSERS_HF_HUB_READ_TOKEN }}
+          # https://pytorch.org/docs/stable/notes/randomness.html#avoiding-nondeterministic-algorithms
+          CUBLAS_WORKSPACE_CONFIG: :16:8
+        run: |
+          python -m pytest -n 1 --max-worker-restart=0 --dist=loadfile \
+            -s -v -k "not Flax and not Onnx" \
+            --make-reports=tests_pipeline_${{ matrix.module }}_cuda \
+            --report-log=tests_pipeline_${{ matrix.module }}_cuda.log \
+            tests/pipelines/${{ matrix.module }}
+      - name: Failure short reports
+        if: ${{ failure() }}
+        run: |
+          cat reports/tests_pipeline_${{ matrix.module }}_cuda_stats.txt
+          cat reports/tests_pipeline_${{ matrix.module }}_cuda_failures_short.txt
+      - name: Test suite reports artifacts
+        if: ${{ always() }}
+        uses: actions/upload-artifact@v4
+        with:
+          name: pipeline_${{ matrix.module }}_test_reports
+          path: reports
+      - name: Generate Report and Notify Channel
+        if: always()
+        run: |
+          pip install slack_sdk tabulate
+          python utils/log_reports.py >> $GITHUB_STEP_SUMMARY

+  run_nightly_tests_for_other_torch_modules:
+    name: Nightly Torch CUDA Tests
+    runs-on:
+      group: aws-g4dn-2xlarge
+    container:
+      image: diffusers/diffusers-pytorch-cuda
+      options: --shm-size "16gb" --ipc host --gpus 0
    defaults:
      run:
        shell: bash
+    strategy:
+      fail-fast: false
+      max-parallel: 2
+      matrix:
+        module: [models, schedulers, lora, others, single_file, examples]
+    steps:
+    - name: Checkout diffusers
+      uses: actions/checkout@v3
+      with:
+        fetch-depth: 2

+    - name: Install dependencies
+      run: |
+        python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
+        python -m uv pip install -e [quality,test]
+        python -m uv pip install peft@git+https://github.com/huggingface/peft.git
+        pip uninstall accelerate -y && python -m uv pip install -U accelerate@git+https://github.com/huggingface/accelerate.git
+        python -m uv pip install pytest-reportlog
+    - name: Environment
+      run: python utils/print_env.py
+
+    - name: Run nightly PyTorch CUDA tests for non-pipeline modules
+      if: ${{ matrix.module != 'examples'}}
+      env:
+        HF_TOKEN: ${{ secrets.DIFFUSERS_HF_HUB_READ_TOKEN }}
+        # https://pytorch.org/docs/stable/notes/randomness.html#avoiding-nondeterministic-algorithms
+        CUBLAS_WORKSPACE_CONFIG: :16:8
+      run: |
+        python -m pytest -n 1 --max-worker-restart=0 --dist=loadfile \
+          -s -v -k "not Flax and not Onnx" \
+          --make-reports=tests_torch_${{ matrix.module }}_cuda \
+          --report-log=tests_torch_${{ matrix.module }}_cuda.log \
+          tests/${{ matrix.module }}
+
+    - name: Run nightly example tests with Torch
+      if: ${{ matrix.module == 'examples' }}
+      env:
+        HF_TOKEN: ${{ secrets.DIFFUSERS_HF_HUB_READ_TOKEN }}
+        # https://pytorch.org/docs/stable/notes/randomness.html#avoiding-nondeterministic-algorithms
+        CUBLAS_WORKSPACE_CONFIG: :16:8
+      run: |
+        python -m pytest -n 1 --max-worker-restart=0 --dist=loadfile \
+          -s -v --make-reports=examples_torch_cuda \
+          --report-log=examples_torch_cuda.log \
+          examples/
+
+    - name: Failure short reports
+      if: ${{ failure() }}
+      run: |
+        cat reports/tests_torch_${{ matrix.module }}_cuda_stats.txt
+        cat reports/tests_torch_${{ matrix.module }}_cuda_failures_short.txt
+
+    - name: Test suite reports artifacts
+      if: ${{ always() }}
+      uses: actions/upload-artifact@v4
+      with:
+        name: torch_${{ matrix.module }}_cuda_test_reports
+        path: reports
+
+    - name: Generate Report and Notify Channel
+      if: always()
+      run: |
+        pip install slack_sdk tabulate
+        python utils/log_reports.py >> $GITHUB_STEP_SUMMARY
+
+  run_big_gpu_torch_tests:
+    name: Torch tests on big GPU
+    strategy:
+      fail-fast: false
+      max-parallel: 2
+    runs-on:
+      group: aws-g6e-xlarge-plus
+    container:
+      image: diffusers/diffusers-pytorch-cuda
+      options: --shm-size "16gb" --ipc host --gpus 0
+    steps:
+      - name: Checkout diffusers
+        uses: actions/checkout@v3
+        with:
+          fetch-depth: 2
+      - name: NVIDIA-SMI
+        run: nvidia-smi
+      - name: Install dependencies
+        run: |
+          python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
+          python -m uv pip install -e [quality,test]
+          python -m uv pip install peft@git+https://github.com/huggingface/peft.git
+          pip uninstall accelerate -y && python -m uv pip install -U accelerate@git+https://github.com/huggingface/accelerate.git
+          python -m uv pip install pytest-reportlog
+      - name: Environment
+        run: |
+          python utils/print_env.py
+      - name: Selected Torch CUDA Test on big GPU
+        env:
+          HF_TOKEN: ${{ secrets.DIFFUSERS_HF_HUB_READ_TOKEN }}
+          # https://pytorch.org/docs/stable/notes/randomness.html#avoiding-nondeterministic-algorithms
+          CUBLAS_WORKSPACE_CONFIG: :16:8
+          BIG_GPU_MEMORY: 40
+        run: |
+          python -m pytest -n 1 --max-worker-restart=0 --dist=loadfile \
+            -m "big_gpu_with_torch_cuda" \
+            --make-reports=tests_big_gpu_torch_cuda \
+            --report-log=tests_big_gpu_torch_cuda.log \
+            tests/
+      - name: Failure short reports
+        if: ${{ failure() }}
+        run: |
+          cat reports/tests_big_gpu_torch_cuda_stats.txt
+          cat reports/tests_big_gpu_torch_cuda_failures_short.txt
+      - name: Test suite reports artifacts
+        if: ${{ always() }}
+        uses: actions/upload-artifact@v4
+        with:
+          name: torch_cuda_big_gpu_test_reports
+          path: reports
+      - name: Generate Report and Notify Channel
+        if: always()
+        run: |
+          pip install slack_sdk tabulate
+          python utils/log_reports.py >> $GITHUB_STEP_SUMMARY
+          
+  torch_minimum_version_cuda_tests:
+    name: Torch Minimum Version CUDA Tests
+    runs-on:
+      group: aws-g4dn-2xlarge
+    container:
+      image: diffusers/diffusers-pytorch-minimum-cuda
+      options: --shm-size "16gb" --ipc host --gpus 0
+    defaults:
+      run:
+        shell: bash
    steps:
      - name: Checkout diffusers
        uses: actions/checkout@v3
        with:
          fetch-depth: 2

-      - name: NVIDIA-SMI
-        if: ${{ matrix.config.runner == 'docker-gpu' }}
-        run: |
-          nvidia-smi
-
      - name: Install dependencies
        run: |
-          python -m pip install -e .[quality,test]
-          python -m pip install -U git+https://github.com/huggingface/transformers
-          python -m pip install git+https://github.com/huggingface/accelerate
+          python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
+          python -m uv pip install -e [quality,test]
+          python -m uv pip install peft@git+https://github.com/huggingface/peft.git
+          pip uninstall accelerate -y && python -m uv pip install -U accelerate@git+https://github.com/huggingface/accelerate.git

      - name: Environment
        run: |
          python utils/print_env.py

-      - name: Run nightly PyTorch CUDA tests
-        if: ${{ matrix.config.framework == 'pytorch' }}
+      - name: Run PyTorch CUDA tests
        env:
-          HUGGING_FACE_HUB_TOKEN: ${{ secrets.HUGGING_FACE_HUB_TOKEN }}
+          HF_TOKEN: ${{ secrets.DIFFUSERS_HF_HUB_READ_TOKEN }}
+          # https://pytorch.org/docs/stable/notes/randomness.html#avoiding-nondeterministic-algorithms
+          CUBLAS_WORKSPACE_CONFIG: :16:8
        run: |
          python -m pytest -n 1 --max-worker-restart=0 --dist=loadfile \
            -s -v -k "not Flax and not Onnx" \
-            --make-reports=tests_${{ matrix.config.report }} \
-            tests/
-
-      - name: Run nightly Flax TPU tests
-        if: ${{ matrix.config.framework == 'flax' }}
-        env:
-          HUGGING_FACE_HUB_TOKEN: ${{ secrets.HUGGING_FACE_HUB_TOKEN }}
-        run: |
-          python -m pytest -n 0 \
-            -s -v -k "Flax" \
-            --make-reports=tests_${{ matrix.config.report }} \
-            tests/
-
-      - name: Run nightly ONNXRuntime CUDA tests
-        if: ${{ matrix.config.framework == 'onnxruntime' }}
-        env:
-          HUGGING_FACE_HUB_TOKEN: ${{ secrets.HUGGING_FACE_HUB_TOKEN }}
-        run: |
-          python -m pytest -n 1 --max-worker-restart=0 --dist=loadfile \
-            -s -v -k "Onnx" \
-            --make-reports=tests_${{ matrix.config.report }} \
-            tests/
+            --make-reports=tests_torch_minimum_version_cuda \
+            tests/models/test_modeling_common.py \
+            tests/pipelines/test_pipelines_common.py \
+            tests/pipelines/test_pipeline_utils.py \
+            tests/pipelines/test_pipelines.py \
+            tests/pipelines/test_pipelines_auto.py \
+            tests/schedulers/test_schedulers.py \
+            tests/others

      - name: Failure short reports
        if: ${{ failure() }}
-        run: cat reports/tests_${{ matrix.config.report }}_failures_short.txt
+        run: |
+          cat reports/tests_torch_minimum_version_cuda_stats.txt
+          cat reports/tests_torch_minimum_version_cuda_failures_short.txt

      - name: Test suite reports artifacts
        if: ${{ always() }}
-        uses: actions/upload-artifact@v2
+        uses: actions/upload-artifact@v4
        with:
-          name: ${{ matrix.config.report }}_test_reports
+          name: torch_minimum_version_cuda_test_reports
          path: reports
+ 
+  run_flax_tpu_tests:
+    name: Nightly Flax TPU Tests
+    runs-on:
+      group: gcp-ct5lp-hightpu-8t
+    if: github.event_name == 'schedule'

-  run_nightly_tests_apple_m1:
-    name: Nightly PyTorch MPS tests on MacOS
-    runs-on: [ self-hosted, apple-m1 ]
+    container:
+      image: diffusers/diffusers-flax-tpu
+      options: --shm-size "16gb" --ipc host --privileged ${{ vars.V5_LITEPOD_8_ENV}} -v /mnt/hf_cache:/mnt/hf_cache
+    defaults:
+      run:
+        shell: bash
+    steps:
+    - name: Checkout diffusers
+      uses: actions/checkout@v3
+      with:
+        fetch-depth: 2

+    - name: Install dependencies
+      run: |
+        python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
+        python -m uv pip install -e [quality,test]
+        pip uninstall accelerate -y && python -m uv pip install -U accelerate@git+https://github.com/huggingface/accelerate.git
+        python -m uv pip install pytest-reportlog
+
+    - name: Environment
+      run: python utils/print_env.py
+
+    - name: Run nightly Flax TPU tests
+      env:
+        HF_TOKEN: ${{ secrets.DIFFUSERS_HF_HUB_READ_TOKEN }}
+      run: |
+        python -m pytest -n 0 \
+          -s -v -k "Flax" \
+          --make-reports=tests_flax_tpu \
+          --report-log=tests_flax_tpu.log \
+          tests/
+
+    - name: Failure short reports
+      if: ${{ failure() }}
+      run: |
+        cat reports/tests_flax_tpu_stats.txt
+        cat reports/tests_flax_tpu_failures_short.txt
+
+    - name: Test suite reports artifacts
+      if: ${{ always() }}
+      uses: actions/upload-artifact@v4
+      with:
+        name: flax_tpu_test_reports
+        path: reports
+
+    - name: Generate Report and Notify Channel
+      if: always()
+      run: |
+        pip install slack_sdk tabulate
+        python utils/log_reports.py >> $GITHUB_STEP_SUMMARY
+
+  run_nightly_onnx_tests:
+    name: Nightly ONNXRuntime CUDA tests on Ubuntu
+    runs-on:
+      group: aws-g4dn-2xlarge
+    container:
+      image: diffusers/diffusers-onnxruntime-cuda
+      options: --gpus 0 --shm-size "16gb" --ipc host
+
+    steps:
+    - name: Checkout diffusers
+      uses: actions/checkout@v3
+      with:
+        fetch-depth: 2
+
+    - name: NVIDIA-SMI
+      run: nvidia-smi
+
+    - name: Install dependencies
+      run: |
+        python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
+        python -m uv pip install -e [quality,test]
+        pip uninstall accelerate -y && python -m uv pip install -U accelerate@git+https://github.com/huggingface/accelerate.git
+        python -m uv pip install pytest-reportlog
+    - name: Environment
+      run: python utils/print_env.py
+
+    - name: Run Nightly ONNXRuntime CUDA tests
+      env:
+        HF_TOKEN: ${{ secrets.DIFFUSERS_HF_HUB_READ_TOKEN }}
+      run: |
+        python -m pytest -n 1 --max-worker-restart=0 --dist=loadfile \
+          -s -v -k "Onnx" \
+          --make-reports=tests_onnx_cuda \
+          --report-log=tests_onnx_cuda.log \
+          tests/
+
+    - name: Failure short reports
+      if: ${{ failure() }}
+      run: |
+        cat reports/tests_onnx_cuda_stats.txt
+        cat reports/tests_onnx_cuda_failures_short.txt
+
+    - name: Test suite reports artifacts
+      if: ${{ always() }}
+      uses: actions/upload-artifact@v4
+      with:
+        name: tests_onnx_cuda_reports
+        path: reports
+
+    - name: Generate Report and Notify Channel
+      if: always()
+      run: |
+        pip install slack_sdk tabulate
+        python utils/log_reports.py >> $GITHUB_STEP_SUMMARY
+
+  run_nightly_quantization_tests:
+    name: Torch quantization nightly tests
+    strategy:
+      fail-fast: false
+      max-parallel: 2
+      matrix: 
+        config:
+          - backend: "bitsandbytes"
+            test_location: "bnb"
+          - backend: "gguf"
+            test_location: "gguf"
+          - backend: "torchao"
+            test_location: "torchao"
+    runs-on:
+      group: aws-g6e-xlarge-plus
+    container:
+      image: diffusers/diffusers-pytorch-cuda
+      options: --shm-size "20gb" --ipc host --gpus 0
    steps:
      - name: Checkout diffusers
        uses: actions/checkout@v3
        with:
          fetch-depth: 2
-
-      - name: Clean checkout
-        shell: arch -arch arm64 bash {0}
-        run: |
-          git clean -fxd
-
-      - name: Setup miniconda
-        uses: ./.github/actions/setup-miniconda
-        with:
-          python-version: 3.9
-
+      - name: NVIDIA-SMI
+        run: nvidia-smi
      - name: Install dependencies
-        shell: arch -arch arm64 bash {0}
        run: |
-          ${CONDA_RUN} python -m pip install --upgrade pip
-          ${CONDA_RUN} python -m pip install -e .[quality,test]
-          ${CONDA_RUN} python -m pip install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cpu
-          ${CONDA_RUN} python -m pip install git+https://github.com/huggingface/accelerate
-
+          python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
+          python -m uv pip install -e [quality,test]
+          python -m uv pip install -U ${{ matrix.config.backend }}
+          python -m uv pip install pytest-reportlog
      - name: Environment
-        shell: arch -arch arm64 bash {0}
        run: |
-          ${CONDA_RUN} python utils/print_env.py
-
-      - name: Run nightly PyTorch tests on M1 (MPS)
-        shell: arch -arch arm64 bash {0}
+          python utils/print_env.py
+      - name: ${{ matrix.config.backend }} quantization tests on GPU
        env:
-          HF_HOME: /System/Volumes/Data/mnt/cache
-          HUGGING_FACE_HUB_TOKEN: ${{ secrets.HUGGING_FACE_HUB_TOKEN }}
+          HF_TOKEN: ${{ secrets.DIFFUSERS_HF_HUB_READ_TOKEN }}
+          # https://pytorch.org/docs/stable/notes/randomness.html#avoiding-nondeterministic-algorithms
+          CUBLAS_WORKSPACE_CONFIG: :16:8
+          BIG_GPU_MEMORY: 40
        run: |
-          ${CONDA_RUN} python -m pytest -n 1 -s -v --make-reports=tests_torch_mps tests/
-
+          python -m pytest -n 1 --max-worker-restart=0 --dist=loadfile \
+            --make-reports=tests_${{ matrix.config.backend }}_torch_cuda \
+            --report-log=tests_${{ matrix.config.backend }}_torch_cuda.log \
+            tests/quantization/${{ matrix.config.test_location }}
      - name: Failure short reports
        if: ${{ failure() }}
-        run: cat reports/tests_torch_mps_failures_short.txt
-
+        run: |
+          cat reports/tests_${{ matrix.config.backend }}_torch_cuda_stats.txt
+          cat reports/tests_${{ matrix.config.backend }}_torch_cuda_failures_short.txt
      - name: Test suite reports artifacts
        if: ${{ always() }}
-        uses: actions/upload-artifact@v2
+        uses: actions/upload-artifact@v4
        with:
-          name: torch_mps_test_reports
+          name: torch_cuda_${{ matrix.config.backend }}_reports
          path: reports
+      - name: Generate Report and Notify Channel
+        if: always()
+        run: |
+          pip install slack_sdk tabulate
+          python utils/log_reports.py >> $GITHUB_STEP_SUMMARY
+
+# M1 runner currently not well supported
+# TODO: (Dhruv) add these back when we setup better testing for Apple Silicon
+#  run_nightly_tests_apple_m1:
+#    name: Nightly PyTorch MPS tests on MacOS
+#    runs-on: [ self-hosted, apple-m1 ]
+#    if: github.event_name == 'schedule'
+#
+#    steps:
+#      - name: Checkout diffusers
+#        uses: actions/checkout@v3
+#        with:
+#          fetch-depth: 2
+#
+#      - name: Clean checkout
+#        shell: arch -arch arm64 bash {0}
+#        run: |
+#          git clean -fxd
+#      - name: Setup miniconda
+#        uses: ./.github/actions/setup-miniconda
+#        with:
+#          python-version: 3.9
+#
+#      - name: Install dependencies
+#        shell: arch -arch arm64 bash {0}
+#        run: |
+#          ${CONDA_RUN} python -m pip install --upgrade pip uv
+#          ${CONDA_RUN} python -m uv pip install -e [quality,test]
+#          ${CONDA_RUN} python -m uv pip install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cpu
+#          ${CONDA_RUN} python -m uv pip install accelerate@git+https://github.com/huggingface/accelerate
+#          ${CONDA_RUN} python -m uv pip install pytest-reportlog
+#      - name: Environment
+#        shell: arch -arch arm64 bash {0}
+#        run: |
+#          ${CONDA_RUN} python utils/print_env.py
+#      - name: Run nightly PyTorch tests on M1 (MPS)
+#        shell: arch -arch arm64 bash {0}
+#        env:
+#          HF_HOME: /System/Volumes/Data/mnt/cache
+#          HF_TOKEN: ${{ secrets.DIFFUSERS_HF_HUB_READ_TOKEN }}
+#        run: |
+#          ${CONDA_RUN} python -m pytest -n 1 -s -v --make-reports=tests_torch_mps \
+#            --report-log=tests_torch_mps.log \
+#            tests/
+#      - name: Failure short reports
+#        if: ${{ failure() }}
+#        run: cat reports/tests_torch_mps_failures_short.txt
+#
+#      - name: Test suite reports artifacts
+#        if: ${{ always() }}
+#        uses: actions/upload-artifact@v4
+#        with:
+#          name: torch_mps_test_reports
+#          path: reports
+#
+#      - name: Generate Report and Notify Channel
+#        if: always()
+#        run: |
+#          pip install slack_sdk tabulate
+#          python utils/log_reports.py >> $GITHUB_STEP_SUMMARY  run_nightly_tests_apple_m1:
+#    name: Nightly PyTorch MPS tests on MacOS
+#    runs-on: [ self-hosted, apple-m1 ]
+#    if: github.event_name == 'schedule'
+#
+#    steps:
+#      - name: Checkout diffusers
+#        uses: actions/checkout@v3
+#        with:
+#          fetch-depth: 2
+#
+#      - name: Clean checkout
+#        shell: arch -arch arm64 bash {0}
+#        run: |
+#          git clean -fxd
+#      - name: Setup miniconda
+#        uses: ./.github/actions/setup-miniconda
+#        with:
+#          python-version: 3.9
+#
+#      - name: Install dependencies
+#        shell: arch -arch arm64 bash {0}
+#        run: |
+#          ${CONDA_RUN} python -m pip install --upgrade pip uv
+#          ${CONDA_RUN} python -m uv pip install -e [quality,test]
+#          ${CONDA_RUN} python -m uv pip install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cpu
+#          ${CONDA_RUN} python -m uv pip install accelerate@git+https://github.com/huggingface/accelerate
+#          ${CONDA_RUN} python -m uv pip install pytest-reportlog
+#      - name: Environment
+#        shell: arch -arch arm64 bash {0}
+#        run: |
+#          ${CONDA_RUN} python utils/print_env.py
+#      - name: Run nightly PyTorch tests on M1 (MPS)
+#        shell: arch -arch arm64 bash {0}
+#        env:
+#          HF_HOME: /System/Volumes/Data/mnt/cache
+#          HF_TOKEN: ${{ secrets.DIFFUSERS_HF_HUB_READ_TOKEN }}
+#        run: |
+#          ${CONDA_RUN} python -m pytest -n 1 -s -v --make-reports=tests_torch_mps \
+#            --report-log=tests_torch_mps.log \
+#            tests/
+#      - name: Failure short reports
+#        if: ${{ failure() }}
+#        run: cat reports/tests_torch_mps_failures_short.txt
+#
+#      - name: Test suite reports artifacts
+#        if: ${{ always() }}
+#        uses: actions/upload-artifact@v4
+#        with:
+#          name: torch_mps_test_reports
+#          path: reports
+#
+#      - name: Generate Report and Notify Channel
+#        if: always()
+#        run: |
+#          pip install slack_sdk tabulate
+#          python utils/log_reports.py >> $GITHUB_STEP_SUMMARY
--- a/.github/workflows/notify_slack_about_release.yml
+++ b/.github/workflows/notify_slack_about_release.yml
@@ -0,0 +1,23 @@
+name: Notify Slack about a release
+
+on:
+  workflow_dispatch:
+  release:
+    types: [published]
+
+jobs:
+  build:
+    runs-on: ubuntu-22.04
+
+    steps:
+    - uses: actions/checkout@v3
+
+    - name: Setup Python
+      uses: actions/setup-python@v4
+      with:
+        python-version: '3.8'
+
+    - name: Notify Slack about the release
+      env:
+        SLACK_WEBHOOK_URL: ${{ secrets.SLACK_WEBHOOK_URL }}
+      run: pip install requests && python utils/notify_slack_about_release.py
--- a/.github/workflows/pr_dependency_test.yml
+++ b/.github/workflows/pr_dependency_test.yml
@@ -4,6 +4,8 @@ on:
  pull_request:
    branches:
      - main
+    paths:
+      - "src/diffusers/**.py"
  push:
    branches:
      - main
@@ -14,7 +16,7 @@ concurrency:

 jobs:
  check_dependencies:
-    runs-on: ubuntu-latest
+    runs-on: ubuntu-22.04
    steps:
      - uses: actions/checkout@v3
      - name: Set up Python
@@ -23,10 +25,11 @@ jobs:
          python-version: "3.8"
      - name: Install dependencies
        run: |
-          python -m pip install --upgrade pip
-          pip install -e .
-          pip install pytest
+          python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
+          python -m pip install --upgrade pip uv
+          python -m uv pip install -e .
+          python -m uv pip install pytest
      - name: Check for soft dependencies
        run: |
+          python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
          pytest tests/others/test_dependencies.py
-      
--- a/.github/workflows/pr_flax_dependency_test.yml
+++ b/.github/workflows/pr_flax_dependency_test.yml
@@ -4,6 +4,8 @@ on:
  pull_request:
    branches:
      - main
+    paths:
+      - "src/diffusers/**.py"
  push:
    branches:
      - main
@@ -14,7 +16,7 @@ concurrency:

 jobs:
  check_flax_dependencies:
-    runs-on: ubuntu-latest
+    runs-on: ubuntu-22.04
    steps:
      - uses: actions/checkout@v3
      - name: Set up Python
@@ -23,12 +25,14 @@ jobs:
          python-version: "3.8"
      - name: Install dependencies
        run: |
-          python -m pip install --upgrade pip
-          pip install -e .
-          pip install "jax[cpu]>=0.2.16,!=0.3.2"
-          pip install "flax>=0.4.1"
-          pip install "jaxlib>=0.1.65"
-          pip install pytest
+          python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
+          python -m pip install --upgrade pip uv
+          python -m uv pip install -e .
+          python -m uv pip install "jax[cpu]>=0.2.16,!=0.3.2"
+          python -m uv pip install "flax>=0.4.1"
+          python -m uv pip install "jaxlib>=0.1.65"
+          python -m uv pip install pytest
      - name: Check for soft dependencies
        run: |
+          python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
          pytest tests/others/test_dependencies.py
--- a/.github/workflows/pr_quality.yml
+++ b/.github/workflows/pr_quality.yml
@@ -1,49 +0,0 @@
-name: Run code quality checks
-
-on:
-  pull_request:
-    branches:
-      - main
-  push:
-    branches:
-      - main
-
-concurrency:
-  group: ${{ github.workflow }}-${{ github.head_ref || github.run_id }}
-  cancel-in-progress: true
-
-jobs:
-  check_code_quality:
-    runs-on: ubuntu-latest
-    steps:
-      - uses: actions/checkout@v3
-      - name: Set up Python
-        uses: actions/setup-python@v4
-        with:
-          python-version: "3.8"
-      - name: Install dependencies
-        run: |
-          python -m pip install --upgrade pip
-          pip install .[quality]
-      - name: Check quality
-        run: |
-          ruff check examples tests src utils scripts
-          ruff format examples tests src utils scripts --check
-
-  check_repository_consistency:
-    runs-on: ubuntu-latest
-    steps:
-      - uses: actions/checkout@v3
-      - name: Set up Python
-        uses: actions/setup-python@v4
-        with:
-          python-version: "3.8"
-      - name: Install dependencies
-        run: |
-          python -m pip install --upgrade pip
-          pip install .[quality]
-      - name: Check quality
-        run: |
-          python utils/check_copies.py
-          python utils/check_dummies.py
-          make deps_table_check_updated
--- a/.github/workflows/pr_test_fetcher.yml
+++ b/.github/workflows/pr_test_fetcher.yml
@@ -15,7 +15,8 @@ concurrency:
 jobs:
  setup_pr_tests:
    name: Setup PR Tests
-    runs-on: docker-cpu
+    runs-on:
+      group: aws-general-8-plus
    container:
      image: diffusers/diffusers-pytorch-cpu
      options: --shm-size "16gb" --ipc host -v /mnt/hf_cache:/mnt/cache/
@@ -32,8 +33,8 @@ jobs:
        fetch-depth: 0
    - name: Install dependencies
      run: |
-        apt-get update && apt-get install libsndfile1-dev libgl1 -y
-        python -m pip install -e .[quality,test]
+        python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
+        python -m uv pip install -e [quality,test]
    - name: Environment
      run: |
        python utils/print_env.py
@@ -73,7 +74,8 @@ jobs:
      max-parallel: 2
      matrix:
        modules: ${{ fromJson(needs.setup_pr_tests.outputs.matrix) }}
-    runs-on: docker-cpu
+    runs-on:
+      group: aws-general-8-plus
    container:
      image: diffusers/diffusers-pytorch-cpu
      options: --shm-size "16gb" --ipc host -v /mnt/hf_cache:/mnt/cache/
@@ -88,16 +90,18 @@ jobs:

    - name: Install dependencies
      run: |
-        apt-get update && apt-get install libsndfile1-dev libgl1 -y
-        python -m pip install -e .[quality,test]
+        python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
+        python -m pip install -e [quality,test]
        python -m pip install accelerate

    - name: Environment
      run: |
+        python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
        python utils/print_env.py

    - name: Run all selected tests on CPU
      run: |
+        python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
        python -m pytest -n 2 --dist=loadfile -v --make-reports=${{ matrix.modules }}_tests_cpu ${{ fromJson(needs.setup_pr_tests.outputs.test_map)[matrix.modules] }}

    - name: Failure short reports
@@ -121,12 +125,13 @@ jobs:
        config:
          - name: Hub tests for models, schedulers, and pipelines
            framework: hub_tests_pytorch
-            runner: docker-cpu
+            runner: aws-general-8-plus
            image: diffusers/diffusers-pytorch-cpu
            report: torch_hub

    name: ${{ matrix.config.name }}
-    runs-on: ${{ matrix.config.runner }}
+    runs-on:
+      group: ${{ matrix.config.runner }}
    container:
      image: ${{ matrix.config.image }}
      options: --shm-size "16gb" --ipc host -v /mnt/hf_cache:/mnt/cache/
@@ -143,16 +148,18 @@ jobs:

    - name: Install dependencies
      run: |
-        apt-get update && apt-get install libsndfile1-dev libgl1 -y
-        python -m pip install -e .[quality,test]
+        python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
+        python -m pip install -e [quality,test]

    - name: Environment
      run: |
+        python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
        python utils/print_env.py

    - name: Run Hub tests for models, schedulers, and pipelines on a staging env
      if: ${{ matrix.config.framework == 'hub_tests_pytorch' }}
      run: |
+        python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
        HUGGINGFACE_CO_STAGING=true python -m pytest \
          -m "is_staging_test" \
          --make-reports=tests_${{ matrix.config.report }} \
@@ -164,7 +171,7 @@ jobs:

    - name: Test suite reports artifacts
      if: ${{ always() }}
-      uses: actions/upload-artifact@v2
+      uses: actions/upload-artifact@v4
      with:
        name: pr_${{ matrix.config.report }}_test_reports
        path: reports
--- a/.github/workflows/pr_test_peft_backend.yml
+++ b/.github/workflows/pr_test_peft_backend.yml
@@ -1,65 +0,0 @@
-name: Fast tests for PRs - PEFT backend
-
-on:
-  pull_request:
-    branches:
-      - main
-
-concurrency:
-  group: ${{ github.workflow }}-${{ github.head_ref || github.run_id }}
-  cancel-in-progress: true
-
-env:
-  DIFFUSERS_IS_CI: yes
-  OMP_NUM_THREADS: 4
-  MKL_NUM_THREADS: 4
-  PYTEST_TIMEOUT: 60
-
-jobs:
-  run_fast_tests:
-    strategy:
-      fail-fast: false
-      matrix:
-        lib-versions: ["main", "latest"]
-
-
-    name: LoRA - ${{ matrix.lib-versions }}
-
-    runs-on: docker-cpu
-
-    container:
-      image: diffusers/diffusers-pytorch-cpu
-      options: --shm-size "16gb" --ipc host -v /mnt/hf_cache:/mnt/cache/
-
-    defaults:
-      run:
-        shell: bash
-
-    steps:
-    - name: Checkout diffusers
-      uses: actions/checkout@v3
-      with:
-        fetch-depth: 2
-
-    - name: Install dependencies
-      run: |
-        apt-get update && apt-get install libsndfile1-dev libgl1 -y
-        python -m pip install -e .[quality,test]
-        if [ "${{ matrix.lib-versions }}" == "main" ]; then
-            python -m pip install -U git+https://github.com/huggingface/peft.git
-            python -m pip install -U git+https://github.com/huggingface/transformers.git
-            python -m pip install -U git+https://github.com/huggingface/accelerate.git
-        else
-            python -m pip install -U peft transformers accelerate
-        fi
-
-    - name: Environment
-      run: |
-        python utils/print_env.py
-
-    - name: Run fast PyTorch LoRA CPU tests with PEFT backend
-      run: |
-        python -m pytest -n 1 --max-worker-restart=0 --dist=loadfile \
-          -s -v \
-          --make-reports=tests_${{ matrix.config.report }} \
-          tests/lora/test_lora_layers_peft.py
--- a/.github/workflows/pr_tests.yml
+++ b/.github/workflows/pr_tests.yml
@@ -4,6 +4,14 @@ on:
  pull_request:
    branches:
      - main
+    paths:
+      - "src/diffusers/**.py"
+      - "benchmarks/**.py"
+      - "examples/**.py"
+      - "scripts/**.py"
+      - "tests/**.py"
+      - ".github/**.yml"
+      - "utils/**.py"
  push:
    branches:
      - ci-*
@@ -14,40 +22,85 @@ concurrency:

 env:
  DIFFUSERS_IS_CI: yes
+  HF_HUB_ENABLE_HF_TRANSFER: 1
  OMP_NUM_THREADS: 4
  MKL_NUM_THREADS: 4
  PYTEST_TIMEOUT: 60

 jobs:
+  check_code_quality:
+    runs-on: ubuntu-22.04
+    steps:
+      - uses: actions/checkout@v3
+      - name: Set up Python
+        uses: actions/setup-python@v4
+        with:
+          python-version: "3.8"
+      - name: Install dependencies
+        run: |
+          python -m pip install --upgrade pip
+          pip install .[quality]
+      - name: Check quality
+        run: make quality
+      - name: Check if failure
+        if: ${{ failure() }}
+        run: |
+          echo "Quality check failed. Please ensure the right dependency versions are installed with 'pip install -e .[quality]' and run 'make style && make quality'" >> $GITHUB_STEP_SUMMARY
+
+  check_repository_consistency:
+    needs: check_code_quality
+    runs-on: ubuntu-22.04
+    steps:
+      - uses: actions/checkout@v3
+      - name: Set up Python
+        uses: actions/setup-python@v4
+        with:
+          python-version: "3.8"
+      - name: Install dependencies
+        run: |
+          python -m pip install --upgrade pip
+          pip install .[quality]
+      - name: Check repo consistency
+        run: |
+          python utils/check_copies.py
+          python utils/check_dummies.py
+          make deps_table_check_updated
+      - name: Check if failure
+        if: ${{ failure() }}
+        run: |
+          echo "Repo consistency check failed. Please ensure the right dependency versions are installed with 'pip install -e .[quality]' and run 'make fix-copies'" >> $GITHUB_STEP_SUMMARY
+
  run_fast_tests:
+    needs: [check_code_quality, check_repository_consistency]
    strategy:
      fail-fast: false
      matrix:
        config:
          - name: Fast PyTorch Pipeline CPU tests
            framework: pytorch_pipelines
-            runner: docker-cpu
+            runner: aws-highmemory-32-plus
            image: diffusers/diffusers-pytorch-cpu
            report: torch_cpu_pipelines
          - name: Fast PyTorch Models & Schedulers CPU tests
            framework: pytorch_models
-            runner: docker-cpu
+            runner: aws-general-8-plus
            image: diffusers/diffusers-pytorch-cpu
            report: torch_cpu_models_schedulers
          - name: Fast Flax CPU tests
            framework: flax
-            runner: docker-cpu
+            runner: aws-general-8-plus
            image: diffusers/diffusers-flax-cpu
            report: flax_cpu
          - name: PyTorch Example CPU tests
            framework: pytorch_examples
-            runner: docker-cpu
+            runner: aws-general-8-plus
            image: diffusers/diffusers-pytorch-cpu
            report: torch_example_cpu

    name: ${{ matrix.config.name }}

-    runs-on: ${{ matrix.config.runner }}
+    runs-on:
+      group: ${{ matrix.config.runner }}

    container:
      image: ${{ matrix.config.image }}
@@ -65,18 +118,20 @@ jobs:

    - name: Install dependencies
      run: |
-        apt-get update && apt-get install libsndfile1-dev libgl1 -y
-        python -m pip install -e .[quality,test]
-        python -m pip install accelerate
+        python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
+        python -m uv pip install -e [quality,test]
+        python -m uv pip install accelerate

    - name: Environment
      run: |
+        python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
        python utils/print_env.py

    - name: Run fast PyTorch Pipeline CPU tests
      if: ${{ matrix.config.framework == 'pytorch_pipelines' }}
      run: |
-        python -m pytest -n 2 --max-worker-restart=0 --dist=loadfile \
+        python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
+        python -m pytest -n 8 --max-worker-restart=0 --dist=loadfile \
          -s -v -k "not Flax and not Onnx" \
          --make-reports=tests_${{ matrix.config.report }} \
          tests/pipelines
@@ -84,7 +139,8 @@ jobs:
    - name: Run fast PyTorch Model Scheduler CPU tests
      if: ${{ matrix.config.framework == 'pytorch_models' }}
      run: |
-        python -m pytest -n 2 --max-worker-restart=0 --dist=loadfile \
+        python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
+        python -m pytest -n 4 --max-worker-restart=0 --dist=loadfile \
          -s -v -k "not Flax and not Onnx and not Dependency" \
          --make-reports=tests_${{ matrix.config.report }} \
          tests/models tests/schedulers tests/others
@@ -92,7 +148,8 @@ jobs:
    - name: Run fast Flax TPU tests
      if: ${{ matrix.config.framework == 'flax' }}
      run: |
-        python -m pytest -n 2 --max-worker-restart=0 --dist=loadfile \
+        python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
+        python -m pytest -n 4 --max-worker-restart=0 --dist=loadfile \
          -s -v -k "Flax" \
          --make-reports=tests_${{ matrix.config.report }} \
          tests
@@ -100,8 +157,9 @@ jobs:
    - name: Run example PyTorch CPU tests
      if: ${{ matrix.config.framework == 'pytorch_examples' }}
      run: |
-        python -m pip install peft
-        python -m pytest -n 2 --max-worker-restart=0 --dist=loadfile \
+        python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
+        python -m uv pip install peft timm
+        python -m pytest -n 4 --max-worker-restart=0 --dist=loadfile \
          --make-reports=tests_${{ matrix.config.report }} \
          examples

@@ -111,19 +169,21 @@ jobs:

    - name: Test suite reports artifacts
      if: ${{ always() }}
-      uses: actions/upload-artifact@v2
+      uses: actions/upload-artifact@v4
      with:
-        name: pr_${{ matrix.config.report }}_test_reports
+        name: pr_${{ matrix.config.framework }}_${{ matrix.config.report }}_test_reports
        path: reports

  run_staging_tests:
+    needs: [check_code_quality, check_repository_consistency]
    strategy:
      fail-fast: false
      matrix:
        config:
          - name: Hub tests for models, schedulers, and pipelines
            framework: hub_tests_pytorch
-            runner: docker-cpu
+            runner:
+              group: aws-general-8-plus
            image: diffusers/diffusers-pytorch-cpu
            report: torch_hub

@@ -147,16 +207,18 @@ jobs:

    - name: Install dependencies
      run: |
-        apt-get update && apt-get install libsndfile1-dev libgl1 -y
-        python -m pip install -e .[quality,test]
+        python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
+        python -m uv pip install -e [quality,test]

    - name: Environment
      run: |
+        python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
        python utils/print_env.py

    - name: Run Hub tests for models, schedulers, and pipelines on a staging env
      if: ${{ matrix.config.framework == 'hub_tests_pytorch' }}
      run: |
+        python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
        HUGGINGFACE_CO_STAGING=true python -m pytest \
          -m "is_staging_test" \
          --make-reports=tests_${{ matrix.config.report }} \
@@ -168,7 +230,72 @@ jobs:

    - name: Test suite reports artifacts
      if: ${{ always() }}
-      uses: actions/upload-artifact@v2
+      uses: actions/upload-artifact@v4
      with:
        name: pr_${{ matrix.config.report }}_test_reports
        path: reports
+
+  run_lora_tests:
+    needs: [check_code_quality, check_repository_consistency]
+    strategy:
+      fail-fast: false
+
+    name: LoRA tests with PEFT main
+
+    runs-on:
+      group: aws-general-8-plus
+
+    container:
+      image: diffusers/diffusers-pytorch-cpu
+      options: --shm-size "16gb" --ipc host -v /mnt/hf_cache:/mnt/cache/
+
+    defaults:
+      run:
+        shell: bash
+
+    steps:
+    - name: Checkout diffusers
+      uses: actions/checkout@v3
+      with:
+        fetch-depth: 2
+
+    - name: Install dependencies
+      run: |
+        python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
+        python -m uv pip install -e [quality,test]
+        # TODO (sayakpaul, DN6): revisit `--no-deps`
+        python -m pip install -U peft@git+https://github.com/huggingface/peft.git --no-deps
+        python -m uv pip install -U transformers@git+https://github.com/huggingface/transformers.git --no-deps
+        python -m uv pip install -U tokenizers
+        pip uninstall accelerate -y && python -m uv pip install -U accelerate@git+https://github.com/huggingface/accelerate.git --no-deps
+
+    - name: Environment
+      run: |
+        python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
+        python utils/print_env.py
+
+    - name: Run fast PyTorch LoRA tests with PEFT
+      run: |
+        python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
+        python -m pytest -n 4 --max-worker-restart=0 --dist=loadfile \
+          -s -v \
+          --make-reports=tests_peft_main \
+          tests/lora/
+        python -m pytest -n 4 --max-worker-restart=0 --dist=loadfile \
+          -s -v \
+          --make-reports=tests_models_lora_peft_main \
+          tests/models/ -k "lora"
+
+    - name: Failure short reports
+      if: ${{ failure() }}
+      run: |
+        cat reports/tests_lora_failures_short.txt
+        cat reports/tests_models_lora_failures_short.txt
+
+    - name: Test suite reports artifacts
+      if: ${{ always() }}
+      uses: actions/upload-artifact@v4
+      with:
+        name: pr_main_test_reports
+        path: reports
+
--- a/.github/workflows/pr_torch_dependency_test.yml
+++ b/.github/workflows/pr_torch_dependency_test.yml
@@ -4,6 +4,8 @@ on:
  pull_request:
    branches:
      - main
+    paths:
+      - "src/diffusers/**.py"
  push:
    branches:
      - main
@@ -14,7 +16,7 @@ concurrency:

 jobs:
  check_torch_dependencies:
-    runs-on: ubuntu-latest
+    runs-on: ubuntu-22.04
    steps:
      - uses: actions/checkout@v3
      - name: Set up Python
@@ -23,10 +25,12 @@ jobs:
          python-version: "3.8"
      - name: Install dependencies
        run: |
-          python -m pip install --upgrade pip
-          pip install -e .
-          pip install torch torchvision torchaudio
-          pip install pytest
+          python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
+          python -m pip install --upgrade pip uv
+          python -m uv pip install -e .
+          python -m uv pip install torch torchvision torchaudio
+          python -m uv pip install pytest
      - name: Check for soft dependencies
        run: |
+          python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
          pytest tests/others/test_dependencies.py
--- a/.github/workflows/push_tests.yml
+++ b/.github/workflows/push_tests.yml
@@ -1,27 +1,30 @@
-name: Slow Tests on main
+name: Fast GPU Tests on main

 on:
+  workflow_dispatch:
  push:
    branches:
      - main
-
+    paths:
+      - "src/diffusers/**.py"
+      - "examples/**.py"
+      - "tests/**.py"

 env:
  DIFFUSERS_IS_CI: yes
-  HF_HOME: /mnt/cache
  OMP_NUM_THREADS: 8
  MKL_NUM_THREADS: 8
+  HF_HUB_ENABLE_HF_TRANSFER: 1
  PYTEST_TIMEOUT: 600
-  RUN_SLOW: yes
  PIPELINE_USAGE_CUTOFF: 50000

 jobs:
  setup_torch_cuda_pipeline_matrix:
    name: Setup Torch Pipelines CUDA Slow Tests Matrix
-    runs-on: docker-gpu
+    runs-on:
+      group: aws-general-8-plus
    container:
-      image: diffusers/diffusers-pytorch-cpu # this is a CPU image, but we need it to fetch the matrix
-      options: --shm-size "16gb" --ipc host
+      image: diffusers/diffusers-pytorch-cpu
    outputs:
      pipeline_test_matrix: ${{ steps.fetch_pipeline_matrix.outputs.pipeline_test_matrix }}
    steps:
@@ -31,40 +34,37 @@ jobs:
          fetch-depth: 2
      - name: Install dependencies
        run: |
-          apt-get update && apt-get install libsndfile1-dev libgl1 -y
-          python -m pip install -e .[quality,test]
-          python -m pip install git+https://github.com/huggingface/accelerate.git
-
+          python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
+          python -m uv pip install -e [quality,test]
      - name: Environment
        run: |
          python utils/print_env.py
-
      - name: Fetch Pipeline Matrix
        id: fetch_pipeline_matrix
        run: |
          matrix=$(python utils/fetch_torch_cuda_pipeline_test_matrix.py)
          echo $matrix
          echo "pipeline_test_matrix=$matrix" >> $GITHUB_OUTPUT
-
      - name: Pipeline Tests Artifacts
        if: ${{ always() }}
-        uses: actions/upload-artifact@v2
+        uses: actions/upload-artifact@v4
        with:
          name: test-pipelines.json
          path: reports

  torch_pipelines_cuda_tests:
-    name: Torch Pipelines CUDA Slow Tests
+    name: Torch Pipelines CUDA Tests
    needs: setup_torch_cuda_pipeline_matrix
    strategy:
      fail-fast: false
-      max-parallel: 1
+      max-parallel: 8
      matrix:
        module: ${{ fromJson(needs.setup_torch_cuda_pipeline_matrix.outputs.pipeline_test_matrix) }}
-    runs-on: docker-gpu
+    runs-on:
+      group: aws-g4dn-2xlarge
    container:
      image: diffusers/diffusers-pytorch-cuda
-      options: --shm-size "16gb" --ipc host -v /mnt/hf_cache:/mnt/cache/ --gpus 0
+      options: --shm-size "16gb" --ipc host --gpus 0
    steps:
      - name: Checkout diffusers
        uses: actions/checkout@v3
@@ -75,15 +75,15 @@ jobs:
          nvidia-smi
      - name: Install dependencies
        run: |
-          apt-get update && apt-get install libsndfile1-dev libgl1 -y
-          python -m pip install -e .[quality,test]
-          python -m pip install git+https://github.com/huggingface/accelerate.git
+          python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
+          python -m uv pip install -e [quality,test]
+          pip uninstall accelerate -y && python -m uv pip install -U accelerate@git+https://github.com/huggingface/accelerate.git
      - name: Environment
        run: |
          python utils/print_env.py
-      - name: Slow PyTorch CUDA checkpoint tests on Ubuntu
+      - name: PyTorch CUDA checkpoint tests on Ubuntu
        env:
-          HUGGING_FACE_HUB_TOKEN: ${{ secrets.HUGGING_FACE_HUB_TOKEN }}
+          HF_TOKEN: ${{ secrets.DIFFUSERS_HF_HUB_READ_TOKEN }}
          # https://pytorch.org/docs/stable/notes/randomness.html#avoiding-nondeterministic-algorithms
          CUBLAS_WORKSPACE_CONFIG: :16:8
        run: |
@@ -96,26 +96,28 @@ jobs:
        run: |
          cat reports/tests_pipeline_${{ matrix.module }}_cuda_stats.txt
          cat reports/tests_pipeline_${{ matrix.module }}_cuda_failures_short.txt
-
      - name: Test suite reports artifacts
        if: ${{ always() }}
-        uses: actions/upload-artifact@v2
+        uses: actions/upload-artifact@v4
        with:
          name: pipeline_${{ matrix.module }}_test_reports
          path: reports

  torch_cuda_tests:
    name: Torch CUDA Tests
-    runs-on: docker-gpu
+    runs-on:
+      group: aws-g4dn-2xlarge
    container:
      image: diffusers/diffusers-pytorch-cuda
-      options: --shm-size "16gb" --ipc host -v /mnt/hf_cache:/mnt/cache/ --gpus 0
+      options: --shm-size "16gb" --ipc host --gpus 0
    defaults:
      run:
        shell: bash
    strategy:
+      fail-fast: false
+      max-parallel: 2
      matrix:
-        module: [models, schedulers, lora, others]
+        module: [models, schedulers, lora, others, single_file]
    steps:
    - name: Checkout diffusers
      uses: actions/checkout@v3
@@ -124,94 +126,46 @@ jobs:

    - name: Install dependencies
      run: |
-        apt-get update && apt-get install libsndfile1-dev libgl1 -y
-        python -m pip install -e .[quality,test]
-        python -m pip install git+https://github.com/huggingface/accelerate.git
+        python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
+        python -m uv pip install -e [quality,test]
+        python -m uv pip install peft@git+https://github.com/huggingface/peft.git
+        pip uninstall accelerate -y && python -m uv pip install -U accelerate@git+https://github.com/huggingface/accelerate.git

    - name: Environment
      run: |
        python utils/print_env.py

-    - name: Run slow PyTorch CUDA tests
+    - name: Run PyTorch CUDA tests
      env:
-        HUGGING_FACE_HUB_TOKEN: ${{ secrets.HUGGING_FACE_HUB_TOKEN }}
+        HF_TOKEN: ${{ secrets.DIFFUSERS_HF_HUB_READ_TOKEN }}
        # https://pytorch.org/docs/stable/notes/randomness.html#avoiding-nondeterministic-algorithms
        CUBLAS_WORKSPACE_CONFIG: :16:8
      run: |
        python -m pytest -n 1 --max-worker-restart=0 --dist=loadfile \
          -s -v -k "not Flax and not Onnx" \
-          --make-reports=tests_torch_cuda \
+          --make-reports=tests_torch_cuda_${{ matrix.module }} \
          tests/${{ matrix.module }}

    - name: Failure short reports
      if: ${{ failure() }}
      run: |
-        cat reports/tests_torch_cuda_stats.txt
-        cat reports/tests_torch_cuda_failures_short.txt
+        cat reports/tests_torch_cuda_${{ matrix.module }}_stats.txt
+        cat reports/tests_torch_cuda_${{ matrix.module }}_failures_short.txt

    - name: Test suite reports artifacts
      if: ${{ always() }}
-      uses: actions/upload-artifact@v2
+      uses: actions/upload-artifact@v4
      with:
-        name: torch_cuda_test_reports
-        path: reports
-
-  peft_cuda_tests:
-    name: PEFT CUDA Tests
-    runs-on: docker-gpu
-    container:
-      image: diffusers/diffusers-pytorch-cuda
-      options: --shm-size "16gb" --ipc host -v /mnt/hf_cache:/mnt/cache/ --gpus 0
-    defaults:
-      run:
-        shell: bash
-    steps:
-    - name: Checkout diffusers
-      uses: actions/checkout@v3
-      with:
-        fetch-depth: 2
-
-    - name: Install dependencies
-      run: |
-        apt-get update && apt-get install libsndfile1-dev libgl1 -y
-        python -m pip install -e .[quality,test]
-        python -m pip install git+https://github.com/huggingface/accelerate.git
-        python -m pip install git+https://github.com/huggingface/peft.git
-
-    - name: Environment
-      run: |
-        python utils/print_env.py
-
-    - name: Run slow PEFT CUDA tests
-      env:
-        HUGGING_FACE_HUB_TOKEN: ${{ secrets.HUGGING_FACE_HUB_TOKEN }}
-        # https://pytorch.org/docs/stable/notes/randomness.html#avoiding-nondeterministic-algorithms
-        CUBLAS_WORKSPACE_CONFIG: :16:8
-      run: |
-        python -m pytest -n 1 --max-worker-restart=0 --dist=loadfile \
-          -s -v -k "not Flax and not Onnx and not PEFTLoRALoading" \
-          --make-reports=tests_peft_cuda \
-          tests/lora/
-
-    - name: Failure short reports
-      if: ${{ failure() }}
-      run: |
-        cat reports/tests_peft_cuda_stats.txt
-        cat reports/tests_peft_cuda_failures_short.txt
-
-    - name: Test suite reports artifacts
-      if: ${{ always() }}
-      uses: actions/upload-artifact@v2
-      with:
-        name: torch_peft_test_reports
+        name: torch_cuda_test_reports_${{ matrix.module }}
        path: reports

  flax_tpu_tests:
    name: Flax TPU Tests
-    runs-on: docker-tpu
+    runs-on:
+      group: gcp-ct5lp-hightpu-8t
    container:
      image: diffusers/diffusers-flax-tpu
-      options: --shm-size "16gb" --ipc host -v /mnt/hf_cache:/mnt/cache/ --privileged
+      options: --shm-size "16gb" --ipc host --privileged ${{ vars.V5_LITEPOD_8_ENV}} -v /mnt/hf_cache:/mnt/hf_cache 
    defaults:
      run:
        shell: bash
@@ -223,17 +177,17 @@ jobs:

    - name: Install dependencies
      run: |
-        apt-get update && apt-get install libsndfile1-dev libgl1 -y
-        python -m pip install -e .[quality,test]
-        python -m pip install git+https://github.com/huggingface/accelerate.git
+        python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
+        python -m uv pip install -e [quality,test]
+        pip uninstall accelerate -y && python -m uv pip install -U accelerate@git+https://github.com/huggingface/accelerate.git

    - name: Environment
      run: |
        python utils/print_env.py

-    - name: Run slow Flax TPU tests
+    - name: Run Flax TPU tests
      env:
-        HUGGING_FACE_HUB_TOKEN: ${{ secrets.HUGGING_FACE_HUB_TOKEN }}
+        HF_TOKEN: ${{ secrets.DIFFUSERS_HF_HUB_READ_TOKEN }}
      run: |
        python -m pytest -n 0 \
          -s -v -k "Flax" \
@@ -248,17 +202,18 @@ jobs:

    - name: Test suite reports artifacts
      if: ${{ always() }}
-      uses: actions/upload-artifact@v2
+      uses: actions/upload-artifact@v4
      with:
        name: flax_tpu_test_reports
        path: reports

  onnx_cuda_tests:
    name: ONNX CUDA Tests
-    runs-on: docker-gpu
+    runs-on:
+      group: aws-g4dn-2xlarge
    container:
      image: diffusers/diffusers-onnxruntime-cuda
-      options: --shm-size "16gb" --ipc host -v /mnt/hf_cache:/mnt/cache/ --gpus 0
+      options: --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/ --gpus 0
    defaults:
      run:
        shell: bash
@@ -270,17 +225,17 @@ jobs:

    - name: Install dependencies
      run: |
-        apt-get update && apt-get install libsndfile1-dev libgl1 -y
-        python -m pip install -e .[quality,test]
-        python -m pip install git+https://github.com/huggingface/accelerate.git
+        python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
+        python -m uv pip install -e [quality,test]
+        pip uninstall accelerate -y && python -m uv pip install -U accelerate@git+https://github.com/huggingface/accelerate.git

    - name: Environment
      run: |
        python utils/print_env.py

-    - name: Run slow ONNXRuntime CUDA tests
+    - name: Run ONNXRuntime CUDA tests
      env:
-        HUGGING_FACE_HUB_TOKEN: ${{ secrets.HUGGING_FACE_HUB_TOKEN }}
+        HF_TOKEN: ${{ secrets.DIFFUSERS_HF_HUB_READ_TOKEN }}
      run: |
        python -m pytest -n 1 --max-worker-restart=0 --dist=loadfile \
          -s -v -k "Onnx" \
@@ -295,7 +250,7 @@ jobs:

    - name: Test suite reports artifacts
      if: ${{ always() }}
-      uses: actions/upload-artifact@v2
+      uses: actions/upload-artifact@v4
      with:
        name: onnx_cuda_test_reports
        path: reports
@@ -303,11 +258,12 @@ jobs:
  run_torch_compile_tests:
    name: PyTorch Compile CUDA tests

-    runs-on: docker-gpu
+    runs-on:
+      group: aws-g4dn-2xlarge

    container:
      image: diffusers/diffusers-pytorch-compile-cuda
-      options: --gpus 0 --shm-size "16gb" --ipc host -v /mnt/hf_cache:/mnt/cache/
+      options: --gpus 0 --shm-size "16gb" --ipc host

    steps:
    - name: Checkout diffusers
@@ -320,13 +276,15 @@ jobs:
        nvidia-smi
    - name: Install dependencies
      run: |
-        python -m pip install -e .[quality,test,training]
+        python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
+        python -m uv pip install -e [quality,test,training]
    - name: Environment
      run: |
        python utils/print_env.py
    - name: Run example tests on GPU
      env:
-        HUGGING_FACE_HUB_TOKEN: ${{ secrets.HUGGING_FACE_HUB_TOKEN }}
+        HF_TOKEN: ${{ secrets.DIFFUSERS_HF_HUB_READ_TOKEN }}
+        RUN_COMPILE: yes
      run: |
        python -m pytest -n 1 --max-worker-restart=0 --dist=loadfile -s -v -k "compile" --make-reports=tests_torch_compile_cuda tests/
    - name: Failure short reports
@@ -335,7 +293,7 @@ jobs:

    - name: Test suite reports artifacts
      if: ${{ always() }}
-      uses: actions/upload-artifact@v2
+      uses: actions/upload-artifact@v4
      with:
        name: torch_compile_test_reports
        path: reports
@@ -343,11 +301,12 @@ jobs:
  run_xformers_tests:
    name: PyTorch xformers CUDA tests

-    runs-on: docker-gpu
+    runs-on:
+      group: aws-g4dn-2xlarge

    container:
      image: diffusers/diffusers-pytorch-xformers-cuda
-      options: --gpus 0 --shm-size "16gb" --ipc host -v /mnt/hf_cache:/mnt/cache/
+      options: --gpus 0 --shm-size "16gb" --ipc host

    steps:
    - name: Checkout diffusers
@@ -360,13 +319,14 @@ jobs:
        nvidia-smi
    - name: Install dependencies
      run: |
-        python -m pip install -e .[quality,test,training]
+        python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
+        python -m uv pip install -e [quality,test,training]
    - name: Environment
      run: |
        python utils/print_env.py
    - name: Run example tests on GPU
      env:
-        HUGGING_FACE_HUB_TOKEN: ${{ secrets.HUGGING_FACE_HUB_TOKEN }}
+        HF_TOKEN: ${{ secrets.DIFFUSERS_HF_HUB_READ_TOKEN }}
      run: |
        python -m pytest -n 1 --max-worker-restart=0 --dist=loadfile -s -v -k "xformers" --make-reports=tests_torch_xformers_cuda tests/
    - name: Failure short reports
@@ -375,7 +335,7 @@ jobs:

    - name: Test suite reports artifacts
      if: ${{ always() }}
-      uses: actions/upload-artifact@v2
+      uses: actions/upload-artifact@v4
      with:
        name: torch_xformers_test_reports
        path: reports
@@ -383,11 +343,12 @@ jobs:
  run_examples_tests:
    name: Examples PyTorch CUDA tests on Ubuntu

-    runs-on: docker-gpu
+    runs-on:
+      group: aws-g4dn-2xlarge

    container:
      image: diffusers/diffusers-pytorch-cuda
-      options: --gpus 0 --shm-size "16gb" --ipc host -v /mnt/hf_cache:/mnt/cache/
+      options: --gpus 0 --shm-size "16gb" --ipc host

    steps:
    - name: Checkout diffusers
@@ -401,16 +362,20 @@ jobs:

    - name: Install dependencies
      run: |
-        python -m pip install -e .[quality,test,training]
+        python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
+        python -m uv pip install -e [quality,test,training]

    - name: Environment
      run: |
+        python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
        python utils/print_env.py

    - name: Run example tests on GPU
      env:
-        HUGGING_FACE_HUB_TOKEN: ${{ secrets.HUGGING_FACE_HUB_TOKEN }}
+        HF_TOKEN: ${{ secrets.DIFFUSERS_HF_HUB_READ_TOKEN }}
      run: |
+        python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
+        python -m uv pip install timm
        python -m pytest -n 1 --max-worker-restart=0 --dist=loadfile -s -v --make-reports=examples_torch_cuda examples/

    - name: Failure short reports
@@ -421,7 +386,7 @@ jobs:

    - name: Test suite reports artifacts
      if: ${{ always() }}
-      uses: actions/upload-artifact@v2
+      uses: actions/upload-artifact@v4
      with:
        name: examples_test_reports
-        path: reports
+        path: reports
--- a/.github/workflows/push_tests_fast.yml
+++ b/.github/workflows/push_tests_fast.yml
@@ -4,6 +4,10 @@ on:
  push:
    branches:
      - main
+    paths:
+      - "src/diffusers/**.py"
+      - "examples/**.py"
+      - "tests/**.py"

 concurrency:
  group: ${{ github.workflow }}-${{ github.head_ref || github.run_id }}
@@ -14,6 +18,7 @@ env:
  HF_HOME: /mnt/cache
  OMP_NUM_THREADS: 8
  MKL_NUM_THREADS: 8
+  HF_HUB_ENABLE_HF_TRANSFER: 1
  PYTEST_TIMEOUT: 600
  RUN_SLOW: no

@@ -25,28 +30,29 @@ jobs:
        config:
          - name: Fast PyTorch CPU tests on Ubuntu
            framework: pytorch
-            runner: docker-cpu
+            runner: aws-general-8-plus
            image: diffusers/diffusers-pytorch-cpu
            report: torch_cpu
          - name: Fast Flax CPU tests on Ubuntu
            framework: flax
-            runner: docker-cpu
+            runner: aws-general-8-plus
            image: diffusers/diffusers-flax-cpu
            report: flax_cpu
          - name: Fast ONNXRuntime CPU tests on Ubuntu
            framework: onnxruntime
-            runner: docker-cpu
+            runner: aws-general-8-plus
            image: diffusers/diffusers-onnxruntime-cpu
            report: onnx_cpu
          - name: PyTorch Example CPU tests on Ubuntu
            framework: pytorch_examples
-            runner: docker-cpu
+            runner: aws-general-8-plus
            image: diffusers/diffusers-pytorch-cpu
            report: torch_example_cpu

    name: ${{ matrix.config.name }}

-    runs-on: ${{ matrix.config.runner }}
+    runs-on:
+      group: ${{ matrix.config.runner }}

    container:
      image: ${{ matrix.config.image }}
@@ -64,17 +70,19 @@ jobs:

    - name: Install dependencies
      run: |
-        apt-get update && apt-get install libsndfile1-dev libgl1 -y
-        python -m pip install -e .[quality,test]
+        python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
+        python -m uv pip install -e [quality,test]

    - name: Environment
      run: |
+        python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
        python utils/print_env.py

    - name: Run fast PyTorch CPU tests
      if: ${{ matrix.config.framework == 'pytorch' }}
      run: |
-        python -m pytest -n 2 --max-worker-restart=0 --dist=loadfile \
+        python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
+        python -m pytest -n 4 --max-worker-restart=0 --dist=loadfile \
          -s -v -k "not Flax and not Onnx" \
          --make-reports=tests_${{ matrix.config.report }} \
          tests/
@@ -82,7 +90,8 @@ jobs:
    - name: Run fast Flax TPU tests
      if: ${{ matrix.config.framework == 'flax' }}
      run: |
-        python -m pytest -n 2 --max-worker-restart=0 --dist=loadfile \
+        python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
+        python -m pytest -n 4 --max-worker-restart=0 --dist=loadfile \
          -s -v -k "Flax" \
          --make-reports=tests_${{ matrix.config.report }} \
          tests/
@@ -90,7 +99,8 @@ jobs:
    - name: Run fast ONNXRuntime CPU tests
      if: ${{ matrix.config.framework == 'onnxruntime' }}
      run: |
-        python -m pytest -n 2 --max-worker-restart=0 --dist=loadfile \
+        python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
+        python -m pytest -n 4 --max-worker-restart=0 --dist=loadfile \
          -s -v -k "Onnx" \
          --make-reports=tests_${{ matrix.config.report }} \
          tests/
@@ -98,8 +108,9 @@ jobs:
    - name: Run example PyTorch CPU tests
      if: ${{ matrix.config.framework == 'pytorch_examples' }}
      run: |
-        python -m pip install peft
-        python -m pytest -n 2 --max-worker-restart=0 --dist=loadfile \
+        python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
+        python -m uv pip install peft timm
+        python -m pytest -n 4 --max-worker-restart=0 --dist=loadfile \
          --make-reports=tests_${{ matrix.config.report }} \
          examples

@@ -109,7 +120,7 @@ jobs:

    - name: Test suite reports artifacts
      if: ${{ always() }}
-      uses: actions/upload-artifact@v2
+      uses: actions/upload-artifact@v4
      with:
        name: pr_${{ matrix.config.report }}_test_reports
        path: reports
--- a/.github/workflows/push_tests_mps.yml
+++ b/.github/workflows/push_tests_mps.yml
@@ -4,12 +4,16 @@ on:
  push:
    branches:
      - main
+    paths:
+      - "src/diffusers/**.py"
+      - "tests/**.py"

 env:
  DIFFUSERS_IS_CI: yes
  HF_HOME: /mnt/cache
  OMP_NUM_THREADS: 8
  MKL_NUM_THREADS: 8
+  HF_HUB_ENABLE_HF_TRANSFER: 1
  PYTEST_TIMEOUT: 600
  RUN_SLOW: no

@@ -20,7 +24,7 @@ concurrency:
 jobs:
  run_fast_tests_apple_m1:
    name: Fast PyTorch MPS tests on MacOS
-    runs-on: [ self-hosted, apple-m1 ]
+    runs-on: macos-13-xlarge

    steps:
    - name: Checkout diffusers
@@ -41,11 +45,11 @@ jobs:
    - name: Install dependencies
      shell: arch -arch arm64 bash {0}
      run: |
-        ${CONDA_RUN} python -m pip install --upgrade pip
-        ${CONDA_RUN} python -m pip install -e .[quality,test]
-        ${CONDA_RUN} python -m pip install torch torchvision torchaudio
-        ${CONDA_RUN} python -m pip install git+https://github.com/huggingface/accelerate.git
-        ${CONDA_RUN} python -m pip install transformers --upgrade
+        ${CONDA_RUN} python -m pip install --upgrade pip uv
+        ${CONDA_RUN} python -m uv pip install -e ".[quality,test]"
+        ${CONDA_RUN} python -m uv pip install torch torchvision torchaudio
+        ${CONDA_RUN} python -m uv pip install accelerate@git+https://github.com/huggingface/accelerate.git
+        ${CONDA_RUN} python -m uv pip install transformers --upgrade

    - name: Environment
      shell: arch -arch arm64 bash {0}
@@ -56,7 +60,7 @@ jobs:
      shell: arch -arch arm64 bash {0}
      env:
        HF_HOME: /System/Volumes/Data/mnt/cache
-        HUGGING_FACE_HUB_TOKEN: ${{ secrets.HUGGING_FACE_HUB_TOKEN }}
+        HF_TOKEN: ${{ secrets.HF_TOKEN }}
      run: |
        ${CONDA_RUN} python -m pytest -n 0 -s -v --make-reports=tests_torch_mps tests/

@@ -66,7 +70,7 @@ jobs:

    - name: Test suite reports artifacts
      if: ${{ always() }}
-      uses: actions/upload-artifact@v2
+      uses: actions/upload-artifact@v4
      with:
        name: pr_torch_mps_test_reports
        path: reports
--- a/.github/workflows/pypi_publish.yaml
+++ b/.github/workflows/pypi_publish.yaml
@@ -0,0 +1,81 @@
+# Adapted from https://blog.deepjyoti30.dev/pypi-release-github-action
+
+name: PyPI release
+
+on:
+  workflow_dispatch:
+  push:
+    tags:
+      - "*"
+
+jobs:
+  find-and-checkout-latest-branch:
+    runs-on: ubuntu-22.04
+    outputs:
+      latest_branch: ${{ steps.set_latest_branch.outputs.latest_branch }}
+    steps:
+      - name: Checkout Repo
+        uses: actions/checkout@v3
+
+      - name: Set up Python
+        uses: actions/setup-python@v4
+        with:
+          python-version: '3.8'
+
+      - name: Fetch latest branch
+        id: fetch_latest_branch
+        run: |
+          pip install -U requests packaging
+          LATEST_BRANCH=$(python utils/fetch_latest_release_branch.py)
+          echo "Latest branch: $LATEST_BRANCH"
+          echo "latest_branch=$LATEST_BRANCH" >> $GITHUB_ENV
+
+      - name: Set latest branch output
+        id: set_latest_branch
+        run: echo "::set-output name=latest_branch::${{ env.latest_branch }}"
+
+  release:
+    needs: find-and-checkout-latest-branch
+    runs-on: ubuntu-22.04
+
+    steps:
+      - name: Checkout Repo
+        uses: actions/checkout@v3
+        with:
+          ref: ${{ needs.find-and-checkout-latest-branch.outputs.latest_branch }}
+
+      - name: Setup Python
+        uses: actions/setup-python@v4
+        with:
+          python-version: "3.8"
+
+      - name: Install dependencies
+        run: |
+          python -m pip install --upgrade pip
+          pip install -U setuptools wheel twine
+          pip install -U torch --index-url https://download.pytorch.org/whl/cpu
+          pip install -U transformers
+
+      - name: Build the dist files
+        run: python setup.py bdist_wheel && python setup.py sdist
+
+      - name: Publish to the test PyPI
+        env:
+          TWINE_USERNAME: ${{ secrets.TEST_PYPI_USERNAME }}
+          TWINE_PASSWORD: ${{ secrets.TEST_PYPI_PASSWORD }}
+        run: twine upload dist/* -r pypitest --repository-url=https://test.pypi.org/legacy/
+
+      - name: Test installing diffusers and importing
+        run: |
+          pip install diffusers && pip uninstall diffusers -y
+          pip install -i https://test.pypi.org/simple/ diffusers
+          python -c "from diffusers import __version__; print(__version__)"
+          python -c "from diffusers import DiffusionPipeline; pipe = DiffusionPipeline.from_pretrained('fusing/unet-ldm-dummy-update'); pipe()"
+          python -c "from diffusers import DiffusionPipeline; pipe = DiffusionPipeline.from_pretrained('hf-internal-testing/tiny-stable-diffusion-pipe', safety_checker=None); pipe('ah suh du')"
+          python -c "from diffusers import *"
+
+      - name: Publish to PyPI
+        env:
+          TWINE_USERNAME: ${{ secrets.PYPI_USERNAME }}
+          TWINE_PASSWORD: ${{ secrets.PYPI_PASSWORD }}
+        run: twine upload dist/* -r pypi
--- a/.github/workflows/release_tests_fast.yml
+++ b/.github/workflows/release_tests_fast.yml
@@ -0,0 +1,446 @@
+# Duplicate workflow to push_tests.yml that is meant to run on release/patch branches as a final check
+# Creating a duplicate workflow here is simpler than adding complex path/branch parsing logic to push_tests.yml
+# Needs to be updated if push_tests.yml updated
+name: (Release) Fast GPU Tests on main
+
+on:
+  push:
+    branches:
+      - "v*.*.*-release"
+      - "v*.*.*-patch"
+
+env:
+  DIFFUSERS_IS_CI: yes
+  OMP_NUM_THREADS: 8
+  MKL_NUM_THREADS: 8
+  PYTEST_TIMEOUT: 600
+  PIPELINE_USAGE_CUTOFF: 50000
+
+jobs:
+  setup_torch_cuda_pipeline_matrix:
+    name: Setup Torch Pipelines CUDA Slow Tests Matrix
+    runs-on:
+      group: aws-general-8-plus
+    container:
+      image: diffusers/diffusers-pytorch-cpu
+    outputs:
+      pipeline_test_matrix: ${{ steps.fetch_pipeline_matrix.outputs.pipeline_test_matrix }}
+    steps:
+      - name: Checkout diffusers
+        uses: actions/checkout@v3
+        with:
+          fetch-depth: 2
+      - name: Install dependencies
+        run: |
+          python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
+          python -m uv pip install -e [quality,test]
+      - name: Environment
+        run: |
+          python utils/print_env.py
+      - name: Fetch Pipeline Matrix
+        id: fetch_pipeline_matrix
+        run: |
+          matrix=$(python utils/fetch_torch_cuda_pipeline_test_matrix.py)
+          echo $matrix
+          echo "pipeline_test_matrix=$matrix" >> $GITHUB_OUTPUT
+      - name: Pipeline Tests Artifacts
+        if: ${{ always() }}
+        uses: actions/upload-artifact@v4
+        with:
+          name: test-pipelines.json
+          path: reports
+
+  torch_pipelines_cuda_tests:
+    name: Torch Pipelines CUDA Tests
+    needs: setup_torch_cuda_pipeline_matrix
+    strategy:
+      fail-fast: false
+      max-parallel: 8
+      matrix:
+        module: ${{ fromJson(needs.setup_torch_cuda_pipeline_matrix.outputs.pipeline_test_matrix) }}
+    runs-on:
+      group: aws-g4dn-2xlarge
+    container:
+      image: diffusers/diffusers-pytorch-cuda
+      options: --shm-size "16gb" --ipc host --gpus 0
+    steps:
+      - name: Checkout diffusers
+        uses: actions/checkout@v3
+        with:
+          fetch-depth: 2
+      - name: NVIDIA-SMI
+        run: |
+          nvidia-smi
+      - name: Install dependencies
+        run: |
+          python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
+          python -m uv pip install -e [quality,test]
+          pip uninstall accelerate -y && python -m uv pip install -U accelerate@git+https://github.com/huggingface/accelerate.git
+      - name: Environment
+        run: |
+          python utils/print_env.py
+      - name: Slow PyTorch CUDA checkpoint tests on Ubuntu
+        env:
+          HF_TOKEN: ${{ secrets.DIFFUSERS_HF_HUB_READ_TOKEN }}
+          # https://pytorch.org/docs/stable/notes/randomness.html#avoiding-nondeterministic-algorithms
+          CUBLAS_WORKSPACE_CONFIG: :16:8
+        run: |
+          python -m pytest -n 1 --max-worker-restart=0 --dist=loadfile \
+            -s -v -k "not Flax and not Onnx" \
+            --make-reports=tests_pipeline_${{ matrix.module }}_cuda \
+            tests/pipelines/${{ matrix.module }}
+      - name: Failure short reports
+        if: ${{ failure() }}
+        run: |
+          cat reports/tests_pipeline_${{ matrix.module }}_cuda_stats.txt
+          cat reports/tests_pipeline_${{ matrix.module }}_cuda_failures_short.txt
+      - name: Test suite reports artifacts
+        if: ${{ always() }}
+        uses: actions/upload-artifact@v4
+        with:
+          name: pipeline_${{ matrix.module }}_test_reports
+          path: reports
+
+  torch_cuda_tests:
+    name: Torch CUDA Tests
+    runs-on:
+      group: aws-g4dn-2xlarge
+    container:
+      image: diffusers/diffusers-pytorch-cuda
+      options: --shm-size "16gb" --ipc host --gpus 0
+    defaults:
+      run:
+        shell: bash
+    strategy:
+      fail-fast: false
+      max-parallel: 2
+      matrix:
+        module: [models, schedulers, lora, others, single_file]
+    steps:
+    - name: Checkout diffusers
+      uses: actions/checkout@v3
+      with:
+        fetch-depth: 2
+
+    - name: Install dependencies
+      run: |
+        python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
+        python -m uv pip install -e [quality,test]
+        python -m uv pip install peft@git+https://github.com/huggingface/peft.git
+        pip uninstall accelerate -y && python -m uv pip install -U accelerate@git+https://github.com/huggingface/accelerate.git
+
+    - name: Environment
+      run: |
+        python utils/print_env.py
+
+    - name: Run PyTorch CUDA tests
+      env:
+        HF_TOKEN: ${{ secrets.DIFFUSERS_HF_HUB_READ_TOKEN }}
+        # https://pytorch.org/docs/stable/notes/randomness.html#avoiding-nondeterministic-algorithms
+        CUBLAS_WORKSPACE_CONFIG: :16:8
+      run: |
+        python -m pytest -n 1 --max-worker-restart=0 --dist=loadfile \
+          -s -v -k "not Flax and not Onnx" \
+          --make-reports=tests_torch_${{ matrix.module }}_cuda \
+          tests/${{ matrix.module }}
+
+    - name: Failure short reports
+      if: ${{ failure() }}
+      run: |
+        cat reports/tests_torch_${{ matrix.module }}_cuda_stats.txt
+        cat reports/tests_torch_${{ matrix.module }}_cuda_failures_short.txt
+
+    - name: Test suite reports artifacts
+      if: ${{ always() }}
+      uses: actions/upload-artifact@v4
+      with:
+        name: torch_cuda_${{ matrix.module }}_test_reports
+        path: reports
+
+  torch_minimum_version_cuda_tests:
+    name: Torch Minimum Version CUDA Tests
+    runs-on:
+      group: aws-g4dn-2xlarge
+    container:
+      image: diffusers/diffusers-pytorch-minimum-cuda
+      options: --shm-size "16gb" --ipc host --gpus 0
+    defaults:
+      run:
+        shell: bash
+    steps:
+      - name: Checkout diffusers
+        uses: actions/checkout@v3
+        with:
+          fetch-depth: 2
+
+      - name: Install dependencies
+        run: |
+          python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
+          python -m uv pip install -e [quality,test]
+          python -m uv pip install peft@git+https://github.com/huggingface/peft.git
+          pip uninstall accelerate -y && python -m uv pip install -U accelerate@git+https://github.com/huggingface/accelerate.git
+
+      - name: Environment
+        run: |
+          python utils/print_env.py
+
+      - name: Run PyTorch CUDA tests
+        env:
+          HF_TOKEN: ${{ secrets.DIFFUSERS_HF_HUB_READ_TOKEN }}
+          # https://pytorch.org/docs/stable/notes/randomness.html#avoiding-nondeterministic-algorithms
+          CUBLAS_WORKSPACE_CONFIG: :16:8
+        run: |
+          python -m pytest -n 1 --max-worker-restart=0 --dist=loadfile \
+            -s -v -k "not Flax and not Onnx" \
+            --make-reports=tests_torch_minimum_cuda \
+            tests/models/test_modeling_common.py \
+            tests/pipelines/test_pipelines_common.py \
+            tests/pipelines/test_pipeline_utils.py \
+            tests/pipelines/test_pipelines.py \
+            tests/pipelines/test_pipelines_auto.py \
+            tests/schedulers/test_schedulers.py \
+            tests/others
+
+      - name: Failure short reports
+        if: ${{ failure() }}
+        run: |
+          cat reports/tests_torch_minimum_version_cuda_stats.txt
+          cat reports/tests_torch_minimum_version_cuda_failures_short.txt
+
+      - name: Test suite reports artifacts
+        if: ${{ always() }}
+        uses: actions/upload-artifact@v4
+        with:
+          name: torch_minimum_version_cuda_test_reports
+          path: reports
+          
+  flax_tpu_tests:
+    name: Flax TPU Tests
+    runs-on: docker-tpu
+    container:
+      image: diffusers/diffusers-flax-tpu
+      options: --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/ --privileged
+    defaults:
+      run:
+        shell: bash
+    steps:
+    - name: Checkout diffusers
+      uses: actions/checkout@v3
+      with:
+        fetch-depth: 2
+
+    - name: Install dependencies
+      run: |
+        python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
+        python -m uv pip install -e [quality,test]
+        pip uninstall accelerate -y && python -m uv pip install -U accelerate@git+https://github.com/huggingface/accelerate.git
+
+    - name: Environment
+      run: |
+        python utils/print_env.py
+
+    - name: Run slow Flax TPU tests
+      env:
+        HF_TOKEN: ${{ secrets.DIFFUSERS_HF_HUB_READ_TOKEN }}
+      run: |
+        python -m pytest -n 0 \
+          -s -v -k "Flax" \
+          --make-reports=tests_flax_tpu \
+          tests/
+
+    - name: Failure short reports
+      if: ${{ failure() }}
+      run: |
+        cat reports/tests_flax_tpu_stats.txt
+        cat reports/tests_flax_tpu_failures_short.txt
+
+    - name: Test suite reports artifacts
+      if: ${{ always() }}
+      uses: actions/upload-artifact@v4
+      with:
+        name: flax_tpu_test_reports
+        path: reports
+
+  onnx_cuda_tests:
+    name: ONNX CUDA Tests
+    runs-on:
+      group: aws-g4dn-2xlarge
+    container:
+      image: diffusers/diffusers-onnxruntime-cuda
+      options: --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/ --gpus 0
+    defaults:
+      run:
+        shell: bash
+    steps:
+    - name: Checkout diffusers
+      uses: actions/checkout@v3
+      with:
+        fetch-depth: 2
+
+    - name: Install dependencies
+      run: |
+        python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
+        python -m uv pip install -e [quality,test]
+        pip uninstall accelerate -y && python -m uv pip install -U accelerate@git+https://github.com/huggingface/accelerate.git
+
+    - name: Environment
+      run: |
+        python utils/print_env.py
+
+    - name: Run slow ONNXRuntime CUDA tests
+      env:
+        HF_TOKEN: ${{ secrets.DIFFUSERS_HF_HUB_READ_TOKEN }}
+      run: |
+        python -m pytest -n 1 --max-worker-restart=0 --dist=loadfile \
+          -s -v -k "Onnx" \
+          --make-reports=tests_onnx_cuda \
+          tests/
+
+    - name: Failure short reports
+      if: ${{ failure() }}
+      run: |
+        cat reports/tests_onnx_cuda_stats.txt
+        cat reports/tests_onnx_cuda_failures_short.txt
+
+    - name: Test suite reports artifacts
+      if: ${{ always() }}
+      uses: actions/upload-artifact@v4
+      with:
+        name: onnx_cuda_test_reports
+        path: reports
+
+  run_torch_compile_tests:
+    name: PyTorch Compile CUDA tests
+
+    runs-on:
+      group: aws-g4dn-2xlarge
+
+    container:
+      image: diffusers/diffusers-pytorch-compile-cuda
+      options: --gpus 0 --shm-size "16gb" --ipc host
+
+    steps:
+    - name: Checkout diffusers
+      uses: actions/checkout@v3
+      with:
+        fetch-depth: 2
+
+    - name: NVIDIA-SMI
+      run: |
+        nvidia-smi
+    - name: Install dependencies
+      run: |
+        python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
+        python -m uv pip install -e [quality,test,training]
+    - name: Environment
+      run: |
+        python utils/print_env.py
+    - name: Run example tests on GPU
+      env:
+        HF_TOKEN: ${{ secrets.DIFFUSERS_HF_HUB_READ_TOKEN }}
+        RUN_COMPILE: yes
+      run: |
+        python -m pytest -n 1 --max-worker-restart=0 --dist=loadfile -s -v -k "compile" --make-reports=tests_torch_compile_cuda tests/
+    - name: Failure short reports
+      if: ${{ failure() }}
+      run: cat reports/tests_torch_compile_cuda_failures_short.txt
+
+    - name: Test suite reports artifacts
+      if: ${{ always() }}
+      uses: actions/upload-artifact@v4
+      with:
+        name: torch_compile_test_reports
+        path: reports
+
+  run_xformers_tests:
+    name: PyTorch xformers CUDA tests
+
+    runs-on:
+      group: aws-g4dn-2xlarge
+
+    container:
+      image: diffusers/diffusers-pytorch-xformers-cuda
+      options: --gpus 0 --shm-size "16gb" --ipc host
+
+    steps:
+    - name: Checkout diffusers
+      uses: actions/checkout@v3
+      with:
+        fetch-depth: 2
+
+    - name: NVIDIA-SMI
+      run: |
+        nvidia-smi
+    - name: Install dependencies
+      run: |
+        python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
+        python -m uv pip install -e [quality,test,training]
+    - name: Environment
+      run: |
+        python utils/print_env.py
+    - name: Run example tests on GPU
+      env:
+        HF_TOKEN: ${{ secrets.DIFFUSERS_HF_HUB_READ_TOKEN }}
+      run: |
+        python -m pytest -n 1 --max-worker-restart=0 --dist=loadfile -s -v -k "xformers" --make-reports=tests_torch_xformers_cuda tests/
+    - name: Failure short reports
+      if: ${{ failure() }}
+      run: cat reports/tests_torch_xformers_cuda_failures_short.txt
+
+    - name: Test suite reports artifacts
+      if: ${{ always() }}
+      uses: actions/upload-artifact@v4
+      with:
+        name: torch_xformers_test_reports
+        path: reports
+
+  run_examples_tests:
+    name: Examples PyTorch CUDA tests on Ubuntu
+
+    runs-on:
+      group: aws-g4dn-2xlarge
+
+    container:
+      image: diffusers/diffusers-pytorch-cuda
+      options: --gpus 0 --shm-size "16gb" --ipc host
+
+    steps:
+    - name: Checkout diffusers
+      uses: actions/checkout@v3
+      with:
+        fetch-depth: 2
+
+    - name: NVIDIA-SMI
+      run: |
+        nvidia-smi
+
+    - name: Install dependencies
+      run: |
+        python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
+        python -m uv pip install -e [quality,test,training]
+
+    - name: Environment
+      run: |
+        python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
+        python utils/print_env.py
+
+    - name: Run example tests on GPU
+      env:
+        HF_TOKEN: ${{ secrets.DIFFUSERS_HF_HUB_READ_TOKEN }}
+      run: |
+        python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
+        python -m uv pip install timm
+        python -m pytest -n 1 --max-worker-restart=0 --dist=loadfile -s -v --make-reports=examples_torch_cuda examples/
+
+    - name: Failure short reports
+      if: ${{ failure() }}
+      run: |
+        cat reports/examples_torch_cuda_stats.txt
+        cat reports/examples_torch_cuda_failures_short.txt
+
+    - name: Test suite reports artifacts
+      if: ${{ always() }}
+      uses: actions/upload-artifact@v4
+      with:
+        name: examples_test_reports
+        path: reports
--- a/.github/workflows/run_tests_from_a_pr.yml
+++ b/.github/workflows/run_tests_from_a_pr.yml
@@ -0,0 +1,74 @@
+name: Check running SLOW tests from a PR (only GPU)
+
+on:
+  workflow_dispatch:
+    inputs:
+      docker_image:
+        default: 'diffusers/diffusers-pytorch-cuda'
+        description: 'Name of the Docker image'
+        required: true
+      branch:
+        description: 'PR Branch to test on'
+        required: true
+      test:
+        description: 'Tests to run (e.g.: `tests/models`).'
+        required: true
+
+env:
+  DIFFUSERS_IS_CI: yes
+  IS_GITHUB_CI: "1"
+  HF_HOME: /mnt/cache
+  OMP_NUM_THREADS: 8
+  MKL_NUM_THREADS: 8
+  PYTEST_TIMEOUT: 600
+  RUN_SLOW: yes
+
+jobs:
+  run_tests:
+    name: "Run a test on our runner from a PR"
+    runs-on:
+      group: aws-g4dn-2xlarge
+    container:
+      image: ${{ github.event.inputs.docker_image }}
+      options: --gpus 0 --privileged --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/
+
+    steps:
+      - name: Validate test files input
+        id: validate_test_files
+        env:
+          PY_TEST: ${{ github.event.inputs.test }}
+        run: |
+          if [[ ! "$PY_TEST" =~ ^tests/ ]]; then
+            echo "Error: The input string must start with 'tests/'."
+            exit 1
+          fi
+
+          if [[ ! "$PY_TEST" =~ ^tests/(models|pipelines) ]]; then
+            echo "Error: The input string must contain either 'models' or 'pipelines' after 'tests/'."
+            exit 1
+          fi
+
+          if [[ "$PY_TEST" == *";"* ]]; then
+            echo "Error: The input string must not contain ';'."
+            exit 1
+          fi
+          echo "$PY_TEST"
+
+      - name: Checkout PR branch
+        uses: actions/checkout@v4
+        with:
+          ref: ${{ github.event.inputs.branch }}
+          repository: ${{ github.event.pull_request.head.repo.full_name }}
+
+
+      - name: Install pytest
+        run: |
+          python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
+          python -m uv pip install -e [quality,test]
+          python -m uv pip install peft
+
+      - name: Run tests
+        env:
+            PY_TEST: ${{ github.event.inputs.test }}
+        run: |
+          pytest "$PY_TEST"
--- a/.github/workflows/ssh-pr-runner.yml
+++ b/.github/workflows/ssh-pr-runner.yml
@@ -0,0 +1,40 @@
+name: SSH into PR runners
+
+on:
+  workflow_dispatch:
+    inputs:
+      docker_image:
+        description: 'Name of the Docker image'
+        required: true
+
+env:
+  IS_GITHUB_CI: "1"
+  HF_HUB_READ_TOKEN: ${{ secrets.HF_HUB_READ_TOKEN }}
+  HF_HOME: /mnt/cache
+  DIFFUSERS_IS_CI: yes
+  OMP_NUM_THREADS: 8
+  MKL_NUM_THREADS: 8
+  RUN_SLOW: yes
+
+jobs:
+  ssh_runner:
+    name: "SSH"
+    runs-on:
+      group: aws-highmemory-32-plus
+    container:
+      image: ${{ github.event.inputs.docker_image }}
+      options: --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface/diffusers:/mnt/cache/ --privileged
+
+    steps:
+      - name: Checkout diffusers
+        uses: actions/checkout@v3
+        with:
+          fetch-depth: 2
+
+      - name: Tailscale # In order to be able to SSH when a test fails
+        uses: huggingface/tailscale-action@main
+        with:
+          authkey: ${{ secrets.TAILSCALE_SSH_AUTHKEY }}
+          slackChannel: ${{ secrets.SLACK_CIFEEDBACK_CHANNEL }}
+          slackToken: ${{ secrets.SLACK_CIFEEDBACK_BOT_TOKEN }}
+          waitForSSH: true
--- a/.github/workflows/ssh-runner.yml
+++ b/.github/workflows/ssh-runner.yml
@@ -0,0 +1,52 @@
+name: SSH into GPU runners
+
+on:
+  workflow_dispatch:
+    inputs:
+      runner_type:
+        description: 'Type of runner to test (aws-g6-4xlarge-plus: a10, aws-g4dn-2xlarge: t4, aws-g6e-xlarge-plus: L40)'
+        type: choice
+        required: true
+        options:
+          - aws-g6-4xlarge-plus
+          - aws-g4dn-2xlarge
+          - aws-g6e-xlarge-plus
+      docker_image:
+        description: 'Name of the Docker image'
+        required: true
+
+env:
+  IS_GITHUB_CI: "1"
+  HF_HUB_READ_TOKEN: ${{ secrets.HF_HUB_READ_TOKEN }}
+  HF_HOME: /mnt/cache
+  DIFFUSERS_IS_CI: yes
+  OMP_NUM_THREADS: 8
+  MKL_NUM_THREADS: 8
+  RUN_SLOW: yes
+
+jobs:
+  ssh_runner:
+    name: "SSH"
+    runs-on:
+      group: "${{ github.event.inputs.runner_type }}"
+    container:
+      image: ${{ github.event.inputs.docker_image }}
+      options: --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface/diffusers:/mnt/cache/ --gpus 0 --privileged
+
+    steps:
+      - name: Checkout diffusers
+        uses: actions/checkout@v3
+        with:
+          fetch-depth: 2
+
+      - name: NVIDIA-SMI
+        run: |
+          nvidia-smi
+
+      - name: Tailscale # In order to be able to SSH when a test fails
+        uses: huggingface/tailscale-action@main
+        with:
+          authkey: ${{ secrets.TAILSCALE_SSH_AUTHKEY }}
+          slackChannel: ${{ secrets.SLACK_CIFEEDBACK_CHANNEL }}
+          slackToken: ${{ secrets.SLACK_CIFEEDBACK_BOT_TOKEN }}
+          waitForSSH: true
--- a/.github/workflows/stale.yml
+++ b/.github/workflows/stale.yml
@@ -8,7 +8,10 @@ jobs:
  close_stale_issues:
    name: Close Stale Issues
    if: github.repository == 'huggingface/diffusers'
-    runs-on: ubuntu-latest
+    runs-on: ubuntu-22.04
+    permissions:
+      issues: write
+      pull-requests: write
    env:
      GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
    steps:
--- a/.github/workflows/trufflehog.yml
+++ b/.github/workflows/trufflehog.yml
@@ -0,0 +1,18 @@
+on:
+  push:
+
+name: Secret Leaks
+
+jobs:
+  trufflehog:
+    runs-on: ubuntu-22.04
+    steps:
+    - name: Checkout code
+      uses: actions/checkout@v4
+      with:
+        fetch-depth: 0
+    - name: Secret Scanning
+      uses: trufflesecurity/trufflehog@main
+      with:
+        extra_args: --results=verified,unknown
+
--- a/.github/workflows/typos.yml
+++ b/.github/workflows/typos.yml
@@ -5,7 +5,7 @@ on:

 jobs:
  build:
-    runs-on: ubuntu-latest
+    runs-on: ubuntu-22.04

    steps:
      - uses: actions/checkout@v3
--- a/.github/workflows/update_metadata.yml
+++ b/.github/workflows/update_metadata.yml
@@ -0,0 +1,30 @@
+name: Update Diffusers metadata
+
+on:
+  workflow_dispatch:
+  push:
+    branches:
+      - main
+      - update_diffusers_metadata*
+
+jobs:
+  update_metadata:
+    runs-on: ubuntu-22.04
+    defaults:
+      run:
+        shell: bash -l {0}
+
+    steps:
+      - uses: actions/checkout@v3
+
+      - name: Setup environment
+        run: |
+          pip install --upgrade pip
+          pip install datasets pandas
+          pip install .[torch]
+
+      - name: Update metadata
+        env:
+          HF_TOKEN: ${{ secrets.SAYAK_HF_TOKEN }}
+        run: |
+          python utils/update_metadata.py --commit_sha ${{ github.sha }}
--- a/.gitignore
+++ b/.gitignore
@@ -175,4 +175,4 @@ tags
 .ruff_cache

 # wandb
-wandb
+wandb
--- a/CITATION.cff
+++ b/CITATION.cff
@@ -19,6 +19,16 @@ authors:
    family-names: Rasul
  - given-names: Mishig
    family-names: Davaadorj
+  - given-names: Dhruv
+    family-names: Nair
+  - given-names: Sayak
+    family-names: Paul
+  - given-names: Steven
+    family-names: Liu
+  - given-names: William
+    family-names: Berman
+  - given-names: Yiyi
+    family-names: Xu
  - given-names: Thomas
    family-names: Wolf
 repository-code: 'https://github.com/huggingface/diffusers'
--- a/CONTRIBUTING.md
+++ b/CONTRIBUTING.md
@@ -57,13 +57,13 @@ Any question or comment related to the Diffusers library can be asked on the [di
 - ...

 Every question that is asked on the forum or on Discord actively encourages the community to publicly
-share knowledge and might very well help a beginner in the future that has the same question you're
+share knowledge and might very well help a beginner in the future who has the same question you're
 having. Please do pose any questions you might have.
 In the same spirit, you are of immense help to the community by answering such questions because this way you are publicly documenting knowledge for everybody to learn from.

 **Please** keep in mind that the more effort you put into asking or answering a question, the higher
 the quality of the publicly documented knowledge. In the same way, well-posed and well-answered questions create a high-quality knowledge database accessible to everybody, while badly posed questions or answers reduce the overall quality of the public knowledge database.
-In short, a high quality question or answer is *precise*, *concise*, *relevant*, *easy-to-understand*, *accessible*, and *well-formated/well-posed*. For more information, please have a look through the [How to write a good issue](#how-to-write-a-good-issue) section.
+In short, a high quality question or answer is *precise*, *concise*, *relevant*, *easy-to-understand*, *accessible*, and *well-formatted/well-posed*. For more information, please have a look through the [How to write a good issue](#how-to-write-a-good-issue) section.

 **NOTE about channels**:
 [*The forum*](https://discuss.huggingface.co/c/discussion-related-to-httpsgithubcomhuggingfacediffusers/63) is much better indexed by search engines, such as Google. Posts are ranked by popularity rather than chronologically. Hence, it's easier to look up questions and answers that we posted some time ago.
@@ -245,7 +245,7 @@ The official training examples are maintained by the Diffusers' core maintainers
 This is because of the same reasons put forward in [6. Contribute a community pipeline](#6-contribute-a-community-pipeline) for official pipelines vs. community pipelines: It is not feasible for the core maintainers to maintain all possible training methods for diffusion models.
 If the Diffusers core maintainers and the community consider a certain training paradigm to be too experimental or not popular enough, the corresponding training code should be put in the `research_projects` folder and maintained by the author.

-Both official training and research examples consist of a directory that contains one or more training scripts, a requirements.txt file, and a README.md file. In order for the user to make use of the
+Both official training and research examples consist of a directory that contains one or more training scripts, a `requirements.txt` file, and a `README.md` file. In order for the user to make use of the
 training examples, it is required to clone the repository:

 ```bash
@@ -255,7 +255,8 @@ git clone https://github.com/huggingface/diffusers
 as well as to install all additional dependencies required for training:

 ```bash
-pip install -r /examples/<your-example-folder>/requirements.txt
+cd diffusers
+pip install -r examples/<your-example-folder>/requirements.txt
 ```

 Therefore when adding an example, the `requirements.txt` file shall define all pip dependencies required for your training example so that once all those are installed, the user can run the example's training script. See, for example, the [DreamBooth `requirements.txt` file](https://github.com/huggingface/diffusers/blob/main/examples/dreambooth/requirements.txt).
@@ -355,7 +356,7 @@ You will need basic `git` proficiency to be able to contribute to
 manual. Type `git --help` in a shell and enjoy. If you prefer books, [Pro
 Git](https://git-scm.com/book/en/v2) is a very good reference.

-Follow these steps to start contributing ([supported Python versions](https://github.com/huggingface/diffusers/blob/main/setup.py#L265)):
+Follow these steps to start contributing ([supported Python versions](https://github.com/huggingface/diffusers/blob/42f25d601a910dceadaee6c44345896b4cfa9928/setup.py#L270)):

 1. Fork the [repository](https://github.com/huggingface/diffusers) by
 clicking on the 'Fork' button on the repository's page. This creates a copy of the code
--- a/2
+++ b/2
@@ -42,6 +42,7 @@ repo-consistency:
 quality:
 	ruff check $(check_dirs) setup.py
 	ruff format --check $(check_dirs) setup.py
+	doc-builder style src/diffusers docs/source --max_len 119 --check_only
 	python utils/check_doc_toc.py

 # Format source code automatically and check is there are any problems left that need manual fixing
@@ -55,6 +56,7 @@ extra_style_checks:
 style:
 	ruff check $(check_dirs) setup.py --fix
 	ruff format $(check_dirs) setup.py
+	doc-builder style src/diffusers docs/source --max_len 119
 	${MAKE} autogenerate_code
 	${MAKE} extra_style_checks

--- a/PHILOSOPHY.md
+++ b/PHILOSOPHY.md
@@ -15,7 +15,7 @@ specific language governing permissions and limitations under the License.
 🧨 Diffusers provides **state-of-the-art** pretrained diffusion models across multiple modalities.
 Its purpose is to serve as a **modular toolbox** for both inference and training.

-We aim at building a library that stands the test of time and therefore take API design very seriously.
+We aim to build a library that stands the test of time and therefore take API design very seriously.

 In a nutshell, Diffusers is built to be a natural extension of PyTorch. Therefore, most of our design choices are based on [PyTorch's Design Principles](https://pytorch.org/docs/stable/community/design.html#pytorch-design-philosophy). Let's go over the most important ones:

@@ -63,14 +63,14 @@ Let's walk through more detailed design decisions for each class.
 Pipelines are designed to be easy to use (therefore do not follow [*Simple over easy*](#simple-over-easy) 100%), are not feature complete, and should loosely be seen as examples of how to use [models](#models) and [schedulers](#schedulers) for inference.

 The following design principles are followed:
- Pipelines follow the single-file policy. All pipelines can be found in individual directories under src/diffusers/pipelines. One pipeline folder corresponds to one diffusion paper/project/release. Multiple pipeline files can be gathered in one pipeline folder, as it’s done for [`src/diffusers/pipelines/stable-diffusion`](https://github.com/huggingface/diffusers/tree/main/src/diffusers/pipelines/stable_diffusion). If pipelines share similar functionality, one can make use of the [#Copied from mechanism](https://github.com/huggingface/diffusers/blob/125d783076e5bd9785beb05367a2d2566843a271/src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_img2img.py#L251).
+- Pipelines follow the single-file policy. All pipelines can be found in individual directories under src/diffusers/pipelines. One pipeline folder corresponds to one diffusion paper/project/release. Multiple pipeline files can be gathered in one pipeline folder, as it’s done for [`src/diffusers/pipelines/stable-diffusion`](https://github.com/huggingface/diffusers/tree/main/src/diffusers/pipelines/stable_diffusion). If pipelines share similar functionality, one can make use of the [# Copied from mechanism](https://github.com/huggingface/diffusers/blob/125d783076e5bd9785beb05367a2d2566843a271/src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_img2img.py#L251).
 - Pipelines all inherit from [`DiffusionPipeline`].
- Every pipeline consists of different model and scheduler components, that are documented in the [`model_index.json` file](https://huggingface.co/runwayml/stable-diffusion-v1-5/blob/main/model_index.json), are accessible under the same name as attributes of the pipeline and can be shared between pipelines with [`DiffusionPipeline.components`](https://huggingface.co/docs/diffusers/main/en/api/diffusion_pipeline#diffusers.DiffusionPipeline.components) function.
+- Every pipeline consists of different model and scheduler components, that are documented in the [`model_index.json` file](https://huggingface.co/stable-diffusion-v1-5/stable-diffusion-v1-5/blob/main/model_index.json), are accessible under the same name as attributes of the pipeline and can be shared between pipelines with [`DiffusionPipeline.components`](https://huggingface.co/docs/diffusers/main/en/api/diffusion_pipeline#diffusers.DiffusionPipeline.components) function.
 - Every pipeline should be loadable via the [`DiffusionPipeline.from_pretrained`](https://huggingface.co/docs/diffusers/main/en/api/diffusion_pipeline#diffusers.DiffusionPipeline.from_pretrained) function.
 - Pipelines should be used **only** for inference.
 - Pipelines should be very readable, self-explanatory, and easy to tweak.
 - Pipelines should be designed to build on top of each other and be easy to integrate into higher-level APIs.
- Pipelines are **not** intended to be feature-complete user interfaces. For future complete user interfaces one should rather have a look at [InvokeAI](https://github.com/invoke-ai/InvokeAI), [Diffuzers](https://github.com/abhishekkrthakur/diffuzers), and [lama-cleaner](https://github.com/Sanster/lama-cleaner).
+- Pipelines are **not** intended to be feature-complete user interfaces. For feature-complete user interfaces one should rather have a look at [InvokeAI](https://github.com/invoke-ai/InvokeAI), [Diffuzers](https://github.com/abhishekkrthakur/diffuzers), and [lama-cleaner](https://github.com/Sanster/lama-cleaner).
 - Every pipeline should have one and only one way to run it via a `__call__` method. The naming of the `__call__` arguments should be shared across all pipelines.
 - Pipelines should be named after the task they are intended to solve.
 - In almost all cases, novel diffusion pipelines shall be implemented in a new pipeline folder/file.
@@ -81,7 +81,7 @@ Models are designed as configurable toolboxes that are natural extensions of [Py

 The following design principles are followed:
 - Models correspond to **a type of model architecture**. *E.g.* the [`UNet2DConditionModel`] class is used for all UNet variations that expect 2D image inputs and are conditioned on some context.
- All models can be found in [`src/diffusers/models`](https://github.com/huggingface/diffusers/tree/main/src/diffusers/models) and every model architecture shall be defined in its file, e.g. [`unet_2d_condition.py`](https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/unet_2d_condition.py), [`transformer_2d.py`](https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/transformer_2d.py), etc...
+- All models can be found in [`src/diffusers/models`](https://github.com/huggingface/diffusers/tree/main/src/diffusers/models) and every model architecture shall be defined in its file, e.g. [`unets/unet_2d_condition.py`](https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/unets/unet_2d_condition.py), [`transformers/transformer_2d.py`](https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/transformers/transformer_2d.py), etc...
 - Models **do not** follow the single-file policy and should make use of smaller model building blocks, such as [`attention.py`](https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/attention.py), [`resnet.py`](https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/resnet.py), [`embeddings.py`](https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/embeddings.py), etc... **Note**: This is in stark contrast to Transformers' modeling files and shows that models do not really follow the single-file policy.
 - Models intend to expose complexity, just like PyTorch's `Module` class, and give clear error messages.
 - Models all inherit from `ModelMixin` and `ConfigMixin`.
@@ -90,7 +90,7 @@ The following design principles are followed:
 - To integrate new model checkpoints whose general architecture can be classified as an architecture that already exists in Diffusers, the existing model architecture shall be adapted to make it work with the new checkpoint. One should only create a new file if the model architecture is fundamentally different.
 - Models should be designed to be easily extendable to future changes. This can be achieved by limiting public function arguments, configuration arguments, and "foreseeing" future changes, *e.g.* it is usually better to add `string` "...type" arguments that can easily be extended to new future types instead of boolean `is_..._type` arguments. Only the minimum amount of changes shall be made to existing architectures to make a new model checkpoint work.
 - The model design is a difficult trade-off between keeping code readable and concise and supporting many model checkpoints. For most parts of the modeling code, classes shall be adapted for new model checkpoints, while there are some exceptions where it is preferred to add new classes to make sure the code is kept concise and
-readable long-term, such as [UNet blocks](https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/unet_2d_blocks.py) and [Attention processors](https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/attention_processor.py).
+readable long-term, such as [UNet blocks](https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/unets/unet_2d_blocks.py) and [Attention processors](https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/attention_processor.py).

 ### Schedulers

@@ -100,7 +100,7 @@ The following design principles are followed:
 - All schedulers are found in [`src/diffusers/schedulers`](https://github.com/huggingface/diffusers/tree/main/src/diffusers/schedulers).
 - Schedulers are **not** allowed to import from large utils files and shall be kept very self-contained.
 - One scheduler Python file corresponds to one scheduler algorithm (as might be defined in a paper).
- If schedulers share similar functionalities, we can make use of the `#Copied from` mechanism.
+- If schedulers share similar functionalities, we can make use of the `# Copied from` mechanism.
 - Schedulers all inherit from `SchedulerMixin` and `ConfigMixin`.
 - Schedulers can be easily swapped out with the [`ConfigMixin.from_config`](https://huggingface.co/docs/diffusers/main/en/api/configuration#diffusers.ConfigMixin.from_config) method as explained in detail [here](./docs/source/en/using-diffusers/schedulers.md).
 - Every scheduler has to have a `set_num_inference_steps`, and a `step` function. `set_num_inference_steps(...)` has to be called before every denoising process, *i.e.* before `step(...)` is called.
--- a/README.md
+++ b/README.md
@@ -20,21 +20,11 @@ limitations under the License.
    <br>
 <p>
 <p align="center">
-    <a href="https://github.com/huggingface/diffusers/blob/main/LICENSE">
-        <img alt="GitHub" src="https://img.shields.io/github/license/huggingface/datasets.svg?color=blue">
-    </a>
-    <a href="https://github.com/huggingface/diffusers/releases">
-        <img alt="GitHub release" src="https://img.shields.io/github/release/huggingface/diffusers.svg">
-    </a>
-    <a href="https://pepy.tech/project/diffusers">
-        <img alt="GitHub release" src="https://static.pepy.tech/badge/diffusers/month">
-    </a>
-    <a href="CODE_OF_CONDUCT.md">
-        <img alt="Contributor Covenant" src="https://img.shields.io/badge/Contributor%20Covenant-2.1-4baaaa.svg">
-    </a>
-    <a href="https://twitter.com/diffuserslib">
-        <img alt="X account" src="https://img.shields.io/twitter/url/https/twitter.com/diffuserslib.svg?style=social&label=Follow%20%40diffuserslib">
-    </a>
+    <a href="https://github.com/huggingface/diffusers/blob/main/LICENSE"><img alt="GitHub" src="https://img.shields.io/github/license/huggingface/datasets.svg?color=blue"></a>
+    <a href="https://github.com/huggingface/diffusers/releases"><img alt="GitHub release" src="https://img.shields.io/github/release/huggingface/diffusers.svg"></a>
+    <a href="https://pepy.tech/project/diffusers"><img alt="GitHub release" src="https://static.pepy.tech/badge/diffusers/month"></a>
+    <a href="CODE_OF_CONDUCT.md"><img alt="Contributor Covenant" src="https://img.shields.io/badge/Contributor%20Covenant-2.1-4baaaa.svg"></a>
+    <a href="https://twitter.com/diffuserslib"><img alt="X account" src="https://img.shields.io/twitter/url/https/twitter.com/diffuserslib.svg?style=social&label=Follow%20%40diffuserslib"></a>
 </p>

 🤗 Diffusers is the go-to library for state-of-the-art pretrained diffusion models for generating images, audio, and even 3D structures of molecules. Whether you're looking for a simple inference solution or training your own diffusion models, 🤗 Diffusers is a modular toolbox that supports both. Our library is designed with a focus on [usability over performance](https://huggingface.co/docs/diffusers/conceptual/philosophy#usability-over-performance), [simple over easy](https://huggingface.co/docs/diffusers/conceptual/philosophy#simple-over-easy), and [customizability over abstractions](https://huggingface.co/docs/diffusers/conceptual/philosophy#tweakable-contributorfriendly-over-abstraction).
@@ -77,13 +67,13 @@ Please refer to the [How to use Stable Diffusion in Apple Silicon](https://huggi

 ## Quickstart

-Generating outputs is super easy with 🤗 Diffusers. To generate an image from text, use the `from_pretrained` method to load any pretrained diffusion model (browse the [Hub](https://huggingface.co/models?library=diffusers&sort=downloads) for 19000+ checkpoints):
+Generating outputs is super easy with 🤗 Diffusers. To generate an image from text, use the `from_pretrained` method to load any pretrained diffusion model (browse the [Hub](https://huggingface.co/models?library=diffusers&sort=downloads) for 30,000+ checkpoints):

 ```python
 from diffusers import DiffusionPipeline
 import torch

-pipeline = DiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5", torch_dtype=torch.float16)
+pipeline = DiffusionPipeline.from_pretrained("stable-diffusion-v1-5/stable-diffusion-v1-5", torch_dtype=torch.float16)
 pipeline.to("cuda")
 pipeline("An image of a squirrel in Picasso style").images[0]
 ```
@@ -122,9 +112,9 @@ Check out the [Quickstart](https://huggingface.co/docs/diffusers/quicktour) to l
 | **Documentation**                                                   | **What can I learn?**                                                                                                                                                                           |
 |---------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
 | [Tutorial](https://huggingface.co/docs/diffusers/tutorials/tutorial_overview)                                                            | A basic crash course for learning how to use the library's most important features like using models and schedulers to build your own diffusion system, and training your own diffusion model.  |
-| [Loading](https://huggingface.co/docs/diffusers/using-diffusers/loading_overview)                                                             | Guides for how to load and configure all the components (pipelines, models, and schedulers) of the library, as well as how to use different schedulers.                                         |
-| [Pipelines for inference](https://huggingface.co/docs/diffusers/using-diffusers/pipeline_overview)                                             | Guides for how to use pipelines for different inference tasks, batched generation, controlling generated outputs and randomness, and how to contribute a pipeline to the library.               |
-| [Optimization](https://huggingface.co/docs/diffusers/optimization/opt_overview)                                                        | Guides for how to optimize your diffusion model to run faster and consume less memory.                                                                                                          |
+| [Loading](https://huggingface.co/docs/diffusers/using-diffusers/loading)                                                             | Guides for how to load and configure all the components (pipelines, models, and schedulers) of the library, as well as how to use different schedulers.                                         |
+| [Pipelines for inference](https://huggingface.co/docs/diffusers/using-diffusers/overview_techniques)                                             | Guides for how to use pipelines for different inference tasks, batched generation, controlling generated outputs and randomness, and how to contribute a pipeline to the library.               |
+| [Optimization](https://huggingface.co/docs/diffusers/optimization/fp16)                                                        | Guides for how to optimize your diffusion model to run faster and consume less memory.                                                                                                          |
 | [Training](https://huggingface.co/docs/diffusers/training/overview) | Guides for how to train a diffusion model for different tasks with different training techniques.                                                                                               |
 ## Contribution

@@ -154,7 +144,7 @@ Also, say 👋 in our public Discord channel <a href="https://discord.gg/G7tWnz9
  <tr style="border-top: 2px solid black">
    <td>Text-to-Image</td>
    <td><a href="https://huggingface.co/docs/diffusers/api/pipelines/stable_diffusion/text2img">Stable Diffusion Text-to-Image</a></td>
-      <td><a href="https://huggingface.co/runwayml/stable-diffusion-v1-5"> runwayml/stable-diffusion-v1-5 </a></td>
+      <td><a href="https://huggingface.co/stable-diffusion-v1-5/stable-diffusion-v1-5"> stable-diffusion-v1-5/stable-diffusion-v1-5 </a></td>
  </tr>
  <tr>
    <td>Text-to-Image</td>
@@ -184,7 +174,7 @@ Also, say 👋 in our public Discord channel <a href="https://discord.gg/G7tWnz9
  <tr>
    <td>Text-guided Image-to-Image</td>
    <td><a href="https://huggingface.co/docs/diffusers/api/pipelines/stable_diffusion/img2img">Stable Diffusion Image-to-Image</a></td>
-      <td><a href="https://huggingface.co/runwayml/stable-diffusion-v1-5"> runwayml/stable-diffusion-v1-5 </a></td>
+      <td><a href="https://huggingface.co/stable-diffusion-v1-5/stable-diffusion-v1-5"> stable-diffusion-v1-5/stable-diffusion-v1-5 </a></td>
  </tr>
  <tr style="border-top: 2px solid black">
    <td>Text-guided Image Inpainting</td>
@@ -212,6 +202,7 @@ Also, say 👋 in our public Discord channel <a href="https://discord.gg/G7tWnz9

 - https://github.com/microsoft/TaskMatrix
 - https://github.com/invoke-ai/InvokeAI
+- https://github.com/InstantID/InstantID
 - https://github.com/apple/ml-stable-diffusion
 - https://github.com/Sanster/lama-cleaner
 - https://github.com/IDEA-Research/Grounded-Segment-Anything
@@ -219,7 +210,7 @@ Also, say 👋 in our public Discord channel <a href="https://discord.gg/G7tWnz9
 - https://github.com/deep-floyd/IF
 - https://github.com/bentoml/BentoML
 - https://github.com/bmaltais/kohya_ss
- +8000 other amazing GitHub repositories 💪
+- +14,000 other amazing GitHub repositories 💪

 Thank you for using us ❤️.

@@ -238,7 +229,7 @@ We also want to thank @heejkoo for the very helpful overview of papers, code and

 ```bibtex
@misc{von-platen-etal-2022-diffusers,
-  author = {Patrick von Platen and Suraj Patil and Anton Lozhkov and Pedro Cuenca and Nathan Lambert and Kashif Rasul and Mishig Davaadorj and Thomas Wolf},
+  author = {Patrick von Platen and Suraj Patil and Anton Lozhkov and Pedro Cuenca and Nathan Lambert and Kashif Rasul and Mishig Davaadorj and Dhruv Nair and Sayak Paul and William Berman and Yiyi Xu and Steven Liu and Thomas Wolf},
  title = {Diffusers: State-of-the-art diffusion models},
  year = {2022},
  publisher = {GitHub},
--- a/benchmarks/base_classes.py
+++ b/benchmarks/base_classes.py
@@ -34,7 +34,7 @@ from utils import (  # noqa: E402


 RESOLUTION_MAPPING = {
-    "runwayml/stable-diffusion-v1-5": (512, 512),
+    "Lykon/DreamShaper": (512, 512),
    "lllyasviel/sd-controlnet-canny": (512, 512),
    "diffusers/controlnet-canny-sdxl-1.0": (1024, 1024),
    "TencentARC/t2iadapter_canny_sd14v1": (512, 512),
@@ -141,6 +141,7 @@ class LCMLoRATextToImageBenchmark(TextToImageBenchmark):
        super().__init__(args)
        self.pipe.load_lora_weights(self.lora_id)
        self.pipe.fuse_lora()
+        self.pipe.unload_lora_weights()
        self.pipe.scheduler = LCMScheduler.from_config(self.pipe.scheduler.config)

    def get_result_filepath(self, args):
@@ -235,10 +236,39 @@ class InpaintingBenchmark(ImageToImageBenchmark):
        )


+class IPAdapterTextToImageBenchmark(TextToImageBenchmark):
+    url = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/load_neg_embed.png"
+    image = load_image(url)
+
+    def __init__(self, args):
+        pipe = self.pipeline_class.from_pretrained(args.ckpt, torch_dtype=torch.float16).to("cuda")
+        pipe.load_ip_adapter(
+            args.ip_adapter_id[0],
+            subfolder="models" if "sdxl" not in args.ip_adapter_id[1] else "sdxl_models",
+            weight_name=args.ip_adapter_id[1],
+        )
+
+        if args.run_compile:
+            pipe.unet.to(memory_format=torch.channels_last)
+            print("Run torch compile")
+            pipe.unet = torch.compile(pipe.unet, mode="reduce-overhead", fullgraph=True)
+
+        pipe.set_progress_bar_config(disable=True)
+        self.pipe = pipe
+
+    def run_inference(self, pipe, args):
+        _ = pipe(
+            prompt=PROMPT,
+            ip_adapter_image=self.image,
+            num_inference_steps=args.num_inference_steps,
+            num_images_per_prompt=args.batch_size,
+        )
+
+
 class ControlNetBenchmark(TextToImageBenchmark):
    pipeline_class = StableDiffusionControlNetPipeline
    aux_network_class = ControlNetModel
-    root_ckpt = "runwayml/stable-diffusion-v1-5"
+    root_ckpt = "Lykon/DreamShaper"

    url = "https://huggingface.co/datasets/diffusers/docs-images/resolve/main/benchmarking/canny_image_condition.png"
    image = load_image(url).convert("RGB")
@@ -281,7 +311,7 @@ class ControlNetSDXLBenchmark(ControlNetBenchmark):
 class T2IAdapterBenchmark(ControlNetBenchmark):
    pipeline_class = StableDiffusionAdapterPipeline
    aux_network_class = T2IAdapter
-    root_ckpt = "CompVis/stable-diffusion-v1-4"
+    root_ckpt = "Lykon/DreamShaper"

    url = "https://huggingface.co/datasets/diffusers/docs-images/resolve/main/benchmarking/canny_for_adapter.png"
    image = load_image(url).convert("L")
--- a/benchmarks/benchmark_ip_adapters.py
+++ b/benchmarks/benchmark_ip_adapters.py
@@ -0,0 +1,33 @@
+import argparse
+import sys
+
+
+sys.path.append(".")
+from base_classes import IPAdapterTextToImageBenchmark  # noqa: E402
+
+
+IP_ADAPTER_CKPTS = {
+    # because original SD v1.5 has been taken down.
+    "Lykon/DreamShaper": ("h94/IP-Adapter", "ip-adapter_sd15.bin"),
+    "stabilityai/stable-diffusion-xl-base-1.0": ("h94/IP-Adapter", "ip-adapter_sdxl.bin"),
+}
+
+
+if __name__ == "__main__":
+    parser = argparse.ArgumentParser()
+    parser.add_argument(
+        "--ckpt",
+        type=str,
+        default="rstabilityai/stable-diffusion-xl-base-1.0",
+        choices=list(IP_ADAPTER_CKPTS.keys()),
+    )
+    parser.add_argument("--batch_size", type=int, default=1)
+    parser.add_argument("--num_inference_steps", type=int, default=50)
+    parser.add_argument("--model_cpu_offload", action="store_true")
+    parser.add_argument("--run_compile", action="store_true")
+    args = parser.parse_args()
+
+    args.ip_adapter_id = IP_ADAPTER_CKPTS[args.ckpt]
+    benchmark_pipe = IPAdapterTextToImageBenchmark(args)
+    args.ckpt = f"{args.ckpt} (IP-Adapter)"
+    benchmark_pipe.benchmark(args)
--- a/benchmarks/benchmark_sd_img.py
+++ b/benchmarks/benchmark_sd_img.py
@@ -11,9 +11,9 @@ if __name__ == "__main__":
    parser.add_argument(
        "--ckpt",
        type=str,
-        default="runwayml/stable-diffusion-v1-5",
+        default="Lykon/DreamShaper",
        choices=[
-            "runwayml/stable-diffusion-v1-5",
+            "Lykon/DreamShaper",
            "stabilityai/stable-diffusion-2-1",
            "stabilityai/stable-diffusion-xl-refiner-1.0",
            "stabilityai/sdxl-turbo",
--- a/benchmarks/benchmark_sd_inpainting.py
+++ b/benchmarks/benchmark_sd_inpainting.py
@@ -11,9 +11,9 @@ if __name__ == "__main__":
    parser.add_argument(
        "--ckpt",
        type=str,
-        default="runwayml/stable-diffusion-v1-5",
+        default="Lykon/DreamShaper",
        choices=[
-            "runwayml/stable-diffusion-v1-5",
+            "Lykon/DreamShaper",
            "stabilityai/stable-diffusion-2-1",
            "stabilityai/stable-diffusion-xl-base-1.0",
        ],
--- a/benchmarks/benchmark_text_to_image.py
+++ b/benchmarks/benchmark_text_to_image.py
@@ -7,7 +7,7 @@ from base_classes import TextToImageBenchmark, TurboTextToImageBenchmark  # noqa


 ALL_T2I_CKPTS = [
-    "runwayml/stable-diffusion-v1-5",
+    "Lykon/DreamShaper",
    "segmind/SSD-1B",
    "stabilityai/stable-diffusion-xl-base-1.0",
    "kandinsky-community/kandinsky-2-2-decoder",
@@ -21,7 +21,7 @@ if __name__ == "__main__":
    parser.add_argument(
        "--ckpt",
        type=str,
-        default="runwayml/stable-diffusion-v1-5",
+        default="Lykon/DreamShaper",
        choices=ALL_T2I_CKPTS,
    )
    parser.add_argument("--batch_size", type=int, default=1)
--- a/benchmarks/push_results.py
+++ b/benchmarks/push_results.py
@@ -3,7 +3,7 @@ import sys

 import pandas as pd
 from huggingface_hub import hf_hub_download, upload_file
-from huggingface_hub.utils._errors import EntryNotFoundError
+from huggingface_hub.utils import EntryNotFoundError


 sys.path.append(".")
--- a/benchmarks/run_all.py
+++ b/benchmarks/run_all.py
@@ -40,7 +40,7 @@ def main():
        print(f"****** Running file: {file} ******")

        # Run with canonical settings.
-        if file != "benchmark_text_to_image.py":
+        if file != "benchmark_text_to_image.py" and file != "benchmark_ip_adapters.py":
            command = f"python {file}"
            run_command(command.split())

@@ -49,6 +49,10 @@ def main():

    # Run variants.
    for file in python_files:
+        # See: https://github.com/pytorch/pytorch/issues/129637
+        if file == "benchmark_ip_adapters.py":
+            continue
+
        if file == "benchmark_text_to_image.py":
            for ckpt in ALL_T2I_CKPTS:
                command = f"python {file} --ckpt {ckpt}"
@@ -72,7 +76,7 @@ def main():
                command += " --run_compile"
                run_command(command.split())

-        elif file == "benchmark_sd_inpainting.py":
+        elif file in ["benchmark_sd_inpainting.py", "benchmark_ip_adapters.py"]:
            sdxl_ckpt = "stabilityai/stable-diffusion-xl-base-1.0"
            command = f"python {file} --ckpt {sdxl_ckpt}"
            run_command(command.split())
--- a/docker/diffusers-doc-builder/Dockerfile
+++ b/docker/diffusers-doc-builder/Dockerfile
@@ -0,0 +1,52 @@
+FROM ubuntu:20.04
+LABEL maintainer="Hugging Face"
+LABEL repository="diffusers"
+
+ENV DEBIAN_FRONTEND=noninteractive
+
+RUN apt-get -y update \
+    && apt-get install -y software-properties-common \
+    && add-apt-repository ppa:deadsnakes/ppa
+
+RUN apt install -y bash \
+                   build-essential \
+                   git \
+                   git-lfs \
+                   curl \
+                   ca-certificates \
+                   libsndfile1-dev \
+                   python3.10 \
+                   python3-pip \
+                   libgl1 \
+                   zip \
+                   wget \
+                   python3.10-venv && \
+    rm -rf /var/lib/apt/lists
+
+# make sure to use venv
+RUN python3.10 -m venv /opt/venv
+ENV PATH="/opt/venv/bin:$PATH"
+
+# pre-install the heavy dependencies (these can later be overridden by the deps from setup.py)
+RUN python3.10 -m pip install --no-cache-dir --upgrade pip uv==0.1.11 && \
+    python3.10 -m uv pip install --no-cache-dir \
+        torch \
+        torchvision \
+        torchaudio \
+        invisible_watermark \
+        --extra-index-url https://download.pytorch.org/whl/cpu && \
+    python3.10 -m uv pip install --no-cache-dir \
+        accelerate \
+        datasets \
+        hf-doc-builder \
+        huggingface-hub \
+        Jinja2 \
+        librosa \
+        numpy==1.26.4 \
+        scipy \
+        tensorboard \
+        transformers \
+        matplotlib \
+        setuptools==69.5.1
+
+CMD ["/bin/bash"]
--- a/docker/diffusers-flax-cpu/Dockerfile
+++ b/docker/diffusers-flax-cpu/Dockerfile
@@ -4,41 +4,46 @@ LABEL repository="diffusers"

 ENV DEBIAN_FRONTEND=noninteractive

-RUN apt update && \
-    apt install -y bash \
-                   build-essential \
-                   git \
-                   git-lfs \
-                   curl \
-                   ca-certificates \
-                   libsndfile1-dev \
-                   python3.8 \
-                   python3-pip \
-                   python3.8-venv && \
+RUN apt-get -y update \
+    && apt-get install -y software-properties-common \
+    && add-apt-repository ppa:deadsnakes/ppa
+
+RUN apt install -y bash \
+        build-essential \
+        git \
+        git-lfs \
+        curl \
+        ca-certificates \
+        libsndfile1-dev \
+        libgl1 \
+        python3.10 \
+        python3-pip \
+        python3.10-venv && \
    rm -rf /var/lib/apt/lists

 # make sure to use venv
-RUN python3 -m venv /opt/venv
+RUN python3.10 -m venv /opt/venv
 ENV PATH="/opt/venv/bin:$PATH"

 # pre-install the heavy dependencies (these can later be overridden by the deps from setup.py)
 # follow the instructions here: https://cloud.google.com/tpu/docs/run-in-container#train_a_jax_model_in_a_docker_container
-RUN python3 -m pip install --no-cache-dir --upgrade pip && \
-    python3 -m pip install --upgrade --no-cache-dir \
+RUN python3 -m pip install --no-cache-dir --upgrade pip uv==0.1.11 && \
+    python3 -m uv pip install --upgrade --no-cache-dir \
        clu \
        "jax[cpu]>=0.2.16,!=0.3.2" \
        "flax>=0.4.1" \
        "jaxlib>=0.1.65" && \
-    python3 -m pip install --no-cache-dir \
+    python3 -m uv pip install --no-cache-dir \
        accelerate \
        datasets \
        hf-doc-builder \
        huggingface-hub \
        Jinja2 \
        librosa \
-        numpy \
+        numpy==1.26.4 \
        scipy \
        tensorboard \
-        transformers
+        transformers \
+        hf_transfer

 CMD ["/bin/bash"]
--- a/docker/diffusers-flax-tpu/Dockerfile
+++ b/docker/diffusers-flax-tpu/Dockerfile
@@ -4,43 +4,48 @@ LABEL repository="diffusers"

 ENV DEBIAN_FRONTEND=noninteractive

-RUN apt update && \
-    apt install -y bash \
+RUN apt-get -y update \
+    && apt-get install -y software-properties-common \
+    && add-apt-repository ppa:deadsnakes/ppa
+
+RUN apt install -y bash \
                   build-essential \
                   git \
                   git-lfs \
                   curl \
                   ca-certificates \
                   libsndfile1-dev \
-                   python3.8 \
+                   libgl1 \
+                   python3.10 \
                   python3-pip \
-                   python3.8-venv && \
+                   python3.10-venv && \
    rm -rf /var/lib/apt/lists

 # make sure to use venv
-RUN python3 -m venv /opt/venv
+RUN python3.10 -m venv /opt/venv
 ENV PATH="/opt/venv/bin:$PATH"

 # pre-install the heavy dependencies (these can later be overridden by the deps from setup.py)
 # follow the instructions here: https://cloud.google.com/tpu/docs/run-in-container#train_a_jax_model_in_a_docker_container
-RUN python3 -m pip install --no-cache-dir --upgrade pip && \
+RUN python3 -m pip install --no-cache-dir --upgrade pip uv==0.1.11 && \
    python3 -m pip install --no-cache-dir \
        "jax[tpu]>=0.2.16,!=0.3.2" \
        -f https://storage.googleapis.com/jax-releases/libtpu_releases.html && \
-    python3 -m pip install --upgrade --no-cache-dir \
+    python3 -m uv pip install --upgrade --no-cache-dir \
        clu \
        "flax>=0.4.1" \
        "jaxlib>=0.1.65" && \
-    python3 -m pip install --no-cache-dir \
+    python3 -m uv pip install --no-cache-dir \
        accelerate \
        datasets \
        hf-doc-builder \
        huggingface-hub \
        Jinja2 \
-        librosa \        
-        numpy \
+        librosa \
+        numpy==1.26.4 \
        scipy \
        tensorboard \
-        transformers
+        transformers \
+        hf_transfer

 CMD ["/bin/bash"]
--- a/docker/diffusers-onnxruntime-cpu/Dockerfile
+++ b/docker/diffusers-onnxruntime-cpu/Dockerfile
@@ -4,41 +4,46 @@ LABEL repository="diffusers"

 ENV DEBIAN_FRONTEND=noninteractive

-RUN apt update && \
-    apt install -y bash \
+RUN apt-get -y update \
+    && apt-get install -y software-properties-common \
+    && add-apt-repository ppa:deadsnakes/ppa
+
+RUN apt install -y bash \
                   build-essential \
                   git \
                   git-lfs \
                   curl \
                   ca-certificates \
                   libsndfile1-dev \
-                   python3.8 \
+                   libgl1 \
+                   python3.10 \
                   python3-pip \
-                   python3.8-venv && \
+                   python3.10-venv && \
    rm -rf /var/lib/apt/lists

 # make sure to use venv
-RUN python3 -m venv /opt/venv
+RUN python3.10 -m venv /opt/venv
 ENV PATH="/opt/venv/bin:$PATH"

 # pre-install the heavy dependencies (these can later be overridden by the deps from setup.py)
-RUN python3 -m pip install --no-cache-dir --upgrade pip && \
-    python3 -m pip install --no-cache-dir \
+RUN python3 -m pip install --no-cache-dir --upgrade pip uv==0.1.11 && \
+    python3 -m uv pip install --no-cache-dir \
        torch==2.1.2 \
        torchvision==0.16.2 \
        torchaudio==2.1.2 \
        onnxruntime \
        --extra-index-url https://download.pytorch.org/whl/cpu && \
-    python3 -m pip install --no-cache-dir \
+    python3 -m uv pip install --no-cache-dir \
        accelerate \
        datasets \
        hf-doc-builder \
        huggingface-hub \
        Jinja2 \
        librosa \
-        numpy \
+        numpy==1.26.4 \
        scipy \
        tensorboard \
-        transformers
+        transformers \
+        hf_transfer

 CMD ["/bin/bash"]
--- a/docker/diffusers-onnxruntime-cuda/Dockerfile
+++ b/docker/diffusers-onnxruntime-cuda/Dockerfile
@@ -1,44 +1,50 @@
-FROM nvidia/cuda:11.6.2-cudnn8-devel-ubuntu20.04
+FROM nvidia/cuda:12.1.0-runtime-ubuntu20.04
 LABEL maintainer="Hugging Face"
 LABEL repository="diffusers"

 ENV DEBIAN_FRONTEND=noninteractive

-RUN apt update && \
-    apt install -y bash \
+RUN apt-get -y update \
+    && apt-get install -y software-properties-common \
+    && add-apt-repository ppa:deadsnakes/ppa
+
+RUN apt install -y bash \
                   build-essential \
                   git \
                   git-lfs \
                   curl \
                   ca-certificates \
                   libsndfile1-dev \
-                   python3.8 \
+                   libgl1 \
+                   python3.10 \
                   python3-pip \
-                   python3.8-venv && \
+                   python3.10-venv && \
    rm -rf /var/lib/apt/lists

 # make sure to use venv
-RUN python3 -m venv /opt/venv
+RUN python3.10 -m venv /opt/venv
 ENV PATH="/opt/venv/bin:$PATH"

 # pre-install the heavy dependencies (these can later be overridden by the deps from setup.py)
-RUN python3 -m pip install --no-cache-dir --upgrade pip && \
-    python3 -m pip install --no-cache-dir \
-        torch==2.1.2 \
-        torchvision==0.16.2 \
-        torchaudio==2.1.2 \
+RUN python3.10 -m pip install --no-cache-dir --upgrade pip uv==0.1.11 && \
+    python3.10 -m uv pip install --no-cache-dir \
+        torch \
+        torchvision \
+        torchaudio \
        "onnxruntime-gpu>=1.13.1" \
        --extra-index-url https://download.pytorch.org/whl/cu117 && \
-    python3 -m pip install --no-cache-dir \
+    python3.10 -m uv pip install --no-cache-dir \
        accelerate \
        datasets \
        hf-doc-builder \
        huggingface-hub \
+        hf_transfer \
        Jinja2 \
        librosa \
-        numpy \
+        numpy==1.26.4 \
        scipy \
        tensorboard \
-        transformers
+        transformers \
+        hf_transfer

 CMD ["/bin/bash"]
--- a/docker/diffusers-pytorch-compile-cuda/Dockerfile
+++ b/docker/diffusers-pytorch-compile-cuda/Dockerfile
@@ -4,8 +4,11 @@ LABEL repository="diffusers"

 ENV DEBIAN_FRONTEND=noninteractive

-RUN apt update && \
-    apt install -y bash \
+RUN apt-get -y update \
+    && apt-get install -y software-properties-common \
+    && add-apt-repository ppa:deadsnakes/ppa
+
+RUN apt install -y bash \
    build-essential \
    git \
    git-lfs \
@@ -13,33 +16,35 @@ RUN apt update && \
    ca-certificates \
    libsndfile1-dev \
    libgl1 \
-    python3.9 \
-    python3.9-dev \
+    python3.10 \
+    python3.10-dev \
    python3-pip \
-    python3.9-venv && \
+    python3.10-venv && \
    rm -rf /var/lib/apt/lists

 # make sure to use venv
-RUN python3.9 -m venv /opt/venv
+RUN python3.10 -m venv /opt/venv
 ENV PATH="/opt/venv/bin:$PATH"

 # pre-install the heavy dependencies (these can later be overridden by the deps from setup.py)
-RUN python3.9 -m pip install --no-cache-dir --upgrade pip && \
-    python3.9 -m pip install --no-cache-dir \
+RUN python3.10 -m pip install --no-cache-dir --upgrade pip uv==0.1.11 && \
+    python3.10 -m uv pip install --no-cache-dir \
    torch \
    torchvision \
    torchaudio \
    invisible_watermark && \
-    python3.9 -m pip install --no-cache-dir \
+    python3.10 -m pip install --no-cache-dir \
    accelerate \
    datasets \
    hf-doc-builder \
    huggingface-hub \
+    hf_transfer \
    Jinja2 \
    librosa \
-    numpy \
+    numpy==1.26.4 \
    scipy \
    tensorboard \
-    transformers
+    transformers \
+    hf_transfer

 CMD ["/bin/bash"]
--- a/docker/diffusers-pytorch-cpu/Dockerfile
+++ b/docker/diffusers-pytorch-cpu/Dockerfile
@@ -4,42 +4,47 @@ LABEL repository="diffusers"

 ENV DEBIAN_FRONTEND=noninteractive

-RUN apt update && \
-    apt install -y bash \
+RUN apt-get -y update \
+    && apt-get install -y software-properties-common \
+    && add-apt-repository ppa:deadsnakes/ppa
+
+RUN apt install -y bash \
                   build-essential \
                   git \
                   git-lfs \
                   curl \
                   ca-certificates \
                   libsndfile1-dev \
-                   python3.8 \
+                   python3.10 \
+                   python3.10-dev \
                   python3-pip \
                   libgl1 \
-                   python3.8-venv && \
+                   python3.10-venv && \
    rm -rf /var/lib/apt/lists

 # make sure to use venv
-RUN python3 -m venv /opt/venv
+RUN python3.10 -m venv /opt/venv
 ENV PATH="/opt/venv/bin:$PATH"

 # pre-install the heavy dependencies (these can later be overridden by the deps from setup.py)
-RUN python3 -m pip install --no-cache-dir --upgrade pip && \
-    python3 -m pip install --no-cache-dir \
+RUN python3.10 -m pip install --no-cache-dir --upgrade pip uv==0.1.11 && \
+    python3.10 -m uv pip install --no-cache-dir \
        torch \
        torchvision \
        torchaudio \
        invisible_watermark \
        --extra-index-url https://download.pytorch.org/whl/cpu && \
-    python3 -m pip install --no-cache-dir \
+    python3.10 -m uv pip install --no-cache-dir \
        accelerate \
        datasets \
        hf-doc-builder \
        huggingface-hub \
        Jinja2 \
        librosa \
-        numpy \
+        numpy==1.26.4 \
        scipy \
        tensorboard \
-        transformers
+        transformers matplotlib  \
+        hf_transfer

 CMD ["/bin/bash"]
--- a/docker/diffusers-pytorch-cuda/Dockerfile
+++ b/docker/diffusers-pytorch-cuda/Dockerfile
@@ -4,8 +4,11 @@ LABEL repository="diffusers"

 ENV DEBIAN_FRONTEND=noninteractive

-RUN apt update && \
-    apt install -y bash \
+RUN apt-get -y update \
+    && apt-get install -y software-properties-common \
+    && add-apt-repository ppa:deadsnakes/ppa
+
+RUN apt install -y bash \
    build-essential \
    git \
    git-lfs \
@@ -13,33 +16,36 @@ RUN apt update && \
    ca-certificates \
    libsndfile1-dev \
    libgl1 \
-    python3.8 \
+    python3.10 \
+    python3.10-dev \
    python3-pip \
-    python3.8-venv && \
+    python3.10-venv && \
    rm -rf /var/lib/apt/lists

 # make sure to use venv
-RUN python3 -m venv /opt/venv
+RUN python3.10 -m venv /opt/venv
 ENV PATH="/opt/venv/bin:$PATH"

 # pre-install the heavy dependencies (these can later be overridden by the deps from setup.py)
-RUN python3 -m pip install --no-cache-dir --upgrade pip && \
-    python3 -m pip install --no-cache-dir \
+RUN python3.10 -m pip install --no-cache-dir --upgrade pip uv==0.1.11 && \
+    python3.10 -m uv pip install --no-cache-dir \
    torch \
    torchvision \
    torchaudio \
    invisible_watermark && \
-    python3 -m pip install --no-cache-dir \
+    python3.10 -m pip install --no-cache-dir \
    accelerate \
    datasets \
    hf-doc-builder \
    huggingface-hub \
+    hf_transfer \
    Jinja2 \
    librosa \
-    numpy \
+    numpy==1.26.4 \
    scipy \
    tensorboard \
    transformers \
-    pytorch-lightning
+    pytorch-lightning  \
+    hf_transfer

 CMD ["/bin/bash"]
--- a/docker/diffusers-pytorch-minimum-cuda/Dockerfile
+++ b/docker/diffusers-pytorch-minimum-cuda/Dockerfile
@@ -0,0 +1,53 @@
+FROM nvidia/cuda:12.1.0-runtime-ubuntu20.04
+LABEL maintainer="Hugging Face"
+LABEL repository="diffusers"
+
+ENV DEBIAN_FRONTEND=noninteractive
+ENV MINIMUM_SUPPORTED_TORCH_VERSION="2.1.0"
+ENV MINIMUM_SUPPORTED_TORCHVISION_VERSION="0.16.0"
+ENV MINIMUM_SUPPORTED_TORCHAUDIO_VERSION="2.1.0"
+
+RUN apt-get -y update \
+    && apt-get install -y software-properties-common \
+    && add-apt-repository ppa:deadsnakes/ppa
+
+RUN apt install -y bash \
+    build-essential \
+    git \
+    git-lfs \
+    curl \
+    ca-certificates \
+    libsndfile1-dev \
+    libgl1 \
+    python3.10 \
+    python3.10-dev \
+    python3-pip \
+    python3.10-venv && \
+    rm -rf /var/lib/apt/lists
+
+# make sure to use venv
+RUN python3.10 -m venv /opt/venv
+ENV PATH="/opt/venv/bin:$PATH"
+
+# pre-install the heavy dependencies (these can later be overridden by the deps from setup.py)
+RUN python3.10 -m pip install --no-cache-dir --upgrade pip uv==0.1.11 && \
+    python3.10 -m uv pip install --no-cache-dir \
+    torch==$MINIMUM_SUPPORTED_TORCH_VERSION \
+    torchvision==$MINIMUM_SUPPORTED_TORCHVISION_VERSION \
+    torchaudio==$MINIMUM_SUPPORTED_TORCHAUDIO_VERSION \
+    invisible_watermark && \
+    python3.10 -m pip install --no-cache-dir \
+    accelerate \
+    datasets \
+    hf-doc-builder \
+    huggingface-hub \
+    hf_transfer \
+    Jinja2 \
+    librosa \
+    numpy==1.26.4 \
+    scipy \
+    tensorboard \
+    transformers \
+    hf_transfer
+
+CMD ["/bin/bash"]
--- a/docker/diffusers-pytorch-xformers-cuda/Dockerfile
+++ b/docker/diffusers-pytorch-xformers-cuda/Dockerfile
@@ -4,8 +4,11 @@ LABEL repository="diffusers"

 ENV DEBIAN_FRONTEND=noninteractive

-RUN apt update && \
-    apt install -y bash \
+RUN apt-get -y update \
+    && apt-get install -y software-properties-common \
+    && add-apt-repository ppa:deadsnakes/ppa
+
+RUN apt install -y bash \
                   build-essential \
                   git \
                   git-lfs \
@@ -13,33 +16,36 @@ RUN apt update && \
                   ca-certificates \
                   libsndfile1-dev \
                   libgl1 \
-                   python3.8 \
+                   python3.10 \
+                   python3.10-dev \
                   python3-pip \
-                   python3.8-venv && \
+                   python3.10-venv && \
    rm -rf /var/lib/apt/lists

 # make sure to use venv
-RUN python3 -m venv /opt/venv
+RUN python3.10 -m venv /opt/venv
 ENV PATH="/opt/venv/bin:$PATH"

 # pre-install the heavy dependencies (these can later be overridden by the deps from setup.py)
-RUN python3 -m pip install --no-cache-dir --upgrade pip && \
-    python3 -m pip install --no-cache-dir \
+RUN python3.10 -m pip install --no-cache-dir --upgrade pip uv==0.1.11 && \
+    python3.10 -m pip install --no-cache-dir \
        torch \
        torchvision \
        torchaudio \
        invisible_watermark && \
-    python3 -m pip install --no-cache-dir \
+    python3.10 -m uv pip install --no-cache-dir \
        accelerate \
        datasets \
        hf-doc-builder \
        huggingface-hub \
+        hf_transfer \
        Jinja2 \
        librosa \
-        numpy \
+        numpy==1.26.4 \
        scipy \
        tensorboard \
        transformers \
-        xformers
+        xformers  \
+        hf_transfer

 CMD ["/bin/bash"]
--- a/docs/README.md
+++ b/docs/README.md
@@ -242,10 +242,10 @@ Here's an example of a tuple return, comprising several objects:

 ```
    Returns:
-        `tuple(torch.FloatTensor)` comprising various elements depending on the configuration ([`BertConfig`]) and inputs:
-        - ** loss** (*optional*, returned when `masked_lm_labels` is provided) `torch.FloatTensor` of shape `(1,)` --
+        `tuple(torch.Tensor)` comprising various elements depending on the configuration ([`BertConfig`]) and inputs:
+        - ** loss** (*optional*, returned when `masked_lm_labels` is provided) `torch.Tensor` of shape `(1,)` --
          Total loss is the sum of the masked language modeling loss and the next sequence prediction (classification) loss.
-        - **prediction_scores** (`torch.FloatTensor` of shape `(batch_size, sequence_length, config.vocab_size)`) --
+        - **prediction_scores** (`torch.Tensor` of shape `(batch_size, sequence_length, config.vocab_size)`) --
          Prediction scores of the language modeling head (scores for each vocabulary token before SoftMax).
 ```

--- a/docs/source/en/_toctree.yml
+++ b/docs/source/en/_toctree.yml
@@ -18,155 +18,173 @@
  - local: tutorials/basic_training
    title: Train a diffusion model
  - local: tutorials/using_peft_for_inference
-    title: Inference with PEFT
+    title: Load LoRAs for inference
  - local: tutorials/fast_diffusion
    title: Accelerate inference of text-to-image diffusion models
+  - local: tutorials/inference_with_big_models
+    title: Working with big models
  title: Tutorials
 - sections:
-  - sections:
-    - local: using-diffusers/loading_overview
-      title: Overview
-    - local: using-diffusers/loading
-      title: Load pipelines, models, and schedulers
-    - local: using-diffusers/schedulers
-      title: Load and compare different schedulers
-    - local: using-diffusers/custom_pipeline_overview
-      title: Load community pipelines and components
-    - local: using-diffusers/using_safetensors
-      title: Load safetensors
-    - local: using-diffusers/other-formats
-      title: Load different Stable Diffusion formats
-    - local: using-diffusers/loading_adapters
-      title: Load adapters
-    - local: using-diffusers/push_to_hub
-      title: Push files to the Hub
-    title: Loading & Hub
-  - sections:
-    - local: using-diffusers/pipeline_overview
-      title: Overview
-    - local: using-diffusers/unconditional_image_generation
-      title: Unconditional image generation
-    - local: using-diffusers/conditional_image_generation
-      title: Text-to-image
-    - local: using-diffusers/img2img
-      title: Image-to-image
-    - local: using-diffusers/inpaint
-      title: Inpainting
-    - local: using-diffusers/text-img2vid
-      title: Text or image-to-video
-    - local: using-diffusers/depth2img
-      title: Depth-to-image
-    title: Tasks
-  - sections:
-    - local: using-diffusers/textual_inversion_inference
-      title: Textual inversion
-    - local: using-diffusers/ip_adapter
-      title: IP-Adapter
-    - local: training/distributed_inference
-      title: Distributed inference with multiple GPUs
-    - local: using-diffusers/reusing_seeds
-      title: Improve image quality with deterministic generation
-    - local: using-diffusers/control_brightness
-      title: Control image brightness
-    - local: using-diffusers/weighted_prompts
-      title: Prompt weighting
-    - local: using-diffusers/freeu
-      title: Improve generation quality with FreeU
-    title: Techniques
-  - sections:
-    - local: using-diffusers/pipeline_overview
-      title: Overview
-    - local: using-diffusers/sdxl
-      title: Stable Diffusion XL
-    - local: using-diffusers/sdxl_turbo
-      title: SDXL Turbo
-    - local: using-diffusers/kandinsky
-      title: Kandinsky
-    - local: using-diffusers/controlnet
-      title: ControlNet
-    - local: using-diffusers/shap-e
-      title: Shap-E
-    - local: using-diffusers/diffedit
-      title: DiffEdit
-    - local: using-diffusers/distilled_sd
-      title: Distilled Stable Diffusion inference
-    - local: using-diffusers/callback
-      title: Pipeline callbacks
-    - local: using-diffusers/reproducibility
-      title: Create reproducible pipelines
-    - local: using-diffusers/custom_pipeline_examples
-      title: Community pipelines
-    - local: using-diffusers/contribute_pipeline
-      title: Contribute a community pipeline
-    - local: using-diffusers/inference_with_lcm_lora
-      title: Latent Consistency Model-LoRA
-    - local: using-diffusers/inference_with_lcm
-      title: Latent Consistency Model
-    - local: using-diffusers/svd
-      title: Stable Video Diffusion
-    title: Specific pipeline examples
-  - sections:
-    - local: training/overview
-      title: Overview
-    - local: training/create_dataset
-      title: Create a dataset for training
-    - local: training/adapt_a_model
-      title: Adapt a model to a new task
-    - sections:
-      - local: training/unconditional_training
-        title: Unconditional image generation
-      - local: training/text2image
-        title: Text-to-image
-      - local: training/sdxl
-        title: Stable Diffusion XL
-      - local: training/kandinsky
-        title: Kandinsky 2.2
-      - local: training/wuerstchen
-        title: Wuerstchen
-      - local: training/controlnet
-        title: ControlNet
-      - local: training/t2i_adapters
-        title: T2I-Adapters
-      - local: training/instructpix2pix
-        title: InstructPix2Pix
-      title: Models
-    - sections:
-      - local: training/text_inversion
-        title: Textual Inversion
-      - local: training/dreambooth
-        title: DreamBooth
-      - local: training/lora
-        title: LoRA
-      - local: training/custom_diffusion
-        title: Custom Diffusion
-      - local: training/lcm_distill
-        title: Latent Consistency Distillation
-      - local: training/ddpo
-        title: Reinforcement learning training with DDPO
-      title: Methods
-    title: Training
-  - sections:
-    - local: using-diffusers/other-modalities
-      title: Other Modalities
-    title: Taking Diffusers Beyond Images
-  title: Using Diffusers
+  - local: using-diffusers/loading
+    title: Load pipelines
+  - local: using-diffusers/custom_pipeline_overview
+    title: Load community pipelines and components
+  - local: using-diffusers/schedulers
+    title: Load schedulers and models
+  - local: using-diffusers/other-formats
+    title: Model files and layouts
+  - local: using-diffusers/loading_adapters
+    title: Load adapters
+  - local: using-diffusers/push_to_hub
+    title: Push files to the Hub
+  title: Load pipelines and adapters
 - sections:
-  - local: optimization/opt_overview
+  - local: using-diffusers/unconditional_image_generation
+    title: Unconditional image generation
+  - local: using-diffusers/conditional_image_generation
+    title: Text-to-image
+  - local: using-diffusers/img2img
+    title: Image-to-image
+  - local: using-diffusers/inpaint
+    title: Inpainting
+  - local: using-diffusers/text-img2vid
+    title: Video generation
+  - local: using-diffusers/depth2img
+    title: Depth-to-image
+  title: Generative tasks
+- sections:
+  - local: using-diffusers/overview_techniques
    title: Overview
-  - sections:
-    - local: optimization/fp16
-      title: Speed up inference
-    - local: optimization/memory
-      title: Reduce memory usage
-    - local: optimization/torch2.0
-      title: PyTorch 2.0
-    - local: optimization/xformers
-      title: xFormers
-    - local: optimization/tome
-      title: Token merging
-    - local: optimization/deepcache
-      title: DeepCache
-    title: General optimizations
+  - local: using-diffusers/create_a_server
+    title: Create a server
+  - local: training/distributed_inference
+    title: Distributed inference
+  - local: using-diffusers/merge_loras
+    title: Merge LoRAs
+  - local: using-diffusers/scheduler_features
+    title: Scheduler features
+  - local: using-diffusers/callback
+    title: Pipeline callbacks
+  - local: using-diffusers/reusing_seeds
+    title: Reproducible pipelines
+  - local: using-diffusers/image_quality
+    title: Controlling image quality
+  - local: using-diffusers/weighted_prompts
+    title: Prompt techniques
+  title: Inference techniques
+- sections:
+  - local: advanced_inference/outpaint
+    title: Outpainting
+  title: Advanced inference
+- sections:
+  - local: using-diffusers/cogvideox
+    title: CogVideoX
+  - local: using-diffusers/consisid
+    title: ConsisID
+  - local: using-diffusers/sdxl
+    title: Stable Diffusion XL
+  - local: using-diffusers/sdxl_turbo
+    title: SDXL Turbo
+  - local: using-diffusers/kandinsky
+    title: Kandinsky
+  - local: using-diffusers/ip_adapter
+    title: IP-Adapter
+  - local: using-diffusers/omnigen
+    title: OmniGen
+  - local: using-diffusers/pag
+    title: PAG
+  - local: using-diffusers/controlnet
+    title: ControlNet
+  - local: using-diffusers/t2i_adapter
+    title: T2I-Adapter
+  - local: using-diffusers/inference_with_lcm
+    title: Latent Consistency Model
+  - local: using-diffusers/textual_inversion_inference
+    title: Textual inversion
+  - local: using-diffusers/shap-e
+    title: Shap-E
+  - local: using-diffusers/diffedit
+    title: DiffEdit
+  - local: using-diffusers/inference_with_tcd_lora
+    title: Trajectory Consistency Distillation-LoRA
+  - local: using-diffusers/svd
+    title: Stable Video Diffusion
+  - local: using-diffusers/marigold_usage
+    title: Marigold Computer Vision
+  title: Specific pipeline examples
+- sections:
+  - local: training/overview
+    title: Overview
+  - local: training/create_dataset
+    title: Create a dataset for training
+  - local: training/adapt_a_model
+    title: Adapt a model to a new task
+  - isExpanded: false
+    sections:
+    - local: training/unconditional_training
+      title: Unconditional image generation
+    - local: training/text2image
+      title: Text-to-image
+    - local: training/sdxl
+      title: Stable Diffusion XL
+    - local: training/kandinsky
+      title: Kandinsky 2.2
+    - local: training/wuerstchen
+      title: Wuerstchen
+    - local: training/controlnet
+      title: ControlNet
+    - local: training/t2i_adapters
+      title: T2I-Adapters
+    - local: training/instructpix2pix
+      title: InstructPix2Pix
+    - local: training/cogvideox
+      title: CogVideoX
+    title: Models
+  - isExpanded: false
+    sections:
+    - local: training/text_inversion
+      title: Textual Inversion
+    - local: training/dreambooth
+      title: DreamBooth
+    - local: training/lora
+      title: LoRA
+    - local: training/custom_diffusion
+      title: Custom Diffusion
+    - local: training/lcm_distill
+      title: Latent Consistency Distillation
+    - local: training/ddpo
+      title: Reinforcement learning training with DDPO
+    title: Methods
+  title: Training
+- sections:
+  - local: quantization/overview
+    title: Getting Started
+  - local: quantization/bitsandbytes
+    title: bitsandbytes
+  - local: quantization/gguf
+    title: gguf
+  - local: quantization/torchao
+    title: torchao
+  title: Quantization Methods
+- sections:
+  - local: optimization/fp16
+    title: Speed up inference
+  - local: optimization/memory
+    title: Reduce memory usage
+  - local: optimization/torch2.0
+    title: PyTorch 2.0
+  - local: optimization/xformers
+    title: xFormers
+  - local: optimization/tome
+    title: Token merging
+  - local: optimization/deepcache
+    title: DeepCache
+  - local: optimization/tgate
+    title: TGATE
+  - local: optimization/xdit
+    title: xDiT
+  - local: optimization/para_attn
+    title: ParaAttention
  - sections:
    - local: using-diffusers/stable_diffusion_jax_how_to
      title: JAX/Flax
@@ -176,14 +194,16 @@
      title: OpenVINO
    - local: optimization/coreml
      title: Core ML
-    title: Optimized model types
+    title: Optimized model formats
  - sections:
    - local: optimization/mps
      title: Metal Performance Shaders (MPS)
    - local: optimization/habana
      title: Habana Gaudi
+    - local: optimization/neuron
+      title: AWS Neuron
    title: Optimized hardware
-  title: Optimization
+  title: Accelerate inference and reduce memory
 - sections:
  - local: conceptual/philosophy
    title: Philosophy
@@ -197,15 +217,23 @@
    title: Evaluating Diffusion Models
  title: Conceptual Guides
 - sections:
-  - sections:
+  - local: community_projects
+    title: Projects built with Diffusers
+  title: Community Projects
+- sections:
+  - isExpanded: false
+    sections:
    - local: api/configuration
      title: Configuration
    - local: api/logging
      title: Logging
    - local: api/outputs
      title: Outputs
+    - local: api/quantization
+      title: Quantization
    title: Main Classes
-  - sections:
+  - isExpanded: false
+    sections:
    - local: api/loaders/ip_adapter
      title: IP-Adapter
    - local: api/loaders/lora
@@ -216,46 +244,126 @@
      title: Textual Inversion
    - local: api/loaders/unet
      title: UNet
+    - local: api/loaders/transformer_sd3
+      title: SD3Transformer2D
    - local: api/loaders/peft
      title: PEFT
    title: Loaders
-  - sections:
+  - isExpanded: false
+    sections:
    - local: api/models/overview
      title: Overview
-    - local: api/models/unet
-      title: UNet1DModel
-    - local: api/models/unet2d
-      title: UNet2DModel
-    - local: api/models/unet2d-cond
-      title: UNet2DConditionModel
-    - local: api/models/unet3d-cond
-      title: UNet3DConditionModel
-    - local: api/models/unet-motion
-      title: UNetMotionModel
-    - local: api/models/uvit2d
-      title: UViT2DModel
-    - local: api/models/vq
-      title: VQModel
-    - local: api/models/autoencoderkl
-      title: AutoencoderKL
-    - local: api/models/asymmetricautoencoderkl
-      title: AsymmetricAutoencoderKL
-    - local: api/models/autoencoder_tiny
-      title: Tiny AutoEncoder
-    - local: api/models/consistency_decoder_vae
-      title: ConsistencyDecoderVAE
-    - local: api/models/transformer2d
-      title: Transformer2D
-    - local: api/models/transformer_temporal
-      title: Transformer Temporal
-    - local: api/models/prior_transformer
-      title: Prior Transformer
-    - local: api/models/controlnet
-      title: ControlNet
+    - sections:
+      - local: api/models/controlnet
+        title: ControlNetModel
+      - local: api/models/controlnet_flux
+        title: FluxControlNetModel
+      - local: api/models/controlnet_hunyuandit
+        title: HunyuanDiT2DControlNetModel
+      - local: api/models/controlnet_sd3
+        title: SD3ControlNetModel
+      - local: api/models/controlnet_sparsectrl
+        title: SparseControlNetModel
+      - local: api/models/controlnet_union
+        title: ControlNetUnionModel
+      title: ControlNets
+    - sections:
+      - local: api/models/allegro_transformer3d
+        title: AllegroTransformer3DModel
+      - local: api/models/aura_flow_transformer2d
+        title: AuraFlowTransformer2DModel
+      - local: api/models/cogvideox_transformer3d
+        title: CogVideoXTransformer3DModel
+      - local: api/models/consisid_transformer3d
+        title: ConsisIDTransformer3DModel
+      - local: api/models/cogview3plus_transformer2d
+        title: CogView3PlusTransformer2DModel
+      - local: api/models/cogview4_transformer2d
+        title: CogView4Transformer2DModel
+      - local: api/models/dit_transformer2d
+        title: DiTTransformer2DModel
+      - local: api/models/flux_transformer
+        title: FluxTransformer2DModel
+      - local: api/models/hunyuan_transformer2d
+        title: HunyuanDiT2DModel
+      - local: api/models/hunyuan_video_transformer_3d
+        title: HunyuanVideoTransformer3DModel
+      - local: api/models/latte_transformer3d
+        title: LatteTransformer3DModel
+      - local: api/models/lumina_nextdit2d
+        title: LuminaNextDiT2DModel
+      - local: api/models/lumina2_transformer2d
+        title: Lumina2Transformer2DModel
+      - local: api/models/ltx_video_transformer3d
+        title: LTXVideoTransformer3DModel
+      - local: api/models/mochi_transformer3d
+        title: MochiTransformer3DModel
+      - local: api/models/omnigen_transformer
+        title: OmniGenTransformer2DModel
+      - local: api/models/pixart_transformer2d
+        title: PixArtTransformer2DModel
+      - local: api/models/prior_transformer
+        title: PriorTransformer
+      - local: api/models/sd3_transformer2d
+        title: SD3Transformer2DModel
+      - local: api/models/sana_transformer2d
+        title: SanaTransformer2DModel
+      - local: api/models/stable_audio_transformer
+        title: StableAudioDiTModel
+      - local: api/models/transformer2d
+        title: Transformer2DModel
+      - local: api/models/transformer_temporal
+        title: TransformerTemporalModel
+      title: Transformers
+    - sections:
+      - local: api/models/stable_cascade_unet
+        title: StableCascadeUNet
+      - local: api/models/unet
+        title: UNet1DModel
+      - local: api/models/unet2d
+        title: UNet2DModel
+      - local: api/models/unet2d-cond
+        title: UNet2DConditionModel
+      - local: api/models/unet3d-cond
+        title: UNet3DConditionModel
+      - local: api/models/unet-motion
+        title: UNetMotionModel
+      - local: api/models/uvit2d
+        title: UViT2DModel
+      title: UNets
+    - sections:
+      - local: api/models/autoencoderkl
+        title: AutoencoderKL
+      - local: api/models/autoencoderkl_allegro
+        title: AutoencoderKLAllegro
+      - local: api/models/autoencoderkl_cogvideox
+        title: AutoencoderKLCogVideoX
+      - local: api/models/autoencoder_kl_hunyuan_video
+        title: AutoencoderKLHunyuanVideo
+      - local: api/models/autoencoderkl_ltx_video
+        title: AutoencoderKLLTXVideo
+      - local: api/models/autoencoderkl_mochi
+        title: AutoencoderKLMochi
+      - local: api/models/asymmetricautoencoderkl
+        title: AsymmetricAutoencoderKL
+      - local: api/models/autoencoder_dc
+        title: AutoencoderDC
+      - local: api/models/consistency_decoder_vae
+        title: ConsistencyDecoderVAE
+      - local: api/models/autoencoder_oobleck
+        title: Oobleck AutoEncoder
+      - local: api/models/autoencoder_tiny
+        title: Tiny AutoEncoder
+      - local: api/models/vq
+        title: VQModel
+      title: VAEs
    title: Models
-  - sections:
+  - isExpanded: false
+    sections:
    - local: api/pipelines/overview
      title: Overview
+    - local: api/pipelines/allegro
+      title: Allegro
    - local: api/pipelines/amused
      title: aMUSEd
    - local: api/pipelines/animatediff
@@ -266,16 +374,38 @@
      title: AudioLDM
    - local: api/pipelines/audioldm2
      title: AudioLDM 2
+    - local: api/pipelines/aura_flow
+      title: AuraFlow
    - local: api/pipelines/auto_pipeline
      title: AutoPipeline
    - local: api/pipelines/blip_diffusion
      title: BLIP-Diffusion
+    - local: api/pipelines/cogvideox
+      title: CogVideoX
+    - local: api/pipelines/cogview3
+      title: CogView3
+    - local: api/pipelines/cogview4
+      title: CogView4
+    - local: api/pipelines/consisid
+      title: ConsisID
    - local: api/pipelines/consistency_models
      title: Consistency Models
    - local: api/pipelines/controlnet
      title: ControlNet
+    - local: api/pipelines/controlnet_flux
+      title: ControlNet with Flux.1
+    - local: api/pipelines/controlnet_hunyuandit
+      title: ControlNet with Hunyuan-DiT
+    - local: api/pipelines/controlnet_sd3
+      title: ControlNet with Stable Diffusion 3
    - local: api/pipelines/controlnet_sdxl
      title: ControlNet with Stable Diffusion XL
+    - local: api/pipelines/controlnetxs
+      title: ControlNet-XS
+    - local: api/pipelines/controlnetxs_sdxl
+      title: ControlNet-XS with Stable Diffusion XL
+    - local: api/pipelines/controlnet_union
+      title: ControlNetUnion
    - local: api/pipelines/dance_diffusion
      title: Dance Diffusion
    - local: api/pipelines/ddim
@@ -288,6 +418,14 @@
      title: DiffEdit
    - local: api/pipelines/dit
      title: DiT
+    - local: api/pipelines/flux
+      title: Flux
+    - local: api/pipelines/control_flux_inpaint
+      title: FluxControlInpaint
+    - local: api/pipelines/hunyuandit
+      title: Hunyuan-DiT
+    - local: api/pipelines/hunyuan_video
+      title: HunyuanVideo
    - local: api/pipelines/i2vgenxl
      title: I2VGen-XL
    - local: api/pipelines/pix2pix
@@ -298,26 +436,54 @@
      title: Kandinsky 2.2
    - local: api/pipelines/kandinsky3
      title: Kandinsky 3
+    - local: api/pipelines/kolors
+      title: Kolors
    - local: api/pipelines/latent_consistency_models
      title: Latent Consistency Models
    - local: api/pipelines/latent_diffusion
      title: Latent Diffusion
+    - local: api/pipelines/latte
+      title: Latte
+    - local: api/pipelines/ledits_pp
+      title: LEDITS++
+    - local: api/pipelines/ltx_video
+      title: LTXVideo
+    - local: api/pipelines/lumina2
+      title: Lumina 2.0
+    - local: api/pipelines/lumina
+      title: Lumina-T2X
+    - local: api/pipelines/marigold
+      title: Marigold
+    - local: api/pipelines/mochi
+      title: Mochi
    - local: api/pipelines/panorama
      title: MultiDiffusion
    - local: api/pipelines/musicldm
      title: MusicLDM
+    - local: api/pipelines/omnigen
+      title: OmniGen
+    - local: api/pipelines/pag
+      title: PAG
    - local: api/pipelines/paint_by_example
      title: Paint by Example
    - local: api/pipelines/pia
      title: Personalized Image Animator (PIA)
    - local: api/pipelines/pixart
      title: PixArt-α
+    - local: api/pipelines/pixart_sigma
+      title: PixArt-Σ
+    - local: api/pipelines/sana
+      title: Sana
    - local: api/pipelines/self_attention_guidance
      title: Self-Attention Guidance
    - local: api/pipelines/semantic_stable_diffusion
      title: Semantic Guidance
    - local: api/pipelines/shap_e
      title: Shap-E
+    - local: api/pipelines/stable_audio
+      title: Stable Audio
+    - local: api/pipelines/stable_cascade
+      title: Stable Cascade
    - sections:
      - local: api/pipelines/stable_diffusion/overview
        title: Overview
@@ -337,6 +503,8 @@
        title: Safe Stable Diffusion
      - local: api/pipelines/stable_diffusion/stable_diffusion_2
        title: Stable Diffusion 2
+      - local: api/pipelines/stable_diffusion/stable_diffusion_3
+        title: Stable Diffusion 3
      - local: api/pipelines/stable_diffusion/stable_diffusion_xl
        title: Stable Diffusion XL
      - local: api/pipelines/stable_diffusion/sdxl_turbo
@@ -350,7 +518,7 @@
      - local: api/pipelines/stable_diffusion/ldm3d_diffusion
        title: LDM3D Text-to-(RGB, Depth), Text-to-(RGB-pano, Depth-pano), LDM3D Upscaler
      - local: api/pipelines/stable_diffusion/adapter
-        title: Stable Diffusion T2I-Adapter
+        title: T2I-Adapter
      - local: api/pipelines/stable_diffusion/gligen
        title: GLIGEN (Grounded Language-to-Image Generation)
      title: Stable Diffusion
@@ -369,13 +537,16 @@
    - local: api/pipelines/wuerstchen
      title: Wuerstchen
    title: Pipelines
-  - sections:
+  - isExpanded: false
+    sections:
    - local: api/schedulers/overview
      title: Overview
    - local: api/schedulers/cm_stochastic_iterative
      title: CMStochasticIterativeScheduler
    - local: api/schedulers/consistency_decoder
      title: ConsistencyDecoderScheduler
+    - local: api/schedulers/cosine_dpm
+      title: CosineDPMSolverMultistepScheduler
    - local: api/schedulers/ddim_inverse
      title: DDIMInverseScheduler
    - local: api/schedulers/ddim
@@ -392,10 +563,18 @@
      title: DPMSolverSDEScheduler
    - local: api/schedulers/singlestep_dpm_solver
      title: DPMSolverSinglestepScheduler
+    - local: api/schedulers/edm_multistep_dpm_solver
+      title: EDMDPMSolverMultistepScheduler
+    - local: api/schedulers/edm_euler
+      title: EDMEulerScheduler
    - local: api/schedulers/euler_ancestral
      title: EulerAncestralDiscreteScheduler
    - local: api/schedulers/euler
      title: EulerDiscreteScheduler
+    - local: api/schedulers/flow_match_euler_discrete
+      title: FlowMatchEulerDiscreteScheduler
+    - local: api/schedulers/flow_match_heun_discrete
+      title: FlowMatchHeunDiscreteScheduler
    - local: api/schedulers/heun
      title: HeunDiscreteScheduler
    - local: api/schedulers/ipndm
@@ -418,23 +597,30 @@
      title: ScoreSdeVeScheduler
    - local: api/schedulers/score_sde_vp
      title: ScoreSdeVpScheduler
+    - local: api/schedulers/tcd
+      title: TCDScheduler
    - local: api/schedulers/unipc
      title: UniPCMultistepScheduler
    - local: api/schedulers/vq_diffusion
      title: VQDiffusionScheduler
    title: Schedulers
-  - sections:
+  - isExpanded: false
+    sections:
    - local: api/internal_classes_overview
      title: Overview
    - local: api/attnprocessor
      title: Attention Processor
    - local: api/activations
      title: Custom activation functions
+    - local: api/cache
+      title: Caching methods
    - local: api/normalization
      title: Custom normalization layers
    - local: api/utilities
      title: Utilities
    - local: api/image_processor
      title: VAE Image Processor
+    - local: api/video_processor
+      title: Video Processor
    title: Internal classes
  title: API
--- a/docs/source/en/advanced_inference/outpaint.md
+++ b/docs/source/en/advanced_inference/outpaint.md
@@ -0,0 +1,231 @@
+<!--Copyright 2024 The HuggingFace Team. All rights reserved.
+
+Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
+the License. You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
+an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
+specific language governing permissions and limitations under the License.
+-->
+
+# Outpainting
+
+Outpainting extends an image beyond its original boundaries, allowing you to add, replace, or modify visual elements in an image while preserving the original image. Like [inpainting](../using-diffusers/inpaint), you want to fill the white area (in this case, the area outside of the original image) with new visual elements while keeping the original image (represented by a mask of black pixels). There are a couple of ways to outpaint, such as with a [ControlNet](https://hf.co/blog/OzzyGT/outpainting-controlnet) or with [Differential Diffusion](https://hf.co/blog/OzzyGT/outpainting-differential-diffusion).
+
+This guide will show you how to outpaint with an inpainting model, ControlNet, and a ZoeDepth estimator.
+
+Before you begin, make sure you have the [controlnet_aux](https://github.com/huggingface/controlnet_aux) library installed so you can use the ZoeDepth estimator.
+
+```py
+!pip install -q controlnet_aux
+```
+
+## Image preparation
+
+Start by picking an image to outpaint with and remove the background with a Space like [BRIA-RMBG-1.4](https://hf.co/spaces/briaai/BRIA-RMBG-1.4).
+
+<iframe
+	src="https://briaai-bria-rmbg-1-4.hf.space"
+	frameborder="0"
+	width="850"
+	height="450"
+></iframe>
+
+For example, remove the background from this image of a pair of shoes.
+
+<div class="flex flex-row gap-4">
+  <div class="flex-1">
+    <img class="rounded-xl" src="https://huggingface.co/datasets/stevhliu/testing-images/resolve/main/original-jordan.png"/>
+    <figcaption class="mt-2 text-center text-sm text-gray-500">original image</figcaption>
+  </div>
+  <div class="flex-1">
+    <img class="rounded-xl" src="https://huggingface.co/datasets/stevhliu/testing-images/resolve/main/no-background-jordan.png"/>
+    <figcaption class="mt-2 text-center text-sm text-gray-500">background removed</figcaption>
+  </div>
+</div>
+
+[Stable Diffusion XL (SDXL)](../using-diffusers/sdxl) models work best with 1024x1024 images, but you can resize the image to any size as long as your hardware has enough memory to support it. The transparent background in the image should also be replaced with a white background. Create a function (like the one below) that scales and pastes the image onto a white background.
+
+```py
+import random
+
+import requests
+import torch
+from controlnet_aux import ZoeDetector
+from PIL import Image, ImageOps
+
+from diffusers import (
+    AutoencoderKL,
+    ControlNetModel,
+    StableDiffusionXLControlNetPipeline,
+    StableDiffusionXLInpaintPipeline,
+)
+
+def scale_and_paste(original_image):
+    aspect_ratio = original_image.width / original_image.height
+
+    if original_image.width > original_image.height:
+        new_width = 1024
+        new_height = round(new_width / aspect_ratio)
+    else:
+        new_height = 1024
+        new_width = round(new_height * aspect_ratio)
+
+    resized_original = original_image.resize((new_width, new_height), Image.LANCZOS)
+    white_background = Image.new("RGBA", (1024, 1024), "white")
+    x = (1024 - new_width) // 2
+    y = (1024 - new_height) // 2
+    white_background.paste(resized_original, (x, y), resized_original)
+
+    return resized_original, white_background
+
+original_image = Image.open(
+    requests.get(
+        "https://huggingface.co/datasets/stevhliu/testing-images/resolve/main/no-background-jordan.png",
+        stream=True,
+    ).raw
+).convert("RGBA")
+resized_img, white_bg_image = scale_and_paste(original_image)
+```
+
+To avoid adding unwanted extra details, use the ZoeDepth estimator to provide additional guidance during generation and to ensure the shoes remain consistent with the original image.
+
+```py
+zoe = ZoeDetector.from_pretrained("lllyasviel/Annotators")
+image_zoe = zoe(white_bg_image, detect_resolution=512, image_resolution=1024)
+image_zoe
+```
+
+<div class="flex justify-center">
+    <img src="https://huggingface.co/datasets/stevhliu/testing-images/resolve/main/zoedepth-jordan.png"/>
+</div>
+
+## Outpaint
+
+Once your image is ready, you can generate content in the white area around the shoes with [controlnet-inpaint-dreamer-sdxl](https://hf.co/destitech/controlnet-inpaint-dreamer-sdxl), a SDXL ControlNet trained for inpainting.
+
+Load the inpainting ControlNet, ZoeDepth model, VAE and pass them to the [`StableDiffusionXLControlNetPipeline`]. Then you can create an optional `generate_image` function (for convenience) to outpaint an initial image.
+
+```py
+controlnets = [
+    ControlNetModel.from_pretrained(
+        "destitech/controlnet-inpaint-dreamer-sdxl", torch_dtype=torch.float16, variant="fp16"
+    ),
+    ControlNetModel.from_pretrained(
+        "diffusers/controlnet-zoe-depth-sdxl-1.0", torch_dtype=torch.float16
+    ),
+]
+vae = AutoencoderKL.from_pretrained("madebyollin/sdxl-vae-fp16-fix", torch_dtype=torch.float16).to("cuda")
+pipeline = StableDiffusionXLControlNetPipeline.from_pretrained(
+    "SG161222/RealVisXL_V4.0", torch_dtype=torch.float16, variant="fp16", controlnet=controlnets, vae=vae
+).to("cuda")
+
+def generate_image(prompt, negative_prompt, inpaint_image, zoe_image, seed: int = None):
+    if seed is None:
+        seed = random.randint(0, 2**32 - 1)
+
+    generator = torch.Generator(device="cpu").manual_seed(seed)
+
+    image = pipeline(
+        prompt,
+        negative_prompt=negative_prompt,
+        image=[inpaint_image, zoe_image],
+        guidance_scale=6.5,
+        num_inference_steps=25,
+        generator=generator,
+        controlnet_conditioning_scale=[0.5, 0.8],
+        control_guidance_end=[0.9, 0.6],
+    ).images[0]
+
+    return image
+
+prompt = "nike air jordans on a basketball court"
+negative_prompt = ""
+
+temp_image = generate_image(prompt, negative_prompt, white_bg_image, image_zoe, 908097)
+```
+
+Paste the original image over the initial outpainted image. You'll improve the outpainted background in a later step.
+
+```py
+x = (1024 - resized_img.width) // 2
+y = (1024 - resized_img.height) // 2
+temp_image.paste(resized_img, (x, y), resized_img)
+temp_image
+```
+
+<div class="flex justify-center">
+    <img src="https://huggingface.co/datasets/stevhliu/testing-images/resolve/main/initial-outpaint.png"/>
+</div>
+
+> [!TIP]
+> Now is a good time to free up some memory if you're running low!
+>
+> ```py
+> pipeline=None
+> torch.cuda.empty_cache()
+> ```
+
+Now that you have an initial outpainted image, load the [`StableDiffusionXLInpaintPipeline`] with the [RealVisXL](https://hf.co/SG161222/RealVisXL_V4.0) model to generate the final outpainted image with better quality.
+
+```py
+pipeline = StableDiffusionXLInpaintPipeline.from_pretrained(
+    "OzzyGT/RealVisXL_V4.0_inpainting",
+    torch_dtype=torch.float16,
+    variant="fp16",
+    vae=vae,
+).to("cuda")
+```
+
+Prepare a mask for the final outpainted image. To create a more natural transition between the original image and the outpainted background, blur the mask to help it blend better.
+
+```py
+mask = Image.new("L", temp_image.size)
+mask.paste(resized_img.split()[3], (x, y))
+mask = ImageOps.invert(mask)
+final_mask = mask.point(lambda p: p > 128 and 255)
+mask_blurred = pipeline.mask_processor.blur(final_mask, blur_factor=20)
+mask_blurred
+```
+
+<div class="flex justify-center">
+    <img src="https://huggingface.co/datasets/stevhliu/testing-images/resolve/main/blurred-mask.png"/>
+</div>
+
+Create a better prompt and pass it to the `generate_outpaint` function to generate the final outpainted image. Again, paste the original image over the final outpainted background.
+
+```py
+def generate_outpaint(prompt, negative_prompt, image, mask, seed: int = None):
+    if seed is None:
+        seed = random.randint(0, 2**32 - 1)
+
+    generator = torch.Generator(device="cpu").manual_seed(seed)
+
+    image = pipeline(
+        prompt,
+        negative_prompt=negative_prompt,
+        image=image,
+        mask_image=mask,
+        guidance_scale=10.0,
+        strength=0.8,
+        num_inference_steps=30,
+        generator=generator,
+    ).images[0]
+
+    return image
+
+prompt = "high quality photo of nike air jordans on a basketball court, highly detailed"
+negative_prompt = ""
+
+final_image = generate_outpaint(prompt, negative_prompt, temp_image, mask_blurred, 7688778)
+x = (1024 - resized_img.width) // 2
+y = (1024 - resized_img.height) // 2
+final_image.paste(resized_img, (x, y), resized_img)
+final_image
+```
+
+<div class="flex justify-center">
+    <img src="https://huggingface.co/datasets/stevhliu/testing-images/resolve/main/final-outpaint.png"/>
+</div>
--- a/docs/source/en/api/attnprocessor.md
+++ b/docs/source/en/api/attnprocessor.md
@@ -15,49 +15,135 @@ specific language governing permissions and limitations under the License.
 An attention processor is a class for applying different types of attention mechanisms.

 ## AttnProcessor
+
 [[autodoc]] models.attention_processor.AttnProcessor

-## AttnProcessor2_0
 [[autodoc]] models.attention_processor.AttnProcessor2_0

-## AttnAddedKVProcessor
 [[autodoc]] models.attention_processor.AttnAddedKVProcessor

-## AttnAddedKVProcessor2_0
 [[autodoc]] models.attention_processor.AttnAddedKVProcessor2_0

-## CrossFrameAttnProcessor
-[[autodoc]] pipelines.text_to_video_synthesis.pipeline_text_to_video_zero.CrossFrameAttnProcessor
+[[autodoc]] models.attention_processor.AttnProcessorNPU

-## CustomDiffusionAttnProcessor
-[[autodoc]] models.attention_processor.CustomDiffusionAttnProcessor
-
-## CustomDiffusionAttnProcessor2_0
-[[autodoc]] models.attention_processor.CustomDiffusionAttnProcessor2_0
-
-## CustomDiffusionXFormersAttnProcessor
-[[autodoc]] models.attention_processor.CustomDiffusionXFormersAttnProcessor
-
-## FusedAttnProcessor2_0
 [[autodoc]] models.attention_processor.FusedAttnProcessor2_0

-## LoRAAttnProcessor
+## Allegro
+
+[[autodoc]] models.attention_processor.AllegroAttnProcessor2_0
+
+## AuraFlow
+
+[[autodoc]] models.attention_processor.AuraFlowAttnProcessor2_0
+
+[[autodoc]] models.attention_processor.FusedAuraFlowAttnProcessor2_0
+
+## CogVideoX
+
+[[autodoc]] models.attention_processor.CogVideoXAttnProcessor2_0
+
+[[autodoc]] models.attention_processor.FusedCogVideoXAttnProcessor2_0
+
+## CrossFrameAttnProcessor
+
+[[autodoc]] pipelines.text_to_video_synthesis.pipeline_text_to_video_zero.CrossFrameAttnProcessor
+
+## Custom Diffusion
+
+[[autodoc]] models.attention_processor.CustomDiffusionAttnProcessor
+
+[[autodoc]] models.attention_processor.CustomDiffusionAttnProcessor2_0
+
+[[autodoc]] models.attention_processor.CustomDiffusionXFormersAttnProcessor
+
+## Flux
+
+[[autodoc]] models.attention_processor.FluxAttnProcessor2_0
+
+[[autodoc]] models.attention_processor.FusedFluxAttnProcessor2_0
+
+[[autodoc]] models.attention_processor.FluxSingleAttnProcessor2_0
+
+## Hunyuan
+
+[[autodoc]] models.attention_processor.HunyuanAttnProcessor2_0
+
+[[autodoc]] models.attention_processor.FusedHunyuanAttnProcessor2_0
+
+[[autodoc]] models.attention_processor.PAGHunyuanAttnProcessor2_0
+
+[[autodoc]] models.attention_processor.PAGCFGHunyuanAttnProcessor2_0
+
+## IdentitySelfAttnProcessor2_0
+
+[[autodoc]] models.attention_processor.PAGIdentitySelfAttnProcessor2_0
+
+[[autodoc]] models.attention_processor.PAGCFGIdentitySelfAttnProcessor2_0
+
+## IP-Adapter
+
+[[autodoc]] models.attention_processor.IPAdapterAttnProcessor
+
+[[autodoc]] models.attention_processor.IPAdapterAttnProcessor2_0
+
+[[autodoc]] models.attention_processor.SD3IPAdapterJointAttnProcessor2_0
+
+## JointAttnProcessor2_0
+
+[[autodoc]] models.attention_processor.JointAttnProcessor2_0
+
+[[autodoc]] models.attention_processor.PAGJointAttnProcessor2_0
+
+[[autodoc]] models.attention_processor.PAGCFGJointAttnProcessor2_0
+
+[[autodoc]] models.attention_processor.FusedJointAttnProcessor2_0
+
+## LoRA
+
 [[autodoc]] models.attention_processor.LoRAAttnProcessor

-## LoRAAttnProcessor2_0
 [[autodoc]] models.attention_processor.LoRAAttnProcessor2_0

-## LoRAAttnAddedKVProcessor
 [[autodoc]] models.attention_processor.LoRAAttnAddedKVProcessor

-## LoRAXFormersAttnProcessor
 [[autodoc]] models.attention_processor.LoRAXFormersAttnProcessor

+## Lumina-T2X
+
+[[autodoc]] models.attention_processor.LuminaAttnProcessor2_0
+
+## Mochi
+
+[[autodoc]] models.attention_processor.MochiAttnProcessor2_0
+
+[[autodoc]] models.attention_processor.MochiVaeAttnProcessor2_0
+
+## Sana
+
+[[autodoc]] models.attention_processor.SanaLinearAttnProcessor2_0
+
+[[autodoc]] models.attention_processor.SanaMultiscaleAttnProcessor2_0
+
+[[autodoc]] models.attention_processor.PAGCFGSanaLinearAttnProcessor2_0
+
+[[autodoc]] models.attention_processor.PAGIdentitySanaLinearAttnProcessor2_0
+
+## Stable Audio
+
+[[autodoc]] models.attention_processor.StableAudioAttnProcessor2_0
+
 ## SlicedAttnProcessor
+
 [[autodoc]] models.attention_processor.SlicedAttnProcessor

-## SlicedAttnAddedKVProcessor
 [[autodoc]] models.attention_processor.SlicedAttnAddedKVProcessor

 ## XFormersAttnProcessor
+
 [[autodoc]] models.attention_processor.XFormersAttnProcessor
+
+[[autodoc]] models.attention_processor.XFormersAttnAddedKVProcessor
+
+## XLAFlashAttnProcessor2_0
+
+[[autodoc]] models.attention_processor.XLAFlashAttnProcessor2_0
--- a/docs/source/en/api/cache.md
+++ b/docs/source/en/api/cache.md
@@ -0,0 +1,49 @@
+<!-- Copyright 2024 The HuggingFace Team. All rights reserved.
+
+Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
+the License. You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
+an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
+specific language governing permissions and limitations under the License. -->
+
+# Caching methods
+
+## Pyramid Attention Broadcast
+
+[Pyramid Attention Broadcast](https://huggingface.co/papers/2408.12588) from Xuanlei Zhao, Xiaolong Jin, Kai Wang, Yang You.
+
+Pyramid Attention Broadcast (PAB) is a method that speeds up inference in diffusion models by systematically skipping attention computations between successive inference steps and reusing cached attention states. The attention states are not very different between successive inference steps. The most prominent difference is in the spatial attention blocks, not as much in the temporal attention blocks, and finally the least in the cross attention blocks. Therefore, many cross attention computation blocks can be skipped, followed by the temporal and spatial attention blocks. By combining other techniques like sequence parallelism and classifier-free guidance parallelism, PAB achieves near real-time video generation.
+
+Enable PAB with [`~PyramidAttentionBroadcastConfig`] on any pipeline. For some benchmarks, refer to [this](https://github.com/huggingface/diffusers/pull/9562) pull request.
+
+```python
+import torch
+from diffusers import CogVideoXPipeline, PyramidAttentionBroadcastConfig
+
+pipe = CogVideoXPipeline.from_pretrained("THUDM/CogVideoX-5b", torch_dtype=torch.bfloat16)
+pipe.to("cuda")
+
+# Increasing the value of `spatial_attention_timestep_skip_range[0]` or decreasing the value of
+# `spatial_attention_timestep_skip_range[1]` will decrease the interval in which pyramid attention
+# broadcast is active, leader to slower inference speeds. However, large intervals can lead to
+# poorer quality of generated videos.
+config = PyramidAttentionBroadcastConfig(
+    spatial_attention_block_skip_range=2,
+    spatial_attention_timestep_skip_range=(100, 800),
+    current_timestep_callback=lambda: pipe.current_timestep,
+)
+pipe.transformer.enable_cache(config)
+```
+
+### CacheMixin
+
+[[autodoc]] CacheMixin
+
+### PyramidAttentionBroadcastConfig
+
+[[autodoc]] PyramidAttentionBroadcastConfig
+
+[[autodoc]] apply_pyramid_attention_broadcast
--- a/docs/source/en/api/image_processor.md
+++ b/docs/source/en/api/image_processor.md
@@ -25,3 +25,11 @@ All pipelines with [`VaeImageProcessor`] accept PIL Image, PyTorch tensor, or Nu
 The [`VaeImageProcessorLDM3D`] accepts RGB and depth inputs and returns RGB and depth outputs.

 [[autodoc]] image_processor.VaeImageProcessorLDM3D
+
+## PixArtImageProcessor
+
+[[autodoc]] image_processor.PixArtImageProcessor
+
+## IPAdapterMaskProcessor
+
+[[autodoc]] image_processor.IPAdapterMaskProcessor
--- a/docs/source/en/api/loaders/ip_adapter.md
+++ b/docs/source/en/api/loaders/ip_adapter.md
@@ -23,3 +23,13 @@ Learn how to load an IP-Adapter checkpoint and image in the IP-Adapter [loading]
 ## IPAdapterMixin

 [[autodoc]] loaders.ip_adapter.IPAdapterMixin
+
+## SD3IPAdapterMixin
+
+[[autodoc]] loaders.ip_adapter.SD3IPAdapterMixin
+    - all
+    - is_ip_adapter_active
+
+## IPAdapterMaskProcessor
+
+[[autodoc]] image_processor.IPAdapterMaskProcessor
--- a/docs/source/en/api/loaders/lora.md
+++ b/docs/source/en/api/loaders/lora.md
@@ -12,10 +12,16 @@ specific language governing permissions and limitations under the License.

 # LoRA

-LoRA is a fast and lightweight training method that inserts and trains a significantly smaller number of parameters instead of all the model parameters. This produces a smaller file (~100 MBs) and makes it easier to quickly train a model to learn a new concept. LoRA weights are typically loaded into the UNet, text encoder or both. There are two classes for loading LoRA weights:
+LoRA is a fast and lightweight training method that inserts and trains a significantly smaller number of parameters instead of all the model parameters. This produces a smaller file (~100 MBs) and makes it easier to quickly train a model to learn a new concept. LoRA weights are typically loaded into the denoiser, text encoder or both. The denoiser usually corresponds to a UNet ([`UNet2DConditionModel`], for example) or a Transformer ([`SD3Transformer2DModel`], for example). There are several classes for loading LoRA weights:

- [`LoraLoaderMixin`] provides functions for loading and unloading, fusing and unfusing, enabling and disabling, and more functions for managing LoRA weights. This class can be used with any model.
- [`StableDiffusionXLLoraLoaderMixin`] is a [Stable Diffusion (SDXL)](../../api/pipelines/stable_diffusion/stable_diffusion_xl) version of the [`LoraLoaderMixin`] class for loading and saving LoRA weights. It can only be used with the SDXL model.
+- [`StableDiffusionLoraLoaderMixin`] provides functions for loading and unloading, fusing and unfusing, enabling and disabling, and more functions for managing LoRA weights. This class can be used with any model.
+- [`StableDiffusionXLLoraLoaderMixin`] is a [Stable Diffusion (SDXL)](../../api/pipelines/stable_diffusion/stable_diffusion_xl) version of the [`StableDiffusionLoraLoaderMixin`] class for loading and saving LoRA weights. It can only be used with the SDXL model.
+- [`SD3LoraLoaderMixin`] provides similar functions for [Stable Diffusion 3](https://huggingface.co/blog/sd3).
+- [`FluxLoraLoaderMixin`] provides similar functions for [Flux](https://huggingface.co/docs/diffusers/main/en/api/pipelines/flux).
+- [`CogVideoXLoraLoaderMixin`] provides similar functions for [CogVideoX](https://huggingface.co/docs/diffusers/main/en/api/pipelines/cogvideox).
+- [`Mochi1LoraLoaderMixin`] provides similar functions for [Mochi](https://huggingface.co/docs/diffusers/main/en/api/pipelines/mochi).
+- [`AmusedLoraLoaderMixin`] is for the [`AmusedPipeline`].
+- [`LoraBaseMixin`] provides a base class with several utility methods to fuse, unfuse, unload, LoRAs and more.

 <Tip>

@@ -23,10 +29,34 @@ To learn more about how to load LoRA weights, see the [LoRA](../../using-diffuse

 </Tip>

-## LoraLoaderMixin
+## StableDiffusionLoraLoaderMixin

-[[autodoc]] loaders.lora.LoraLoaderMixin
+[[autodoc]] loaders.lora_pipeline.StableDiffusionLoraLoaderMixin

 ## StableDiffusionXLLoraLoaderMixin

-[[autodoc]] loaders.lora.StableDiffusionXLLoraLoaderMixin
+[[autodoc]] loaders.lora_pipeline.StableDiffusionXLLoraLoaderMixin
+
+## SD3LoraLoaderMixin
+
+[[autodoc]] loaders.lora_pipeline.SD3LoraLoaderMixin
+
+## FluxLoraLoaderMixin
+
+[[autodoc]] loaders.lora_pipeline.FluxLoraLoaderMixin
+
+## CogVideoXLoraLoaderMixin
+
+[[autodoc]] loaders.lora_pipeline.CogVideoXLoraLoaderMixin
+
+## Mochi1LoraLoaderMixin
+
+[[autodoc]] loaders.lora_pipeline.Mochi1LoraLoaderMixin
+
+## AmusedLoraLoaderMixin
+
+[[autodoc]] loaders.lora_pipeline.AmusedLoraLoaderMixin
+
+## LoraBaseMixin
+
+[[autodoc]] loaders.lora_base.LoraBaseMixin
--- a/docs/source/en/api/loaders/peft.md
+++ b/docs/source/en/api/loaders/peft.md
@@ -12,7 +12,7 @@ specific language governing permissions and limitations under the License.

 # PEFT

-Diffusers supports loading adapters such as [LoRA](../../using-diffusers/loading_adapters) with the [PEFT](https://huggingface.co/docs/peft/index) library with the [`~loaders.peft.PeftAdapterMixin`] class. This allows modeling classes in Diffusers like [`UNet2DConditionModel`] to load an adapter.
+Diffusers supports loading adapters such as [LoRA](../../using-diffusers/loading_adapters) with the [PEFT](https://huggingface.co/docs/peft/index) library with the [`~loaders.peft.PeftAdapterMixin`] class. This allows modeling classes in Diffusers like [`UNet2DConditionModel`], [`SD3Transformer2DModel`] to operate with an adapter.

 <Tip>

--- a/docs/source/en/api/loaders/single_file.md
+++ b/docs/source/en/api/loaders/single_file.md
@@ -12,26 +12,51 @@ specific language governing permissions and limitations under the License.

 # Single files

-Diffusers supports loading pretrained pipeline (or model) weights stored in a single file, such as a `ckpt` or `safetensors` file. These single file types are typically produced from community trained models. There are three classes for loading single file weights:
+The [`~loaders.FromSingleFileMixin.from_single_file`] method allows you to load:

- [`FromSingleFileMixin`] supports loading pretrained pipeline weights stored in a single file, which can either be a `ckpt` or `safetensors` file.
- [`FromOriginalVAEMixin`] supports loading a pretrained [`AutoencoderKL`] from pretrained ControlNet weights stored in a single file, which can either be a `ckpt` or `safetensors` file.
- [`FromOriginalControlnetMixin`] supports loading pretrained ControlNet weights stored in a single file, which can either be a `ckpt` or `safetensors` file.
+* a model stored in a single file, which is useful if you're working with models from the diffusion ecosystem, like Automatic1111, and commonly rely on a single-file layout to store and share models
+* a model stored in their originally distributed layout, which is useful if you're working with models finetuned with other services, and want to load it directly into Diffusers model objects and pipelines

-<Tip>
+> [!TIP]
+> Read the [Model files and layouts](../../using-diffusers/other-formats) guide to learn more about the Diffusers-multifolder layout versus the single-file layout, and how to load models stored in these different layouts.

-To learn more about how to load single file weights, see the [Load different Stable Diffusion formats](../../using-diffusers/other-formats) loading guide.
+## Supported pipelines

-</Tip>
+- [`StableDiffusionPipeline`]
+- [`StableDiffusionImg2ImgPipeline`]
+- [`StableDiffusionInpaintPipeline`]
+- [`StableDiffusionControlNetPipeline`]
+- [`StableDiffusionControlNetImg2ImgPipeline`]
+- [`StableDiffusionControlNetInpaintPipeline`]
+- [`StableDiffusionUpscalePipeline`]
+- [`StableDiffusionXLPipeline`]
+- [`StableDiffusionXLImg2ImgPipeline`]
+- [`StableDiffusionXLInpaintPipeline`]
+- [`StableDiffusionXLInstructPix2PixPipeline`]
+- [`StableDiffusionXLControlNetPipeline`]
+- [`StableDiffusionXLKDiffusionPipeline`]
+- [`StableDiffusion3Pipeline`]
+- [`LatentConsistencyModelPipeline`]
+- [`LatentConsistencyModelImg2ImgPipeline`]
+- [`StableDiffusionControlNetXSPipeline`]
+- [`StableDiffusionXLControlNetXSPipeline`]
+- [`LEditsPPPipelineStableDiffusion`]
+- [`LEditsPPPipelineStableDiffusionXL`]
+- [`PIAPipeline`]
+
+## Supported models
+
+- [`UNet2DConditionModel`]
+- [`StableCascadeUNet`]
+- [`AutoencoderKL`]
+- [`ControlNetModel`]
+- [`SD3Transformer2DModel`]
+- [`FluxTransformer2DModel`]

 ## FromSingleFileMixin

 [[autodoc]] loaders.single_file.FromSingleFileMixin

-## FromOriginalVAEMixin
+## FromOriginalModelMixin

-[[autodoc]] loaders.autoencoder.FromOriginalVAEMixin
-
-## FromOriginalControlnetMixin
-
-[[autodoc]] loaders.controlnet.FromOriginalControlNetMixin
+[[autodoc]] loaders.single_file_model.FromOriginalModelMixin
--- a/docs/source/en/api/loaders/transformer_sd3.md
+++ b/docs/source/en/api/loaders/transformer_sd3.md
@@ -0,0 +1,29 @@
+<!--Copyright 2024 The HuggingFace Team. All rights reserved.
+
+Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
+the License. You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
+an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
+specific language governing permissions and limitations under the License.
+-->
+
+# SD3Transformer2D
+
+This class is useful when *only* loading weights into a [`SD3Transformer2DModel`]. If you need to load weights into the text encoder or a text encoder and SD3Transformer2DModel, check [`SD3LoraLoaderMixin`](lora#diffusers.loaders.SD3LoraLoaderMixin) class instead.
+
+The [`SD3Transformer2DLoadersMixin`] class currently only loads IP-Adapter weights, but will be used in the future to save weights and load LoRAs.
+
+<Tip>
+
+To learn more about how to load LoRA weights, see the [LoRA](../../using-diffusers/loading_adapters#lora) loading guide.
+
+</Tip>
+
+## SD3Transformer2DLoadersMixin
+
+[[autodoc]] loaders.transformer_sd3.SD3Transformer2DLoadersMixin
+    - all
+    - _load_ip_adapter_weights
--- a/docs/source/en/api/loaders/unet.md
+++ b/docs/source/en/api/loaders/unet.md
@@ -12,7 +12,7 @@ specific language governing permissions and limitations under the License.

 # UNet

-Some training methods - like LoRA and Custom Diffusion - typically target the UNet's attention layers, but these training methods can also target other non-attention layers. Instead of training all of a model's parameters, only a subset of the parameters are trained, which is faster and more efficient. This class is useful if you're *only* loading weights into a UNet. If you need to load weights into the text encoder or a text encoder and UNet, try using the [`~loaders.LoraLoaderMixin.load_lora_weights`] function instead.
+Some training methods - like LoRA and Custom Diffusion - typically target the UNet's attention layers, but these training methods can also target other non-attention layers. Instead of training all of a model's parameters, only a subset of the parameters are trained, which is faster and more efficient. This class is useful if you're *only* loading weights into a UNet. If you need to load weights into the text encoder or a text encoder and UNet, try using the [`~loaders.StableDiffusionLoraLoaderMixin.load_lora_weights`] function instead.

 The [`UNet2DConditionLoadersMixin`] class provides functions for loading and saving weights, fusing and unfusing LoRAs, disabling and enabling LoRAs, and setting and deleting adapters.

--- a/docs/source/en/api/models/allegro_transformer3d.md
+++ b/docs/source/en/api/models/allegro_transformer3d.md
@@ -0,0 +1,30 @@
+<!-- Copyright 2024 The HuggingFace Team. All rights reserved.
+
+Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
+the License. You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
+an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
+specific language governing permissions and limitations under the License. -->
+
+# AllegroTransformer3DModel
+
+A Diffusion Transformer model for 3D data from [Allegro](https://github.com/rhymes-ai/Allegro) was introduced in [Allegro: Open the Black Box of Commercial-Level Video Generation Model](https://huggingface.co/papers/2410.15458) by RhymesAI.
+
+The model can be loaded with the following code snippet.
+
+```python
+from diffusers import AllegroTransformer3DModel
+
+transformer = AllegroTransformer3DModel.from_pretrained("rhymes-ai/Allegro", subfolder="transformer", torch_dtype=torch.bfloat16).to("cuda")
+```
+
+## AllegroTransformer3DModel
+
+[[autodoc]] AllegroTransformer3DModel
+
+## Transformer2DModelOutput
+
+[[autodoc]] models.modeling_outputs.Transformer2DModelOutput
--- a/docs/source/en/api/models/aura_flow_transformer2d.md
+++ b/docs/source/en/api/models/aura_flow_transformer2d.md
@@ -0,0 +1,19 @@
+<!--Copyright 2024 The HuggingFace Team. All rights reserved.
+
+Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
+the License. You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
+an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
+specific language governing permissions and limitations under the License.
+-->
+
+# AuraFlowTransformer2DModel
+
+A Transformer model for image-like data from [AuraFlow](https://blog.fal.ai/auraflow/).
+
+## AuraFlowTransformer2DModel
+
+[[autodoc]] AuraFlowTransformer2DModel
--- a/docs/source/en/api/models/autoencoder_dc.md
+++ b/docs/source/en/api/models/autoencoder_dc.md
@@ -0,0 +1,72 @@
+<!-- Copyright 2024 The HuggingFace Team. All rights reserved.
+
+Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
+the License. You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
+an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
+specific language governing permissions and limitations under the License. -->
+
+# AutoencoderDC
+
+The 2D Autoencoder model used in [SANA](https://huggingface.co/papers/2410.10629) and introduced in [DCAE](https://huggingface.co/papers/2410.10733) by authors Junyu Chen\*, Han Cai\*, Junsong Chen, Enze Xie, Shang Yang, Haotian Tang, Muyang Li, Yao Lu, Song Han from MIT HAN Lab.
+
+The abstract from the paper is:
+
+*We present Deep Compression Autoencoder (DC-AE), a new family of autoencoder models for accelerating high-resolution diffusion models. Existing autoencoder models have demonstrated impressive results at a moderate spatial compression ratio (e.g., 8x), but fail to maintain satisfactory reconstruction accuracy for high spatial compression ratios (e.g., 64x). We address this challenge by introducing two key techniques: (1) Residual Autoencoding, where we design our models to learn residuals based on the space-to-channel transformed features to alleviate the optimization difficulty of high spatial-compression autoencoders; (2) Decoupled High-Resolution Adaptation, an efficient decoupled three-phases training strategy for mitigating the generalization penalty of high spatial-compression autoencoders. With these designs, we improve the autoencoder's spatial compression ratio up to 128 while maintaining the reconstruction quality. Applying our DC-AE to latent diffusion models, we achieve significant speedup without accuracy drop. For example, on ImageNet 512x512, our DC-AE provides 19.1x inference speedup and 17.9x training speedup on H100 GPU for UViT-H while achieving a better FID, compared with the widely used SD-VAE-f8 autoencoder. Our code is available at [this https URL](https://github.com/mit-han-lab/efficientvit).*
+
+The following DCAE models are released and supported in Diffusers.
+
+| Diffusers format | Original format |
+|:----------------:|:---------------:|
+| [`mit-han-lab/dc-ae-f32c32-sana-1.0-diffusers`](https://huggingface.co/mit-han-lab/dc-ae-f32c32-sana-1.0-diffusers) | [`mit-han-lab/dc-ae-f32c32-sana-1.0`](https://huggingface.co/mit-han-lab/dc-ae-f32c32-sana-1.0)
+| [`mit-han-lab/dc-ae-f32c32-in-1.0-diffusers`](https://huggingface.co/mit-han-lab/dc-ae-f32c32-in-1.0-diffusers) | [`mit-han-lab/dc-ae-f32c32-in-1.0`](https://huggingface.co/mit-han-lab/dc-ae-f32c32-in-1.0)
+| [`mit-han-lab/dc-ae-f32c32-mix-1.0-diffusers`](https://huggingface.co/mit-han-lab/dc-ae-f32c32-mix-1.0-diffusers) | [`mit-han-lab/dc-ae-f32c32-mix-1.0`](https://huggingface.co/mit-han-lab/dc-ae-f32c32-mix-1.0)
+| [`mit-han-lab/dc-ae-f64c128-in-1.0-diffusers`](https://huggingface.co/mit-han-lab/dc-ae-f64c128-in-1.0-diffusers) | [`mit-han-lab/dc-ae-f64c128-in-1.0`](https://huggingface.co/mit-han-lab/dc-ae-f64c128-in-1.0)
+| [`mit-han-lab/dc-ae-f64c128-mix-1.0-diffusers`](https://huggingface.co/mit-han-lab/dc-ae-f64c128-mix-1.0-diffusers) | [`mit-han-lab/dc-ae-f64c128-mix-1.0`](https://huggingface.co/mit-han-lab/dc-ae-f64c128-mix-1.0)
+| [`mit-han-lab/dc-ae-f128c512-in-1.0-diffusers`](https://huggingface.co/mit-han-lab/dc-ae-f128c512-in-1.0-diffusers) | [`mit-han-lab/dc-ae-f128c512-in-1.0`](https://huggingface.co/mit-han-lab/dc-ae-f128c512-in-1.0)
+| [`mit-han-lab/dc-ae-f128c512-mix-1.0-diffusers`](https://huggingface.co/mit-han-lab/dc-ae-f128c512-mix-1.0-diffusers) | [`mit-han-lab/dc-ae-f128c512-mix-1.0`](https://huggingface.co/mit-han-lab/dc-ae-f128c512-mix-1.0)
+
+This model was contributed by [lawrence-cj](https://github.com/lawrence-cj).
+
+Load a model in Diffusers format with [`~ModelMixin.from_pretrained`].
+
+```python
+from diffusers import AutoencoderDC
+
+ae = AutoencoderDC.from_pretrained("mit-han-lab/dc-ae-f32c32-sana-1.0-diffusers", torch_dtype=torch.float32).to("cuda")
+```
+
+## Load a model in Diffusers via `from_single_file`
+
+```python
+from difusers import AutoencoderDC
+
+ckpt_path = "https://huggingface.co/mit-han-lab/dc-ae-f32c32-sana-1.0/blob/main/model.safetensors"
+model = AutoencoderDC.from_single_file(ckpt_path) 
+
+```
+
+The `AutoencoderDC` model has `in` and `mix` single file checkpoint variants that have matching checkpoint keys, but use different scaling factors. It is not possible for Diffusers to automatically infer the correct config file to use with the model based on just the checkpoint and will default to configuring the model using the `mix` variant config file. To override the automatically determined config, please use the `config` argument when using single file loading with `in` variant checkpoints. 
+
+```python
+from diffusers import AutoencoderDC
+
+ckpt_path = "https://huggingface.co/mit-han-lab/dc-ae-f128c512-in-1.0/blob/main/model.safetensors"
+model = AutoencoderDC.from_single_file(ckpt_path, config="mit-han-lab/dc-ae-f128c512-in-1.0-diffusers")
+```
+
+
+## AutoencoderDC
+
+[[autodoc]] AutoencoderDC
+  - encode
+  - decode
+  - all
+
+## DecoderOutput
+
+[[autodoc]] models.autoencoders.vae.DecoderOutput
+
--- a/docs/source/en/api/models/autoencoder_kl_hunyuan_video.md
+++ b/docs/source/en/api/models/autoencoder_kl_hunyuan_video.md
@@ -0,0 +1,32 @@
+<!-- Copyright 2024 The HuggingFace Team. All rights reserved.
+
+Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
+the License. You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
+an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
+specific language governing permissions and limitations under the License. -->
+
+# AutoencoderKLHunyuanVideo
+
+The 3D variational autoencoder (VAE) model with KL loss used in [HunyuanVideo](https://github.com/Tencent/HunyuanVideo/), which was introduced in [HunyuanVideo: A Systematic Framework For Large Video Generative Models](https://huggingface.co/papers/2412.03603) by Tencent.
+
+The model can be loaded with the following code snippet.
+
+```python
+from diffusers import AutoencoderKLHunyuanVideo
+
+vae = AutoencoderKLHunyuanVideo.from_pretrained("hunyuanvideo-community/HunyuanVideo", subfolder="vae", torch_dtype=torch.float16)
+```
+
+## AutoencoderKLHunyuanVideo
+
+[[autodoc]] AutoencoderKLHunyuanVideo
+  - decode
+  - all
+
+## DecoderOutput
+
+[[autodoc]] models.autoencoders.vae.DecoderOutput
--- a/docs/source/en/api/models/autoencoder_oobleck.md
+++ b/docs/source/en/api/models/autoencoder_oobleck.md
@@ -0,0 +1,38 @@
+<!--Copyright 2024 The HuggingFace Team. All rights reserved.
+
+Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
+the License. You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
+an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
+specific language governing permissions and limitations under the License.
+-->
+
+# AutoencoderOobleck
+
+The Oobleck variational autoencoder (VAE) model with KL loss was introduced in [Stability-AI/stable-audio-tools](https://github.com/Stability-AI/stable-audio-tools) and [Stable Audio Open](https://huggingface.co/papers/2407.14358) by Stability AI. The model is used in 🤗 Diffusers to encode audio waveforms into latents and to decode latent representations into audio waveforms.
+
+The abstract from the paper is:
+
+*Open generative models are vitally important for the community, allowing for fine-tunes and serving as baselines when presenting new models. However, most current text-to-audio models are private and not accessible for artists and researchers to build upon. Here we describe the architecture and training process of a new open-weights text-to-audio model trained with Creative Commons data. Our evaluation shows that the model's performance is competitive with the state-of-the-art across various metrics. Notably, the reported FDopenl3 results (measuring the realism of the generations) showcase its potential for high-quality stereo sound synthesis at 44.1kHz.*
+
+## AutoencoderOobleck
+
+[[autodoc]] AutoencoderOobleck
+    - decode
+    - encode
+    - all
+
+## OobleckDecoderOutput
+
+[[autodoc]] models.autoencoders.autoencoder_oobleck.OobleckDecoderOutput
+
+## OobleckDecoderOutput
+
+[[autodoc]] models.autoencoders.autoencoder_oobleck.OobleckDecoderOutput
+
+## AutoencoderOobleckOutput
+
+[[autodoc]] models.autoencoders.autoencoder_oobleck.AutoencoderOobleckOutput
--- a/docs/source/en/api/models/autoencoderkl.md
+++ b/docs/source/en/api/models/autoencoderkl.md
@@ -21,7 +21,7 @@ The abstract from the paper is:
 ## Loading from the original format

 By default the [`AutoencoderKL`] should be loaded with [`~ModelMixin.from_pretrained`], but it can also be loaded
-from the original format using [`FromOriginalVAEMixin.from_single_file`] as follows:
+from the original format using [`FromOriginalModelMixin.from_single_file`] as follows:

 ```py
 from diffusers import AutoencoderKL
--- a/docs/source/en/api/models/autoencoderkl_allegro.md
+++ b/docs/source/en/api/models/autoencoderkl_allegro.md
@@ -0,0 +1,37 @@
+<!-- Copyright 2024 The HuggingFace Team. All rights reserved.
+
+Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
+the License. You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
+an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
+specific language governing permissions and limitations under the License. -->
+
+# AutoencoderKLAllegro
+
+The 3D variational autoencoder (VAE) model with KL loss used in [Allegro](https://github.com/rhymes-ai/Allegro) was introduced in [Allegro: Open the Black Box of Commercial-Level Video Generation Model](https://huggingface.co/papers/2410.15458) by RhymesAI.
+
+The model can be loaded with the following code snippet.
+
+```python
+from diffusers import AutoencoderKLAllegro
+
+vae = AutoencoderKLCogVideoX.from_pretrained("rhymes-ai/Allegro", subfolder="vae", torch_dtype=torch.float32).to("cuda")
+```
+
+## AutoencoderKLAllegro
+
+[[autodoc]] AutoencoderKLAllegro
+    - decode
+    - encode
+    - all
+
+## AutoencoderKLOutput
+
+[[autodoc]] models.autoencoders.autoencoder_kl.AutoencoderKLOutput
+
+## DecoderOutput
+
+[[autodoc]] models.autoencoders.vae.DecoderOutput
--- a/docs/source/en/api/models/autoencoderkl_cogvideox.md
+++ b/docs/source/en/api/models/autoencoderkl_cogvideox.md
@@ -0,0 +1,37 @@
+<!--Copyright 2024 The HuggingFace Team. All rights reserved.
+
+Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
+the License. You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
+an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
+specific language governing permissions and limitations under the License. -->
+
+# AutoencoderKLCogVideoX
+
+The 3D variational autoencoder (VAE) model with KL loss used in [CogVideoX](https://github.com/THUDM/CogVideo) was introduced in [CogVideoX: Text-to-Video Diffusion Models with An Expert Transformer](https://github.com/THUDM/CogVideo/blob/main/resources/CogVideoX.pdf) by Tsinghua University & ZhipuAI.
+
+The model can be loaded with the following code snippet.
+
+```python
+from diffusers import AutoencoderKLCogVideoX
+
+vae = AutoencoderKLCogVideoX.from_pretrained("THUDM/CogVideoX-2b", subfolder="vae", torch_dtype=torch.float16).to("cuda")
+```
+
+## AutoencoderKLCogVideoX
+
+[[autodoc]] AutoencoderKLCogVideoX
+    - decode
+    - encode
+    - all
+
+## AutoencoderKLOutput
+
+[[autodoc]] models.autoencoders.autoencoder_kl.AutoencoderKLOutput
+
+## DecoderOutput
+
+[[autodoc]] models.autoencoders.vae.DecoderOutput
--- a/docs/source/en/api/models/autoencoderkl_ltx_video.md
+++ b/docs/source/en/api/models/autoencoderkl_ltx_video.md
@@ -0,0 +1,37 @@
+<!-- Copyright 2024 The HuggingFace Team. All rights reserved.
+
+Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
+the License. You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
+an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
+specific language governing permissions and limitations under the License. -->
+
+# AutoencoderKLLTXVideo
+
+The 3D variational autoencoder (VAE) model with KL loss used in [LTX](https://huggingface.co/Lightricks/LTX-Video) was introduced by Lightricks.
+
+The model can be loaded with the following code snippet.
+
+```python
+from diffusers import AutoencoderKLLTXVideo
+
+vae = AutoencoderKLLTXVideo.from_pretrained("Lightricks/LTX-Video", subfolder="vae", torch_dtype=torch.float32).to("cuda")
+```
+
+## AutoencoderKLLTXVideo
+
+[[autodoc]] AutoencoderKLLTXVideo
+    - decode
+    - encode
+    - all
+
+## AutoencoderKLOutput
+
+[[autodoc]] models.autoencoders.autoencoder_kl.AutoencoderKLOutput
+
+## DecoderOutput
+
+[[autodoc]] models.autoencoders.vae.DecoderOutput
--- a/docs/source/en/api/models/autoencoderkl_mochi.md
+++ b/docs/source/en/api/models/autoencoderkl_mochi.md
@@ -0,0 +1,32 @@
+<!-- Copyright 2024 The HuggingFace Team. All rights reserved.
+
+Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
+the License. You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
+an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
+specific language governing permissions and limitations under the License. -->
+
+# AutoencoderKLMochi
+
+The 3D variational autoencoder (VAE) model with KL loss used in [Mochi](https://github.com/genmoai/models) was introduced in [Mochi 1 Preview](https://huggingface.co/genmo/mochi-1-preview) by Tsinghua University & ZhipuAI.
+
+The model can be loaded with the following code snippet.
+
+```python
+from diffusers import AutoencoderKLMochi
+
+vae = AutoencoderKLMochi.from_pretrained("genmo/mochi-1-preview", subfolder="vae", torch_dtype=torch.float32).to("cuda")
+```
+
+## AutoencoderKLMochi
+
+[[autodoc]] AutoencoderKLMochi
+    - decode
+    - all
+
+## DecoderOutput
+
+[[autodoc]] models.autoencoders.vae.DecoderOutput
--- a/docs/source/en/api/models/cogvideox_transformer3d.md
+++ b/docs/source/en/api/models/cogvideox_transformer3d.md
@@ -0,0 +1,30 @@
+<!--Copyright 2024 The HuggingFace Team. All rights reserved.
+
+Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
+the License. You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
+an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
+specific language governing permissions and limitations under the License. -->
+
+# CogVideoXTransformer3DModel
+
+A Diffusion Transformer model for 3D data from [CogVideoX](https://github.com/THUDM/CogVideo) was introduced in [CogVideoX: Text-to-Video Diffusion Models with An Expert Transformer](https://github.com/THUDM/CogVideo/blob/main/resources/CogVideoX.pdf) by Tsinghua University & ZhipuAI.
+
+The model can be loaded with the following code snippet.
+
+```python
+from diffusers import CogVideoXTransformer3DModel
+
+transformer = CogVideoXTransformer3DModel.from_pretrained("THUDM/CogVideoX-2b", subfolder="transformer", torch_dtype=torch.float16).to("cuda")
+```
+
+## CogVideoXTransformer3DModel
+
+[[autodoc]] CogVideoXTransformer3DModel
+
+## Transformer2DModelOutput
+
+[[autodoc]] models.modeling_outputs.Transformer2DModelOutput
--- a/docs/source/en/api/models/cogview3plus_transformer2d.md
+++ b/docs/source/en/api/models/cogview3plus_transformer2d.md
@@ -0,0 +1,30 @@
+<!--Copyright 2024 The HuggingFace Team. All rights reserved.
+
+Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
+the License. You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
+an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
+specific language governing permissions and limitations under the License. -->
+
+# CogView3PlusTransformer2DModel
+
+A Diffusion Transformer model for 2D data from [CogView3Plus](https://github.com/THUDM/CogView3) was introduced in [CogView3: Finer and Faster Text-to-Image Generation via Relay Diffusion](https://huggingface.co/papers/2403.05121) by Tsinghua University & ZhipuAI.
+
+The model can be loaded with the following code snippet.
+
+```python
+from diffusers import CogView3PlusTransformer2DModel
+
+transformer = CogView3PlusTransformer2DModel.from_pretrained("THUDM/CogView3Plus-3b", subfolder="transformer", torch_dtype=torch.bfloat16).to("cuda")
+```
+
+## CogView3PlusTransformer2DModel
+
+[[autodoc]] CogView3PlusTransformer2DModel
+
+## Transformer2DModelOutput
+
+[[autodoc]] models.modeling_outputs.Transformer2DModelOutput
--- a/docs/source/en/api/models/cogview4_transformer2d.md
+++ b/docs/source/en/api/models/cogview4_transformer2d.md
@@ -0,0 +1,30 @@
+<!--Copyright 2024 The HuggingFace Team. All rights reserved.
+
+Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
+the License. You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
+an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
+specific language governing permissions and limitations under the License. -->
+
+# CogView4Transformer2DModel
+
+A Diffusion Transformer model for 2D data from [CogView4]()
+
+The model can be loaded with the following code snippet.
+
+```python
+from diffusers import CogView4Transformer2DModel
+
+transformer = CogView4Transformer2DModel.from_pretrained("THUDM/CogView4-6B", subfolder="transformer", torch_dtype=torch.bfloat16).to("cuda")
+```
+
+## CogView4Transformer2DModel
+
+[[autodoc]] CogView4Transformer2DModel
+
+## Transformer2DModelOutput
+
+[[autodoc]] models.modeling_outputs.Transformer2DModelOutput
--- a/docs/source/en/api/models/consisid_transformer3d.md
+++ b/docs/source/en/api/models/consisid_transformer3d.md
@@ -0,0 +1,30 @@
+<!--Copyright 2024 The HuggingFace Team. All rights reserved.
+
+Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
+the License. You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
+an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
+specific language governing permissions and limitations under the License. -->
+
+# ConsisIDTransformer3DModel
+
+A Diffusion Transformer model for 3D data from [ConsisID](https://github.com/PKU-YuanGroup/ConsisID) was introduced in [Identity-Preserving Text-to-Video Generation by Frequency Decomposition](https://arxiv.org/pdf/2411.17440) by Peking University & University of Rochester & etc.
+
+The model can be loaded with the following code snippet.
+
+```python
+from diffusers import ConsisIDTransformer3DModel
+
+transformer = ConsisIDTransformer3DModel.from_pretrained("BestWishYsh/ConsisID-preview", subfolder="transformer", torch_dtype=torch.bfloat16).to("cuda")
+```
+
+## ConsisIDTransformer3DModel
+
+[[autodoc]] ConsisIDTransformer3DModel
+
+## Transformer2DModelOutput
+
+[[autodoc]] models.modeling_outputs.Transformer2DModelOutput
--- a/docs/source/en/api/models/consistency_decoder_vae.md
+++ b/docs/source/en/api/models/consistency_decoder_vae.md
@@ -1,6 +1,18 @@
+<!--Copyright 2024 The HuggingFace Team. All rights reserved.
+
+Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
+the License. You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
+an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
+specific language governing permissions and limitations under the License.
+-->
+
 # Consistency Decoder

-Consistency decoder can be used to decode the latents from the denoising UNet in the [`StableDiffusionPipeline`]. This decoder was introduced in the [DALL-E 3 technical report](https://openai.com/dall-e-3). 
+Consistency decoder can be used to decode the latents from the denoising UNet in the [`StableDiffusionPipeline`]. This decoder was introduced in the [DALL-E 3 technical report](https://openai.com/dall-e-3).

 The original codebase can be found at [openai/consistencydecoder](https://github.com/openai/consistencydecoder).

--- a/docs/source/en/api/models/controlnet.md
+++ b/docs/source/en/api/models/controlnet.md
@@ -10,7 +10,7 @@ an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express o
 specific language governing permissions and limitations under the License.
 -->

-# ControlNet
+# ControlNetModel

 The ControlNet model was introduced in [Adding Conditional Control to Text-to-Image Diffusion Models](https://huggingface.co/papers/2302.05543) by Lvmin Zhang, Anyi Rao, Maneesh Agrawala. It provides a greater degree of control over text-to-image generation by conditioning the model on additional inputs such as edge maps, depth maps, segmentation maps, and keypoints for pose detection.

@@ -21,7 +21,7 @@ The abstract from the paper is:
 ## Loading from the original format

 By default the [`ControlNetModel`] should be loaded with [`~ModelMixin.from_pretrained`], but it can also be loaded
-from the original format using [`FromOriginalControlnetMixin.from_single_file`] as follows:
+from the original format using [`FromOriginalModelMixin.from_single_file`] as follows:

 ```py
 from diffusers import StableDiffusionControlNetPipeline, ControlNetModel
@@ -29,7 +29,7 @@ from diffusers import StableDiffusionControlNetPipeline, ControlNetModel
 url = "https://huggingface.co/lllyasviel/ControlNet-v1-1/blob/main/control_v11p_sd15_canny.pth"  # can also be a local path
 controlnet = ControlNetModel.from_single_file(url)

-url = "https://huggingface.co/runwayml/stable-diffusion-v1-5/blob/main/v1-5-pruned.safetensors"  # can also be a local path
+url = "https://huggingface.co/stable-diffusion-v1-5/stable-diffusion-v1-5/blob/main/v1-5-pruned.safetensors"  # can also be a local path
 pipe = StableDiffusionControlNetPipeline.from_single_file(url, controlnet=controlnet)
 ```

@@ -39,7 +39,7 @@ pipe = StableDiffusionControlNetPipeline.from_single_file(url, controlnet=contro

 ## ControlNetOutput

-[[autodoc]] models.controlnet.ControlNetOutput
+[[autodoc]] models.controlnets.controlnet.ControlNetOutput

 ## FlaxControlNetModel

@@ -47,4 +47,4 @@ pipe = StableDiffusionControlNetPipeline.from_single_file(url, controlnet=contro

 ## FlaxControlNetOutput

-[[autodoc]] models.controlnet_flax.FlaxControlNetOutput
+[[autodoc]] models.controlnets.controlnet_flax.FlaxControlNetOutput
--- a/docs/source/en/api/models/controlnet_flux.md
+++ b/docs/source/en/api/models/controlnet_flux.md
@@ -0,0 +1,45 @@
+<!--Copyright 2024 The HuggingFace Team and The InstantX Team. All rights reserved.
+
+Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
+the License. You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
+an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
+specific language governing permissions and limitations under the License.
+-->
+
+# FluxControlNetModel
+
+FluxControlNetModel is an implementation of ControlNet for Flux.1.
+
+The ControlNet model was introduced in [Adding Conditional Control to Text-to-Image Diffusion Models](https://huggingface.co/papers/2302.05543) by Lvmin Zhang, Anyi Rao, Maneesh Agrawala. It provides a greater degree of control over text-to-image generation by conditioning the model on additional inputs such as edge maps, depth maps, segmentation maps, and keypoints for pose detection.
+
+The abstract from the paper is:
+
+*We present ControlNet, a neural network architecture to add spatial conditioning controls to large, pretrained text-to-image diffusion models. ControlNet locks the production-ready large diffusion models, and reuses their deep and robust encoding layers pretrained with billions of images as a strong backbone to learn a diverse set of conditional controls. The neural architecture is connected with "zero convolutions" (zero-initialized convolution layers) that progressively grow the parameters from zero and ensure that no harmful noise could affect the finetuning. We test various conditioning controls, eg, edges, depth, segmentation, human pose, etc, with Stable Diffusion, using single or multiple conditions, with or without prompts. We show that the training of ControlNets is robust with small (<50k) and large (>1m) datasets. Extensive results show that ControlNet may facilitate wider applications to control image diffusion models.*
+
+## Loading from the original format
+
+By default the [`FluxControlNetModel`] should be loaded with [`~ModelMixin.from_pretrained`].
+
+```py
+from diffusers import FluxControlNetPipeline
+from diffusers.models import FluxControlNetModel, FluxMultiControlNetModel
+
+controlnet = FluxControlNetModel.from_pretrained("InstantX/FLUX.1-dev-Controlnet-Canny")
+pipe = FluxControlNetPipeline.from_pretrained("black-forest-labs/FLUX.1-dev", controlnet=controlnet)
+
+controlnet = FluxControlNetModel.from_pretrained("InstantX/FLUX.1-dev-Controlnet-Canny")
+controlnet = FluxMultiControlNetModel([controlnet])
+pipe = FluxControlNetPipeline.from_pretrained("black-forest-labs/FLUX.1-dev", controlnet=controlnet)
+```
+
+## FluxControlNetModel
+
+[[autodoc]] FluxControlNetModel
+
+## FluxControlNetOutput
+
+[[autodoc]] models.controlnet_flux.FluxControlNetOutput
--- a/docs/source/en/api/models/controlnet_hunyuandit.md
+++ b/docs/source/en/api/models/controlnet_hunyuandit.md
@@ -0,0 +1,37 @@
+<!--Copyright 2024 The HuggingFace Team and Tencent Hunyuan Team. All rights reserved.
+
+Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
+the License. You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
+an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
+specific language governing permissions and limitations under the License.
+-->
+
+# HunyuanDiT2DControlNetModel
+
+HunyuanDiT2DControlNetModel is an implementation of ControlNet for [Hunyuan-DiT](https://arxiv.org/abs/2405.08748).
+
+ControlNet was introduced in [Adding Conditional Control to Text-to-Image Diffusion Models](https://huggingface.co/papers/2302.05543) by Lvmin Zhang, Anyi Rao, and Maneesh Agrawala.
+
+With a ControlNet model, you can provide an additional control image to condition and control Hunyuan-DiT generation. For example, if you provide a depth map, the ControlNet model generates an image that'll preserve the spatial information from the depth map. It is a more flexible and accurate way to control the image generation process.
+
+The abstract from the paper is:
+
+*We present ControlNet, a neural network architecture to add spatial conditioning controls to large, pretrained text-to-image diffusion models. ControlNet locks the production-ready large diffusion models, and reuses their deep and robust encoding layers pretrained with billions of images as a strong backbone to learn a diverse set of conditional controls. The neural architecture is connected with "zero convolutions" (zero-initialized convolution layers) that progressively grow the parameters from zero and ensure that no harmful noise could affect the finetuning. We test various conditioning controls, eg, edges, depth, segmentation, human pose, etc, with Stable Diffusion, using single or multiple conditions, with or without prompts. We show that the training of ControlNets is robust with small (<50k) and large (>1m) datasets. Extensive results show that ControlNet may facilitate wider applications to control image diffusion models.*
+
+This code is implemented by Tencent Hunyuan Team. You can find pre-trained checkpoints for Hunyuan-DiT ControlNets on [Tencent Hunyuan](https://huggingface.co/Tencent-Hunyuan).
+
+## Example For Loading HunyuanDiT2DControlNetModel
+
+```py
+from diffusers import HunyuanDiT2DControlNetModel
+import torch
+controlnet = HunyuanDiT2DControlNetModel.from_pretrained("Tencent-Hunyuan/HunyuanDiT-v1.1-ControlNet-Diffusers-Pose", torch_dtype=torch.float16)
+```
+
+## HunyuanDiT2DControlNetModel
+
+[[autodoc]] HunyuanDiT2DControlNetModel
--- a/docs/source/en/api/models/controlnet_sd3.md
+++ b/docs/source/en/api/models/controlnet_sd3.md
@@ -0,0 +1,42 @@
+<!--Copyright 2024 The HuggingFace Team and The InstantX Team. All rights reserved.
+
+Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
+the License. You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
+an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
+specific language governing permissions and limitations under the License.
+-->
+
+# SD3ControlNetModel
+
+SD3ControlNetModel is an implementation of ControlNet for Stable Diffusion 3.
+
+The ControlNet model was introduced in [Adding Conditional Control to Text-to-Image Diffusion Models](https://huggingface.co/papers/2302.05543) by Lvmin Zhang, Anyi Rao, Maneesh Agrawala. It provides a greater degree of control over text-to-image generation by conditioning the model on additional inputs such as edge maps, depth maps, segmentation maps, and keypoints for pose detection.
+
+The abstract from the paper is:
+
+*We present ControlNet, a neural network architecture to add spatial conditioning controls to large, pretrained text-to-image diffusion models. ControlNet locks the production-ready large diffusion models, and reuses their deep and robust encoding layers pretrained with billions of images as a strong backbone to learn a diverse set of conditional controls. The neural architecture is connected with "zero convolutions" (zero-initialized convolution layers) that progressively grow the parameters from zero and ensure that no harmful noise could affect the finetuning. We test various conditioning controls, eg, edges, depth, segmentation, human pose, etc, with Stable Diffusion, using single or multiple conditions, with or without prompts. We show that the training of ControlNets is robust with small (<50k) and large (>1m) datasets. Extensive results show that ControlNet may facilitate wider applications to control image diffusion models.*
+
+## Loading from the original format
+
+By default the [`SD3ControlNetModel`] should be loaded with [`~ModelMixin.from_pretrained`].
+
+```py
+from diffusers import StableDiffusion3ControlNetPipeline
+from diffusers.models import SD3ControlNetModel, SD3MultiControlNetModel
+
+controlnet = SD3ControlNetModel.from_pretrained("InstantX/SD3-Controlnet-Canny")
+pipe = StableDiffusion3ControlNetPipeline.from_pretrained("stabilityai/stable-diffusion-3-medium-diffusers", controlnet=controlnet)
+```
+
+## SD3ControlNetModel
+
+[[autodoc]] SD3ControlNetModel
+
+## SD3ControlNetOutput
+
+[[autodoc]] models.controlnets.controlnet_sd3.SD3ControlNetOutput
+
--- a/docs/source/en/api/models/controlnet_sparsectrl.md
+++ b/docs/source/en/api/models/controlnet_sparsectrl.md
@@ -0,0 +1,46 @@
+<!-- Copyright 2024 The HuggingFace Team. All rights reserved.
+
+Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
+the License. You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
+an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
+specific language governing permissions and limitations under the License. -->
+
+# SparseControlNetModel
+
+SparseControlNetModel is an implementation of ControlNet for [AnimateDiff](https://arxiv.org/abs/2307.04725).
+
+ControlNet was introduced in [Adding Conditional Control to Text-to-Image Diffusion Models](https://huggingface.co/papers/2302.05543) by Lvmin Zhang, Anyi Rao, and Maneesh Agrawala.
+
+The SparseCtrl version of ControlNet was introduced in [SparseCtrl: Adding Sparse Controls to Text-to-Video Diffusion Models](https://arxiv.org/abs/2311.16933) for achieving controlled generation in text-to-video diffusion models by Yuwei Guo, Ceyuan Yang, Anyi Rao, Maneesh Agrawala, Dahua Lin, and Bo Dai.
+
+The abstract from the paper is:
+
+*The development of text-to-video (T2V), i.e., generating videos with a given text prompt, has been significantly advanced in recent years. However, relying solely on text prompts often results in ambiguous frame composition due to spatial uncertainty. The research community thus leverages the dense structure signals, e.g., per-frame depth/edge sequences, to enhance controllability, whose collection accordingly increases the burden of inference. In this work, we present SparseCtrl to enable flexible structure control with temporally sparse signals, requiring only one or a few inputs, as shown in Figure 1. It incorporates an additional condition encoder to process these sparse signals while leaving the pre-trained T2V model untouched. The proposed approach is compatible with various modalities, including sketches, depth maps, and RGB images, providing more practical control for video generation and promoting applications such as storyboarding, depth rendering, keyframe animation, and interpolation. Extensive experiments demonstrate the generalization of SparseCtrl on both original and personalized T2V generators. Codes and models will be publicly available at [this https URL](https://guoyww.github.io/projects/SparseCtrl).*
+
+## Example for loading SparseControlNetModel
+
+```python
+import torch
+from diffusers import SparseControlNetModel
+
+# fp32 variant in float16
+# 1. Scribble checkpoint
+controlnet = SparseControlNetModel.from_pretrained("guoyww/animatediff-sparsectrl-scribble", torch_dtype=torch.float16)
+
+# 2. RGB checkpoint
+controlnet = SparseControlNetModel.from_pretrained("guoyww/animatediff-sparsectrl-rgb", torch_dtype=torch.float16)
+
+# For loading fp16 variant, pass `variant="fp16"` as an additional parameter
+```
+
+## SparseControlNetModel
+
+[[autodoc]] SparseControlNetModel
+
+## SparseControlNetOutput
+
+[[autodoc]] models.controlnet_sparsectrl.SparseControlNetOutput
--- a/docs/source/en/api/models/controlnet_union.md
+++ b/docs/source/en/api/models/controlnet_union.md
@@ -0,0 +1,35 @@
+<!--Copyright 2024 The HuggingFace Team and The InstantX Team. All rights reserved.
+
+Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
+the License. You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
+an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
+specific language governing permissions and limitations under the License.
+-->
+
+# ControlNetUnionModel
+
+ControlNetUnionModel is an implementation of ControlNet for Stable Diffusion XL.
+
+The ControlNet model was introduced in [ControlNetPlus](https://github.com/xinsir6/ControlNetPlus) by xinsir6. It supports multiple conditioning inputs without increasing computation.
+
+*We design a new architecture that can support 10+ control types in condition text-to-image generation and can generate high resolution images visually comparable with midjourney. The network is based on the original ControlNet architecture, we propose two new modules to: 1 Extend the original ControlNet to support different image conditions using the same network parameter. 2 Support multiple conditions input without increasing computation offload, which is especially important for designers who want to edit image in detail, different conditions use the same condition encoder, without adding extra computations or parameters.*
+
+## Loading
+
+By default the [`ControlNetUnionModel`] should be loaded with [`~ModelMixin.from_pretrained`].
+
+```py
+from diffusers import StableDiffusionXLControlNetUnionPipeline, ControlNetUnionModel
+
+controlnet = ControlNetUnionModel.from_pretrained("xinsir/controlnet-union-sdxl-1.0")
+pipe = StableDiffusionXLControlNetUnionPipeline.from_pretrained("stabilityai/stable-diffusion-xl-base-1.0", controlnet=controlnet)
+```
+
+## ControlNetUnionModel
+
+[[autodoc]] ControlNetUnionModel
+
--- a/docs/source/en/api/models/dit_transformer2d.md
+++ b/docs/source/en/api/models/dit_transformer2d.md
@@ -0,0 +1,19 @@
+<!--Copyright 2024 The HuggingFace Team. All rights reserved.
+
+Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
+the License. You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
+an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
+specific language governing permissions and limitations under the License.
+-->
+
+# DiTTransformer2DModel
+
+A Transformer model for image-like data from [DiT](https://huggingface.co/papers/2212.09748).
+
+## DiTTransformer2DModel
+
+[[autodoc]] DiTTransformer2DModel
--- a/docs/source/en/api/models/flux_transformer.md
+++ b/docs/source/en/api/models/flux_transformer.md
@@ -0,0 +1,19 @@
+<!--Copyright 2024 The HuggingFace Team. All rights reserved.
+
+Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
+the License. You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
+an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
+specific language governing permissions and limitations under the License.
+-->
+
+# FluxTransformer2DModel
+
+A Transformer model for image-like data from [Flux](https://blackforestlabs.ai/announcing-black-forest-labs/).
+
+## FluxTransformer2DModel
+
+[[autodoc]] FluxTransformer2DModel
--- a/docs/source/en/api/models/hunyuan_transformer2d.md
+++ b/docs/source/en/api/models/hunyuan_transformer2d.md
@@ -0,0 +1,20 @@
+<!--Copyright 2024 The HuggingFace Team. All rights reserved.
+
+Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
+the License. You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
+an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
+specific language governing permissions and limitations under the License.
+-->
+
+# HunyuanDiT2DModel
+
+A Diffusion Transformer model for 2D data from [Hunyuan-DiT](https://github.com/Tencent/HunyuanDiT).
+
+## HunyuanDiT2DModel
+
+[[autodoc]] HunyuanDiT2DModel
+
--- a/docs/source/en/api/models/hunyuan_video_transformer_3d.md
+++ b/docs/source/en/api/models/hunyuan_video_transformer_3d.md
@@ -0,0 +1,30 @@
+<!-- Copyright 2024 The HuggingFace Team. All rights reserved.
+
+Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
+the License. You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
+an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
+specific language governing permissions and limitations under the License. -->
+
+# HunyuanVideoTransformer3DModel
+
+A Diffusion Transformer model for 3D video-like data was introduced in [HunyuanVideo: A Systematic Framework For Large Video Generative Models](https://huggingface.co/papers/2412.03603) by Tencent.
+
+The model can be loaded with the following code snippet.
+
+```python
+from diffusers import HunyuanVideoTransformer3DModel
+
+transformer = HunyuanVideoTransformer3DModel.from_pretrained("hunyuanvideo-community/HunyuanVideo", subfolder="transformer", torch_dtype=torch.bfloat16)
+```
+
+## HunyuanVideoTransformer3DModel
+
+[[autodoc]] HunyuanVideoTransformer3DModel
+
+## Transformer2DModelOutput
+
+[[autodoc]] models.modeling_outputs.Transformer2DModelOutput
--- a/docs/source/en/api/models/latte_transformer3d.md
+++ b/docs/source/en/api/models/latte_transformer3d.md
@@ -0,0 +1,19 @@
+<!--Copyright 2024 The HuggingFace Team. All rights reserved.
+
+Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
+the License. You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
+an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
+specific language governing permissions and limitations under the License.
+-->
+
+## LatteTransformer3DModel
+
+A Diffusion Transformer model for 3D data from [Latte](https://github.com/Vchitect/Latte).
+
+## LatteTransformer3DModel
+
+[[autodoc]] LatteTransformer3DModel
--- a/docs/source/en/api/models/ltx_video_transformer3d.md
+++ b/docs/source/en/api/models/ltx_video_transformer3d.md
@@ -0,0 +1,30 @@
+<!-- Copyright 2024 The HuggingFace Team. All rights reserved.
+
+Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
+the License. You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
+an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
+specific language governing permissions and limitations under the License. -->
+
+# LTXVideoTransformer3DModel
+
+A Diffusion Transformer model for 3D data from [LTX](https://huggingface.co/Lightricks/LTX-Video) was introduced by Lightricks.
+
+The model can be loaded with the following code snippet.
+
+```python
+from diffusers import LTXVideoTransformer3DModel
+
+transformer = LTXVideoTransformer3DModel.from_pretrained("Lightricks/LTX-Video", subfolder="transformer", torch_dtype=torch.bfloat16).to("cuda")
+```
+
+## LTXVideoTransformer3DModel
+
+[[autodoc]] LTXVideoTransformer3DModel
+
+## Transformer2DModelOutput
+
+[[autodoc]] models.modeling_outputs.Transformer2DModelOutput
--- a/docs/source/en/api/models/lumina2_transformer2d.md
+++ b/docs/source/en/api/models/lumina2_transformer2d.md
@@ -0,0 +1,30 @@
+<!-- Copyright 2024 The HuggingFace Team. All rights reserved.
+
+Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
+the License. You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
+an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
+specific language governing permissions and limitations under the License. -->
+
+# Lumina2Transformer2DModel
+
+A Diffusion Transformer model for 3D video-like data was introduced in [Lumina Image 2.0](https://huggingface.co/Alpha-VLLM/Lumina-Image-2.0) by Alpha-VLLM.
+
+The model can be loaded with the following code snippet.
+
+```python
+from diffusers import Lumina2Transformer2DModel
+
+transformer = Lumina2Transformer2DModel.from_pretrained("Alpha-VLLM/Lumina-Image-2.0", subfolder="transformer", torch_dtype=torch.bfloat16)
+```
+
+## Lumina2Transformer2DModel
+
+[[autodoc]] Lumina2Transformer2DModel
+
+## Transformer2DModelOutput
+
+[[autodoc]] models.modeling_outputs.Transformer2DModelOutput
--- a/docs/source/en/api/models/lumina_nextdit2d.md
+++ b/docs/source/en/api/models/lumina_nextdit2d.md
@@ -0,0 +1,20 @@
+<!--Copyright 2024 The HuggingFace Team. All rights reserved.
+
+Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
+the License. You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
+an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
+specific language governing permissions and limitations under the License.
+-->
+
+# LuminaNextDiT2DModel
+
+A Next Version of Diffusion Transformer model for 2D data from [Lumina-T2X](https://github.com/Alpha-VLLM/Lumina-T2X).
+
+## LuminaNextDiT2DModel
+
+[[autodoc]] LuminaNextDiT2DModel
+
--- a/docs/source/en/api/models/mochi_transformer3d.md
+++ b/docs/source/en/api/models/mochi_transformer3d.md
@@ -0,0 +1,30 @@
+<!-- Copyright 2024 The HuggingFace Team. All rights reserved.
+
+Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
+the License. You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
+an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
+specific language governing permissions and limitations under the License. -->
+
+# MochiTransformer3DModel
+
+A Diffusion Transformer model for 3D video-like data was introduced in [Mochi-1 Preview](https://huggingface.co/genmo/mochi-1-preview) by Genmo.
+
+The model can be loaded with the following code snippet.
+
+```python
+from diffusers import MochiTransformer3DModel
+
+transformer = MochiTransformer3DModel.from_pretrained("genmo/mochi-1-preview", subfolder="transformer", torch_dtype=torch.float16).to("cuda")
+```
+
+## MochiTransformer3DModel
+
+[[autodoc]] MochiTransformer3DModel
+
+## Transformer2DModelOutput
+
+[[autodoc]] models.modeling_outputs.Transformer2DModelOutput
--- a/docs/source/en/api/models/omnigen_transformer.md
+++ b/docs/source/en/api/models/omnigen_transformer.md
@@ -0,0 +1,30 @@
+<!--Copyright 2024 The HuggingFace Team. All rights reserved.
+
+Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
+the License. You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
+an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
+specific language governing permissions and limitations under the License.
+-->
+
+# OmniGenTransformer2DModel
+
+A Transformer model that accepts multimodal instructions to generate images for [OmniGen](https://github.com/VectorSpaceLab/OmniGen/).
+
+The abstract from the paper is:
+
+*The emergence of Large Language Models (LLMs) has unified language  generation tasks and revolutionized human-machine interaction.  However, in the realm of image generation, a unified model capable of handling various tasks within a single framework remains largely unexplored. In this work, we introduce OmniGen, a new diffusion model for unified image generation. OmniGen is characterized by the following features: 1) Unification: OmniGen not only demonstrates text-to-image generation capabilities but also inherently supports various downstream tasks, such as image editing, subject-driven generation, and visual conditional generation. 2) Simplicity: The architecture of OmniGen is highly simplified, eliminating the need for additional plugins. Moreover, compared to existing diffusion models, it is more user-friendly and can complete complex tasks end-to-end through instructions without the need for extra intermediate steps, greatly simplifying the image generation workflow. 3) Knowledge Transfer: Benefit from learning in a unified format, OmniGen effectively transfers knowledge across different tasks, manages unseen tasks and domains, and exhibits novel capabilities. We also explore the model’s reasoning capabilities and potential applications of the chain-of-thought mechanism.  This work represents the first attempt at a general-purpose image generation model,  and we will release our resources at https://github.com/VectorSpaceLab/OmniGen to foster future advancements.*
+
+```python
+import torch
+from diffusers import OmniGenTransformer2DModel
+
+transformer = OmniGenTransformer2DModel.from_pretrained("Shitao/OmniGen-v1-diffusers", subfolder="transformer", torch_dtype=torch.bfloat16)
+```
+
+## OmniGenTransformer2DModel
+
+[[autodoc]] OmniGenTransformer2DModel
--- a/docs/source/en/api/models/pixart_transformer2d.md
+++ b/docs/source/en/api/models/pixart_transformer2d.md
@@ -0,0 +1,19 @@
+<!--Copyright 2024 The HuggingFace Team. All rights reserved.
+
+Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
+the License. You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
+an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
+specific language governing permissions and limitations under the License.
+-->
+
+# PixArtTransformer2DModel
+
+A Transformer model for image-like data from [PixArt-Alpha](https://huggingface.co/papers/2310.00426) and [PixArt-Sigma](https://huggingface.co/papers/2403.04692).
+
+## PixArtTransformer2DModel
+
+[[autodoc]] PixArtTransformer2DModel
--- a/docs/source/en/api/models/prior_transformer.md
+++ b/docs/source/en/api/models/prior_transformer.md
@@ -10,7 +10,7 @@ an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express o
 specific language governing permissions and limitations under the License.
 -->

-# Prior Transformer
+# PriorTransformer

 The Prior Transformer was originally introduced in [Hierarchical Text-Conditional Image Generation with CLIP Latents](https://huggingface.co/papers/2204.06125) by Ramesh et al. It is used to predict CLIP image embeddings from CLIP text embeddings; image embeddings are predicted through a denoising diffusion process.

--- a/docs/source/en/api/models/sana_transformer2d.md
+++ b/docs/source/en/api/models/sana_transformer2d.md
@@ -0,0 +1,34 @@
+<!-- Copyright 2024 The HuggingFace Team. All rights reserved.
+
+Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
+the License. You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
+an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
+specific language governing permissions and limitations under the License. -->
+
+# SanaTransformer2DModel
+
+A Diffusion Transformer model for 2D data from [SANA: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformers](https://huggingface.co/papers/2410.10629) was introduced from NVIDIA and MIT HAN Lab, by Enze Xie, Junsong Chen, Junyu Chen, Han Cai, Haotian Tang, Yujun Lin, Zhekai Zhang, Muyang Li, Ligeng Zhu, Yao Lu, Song Han.
+
+The abstract from the paper is:
+
+*We introduce Sana, a text-to-image framework that can efficiently generate images up to 4096×4096 resolution. Sana can synthesize high-resolution, high-quality images with strong text-image alignment at a remarkably fast speed, deployable on laptop GPU. Core designs include: (1) Deep compression autoencoder: unlike traditional AEs, which compress images only 8×, we trained an AE that can compress images 32×, effectively reducing the number of latent tokens. (2) Linear DiT: we replace all vanilla attention in DiT with linear attention, which is more efficient at high resolutions without sacrificing quality. (3) Decoder-only text encoder: we replaced T5 with modern decoder-only small LLM as the text encoder and designed complex human instruction with in-context learning to enhance the image-text alignment. (4) Efficient training and sampling: we propose Flow-DPM-Solver to reduce sampling steps, with efficient caption labeling and selection to accelerate convergence. As a result, Sana-0.6B is very competitive with modern giant diffusion model (e.g. Flux-12B), being 20 times smaller and 100+ times faster in measured throughput. Moreover, Sana-0.6B can be deployed on a 16GB laptop GPU, taking less than 1 second to generate a 1024×1024 resolution image. Sana enables content creation at low cost. Code and model will be publicly released.*
+
+The model can be loaded with the following code snippet.
+
+```python
+from diffusers import SanaTransformer2DModel
+
+transformer = SanaTransformer2DModel.from_pretrained("Efficient-Large-Model/Sana_1600M_1024px_BF16_diffusers", subfolder="transformer", torch_dtype=torch.bfloat16)
+```
+
+## SanaTransformer2DModel
+
+[[autodoc]] SanaTransformer2DModel
+
+## Transformer2DModelOutput
+
+[[autodoc]] models.modeling_outputs.Transformer2DModelOutput
--- a/docs/source/en/api/models/sd3_transformer2d.md
+++ b/docs/source/en/api/models/sd3_transformer2d.md
@@ -0,0 +1,19 @@
+<!--Copyright 2024 The HuggingFace Team. All rights reserved.
+
+Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
+the License. You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
+an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
+specific language governing permissions and limitations under the License.
+-->
+
+# SD3 Transformer Model
+
+The Transformer model introduced in [Stable Diffusion 3](https://hf.co/papers/2403.03206). Its novelty lies in the MMDiT transformer block.
+
+## SD3Transformer2DModel
+
+[[autodoc]] SD3Transformer2DModel
--- a/docs/source/en/api/models/stable_audio_transformer.md
+++ b/docs/source/en/api/models/stable_audio_transformer.md
@@ -0,0 +1,19 @@
+<!--Copyright 2024 The HuggingFace Team. All rights reserved.
+
+Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
+the License. You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
+an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
+specific language governing permissions and limitations under the License.
+-->
+
+# StableAudioDiTModel
+
+A Transformer model for audio waveforms from [Stable Audio Open](https://huggingface.co/papers/2407.14358).
+
+## StableAudioDiTModel
+
+[[autodoc]] StableAudioDiTModel
--- a/docs/source/en/api/models/stable_cascade_unet.md
+++ b/docs/source/en/api/models/stable_cascade_unet.md
@@ -0,0 +1,19 @@
+<!--Copyright 2024 The HuggingFace Team. All rights reserved.
+
+Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
+the License. You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
+an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
+specific language governing permissions and limitations under the License.
+-->
+
+# StableCascadeUNet
+
+A UNet model from the [Stable Cascade pipeline](../pipelines/stable_cascade.md).
+
+## StableCascadeUNet
+
+[[autodoc]] models.unets.unet_stable_cascade.StableCascadeUNet
--- a/Show More
+++ b/Show More