Release: v0.23.0

2025-12-07 04:54:47 +08:00 · 2023-11-09 21:55:21 +05:30
56 changed files with 229 additions and 5821 deletions
--- a/docs/README.md
+++ b/docs/README.md
@@ -16,7 +16,7 @@ limitations under the License.

 # Generating the documentation

-To generate the documentation, you first have to build it. Several packages are necessary to build the doc,
+To generate the documentation, you first have to build it. Several packages are necessary to build the doc, 
 you can install them with the following command, at the root of the code repository:

 ```bash
@@ -142,7 +142,7 @@ This will include every public method of the pipeline that is documented, as wel
 	- __call__
 	- enable_attention_slicing
 	- disable_attention_slicing
-    - enable_xformers_memory_efficient_attention
+    - enable_xformers_memory_efficient_attention 
    - disable_xformers_memory_efficient_attention
 ```

@@ -154,7 +154,7 @@ Values that should be put in `code` should either be surrounded by backticks: \`
 and objects like True, None, or any strings should usually be put in `code`.

 When mentioning a class, function, or method, it is recommended to use our syntax for internal links so that our tool
-adds a link to its documentation with this syntax: \[\`XXXClass\`\] or \[\`function\`\]. This requires the class or
+adds a link to its documentation with this syntax: \[\`XXXClass\`\] or \[\`function\`\]. This requires the class or 
 function to be in the main package.

 If you want to create a link to some internal class or function, you need to
--- a/docs/TRANSLATING.md
+++ b/docs/TRANSLATING.md
@@ -38,7 +38,7 @@ Here, `LANG-ID` should be one of the ISO 639-1 or ISO 639-2 language codes -- se

 The fun part comes - translating the text!

-The first thing we recommend is translating the part of the `_toctree.yml` file that corresponds to your doc chapter. This file is used to render the table of contents on the website.
+The first thing we recommend is translating the part of the `_toctree.yml` file that corresponds to your doc chapter. This file is used to render the table of contents on the website. 

 > 🙋 If the `_toctree.yml` file doesn't yet exist for your language, you can create one by copy-pasting from the English version and deleting the sections unrelated to your chapter. Just make sure it exists in the `docs/source/LANG-ID/` directory!

--- a/docs/source/en/_toctree.yml
+++ b/docs/source/en/_toctree.yml
@@ -135,7 +135,7 @@
    - local: optimization/memory
      title: Reduce memory usage
    - local: optimization/torch2.0
-      title: PyTorch 2.0
+      title: Torch 2.0
    - local: optimization/xformers
      title: xFormers
    - local: optimization/tome
--- a/docs/source/en/conceptual/ethical_guidelines.md
+++ b/docs/source/en/conceptual/ethical_guidelines.md
@@ -14,7 +14,7 @@ specific language governing permissions and limitations under the License.

 ## Preamble

-[Diffusers](https://huggingface.co/docs/diffusers/index) provides pre-trained diffusion models and serves as a modular toolbox for inference and training.
+[Diffusers](https://huggingface.co/docs/diffusers/index) provides pre-trained diffusion models and serves as a modular toolbox for inference and training. 

 Given its real case applications in the world and potential negative impacts on society, we think it is important to provide the project with ethical guidelines to guide the development, users’ contributions, and usage of the Diffusers library.

@@ -46,7 +46,7 @@ The following ethical guidelines apply generally, but we will primarily implemen

 ## Examples of implementations: Safety features and Mechanisms

-The team works daily to make the technical and non-technical tools available to deal with the potential ethical and social risks associated with diffusion technology. Moreover, the community's input is invaluable in ensuring these features' implementation and raising awareness with us.
+The team works daily to make the technical and non-technical tools available to deal with the potential ethical and social risks associated with diffusion technology. Moreover, the community's input is invaluable in ensuring these features' implementation and raising awareness with us. 

 - [**Community tab**](https://huggingface.co/docs/hub/repositories-pull-requests-discussions): it enables the community to discuss and better collaborate on a project.

@@ -60,4 +60,4 @@ The team works daily to make the technical and non-technical tools available to

 - **Staged released on the Hub**: in particularly sensitive situations, access to some repositories should be restricted. This staged release is an intermediary step that allows the repository’s authors to have more control over its use.

- **Licensing**: [OpenRAILs](https://huggingface.co/blog/open_rail), a new type of licensing, allow us to ensure free access while having a set of restrictions that ensure more responsible use.
+- **Licensing**: [OpenRAILs](https://huggingface.co/blog/open_rail), a new type of licensing, allow us to ensure free access while having a set of restrictions that ensure more responsible use. 
--- a/docs/source/en/conceptual/evaluation.md
+++ b/docs/source/en/conceptual/evaluation.md
@@ -12,9 +12,9 @@ specific language governing permissions and limitations under the License.

 # Evaluating Diffusion Models

-<a target="_blank" href="https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/evaluation.ipynb">
-    <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
-</a>
+<a target="_blank" href="https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/evaluation.ipynb">                                                                                                                                                                                                                                                                                                                                                            
+    <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>                                                                                                                                                 
+</a>   

 Evaluation of generative models like [Stable Diffusion](https://huggingface.co/docs/diffusers/stable_diffusion) is subjective in nature. But as practitioners and researchers, we often have to make careful choices amongst many different possibilities. So, when working with different generative models (like GANs, Diffusion, etc.), how do we choose one over the other?

@@ -23,7 +23,7 @@ However, quantitative metrics don't necessarily correspond to image quality. So,
 of both qualitative and quantitative evaluations provides a stronger signal when choosing one model
 over the other.

-In this document, we provide a non-exhaustive overview of qualitative and quantitative methods to evaluate Diffusion models. For quantitative methods, we specifically focus on how to implement them alongside `diffusers`.
+In this document, we provide a non-exhaustive overview of qualitative and quantitative methods to evaluate Diffusion models. For quantitative methods, we specifically focus on how to implement them alongside `diffusers`. 

 The methods shown in this document can also be used to evaluate different [noise schedulers](https://huggingface.co/docs/diffusers/main/en/api/schedulers/overview) keeping the underlying generation model fixed.

@@ -38,9 +38,9 @@ We cover Diffusion models with the following pipelines:
 ## Qualitative Evaluation

 Qualitative evaluation typically involves human assessment of generated images. Quality is measured across aspects such as compositionality, image-text alignment, and spatial relations. Common prompts provide a degree of uniformity for subjective metrics.
-DrawBench and PartiPrompts are prompt datasets used for qualitative benchmarking. DrawBench and PartiPrompts were introduced by [Imagen](https://imagen.research.google/) and [Parti](https://parti.research.google/) respectively.
+DrawBench and PartiPrompts are prompt datasets used for qualitative benchmarking. DrawBench and PartiPrompts were introduced by [Imagen](https://imagen.research.google/) and [Parti](https://parti.research.google/) respectively. 

-From the [official Parti website](https://parti.research.google/):
+From the [official Parti website](https://parti.research.google/): 

 > PartiPrompts (P2) is a rich set of over 1600 prompts in English that we release as part of this work. P2 can be used to measure model capabilities across various categories and challenge aspects.

@@ -52,13 +52,13 @@ PartiPrompts has the following columns:
 - Category of the prompt (such as “Abstract”, “World Knowledge”, etc.)
 - Challenge reflecting the difficulty (such as “Basic”, “Complex”, “Writing & Symbols”, etc.)

-These benchmarks allow for side-by-side human evaluation of different image generation models.
+These benchmarks allow for side-by-side human evaluation of different image generation models. 

 For this, the 🧨 Diffusers team has built **Open Parti Prompts**, which is a community-driven qualitative benchmark based on Parti Prompts to compare state-of-the-art open-source diffusion models:
 - [Open Parti Prompts Game](https://huggingface.co/spaces/OpenGenAI/open-parti-prompts): For 10 parti prompts, 4 generated images are shown and the user selects the image that suits the prompt best.
 - [Open Parti Prompts Leaderboard](https://huggingface.co/spaces/OpenGenAI/parti-prompts-leaderboard): The leaderboard comparing the currently best open-sourced diffusion models to each other.

-To manually compare images, let’s see how we can use `diffusers` on a couple of PartiPrompts.
+To manually compare images, let’s see how we can use `diffusers` on a couple of PartiPrompts. 

 Below we show some prompts sampled across different challenges: Basic, Complex, Linguistic Structures, Imagination, and Writing & Symbols. Here we are using PartiPrompts as a [dataset](https://huggingface.co/datasets/nateraw/parti-prompts).

@@ -92,16 +92,16 @@ images = sd_pipeline(sample_prompts, num_images_per_prompt=1, generator=generato

 ![parti-prompts-14](https://huggingface.co/datasets/diffusers/docs-images/resolve/main/evaluation_diffusion_models/parti-prompts-14.png)

-We can also set `num_images_per_prompt` accordingly to compare different images for the same prompt. Running the same pipeline but with a different checkpoint ([v1-5](https://huggingface.co/runwayml/stable-diffusion-v1-5)), yields:
+We can also set `num_images_per_prompt` accordingly to compare different images for the same prompt. Running the same pipeline but with a different checkpoint ([v1-5](https://huggingface.co/runwayml/stable-diffusion-v1-5)), yields: 

 ![parti-prompts-15](https://huggingface.co/datasets/diffusers/docs-images/resolve/main/evaluation_diffusion_models/parti-prompts-15.png)

 Once several images are generated from all the prompts using multiple models (under evaluation), these results are presented to human evaluators for scoring. For
-more details on the DrawBench and PartiPrompts benchmarks, refer to their respective papers.
+more details on the DrawBench and PartiPrompts benchmarks, refer to their respective papers.  

-<Tip>
+<Tip> 

-It is useful to look at some inference samples while a model is training to measure the
+It is useful to look at some inference samples while a model is training to measure the 
 training progress. In our [training scripts](https://github.com/huggingface/diffusers/tree/main/examples/), we support this utility with additional support for
 logging to TensorBoard and Weights & Biases.

@@ -177,7 +177,7 @@ generator = torch.manual_seed(seed)
 images = sd_pipeline(prompts, num_images_per_prompt=1, generator=generator, output_type="np").images
 ```

-Then we load the [v1-5 checkpoint](https://huggingface.co/runwayml/stable-diffusion-v1-5) to generate images:
+Then we load the [v1-5 checkpoint](https://huggingface.co/runwayml/stable-diffusion-v1-5) to generate images: 

 ```python
 model_ckpt_1_5 = "runwayml/stable-diffusion-v1-5"
@@ -205,7 +205,7 @@ It seems like the [v1-5](https://huggingface.co/runwayml/stable-diffusion-v1-5)
 By construction, there are some limitations in this score. The captions in the training dataset
 were crawled from the web and extracted from `alt` and similar tags associated an image on the internet.
 They are not necessarily representative of what a human being would use to describe an image. Hence we
-had to "engineer" some prompts here.
+had to "engineer" some prompts here. 

 </Tip>

@@ -551,15 +551,15 @@ FID results tend to be fragile as they depend on a lot of factors:
 * The implementation accuracy of the computation.
 * The image format (not the same if we start from PNGs vs JPGs).

-Keeping that in mind, FID is often most useful when comparing similar runs, but it is
-hard to reproduce paper results unless the authors carefully disclose the FID
+Keeping that in mind, FID is often most useful when comparing similar runs, but it is 
+hard to reproduce paper results unless the authors carefully disclose the FID 
 measurement code.

-These points apply to other related metrics too, such as KID and IS.
+These points apply to other related metrics too, such as KID and IS. 

 </Tip>

-As a final step, let's visually inspect the `fake_images`.
+As a final step, let's visually inspect the `fake_images`. 

 <p align="center">
    <img src="https://huggingface.co/datasets/diffusers/docs-images/resolve/main/evaluation_diffusion_models/fake-images.png" alt="fake-images"><br>
--- a/docs/source/en/conceptual/philosophy.md
+++ b/docs/source/en/conceptual/philosophy.md
@@ -27,18 +27,18 @@ In a nutshell, Diffusers is built to be a natural extension of PyTorch. Therefor

 ## Simple over easy

-As PyTorch states, **explicit is better than implicit** and **simple is better than complex**. This design philosophy is reflected in multiple parts of the library:
+As PyTorch states, **explicit is better than implicit** and **simple is better than complex**. This design philosophy is reflected in multiple parts of the library: 
 - We follow PyTorch's API with methods like [`DiffusionPipeline.to`](https://huggingface.co/docs/diffusers/main/en/api/diffusion_pipeline#diffusers.DiffusionPipeline.to) to let the user handle device management.
 - Raising concise error messages is preferred to silently correct erroneous input. Diffusers aims at teaching the user, rather than making the library as easy to use as possible.
 - Complex model vs. scheduler logic is exposed instead of magically handled inside. Schedulers/Samplers are separated from diffusion models with minimal dependencies on each other. This forces the user to write the unrolled denoising loop. However, the separation allows for easier debugging and gives the user more control over adapting the denoising process or switching out diffusion models or schedulers.
- Separately trained components of the diffusion pipeline, *e.g.* the text encoder, the unet, and the variational autoencoder, each have their own model class. This forces the user to handle the interaction between the different model components, and the serialization format separates the model components into different files. However, this allows for easier debugging and customization. DreamBooth or Textual Inversion training
+- Separately trained components of the diffusion pipeline, *e.g.* the text encoder, the unet, and the variational autoencoder, each have their own model class. This forces the user to handle the interaction between the different model components, and the serialization format separates the model components into different files. However, this allows for easier debugging and customization. DreamBooth or Textual Inversion training 
 is very simple thanks to Diffusers' ability to separate single components of the diffusion pipeline.

 ## Tweakable, contributor-friendly over abstraction

-For large parts of the library, Diffusers adopts an important design principle of the [Transformers library](https://github.com/huggingface/transformers), which is to prefer copy-pasted code over hasty abstractions. This design principle is very opinionated and stands in stark contrast to popular design principles such as [Don't repeat yourself (DRY)](https://en.wikipedia.org/wiki/Don%27t_repeat_yourself).
+For large parts of the library, Diffusers adopts an important design principle of the [Transformers library](https://github.com/huggingface/transformers), which is to prefer copy-pasted code over hasty abstractions. This design principle is very opinionated and stands in stark contrast to popular design principles such as [Don't repeat yourself (DRY)](https://en.wikipedia.org/wiki/Don%27t_repeat_yourself). 
 In short, just like Transformers does for modeling files, Diffusers prefers to keep an extremely low level of abstraction and very self-contained code for pipelines and schedulers.
-Functions, long code blocks, and even classes can be copied across multiple files which at first can look like a bad, sloppy design choice that makes the library unmaintainable.
+Functions, long code blocks, and even classes can be copied across multiple files which at first can look like a bad, sloppy design choice that makes the library unmaintainable. 
 **However**, this design has proven to be extremely successful for Transformers and makes a lot of sense for community-driven, open-source machine learning libraries because:
 - Machine Learning is an extremely fast-moving field in which paradigms, model architectures, and algorithms are changing rapidly, which therefore makes it very difficult to define long-lasting code abstractions.
 - Machine Learning practitioners like to be able to quickly tweak existing code for ideation and research and therefore prefer self-contained code over one that contains many abstractions.
@@ -47,10 +47,10 @@ Functions, long code blocks, and even classes can be copied across multiple file
 At Hugging Face, we call this design the **single-file policy** which means that almost all of the code of a certain class should be written in a single, self-contained file. To read more about the philosophy, you can have a look
 at [this blog post](https://huggingface.co/blog/transformers-design-philosophy).

-In Diffusers, we follow this philosophy for both pipelines and schedulers, but only partly for diffusion models. The reason we don't follow this design fully for diffusion models is because almost all diffusion pipelines, such
+In Diffusers, we follow this philosophy for both pipelines and schedulers, but only partly for diffusion models. The reason we don't follow this design fully for diffusion models is because almost all diffusion pipelines, such 
 as [DDPM](https://huggingface.co/docs/diffusers/api/pipelines/ddpm), [Stable Diffusion](https://huggingface.co/docs/diffusers/api/pipelines/stable_diffusion/overview#stable-diffusion-pipelines), [unCLIP (DALL·E 2)](https://huggingface.co/docs/diffusers/api/pipelines/unclip) and [Imagen](https://imagen.research.google/) all rely on the same diffusion model, the [UNet](https://huggingface.co/docs/diffusers/api/models/unet2d-cond).

-Great, now you should have generally understood why 🧨 Diffusers is designed the way it is 🤗.
+Great, now you should have generally understood why 🧨 Diffusers is designed the way it is 🤗. 
 We try to apply these design principles consistently across the library. Nevertheless, there are some minor exceptions to the philosophy or some unlucky design choices. If you have feedback regarding the design, we would ❤️  to hear it [directly on GitHub](https://github.com/huggingface/diffusers/issues/new?assignees=&labels=&template=feedback.md&title=).

 ## Design Philosophy in Details
@@ -89,7 +89,7 @@ The following design principles are followed:
 - Models should by default have the highest precision and lowest performance setting.
 - To integrate new model checkpoints whose general architecture can be classified as an architecture that already exists in Diffusers, the existing model architecture shall be adapted to make it work with the new checkpoint. One should only create a new file if the model architecture is fundamentally different.
 - Models should be designed to be easily extendable to future changes. This can be achieved by limiting public function arguments, configuration arguments, and "foreseeing" future changes, *e.g.* it is usually better to add `string` "...type" arguments that can easily be extended to new future types instead of boolean `is_..._type` arguments. Only the minimum amount of changes shall be made to existing architectures to make a new model checkpoint work.
- The model design is a difficult trade-off between keeping code readable and concise and supporting many model checkpoints. For most parts of the modeling code, classes shall be adapted for new model checkpoints, while there are some exceptions where it is preferred to add new classes to make sure the code is kept concise and
+- The model design is a difficult trade-off between keeping code readable and concise and supporting many model checkpoints. For most parts of the modeling code, classes shall be adapted for new model checkpoints, while there are some exceptions where it is preferred to add new classes to make sure the code is kept concise and 
 readable long-term, such as [UNet blocks](https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/unet_2d_blocks.py) and [Attention processors](https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/attention_processor.py).

 ### Schedulers
@@ -97,9 +97,9 @@ readable long-term, such as [UNet blocks](https://github.com/huggingface/diffuse
 Schedulers are responsible to guide the denoising process for inference as well as to define a noise schedule for training. They are designed as individual classes with loadable configuration files and strongly follow the **single-file policy**.

 The following design principles are followed:
- All schedulers are found in [`src/diffusers/schedulers`](https://github.com/huggingface/diffusers/tree/main/src/diffusers/schedulers).
- Schedulers are **not** allowed to import from large utils files and shall be kept very self-contained.
- One scheduler Python file corresponds to one scheduler algorithm (as might be defined in a paper).
+- All schedulers are found in [`src/diffusers/schedulers`](https://github.com/huggingface/diffusers/tree/main/src/diffusers/schedulers). 
+- Schedulers are **not** allowed to import from large utils files and shall be kept very self-contained. 
+- One scheduler Python file corresponds to one scheduler algorithm (as might be defined in a paper). 
 - If schedulers share similar functionalities, we can make use of the `#Copied from` mechanism.
 - Schedulers all inherit from `SchedulerMixin` and `ConfigMixin`.
 - Schedulers can be easily swapped out with the [`ConfigMixin.from_config`](https://huggingface.co/docs/diffusers/main/en/api/configuration#diffusers.ConfigMixin.from_config) method as explained in detail [here](../using-diffusers/schedulers.md).
--- a/docs/source/en/optimization/coreml.md
+++ b/docs/source/en/optimization/coreml.md
@@ -31,7 +31,7 @@ Thankfully, Apple engineers developed [a conversion tool](https://github.com/app
 Before you convert a model, though, take a moment to explore the Hugging Face Hub – chances are the model you're interested in is already available in Core ML format:

 - the [Apple](https://huggingface.co/apple) organization includes Stable Diffusion versions 1.4, 1.5, 2.0 base, and 2.1 base
- [coreml community](https://huggingface.co/coreml-community) includes custom finetuned models
+- [coreml](https://huggingface.co/coreml) organization includes custom DreamBoothed and finetuned models
 - use this [filter](https://huggingface.co/models?pipeline_tag=text-to-image&library=coreml&p=2&sort=likes) to return all available Core ML checkpoints

 If you can't find the model you're interested in, we recommend you follow the instructions for [Converting Models to Core ML](https://github.com/apple/ml-stable-diffusion#-converting-models-to-core-ml) by Apple.
@@ -90,6 +90,7 @@ snapshot_download(repo_id, allow_patterns=f"{variant}/*", local_dir=model_path,
 print(f"Model downloaded at {model_path}")
 ```

+
 ### Inference[[python-inference]]

 Once you have downloaded a snapshot of the model, you can test it using Apple's Python script.
@@ -98,7 +99,7 @@ Once you have downloaded a snapshot of the model, you can test it using Apple's
 python -m python_coreml_stable_diffusion.pipeline --prompt "a photo of an astronaut riding a horse on mars" -i models/coreml-stable-diffusion-v1-4_original_packages -o </path/to/output/image> --compute-unit CPU_AND_GPU --seed 93
 ```

-Pass the path of the downloaded checkpoint with `-i` flag to the script. `--compute-unit` indicates the hardware you want to allow for inference. It must be one of the following options: `ALL`, `CPU_AND_GPU`, `CPU_ONLY`, `CPU_AND_NE`. You may also provide an optional output path, and a seed for reproducibility.
+`<output-mlpackages-directory>` should point to the checkpoint you downloaded in the step above, and `--compute-unit` indicates the hardware you want to allow for inference. It must be one of the following options: `ALL`, `CPU_AND_GPU`, `CPU_ONLY`, `CPU_AND_NE`. You may also provide an optional output path, and a seed for reproducibility.

 The inference script assumes you're using the original version of the Stable Diffusion model, `CompVis/stable-diffusion-v1-4`. If you use another model, you *have* to specify its Hub id in the inference command line, using the `--model-version` option. This works for models already supported and custom models you trained or fine-tuned yourself.

@@ -108,6 +109,7 @@ For example, if you want to use [`runwayml/stable-diffusion-v1-5`](https://huggi
 python -m python_coreml_stable_diffusion.pipeline --prompt "a photo of an astronaut riding a horse on mars" --compute-unit ALL -o output --seed 93 -i models/coreml-stable-diffusion-v1-5_original_packages --model-version runwayml/stable-diffusion-v1-5
 ```

+
 ## Core ML inference in Swift

 Running inference in Swift is slightly faster than in Python because the models are already compiled in the `mlmodelc` format. This is noticeable on app startup when the model is loaded but shouldn’t be noticeable if you run several generations afterward.
@@ -147,6 +149,7 @@ You have to specify in `--resource-path` one of the checkpoints downloaded in th

 For more details, please refer to the [instructions in Apple's repo](https://github.com/apple/ml-stable-diffusion).

+
 ## Supported Diffusers Features

 The Core ML models and inference code don't support many of the features, options, and flexibility of 🧨 Diffusers. These are some of the limitations to keep in mind:
@@ -155,10 +158,10 @@ The Core ML models and inference code don't support many of the features, option
 - Only two schedulers have been ported to Swift, the default one used by Stable Diffusion and `DPMSolverMultistepScheduler`, which we ported to Swift from our `diffusers` implementation. We recommend you use `DPMSolverMultistepScheduler`, since it produces the same quality in about half the steps.
 - Negative prompts, classifier-free guidance scale, and image-to-image tasks are available in the inference code. Advanced features such as depth guidance, ControlNet, and latent upscalers are not available yet.

-Apple's [conversion and inference repo](https://github.com/apple/ml-stable-diffusion) and our own [swift-coreml-diffusers](https://github.com/huggingface/swift-coreml-diffusers) repos are intended as technology demonstrators to enable other developers to build upon.
+Apple's [conversion and inference repo](https://github.com/apple/ml-stable-diffusion) and our own [swift-coreml-diffusers](https://github.com/huggingface/swift-coreml-diffusers) repos are intended as technology demonstrators to enable other developers to build upon. 

-If you feel strongly about any missing features, please feel free to open a feature request or, better yet, a contribution PR 🙂.
+If you feel strongly about any missing features, please feel free to open a feature request or, better yet, a contribution PR :)

 ## Native Diffusers Swift app

-One easy way to run Stable Diffusion on your own Apple hardware is to use [our open-source Swift repo](https://github.com/huggingface/swift-coreml-diffusers), based on `diffusers` and Apple's conversion and inference repo. You can study the code, compile it with [Xcode](https://developer.apple.com/xcode/) and adapt it for your own needs. For your convenience, there's also a [standalone Mac app in the App Store](https://apps.apple.com/app/diffusers/id1666309574), so you can play with it without having to deal with the code or IDE. If you are a developer and have determined that Core ML is the best solution to build your Stable Diffusion app, then you can use the rest of this guide to get started with your project. We can't wait to see what you'll build 🙂.
+One easy way to run Stable Diffusion on your own Apple hardware is to use [our open-source Swift repo](https://github.com/huggingface/swift-coreml-diffusers), based on `diffusers` and Apple's conversion and inference repo. You can study the code, compile it with [Xcode](https://developer.apple.com/xcode/) and adapt it for your own needs. For your convenience, there's also a [standalone Mac app in the App Store](https://apps.apple.com/app/diffusers/id1666309574), so you can play with it without having to deal with the code or IDE. If you are a developer and have determined that Core ML is the best solution to build your Stable Diffusion app, then you can use the rest of this guide to get started with your project. We can't wait to see what you'll build :)
--- a/docs/source/en/optimization/fp16.md
+++ b/docs/source/en/optimization/fp16.md
@@ -12,7 +12,7 @@ specific language governing permissions and limitations under the License.

 # Speed up inference

-There are several ways to optimize 🤗 Diffusers for inference speed. As a general rule of thumb, we recommend using either [xFormers](xformers) or `torch.nn.functional.scaled_dot_product_attention` in PyTorch 2.0 for their memory-efficient attention.
+There are several ways to optimize 🤗 Diffusers for inference speed. As a general rule of thumb, we recommend using either [xFormers](xformers) or `torch.nn.functional.scaled_dot_product_attention` in PyTorch 2.0 for their memory-efficient attention. 

 <Tip>

@@ -64,5 +64,5 @@ image = pipe(prompt).images[0]
 <Tip warning={true}>

 Don't use [`torch.autocast`](https://pytorch.org/docs/stable/amp.html#torch.autocast) in any of the pipelines as it can lead to black images and is always slower than pure float16 precision.
-
-</Tip>
+  
+</Tip>
--- a/docs/source/en/optimization/habana.md
+++ b/docs/source/en/optimization/habana.md
@@ -55,7 +55,8 @@ outputs = pipeline(
 )
 ```

-For more information, check out 🤗 Optimum Habana's [documentation](https://huggingface.co/docs/optimum/habana/usage_guides/stable_diffusion) and the [example](https://github.com/huggingface/optimum-habana/tree/main/examples/stable-diffusion) provided in the official GitHub repository.
+For more information, check out 🤗 Optimum Habana's [documentation](https://huggingface.co/docs/optimum/habana/usage_guides/stable_diffusion) and the [example](https://github.com/huggingface/optimum-habana/tree/main/examples/stable-diffusion) provided in the official Github repository.
+

 ## Benchmark

--- a/docs/source/en/optimization/memory.md
+++ b/docs/source/en/optimization/memory.md
@@ -1,15 +1,3 @@
-<!--Copyright 2023 The HuggingFace Team. All rights reserved.
-
-Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
-the License. You may obtain a copy of the License at
-
-http://www.apache.org/licenses/LICENSE-2.0
-
-Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
-an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
-specific language governing permissions and limitations under the License.
-->
-
 # Reduce memory usage

 A barrier to using diffusion models is the large amount of memory required. To overcome this challenge, there are several memory-reducing techniques you can use to run even some of the largest models on free-tier or consumer GPUs. Some of these techniques can even be combined to further reduce memory usage.
@@ -30,9 +18,10 @@ The results below are obtained from generating a single 512x512 image from the p
 | traced UNet      | 3.21s   | x2.96   |
 | memory-efficient attention  | 2.63s  | x3.61   |

+
 ## Sliced VAE

-Sliced VAE enables decoding large batches of images with limited VRAM or batches with 32 images or more by decoding the batches of latents one image at a time. You'll likely want to couple this with [`~ModelMixin.enable_xformers_memory_efficient_attention`] to reduce memory use further if you have xFormers installed.
+Sliced VAE enables decoding large batches of images with limited VRAM or batches with 32 images or more by decoding the batches of latents one image at a time. You'll likely want to couple this with [`~ModelMixin.enable_xformers_memory_efficient_attention`] to further reduce memory use.

 To use sliced VAE, call [`~StableDiffusionPipeline.enable_vae_slicing`] on your pipeline before inference:

@@ -49,7 +38,6 @@ pipe = pipe.to("cuda")

 prompt = "a photo of an astronaut riding a horse on mars"
 pipe.enable_vae_slicing()
-#pipe.enable_xformers_memory_efficient_attention()
 images = pipe([prompt] * 32).images
 ```

@@ -57,7 +45,7 @@ You may see a small performance boost in VAE decoding on multi-image batches, an

 ## Tiled VAE

-Tiled VAE processing also enables working with large images on limited VRAM (for example, generating 4k images on 8GB of VRAM) by splitting the image into overlapping tiles, decoding the tiles, and then blending the outputs together to compose the final image. You should also used tiled VAE with [`~ModelMixin.enable_xformers_memory_efficient_attention`] to reduce memory use further if you have xFormers installed.
+Tiled VAE processing also enables working with large images on limited VRAM (for example, generating 4k images on 8GB of VRAM) by splitting the image into overlapping tiles, decoding the tiles, and then blending the outputs together to compose the final image. You should also used tiled VAE with [`~ModelMixin.enable_xformers_memory_efficient_attention`] to further reduce memory use.

 To use tiled VAE processing, call [`~StableDiffusionPipeline.enable_vae_tiling`] on your pipeline before inference:

@@ -74,7 +62,7 @@ pipe.scheduler = UniPCMultistepScheduler.from_config(pipe.scheduler.config)
 pipe = pipe.to("cuda")
 prompt = "a beautiful landscape photograph"
 pipe.enable_vae_tiling()
-#pipe.enable_xformers_memory_efficient_attention()
+pipe.enable_xformers_memory_efficient_attention()

 image = pipe([prompt], width=3840, height=2224, num_inference_steps=20).images[0]
 ```
@@ -110,6 +98,24 @@ Consider using [model offloading](#model-offloading) if you want to optimize for

 </Tip>

+CPU offloading can also be chained with attention slicing to reduce memory consumption to less than 2GB.
+
+```Python
+import torch
+from diffusers import StableDiffusionPipeline
+
+pipe = StableDiffusionPipeline.from_pretrained(
+    "runwayml/stable-diffusion-v1-5",
+    torch_dtype=torch.float16,
+    use_safetensors=True,
+)
+
+prompt = "a photo of an astronaut riding a horse on mars"
+pipe.enable_sequential_cpu_offload()
+
+image = pipe(prompt).images[0]
+```
+
 <Tip warning={true}>

 When using [`~StableDiffusionPipeline.enable_sequential_cpu_offload`], don't move the pipeline to CUDA beforehand or else the gain in memory consumption will only be minimal (see this [issue](https://github.com/huggingface/diffusers/issues/1934) for more information).
@@ -140,7 +146,7 @@ import torch
 from diffusers import StableDiffusionPipeline

 pipe = StableDiffusionPipeline.from_pretrained(
-    "runwayml/stable-diffusion-v1-5",
+    "runwayml/stable-diffusion-v1-5",  
    torch_dtype=torch.float16,
    use_safetensors=True,
 )
@@ -150,9 +156,28 @@ pipe.enable_model_cpu_offload()
 image = pipe(prompt).images[0]
 ```

+Model offloading can also be combined with attention slicing for additional memory savings.
+
+```Python
+import torch
+from diffusers import StableDiffusionPipeline
+
+pipe = StableDiffusionPipeline.from_pretrained(
+    "runwayml/stable-diffusion-v1-5",
+    torch_dtype=torch.float16,
+    use_safetensors=True,
+)
+
+prompt = "a photo of an astronaut riding a horse on mars"
+pipe.enable_model_cpu_offload()
+
+image = pipe(prompt).images[0]
+```
+
 <Tip warning={true}>

-In order to properly offload models after they're called, it is required to run the entire pipeline and models are called in the pipeline's expected order. Exercise caution if models are reused outside the context of the pipeline after hooks have been installed. See [Removing Hooks](https://huggingface.co/docs/accelerate/en/package_reference/big_modeling#accelerate.hooks.remove_hook_from_module) for more information.
+In order to properly offload models after they're called, it is required to run the entire pipeline and models are called in the pipeline's expected order. Exercise caution if models are reused outside the context of the pipeline after hooks have been installed. See [Removing Hooks](https://huggingface.co/docs/accelerate/en/package_reference/big_modeling#accelerate.hooks.remove_hook_from_module)
+for more information.

 [`~StableDiffusionPipeline.enable_model_cpu_offload`] is a stateful operation that installs hooks on the models and state on the pipeline.

@@ -278,7 +303,7 @@ unet_traced = torch.jit.load("unet_traced.pt")
 class TracedUNet(torch.nn.Module):
    def __init__(self):
        super().__init__()
-        self.in_channels = pipe.unet.config.in_channels
+        self.in_channels = pipe.unet.in_channels
        self.device = pipe.unet.device

    def forward(self, latent_model_input, t, encoder_hidden_states):
@@ -294,7 +319,7 @@ with torch.inference_mode():

 ## Memory-efficient attention

-Recent work on optimizing bandwidth in the attention block has generated huge speed-ups and reductions in GPU memory usage. The most recent type of memory-efficient attention is [Flash Attention](https://arxiv.org/abs/2205.14135) (you can check out the original code at [HazyResearch/flash-attention](https://github.com/HazyResearch/flash-attention)).
+Recent work on optimizing bandwidth in the attention block has generated huge speed-ups and reductions in GPU memory usage. The most recent type of memory-efficient attention is [Flash Attention](https://arxiv.org/pdf/2205.14135.pdf) (you can check out the original code at [HazyResearch/flash-attention](https://github.com/HazyResearch/flash-attention)).

 <Tip>

@@ -329,4 +354,4 @@ with torch.inference_mode():
 # pipe.disable_xformers_memory_efficient_attention()
 ```

-The iteration speed when using `xformers` should match the iteration speed of PyTorch 2.0 as described [here](torch2.0).
+The iteration speed when using `xformers` should match the iteration speed of Torch 2.0 as described [here](torch2.0).
--- a/docs/source/en/optimization/mps.md
+++ b/docs/source/en/optimization/mps.md
@@ -31,8 +31,6 @@ pipe = pipe.to("mps")
 pipe.enable_attention_slicing()

 prompt = "a photo of an astronaut riding a horse on mars"
-image = pipe(prompt).images[0]
-image
 ```

 <Tip warning={true}>
@@ -50,10 +48,10 @@ If you're using **PyTorch 1.13**, you need to "prime" the pipeline with an addit
  pipe.enable_attention_slicing()

  prompt = "a photo of an astronaut riding a horse on mars"
-  # First-time "warmup" pass if PyTorch version is 1.13
+# First-time "warmup" pass if PyTorch version is 1.13
 + _ = pipe(prompt, num_inference_steps=1)

-  # Results match those from the CPU device after the warmup pass.
+# Results match those from the CPU device after the warmup pass.
  image = pipe(prompt).images[0]
 ```

@@ -65,7 +63,6 @@ To prevent this from happening, we recommend *attention slicing* to reduce memor

 ```py
 from diffusers import DiffusionPipeline
-import torch

 pipeline = DiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5", torch_dtype=torch.float16, variant="fp16", use_safetensors=True).to("mps")
 pipeline.enable_attention_slicing()
--- a/docs/source/en/optimization/onnx.md
+++ b/docs/source/en/optimization/onnx.md
@@ -10,12 +10,13 @@ an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express o
 specific language governing permissions and limitations under the License.
 -->

+
 # ONNX Runtime

 🤗 [Optimum](https://github.com/huggingface/optimum) provides a Stable Diffusion pipeline compatible with ONNX Runtime. You'll need to install 🤗 Optimum with the following command for ONNX Runtime support:

 ```bash
-pip install -q optimum["onnxruntime"]
+pip install optimum["onnxruntime"]
 ```

 This guide will show you how to use the Stable Diffusion and Stable Diffusion XL (SDXL) pipelines with ONNX Runtime.
@@ -49,7 +50,7 @@ optimum-cli export onnx --model runwayml/stable-diffusion-v1-5 sd_v15_onnx/

 Then to perform inference (you don't have to specify `export=True` again):

-```python
+```python 
 from optimum.onnxruntime import ORTStableDiffusionPipeline

 model_id = "sd_v15_onnx"
--- a/docs/source/en/optimization/open_vino.md
+++ b/docs/source/en/optimization/open_vino.md
@@ -10,13 +10,14 @@ an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express o
 specific language governing permissions and limitations under the License.
 -->

+
 # OpenVINO

-🤗 [Optimum](https://github.com/huggingface/optimum-intel) provides Stable Diffusion pipelines compatible with OpenVINO to perform inference on a variety of Intel processors (see the [full list](https://docs.openvino.ai/latest/openvino_docs_OV_UG_supported_plugins_Supported_Devices.html) of supported devices).
+🤗 [Optimum](https://github.com/huggingface/optimum-intel) provides Stable Diffusion pipelines compatible with OpenVINO to perform inference on a variety of Intel processors (see the [full list]((https://docs.openvino.ai/latest/openvino_docs_OV_UG_supported_plugins_Supported_Devices.html)) of supported devices).

 You'll need to install 🤗 Optimum Intel with the `--upgrade-strategy eager` option to ensure [`optimum-intel`](https://github.com/huggingface/optimum-intel) is using the latest version:

-```bash
+```
 pip install --upgrade-strategy eager optimum["openvino"]
 ```

--- a/docs/source/en/optimization/tome.md
+++ b/docs/source/en/optimization/tome.md
@@ -14,25 +14,18 @@ specific language governing permissions and limitations under the License.

 [Token merging](https://huggingface.co/papers/2303.17604) (ToMe) merges redundant tokens/patches progressively in the forward pass of a Transformer-based network which can speed-up the inference latency of [`StableDiffusionPipeline`].

-Install ToMe from `pip`:
-
-```bash
-pip install tomesd
-```
-
 You can use ToMe from the [`tomesd`](https://github.com/dbolya/tomesd) library with the [`apply_patch`](https://github.com/dbolya/tomesd?tab=readme-ov-file#usage) function:

 ```diff
-  from diffusers import StableDiffusionPipeline
-  import torch
-  import tomesd
+from diffusers import StableDiffusionPipeline
+import tomesd

-  pipeline = StableDiffusionPipeline.from_pretrained(
-        "runwayml/stable-diffusion-v1-5", torch_dtype=torch.float16, use_safetensors=True,
-  ).to("cuda")
+pipeline = StableDiffusionPipeline.from_pretrained(
+      "runwayml/stable-diffusion-v1-5", torch_dtype=torch.float16, use_safetensors=True,
+).to("cuda")
 + tomesd.apply_patch(pipeline, ratio=0.5)

-  image = pipeline("a photo of an astronaut riding a horse on mars").images[0]
+image = pipeline("a photo of an astronaut riding a horse on mars").images[0]
 ```

 The `apply_patch` function exposes a number of [arguments](https://github.com/dbolya/tomesd#usage) to help strike a balance between pipeline inference speed and the quality of the generated tokens. The most important argument is `ratio` which controls the number of tokens that are merged during the forward pass.
--- a/docs/source/en/optimization/torch2.0.md
+++ b/docs/source/en/optimization/torch2.0.md
@@ -10,7 +10,7 @@ an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express o
 specific language governing permissions and limitations under the License.
 -->

-# PyTorch 2.0
+# Torch 2.0

 🤗 Diffusers supports the latest optimizations from [PyTorch 2.0](https://pytorch.org/get-started/pytorch-2.0/) which include:

@@ -48,6 +48,7 @@ In some cases - such as making the pipeline more deterministic or converting it
 ```diff
  import torch
  from diffusers import DiffusionPipeline
+  from diffusers.models.attention_processor import AttnProcessor

  pipe = DiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5", torch_dtype=torch.float16, use_safetensors=True).to("cuda")
 + pipe.unet.set_default_attn_processor()
@@ -109,14 +110,17 @@ for _ in range(3):

 ### Stable Diffusion image-to-image

-```python
+```python 
 from diffusers import StableDiffusionImg2ImgPipeline
-from diffusers.utils import load_image
+import requests
 import torch
+from PIL import Image
+from io import BytesIO

 url = "https://raw.githubusercontent.com/CompVis/stable-diffusion/main/assets/stable-samples/img2img/sketch-mountains-input.jpg"

-init_image = load_image(url)
+response = requests.get(url)
+init_image = Image.open(BytesIO(response.content)).convert("RGB")
 init_image = init_image.resize((512, 512))

 path = "runwayml/stable-diffusion-v1-5"
@@ -139,16 +143,25 @@ for _ in range(3):

 ### Stable Diffusion inpainting

-```python
+```python 
 from diffusers import StableDiffusionInpaintPipeline
-from diffusers.utils import load_image
+import requests
 import torch
+from PIL import Image
+from io import BytesIO
+
+url = "https://raw.githubusercontent.com/CompVis/stable-diffusion/main/assets/stable-samples/img2img/sketch-mountains-input.jpg"
+
+def download_image(url):
+    response = requests.get(url)
+    return Image.open(BytesIO(response.content)).convert("RGB")
+

 img_url = "https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpainting_examples/overture-creations-5sI6fQgYIuo.png"
 mask_url = "https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpainting_examples/overture-creations-5sI6fQgYIuo_mask.png"

-init_image = load_image(img_url).resize((512, 512))
-mask_image = load_image(mask_url).resize((512, 512))
+init_image = download_image(img_url).resize((512, 512))
+mask_image = download_image(mask_url).resize((512, 512))

 path = "runwayml/stable-diffusion-inpainting"

@@ -170,14 +183,17 @@ for _ in range(3):

 ### ControlNet

-```python
+```python 
 from diffusers import StableDiffusionControlNetPipeline, ControlNetModel
-from diffusers.utils import load_image
+import requests
 import torch
+from PIL import Image
+from io import BytesIO

 url = "https://raw.githubusercontent.com/CompVis/stable-diffusion/main/assets/stable-samples/img2img/sketch-mountains-input.jpg"

-init_image = load_image(url)
+response = requests.get(url)
+init_image = Image.open(BytesIO(response.content)).convert("RGB")
 init_image = init_image.resize((512, 512))

 path = "runwayml/stable-diffusion-v1-5"
@@ -205,26 +221,26 @@ for _ in range(3):

 ### DeepFloyd IF text-to-image + upscaling

-```python
+```python 
 from diffusers import DiffusionPipeline
 import torch

 run_compile = True  # Set True / False

-pipe_1 = DiffusionPipeline.from_pretrained("DeepFloyd/IF-I-M-v1.0", variant="fp16", text_encoder=None, torch_dtype=torch.float16, use_safetensors=True)
-pipe_1.to("cuda")
+pipe = DiffusionPipeline.from_pretrained("DeepFloyd/IF-I-M-v1.0", variant="fp16", text_encoder=None, torch_dtype=torch.float16, use_safetensors=True)
+pipe.to("cuda")
 pipe_2 = DiffusionPipeline.from_pretrained("DeepFloyd/IF-II-M-v1.0", variant="fp16", text_encoder=None, torch_dtype=torch.float16, use_safetensors=True)
 pipe_2.to("cuda")
 pipe_3 = DiffusionPipeline.from_pretrained("stabilityai/stable-diffusion-x4-upscaler", torch_dtype=torch.float16, use_safetensors=True)
 pipe_3.to("cuda")


-pipe_1.unet.to(memory_format=torch.channels_last)
+pipe.unet.to(memory_format=torch.channels_last)
 pipe_2.unet.to(memory_format=torch.channels_last)
 pipe_3.unet.to(memory_format=torch.channels_last)

 if run_compile:
-    pipe_1.unet = torch.compile(pipe_1.unet, mode="reduce-overhead", fullgraph=True)
+    pipe.unet = torch.compile(pipe.unet, mode="reduce-overhead", fullgraph=True)
    pipe_2.unet = torch.compile(pipe_2.unet, mode="reduce-overhead", fullgraph=True)
    pipe_3.unet = torch.compile(pipe_3.unet, mode="reduce-overhead", fullgraph=True)

@@ -234,9 +250,9 @@ prompt_embeds = torch.randn((1, 2, 4096), dtype=torch.float16)
 neg_prompt_embeds = torch.randn((1, 2, 4096), dtype=torch.float16)

 for _ in range(3):
-    image_1 = pipe_1(prompt_embeds=prompt_embeds, negative_prompt_embeds=neg_prompt_embeds, output_type="pt").images
-    image_2 = pipe_2(image=image_1, prompt_embeds=prompt_embeds, negative_prompt_embeds=neg_prompt_embeds, output_type="pt").images
-    image_3 = pipe_3(prompt=prompt, image=image_1, noise_level=100).images
+    image = pipe(prompt_embeds=prompt_embeds, negative_prompt_embeds=neg_prompt_embeds, output_type="pt").images
+    image_2 = pipe_2(image=image, prompt_embeds=prompt_embeds, negative_prompt_embeds=neg_prompt_embeds, output_type="pt").images
+    image_3 = pipe_3(prompt=prompt, image=image, noise_level=100).images
 ```
 </details>

@@ -410,9 +426,9 @@ In the following tables, we report our findings in terms of the *number of itera
 | IF | 9.26 | 9.2 | ❌ | 13.31 |
 | SDXL - txt2img | 0.52 | 0.53 | - | - |

-## Notes
+## Notes 

-* Follow this [PR](https://github.com/huggingface/diffusers/pull/3313) for more details on the environment used for conducting the benchmarks.
+* Follow this [PR](https://github.com/huggingface/diffusers/pull/3313) for more details on the environment used for conducting the benchmarks. 
 * For the DeepFloyd IF pipeline where batch sizes > 1, we only used a batch size of > 1 in the first IF pipeline for text-to-image generation and NOT for upscaling. That means the two upscaling pipelines received a batch size of 1.

 *Thanks to [Horace He](https://github.com/Chillee) from the PyTorch team for their support in improving our support of `torch.compile()` in Diffusers.*
--- a/docs/source/en/quicktour.md
+++ b/docs/source/en/quicktour.md
@@ -257,7 +257,7 @@ To predict a slightly less noisy image, pass the following to the scheduler's [`
 torch.Size([1, 3, 256, 256])
 ```

-The `less_noisy_sample` can be passed to the next `timestep` where it'll get even less noisy! Let's bring it all together now and visualize the entire denoising process.
+The `less_noisy_sample` can be passed to the next `timestep` where it'll get even less noisy! Let's bring it all together now and visualize the entire denoising process. 

 First, create a function that postprocesses and displays the denoised image as a `PIL.Image`:

--- a/docs/source/en/stable_diffusion.md
+++ b/docs/source/en/stable_diffusion.md
@@ -9,12 +9,12 @@ Unless required by applicable law or agreed to in writing, software distributed
 an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
 specific language governing permissions and limitations under the License.
 -->
-
+                                                               
 # Effective and efficient diffusion

 [[open-in-colab]]

-Getting the [`DiffusionPipeline`] to generate images in a certain style or include what you want can be tricky. Often times, you have to run the [`DiffusionPipeline`] several times before you end up with an image you're happy with. But generating something out of nothing is a computationally intensive process, especially if you're running inference over and over again.
+Getting the [`DiffusionPipeline`] to generate images in a certain style or include what you want can be tricky. Often times, you have to run the [`DiffusionPipeline`] several times before you end up with an image you're happy with. But generating something out of nothing is a computationally intensive process, especially if you're running inference over and over again. 

 This is why it's important to get the most *computational* (speed) and *memory* (GPU vRAM) efficiency from the pipeline to reduce the time between inference cycles so you can iterate faster.

@@ -68,7 +68,7 @@ image
    <img src="https://huggingface.co/datasets/diffusers/docs-images/resolve/main/stable_diffusion_101/sd_101_1.png">
 </div>

-This process took ~30 seconds on a T4 GPU (it might be faster if your allocated GPU is better than a T4). By default, the [`DiffusionPipeline`] runs inference with full `float32` precision for 50 inference steps. You can speed this up by switching to a lower precision like `float16` or running fewer inference steps.
+This process took ~30 seconds on a T4 GPU (it might be faster if your allocated GPU is better than a T4). By default, the [`DiffusionPipeline`] runs inference with full `float32` precision for 50 inference steps. You can speed this up by switching to a lower precision like `float16` or running fewer inference steps. 

 Let's start by loading the model in `float16` and generate an image:

--- a/docs/source/en/training/lora.md
+++ b/docs/source/en/training/lora.md
@@ -113,15 +113,14 @@ Load the LoRA weights from your finetuned model *on top of the base model weight
 ```py
 >>> pipe.unet.load_attn_procs(lora_model_path)
 >>> pipe.to("cuda")
-
 # use half the weights from the LoRA finetuned model and half the weights from the base model
+
 >>> image = pipe(
 ...     "A pokemon with blue eyes.", num_inference_steps=25, guidance_scale=7.5, cross_attention_kwargs={"scale": 0.5}
 ... ).images[0]
+# use the weights from the fully finetuned LoRA model

-# OR, use the weights from the fully finetuned LoRA model
-# >>> image = pipe("A pokemon with blue eyes.", num_inference_steps=25, guidance_scale=7.5).images[0]
-
+>>> image = pipe("A pokemon with blue eyes.", num_inference_steps=25, guidance_scale=7.5).images[0]
 >>> image.save("blue_pokemon.png")
 ```

@@ -226,18 +225,17 @@ Load the LoRA weights from your finetuned DreamBooth model *on top of the base m
 ```py
 >>> pipe.unet.load_attn_procs(lora_model_path)
 >>> pipe.to("cuda")
-
 # use half the weights from the LoRA finetuned model and half the weights from the base model
+
 >>> image = pipe(
 ...     "A picture of a sks dog in a bucket.",
 ...     num_inference_steps=25,
 ...     guidance_scale=7.5,
 ...     cross_attention_kwargs={"scale": 0.5},
 ... ).images[0]
+# use the weights from the fully finetuned LoRA model

-# OR, use the weights from the fully finetuned LoRA model
-# >>> image = pipe("A picture of a sks dog in a bucket.", num_inference_steps=25, guidance_scale=7.5).images[0]
-
+>>> image = pipe("A picture of a sks dog in a bucket.", num_inference_steps=25, guidance_scale=7.5).images[0]
 >>> image.save("bucket-dog.png")
 ```

--- a/docs/source/en/using-diffusers/callback.md
+++ b/docs/source/en/using-diffusers/callback.md
@@ -10,18 +10,18 @@ an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express o
 specific language governing permissions and limitations under the License.
 -->

-# Using callback
+# Using callback 

 [[open-in-colab]]

-Most 🤗 Diffusers pipelines now accept a `callback_on_step_end` argument that allows you to change the default behavior of denoising loop with custom defined functions. Here is an example of a callback function we can write to disable classifier-free guidance after 40% of inference steps to save compute with a minimum tradeoff in performance.
+Most 🤗 Diffusers pipeline now accept a `callback_on_step_end` argument that allows you to change the default behavior of denoising loop with custom defined functions. Here is an example of a callback function we can write to disable classifier free guidance after 40% of inference steps to save compute with minimum tradeoff in performance.

 ```python
-def callback_dynamic_cfg(pipe, step_index, timestep, callback_kwargs):
+def callback_dynamic_cfg(pipe, step_index, timestep, callback_kwargs):    
        # adjust the batch_size of prompt_embeds according to guidance_scale
        if step_index == int(pipe.num_timestep * 0.4):
                prompt_embeds = callback_kwargs["prompt_embeds"]
-                prompt_embeds = prompt_embeds.chunk(2)[-1]
+                prompt_embeds =prompt_embeds.chunk(2)[-1]

        # update guidance_scale and prompt_embeds
        pipe._guidance_scale = 0.0
@@ -34,9 +34,9 @@ Your callback function has below arguments:
 * `step_index` and `timestep` tell you where you are in the denoising loop. In our example, we use `step_index` to decide when to turn off CFG.
 * `callback_kwargs` is a dict that contains tensor variables you can modify during the denoising loop. It only includes variables specified in the `callback_on_step_end_tensor_inputs` argument passed to the pipeline's `__call__` method. Different pipelines may use different sets of variables so please check the pipeline class's `_callback_tensor_inputs` attribute for the list of variables that you can modify. Common variables include `latents` and `prompt_embeds`. In our example, we need to adjust the batch size of `prompt_embeds` after setting `guidance_scale` to be `0` in order for it to work properly.

-You can pass the callback function as `callback_on_step_end` argument to the pipeline along with `callback_on_step_end_tensor_inputs`.
+You can pass the callback function as `callback_on_step_end` argument to the pipeline along with `callback_on_step_end_tensor_inputs`. 

-```python
+```
 import torch
 from diffusers import StableDiffusionPipeline

@@ -46,7 +46,7 @@ pipe = pipe.to("cuda")
 prompt = "a photo of an astronaut riding a horse on mars"

 generator = torch.Generator(device="cuda").manual_seed(1)
-out = pipe(prompt, generator=generator, callback_on_step_end=callback_custom_cfg, callback_on_step_end_tensor_inputs=['prompt_embeds'])
+out= pipe(prompt, generator=generator, callback_on_step_end = callback_custom_cfg, callback_on_step_end_tensor_inputs=['prompt_embeds'])

 out.images[0].save("out_custom_cfg.png")
 ```
@@ -55,6 +55,6 @@ Your callback function will be executed at the end of each denoising step and mo

 <Tip>

-Currently we only support `callback_on_step_end`. If you have a solid use case and require a callback function with a different execution point, please open a [Feature Request](https://github.com/huggingface/diffusers/issues/new?assignees=&labels=&projects=&template=feature_request.md&title=) so we can add it!
+Currently we only support `callback_on_step_end`. If you have a solid use case and require a callback function with a different execution point, please open an [feature request](https://github.com/huggingface/diffusers/issues/new/choose) so we can add it!

-</Tip>
+</Tip>
--- a/docs/source/en/using-diffusers/stable_diffusion_jax_how_to.md
+++ b/docs/source/en/using-diffusers/stable_diffusion_jax_how_to.md
@@ -38,20 +38,25 @@ device_type = jax.devices()[0].device_kind

 print(f"Found {num_devices} JAX devices of type {device_type}.")
 assert (
-    "TPU" in device_type,
-    "Available device is not a TPU, please select TPU from Runtime > Change runtime type > Hardware accelerator"
+    "TPU" in device_type, 
+    "Available device is not a TPU, please select TPU from Edit > Notebook settings > Hardware accelerator"
 )
-# Found 8 JAX devices of type Cloud TPU.
+"Found 8 JAX devices of type Cloud TPU."
 ```

 Great, now you can import the rest of the dependencies you'll need:

 ```python
+import numpy as np
 import jax.numpy as jnp
+
+from pathlib import Path
 from jax import pmap
 from flax.jax_utils import replicate
 from flax.training.common_utils import shard
+from PIL import Image

+from huggingface_hub import notebook_login
 from diffusers import FlaxStableDiffusionPipeline
 ```

@@ -85,7 +90,7 @@ prompt = "A cinematic film still of Morgan Freeman starring as Jimi Hendrix, por
 prompt = [prompt] * jax.device_count()
 prompt_ids = pipeline.prepare_inputs(prompt)
 prompt_ids.shape
-# (8, 77)
+"(8, 77)"
 ```

 Model parameters and inputs have to be replicated across the 8 parallel devices. The parameters dictionary is replicated with [`flax.jax_utils.replicate`](https://flax.readthedocs.io/en/latest/api_reference/flax.jax_utils.html#flax.jax_utils.replicate) which traverses the dictionary and changes the shape of the weights so they are repeated 8 times. Arrays are replicated using `shard`.
@@ -97,7 +102,7 @@ p_params = replicate(params)
 # arrays
 prompt_ids = shard(prompt_ids)
 prompt_ids.shape
-# (8, 1, 77)
+"(8, 1, 77)"
 ```

 This shape means each one of the 8 devices receives as an input a `jnp` array with shape `(1, 77)`, where `1` is the batch size per device. On TPUs with sufficient memory, you could have a batch size larger than `1` if you want to generate multiple images (per chip) at once.
@@ -122,7 +127,7 @@ To take advantage of JAX's optimized speed on a TPU, pass `jit=True` to the pipe

 <Tip warning={true}>

-You need to ensure all your inputs have the same shape in subsequent calls, otherwise JAX will need to recompile the code which is slower.
+You need to ensure all your inputs have the same shape in subsequent calls, other JAX will need to recompile the code which is slower.

 </Tip>

@@ -132,18 +137,18 @@ The first inference run takes more time because it needs to compile the code, bu
 %%time
 images = pipeline(prompt_ids, p_params, rng, jit=True)[0]

-# CPU times: user 56.2 s, sys: 42.5 s, total: 1min 38s
-# Wall time: 1min 29s
+"CPU times: user 56.2 s, sys: 42.5 s, total: 1min 38s"
+"Wall time: 1min 29s"
 ```

 The returned array has shape `(8, 1, 512, 512, 3)` which should be reshaped to remove the second dimension and get 8 images of `512 × 512 × 3`. Then you can use the [`~utils.numpy_to_pil`] function to convert the arrays into images.

 ```python
-from diffusers.utils import make_image_grid
+from diffusers import make_image_grid

 images = images.reshape((images.shape[0] * images.shape[1],) + images.shape[-3:])
 images = pipeline.numpy_to_pil(images)
-make_image_grid(images, rows=2, cols=4)
+make_image_grid(images, 2, 4)
 ```

 ![img](https://huggingface.co/datasets/YiYiXu/test-doc-assets/resolve/main/stable_diffusion_jax_how_to_cell_38_output_0.jpeg)
@@ -176,6 +181,7 @@ make_image_grid(images, 2, 4)

 ![img](https://huggingface.co/datasets/YiYiXu/test-doc-assets/resolve/main/stable_diffusion_jax_how_to_cell_43_output_0.jpeg)

+
 ## How does parallelization work?

 The Flax pipeline in 🤗 Diffusers automatically compiles the model and runs it in parallel on all available devices. Let's take a closer look at how that process works.
@@ -196,7 +202,7 @@ p_generate = pmap(pipeline._generate)
 After calling `pmap`, the prepared function `p_generate` will:

 1. Make a copy of the underlying function, `pipeline._generate`, on each device.
-2. Send each device a different portion of the input arguments (this is why it's necessary to call the *shard* function). In this case, `prompt_ids` has shape `(8, 1, 77, 768)` so the array is split into 8 and each copy of `_generate` receives an input with shape `(1, 77, 768)`.
+2. Send each device a different portion of the input arguments (this is why its necessary to call the *shard* function). In this case, `prompt_ids` has shape `(8, 1, 77, 768)` so the array is split into 8 and each copy of `_generate` receives an input with shape `(1, 77, 768)`.

 The most important thing to pay attention to here is the batch size (1 in this example), and the input dimensions that make sense for your code. You don't have to change anything else to make the code work in parallel.

@@ -206,14 +212,13 @@ The first time you call the pipeline takes more time, but the calls afterward ar
 %%time
 images = p_generate(prompt_ids, p_params, rng)
 images = images.block_until_ready()
-
-# CPU times: user 1min 15s, sys: 18.2 s, total: 1min 34s
-# Wall time: 1min 15s
+"CPU times: user 1min 15s, sys: 18.2 s, total: 1min 34s"
+"Wall time: 1min 15s"
 ```

 Check your image dimensions to see if they're correct:

 ```python
 images.shape
-# (8, 1, 512, 512, 3)
-```
+"(8, 1, 512, 512, 3)"
+```
--- a/examples/consistency_distillation/README.md
+++ b/examples/consistency_distillation/README.md
@@ -1,104 +0,0 @@
-# Latent Consistency Distillation Example:
-
-[Latent Consistency Models (LCMs)](https://arxiv.org/abs/2310.04378) is method to distill latent diffusion model to enable swift inference with minimal steps. This example demonstrates how to use the latent consistency distillation to distill stable-diffusion-v1.5 for less timestep inference.
-
-## Full model distillation
-
-### Running locally with PyTorch
-
-#### Installing the dependencies
-
-Before running the scripts, make sure to install the library's training dependencies:
-
-**Important**
-
-To make sure you can successfully run the latest versions of the example scripts, we highly recommend **installing from source** and keeping the install up to date as we update the example scripts frequently and install some example-specific requirements. To do this, execute the following steps in a new virtual environment:
-```bash
-git clone https://github.com/huggingface/diffusers
-cd diffusers
-pip install -e .
-```
-
-Then cd in the example folder and run
-```bash
-pip install -r requirements.txt
-```
-
-And initialize an [🤗Accelerate](https://github.com/huggingface/accelerate/) environment with:
-
-```bash
-accelerate config
-```
-
-Or for a default accelerate configuration without answering questions about your environment
-
-```bash
-accelerate config default
-```
-
-Or if your environment doesn't support an interactive shell e.g. a notebook
-
-```python
-from accelerate.utils import write_basic_config
-write_basic_config()
-```
-
-When running `accelerate config`, if we specify torch compile mode to True there can be dramatic speedups.
-
-
-#### Example with LAION-A6+ dataset
-
-```bash
-runwayml/stable-diffusion-v1-5
-PROGRAM="train_lcm_distill_sd_wds.py \
-    --pretrained_teacher_model=$MODEL_DIR \
-    --output_dir=$OUTPUT_DIR \
-    --mixed_precision=fp16 \
-    --resolution=512 \
-    --learning_rate=1e-6 --loss_type="huber" --ema_decay=0.95 --adam_weight_decay=0.0 \
-    --max_train_steps=1000 \
-    --max_train_samples=4000000 \
-    --dataloader_num_workers=8 \
-    --train_shards_path_or_url='pipe:aws s3 cp s3://muse-datasets/laion-aesthetic6plus-min512-data/{00000..01210}.tar -' \
-    --validation_steps=200 \
-    --checkpointing_steps=200 --checkpoints_total_limit=10 \
-    --train_batch_size=12 \
-    --gradient_checkpointing --enable_xformers_memory_efficient_attention \
-    --gradient_accumulation_steps=1 \
-    --use_8bit_adam \
-    --resume_from_checkpoint=latest \
-    --report_to=wandb \
-    --seed=453645634 \
-    --push_to_hub \
-```
-
-## LCM-LoRA
-
-Instead of fine-tuning the full model, we can also just train a LoRA that can be injected into any SDXL model.
-
-### Example with LAION-A6+ dataset
-    
-```bash
-runwayml/stable-diffusion-v1-5
-PROGRAM="train_lcm_distill_lora_sd_wds.py \
-    --pretrained_teacher_model=$MODEL_DIR \
-    --output_dir=$OUTPUT_DIR \
-    --mixed_precision=fp16 \
-    --resolution=512 \
-    --lora_rank=64 \
-    --learning_rate=1e-6 --loss_type="huber" --adam_weight_decay=0.0 \
-    --max_train_steps=1000 \
-    --max_train_samples=4000000 \
-    --dataloader_num_workers=8 \
-    --train_shards_path_or_url='pipe:aws s3 cp s3://muse-datasets/laion-aesthetic6plus-min512-data/{00000..01210}.tar -' \
-    --validation_steps=200 \
-    --checkpointing_steps=200 --checkpoints_total_limit=10 \
-    --train_batch_size=12 \
-    --gradient_checkpointing --enable_xformers_memory_efficient_attention \
-    --gradient_accumulation_steps=1 \
-    --use_8bit_adam \
-    --resume_from_checkpoint=latest \
-    --report_to=wandb \
-    --seed=453645634 \
-    --push_to_hub \
-```
--- a/examples/consistency_distillation/README_sdxl.md
+++ b/examples/consistency_distillation/README_sdxl.md
@@ -1,106 +0,0 @@
-# Latent Consistency Distillation Example:
-
-[Latent Consistency Models (LCMs)](https://arxiv.org/abs/2310.04378) is method to distill latent diffusion model to enable swift inference with minimal steps. This example demonstrates how to use the latent consistency distillation to distill SDXL for less timestep inference.
-
-## Full model distillation
-
-### Running locally with PyTorch
-
-#### Installing the dependencies
-
-Before running the scripts, make sure to install the library's training dependencies:
-
-**Important**
-
-To make sure you can successfully run the latest versions of the example scripts, we highly recommend **installing from source** and keeping the install up to date as we update the example scripts frequently and install some example-specific requirements. To do this, execute the following steps in a new virtual environment:
-```bash
-git clone https://github.com/huggingface/diffusers
-cd diffusers
-pip install -e .
-```
-
-Then cd in the example folder and run
-```bash
-pip install -r requirements.txt
-```
-
-And initialize an [🤗Accelerate](https://github.com/huggingface/accelerate/) environment with:
-
-```bash
-accelerate config
-```
-
-Or for a default accelerate configuration without answering questions about your environment
-
-```bash
-accelerate config default
-```
-
-Or if your environment doesn't support an interactive shell e.g. a notebook
-
-```python
-from accelerate.utils import write_basic_config
-write_basic_config()
-```
-
-When running `accelerate config`, if we specify torch compile mode to True there can be dramatic speedups.
-
-
-#### Example with LAION-A6+ dataset
-
-```bash
-export MODEL_DIR="stabilityai/stable-diffusion-xl-base-1.0"
-PROGRAM="train_lcm_distill_sdxl_wds.py \
-    --pretrained_teacher_model=$MODEL_DIR \
-    --pretrained_vae_model_name_or_path=madebyollin/sdxl-vae-fp16-fix \
-    --output_dir=$OUTPUT_DIR \
-    --mixed_precision=fp16 \
-    --resolution=1024 \
-    --learning_rate=1e-6 --loss_type="huber" --use_fix_crop_and_size --ema_decay=0.95 --adam_weight_decay=0.0 \
-    --max_train_steps=1000 \
-    --max_train_samples=4000000 \
-    --dataloader_num_workers=8 \
-    --train_shards_path_or_url='pipe:aws s3 cp s3://muse-datasets/laion-aesthetic6plus-min512-data/{00000..01210}.tar -' \
-    --validation_steps=200 \
-    --checkpointing_steps=200 --checkpoints_total_limit=10 \
-    --train_batch_size=12 \
-    --gradient_checkpointing --enable_xformers_memory_efficient_attention \
-    --gradient_accumulation_steps=1 \
-    --use_8bit_adam \
-    --resume_from_checkpoint=latest \
-    --report_to=wandb \
-    --seed=453645634 \
-    --push_to_hub \
-```
-
-## LCM-LoRA
-
-Instead of fine-tuning the full model, we can also just train a LoRA that can be injected into any SDXL model.
-
-### Example with LAION-A6+ dataset
-    
-```bash
-export MODEL_DIR="stabilityai/stable-diffusion-xl-base-1.0"
-PROGRAM="train_lcm_distill_lora_sdxl_wds.py \
-    --pretrained_teacher_model=$MODEL_DIR \
-    --pretrained_vae_model_name_or_path=madebyollin/sdxl-vae-fp16-fix \
-    --output_dir=$OUTPUT_DIR \
-    --mixed_precision=fp16 \
-    --resolution=1024 \
-    --lora_rank=64 \
-    --learning_rate=1e-6 --loss_type="huber" --use_fix_crop_and_size --adam_weight_decay=0.0 \
-    --max_train_steps=1000 \
-    --max_train_samples=4000000 \
-    --dataloader_num_workers=8 \
-    --train_shards_path_or_url='pipe:aws s3 cp s3://muse-datasets/laion-aesthetic6plus-min512-data/{00000..01210}.tar -' \
-    --validation_steps=200 \
-    --checkpointing_steps=200 --checkpoints_total_limit=10 \
-    --train_batch_size=12 \
-    --gradient_checkpointing --enable_xformers_memory_efficient_attention \
-    --gradient_accumulation_steps=1 \
-    --use_8bit_adam \
-    --resume_from_checkpoint=latest \
-    --report_to=wandb \
-    --seed=453645634 \
-    --push_to_hub \
-```
--- a/examples/consistency_distillation/requirements.txt
+++ b/examples/consistency_distillation/requirements.txt
@@ -1,7 +0,0 @@
-accelerate>=0.16.0
-torchvision
-transformers>=4.25.1
-ftfy
-tensorboard
-Jinja2
-webdataset
--- a/examples/consistency_distillation/train_lcm_distill_lora_sd_wds.py
+++ b/examples/consistency_distillation/train_lcm_distill_lora_sd_wds.py
--- a/examples/consistency_distillation/train_lcm_distill_lora_sdxl_wds.py
+++ b/examples/consistency_distillation/train_lcm_distill_lora_sdxl_wds.py
--- a/examples/consistency_distillation/train_lcm_distill_sd_wds.py
+++ b/examples/consistency_distillation/train_lcm_distill_sd_wds.py
--- a/examples/consistency_distillation/train_lcm_distill_sdxl_wds.py
+++ b/examples/consistency_distillation/train_lcm_distill_sdxl_wds.py
--- a/examples/controlnet/train_controlnet.py
+++ b/examples/controlnet/train_controlnet.py
@@ -56,7 +56,7 @@ if is_wandb_available():
    import wandb

 # Will error if the minimal version of diffusers is not installed. Remove at your own risks.
-check_min_version("0.23.0.dev0")
+check_min_version("0.23.0")

 logger = get_logger(__name__)

--- a/examples/controlnet/train_controlnet_flax.py
+++ b/examples/controlnet/train_controlnet_flax.py
@@ -59,7 +59,7 @@ if is_wandb_available():
    import wandb

 # Will error if the minimal version of diffusers is not installed. Remove at your own risks.
-check_min_version("0.23.0.dev0")
+check_min_version("0.23.0")

 logger = logging.getLogger(__name__)

--- a/examples/controlnet/train_controlnet_sdxl.py
+++ b/examples/controlnet/train_controlnet_sdxl.py
@@ -58,7 +58,7 @@ if is_wandb_available():
    import wandb

 # Will error if the minimal version of diffusers is not installed. Remove at your own risks.
-check_min_version("0.23.0.dev0")
+check_min_version("0.23.0")

 logger = get_logger(__name__)

--- a/examples/custom_diffusion/train_custom_diffusion.py
+++ b/examples/custom_diffusion/train_custom_diffusion.py
@@ -62,7 +62,7 @@ from diffusers.utils.import_utils import is_xformers_available


 # Will error if the minimal version of diffusers is not installed. Remove at your own risks.
-check_min_version("0.23.0.dev0")
+check_min_version("0.23.0")

 logger = get_logger(__name__)

--- a/examples/dreambooth/train_dreambooth.py
+++ b/examples/dreambooth/train_dreambooth.py
@@ -61,7 +61,7 @@ if is_wandb_available():
    import wandb

 # Will error if the minimal version of diffusers is not installed. Remove at your own risks.
-check_min_version("0.23.0.dev0")
+check_min_version("0.23.0")

 logger = get_logger(__name__)

--- a/examples/dreambooth/train_dreambooth_flax.py
+++ b/examples/dreambooth/train_dreambooth_flax.py
@@ -35,7 +35,7 @@ from diffusers.utils import check_min_version


 # Will error if the minimal version of diffusers is not installed. Remove at your own risks.
-check_min_version("0.23.0.dev0")
+check_min_version("0.23.0")

 # Cache compiled models across invocations of this script.
 cc.initialize_cache(os.path.expanduser("~/.cache/jax/compilation_cache"))
--- a/examples/dreambooth/train_dreambooth_lora.py
+++ b/examples/dreambooth/train_dreambooth_lora.py
@@ -68,7 +68,7 @@ from diffusers.utils.import_utils import is_xformers_available


 # Will error if the minimal version of diffusers is not installed. Remove at your own risks.
-check_min_version("0.23.0.dev0")
+check_min_version("0.23.0")

 logger = get_logger(__name__)

--- a/examples/dreambooth/train_dreambooth_lora_sdxl.py
+++ b/examples/dreambooth/train_dreambooth_lora_sdxl.py
@@ -58,7 +58,7 @@ from diffusers.utils.import_utils import is_xformers_available


 # Will error if the minimal version of diffusers is not installed. Remove at your own risks.
-check_min_version("0.23.0.dev0")
+check_min_version("0.23.0")

 logger = get_logger(__name__)

--- a/examples/instruct_pix2pix/train_instruct_pix2pix.py
+++ b/examples/instruct_pix2pix/train_instruct_pix2pix.py
@@ -52,7 +52,7 @@ from diffusers.utils.import_utils import is_xformers_available


 # Will error if the minimal version of diffusers is not installed. Remove at your own risks.
-check_min_version("0.23.0.dev0")
+check_min_version("0.23.0")

 logger = get_logger(__name__, log_level="INFO")

--- a/examples/instruct_pix2pix/train_instruct_pix2pix_sdxl.py
+++ b/examples/instruct_pix2pix/train_instruct_pix2pix_sdxl.py
@@ -55,7 +55,7 @@ from diffusers.utils.import_utils import is_xformers_available


 # Will error if the minimal version of diffusers is not installed. Remove at your own risks.
-check_min_version("0.23.0.dev0")
+check_min_version("0.23.0")

 logger = get_logger(__name__, log_level="INFO")

--- a/examples/kandinsky2_2/text_to_image/train_text_to_image_decoder.py
+++ b/examples/kandinsky2_2/text_to_image/train_text_to_image_decoder.py
@@ -52,7 +52,7 @@ if is_wandb_available():


 # Will error if the minimal version of diffusers is not installed. Remove at your own risks.
-check_min_version("0.23.0.dev0")
+check_min_version("0.23.0")

 logger = get_logger(__name__, log_level="INFO")

--- a/examples/kandinsky2_2/text_to_image/train_text_to_image_lora_decoder.py
+++ b/examples/kandinsky2_2/text_to_image/train_text_to_image_lora_decoder.py
@@ -46,7 +46,7 @@ from diffusers.utils import check_min_version, is_wandb_available


 # Will error if the minimal version of diffusers is not installed. Remove at your own risks.
-check_min_version("0.23.0.dev0")
+check_min_version("0.23.0")

 logger = get_logger(__name__, log_level="INFO")

--- a/examples/kandinsky2_2/text_to_image/train_text_to_image_lora_prior.py
+++ b/examples/kandinsky2_2/text_to_image/train_text_to_image_lora_prior.py
@@ -46,7 +46,7 @@ from diffusers.utils import check_min_version, is_wandb_available


 # Will error if the minimal version of diffusers is not installed. Remove at your own risks.
-check_min_version("0.23.0.dev0")
+check_min_version("0.23.0")

 logger = get_logger(__name__, log_level="INFO")

--- a/examples/kandinsky2_2/text_to_image/train_text_to_image_prior.py
+++ b/examples/kandinsky2_2/text_to_image/train_text_to_image_prior.py
@@ -51,7 +51,7 @@ if is_wandb_available():


 # Will error if the minimal version of diffusers is not installed. Remove at your own risks.
-check_min_version("0.23.0.dev0")
+check_min_version("0.23.0")

 logger = get_logger(__name__, log_level="INFO")

--- a/examples/t2i_adapter/train_t2i_adapter_sdxl.py
+++ b/examples/t2i_adapter/train_t2i_adapter_sdxl.py
@@ -58,7 +58,7 @@ if is_wandb_available():
    import wandb

 # Will error if the minimal version of diffusers is not installed. Remove at your own risks.
-check_min_version("0.23.0.dev0")
+check_min_version("0.23.0")

 logger = get_logger(__name__)

--- a/examples/text_to_image/train_text_to_image.py
+++ b/examples/text_to_image/train_text_to_image.py
@@ -53,7 +53,7 @@ if is_wandb_available():


 # Will error if the minimal version of diffusers is not installed. Remove at your own risks.
-check_min_version("0.23.0.dev0")
+check_min_version("0.23.0")

 logger = get_logger(__name__, log_level="INFO")

--- a/examples/text_to_image/train_text_to_image_flax.py
+++ b/examples/text_to_image/train_text_to_image_flax.py
@@ -33,7 +33,7 @@ from diffusers.utils import check_min_version


 # Will error if the minimal version of diffusers is not installed. Remove at your own risks.
-check_min_version("0.23.0.dev0")
+check_min_version("0.23.0")

 logger = logging.getLogger(__name__)

--- a/examples/text_to_image/train_text_to_image_lora.py
+++ b/examples/text_to_image/train_text_to_image_lora.py
@@ -49,7 +49,7 @@ from diffusers.utils.import_utils import is_xformers_available


 # Will error if the minimal version of diffusers is not installed. Remove at your own risks.
-check_min_version("0.23.0.dev0")
+check_min_version("0.23.0")

 logger = get_logger(__name__, log_level="INFO")

--- a/examples/text_to_image/train_text_to_image_lora_sdxl.py
+++ b/examples/text_to_image/train_text_to_image_lora_sdxl.py
@@ -58,7 +58,7 @@ from diffusers.utils.import_utils import is_xformers_available


 # Will error if the minimal version of diffusers is not installed. Remove at your own risks.
-check_min_version("0.23.0.dev0")
+check_min_version("0.23.0")

 logger = get_logger(__name__)

--- a/examples/text_to_image/train_text_to_image_sdxl.py
+++ b/examples/text_to_image/train_text_to_image_sdxl.py
@@ -57,7 +57,7 @@ from diffusers.utils.import_utils import is_xformers_available


 # Will error if the minimal version of diffusers is not installed. Remove at your own risks.
-check_min_version("0.23.0.dev0")
+check_min_version("0.23.0")

 logger = get_logger(__name__)

--- a/examples/textual_inversion/textual_inversion.py
+++ b/examples/textual_inversion/textual_inversion.py
@@ -79,7 +79,7 @@ else:


 # Will error if the minimal version of diffusers is not installed. Remove at your own risks.
-check_min_version("0.23.0.dev0")
+check_min_version("0.23.0")

 logger = get_logger(__name__)

--- a/examples/textual_inversion/textual_inversion_flax.py
+++ b/examples/textual_inversion/textual_inversion_flax.py
@@ -56,7 +56,7 @@ else:
 # ------------------------------------------------------------------------------

 # Will error if the minimal version of diffusers is not installed. Remove at your own risks.
-check_min_version("0.23.0.dev0")
+check_min_version("0.23.0")

 logger = logging.getLogger(__name__)

--- a/examples/unconditional_image_generation/train_unconditional.py
+++ b/examples/unconditional_image_generation/train_unconditional.py
@@ -29,7 +29,7 @@ from diffusers.utils.import_utils import is_xformers_available


 # Will error if the minimal version of diffusers is not installed. Remove at your own risks.
-check_min_version("0.23.0.dev0")
+check_min_version("0.23.0")

 logger = get_logger(__name__, log_level="INFO")

--- a/examples/wuerstchen/text_to_image/train_text_to_image_lora_prior.py
+++ b/examples/wuerstchen/text_to_image/train_text_to_image_lora_prior.py
@@ -50,7 +50,7 @@ if is_wandb_available():


 # Will error if the minimal version of diffusers is not installed. Remove at your own risks.
-check_min_version("0.23.0.dev0")
+check_min_version("0.23.0")

 logger = get_logger(__name__, log_level="INFO")

--- a/examples/wuerstchen/text_to_image/train_text_to_image_prior.py
+++ b/examples/wuerstchen/text_to_image/train_text_to_image_prior.py
@@ -51,7 +51,7 @@ if is_wandb_available():


 # Will error if the minimal version of diffusers is not installed. Remove at your own risks.
-check_min_version("0.23.0.dev0")
+check_min_version("0.23.0")

 logger = get_logger(__name__, log_level="INFO")

--- a/setup.py
+++ b/setup.py
@@ -244,7 +244,7 @@ install_requires = [

 setup(
    name="diffusers",
-    version="0.23.0.dev0",  # expected format is one of x.y.z.dev0, or x.y.z.rc1 or x.y.z (no to dashes, yes to dots)
+    version="0.23.0",  # expected format is one of x.y.z.dev0, or x.y.z.rc1 or x.y.z (no to dashes, yes to dots)
    description="State-of-the-art diffusion in PyTorch and JAX.",
    long_description=open("README.md", "r", encoding="utf-8").read(),
    long_description_content_type="text/markdown",
--- a/src/diffusers/init.py
+++ b/src/diffusers/init.py
@@ -1,4 +1,4 @@
-__version__ = "0.23.0.dev0"
+__version__ = "0.23.0"

 from typing import TYPE_CHECKING

--- a/src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_img2img.py
+++ b/src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_img2img.py
@@ -551,7 +551,6 @@ class StableDiffusionImg2ImgPipeline(

        t_start = max(num_inference_steps - init_timestep, 0)
        timesteps = self.scheduler.timesteps[t_start * self.scheduler.order :]
-        self.scheduler._step_index_init = t_start * self.scheduler.order

        return timesteps, num_inference_steps - t_start

--- a/src/diffusers/schedulers/scheduling_dpmsolver_multistep.py
+++ b/src/diffusers/schedulers/scheduling_dpmsolver_multistep.py
@@ -215,7 +215,6 @@ class DPMSolverMultistepScheduler(SchedulerMixin, ConfigMixin):
        self.model_outputs = [None] * solver_order
        self.lower_order_nums = 0
        self._step_index = None
-        self._step_index_init = None

    @property
    def step_index(self):
@@ -223,13 +222,6 @@ class DPMSolverMultistepScheduler(SchedulerMixin, ConfigMixin):
        The index counter for current timestep. It will increae 1 after each scheduler step.
        """
        return self._step_index
-    
-    @property
-    def step_index_init(self):
-        """
-        the first step_index for denoising loop.
-        """
-        return self._step_index_init

    def set_timesteps(self, num_inference_steps: int = None, device: Union[str, torch.device] = None):
        """
@@ -768,28 +760,23 @@ class DPMSolverMultistepScheduler(SchedulerMixin, ConfigMixin):
        return x_t

    def _init_step_index(self, timestep):
-        
-        if self.step_index_init is None:
-            if isinstance(timestep, torch.Tensor):
-                timestep = timestep.to(self.timesteps.device)
+        if isinstance(timestep, torch.Tensor):
+            timestep = timestep.to(self.timesteps.device)

-            index_candidates = (self.timesteps == timestep).nonzero()
+        index_candidates = (self.timesteps == timestep).nonzero()

-            if len(index_candidates) == 0:
-                step_index = len(self.timesteps) - 1
-            # The sigma index that is taken for the **very** first `step`
-            # is always the second index (or the last index if there is only 1)
-            # This way we can ensure we don't accidentally skip a sigma in
-            # case we start in the middle of the denoising schedule (e.g. for image-to-image)
-            elif len(index_candidates) > 1:
-                step_index = index_candidates[1].item()
-            else:
-                step_index = index_candidates[0].item()
-
-            self._step_index_init = step_index
-            self._step_index = step_index
+        if len(index_candidates) == 0:
+            step_index = len(self.timesteps) - 1
+        # The sigma index that is taken for the **very** first `step`
+        # is always the second index (or the last index if there is only 1)
+        # This way we can ensure we don't accidentally skip a sigma in
+        # case we start in the middle of the denoising schedule (e.g. for image-to-image)
+        elif len(index_candidates) > 1:
+            step_index = index_candidates[1].item()
        else:
-            self._step_index = self.step_index_init
+            step_index = index_candidates[0].item()
+
+        self._step_index = step_index

    def step(
        self,
@@ -897,10 +884,8 @@ class DPMSolverMultistepScheduler(SchedulerMixin, ConfigMixin):
        else:
            schedule_timesteps = self.timesteps.to(original_samples.device)
            timesteps = timesteps.to(original_samples.device)
-        if self.step_index_init is None:
-            step_indices = [(schedule_timesteps == t).nonzero().item() for t in timesteps]
-        else:
-            step_indices = [self.step_index_init] * timesteps.shape[0]
+
+        step_indices = [(schedule_timesteps == t).nonzero().item() for t in timesteps]

        sigma = sigmas[step_indices].flatten()
        while len(sigma.shape) < len(original_samples.shape):