Release: v0.2.2

[K-LMS Scheduler] fix import (#191 )
Update README.md
2025-12-29 07:51:21 +08:00 · 2022-08-16 19:30:08 +02:00 · 2022-08-16 19:25:45 +02:00 · 2022-08-16 19:12:25 +02:00 · 2022-08-16 19:10:35 +02:00 · 2022-08-16 19:09:09 +02:00
34 changed files with 1205 additions and 80 deletions
--- a/.github/ISSUE_TEMPLATE/bug-report.yml
+++ b/.github/ISSUE_TEMPLATE/bug-report.yml
@@ -0,0 +1,37 @@
+name: "\U0001F41B Bug Report"
+description: Report a bug on diffusers
+labels: [ "bug" ]
+body:
+  - type: markdown
+    attributes:
+      value: |
+        Thanks for taking the time to fill out this bug report!
+  - type: textarea
+    id: bug-description
+    attributes:
+      label: Describe the bug
+      description: A clear and concise description of what the bug is. If you intend to submit a pull request for this issue, tell us in the description. Thanks!
+      placeholder: Bug description
+    validations:
+      required: true
+  - type: textarea
+    id: reproduction
+    attributes:
+      label: Reproduction
+      description: Please provide a minimal reproducible code which we can copy/paste and reproduce the issue.
+      placeholder: Reproduction
+  - type: textarea
+    id: logs
+    attributes:
+      label: Logs
+      description: "Please include the Python logs if you can."
+      render: shell
+  - type: textarea
+    id: system-info
+    attributes:
+      label: System Info
+      description: Please share your system info with us,
+      render: shell
+      placeholder: diffusers version, Python Version, etc
+    validations:
+      required: true
--- a/.github/ISSUE_TEMPLATE/config.yml
+++ b/.github/ISSUE_TEMPLATE/config.yml
@@ -0,0 +1,7 @@
+contact_links:
+  - name: Forum
+    url: https://discuss.huggingface.co/c/discussion-related-to-httpsgithubcomhuggingfacediffusers/63
+    about: General usage questions and community discussions
+  - name: Blank issue
+    url: https://github.com/huggingface/diffusers/issues/new
+    about: Please note that the Forum is in most places the right place for discussions
--- a/.github/ISSUE_TEMPLATE/feature_request.md
+++ b/.github/ISSUE_TEMPLATE/feature_request.md
@@ -0,0 +1,20 @@
+---
+name: "\U0001F680 Feature request"
+about: Suggest an idea for this project
+title: ''
+labels: ''
+assignees: ''
+
+---
+
+**Is your feature request related to a problem? Please describe.**
+A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]
+
+**Describe the solution you'd like**
+A clear and concise description of what you want to happen.
+
+**Describe alternatives you've considered**
+A clear and concise description of any alternative solutions or features you've considered.
+
+**Additional context**
+Add any other context or screenshots about the feature request here.
--- a/README.md
+++ b/README.md
@@ -34,6 +34,40 @@ In order to get started, we recommend taking a look at two notebooks:
 - The [Training a diffusers model](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/training_example.ipynb) notebook summarizes diffuser model training methods. This notebook takes a step-by-step approach to training your
  diffuser model on an image dataset, with explanatory graphics.
  
+## **New 🎨🎨🎨** Stable Diffusion is now fully compatible with `diffusers`! 
+
+Stable Diffusion is a text-to-image latent diffusion model created by the researchers and engineers from [CompVis](https://github.com/CompVis), [Stability AI](https://stability.ai/) and [LAION](https://laion.ai/). It's trained on 512x512 images from a subset of the [LAION-5B](https://laion.ai/blog/laion-5b/) database. This model uses a frozen CLIP ViT-L/14 text encoder to condition the model on text prompts. With its 860M UNet and 123M text encoder, the model is relatively lightweight and runs on a GPU with at least 10GB VRAM.
+See the [model card](https://huggingface.co/CompVis/stable-diffusion) for more information.
+
+**The Stable Diffusion weights are currently only available to universities, academics, research institutions and independent researchers. Please request access applying to <a href="https://stability.ai/academia-access-form" target="_blank">this</a> form**
+
+```py
+# make sure you're logged in with `huggingface-cli login`
+from torch import autocast
+from diffusers import StableDiffusionPipeline, LMSDiscreteScheduler
+
+lms = LMSDiscreteScheduler(
+	beta_start=0.00085, 
+	beta_end=0.012, 
+	beta_schedule="scaled_linear"
+)
+
+pipe = StableDiffusionPipeline.from_pretrained(
+	"CompVis/stable-diffusion-v1-3-diffusers", 
+	scheduler=lms,
+	use_auth_token=True
+)  
+
+prompt = "a photo of an astronaut riding a horse on mars"
+with autocast("cuda"):
+    image = pipe(prompt, width=768, guidance_scale=7)["sample"][0]  
+    
+image.save("astronaut_rides_horse.png")
+```
+
+For more details, check out [the Stable Diffusion notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/stable_diffusion.ipynb)
+and have a look into the [release notes](https://github.com/huggingface/diffusers/releases/tag/v0.2.0).
+  
 ## Examples

 If you want to run the code yourself 💻, you can try out:
@@ -120,11 +154,17 @@ The class provides functionality to compute previous image according to alpha, b

 ## Installation

+**With `pip`**
+    
 ```bash
-pip install diffusers  # should install diffusers 0.1.3
+pip install --upgrade diffusers  # should install diffusers 0.2.1
 ```

+**With `conda`**

+```sh
+conda install -c conda-forge diffusers
+```

 ## In the works

--- a/examples/README.md
+++ b/examples/README.md
@@ -22,7 +22,7 @@ The command to train a DDPM UNet model on the Oxford Flowers dataset:

 ```bash
 accelerate launch train_unconditional.py \
-  --dataset="huggan/flowers-102-categories" \
+  --dataset_name="huggan/flowers-102-categories" \
  --resolution=64 \
  --output_dir="ddpm-ema-flowers-64" \
  --train_batch_size=16 \
@@ -46,7 +46,7 @@ The command to train a DDPM UNet model on the Pokemon dataset:

 ```bash
 accelerate launch train_unconditional.py \
-  --dataset="huggan/pokemon" \
+  --dataset_name="huggan/pokemon" \
  --resolution=64 \
  --output_dir="ddpm-ema-pokemon-64" \
  --train_batch_size=16 \
@@ -62,3 +62,68 @@ An example trained model: https://huggingface.co/anton-l/ddpm-ema-pokemon-64
 A full training run takes 2 hours on 4xV100 GPUs.

 <img src="https://user-images.githubusercontent.com/26864830/180248200-928953b4-db38-48db-b0c6-8b740fe6786f.png" width="700" />
+
+
+### Using your own data
+
+To use your own dataset, there are 2 ways:
+- you can either provide your own folder as `--train_data_dir`
+- or you can upload your dataset to the hub (possibly as a private repo, if you prefer so), and simply pass the `--dataset_name` argument.
+
+Below, we explain both in more detail.
+
+#### Provide the dataset as a folder
+
+If you provide your own folders with images, the script expects the following directory structure:
+
+```bash
+data_dir/xxx.png
+data_dir/xxy.png
+data_dir/[...]/xxz.png
+```
+
+In other words, the script will take care of gathering all images inside the folder. You can then run the script like this:
+
+```bash
+accelerate launch train_unconditional.py \
+    --train_data_dir <path-to-train-directory> \
+    <other-arguments>
+```
+
+Internally, the script will use the [`ImageFolder`](https://huggingface.co/docs/datasets/v2.0.0/en/image_process#imagefolder) feature which will automatically turn the folders into 🤗 Dataset objects.
+
+#### Upload your data to the hub, as a (possibly private) repo
+
+It's very easy (and convenient) to upload your image dataset to the hub using the [`ImageFolder`](https://huggingface.co/docs/datasets/v2.0.0/en/image_process#imagefolder) feature available in 🤗 Datasets. Simply do the following:
+
+```python
+from datasets import load_dataset
+
+# example 1: local folder
+dataset = load_dataset("imagefolder", data_dir="path_to_your_folder")
+
+# example 2: local files (suppoted formats are tar, gzip, zip, xz, rar, zstd)
+dataset = load_dataset("imagefolder", data_files="path_to_zip_file")
+
+# example 3: remote files (supported formats are tar, gzip, zip, xz, rar, zstd)
+dataset = load_dataset("imagefolder", data_files="https://download.microsoft.com/download/3/E/1/3E1C3F21-ECDB-4869-8368-6DEBA77B919F/kagglecatsanddogs_3367a.zip")
+
+# example 4: providing several splits
+dataset = load_dataset("imagefolder", data_files={"train": ["path/to/file1", "path/to/file2"], "test": ["path/to/file3", "path/to/file4"]})
+```
+
+`ImageFolder` will create an `image` column containing the PIL-encoded images.
+
+Next, push it to the hub!
+
+```python
+# assuming you have ran the huggingface-cli login command in a terminal
+dataset.push_to_hub("name_of_your_dataset")
+
+# if you want to push to a private repo, simply pass private=True:
+dataset.push_to_hub("name_of_your_dataset", private=True)
+```
+
+and that's it! You can now train your model by simply setting the `--dataset_name` argument to the name of your dataset on the hub.
+
+More on this can also be found in [this blog post](https://huggingface.co/blog/image-search-datasets).
--- a/examples/train_unconditional.py
+++ b/examples/train_unconditional.py
@@ -75,7 +75,17 @@ def main(args):
            Normalize([0.5], [0.5]),
        ]
    )
-    dataset = load_dataset(args.dataset, split="train")
+
+    if args.dataset_name is not None:
+        dataset = load_dataset(
+            args.dataset_name,
+            args.dataset_config_name,
+            cache_dir=args.cache_dir,
+            use_auth_token=True if args.use_auth_token else None,
+            split="train",
+        )
+    else:
+        dataset = load_dataset("imagefolder", data_dir=args.train_data_dir, cache_dir=args.cache_dir, split="train")

    def transforms(examples):
        images = [augmentations(image.convert("RGB")) for image in examples["image"]]
@@ -179,9 +189,12 @@ def main(args):
 if __name__ == "__main__":
    parser = argparse.ArgumentParser(description="Simple example of a training script.")
    parser.add_argument("--local_rank", type=int, default=-1)
-    parser.add_argument("--dataset", type=str, default="huggan/flowers-102-categories")
-    parser.add_argument("--output_dir", type=str, default="ddpm-flowers-64")
+    parser.add_argument("--dataset_name", type=str, default=None)
+    parser.add_argument("--dataset_config_name", type=str, default=None)
+    parser.add_argument("--train_data_dir", type=str, default=None, help="A folder containing the training data.")
+    parser.add_argument("--output_dir", type=str, default="ddpm-model-64")
    parser.add_argument("--overwrite_output_dir", action="store_true")
+    parser.add_argument("--cache_dir", type=str, default=None)
    parser.add_argument("--resolution", type=int, default=64)
    parser.add_argument("--train_batch_size", type=int, default=16)
    parser.add_argument("--eval_batch_size", type=int, default=16)
@@ -201,6 +214,7 @@ if __name__ == "__main__":
    parser.add_argument("--ema_power", type=float, default=3 / 4)
    parser.add_argument("--ema_max_decay", type=float, default=0.9999)
    parser.add_argument("--push_to_hub", action="store_true")
+    parser.add_argument("--use_auth_token", action="store_true")
    parser.add_argument("--hub_token", type=str, default=None)
    parser.add_argument("--hub_model_id", type=str, default=None)
    parser.add_argument("--hub_private_repo", action="store_true")
@@ -222,4 +236,7 @@ if __name__ == "__main__":
    if env_local_rank != -1 and env_local_rank != args.local_rank:
        args.local_rank = env_local_rank

+    if args.dataset_name is None and args.train_data_dir is None:
+        raise ValueError("You must specify either a dataset name from the hub or a train data directory.")
+
    main(args)
--- a/setup.py
+++ b/setup.py
@@ -181,7 +181,7 @@ install_requires = [

 setup(
    name="diffusers",
-    version="0.1.3",
+    version="0.2.2",
    description="Diffusers",
    long_description=open("README.md", "r", encoding="utf-8").read(),
    long_description_content_type="text/markdown",
--- a/src/diffusers/init.py
+++ b/src/diffusers/init.py
@@ -1,10 +1,10 @@
 # flake8: noqa
 # There's no way to ignore "F401 '...' imported but unused" warnings in this
 # module, but to preserve other warnings. So, don't check this module at all.
-from .utils import is_inflect_available, is_transformers_available, is_unidecode_available
+from .utils import is_inflect_available, is_scipy_available, is_transformers_available, is_unidecode_available


-__version__ = "0.1.3"
+__version__ = "0.2.2"

 from .modeling_utils import ModelMixin
 from .models import AutoencoderKL, UNet2DConditionModel, UNet2DModel, VQModel
@@ -18,12 +18,26 @@ from .optimization import (
    get_scheduler,
 )
 from .pipeline_utils import DiffusionPipeline
-from .pipelines import DDIMPipeline, DDPMPipeline, LDMPipeline, PNDMPipeline, ScoreSdeVePipeline
-from .schedulers import DDIMScheduler, DDPMScheduler, PNDMScheduler, SchedulerMixin, ScoreSdeVeScheduler
+from .pipelines import DDIMPipeline, DDPMPipeline, KarrasVePipeline, LDMPipeline, PNDMPipeline, ScoreSdeVePipeline
+from .schedulers import (
+    DDIMScheduler,
+    DDPMScheduler,
+    KarrasVeScheduler,
+    PNDMScheduler,
+    SchedulerMixin,
+    ScoreSdeVeScheduler,
+)
+
+
+if is_scipy_available():
+    from .schedulers import LMSDiscreteScheduler
+else:
+    from .utils.dummy_scipy_objects import *
+
 from .training_utils import EMAModel


 if is_transformers_available():
-    from .pipelines import LDMTextToImagePipeline
+    from .pipelines import LDMTextToImagePipeline, StableDiffusionPipeline
 else:
    from .utils.dummy_transformers_objects import *
--- a/src/diffusers/hub_utils.py
+++ b/src/diffusers/hub_utils.py
@@ -168,13 +168,13 @@ def create_model_card(args, model_name):
            license="apache-2.0",
            library_name="diffusers",
            tags=[],
-            datasets=args.dataset,
+            datasets=args.dataset_name,
            metrics=[],
        ),
        template_path=MODEL_CARD_TEMPLATE_PATH,
        model_name=model_name,
        repo_name=repo_name,
-        dataset_name=args.dataset if hasattr(args, "dataset") else None,
+        dataset_name=args.dataset_name if hasattr(args, "dataset_name") else None,
        learning_rate=args.learning_rate,
        train_batch_size=args.train_batch_size,
        eval_batch_size=args.eval_batch_size,
--- a/src/diffusers/models/embeddings.py
+++ b/src/diffusers/models/embeddings.py
@@ -32,10 +32,10 @@ def get_timestep_embedding(
    assert len(timesteps.shape) == 1, "Timesteps should be a 1d-array"

    half_dim = embedding_dim // 2
+    exponent = -math.log(max_period) * torch.arange(start=0, end=half_dim, dtype=torch.float32)
+    exponent = exponent / (half_dim - downscale_freq_shift)

-    emb_coeff = -math.log(max_period) / (half_dim - downscale_freq_shift)
-    emb = torch.arange(half_dim, dtype=torch.float32, device=timesteps.device)
-    emb = torch.exp(emb * emb_coeff)
+    emb = torch.exp(exponent).to(device=timesteps.device)
    emb = timesteps[:, None].float() * emb[None, :]

    # scale embeddings
--- a/src/diffusers/models/resnet.py
+++ b/src/diffusers/models/resnet.py
@@ -331,7 +331,9 @@ class ResnetBlock(nn.Module):
    def forward(self, x, temb, hey=False):
        h = x

-        h = self.norm1(h)
+        # make sure hidden states is in float32
+        # when running in half-precision
+        h = self.norm1(h.float()).type(h.dtype)
        h = self.nonlinearity(h)

        if self.upsample is not None:
@@ -347,7 +349,9 @@ class ResnetBlock(nn.Module):
            temb = self.time_emb_proj(self.nonlinearity(temb))[:, :, None, None]
            h = h + temb

-        h = self.norm2(h)
+        # make sure hidden states is in float32
+        # when running in half-precision
+        h = self.norm2(h.float()).type(h.dtype)
        h = self.nonlinearity(h)

        h = self.dropout(h)
--- a/src/diffusers/models/unet_2d.py
+++ b/src/diffusers/models/unet_2d.py
@@ -132,6 +132,9 @@ class UNet2DModel(ModelMixin, ConfigMixin):
        elif torch.is_tensor(timesteps) and len(timesteps.shape) == 0:
            timesteps = timesteps[None].to(sample.device)

+        # broadcast to batch dimension
+        timesteps = timesteps.broadcast_to(sample.shape[0])
+
        t_emb = self.time_proj(timesteps)
        emb = self.time_embedding(t_emb)

@@ -166,7 +169,9 @@ class UNet2DModel(ModelMixin, ConfigMixin):
                sample = upsample_block(sample, res_samples, emb)

        # 6. post-process
-        sample = self.conv_norm_out(sample)
+        # make sure hidden states is in float32
+        # when running in half-precision
+        sample = self.conv_norm_out(sample.float()).type(sample.dtype)
        sample = self.conv_act(sample)
        sample = self.conv_out(sample)

--- a/src/diffusers/models/unet_2d_condition.py
+++ b/src/diffusers/models/unet_2d_condition.py
@@ -28,6 +28,7 @@ class UNet2DConditionModel(ModelMixin, ConfigMixin):
        act_fn="silu",
        norm_num_groups=32,
        norm_eps=1e-5,
+        cross_attention_dim=1280,
        attention_head_dim=8,
    ):
        super().__init__()
@@ -64,6 +65,7 @@ class UNet2DConditionModel(ModelMixin, ConfigMixin):
                add_downsample=not is_final_block,
                resnet_eps=norm_eps,
                resnet_act_fn=act_fn,
+                cross_attention_dim=cross_attention_dim,
                attn_num_head_channels=attention_head_dim,
                downsample_padding=downsample_padding,
            )
@@ -77,6 +79,7 @@ class UNet2DConditionModel(ModelMixin, ConfigMixin):
            resnet_act_fn=act_fn,
            output_scale_factor=mid_block_scale_factor,
            resnet_time_scale_shift="default",
+            cross_attention_dim=cross_attention_dim,
            attn_num_head_channels=attention_head_dim,
            resnet_groups=norm_num_groups,
        )
@@ -101,6 +104,7 @@ class UNet2DConditionModel(ModelMixin, ConfigMixin):
                add_upsample=not is_final_block,
                resnet_eps=norm_eps,
                resnet_act_fn=act_fn,
+                cross_attention_dim=cross_attention_dim,
                attn_num_head_channels=attention_head_dim,
            )
            self.up_blocks.append(up_block)
@@ -129,6 +133,9 @@ class UNet2DConditionModel(ModelMixin, ConfigMixin):
        elif torch.is_tensor(timesteps) and len(timesteps.shape) == 0:
            timesteps = timesteps[None].to(sample.device)

+        # broadcast to batch dimension
+        timesteps = timesteps.broadcast_to(sample.shape[0])
+
        t_emb = self.time_proj(timesteps)
        emb = self.time_embedding(t_emb)

@@ -168,8 +175,9 @@ class UNet2DConditionModel(ModelMixin, ConfigMixin):
                sample = upsample_block(hidden_states=sample, temb=emb, res_hidden_states_tuple=res_samples)

        # 6. post-process
-
-        sample = self.conv_norm_out(sample)
+        # make sure hidden states is in float32
+        # when running in half-precision
+        sample = self.conv_norm_out(sample.float()).type(sample.dtype)
        sample = self.conv_act(sample)
        sample = self.conv_out(sample)

--- a/src/diffusers/models/unet_blocks.py
+++ b/src/diffusers/models/unet_blocks.py
@@ -31,6 +31,7 @@ def get_down_block(
    resnet_eps,
    resnet_act_fn,
    attn_num_head_channels,
+    cross_attention_dim=None,
    downsample_padding=None,
 ):
    down_block_type = down_block_type[7:] if down_block_type.startswith("UNetRes") else down_block_type
@@ -58,6 +59,8 @@ def get_down_block(
            attn_num_head_channels=attn_num_head_channels,
        )
    elif down_block_type == "CrossAttnDownBlock2D":
+        if cross_attention_dim is None:
+            raise ValueError("cross_attention_dim must be specified for CrossAttnUpBlock2D")
        return CrossAttnDownBlock2D(
            num_layers=num_layers,
            in_channels=in_channels,
@@ -67,6 +70,7 @@ def get_down_block(
            resnet_eps=resnet_eps,
            resnet_act_fn=resnet_act_fn,
            downsample_padding=downsample_padding,
+            cross_attention_dim=cross_attention_dim,
            attn_num_head_channels=attn_num_head_channels,
        )
    elif down_block_type == "SkipDownBlock2D":
@@ -115,6 +119,7 @@ def get_up_block(
    resnet_eps,
    resnet_act_fn,
    attn_num_head_channels,
+    cross_attention_dim=None,
 ):
    up_block_type = up_block_type[7:] if up_block_type.startswith("UNetRes") else up_block_type
    if up_block_type == "UpBlock2D":
@@ -129,6 +134,8 @@ def get_up_block(
            resnet_act_fn=resnet_act_fn,
        )
    elif up_block_type == "CrossAttnUpBlock2D":
+        if cross_attention_dim is None:
+            raise ValueError("cross_attention_dim must be specified for CrossAttnUpBlock2D")
        return CrossAttnUpBlock2D(
            num_layers=num_layers,
            in_channels=in_channels,
@@ -138,6 +145,7 @@ def get_up_block(
            add_upsample=add_upsample,
            resnet_eps=resnet_eps,
            resnet_act_fn=resnet_act_fn,
+            cross_attention_dim=cross_attention_dim,
            attn_num_head_channels=attn_num_head_channels,
        )
    elif up_block_type == "AttnUpBlock2D":
@@ -632,6 +640,79 @@ class DownEncoderBlock2D(nn.Module):
        return hidden_states


+class AttnDownEncoderBlock2D(nn.Module):
+    def __init__(
+        self,
+        in_channels: int,
+        out_channels: int,
+        dropout: float = 0.0,
+        num_layers: int = 1,
+        resnet_eps: float = 1e-6,
+        resnet_time_scale_shift: str = "default",
+        resnet_act_fn: str = "swish",
+        resnet_groups: int = 32,
+        resnet_pre_norm: bool = True,
+        attn_num_head_channels=1,
+        output_scale_factor=1.0,
+        add_downsample=True,
+        downsample_padding=1,
+    ):
+        super().__init__()
+        resnets = []
+        attentions = []
+
+        for i in range(num_layers):
+            in_channels = in_channels if i == 0 else out_channels
+            resnets.append(
+                ResnetBlock(
+                    in_channels=in_channels,
+                    out_channels=out_channels,
+                    temb_channels=None,
+                    eps=resnet_eps,
+                    groups=resnet_groups,
+                    dropout=dropout,
+                    time_embedding_norm=resnet_time_scale_shift,
+                    non_linearity=resnet_act_fn,
+                    output_scale_factor=output_scale_factor,
+                    pre_norm=resnet_pre_norm,
+                )
+            )
+            attentions.append(
+                AttentionBlockNew(
+                    out_channels,
+                    num_head_channels=attn_num_head_channels,
+                    rescale_output_factor=output_scale_factor,
+                    eps=resnet_eps,
+                    num_groups=resnet_groups,
+                )
+            )
+
+        self.attentions = nn.ModuleList(attentions)
+        self.resnets = nn.ModuleList(resnets)
+
+        if add_downsample:
+            self.downsamplers = nn.ModuleList(
+                [
+                    Downsample2D(
+                        in_channels, use_conv=True, out_channels=out_channels, padding=downsample_padding, name="op"
+                    )
+                ]
+            )
+        else:
+            self.downsamplers = None
+
+    def forward(self, hidden_states):
+        for resnet, attn in zip(self.resnets, self.attentions):
+            hidden_states = resnet(hidden_states, temb=None)
+            hidden_states = attn(hidden_states)
+
+        if self.downsamplers is not None:
+            for downsampler in self.downsamplers:
+                hidden_states = downsampler(hidden_states)
+
+        return hidden_states
+
+
 class AttnSkipDownBlock2D(nn.Module):
    def __init__(
        self,
@@ -1079,6 +1160,73 @@ class UpDecoderBlock2D(nn.Module):
        return hidden_states


+class AttnUpDecoderBlock2D(nn.Module):
+    def __init__(
+        self,
+        in_channels: int,
+        out_channels: int,
+        dropout: float = 0.0,
+        num_layers: int = 1,
+        resnet_eps: float = 1e-6,
+        resnet_time_scale_shift: str = "default",
+        resnet_act_fn: str = "swish",
+        resnet_groups: int = 32,
+        resnet_pre_norm: bool = True,
+        attn_num_head_channels=1,
+        output_scale_factor=1.0,
+        add_upsample=True,
+    ):
+        super().__init__()
+        resnets = []
+        attentions = []
+
+        for i in range(num_layers):
+            input_channels = in_channels if i == 0 else out_channels
+
+            resnets.append(
+                ResnetBlock(
+                    in_channels=input_channels,
+                    out_channels=out_channels,
+                    temb_channels=None,
+                    eps=resnet_eps,
+                    groups=resnet_groups,
+                    dropout=dropout,
+                    time_embedding_norm=resnet_time_scale_shift,
+                    non_linearity=resnet_act_fn,
+                    output_scale_factor=output_scale_factor,
+                    pre_norm=resnet_pre_norm,
+                )
+            )
+            attentions.append(
+                AttentionBlockNew(
+                    out_channels,
+                    num_head_channels=attn_num_head_channels,
+                    rescale_output_factor=output_scale_factor,
+                    eps=resnet_eps,
+                    num_groups=resnet_groups,
+                )
+            )
+
+        self.attentions = nn.ModuleList(attentions)
+        self.resnets = nn.ModuleList(resnets)
+
+        if add_upsample:
+            self.upsamplers = nn.ModuleList([Upsample2D(out_channels, use_conv=True, out_channels=out_channels)])
+        else:
+            self.upsamplers = None
+
+    def forward(self, hidden_states):
+        for resnet, attn in zip(self.resnets, self.attentions):
+            hidden_states = resnet(hidden_states, temb=None)
+            hidden_states = attn(hidden_states)
+
+        if self.upsamplers is not None:
+            for upsampler in self.upsamplers:
+                hidden_states = upsampler(hidden_states)
+
+        return hidden_states
+
+
 class AttnSkipUpBlock2D(nn.Module):
    def __init__(
        self,
--- a/src/diffusers/models/vae.py
+++ b/src/diffusers/models/vae.py
@@ -40,6 +40,7 @@ class Encoder(nn.Module):
                out_channels=output_channel,
                add_downsample=not is_final_block,
                resnet_eps=1e-6,
+                downsample_padding=0,
                resnet_act_fn=act_fn,
                attn_num_head_channels=None,
                temb_channels=None,
--- a/src/diffusers/pipeline_utils.py
+++ b/src/diffusers/pipeline_utils.py
@@ -15,6 +15,7 @@
 # limitations under the License.

 import importlib
+import inspect
 import os
 from typing import Optional, Union

@@ -148,6 +149,12 @@ class DiffusionPipeline(ConfigMixin):
            diffusers_module = importlib.import_module(cls.__module__.split(".")[0])
            pipeline_class = getattr(diffusers_module, config_dict["_class_name"])

+        # some modules can be passed directly to the init
+        # in this case they are already instantiated in `kwargs`
+        # extract them here
+        expected_modules = set(inspect.signature(pipeline_class.__init__).parameters.keys())
+        passed_class_obj = {k: kwargs.pop(k) for k in expected_modules if k in kwargs}
+
        init_dict, _ = pipeline_class.extract_init_dict(config_dict, **kwargs)

        init_kwargs = {}
@@ -158,8 +165,36 @@ class DiffusionPipeline(ConfigMixin):
        # 3. Load each module in the pipeline
        for name, (library_name, class_name) in init_dict.items():
            is_pipeline_module = hasattr(pipelines, library_name)
+            loaded_sub_model = None
+
            # if the model is in a pipeline module, then we load it from the pipeline
-            if is_pipeline_module:
+            if name in passed_class_obj:
+                # 1. check that passed_class_obj has correct parent class
+                if not is_pipeline_module:
+                    library = importlib.import_module(library_name)
+                    class_obj = getattr(library, class_name)
+                    importable_classes = LOADABLE_CLASSES[library_name]
+                    class_candidates = {c: getattr(library, c) for c in importable_classes.keys()}
+
+                    expected_class_obj = None
+                    for class_name, class_candidate in class_candidates.items():
+                        if issubclass(class_obj, class_candidate):
+                            expected_class_obj = class_candidate
+
+                    if not issubclass(passed_class_obj[name].__class__, expected_class_obj):
+                        raise ValueError(
+                            f"{passed_class_obj[name]} is of type: {type(passed_class_obj[name])}, but should be"
+                            f" {expected_class_obj}"
+                        )
+                else:
+                    logger.warn(
+                        f"You have passed a non-standard module {passed_class_obj[name]}. We cannot verify whether it"
+                        " has the correct type"
+                    )
+
+                # set passed class object
+                loaded_sub_model = passed_class_obj[name]
+            elif is_pipeline_module:
                pipeline_module = getattr(pipelines, library_name)
                class_obj = getattr(pipeline_module, class_name)
                importable_classes = ALL_IMPORTABLE_CLASSES
@@ -171,23 +206,24 @@ class DiffusionPipeline(ConfigMixin):
                importable_classes = LOADABLE_CLASSES[library_name]
                class_candidates = {c: getattr(library, c) for c in importable_classes.keys()}

-            load_method_name = None
-            for class_name, class_candidate in class_candidates.items():
-                if issubclass(class_obj, class_candidate):
-                    load_method_name = importable_classes[class_name][1]
+            if loaded_sub_model is None:
+                load_method_name = None
+                for class_name, class_candidate in class_candidates.items():
+                    if issubclass(class_obj, class_candidate):
+                        load_method_name = importable_classes[class_name][1]

-            load_method = getattr(class_obj, load_method_name)
+                load_method = getattr(class_obj, load_method_name)

-            # check if the module is in a subdirectory
-            if os.path.isdir(os.path.join(cached_folder, name)):
-                loaded_sub_model = load_method(os.path.join(cached_folder, name))
-            else:
-                # else load from the root directory
-                loaded_sub_model = load_method(cached_folder)
+                # check if the module is in a subdirectory
+                if os.path.isdir(os.path.join(cached_folder, name)):
+                    loaded_sub_model = load_method(os.path.join(cached_folder, name))
+                else:
+                    # else load from the root directory
+                    loaded_sub_model = load_method(cached_folder)

            init_kwargs[name] = loaded_sub_model  # UNet(...), # DiffusionSchedule(...)

-        # 5. Instantiate the pipeline
+        # 4. Instantiate the pipeline
        model = pipeline_class(**init_kwargs)
        return model

--- a/src/diffusers/pipelines/init.py
+++ b/src/diffusers/pipelines/init.py
@@ -4,7 +4,9 @@ from .ddpm import DDPMPipeline
 from .latent_diffusion_uncond import LDMPipeline
 from .pndm import PNDMPipeline
 from .score_sde_ve import ScoreSdeVePipeline
+from .stochatic_karras_ve import KarrasVePipeline


 if is_transformers_available():
    from .latent_diffusion import LDMTextToImagePipeline
+    from .stable_diffusion import StableDiffusionPipeline
--- a/src/diffusers/pipelines/latent_diffusion/pipeline_latent_diffusion.py
+++ b/src/diffusers/pipelines/latent_diffusion/pipeline_latent_diffusion.py
@@ -1,3 +1,4 @@
+import inspect
 from typing import Optional, Tuple, Union

 import torch
@@ -45,11 +46,11 @@ class LDMTextToImagePipeline(DiffusionPipeline):
        # get unconditional embeddings for classifier free guidance
        if guidance_scale != 1.0:
            uncond_input = self.tokenizer([""] * batch_size, padding="max_length", max_length=77, return_tensors="pt")
-            uncond_embeddings = self.bert(uncond_input.input_ids.to(torch_device))
+            uncond_embeddings = self.bert(uncond_input.input_ids.to(torch_device))[0]

        # get prompt text embeddings
        text_input = self.tokenizer(prompt, padding="max_length", max_length=77, return_tensors="pt")
-        text_embeddings = self.bert(text_input.input_ids.to(torch_device))
+        text_embeddings = self.bert(text_input.input_ids.to(torch_device))[0]

        latents = torch.randn(
            (batch_size, self.unet.in_channels, self.unet.sample_size, self.unet.sample_size),
@@ -59,6 +60,13 @@ class LDMTextToImagePipeline(DiffusionPipeline):

        self.scheduler.set_timesteps(num_inference_steps)

+        # prepare extra kwargs for the scheduler step, since not all schedulers have the same signature
+        accepts_eta = "eta" in set(inspect.signature(self.scheduler.step).parameters.keys())
+
+        extra_kwargs = {}
+        if accepts_eta:
+            extra_kwargs["eta"] = eta
+
        for t in tqdm(self.scheduler.timesteps):
            if guidance_scale == 1.0:
                # guidance_scale of 1 means no guidance
@@ -79,7 +87,7 @@ class LDMTextToImagePipeline(DiffusionPipeline):
                noise_pred = noise_pred_uncond + guidance_scale * (noise_prediction_text - noise_pred_uncond)

            # compute the previous noisy sample x_t -> x_t-1
-            latents = self.scheduler.step(noise_pred, t, latents, eta)["prev_sample"]
+            latents = self.scheduler.step(noise_pred, t, latents, **extra_kwargs)["prev_sample"]

        # scale and decode the image latents with vae
        latents = 1 / 0.18215 * latents
@@ -618,5 +626,4 @@ class LDMBertModel(LDMBertPreTrainedModel):
            output_hidden_states=output_hidden_states,
            return_dict=return_dict,
        )
-        sequence_output = outputs[0]
-        return sequence_output
+        return outputs
--- a/src/diffusers/pipelines/latent_diffusion_uncond/pipeline_latent_diffusion_uncond.py
+++ b/src/diffusers/pipelines/latent_diffusion_uncond/pipeline_latent_diffusion_uncond.py
@@ -1,3 +1,5 @@
+import inspect
+
 import torch

 from tqdm.auto import tqdm
@@ -31,11 +33,18 @@ class LDMPipeline(DiffusionPipeline):

        self.scheduler.set_timesteps(num_inference_steps)

+        # prepare extra kwargs for the scheduler step, since not all schedulers have the same signature
+        accepts_eta = "eta" in set(inspect.signature(self.scheduler.step).parameters.keys())
+
+        extra_kwargs = {}
+        if accepts_eta:
+            extra_kwargs["eta"] = eta
+
        for t in tqdm(self.scheduler.timesteps):
            # predict the noise residual
            noise_prediction = self.unet(latents, t)["sample"]
            # compute the previous noisy sample x_t -> x_t-1
-            latents = self.scheduler.step(noise_prediction, t, latents, eta)["prev_sample"]
+            latents = self.scheduler.step(noise_prediction, t, latents, **extra_kwargs)["prev_sample"]

        # decode the image latents with the VAE
        image = self.vqvae.decode(latents)
--- a/src/diffusers/pipelines/stable_diffusion/init.py
+++ b/src/diffusers/pipelines/stable_diffusion/init.py
@@ -0,0 +1,5 @@
+from ...utils import is_transformers_available
+
+
+if is_transformers_available():
+    from .pipeline_stable_diffusion import StableDiffusionPipeline
--- a/src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion.py
+++ b/src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion.py
@@ -0,0 +1,142 @@
+import inspect
+from typing import List, Optional, Union
+
+import torch
+
+from tqdm.auto import tqdm
+from transformers import CLIPTextModel, CLIPTokenizer
+
+from ...models import AutoencoderKL, UNet2DConditionModel
+from ...pipeline_utils import DiffusionPipeline
+from ...schedulers import DDIMScheduler, LMSDiscreteScheduler, PNDMScheduler
+
+
+class StableDiffusionPipeline(DiffusionPipeline):
+    def __init__(
+        self,
+        vae: AutoencoderKL,
+        text_encoder: CLIPTextModel,
+        tokenizer: CLIPTokenizer,
+        unet: UNet2DConditionModel,
+        scheduler: Union[DDIMScheduler, PNDMScheduler, LMSDiscreteScheduler],
+    ):
+        super().__init__()
+        scheduler = scheduler.set_format("pt")
+        self.register_modules(vae=vae, text_encoder=text_encoder, tokenizer=tokenizer, unet=unet, scheduler=scheduler)
+
+    @torch.no_grad()
+    def __call__(
+        self,
+        prompt: Union[str, List[str]],
+        height: Optional[int] = 512,
+        width: Optional[int] = 512,
+        num_inference_steps: Optional[int] = 50,
+        guidance_scale: Optional[float] = 1.0,
+        eta: Optional[float] = 0.0,
+        generator: Optional[torch.Generator] = None,
+        torch_device: Optional[Union[str, torch.device]] = None,
+        output_type: Optional[str] = "pil",
+    ):
+        if torch_device is None:
+            torch_device = "cuda" if torch.cuda.is_available() else "cpu"
+
+        if isinstance(prompt, str):
+            batch_size = 1
+        elif isinstance(prompt, list):
+            batch_size = len(prompt)
+        else:
+            raise ValueError(f"`prompt` has to be of type `str` or `list` but is {type(prompt)}")
+
+        if height % 8 != 0 or width % 8 != 0:
+            raise ValueError(f"`height` and `width` have to be divisible by 8 but are {height} and {width}.")
+
+        self.unet.to(torch_device)
+        self.vae.to(torch_device)
+        self.text_encoder.to(torch_device)
+
+        # get prompt text embeddings
+        text_input = self.tokenizer(
+            prompt,
+            padding="max_length",
+            max_length=self.tokenizer.model_max_length,
+            truncation=True,
+            return_tensors="pt",
+        )
+        text_embeddings = self.text_encoder(text_input.input_ids.to(torch_device))[0]
+
+        # here `guidance_scale` is defined analog to the guidance weight `w` of equation (2)
+        # of the Imagen paper: https://arxiv.org/pdf/2205.11487.pdf . `guidance_scale = 1`
+        # corresponds to doing no classifier free guidance.
+        do_classifier_free_guidance = guidance_scale > 1.0
+        # get unconditional embeddings for classifier free guidance
+        if do_classifier_free_guidance:
+            max_length = text_input.input_ids.shape[-1]
+            uncond_input = self.tokenizer(
+                [""] * batch_size, padding="max_length", max_length=max_length, return_tensors="pt"
+            )
+            uncond_embeddings = self.text_encoder(uncond_input.input_ids.to(torch_device))[0]
+
+            # For classifier free guidance, we need to do two forward passes.
+            # Here we concatenate the unconditional and text embeddings into a single batch
+            # to avoid doing two forward passes
+            text_embeddings = torch.cat([uncond_embeddings, text_embeddings])
+
+        # get the intial random noise
+        latents = torch.randn(
+            (batch_size, self.unet.in_channels, height // 8, width // 8),
+            generator=generator,
+            device=torch_device,
+        )
+
+        # set timesteps
+        accepts_offset = "offset" in set(inspect.signature(self.scheduler.set_timesteps).parameters.keys())
+        extra_set_kwargs = {}
+        if accepts_offset:
+            extra_set_kwargs["offset"] = 1
+
+        self.scheduler.set_timesteps(num_inference_steps, **extra_set_kwargs)
+
+        # if we use LMSDiscreteScheduler, let's make sure latents are mulitplied by sigmas
+        if isinstance(self.scheduler, LMSDiscreteScheduler):
+            latents = latents * self.scheduler.sigmas[0]
+
+        # prepare extra kwargs for the scheduler step, since not all schedulers have the same signature
+        # eta (η) is only used with the DDIMScheduler, it will be ignored for other schedulers.
+        # eta corresponds to η in DDIM paper: https://arxiv.org/abs/2010.02502
+        # and should be between [0, 1]
+        accepts_eta = "eta" in set(inspect.signature(self.scheduler.step).parameters.keys())
+        extra_step_kwargs = {}
+        if accepts_eta:
+            extra_step_kwargs["eta"] = eta
+
+        for i, t in tqdm(enumerate(self.scheduler.timesteps)):
+            # expand the latents if we are doing classifier free guidance
+            latent_model_input = torch.cat([latents] * 2) if do_classifier_free_guidance else latents
+            if isinstance(self.scheduler, LMSDiscreteScheduler):
+                sigma = self.scheduler.sigmas[i]
+                latent_model_input = latent_model_input / ((sigma**2 + 1) ** 0.5)
+
+            # predict the noise residual
+            noise_pred = self.unet(latent_model_input, t, encoder_hidden_states=text_embeddings)["sample"]
+
+            # perform guidance
+            if do_classifier_free_guidance:
+                noise_pred_uncond, noise_pred_text = noise_pred.chunk(2)
+                noise_pred = noise_pred_uncond + guidance_scale * (noise_pred_text - noise_pred_uncond)
+
+            # compute the previous noisy sample x_t -> x_t-1
+            if isinstance(self.scheduler, LMSDiscreteScheduler):
+                latents = self.scheduler.step(noise_pred, i, latents, **extra_step_kwargs)["prev_sample"]
+            else:
+                latents = self.scheduler.step(noise_pred, t, latents, **extra_step_kwargs)["prev_sample"]
+
+        # scale and decode the image latents with vae
+        latents = 1 / 0.18215 * latents
+        image = self.vae.decode(latents)
+
+        image = (image / 2 + 0.5).clamp(0, 1)
+        image = image.cpu().permute(0, 2, 3, 1).numpy()
+        if output_type == "pil":
+            image = self.numpy_to_pil(image)
+
+        return {"sample": image}
--- a/src/diffusers/pipelines/stochatic_karras_ve/init.py
+++ b/src/diffusers/pipelines/stochatic_karras_ve/init.py
@@ -0,0 +1 @@
+from .pipeline_stochastic_karras_ve import KarrasVePipeline
--- a/src/diffusers/pipelines/stochatic_karras_ve/pipeline_stochastic_karras_ve.py
+++ b/src/diffusers/pipelines/stochatic_karras_ve/pipeline_stochastic_karras_ve.py
@@ -0,0 +1,81 @@
+#!/usr/bin/env python3
+import torch
+
+from tqdm.auto import tqdm
+
+from ...models import UNet2DModel
+from ...pipeline_utils import DiffusionPipeline
+from ...schedulers import KarrasVeScheduler
+
+
+class KarrasVePipeline(DiffusionPipeline):
+    """
+    Stochastic sampling from Karras et al. [1] tailored to the Variance-Expanding (VE) models [2]. Use Algorithm 2 and
+    the VE column of Table 1 from [1] for reference.
+
+    [1] Karras, Tero, et al. "Elucidating the Design Space of Diffusion-Based Generative Models."
+    https://arxiv.org/abs/2206.00364 [2] Song, Yang, et al. "Score-based generative modeling through stochastic
+    differential equations." https://arxiv.org/abs/2011.13456
+    """
+
+    unet: UNet2DModel
+    scheduler: KarrasVeScheduler
+
+    def __init__(self, unet, scheduler):
+        super().__init__()
+        scheduler = scheduler.set_format("pt")
+        self.register_modules(unet=unet, scheduler=scheduler)
+
+    @torch.no_grad()
+    def __call__(self, batch_size=1, num_inference_steps=50, generator=None, torch_device=None, output_type="pil"):
+        if torch_device is None:
+            torch_device = "cuda" if torch.cuda.is_available() else "cpu"
+
+        img_size = self.unet.config.sample_size
+        shape = (batch_size, 3, img_size, img_size)
+
+        model = self.unet.to(torch_device)
+
+        # sample x_0 ~ N(0, sigma_0^2 * I)
+        sample = torch.randn(*shape) * self.scheduler.config.sigma_max
+        sample = sample.to(torch_device)
+
+        self.scheduler.set_timesteps(num_inference_steps)
+
+        for t in tqdm(self.scheduler.timesteps):
+            # here sigma_t == t_i from the paper
+            sigma = self.scheduler.schedule[t]
+            sigma_prev = self.scheduler.schedule[t - 1] if t > 0 else 0
+
+            # 1. Select temporarily increased noise level sigma_hat
+            # 2. Add new noise to move from sample_i to sample_hat
+            sample_hat, sigma_hat = self.scheduler.add_noise_to_input(sample, sigma, generator=generator)
+
+            # 3. Predict the noise residual given the noise magnitude `sigma_hat`
+            # The model inputs and output are adjusted by following eq. (213) in [1].
+            model_output = (sigma_hat / 2) * model((sample_hat + 1) / 2, sigma_hat / 2)["sample"]
+
+            # 4. Evaluate dx/dt at sigma_hat
+            # 5. Take Euler step from sigma to sigma_prev
+            step_output = self.scheduler.step(model_output, sigma_hat, sigma_prev, sample_hat)
+
+            if sigma_prev != 0:
+                # 6. Apply 2nd order correction
+                # The model inputs and output are adjusted by following eq. (213) in [1].
+                model_output = (sigma_prev / 2) * model((step_output["prev_sample"] + 1) / 2, sigma_prev / 2)["sample"]
+                step_output = self.scheduler.step_correct(
+                    model_output,
+                    sigma_hat,
+                    sigma_prev,
+                    sample_hat,
+                    step_output["prev_sample"],
+                    step_output["derivative"],
+                )
+            sample = step_output["prev_sample"]
+
+        sample = (sample / 2 + 0.5).clamp(0, 1)
+        sample = sample.cpu().permute(0, 2, 3, 1).numpy()
+        if output_type == "pil":
+            sample = self.numpy_to_pil(sample)
+
+        return {"sample": sample}
--- a/src/diffusers/schedulers/README.md
+++ b/src/diffusers/schedulers/README.md
@@ -1,14 +1,14 @@
 # Schedulers

 - Schedulers are the algorithms to use diffusion models in inference as well as for training. They include the noise schedules and define algorithm-specific diffusion steps.
- Schedulers can be used interchangable between diffusion models in inference to find the preferred tradef-off between speed and generation quality.
+- Schedulers can be used interchangable between diffusion models in inference to find the preferred trade-off between speed and generation quality.
 - Schedulers are available in numpy, but can easily be transformed into PyTorch.

 ## API

 - Schedulers should provide one or more `def step(...)` functions that should be called iteratively to unroll the diffusion loop during 
 the forward pass.
- Schedulers should be framework-agonstic, but provide a simple functionality to convert the scheduler into a specific framework, such as PyTorch 
+- Schedulers should be framework-agnostic, but provide a simple functionality to convert the scheduler into a specific framework, such as PyTorch 
 with a `set_format(...)` method.

 ## Examples
--- a/src/diffusers/schedulers/init.py
+++ b/src/diffusers/schedulers/init.py
@@ -16,9 +16,17 @@
 # See the License for the specific language governing permissions and
 # limitations under the License.

+from ..utils import is_scipy_available
 from .scheduling_ddim import DDIMScheduler
 from .scheduling_ddpm import DDPMScheduler
+from .scheduling_karras_ve import KarrasVeScheduler
 from .scheduling_pndm import PNDMScheduler
 from .scheduling_sde_ve import ScoreSdeVeScheduler
 from .scheduling_sde_vp import ScoreSdeVpScheduler
 from .scheduling_utils import SchedulerMixin
+
+
+if is_scipy_available():
+    from .scheduling_lms_discrete import LMSDiscreteScheduler
+else:
+    from ..utils.dummy_scipy_objects import *
--- a/src/diffusers/schedulers/scheduling_ddim.py
+++ b/src/diffusers/schedulers/scheduling_ddim.py
@@ -59,6 +59,7 @@ class DDIMScheduler(SchedulerMixin, ConfigMixin):
        trained_betas=None,
        timestep_values=None,
        clip_sample=True,
+        set_alpha_to_one=True,
        tensor_format="pt",
    ):

@@ -75,7 +76,12 @@ class DDIMScheduler(SchedulerMixin, ConfigMixin):

        self.alphas = 1.0 - self.betas
        self.alphas_cumprod = np.cumprod(self.alphas, axis=0)
-        self.one = np.array(1.0)
+
+        # At every step in ddim, we are looking into the previous alphas_cumprod
+        # For the final step, there is no previous alphas_cumprod because we are already at 0
+        # `set_alpha_to_one` decides whether we set this paratemer simply to one or
+        # whether we use the final alpha of the "non-previous" one.
+        self.final_alpha_cumprod = np.array(1.0) if set_alpha_to_one else self.alphas_cumprod[0]

        # setable values
        self.num_inference_steps = None
@@ -86,7 +92,7 @@ class DDIMScheduler(SchedulerMixin, ConfigMixin):

    def _get_variance(self, timestep, prev_timestep):
        alpha_prod_t = self.alphas_cumprod[timestep]
-        alpha_prod_t_prev = self.alphas_cumprod[prev_timestep] if prev_timestep >= 0 else self.one
+        alpha_prod_t_prev = self.alphas_cumprod[prev_timestep] if prev_timestep >= 0 else self.final_alpha_cumprod
        beta_prod_t = 1 - alpha_prod_t
        beta_prod_t_prev = 1 - alpha_prod_t_prev

@@ -94,11 +100,12 @@ class DDIMScheduler(SchedulerMixin, ConfigMixin):

        return variance

-    def set_timesteps(self, num_inference_steps):
+    def set_timesteps(self, num_inference_steps, offset=0):
        self.num_inference_steps = num_inference_steps
        self.timesteps = np.arange(
            0, self.config.num_train_timesteps, self.config.num_train_timesteps // self.num_inference_steps
        )[::-1].copy()
+        self.timesteps += offset
        self.set_format(tensor_format=self.tensor_format)

    def step(
@@ -126,7 +133,7 @@ class DDIMScheduler(SchedulerMixin, ConfigMixin):

        # 2. compute alphas, betas
        alpha_prod_t = self.alphas_cumprod[timestep]
-        alpha_prod_t_prev = self.alphas_cumprod[prev_timestep] if prev_timestep >= 0 else self.one
+        alpha_prod_t_prev = self.alphas_cumprod[prev_timestep] if prev_timestep >= 0 else self.final_alpha_cumprod
        beta_prod_t = 1 - alpha_prod_t

        # 3. compute predicted original sample from predicted noise also called
--- a/src/diffusers/schedulers/scheduling_ddpm.py
+++ b/src/diffusers/schedulers/scheduling_ddpm.py
@@ -65,6 +65,9 @@ class DDPMScheduler(SchedulerMixin, ConfigMixin):
            self.betas = np.asarray(trained_betas)
        elif beta_schedule == "linear":
            self.betas = np.linspace(beta_start, beta_end, num_train_timesteps, dtype=np.float32)
+        elif beta_schedule == "scaled_linear":
+            # this schedule is very specific to the latent diffusion model.
+            self.betas = np.linspace(beta_start**0.5, beta_end**0.5, num_train_timesteps, dtype=np.float32) ** 2
        elif beta_schedule == "squaredcos_cap_v2":
            # Glide cosine schedule
            self.betas = betas_for_alpha_bar(num_train_timesteps)
@@ -82,6 +85,8 @@ class DDPMScheduler(SchedulerMixin, ConfigMixin):
        self.tensor_format = tensor_format
        self.set_format(tensor_format=tensor_format)

+        self.variance_type = variance_type
+
    def set_timesteps(self, num_inference_steps):
        num_inference_steps = min(self.config.num_train_timesteps, num_inference_steps)
        self.num_inference_steps = num_inference_steps
@@ -90,7 +95,7 @@ class DDPMScheduler(SchedulerMixin, ConfigMixin):
        )[::-1].copy()
        self.set_format(tensor_format=self.tensor_format)

-    def _get_variance(self, t, variance_type=None):
+    def _get_variance(self, t, predicted_variance=None, variance_type=None):
        alpha_prod_t = self.alphas_cumprod[t]
        alpha_prod_t_prev = self.alphas_cumprod[t - 1] if t > 0 else self.one

@@ -113,6 +118,13 @@ class DDPMScheduler(SchedulerMixin, ConfigMixin):
        elif variance_type == "fixed_large_log":
            # Glide max_log
            variance = self.log(self.betas[t])
+        elif variance_type == "learned":
+            return predicted_variance
+        elif variance_type == "learned_range":
+            min_log = variance
+            max_log = self.betas[t]
+            frac = (predicted_variance + 1) / 2
+            variance = frac * max_log + (1 - frac) * min_log

        return variance

@@ -125,6 +137,12 @@ class DDPMScheduler(SchedulerMixin, ConfigMixin):
        generator=None,
    ):
        t = timestep
+
+        if model_output.shape[1] == sample.shape[1] * 2 and self.variance_type in ["learned", "learned_range"]:
+            model_output, predicted_variance = torch.split(model_output, sample.shape[1], dim=1)
+        else:
+            predicted_variance = None
+
        # 1. compute alphas, betas
        alpha_prod_t = self.alphas_cumprod[t]
        alpha_prod_t_prev = self.alphas_cumprod[t - 1] if t > 0 else self.one
@@ -155,7 +173,7 @@ class DDPMScheduler(SchedulerMixin, ConfigMixin):
        variance = 0
        if t > 0:
            noise = self.randn_like(model_output, generator=generator)
-            variance = (self._get_variance(t) ** 0.5) * noise
+            variance = (self._get_variance(t, predicted_variance=predicted_variance) ** 0.5) * noise

        pred_prev_sample = pred_prev_sample + variance

--- a/src/diffusers/schedulers/scheduling_karras_ve.py
+++ b/src/diffusers/schedulers/scheduling_karras_ve.py
@@ -0,0 +1,127 @@
+# Copyright 2022 NVIDIA and The HuggingFace Team. All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+
+from typing import Union
+
+import numpy as np
+import torch
+
+from ..configuration_utils import ConfigMixin, register_to_config
+from .scheduling_utils import SchedulerMixin
+
+
+class KarrasVeScheduler(SchedulerMixin, ConfigMixin):
+    """
+    Stochastic sampling from Karras et al. [1] tailored to the Variance-Expanding (VE) models [2]. Use Algorithm 2 and
+    the VE column of Table 1 from [1] for reference.
+
+    [1] Karras, Tero, et al. "Elucidating the Design Space of Diffusion-Based Generative Models."
+    https://arxiv.org/abs/2206.00364 [2] Song, Yang, et al. "Score-based generative modeling through stochastic
+    differential equations." https://arxiv.org/abs/2011.13456
+    """
+
+    @register_to_config
+    def __init__(
+        self,
+        sigma_min=0.02,
+        sigma_max=100,
+        s_noise=1.007,
+        s_churn=80,
+        s_min=0.05,
+        s_max=50,
+        tensor_format="pt",
+    ):
+        """
+        For more details on the parameters, see the original paper's Appendix E.: "Elucidating the Design Space of
+        Diffusion-Based Generative Models." https://arxiv.org/abs/2206.00364. The grid search values used to find the
+        optimal {s_noise, s_churn, s_min, s_max} for a specific model are described in Table 5 of the paper.
+
+        Args:
+            sigma_min (`float`): minimum noise magnitude
+            sigma_max (`float`): maximum noise magnitude
+            s_noise (`float`): the amount of additional noise to counteract loss of detail during sampling.
+                A reasonable range is [1.000, 1.011].
+            s_churn (`float`): the parameter controlling the overall amount of stochasticity.
+                A reasonable range is [0, 100].
+            s_min (`float`): the start value of the sigma range where we add noise (enable stochasticity).
+                A reasonable range is [0, 10].
+            s_max (`float`): the end value of the sigma range where we add noise.
+                A reasonable range is [0.2, 80].
+        """
+        # setable values
+        self.num_inference_steps = None
+        self.timesteps = None
+        self.schedule = None  # sigma(t_i)
+
+        self.tensor_format = tensor_format
+        self.set_format(tensor_format=tensor_format)
+
+    def set_timesteps(self, num_inference_steps):
+        self.num_inference_steps = num_inference_steps
+        self.timesteps = np.arange(0, self.num_inference_steps)[::-1].copy()
+        self.schedule = [
+            (self.sigma_max * (self.sigma_min**2 / self.sigma_max**2) ** (i / (num_inference_steps - 1)))
+            for i in self.timesteps
+        ]
+        self.schedule = np.array(self.schedule, dtype=np.float32)
+
+        self.set_format(tensor_format=self.tensor_format)
+
+    def add_noise_to_input(self, sample, sigma, generator=None):
+        """
+        Explicit Langevin-like "churn" step of adding noise to the sample according to a factor gamma_i ≥ 0 to reach a
+        higher noise level sigma_hat = sigma_i + gamma_i*sigma_i.
+        """
+        if self.s_min <= sigma <= self.s_max:
+            gamma = min(self.s_churn / self.num_inference_steps, 2**0.5 - 1)
+        else:
+            gamma = 0
+
+        # sample eps ~ N(0, S_noise^2 * I)
+        eps = self.s_noise * torch.randn(sample.shape, generator=generator).to(sample.device)
+        sigma_hat = sigma + gamma * sigma
+        sample_hat = sample + ((sigma_hat**2 - sigma**2) ** 0.5 * eps)
+
+        return sample_hat, sigma_hat
+
+    def step(
+        self,
+        model_output: Union[torch.FloatTensor, np.ndarray],
+        sigma_hat: float,
+        sigma_prev: float,
+        sample_hat: Union[torch.FloatTensor, np.ndarray],
+    ):
+        pred_original_sample = sample_hat + sigma_hat * model_output
+        derivative = (sample_hat - pred_original_sample) / sigma_hat
+        sample_prev = sample_hat + (sigma_prev - sigma_hat) * derivative
+
+        return {"prev_sample": sample_prev, "derivative": derivative}
+
+    def step_correct(
+        self,
+        model_output: Union[torch.FloatTensor, np.ndarray],
+        sigma_hat: float,
+        sigma_prev: float,
+        sample_hat: Union[torch.FloatTensor, np.ndarray],
+        sample_prev: Union[torch.FloatTensor, np.ndarray],
+        derivative: Union[torch.FloatTensor, np.ndarray],
+    ):
+        pred_original_sample = sample_prev + sigma_prev * model_output
+        derivative_corr = (sample_prev - pred_original_sample) / sigma_prev
+        sample_prev = sample_hat + (sigma_prev - sigma_hat) * (0.5 * derivative + 0.5 * derivative_corr)
+        return {"prev_sample": sample_prev, "derivative": derivative_corr}
+
+    def add_noise(self, original_samples, noise, timesteps):
+        raise NotImplementedError()
--- a/src/diffusers/schedulers/scheduling_lms_discrete.py
+++ b/src/diffusers/schedulers/scheduling_lms_discrete.py
@@ -0,0 +1,134 @@
+# Copyright 2022 Katherine Crowson and The HuggingFace Team. All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from typing import List, Union
+
+import numpy as np
+import torch
+
+from scipy import integrate
+
+from ..configuration_utils import ConfigMixin, register_to_config
+from .scheduling_utils import SchedulerMixin
+
+
+class LMSDiscreteScheduler(SchedulerMixin, ConfigMixin):
+    @register_to_config
+    def __init__(
+        self,
+        num_train_timesteps=1000,
+        beta_start=0.0001,
+        beta_end=0.02,
+        beta_schedule="linear",
+        trained_betas=None,
+        timestep_values=None,
+        tensor_format="pt",
+    ):
+        """
+        Linear Multistep Scheduler for discrete beta schedules. Based on the original k-diffusion implementation by
+        Katherine Crowson:
+        https://github.com/crowsonkb/k-diffusion/blob/481677d114f6ea445aa009cf5bd7a9cdee909e47/k_diffusion/sampling.py#L181
+        """
+
+        if beta_schedule == "linear":
+            self.betas = np.linspace(beta_start, beta_end, num_train_timesteps, dtype=np.float32)
+        elif beta_schedule == "scaled_linear":
+            # this schedule is very specific to the latent diffusion model.
+            self.betas = np.linspace(beta_start**0.5, beta_end**0.5, num_train_timesteps, dtype=np.float32) ** 2
+        else:
+            raise NotImplementedError(f"{beta_schedule} does is not implemented for {self.__class__}")
+
+        self.alphas = 1.0 - self.betas
+        self.alphas_cumprod = np.cumprod(self.alphas, axis=0)
+
+        self.sigmas = ((1 - self.alphas_cumprod) / self.alphas_cumprod) ** 0.5
+
+        # setable values
+        self.num_inference_steps = None
+        self.timesteps = np.arange(0, num_train_timesteps)[::-1].copy()
+        self.derivatives = []
+
+        self.tensor_format = tensor_format
+        self.set_format(tensor_format=tensor_format)
+
+    def get_lms_coefficient(self, order, t, current_order):
+        """
+        Compute a linear multistep coefficient
+        """
+
+        def lms_derivative(tau):
+            prod = 1.0
+            for k in range(order):
+                if current_order == k:
+                    continue
+                prod *= (tau - self.sigmas[t - k]) / (self.sigmas[t - current_order] - self.sigmas[t - k])
+            return prod
+
+        integrated_coeff = integrate.quad(lms_derivative, self.sigmas[t], self.sigmas[t + 1], epsrel=1e-4)[0]
+
+        return integrated_coeff
+
+    def set_timesteps(self, num_inference_steps):
+        self.num_inference_steps = num_inference_steps
+        self.timesteps = np.linspace(self.num_train_timesteps - 1, 0, num_inference_steps, dtype=float)
+
+        low_idx = np.floor(self.timesteps).astype(int)
+        high_idx = np.ceil(self.timesteps).astype(int)
+        frac = np.mod(self.timesteps, 1.0)
+        sigmas = np.array(((1 - self.alphas_cumprod) / self.alphas_cumprod) ** 0.5)
+        sigmas = (1 - frac) * sigmas[low_idx] + frac * sigmas[high_idx]
+        self.sigmas = np.concatenate([sigmas, [0.0]])
+
+        self.derivatives = []
+
+        self.set_format(tensor_format=self.tensor_format)
+
+    def step(
+        self,
+        model_output: Union[torch.FloatTensor, np.ndarray],
+        timestep: int,
+        sample: Union[torch.FloatTensor, np.ndarray],
+        order: int = 4,
+    ):
+        sigma = self.sigmas[timestep]
+
+        # 1. compute predicted original sample (x_0) from sigma-scaled predicted noise
+        pred_original_sample = sample - sigma * model_output
+
+        # 2. Convert to an ODE derivative
+        derivative = (sample - pred_original_sample) / sigma
+        self.derivatives.append(derivative)
+        if len(self.derivatives) > order:
+            self.derivatives.pop(0)
+
+        # 3. Compute linear multistep coefficients
+        order = min(timestep + 1, order)
+        lms_coeffs = [self.get_lms_coefficient(order, timestep, curr_order) for curr_order in range(order)]
+
+        # 4. Compute previous sample based on the derivatives path
+        prev_sample = sample + sum(
+            coeff * derivative for coeff, derivative in zip(lms_coeffs, reversed(self.derivatives))
+        )
+
+        return {"prev_sample": prev_sample}
+
+    def add_noise(self, original_samples, noise, timesteps):
+        alpha_prod = self.alphas_cumprod[timesteps]
+        alpha_prod = self.match_shape(alpha_prod, original_samples)
+
+        noisy_samples = (alpha_prod**0.5) * original_samples + ((1 - alpha_prod) ** 0.5) * noise
+        return noisy_samples
+
+    def __len__(self):
+        return self.config.num_train_timesteps
--- a/src/diffusers/schedulers/scheduling_pndm.py
+++ b/src/diffusers/schedulers/scheduling_pndm.py
@@ -56,10 +56,14 @@ class PNDMScheduler(SchedulerMixin, ConfigMixin):
        beta_end=0.02,
        beta_schedule="linear",
        tensor_format="pt",
+        skip_prk_steps=False,
    ):

        if beta_schedule == "linear":
            self.betas = np.linspace(beta_start, beta_end, num_train_timesteps, dtype=np.float32)
+        elif beta_schedule == "scaled_linear":
+            # this schedule is very specific to the latent diffusion model.
+            self.betas = np.linspace(beta_start**0.5, beta_end**0.5, num_train_timesteps, dtype=np.float32) ** 2
        elif beta_schedule == "squaredcos_cap_v2":
            # Glide cosine schedule
            self.betas = betas_for_alpha_bar(num_train_timesteps)
@@ -85,6 +89,7 @@ class PNDMScheduler(SchedulerMixin, ConfigMixin):
        # setable values
        self.num_inference_steps = None
        self._timesteps = np.arange(0, num_train_timesteps)[::-1].copy()
+        self._offset = 0
        self.prk_timesteps = None
        self.plms_timesteps = None
        self.timesteps = None
@@ -92,19 +97,30 @@ class PNDMScheduler(SchedulerMixin, ConfigMixin):
        self.tensor_format = tensor_format
        self.set_format(tensor_format=tensor_format)

-    def set_timesteps(self, num_inference_steps):
+    def set_timesteps(self, num_inference_steps, offset=0):
        self.num_inference_steps = num_inference_steps
        self._timesteps = list(
            range(0, self.config.num_train_timesteps, self.config.num_train_timesteps // num_inference_steps)
        )
+        self._offset = offset
+        self._timesteps = [t + self._offset for t in self._timesteps]
+
+        if self.config.skip_prk_steps:
+            # for some models like stable diffusion the prk steps can/should be skipped to
+            # produce better results. When using PNDM with `self.config.skip_prk_steps` the implementation
+            # is based on crowsonkb's PLMS sampler implementation: https://github.com/CompVis/latent-diffusion/pull/51
+            self.prk_timesteps = []
+            self.plms_timesteps = list(reversed(self._timesteps[:-1] + self._timesteps[-2:-1] + self._timesteps[-1:]))
+        else:
+            prk_timesteps = np.array(self._timesteps[-self.pndm_order :]).repeat(2) + np.tile(
+                np.array([0, self.config.num_train_timesteps // num_inference_steps // 2]), self.pndm_order
+            )
+            self.prk_timesteps = list(reversed(prk_timesteps[:-1].repeat(2)[1:-1]))
+            self.plms_timesteps = list(reversed(self._timesteps[:-3]))

-        prk_timesteps = np.array(self._timesteps[-self.pndm_order :]).repeat(2) + np.tile(
-            np.array([0, self.config.num_train_timesteps // num_inference_steps // 2]), self.pndm_order
-        )
-        self.prk_timesteps = list(reversed(prk_timesteps[:-1].repeat(2)[1:-1]))
-        self.plms_timesteps = list(reversed(self._timesteps[:-3]))
        self.timesteps = self.prk_timesteps + self.plms_timesteps

+        self.ets = []
        self.counter = 0
        self.set_format(tensor_format=self.tensor_format)

@@ -114,7 +130,7 @@ class PNDMScheduler(SchedulerMixin, ConfigMixin):
        timestep: int,
        sample: Union[torch.FloatTensor, np.ndarray],
    ):
-        if self.counter < len(self.prk_timesteps):
+        if self.counter < len(self.prk_timesteps) and not self.config.skip_prk_steps:
            return self.step_prk(model_output=model_output, timestep=timestep, sample=sample)
        else:
            return self.step_plms(model_output=model_output, timestep=timestep, sample=sample)
@@ -163,7 +179,7 @@ class PNDMScheduler(SchedulerMixin, ConfigMixin):
        Step function propagating the sample with the linear multi-step method. This has one forward pass with multiple
        times to approximate the solution.
        """
-        if len(self.ets) < 3:
+        if not self.config.skip_prk_steps and len(self.ets) < 3:
            raise ValueError(
                f"{self.__class__} can only be run AFTER scheduler has been run "
                "in 'prk' mode for at least 12 iterations "
@@ -172,9 +188,26 @@ class PNDMScheduler(SchedulerMixin, ConfigMixin):
            )

        prev_timestep = max(timestep - self.config.num_train_timesteps // self.num_inference_steps, 0)
-        self.ets.append(model_output)

-        model_output = (1 / 24) * (55 * self.ets[-1] - 59 * self.ets[-2] + 37 * self.ets[-3] - 9 * self.ets[-4])
+        if self.counter != 1:
+            self.ets.append(model_output)
+        else:
+            prev_timestep = timestep
+            timestep = timestep + self.config.num_train_timesteps // self.num_inference_steps
+
+        if len(self.ets) == 1 and self.counter == 0:
+            model_output = model_output
+            self.cur_sample = sample
+        elif len(self.ets) == 1 and self.counter == 1:
+            model_output = (model_output + self.ets[-1]) / 2
+            sample = self.cur_sample
+            self.cur_sample = None
+        elif len(self.ets) == 2:
+            model_output = (3 * self.ets[-1] - self.ets[-2]) / 2
+        elif len(self.ets) == 3:
+            model_output = (23 * self.ets[-1] - 16 * self.ets[-2] + 5 * self.ets[-3]) / 12
+        else:
+            model_output = (1 / 24) * (55 * self.ets[-1] - 59 * self.ets[-2] + 37 * self.ets[-3] - 9 * self.ets[-4])

        prev_sample = self._get_prev_sample(sample, timestep, prev_timestep, model_output)
        self.counter += 1
@@ -194,8 +227,8 @@ class PNDMScheduler(SchedulerMixin, ConfigMixin):
        # sample -> x_t
        # model_output -> e_θ(x_t, t)
        # prev_sample -> x_(t−δ)
-        alpha_prod_t = self.alphas_cumprod[timestep + 1]
-        alpha_prod_t_prev = self.alphas_cumprod[timestep_prev + 1]
+        alpha_prod_t = self.alphas_cumprod[timestep + 1 - self._offset]
+        alpha_prod_t_prev = self.alphas_cumprod[timestep_prev + 1 - self._offset]
        beta_prod_t = 1 - alpha_prod_t
        beta_prod_t_prev = 1 - alpha_prod_t_prev

--- a/src/diffusers/utils/init.py
+++ b/src/diffusers/utils/init.py
@@ -69,6 +69,14 @@ except importlib_metadata.PackageNotFoundError:
    _modelcards_available = False


+_scipy_available = importlib.util.find_spec("scipy") is not None
+try:
+    _scipy_version = importlib_metadata.version("scipy")
+    logger.debug(f"Successfully imported transformers version {_scipy_version}")
+except importlib_metadata.PackageNotFoundError:
+    _scipy_available = False
+
+
 def is_transformers_available():
    return _transformers_available

@@ -85,6 +93,10 @@ def is_modelcards_available():
    return _modelcards_available


+def is_scipy_available():
+    return _scipy_available
+
+
 class RepositoryNotFoundError(HTTPError):
    """
    Raised when trying to access a hf.co URL with an invalid repository name, or with a private repo name the user does
@@ -118,11 +130,18 @@ inflect`
 """


+SCIPY_IMPORT_ERROR = """
+{0} requires the scipy library but it was not found in your environment. You can install it with pip: `pip install
+scipy`
+"""
+
+
 BACKENDS_MAPPING = OrderedDict(
    [
        ("transformers", (is_transformers_available, TRANSFORMERS_IMPORT_ERROR)),
        ("unidecode", (is_unidecode_available, UNIDECODE_IMPORT_ERROR)),
        ("inflect", (is_inflect_available, INFLECT_IMPORT_ERROR)),
+        ("scipy", (is_scipy_available, SCIPY_IMPORT_ERROR)),
    ]
 )

--- a/src/diffusers/utils/dummy_scipy_objects.py
+++ b/src/diffusers/utils/dummy_scipy_objects.py
@@ -0,0 +1,24 @@
+# This file is autogenerated by the command `make fix-copies`, do not edit.
+# flake8: noqa
+from ..utils import DummyObject, requires_backends
+
+
+class LMSDiscreteScheduler(metaclass=DummyObject):
+    _backends = ["scipy"]
+
+    def __init__(self, *args, **kwargs):
+        requires_backends(self, ["scipy"])
+
+
+class LDMTextToImagePipeline(metaclass=DummyObject):
+    _backends = ["scipy"]
+
+    def __init__(self, *args, **kwargs):
+        requires_backends(self, ["scipy"])
+
+
+class StableDiffusionPipeline(metaclass=DummyObject):
+    _backends = ["scipy"]
+
+    def __init__(self, *args, **kwargs):
+        requires_backends(self, ["scipy"])
--- a/src/diffusers/utils/dummy_transformers_objects.py
+++ b/src/diffusers/utils/dummy_transformers_objects.py
@@ -8,3 +8,10 @@ class LDMTextToImagePipeline(metaclass=DummyObject):

    def __init__(self, *args, **kwargs):
        requires_backends(self, ["transformers"])
+
+
+class StableDiffusionPipeline(metaclass=DummyObject):
+    _backends = ["transformers"]
+
+    def __init__(self, *args, **kwargs):
+        requires_backends(self, ["transformers"])
--- a/tests/test_modeling_utils.py
+++ b/tests/test_modeling_utils.py
@@ -29,12 +29,16 @@ from diffusers import (
    DDIMScheduler,
    DDPMPipeline,
    DDPMScheduler,
+    KarrasVePipeline,
+    KarrasVeScheduler,
    LDMPipeline,
    LDMTextToImagePipeline,
+    LMSDiscreteScheduler,
    PNDMPipeline,
    PNDMScheduler,
    ScoreSdeVePipeline,
    ScoreSdeVeScheduler,
+    StableDiffusionPipeline,
    UNet2DModel,
    VQModel,
 )
@@ -555,11 +559,11 @@ class VQModelTests(ModelTesterMixin, unittest.TestCase):

    def prepare_init_args_and_inputs_for_common(self):
        init_dict = {
-            "block_out_channels": [64],
+            "block_out_channels": [32, 64],
            "in_channels": 3,
            "out_channels": 3,
-            "down_block_types": ["DownEncoderBlock2D"],
-            "up_block_types": ["UpDecoderBlock2D"],
+            "down_block_types": ["DownEncoderBlock2D", "DownEncoderBlock2D"],
+            "up_block_types": ["UpDecoderBlock2D", "UpDecoderBlock2D"],
            "latent_channels": 3,
        }
        inputs_dict = self.dummy_input
@@ -595,7 +599,7 @@ class VQModelTests(ModelTesterMixin, unittest.TestCase):

        output_slice = output[0, -1, -3:, -3:].flatten()
        # fmt: off
-        expected_output_slice = torch.tensor([-1.1321, 0.1056, 0.3505, -0.6461, -0.2014, 0.0419, -0.5763, -0.8462, -0.4218])
+        expected_output_slice = torch.tensor([-0.0153, -0.4044, -0.1880, -0.5161, -0.2418, -0.4072, -0.1612, -0.0633, -0.0143])
        # fmt: on
        self.assertTrue(torch.allclose(output_slice, expected_output_slice, rtol=1e-2))

@@ -623,22 +627,11 @@ class AutoencoderKLTests(ModelTesterMixin, unittest.TestCase):

    def prepare_init_args_and_inputs_for_common(self):
        init_dict = {
-            "ch": 64,
-            "ch_mult": (1,),
-            "embed_dim": 4,
-            "in_channels": 3,
-            "attn_resolutions": [],
-            "num_res_blocks": 1,
-            "out_ch": 3,
-            "resolution": 32,
-            "z_channels": 4,
-        }
-        init_dict = {
-            "block_out_channels": [64],
+            "block_out_channels": [32, 64],
            "in_channels": 3,
            "out_channels": 3,
-            "down_block_types": ["DownEncoderBlock2D"],
-            "up_block_types": ["UpDecoderBlock2D"],
+            "down_block_types": ["DownEncoderBlock2D", "DownEncoderBlock2D"],
+            "up_block_types": ["UpDecoderBlock2D", "UpDecoderBlock2D"],
            "latent_channels": 4,
        }
        inputs_dict = self.dummy_input
@@ -674,7 +667,7 @@ class AutoencoderKLTests(ModelTesterMixin, unittest.TestCase):

        output_slice = output[0, -1, -3:, -3:].flatten()
        # fmt: off
-        expected_output_slice = torch.tensor([-0.3900, -0.2800, 0.1281, -0.4449, -0.4890, -0.0207, 0.0784, -0.1258, -0.0409])
+        expected_output_slice = torch.tensor([-4.0078e-01, -3.8304e-04, -1.2681e-01, -1.1462e-01, 2.0095e-01, 1.0893e-01, -8.8248e-02, -3.0361e-01, -9.8646e-03])
        # fmt: on
        self.assertTrue(torch.allclose(output_slice, expected_output_slice, rtol=1e-2))

@@ -725,6 +718,28 @@ class PipelineTesterMixin(unittest.TestCase):

        assert np.abs(image - new_image).sum() < 1e-5, "Models don't give the same forward pass"

+    @slow
+    def test_from_pretrained_hub_pass_model(self):
+        model_path = "google/ddpm-cifar10-32"
+
+        # pass unet into DiffusionPipeline
+        unet = UNet2DModel.from_pretrained(model_path)
+        ddpm_from_hub_custom_model = DDPMPipeline.from_pretrained(model_path, unet=unet)
+        ddpm_from_hub_custom_model = DiffusionPipeline.from_pretrained(model_path, unet=unet)
+
+        ddpm_from_hub = DiffusionPipeline.from_pretrained(model_path)
+
+        ddpm_from_hub_custom_model.scheduler.num_timesteps = 10
+        ddpm_from_hub.scheduler.num_timesteps = 10
+
+        generator = torch.manual_seed(0)
+
+        image = ddpm_from_hub_custom_model(generator=generator, output_type="numpy")["sample"]
+        generator = generator.manual_seed(0)
+        new_image = ddpm_from_hub(generator=generator, output_type="numpy")["sample"]
+
+        assert np.abs(image - new_image).sum() < 1e-5, "Models don't give the same forward pass"
+
    @slow
    def test_output_format(self):
        model_path = "google/ddpm-cifar10-32"
@@ -848,6 +863,54 @@ class PipelineTesterMixin(unittest.TestCase):
        expected_slice = np.array([0.3163, 0.8670, 0.6465, 0.1865, 0.6291, 0.5139, 0.2824, 0.3723, 0.4344])
        assert np.abs(image_slice.flatten() - expected_slice).max() < 1e-2

+    @slow
+    @unittest.skipIf(torch_device == "cpu", "Stable diffusion is supposed to run on GPU")
+    def test_stable_diffusion(self):
+        # make sure here that pndm scheduler skips prk
+        sd_pipe = StableDiffusionPipeline.from_pretrained("CompVis/stable-diffusion-v1-1-diffusers")
+
+        prompt = "A painting of a squirrel eating a burger"
+        generator = torch.Generator(device=torch_device).manual_seed(0)
+        with torch.autocast("cuda"):
+            output = sd_pipe(
+                [prompt], generator=generator, guidance_scale=6.0, num_inference_steps=20, output_type="np"
+            )
+
+        image = output["sample"]
+
+        image_slice = image[0, -3:, -3:, -1]
+
+        assert image.shape == (1, 512, 512, 3)
+        expected_slice = np.array([0.8887, 0.915, 0.91, 0.894, 0.909, 0.912, 0.919, 0.925, 0.883])
+        assert np.abs(image_slice.flatten() - expected_slice).max() < 1e-2
+
+    @slow
+    @unittest.skipIf(torch_device == "cpu", "Stable diffusion is supposed to run on GPU")
+    def test_stable_diffusion_fast_ddim(self):
+        sd_pipe = StableDiffusionPipeline.from_pretrained("CompVis/stable-diffusion-v1-1-diffusers")
+
+        scheduler = DDIMScheduler(
+            beta_start=0.00085,
+            beta_end=0.012,
+            beta_schedule="scaled_linear",
+            clip_sample=False,
+            set_alpha_to_one=False,
+        )
+        sd_pipe.scheduler = scheduler
+
+        prompt = "A painting of a squirrel eating a burger"
+        generator = torch.Generator(device=torch_device).manual_seed(0)
+
+        with torch.autocast("cuda"):
+            output = sd_pipe([prompt], generator=generator, num_inference_steps=2, output_type="numpy")
+        image = output["sample"]
+
+        image_slice = image[0, -3:, -3:, -1]
+
+        assert image.shape == (1, 512, 512, 3)
+        expected_slice = np.array([0.8354, 0.83, 0.866, 0.838, 0.8315, 0.867, 0.836, 0.8584, 0.869])
+        assert np.abs(image_slice.flatten() - expected_slice).max() < 1e-3
+
    @slow
    def test_score_sde_ve_pipeline(self):
        model_id = "google/ncsnpp-church-256"
@@ -863,6 +926,7 @@ class PipelineTesterMixin(unittest.TestCase):
        image_slice = image[0, -3:, -3:, -1]

        assert image.shape == (1, 256, 256, 3)
+
        expected_slice = np.array([0.64363, 0.5868, 0.3031, 0.2284, 0.7409, 0.3216, 0.25643, 0.6557, 0.2633])
        assert np.abs(image_slice.flatten() - expected_slice).max() < 1e-2

@@ -920,3 +984,38 @@ class PipelineTesterMixin(unittest.TestCase):

        # the values aren't exactly equal, but the images look the same visually
        assert np.abs(ddpm_images - ddim_images).max() < 1e-1
+
+    @slow
+    def test_karras_ve_pipeline(self):
+        model_id = "google/ncsnpp-celebahq-256"
+        model = UNet2DModel.from_pretrained(model_id)
+        scheduler = KarrasVeScheduler(tensor_format="pt")
+
+        pipe = KarrasVePipeline(unet=model, scheduler=scheduler)
+
+        generator = torch.manual_seed(0)
+        image = pipe(num_inference_steps=20, generator=generator, output_type="numpy")["sample"]
+
+        image_slice = image[0, -3:, -3:, -1]
+        assert image.shape == (1, 256, 256, 3)
+        expected_slice = np.array([0.26815, 0.1581, 0.2658, 0.23248, 0.1550, 0.2539, 0.1131, 0.1024, 0.0837])
+        assert np.abs(image_slice.flatten() - expected_slice).max() < 1e-2
+
+    @slow
+    @unittest.skipIf(torch_device == "cpu", "Stable diffusion is supposed to run on GPU")
+    def test_lms_stable_diffusion_pipeline(self):
+        model_id = "CompVis/stable-diffusion-v1-1-diffusers"
+        pipe = StableDiffusionPipeline.from_pretrained(model_id, use_auth_token=True)
+        scheduler = LMSDiscreteScheduler.from_config(model_id, subfolder="scheduler", use_auth_token=True)
+        pipe.scheduler = scheduler
+
+        prompt = "a photograph of an astronaut riding a horse"
+        generator = torch.Generator(device=torch_device).manual_seed(0)
+        image = pipe([prompt], generator=generator, guidance_scale=7.5, num_inference_steps=10, output_type="numpy")[
+            "sample"
+        ]
+
+        image_slice = image[0, -3:, -3:, -1]
+        assert image.shape == (1, 512, 512, 3)
+        expected_slice = np.array([0.9077, 0.9254, 0.9181, 0.9227, 0.9213, 0.9367, 0.9399, 0.9406, 0.9024])
+        assert np.abs(image_slice.flatten() - expected_slice).max() < 1e-2
Author	SHA1	Message	Date
Patrick von Platen	4b02f53e62	Release: v0.2.2	2022-08-16 19:30:08 +02:00
Patrick von Platen	27d11a0094	[K-LMS Scheduler] fix import (#191 )	2022-08-16 19:25:45 +02:00
Patrick von Platen	554e67cb06	Update README.md	2022-08-16 19:12:25 +02:00
Patrick von Platen	45cb500667	Update README.md	2022-08-16 19:10:35 +02:00
Patrick von Platen	8c78e73fef	Update README.md	2022-08-16 19:09:09 +02:00
anton-l	c1b378db69	Release: v0.2.1	2022-08-16 18:22:45 +02:00
Patrick von Platen	b50a9ae383	[Stable diffusion] Hot fix	2022-08-16 16:17:32 +00:00
anton-l	ea2e177c1d	Release: v0.2.0	2022-08-16 17:39:50 +02:00
Pedro Cuenca	513f1fbfb0	Allow passing non-default modules to pipeline (#188 ) * Allow passing non-default modules to pipeline. Override modules are recognized and replaced in the pipeline. However, no check is performed about mismatched classes yet. This is because the override module is already instantiated and we have no library or class name to compare against. * up * add test Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>	2022-08-16 17:25:25 +02:00
Anton Lozhkov	d7b692083c	Add K-LMS scheduler from k-diffusion (#185 ) * test LMS with LDM * test LMS with LDM * Interchangeable sigma and timestep. Added dummy objects * Debug * cuda generator * Fix derivatives * Update tests * Rename Lms->LMS	2022-08-16 16:48:35 +02:00
Patrick von Platen	9070c394aa	[Naming] correct config naming of DDIM pipeline (#187 )	2022-08-16 15:50:36 +02:00
Patrick von Platen	194ed794d8	[PNDM] Stable diffusion (#186 ) * [PNDM] Stable diffusino * finish	2022-08-16 15:33:13 +02:00
Patrick von Platen	051b34635f	[Half precision] Make sure half-precision is correct (#182 ) * [Half precision] Make sure half-precision is correct * Update src/diffusers/models/unet_2d.py * Update src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion.py * correct some tests * Apply suggestions from code review Co-authored-by: Suraj Patil <surajp815@gmail.com> * finalize * finish Co-authored-by: Suraj Patil <surajp815@gmail.com>	2022-08-16 10:42:24 +02:00
Suraj Patil	5f25818a0f	allow custom height, width in StableDiffusionPipeline (#179 ) * allow custom height width * raise if height width are not mul of 8	2022-08-15 10:28:03 +05:30
Suraj Patil	c25d8c905c	add tests for stable diffusion pipeline (#178 ) add tests for sd pipeline	2022-08-14 18:51:02 +05:30
Suraj Patil	5782e0393d	Stable diffusion pipeline (#168 ) * add stable diffusion pipeline * get rid of multiple if/else * batch_size is unused * add type hints * Update src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion.py * fix some bugs Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>	2022-08-14 14:43:14 +02:00
Suraj Patil	92b6dbba1a	[LDM pipeline] fix eta condition. (#171 ) fix typo in condirion	2022-08-13 12:32:01 +05:30
Suraj Patil	c72e343085	[PNDM in LDM pipeline] use inspect in pipeline instead of unused kwargs (#167 ) use inspect instead of unused kwargs	2022-08-12 20:29:54 +05:30
Suraj Patil	3228eb1609	allow pndm scheduler to be used with ldm pipeline (#165 )	2022-08-11 14:58:14 +05:30
Suraj Patil	c1488ff348	add scaled_linear schedule in PNDM and DDPM (#164 )	2022-08-11 14:56:12 +05:30
Suraj Patil	b344c953a8	add attention up/down blocks for VAE (#161 )	2022-08-10 16:38:32 +05:30
Anton Lozhkov	dd10da76a7	Add an alternative Karras et al. stochastic scheduler for VE models (#160 ) * karras + VE, not flexible yet * Fix inputs incompatibility with the original unet * Roll back sigma scaling * Apply suggestions from code review * Old comment * Fix doc	2022-08-09 15:58:30 +02:00
Suraj Patil	543ee1e092	[LDMTextToImagePipeline] make text model generic (#162 ) make text model generic	2022-08-09 19:16:17 +05:30
Pedro Cuenca	75b6c16567	Minor typos (#159 )	2022-08-06 21:59:41 +02:00
Pedro Cuenca	c4ae7c2421	Fix arg key for `dataset_name` in `create_model_card` (#158 ) Fix arg key for `dataset_name` The example training script was changed in #152, but not `create_model_card`.	2022-08-06 21:59:12 +02:00
Suraj Patil	a2090375ca	[VAE] fix the downsample block in Encoder. (#156 ) * pass downsample_padding in encoder * update tests	2022-08-06 17:36:07 +05:30
Suraj Patil	c4a3b09a36	[UNet2DConditionModel] add cross_attention_dim as an argument (#155 ) add cross_attention_dim as an argument	2022-08-05 18:12:03 +05:30
Sugato Ray	616c3a42cb	Added `diffusers` to conda-forge and updated README for installation instruction (#129 ) add instruction to install with conda Co-authored-by: Anton Lozhkov <anton@huggingface.co>	2022-08-03 16:46:23 +02:00
Omar Sanseviero	d23cf98769	Add issue templates for feature requests and bug reports (#153 ) * Add issue template for feature requests and bug reports * Update .github/ISSUE_TEMPLATE/config.yml Co-authored-by: Anton Lozhkov <anton@huggingface.co>	2022-08-03 16:38:37 +02:00
Anton Lozhkov	eeb9264acd	Support training with a local image folder (#152 ) * Support training with an image folder * style	2022-08-03 15:25:00 +02:00
Eyal Mazuz	b6447fa87e	Allow DDPM scheduler to use model's predicated variance (#132 ) * Extented the ability of ddpm scheduler to utilize model that also predict the variance. * Update src/diffusers/schedulers/scheduling_ddpm.py Co-authored-by: Anton Lozhkov <aglozhkov@gmail.com>	2022-08-03 12:40:04 +02:00
				`@@ -0,0 +1 @@`
				`from .pipeline_stochastic_karras_ve import KarrasVePipeline`