Files
diffusers/docs/source/en/api/pipelines/overview.md
Yoach Lacombe 69e72b1dd1 Stable Audio integration (#8716)
* WIP modeling code and pipeline

* add custom attention processor + custom activation + add to init

* correct ProjectionModel forward

* add stable audio to __initèè

* add autoencoder and update pipeline and modeling code

* add half Rope

* add partial rotary v2

* add temporary modfis to scheduler

* add EDM DPM Solver

* remove TODOs

* clean GLU

* remove att.group_norm to attn processor

* revert back src/diffusers/schedulers/scheduling_dpmsolver_multistep.py

* refactor GLU -> SwiGLU

* remove redundant args

* add channel multiples in autoencoder docstrings

* changes in docsrtings and copyright headers

* clean pipeline

* further cleaning

* remove peft and lora and fromoriginalmodel

* Delete src/diffusers/pipelines/stable_audio/diffusers.code-workspace

* make style

* dummy models

* fix copied from

* add fast oobleck tests

* add brownian tree

* oobleck autoencoder slow tests

* remove TODO

* fast stable audio pipeline tests

* add slow tests

* make style

* add first version of docs

* wrap is_torchsde_available to the scheduler

* fix slow test

* test with input waveform

* add input waveform

* remove some todos

* create stableaudio gaussian projection + make style

* add pipeline to toctree

* fix copied from

* make quality

* refactor timestep_features->time_proj

* refactor joint_attention_kwargs->cross_attention_kwargs

* remove forward_chunk

* move StableAudioDitModel to transformers folder

* correct convert + remove partial rotary embed

* apply suggestions from yiyixuxu -> removing attn.kv_heads

* remove temb

* remove cross_attention_kwargs

* further removal of cross_attention_kwargs

* remove text encoder autocast to fp16

* continue removing autocast

* make style

* refactor how text and audio are embedded

* add paper

* update example code

* make style

* unify projection model forward + fix device placement

* make style

* remove fuse qkv

* apply suggestions from review

* Update src/diffusers/pipelines/stable_audio/pipeline_stable_audio.py

Co-authored-by: YiYi Xu <yixu310@gmail.com>

* make style

* smaller models in fast tests

* pass sequential offloading fast tests

* add docs for vae and autoencoder

* make style and update example

* remove useless import

* add cosine scheduler

* dummy classes

* cosine scheduler docs

* better description of scheduler

---------

Co-authored-by: YiYi Xu <yixu310@gmail.com>
2024-07-30 15:29:06 +05:30

6.1 KiB
Raw Blame History

Pipelines

Pipelines provide a simple way to run state-of-the-art diffusion models in inference by bundling all of the necessary components (multiple independently-trained models, schedulers, and processors) into a single end-to-end class. Pipelines are flexible and they can be adapted to use different schedulers or even model components.

All pipelines are built from the base [DiffusionPipeline] class which provides basic functionality for loading, downloading, and saving all the components. Specific pipeline types (for example [StableDiffusionPipeline]) loaded with [~DiffusionPipeline.from_pretrained] are automatically detected and the pipeline components are loaded and passed to the __init__ function of the pipeline.

You shouldn't use the [DiffusionPipeline] class for training. Individual components (for example, [UNet2DModel] and [UNet2DConditionModel]) of diffusion pipelines are usually trained individually, so we suggest directly working with them instead.


Pipelines do not offer any training functionality. You'll notice PyTorch's autograd is disabled by decorating the [~DiffusionPipeline.__call__] method with a torch.no_grad decorator because pipelines should not be used for training. If you're interested in training, please take a look at the Training guides instead!

The table below lists all the pipelines currently available in 🤗 Diffusers and the tasks they support. Click on a pipeline to view its abstract and published paper.

Pipeline Tasks
AltDiffusion image2image
AnimateDiff text2video
Attend-and-Excite text2image
Audio Diffusion image2audio
AudioLDM text2audio
AudioLDM2 text2audio
BLIP Diffusion text2image
Consistency Models unconditional image generation
ControlNet text2image, image2image, inpainting
ControlNet with Stable Diffusion XL text2image
ControlNet-XS text2image
ControlNet-XS with Stable Diffusion XL text2image
Cycle Diffusion image2image
Dance Diffusion unconditional audio generation
DDIM unconditional image generation
DDPM unconditional image generation
DeepFloyd IF text2image, image2image, inpainting, super-resolution
DiffEdit inpainting
DiT text2image
GLIGEN text2image
InstructPix2Pix image editing
Kandinsky 2.1 text2image, image2image, inpainting, interpolation
Kandinsky 2.2 text2image, image2image, inpainting
Kandinsky 3 text2image, image2image
Latent Consistency Models text2image
Latent Diffusion text2image, super-resolution
LDM3D text2image, text-to-3D, text-to-pano, upscaling
LEDITS++ image editing
MultiDiffusion text2image
MusicLDM text2audio
Paint by Example inpainting
ParaDiGMS text2image
Pix2Pix Zero image editing
PixArt-α text2image
PNDM unconditional image generation
RePaint inpainting
Score SDE VE unconditional image generation
Self-Attention Guidance text2image
Semantic Guidance text2image
Shap-E text-to-3D, image-to-3D
Spectrogram Diffusion
Stable Audio text2audio
Stable Diffusion text2image, image2image, depth2image, inpainting, image variation, latent upscaler, super-resolution
Stable Diffusion Model Editing model editing
Stable Diffusion XL text2image, image2image, inpainting
Stable Diffusion XL Turbo text2image, image2image, inpainting
Stable unCLIP text2image, image variation
Stochastic Karras VE unconditional image generation
T2I-Adapter text2image
Text2Video text2video, video2video
Text2Video-Zero text2video
unCLIP text2image, image variation
Unconditional Latent Diffusion unconditional image generation
UniDiffuser text2image, image2text, image variation, text variation, unconditional image generation, unconditional audio generation
Value-guided planning value guided sampling
Versatile Diffusion text2image, image variation
VQ Diffusion text2image
Wuerstchen text2image

DiffusionPipeline

autodoc DiffusionPipeline - all - call - device - to - components

autodoc pipelines.StableDiffusionMixin.enable_freeu

autodoc pipelines.StableDiffusionMixin.disable_freeu

FlaxDiffusionPipeline

autodoc pipelines.pipeline_flax_utils.FlaxDiffusionPipeline

PushToHubMixin

autodoc utils.PushToHubMixin