* WIP modeling code and pipeline * add custom attention processor + custom activation + add to init * correct ProjectionModel forward * add stable audio to __initèè * add autoencoder and update pipeline and modeling code * add half Rope * add partial rotary v2 * add temporary modfis to scheduler * add EDM DPM Solver * remove TODOs * clean GLU * remove att.group_norm to attn processor * revert back src/diffusers/schedulers/scheduling_dpmsolver_multistep.py * refactor GLU -> SwiGLU * remove redundant args * add channel multiples in autoencoder docstrings * changes in docsrtings and copyright headers * clean pipeline * further cleaning * remove peft and lora and fromoriginalmodel * Delete src/diffusers/pipelines/stable_audio/diffusers.code-workspace * make style * dummy models * fix copied from * add fast oobleck tests * add brownian tree * oobleck autoencoder slow tests * remove TODO * fast stable audio pipeline tests * add slow tests * make style * add first version of docs * wrap is_torchsde_available to the scheduler * fix slow test * test with input waveform * add input waveform * remove some todos * create stableaudio gaussian projection + make style * add pipeline to toctree * fix copied from * make quality * refactor timestep_features->time_proj * refactor joint_attention_kwargs->cross_attention_kwargs * remove forward_chunk * move StableAudioDitModel to transformers folder * correct convert + remove partial rotary embed * apply suggestions from yiyixuxu -> removing attn.kv_heads * remove temb * remove cross_attention_kwargs * further removal of cross_attention_kwargs * remove text encoder autocast to fp16 * continue removing autocast * make style * refactor how text and audio are embedded * add paper * update example code * make style * unify projection model forward + fix device placement * make style * remove fuse qkv * apply suggestions from review * Update src/diffusers/pipelines/stable_audio/pipeline_stable_audio.py Co-authored-by: YiYi Xu <yixu310@gmail.com> * make style * smaller models in fast tests * pass sequential offloading fast tests * add docs for vae and autoencoder * make style and update example * remove useless import * add cosine scheduler * dummy classes * cosine scheduler docs * better description of scheduler --------- Co-authored-by: YiYi Xu <yixu310@gmail.com>
6.1 KiB
Pipelines
Pipelines provide a simple way to run state-of-the-art diffusion models in inference by bundling all of the necessary components (multiple independently-trained models, schedulers, and processors) into a single end-to-end class. Pipelines are flexible and they can be adapted to use different schedulers or even model components.
All pipelines are built from the base [DiffusionPipeline] class which provides basic functionality for loading, downloading, and saving all the components. Specific pipeline types (for example [StableDiffusionPipeline]) loaded with [~DiffusionPipeline.from_pretrained] are automatically detected and the pipeline components are loaded and passed to the __init__ function of the pipeline.
You shouldn't use the [DiffusionPipeline] class for training. Individual components (for example, [UNet2DModel] and [UNet2DConditionModel]) of diffusion pipelines are usually trained individually, so we suggest directly working with them instead.
Pipelines do not offer any training functionality. You'll notice PyTorch's autograd is disabled by decorating the [~DiffusionPipeline.__call__] method with a torch.no_grad decorator because pipelines should not be used for training. If you're interested in training, please take a look at the Training guides instead!
The table below lists all the pipelines currently available in 🤗 Diffusers and the tasks they support. Click on a pipeline to view its abstract and published paper.
| Pipeline | Tasks |
|---|---|
| AltDiffusion | image2image |
| AnimateDiff | text2video |
| Attend-and-Excite | text2image |
| Audio Diffusion | image2audio |
| AudioLDM | text2audio |
| AudioLDM2 | text2audio |
| BLIP Diffusion | text2image |
| Consistency Models | unconditional image generation |
| ControlNet | text2image, image2image, inpainting |
| ControlNet with Stable Diffusion XL | text2image |
| ControlNet-XS | text2image |
| ControlNet-XS with Stable Diffusion XL | text2image |
| Cycle Diffusion | image2image |
| Dance Diffusion | unconditional audio generation |
| DDIM | unconditional image generation |
| DDPM | unconditional image generation |
| DeepFloyd IF | text2image, image2image, inpainting, super-resolution |
| DiffEdit | inpainting |
| DiT | text2image |
| GLIGEN | text2image |
| InstructPix2Pix | image editing |
| Kandinsky 2.1 | text2image, image2image, inpainting, interpolation |
| Kandinsky 2.2 | text2image, image2image, inpainting |
| Kandinsky 3 | text2image, image2image |
| Latent Consistency Models | text2image |
| Latent Diffusion | text2image, super-resolution |
| LDM3D | text2image, text-to-3D, text-to-pano, upscaling |
| LEDITS++ | image editing |
| MultiDiffusion | text2image |
| MusicLDM | text2audio |
| Paint by Example | inpainting |
| ParaDiGMS | text2image |
| Pix2Pix Zero | image editing |
| PixArt-α | text2image |
| PNDM | unconditional image generation |
| RePaint | inpainting |
| Score SDE VE | unconditional image generation |
| Self-Attention Guidance | text2image |
| Semantic Guidance | text2image |
| Shap-E | text-to-3D, image-to-3D |
| Spectrogram Diffusion | |
| Stable Audio | text2audio |
| Stable Diffusion | text2image, image2image, depth2image, inpainting, image variation, latent upscaler, super-resolution |
| Stable Diffusion Model Editing | model editing |
| Stable Diffusion XL | text2image, image2image, inpainting |
| Stable Diffusion XL Turbo | text2image, image2image, inpainting |
| Stable unCLIP | text2image, image variation |
| Stochastic Karras VE | unconditional image generation |
| T2I-Adapter | text2image |
| Text2Video | text2video, video2video |
| Text2Video-Zero | text2video |
| unCLIP | text2image, image variation |
| Unconditional Latent Diffusion | unconditional image generation |
| UniDiffuser | text2image, image2text, image variation, text variation, unconditional image generation, unconditional audio generation |
| Value-guided planning | value guided sampling |
| Versatile Diffusion | text2image, image variation |
| VQ Diffusion | text2image |
| Wuerstchen | text2image |
DiffusionPipeline
autodoc DiffusionPipeline - all - call - device - to - components
autodoc pipelines.StableDiffusionMixin.enable_freeu
autodoc pipelines.StableDiffusionMixin.disable_freeu
FlaxDiffusionPipeline
autodoc pipelines.pipeline_flax_utils.FlaxDiffusionPipeline
PushToHubMixin
autodoc utils.PushToHubMixin