diff --git a/docs/source/en/modular_diffusers/modular_pipeline.md b/docs/source/en/modular_diffusers/modular_pipeline.md index 34cd8f72b5..c7e0a54bc7 100644 --- a/docs/source/en/modular_diffusers/modular_pipeline.md +++ b/docs/source/en/modular_diffusers/modular_pipeline.md @@ -12,27 +12,28 @@ specific language governing permissions and limitations under the License. # ModularPipeline -[`ModularPipeline`] converts [`~modular_pipelines.ModularPipelineBlocks`]'s into an executable pipeline that loads models and performs the computation steps defined in the block. It is the main interface for running a pipeline and it is very similar to the [`DiffusionPipeline`] API. +[`ModularPipeline`] converts [`~modular_pipelines.ModularPipelineBlocks`] into an executable pipeline that loads models and performs the computation steps defined in the blocks. It is the main interface for running a pipeline and the API is very similar to [`DiffusionPipeline`] but with a few key differences. -The main difference is to include an expected `output` argument in the pipeline. +**Loading is lazy.** With [`DiffusionPipeline`], [`~DiffusionPipeline.from_pretrained`] creates the pipeline and loads all models at the same time. With [`ModularPipeline`], creating and loading are two separate steps: [`~ModularPipeline.from_pretrained`] reads the configuration and knows where to load each component from, but doesn't actually load the model weights. You load the models later with [`~ModularPipeline.load_components`], which is where you pass loading arguments like `torch_dtype` and `quantization_config`. + +**Two ways to create a pipeline.** You can use [`~ModularPipeline.from_pretrained`] with an existing diffusers model repository — it automatically maps to the default pipeline blocks and then converts to a [`ModularPipeline`] with no extra setup. Currently supported models include SDXL, Wan, Qwen, Z-Image, Flux, and Flux2. You can also assemble your own pipeline from [`ModularPipelineBlocks`] and convert it with the [`~ModularPipelineBlocks.init_pipeline`] method (see [Creating a pipeline](#creating-a-pipeline) for more details). + +**Running the pipeline is the same.** Once loaded, you call the pipeline with the same arguments you're used to. A single [`ModularPipeline`] can support multiple workflows (text-to-image, image-to-image, inpainting, etc.) when the pipeline blocks use [`AutoPipelineBlocks`](./auto_pipeline) to automatically select the workflow based on your inputs. + +Below are complete examples for text-to-image, image-to-image, and inpainting with SDXL. ```py import torch -from diffusers.modular_pipelines import SequentialPipelineBlocks -from diffusers.modular_pipelines.stable_diffusion_xl import TEXT2IMAGE_BLOCKS - -blocks = SequentialPipelineBlocks.from_blocks_dict(TEXT2IMAGE_BLOCKS) - -modular_repo_id = "YiYiXu/modular-loader-t2i-0704" -pipeline = blocks.init_pipeline(modular_repo_id) +from diffusers import ModularPipeline +pipeline = ModularPipeline.from_pretrained("stabilityai/stable-diffusion-xl-base-1.0") pipeline.load_components(torch_dtype=torch.float16) pipeline.to("cuda") -image = pipeline(prompt="Astronaut in a jungle, cold color palette, muted colors, detailed, 8k", output="images")[0] +image = pipeline(prompt="Astronaut in a jungle, cold color palette, muted colors, detailed, 8k").images[0] image.save("modular_t2i_out.png") ``` @@ -41,21 +42,17 @@ image.save("modular_t2i_out.png") ```py import torch -from diffusers.modular_pipelines import SequentialPipelineBlocks -from diffusers.modular_pipelines.stable_diffusion_xl import IMAGE2IMAGE_BLOCKS - -blocks = SequentialPipelineBlocks.from_blocks_dict(IMAGE2IMAGE_BLOCKS) - -modular_repo_id = "YiYiXu/modular-loader-t2i-0704" -pipeline = blocks.init_pipeline(modular_repo_id) +from diffusers import ModularPipeline +from diffusers.utils import load_image +pipeline = ModularPipeline.from_pretrained("stabilityai/stable-diffusion-xl-base-1.0") pipeline.load_components(torch_dtype=torch.float16) pipeline.to("cuda") url = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/sdxl-text2img.png" init_image = load_image(url) prompt = "a dog catching a frisbee in the jungle" -image = pipeline(prompt=prompt, image=init_image, strength=0.8, output="images")[0] +image = pipeline(prompt=prompt, image=init_image, strength=0.8).images[0] image.save("modular_i2i_out.png") ``` @@ -64,15 +61,10 @@ image.save("modular_i2i_out.png") ```py import torch -from diffusers.modular_pipelines import SequentialPipelineBlocks -from diffusers.modular_pipelines.stable_diffusion_xl import INPAINT_BLOCKS +from diffusers import ModularPipeline from diffusers.utils import load_image -blocks = SequentialPipelineBlocks.from_blocks_dict(INPAINT_BLOCKS) - -modular_repo_id = "YiYiXu/modular-loader-t2i-0704" -pipeline = blocks.init_pipeline(modular_repo_id) - +pipeline = ModularPipeline.from_pretrained("stabilityai/stable-diffusion-xl-base-1.0") pipeline.load_components(torch_dtype=torch.float16) pipeline.to("cuda") @@ -83,96 +75,160 @@ init_image = load_image(img_url) mask_image = load_image(mask_url) prompt = "A deep sea diver floating" -image = pipeline(prompt=prompt, image=init_image, mask_image=mask_image, strength=0.85, output="images")[0] +image = pipeline(prompt=prompt, image=init_image, mask_image=mask_image, strength=0.85).images[0] image.save("moduar_inpaint_out.png") ``` -This guide will show you how to create a [`ModularPipeline`] and manage the components in it. - -## Adding blocks - -Blocks are [`InsertableDict`] objects that can be inserted at specific positions, providing a flexible way to mix-and-match blocks. - -Use [`~modular_pipelines.modular_pipeline_utils.InsertableDict.insert`] on either the block class or `sub_blocks` attribute to add a block. - -```py -# BLOCKS is dict of block classes, you need to add class to it -BLOCKS.insert("block_name", BlockClass, index) -# sub_blocks attribute contains instance, add a block instance to the attribute -t2i_blocks.sub_blocks.insert("block_name", block_instance, index) -``` - -Use [`~modular_pipelines.modular_pipeline_utils.InsertableDict.pop`] on either the block class or `sub_blocks` attribute to remove a block. - -```py -# remove a block class from preset -BLOCKS.pop("text_encoder") -# split out a block instance on its own -text_encoder_block = t2i_blocks.sub_blocks.pop("text_encoder") -``` - -Swap blocks by setting the existing block to the new block. - -```py -# Replace block class in preset -BLOCKS["prepare_latents"] = CustomPrepareLatents -# Replace in sub_blocks attribute using an block instance -t2i_blocks.sub_blocks["prepare_latents"] = CustomPrepareLatents() -``` +This guide will show you how to create a [`ModularPipeline`] and manage the components in it, and run it. ## Creating a pipeline -There are two ways to create a [`ModularPipeline`]. Assemble and create a pipeline from [`ModularPipelineBlocks`] or load an existing pipeline with [`~ModularPipeline.from_pretrained`]. +There are two ways to create a [`ModularPipeline`]. Assemble and create a pipeline from [`ModularPipelineBlocks`] with [`~ModularPipelineBlocks.init_pipeline`], or load an existing pipeline with [`~ModularPipeline.from_pretrained`]. You should also initialize a [`ComponentsManager`] to handle device placement and memory and component management. > [!TIP] > Refer to the [ComponentsManager](./components_manager) doc for more details about how it can help manage components across different workflows. - - +### init_pipeline -Use the [`~ModularPipelineBlocks.init_pipeline`] method to create a [`ModularPipeline`] from the component and configuration specifications. This method loads the *specifications* from a `modular_model_index.json` file, but it doesn't load the *models* yet. +[`~ModularPipelineBlocks.init_pipeline`] converts any [`ModularPipelineBlocks`] into a [`ModularPipeline`]. +Let's define a minimal block to see how it works: ```py -from diffusers import ComponentsManager -from diffusers.modular_pipelines import SequentialPipelineBlocks -from diffusers.modular_pipelines.stable_diffusion_xl import TEXT2IMAGE_BLOCKS +from transformers import CLIPTextModel +from diffusers.modular_pipelines import ( + ComponentSpec, + ModularPipelineBlocks, + PipelineState, +) -t2i_blocks = SequentialPipelineBlocks.from_blocks_dict(TEXT2IMAGE_BLOCKS) +class MyBlock(ModularPipelineBlocks): + @property + def expected_components(self): + return [ + ComponentSpec( + name="text_encoder", + type_hint=CLIPTextModel, + pretrained_model_name_or_path="openai/clip-vit-large-patch14", + ), + ] -modular_repo_id = "YiYiXu/modular-loader-t2i-0704" -components = ComponentsManager() -t2i_pipeline = t2i_blocks.init_pipeline(modular_repo_id, components_manager=components) + def __call__(self, components, state: PipelineState) -> PipelineState: + return components, state ``` - - +Call [`~ModularPipelineBlocks.init_pipeline`] to convert it into a pipeline. The `blocks` attribute on the pipeline is the blocks it was created from — it determines the expected inputs, outputs, and computation logic. +```py +block = MyBlock() +pipe = block.init_pipeline() +pipe.blocks +``` +``` +MyBlock { + "_class_name": "MyBlock", + "_diffusers_version": "0.37.0.dev0" +} +``` -The [`~ModularPipeline.from_pretrained`] method creates a [`ModularPipeline`] from a modular repository on the Hub. +Call [`~ModularPipelineBlocks.init_pipeline`] to convert it into a pipeline. The `blocks` attribute on the pipeline is the blocks it was created from — it determines the expected inputs, outputs, and computation logic. +> [!WARNING] +> Blocks are mutable — you can freely add, remove, or swap blocks before creating a pipeline. However, once a pipeline is created, modifying `pipeline.blocks` won't affect the pipeline because it returns a copy. If you want a different block structure, create a new pipeline after modifying the blocks. + +When you call [`~ModularPipelineBlocks.init_pipeline`] without a repository, it uses the `pretrained_model_name_or_path` defined in the block's [`ComponentSpec`] to determine where to load each component from. Printing the pipeline shows the component loading configuration. + +```py +pipe +ModularPipeline { + "_blocks_class_name": "MyBlock", + "_class_name": "ModularPipeline", + "_diffusers_version": "0.37.0.dev0", + "text_encoder": [ + null, + null, + { + "pretrained_model_name_or_path": "openai/clip-vit-large-patch14", + "revision": null, + "subfolder": "", + "type_hint": [ + "transformers", + "CLIPTextModel" + ], + "variant": null + } + ] +} +``` + +If you pass a repository to [`~ModularPipelineBlocks.init_pipeline`], it overrides the loading path by matching your block's components against the pipeline config in that repository (`model_index.json` or `modular_model_index.json`). + +In the example below, the `pretrained_model_name_or_path` will be updated to `"stabilityai/stable-diffusion-xl-base-1.0"`. +```py +pipe = block.init_pipeline("stabilityai/stable-diffusion-xl-base-1.0") +pipe +ModularPipeline { + "_blocks_class_name": "MyBlock", + "_class_name": "ModularPipeline", + "_diffusers_version": "0.37.0.dev0", + "text_encoder": [ + null, + null, + { + "pretrained_model_name_or_path": "stabilityai/stable-diffusion-xl-base-1.0", + "revision": null, + "subfolder": "text_encoder", + "type_hint": [ + "transformers", + "CLIPTextModel" + ], + "variant": null + } + ] +} +``` + +If a component in your block doesn't exist in the repository, it remains `null` and is skipped during [`~ModularPipeline.load_components`]. + +### from_pretrained + +[`~ModularPipeline.from_pretrained`] is a convenient way to create a [`ModularPipeline`] without defining blocks yourself. + +It works with three types of repositories. + +**A regular diffusers repository.** Pass any supported model repository and it automatically maps to the default pipeline blocks. Currently supported models include SDXL, Wan, Qwen, Z-Image, Flux, and Flux2. ```py from diffusers import ModularPipeline, ComponentsManager components = ComponentsManager() -pipeline = ModularPipeline.from_pretrained("YiYiXu/modular-loader-t2i-0704", components_manager=components) +pipeline = ModularPipeline.from_pretrained( + "stabilityai/stable-diffusion-xl-base-1.0", components_manager=components +) ``` -Add the `trust_remote_code` argument to load a custom [`ModularPipeline`]. - +**A modular repository.** These repositories contain a `modular_model_index.json` that specifies where to load each component from — the components can come from different repositories and the modular repository itself may not contain any model weights. For example, [diffusers/flux2-bnb-4bit-modular](https://huggingface.co/diffusers/flux2-bnb-4bit-modular) loads a quantized transformer from one repository and the remaining components from another. See [Modular repository](#modular-repository) for more details on the format. ```py from diffusers import ModularPipeline, ComponentsManager components = ComponentsManager() -modular_repo_id = "YiYiXu/modular-diffdiff-0704" -diffdiff_pipeline = ModularPipeline.from_pretrained(modular_repo_id, trust_remote_code=True, components_manager=components) +pipeline = ModularPipeline.from_pretrained( + "diffusers/flux2-bnb-4bit-modular", components_manager=components +) +``` + +**A modular repository with custom code.** Some repositories include custom pipeline blocks alongside the loading configuration. Add `trust_remote_code=True` to load them. See [Custom blocks](./custom_blocks) for how to create your own. +```py +from diffusers import ModularPipeline, ComponentsManager + +components = ComponentsManager() +pipeline = ModularPipeline.from_pretrained( + "diffusers/Florence2-image-Annotator", trust_remote_code=True, components_manager=components +) ``` - - ## Loading components @@ -184,7 +240,7 @@ A [`ModularPipeline`] doesn't automatically instantiate with components. It only ```py import torch -t2i_pipeline.load_components(torch_dtype=torch.float16) +pipeline.load_components(torch_dtype=torch.float16) t2i_pipeline.to("cuda") ``` @@ -196,7 +252,7 @@ The example below only loads the UNet and VAE. ```py import torch -t2i_pipeline.load_components(names=["unet", "vae"], torch_dtype=torch.float16) +pipeline.load_components(names=["unet", "vae"], torch_dtype=torch.float16) ```