mirror of https://github.com/huggingface/diffusers.git synced 2026-02-15 23:37:08 +08:00

Files

YiYi Xu 3c7506b294 [Modular] update doc for ModularPipeline (#13100 )

* update create pipeline section

* update more

* update more

* more

* add a section on running pipeline moduarly

* refactor update_components, remove support for spec

* style

* bullet points

* update the pipeline block

* small fix in state doc

* update sequential doc

* fix link

* small update on quikstart

* add a note on how to run pipeline without the componen4ts manager

* Apply suggestions from code review

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* remove the supported models mention

* update more

* up

* revert type hint changes

---------

Co-authored-by: yiyi@huggingface.co <yiyi@ip-26-0-160-103.ec2.internal>
Co-authored-by: yiyi@huggingface.co <yiyi@ip-26-0-161-123.ec2.internal>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

2026-02-14 11:43:28 -10:00

6.3 KiB

Raw Blame History

ModularPipelineBlocks

[~modular_pipelines.ModularPipelineBlocks] is the basic block for building a [ModularPipeline]. It defines what components, inputs/outputs, and computation a block should perform for a specific step in a pipeline. A [~modular_pipelines.ModularPipelineBlocks] connects with other blocks, using state, to enable the modular construction of workflows.

A [~modular_pipelines.ModularPipelineBlocks] on it's own can't be executed. It is a blueprint for what a step should do in a pipeline. To actually run and execute a pipeline, the [~modular_pipelines.ModularPipelineBlocks] needs to be converted into a [ModularPipeline].

This guide will show you how to create a [~modular_pipelines.ModularPipelineBlocks].

Inputs and outputs

Tip

Refer to the States guide if you aren't familiar with how state works in Modular Diffusers.

A [~modular_pipelines.ModularPipelineBlocks] requires inputs, and intermediate_outputs.

inputs are values a block reads from the [~modular_pipelines.PipelineState] to perform its computation. These can be values provided by a user (like a prompt or image) or values produced by a previous block (like encoded image_latents).

Use InputParam to define inputs.

class ImageEncodeStep(ModularPipelineBlocks):
    ...

    @property
    def inputs(self):
        return [
            InputParam(name="image", type_hint="PIL.Image", required=True, description="raw input image to process"),
        ]
    ...

intermediate_outputs are new values created by a block and added to the [~modular_pipelines.PipelineState]. The intermediate_outputs are available as inputs for subsequent blocks or available as the final output from running the pipeline.

Use OutputParam to define intermediate_outputs.

class ImageEncodeStep(ModularPipelineBlocks):
    ...

    @property
    def intermediate_outputs(self):
        return [
            OutputParam(name="image_latents", description="latents representing the image"),
        ]

    ...

The intermediate inputs and outputs share data to connect blocks. They are accessible at any point, allowing you to track the workflow's progress.

Components and configs

The components and pipeline-level configs a block needs are specified in [ComponentSpec] and [~modular_pipelines.ConfigSpec].

[ComponentSpec] contains the expected components used by a block. You need the name of the component and ideally a type_hint that specifies exactly what the component is.
[~modular_pipelines.ConfigSpec] contains pipeline-level settings that control behavior across all blocks.

class ImageEncodeStep(ModularPipelineBlocks):
    ...

    @property
    def expected_components(self):
        return [
            ComponentSpec(name="vae", type_hint=AutoencoderKL),
        ]

    @property
    def expected_configs(self):
        return [
            ConfigSpec("force_zeros_for_empty_prompt", True),
        ]

    ...

When the blocks are converted into a pipeline, the components become available to the block as the first argument in __call__.

Computation logic

The computation a block performs is defined in the __call__ method and it follows a specific structure.

Retrieve the [~modular_pipelines.BlockState] to get a local view of the inputs.
Implement the computation logic on the inputs.
Update [~modular_pipelines.PipelineState] to push changes from the local [~modular_pipelines.BlockState] back to the global [~modular_pipelines.PipelineState].
Return the components and state which becomes available to the next block.

class ImageEncodeStep(ModularPipelineBlocks):

    def __call__(self, components, state):
        # Get a local view of the state variables this block needs
        block_state = self.get_block_state(state)

        # Your computation logic here
        # block_state contains all your inputs
        # Access them like: block_state.image, block_state.processed_image

        # Update the pipeline state with your updated block_states
        self.set_block_state(state, block_state)
        return components, state

Putting it all together

Here is the complete block with all the pieces connected.

from diffusers import ComponentSpec, AutoencoderKL
from diffusers.modular_pipelines import InputParam, ModularPipelineBlocks, OutputParam


class ImageEncodeStep(ModularPipelineBlocks):

    @property
    def description(self):
        return "Encode an image into latent space."

    @property
    def expected_components(self):
        return [
            ComponentSpec(name="vae", type_hint=AutoencoderKL),
        ]

    @property
    def inputs(self):
        return [
            InputParam(name="image", type_hint="PIL.Image", required=True, description="raw input image to process"),
        ]

    @property
    def intermediate_outputs(self):
        return [
            OutputParam(name="image_latents", type_hint="torch.Tensor", description="latents representing the image"),
        ]

    def __call__(self, components, state):
        block_state = self.get_block_state(state)
        block_state.image_latents = components.vae.encode(block_state.image)
        self.set_block_state(state, block_state)
        return components, state

Every block has a doc property that is automatically generated from the properties you defined above. It provides a summary of the block's description, components, inputs, and outputs.

block = ImageEncoderStep()
print(block.doc)
class ImageEncodeStep

  Encode an image into latent space.

  Components:
      vae (`AutoencoderKL`)

  Inputs:
      image (`PIL.Image`):
          raw input image to process

  Outputs:
      image_latents (`torch.Tensor`):
          latents representing the image

6.3 KiB Raw Blame History