Files
diffusers/docs/source/en/modular_diffusers/custom_blocks.md
YiYi Xu 6d4fc6baa0 [Modular] mellon doc etc (#13051)
* add metadata field to input/output param

* refactor mellonparam: move the template outside, add metaclass, define some generic template for custom node

* add from_custom_block

* style

* up up fix

* add mellon guide

* add to toctree

* style

* add mellon_types

* style

* mellon_type -> inpnt_types + output_types

* update doc

* add quant info to components manager

* fix more

* up up

* fix components manager

* update custom block guide

* update

* style

* add a warn for mellon and add new guides to overview

* Apply suggestions from code review

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/modular_diffusers/mellon.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* more update on custom block guide

* Update docs/source/en/modular_diffusers/mellon.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* a few mamual

* apply suggestion: turn into bullets

* support define mellon meta with MellonParam directly, and update doc

* add the video

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
Co-authored-by: yiyi@huggingface.co <yiyi@ip-26-0-160-103.ec2.internal>
2026-02-03 13:38:57 -10:00

11 KiB

Building Custom Blocks

ModularPipelineBlocks are the fundamental building blocks of a [ModularPipeline]. You can create custom blocks by defining their inputs, outputs, and computation logic. This guide demonstrates how to create and use a custom block.

Tip

Explore the Modular Diffusers Custom Blocks collection for official custom blocks.

Project Structure

Your custom block project should use the following structure:

.
├── block.py
└── modular_config.json
  • block.py contains the custom block implementation
  • modular_config.json contains the metadata needed to load the block

Quick Start with Template

The fastest way to create a custom block is to start from our template. The template provides a pre-configured project structure with block.py and modular_config.json files, plus commented examples showing how to define components, inputs, outputs, and the __call__ method—so you can focus on your custom logic instead of boilerplate setup.

Download the template

from diffusers import ModularPipelineBlocks

model_id = "diffusers/custom-block-template"
local_dir = model_id.split("/")[-1]

blocks = ModularPipelineBlocks.from_pretrained(
    model_id, 
    trust_remote_code=True, 
    local_dir=local_dir
)

This saves the template files to custom-block-template/ locally or you could use local_dir to save to a specific location.

Edit locally

Open block.py and implement your custom block. The template includes commented examples showing how to define each property. See the Florence-2 example below for a complete implementation.

Test your block

from diffusers import ModularPipelineBlocks

blocks = ModularPipelineBlocks.from_pretrained(local_dir, trust_remote_code=True)
pipeline = blocks.init_pipeline()
output = pipeline(...)  # your inputs here

Upload to the Hub

pipeline.save_pretrained(local_dir, repo_id="your-username/your-block-name", push_to_hub=True)

Example: Florence-2 Image Annotator

This example creates a custom block with Florence-2 to process an input image and generate a mask for inpainting.

Define components

Define the components the block needs, Florence2ForConditionalGeneration and its processor. When defining components, specify the name (how you'll access it in code), type_hint (the model class), and pretrained_model_name_or_path (where to load weights from).

# Inside block.py
from diffusers.modular_pipelines import ModularPipelineBlocks, ComponentSpec
from transformers import AutoProcessor, Florence2ForConditionalGeneration


class Florence2ImageAnnotatorBlock(ModularPipelineBlocks):

    @property
    def expected_components(self):
        return [
            ComponentSpec(
                name="image_annotator",
                type_hint=Florence2ForConditionalGeneration,
                pretrained_model_name_or_path="florence-community/Florence-2-base-ft",
            ),
            ComponentSpec(
                name="image_annotator_processor",
                type_hint=AutoProcessor,
                pretrained_model_name_or_path="florence-community/Florence-2-base-ft",
            ),
        ]

Define inputs and outputs

Inputs include the image, annotation task, and prompt. Outputs include the generated mask and annotations.

from typing import List, Union
from PIL import Image
from diffusers.modular_pipelines import InputParam, OutputParam


class Florence2ImageAnnotatorBlock(ModularPipelineBlocks):

    # ... expected_components from above ...

    @property
    def inputs(self) -> List[InputParam]:
        return [
            InputParam(
                "image",
                type_hint=Union[Image.Image, List[Image.Image]],
                required=True,
                description="Image(s) to annotate",
            ),
            InputParam(
                "annotation_task",
                type_hint=str,
                default="<REFERRING_EXPRESSION_SEGMENTATION>",
                description="Annotation task to perform (e.g., <OD>, <CAPTION>, <REFERRING_EXPRESSION_SEGMENTATION>)",
            ),
            InputParam(
                "annotation_prompt",
                type_hint=str,
                required=True,
                description="Prompt to provide context for the annotation task",
            ),
            InputParam(
                "annotation_output_type",
                type_hint=str,
                default="mask_image",
                description="Output type: 'mask_image', 'mask_overlay', or 'bounding_box'",
            ),
        ]

    @property
    def intermediate_outputs(self) -> List[OutputParam]:
        return [
            OutputParam(
                "mask_image",
                type_hint=Image.Image,
                description="Inpainting mask for the input image",
            ),
            OutputParam(
                "annotations",
                type_hint=dict,
                description="Raw annotation predictions",
            ),
            OutputParam(
                "image",
                type_hint=Image.Image,
                description="Annotated image",
            ),
        ]

Implement the __call__ method

The __call__ method contains the block's logic. Access inputs via block_state, run your computation, and set outputs back to block_state.

import torch
from diffusers.modular_pipelines import PipelineState


class Florence2ImageAnnotatorBlock(ModularPipelineBlocks):

    # ... expected_components, inputs, intermediate_outputs from above ...

    @torch.no_grad()
    def __call__(self, components, state: PipelineState) -> PipelineState:
        block_state = self.get_block_state(state)
        
        images, annotation_task_prompt = self.prepare_inputs(
            block_state.image, block_state.annotation_prompt
        )
        task = block_state.annotation_task
        fill = block_state.fill
        
        annotations = self.get_annotations(
            components, images, annotation_task_prompt, task
        )
        block_state.annotations = annotations
        if block_state.annotation_output_type == "mask_image":
            block_state.mask_image = self.prepare_mask(images, annotations)
        else:
            block_state.mask_image = None

        if block_state.annotation_output_type == "mask_overlay":
            block_state.image = self.prepare_mask(images, annotations, overlay=True, fill=fill)

        elif block_state.annotation_output_type == "bounding_box":
            block_state.image = self.prepare_bounding_boxes(images, annotations)

        self.set_block_state(state, block_state)

        return components, state
    
    # Helper methods for mask/bounding box generation...

Tip

See the complete implementation at diffusers/Florence2-image-Annotator.

Using Custom Blocks

Load a custom block with [~ModularPipeline.from_pretrained] and set trust_remote_code=True.

import torch
from diffusers import ModularPipeline
from diffusers.utils import load_image

# Load the Florence-2 annotator pipeline
image_annotator = ModularPipeline.from_pretrained(
    "diffusers/Florence2-image-Annotator",
    trust_remote_code=True
)

# Check the docstring to see inputs/outputs
print(image_annotator.blocks.doc)

Use the block to generate a mask:

image_annotator.load_components(torch_dtype=torch.bfloat16)
image_annotator.to("cuda")

image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/tasks/car.jpg")
image = image.resize((1024, 1024))
prompt = ["A red car"]
annotation_task = "<REFERRING_EXPRESSION_SEGMENTATION>"
annotation_prompt = ["the car"]

mask_image = image_annotator_node(
    prompt=prompt,
    image=image,
    annotation_task=annotation_task,
    annotation_prompt=annotation_prompt,
    annotation_output_type="mask_image",
).images
mask_image[0].save("car-mask.png")

Compose it with other blocks to create a new pipeline:

# Get the annotator block
annotator_block = image_annotator.blocks

# Get an inpainting workflow and insert the annotator at the beginning
inpaint_blocks = ModularPipeline.from_pretrained("Qwen/Qwen-Image").blocks.get_workflow("inpainting")
inpaint_blocks.sub_blocks.insert("image_annotator", annotator_block, 0)

# Initialize the combined pipeline
pipe = inpaint_blocks.init_pipeline()
pipe.load_components(torch_dtype=torch.float16, device="cuda")

# Now the pipeline automatically generates masks from prompts
output = pipe(
    prompt=prompt,
    image=image,
    annotation_task=annotation_task,
    annotation_prompt=annotation_prompt,
    annotation_output_type="mask_image",
    num_inference_steps=35,
    guidance_scale=7.5,
    strength=0.95,
    output="images"
)
output[0].save("florence-inpainting.png")

Editing custom blocks

Edit custom blocks by downloading it locally. This is the same workflow as the Quick Start with Template, but starting from an existing block instead of the template.

Use the local_dir argument to download a custom block to a specific folder:

from diffusers import ModularPipelineBlocks

# Download to a local folder for editing
annotator_block = ModularPipelineBlocks.from_pretrained(
    "diffusers/Florence2-image-Annotator",
    trust_remote_code=True,
    local_dir="./my-florence-block"
)

Any changes made to the block files in this folder will be reflected when you load the block again. When you're ready to share your changes, upload to a new repository:

pipeline = annotator_block.init_pipeline()
pipeline.save_pretrained("./my-florence-block", repo_id="your-username/my-custom-florence", push_to_hub=True)

Next Steps

This guide covered creating a single custom block. Learn how to compose multiple blocks together:

Make your custom block work with Mellon's visual interface. See the Mellon Custom Blocks guide.

Browse the Modular Diffusers Custom Blocks collection for inspiration and ready-to-use blocks.