Compare commits

...

12 Commits

Author SHA1 Message Date
yiyixuxu
077b6972cf update more 2026-01-26 21:36:24 +01:00
YiYi Xu
7dc454f466 Update docs/source/en/modular_diffusers/quickstart.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2026-01-26 08:27:04 -10:00
YiYi Xu
b6d05bb69c Update docs/source/en/modular_diffusers/quickstart.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2026-01-26 08:26:40 -10:00
YiYi Xu
8483c06993 Apply suggestions from code review
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2026-01-26 08:26:06 -10:00
yiyixuxu
fe4e4d7b66 fix 2026-01-26 02:03:39 +01:00
yiyixuxu
56dd6cca15 update a bit more 2026-01-26 02:01:23 +01:00
YiYi Xu
809fc36e03 Update docs/source/en/modular_diffusers/quickstart.md 2026-01-25 14:10:32 -10:00
yiyixuxu
318f2bf918 add a real quick start guide 2026-01-26 01:06:59 +01:00
Gal Davidi
d7a1c31f4f Fibo edit pipeline (#12930)
* Feature: Add BriaFiboEditPipeline to diffusers

* Introduced BriaFiboEditPipeline class with necessary backend requirements.
* Updated import structures in relevant modules to include BriaFiboEditPipeline.
* Ensured compatibility with existing pipelines and type checking.

* Feature: Introduce Bria Fibo Edit Pipeline

* Added BriaFiboEditPipeline class for structured JSON-native image editing.
* Created documentation for the new pipeline in bria_fibo_edit.md.
* Updated import structures to include the new pipeline and its components.
* Added unit tests for the BriaFiboEditPipeline to ensure functionality and correctness.

* Enhancement: Update Bria Fibo Edit Pipeline and Documentation

* Refined the Bria Fibo Edit model description for clarity and detail.
* Added usage instructions for model authentication and login.
* Implemented mask handling functions in the BriaFiboEditPipeline for improved image editing capabilities.
* Updated unit tests to cover new mask functionalities and ensure input validation.
* Adjusted example code in documentation to reflect changes in the pipeline's usage.

* Update Bria Fibo Edit documentation with corrected Hugging Face page link

* add dreambooth training script

* style and quality

* Delete temp.py

* Enhancement: Improve JSON caption validation in DreamBoothDataset

* Updated the clean_json_caption function to handle both string and dictionary inputs for captions.
* Added error handling to raise a ValueError for invalid caption types, ensuring better input validation.

* Add datasets dependency to requirements_fibo_edit.txt

* Add bria_fibo_edit to docs table of contents

* Fix dummy objects ordering

* Fix BriaFiboEditPipeline to use passed generator parameter

The pipeline was ignoring the generator parameter and only using
the seed parameter. This caused non-deterministic outputs in tests
that pass a seeded generator.

* Remove fibo_edit training script and related files

---------

Co-authored-by: kfirbria <kfir@bria.ai>
2026-01-19 22:09:53 +05:30
Sayak Paul
29b15f41c7 [chore] make style to push new changes. (#12998)
make style to push new changes.
2026-01-19 16:02:13 +05:30
sayakpaul
75edff93a0 Revert "make style && make quality"
This reverts commit 76f51a5e92.
2026-01-19 15:35:20 +05:30
sayakpaul
76f51a5e92 make style && make quality 2026-01-19 15:34:29 +05:30
15 changed files with 1621 additions and 328 deletions

View File

@@ -496,6 +496,8 @@
title: Bria 3.2
- local: api/pipelines/bria_fibo
title: Bria Fibo
- local: api/pipelines/bria_fibo_edit
title: Bria Fibo Edit
- local: api/pipelines/chroma
title: Chroma
- local: api/pipelines/cogview3

View File

@@ -0,0 +1,33 @@
<!--Copyright 2025 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->
# Bria Fibo Edit
Fibo Edit is an 8B parameter image-to-image model that introduces a new paradigm of structured control, operating on JSON inputs paired with source images to enable deterministic and repeatable editing workflows.
Featuring native masking for granular precision, it moves beyond simple prompt-based diffusion to offer explicit, interpretable control optimized for production environments.
Its lightweight architecture is designed for deep customization, empowering researchers to build specialized "Edit" models for domain-specific tasks while delivering top-tier aesthetic quality
## Usage
_As the model is gated, before using it with diffusers you first need to go to the [Bria Fibo Hugging Face page](https://huggingface.co/briaai/Fibo-Edit), fill in the form and accept the gate. Once you are in, you need to login so that your system knows youve accepted the gate._
Use the command below to log in:
```bash
hf auth login
```
## BriaFiboEditPipeline
[[autodoc]] BriaFiboEditPipeline
- all
- __call__

View File

@@ -24,7 +24,7 @@ The Modular Diffusers docs are organized as shown below.
## Quickstart
- A [quickstart](./quickstart) demonstrating how to implement an example workflow with Modular Diffusers.
- The [quickstart](./quickstart) shows you how to run a modular pipeline, understand its structure, and customize it by modifying the blocks that compose it.
## ModularPipelineBlocks

View File

@@ -12,333 +12,250 @@ specific language governing permissions and limitations under the License.
# Quickstart
Modular Diffusers is a framework for quickly building flexible and customizable pipelines. At the core of Modular Diffusers are [`ModularPipelineBlocks`] that can be combined with other blocks to adapt to new workflows. The blocks are converted into a [`ModularPipeline`], a friendly user-facing interface developers can use.
Modular Diffusers is a framework for quickly building flexible and customizable pipelines. At the core of Modular Diffusers are [`ModularPipelineBlocks`] that can be combined with other blocks to adapt to new workflows. The blocks are converted into a [`ModularPipeline`], a friendly user-facing interface for running generation tasks.
This doc will show you how to implement a [Differential Diffusion](https://differential-diffusion.github.io/) pipeline with the modular framework.
This guide shows you how to run a modular pipeline, understand its structure, and customize it by modifying the blocks that compose it.
## ModularPipelineBlocks
## Run a pipeline
[`ModularPipelineBlocks`] are *definitions* that specify the components, inputs, outputs, and computation logic for a single step in a pipeline. There are four types of blocks.
- [`ModularPipelineBlocks`] is the most basic block for a single step.
- [`SequentialPipelineBlocks`] is a multi-block that composes other blocks linearly. The outputs of one block are the inputs to the next block.
- [`LoopSequentialPipelineBlocks`] is a multi-block that runs iteratively and is designed for iterative workflows.
- [`AutoPipelineBlocks`] is a collection of blocks for different workflows and it selects which block to run based on the input. It is designed to conveniently package multiple workflows into a single pipeline.
[Differential Diffusion](https://differential-diffusion.github.io/) is an image-to-image workflow. Start with the `IMAGE2IMAGE_BLOCKS` preset, a collection of `ModularPipelineBlocks` for image-to-image generation.
```py
from diffusers.modular_pipelines.stable_diffusion_xl import IMAGE2IMAGE_BLOCKS
IMAGE2IMAGE_BLOCKS = InsertableDict([
("text_encoder", StableDiffusionXLTextEncoderStep),
("image_encoder", StableDiffusionXLVaeEncoderStep),
("input", StableDiffusionXLInputStep),
("set_timesteps", StableDiffusionXLImg2ImgSetTimestepsStep),
("prepare_latents", StableDiffusionXLImg2ImgPrepareLatentsStep),
("prepare_add_cond", StableDiffusionXLImg2ImgPrepareAdditionalConditioningStep),
("denoise", StableDiffusionXLDenoiseStep),
("decode", StableDiffusionXLDecodeStep)
])
```
## Pipeline and block states
Modular Diffusers uses *state* to communicate data between blocks. There are two types of states.
- [`PipelineState`] is a global state that can be used to track all inputs and outputs across all blocks.
- [`BlockState`] is a local view of relevant variables from [`PipelineState`] for an individual block.
## Customizing blocks
[Differential Diffusion](https://differential-diffusion.github.io/) differs from standard image-to-image in its `prepare_latents` and `denoise` blocks. All the other blocks can be reused, but you'll need to modify these two.
Create placeholder `ModularPipelineBlocks` for `prepare_latents` and `denoise` by copying and modifying the existing ones.
Print the `denoise` block to see that it is composed of [`LoopSequentialPipelineBlocks`] with three sub-blocks, `before_denoiser`, `denoiser`, and `after_denoiser`. Only the `before_denoiser` sub-block needs to be modified to prepare the latent input for the denoiser based on the change map.
```py
denoise_blocks = IMAGE2IMAGE_BLOCKS["denoise"]()
print(denoise_blocks)
```
Replace the `StableDiffusionXLLoopBeforeDenoiser` sub-block with the new `SDXLDiffDiffLoopBeforeDenoiser` block.
```py
# Copy existing blocks as placeholders
class SDXLDiffDiffPrepareLatentsStep(ModularPipelineBlocks):
"""Copied from StableDiffusionXLImg2ImgPrepareLatentsStep - will modify later"""
# ... same implementation as StableDiffusionXLImg2ImgPrepareLatentsStep
class SDXLDiffDiffDenoiseStep(StableDiffusionXLDenoiseLoopWrapper):
block_classes = [SDXLDiffDiffLoopBeforeDenoiser, StableDiffusionXLLoopDenoiser, StableDiffusionXLLoopAfterDenoiser]
block_names = ["before_denoiser", "denoiser", "after_denoiser"]
```
### prepare_latents
The `prepare_latents` block requires the following changes.
- a processor to process the change map
- a new `inputs` to accept the user-provided change map, `timestep` for precomputing all the latents and `num_inference_steps` to create the mask for updating the image regions
- update the computation in the `__call__` method for processing the change map and creating the masks, and storing it in the [`BlockState`]
```diff
class SDXLDiffDiffPrepareLatentsStep(ModularPipelineBlocks):
@property
def expected_components(self) -> List[ComponentSpec]:
return [
ComponentSpec("vae", AutoencoderKL),
ComponentSpec("scheduler", EulerDiscreteScheduler),
+ ComponentSpec("mask_processor", VaeImageProcessor, config=FrozenDict({"do_normalize": False, "do_convert_grayscale": True}))
]
@property
def inputs(self) -> List[Tuple[str, Any]]:
return [
InputParam("generator"),
+ InputParam("diffdiff_map", required=True),
- InputParam("latent_timestep", required=True, type_hint=torch.Tensor),
+ InputParam("timesteps", type_hint=torch.Tensor),
+ InputParam("num_inference_steps", type_hint=int),
]
@property
def intermediate_outputs(self) -> List[OutputParam]:
return [
+ OutputParam("original_latents", type_hint=torch.Tensor),
+ OutputParam("diffdiff_masks", type_hint=torch.Tensor),
]
def __call__(self, components, state: PipelineState):
# ... existing logic ...
+ # Process change map and create masks
+ diffdiff_map = components.mask_processor.preprocess(block_state.diffdiff_map, height=latent_height, width=latent_width)
+ thresholds = torch.arange(block_state.num_inference_steps, dtype=diffdiff_map.dtype) / block_state.num_inference_steps
+ block_state.diffdiff_masks = diffdiff_map > (thresholds + (block_state.denoising_start or 0))
+ block_state.original_latents = block_state.latents
```
### denoise
The `before_denoiser` sub-block requires the following changes.
- a new `inputs` to accept a `denoising_start` parameter, `original_latents` and `diffdiff_masks` from the `prepare_latents` block
- update the computation in the `__call__` method for applying Differential Diffusion
```diff
class SDXLDiffDiffLoopBeforeDenoiser(ModularPipelineBlocks):
@property
def description(self) -> str:
return (
"Step within the denoising loop for differential diffusion that prepare the latent input for the denoiser"
)
@property
def inputs(self) -> List[str]:
return [
InputParam("latents", required=True, type_hint=torch.Tensor),
+ InputParam("denoising_start"),
+ InputParam("original_latents", type_hint=torch.Tensor),
+ InputParam("diffdiff_masks", type_hint=torch.Tensor),
]
def __call__(self, components, block_state, i, t):
+ # Apply differential diffusion logic
+ if i == 0 and block_state.denoising_start is None:
+ block_state.latents = block_state.original_latents[:1]
+ else:
+ block_state.mask = block_state.diffdiff_masks[i].unsqueeze(0).unsqueeze(1)
+ block_state.latents = block_state.original_latents[i] * block_state.mask + block_state.latents * (1 - block_state.mask)
# ... rest of existing logic ...
```
## Assembling the blocks
You should have all the blocks you need at this point to create a [`ModularPipeline`].
Copy the existing `IMAGE2IMAGE_BLOCKS` preset and for the `set_timesteps` block, use the `set_timesteps` from the `TEXT2IMAGE_BLOCKS` because Differential Diffusion doesn't require a `strength` parameter.
Set the `prepare_latents` and `denoise` blocks to the `SDXLDiffDiffPrepareLatentsStep` and `SDXLDiffDiffDenoiseStep` blocks you just modified.
Call [`SequentialPipelineBlocks.from_blocks_dict`] on the blocks to create a `SequentialPipelineBlocks`.
```py
DIFFDIFF_BLOCKS = IMAGE2IMAGE_BLOCKS.copy()
DIFFDIFF_BLOCKS["set_timesteps"] = TEXT2IMAGE_BLOCKS["set_timesteps"]
DIFFDIFF_BLOCKS["prepare_latents"] = SDXLDiffDiffPrepareLatentsStep
DIFFDIFF_BLOCKS["denoise"] = SDXLDiffDiffDenoiseStep
dd_blocks = SequentialPipelineBlocks.from_blocks_dict(DIFFDIFF_BLOCKS)
print(dd_blocks)
```
## ModularPipeline
Convert the [`SequentialPipelineBlocks`] into a [`ModularPipeline`] with the [`ModularPipeline.init_pipeline`] method. This initializes the expected components to load from a `modular_model_index.json` file. Explicitly load the components by calling [`ModularPipeline.load_components`].
It is a good idea to initialize the [`ComponentManager`] with the pipeline to help manage the different components. Once you call [`~ModularPipeline.load_components`], the components are registered to the [`ComponentManager`] and can be shared between workflows. The example below uses the `collection` argument to assign the components a `"diffdiff"` label for better organization.
```py
from diffusers.modular_pipelines import ComponentsManager
components = ComponentManager()
dd_pipeline = dd_blocks.init_pipeline("YiYiXu/modular-demo-auto", components_manager=components, collection="diffdiff")
dd_pipeline.load_componenets(torch_dtype=torch.float16)
dd_pipeline.to("cuda")
```
## Adding workflows
Other workflows can be added to the [`ModularPipeline`] to support additional features without rewriting the entire pipeline from scratch.
This section demonstrates how to add an IP-Adapter or ControlNet.
### IP-Adapter
Stable Diffusion XL already has a preset IP-Adapter block that you can use and doesn't require any changes to the existing Differential Diffusion pipeline.
```py
from diffusers.modular_pipelines.stable_diffusion_xl.encoders import StableDiffusionXLAutoIPAdapterStep
ip_adapter_block = StableDiffusionXLAutoIPAdapterStep()
```
Use the [`sub_blocks.insert`] method to insert it into the [`ModularPipeline`]. The example below inserts the `ip_adapter_block` at position `0`. Print the pipeline to see that the `ip_adapter_block` is added and it requires an `ip_adapter_image`. This also added two components to the pipeline, the `image_encoder` and `feature_extractor`.
```py
dd_blocks.sub_blocks.insert("ip_adapter", ip_adapter_block, 0)
```
Call [`~ModularPipeline.init_pipeline`] to initialize a [`ModularPipeline`] and use [`~ModularPipeline.load_components`] to load the model components. Load and set the IP-Adapter to run the pipeline.
```py
dd_pipeline = dd_blocks.init_pipeline("YiYiXu/modular-demo-auto", collection="diffdiff")
dd_pipeline.load_components(torch_dtype=torch.float16)
dd_pipeline.loader.load_ip_adapter("h94/IP-Adapter", subfolder="sdxl_models", weight_name="ip-adapter_sdxl.bin")
dd_pipeline.loader.set_ip_adapter_scale(0.6)
dd_pipeline = dd_pipeline.to(device)
ip_adapter_image = load_image("https://huggingface.co/datasets/YiYiXu/testing-images/resolve/main/diffdiff_orange.jpeg")
image = load_image("https://huggingface.co/datasets/OzzyGT/testing-resources/resolve/main/differential/20240329211129_4024911930.png?download=true")
mask = load_image("https://huggingface.co/datasets/OzzyGT/testing-resources/resolve/main/differential/gradient_mask.png?download=true")
prompt = "a green pear"
negative_prompt = "blurry"
generator = torch.Generator(device=device).manual_seed(42)
image = dd_pipeline(
prompt=prompt,
negative_prompt=negative_prompt,
num_inference_steps=25,
generator=generator,
ip_adapter_image=ip_adapter_image,
diffdiff_map=mask,
image=image,
output="images"
)[0]
```
### ControlNet
Stable Diffusion XL already has a preset ControlNet block that can readily be used.
```py
from diffusers.modular_pipelines.stable_diffusion_xl.modular_blocks import StableDiffusionXLAutoControlNetInputStep
control_input_block = StableDiffusionXLAutoControlNetInputStep()
```
However, it requires modifying the `denoise` block because that's where the ControlNet injects the control information into the UNet.
Modify the `denoise` block by replacing the `StableDiffusionXLLoopDenoiser` sub-block with the `StableDiffusionXLControlNetLoopDenoiser`.
```py
class SDXLDiffDiffControlNetDenoiseStep(StableDiffusionXLDenoiseLoopWrapper):
block_classes = [SDXLDiffDiffLoopBeforeDenoiser, StableDiffusionXLControlNetLoopDenoiser, StableDiffusionXLDenoiseLoopAfterDenoiser]
block_names = ["before_denoiser", "denoiser", "after_denoiser"]
controlnet_denoise_block = SDXLDiffDiffControlNetDenoiseStep()
```
Insert the `controlnet_input` block and replace the `denoise` block with the new `controlnet_denoise_block`. Initialize a [`ModularPipeline`] and [`~ModularPipeline.load_components`] into it.
```py
dd_blocks.sub_blocks.insert("controlnet_input", control_input_block, 7)
dd_blocks.sub_blocks["denoise"] = controlnet_denoise_block
dd_pipeline = dd_blocks.init_pipeline("YiYiXu/modular-demo-auto", collection="diffdiff")
dd_pipeline.load_components(torch_dtype=torch.float16)
dd_pipeline = dd_pipeline.to(device)
control_image = load_image("https://huggingface.co/datasets/YiYiXu/testing-images/resolve/main/diffdiff_tomato_canny.jpeg")
image = load_image("https://huggingface.co/datasets/OzzyGT/testing-resources/resolve/main/differential/20240329211129_4024911930.png?download=true")
mask = load_image("https://huggingface.co/datasets/OzzyGT/testing-resources/resolve/main/differential/gradient_mask.png?download=true")
prompt = "a green pear"
negative_prompt = "blurry"
generator = torch.Generator(device=device).manual_seed(42)
image = dd_pipeline(
prompt=prompt,
negative_prompt=negative_prompt,
num_inference_steps=25,
generator=generator,
control_image=control_image,
controlnet_conditioning_scale=0.5,
diffdiff_map=mask,
image=image,
output="images"
)[0]
```
### AutoPipelineBlocks
The Differential Diffusion, IP-Adapter, and ControlNet workflows can be bundled into a single [`ModularPipeline`] by using [`AutoPipelineBlocks`]. This allows automatically selecting which sub-blocks to run based on the inputs like `control_image` or `ip_adapter_image`. If none of these inputs are passed, then it defaults to the Differential Diffusion.
Use `block_trigger_inputs` to only run the `SDXLDiffDiffControlNetDenoiseStep` block if a `control_image` input is provided. Otherwise, the `SDXLDiffDiffDenoiseStep` is used.
```py
class SDXLDiffDiffAutoDenoiseStep(AutoPipelineBlocks):
block_classes = [SDXLDiffDiffControlNetDenoiseStep, SDXLDiffDiffDenoiseStep]
block_names = ["controlnet_denoise", "denoise"]
block_trigger_inputs = ["controlnet_cond", None]
```
Add the `ip_adapter` and `controlnet_input` blocks.
```py
DIFFDIFF_AUTO_BLOCKS = IMAGE2IMAGE_BLOCKS.copy()
DIFFDIFF_AUTO_BLOCKS["prepare_latents"] = SDXLDiffDiffPrepareLatentsStep
DIFFDIFF_AUTO_BLOCKS["set_timesteps"] = TEXT2IMAGE_BLOCKS["set_timesteps"]
DIFFDIFF_AUTO_BLOCKS["denoise"] = SDXLDiffDiffAutoDenoiseStep
DIFFDIFF_AUTO_BLOCKS.insert("ip_adapter", StableDiffusionXLAutoIPAdapterStep, 0)
DIFFDIFF_AUTO_BLOCKS.insert("controlnet_input",StableDiffusionXLControlNetAutoInput, 7)
```
Call [`SequentialPipelineBlocks.from_blocks_dict`] to create a [`SequentialPipelineBlocks`] and create a [`ModularPipeline`] and load in the model components to run.
```py
dd_auto_blocks = SequentialPipelineBlocks.from_blocks_dict(DIFFDIFF_AUTO_BLOCKS)
dd_pipeline = dd_auto_blocks.init_pipeline("YiYiXu/modular-demo-auto", collection="diffdiff")
dd_pipeline.load_components(torch_dtype=torch.float16)
```
## Share
Add your [`ModularPipeline`] to the Hub with [`~ModularPipeline.save_pretrained`] and set `push_to_hub` argument to `True`.
```py
dd_pipeline.save_pretrained("YiYiXu/test_modular_doc", push_to_hub=True)
```
Other users can load the [`ModularPipeline`] with [`~ModularPipeline.from_pretrained`].
[`ModularPipeline`] is the main interface for loading, running, and managing modular pipelines.
```py
import torch
from diffusers.modular_pipelines import ModularPipeline, ComponentsManager
from diffusers import ModularPipeline
components = ComponentsManager()
pipe = ModularPipeline.from_pretrained("Qwen/Qwen-Image")
pipe.load_components(torch_dtype=torch.bfloat16)
pipe.to("cuda")
diffdiff_pipeline = ModularPipeline.from_pretrained("YiYiXu/modular-diffdiff-0704", trust_remote_code=True, components_manager=components, collection="diffdiff")
diffdiff_pipeline.load_components(torch_dtype=torch.float16)
image = pipe(
prompt="cat wizard with red hat, gandalf, lord of the rings, detailed, fantasy, cute, adorable, Pixar, Disney",
).images[0]
image
```
[`~ModularPipeline.from_pretrained`] uses lazy loading - it reads the configuration to learn where to load each component from, but doesn't actually load the model weights until you call [`~ModularPipeline.load_components`]. This gives you control over when and how components are loaded.
Learn more about creating and loading pipelines in the [Creating a pipeline](https://huggingface.co/docs/diffusers/modular_diffusers/modular_pipeline#creating-a-pipeline) and [Loading components](https://huggingface.co/docs/diffusers/modular_diffusers/modular_pipeline#loading-components) guides.
## Understand the structure
A [`ModularPipeline`] has two parts:
- **State**: the loaded components (models, schedulers, processors) and configuration
- **Definition**: the [`ModularPipelineBlocks`] that specify inputs, outputs, expected components and computation logic
The blocks define *what* the pipeline does. Access them through `pipe.blocks`.
```py
print(pipe.blocks)
```
```
QwenImageAutoBlocks(
Class: SequentialPipelineBlocks
Description: Auto Modular pipeline for text-to-image, image-to-image, inpainting, and controlnet tasks using QwenImage.
Supported workflows:
- `text2image`: requires `prompt`
- `image2image`: requires `prompt`, `image`
- `inpainting`: requires `prompt`, `mask_image`, `image`
- `controlnet_text2image`: requires `prompt`, `control_image`
...
Components:
text_encoder (`Qwen2_5_VLForConditionalGeneration`)
vae (`AutoencoderKLQwenImage`)
transformer (`QwenImageTransformer2DModel`)
...
Sub-Blocks:
[0] text_encoder (QwenImageAutoTextEncoderStep)
[1] vae_encoder (QwenImageAutoVaeEncoderStep)
[2] controlnet_vae_encoder (QwenImageOptionalControlNetVaeEncoderStep)
[3] denoise (QwenImageAutoCoreDenoiseStep)
[4] decode (QwenImageAutoDecodeStep)
)
```
The output returns:
- The supported workflows (text2image, image2image, inpainting, etc.)
- The Sub-Blocks it's composed of (text_encoder, vae_encoder, denoise, decode)
### Workflows
`QwenImageAutoBlocks` is a [`ConditionalPipelineBlocks`], so this pipeline supports multiple workflows and adapts its behavior based on the inputs you provide. For example, if you pass `image` to the pipeline, it runs an image-to-image workflow instead of text-to-image.
```py
from diffusers.utils import load_image
input_image = load_image("https://github.com/Trgtuan10/Image_storage/blob/main/cute_cat.png?raw=true")
image = pipe(
prompt="cat wizard with red hat, gandalf, lord of the rings, detailed, fantasy, cute, adorable, Pixar, Disney",
image=input_image,
).images[0]
```
Use `get_workflow()` to extract the blocks for a specific workflow.
```py
img2img_blocks = pipe.blocks.get_workflow("image2image")
```
Conditional blocks are convenient for users, but their conditional logic adds complexity when customizing or debugging. Extracting a workflow gives you the specific blocks relevant to your workflow, making it easier to work with. Learn more in the [AutoPipelineBlocks](https://huggingface.co/docs/diffusers/modular_diffusers/auto_pipeline_blocks) guide.
### Sub-blocks
`QwenImageAutoBlocks` is itself composed of smaller blocks: `text_encoder`, `vae_encoder`, `controlnet_vae_encoder`, `denoise`, and `decode`. Access them through the `sub_blocks` property.
The `doc` property is useful for seeing the full documentation of any block, including its inputs, outputs, and components.
```py
vae_encoder_block = pipe.blocks.sub_blocks["vae_encoder"]
print(vae_encoder_block.doc)
```
This block can be converted to a pipeline and run on its own with [`~ModularPipelineBlocks.init_pipeline`].
```py
vae_encoder_pipe = vae_encoder_block.init_pipeline()
# Reuse the VAE we already loaded, we can reuse it with update_components() method
vae_encoder_pipe.update_components(vae=pipe.vae)
# Run just this block
image_latents = vae_encoder_pipe(image=input_image).image_latents
print(image_latents.shape)
```
It reuses the VAE from our original pipeline instead of reloading it, keeping memory usage efficient. Learn more in the [Loading components](https://huggingface.co/docs/diffusers/modular_diffusers/modular_pipeline#loading-components) guide.
Since blocks are composable, you can modify the pipeline's definition by adding, removing, or swapping blocks to create new workflows. In the next section, we'll add a canny edge detection block to a ControlNet pipeline, so you can pass a regular image instead of a pre-processed canny edge map.
## Compose new workflows
Let's add a canny edge detection block to a ControlNet pipeline. First, load a pre-built canny block from the Hub (see [Building Custom Blocks](https://huggingface.co/docs/diffusers/modular_diffusers/custom_blocks) to create your own).
```py
from diffusers.modular_pipelines import ModularPipelineBlocks
# Load a canny block from the Hub
canny_block = ModularPipelineBlocks.from_pretrained(
"diffusers-internal-dev/canny-filtering",
trust_remote_code=True,
)
print(canny_block.doc)
```
```
class CannyBlock
Inputs:
image (`Union[Image, ndarray]`):
Image to compute canny filter on
low_threshold (`int`, *optional*, defaults to 50):
Low threshold for the canny filter.
high_threshold (`int`, *optional*, defaults to 200):
High threshold for the canny filter.
...
Outputs:
control_image (`PIL.Image`):
Canny map for input image
```
Use `get_workflow` to extract the ControlNet workflow from [`QwenImageAutoBlocks`].
```py
# Get the controlnet workflow that we want to work with
blocks = pipe.blocks.get_workflow("controlnet_text2image")
print(blocks.doc)
```
```
class SequentialPipelineBlocks
Inputs:
prompt (`str`):
The prompt or prompts to guide image generation.
control_image (`Image`):
Control image for ControlNet conditioning.
...
```
It requires control_image as input. After inserting the canny block, the pipeline will accept a regular image instead.
```py
# and insert canny at the beginning
blocks.sub_blocks.insert("canny", canny_block, 0)
# Check the updated structure: CannyBlock is now listed as first sub-block
print(blocks)
# Check the updated doc: notice the pipeline now takes "image" as input
# even though it's a controlnet pipeline, because canny preprocesses it into control_image
print(blocks.doc)
```
```
class SequentialPipelineBlocks
Inputs:
image (`Union[Image, ndarray]`):
Image to compute canny filter on
low_threshold (`int`, *optional*, defaults to 50):
Low threshold for the canny filter.
high_threshold (`int`, *optional*, defaults to 200):
High threshold for the canny filter.
prompt (`str`):
The prompt or prompts to guide image generation.
...
```
Now the pipeline takes `image` as input - the canny block will preprocess it into `control_image` automatically.
Create a pipeline from the modified blocks and load a ControlNet model. We use [`ComponentsManager`] to enable CPU offloading for reduced memory usage (learn more in the [ComponentsManager](./components_manager) guide).
```py
from diffusers import ComponentsManager
manager = ComponentsManager()
manager.enable_auto_cpu_offload(device="cuda:0")
pipeline = blocks.init_pipeline("Qwen/Qwen-Image", components_manager=manager)
pipeline.load_components(torch_dtype=torch.bfloat16)
# Load the ControlNet model
controlnet_spec = pipeline.get_component_spec("controlnet")
controlnet_spec.pretrained_model_name_or_path = "InstantX/Qwen-Image-ControlNet-Union"
controlnet = controlnet_spec.load(torch_dtype=torch.bfloat16)
pipeline.update_components(controlnet=controlnet)
```
Now run the pipeline - the canny block preprocesses the image for ControlNet.
```py
from diffusers.utils import load_image
prompt = "cat wizard with red hat, gandalf, lord of the rings, detailed, fantasy, cute, adorable, Pixar, Disney"
image = load_image("https://github.com/Trgtuan10/Image_storage/blob/main/cute_cat.png?raw=true")
output = pipeline(
prompt=prompt,
image=image,
).images[0]
output
```
## Next steps
<hfoptions id="next">
<hfoption id="Build custom blocks">
Learn how to create your own blocks with custom logic in the [Building Custom Blocks](./custom_blocks) guide.
</hfoption>
<hfoption id="Share components">
Use [`ComponentsManager`](./components_manager) to share models across multiple pipelines and manage memory efficiently.
</hfoption>
<hfoption id="Visual interface">
Connect modular pipelines to [Mellon](https://github.com/cubiq/Mellon), a visual node-based interface for building workflows. Custom blocks built with Modular Diffusers work out of the box with Mellon - no UI code required. Read more in Mellon guide.
</hfoption>
</hfoptions>

View File

@@ -457,6 +457,7 @@ else:
"AuraFlowPipeline",
"BlipDiffusionControlNetPipeline",
"BlipDiffusionPipeline",
"BriaFiboEditPipeline",
"BriaFiboPipeline",
"BriaPipeline",
"ChromaImg2ImgPipeline",
@@ -1185,6 +1186,7 @@ if TYPE_CHECKING or DIFFUSERS_SLOW_IMPORT:
AudioLDM2UNet2DConditionModel,
AudioLDMPipeline,
AuraFlowPipeline,
BriaFiboEditPipeline,
BriaFiboPipeline,
BriaPipeline,
ChromaImg2ImgPipeline,

View File

@@ -129,7 +129,7 @@ else:
"AnimateDiffVideoToVideoControlNetPipeline",
]
_import_structure["bria"] = ["BriaPipeline"]
_import_structure["bria_fibo"] = ["BriaFiboPipeline"]
_import_structure["bria_fibo"] = ["BriaFiboPipeline", "BriaFiboEditPipeline"]
_import_structure["flux2"] = ["Flux2Pipeline", "Flux2KleinPipeline"]
_import_structure["flux"] = [
"FluxControlPipeline",
@@ -597,7 +597,7 @@ if TYPE_CHECKING or DIFFUSERS_SLOW_IMPORT:
from .aura_flow import AuraFlowPipeline
from .blip_diffusion import BlipDiffusionPipeline
from .bria import BriaPipeline
from .bria_fibo import BriaFiboPipeline
from .bria_fibo import BriaFiboEditPipeline, BriaFiboPipeline
from .chroma import ChromaImg2ImgPipeline, ChromaInpaintPipeline, ChromaPipeline
from .chronoedit import ChronoEditPipeline
from .cogvideo import (

View File

@@ -23,6 +23,8 @@ except OptionalDependencyNotAvailable:
_dummy_objects.update(get_objects_from_module(dummy_torch_and_transformers_objects))
else:
_import_structure["pipeline_bria_fibo"] = ["BriaFiboPipeline"]
_import_structure["pipeline_bria_fibo_edit"] = ["BriaFiboEditPipeline"]
if TYPE_CHECKING or DIFFUSERS_SLOW_IMPORT:
try:
@@ -33,6 +35,7 @@ if TYPE_CHECKING or DIFFUSERS_SLOW_IMPORT:
from ...utils.dummy_torch_and_transformers_objects import *
else:
from .pipeline_bria_fibo import BriaFiboPipeline
from .pipeline_bria_fibo_edit import BriaFiboEditPipeline
else:
import sys

File diff suppressed because it is too large Load Diff

View File

@@ -84,7 +84,6 @@ EXAMPLE_DOC_STRING = """
>>> from diffusers import ControlNetModel, StableDiffusionXLControlNetImg2ImgPipeline, AutoencoderKL
>>> from diffusers.utils import load_image
>>> depth_estimator = DPTForDepthEstimation.from_pretrained("Intel/dpt-hybrid-midas").to("cuda")
>>> feature_extractor = DPTImageProcessor.from_pretrained("Intel/dpt-hybrid-midas")
>>> controlnet = ControlNetModel.from_pretrained(

View File

@@ -53,7 +53,6 @@ EXAMPLE_DOC_STRING = """
>>> from transformers import AutoTokenizer, LlamaForCausalLM
>>> from diffusers import HiDreamImagePipeline
>>> tokenizer_4 = AutoTokenizer.from_pretrained("meta-llama/Meta-Llama-3.1-8B-Instruct")
>>> text_encoder_4 = LlamaForCausalLM.from_pretrained(
... "meta-llama/Meta-Llama-3.1-8B-Instruct",

View File

@@ -85,7 +85,6 @@ EXAMPLE_DOC_STRING = """
>>> from diffusers import ControlNetModel, StableDiffusionXLControlNetPAGImg2ImgPipeline, AutoencoderKL
>>> from diffusers.utils import load_image
>>> depth_estimator = DPTForDepthEstimation.from_pretrained("Intel/dpt-hybrid-midas").to("cuda")
>>> feature_extractor = DPTFeatureExtractor.from_pretrained("Intel/dpt-hybrid-midas")
>>> controlnet = ControlNetModel.from_pretrained(

View File

@@ -459,7 +459,6 @@ class StableDiffusionLatentUpscalePipeline(DiffusionPipeline, StableDiffusionMix
>>> from diffusers import StableDiffusionLatentUpscalePipeline, StableDiffusionPipeline
>>> import torch
>>> pipeline = StableDiffusionPipeline.from_pretrained(
... "CompVis/stable-diffusion-v1-4", torch_dtype=torch.float16
... )

View File

@@ -587,6 +587,21 @@ class AuraFlowPipeline(metaclass=DummyObject):
requires_backends(cls, ["torch", "transformers"])
class BriaFiboEditPipeline(metaclass=DummyObject):
_backends = ["torch", "transformers"]
def __init__(self, *args, **kwargs):
requires_backends(self, ["torch", "transformers"])
@classmethod
def from_config(cls, *args, **kwargs):
requires_backends(cls, ["torch", "transformers"])
@classmethod
def from_pretrained(cls, *args, **kwargs):
requires_backends(cls, ["torch", "transformers"])
class BriaFiboPipeline(metaclass=DummyObject):
_backends = ["torch", "transformers"]

View File

@@ -0,0 +1,192 @@
# Copyright 2024 Bria AI and The HuggingFace Team. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import unittest
import numpy as np
import torch
from PIL import Image
from transformers import AutoTokenizer
from transformers.models.smollm3.modeling_smollm3 import SmolLM3Config, SmolLM3ForCausalLM
from diffusers import (
AutoencoderKLWan,
BriaFiboEditPipeline,
FlowMatchEulerDiscreteScheduler,
)
from diffusers.models.transformers.transformer_bria_fibo import BriaFiboTransformer2DModel
from tests.pipelines.test_pipelines_common import PipelineTesterMixin
from ...testing_utils import (
enable_full_determinism,
torch_device,
)
enable_full_determinism()
class BriaFiboPipelineFastTests(PipelineTesterMixin, unittest.TestCase):
pipeline_class = BriaFiboEditPipeline
params = frozenset(["prompt", "height", "width", "guidance_scale"])
batch_params = frozenset(["prompt"])
test_xformers_attention = False
test_layerwise_casting = False
test_group_offloading = False
supports_dduf = False
def get_dummy_components(self):
torch.manual_seed(0)
transformer = BriaFiboTransformer2DModel(
patch_size=1,
in_channels=16,
num_layers=1,
num_single_layers=1,
attention_head_dim=8,
num_attention_heads=2,
joint_attention_dim=64,
text_encoder_dim=32,
pooled_projection_dim=None,
axes_dims_rope=[0, 4, 4],
)
vae = AutoencoderKLWan(
base_dim=80,
decoder_base_dim=128,
dim_mult=[1, 2, 4, 4],
dropout=0.0,
in_channels=12,
latents_mean=[0.0] * 16,
latents_std=[1.0] * 16,
is_residual=True,
num_res_blocks=2,
out_channels=12,
patch_size=2,
scale_factor_spatial=16,
scale_factor_temporal=4,
temperal_downsample=[False, True, True],
z_dim=16,
)
scheduler = FlowMatchEulerDiscreteScheduler()
text_encoder = SmolLM3ForCausalLM(SmolLM3Config(hidden_size=32))
tokenizer = AutoTokenizer.from_pretrained("hf-internal-testing/tiny-random-t5")
components = {
"scheduler": scheduler,
"text_encoder": text_encoder,
"tokenizer": tokenizer,
"transformer": transformer,
"vae": vae,
}
return components
def get_dummy_inputs(self, device, seed=0):
if str(device).startswith("mps"):
generator = torch.manual_seed(seed)
else:
generator = torch.Generator(device="cpu").manual_seed(seed)
inputs = {
"prompt": '{"text": "A painting of a squirrel eating a burger","edit_instruction": "A painting of a squirrel eating a burger"}',
"negative_prompt": "bad, ugly",
"generator": generator,
"num_inference_steps": 2,
"guidance_scale": 5.0,
"height": 192,
"width": 336,
"output_type": "np",
}
image = Image.new("RGB", (336, 192), (255, 255, 255))
inputs["image"] = image
return inputs
@unittest.skip(reason="will not be supported due to dim-fusion")
def test_encode_prompt_works_in_isolation(self):
pass
@unittest.skip(reason="Batching is not supported yet")
def test_num_images_per_prompt(self):
pass
@unittest.skip(reason="Batching is not supported yet")
def test_inference_batch_consistent(self):
pass
@unittest.skip(reason="Batching is not supported yet")
def test_inference_batch_single_identical(self):
pass
def test_bria_fibo_different_prompts(self):
pipe = self.pipeline_class(**self.get_dummy_components())
pipe = pipe.to(torch_device)
inputs = self.get_dummy_inputs(torch_device)
output_same_prompt = pipe(**inputs).images[0]
inputs = self.get_dummy_inputs(torch_device)
inputs["prompt"] = {"edit_instruction": "a different prompt"}
output_different_prompts = pipe(**inputs).images[0]
max_diff = np.abs(output_same_prompt - output_different_prompts).max()
assert max_diff > 1e-6
def test_image_output_shape(self):
pipe = self.pipeline_class(**self.get_dummy_components())
pipe = pipe.to(torch_device)
inputs = self.get_dummy_inputs(torch_device)
height_width_pairs = [(32, 32), (64, 64), (32, 64)]
for height, width in height_width_pairs:
expected_height = height
expected_width = width
inputs.update({"height": height, "width": width})
image = pipe(**inputs).images[0]
output_height, output_width, _ = image.shape
assert (output_height, output_width) == (expected_height, expected_width)
def test_bria_fibo_edit_mask(self):
pipe = self.pipeline_class(**self.get_dummy_components())
pipe = pipe.to(torch_device)
inputs = self.get_dummy_inputs(torch_device)
mask = Image.fromarray((np.ones((192, 336)) * 255).astype(np.uint8), mode="L")
inputs.update({"mask": mask})
output = pipe(**inputs).images[0]
assert output.shape == (192, 336, 3)
def test_bria_fibo_edit_mask_image_size_mismatch(self):
pipe = self.pipeline_class(**self.get_dummy_components())
pipe = pipe.to(torch_device)
inputs = self.get_dummy_inputs(torch_device)
mask = Image.fromarray((np.ones((64, 64)) * 255).astype(np.uint8), mode="L")
inputs.update({"mask": mask})
with self.assertRaisesRegex(ValueError, "Mask and image must have the same size"):
pipe(**inputs)
def test_bria_fibo_edit_mask_no_image(self):
pipe = self.pipeline_class(**self.get_dummy_components())
pipe = pipe.to(torch_device)
inputs = self.get_dummy_inputs(torch_device)
mask = Image.fromarray((np.ones((32, 32)) * 255).astype(np.uint8), mode="L")
# Remove image from inputs if it's there (it shouldn't be by default from get_dummy_inputs)
inputs.pop("image", None)
inputs.update({"mask": mask})
with self.assertRaisesRegex(ValueError, "If mask is provided, image must also be provided"):
pipe(**inputs)