mirror of https://github.com/huggingface/diffusers.git synced 2026-02-14 14:55:26 +08:00

Files

CalamitousFelicitousness 99e2cfff27 Feature/zimage inpaint pipeline (#13006 )

* Add ZImageInpaintPipeline

Updated the pipeline structure to include ZImageInpaintPipeline
    alongside ZImagePipeline and ZImageImg2ImgPipeline.
Implemented the ZImageInpaintPipeline class for inpainting
    tasks, including necessary methods for encoding prompts,
    preparing masked latents, and denoising.
Enhanced the auto_pipeline to map the new ZImageInpaintPipeline
    for inpainting generation tasks.
Added unit tests for ZImageInpaintPipeline to ensure
    functionality and performance.
Updated dummy objects to include ZImageInpaintPipeline for
    testing purposes.

* Add documentation and improve test stability for ZImageInpaintPipeline

- Add torch.empty fix for x_pad_token and cap_pad_token in test
- Add # Copied from annotations for encode_prompt methods
- Add documentation with usage example and autodoc directive

* Address PR review feedback for ZImageInpaintPipeline

Add batch size validation and callback handling fixes per review,
using diffusers conventions rather than suggested code verbatim.

* Update src/diffusers/pipelines/z_image/pipeline_z_image_inpaint.py

Co-authored-by: Álvaro Somoza <asomoza@users.noreply.github.com>

* Update src/diffusers/pipelines/z_image/pipeline_z_image_inpaint.py

Co-authored-by: Álvaro Somoza <asomoza@users.noreply.github.com>

* Add input validation and fix XLA support for ZImageInpaintPipeline

- Add missing is_torch_xla_available import for TPU support
- Add xm.mark_step() in denoising loop for proper XLA execution
- Add check_inputs() method for comprehensive input validation
- Call check_inputs() at the start of __call__

Addresses PR review feedback from @asomoza.

* Cleanup

---------

Co-authored-by: Álvaro Somoza <asomoza@users.noreply.github.com>

2026-02-05 11:48:25 -03:00

3.4 KiB

Raw Permalink Blame History

Z-Image

Z-Image is a powerful and highly efficient image generation model with 6B parameters. Currently there's only one model with two more to be released:

Model	Hugging Face
Z-Image-Turbo	https://huggingface.co/Tongyi-MAI/Z-Image-Turbo

Z-Image-Turbo

Z-Image-Turbo is a distilled version of Z-Image that matches or exceeds leading competitors with only 8 NFEs (Number of Function Evaluations). It offers sub-second inference latency on enterprise-grade H800 GPUs and fits comfortably within 16G VRAM consumer devices. It excels in photorealistic image generation, bilingual text rendering (English & Chinese), and robust instruction adherence.

Image-to-image

Use [ZImageImg2ImgPipeline] to transform an existing image based on a text prompt.

import torch
from diffusers import ZImageImg2ImgPipeline
from diffusers.utils import load_image

pipe = ZImageImg2ImgPipeline.from_pretrained("Tongyi-MAI/Z-Image-Turbo", torch_dtype=torch.bfloat16)
pipe.to("cuda")

url = "https://raw.githubusercontent.com/CompVis/stable-diffusion/main/assets/stable-samples/img2img/sketch-mountains-input.jpg"
init_image = load_image(url).resize((1024, 1024))

prompt = "A fantasy landscape with mountains and a river, detailed, vibrant colors"
image = pipe(
    prompt,
    image=init_image,
    strength=0.6,
    num_inference_steps=9,
    guidance_scale=0.0,
    generator=torch.Generator("cuda").manual_seed(42),
).images[0]
image.save("zimage_img2img.png")

Inpainting

Use [ZImageInpaintPipeline] to inpaint specific regions of an image based on a text prompt and mask.

import torch
import numpy as np
from PIL import Image
from diffusers import ZImageInpaintPipeline
from diffusers.utils import load_image

pipe = ZImageInpaintPipeline.from_pretrained("Tongyi-MAI/Z-Image-Turbo", torch_dtype=torch.bfloat16)
pipe.to("cuda")

url = "https://raw.githubusercontent.com/CompVis/stable-diffusion/main/assets/stable-samples/img2img/sketch-mountains-input.jpg"
init_image = load_image(url).resize((1024, 1024))

# Create a mask (white = inpaint, black = preserve)
mask = np.zeros((1024, 1024), dtype=np.uint8)
mask[256:768, 256:768] = 255  # Inpaint center region
mask_image = Image.fromarray(mask)

prompt = "A beautiful lake with mountains in the background"
image = pipe(
    prompt,
    image=init_image,
    mask_image=mask_image,
    strength=1.0,
    num_inference_steps=9,
    guidance_scale=0.0,
    generator=torch.Generator("cuda").manual_seed(42),
).images[0]
image.save("zimage_inpaint.png")

ZImagePipeline

autodoc ZImagePipeline - all - call

ZImageImg2ImgPipeline

autodoc ZImageImg2ImgPipeline - all - call

ZImageInpaintPipeline

autodoc ZImageInpaintPipeline - all - call

3.4 KiB Raw Permalink Blame History