* Add ZImageInpaintPipeline
Updated the pipeline structure to include ZImageInpaintPipeline
alongside ZImagePipeline and ZImageImg2ImgPipeline.
Implemented the ZImageInpaintPipeline class for inpainting
tasks, including necessary methods for encoding prompts,
preparing masked latents, and denoising.
Enhanced the auto_pipeline to map the new ZImageInpaintPipeline
for inpainting generation tasks.
Added unit tests for ZImageInpaintPipeline to ensure
functionality and performance.
Updated dummy objects to include ZImageInpaintPipeline for
testing purposes.
* Add documentation and improve test stability for ZImageInpaintPipeline
- Add torch.empty fix for x_pad_token and cap_pad_token in test
- Add # Copied from annotations for encode_prompt methods
- Add documentation with usage example and autodoc directive
* Address PR review feedback for ZImageInpaintPipeline
Add batch size validation and callback handling fixes per review,
using diffusers conventions rather than suggested code verbatim.
* Update src/diffusers/pipelines/z_image/pipeline_z_image_inpaint.py
Co-authored-by: Álvaro Somoza <asomoza@users.noreply.github.com>
* Update src/diffusers/pipelines/z_image/pipeline_z_image_inpaint.py
Co-authored-by: Álvaro Somoza <asomoza@users.noreply.github.com>
* Add input validation and fix XLA support for ZImageInpaintPipeline
- Add missing is_torch_xla_available import for TPU support
- Add xm.mark_step() in denoising loop for proper XLA execution
- Add check_inputs() method for comprehensive input validation
- Call check_inputs() at the start of __call__
Addresses PR review feedback from @asomoza.
* Cleanup
---------
Co-authored-by: Álvaro Somoza <asomoza@users.noreply.github.com>
3.4 KiB
Z-Image
Z-Image is a powerful and highly efficient image generation model with 6B parameters. Currently there's only one model with two more to be released:
| Model | Hugging Face |
|---|---|
| Z-Image-Turbo | https://huggingface.co/Tongyi-MAI/Z-Image-Turbo |
Z-Image-Turbo
Z-Image-Turbo is a distilled version of Z-Image that matches or exceeds leading competitors with only 8 NFEs (Number of Function Evaluations). It offers sub-second inference latency on enterprise-grade H800 GPUs and fits comfortably within 16G VRAM consumer devices. It excels in photorealistic image generation, bilingual text rendering (English & Chinese), and robust instruction adherence.
Image-to-image
Use [ZImageImg2ImgPipeline] to transform an existing image based on a text prompt.
import torch
from diffusers import ZImageImg2ImgPipeline
from diffusers.utils import load_image
pipe = ZImageImg2ImgPipeline.from_pretrained("Tongyi-MAI/Z-Image-Turbo", torch_dtype=torch.bfloat16)
pipe.to("cuda")
url = "https://raw.githubusercontent.com/CompVis/stable-diffusion/main/assets/stable-samples/img2img/sketch-mountains-input.jpg"
init_image = load_image(url).resize((1024, 1024))
prompt = "A fantasy landscape with mountains and a river, detailed, vibrant colors"
image = pipe(
prompt,
image=init_image,
strength=0.6,
num_inference_steps=9,
guidance_scale=0.0,
generator=torch.Generator("cuda").manual_seed(42),
).images[0]
image.save("zimage_img2img.png")
Inpainting
Use [ZImageInpaintPipeline] to inpaint specific regions of an image based on a text prompt and mask.
import torch
import numpy as np
from PIL import Image
from diffusers import ZImageInpaintPipeline
from diffusers.utils import load_image
pipe = ZImageInpaintPipeline.from_pretrained("Tongyi-MAI/Z-Image-Turbo", torch_dtype=torch.bfloat16)
pipe.to("cuda")
url = "https://raw.githubusercontent.com/CompVis/stable-diffusion/main/assets/stable-samples/img2img/sketch-mountains-input.jpg"
init_image = load_image(url).resize((1024, 1024))
# Create a mask (white = inpaint, black = preserve)
mask = np.zeros((1024, 1024), dtype=np.uint8)
mask[256:768, 256:768] = 255 # Inpaint center region
mask_image = Image.fromarray(mask)
prompt = "A beautiful lake with mountains in the background"
image = pipe(
prompt,
image=init_image,
mask_image=mask_image,
strength=1.0,
num_inference_steps=9,
guidance_scale=0.0,
generator=torch.Generator("cuda").manual_seed(42),
).images[0]
image.save("zimage_inpaint.png")
ZImagePipeline
autodoc ZImagePipeline - all - call
ZImageImg2ImgPipeline
autodoc ZImageImg2ImgPipeline - all - call
ZImageInpaintPipeline
autodoc ZImageInpaintPipeline - all - call