Fix HunyuanVideo 1.5 I2V by preprocessing image at pixel resolution i… (#13440 )

Fix HunyuanVideo 1.5 I2V by preprocessing image at pixel resolution instead of latent resolution
Fix grammar in LoRA documentation (#13423 )
2026-04-11 10:12:01 +08:00 · 2026-04-10 09:54:36 -10:00 · 2026-04-10 09:18:30 -07:00
2 changed files with 4 additions and 4 deletions
--- a/docs/source/en/quicktour.md
+++ b/docs/source/en/quicktour.md
@@ -101,9 +101,9 @@ export_to_video(video, "output.mp4", fps=16)

 ## LoRA

-Adapters insert a small number of trainable parameters to the original base model. Only the inserted parameters are fine-tuned while the rest of the model weights remain frozen. This makes it fast and cheap to fine-tune a model on a new style. Among adapters, [LoRA's](./tutorials/using_peft_for_inference) are the most popular.
+Adapters insert a small number of trainable parameters to the original base model. Only the inserted parameters are fine-tuned while the rest of the model weights remain frozen. This makes it fast and cheap to fine-tune a model on a new style. Among adapters, [LoRAs](./tutorials/using_peft_for_inference) are the most popular.

-Add a LoRA to a pipeline with the [`~loaders.QwenImageLoraLoaderMixin.load_lora_weights`] method. Some LoRA's require a special word to trigger it, such as `Realism`, in the example below. Check a LoRA's model card to see if it requires a trigger word.
+Add a LoRA to a pipeline with the [`~loaders.QwenImageLoraLoaderMixin.load_lora_weights`] method. Some LoRAs require a special word to trigger them, such as `Realism`, in the example below. Check a LoRA's model card to see if it requires a trigger word.

 ```py
 import torch
--- a/src/diffusers/pipelines/hunyuan_video1_5/pipeline_hunyuan_video1_5_image2video.py
+++ b/src/diffusers/pipelines/hunyuan_video1_5/pipeline_hunyuan_video1_5_image2video.py
@@ -611,7 +611,7 @@ class HunyuanVideo15ImageToVideoPipeline(DiffusionPipeline):
            tuple: (cond_latents_concat, mask_concat) - both are zero tensors for t2v
        """

-        batch, channels, frames, height, width = latents.shape
+        batch, channels, frames, latent_height, latent_width = latents.shape

        image_latents = self._get_image_latents(
            vae=self.vae,
@@ -626,7 +626,7 @@ class HunyuanVideo15ImageToVideoPipeline(DiffusionPipeline):
        latent_condition[:, :, 1:, :, :] = 0
        latent_condition = latent_condition.to(device=device, dtype=dtype)

-        latent_mask = torch.zeros(batch, 1, frames, height, width, dtype=dtype, device=device)
+        latent_mask = torch.zeros(batch, 1, frames, latent_height, latent_width, dtype=dtype, device=device)
        latent_mask[:, :, 0, :, :] = 1.0

        return latent_condition, latent_mask
Author	SHA1	Message	Date
Akshan Krithick	87beae7771	Fix HunyuanVideo 1.5 I2V by preprocessing image at pixel resolution i… (#13440 ) Fix HunyuanVideo 1.5 I2V by preprocessing image at pixel resolution instead of latent resolution	2026-04-10 09:54:36 -10:00
Xyc2016	251676dfda	Fix grammar in LoRA documentation (#13423 ) Fix grammar in LoRA documentation (LoRA's → LoRAs, trigger it → trigger them)	2026-04-10 09:18:30 -07:00