mirror of https://github.com/huggingface/diffusers.git synced 2026-03-07 01:01:45 +08:00

Files

dg845 33f785b444 Add Helios-14B Video Generation Pipelines (#13208 )

* [1/N] add helios

* fix test

* make fix-copies

* change script path

* fix cus script

* update docs

* fix documented check

* update links for docs and examples

* change default config

* small refactor

* add test

* Update src/diffusers/models/transformers/transformer_helios.py

Co-authored-by: YiYi Xu <yixu310@gmail.com>

* remove register_buffer for _scale_cache

* fix non-cuda devices error

* remove "handle the case when timestep is 2D"

* refactor HeliosMultiTermMemoryPatch and process_input_hidden_states

* Update src/diffusers/pipelines/helios/pipeline_helios.py

Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com>

* Update src/diffusers/models/transformers/transformer_helios.py

Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com>

* Update src/diffusers/pipelines/helios/pipeline_helios.py

Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com>

* fix calculate_shift

* Update src/diffusers/pipelines/helios/pipeline_helios.py

Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com>

* rewritten `einops` in pure `torch`

* fix: pass patch_size to apply_schedule_shift instead of hardcoding

* remove the logics of 'vae_decode_type'

* move some validation into check_inputs()

* rename helios scheduler & merge all into one step()

* add some details to doc

* move dmd  step() logics from pipeline to scheduler

* change to Python 3.9+ style type

* fix NoneType error

* refactor DMD scheduler's set_timestep

* change rope related vars name

* fix stage2 sample

* fix dmd sample

* Update src/diffusers/models/transformers/transformer_helios.py

Co-authored-by: YiYi Xu <yixu310@gmail.com>

* Update src/diffusers/models/transformers/transformer_helios.py

Co-authored-by: YiYi Xu <yixu310@gmail.com>

* remove redundant & refactor norm_out

* Update src/diffusers/pipelines/helios/pipeline_helios.py

Co-authored-by: YiYi Xu <yixu310@gmail.com>

* change "is_keep_x0" to "keep_first_frame"

* use a more intuitive name

* refactor dynamic_time_shifting

* remove use_dynamic_shifting args

* remove usage of UniPCMultistepScheduler

* separate stage2 sample to HeliosPyramidPipeline

* Update src/diffusers/models/transformers/transformer_helios.py

Co-authored-by: YiYi Xu <yixu310@gmail.com>

* Update src/diffusers/models/transformers/transformer_helios.py

Co-authored-by: YiYi Xu <yixu310@gmail.com>

* Update src/diffusers/models/transformers/transformer_helios.py

Co-authored-by: YiYi Xu <yixu310@gmail.com>

* Update src/diffusers/models/transformers/transformer_helios.py

Co-authored-by: YiYi Xu <yixu310@gmail.com>

* fix transformer

* use a more intuitive name

* update example script

* fix requirements

* remove redudant attention mask

* fix

* optimize pipelines

* make style .

* update TYPE_CHECKING

* change to use torch.split

Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com>

* derive memory patch sizes from patch_size multiples

* remove some hardcoding

* move some checks into check_inputs

* refactor sample_block_noise

* optimize encoding chunks logits for v2v

* use num_history_latent_frames = sum(history_sizes)

* Update src/diffusers/pipelines/helios/pipeline_helios.py

Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com>

* remove redudant optimized_scale

* Update src/diffusers/pipelines/helios/pipeline_helios_pyramid.py

Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com>

* use more descriptive name

* optimize history_latents

* remove not used "num_inference_steps"

* removed redudant "pyramid_num_stages"

* add "is_cfg_zero_star" and "is_distilled" to HeliosPyramidPipeline

* remove redudant

* change example scripts name

* change example scripts name

* correct docs

* update example

* update docs

* Update tests/models/transformers/test_models_transformer_helios.py

Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com>

* Update tests/models/transformers/test_models_transformer_helios.py

Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com>

* separate HeliosDMDScheduler

* fix numerical stability issue:

* Update src/diffusers/schedulers/scheduling_helios_dmd.py

Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com>

* Update src/diffusers/schedulers/scheduling_helios_dmd.py

Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com>

* Update src/diffusers/schedulers/scheduling_helios_dmd.py

Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com>

* Update src/diffusers/schedulers/scheduling_helios_dmd.py

Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com>

* Update src/diffusers/schedulers/scheduling_helios_dmd.py

Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com>

* remove redudant

* small refactor

* remove use_interpolate_prompt logits

* simplified model test

* fallbackt to BaseModelTesterConfig

* remove _maybe_expand_t2v_lora_for_i2v

* fix HeliosLoraLoaderMixin

* update docs

* use randn_tensor for test

* fix doc typo

* optimize code

* mark torch.compile xfail

* change paper name

* Make get_dummy_inputs deterministic using self.generator

* Set less strict threshold for test_save_load_float16 test for Helios pipeline

* make style and make quality

* Preparation for merging

* add torch.Generator

* Fix HeliosPipelineOutput doc path

* Fix Helios related (optimize docs & remove redudant) (#13210)

* fix docs

* remove redudant

* remove redudant

* fix group offload

* Removed fixes for group offload

---------

Co-authored-by: yuanshenghai <yuanshenghai@bytedance.com>
Co-authored-by: Shenghai Yuan <140951558+SHYuanBest@users.noreply.github.com>
Co-authored-by: YiYi Xu <yixu310@gmail.com>
Co-authored-by: SHYuanBest <shyuan-cs@hotmail.com>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

2026-03-04 21:31:43 +05:30

8.4 KiB

Raw Blame History

Helios

Helios 是首个能够在单张 NVIDIA H100 GPU 上以 19.5 FPS 运行的 14B 视频生成模型。它在支持分钟级视频生成的同时，拥有媲美强大基线模型的生成质量，并在统一架构下原生集成了文生视频（T2V）、图生视频（I2V）和视频生视频（V2V）任务。Helios 的主要特性包括：

无需常用的防漂移策略（例如：自强制/self-forcing、误差库/error-banks、关键帧采样或逆采样），我们的模型即可生成高质量且高度连贯的分钟级视频。
无需标准的加速技术（例如：KV 缓存、因果掩码、稀疏/线性注意力机制、TinyVAE、渐进式噪声调度、隐藏状态缓存或量化），作为一款 14B 规模的视频生成模型，我们在单张 H100 GPU 上的端到端推理速度便达到了 19.5 FPS。
引入了多项优化方案，在降低显存消耗的同时，显著提升了训练与推理的吞吐量。这些改进使得我们无需借助并行或分片（sharding）等基础设施，即可使用与图像模型相当的批大小（batch sizes）来训练 14B 的视频生成模型。

本指南将引导您完成 Helios 在不同场景下的使用。

Load Model Checkpoints

模型权重可以存储在Hub上或本地的单独子文件夹中，在这种情况下，您应该使用 [~DiffusionPipeline.from_pretrained] 方法。

import torch
from diffusers import HeliosPipeline, HeliosPyramidPipeline
from huggingface_hub import snapshot_download

# For Best Quality
snapshot_download(repo_id="BestWishYsh/Helios-Base", local_dir="BestWishYsh/Helios-Base")
pipe = HeliosPipeline.from_pretrained("BestWishYsh/Helios-Base", torch_dtype=torch.bfloat16)
pipe.to("cuda")

# Intermediate Weight
snapshot_download(repo_id="BestWishYsh/Helios-Mid", local_dir="BestWishYsh/Helios-Mid")
pipe = HeliosPyramidPipeline.from_pretrained("BestWishYsh/Helios-Mid", torch_dtype=torch.bfloat16)
pipe.to("cuda")

# For Best Efficiency
snapshot_download(repo_id="BestWishYsh/Helios-Distilled", local_dir="BestWishYsh/Helios-Distilled")
pipe = HeliosPyramidPipeline.from_pretrained("BestWishYsh/Helios-Distilled", torch_dtype=torch.bfloat16)
pipe.to("cuda")

Text-to-Video Showcases

Prompt	Generated Video
A Viking warrior driving a modern city bus filled with passengers. The Viking has long blonde hair tied back, a beard, and is adorned with a fur-lined helmet and armor. He wears a traditional tunic and trousers, but also sports a seatbelt as he focuses on navigating the busy streets. The interior of the bus is typical, with rows of seats occupied by diverse passengers going about their daily routines. The exterior shots show the bustling urban environment, including tall buildings and traffic. Medium shot focusing on the Viking at the wheel, with occasional close-ups of his determined expression.
A documentary-style nature photography shot from a camera truck moving to the left, capturing a crab quickly scurrying into its burrow. The crab has a hard, greenish-brown shell and long claws, moving with determined speed across the sandy ground. Its body is slightly arched as it burrows into the sand, leaving a small trail behind. The background shows a shallow beach with scattered rocks and seashells, and the horizon features a gentle curve of the coastline. The photo has a natural and realistic texture, emphasizing the crab's natural movement and the texture of the sand. A close-up shot from a slightly elevated angle.

Prompt

Generated Video

A Viking warrior driving a modern city bus filled with passengers. The Viking has long blonde hair tied back, a beard, and is adorned with a fur-lined helmet and armor. He wears a traditional tunic and trousers, but also sports a seatbelt as he focuses on navigating the busy streets. The interior of the bus is typical, with rows of seats occupied by diverse passengers going about their daily routines. The exterior shots show the bustling urban environment, including tall buildings and traffic. Medium shot focusing on the Viking at the wheel, with occasional close-ups of his determined expression.

A documentary-style nature photography shot from a camera truck moving to the left, capturing a crab quickly scurrying into its burrow. The crab has a hard, greenish-brown shell and long claws, moving with determined speed across the sandy ground. Its body is slightly arched as it burrows into the sand, leaving a small trail behind. The background shows a shallow beach with scattered rocks and seashells, and the horizon features a gentle curve of the coastline. The photo has a natural and realistic texture, emphasizing the crab's natural movement and the texture of the sand. A close-up shot from a slightly elevated angle.

Image-to-Video Showcases

Image	Prompt	Generated Video
	A sleek red Kia car speeds along a rural road under a cloudy sky, its modern design and dynamic movement emphasized by the blurred motion of the surrounding fields and trees stretching into the distance. The car's glossy exterior reflects the overcast sky, highlighting its aerodynamic shape and sporty stance. The license plate reads "KIA 626," and the vehicle's headlights are on, adding to the sense of motion and energy. The road curves gently, with the car positioned slightly off-center, creating a sense of forward momentum. A dynamic front three-quarter view captures the car's powerful presence against the serene backdrop of rolling hills and scattered trees.
	A close-up captures a fluffy orange cat with striking green eyes and white whiskers, gazing intently towards the camera. The cat's fur is soft and well-groomed, with a mix of warm orange and cream tones. Its large, expressive eyes are a vivid green, reflecting curiosity and alertness. The cat's nose is small and pink, and its mouth is slightly open, revealing a hint of its pink tongue. The background is softly blurred, suggesting a cozy indoor setting with neutral tones. The photo has a shallow depth of field, focusing sharply on the cat's face while the background remains out of focus. A close-up shot from a slightly elevated perspective.

Interactive-Video Showcases

Prompt	Generated Video
The prompt can be found here
The prompt can be found here

Resources

通过以下资源了解有关 Helios 的更多信息：

视频1和视频2演示了 Helios 的主要功能;
有关更多详细信息，请参阅研究论文 Helios: Real Real-Time Long Video Generation Model。

8.4 KiB Raw Blame History Unescape Escape