Compare commits

...

7 Commits

Author SHA1 Message Date
Sayak Paul
8470ce3d06 Merge branch 'main' into cache-docs-fixes 2026-01-10 09:13:39 +05:30
sayakpaul
73601980c2 up 2026-01-10 09:09:44 +05:30
Sayak Paul
25795856e0 Update docs/source/en/optimization/cache.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2026-01-10 09:07:46 +05:30
Sayak Paul
d76b744ac3 Merge branch 'main' into cache-docs-fixes 2025-11-26 15:22:39 +05:30
Sayak Paul
b26867b628 Merge branch 'main' into cache-docs-fixes 2025-11-20 10:06:19 +05:30
Sayak Paul
e3f441648c Update docs/source/en/optimization/cache.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-11-20 10:00:46 +05:30
sayakpaul
c6cfc5ce1d polish caching docs. 2025-11-19 08:40:28 +05:30
3 changed files with 20 additions and 5 deletions

View File

@@ -29,7 +29,7 @@ Cache methods speedup diffusion transformers by storing and reusing intermediate
[[autodoc]] apply_faster_cache
### FirstBlockCacheConfig
## FirstBlockCacheConfig
[[autodoc]] FirstBlockCacheConfig

View File

@@ -68,6 +68,20 @@ config = FasterCacheConfig(
pipeline.transformer.enable_cache(config)
```
## FirstBlockCache
[FirstBlock Cache](https://huggingface.co/docs/diffusers/main/en/api/cache#diffusers.FirstBlockCacheConfig) checks how much the early layers of the denoiser changes from one timestep to the next. If the change is small, the model skips the expensive later layers and reuses the previous output.
```py
import torch
from diffusers import DiffusionPipeline
from diffusers.hooks import apply_first_block_cache, FirstBlockCacheConfig
pipeline = DiffusionPipeline.from_pretrained(
"Qwen/Qwen-Image", torch_dtype=torch.bfloat16
)
apply_first_block_cache(pipeline.transformer, FirstBlockCacheConfig(threshold=0.2))
```
## TaylorSeer Cache
[TaylorSeer Cache](https://huggingface.co/papers/2403.06923) accelerates diffusion inference by using Taylor series expansions to approximate and cache intermediate activations across denoising steps. The method predicts future outputs based on past computations, reusing them at specified intervals to reduce redundant calculations.
@@ -87,8 +101,7 @@ from diffusers import FluxPipeline, TaylorSeerCacheConfig
pipe = FluxPipeline.from_pretrained(
"black-forest-labs/FLUX.1-dev",
torch_dtype=torch.bfloat16,
)
pipe.to("cuda")
).to("cuda")
config = TaylorSeerCacheConfig(
cache_interval=5,
@@ -97,4 +110,4 @@ config = TaylorSeerCacheConfig(
taylor_factors_dtype=torch.bfloat16,
)
pipe.transformer.enable_cache(config)
```
```

View File

@@ -41,9 +41,11 @@ class CacheMixin:
Enable caching techniques on the model.
Args:
config (`Union[PyramidAttentionBroadcastConfig]`):
config (`Union[PyramidAttentionBroadcastConfig, FasterCacheConfig, FirstBlockCacheConfig]`):
The configuration for applying the caching technique. Currently supported caching techniques are:
- [`~hooks.PyramidAttentionBroadcastConfig`]
- [`~hooks.FasterCacheConfig`]
- [`~hooks.FirstBlockCacheConfig`]
Example: