Merge branch 'main' into fp8-note-torchao

Update docs/source/en/quantization/torchao.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-12-07 21:14:44 +08:00 · 2025-06-13 06:53:29 +05:30 · 2025-06-13 06:53:20 +05:30 · 2025-06-12 09:34:57 +05:30
1 changed files with 3 additions and 0 deletions
--- a/docs/source/en/quantization/torchao.md
+++ b/docs/source/en/quantization/torchao.md
@@ -65,6 +65,9 @@ transformer = torch.compile(transformer, mode="max-autotune", fullgraph=True)

 For speed and memory benchmarks on Flux and CogVideoX, please refer to the table [here](https://github.com/huggingface/diffusers/pull/10009#issue-2688781450). You can also find some torchao [benchmarks](https://github.com/pytorch/ao/tree/main/torchao/quantization#benchmarks) numbers for various hardware.

+> [!TIP]
+> The FP8 post-training quantization schemes in torchao are effective for GPUs with compute capability of at least 8.9 (RTX-4090, Hopper, etc.). FP8 often provides the best speed, memory, and quality trade-off when generating images and videos. We recommend combining FP8 and torch.compile if your GPU is compatible.
+
 torchao also supports an automatic quantization API through [autoquant](https://github.com/pytorch/ao/blob/main/torchao/quantization/README.md#autoquantization). Autoquantization determines the best quantization strategy applicable to a model by comparing the performance of each technique on chosen input types and shapes. Currently, this can be used directly on the underlying modeling components. Diffusers will also expose an autoquant configuration option in the future.

 The `TorchAoConfig` class accepts three parameters:
Author	SHA1	Message	Date
Sayak Paul	ddee236a20	Merge branch 'main' into fp8-note-torchao	2025-06-13 06:53:29 +05:30
Sayak Paul	01c637f3c2	Update docs/source/en/quantization/torchao.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>	2025-06-13 06:53:20 +05:30
sayakpaul	fb0c722661	mention fp8 benefits on supported hardware.	2025-06-12 09:34:57 +05:30