mirror of
https://github.com/huggingface/diffusers.git
synced 2025-12-09 22:14:43 +08:00
1.4 KiB
1.4 KiB
Quantization
Quantization techniques reduce memory and computational costs by representing weights and activations with lower-precision data types like 8-bit integers (int8). This enables loading larger models you normally wouldn't be able to fit into memory, and speeding up inference.
Tip
Learn how to quantize models in the Quantization guide.
PipelineQuantizationConfig
autodoc quantizers.PipelineQuantizationConfig
BitsAndBytesConfig
autodoc quantizers.quantization_config.BitsAndBytesConfig
GGUFQuantizationConfig
autodoc quantizers.quantization_config.GGUFQuantizationConfig
QuantoConfig
autodoc quantizers.quantization_config.QuantoConfig
TorchAoConfig
autodoc quantizers.quantization_config.TorchAoConfig
DiffusersQuantizer
autodoc quantizers.base.DiffusersQuantizer