mirror of
https://github.com/huggingface/diffusers.git
synced 2025-12-10 06:24:19 +08:00
* update * updaet * update * update * update * update * update * update * update * update * update * update * Update docs/source/en/quantization/quanto.md Co-authored-by: Sayak Paul <spsayakpaul@gmail.com> * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * Update src/diffusers/quantizers/quanto/utils.py Co-authored-by: Sayak Paul <spsayakpaul@gmail.com> * update * update --------- Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
1.5 KiB
1.5 KiB
Quantization
Quantization techniques reduce memory and computational costs by representing weights and activations with lower-precision data types like 8-bit integers (int8). This enables loading larger models you normally wouldn't be able to fit into memory, and speeding up inference. Diffusers supports 8-bit and 4-bit quantization with bitsandbytes.
Quantization techniques that aren't supported in Transformers can be added with the [DiffusersQuantizer] class.
Learn how to quantize models in the Quantization guide.
BitsAndBytesConfig
autodoc BitsAndBytesConfig
GGUFQuantizationConfig
autodoc GGUFQuantizationConfig
QuantoConfig
autodoc QuantoConfig
TorchAoConfig
autodoc TorchAoConfig
DiffusersQuantizer
autodoc quantizers.base.DiffusersQuantizer