Signed-off-by: Chendi Xue <Chendi.Xue@intel.com> Signed-off-by: Chendi Xue <chendi.xue@intel.com>
Features
Compatibility Matrix
The tables below show mutually exclusive features and the support on some hardware.
The symbols used have the following meanings:
- ✅ = Full compatibility
- 🟠 = Partial compatibility
- ❌ = No compatibility
- ❔ = Unknown or TBD
!!! note Check the ❌ or 🟠 with links to see tracking issue for unsupported feature/hardware combination.
Feature x Feature
<style> td:not(:first-child) { text-align: center !important; } td { padding: 0.5rem !important; white-space: nowrap; } th { padding: 0.5rem !important; min-width: 0 !important; } th:not(:first-child) { writing-mode: vertical-lr; transform: rotate(180deg) } </style>| Feature | [CP][chunked-prefill] | APC | LoRA | SD | CUDA graph | pooling | enc-dec | logP | prmpt logP | async output | multi-step | mm | best-of | beam-search | prompt-embeds |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| [CP][chunked-prefill] | ✅ | ||||||||||||||
| APC | ✅ | ✅ | |||||||||||||
| LoRA | ✅ | ✅ | ✅ | ||||||||||||
| SD | ✅ | ✅ | ❌ | ✅ | |||||||||||
| CUDA graph | ✅ | ✅ | ✅ | ✅ | ✅ | ||||||||||
| pooling | 🟠* | 🟠* | ✅ | ❌ | ✅ | ✅ | |||||||||
| enc-dec | ❌ | ❌ | ❌ | ❌ | ✅ | ✅ | ✅ | ||||||||
| logP | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ | ✅ | ✅ | |||||||
| prmpt logP | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ | ✅ | ✅ | ✅ | ||||||
| async output | ✅ | ✅ | ✅ | ❌ | ✅ | ❌ | ❌ | ✅ | ✅ | ✅ | |||||
| multi-step | ❌ | ✅ | ❌ | ❌ | ✅ | ❌ | ❌ | ✅ | ✅ | ✅ | ✅ | ||||
| mm | ✅ | ✅ | 🟠^ | ❔ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❔ | ✅ | |||
| best-of | ✅ | ✅ | ✅ | ❌ | ✅ | ❌ | ✅ | ✅ | ✅ | ❔ | ❌ | ✅ | ✅ | ||
| beam-search | ✅ | ✅ | ✅ | ❌ | ✅ | ❌ | ✅ | ✅ | ✅ | ❔ | ❌ | ❔ | ✅ | ✅ | |
| prompt-embeds | ✅ | ❌ | ✅ | ❌ | ✅ | ❌ | ❌ | ✅ | ❌ | ❔ | ❔ | ❌ | ❔ | ❔ | ✅ |
* Chunked prefill and prefix caching are only applicable to last-token pooling.
^ LoRA is only applicable to the language backbone of multimodal models.
Feature x Hardware
| Feature | Volta | Turing | Ampere | Ada | Hopper | CPU | AMD | TPU | Intel GPU |
|---|---|---|---|---|---|---|---|---|---|
| [CP][chunked-prefill] | ❌ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
| APC | ❌ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
| LoRA | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
| SD | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ | 🟠 |
| CUDA graph | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ | ✅ | ❌ | ❌ |
| pooling | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ | ✅ |
| enc-dec | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ | ❌ | ✅ |
| mm | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ | 🟠 |
| logP | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ | ✅ |
| prmpt logP | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ | ✅ |
| async output | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ | ❌ | ❌ | ✅ |
| multi-step | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ | ✅ | ❌ | ✅ |
| best-of | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ | ✅ |
| beam-search | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ | ✅ |
| prompt-embeds | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ? | ❌ | ✅ |