[CPU Backend] [Doc]: Update Installation Docs for CPUs (#29868)

Signed-off-by: Ioana Ghiban <ioana.ghiban@arm.com>
This commit is contained in:
ioana ghiban
2025-12-03 14:33:50 +01:00
committed by GitHub
parent 15b1511a15
commit 1bb17ecb39
5 changed files with 58 additions and 14 deletions

View File

@@ -4,9 +4,6 @@ vLLM has experimental support for macOS with Apple Silicon. For now, users must
Currently the CPU implementation for macOS supports FP32 and FP16 datatypes.
!!! warning
There are no pre-built wheels or images for this device, so you must build vLLM from source.
# --8<-- [end:installation]
# --8<-- [start:requirements]
@@ -20,6 +17,8 @@ Currently the CPU implementation for macOS supports FP32 and FP16 datatypes.
# --8<-- [end:set-up-using-python]
# --8<-- [start:pre-built-wheels]
Currently, there are no pre-built Apple silicon CPU wheels.
# --8<-- [end:pre-built-wheels]
# --8<-- [start:build-wheel-from-source]
@@ -78,6 +77,8 @@ uv pip install -e .
# --8<-- [end:build-wheel-from-source]
# --8<-- [start:pre-built-images]
Currently, there are no pre-built Arm silicon CPU images.
# --8<-- [end:pre-built-images]
# --8<-- [start:build-image-from-source]

View File

@@ -1,11 +1,6 @@
# --8<-- [start:installation]
vLLM has been adapted to work on ARM64 CPUs with NEON support, leveraging the CPU backend initially developed for the x86 platform.
ARM CPU backend currently supports Float32, FP16 and BFloat16 datatypes.
!!! warning
There are no pre-built wheels or images for this device, so you must build vLLM from source.
vLLM offers basic model inferencing and serving on Arm CPU platform, with support NEON, data types FP32, FP16 and BF16.
# --8<-- [end:installation]
# --8<-- [start:requirements]
@@ -20,6 +15,23 @@ ARM CPU backend currently supports Float32, FP16 and BFloat16 datatypes.
# --8<-- [end:set-up-using-python]
# --8<-- [start:pre-built-wheels]
Pre-built vLLM wheels for Arm are available since version 0.11.2. These wheels contain pre-compiled C++ binaries.
Please replace `<version>` in the commands below with a specific version string (e.g., `0.11.2`).
```bash
uv pip install --pre vllm==<version>+cpu --extra-index-url https://wheels.vllm.ai/<version>%2Bcpu/
```
??? console "pip"
```bash
pip install --pre vllm==<version>+cpu --extra-index-url https://wheels.vllm.ai/<version>%2Bcpu/
```
The `uv` approach works for vLLM `v0.6.6` and later. A unique feature of `uv` is that packages in `--extra-index-url` have [higher priority than the default index](https://docs.astral.sh/uv/pip/compatibility/#packages-that-exist-on-multiple-indexes). If the latest public release is `v0.6.6.post1`, `uv`'s behavior allows installing a commit before `v0.6.6.post1` by specifying the `--extra-index-url`. In contrast, `pip` combines packages from `--extra-index-url` and the default index, choosing only the latest version, which makes it difficult to install a development version prior to the released version.
!!! note
Nightly wheels are currently unsupported for this architecture. (e.g. to bisect the behavior change, performance regression).
# --8<-- [end:pre-built-wheels]
# --8<-- [start:build-wheel-from-source]
@@ -69,6 +81,8 @@ Testing has been conducted on AWS Graviton3 instances for compatibility.
# --8<-- [end:build-wheel-from-source]
# --8<-- [start:pre-built-images]
Currently, there are no pre-built Arm CPU images.
# --8<-- [end:pre-built-images]
# --8<-- [start:build-image-from-source]
```bash

View File

@@ -46,11 +46,25 @@ vLLM is a Python library that supports the following CPU variants. Select your C
### Pre-built wheels
Please refer to the instructions for [pre-built wheels on GPU](./gpu.md#pre-built-wheels).
When specifying the index URL, please make sure to use the `cpu` variant subdirectory.
For example, the nightly build index is: `https://wheels.vllm.ai/nightly/cpu/`.
=== "Intel/AMD x86"
--8<-- "docs/getting_started/installation/cpu.x86.inc.md:pre-built-wheels"
=== "ARM AArch64"
--8<-- "docs/getting_started/installation/cpu.arm.inc.md:pre-built-wheels"
=== "Apple silicon"
--8<-- "docs/getting_started/installation/cpu.apple.inc.md:pre-built-wheels"
=== "IBM Z (S390X)"
--8<-- "docs/getting_started/installation/cpu.s390x.inc.md:pre-built-wheels"
### Build wheel from source
#### Set up using Python-only build (without compilation) {#python-only-build}
@@ -87,6 +101,18 @@ VLLM_USE_PRECOMPILED=1 VLLM_PRECOMPILED_WHEEL_VARIANT=cpu VLLM_TARGET_DEVICE=cpu
--8<-- "docs/getting_started/installation/cpu.x86.inc.md:pre-built-images"
=== "ARM AArch64"
--8<-- "docs/getting_started/installation/cpu.arm.inc.md:pre-built-images"
=== "Apple silicon"
--8<-- "docs/getting_started/installation/cpu.apple.inc.md:pre-built-images"
=== "IBM Z (S390X)"
--8<-- "docs/getting_started/installation/cpu.s390x.inc.md:pre-built-images"
### Build image from source
=== "Intel/AMD x86"

View File

@@ -4,9 +4,6 @@ vLLM has experimental support for s390x architecture on IBM Z platform. For now,
Currently, the CPU implementation for s390x architecture supports FP32 datatype only.
!!! warning
There are no pre-built wheels or images for this device, so you must build vLLM from source.
# --8<-- [end:installation]
# --8<-- [start:requirements]
@@ -21,6 +18,8 @@ Currently, the CPU implementation for s390x architecture supports FP32 datatype
# --8<-- [end:set-up-using-python]
# --8<-- [start:pre-built-wheels]
Currently, there are no pre-built IBM Z CPU wheels.
# --8<-- [end:pre-built-wheels]
# --8<-- [start:build-wheel-from-source]
@@ -69,6 +68,8 @@ Execute the following commands to build and install vLLM from source.
# --8<-- [end:build-wheel-from-source]
# --8<-- [start:pre-built-images]
Currently, there are no pre-built IBM Z CPU images.
# --8<-- [end:pre-built-images]
# --8<-- [start:build-image-from-source]

View File

@@ -17,6 +17,8 @@ vLLM supports basic model inferencing and serving on x86 CPU platform, with data
# --8<-- [end:set-up-using-python]
# --8<-- [start:pre-built-wheels]
Currently, there are no pre-built x86 CPU wheels.
# --8<-- [end:pre-built-wheels]
# --8<-- [start:build-wheel-from-source]