[CPU Backend] [Doc]: Update Installation Docs for CPUs (#29868)

Signed-off-by: Ioana Ghiban <ioana.ghiban@arm.com>
2025-12-06 06:53:12 +08:00 · 2025-12-03 14:33:50 +01:00
parent 15b1511a15
commit 1bb17ecb39
5 changed files with 58 additions and 14 deletions
--- a/docs/getting_started/installation/cpu.apple.inc.md
+++ b/docs/getting_started/installation/cpu.apple.inc.md
@@ -4,9 +4,6 @@ vLLM has experimental support for macOS with Apple Silicon. For now, users must

 Currently the CPU implementation for macOS supports FP32 and FP16 datatypes.

-!!! warning
-    There are no pre-built wheels or images for this device, so you must build vLLM from source.
-
 # --8<-- [end:installation]
 # --8<-- [start:requirements]

@@ -20,6 +17,8 @@ Currently the CPU implementation for macOS supports FP32 and FP16 datatypes.
 # --8<-- [end:set-up-using-python]
 # --8<-- [start:pre-built-wheels]

+Currently, there are no pre-built Apple silicon CPU wheels.
+
 # --8<-- [end:pre-built-wheels]
 # --8<-- [start:build-wheel-from-source]

@@ -78,6 +77,8 @@ uv pip install -e .
 # --8<-- [end:build-wheel-from-source]
 # --8<-- [start:pre-built-images]

+Currently, there are no pre-built Arm silicon CPU images.
+
 # --8<-- [end:pre-built-images]
 # --8<-- [start:build-image-from-source]

--- a/docs/getting_started/installation/cpu.arm.inc.md
+++ b/docs/getting_started/installation/cpu.arm.inc.md
@@ -1,11 +1,6 @@
 # --8<-- [start:installation]

-vLLM has been adapted to work on ARM64 CPUs with NEON support, leveraging the CPU backend initially developed for the x86 platform.
-
-ARM CPU backend currently supports Float32, FP16 and BFloat16 datatypes.
-
-!!! warning
-    There are no pre-built wheels or images for this device, so you must build vLLM from source.
+vLLM offers basic model inferencing and serving on Arm CPU platform, with support NEON, data types FP32, FP16 and BF16.

 # --8<-- [end:installation]
 # --8<-- [start:requirements]
@@ -20,6 +15,23 @@ ARM CPU backend currently supports Float32, FP16 and BFloat16 datatypes.
 # --8<-- [end:set-up-using-python]
 # --8<-- [start:pre-built-wheels]

+Pre-built vLLM wheels for Arm are available since version 0.11.2. These wheels contain pre-compiled C++ binaries.
+Please replace `<version>` in the commands below with a specific version string (e.g., `0.11.2`).
+
+```bash
+uv pip install --pre vllm==<version>+cpu --extra-index-url https://wheels.vllm.ai/<version>%2Bcpu/
+```
+
+??? console "pip"
+    ```bash
+    pip install --pre vllm==<version>+cpu --extra-index-url https://wheels.vllm.ai/<version>%2Bcpu/
+    ```
+
+The `uv` approach works for vLLM `v0.6.6` and later. A unique feature of `uv` is that packages in `--extra-index-url` have [higher priority than the default index](https://docs.astral.sh/uv/pip/compatibility/#packages-that-exist-on-multiple-indexes). If the latest public release is `v0.6.6.post1`, `uv`'s behavior allows installing a commit before `v0.6.6.post1` by specifying the `--extra-index-url`. In contrast, `pip` combines packages from `--extra-index-url` and the default index, choosing only the latest version, which makes it difficult to install a development version prior to the released version.
+
+!!! note
+    Nightly wheels are currently unsupported for this architecture. (e.g. to bisect the behavior change, performance regression).
+
 # --8<-- [end:pre-built-wheels]
 # --8<-- [start:build-wheel-from-source]

@@ -69,6 +81,8 @@ Testing has been conducted on AWS Graviton3 instances for compatibility.
 # --8<-- [end:build-wheel-from-source]
 # --8<-- [start:pre-built-images]

+Currently, there are no pre-built Arm CPU images.
+
 # --8<-- [end:pre-built-images]
 # --8<-- [start:build-image-from-source]
 ```bash
--- a/docs/getting_started/installation/cpu.md
+++ b/docs/getting_started/installation/cpu.md
@@ -46,11 +46,25 @@ vLLM is a Python library that supports the following CPU variants. Select your C

 ### Pre-built wheels

-Please refer to the instructions for [pre-built wheels on GPU](./gpu.md#pre-built-wheels).
-
 When specifying the index URL, please make sure to use the `cpu` variant subdirectory.
 For example, the nightly build index is: `https://wheels.vllm.ai/nightly/cpu/`.

+=== "Intel/AMD x86"
+
+    --8<-- "docs/getting_started/installation/cpu.x86.inc.md:pre-built-wheels"
+
+=== "ARM AArch64"
+
+    --8<-- "docs/getting_started/installation/cpu.arm.inc.md:pre-built-wheels"
+
+=== "Apple silicon"
+
+    --8<-- "docs/getting_started/installation/cpu.apple.inc.md:pre-built-wheels"
+
+=== "IBM Z (S390X)"
+
+    --8<-- "docs/getting_started/installation/cpu.s390x.inc.md:pre-built-wheels"
+
 ### Build wheel from source

 #### Set up using Python-only build (without compilation) {#python-only-build}
@@ -87,6 +101,18 @@ VLLM_USE_PRECOMPILED=1 VLLM_PRECOMPILED_WHEEL_VARIANT=cpu VLLM_TARGET_DEVICE=cpu

    --8<-- "docs/getting_started/installation/cpu.x86.inc.md:pre-built-images"

+=== "ARM AArch64"
+
+    --8<-- "docs/getting_started/installation/cpu.arm.inc.md:pre-built-images"
+
+=== "Apple silicon"
+
+    --8<-- "docs/getting_started/installation/cpu.apple.inc.md:pre-built-images"
+
+=== "IBM Z (S390X)"
+
+    --8<-- "docs/getting_started/installation/cpu.s390x.inc.md:pre-built-images"
+
 ### Build image from source

 === "Intel/AMD x86"
--- a/docs/getting_started/installation/cpu.s390x.inc.md
+++ b/docs/getting_started/installation/cpu.s390x.inc.md
@@ -4,9 +4,6 @@ vLLM has experimental support for s390x architecture on IBM Z platform. For now,

 Currently, the CPU implementation for s390x architecture supports FP32 datatype only.

-!!! warning
-    There are no pre-built wheels or images for this device, so you must build vLLM from source.
-
 # --8<-- [end:installation]
 # --8<-- [start:requirements]

@@ -21,6 +18,8 @@ Currently, the CPU implementation for s390x architecture supports FP32 datatype
 # --8<-- [end:set-up-using-python]
 # --8<-- [start:pre-built-wheels]

+Currently, there are no pre-built IBM Z CPU wheels.
+
 # --8<-- [end:pre-built-wheels]
 # --8<-- [start:build-wheel-from-source]

@@ -69,6 +68,8 @@ Execute the following commands to build and install vLLM from source.
 # --8<-- [end:build-wheel-from-source]
 # --8<-- [start:pre-built-images]

+Currently, there are no pre-built IBM Z CPU images.
+
 # --8<-- [end:pre-built-images]
 # --8<-- [start:build-image-from-source]

--- a/docs/getting_started/installation/cpu.x86.inc.md
+++ b/docs/getting_started/installation/cpu.x86.inc.md
@@ -17,6 +17,8 @@ vLLM supports basic model inferencing and serving on x86 CPU platform, with data
 # --8<-- [end:set-up-using-python]
 # --8<-- [start:pre-built-wheels]

+Currently, there are no pre-built x86 CPU wheels.
+
 # --8<-- [end:pre-built-wheels]
 # --8<-- [start:build-wheel-from-source]