mirror of
https://github.com/vllm-project/vllm.git
synced 2025-12-06 06:53:12 +08:00
[CPU Backend] [Doc]: Update Installation Docs for CPUs (#29868)
Signed-off-by: Ioana Ghiban <ioana.ghiban@arm.com>
This commit is contained in:
@@ -4,9 +4,6 @@ vLLM has experimental support for macOS with Apple Silicon. For now, users must
|
||||
|
||||
Currently the CPU implementation for macOS supports FP32 and FP16 datatypes.
|
||||
|
||||
!!! warning
|
||||
There are no pre-built wheels or images for this device, so you must build vLLM from source.
|
||||
|
||||
# --8<-- [end:installation]
|
||||
# --8<-- [start:requirements]
|
||||
|
||||
@@ -20,6 +17,8 @@ Currently the CPU implementation for macOS supports FP32 and FP16 datatypes.
|
||||
# --8<-- [end:set-up-using-python]
|
||||
# --8<-- [start:pre-built-wheels]
|
||||
|
||||
Currently, there are no pre-built Apple silicon CPU wheels.
|
||||
|
||||
# --8<-- [end:pre-built-wheels]
|
||||
# --8<-- [start:build-wheel-from-source]
|
||||
|
||||
@@ -78,6 +77,8 @@ uv pip install -e .
|
||||
# --8<-- [end:build-wheel-from-source]
|
||||
# --8<-- [start:pre-built-images]
|
||||
|
||||
Currently, there are no pre-built Arm silicon CPU images.
|
||||
|
||||
# --8<-- [end:pre-built-images]
|
||||
# --8<-- [start:build-image-from-source]
|
||||
|
||||
|
||||
@@ -1,11 +1,6 @@
|
||||
# --8<-- [start:installation]
|
||||
|
||||
vLLM has been adapted to work on ARM64 CPUs with NEON support, leveraging the CPU backend initially developed for the x86 platform.
|
||||
|
||||
ARM CPU backend currently supports Float32, FP16 and BFloat16 datatypes.
|
||||
|
||||
!!! warning
|
||||
There are no pre-built wheels or images for this device, so you must build vLLM from source.
|
||||
vLLM offers basic model inferencing and serving on Arm CPU platform, with support NEON, data types FP32, FP16 and BF16.
|
||||
|
||||
# --8<-- [end:installation]
|
||||
# --8<-- [start:requirements]
|
||||
@@ -20,6 +15,23 @@ ARM CPU backend currently supports Float32, FP16 and BFloat16 datatypes.
|
||||
# --8<-- [end:set-up-using-python]
|
||||
# --8<-- [start:pre-built-wheels]
|
||||
|
||||
Pre-built vLLM wheels for Arm are available since version 0.11.2. These wheels contain pre-compiled C++ binaries.
|
||||
Please replace `<version>` in the commands below with a specific version string (e.g., `0.11.2`).
|
||||
|
||||
```bash
|
||||
uv pip install --pre vllm==<version>+cpu --extra-index-url https://wheels.vllm.ai/<version>%2Bcpu/
|
||||
```
|
||||
|
||||
??? console "pip"
|
||||
```bash
|
||||
pip install --pre vllm==<version>+cpu --extra-index-url https://wheels.vllm.ai/<version>%2Bcpu/
|
||||
```
|
||||
|
||||
The `uv` approach works for vLLM `v0.6.6` and later. A unique feature of `uv` is that packages in `--extra-index-url` have [higher priority than the default index](https://docs.astral.sh/uv/pip/compatibility/#packages-that-exist-on-multiple-indexes). If the latest public release is `v0.6.6.post1`, `uv`'s behavior allows installing a commit before `v0.6.6.post1` by specifying the `--extra-index-url`. In contrast, `pip` combines packages from `--extra-index-url` and the default index, choosing only the latest version, which makes it difficult to install a development version prior to the released version.
|
||||
|
||||
!!! note
|
||||
Nightly wheels are currently unsupported for this architecture. (e.g. to bisect the behavior change, performance regression).
|
||||
|
||||
# --8<-- [end:pre-built-wheels]
|
||||
# --8<-- [start:build-wheel-from-source]
|
||||
|
||||
@@ -69,6 +81,8 @@ Testing has been conducted on AWS Graviton3 instances for compatibility.
|
||||
# --8<-- [end:build-wheel-from-source]
|
||||
# --8<-- [start:pre-built-images]
|
||||
|
||||
Currently, there are no pre-built Arm CPU images.
|
||||
|
||||
# --8<-- [end:pre-built-images]
|
||||
# --8<-- [start:build-image-from-source]
|
||||
```bash
|
||||
|
||||
@@ -46,11 +46,25 @@ vLLM is a Python library that supports the following CPU variants. Select your C
|
||||
|
||||
### Pre-built wheels
|
||||
|
||||
Please refer to the instructions for [pre-built wheels on GPU](./gpu.md#pre-built-wheels).
|
||||
|
||||
When specifying the index URL, please make sure to use the `cpu` variant subdirectory.
|
||||
For example, the nightly build index is: `https://wheels.vllm.ai/nightly/cpu/`.
|
||||
|
||||
=== "Intel/AMD x86"
|
||||
|
||||
--8<-- "docs/getting_started/installation/cpu.x86.inc.md:pre-built-wheels"
|
||||
|
||||
=== "ARM AArch64"
|
||||
|
||||
--8<-- "docs/getting_started/installation/cpu.arm.inc.md:pre-built-wheels"
|
||||
|
||||
=== "Apple silicon"
|
||||
|
||||
--8<-- "docs/getting_started/installation/cpu.apple.inc.md:pre-built-wheels"
|
||||
|
||||
=== "IBM Z (S390X)"
|
||||
|
||||
--8<-- "docs/getting_started/installation/cpu.s390x.inc.md:pre-built-wheels"
|
||||
|
||||
### Build wheel from source
|
||||
|
||||
#### Set up using Python-only build (without compilation) {#python-only-build}
|
||||
@@ -87,6 +101,18 @@ VLLM_USE_PRECOMPILED=1 VLLM_PRECOMPILED_WHEEL_VARIANT=cpu VLLM_TARGET_DEVICE=cpu
|
||||
|
||||
--8<-- "docs/getting_started/installation/cpu.x86.inc.md:pre-built-images"
|
||||
|
||||
=== "ARM AArch64"
|
||||
|
||||
--8<-- "docs/getting_started/installation/cpu.arm.inc.md:pre-built-images"
|
||||
|
||||
=== "Apple silicon"
|
||||
|
||||
--8<-- "docs/getting_started/installation/cpu.apple.inc.md:pre-built-images"
|
||||
|
||||
=== "IBM Z (S390X)"
|
||||
|
||||
--8<-- "docs/getting_started/installation/cpu.s390x.inc.md:pre-built-images"
|
||||
|
||||
### Build image from source
|
||||
|
||||
=== "Intel/AMD x86"
|
||||
|
||||
@@ -4,9 +4,6 @@ vLLM has experimental support for s390x architecture on IBM Z platform. For now,
|
||||
|
||||
Currently, the CPU implementation for s390x architecture supports FP32 datatype only.
|
||||
|
||||
!!! warning
|
||||
There are no pre-built wheels or images for this device, so you must build vLLM from source.
|
||||
|
||||
# --8<-- [end:installation]
|
||||
# --8<-- [start:requirements]
|
||||
|
||||
@@ -21,6 +18,8 @@ Currently, the CPU implementation for s390x architecture supports FP32 datatype
|
||||
# --8<-- [end:set-up-using-python]
|
||||
# --8<-- [start:pre-built-wheels]
|
||||
|
||||
Currently, there are no pre-built IBM Z CPU wheels.
|
||||
|
||||
# --8<-- [end:pre-built-wheels]
|
||||
# --8<-- [start:build-wheel-from-source]
|
||||
|
||||
@@ -69,6 +68,8 @@ Execute the following commands to build and install vLLM from source.
|
||||
# --8<-- [end:build-wheel-from-source]
|
||||
# --8<-- [start:pre-built-images]
|
||||
|
||||
Currently, there are no pre-built IBM Z CPU images.
|
||||
|
||||
# --8<-- [end:pre-built-images]
|
||||
# --8<-- [start:build-image-from-source]
|
||||
|
||||
|
||||
@@ -17,6 +17,8 @@ vLLM supports basic model inferencing and serving on x86 CPU platform, with data
|
||||
# --8<-- [end:set-up-using-python]
|
||||
# --8<-- [start:pre-built-wheels]
|
||||
|
||||
Currently, there are no pre-built x86 CPU wheels.
|
||||
|
||||
# --8<-- [end:pre-built-wheels]
|
||||
# --8<-- [start:build-wheel-from-source]
|
||||
|
||||
|
||||
Reference in New Issue
Block a user