mirror of https://github.com/vllm-project/vllm.git synced 2025-12-06 06:53:12 +08:00

Files

Varun Sundar Rabindranath 19bee6d12d [Performance][DP/EP] Add silu_mul_per_token_group_quant_fp8_colmajor kernel (#29470 )

Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>

2025-12-03 18:04:59 +00:00

auto_tune

[Doc] fix heading levels (#29783 )

2025-12-01 14:49:22 +00:00

cutlass_benchmarks

[Chore]:Extract math and argparse utilities to separate modules (#27188 )

2025-10-26 04:03:32 -07:00

disagg_benchmarks

[BugFix][PD]: make example proxy usable with P2pNcclConnector (#26628 )

2025-11-20 17:38:31 +00:00

fused_kernels

Update Optional[x] -> x | None and Union[x, y] to x | y (#26633 )

2025-10-12 09:51:31 -07:00

kernels

[Performance][DP/EP] Add silu_mul_per_token_group_quant_fp8_colmajor kernel (#29470 )

2025-12-03 18:04:59 +00:00

multi_turn

[Benchmark] multi_turn: Report warmup-inclusive runtime (#28937 )

2025-11-18 16:38:22 +00:00

overheads

[Chore]:Extract math and argparse utilities to separate modules (#27188 )

2025-10-26 04:03:32 -07:00

structured_schemas

benchmarks: simplify test jsonschema (#14567 )

2025-03-11 13:39:30 +00:00

backend_request_func.py

[Misc] Refactor tokenizer interface (#29693 )

2025-11-29 04:02:21 -08:00

benchmark_batch_invariance.py

Adding a benchmark for batch invariance (#28161 )

2025-11-16 13:22:17 +08:00

benchmark_block_pool.py

[Chore]:Extract math and argparse utilities to separate modules (#27188 )

2025-10-26 04:03:32 -07:00

benchmark_hash.py

[Core] Add xxHash as a high-performance hash option for accelerating prefix caching (#29163 )

2025-12-03 16:06:57 +00:00

benchmark_latency.py

[CI/Build][Doc] Fully deprecate old bench scripts for serving / throughput / latency (#24411 )

2025-09-09 10:02:35 +00:00

benchmark_long_document_qa_throughput.py

[Chore]:Extract math and argparse utilities to separate modules (#27188 )

2025-10-26 04:03:32 -07:00

benchmark_ngram_proposer.py

Remove default values from InitVars so that they're not stored (#29859 )

2025-12-02 12:16:37 +00:00

benchmark_prefix_block_hash.py

[Core] Add xxHash as a high-performance hash option for accelerating prefix caching (#29163 )

2025-12-03 16:06:57 +00:00

benchmark_prefix_caching.py

[Chore] Move tokenizer initialization methods (#29793 )

2025-12-02 13:33:37 +08:00

benchmark_prioritization.py

[Chore]:Extract math and argparse utilities to separate modules (#27188 )

2025-10-26 04:03:32 -07:00

benchmark_serving_structured_output.py

[Chore] Move tokenizer initialization methods (#29793 )

2025-12-02 13:33:37 +08:00

benchmark_serving.py

[CI/Build][Doc] Fully deprecate old bench scripts for serving / throughput / latency (#24411 )

2025-09-09 10:02:35 +00:00

benchmark_throughput.py

[CI/Build][Doc] Fully deprecate old bench scripts for serving / throughput / latency (#24411 )

2025-09-09 10:02:35 +00:00

benchmark_utils.py

Update Optional[x] -> x | None and Union[x, y] to x | y (#26633 )

2025-10-12 09:51:31 -07:00

README.md

[Docs] move benchmarks README to contributing guides (#24820 )

2025-09-16 05:52:57 -07:00

run_structured_output_benchmark.sh

[Benchmarks] Refactor run_structured_output_benchmarks.sh (#17722 )

2025-05-13 01:47:29 -07:00

sonnet.txt

feat(benchmarks): Add Prefix Caching Benchmark to Serving Benchmark (#3277 )

2024-03-27 13:39:26 -07:00

README.md

Benchmarks

This directory used to contain vLLM's benchmark scripts and utilities for performance testing and evaluation.

Serving benchmarks: Scripts for testing online inference performance (latency, throughput)
Throughput benchmarks: Scripts for testing offline batch inference performance
Specialized benchmarks: Tools for testing specific features like structured output, prefix caching, long document QA, request prioritization, and multi-modal inference
Dataset utilities: Framework for loading and sampling from various benchmark datasets (ShareGPT, HuggingFace datasets, synthetic data, etc.)

Usage

For detailed usage instructions, examples, and dataset information, see the Benchmark CLI documentation.

For full CLI reference see:

README.md

Benchmarks

Contents

Usage