vllm-project/vllm

mirror of https://github.com/vllm-project/vllm.git synced 2025-12-06 06:53:12 +08:00

Author	SHA1	Message	Date
Russell Bryant	3633035a3f	[Misc] Rename CohereForAI references to CohereLabs (#30147 ) Signed-off-by: Russell Bryant <rbryant@redhat.com>	2025-12-05 19:41:40 +00:00
Yanan Cao	62b3333448	[Frontend] Remove deprecated -O.xx flag (#29991 ) Signed-off-by: Yanan Cao <gmagogsfm@gmail.com>	2025-12-05 00:47:22 -08:00
Tiger Xu / Zhonghu Xu	60a66ea2dc	[DOC]: Add kthena to integrations (#29931 ) Signed-off-by: Zhonghu Xu <xuzhonghu@huawei.com>	2025-12-05 08:11:03 +00:00
Hubert de La Jonquiere	befb59e5b1	[Model] Add Holo2 reasoning parser (#30048 ) Signed-off-by: hdlj-h <hubert@hcompany.ai>	2025-12-05 10:38:45 +08:00
TimWang	690cc3ef20	docs: update metrics design doc to use new vllm:kv_cache_usage_perc (#30041 ) Signed-off-by: Tim <tim.wang03@sap.com>	2025-12-04 23:37:14 +00:00
Tao Yun	6dcb07f676	support qwen3-vl handle requests with embeddings (#30037 ) Signed-off-by: taoyun <1069423820@qq.com> Signed-off-by: Cyrus Leung <cyrus.tl.leung@gmail.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2025-12-04 17:34:06 +00:00
Shengqi Chen	990f806473	[Doc] clarify nightly builds in developer docs (#30019 ) Signed-off-by: Shengqi Chen <harry-chen@outlook.com>	2025-12-05 00:28:37 +08:00
Harry Mellor	9998ea5b57	Delete HF version of Phi 4 MM (#30049 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-12-04 13:44:50 +00:00
wang.yuqi	74c4d80c6c	[Model][6/N] Improve all pooling task \| Support chunked prefill with ALL pooling (#27145 ) Signed-off-by: wang.yuqi <noooop@126.com> Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-12-04 13:44:15 +00:00
dtc	842aba501d	[P/D] Introduce Mooncake Transfer Engine as kv_connector (#24718 ) Signed-off-by: Tianchen Ding <dtcccc@linux.alibaba.com> Signed-off-by: dtc <dtcccc@linux.alibaba.com> Co-authored-by: Nicolò Lucchesi <nicolo.lucchesi@gmail.com>	2025-12-04 09:51:36 +00:00
CYJiang	fd68e909db	[docs] Remove _total from counter metrics names (#30028 ) In Prometheus Counters always expose their actual numeric value with a metric name that ends in _total. We should document the base name, as this what appears in the get_metrics() API. Signed-off-by: CYJiang <86391540+googs1025@users.noreply.github.com>	2025-12-04 07:46:15 +00:00
Cyrus Leung	9ae2f60374	[Misc] Various cleanups for MM input processing (#29970 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-12-04 06:22:20 +00:00
bnellnm	2902c34826	[Kernels] Remove BatchedTritonOrDeepGemmExperts and default fallback to Triton (#29929 ) Signed-off-by: Bill Nell <bnell@redhat.com> Signed-off-by: bnellnm <49004751+bnellnm@users.noreply.github.com> Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>	2025-12-03 20:49:00 +00:00
Lumis Chen	9bcf92295a	[Core] Add xxHash as a high-performance hash option for accelerating prefix caching (#29163 ) Signed-off-by: LuminolT <lumischen01@gmail.com> Signed-off-by: Lumis Chen <lumischen01@gmail.com> Co-authored-by: Russell Bryant <rbryant@redhat.com>	2025-12-03 16:06:57 +00:00
ioana ghiban	1bb17ecb39	[CPU Backend] [Doc]: Update Installation Docs for CPUs (#29868 ) Signed-off-by: Ioana Ghiban <ioana.ghiban@arm.com>	2025-12-03 13:33:50 +00:00
ioana ghiban	15b1511a15	[GPU Backend] [Doc]: Remove duplicate statements on missing GPU wheels. (#29962 ) Signed-off-by: Ioana Ghiban <ioana.ghiban@arm.com>	2025-12-03 12:56:47 +00:00
Amr Mahdi	f5d3d93c40	[docker] Build CUDA kernels in separate Docker stage for faster rebuilds (#29452 ) Signed-off-by: Amr Mahdi <amrmahdi@meta.com>	2025-12-03 11:41:53 +00:00
Fadi Arafeh	78f4bb0ba8	[DOC] Add Arm to list of compute resouces providers (#29894 ) Signed-off-by: Fadi Arafeh <fadi.arafeh@arm.com>	2025-12-03 11:36:58 +00:00
Russell Bryant	b08025a83b	[Docs] Discuss api key limitations in security guide (#29922 ) Signed-off-by: Russell Bryant <rbryant@redhat.com>	2025-12-02 20:57:28 -08:00
wang.yuqi	2eb4fe9129	[examples] Resettle pooling examples. (#29365 ) Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-12-02 15:54:28 +00:00
Julien Denize	d8c6210eea	Add Mistral Large 3 and Ministral 3 (#29757 ) Signed-off-by: Julien Denize <julien.denize@mistral.ai> Signed-off-by: Julien Denize <40604584+juliendenize@users.noreply.github.com> Signed-off-by: Mickael Seznec <mickael@mistral.ai> Signed-off-by: Roger Wang <hey@rogerw.io> Co-authored-by: Roger Wang <hey@rogerw.io> Co-authored-by: Mickael Seznec <mickael@mistral.ai>	2025-12-02 10:29:00 +00:00
Louie Tsai	8bbcf8b6e7	[vLLM Benchmark Suite] Add default parameters section and update CPU benchmark cases (#29381 ) Signed-off-by: Tsai, Louie <louie.tsai@intel.com> Signed-off-by: Louie Tsai <louie.tsai@intel.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Li, Jiang <bigpyj64@gmail.com>	2025-12-02 09:00:23 +00:00
Shengqi Chen	4b612664fd	[CI] Renovation of nightly wheel build & generation (take 2) (#29838 ) Signed-off-by: Shengqi Chen <harry-chen@outlook.com>	2025-12-01 22:17:10 -08:00
Kevin H. Luu	1336a1ea24	Revert #29787 and #29690 (#29815 )	2025-12-01 13:42:03 -08:00
Finbarr Timbers	38caf7fa1a	Update FAQ on interleaving sliding windows support (#29796 ) Signed-off-by: Finbarr Timbers <finbarrtimbers@gmail.com>	2025-12-01 19:15:19 +00:00
shivampr	cabc77cc86	[Core][Observability] Add KV cache residency metrics (#27793 ) Introduces three new Prometheus histograms for fine-grained observability of KV cache residency behavior: vllm:kv_block_lifetime_seconds — total lifetime from allocation to free vllm:kv_block_idle_before_evict_seconds — idle duration before eviction vllm:kv_block_reuse_gap_seconds — time between consecutive reuses of the same block These metrics help operators analyze KV cache efficiency, reuse patterns, and eviction timing beyond simple utilization rates. Implementation uses monotonic timestamps for accuracy, 1% sampling for minimal overhead (~48 bytes/block), and is fully thread-safe with zero runtime cost when disabled. Two new runtime flags are introduced: --kv-cache-metrics – enable KV cache residency metrics --kv-cache-metrics-sample – control sampling ratio (default: 0.01) Signed-off-by: Shivam <shivamprasad91@gmail.com>	2025-12-01 18:27:53 +00:00
sangbumlikeagod	092bb73b8a	[Frontend] add 'verbose_json' and 'timestamp' feature on Whisper Transcription/Translation (#24209 ) Signed-off-by: sangbumlikeagod <oironese@naver.com> Signed-off-by: sangbumlikeagod <98077576+sangbumlikeagod@users.noreply.github.com>	2025-12-01 18:19:17 +01:00
Shengqi Chen	36db0a35e4	[CI] Renovation of nightly wheel build & generation (#29690 ) Signed-off-by: Shengqi Chen <harry-chen@outlook.com>	2025-12-01 21:25:39 +08:00
wang.yuqi	62de4f4257	[Frontend] Resettle pooling entrypoints (#29634 ) Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>	2025-12-01 15:30:43 +08:00
Yifei Zhang	1ab8fc8197	Make PyTorch profiler gzip and CUDA time dump configurable (#29568 ) Signed-off-by: Yifei Zhang <yifei.zhang1992@outlook.com>	2025-12-01 04:30:46 +00:00
Cyrus Leung	2afcec4dec	[Misc] Update `TokenizerLike` interface and move `get_cached_tokenizer` (#29730 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-11-30 14:59:47 +08:00
Jinzhen Lin	1656ad3704	[Kernel][Quantization] add w4a8 support for marlin kernel (#24722 ) Signed-off-by: Jinzhen Lin <jinzhen.ljz@antgroup.com> Signed-off-by: Michael Goin <mgoin64@gmail.com> Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com> Co-authored-by: Michael Goin <mgoin64@gmail.com> Co-authored-by: Michael Goin <mgoin@redhat.com>	2025-11-29 07:19:33 -08:00
dublc	f4341f45d3	[Doc]: fix code block rendering (#29728 ) Signed-off-by: dublc <jdublc0x@gmail.com>	2025-11-29 13:46:48 +00:00
Cyrus Leung	34a984274e	[Misc] Refactor tokenizer interface (#29693 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-11-29 04:02:21 -08:00
Yanan Cao	3461e7efd8	[Frontend] Remap -O to -cc commandline flag (#29557 ) Signed-off-by: Yanan Cao <gmagogsfm@gmail.com> Co-authored-by: Claude <noreply@anthropic.com>	2025-11-28 21:51:12 +00:00
Harry Mellor	4332955602	[Docs] Add CLI reference doc for `vllm bench sweep plot_pareto` (#29689 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-11-28 08:10:08 -09:00
Wilson Wu	5c2b5cb422	[Docs] Add SPLADE and Ultravox models to supported models documentation (#29659 ) Signed-off-by: Wilson Wu <iwilsonwu@gmail.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2025-11-28 01:29:28 -09:00
Cyrus Leung	ccbdf51bd5	[Doc] Reorganize benchmark docs (#29658 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-11-28 17:19:25 +08:00
rongfu.leng	480598958e	[Feature][Bench] Add pareto visualization (#29477 ) Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io>	2025-11-27 23:53:20 -08:00
Wilson Wu	18523b87f6	[Docs] Update supported models for Olmo 3 in tool calling documentation (#29411 ) Signed-off-by: Wilson Wu <iwilsonwu@gmail.com>	2025-11-28 02:53:55 +00:00
Morrison Turnansky	0838b52e2e	[Frontend][torch.compile] CompilationConfig Overhaul (#20283 ): Set up -O infrastructure (#26847 ) Signed-off-by: morrison-turnansky <mturnans@redhat.com> Signed-off-by: adabeyta <aabeyta@redhat.com> Signed-off-by: Morrison Turnansky <mturnans@redhat.com> Co-authored-by: adabeyta <aabeyta@redhat.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-11-27 01:55:58 -08:00
TJian	da8e1a1bf9	[DOC] Add vLLM Bangkok Meetup info (#29561 ) Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>	2025-11-27 04:42:50 +00:00
Louie Tsai	9bb33c8919	add xpu supported model and model id for cpu (#29380 ) Signed-off-by: Tsai, Louie <louie.tsai@intel.com>	2025-11-27 11:30:50 +08:00
Lucas Wilkinson	56539cddac	[Core] Refactor padding logic and pad for CUDA graphs before attention metadata building (#28579 )	2025-11-26 14:07:13 -05:00
Matthew Bonanni	430dd4d9eb	[Attention] Remove imports from `vllm/attention/__init__.py` (#29342 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>	2025-11-26 10:53:15 -07:00
yxt	3650a74ed8	Optimize the wording of the document and unify the terminology and th… (#29491 )	2025-11-26 05:16:12 -08:00
Michael Goin	e502098643	[Kernel] Add NVFP4 MoE CUTLASS support for SM120 (#29242 ) Signed-off-by: mgoin <mgoin64@gmail.com> Signed-off-by: Michael Goin <mgoin64@gmail.com>	2025-11-25 06:59:07 -08:00
Ben Browning	e1dd706cd1	[Frontend] Respect Chat Completion parallel_tool_calls param (#26233 ) Signed-off-by: Ben Browning <bbrownin@redhat.com> Co-authored-by: Chauncey <chaunceyjiang@gmail.com>	2025-11-25 09:56:15 +00:00
Harry Mellor	316c8492bf	Scheduled removal of `guided_*` config fields (#29326 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-11-25 05:24:05 +00:00
Isotr0py	92effb07a4	[Model] Add HunyuanOCR support (#29327 ) Signed-off-by: manayang <jackmanayang@gmail.com> Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> Signed-off-by: Roger Wang <hey@rogerw.io> Co-authored-by: sergeywang <sergeywang@tencent.com> Co-authored-by: manayang <jackmanayang@gmail.com> Co-authored-by: manayang <manayang@tencent.com> Co-authored-by: Roger Wang <hey@rogerw.io>	2025-11-25 03:28:51 +00:00

1 2 3 4 5 ...

1729 Commits