vllm-project/vllm

mirror of https://github.com/vllm-project/vllm.git synced 2025-12-06 06:53:12 +08:00

Author	SHA1	Message	Date
Wentao Ye	7b5575fa7d	[Bug] Fix vLLM config is not set error (#29999 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-12-05 16:42:12 -05:00
Bangsheng Tang	77e4472809	let draft model follow target model's config_format (#30152 )	2025-12-05 13:33:42 -08:00
Divakar Verma	962d703818	[Bugfix][llama4_eagle] Fix missing 'lm_head' attribute (#29926 ) Signed-off-by: Divakar Verma <divakar.verma@amd.com>	2025-12-05 19:57:26 +00:00
Nicolò Lucchesi	e23ca3a0e8	[CI] Re-use whisper_client for all tests (#30148 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2025-12-05 19:47:37 +00:00
Russell Bryant	3633035a3f	[Misc] Rename CohereForAI references to CohereLabs (#30147 ) Signed-off-by: Russell Bryant <rbryant@redhat.com>	2025-12-05 19:41:40 +00:00
Nicolò Lucchesi	bff78310d9	[Enc-Dec] Fix OOT tokenizer issue (#30144 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2025-12-05 19:23:33 +00:00
Tova Movshovitz	adb315060c	[KVConnector][Feature] Support KV connector cache reset via /reset_prefix_cache (#27170 ) Signed-off-by: tovam <tovam@pliops.com> Signed-off-by: Tova Movshovitz <tovam@pliops.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-12-05 18:33:26 +00:00
Ilya Markov	4e26d3b09e	[Compile] Conditional compilation. Introduce compile_ranges (#24252 ) Signed-off-by: Luka Govedič <lgovedic@redhat.com> Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com> Signed-off-by: ilmarkov <markovilya197@gmail.com> Signed-off-by: Luka Govedič <luka.govedic@gmail.com> Signed-off-by: ProExpertProg <lgovedic@redhat.com> Co-authored-by: Luka Govedič <lgovedic@redhat.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com> Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com> Co-authored-by: Luka Govedič <luka.govedic@gmail.com>	2025-12-05 18:17:32 +00:00
Matthew Bonanni	66e674cdd5	[Attention][UX][1/N] Add AttentionConfig and change attention env vars to CLI arguments (#26315 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com> Signed-off-by: Matthew Bonanni <mbonanni001@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com>	2025-12-05 09:48:43 -08:00
Mark McLoughlin	dff0a2b394	[NIXL] Add remote_request_id to kv_transfer_params (#29665 ) Signed-off-by: Mark McLoughlin <markmc@redhat.com>	2025-12-05 09:43:48 -08:00
Nick Hill	dc264bcea1	[BugFix] Eagerly abort cancelled final-step requests (#29987 ) Currently, when requests are cancelled while executing their final step, "completion" is handled based on normal stop processing (e.g. length or stop token), so the abort has no effect. This is typically not a problem, but when a kv connector is involved it thinks the request completed successfully rather than being aborted. This is problematic for disaggregated prefill which will free kv cache blocks if the request was aborted but not if it completed successfully—since the cancelled request will never be sent to the decode side, kv cache blocks remain pinned until the fall-back timeout expires. The problem is exacerbated when many requests are cancelled and/or there are large prefills whose forward pass takes a long time (since the window is bigger). This PR fixes the problem by processing pending aborts immediately prior to processing model output each step; we process only aborts, not new requests, since it's preferable for latency to process model outputs before new incoming requests. Fixes #26400. Signed-off-by: Nick Hill <nhill@redhat.com>	2025-12-05 17:28:32 +00:00
Nicolò Lucchesi	78c44fd722	[NIXL] Small cleanup of unused variables (#29618 ) Signed-off-by: NickLucche <nlucches@redhat.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>	2025-12-05 18:17:36 +01:00
Angela Yi	e7296b08da	[bugfix] Pass globals to aot_compiled function (#29428 ) Signed-off-by: angelayi <yiangela7@gmail.com>	2025-12-05 16:54:26 +00:00
Andrew Xia	da7bc54ea8	[responsesAPI][5] ResponsesParser with tools for full MCP python loop (#29798 ) Signed-off-by: Andrew Xia <axia@fb.com> Signed-off-by: Andrew Xia <axia@meta.com> Co-authored-by: Andrew Xia <axia@fb.com>	2025-12-05 11:11:50 -05:00
Mark McLoughlin	949a6a19d2	[NIXL] Add compatibility checking to NIXL KV connector handshake (#29503 ) Signed-off-by: Mark McLoughlin <markmc@redhat.com>	2025-12-05 15:52:45 +01:00
Alec S	2c174420f5	Reduce validation to a warning (#28749 ) Signed-off-by: Alec Solder <alecs@fb.com> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Alec Solder <alecs@fb.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-12-05 14:02:49 +00:00
Yi Liu	0d8a7d8a26	[Compressed Tensors] Add XPU `wNa16` support (#29484 ) Signed-off-by: yiliu30 <yi4.liu@intel.com>	2025-12-05 22:02:09 +08:00
Elham	9843e332da	[CPU][Perf] Add fast vectorized exp impl from Arm Optimized Routines (#30068 ) Signed-off-by: Ubuntu <ubuntu@ip-10-252-30-150.eu-west-1.compute.internal> Signed-off-by: Elham Harirpoush <elham.harirpoush@arm.com> Co-authored-by: Ubuntu <ubuntu@ip-10-252-30-150.eu-west-1.compute.internal>	2025-12-05 13:09:20 +00:00
Harry Mellor	b7d85cf25c	[CI] Have pre-commit comment on a PR if pre-commit was not used (#30077 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-12-05 13:03:45 +00:00
Max Hu	c2894d3883	[Feature] Add Layer-wise NVTX Support (#29990 ) Signed-off-by: Max Hu <hyoung2991@gmail.com> Signed-off-by: Max Hu <maxhu@nvidia.com> Co-authored-by: Max Hu <maxhu@nvidia.com>	2025-12-05 11:20:07 +00:00
Zhiwei	3628bcaaf2	[ROCm][MXFP4] Infer w4a4 quant method in rocm aiter fused moe (#29775 ) Signed-off-by: ZhiweiYan-96 <zhiwei.yan@amd.com>	2025-12-05 11:01:16 +00:00
strinczer	b73b158ab0	[Bugfix] Fix parse_output_message crash on commentary with no recipient (#29972 ) Signed-off-by: Shai Trinczer <strinczer@icloud.com> Signed-off-by: strinczer <strinczer@icloud.com>	2025-12-05 10:51:12 +00:00
Ning Xie	7ae13c66ba	[typing] fix type (#29964 ) Signed-off-by: Andy Xie <andy.xning@gmail.com>	2025-12-05 10:46:08 +00:00
Ming Yang	f16356fe36	[bench] Support common prefix len config (for decode-only bench) (#29934 ) Signed-off-by: Ming Yang <minos.future@gmail.com>	2025-12-05 10:26:52 +00:00
Alec S	65ee97288a	[BugFix] Adding env variable to disable async grammar compilation (#29996 ) Signed-off-by: Alec Solder <alecs@fb.com> Signed-off-by: Alec S <10566873+alecsolder@users.noreply.github.com> Co-authored-by: Alec Solder <alecs@fb.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: 22quinn <33176974+22quinn@users.noreply.github.com>	2025-12-05 00:49:37 -08:00
Yanan Cao	62b3333448	[Frontend] Remove deprecated -O.xx flag (#29991 ) Signed-off-by: Yanan Cao <gmagogsfm@gmail.com>	2025-12-05 00:47:22 -08:00
rasmith	feecba09af	[CI/Build][AMD] Use float16 in test_reset_prefix_cache_e2e to avoid accuracy issues (#29997 ) Signed-off-by: Randall Smith <ransmith@amd.com> Co-authored-by: Randall Smith <ransmith@amd.com>	2025-12-05 08:42:25 +00:00
amitz-nv	6038b1b04b	[Frontend][Model] Add 'float16' to possible mamba cache dtype values, override mamba SSM cache dtype value for NemotronH (#29978 ) Signed-off-by: amitz-nv <203509407+amitz-nv@users.noreply.github.com>	2025-12-05 00:34:33 -08:00
Tiger Xu / Zhonghu Xu	60a66ea2dc	[DOC]: Add kthena to integrations (#29931 ) Signed-off-by: Zhonghu Xu <xuzhonghu@huawei.com>	2025-12-05 08:11:03 +00:00
Micah Williamson	06579f9a82	[AMD][CI] Add ray[default] Dependency On ROCm To Pass v1/metrics/test_engine_logger_apis.py (#30110 ) Signed-off-by: Micah Williamson <micah.williamson@amd.com>	2025-12-05 06:48:23 +00:00
Chukwuma Nwaugha	6e865b6a83	Refactor example prompts fixture (#29854 ) Signed-off-by: nwaughac@gmail.com	2025-12-05 06:44:32 +00:00
Jingchun Gao	d698bb382d	[Bugfix] Correct num_q_heads on DCP for Flashinfer backends (#29487 ) Signed-off-by: Jingchun Gao <gaojingchun1@huawei.com> Signed-off-by: Jingchun Gao <63247409+gjc0824@users.noreply.github.com> Co-authored-by: Jingchun Gao <gaojingchun1@huawei.com>	2025-12-05 05:54:31 +00:00
Charlie Fu	2c22c4ca2d	[ROCm][CI] Increase the memory threshold for test_deep_sleep_fp8_kvcache (#30104 ) Signed-off-by: charlifu <charlifu@amd.com>	2025-12-05 04:51:44 +00:00
Laith Sakka	5867819eaf	Do not guard during noop elimination pass (#30095 ) Signed-off-by: Laith Sakka <lsakka@meta.com>	2025-12-05 04:10:12 +00:00
Charlie Fu	7c9b2c8f81	[ROCm][CI] Add jiwer dependency for testing (#30081 ) Signed-off-by: charlifu <charlifu@amd.com>	2025-12-05 03:34:51 +00:00
Qiu	0098a6e3da	[PCP&DCP] move CUDAGraph check for PCP&DCP to the check func of platforms (#29952 ) Signed-off-by: QiuChunshuo <qiuchunshuo@huawei.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>	2025-12-04 21:40:51 -05:00
Hubert de La Jonquiere	befb59e5b1	[Model] Add Holo2 reasoning parser (#30048 ) Signed-off-by: hdlj-h <hubert@hcompany.ai>	2025-12-05 10:38:45 +08:00
Shengqi Chen	aaddc9c82a	[CI] fix silent error in nightly wheel index generation script, add generation time to HTML index (#30060 ) Signed-off-by: Shengqi Chen <harry-chen@outlook.com>	2025-12-05 00:48:59 +00:00
Zhewen Li	263c38d74d	[CI/Build] Update batch invariant test trigger (#30080 ) Signed-off-by: zhewenli <zhewenli@meta.com>	2025-12-05 00:42:37 +00:00
Zhewen Li	bcf43ab1f3	[CI/Build][AMD] Add Llama4 Maverick FP8 to AMD CI (#28695 ) Signed-off-by: zhewenli <zhewenli@meta.com>	2025-12-04 16:07:20 -08:00
Alexander Matveev	4470ee2f90	[Perf] Enable separate shared_experts stream only for CUDA (#30085 ) Signed-off-by: Alexander Matveev <amatveev@redhat.com>	2025-12-05 00:03:17 +00:00
TimWang	690cc3ef20	docs: update metrics design doc to use new vllm:kv_cache_usage_perc (#30041 ) Signed-off-by: Tim <tim.wang03@sap.com>	2025-12-04 23:37:14 +00:00
Laith Sakka	1f0d184590	[aot_compile]change VLLM backend to read fake args from example_value (#29104 ) Signed-off-by: Laith Sakka <lsakka@meta.com>	2025-12-04 17:33:45 -05:00
Lucas Wilkinson	c8ab988b15	[BugFix] Fix DBO assert `assert B_block_table == B_q` (#29933 ) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>	2025-12-04 14:48:54 -05:00
Peng-YM	48a5fff66e	[Bugfix] Missing tokens in `return_token_ids` when tool parsers is enabled in streaming mode (#29074 ) Signed-off-by: Peng-YM <1048217874pengym@gmail.com>	2025-12-04 19:09:39 +00:00
Mercykid-bash	1119f6e47a	Abstract eplb algo (#26471 ) Signed-off-by: Che Ruan <cr623@ic.ac.uk> Signed-off-by: mengxingkongzhouhan <117415539+mengxingkongzhouhan@users.noreply.github.com> Signed-off-by: Mercykid-bash <ruanche0218@gmail.com> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Che Ruan <cr623@ic.ac.uk> Co-authored-by: mengxingkongzhouhan <117415539+mengxingkongzhouhan@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-12-04 19:09:09 +00:00
Harry Mellor	e10c84e06a	Access `partial_rotary_factor` from `rope_parameters` (#29966 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-12-04 18:42:49 +00:00
Kuntai Du	ece2825a29	[KVConnector] Remove v0-related kv connector components such as kv pipe and kv lookup buffer (#29705 ) Signed-off-by: KuntaiDu <kuntai@uchicago.edu>	2025-12-04 18:20:48 +00:00
Jee Jee Li	652ba93da3	[Bugfix] Fix FP8 MoE LoRA (#29890 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-12-04 18:17:49 +00:00
Tao Yun	6dcb07f676	support qwen3-vl handle requests with embeddings (#30037 ) Signed-off-by: taoyun <1069423820@qq.com> Signed-off-by: Cyrus Leung <cyrus.tl.leung@gmail.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2025-12-04 17:34:06 +00:00

1 2 3 4 5 ...

11993 Commits