litellm

mirror of https://github.com/BerriAI/litellm.git synced 2025-12-06 11:33:26 +08:00

Author	SHA1	Message	Date
Ishaan Jaffer	8b499adba6	Revert "Add license metadata to health/readiness endpoint. (#15997 )" This reverts commit `d89990e0c5`.	2025-12-05 19:31:30 -08:00
YutaSaito	12850969fb	Merge pull request #17570 from BerriAI/litellm_fix_mcp_test	2025-12-06 11:24:35 +09:00
Yuta Saito	21a18128ec	fix: mcp test	2025-12-06 10:54:22 +09:00
Ishaan Jaffer	ce4b5daf70	ollama fix	2025-12-05 17:25:55 -08:00
Ishaan Jaffer	f0a93fb9b9	test_string_cost_values_edge_cases	2025-12-05 17:25:55 -08:00
yuneng-jiang	fdb49c97f2	Merge pull request #17562 from BerriAI/litellm_ui_compare_images [Feature] Support Images in Compare UI	2025-12-05 17:24:05 -08:00
Ishaan Jaffer	96e4c9e078	fix _update_metadata_with_tags_in_header	2025-12-05 17:20:14 -08:00
dependabot[bot]	83291d394e	build(deps): bump mdast-util-to-hast in /ui/litellm-dashboard (#17444 ) Bumps [mdast-util-to-hast](https://github.com/syntax-tree/mdast-util-to-hast) from 13.2.0 to 13.2.1. - [Release notes](https://github.com/syntax-tree/mdast-util-to-hast/releases) - [Commits](https://github.com/syntax-tree/mdast-util-to-hast/compare/13.2.0...13.2.1) --- updated-dependencies: - dependency-name: mdast-util-to-hast dependency-version: 13.2.1 dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2025-12-05 17:12:51 -08:00
Ishaan Jaffer	eaa7e61f57	test fixes	2025-12-05 17:12:01 -08:00
Ishaan Jaffer	58f8be60a1	fix REDIS_DAILY_END_USER_SPEND_UPDATE_QUEUE	2025-12-05 17:07:09 -08:00
yuneng-jiang	82376d8b76	Merge pull request #17564 from BerriAI/litellm_end_user_spend_redis_test [Fix] CI/CD - Adding end user and org to service types	2025-12-05 16:44:37 -08:00
yuneng-jiang	86baa9e5fb	Adding end user and org to service types	2025-12-05 16:38:09 -08:00
yuneng-jiang	8e74a3b692	Merge pull request #17563 from BerriAI/litellm_v2_login_test_fix [Fix] Mock server_root_path for v2/login test	2025-12-05 16:23:51 -08:00
yuneng-jiang	df8b0e8389	Merge pull request #17506 from BerriAI/litellm_ui_customer_usage [Feature] Customer Usage UI	2025-12-05 16:18:42 -08:00
YutaSaito	b5133c4c7d	Feat/mcp preserve tool metadata calltoolresult (#17561 ) * feat(mcp): preserve tool metadata and full CallToolResult in MCP gateway This PR fixes two issues that prevented ChatGPT from rendering MCP UI widgets when proxied through LiteLLM: 1. Preserve Tool Metadata in tools/list - Modified _create_prefixed_tools() to mutate tools in place instead of reconstructing them, preserving all fields including metadata/_meta - This ensures ChatGPT can see 'openai/outputTemplate' URIs in tools/list and will call resources/read to fetch widgets 2. Preserve Full CallToolResult (structuredContent + metadata) - Changed call_mcp_tool() and _handle_managed_mcp_tool() to return full CallToolResult objects instead of just content - Updated error handlers to return CallToolResult with isError flag - Wrapped local tool results in CallToolResult objects - This preserves structuredContent and metadata fields needed for widget rendering Files changed: - litellm/proxy/_experimental/mcp_server/mcp_server_manager.py - litellm/proxy/_experimental/mcp_server/server.py Fixes issues where ChatGPT could not render MCP UI widgets when using LiteLLM as an MCP gateway. * feat(mcp): Preserve tool metadata and return full CallToolResult for ChatGPT UI widgets - Preserve metadata and _meta fields when creating prefixed tools - Return full CallToolResult instead of just content list - Ensures ChatGPT can discover and render UI widgets via openai/outputTemplate - Fixes metadata stripping that prevented widget rendering in ChatGPT Changes: - mcp_server_manager.py: Mutate tools in place to preserve all fields including metadata - server.py: Return CallToolResult with structuredContent and metadata preserved - Added test to verify metadata preservation * fix: guard cost calculator when BaseModel lacks _hidden_params --------- Co-authored-by: Afroz Ahmad <aahmad@Afrozs-MacBook-Pro.local> Co-authored-by: Afroz Ahmad <aahmad@KNDMCPTMZH3.sephoraus.com>	2025-12-05 16:15:22 -08:00
yuneng-jiang	5afd03fef3	Mock server_root_path for test	2025-12-05 16:13:56 -08:00
Xingjian Li	342723eb12	fix: Handle global location for Vertex AI Gemini image generation (#17255 ) - Add check for 'global' location to use correct API endpoint - Global location uses aiplatform.googleapis.com without region prefix - Regional locations use {region}-aiplatform.googleapis.com format - Fixes URL construction error when using vertex_location='global' Resolves issue with gemini-3-pro-image-preview model on global endpoint	2025-12-05 15:56:38 -08:00
Cesar Garcia	87f94172a9	fix(responses): Add image generation support for Responses API (#16586 ) * fix(responses): Add image generation support for Responses API Fixes #16227 ## Problem When using Gemini 2.5 Flash Image with /responses endpoint, image generation outputs were not being returned correctly. The response contained only text with empty content instead of the generated images. ## Solution 1. Created new `OutputImageGenerationCall` type for image generation outputs 2. Modified `_extract_message_output_items()` to detect images in completion responses 3. Added `_extract_image_generation_output_items()` to transform images from completion format (data URL) to responses format (pure base64) 4. Added `_extract_base64_from_data_url()` helper to extract base64 from data URLs 5. Updated `ResponsesAPIResponse.output` type to include `OutputImageGenerationCall` ## Changes - litellm/types/responses/main.py: Added OutputImageGenerationCall type - litellm/types/llms/openai.py: Updated ResponsesAPIResponse.output type - litellm/responses/litellm_completion_transformation/transformation.py: Added image detection and extraction logic - tests/test_litellm/responses/litellm_completion_transformation/test_image_generation_output.py: Added comprehensive unit tests (16 tests, all passing) ## Result /responses endpoint now correctly returns: ```json { "output": [{ "type": "image_generation_call", "id": "..._img_0", "status": "completed", "result": "iVBORw0KGgo..." // Pure base64, no data: prefix }] } ``` This matches OpenAI Responses API specification where image generation outputs have type "image_generation_call" with base64 data in "result" field. * docs(responses): Add image generation documentation and tests - Add comprehensive image generation documentation to response_api.md - Include examples for Gemini (no tools param) and OpenAI (with tools param) - Document response format and base64 handling - Add supported models table with provider-specific requirements - Add unit tests for image generation output transformation - Test base64 extraction from data URLs - Test image generation output item creation - Test status mapping and integration scenarios - Verify proper transformation from completions to responses format Related to #16227 * fix(responses): Correct status type for image generation output - Add _map_finish_reason_to_image_generation_status() helper function - Fix MyPy type error: OutputImageGenerationCall.status only accepts ['in_progress', 'completed', 'incomplete', 'failed'], not the full ResponsesAPIStatus union which includes 'cancelled' and 'queued' Fixes MyPy error in transformation.py:838	2025-12-05 15:56:26 -08:00
Cesar Garcia	829b06f53f	Fix: Gemini image_tokens incorrectly treated as text tokens in cost calculation (#17554 ) When Gemini image generation models return `text_tokens=0` with `image_tokens > 0`, the cost calculator was assuming no token breakdown existed and treating all completion tokens as text tokens, resulting in ~10x underestimation of costs. Changes: - Fix cost calculation logic to respect token breakdown when image/audio/reasoning tokens are present, even if text_tokens=0 - Add `output_cost_per_image_token` pricing for gemini-3-pro-image-preview models - Add test case reproducing the issue - Add documentation explaining image token pricing Fixes #17410	2025-12-05 15:55:38 -08:00
Javier de la Torre	2905feb889	feat(oci): Add textarea field type for OCI private key input (#17159 ) This enables Oracle Cloud Infrastructure (OCI) GenAI authentication via the UI by allowing users to paste their PEM private key content directly into a multiline textarea field. Changes: - Add `textarea` field type to UI component system - Configure OCI provider with proper credential fields (oci_key, oci_user, oci_fingerprint, oci_tenancy, oci_region, oci_compartment_id) - Handle PEM content newline normalization (\\n -> \n, \r\n -> \n) - Use OCIError for consistent error handling Previously OCI only supported file-based authentication (oci_key_file), which doesn't work for UI-based model configuration. This adds support for inline PEM content via the new oci_key field. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude <noreply@anthropic.com>	2025-12-05 15:53:54 -08:00
Devaj Mody	e5f7a0b0a5	fix(streaming): add length validation for empty tool_calls in delta (#17523 ) Fixes #17425 - Add length check for tool_calls in model_response.choices[0].delta - Prevents empty tool call objects from appearing in streaming responses - Add regression tests for empty and valid tool_calls scenarios	2025-12-05 15:53:49 -08:00
Yuichiro Utsumi	d18e489872	fix(docs): remove `source .env` (#17466 ) Remove `source .env` since `docker compose` automatically loads the `.env` file. Signed-off-by: utsumi.yuichiro <utsumi.yuichiro@fujitsu.com>	2025-12-05 15:53:05 -08:00
Chris Lapa	9c5f2ea827	Fixes #13652 - auth not working with ollama.com (#17191 ) * ollama: adds missing auth headers if set * ollama: sets ollama as openai compatible provider. * ollama: adds tests for ollama auth	2025-12-05 15:52:54 -08:00
yuneng-jiang	852a1fee89	Support images in compare UI	2025-12-05 15:51:56 -08:00
Cesar Garcia	2cf41d63a6	fix(gemini): use thought:true instead of thoughtSignature to detect thinking blocks (#17266 ) The previous implementation incorrectly used `thoughtSignature` as the criterion to detect thinking blocks. However, per Google's docs: - `thought: true` indicates that a part contains reasoning/thinking content - `thoughtSignature` is just a token for multi-turn context preservation (a part can have thoughtSignature without thought:true, e.g., function calls) This caused functionCall data to leak into reasoning_content when using Gemini 2.5 Pro with streaming + tools enabled. Changes: - _extract_thinking_blocks_from_parts now checks `part.get("thought") is True` - Extract actual text content instead of json.dumps(part) - Include signature only when present (optional in Gemini 2.5) Refs: - https://ai.google.dev/gemini-api/docs/thinking - https://ai.google.dev/gemini-api/docs/thought-signatures	2025-12-05 15:51:51 -08:00
Irfan Sofyana Putra	bffc118170	fix bedrock qwen anthropic beta (#17467 )	2025-12-05 15:47:34 -08:00
Ishaan Jaffer	e519462efa	fix MYPY linting	2025-12-05 15:46:26 -08:00
Ishaan Jaffer	ae065525ea	fix ZAI	2025-12-05 15:46:26 -08:00
Dominic Fallows	2ffe8ee204	fix(presidio): handle empty content and error dict responses (#17489 ) - Skip empty/whitespace text before calling Presidio API - Handle error dict responses gracefully (e.g., {'error': 'No text provided'}) - Add defensive error handling for invalid result items - Add comprehensive test coverage for empty content scenarios Fixes crash in tool/function calling where assistant messages have empty content.	2025-12-05 15:45:19 -08:00
dependabot[bot]	5fb7530d8c	build(deps): bump jws from 3.2.2 to 3.2.3 in /ui/litellm-dashboard (#17494 ) Bumps [jws](https://github.com/brianloveswords/node-jws) from 3.2.2 to 3.2.3. - [Release notes](https://github.com/brianloveswords/node-jws/releases) - [Changelog](https://github.com/auth0/node-jws/blob/master/CHANGELOG.md) - [Commits](https://github.com/brianloveswords/node-jws/compare/v3.2.2...v3.2.3) --- updated-dependencies: - dependency-name: jws dependency-version: 3.2.3 dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2025-12-05 15:44:15 -08:00
Ishaan Jaff	f02df3035a	[Feat] Allow using dynamic rate limit/priority reservation on teams (#17061 ) * use helper to get key/team priority * test_team_metadata_priority * docs team priority	2025-12-05 15:42:27 -08:00
yuneng-jiang	cb18af542e	Merge pull request #17498 from BerriAI/litellm_customer_usage_backend [Feature] Customer (end user) Usage	2025-12-05 15:31:08 -08:00
Devaj Mody	6ff7ed14f6	fix(team): use organization.members instead of deprecated organization.users (#17557 ) Fixes #17552 - Change Prisma include from 'users' to 'members' - Use LiteLLM_OrganizationTableWithMembers type for membership validation - Access organization.members instead of organization.users - Add tests for membership validation	2025-12-05 15:30:59 -08:00
Cesar Garcia	7259de2f12	feat: add Mistral Large 3 model support (#17547 ) Add Mistral Large 3 (675B MoE) to model catalog for both providers: - mistral/mistral-large-3 - azure_ai/mistral-large-3 Specs: - 256k context window - $0.50/1M input, $1.50/1M output - Supports vision (multimodal) - Supports function calling Closes #17527	2025-12-05 15:26:20 -08:00
Ishaan Jaff	769f3cc310	[Bug fix] Secret Managers Integration - Make email and secret manager operations independent in key management hooks (#17551 ) * TestKeyManagementEventHooksIndependentOperations * KeyManagementEventHooks - make ops independant	2025-12-05 15:26:00 -08:00
Ishaan Jaff	a78f40f75a	[Fixes] Dynamic Rate Limiter - Dynamic rate limiting token count increases/decreases by 1 instead of actual count + Redis TTL (#17558 ) * fix async_log_success_event for _PROXY_DynamicRateLimitHandlerV3 * test_async_log_success_event_increments_by_actual_tokens * fix redis TTL * Potential fix for code scanning alert no. 3873: Clear-text logging of sensitive information Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com> --------- Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>	2025-12-05 15:25:45 -08:00
YutaSaito	4d39a1a18f	Fix: MLflow streaming spans for Anthropic passthrough (#17288 ) * Fix: MLflow streaming spans for Anthropic passthrough * fix: Revert "Handle MLflow chunk events without delta"	2025-12-05 14:59:36 -08:00
Alexsander Hamir	655e04f16c	Fix: apply_guardrail method and improve test isolation (#17555 ) * Fix Bedrock guardrail apply_guardrail method and test mocks Fixed 4 failing tests in the guardrail test suite: 1. BedrockGuardrail.apply_guardrail now returns original texts when guardrail allows content but doesn't provide output/outputs fields. Previously returned empty list, causing test_bedrock_apply_guardrail_success to fail. 2. Updated test mocks to use correct Bedrock API response format: - Changed from 'content' field to 'output' field - Fixed nested structure from {'text': {'text': '...'}} to {'text': '...'} - Added missing 'output' field in filter test 3. Fixed endpoint test mocks to return GenericGuardrailAPIInputs format: - Changed from tuple (List[str], Optional[List[str]]) to dict {'texts': [...]} - Updated method call assertions to use 'inputs' parameter correctly All 12 guardrail tests now pass successfully. * fix: remove python3-dev from Dockerfile.build_from_pip to avoid Python version conflict The base image cgr.dev/chainguard/python:latest-dev already includes Python 3.14 and its development tools. Installing python3-dev pulls Python 3.13 packages which conflict with the existing Python 3.14 installation, causing file ownership errors during apk install. * fix: disable callbacks in vertex fine-tuning tests to prevent Datadog logging interference The test was failing because Datadog logging was making an HTTP POST request that was being caught by the mock, causing assert_called_once() to fail. By disabling callbacks during the test, we prevent Datadog from making any HTTP calls, allowing the mock to only see the Vertex AI API call. * fix: ensure test isolation in test_logging_non_streaming_request Add proper cleanup to restore original litellm.callbacks after test execution. This prevents test interference when running as part of a larger test suite, where global state pollution was causing async_log_success_event to be called multiple times instead of once. Fixes test failure where the test expected async_log_success_event to be called once but was being called twice due to callbacks from previous tests not being cleaned up.	2025-12-05 12:59:35 -08:00
Cesar Garcia	4eb9f8036f	Add gpt-5.1-codex-max model pricing and configuration (#17541 ) Add support for OpenAI's gpt-5.1-codex-max model, their most intelligent coding model optimized for long-horizon agentic coding tasks. - 400k context window, 128k max output tokens - $1.25/1M input, $10/1M output, $0.125/1M cached input - Only available via /v1/responses endpoint - Supports vision, function calling, reasoning, prompt caching	2025-12-05 12:46:14 -08:00
rgshr	1ea7803d39	fix(github_copilot): preserve encrypted_content in reasoning items for multi-turn conversations (#17130 ) * fix(github_copilot): preserve encrypted_content in reasoning items for multi-turn conversations GitHub Copilot uses encrypted_content in reasoning items to maintain conversation state across turns. The parent class (OpenAIResponsesAPIConfig._handle_reasoning_item) strips this field when converting to OpenAI's ResponseReasoningItem model, causing "encrypted content could not be verified" errors on multi-turn requests. This override preserves encrypted_content while still filtering out status=None which OpenAI's API rejects. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> * chore: regenerate poetry.lock * Revert "chore: regenerate poetry.lock" This reverts commit `8796dc8f96`. --------- Co-authored-by: Claude <noreply@anthropic.com>	2025-12-05 12:42:25 -08:00
yuneng-jiang	62045477ba	Merge pull request #16335 from BerriAI/litellm_ui_callback_fix [Feature] Show all callbacks on UI	2025-12-05 12:35:58 -08:00
Sameer Kankute	b9bcb51f1b	Merge pull request #17542 from BerriAI/litellm_pcs_vertex_fix fix failing vertex tests	2025-12-06 01:15:59 +05:30
Ishaan Jaff	6021f31ebc	Fix: Allow null max_budget in budget update endpoint (#17545 ) Co-authored-by: Cursor Agent <cursoragent@cursor.com> Co-authored-by: ishaan <ishaan@berri.ai>	2025-12-05 11:45:23 -08:00
yuneng-jiang	4a0893ca22	Merge remote-tracking branch 'origin' into litellm_ui_callback_fix	2025-12-05 11:43:35 -08:00
yuneng-jiang	2b0e83b79d	Merge pull request #17549 from BerriAI/litellm_yuneng_temp [Infra] Bump LiteLLM Enterprise Version	2025-12-05 11:24:04 -08:00
yuneng-jiang	6a60c950fe	bumping enterprise build	2025-12-05 11:14:00 -08:00
yuneng-jiang	a750f5ca69	bump: version 0.1.22 → 0.1.23	2025-12-05 11:08:04 -08:00
Ishaan Jaff	77cce4202e	[Bug fix] WatsonX audio transcriptions, don't force content type in request headers (#17546 ) * fix watsonx content type * watsonx content type	2025-12-05 10:56:15 -08:00
Sameer Kankute	64c001255d	Add embedding pcs support	2025-12-06 00:20:30 +05:30
Sameer Kankute	e924b6978a	Merge pull request #17137 from BerriAI/litellm_gemini3_media_res_fix Make sure that media resolution is only for gemini 3 model	2025-12-06 00:06:55 +05:30

1 2 3 4 5 ...

28312 Commits