litellm

mirror of https://github.com/BerriAI/litellm.git synced 2025-12-06 11:33:26 +08:00

Author	SHA1	Message	Date
Ishaan Jaffer	8b499adba6	Revert "Add license metadata to health/readiness endpoint. (#15997 )" This reverts commit `d89990e0c5`.	2025-12-05 19:31:30 -08:00
YutaSaito	12850969fb	Merge pull request #17570 from BerriAI/litellm_fix_mcp_test	2025-12-06 11:24:35 +09:00
Yuta Saito	21a18128ec	fix: mcp test	2025-12-06 10:54:22 +09:00
Ishaan Jaffer	f0a93fb9b9	test_string_cost_values_edge_cases	2025-12-05 17:25:55 -08:00
Ishaan Jaffer	eaa7e61f57	test fixes	2025-12-05 17:12:01 -08:00
yuneng-jiang	8e74a3b692	Merge pull request #17563 from BerriAI/litellm_v2_login_test_fix [Fix] Mock server_root_path for v2/login test	2025-12-05 16:23:51 -08:00
YutaSaito	b5133c4c7d	Feat/mcp preserve tool metadata calltoolresult (#17561 ) * feat(mcp): preserve tool metadata and full CallToolResult in MCP gateway This PR fixes two issues that prevented ChatGPT from rendering MCP UI widgets when proxied through LiteLLM: 1. Preserve Tool Metadata in tools/list - Modified _create_prefixed_tools() to mutate tools in place instead of reconstructing them, preserving all fields including metadata/_meta - This ensures ChatGPT can see 'openai/outputTemplate' URIs in tools/list and will call resources/read to fetch widgets 2. Preserve Full CallToolResult (structuredContent + metadata) - Changed call_mcp_tool() and _handle_managed_mcp_tool() to return full CallToolResult objects instead of just content - Updated error handlers to return CallToolResult with isError flag - Wrapped local tool results in CallToolResult objects - This preserves structuredContent and metadata fields needed for widget rendering Files changed: - litellm/proxy/_experimental/mcp_server/mcp_server_manager.py - litellm/proxy/_experimental/mcp_server/server.py Fixes issues where ChatGPT could not render MCP UI widgets when using LiteLLM as an MCP gateway. * feat(mcp): Preserve tool metadata and return full CallToolResult for ChatGPT UI widgets - Preserve metadata and _meta fields when creating prefixed tools - Return full CallToolResult instead of just content list - Ensures ChatGPT can discover and render UI widgets via openai/outputTemplate - Fixes metadata stripping that prevented widget rendering in ChatGPT Changes: - mcp_server_manager.py: Mutate tools in place to preserve all fields including metadata - server.py: Return CallToolResult with structuredContent and metadata preserved - Added test to verify metadata preservation * fix: guard cost calculator when BaseModel lacks _hidden_params --------- Co-authored-by: Afroz Ahmad <aahmad@Afrozs-MacBook-Pro.local> Co-authored-by: Afroz Ahmad <aahmad@KNDMCPTMZH3.sephoraus.com>	2025-12-05 16:15:22 -08:00
yuneng-jiang	5afd03fef3	Mock server_root_path for test	2025-12-05 16:13:56 -08:00
Cesar Garcia	87f94172a9	fix(responses): Add image generation support for Responses API (#16586 ) * fix(responses): Add image generation support for Responses API Fixes #16227 ## Problem When using Gemini 2.5 Flash Image with /responses endpoint, image generation outputs were not being returned correctly. The response contained only text with empty content instead of the generated images. ## Solution 1. Created new `OutputImageGenerationCall` type for image generation outputs 2. Modified `_extract_message_output_items()` to detect images in completion responses 3. Added `_extract_image_generation_output_items()` to transform images from completion format (data URL) to responses format (pure base64) 4. Added `_extract_base64_from_data_url()` helper to extract base64 from data URLs 5. Updated `ResponsesAPIResponse.output` type to include `OutputImageGenerationCall` ## Changes - litellm/types/responses/main.py: Added OutputImageGenerationCall type - litellm/types/llms/openai.py: Updated ResponsesAPIResponse.output type - litellm/responses/litellm_completion_transformation/transformation.py: Added image detection and extraction logic - tests/test_litellm/responses/litellm_completion_transformation/test_image_generation_output.py: Added comprehensive unit tests (16 tests, all passing) ## Result /responses endpoint now correctly returns: ```json { "output": [{ "type": "image_generation_call", "id": "..._img_0", "status": "completed", "result": "iVBORw0KGgo..." // Pure base64, no data: prefix }] } ``` This matches OpenAI Responses API specification where image generation outputs have type "image_generation_call" with base64 data in "result" field. * docs(responses): Add image generation documentation and tests - Add comprehensive image generation documentation to response_api.md - Include examples for Gemini (no tools param) and OpenAI (with tools param) - Document response format and base64 handling - Add supported models table with provider-specific requirements - Add unit tests for image generation output transformation - Test base64 extraction from data URLs - Test image generation output item creation - Test status mapping and integration scenarios - Verify proper transformation from completions to responses format Related to #16227 * fix(responses): Correct status type for image generation output - Add _map_finish_reason_to_image_generation_status() helper function - Fix MyPy type error: OutputImageGenerationCall.status only accepts ['in_progress', 'completed', 'incomplete', 'failed'], not the full ResponsesAPIStatus union which includes 'cancelled' and 'queued' Fixes MyPy error in transformation.py:838	2025-12-05 15:56:26 -08:00
Cesar Garcia	829b06f53f	Fix: Gemini image_tokens incorrectly treated as text tokens in cost calculation (#17554 ) When Gemini image generation models return `text_tokens=0` with `image_tokens > 0`, the cost calculator was assuming no token breakdown existed and treating all completion tokens as text tokens, resulting in ~10x underestimation of costs. Changes: - Fix cost calculation logic to respect token breakdown when image/audio/reasoning tokens are present, even if text_tokens=0 - Add `output_cost_per_image_token` pricing for gemini-3-pro-image-preview models - Add test case reproducing the issue - Add documentation explaining image token pricing Fixes #17410	2025-12-05 15:55:38 -08:00
Javier de la Torre	2905feb889	feat(oci): Add textarea field type for OCI private key input (#17159 ) This enables Oracle Cloud Infrastructure (OCI) GenAI authentication via the UI by allowing users to paste their PEM private key content directly into a multiline textarea field. Changes: - Add `textarea` field type to UI component system - Configure OCI provider with proper credential fields (oci_key, oci_user, oci_fingerprint, oci_tenancy, oci_region, oci_compartment_id) - Handle PEM content newline normalization (\\n -> \n, \r\n -> \n) - Use OCIError for consistent error handling Previously OCI only supported file-based authentication (oci_key_file), which doesn't work for UI-based model configuration. This adds support for inline PEM content via the new oci_key field. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude <noreply@anthropic.com>	2025-12-05 15:53:54 -08:00
Devaj Mody	e5f7a0b0a5	fix(streaming): add length validation for empty tool_calls in delta (#17523 ) Fixes #17425 - Add length check for tool_calls in model_response.choices[0].delta - Prevents empty tool call objects from appearing in streaming responses - Add regression tests for empty and valid tool_calls scenarios	2025-12-05 15:53:49 -08:00
Chris Lapa	9c5f2ea827	Fixes #13652 - auth not working with ollama.com (#17191 ) * ollama: adds missing auth headers if set * ollama: sets ollama as openai compatible provider. * ollama: adds tests for ollama auth	2025-12-05 15:52:54 -08:00
Cesar Garcia	2cf41d63a6	fix(gemini): use thought:true instead of thoughtSignature to detect thinking blocks (#17266 ) The previous implementation incorrectly used `thoughtSignature` as the criterion to detect thinking blocks. However, per Google's docs: - `thought: true` indicates that a part contains reasoning/thinking content - `thoughtSignature` is just a token for multi-turn context preservation (a part can have thoughtSignature without thought:true, e.g., function calls) This caused functionCall data to leak into reasoning_content when using Gemini 2.5 Pro with streaming + tools enabled. Changes: - _extract_thinking_blocks_from_parts now checks `part.get("thought") is True` - Extract actual text content instead of json.dumps(part) - Include signature only when present (optional in Gemini 2.5) Refs: - https://ai.google.dev/gemini-api/docs/thinking - https://ai.google.dev/gemini-api/docs/thought-signatures	2025-12-05 15:51:51 -08:00
Irfan Sofyana Putra	bffc118170	fix bedrock qwen anthropic beta (#17467 )	2025-12-05 15:47:34 -08:00
Dominic Fallows	2ffe8ee204	fix(presidio): handle empty content and error dict responses (#17489 ) - Skip empty/whitespace text before calling Presidio API - Handle error dict responses gracefully (e.g., {'error': 'No text provided'}) - Add defensive error handling for invalid result items - Add comprehensive test coverage for empty content scenarios Fixes crash in tool/function calling where assistant messages have empty content.	2025-12-05 15:45:19 -08:00
yuneng-jiang	cb18af542e	Merge pull request #17498 from BerriAI/litellm_customer_usage_backend [Feature] Customer (end user) Usage	2025-12-05 15:31:08 -08:00
Devaj Mody	6ff7ed14f6	fix(team): use organization.members instead of deprecated organization.users (#17557 ) Fixes #17552 - Change Prisma include from 'users' to 'members' - Use LiteLLM_OrganizationTableWithMembers type for membership validation - Access organization.members instead of organization.users - Add tests for membership validation	2025-12-05 15:30:59 -08:00
Ishaan Jaff	769f3cc310	[Bug fix] Secret Managers Integration - Make email and secret manager operations independent in key management hooks (#17551 ) * TestKeyManagementEventHooksIndependentOperations * KeyManagementEventHooks - make ops independant	2025-12-05 15:26:00 -08:00
Ishaan Jaff	a78f40f75a	[Fixes] Dynamic Rate Limiter - Dynamic rate limiting token count increases/decreases by 1 instead of actual count + Redis TTL (#17558 ) * fix async_log_success_event for _PROXY_DynamicRateLimitHandlerV3 * test_async_log_success_event_increments_by_actual_tokens * fix redis TTL * Potential fix for code scanning alert no. 3873: Clear-text logging of sensitive information Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com> --------- Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>	2025-12-05 15:25:45 -08:00
YutaSaito	4d39a1a18f	Fix: MLflow streaming spans for Anthropic passthrough (#17288 ) * Fix: MLflow streaming spans for Anthropic passthrough * fix: Revert "Handle MLflow chunk events without delta"	2025-12-05 14:59:36 -08:00
Alexsander Hamir	655e04f16c	Fix: apply_guardrail method and improve test isolation (#17555 ) * Fix Bedrock guardrail apply_guardrail method and test mocks Fixed 4 failing tests in the guardrail test suite: 1. BedrockGuardrail.apply_guardrail now returns original texts when guardrail allows content but doesn't provide output/outputs fields. Previously returned empty list, causing test_bedrock_apply_guardrail_success to fail. 2. Updated test mocks to use correct Bedrock API response format: - Changed from 'content' field to 'output' field - Fixed nested structure from {'text': {'text': '...'}} to {'text': '...'} - Added missing 'output' field in filter test 3. Fixed endpoint test mocks to return GenericGuardrailAPIInputs format: - Changed from tuple (List[str], Optional[List[str]]) to dict {'texts': [...]} - Updated method call assertions to use 'inputs' parameter correctly All 12 guardrail tests now pass successfully. * fix: remove python3-dev from Dockerfile.build_from_pip to avoid Python version conflict The base image cgr.dev/chainguard/python:latest-dev already includes Python 3.14 and its development tools. Installing python3-dev pulls Python 3.13 packages which conflict with the existing Python 3.14 installation, causing file ownership errors during apk install. * fix: disable callbacks in vertex fine-tuning tests to prevent Datadog logging interference The test was failing because Datadog logging was making an HTTP POST request that was being caught by the mock, causing assert_called_once() to fail. By disabling callbacks during the test, we prevent Datadog from making any HTTP calls, allowing the mock to only see the Vertex AI API call. * fix: ensure test isolation in test_logging_non_streaming_request Add proper cleanup to restore original litellm.callbacks after test execution. This prevents test interference when running as part of a larger test suite, where global state pollution was causing async_log_success_event to be called multiple times instead of once. Fixes test failure where the test expected async_log_success_event to be called once but was being called twice due to callbacks from previous tests not being cleaned up.	2025-12-05 12:59:35 -08:00
Cesar Garcia	4eb9f8036f	Add gpt-5.1-codex-max model pricing and configuration (#17541 ) Add support for OpenAI's gpt-5.1-codex-max model, their most intelligent coding model optimized for long-horizon agentic coding tasks. - 400k context window, 128k max output tokens - $1.25/1M input, $10/1M output, $0.125/1M cached input - Only available via /v1/responses endpoint - Supports vision, function calling, reasoning, prompt caching	2025-12-05 12:46:14 -08:00
rgshr	1ea7803d39	fix(github_copilot): preserve encrypted_content in reasoning items for multi-turn conversations (#17130 ) * fix(github_copilot): preserve encrypted_content in reasoning items for multi-turn conversations GitHub Copilot uses encrypted_content in reasoning items to maintain conversation state across turns. The parent class (OpenAIResponsesAPIConfig._handle_reasoning_item) strips this field when converting to OpenAI's ResponseReasoningItem model, causing "encrypted content could not be verified" errors on multi-turn requests. This override preserves encrypted_content while still filtering out status=None which OpenAI's API rejects. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> * chore: regenerate poetry.lock * Revert "chore: regenerate poetry.lock" This reverts commit `8796dc8f96`. --------- Co-authored-by: Claude <noreply@anthropic.com>	2025-12-05 12:42:25 -08:00
yuneng-jiang	62045477ba	Merge pull request #16335 from BerriAI/litellm_ui_callback_fix [Feature] Show all callbacks on UI	2025-12-05 12:35:58 -08:00
Sameer Kankute	b9bcb51f1b	Merge pull request #17542 from BerriAI/litellm_pcs_vertex_fix fix failing vertex tests	2025-12-06 01:15:59 +05:30
Ishaan Jaff	6021f31ebc	Fix: Allow null max_budget in budget update endpoint (#17545 ) Co-authored-by: Cursor Agent <cursoragent@cursor.com> Co-authored-by: ishaan <ishaan@berri.ai>	2025-12-05 11:45:23 -08:00
yuneng-jiang	4a0893ca22	Merge remote-tracking branch 'origin' into litellm_ui_callback_fix	2025-12-05 11:43:35 -08:00
Ishaan Jaff	77cce4202e	[Bug fix] WatsonX audio transcriptions, don't force content type in request headers (#17546 ) * fix watsonx content type * watsonx content type	2025-12-05 10:56:15 -08:00
Sameer Kankute	64c001255d	Add embedding pcs support	2025-12-06 00:20:30 +05:30
Sameer Kankute	e924b6978a	Merge pull request #17137 from BerriAI/litellm_gemini3_media_res_fix Make sure that media resolution is only for gemini 3 model	2025-12-06 00:06:55 +05:30
Sameer Kankute	43914796d6	fix failing vertex tests	2025-12-06 00:04:04 +05:30
Krish Dholakia	85d73403f4	Refactor: Skip PublicAI tests if API key is not set (#17540 ) Co-authored-by: Cursor Agent <cursoragent@cursor.com>	2025-12-05 10:22:07 -08:00
Alexsander Hamir	c0d149e0a9	Fix: Lack of None value checks & update publicai_chat_transformation tests (#17539 ) * fix: handle none content * fix: defensive check on none value * Fix test failures: Azure OCR skip, None content handling, PublicAI JSON config - Skip aocr/ocr call types in Azure test (they don't use Azure SDK client) - Handle None content in Responses API transformation (skip message creation) - Update PublicAI tests to use JSON-based configuration system - Add None check in PublicAI test fixture to fix type error	2025-12-05 09:43:52 -08:00
Sameer Kankute	a21f1ce21f	Merge pull request #17528 from BerriAI/litellm_save_background_checks Add background health checks to db	2025-12-05 22:25:03 +05:30
Sameer Kankute	558c8f92d1	Merge pull request #17519 from BerriAI/litellm_cursor_integration Add support for cursor BYOK with its own configuration	2025-12-05 22:23:45 +05:30
Sameer Kankute	49a344ebd9	Merge pull request #17525 from BerriAI/litellm_fix_in_memory_vector_store Fix vector store configuration synchronization failure	2025-12-05 22:23:17 +05:30
Sameer Kankute	b6867184a8	Merge pull request #17534 from colinlin-stripe/colinlin/opus-budget-thinking [fix] parse <budget:thinking> blocks for opus 4.5	2025-12-05 22:21:05 +05:30
Sameer Kankute	5f23d94b7e	Fixed media resoltion for gemini 3	2025-12-05 22:16:36 +05:30
Alexsander Hamir	96122a8b5a	Fix Presidio guardrail test TypeError and license base64 decoding error (#17538 ) Fixed two issues: 1. Presidio guardrail test TypeError: - Issue: test_presidio_apply_guardrail() was calling apply_guardrail() with incorrect arguments (text=, language=) instead of the correct signature (inputs=, request_data=, input_type=) - Fix: Updated test to use correct method signature: - Changed from: apply_guardrail(text=..., language=...) - Changed to: apply_guardrail(inputs={'texts': [...]}, request_data={}, input_type='request') - Also updated assertions to extract text from response['texts'][0] 2. License verification base64 decoding error: - Issue: verify_license_without_api_request() was failing with 'Invalid base64-encoded string: number of data characters (185) cannot be 1 more than a multiple of 4' when license keys lacked proper base64 padding - Root cause: Base64 strings must be a multiple of 4 characters. Some license keys were missing padding characters (=) needed for proper decoding - Fix: Added automatic padding before base64 decoding: - Calculate padding needed: len(license_key) % 4 - Add '=' characters to make length a multiple of 4 - This makes license verification robust to keys with or without padding Both fixes ensure the code handles edge cases properly and tests use correct APIs.	2025-12-05 08:45:02 -08:00
Colin Lin	0bd144103d	[stripe] simplify opus test	2025-12-05 10:56:09 -05:00
Colin Lin	3046b9f163	[stripe] opus budget thinking	2025-12-05 10:56:02 -05:00
Sameer Kankute	3d6b7f0d3d	Add background health checks to db	2025-12-05 14:27:37 +05:30
yuneng-jiang	b9b5d638c8	Merge pull request #17524 from BerriAI/litellm_team_user_settings_fix [Fix] Select in Edit Membership Modal	2025-12-04 23:13:13 -08:00
yuneng-jiang	37bfe65bdd	Adding screenshot to debug	2025-12-04 23:05:00 -08:00
yuneng-jiang	50283a00a3	e2e fix	2025-12-04 22:51:52 -08:00
Sameer Kankute	acc0b5fe27	Merge pull request #17362 from BerriAI/litellm_vertex-bge-cherrypick [Feat] VertexAI - Add BGE Embeddings support	2025-12-05 11:53:42 +05:30
Sameer Kankute	99fd96687f	Fix vector store configuration synchronization failure	2025-12-05 11:46:14 +05:30
Krish Dholakia	51cc102c30	fix(unified_guardrail.py): support during_call event type for unified guardrails (#17514 ) * fix(unified_guardrail.py): support during_call event type for unified guardrails allows guardrails overriding apply_guardrails to work 'during_call' * feat(generic_guardrail_api.py): support new 'tool_calls' field for generic guardrail api returns the tool calls emitted by the LLM API to the user * fix(generic_guardrail_api.py): working anthropic /v1/messages tool call response send llm tool calls to guardrail api when called via `/v1/messages` API * fix(responses/): run generic_guardrail_api on responses api tool call responses * fix: fix tests * test: fix tests * fix: fix tests	2025-12-04 22:06:13 -08:00
Cesar Garcia	316f7671a9	fix(gemini): handle partial JSON chunks after first valid chunk (#17496 ) * fix(gemini): allow JSON accumulation on any chunk, not just first * test(gemini): add tests for partial JSON chunk handling	2025-12-04 22:01:59 -08:00

1 2 3 4 5 ...

4546 Commits