4546 Commits

Author SHA1 Message Date
Ishaan Jaffer
8b499adba6 Revert "Add license metadata to health/readiness endpoint. (#15997)"
This reverts commit d89990e0c5.
2025-12-05 19:31:30 -08:00
YutaSaito
12850969fb Merge pull request #17570 from BerriAI/litellm_fix_mcp_test 2025-12-06 11:24:35 +09:00
Yuta Saito
21a18128ec fix: mcp test 2025-12-06 10:54:22 +09:00
Ishaan Jaffer
f0a93fb9b9 test_string_cost_values_edge_cases 2025-12-05 17:25:55 -08:00
Ishaan Jaffer
eaa7e61f57 test fixes 2025-12-05 17:12:01 -08:00
yuneng-jiang
8e74a3b692 Merge pull request #17563 from BerriAI/litellm_v2_login_test_fix
[Fix] Mock server_root_path for v2/login test
2025-12-05 16:23:51 -08:00
YutaSaito
b5133c4c7d Feat/mcp preserve tool metadata calltoolresult (#17561)
* feat(mcp): preserve tool metadata and full CallToolResult in MCP gateway

This PR fixes two issues that prevented ChatGPT from rendering MCP UI widgets
when proxied through LiteLLM:

1. Preserve Tool Metadata in tools/list
   - Modified _create_prefixed_tools() to mutate tools in place instead of
     reconstructing them, preserving all fields including metadata/_meta
   - This ensures ChatGPT can see 'openai/outputTemplate' URIs in tools/list
     and will call resources/read to fetch widgets

2. Preserve Full CallToolResult (structuredContent + metadata)
   - Changed call_mcp_tool() and _handle_managed_mcp_tool() to return full
     CallToolResult objects instead of just content
   - Updated error handlers to return CallToolResult with isError flag
   - Wrapped local tool results in CallToolResult objects
   - This preserves structuredContent and metadata fields needed for widget rendering

Files changed:
- litellm/proxy/_experimental/mcp_server/mcp_server_manager.py
- litellm/proxy/_experimental/mcp_server/server.py

Fixes issues where ChatGPT could not render MCP UI widgets when using
LiteLLM as an MCP gateway.

* feat(mcp): Preserve tool metadata and return full CallToolResult for ChatGPT UI widgets

- Preserve metadata and _meta fields when creating prefixed tools
- Return full CallToolResult instead of just content list
- Ensures ChatGPT can discover and render UI widgets via openai/outputTemplate
- Fixes metadata stripping that prevented widget rendering in ChatGPT

Changes:
- mcp_server_manager.py: Mutate tools in place to preserve all fields including metadata
- server.py: Return CallToolResult with structuredContent and metadata preserved
- Added test to verify metadata preservation

* fix: guard cost calculator when BaseModel lacks _hidden_params

---------

Co-authored-by: Afroz Ahmad <aahmad@Afrozs-MacBook-Pro.local>
Co-authored-by: Afroz Ahmad <aahmad@KNDMCPTMZH3.sephoraus.com>
2025-12-05 16:15:22 -08:00
yuneng-jiang
5afd03fef3 Mock server_root_path for test 2025-12-05 16:13:56 -08:00
Cesar Garcia
87f94172a9 fix(responses): Add image generation support for Responses API (#16586)
* fix(responses): Add image generation support for Responses API

Fixes #16227

## Problem
When using Gemini 2.5 Flash Image with /responses endpoint, image generation
outputs were not being returned correctly. The response contained only text
with empty content instead of the generated images.

## Solution
1. Created new `OutputImageGenerationCall` type for image generation outputs
2. Modified `_extract_message_output_items()` to detect images in completion responses
3. Added `_extract_image_generation_output_items()` to transform images from
   completion format (data URL) to responses format (pure base64)
4. Added `_extract_base64_from_data_url()` helper to extract base64 from data URLs
5. Updated `ResponsesAPIResponse.output` type to include `OutputImageGenerationCall`

## Changes
- litellm/types/responses/main.py: Added OutputImageGenerationCall type
- litellm/types/llms/openai.py: Updated ResponsesAPIResponse.output type
- litellm/responses/litellm_completion_transformation/transformation.py:
  Added image detection and extraction logic
- tests/test_litellm/responses/litellm_completion_transformation/test_image_generation_output.py:
  Added comprehensive unit tests (16 tests, all passing)

## Result
/responses endpoint now correctly returns:
```json
{
  "output": [{
    "type": "image_generation_call",
    "id": "..._img_0",
    "status": "completed",
    "result": "iVBORw0KGgo..."  // Pure base64, no data: prefix
  }]
}
```

This matches OpenAI Responses API specification where image generation
outputs have type "image_generation_call" with base64 data in "result" field.

* docs(responses): Add image generation documentation and tests

- Add comprehensive image generation documentation to response_api.md
  - Include examples for Gemini (no tools param) and OpenAI (with tools param)
  - Document response format and base64 handling
  - Add supported models table with provider-specific requirements

- Add unit tests for image generation output transformation
  - Test base64 extraction from data URLs
  - Test image generation output item creation
  - Test status mapping and integration scenarios
  - Verify proper transformation from completions to responses format

Related to #16227

* fix(responses): Correct status type for image generation output

- Add _map_finish_reason_to_image_generation_status() helper function
- Fix MyPy type error: OutputImageGenerationCall.status only accepts
  ['in_progress', 'completed', 'incomplete', 'failed'], not the full
  ResponsesAPIStatus union which includes 'cancelled' and 'queued'

Fixes MyPy error in transformation.py:838
2025-12-05 15:56:26 -08:00
Cesar Garcia
829b06f53f Fix: Gemini image_tokens incorrectly treated as text tokens in cost calculation (#17554)
When Gemini image generation models return `text_tokens=0` with `image_tokens > 0`,
the cost calculator was assuming no token breakdown existed and treating all
completion tokens as text tokens, resulting in ~10x underestimation of costs.

Changes:
- Fix cost calculation logic to respect token breakdown when image/audio/reasoning
  tokens are present, even if text_tokens=0
- Add `output_cost_per_image_token` pricing for gemini-3-pro-image-preview models
- Add test case reproducing the issue
- Add documentation explaining image token pricing

Fixes #17410
2025-12-05 15:55:38 -08:00
Javier de la Torre
2905feb889 feat(oci): Add textarea field type for OCI private key input (#17159)
This enables Oracle Cloud Infrastructure (OCI) GenAI authentication via the UI
by allowing users to paste their PEM private key content directly into a
multiline textarea field.

Changes:
- Add `textarea` field type to UI component system
- Configure OCI provider with proper credential fields (oci_key, oci_user,
  oci_fingerprint, oci_tenancy, oci_region, oci_compartment_id)
- Handle PEM content newline normalization (\\n -> \n, \r\n -> \n)
- Use OCIError for consistent error handling

Previously OCI only supported file-based authentication (oci_key_file), which
doesn't work for UI-based model configuration. This adds support for inline
PEM content via the new oci_key field.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: Claude <noreply@anthropic.com>
2025-12-05 15:53:54 -08:00
Devaj Mody
e5f7a0b0a5 fix(streaming): add length validation for empty tool_calls in delta (#17523)
Fixes #17425

  - Add length check for tool_calls in model_response.choices[0].delta
  - Prevents empty tool call objects from appearing in streaming responses
  - Add regression tests for empty and valid tool_calls scenarios
2025-12-05 15:53:49 -08:00
Chris Lapa
9c5f2ea827 Fixes #13652 - auth not working with ollama.com (#17191)
* ollama: adds missing auth headers if set

* ollama: sets ollama as openai compatible provider.

* ollama: adds tests for ollama auth
2025-12-05 15:52:54 -08:00
Cesar Garcia
2cf41d63a6 fix(gemini): use thought:true instead of thoughtSignature to detect thinking blocks (#17266)
The previous implementation incorrectly used `thoughtSignature` as the criterion
to detect thinking blocks. However, per Google's docs:
- `thought: true` indicates that a part contains reasoning/thinking content
- `thoughtSignature` is just a token for multi-turn context preservation
  (a part can have thoughtSignature without thought:true, e.g., function calls)

This caused functionCall data to leak into reasoning_content when using
Gemini 2.5 Pro with streaming + tools enabled.

Changes:
- _extract_thinking_blocks_from_parts now checks `part.get("thought") is True`
- Extract actual text content instead of json.dumps(part)
- Include signature only when present (optional in Gemini 2.5)

Refs:
- https://ai.google.dev/gemini-api/docs/thinking
- https://ai.google.dev/gemini-api/docs/thought-signatures
2025-12-05 15:51:51 -08:00
Irfan Sofyana Putra
bffc118170 fix bedrock qwen anthropic beta (#17467) 2025-12-05 15:47:34 -08:00
Dominic Fallows
2ffe8ee204 fix(presidio): handle empty content and error dict responses (#17489)
- Skip empty/whitespace text before calling Presidio API
- Handle error dict responses gracefully (e.g., {'error': 'No text provided'})
- Add defensive error handling for invalid result items
- Add comprehensive test coverage for empty content scenarios

Fixes crash in tool/function calling where assistant messages have empty content.
2025-12-05 15:45:19 -08:00
yuneng-jiang
cb18af542e Merge pull request #17498 from BerriAI/litellm_customer_usage_backend
[Feature] Customer (end user) Usage
2025-12-05 15:31:08 -08:00
Devaj Mody
6ff7ed14f6 fix(team): use organization.members instead of deprecated organization.users (#17557)
Fixes #17552

  - Change Prisma include from 'users' to 'members'
  - Use LiteLLM_OrganizationTableWithMembers type for membership validation
  - Access organization.members instead of organization.users
  - Add tests for membership validation
2025-12-05 15:30:59 -08:00
Ishaan Jaff
769f3cc310 [Bug fix] Secret Managers Integration - Make email and secret manager operations independent in key management hooks (#17551)
* TestKeyManagementEventHooksIndependentOperations

* KeyManagementEventHooks - make ops independant
2025-12-05 15:26:00 -08:00
Ishaan Jaff
a78f40f75a [Fixes] Dynamic Rate Limiter - Dynamic rate limiting token count increases/decreases by 1 instead of actual count + Redis TTL (#17558)
* fix async_log_success_event for _PROXY_DynamicRateLimitHandlerV3

* test_async_log_success_event_increments_by_actual_tokens

* fix redis TTL

* Potential fix for code scanning alert no. 3873: Clear-text logging of sensitive information

Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>

---------

Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
2025-12-05 15:25:45 -08:00
YutaSaito
4d39a1a18f Fix: MLflow streaming spans for Anthropic passthrough (#17288)
* Fix: MLflow streaming spans for Anthropic passthrough

* fix: Revert "Handle MLflow chunk events without delta"
2025-12-05 14:59:36 -08:00
Alexsander Hamir
655e04f16c Fix: apply_guardrail method and improve test isolation (#17555)
* Fix Bedrock guardrail apply_guardrail method and test mocks

Fixed 4 failing tests in the guardrail test suite:

1. BedrockGuardrail.apply_guardrail now returns original texts when guardrail
   allows content but doesn't provide output/outputs fields. Previously returned
   empty list, causing test_bedrock_apply_guardrail_success to fail.

2. Updated test mocks to use correct Bedrock API response format:
   - Changed from 'content' field to 'output' field
   - Fixed nested structure from {'text': {'text': '...'}} to {'text': '...'}
   - Added missing 'output' field in filter test

3. Fixed endpoint test mocks to return GenericGuardrailAPIInputs format:
   - Changed from tuple (List[str], Optional[List[str]]) to dict {'texts': [...]}
   - Updated method call assertions to use 'inputs' parameter correctly

All 12 guardrail tests now pass successfully.

* fix: remove python3-dev from Dockerfile.build_from_pip to avoid Python version conflict

The base image cgr.dev/chainguard/python:latest-dev already includes Python 3.14
and its development tools. Installing python3-dev pulls Python 3.13 packages
which conflict with the existing Python 3.14 installation, causing file
ownership errors during apk install.

* fix: disable callbacks in vertex fine-tuning tests to prevent Datadog logging interference

The test was failing because Datadog logging was making an HTTP POST request
that was being caught by the mock, causing assert_called_once() to fail.
By disabling callbacks during the test, we prevent Datadog from making any
HTTP calls, allowing the mock to only see the Vertex AI API call.

* fix: ensure test isolation in test_logging_non_streaming_request

Add proper cleanup to restore original litellm.callbacks after test execution.
This prevents test interference when running as part of a larger test suite,
where global state pollution was causing async_log_success_event to be
called multiple times instead of once.

Fixes test failure where the test expected async_log_success_event to be
called once but was being called twice due to callbacks from previous tests
not being cleaned up.
2025-12-05 12:59:35 -08:00
Cesar Garcia
4eb9f8036f Add gpt-5.1-codex-max model pricing and configuration (#17541)
Add support for OpenAI's gpt-5.1-codex-max model, their most intelligent
coding model optimized for long-horizon agentic coding tasks.

- 400k context window, 128k max output tokens
- $1.25/1M input, $10/1M output, $0.125/1M cached input
- Only available via /v1/responses endpoint
- Supports vision, function calling, reasoning, prompt caching
2025-12-05 12:46:14 -08:00
rgshr
1ea7803d39 fix(github_copilot): preserve encrypted_content in reasoning items for multi-turn conversations (#17130)
* fix(github_copilot): preserve encrypted_content in reasoning items for multi-turn conversations

GitHub Copilot uses encrypted_content in reasoning items to maintain conversation
state across turns. The parent class (OpenAIResponsesAPIConfig._handle_reasoning_item)
strips this field when converting to OpenAI's ResponseReasoningItem model, causing
"encrypted content could not be verified" errors on multi-turn requests.

This override preserves encrypted_content while still filtering out status=None
which OpenAI's API rejects.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

* chore: regenerate poetry.lock

* Revert "chore: regenerate poetry.lock"

This reverts commit 8796dc8f96.

---------

Co-authored-by: Claude <noreply@anthropic.com>
2025-12-05 12:42:25 -08:00
yuneng-jiang
62045477ba Merge pull request #16335 from BerriAI/litellm_ui_callback_fix
[Feature] Show all callbacks on UI
2025-12-05 12:35:58 -08:00
Sameer Kankute
b9bcb51f1b Merge pull request #17542 from BerriAI/litellm_pcs_vertex_fix
fix failing vertex tests
2025-12-06 01:15:59 +05:30
Ishaan Jaff
6021f31ebc Fix: Allow null max_budget in budget update endpoint (#17545)
Co-authored-by: Cursor Agent <cursoragent@cursor.com>
Co-authored-by: ishaan <ishaan@berri.ai>
2025-12-05 11:45:23 -08:00
yuneng-jiang
4a0893ca22 Merge remote-tracking branch 'origin' into litellm_ui_callback_fix 2025-12-05 11:43:35 -08:00
Ishaan Jaff
77cce4202e [Bug fix] WatsonX audio transcriptions, don't force content type in request headers (#17546)
* fix watsonx content type

* watsonx content type
2025-12-05 10:56:15 -08:00
Sameer Kankute
64c001255d Add embedding pcs support 2025-12-06 00:20:30 +05:30
Sameer Kankute
e924b6978a Merge pull request #17137 from BerriAI/litellm_gemini3_media_res_fix
Make sure that media resolution is only for gemini 3 model
2025-12-06 00:06:55 +05:30
Sameer Kankute
43914796d6 fix failing vertex tests 2025-12-06 00:04:04 +05:30
Krish Dholakia
85d73403f4 Refactor: Skip PublicAI tests if API key is not set (#17540)
Co-authored-by: Cursor Agent <cursoragent@cursor.com>
2025-12-05 10:22:07 -08:00
Alexsander Hamir
c0d149e0a9 Fix: Lack of None value checks & update publicai_chat_transformation tests (#17539)
* fix: handle none content

* fix: defensive check on none value

* Fix test failures: Azure OCR skip, None content handling, PublicAI JSON config

- Skip aocr/ocr call types in Azure test (they don't use Azure SDK client)
- Handle None content in Responses API transformation (skip message creation)
- Update PublicAI tests to use JSON-based configuration system
- Add None check in PublicAI test fixture to fix type error
2025-12-05 09:43:52 -08:00
Sameer Kankute
a21f1ce21f Merge pull request #17528 from BerriAI/litellm_save_background_checks
Add background health checks to db
2025-12-05 22:25:03 +05:30
Sameer Kankute
558c8f92d1 Merge pull request #17519 from BerriAI/litellm_cursor_integration
Add support for cursor BYOK with its own configuration
2025-12-05 22:23:45 +05:30
Sameer Kankute
49a344ebd9 Merge pull request #17525 from BerriAI/litellm_fix_in_memory_vector_store
Fix vector store configuration synchronization failure
2025-12-05 22:23:17 +05:30
Sameer Kankute
b6867184a8 Merge pull request #17534 from colinlin-stripe/colinlin/opus-budget-thinking
[fix] parse <budget:thinking> blocks for opus 4.5
2025-12-05 22:21:05 +05:30
Sameer Kankute
5f23d94b7e Fixed media resoltion for gemini 3 2025-12-05 22:16:36 +05:30
Alexsander Hamir
96122a8b5a Fix Presidio guardrail test TypeError and license base64 decoding error (#17538)
Fixed two issues:

1. Presidio guardrail test TypeError:
   - Issue: test_presidio_apply_guardrail() was calling apply_guardrail() with
     incorrect arguments (text=, language=) instead of the correct signature
     (inputs=, request_data=, input_type=)
   - Fix: Updated test to use correct method signature:
     - Changed from: apply_guardrail(text=..., language=...)
     - Changed to: apply_guardrail(inputs={'texts': [...]}, request_data={}, input_type='request')
   - Also updated assertions to extract text from response['texts'][0]

2. License verification base64 decoding error:
   - Issue: verify_license_without_api_request() was failing with
     'Invalid base64-encoded string: number of data characters (185) cannot be
     1 more than a multiple of 4' when license keys lacked proper base64 padding
   - Root cause: Base64 strings must be a multiple of 4 characters. Some license
     keys were missing padding characters (=) needed for proper decoding
   - Fix: Added automatic padding before base64 decoding:
     - Calculate padding needed: len(license_key) % 4
     - Add '=' characters to make length a multiple of 4
     - This makes license verification robust to keys with or without padding

Both fixes ensure the code handles edge cases properly and tests use correct APIs.
2025-12-05 08:45:02 -08:00
Colin Lin
0bd144103d [stripe] simplify opus test 2025-12-05 10:56:09 -05:00
Colin Lin
3046b9f163 [stripe] opus budget thinking 2025-12-05 10:56:02 -05:00
Sameer Kankute
3d6b7f0d3d Add background health checks to db 2025-12-05 14:27:37 +05:30
yuneng-jiang
b9b5d638c8 Merge pull request #17524 from BerriAI/litellm_team_user_settings_fix
[Fix] Select in Edit Membership Modal
2025-12-04 23:13:13 -08:00
yuneng-jiang
37bfe65bdd Adding screenshot to debug 2025-12-04 23:05:00 -08:00
yuneng-jiang
50283a00a3 e2e fix 2025-12-04 22:51:52 -08:00
Sameer Kankute
acc0b5fe27 Merge pull request #17362 from BerriAI/litellm_vertex-bge-cherrypick
[Feat] VertexAI - Add BGE Embeddings support
2025-12-05 11:53:42 +05:30
Sameer Kankute
99fd96687f Fix vector store configuration synchronization failure 2025-12-05 11:46:14 +05:30
Krish Dholakia
51cc102c30 fix(unified_guardrail.py): support during_call event type for unified guardrails (#17514)
* fix(unified_guardrail.py): support during_call event type for unified guardrails

allows guardrails overriding apply_guardrails to work 'during_call'

* feat(generic_guardrail_api.py): support new 'tool_calls' field for generic guardrail api

returns the tool calls emitted by the LLM API to the user

* fix(generic_guardrail_api.py): working anthropic /v1/messages tool call response

send llm tool calls to guardrail api when called via `/v1/messages` API

* fix(responses/): run generic_guardrail_api on responses api tool call responses

* fix: fix tests

* test: fix tests

* fix: fix tests
2025-12-04 22:06:13 -08:00
Cesar Garcia
316f7671a9 fix(gemini): handle partial JSON chunks after first valid chunk (#17496)
* fix(gemini): allow JSON accumulation on any chunk, not just first

* test(gemini): add tests for partial JSON chunk handling
2025-12-04 22:01:59 -08:00