28312 Commits

Author SHA1 Message Date
Ishaan Jaffer
8b499adba6 Revert "Add license metadata to health/readiness endpoint. (#15997)"
This reverts commit d89990e0c5.
2025-12-05 19:31:30 -08:00
YutaSaito
12850969fb Merge pull request #17570 from BerriAI/litellm_fix_mcp_test 2025-12-06 11:24:35 +09:00
Yuta Saito
21a18128ec fix: mcp test 2025-12-06 10:54:22 +09:00
Ishaan Jaffer
ce4b5daf70 ollama fix 2025-12-05 17:25:55 -08:00
Ishaan Jaffer
f0a93fb9b9 test_string_cost_values_edge_cases 2025-12-05 17:25:55 -08:00
yuneng-jiang
fdb49c97f2 Merge pull request #17562 from BerriAI/litellm_ui_compare_images
[Feature] Support Images in Compare UI
2025-12-05 17:24:05 -08:00
Ishaan Jaffer
96e4c9e078 fix _update_metadata_with_tags_in_header 2025-12-05 17:20:14 -08:00
dependabot[bot]
83291d394e build(deps): bump mdast-util-to-hast in /ui/litellm-dashboard (#17444)
Bumps [mdast-util-to-hast](https://github.com/syntax-tree/mdast-util-to-hast) from 13.2.0 to 13.2.1.
- [Release notes](https://github.com/syntax-tree/mdast-util-to-hast/releases)
- [Commits](https://github.com/syntax-tree/mdast-util-to-hast/compare/13.2.0...13.2.1)

---
updated-dependencies:
- dependency-name: mdast-util-to-hast
  dependency-version: 13.2.1
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-12-05 17:12:51 -08:00
Ishaan Jaffer
eaa7e61f57 test fixes 2025-12-05 17:12:01 -08:00
Ishaan Jaffer
58f8be60a1 fix REDIS_DAILY_END_USER_SPEND_UPDATE_QUEUE 2025-12-05 17:07:09 -08:00
yuneng-jiang
82376d8b76 Merge pull request #17564 from BerriAI/litellm_end_user_spend_redis_test
[Fix] CI/CD - Adding end user and org to service types
2025-12-05 16:44:37 -08:00
yuneng-jiang
86baa9e5fb Adding end user and org to service types 2025-12-05 16:38:09 -08:00
yuneng-jiang
8e74a3b692 Merge pull request #17563 from BerriAI/litellm_v2_login_test_fix
[Fix] Mock server_root_path for v2/login test
2025-12-05 16:23:51 -08:00
yuneng-jiang
df8b0e8389 Merge pull request #17506 from BerriAI/litellm_ui_customer_usage
[Feature] Customer Usage UI
2025-12-05 16:18:42 -08:00
YutaSaito
b5133c4c7d Feat/mcp preserve tool metadata calltoolresult (#17561)
* feat(mcp): preserve tool metadata and full CallToolResult in MCP gateway

This PR fixes two issues that prevented ChatGPT from rendering MCP UI widgets
when proxied through LiteLLM:

1. Preserve Tool Metadata in tools/list
   - Modified _create_prefixed_tools() to mutate tools in place instead of
     reconstructing them, preserving all fields including metadata/_meta
   - This ensures ChatGPT can see 'openai/outputTemplate' URIs in tools/list
     and will call resources/read to fetch widgets

2. Preserve Full CallToolResult (structuredContent + metadata)
   - Changed call_mcp_tool() and _handle_managed_mcp_tool() to return full
     CallToolResult objects instead of just content
   - Updated error handlers to return CallToolResult with isError flag
   - Wrapped local tool results in CallToolResult objects
   - This preserves structuredContent and metadata fields needed for widget rendering

Files changed:
- litellm/proxy/_experimental/mcp_server/mcp_server_manager.py
- litellm/proxy/_experimental/mcp_server/server.py

Fixes issues where ChatGPT could not render MCP UI widgets when using
LiteLLM as an MCP gateway.

* feat(mcp): Preserve tool metadata and return full CallToolResult for ChatGPT UI widgets

- Preserve metadata and _meta fields when creating prefixed tools
- Return full CallToolResult instead of just content list
- Ensures ChatGPT can discover and render UI widgets via openai/outputTemplate
- Fixes metadata stripping that prevented widget rendering in ChatGPT

Changes:
- mcp_server_manager.py: Mutate tools in place to preserve all fields including metadata
- server.py: Return CallToolResult with structuredContent and metadata preserved
- Added test to verify metadata preservation

* fix: guard cost calculator when BaseModel lacks _hidden_params

---------

Co-authored-by: Afroz Ahmad <aahmad@Afrozs-MacBook-Pro.local>
Co-authored-by: Afroz Ahmad <aahmad@KNDMCPTMZH3.sephoraus.com>
2025-12-05 16:15:22 -08:00
yuneng-jiang
5afd03fef3 Mock server_root_path for test 2025-12-05 16:13:56 -08:00
Xingjian Li
342723eb12 fix: Handle global location for Vertex AI Gemini image generation (#17255)
- Add check for 'global' location to use correct API endpoint
- Global location uses aiplatform.googleapis.com without region prefix
- Regional locations use {region}-aiplatform.googleapis.com format
- Fixes URL construction error when using vertex_location='global'

Resolves issue with gemini-3-pro-image-preview model on global endpoint
2025-12-05 15:56:38 -08:00
Cesar Garcia
87f94172a9 fix(responses): Add image generation support for Responses API (#16586)
* fix(responses): Add image generation support for Responses API

Fixes #16227

## Problem
When using Gemini 2.5 Flash Image with /responses endpoint, image generation
outputs were not being returned correctly. The response contained only text
with empty content instead of the generated images.

## Solution
1. Created new `OutputImageGenerationCall` type for image generation outputs
2. Modified `_extract_message_output_items()` to detect images in completion responses
3. Added `_extract_image_generation_output_items()` to transform images from
   completion format (data URL) to responses format (pure base64)
4. Added `_extract_base64_from_data_url()` helper to extract base64 from data URLs
5. Updated `ResponsesAPIResponse.output` type to include `OutputImageGenerationCall`

## Changes
- litellm/types/responses/main.py: Added OutputImageGenerationCall type
- litellm/types/llms/openai.py: Updated ResponsesAPIResponse.output type
- litellm/responses/litellm_completion_transformation/transformation.py:
  Added image detection and extraction logic
- tests/test_litellm/responses/litellm_completion_transformation/test_image_generation_output.py:
  Added comprehensive unit tests (16 tests, all passing)

## Result
/responses endpoint now correctly returns:
```json
{
  "output": [{
    "type": "image_generation_call",
    "id": "..._img_0",
    "status": "completed",
    "result": "iVBORw0KGgo..."  // Pure base64, no data: prefix
  }]
}
```

This matches OpenAI Responses API specification where image generation
outputs have type "image_generation_call" with base64 data in "result" field.

* docs(responses): Add image generation documentation and tests

- Add comprehensive image generation documentation to response_api.md
  - Include examples for Gemini (no tools param) and OpenAI (with tools param)
  - Document response format and base64 handling
  - Add supported models table with provider-specific requirements

- Add unit tests for image generation output transformation
  - Test base64 extraction from data URLs
  - Test image generation output item creation
  - Test status mapping and integration scenarios
  - Verify proper transformation from completions to responses format

Related to #16227

* fix(responses): Correct status type for image generation output

- Add _map_finish_reason_to_image_generation_status() helper function
- Fix MyPy type error: OutputImageGenerationCall.status only accepts
  ['in_progress', 'completed', 'incomplete', 'failed'], not the full
  ResponsesAPIStatus union which includes 'cancelled' and 'queued'

Fixes MyPy error in transformation.py:838
2025-12-05 15:56:26 -08:00
Cesar Garcia
829b06f53f Fix: Gemini image_tokens incorrectly treated as text tokens in cost calculation (#17554)
When Gemini image generation models return `text_tokens=0` with `image_tokens > 0`,
the cost calculator was assuming no token breakdown existed and treating all
completion tokens as text tokens, resulting in ~10x underestimation of costs.

Changes:
- Fix cost calculation logic to respect token breakdown when image/audio/reasoning
  tokens are present, even if text_tokens=0
- Add `output_cost_per_image_token` pricing for gemini-3-pro-image-preview models
- Add test case reproducing the issue
- Add documentation explaining image token pricing

Fixes #17410
2025-12-05 15:55:38 -08:00
Javier de la Torre
2905feb889 feat(oci): Add textarea field type for OCI private key input (#17159)
This enables Oracle Cloud Infrastructure (OCI) GenAI authentication via the UI
by allowing users to paste their PEM private key content directly into a
multiline textarea field.

Changes:
- Add `textarea` field type to UI component system
- Configure OCI provider with proper credential fields (oci_key, oci_user,
  oci_fingerprint, oci_tenancy, oci_region, oci_compartment_id)
- Handle PEM content newline normalization (\\n -> \n, \r\n -> \n)
- Use OCIError for consistent error handling

Previously OCI only supported file-based authentication (oci_key_file), which
doesn't work for UI-based model configuration. This adds support for inline
PEM content via the new oci_key field.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: Claude <noreply@anthropic.com>
2025-12-05 15:53:54 -08:00
Devaj Mody
e5f7a0b0a5 fix(streaming): add length validation for empty tool_calls in delta (#17523)
Fixes #17425

  - Add length check for tool_calls in model_response.choices[0].delta
  - Prevents empty tool call objects from appearing in streaming responses
  - Add regression tests for empty and valid tool_calls scenarios
2025-12-05 15:53:49 -08:00
Yuichiro Utsumi
d18e489872 fix(docs): remove source .env (#17466)
Remove `source .env` since `docker compose` automatically loads
the `.env` file.

Signed-off-by: utsumi.yuichiro <utsumi.yuichiro@fujitsu.com>
2025-12-05 15:53:05 -08:00
Chris Lapa
9c5f2ea827 Fixes #13652 - auth not working with ollama.com (#17191)
* ollama: adds missing auth headers if set

* ollama: sets ollama as openai compatible provider.

* ollama: adds tests for ollama auth
2025-12-05 15:52:54 -08:00
yuneng-jiang
852a1fee89 Support images in compare UI 2025-12-05 15:51:56 -08:00
Cesar Garcia
2cf41d63a6 fix(gemini): use thought:true instead of thoughtSignature to detect thinking blocks (#17266)
The previous implementation incorrectly used `thoughtSignature` as the criterion
to detect thinking blocks. However, per Google's docs:
- `thought: true` indicates that a part contains reasoning/thinking content
- `thoughtSignature` is just a token for multi-turn context preservation
  (a part can have thoughtSignature without thought:true, e.g., function calls)

This caused functionCall data to leak into reasoning_content when using
Gemini 2.5 Pro with streaming + tools enabled.

Changes:
- _extract_thinking_blocks_from_parts now checks `part.get("thought") is True`
- Extract actual text content instead of json.dumps(part)
- Include signature only when present (optional in Gemini 2.5)

Refs:
- https://ai.google.dev/gemini-api/docs/thinking
- https://ai.google.dev/gemini-api/docs/thought-signatures
2025-12-05 15:51:51 -08:00
Irfan Sofyana Putra
bffc118170 fix bedrock qwen anthropic beta (#17467) 2025-12-05 15:47:34 -08:00
Ishaan Jaffer
e519462efa fix MYPY linting 2025-12-05 15:46:26 -08:00
Ishaan Jaffer
ae065525ea fix ZAI 2025-12-05 15:46:26 -08:00
Dominic Fallows
2ffe8ee204 fix(presidio): handle empty content and error dict responses (#17489)
- Skip empty/whitespace text before calling Presidio API
- Handle error dict responses gracefully (e.g., {'error': 'No text provided'})
- Add defensive error handling for invalid result items
- Add comprehensive test coverage for empty content scenarios

Fixes crash in tool/function calling where assistant messages have empty content.
2025-12-05 15:45:19 -08:00
dependabot[bot]
5fb7530d8c build(deps): bump jws from 3.2.2 to 3.2.3 in /ui/litellm-dashboard (#17494)
Bumps [jws](https://github.com/brianloveswords/node-jws) from 3.2.2 to 3.2.3.
- [Release notes](https://github.com/brianloveswords/node-jws/releases)
- [Changelog](https://github.com/auth0/node-jws/blob/master/CHANGELOG.md)
- [Commits](https://github.com/brianloveswords/node-jws/compare/v3.2.2...v3.2.3)

---
updated-dependencies:
- dependency-name: jws
  dependency-version: 3.2.3
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-12-05 15:44:15 -08:00
Ishaan Jaff
f02df3035a [Feat] Allow using dynamic rate limit/priority reservation on teams (#17061)
* use helper to get key/team priority

* test_team_metadata_priority

* docs team priority
2025-12-05 15:42:27 -08:00
yuneng-jiang
cb18af542e Merge pull request #17498 from BerriAI/litellm_customer_usage_backend
[Feature] Customer (end user) Usage
2025-12-05 15:31:08 -08:00
Devaj Mody
6ff7ed14f6 fix(team): use organization.members instead of deprecated organization.users (#17557)
Fixes #17552

  - Change Prisma include from 'users' to 'members'
  - Use LiteLLM_OrganizationTableWithMembers type for membership validation
  - Access organization.members instead of organization.users
  - Add tests for membership validation
2025-12-05 15:30:59 -08:00
Cesar Garcia
7259de2f12 feat: add Mistral Large 3 model support (#17547)
Add Mistral Large 3 (675B MoE) to model catalog for both providers:
- mistral/mistral-large-3
- azure_ai/mistral-large-3

Specs:
- 256k context window
- $0.50/1M input, $1.50/1M output
- Supports vision (multimodal)
- Supports function calling

Closes #17527
2025-12-05 15:26:20 -08:00
Ishaan Jaff
769f3cc310 [Bug fix] Secret Managers Integration - Make email and secret manager operations independent in key management hooks (#17551)
* TestKeyManagementEventHooksIndependentOperations

* KeyManagementEventHooks - make ops independant
2025-12-05 15:26:00 -08:00
Ishaan Jaff
a78f40f75a [Fixes] Dynamic Rate Limiter - Dynamic rate limiting token count increases/decreases by 1 instead of actual count + Redis TTL (#17558)
* fix async_log_success_event for _PROXY_DynamicRateLimitHandlerV3

* test_async_log_success_event_increments_by_actual_tokens

* fix redis TTL

* Potential fix for code scanning alert no. 3873: Clear-text logging of sensitive information

Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>

---------

Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
2025-12-05 15:25:45 -08:00
YutaSaito
4d39a1a18f Fix: MLflow streaming spans for Anthropic passthrough (#17288)
* Fix: MLflow streaming spans for Anthropic passthrough

* fix: Revert "Handle MLflow chunk events without delta"
2025-12-05 14:59:36 -08:00
Alexsander Hamir
655e04f16c Fix: apply_guardrail method and improve test isolation (#17555)
* Fix Bedrock guardrail apply_guardrail method and test mocks

Fixed 4 failing tests in the guardrail test suite:

1. BedrockGuardrail.apply_guardrail now returns original texts when guardrail
   allows content but doesn't provide output/outputs fields. Previously returned
   empty list, causing test_bedrock_apply_guardrail_success to fail.

2. Updated test mocks to use correct Bedrock API response format:
   - Changed from 'content' field to 'output' field
   - Fixed nested structure from {'text': {'text': '...'}} to {'text': '...'}
   - Added missing 'output' field in filter test

3. Fixed endpoint test mocks to return GenericGuardrailAPIInputs format:
   - Changed from tuple (List[str], Optional[List[str]]) to dict {'texts': [...]}
   - Updated method call assertions to use 'inputs' parameter correctly

All 12 guardrail tests now pass successfully.

* fix: remove python3-dev from Dockerfile.build_from_pip to avoid Python version conflict

The base image cgr.dev/chainguard/python:latest-dev already includes Python 3.14
and its development tools. Installing python3-dev pulls Python 3.13 packages
which conflict with the existing Python 3.14 installation, causing file
ownership errors during apk install.

* fix: disable callbacks in vertex fine-tuning tests to prevent Datadog logging interference

The test was failing because Datadog logging was making an HTTP POST request
that was being caught by the mock, causing assert_called_once() to fail.
By disabling callbacks during the test, we prevent Datadog from making any
HTTP calls, allowing the mock to only see the Vertex AI API call.

* fix: ensure test isolation in test_logging_non_streaming_request

Add proper cleanup to restore original litellm.callbacks after test execution.
This prevents test interference when running as part of a larger test suite,
where global state pollution was causing async_log_success_event to be
called multiple times instead of once.

Fixes test failure where the test expected async_log_success_event to be
called once but was being called twice due to callbacks from previous tests
not being cleaned up.
2025-12-05 12:59:35 -08:00
Cesar Garcia
4eb9f8036f Add gpt-5.1-codex-max model pricing and configuration (#17541)
Add support for OpenAI's gpt-5.1-codex-max model, their most intelligent
coding model optimized for long-horizon agentic coding tasks.

- 400k context window, 128k max output tokens
- $1.25/1M input, $10/1M output, $0.125/1M cached input
- Only available via /v1/responses endpoint
- Supports vision, function calling, reasoning, prompt caching
2025-12-05 12:46:14 -08:00
rgshr
1ea7803d39 fix(github_copilot): preserve encrypted_content in reasoning items for multi-turn conversations (#17130)
* fix(github_copilot): preserve encrypted_content in reasoning items for multi-turn conversations

GitHub Copilot uses encrypted_content in reasoning items to maintain conversation
state across turns. The parent class (OpenAIResponsesAPIConfig._handle_reasoning_item)
strips this field when converting to OpenAI's ResponseReasoningItem model, causing
"encrypted content could not be verified" errors on multi-turn requests.

This override preserves encrypted_content while still filtering out status=None
which OpenAI's API rejects.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

* chore: regenerate poetry.lock

* Revert "chore: regenerate poetry.lock"

This reverts commit 8796dc8f96.

---------

Co-authored-by: Claude <noreply@anthropic.com>
2025-12-05 12:42:25 -08:00
yuneng-jiang
62045477ba Merge pull request #16335 from BerriAI/litellm_ui_callback_fix
[Feature] Show all callbacks on UI
2025-12-05 12:35:58 -08:00
Sameer Kankute
b9bcb51f1b Merge pull request #17542 from BerriAI/litellm_pcs_vertex_fix
fix failing vertex tests
2025-12-06 01:15:59 +05:30
Ishaan Jaff
6021f31ebc Fix: Allow null max_budget in budget update endpoint (#17545)
Co-authored-by: Cursor Agent <cursoragent@cursor.com>
Co-authored-by: ishaan <ishaan@berri.ai>
2025-12-05 11:45:23 -08:00
yuneng-jiang
4a0893ca22 Merge remote-tracking branch 'origin' into litellm_ui_callback_fix 2025-12-05 11:43:35 -08:00
yuneng-jiang
2b0e83b79d Merge pull request #17549 from BerriAI/litellm_yuneng_temp
[Infra] Bump LiteLLM Enterprise Version
2025-12-05 11:24:04 -08:00
yuneng-jiang
6a60c950fe bumping enterprise build 2025-12-05 11:14:00 -08:00
yuneng-jiang
a750f5ca69 bump: version 0.1.22 → 0.1.23 2025-12-05 11:08:04 -08:00
Ishaan Jaff
77cce4202e [Bug fix] WatsonX audio transcriptions, don't force content type in request headers (#17546)
* fix watsonx content type

* watsonx content type
2025-12-05 10:56:15 -08:00
Sameer Kankute
64c001255d Add embedding pcs support 2025-12-06 00:20:30 +05:30
Sameer Kankute
e924b6978a Merge pull request #17137 from BerriAI/litellm_gemini3_media_res_fix
Make sure that media resolution is only for gemini 3 model
2025-12-06 00:06:55 +05:30