* feat(mcp): preserve tool metadata and full CallToolResult in MCP gateway
This PR fixes two issues that prevented ChatGPT from rendering MCP UI widgets
when proxied through LiteLLM:
1. Preserve Tool Metadata in tools/list
- Modified _create_prefixed_tools() to mutate tools in place instead of
reconstructing them, preserving all fields including metadata/_meta
- This ensures ChatGPT can see 'openai/outputTemplate' URIs in tools/list
and will call resources/read to fetch widgets
2. Preserve Full CallToolResult (structuredContent + metadata)
- Changed call_mcp_tool() and _handle_managed_mcp_tool() to return full
CallToolResult objects instead of just content
- Updated error handlers to return CallToolResult with isError flag
- Wrapped local tool results in CallToolResult objects
- This preserves structuredContent and metadata fields needed for widget rendering
Files changed:
- litellm/proxy/_experimental/mcp_server/mcp_server_manager.py
- litellm/proxy/_experimental/mcp_server/server.py
Fixes issues where ChatGPT could not render MCP UI widgets when using
LiteLLM as an MCP gateway.
* feat(mcp): Preserve tool metadata and return full CallToolResult for ChatGPT UI widgets
- Preserve metadata and _meta fields when creating prefixed tools
- Return full CallToolResult instead of just content list
- Ensures ChatGPT can discover and render UI widgets via openai/outputTemplate
- Fixes metadata stripping that prevented widget rendering in ChatGPT
Changes:
- mcp_server_manager.py: Mutate tools in place to preserve all fields including metadata
- server.py: Return CallToolResult with structuredContent and metadata preserved
- Added test to verify metadata preservation
* fix: guard cost calculator when BaseModel lacks _hidden_params
---------
Co-authored-by: Afroz Ahmad <aahmad@Afrozs-MacBook-Pro.local>
Co-authored-by: Afroz Ahmad <aahmad@KNDMCPTMZH3.sephoraus.com>
- Add check for 'global' location to use correct API endpoint
- Global location uses aiplatform.googleapis.com without region prefix
- Regional locations use {region}-aiplatform.googleapis.com format
- Fixes URL construction error when using vertex_location='global'
Resolves issue with gemini-3-pro-image-preview model on global endpoint
* fix(responses): Add image generation support for Responses API
Fixes#16227
## Problem
When using Gemini 2.5 Flash Image with /responses endpoint, image generation
outputs were not being returned correctly. The response contained only text
with empty content instead of the generated images.
## Solution
1. Created new `OutputImageGenerationCall` type for image generation outputs
2. Modified `_extract_message_output_items()` to detect images in completion responses
3. Added `_extract_image_generation_output_items()` to transform images from
completion format (data URL) to responses format (pure base64)
4. Added `_extract_base64_from_data_url()` helper to extract base64 from data URLs
5. Updated `ResponsesAPIResponse.output` type to include `OutputImageGenerationCall`
## Changes
- litellm/types/responses/main.py: Added OutputImageGenerationCall type
- litellm/types/llms/openai.py: Updated ResponsesAPIResponse.output type
- litellm/responses/litellm_completion_transformation/transformation.py:
Added image detection and extraction logic
- tests/test_litellm/responses/litellm_completion_transformation/test_image_generation_output.py:
Added comprehensive unit tests (16 tests, all passing)
## Result
/responses endpoint now correctly returns:
```json
{
"output": [{
"type": "image_generation_call",
"id": "..._img_0",
"status": "completed",
"result": "iVBORw0KGgo..." // Pure base64, no data: prefix
}]
}
```
This matches OpenAI Responses API specification where image generation
outputs have type "image_generation_call" with base64 data in "result" field.
* docs(responses): Add image generation documentation and tests
- Add comprehensive image generation documentation to response_api.md
- Include examples for Gemini (no tools param) and OpenAI (with tools param)
- Document response format and base64 handling
- Add supported models table with provider-specific requirements
- Add unit tests for image generation output transformation
- Test base64 extraction from data URLs
- Test image generation output item creation
- Test status mapping and integration scenarios
- Verify proper transformation from completions to responses format
Related to #16227
* fix(responses): Correct status type for image generation output
- Add _map_finish_reason_to_image_generation_status() helper function
- Fix MyPy type error: OutputImageGenerationCall.status only accepts
['in_progress', 'completed', 'incomplete', 'failed'], not the full
ResponsesAPIStatus union which includes 'cancelled' and 'queued'
Fixes MyPy error in transformation.py:838
When Gemini image generation models return `text_tokens=0` with `image_tokens > 0`,
the cost calculator was assuming no token breakdown existed and treating all
completion tokens as text tokens, resulting in ~10x underestimation of costs.
Changes:
- Fix cost calculation logic to respect token breakdown when image/audio/reasoning
tokens are present, even if text_tokens=0
- Add `output_cost_per_image_token` pricing for gemini-3-pro-image-preview models
- Add test case reproducing the issue
- Add documentation explaining image token pricing
Fixes#17410
This enables Oracle Cloud Infrastructure (OCI) GenAI authentication via the UI
by allowing users to paste their PEM private key content directly into a
multiline textarea field.
Changes:
- Add `textarea` field type to UI component system
- Configure OCI provider with proper credential fields (oci_key, oci_user,
oci_fingerprint, oci_tenancy, oci_region, oci_compartment_id)
- Handle PEM content newline normalization (\\n -> \n, \r\n -> \n)
- Use OCIError for consistent error handling
Previously OCI only supported file-based authentication (oci_key_file), which
doesn't work for UI-based model configuration. This adds support for inline
PEM content via the new oci_key field.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-authored-by: Claude <noreply@anthropic.com>
Fixes#17425
- Add length check for tool_calls in model_response.choices[0].delta
- Prevents empty tool call objects from appearing in streaming responses
- Add regression tests for empty and valid tool_calls scenarios
The previous implementation incorrectly used `thoughtSignature` as the criterion
to detect thinking blocks. However, per Google's docs:
- `thought: true` indicates that a part contains reasoning/thinking content
- `thoughtSignature` is just a token for multi-turn context preservation
(a part can have thoughtSignature without thought:true, e.g., function calls)
This caused functionCall data to leak into reasoning_content when using
Gemini 2.5 Pro with streaming + tools enabled.
Changes:
- _extract_thinking_blocks_from_parts now checks `part.get("thought") is True`
- Extract actual text content instead of json.dumps(part)
- Include signature only when present (optional in Gemini 2.5)
Refs:
- https://ai.google.dev/gemini-api/docs/thinking
- https://ai.google.dev/gemini-api/docs/thought-signatures
- Skip empty/whitespace text before calling Presidio API
- Handle error dict responses gracefully (e.g., {'error': 'No text provided'})
- Add defensive error handling for invalid result items
- Add comprehensive test coverage for empty content scenarios
Fixes crash in tool/function calling where assistant messages have empty content.
Fixes#17552
- Change Prisma include from 'users' to 'members'
- Use LiteLLM_OrganizationTableWithMembers type for membership validation
- Access organization.members instead of organization.users
- Add tests for membership validation
* fix async_log_success_event for _PROXY_DynamicRateLimitHandlerV3
* test_async_log_success_event_increments_by_actual_tokens
* fix redis TTL
* Potential fix for code scanning alert no. 3873: Clear-text logging of sensitive information
Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
---------
Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
* Fix Bedrock guardrail apply_guardrail method and test mocks
Fixed 4 failing tests in the guardrail test suite:
1. BedrockGuardrail.apply_guardrail now returns original texts when guardrail
allows content but doesn't provide output/outputs fields. Previously returned
empty list, causing test_bedrock_apply_guardrail_success to fail.
2. Updated test mocks to use correct Bedrock API response format:
- Changed from 'content' field to 'output' field
- Fixed nested structure from {'text': {'text': '...'}} to {'text': '...'}
- Added missing 'output' field in filter test
3. Fixed endpoint test mocks to return GenericGuardrailAPIInputs format:
- Changed from tuple (List[str], Optional[List[str]]) to dict {'texts': [...]}
- Updated method call assertions to use 'inputs' parameter correctly
All 12 guardrail tests now pass successfully.
* fix: remove python3-dev from Dockerfile.build_from_pip to avoid Python version conflict
The base image cgr.dev/chainguard/python:latest-dev already includes Python 3.14
and its development tools. Installing python3-dev pulls Python 3.13 packages
which conflict with the existing Python 3.14 installation, causing file
ownership errors during apk install.
* fix: disable callbacks in vertex fine-tuning tests to prevent Datadog logging interference
The test was failing because Datadog logging was making an HTTP POST request
that was being caught by the mock, causing assert_called_once() to fail.
By disabling callbacks during the test, we prevent Datadog from making any
HTTP calls, allowing the mock to only see the Vertex AI API call.
* fix: ensure test isolation in test_logging_non_streaming_request
Add proper cleanup to restore original litellm.callbacks after test execution.
This prevents test interference when running as part of a larger test suite,
where global state pollution was causing async_log_success_event to be
called multiple times instead of once.
Fixes test failure where the test expected async_log_success_event to be
called once but was being called twice due to callbacks from previous tests
not being cleaned up.
Add support for OpenAI's gpt-5.1-codex-max model, their most intelligent
coding model optimized for long-horizon agentic coding tasks.
- 400k context window, 128k max output tokens
- $1.25/1M input, $10/1M output, $0.125/1M cached input
- Only available via /v1/responses endpoint
- Supports vision, function calling, reasoning, prompt caching
* fix(github_copilot): preserve encrypted_content in reasoning items for multi-turn conversations
GitHub Copilot uses encrypted_content in reasoning items to maintain conversation
state across turns. The parent class (OpenAIResponsesAPIConfig._handle_reasoning_item)
strips this field when converting to OpenAI's ResponseReasoningItem model, causing
"encrypted content could not be verified" errors on multi-turn requests.
This override preserves encrypted_content while still filtering out status=None
which OpenAI's API rejects.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
* chore: regenerate poetry.lock
* Revert "chore: regenerate poetry.lock"
This reverts commit 8796dc8f96.
---------
Co-authored-by: Claude <noreply@anthropic.com>