mirror of
https://github.com/BerriAI/litellm.git
synced 2025-12-06 11:33:26 +08:00
fix(responses): Add image generation support for Responses API (#16586)
* fix(responses): Add image generation support for Responses API Fixes #16227 ## Problem When using Gemini 2.5 Flash Image with /responses endpoint, image generation outputs were not being returned correctly. The response contained only text with empty content instead of the generated images. ## Solution 1. Created new `OutputImageGenerationCall` type for image generation outputs 2. Modified `_extract_message_output_items()` to detect images in completion responses 3. Added `_extract_image_generation_output_items()` to transform images from completion format (data URL) to responses format (pure base64) 4. Added `_extract_base64_from_data_url()` helper to extract base64 from data URLs 5. Updated `ResponsesAPIResponse.output` type to include `OutputImageGenerationCall` ## Changes - litellm/types/responses/main.py: Added OutputImageGenerationCall type - litellm/types/llms/openai.py: Updated ResponsesAPIResponse.output type - litellm/responses/litellm_completion_transformation/transformation.py: Added image detection and extraction logic - tests/test_litellm/responses/litellm_completion_transformation/test_image_generation_output.py: Added comprehensive unit tests (16 tests, all passing) ## Result /responses endpoint now correctly returns: ```json { "output": [{ "type": "image_generation_call", "id": "..._img_0", "status": "completed", "result": "iVBORw0KGgo..." // Pure base64, no data: prefix }] } ``` This matches OpenAI Responses API specification where image generation outputs have type "image_generation_call" with base64 data in "result" field. * docs(responses): Add image generation documentation and tests - Add comprehensive image generation documentation to response_api.md - Include examples for Gemini (no tools param) and OpenAI (with tools param) - Document response format and base64 handling - Add supported models table with provider-specific requirements - Add unit tests for image generation output transformation - Test base64 extraction from data URLs - Test image generation output item creation - Test status mapping and integration scenarios - Verify proper transformation from completions to responses format Related to #16227 * fix(responses): Correct status type for image generation output - Add _map_finish_reason_to_image_generation_status() helper function - Fix MyPy type error: OutputImageGenerationCall.status only accepts ['in_progress', 'completed', 'incomplete', 'failed'], not the full ResponsesAPIStatus union which includes 'cancelled' and 'queued' Fixes MyPy error in transformation.py:838
This commit is contained in:
@@ -81,6 +81,85 @@ for event in stream:
|
||||
f.write(image_bytes)
|
||||
```
|
||||
|
||||
#### Image Generation (Non-streaming)
|
||||
|
||||
Image generation is supported for models that generate images. Generated images are returned in the `output` array with `type: "image_generation_call"`.
|
||||
|
||||
**Gemini (Google AI Studio):**
|
||||
```python showLineNumbers title="Gemini Image Generation"
|
||||
import litellm
|
||||
import base64
|
||||
|
||||
# Gemini image generation models don't require tools parameter
|
||||
response = litellm.responses(
|
||||
model="gemini/gemini-2.5-flash-image",
|
||||
input="Generate a cute cat playing with yarn"
|
||||
)
|
||||
|
||||
# Access generated images from output
|
||||
for item in response.output:
|
||||
if item.type == "image_generation_call":
|
||||
# item.result contains pure base64 (no data: prefix)
|
||||
image_bytes = base64.b64decode(item.result)
|
||||
|
||||
# Save the image
|
||||
with open(f"generated_{item.id}.png", "wb") as f:
|
||||
f.write(image_bytes)
|
||||
|
||||
print(f"Image saved: generated_{response.output[0].id}.png")
|
||||
```
|
||||
|
||||
**OpenAI:**
|
||||
```python showLineNumbers title="OpenAI Image Generation"
|
||||
import litellm
|
||||
import base64
|
||||
|
||||
# OpenAI models require tools parameter for image generation
|
||||
response = litellm.responses(
|
||||
model="openai/gpt-4o",
|
||||
input="Generate a futuristic city at sunset",
|
||||
tools=[{"type": "image_generation"}]
|
||||
)
|
||||
|
||||
# Access generated images from output
|
||||
for item in response.output:
|
||||
if item.type == "image_generation_call":
|
||||
image_bytes = base64.b64decode(item.result)
|
||||
with open(f"generated_{item.id}.png", "wb") as f:
|
||||
f.write(image_bytes)
|
||||
```
|
||||
|
||||
**Response Format:**
|
||||
|
||||
When image generation is successful, the response contains:
|
||||
|
||||
```json
|
||||
{
|
||||
"id": "resp_abc123",
|
||||
"status": "completed",
|
||||
"output": [
|
||||
{
|
||||
"type": "image_generation_call",
|
||||
"id": "resp_abc123_img_0",
|
||||
"status": "completed",
|
||||
"result": "iVBORw0KGgo..." // Pure base64 string (no data: prefix)
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
**Supported Models:**
|
||||
|
||||
| Provider | Models | Requires `tools` Parameter |
|
||||
|----------|--------|---------------------------|
|
||||
| Google AI Studio | `gemini/gemini-2.5-flash-image` | ❌ No |
|
||||
| Vertex AI | `vertex_ai/gemini-2.5-flash-image-preview` | ❌ No |
|
||||
| OpenAI | `gpt-4o`, `gpt-4o-mini`, `gpt-4.1`, `gpt-4.1-mini`, `gpt-4.1-nano`, `o3` | ✅ Yes |
|
||||
| AWS Bedrock | Stability AI, Amazon Nova Canvas models | Model-specific |
|
||||
| Fal AI | Various image generation models | Check model docs |
|
||||
|
||||
**Note:** The `result` field contains pure base64-encoded image data without the `data:image/png;base64,` prefix. You must decode it with `base64.b64decode()` before saving.
|
||||
|
||||
#### GET a Response
|
||||
```python showLineNumbers title="Get Response by ID"
|
||||
import litellm
|
||||
|
||||
@@ -39,6 +39,7 @@ from litellm.types.responses.main import (
|
||||
GenericResponseOutputItem,
|
||||
GenericResponseOutputItemContentAnnotation,
|
||||
OutputFunctionToolCall,
|
||||
OutputImageGenerationCall,
|
||||
OutputText,
|
||||
)
|
||||
from litellm.types.utils import (
|
||||
@@ -830,9 +831,9 @@ class LiteLLMCompletionResponsesConfig:
|
||||
def _transform_chat_completion_choices_to_responses_output(
|
||||
chat_completion_response: ModelResponse,
|
||||
choices: List[Choices],
|
||||
) -> List[Union[GenericResponseOutputItem, OutputFunctionToolCall]]:
|
||||
) -> List[Union[GenericResponseOutputItem, OutputFunctionToolCall, OutputImageGenerationCall]]:
|
||||
responses_output: List[
|
||||
Union[GenericResponseOutputItem, OutputFunctionToolCall]
|
||||
Union[GenericResponseOutputItem, OutputFunctionToolCall, OutputImageGenerationCall]
|
||||
] = []
|
||||
|
||||
responses_output.extend(
|
||||
@@ -881,28 +882,130 @@ class LiteLLMCompletionResponsesConfig:
|
||||
]
|
||||
return []
|
||||
|
||||
@staticmethod
|
||||
def _extract_image_generation_output_items(
|
||||
chat_completion_response: ModelResponse,
|
||||
choice: Choices,
|
||||
) -> List[OutputImageGenerationCall]:
|
||||
"""
|
||||
Extract image generation outputs from a choice that contains images.
|
||||
|
||||
Transforms message.images from chat completion format:
|
||||
{
|
||||
'image_url': {'url': '...'},
|
||||
'type': 'image_url',
|
||||
'index': 0
|
||||
}
|
||||
|
||||
To Responses API format:
|
||||
{
|
||||
'type': 'image_generation_call',
|
||||
'id': 'img_...',
|
||||
'status': 'completed',
|
||||
'result': 'iVBORw0...' # Pure base64 without data: prefix
|
||||
}
|
||||
"""
|
||||
image_generation_items: List[OutputImageGenerationCall] = []
|
||||
|
||||
images = getattr(choice.message, 'images', [])
|
||||
if not images:
|
||||
return image_generation_items
|
||||
|
||||
for idx, image_item in enumerate(images):
|
||||
# Extract base64 from data URL
|
||||
image_url = image_item.get('image_url', {}).get('url', '')
|
||||
base64_data = LiteLLMCompletionResponsesConfig._extract_base64_from_data_url(image_url)
|
||||
|
||||
if base64_data:
|
||||
image_generation_items.append(
|
||||
OutputImageGenerationCall(
|
||||
type="image_generation_call",
|
||||
id=f"{chat_completion_response.id}_img_{idx}",
|
||||
status=LiteLLMCompletionResponsesConfig._map_finish_reason_to_image_generation_status(
|
||||
choice.finish_reason
|
||||
),
|
||||
result=base64_data,
|
||||
)
|
||||
)
|
||||
|
||||
return image_generation_items
|
||||
|
||||
@staticmethod
|
||||
def _map_finish_reason_to_image_generation_status(
|
||||
finish_reason: Optional[str],
|
||||
) -> Literal["in_progress", "completed", "incomplete", "failed"]:
|
||||
"""
|
||||
Map finish_reason to image generation status.
|
||||
|
||||
Image generation status only supports: in_progress, completed, incomplete, failed
|
||||
(does not support: cancelled, queued like general ResponsesAPIStatus)
|
||||
"""
|
||||
if finish_reason == "stop":
|
||||
return "completed"
|
||||
elif finish_reason == "length":
|
||||
return "incomplete"
|
||||
elif finish_reason in ["content_filter", "error"]:
|
||||
return "failed"
|
||||
else:
|
||||
# Default to completed for other cases
|
||||
return "completed"
|
||||
|
||||
@staticmethod
|
||||
def _extract_base64_from_data_url(data_url: str) -> Optional[str]:
|
||||
"""
|
||||
Extract pure base64 string from a data URL.
|
||||
|
||||
Input: '...'
|
||||
Output: 'iVBORw0KGgoAAAANS...'
|
||||
|
||||
If input is already pure base64 (no prefix), return as-is.
|
||||
"""
|
||||
if not data_url:
|
||||
return None
|
||||
|
||||
# Check if it's a data URL with prefix
|
||||
if data_url.startswith('data:'):
|
||||
# Split by comma to separate prefix from base64 data
|
||||
parts = data_url.split(',', 1)
|
||||
if len(parts) == 2:
|
||||
return parts[1] # Return the base64 part
|
||||
return None
|
||||
else:
|
||||
# Already pure base64
|
||||
return data_url
|
||||
|
||||
@staticmethod
|
||||
def _extract_message_output_items(
|
||||
chat_completion_response: ModelResponse,
|
||||
choices: List[Choices],
|
||||
) -> List[GenericResponseOutputItem]:
|
||||
message_output_items = []
|
||||
) -> List[Union[GenericResponseOutputItem, OutputImageGenerationCall]]:
|
||||
message_output_items: List[Union[GenericResponseOutputItem, OutputImageGenerationCall]] = []
|
||||
for choice in choices:
|
||||
message_output_items.append(
|
||||
GenericResponseOutputItem(
|
||||
type="message",
|
||||
id=chat_completion_response.id,
|
||||
status=LiteLLMCompletionResponsesConfig._map_chat_completion_finish_reason_to_responses_status(
|
||||
choice.finish_reason
|
||||
),
|
||||
role=choice.message.role,
|
||||
content=[
|
||||
LiteLLMCompletionResponsesConfig._transform_chat_message_to_response_output_text(
|
||||
choice.message
|
||||
)
|
||||
],
|
||||
# Check if message has images (image generation)
|
||||
if hasattr(choice.message, 'images') and choice.message.images:
|
||||
# Extract image generation output
|
||||
image_generation_items = LiteLLMCompletionResponsesConfig._extract_image_generation_output_items(
|
||||
chat_completion_response=chat_completion_response,
|
||||
choice=choice,
|
||||
)
|
||||
message_output_items.extend(image_generation_items)
|
||||
else:
|
||||
# Regular message output
|
||||
message_output_items.append(
|
||||
GenericResponseOutputItem(
|
||||
type="message",
|
||||
id=chat_completion_response.id,
|
||||
status=LiteLLMCompletionResponsesConfig._map_chat_completion_finish_reason_to_responses_status(
|
||||
choice.finish_reason
|
||||
),
|
||||
role=choice.message.role,
|
||||
content=[
|
||||
LiteLLMCompletionResponsesConfig._transform_chat_message_to_response_output_text(
|
||||
choice.message
|
||||
)
|
||||
],
|
||||
)
|
||||
)
|
||||
)
|
||||
return message_output_items
|
||||
|
||||
@staticmethod
|
||||
|
||||
@@ -76,6 +76,7 @@ from litellm.types.llms.base import BaseLiteLLMOpenAIResponseObject
|
||||
from litellm.types.responses.main import (
|
||||
GenericResponseOutputItem,
|
||||
OutputFunctionToolCall,
|
||||
OutputImageGenerationCall,
|
||||
)
|
||||
|
||||
FileContent = Union[IO[bytes], bytes, PathLike]
|
||||
@@ -1071,7 +1072,7 @@ class ResponsesAPIResponse(BaseLiteLLMOpenAIResponseObject):
|
||||
object: Optional[str] = None
|
||||
output: Union[
|
||||
List[Union[ResponseOutputItem, Dict]],
|
||||
List[Union[GenericResponseOutputItem, OutputFunctionToolCall]],
|
||||
List[Union[GenericResponseOutputItem, OutputFunctionToolCall, OutputImageGenerationCall]],
|
||||
]
|
||||
parallel_tool_calls: Optional[bool] = None
|
||||
temperature: Optional[float] = None
|
||||
|
||||
@@ -36,6 +36,15 @@ class OutputFunctionToolCall(BaseLiteLLMOpenAIResponseObject):
|
||||
status: Literal["in_progress", "completed", "incomplete"]
|
||||
|
||||
|
||||
class OutputImageGenerationCall(BaseLiteLLMOpenAIResponseObject):
|
||||
"""An image generation call output"""
|
||||
|
||||
type: Literal["image_generation_call"]
|
||||
id: str
|
||||
status: Literal["in_progress", "completed", "incomplete", "failed"]
|
||||
result: Optional[str] # Base64 encoded image data (without data:image prefix)
|
||||
|
||||
|
||||
class GenericResponseOutputItem(BaseLiteLLMOpenAIResponseObject):
|
||||
"""
|
||||
Generic response API output item
|
||||
|
||||
@@ -0,0 +1,172 @@
|
||||
"""
|
||||
Unit tests for Responses API image generation support
|
||||
|
||||
Tests the fix for Issue #16227:
|
||||
https://github.com/BerriAI/litellm/issues/16227
|
||||
|
||||
Verifies that image generation outputs are correctly transformed
|
||||
from /chat/completions format to /responses API format.
|
||||
"""
|
||||
import pytest
|
||||
from unittest.mock import Mock
|
||||
from litellm.responses.litellm_completion_transformation.transformation import (
|
||||
LiteLLMCompletionResponsesConfig,
|
||||
)
|
||||
from litellm.types.responses.main import OutputImageGenerationCall
|
||||
from litellm.types.utils import ModelResponse, Choices, Message
|
||||
|
||||
|
||||
class TestExtractBase64FromDataUrl:
|
||||
"""Tests for _extract_base64_from_data_url helper function"""
|
||||
|
||||
def test_extracts_base64_from_data_url(self):
|
||||
"""Should extract pure base64 from data URL with prefix"""
|
||||
data_url = ""
|
||||
result = LiteLLMCompletionResponsesConfig._extract_base64_from_data_url(
|
||||
data_url
|
||||
)
|
||||
assert result == "iVBORw0KGgoAAAANS"
|
||||
|
||||
def test_returns_base64_as_is_if_no_prefix(self):
|
||||
"""Should return base64 as-is if no data: prefix"""
|
||||
pure_base64 = "iVBORw0KGgoAAAANS"
|
||||
result = LiteLLMCompletionResponsesConfig._extract_base64_from_data_url(
|
||||
pure_base64
|
||||
)
|
||||
assert result == pure_base64
|
||||
|
||||
def test_handles_invalid_inputs(self):
|
||||
"""Should return None for empty/None/malformed inputs"""
|
||||
assert LiteLLMCompletionResponsesConfig._extract_base64_from_data_url("") is None
|
||||
assert LiteLLMCompletionResponsesConfig._extract_base64_from_data_url(None) is None
|
||||
assert LiteLLMCompletionResponsesConfig._extract_base64_from_data_url("data:image/png;base64") is None
|
||||
|
||||
|
||||
class TestExtractImageGenerationOutputItems:
|
||||
"""Tests for _extract_image_generation_output_items function"""
|
||||
|
||||
def test_extracts_images_correctly(self):
|
||||
"""Should extract OutputImageGenerationCall objects from images"""
|
||||
mock_response = Mock(spec=ModelResponse)
|
||||
mock_response.id = "test_123"
|
||||
|
||||
mock_message = Mock(spec=Message)
|
||||
mock_message.images = [
|
||||
{"image_url": {"url": ""}, "type": "image_url", "index": 0},
|
||||
{"image_url": {"url": ""}, "type": "image_url", "index": 1},
|
||||
]
|
||||
|
||||
mock_choice = Mock(spec=Choices)
|
||||
mock_choice.message = mock_message
|
||||
mock_choice.finish_reason = "stop"
|
||||
|
||||
result = LiteLLMCompletionResponsesConfig._extract_image_generation_output_items(
|
||||
chat_completion_response=mock_response,
|
||||
choice=mock_choice,
|
||||
)
|
||||
|
||||
assert len(result) == 2
|
||||
assert result[0].type == "image_generation_call"
|
||||
assert result[0].result == "IMG1"
|
||||
assert result[1].result == "IMG2"
|
||||
assert result[0].id == "test_123_img_0"
|
||||
assert result[1].id == "test_123_img_1"
|
||||
assert result[0].status == "completed"
|
||||
|
||||
def test_returns_empty_for_no_images(self):
|
||||
"""Should return empty list if no images"""
|
||||
mock_response = Mock(spec=ModelResponse)
|
||||
mock_message = Mock(spec=Message)
|
||||
mock_message.images = []
|
||||
|
||||
mock_choice = Mock(spec=Choices)
|
||||
mock_choice.message = mock_message
|
||||
mock_choice.finish_reason = "stop"
|
||||
|
||||
result = LiteLLMCompletionResponsesConfig._extract_image_generation_output_items(
|
||||
chat_completion_response=mock_response,
|
||||
choice=mock_choice,
|
||||
)
|
||||
|
||||
assert result == []
|
||||
|
||||
def test_maps_finish_reason_to_status(self):
|
||||
"""Should correctly map finish_reason to status"""
|
||||
mock_response = Mock(spec=ModelResponse)
|
||||
mock_response.id = "test_finish"
|
||||
|
||||
mock_message = Mock(spec=Message)
|
||||
mock_message.images = [
|
||||
{"image_url": {"url": ""}, "type": "image_url", "index": 0}
|
||||
]
|
||||
|
||||
mock_choice = Mock(spec=Choices)
|
||||
mock_choice.message = mock_message
|
||||
mock_choice.finish_reason = "length"
|
||||
|
||||
result = LiteLLMCompletionResponsesConfig._extract_image_generation_output_items(
|
||||
chat_completion_response=mock_response,
|
||||
choice=mock_choice,
|
||||
)
|
||||
|
||||
assert result[0].status == "incomplete"
|
||||
|
||||
|
||||
class TestExtractMessageOutputItemsIntegration:
|
||||
"""Integration tests for _extract_message_output_items with images"""
|
||||
|
||||
def test_detects_images_and_creates_image_generation_call(self):
|
||||
"""Should detect images in message and create image_generation_call output"""
|
||||
mock_response = Mock(spec=ModelResponse)
|
||||
mock_response.id = "integration_test_123"
|
||||
|
||||
mock_message = Mock(spec=Message)
|
||||
mock_message.images = [
|
||||
{
|
||||
"image_url": {"url": "_TEST"},
|
||||
"type": "image_url",
|
||||
"index": 0,
|
||||
}
|
||||
]
|
||||
mock_message.role = "assistant"
|
||||
mock_message.content = "Here's your image!"
|
||||
|
||||
mock_choice = Mock(spec=Choices)
|
||||
mock_choice.message = mock_message
|
||||
mock_choice.finish_reason = "stop"
|
||||
|
||||
result = LiteLLMCompletionResponsesConfig._extract_message_output_items(
|
||||
chat_completion_response=mock_response,
|
||||
choices=[mock_choice],
|
||||
)
|
||||
|
||||
# Should return image_generation_call, NOT regular message
|
||||
assert len(result) == 1
|
||||
assert isinstance(result[0], OutputImageGenerationCall)
|
||||
assert result[0].type == "image_generation_call"
|
||||
assert result[0].result == "INTEGRATION_TEST"
|
||||
|
||||
def test_creates_regular_message_when_no_images(self):
|
||||
"""Should create regular GenericResponseOutputItem when no images"""
|
||||
from litellm.types.responses.main import GenericResponseOutputItem
|
||||
|
||||
mock_response = Mock(spec=ModelResponse)
|
||||
mock_response.id = "no_images_123"
|
||||
|
||||
mock_message = Mock(spec=Message)
|
||||
# No images attribute or empty
|
||||
mock_message.role = "assistant"
|
||||
mock_message.content = "Just text, no images"
|
||||
|
||||
mock_choice = Mock(spec=Choices)
|
||||
mock_choice.message = mock_message
|
||||
mock_choice.finish_reason = "stop"
|
||||
|
||||
result = LiteLLMCompletionResponsesConfig._extract_message_output_items(
|
||||
chat_completion_response=mock_response,
|
||||
choices=[mock_choice],
|
||||
)
|
||||
|
||||
assert len(result) == 1
|
||||
assert isinstance(result[0], GenericResponseOutputItem)
|
||||
assert result[0].type == "message"
|
||||
Reference in New Issue
Block a user