fix(responses): Add image generation support for Responses API (#16586)

* fix(responses): Add image generation support for Responses API Fixes #16227 ## Problem When using Gemini 2.5 Flash Image with /responses endpoint, image generation outputs were not being returned correctly. The response contained only text with empty content instead of the generated images. ## Solution 1. Created new `OutputImageGenerationCall` type for image generation outputs 2. Modified `_extract_message_output_items()` to detect images in completion responses 3. Added `_extract_image_generation_output_items()` to transform images from completion format (data URL) to responses format (pure base64) 4. Added `_extract_base64_from_data_url()` helper to extract base64 from data URLs 5. Updated `ResponsesAPIResponse.output` type to include `OutputImageGenerationCall` ## Changes - litellm/types/responses/main.py: Added OutputImageGenerationCall type - litellm/types/llms/openai.py: Updated ResponsesAPIResponse.output type - litellm/responses/litellm_completion_transformation/transformation.py: Added image detection and extraction logic - tests/test_litellm/responses/litellm_completion_transformation/test_image_generation_output.py: Added comprehensive unit tests (16 tests, all passing) ## Result /responses endpoint now correctly returns: ```json { "output": [{ "type": "image_generation_call", "id": "..._img_0", "status": "completed", "result": "iVBORw0KGgo..." // Pure base64, no data: prefix }] } ``` This matches OpenAI Responses API specification where image generation outputs have type "image_generation_call" with base64 data in "result" field. * docs(responses): Add image generation documentation and tests - Add comprehensive image generation documentation to response_api.md - Include examples for Gemini (no tools param) and OpenAI (with tools param) - Document response format and base64 handling - Add supported models table with provider-specific requirements - Add unit tests for image generation output transformation - Test base64 extraction from data URLs - Test image generation output item creation - Test status mapping and integration scenarios - Verify proper transformation from completions to responses format Related to #16227 * fix(responses): Correct status type for image generation output - Add _map_finish_reason_to_image_generation_status() helper function - Fix MyPy type error: OutputImageGenerationCall.status only accepts ['in_progress', 'completed', 'incomplete', 'failed'], not the full ResponsesAPIStatus union which includes 'cancelled' and 'queued' Fixes MyPy error in transformation.py:838
2025-12-06 11:33:26 +08:00 · 2025-12-05 20:56:26 -03:00
parent 829b06f53f
commit 87f94172a9
5 changed files with 383 additions and 19 deletions
--- a/docs/my-website/docs/response_api.md
+++ b/docs/my-website/docs/response_api.md
@@ -81,6 +81,85 @@ for event in stream:
            f.write(image_bytes)
 ```

+#### Image Generation (Non-streaming)
+
+Image generation is supported for models that generate images. Generated images are returned in the `output` array with `type: "image_generation_call"`.
+
+**Gemini (Google AI Studio):**
+```python showLineNumbers title="Gemini Image Generation"
+import litellm
+import base64
+
+# Gemini image generation models don't require tools parameter
+response = litellm.responses(
+    model="gemini/gemini-2.5-flash-image",
+    input="Generate a cute cat playing with yarn"
+)
+
+# Access generated images from output
+for item in response.output:
+    if item.type == "image_generation_call":
+        # item.result contains pure base64 (no data: prefix)
+        image_bytes = base64.b64decode(item.result)
+
+        # Save the image
+        with open(f"generated_{item.id}.png", "wb") as f:
+            f.write(image_bytes)
+
+print(f"Image saved: generated_{response.output[0].id}.png")
+```
+
+**OpenAI:**
+```python showLineNumbers title="OpenAI Image Generation"
+import litellm
+import base64
+
+# OpenAI models require tools parameter for image generation
+response = litellm.responses(
+    model="openai/gpt-4o",
+    input="Generate a futuristic city at sunset",
+    tools=[{"type": "image_generation"}]
+)
+
+# Access generated images from output
+for item in response.output:
+    if item.type == "image_generation_call":
+        image_bytes = base64.b64decode(item.result)
+        with open(f"generated_{item.id}.png", "wb") as f:
+            f.write(image_bytes)
+```
+
+**Response Format:**
+
+When image generation is successful, the response contains:
+
+```json
+{
+  "id": "resp_abc123",
+  "status": "completed",
+  "output": [
+    {
+      "type": "image_generation_call",
+      "id": "resp_abc123_img_0",
+      "status": "completed",
+      "result": "iVBORw0KGgo..."  // Pure base64 string (no data: prefix)
+    }
+  ]
+}
+```
+
+**Supported Models:**
+
+| Provider | Models | Requires `tools` Parameter |
+|----------|--------|---------------------------|
+| Google AI Studio | `gemini/gemini-2.5-flash-image` | ❌ No |
+| Vertex AI | `vertex_ai/gemini-2.5-flash-image-preview` | ❌ No |
+| OpenAI | `gpt-4o`, `gpt-4o-mini`, `gpt-4.1`, `gpt-4.1-mini`, `gpt-4.1-nano`, `o3` | ✅ Yes |
+| AWS Bedrock | Stability AI, Amazon Nova Canvas models | Model-specific |
+| Fal AI | Various image generation models | Check model docs |
+
+**Note:** The `result` field contains pure base64-encoded image data without the `data:image/png;base64,` prefix. You must decode it with `base64.b64decode()` before saving.
+
 #### GET a Response
 ```python showLineNumbers title="Get Response by ID"
 import litellm
--- a/litellm/responses/litellm_completion_transformation/transformation.py
+++ b/litellm/responses/litellm_completion_transformation/transformation.py
@@ -39,6 +39,7 @@ from litellm.types.responses.main import (
    GenericResponseOutputItem,
    GenericResponseOutputItemContentAnnotation,
    OutputFunctionToolCall,
+    OutputImageGenerationCall,
    OutputText,
 )
 from litellm.types.utils import (
@@ -830,9 +831,9 @@ class LiteLLMCompletionResponsesConfig:
    def _transform_chat_completion_choices_to_responses_output(
        chat_completion_response: ModelResponse,
        choices: List[Choices],
-    ) -> List[Union[GenericResponseOutputItem, OutputFunctionToolCall]]:
+    ) -> List[Union[GenericResponseOutputItem, OutputFunctionToolCall, OutputImageGenerationCall]]:
        responses_output: List[
-            Union[GenericResponseOutputItem, OutputFunctionToolCall]
+            Union[GenericResponseOutputItem, OutputFunctionToolCall, OutputImageGenerationCall]
        ] = []

        responses_output.extend(
@@ -881,28 +882,130 @@ class LiteLLMCompletionResponsesConfig:
                    ]
        return []

+    @staticmethod
+    def _extract_image_generation_output_items(
+        chat_completion_response: ModelResponse,
+        choice: Choices,
+    ) -> List[OutputImageGenerationCall]:
+        """
+        Extract image generation outputs from a choice that contains images.
+
+        Transforms message.images from chat completion format:
+        {
+            'image_url': {'url': 'data:image/png;base64,iVBORw0...'},
+            'type': 'image_url',
+            'index': 0
+        }
+
+        To Responses API format:
+        {
+            'type': 'image_generation_call',
+            'id': 'img_...',
+            'status': 'completed',
+            'result': 'iVBORw0...'  # Pure base64 without data: prefix
+        }
+        """
+        image_generation_items: List[OutputImageGenerationCall] = []
+
+        images = getattr(choice.message, 'images', [])
+        if not images:
+            return image_generation_items
+
+        for idx, image_item in enumerate(images):
+            # Extract base64 from data URL
+            image_url = image_item.get('image_url', {}).get('url', '')
+            base64_data = LiteLLMCompletionResponsesConfig._extract_base64_from_data_url(image_url)
+
+            if base64_data:
+                image_generation_items.append(
+                    OutputImageGenerationCall(
+                        type="image_generation_call",
+                        id=f"{chat_completion_response.id}_img_{idx}",
+                        status=LiteLLMCompletionResponsesConfig._map_finish_reason_to_image_generation_status(
+                            choice.finish_reason
+                        ),
+                        result=base64_data,
+                    )
+                )
+
+        return image_generation_items
+
+    @staticmethod
+    def _map_finish_reason_to_image_generation_status(
+        finish_reason: Optional[str],
+    ) -> Literal["in_progress", "completed", "incomplete", "failed"]:
+        """
+        Map finish_reason to image generation status.
+
+        Image generation status only supports: in_progress, completed, incomplete, failed
+        (does not support: cancelled, queued like general ResponsesAPIStatus)
+        """
+        if finish_reason == "stop":
+            return "completed"
+        elif finish_reason == "length":
+            return "incomplete"
+        elif finish_reason in ["content_filter", "error"]:
+            return "failed"
+        else:
+            # Default to completed for other cases
+            return "completed"
+
+    @staticmethod
+    def _extract_base64_from_data_url(data_url: str) -> Optional[str]:
+        """
+        Extract pure base64 string from a data URL.
+
+        Input: 'data:image/png;base64,iVBORw0KGgoAAAANS...'
+        Output: 'iVBORw0KGgoAAAANS...'
+
+        If input is already pure base64 (no prefix), return as-is.
+        """
+        if not data_url:
+            return None
+
+        # Check if it's a data URL with prefix
+        if data_url.startswith('data:'):
+            # Split by comma to separate prefix from base64 data
+            parts = data_url.split(',', 1)
+            if len(parts) == 2:
+                return parts[1]  # Return the base64 part
+            return None
+        else:
+            # Already pure base64
+            return data_url
+
    @staticmethod
    def _extract_message_output_items(
        chat_completion_response: ModelResponse,
        choices: List[Choices],
-    ) -> List[GenericResponseOutputItem]:
-        message_output_items = []
+    ) -> List[Union[GenericResponseOutputItem, OutputImageGenerationCall]]:
+        message_output_items: List[Union[GenericResponseOutputItem, OutputImageGenerationCall]] = []
        for choice in choices:
-            message_output_items.append(
-                GenericResponseOutputItem(
-                    type="message",
-                    id=chat_completion_response.id,
-                    status=LiteLLMCompletionResponsesConfig._map_chat_completion_finish_reason_to_responses_status(
-                        choice.finish_reason
-                    ),
-                    role=choice.message.role,
-                    content=[
-                        LiteLLMCompletionResponsesConfig._transform_chat_message_to_response_output_text(
-                            choice.message
-                        )
-                    ],
+            # Check if message has images (image generation)
+            if hasattr(choice.message, 'images') and choice.message.images:
+                # Extract image generation output
+                image_generation_items = LiteLLMCompletionResponsesConfig._extract_image_generation_output_items(
+                    chat_completion_response=chat_completion_response,
+                    choice=choice,
+                )
+                message_output_items.extend(image_generation_items)
+            else:
+                # Regular message output
+                message_output_items.append(
+                    GenericResponseOutputItem(
+                        type="message",
+                        id=chat_completion_response.id,
+                        status=LiteLLMCompletionResponsesConfig._map_chat_completion_finish_reason_to_responses_status(
+                            choice.finish_reason
+                        ),
+                        role=choice.message.role,
+                        content=[
+                            LiteLLMCompletionResponsesConfig._transform_chat_message_to_response_output_text(
+                                choice.message
+                            )
+                        ],
+                    )
                )
-            )
        return message_output_items

    @staticmethod
--- a/litellm/types/llms/openai.py
+++ b/litellm/types/llms/openai.py
@@ -76,6 +76,7 @@ from litellm.types.llms.base import BaseLiteLLMOpenAIResponseObject
 from litellm.types.responses.main import (
    GenericResponseOutputItem,
    OutputFunctionToolCall,
+    OutputImageGenerationCall,
 )

 FileContent = Union[IO[bytes], bytes, PathLike]
@@ -1071,7 +1072,7 @@ class ResponsesAPIResponse(BaseLiteLLMOpenAIResponseObject):
    object: Optional[str] = None
    output: Union[
        List[Union[ResponseOutputItem, Dict]],
-        List[Union[GenericResponseOutputItem, OutputFunctionToolCall]],
+        List[Union[GenericResponseOutputItem, OutputFunctionToolCall, OutputImageGenerationCall]],
    ]
    parallel_tool_calls: Optional[bool] = None
    temperature: Optional[float] = None
--- a/litellm/types/responses/main.py
+++ b/litellm/types/responses/main.py
@@ -36,6 +36,15 @@ class OutputFunctionToolCall(BaseLiteLLMOpenAIResponseObject):
    status: Literal["in_progress", "completed", "incomplete"]


+class OutputImageGenerationCall(BaseLiteLLMOpenAIResponseObject):
+    """An image generation call output"""
+
+    type: Literal["image_generation_call"]
+    id: str
+    status: Literal["in_progress", "completed", "incomplete", "failed"]
+    result: Optional[str]  # Base64 encoded image data (without data:image prefix)
+
+
 class GenericResponseOutputItem(BaseLiteLLMOpenAIResponseObject):
    """
    Generic response API output item
--- a/tests/test_litellm/responses/litellm_completion_transformation/test_image_generation_output.py
+++ b/tests/test_litellm/responses/litellm_completion_transformation/test_image_generation_output.py
@@ -0,0 +1,172 @@
+"""
+Unit tests for Responses API image generation support
+
+Tests the fix for Issue #16227:
+https://github.com/BerriAI/litellm/issues/16227
+
+Verifies that image generation outputs are correctly transformed
+from /chat/completions format to /responses API format.
+"""
+import pytest
+from unittest.mock import Mock
+from litellm.responses.litellm_completion_transformation.transformation import (
+    LiteLLMCompletionResponsesConfig,
+)
+from litellm.types.responses.main import OutputImageGenerationCall
+from litellm.types.utils import ModelResponse, Choices, Message
+
+
+class TestExtractBase64FromDataUrl:
+    """Tests for _extract_base64_from_data_url helper function"""
+
+    def test_extracts_base64_from_data_url(self):
+        """Should extract pure base64 from data URL with prefix"""
+        data_url = "data:image/png;base64,iVBORw0KGgoAAAANS"
+        result = LiteLLMCompletionResponsesConfig._extract_base64_from_data_url(
+            data_url
+        )
+        assert result == "iVBORw0KGgoAAAANS"
+
+    def test_returns_base64_as_is_if_no_prefix(self):
+        """Should return base64 as-is if no data: prefix"""
+        pure_base64 = "iVBORw0KGgoAAAANS"
+        result = LiteLLMCompletionResponsesConfig._extract_base64_from_data_url(
+            pure_base64
+        )
+        assert result == pure_base64
+
+    def test_handles_invalid_inputs(self):
+        """Should return None for empty/None/malformed inputs"""
+        assert LiteLLMCompletionResponsesConfig._extract_base64_from_data_url("") is None
+        assert LiteLLMCompletionResponsesConfig._extract_base64_from_data_url(None) is None
+        assert LiteLLMCompletionResponsesConfig._extract_base64_from_data_url("data:image/png;base64") is None
+
+
+class TestExtractImageGenerationOutputItems:
+    """Tests for _extract_image_generation_output_items function"""
+
+    def test_extracts_images_correctly(self):
+        """Should extract OutputImageGenerationCall objects from images"""
+        mock_response = Mock(spec=ModelResponse)
+        mock_response.id = "test_123"
+
+        mock_message = Mock(spec=Message)
+        mock_message.images = [
+            {"image_url": {"url": "data:image/png;base64,IMG1"}, "type": "image_url", "index": 0},
+            {"image_url": {"url": "data:image/jpeg;base64,IMG2"}, "type": "image_url", "index": 1},
+        ]
+
+        mock_choice = Mock(spec=Choices)
+        mock_choice.message = mock_message
+        mock_choice.finish_reason = "stop"
+
+        result = LiteLLMCompletionResponsesConfig._extract_image_generation_output_items(
+            chat_completion_response=mock_response,
+            choice=mock_choice,
+        )
+
+        assert len(result) == 2
+        assert result[0].type == "image_generation_call"
+        assert result[0].result == "IMG1"
+        assert result[1].result == "IMG2"
+        assert result[0].id == "test_123_img_0"
+        assert result[1].id == "test_123_img_1"
+        assert result[0].status == "completed"
+
+    def test_returns_empty_for_no_images(self):
+        """Should return empty list if no images"""
+        mock_response = Mock(spec=ModelResponse)
+        mock_message = Mock(spec=Message)
+        mock_message.images = []
+
+        mock_choice = Mock(spec=Choices)
+        mock_choice.message = mock_message
+        mock_choice.finish_reason = "stop"
+
+        result = LiteLLMCompletionResponsesConfig._extract_image_generation_output_items(
+            chat_completion_response=mock_response,
+            choice=mock_choice,
+        )
+
+        assert result == []
+
+    def test_maps_finish_reason_to_status(self):
+        """Should correctly map finish_reason to status"""
+        mock_response = Mock(spec=ModelResponse)
+        mock_response.id = "test_finish"
+
+        mock_message = Mock(spec=Message)
+        mock_message.images = [
+            {"image_url": {"url": "data:image/png;base64,TEST"}, "type": "image_url", "index": 0}
+        ]
+
+        mock_choice = Mock(spec=Choices)
+        mock_choice.message = mock_message
+        mock_choice.finish_reason = "length"
+
+        result = LiteLLMCompletionResponsesConfig._extract_image_generation_output_items(
+            chat_completion_response=mock_response,
+            choice=mock_choice,
+        )
+
+        assert result[0].status == "incomplete"
+
+
+class TestExtractMessageOutputItemsIntegration:
+    """Integration tests for _extract_message_output_items with images"""
+
+    def test_detects_images_and_creates_image_generation_call(self):
+        """Should detect images in message and create image_generation_call output"""
+        mock_response = Mock(spec=ModelResponse)
+        mock_response.id = "integration_test_123"
+
+        mock_message = Mock(spec=Message)
+        mock_message.images = [
+            {
+                "image_url": {"url": "data:image/png;base64,INTEGRATION_TEST"},
+                "type": "image_url",
+                "index": 0,
+            }
+        ]
+        mock_message.role = "assistant"
+        mock_message.content = "Here's your image!"
+
+        mock_choice = Mock(spec=Choices)
+        mock_choice.message = mock_message
+        mock_choice.finish_reason = "stop"
+
+        result = LiteLLMCompletionResponsesConfig._extract_message_output_items(
+            chat_completion_response=mock_response,
+            choices=[mock_choice],
+        )
+
+        # Should return image_generation_call, NOT regular message
+        assert len(result) == 1
+        assert isinstance(result[0], OutputImageGenerationCall)
+        assert result[0].type == "image_generation_call"
+        assert result[0].result == "INTEGRATION_TEST"
+
+    def test_creates_regular_message_when_no_images(self):
+        """Should create regular GenericResponseOutputItem when no images"""
+        from litellm.types.responses.main import GenericResponseOutputItem
+
+        mock_response = Mock(spec=ModelResponse)
+        mock_response.id = "no_images_123"
+
+        mock_message = Mock(spec=Message)
+        # No images attribute or empty
+        mock_message.role = "assistant"
+        mock_message.content = "Just text, no images"
+
+        mock_choice = Mock(spec=Choices)
+        mock_choice.message = mock_message
+        mock_choice.finish_reason = "stop"
+
+        result = LiteLLMCompletionResponsesConfig._extract_message_output_items(
+            chat_completion_response=mock_response,
+            choices=[mock_choice],
+        )
+
+        assert len(result) == 1
+        assert isinstance(result[0], GenericResponseOutputItem)
+        assert result[0].type == "message"