mirror of
https://github.com/BerriAI/litellm.git
synced 2025-12-06 11:33:26 +08:00
[Feat] Add Azure AI Doc Intelligence OCR (#16219)
* TestAzureDocumentIntelligenceOCR * add AZURE_DOCUMENT_INTELLIGENCE_API_VERSION * add AzureDocumentIntelligenceOCRConfig * add async_transform_ocr_response * use async transform * add AzureDocumentIntelligenceOCRConfig * add AzureDocumentIntelligenceOCRConfig * add AzureDocumentIntelligenceOCRConfig * add get_azure_ai_ocr_config * add azure_ai/doc-intelligence * add azure_ai/doc-intelligence * docs fix * docs fix * add azure doc intel * fix lint error
This commit is contained in:
408
docs/my-website/docs/providers/azure_document_intelligence.md
Normal file
408
docs/my-website/docs/providers/azure_document_intelligence.md
Normal file
@@ -0,0 +1,408 @@
|
||||
# Azure Document Intelligence OCR
|
||||
|
||||
## Overview
|
||||
|
||||
| Property | Details |
|
||||
|-------|-------|
|
||||
| Description | Azure Document Intelligence (formerly Form Recognizer) provides advanced document analysis capabilities including text extraction, layout analysis, and structure recognition |
|
||||
| Provider Route on LiteLLM | `azure_ai/doc-intelligence/` |
|
||||
| Supported Operations | `/ocr` |
|
||||
| Link to Provider Doc | [Azure Document Intelligence ↗](https://learn.microsoft.com/en-us/azure/ai-services/document-intelligence/)
|
||||
|
||||
Extract text and analyze document structure using Azure Document Intelligence's powerful prebuilt models.
|
||||
|
||||
## Quick Start
|
||||
|
||||
### **LiteLLM SDK**
|
||||
|
||||
```python showLineNumbers title="SDK Usage"
|
||||
import litellm
|
||||
import os
|
||||
|
||||
# Set environment variables
|
||||
os.environ["AZURE_DOCUMENT_INTELLIGENCE_API_KEY"] = "your-api-key"
|
||||
os.environ["AZURE_DOCUMENT_INTELLIGENCE_ENDPOINT"] = "https://your-resource.cognitiveservices.azure.com"
|
||||
|
||||
# OCR with PDF URL
|
||||
response = litellm.ocr(
|
||||
model="azure_ai/doc-intelligence/prebuilt-layout",
|
||||
document={
|
||||
"type": "document_url",
|
||||
"document_url": "https://example.com/document.pdf"
|
||||
}
|
||||
)
|
||||
|
||||
# Access extracted text
|
||||
for page in response.pages:
|
||||
print(f"Page {page.index}:")
|
||||
print(page.markdown)
|
||||
```
|
||||
|
||||
### **LiteLLM PROXY**
|
||||
|
||||
```yaml showLineNumbers title="proxy_config.yaml"
|
||||
model_list:
|
||||
- model_name: azure-doc-intel
|
||||
litellm_params:
|
||||
model: azure_ai/doc-intelligence/prebuilt-layout
|
||||
api_key: os.environ/AZURE_DOCUMENT_INTELLIGENCE_API_KEY
|
||||
api_base: os.environ/AZURE_DOCUMENT_INTELLIGENCE_ENDPOINT
|
||||
model_info:
|
||||
mode: ocr
|
||||
```
|
||||
|
||||
**Start Proxy**
|
||||
```bash
|
||||
litellm --config proxy_config.yaml
|
||||
```
|
||||
|
||||
**Call OCR via Proxy**
|
||||
```bash showLineNumbers title="cURL Request"
|
||||
curl -X POST http://localhost:4000/ocr \
|
||||
-H "Content-Type: application/json" \
|
||||
-H "Authorization: Bearer your-api-key" \
|
||||
-d '{
|
||||
"model": "azure-doc-intel",
|
||||
"document": {
|
||||
"type": "document_url",
|
||||
"document_url": "https://arxiv.org/pdf/2201.04234"
|
||||
}
|
||||
}'
|
||||
```
|
||||
|
||||
## How It Works
|
||||
|
||||
Azure Document Intelligence uses an asynchronous API pattern. LiteLLM AI Gateway handles the request/response transformation and polling automatically.
|
||||
|
||||
### Complete Flow Diagram
|
||||
|
||||
```mermaid
|
||||
sequenceDiagram
|
||||
participant Client
|
||||
box rgb(200, 220, 255) LiteLLM AI Gateway
|
||||
participant LiteLLM
|
||||
end
|
||||
participant Azure as Azure Document Intelligence
|
||||
|
||||
Client->>LiteLLM: POST /ocr (Mistral format)
|
||||
Note over LiteLLM: Transform to Azure format
|
||||
|
||||
LiteLLM->>Azure: POST :analyze
|
||||
Azure-->>LiteLLM: 202 Accepted + polling URL
|
||||
|
||||
Note over LiteLLM: Automatic Polling
|
||||
loop Every 2-10 seconds
|
||||
LiteLLM->>Azure: GET polling URL
|
||||
Azure-->>LiteLLM: Status: running
|
||||
end
|
||||
|
||||
LiteLLM->>Azure: GET polling URL
|
||||
Azure-->>LiteLLM: Status: succeeded + results
|
||||
|
||||
Note over LiteLLM: Transform to Mistral format
|
||||
LiteLLM-->>Client: OCR Response (Mistral format)
|
||||
```
|
||||
|
||||
### What LiteLLM Does For You
|
||||
|
||||
When you call `litellm.ocr()` via SDK or `/ocr` via Proxy:
|
||||
|
||||
1. **Request Transformation**: Converts Mistral OCR format → Azure Document Intelligence format
|
||||
2. **Submits Document**: Sends transformed request to Azure DI API
|
||||
3. **Handles 202 Response**: Captures the `Operation-Location` URL from response headers
|
||||
4. **Automatic Polling**:
|
||||
- Polls the operation URL at intervals specified by `retry-after` header (default: 2 seconds)
|
||||
- Continues until status is `succeeded` or `failed`
|
||||
- Respects Azure's rate limiting via `retry-after` headers
|
||||
5. **Response Transformation**: Converts Azure DI format → Mistral OCR format
|
||||
6. **Returns Result**: Sends unified Mistral format response to client
|
||||
|
||||
**Polling Configuration:**
|
||||
- Default timeout: 120 seconds
|
||||
- Configurable via `AZURE_OPERATION_POLLING_TIMEOUT` environment variable
|
||||
- Uses sync (`time.sleep()`) or async (`await asyncio.sleep()`) based on call type
|
||||
|
||||
:::info
|
||||
**Typical processing time**: 2-10 seconds depending on document size and complexity
|
||||
:::
|
||||
|
||||
## Supported Models
|
||||
|
||||
Azure Document Intelligence offers several prebuilt models optimized for different use cases:
|
||||
|
||||
### prebuilt-layout (Recommended)
|
||||
|
||||
Best for general document OCR with structure preservation.
|
||||
|
||||
import Tabs from '@theme/Tabs';
|
||||
import TabItem from '@theme/TabItem';
|
||||
|
||||
<Tabs>
|
||||
<TabItem value="sdk" label="SDK">
|
||||
|
||||
```python showLineNumbers title="Layout Model - SDK"
|
||||
import litellm
|
||||
import os
|
||||
|
||||
os.environ["AZURE_DOCUMENT_INTELLIGENCE_API_KEY"] = "your-api-key"
|
||||
os.environ["AZURE_DOCUMENT_INTELLIGENCE_ENDPOINT"] = "https://your-resource.cognitiveservices.azure.com"
|
||||
|
||||
response = litellm.ocr(
|
||||
model="azure_ai/doc-intelligence/prebuilt-layout",
|
||||
document={
|
||||
"type": "document_url",
|
||||
"document_url": "https://example.com/document.pdf"
|
||||
}
|
||||
)
|
||||
```
|
||||
|
||||
</TabItem>
|
||||
<TabItem value="proxy" label="Proxy Config">
|
||||
|
||||
```yaml showLineNumbers title="proxy_config.yaml"
|
||||
model_list:
|
||||
- model_name: azure-layout
|
||||
litellm_params:
|
||||
model: azure_ai/doc-intelligence/prebuilt-layout
|
||||
api_key: os.environ/AZURE_DOCUMENT_INTELLIGENCE_API_KEY
|
||||
api_base: os.environ/AZURE_DOCUMENT_INTELLIGENCE_ENDPOINT
|
||||
model_info:
|
||||
mode: ocr
|
||||
```
|
||||
|
||||
**Usage:**
|
||||
```bash
|
||||
curl -X POST http://localhost:4000/ocr \
|
||||
-H "Authorization: Bearer your-api-key" \
|
||||
-d '{"model": "azure-layout", "document": {"type": "document_url", "document_url": "https://example.com/doc.pdf"}}'
|
||||
```
|
||||
|
||||
</TabItem>
|
||||
</Tabs>
|
||||
|
||||
**Features:**
|
||||
- Text extraction with markdown formatting
|
||||
- Table detection and extraction
|
||||
- Document structure analysis
|
||||
- Paragraph and section recognition
|
||||
|
||||
**Pricing:** $10 per 1,000 pages
|
||||
|
||||
### prebuilt-read
|
||||
|
||||
Optimized for reading text from documents - fastest and most cost-effective.
|
||||
|
||||
<Tabs>
|
||||
<TabItem value="sdk" label="SDK">
|
||||
|
||||
```python showLineNumbers title="Read Model - SDK"
|
||||
import litellm
|
||||
import os
|
||||
|
||||
os.environ["AZURE_DOCUMENT_INTELLIGENCE_API_KEY"] = "your-api-key"
|
||||
os.environ["AZURE_DOCUMENT_INTELLIGENCE_ENDPOINT"] = "https://your-resource.cognitiveservices.azure.com"
|
||||
|
||||
response = litellm.ocr(
|
||||
model="azure_ai/doc-intelligence/prebuilt-read",
|
||||
document={
|
||||
"type": "document_url",
|
||||
"document_url": "https://example.com/document.pdf"
|
||||
}
|
||||
)
|
||||
```
|
||||
|
||||
</TabItem>
|
||||
<TabItem value="proxy" label="Proxy Config">
|
||||
|
||||
```yaml showLineNumbers title="proxy_config.yaml"
|
||||
model_list:
|
||||
- model_name: azure-read
|
||||
litellm_params:
|
||||
model: azure_ai/doc-intelligence/prebuilt-read
|
||||
api_key: os.environ/AZURE_DOCUMENT_INTELLIGENCE_API_KEY
|
||||
api_base: os.environ/AZURE_DOCUMENT_INTELLIGENCE_ENDPOINT
|
||||
model_info:
|
||||
mode: ocr
|
||||
```
|
||||
|
||||
**Usage:**
|
||||
```bash
|
||||
curl -X POST http://localhost:4000/ocr \
|
||||
-H "Authorization: Bearer your-api-key" \
|
||||
-d '{"model": "azure-read", "document": {"type": "document_url", "document_url": "https://example.com/doc.pdf"}}'
|
||||
```
|
||||
|
||||
</TabItem>
|
||||
</Tabs>
|
||||
|
||||
**Features:**
|
||||
- Fast text extraction
|
||||
- Optimized for reading-heavy documents
|
||||
- Basic structure recognition
|
||||
|
||||
**Pricing:** $1.50 per 1,000 pages
|
||||
|
||||
### prebuilt-document
|
||||
|
||||
General-purpose document analysis with key-value pairs.
|
||||
|
||||
<Tabs>
|
||||
<TabItem value="sdk" label="SDK">
|
||||
|
||||
```python showLineNumbers title="Document Model - SDK"
|
||||
import litellm
|
||||
import os
|
||||
|
||||
os.environ["AZURE_DOCUMENT_INTELLIGENCE_API_KEY"] = "your-api-key"
|
||||
os.environ["AZURE_DOCUMENT_INTELLIGENCE_ENDPOINT"] = "https://your-resource.cognitiveservices.azure.com"
|
||||
|
||||
response = litellm.ocr(
|
||||
model="azure_ai/doc-intelligence/prebuilt-document",
|
||||
document={
|
||||
"type": "document_url",
|
||||
"document_url": "https://example.com/document.pdf"
|
||||
}
|
||||
)
|
||||
```
|
||||
|
||||
</TabItem>
|
||||
<TabItem value="proxy" label="Proxy Config">
|
||||
|
||||
```yaml showLineNumbers title="proxy_config.yaml"
|
||||
model_list:
|
||||
- model_name: azure-document
|
||||
litellm_params:
|
||||
model: azure_ai/doc-intelligence/prebuilt-document
|
||||
api_key: os.environ/AZURE_DOCUMENT_INTELLIGENCE_API_KEY
|
||||
api_base: os.environ/AZURE_DOCUMENT_INTELLIGENCE_ENDPOINT
|
||||
model_info:
|
||||
mode: ocr
|
||||
```
|
||||
|
||||
**Usage:**
|
||||
```bash
|
||||
curl -X POST http://localhost:4000/ocr \
|
||||
-H "Authorization: Bearer your-api-key" \
|
||||
-d '{"model": "azure-document", "document": {"type": "document_url", "document_url": "https://example.com/doc.pdf"}}'
|
||||
```
|
||||
|
||||
</TabItem>
|
||||
</Tabs>
|
||||
|
||||
**Pricing:** $10 per 1,000 pages
|
||||
|
||||
## Document Types
|
||||
|
||||
Azure Document Intelligence supports various document formats.
|
||||
|
||||
### PDF Documents
|
||||
|
||||
```python showLineNumbers title="PDF OCR"
|
||||
response = litellm.ocr(
|
||||
model="azure_ai/doc-intelligence/prebuilt-layout",
|
||||
document={
|
||||
"type": "document_url",
|
||||
"document_url": "https://example.com/document.pdf"
|
||||
}
|
||||
)
|
||||
```
|
||||
|
||||
### Image Documents
|
||||
|
||||
```python showLineNumbers title="Image OCR"
|
||||
response = litellm.ocr(
|
||||
model="azure_ai/doc-intelligence/prebuilt-layout",
|
||||
document={
|
||||
"type": "image_url",
|
||||
"image_url": "https://example.com/image.png"
|
||||
}
|
||||
)
|
||||
```
|
||||
|
||||
**Supported image formats:** JPEG, PNG, BMP, TIFF
|
||||
|
||||
### Base64 Encoded Documents
|
||||
|
||||
```python showLineNumbers title="Base64 PDF"
|
||||
import base64
|
||||
|
||||
# Read and encode PDF
|
||||
with open("document.pdf", "rb") as f:
|
||||
pdf_base64 = base64.b64encode(f.read()).decode()
|
||||
|
||||
response = litellm.ocr(
|
||||
model="azure_ai/doc-intelligence/prebuilt-layout",
|
||||
document={
|
||||
"type": "document_url",
|
||||
"document_url": f"data:application/pdf;base64,{pdf_base64}"
|
||||
}
|
||||
)
|
||||
```
|
||||
|
||||
## Response Format
|
||||
|
||||
```python showLineNumbers title="Response Structure"
|
||||
# Response has the following structure
|
||||
response.pages # List of pages with extracted text
|
||||
response.model # Model used
|
||||
response.object # "ocr"
|
||||
response.usage_info # Token usage information
|
||||
|
||||
# Access page content
|
||||
for page in response.pages:
|
||||
print(f"Page {page.index}:")
|
||||
print(page.markdown)
|
||||
|
||||
# Page dimensions (in pixels)
|
||||
if page.dimensions:
|
||||
print(f"Width: {page.dimensions.width}px")
|
||||
print(f"Height: {page.dimensions.height}px")
|
||||
```
|
||||
|
||||
## Async Support
|
||||
|
||||
```python showLineNumbers title="Async Usage"
|
||||
import litellm
|
||||
import asyncio
|
||||
|
||||
async def process_document():
|
||||
response = await litellm.aocr(
|
||||
model="azure_ai/doc-intelligence/prebuilt-layout",
|
||||
document={
|
||||
"type": "document_url",
|
||||
"document_url": "https://example.com/document.pdf"
|
||||
}
|
||||
)
|
||||
return response
|
||||
|
||||
# Run async function
|
||||
response = asyncio.run(process_document())
|
||||
```
|
||||
|
||||
## Cost Tracking
|
||||
|
||||
LiteLLM automatically tracks costs for Azure Document Intelligence OCR:
|
||||
|
||||
| Model | Cost per 1,000 Pages |
|
||||
|-------|---------------------|
|
||||
| prebuilt-read | $1.50 |
|
||||
| prebuilt-layout | $10.00 |
|
||||
| prebuilt-document | $10.00 |
|
||||
|
||||
```python showLineNumbers title="View Cost"
|
||||
response = litellm.ocr(
|
||||
model="azure_ai/doc-intelligence/prebuilt-layout",
|
||||
document={"type": "document_url", "document_url": "https://..."}
|
||||
)
|
||||
|
||||
# Access cost information
|
||||
print(f"Cost: ${response._hidden_params.get('response_cost', 0)}")
|
||||
```
|
||||
|
||||
## Additional Resources
|
||||
|
||||
- [Azure Document Intelligence Documentation](https://learn.microsoft.com/en-us/azure/ai-services/document-intelligence/)
|
||||
- [Pricing Details](https://azure.microsoft.com/en-us/pricing/details/ai-document-intelligence/)
|
||||
- [Supported File Formats](https://learn.microsoft.com/en-us/azure/ai-services/document-intelligence/concept-model-overview)
|
||||
- [LiteLLM OCR Documentation](https://docs.litellm.ai/docs/ocr)
|
||||
|
||||
@@ -1,4 +1,4 @@
|
||||
# Azure AI OCR
|
||||
# Azure AI OCR (Mistral)
|
||||
|
||||
## Overview
|
||||
|
||||
|
||||
@@ -455,6 +455,7 @@ const sidebars = {
|
||||
items: [
|
||||
"providers/azure_ai",
|
||||
"providers/azure_ocr",
|
||||
"providers/azure_document_intelligence",
|
||||
"providers/azure_ai_speech",
|
||||
"providers/azure_ai_img",
|
||||
"providers/azure_ai_vector_stores",
|
||||
|
||||
@@ -209,6 +209,12 @@ DEFAULT_POLLING_INTERVAL = float(
|
||||
os.getenv("DEFAULT_POLLING_INTERVAL", 0.03)
|
||||
) # default polling interval for the scheduler
|
||||
AZURE_OPERATION_POLLING_TIMEOUT = int(os.getenv("AZURE_OPERATION_POLLING_TIMEOUT", 120))
|
||||
AZURE_DOCUMENT_INTELLIGENCE_API_VERSION = str(
|
||||
os.getenv("AZURE_DOCUMENT_INTELLIGENCE_API_VERSION", "2024-11-30")
|
||||
)
|
||||
AZURE_DOCUMENT_INTELLIGENCE_DEFAULT_DPI = int(
|
||||
os.getenv("AZURE_DOCUMENT_INTELLIGENCE_DEFAULT_DPI", 96)
|
||||
)
|
||||
REDIS_SOCKET_TIMEOUT = float(os.getenv("REDIS_SOCKET_TIMEOUT", 0.1))
|
||||
REDIS_CONNECTION_POOL_TIMEOUT = int(os.getenv("REDIS_CONNECTION_POOL_TIMEOUT", 5))
|
||||
NON_LLM_CONNECTION_TIMEOUT = int(
|
||||
|
||||
@@ -1,5 +1,13 @@
|
||||
"""Azure AI OCR module."""
|
||||
from .common_utils import get_azure_ai_ocr_config
|
||||
from .document_intelligence.transformation import (
|
||||
AzureDocumentIntelligenceOCRConfig,
|
||||
)
|
||||
from .transformation import AzureAIOCRConfig
|
||||
|
||||
__all__ = ["AzureAIOCRConfig"]
|
||||
__all__ = [
|
||||
"AzureAIOCRConfig",
|
||||
"AzureDocumentIntelligenceOCRConfig",
|
||||
"get_azure_ai_ocr_config",
|
||||
]
|
||||
|
||||
|
||||
53
litellm/llms/azure_ai/ocr/common_utils.py
Normal file
53
litellm/llms/azure_ai/ocr/common_utils.py
Normal file
@@ -0,0 +1,53 @@
|
||||
"""
|
||||
Common utilities for Azure AI OCR providers.
|
||||
|
||||
This module provides routing logic to determine which OCR configuration to use
|
||||
based on the model name.
|
||||
"""
|
||||
|
||||
from typing import TYPE_CHECKING, Optional
|
||||
|
||||
from litellm._logging import verbose_logger
|
||||
|
||||
if TYPE_CHECKING:
|
||||
from litellm.llms.base_llm.ocr.transformation import BaseOCRConfig
|
||||
|
||||
|
||||
def get_azure_ai_ocr_config(model: str) -> Optional["BaseOCRConfig"]:
|
||||
"""
|
||||
Determine which Azure AI OCR configuration to use based on the model name.
|
||||
|
||||
Azure AI supports multiple OCR services:
|
||||
- Azure Document Intelligence: azure_ai/doc-intelligence/<model>
|
||||
- Mistral OCR (via Azure AI): azure_ai/<model>
|
||||
|
||||
Args:
|
||||
model: The model name (e.g., "azure_ai/doc-intelligence/prebuilt-read",
|
||||
"azure_ai/pixtral-12b-2409")
|
||||
|
||||
Returns:
|
||||
OCR configuration instance for the specified model
|
||||
|
||||
Examples:
|
||||
>>> get_azure_ai_ocr_config("azure_ai/doc-intelligence/prebuilt-read")
|
||||
<AzureDocumentIntelligenceOCRConfig object>
|
||||
|
||||
>>> get_azure_ai_ocr_config("azure_ai/pixtral-12b-2409")
|
||||
<AzureAIOCRConfig object>
|
||||
"""
|
||||
from litellm.llms.azure_ai.ocr.document_intelligence.transformation import (
|
||||
AzureDocumentIntelligenceOCRConfig,
|
||||
)
|
||||
from litellm.llms.azure_ai.ocr.transformation import AzureAIOCRConfig
|
||||
|
||||
# Check for Azure Document Intelligence models
|
||||
if "doc-intelligence" in model or "documentintelligence" in model:
|
||||
verbose_logger.debug(
|
||||
f"Routing {model} to Azure Document Intelligence OCR config"
|
||||
)
|
||||
return AzureDocumentIntelligenceOCRConfig()
|
||||
|
||||
# Default to Mistral-based OCR for other azure_ai models
|
||||
verbose_logger.debug(f"Routing {model} to Azure AI (Mistral) OCR config")
|
||||
return AzureAIOCRConfig()
|
||||
|
||||
@@ -0,0 +1,5 @@
|
||||
"""Azure Document Intelligence OCR module."""
|
||||
from .transformation import AzureDocumentIntelligenceOCRConfig
|
||||
|
||||
__all__ = ["AzureDocumentIntelligenceOCRConfig"]
|
||||
|
||||
@@ -0,0 +1,697 @@
|
||||
"""
|
||||
Azure Document Intelligence OCR transformation implementation.
|
||||
|
||||
Azure Document Intelligence (formerly Form Recognizer) provides advanced document analysis capabilities.
|
||||
This implementation transforms between Mistral OCR format and Azure Document Intelligence API v4.0.
|
||||
|
||||
Note: Azure Document Intelligence API is async - POST returns 202 Accepted with Operation-Location header.
|
||||
The operation location must be polled until the analysis completes.
|
||||
"""
|
||||
import asyncio
|
||||
import re
|
||||
import time
|
||||
from typing import Any, Dict, Optional
|
||||
|
||||
import httpx
|
||||
|
||||
from litellm._logging import verbose_logger
|
||||
from litellm.constants import (
|
||||
AZURE_DOCUMENT_INTELLIGENCE_API_VERSION,
|
||||
AZURE_DOCUMENT_INTELLIGENCE_DEFAULT_DPI,
|
||||
AZURE_OPERATION_POLLING_TIMEOUT,
|
||||
)
|
||||
from litellm.llms.base_llm.ocr.transformation import (
|
||||
BaseOCRConfig,
|
||||
DocumentType,
|
||||
OCRPage,
|
||||
OCRPageDimensions,
|
||||
OCRRequestData,
|
||||
OCRResponse,
|
||||
OCRUsageInfo,
|
||||
)
|
||||
from litellm.llms.custom_httpx.http_handler import AsyncHTTPHandler
|
||||
from litellm.secret_managers.main import get_secret_str
|
||||
|
||||
|
||||
class AzureDocumentIntelligenceOCRConfig(BaseOCRConfig):
|
||||
"""
|
||||
Azure Document Intelligence OCR transformation configuration.
|
||||
|
||||
Supports Azure Document Intelligence v4.0 (2024-11-30) API.
|
||||
Model route: azure_ai/doc-intelligence/<model>
|
||||
|
||||
Supported models:
|
||||
- prebuilt-layout: Extracts text with markdown, tables, and structure (closest to Mistral OCR)
|
||||
- prebuilt-read: Basic text extraction optimized for reading
|
||||
- prebuilt-document: General document analysis
|
||||
|
||||
Reference: https://learn.microsoft.com/en-us/azure/ai-services/document-intelligence/
|
||||
"""
|
||||
|
||||
def __init__(self) -> None:
|
||||
super().__init__()
|
||||
|
||||
def get_supported_ocr_params(self, model: str) -> list:
|
||||
"""
|
||||
Get supported OCR parameters for Azure Document Intelligence.
|
||||
|
||||
Azure DI has minimal optional parameters compared to Mistral OCR.
|
||||
Most Mistral-specific params are ignored during transformation.
|
||||
"""
|
||||
return []
|
||||
|
||||
def validate_environment(
|
||||
self,
|
||||
headers: Dict,
|
||||
model: str,
|
||||
api_key: Optional[str] = None,
|
||||
api_base: Optional[str] = None,
|
||||
litellm_params: Optional[dict] = None,
|
||||
**kwargs,
|
||||
) -> Dict:
|
||||
"""
|
||||
Validate environment and return headers for Azure Document Intelligence.
|
||||
|
||||
Authentication uses Ocp-Apim-Subscription-Key header.
|
||||
"""
|
||||
# Get API key from environment if not provided
|
||||
if api_key is None:
|
||||
api_key = get_secret_str("AZURE_DOCUMENT_INTELLIGENCE_API_KEY")
|
||||
|
||||
if api_key is None:
|
||||
raise ValueError(
|
||||
"Missing Azure Document Intelligence API Key - Set AZURE_DOCUMENT_INTELLIGENCE_API_KEY environment variable or pass api_key parameter"
|
||||
)
|
||||
|
||||
# Validate API base/endpoint is provided
|
||||
if api_base is None:
|
||||
api_base = get_secret_str("AZURE_DOCUMENT_INTELLIGENCE_ENDPOINT")
|
||||
|
||||
if api_base is None:
|
||||
raise ValueError(
|
||||
"Missing Azure Document Intelligence Endpoint - Set AZURE_DOCUMENT_INTELLIGENCE_ENDPOINT environment variable or pass api_base parameter"
|
||||
)
|
||||
|
||||
headers = {
|
||||
"Ocp-Apim-Subscription-Key": api_key,
|
||||
"Content-Type": "application/json",
|
||||
**headers,
|
||||
}
|
||||
|
||||
return headers
|
||||
|
||||
def get_complete_url(
|
||||
self,
|
||||
api_base: Optional[str],
|
||||
model: str,
|
||||
optional_params: dict,
|
||||
litellm_params: Optional[dict] = None,
|
||||
**kwargs,
|
||||
) -> str:
|
||||
"""
|
||||
Get complete URL for Azure Document Intelligence endpoint.
|
||||
|
||||
Format: {endpoint}/documentintelligence/documentModels/{modelId}:analyze?api-version=2024-11-30
|
||||
|
||||
Note: API version 2024-11-30 uses /documentintelligence/ path (not /formrecognizer/)
|
||||
|
||||
Args:
|
||||
api_base: Azure Document Intelligence endpoint (e.g., https://your-resource.cognitiveservices.azure.com)
|
||||
model: Model ID (e.g., "prebuilt-layout", "prebuilt-read")
|
||||
optional_params: Optional parameters
|
||||
|
||||
Returns: Complete URL for Azure DI analyze endpoint
|
||||
"""
|
||||
if api_base is None:
|
||||
raise ValueError(
|
||||
"Missing Azure Document Intelligence Endpoint - Set AZURE_DOCUMENT_INTELLIGENCE_ENDPOINT environment variable or pass api_base parameter"
|
||||
)
|
||||
|
||||
# Ensure no trailing slash
|
||||
api_base = api_base.rstrip("/")
|
||||
|
||||
# Extract model ID from full model path if needed
|
||||
# Model can be "prebuilt-layout" or "azure_ai/doc-intelligence/prebuilt-layout"
|
||||
model_id = model
|
||||
if "/" in model:
|
||||
# Extract the last part after the last slash
|
||||
model_id = model.split("/")[-1]
|
||||
|
||||
# Azure Document Intelligence analyze endpoint
|
||||
# Note: API version 2024-11-30+ uses /documentintelligence/ (not /formrecognizer/)
|
||||
return f"{api_base}/documentintelligence/documentModels/{model_id}:analyze?api-version={AZURE_DOCUMENT_INTELLIGENCE_API_VERSION}"
|
||||
|
||||
def _extract_base64_from_data_uri(self, data_uri: str) -> str:
|
||||
"""
|
||||
Extract base64 content from a data URI.
|
||||
|
||||
Args:
|
||||
data_uri: Data URI like "data:application/pdf;base64,..."
|
||||
|
||||
Returns:
|
||||
Base64 string without the data URI prefix
|
||||
"""
|
||||
# Match pattern: data:[<mediatype>][;base64],<data>
|
||||
match = re.match(r"data:([^;]+)(?:;base64)?,(.+)", data_uri)
|
||||
if match:
|
||||
return match.group(2)
|
||||
return data_uri
|
||||
|
||||
def transform_ocr_request(
|
||||
self,
|
||||
model: str,
|
||||
document: DocumentType,
|
||||
optional_params: dict,
|
||||
headers: dict,
|
||||
**kwargs,
|
||||
) -> OCRRequestData:
|
||||
"""
|
||||
Transform OCR request to Azure Document Intelligence format.
|
||||
|
||||
Mistral OCR format:
|
||||
{
|
||||
"document": {
|
||||
"type": "document_url",
|
||||
"document_url": "https://example.com/doc.pdf"
|
||||
}
|
||||
}
|
||||
|
||||
Azure DI format:
|
||||
{
|
||||
"urlSource": "https://example.com/doc.pdf"
|
||||
}
|
||||
OR
|
||||
{
|
||||
"base64Source": "base64_encoded_content"
|
||||
}
|
||||
|
||||
Args:
|
||||
model: Model name
|
||||
document: Document dict from user (Mistral format)
|
||||
optional_params: Already mapped optional parameters
|
||||
headers: Request headers
|
||||
|
||||
Returns:
|
||||
OCRRequestData with JSON data
|
||||
"""
|
||||
verbose_logger.debug(
|
||||
f"Azure Document Intelligence transform_ocr_request - model: {model}"
|
||||
)
|
||||
|
||||
if not isinstance(document, dict):
|
||||
raise ValueError(f"Expected document dict, got {type(document)}")
|
||||
|
||||
# Extract document URL from Mistral format
|
||||
doc_type = document.get("type")
|
||||
document_url = None
|
||||
|
||||
if doc_type == "document_url":
|
||||
document_url = document.get("document_url", "")
|
||||
elif doc_type == "image_url":
|
||||
document_url = document.get("image_url", "")
|
||||
else:
|
||||
raise ValueError(
|
||||
f"Invalid document type: {doc_type}. Must be 'document_url' or 'image_url'"
|
||||
)
|
||||
|
||||
if not document_url:
|
||||
raise ValueError("Document URL is required")
|
||||
|
||||
# Build Azure DI request
|
||||
data: Dict[str, Any] = {}
|
||||
|
||||
# Check if it's a data URI (base64)
|
||||
if document_url.startswith("data:"):
|
||||
# Extract base64 content
|
||||
base64_content = self._extract_base64_from_data_uri(document_url)
|
||||
data["base64Source"] = base64_content
|
||||
verbose_logger.debug("Using base64Source for Azure Document Intelligence")
|
||||
else:
|
||||
# Regular URL
|
||||
data["urlSource"] = document_url
|
||||
verbose_logger.debug("Using urlSource for Azure Document Intelligence")
|
||||
|
||||
# Azure DI doesn't support most Mistral-specific params
|
||||
# Ignore pages, include_image_base64, etc.
|
||||
|
||||
return OCRRequestData(data=data, files=None)
|
||||
|
||||
def _extract_page_markdown(self, page_data: Dict[str, Any]) -> str:
|
||||
"""
|
||||
Extract text from Azure DI page and format as markdown.
|
||||
|
||||
Azure DI provides text in 'lines' array. We concatenate them with newlines.
|
||||
|
||||
Args:
|
||||
page_data: Azure DI page object
|
||||
|
||||
Returns:
|
||||
Markdown-formatted text
|
||||
"""
|
||||
lines = page_data.get("lines", [])
|
||||
if not lines:
|
||||
return ""
|
||||
|
||||
# Extract text content from each line
|
||||
text_lines = [line.get("content", "") for line in lines]
|
||||
|
||||
# Join with newlines to preserve structure
|
||||
return "\n".join(text_lines)
|
||||
|
||||
def _convert_dimensions(
|
||||
self, width: float, height: float, unit: str
|
||||
) -> OCRPageDimensions:
|
||||
"""
|
||||
Convert Azure DI dimensions to pixels.
|
||||
|
||||
Azure DI provides dimensions in inches. We convert to pixels using configured DPI.
|
||||
|
||||
Args:
|
||||
width: Width in specified unit
|
||||
height: Height in specified unit
|
||||
unit: Unit of measurement (e.g., "inch")
|
||||
|
||||
Returns:
|
||||
OCRPageDimensions with pixel values
|
||||
"""
|
||||
# Convert to pixels using configured DPI
|
||||
dpi = AZURE_DOCUMENT_INTELLIGENCE_DEFAULT_DPI
|
||||
if unit == "inch":
|
||||
width_px = int(width * dpi)
|
||||
height_px = int(height * dpi)
|
||||
else:
|
||||
# If unit is not inches, assume it's already in pixels
|
||||
width_px = int(width)
|
||||
height_px = int(height)
|
||||
|
||||
return OCRPageDimensions(width=width_px, height=height_px, dpi=dpi)
|
||||
|
||||
@staticmethod
|
||||
def _check_timeout(start_time: float, timeout_secs: int) -> None:
|
||||
"""
|
||||
Check if operation has timed out.
|
||||
|
||||
Args:
|
||||
start_time: Start time of the operation
|
||||
timeout_secs: Timeout duration in seconds
|
||||
|
||||
Raises:
|
||||
TimeoutError: If operation has exceeded timeout
|
||||
"""
|
||||
if time.time() - start_time > timeout_secs:
|
||||
raise TimeoutError(
|
||||
f"Azure Document Intelligence operation polling timed out after {timeout_secs} seconds"
|
||||
)
|
||||
|
||||
@staticmethod
|
||||
def _get_retry_after(response: httpx.Response) -> int:
|
||||
"""
|
||||
Get retry-after duration from response headers.
|
||||
|
||||
Args:
|
||||
response: HTTP response
|
||||
|
||||
Returns:
|
||||
Retry-after duration in seconds (default: 2)
|
||||
"""
|
||||
retry_after = int(response.headers.get("retry-after", "2"))
|
||||
verbose_logger.debug(f"Retry polling after: {retry_after} seconds")
|
||||
return retry_after
|
||||
|
||||
@staticmethod
|
||||
def _check_operation_status(response: httpx.Response) -> str:
|
||||
"""
|
||||
Check Azure DI operation status from response.
|
||||
|
||||
Args:
|
||||
response: HTTP response from operation endpoint
|
||||
|
||||
Returns:
|
||||
Operation status string
|
||||
|
||||
Raises:
|
||||
ValueError: If operation failed or status is unknown
|
||||
"""
|
||||
try:
|
||||
result = response.json()
|
||||
status = result.get("status")
|
||||
|
||||
verbose_logger.debug(f"Azure DI operation status: {status}")
|
||||
|
||||
if status == "succeeded":
|
||||
return "succeeded"
|
||||
elif status == "failed":
|
||||
error_msg = result.get("error", {}).get("message", "Unknown error")
|
||||
raise ValueError(
|
||||
f"Azure Document Intelligence analysis failed: {error_msg}"
|
||||
)
|
||||
elif status in ["running", "notStarted"]:
|
||||
return "running"
|
||||
else:
|
||||
raise ValueError(f"Unknown operation status: {status}")
|
||||
|
||||
except Exception as e:
|
||||
if "succeeded" in str(e) or "failed" in str(e):
|
||||
raise
|
||||
# If we can't parse JSON, something went wrong
|
||||
raise ValueError(f"Failed to parse Azure DI operation response: {e}")
|
||||
|
||||
def _poll_operation_sync(
|
||||
self,
|
||||
operation_url: str,
|
||||
headers: Dict[str, str],
|
||||
timeout_secs: int,
|
||||
) -> httpx.Response:
|
||||
"""
|
||||
Poll Azure Document Intelligence operation until completion (sync).
|
||||
|
||||
Azure DI POST returns 202 with Operation-Location header.
|
||||
We need to poll that URL until status is "succeeded" or "failed".
|
||||
|
||||
Args:
|
||||
operation_url: The Operation-Location URL to poll
|
||||
headers: Request headers (including auth)
|
||||
timeout_secs: Total timeout in seconds
|
||||
|
||||
Returns:
|
||||
Final response with completed analysis
|
||||
"""
|
||||
from litellm.llms.custom_httpx.http_handler import _get_httpx_client
|
||||
|
||||
client = _get_httpx_client()
|
||||
start_time = time.time()
|
||||
|
||||
verbose_logger.debug(f"Polling Azure DI operation: {operation_url}")
|
||||
|
||||
while True:
|
||||
self._check_timeout(start_time=start_time, timeout_secs=timeout_secs)
|
||||
|
||||
# Poll the operation status
|
||||
response = client.get(url=operation_url, headers=headers)
|
||||
|
||||
# Check operation status
|
||||
status = self._check_operation_status(response=response)
|
||||
|
||||
if status == "succeeded":
|
||||
return response
|
||||
elif status == "running":
|
||||
# Wait before polling again
|
||||
retry_after = self._get_retry_after(response=response)
|
||||
time.sleep(retry_after)
|
||||
|
||||
async def _poll_operation_async(
|
||||
self,
|
||||
operation_url: str,
|
||||
headers: Dict[str, str],
|
||||
timeout_secs: int,
|
||||
) -> httpx.Response:
|
||||
"""
|
||||
Poll Azure Document Intelligence operation until completion (async).
|
||||
|
||||
Args:
|
||||
operation_url: The Operation-Location URL to poll
|
||||
headers: Request headers (including auth)
|
||||
timeout_secs: Total timeout in seconds
|
||||
|
||||
Returns:
|
||||
Final response with completed analysis
|
||||
"""
|
||||
import litellm
|
||||
from litellm.llms.custom_httpx.http_handler import get_async_httpx_client
|
||||
|
||||
client = get_async_httpx_client(llm_provider=litellm.LlmProviders.AZURE_AI)
|
||||
start_time = time.time()
|
||||
|
||||
verbose_logger.debug(f"Polling Azure DI operation (async): {operation_url}")
|
||||
|
||||
while True:
|
||||
self._check_timeout(start_time=start_time, timeout_secs=timeout_secs)
|
||||
|
||||
# Poll the operation status
|
||||
response = await client.get(url=operation_url, headers=headers)
|
||||
|
||||
# Check operation status
|
||||
status = self._check_operation_status(response=response)
|
||||
|
||||
if status == "succeeded":
|
||||
return response
|
||||
elif status == "running":
|
||||
# Wait before polling again
|
||||
retry_after = self._get_retry_after(response=response)
|
||||
await asyncio.sleep(retry_after)
|
||||
|
||||
def transform_ocr_response(
|
||||
self,
|
||||
model: str,
|
||||
raw_response: httpx.Response,
|
||||
logging_obj: Any,
|
||||
**kwargs,
|
||||
) -> OCRResponse:
|
||||
"""
|
||||
Transform Azure Document Intelligence response to Mistral OCR format.
|
||||
|
||||
Handles async operation polling: If response is 202 Accepted, polls Operation-Location
|
||||
until analysis completes.
|
||||
|
||||
Azure DI response (after polling):
|
||||
{
|
||||
"status": "succeeded",
|
||||
"analyzeResult": {
|
||||
"content": "Full document text...",
|
||||
"pages": [
|
||||
{
|
||||
"pageNumber": 1,
|
||||
"width": 8.5,
|
||||
"height": 11,
|
||||
"unit": "inch",
|
||||
"lines": [{"content": "text", "boundingBox": [...]}]
|
||||
}
|
||||
]
|
||||
}
|
||||
}
|
||||
|
||||
Mistral OCR format:
|
||||
{
|
||||
"pages": [
|
||||
{
|
||||
"index": 0,
|
||||
"markdown": "extracted text",
|
||||
"dimensions": {"width": 816, "height": 1056, "dpi": 96}
|
||||
}
|
||||
],
|
||||
"model": "azure_ai/doc-intelligence/prebuilt-layout",
|
||||
"usage_info": {"pages_processed": 1},
|
||||
"object": "ocr"
|
||||
}
|
||||
|
||||
Args:
|
||||
model: Model name
|
||||
raw_response: Raw HTTP response from Azure DI (may be 202 Accepted)
|
||||
logging_obj: Logging object
|
||||
|
||||
Returns:
|
||||
OCRResponse in Mistral format
|
||||
"""
|
||||
try:
|
||||
# Check if we got 202 Accepted (async operation started)
|
||||
if raw_response.status_code == 202:
|
||||
verbose_logger.debug(
|
||||
"Azure DI returned 202 Accepted, polling operation..."
|
||||
)
|
||||
|
||||
# Get Operation-Location header
|
||||
operation_url = raw_response.headers.get("Operation-Location")
|
||||
if not operation_url:
|
||||
raise ValueError(
|
||||
"Azure Document Intelligence returned 202 but no Operation-Location header found"
|
||||
)
|
||||
|
||||
# Get headers for polling (need auth)
|
||||
poll_headers = {
|
||||
"Ocp-Apim-Subscription-Key": raw_response.request.headers.get(
|
||||
"Ocp-Apim-Subscription-Key", ""
|
||||
)
|
||||
}
|
||||
|
||||
# Get timeout from kwargs or use default
|
||||
timeout_secs = AZURE_OPERATION_POLLING_TIMEOUT
|
||||
|
||||
# Poll until operation completes
|
||||
raw_response = self._poll_operation_sync(
|
||||
operation_url=operation_url,
|
||||
headers=poll_headers,
|
||||
timeout_secs=timeout_secs,
|
||||
)
|
||||
|
||||
# Now parse the completed response
|
||||
response_json = raw_response.json()
|
||||
|
||||
verbose_logger.debug(
|
||||
f"Azure Document Intelligence response status: {response_json.get('status')}"
|
||||
)
|
||||
|
||||
# Check if request succeeded
|
||||
status = response_json.get("status")
|
||||
if status != "succeeded":
|
||||
raise ValueError(
|
||||
f"Azure Document Intelligence analysis failed with status: {status}"
|
||||
)
|
||||
|
||||
# Extract analyze result
|
||||
analyze_result = response_json.get("analyzeResult", {})
|
||||
azure_pages = analyze_result.get("pages", [])
|
||||
|
||||
# Transform pages to Mistral format
|
||||
mistral_pages = []
|
||||
for azure_page in azure_pages:
|
||||
page_number = azure_page.get("pageNumber", 1)
|
||||
index = page_number - 1 # Convert to 0-based index
|
||||
|
||||
# Extract markdown text
|
||||
markdown = self._extract_page_markdown(azure_page)
|
||||
|
||||
# Convert dimensions
|
||||
width = azure_page.get("width", 8.5)
|
||||
height = azure_page.get("height", 11)
|
||||
unit = azure_page.get("unit", "inch")
|
||||
dimensions = self._convert_dimensions(
|
||||
width=width, height=height, unit=unit
|
||||
)
|
||||
|
||||
# Build OCR page
|
||||
ocr_page = OCRPage(
|
||||
index=index, markdown=markdown, dimensions=dimensions
|
||||
)
|
||||
mistral_pages.append(ocr_page)
|
||||
|
||||
# Build usage info
|
||||
usage_info = OCRUsageInfo(
|
||||
pages_processed=len(mistral_pages), doc_size_bytes=None
|
||||
)
|
||||
|
||||
# Return Mistral OCR response
|
||||
return OCRResponse(
|
||||
pages=mistral_pages,
|
||||
model=model,
|
||||
usage_info=usage_info,
|
||||
object="ocr",
|
||||
)
|
||||
|
||||
except Exception as e:
|
||||
verbose_logger.error(
|
||||
f"Error parsing Azure Document Intelligence response: {e}"
|
||||
)
|
||||
raise e
|
||||
|
||||
async def async_transform_ocr_response(
|
||||
self,
|
||||
model: str,
|
||||
raw_response: httpx.Response,
|
||||
logging_obj: Any,
|
||||
**kwargs,
|
||||
) -> OCRResponse:
|
||||
"""
|
||||
Async transform Azure Document Intelligence response to Mistral OCR format.
|
||||
|
||||
Handles async operation polling: If response is 202 Accepted, polls Operation-Location
|
||||
until analysis completes using async polling.
|
||||
|
||||
Args:
|
||||
model: Model name
|
||||
raw_response: Raw HTTP response from Azure DI (may be 202 Accepted)
|
||||
logging_obj: Logging object
|
||||
|
||||
Returns:
|
||||
OCRResponse in Mistral format
|
||||
"""
|
||||
try:
|
||||
# Check if we got 202 Accepted (async operation started)
|
||||
if raw_response.status_code == 202:
|
||||
verbose_logger.debug(
|
||||
"Azure DI returned 202 Accepted, polling operation (async)..."
|
||||
)
|
||||
|
||||
# Get Operation-Location header
|
||||
operation_url = raw_response.headers.get("Operation-Location")
|
||||
if not operation_url:
|
||||
raise ValueError(
|
||||
"Azure Document Intelligence returned 202 but no Operation-Location header found"
|
||||
)
|
||||
|
||||
# Get headers for polling (need auth)
|
||||
poll_headers = {
|
||||
"Ocp-Apim-Subscription-Key": raw_response.request.headers.get(
|
||||
"Ocp-Apim-Subscription-Key", ""
|
||||
)
|
||||
}
|
||||
|
||||
# Get timeout from kwargs or use default
|
||||
timeout_secs = AZURE_OPERATION_POLLING_TIMEOUT
|
||||
|
||||
# Poll until operation completes (async)
|
||||
raw_response = await self._poll_operation_async(
|
||||
operation_url=operation_url,
|
||||
headers=poll_headers,
|
||||
timeout_secs=timeout_secs,
|
||||
)
|
||||
|
||||
# Now parse the completed response
|
||||
response_json = raw_response.json()
|
||||
|
||||
verbose_logger.debug(
|
||||
f"Azure Document Intelligence response status: {response_json.get('status')}"
|
||||
)
|
||||
|
||||
# Check if request succeeded
|
||||
status = response_json.get("status")
|
||||
if status != "succeeded":
|
||||
raise ValueError(
|
||||
f"Azure Document Intelligence analysis failed with status: {status}"
|
||||
)
|
||||
|
||||
# Extract analyze result
|
||||
analyze_result = response_json.get("analyzeResult", {})
|
||||
azure_pages = analyze_result.get("pages", [])
|
||||
|
||||
# Transform pages to Mistral format
|
||||
mistral_pages = []
|
||||
for azure_page in azure_pages:
|
||||
page_number = azure_page.get("pageNumber", 1)
|
||||
index = page_number - 1 # Convert to 0-based index
|
||||
|
||||
# Extract markdown text
|
||||
markdown = self._extract_page_markdown(azure_page)
|
||||
|
||||
# Convert dimensions
|
||||
width = azure_page.get("width", 8.5)
|
||||
height = azure_page.get("height", 11)
|
||||
unit = azure_page.get("unit", "inch")
|
||||
dimensions = self._convert_dimensions(
|
||||
width=width, height=height, unit=unit
|
||||
)
|
||||
|
||||
# Build OCR page
|
||||
ocr_page = OCRPage(
|
||||
index=index, markdown=markdown, dimensions=dimensions
|
||||
)
|
||||
mistral_pages.append(ocr_page)
|
||||
|
||||
# Build usage info
|
||||
usage_info = OCRUsageInfo(
|
||||
pages_processed=len(mistral_pages), doc_size_bytes=None
|
||||
)
|
||||
|
||||
# Return Mistral OCR response
|
||||
return OCRResponse(
|
||||
pages=mistral_pages,
|
||||
model=model,
|
||||
usage_info=usage_info,
|
||||
object="ocr",
|
||||
)
|
||||
|
||||
except Exception as e:
|
||||
verbose_logger.error(
|
||||
f"Error parsing Azure Document Intelligence response (async): {e}"
|
||||
)
|
||||
raise e
|
||||
|
||||
@@ -198,6 +198,36 @@ class BaseOCRConfig:
|
||||
"""
|
||||
raise NotImplementedError("transform_ocr_response must be implemented by provider")
|
||||
|
||||
async def async_transform_ocr_response(
|
||||
self,
|
||||
model: str,
|
||||
raw_response: httpx.Response,
|
||||
logging_obj: LiteLLMLoggingObj,
|
||||
**kwargs,
|
||||
) -> OCRResponse:
|
||||
"""
|
||||
Async transform provider-specific OCR response to standard format.
|
||||
Optional method - providers can override if they need async transformations
|
||||
(e.g., Azure Document Intelligence for async operation polling).
|
||||
|
||||
Default implementation falls back to sync transform_ocr_response.
|
||||
|
||||
Args:
|
||||
model: Model name
|
||||
raw_response: Raw HTTP response
|
||||
logging_obj: Logging object
|
||||
|
||||
Returns:
|
||||
OCRResponse in standard format
|
||||
"""
|
||||
# Default implementation: call sync version
|
||||
return self.transform_ocr_response(
|
||||
model=model,
|
||||
raw_response=raw_response,
|
||||
logging_obj=logging_obj,
|
||||
**kwargs,
|
||||
)
|
||||
|
||||
def get_error_class(
|
||||
self,
|
||||
error_message: str,
|
||||
|
||||
@@ -1552,10 +1552,10 @@ class BaseLLMHTTPHandler:
|
||||
except Exception as e:
|
||||
raise self._handle_error(e=e, provider_config=provider_config)
|
||||
|
||||
return self._transform_ocr_response(
|
||||
provider_config=provider_config,
|
||||
# Use async response transform for async operations
|
||||
return await provider_config.async_transform_ocr_response(
|
||||
model=model,
|
||||
response=response,
|
||||
raw_response=response,
|
||||
logging_obj=logging_obj,
|
||||
)
|
||||
|
||||
|
||||
@@ -3387,6 +3387,33 @@
|
||||
],
|
||||
"source": "https://devblogs.microsoft.com/foundry/whats-new-in-azure-ai-foundry-august-2025/#mistral-document-ai-(ocr)-%E2%80%94-serverless-in-foundry"
|
||||
},
|
||||
"azure_ai/doc-intelligence/prebuilt-read": {
|
||||
"litellm_provider": "azure_ai",
|
||||
"ocr_cost_per_page": 1.5e-3,
|
||||
"mode": "ocr",
|
||||
"supported_endpoints": [
|
||||
"/v1/ocr"
|
||||
],
|
||||
"source": "https://azure.microsoft.com/en-us/pricing/details/ai-document-intelligence/"
|
||||
},
|
||||
"azure_ai/doc-intelligence/prebuilt-layout": {
|
||||
"litellm_provider": "azure_ai",
|
||||
"ocr_cost_per_page": 1e-2,
|
||||
"mode": "ocr",
|
||||
"supported_endpoints": [
|
||||
"/v1/ocr"
|
||||
],
|
||||
"source": "https://azure.microsoft.com/en-us/pricing/details/ai-document-intelligence/"
|
||||
},
|
||||
"azure_ai/doc-intelligence/prebuilt-document": {
|
||||
"litellm_provider": "azure_ai",
|
||||
"ocr_cost_per_page": 1e-2,
|
||||
"mode": "ocr",
|
||||
"supported_endpoints": [
|
||||
"/v1/ocr"
|
||||
],
|
||||
"source": "https://azure.microsoft.com/en-us/pricing/details/ai-document-intelligence/"
|
||||
},
|
||||
"azure_ai/MAI-DS-R1": {
|
||||
"input_cost_per_token": 1.35e-06,
|
||||
"litellm_provider": "azure_ai",
|
||||
|
||||
@@ -7741,12 +7741,16 @@ class ProviderConfigManager:
|
||||
"""
|
||||
Get OCR configuration for a given provider.
|
||||
"""
|
||||
from litellm.llms.azure_ai.ocr.transformation import AzureAIOCRConfig
|
||||
from litellm.llms.vertex_ai.ocr.transformation import VertexAIOCRConfig
|
||||
|
||||
# Special handling for Azure AI - distinguish between Mistral OCR and Document Intelligence
|
||||
if provider == litellm.LlmProviders.AZURE_AI:
|
||||
from litellm.llms.azure_ai.ocr.common_utils import get_azure_ai_ocr_config
|
||||
|
||||
return get_azure_ai_ocr_config(model=model)
|
||||
|
||||
PROVIDER_TO_CONFIG_MAP = {
|
||||
litellm.LlmProviders.MISTRAL: MistralOCRConfig,
|
||||
litellm.LlmProviders.AZURE_AI: AzureAIOCRConfig,
|
||||
litellm.LlmProviders.VERTEX_AI: VertexAIOCRConfig,
|
||||
}
|
||||
config_class = PROVIDER_TO_CONFIG_MAP.get(provider, None)
|
||||
|
||||
@@ -3387,6 +3387,33 @@
|
||||
],
|
||||
"source": "https://devblogs.microsoft.com/foundry/whats-new-in-azure-ai-foundry-august-2025/#mistral-document-ai-(ocr)-%E2%80%94-serverless-in-foundry"
|
||||
},
|
||||
"azure_ai/doc-intelligence/prebuilt-read": {
|
||||
"litellm_provider": "azure_ai",
|
||||
"ocr_cost_per_page": 1.5e-3,
|
||||
"mode": "ocr",
|
||||
"supported_endpoints": [
|
||||
"/v1/ocr"
|
||||
],
|
||||
"source": "https://azure.microsoft.com/en-us/pricing/details/ai-document-intelligence/"
|
||||
},
|
||||
"azure_ai/doc-intelligence/prebuilt-layout": {
|
||||
"litellm_provider": "azure_ai",
|
||||
"ocr_cost_per_page": 1e-2,
|
||||
"mode": "ocr",
|
||||
"supported_endpoints": [
|
||||
"/v1/ocr"
|
||||
],
|
||||
"source": "https://azure.microsoft.com/en-us/pricing/details/ai-document-intelligence/"
|
||||
},
|
||||
"azure_ai/doc-intelligence/prebuilt-document": {
|
||||
"litellm_provider": "azure_ai",
|
||||
"ocr_cost_per_page": 1e-2,
|
||||
"mode": "ocr",
|
||||
"supported_endpoints": [
|
||||
"/v1/ocr"
|
||||
],
|
||||
"source": "https://azure.microsoft.com/en-us/pricing/details/ai-document-intelligence/"
|
||||
},
|
||||
"azure_ai/MAI-DS-R1": {
|
||||
"input_cost_per_token": 1.35e-06,
|
||||
"litellm_provider": "azure_ai",
|
||||
|
||||
@@ -197,6 +197,23 @@
|
||||
"ocr": true
|
||||
}
|
||||
},
|
||||
"azure_ai/doc-intelligence": {
|
||||
"display_name": "Azure AI Document Intelligence (`azure_ai/doc-intelligence`)",
|
||||
"url": "https://docs.litellm.ai/docs/providers/azure_document_intelligence",
|
||||
"endpoints": {
|
||||
"chat_completions": false,
|
||||
"messages": false,
|
||||
"responses": false,
|
||||
"embeddings": false,
|
||||
"image_generations": false,
|
||||
"audio_transcriptions": false,
|
||||
"audio_speech": false,
|
||||
"moderations": false,
|
||||
"batches": false,
|
||||
"rerank": false,
|
||||
"ocr": true
|
||||
}
|
||||
},
|
||||
"azure_text": {
|
||||
"display_name": "Azure Text (`azure_text`)",
|
||||
"url": "https://docs.litellm.ai/docs/providers/azure",
|
||||
|
||||
44
tests/ocr_tests/test_ocr_azure_document_intelligence.py
Normal file
44
tests/ocr_tests/test_ocr_azure_document_intelligence.py
Normal file
@@ -0,0 +1,44 @@
|
||||
"""
|
||||
Test OCR functionality with Azure Document Intelligence API.
|
||||
|
||||
Azure Document Intelligence provides advanced document analysis capabilities
|
||||
using the v4.0 (2024-11-30) API.
|
||||
"""
|
||||
import os
|
||||
|
||||
import pytest
|
||||
|
||||
from base_ocr_unit_tests import BaseOCRTest
|
||||
|
||||
|
||||
class TestAzureDocumentIntelligenceOCR(BaseOCRTest):
|
||||
"""
|
||||
Test class for Azure Document Intelligence OCR functionality.
|
||||
|
||||
Inherits from BaseOCRTest and provides Azure Document Intelligence-specific configuration.
|
||||
|
||||
Tests the azure_ai/doc-intelligence/<model> provider route.
|
||||
"""
|
||||
|
||||
def get_base_ocr_call_args(self) -> dict:
|
||||
"""
|
||||
Return the base OCR call args for Azure Document Intelligence.
|
||||
|
||||
Uses prebuilt-layout model which is closest to Mistral OCR format.
|
||||
"""
|
||||
# Check for required environment variables
|
||||
api_key = os.environ.get("AZURE_DOCUMENT_INTELLIGENCE_API_KEY")
|
||||
endpoint = os.environ.get("AZURE_DOCUMENT_INTELLIGENCE_ENDPOINT")
|
||||
|
||||
if not api_key or not endpoint:
|
||||
pytest.skip(
|
||||
"AZURE_DOCUMENT_INTELLIGENCE_API_KEY and AZURE_DOCUMENT_INTELLIGENCE_ENDPOINT "
|
||||
"environment variables are required for Azure Document Intelligence tests"
|
||||
)
|
||||
|
||||
return {
|
||||
"model": "azure_ai/doc-intelligence/prebuilt-layout",
|
||||
"api_key": api_key,
|
||||
"api_base": endpoint,
|
||||
}
|
||||
|
||||
Reference in New Issue
Block a user