85 Commits

Author SHA1 Message Date
abi_jey
a40c6ae0c0 fix: remove the unnecessary config changes 2025-11-26 13:33:52 +00:00
abi_jey
98344417ab fix: tested e2e implementation and added sample config. 2025-11-26 12:37:13 +00:00
Ishaan Jaffer
d6b0c11d5b test fixes, fk azure 2025-10-25 17:15:52 -07:00
Ishaan Jaffer
6ac21ddcec fix build and test gpt-3.5-turbo 2025-10-25 16:39:14 -07:00
Ishaan Jaffer
a6b6e56246 fixes azure 2025-10-25 15:54:30 -07:00
Sameer Kankute
b9585b1db5 Update documentation for enable_caching_on_provider_specific_optional_params (#15885) 2025-10-24 10:22:27 -07:00
Sameer Kankute
dce6cd1051 Add shared healthcheck 2025-10-09 22:18:05 +05:30
Ishaan Jaffer
4054eeea20 test build and test 2025-09-27 09:26:38 -07:00
Ishaan Jaff
9761ba7c7a [Bug Fix] Responses api session management for streaming responses (#13396)
* fix proxy config

* fix(responses api): fix streaming ID consistency and tool format handling (#12640)

* fix(responses): ensure streaming chunk IDs use consistent encoding format

Fixes streaming ID inconsistency where streaming responses used raw provider IDs
while non-streaming responses used properly encoded IDs with provider context.

Changes:
- Updated LiteLLMCompletionStreamingIterator to accept provider context
- Added _encode_chunk_id() method using same logic as non-streaming responses
- Modified chunk transformation to encode all streaming item_ids with resp_ prefix
- Updated handlers to pass custom_llm_provider and litellm_metadata to streaming iterator

Impact:
- Streaming chunk IDs now format: resp_<base64_encoded_provider_context>
- Enables session continuity when using streaming response IDs as previous_response_id
- Allows provider detection and load balancing with streaming responses
- Maintains backward compatibility with existing streaming functionality

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>

* fix(types): add explicit Optional[str] type annotation for model_id

This resolves MyPy type checking error where model_id could be None
but wasn't explicitly typed as Optional[str].

* fix(types): handle None case for litellm_metadata access

Prevents 'Item None has no attribute get' error by checking for None
before accessing litellm_metadata dictionary.

* test: add comprehensive tests for streaming ID consistency

Adds unit and E2E tests to verify streaming chunk IDs are properly encoded
with consistent format across streaming responses.

## Tests Added

### Unit Test (test_reasoning_content_transformation.py)
- `test_streaming_chunk_id_encoding()`: Validates the `_encode_chunk_id()` method
  correctly encodes chunk IDs with `resp_` prefix and provider context

### E2E Tests (test_e2e_openai_responses_api.py)
- `test_streaming_id_consistency_across_chunks()`: Tests that all streaming chunk IDs
  are properly encoded across multiple chunks in a real streaming response
- `test_streaming_response_id_as_previous_response_id()`: Tests the core use case -
  using streaming response IDs for session continuity with `previous_response_id`

## Key Testing Approach
- Uses **Gemini** (non-OpenAI model) to test the transformation logic rather than
  OpenAI passthrough, since the streaming ID consistency issue occurs when LiteLLM
  transforms responses rather than just passing through to native OpenAI responses API
- Tests validate that streaming chunk IDs now use same encoding as non-streaming responses
- Verifies session continuity works with streaming responses

Addresses @ishaan-jaff's request for unit tests covering the streaming ID consistency fix.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>

* fix(lint): remove unused imports in transformation.py

Removes unused imports to fix CI linting errors:
- GenericResponseOutputItem
- OutputFunctionToolCall

* test: remove E2E tests from openai_endpoints_tests

Remove streaming ID consistency E2E tests as requested by @ishaan-jaff.
Keep only the mock/unit test in test_reasoning_content_transformation.py

* revert: remove streaming chunk ID encoding to original behavior

This reverts the streaming chunk ID encoding changes to understand the original issue better.
Original behavior was:
- Streaming chunks: raw provider IDs
- Streaming final response: raw IDs (PROBLEM!)
- Non-streaming final response: encoded IDs (correct)

The real issue: streaming final response IDs were not encoded, breaking session continuity.

* fix(responses): encode streaming final response IDs to match OpenAI behavior

Fixes streaming ID inconsistency to match OpenAI's Responses API behavior:
- Streaming chunks: raw message IDs (like OpenAI's msg_xxx)
- Final response: encoded IDs (like OpenAI's resp_xxx)

This enables session continuity by ensuring streaming final response IDs
have the same encoded format as non-streaming responses, allowing them
to be used as previous_response_id in follow-up requests.

Changes:
- Add custom_llm_provider and litellm_metadata to LiteLLMCompletionStreamingIterator
- Update handlers to pass provider context to streaming iterator
- Apply _update_responses_api_response_id_with_model_id to final streaming response
- Keep streaming chunks as raw IDs to match OpenAI format

Impact:
- Session continuity works with streaming responses
- Load balancing can detect provider from streaming final response IDs
- Format matches OpenAI's Responses API exactly

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>

* test: update unit test to match correct OpenAI-compatible behavior

Updates the unit test to verify streaming chunk IDs are raw (not encoded)
to match OpenAI's responses API format:
- Streaming chunks: raw message IDs (like msg_xxx)
- Final response: encoded IDs (like resp_xxx)

This reflects the correct behavior implemented in the fix.

---------

Co-authored-by: Claude <noreply@anthropic.com>

* cleanup

* TestBaseResponsesAPIStreamingIterator

---------

Co-authored-by: Javier de la Torre <jatorre@carto.com>
Co-authored-by: Claude <noreply@anthropic.com>
2025-08-07 20:13:24 -07:00
Krish Dholakia
9c32525c17 build: update model in test (#10706) 2025-05-09 13:33:11 -07:00
Krrish Dholakia
96e31edad3 build(proxy_server_config.yaml): move to model with higher quota 2025-05-08 22:18:27 -07:00
Ishaan Jaff
97d7a5e78e fix deployment name 2025-04-19 09:23:22 -07:00
Ishaan Jaff
8a1023fa2d test image gen fix in build and test 2025-04-02 21:33:24 -07:00
Ishaan Jaff
6b3bfa2b42 (Feat) - return x-litellm-attempted-fallbacks in responses from litellm proxy (#8558)
* add_fallback_headers_to_response

* test x-litellm-attempted-fallbacks

* unit test attempted fallbacks

* fix add_fallback_headers_to_response

* docs document response headers

* fix file name
2025-02-15 14:54:23 -08:00
Krish Dholakia
6bafdbc546 Litellm dev 01 25 2025 p4 (#8006)
* feat(main.py): use asyncio.sleep for mock_Timeout=true on async request

adds unit testing to ensure proxy does not fail if specific Openai requests hang (e.g. recent o1 outage)

* fix(streaming_handler.py): fix deepseek r1 return reasoning content on streaming

Fixes https://github.com/BerriAI/litellm/issues/7942

* Revert "fix(streaming_handler.py): fix deepseek r1 return reasoning content on streaming"

This reverts commit 7a052a64e3.

* fix(deepseek-r-1): return reasoning_content as a top-level param

ensures compatibility with existing tools that use it

* fix: fix linting error
2025-01-26 08:01:05 -08:00
Krish Dholakia
08b124aeb6 Litellm dev 01 25 2025 p2 (#8003)
* fix(base_utils.py): supported nested json schema passed in for anthropic calls

* refactor(base_utils.py): refactor ref parsing to prevent infinite loop

* test(test_openai_endpoints.py): refactor anthropic test to use bedrock

* fix(langfuse_prompt_management.py): add unit test for sync langfuse calls

Resolves https://github.com/BerriAI/litellm/issues/7938#issuecomment-2613293757
2025-01-25 16:50:57 -08:00
Krish Dholakia
513b1904ab Add attempted-retries and timeout values to response headers + more testing (#7926)
* feat(router.py): add retry headers to response

makes it easy to add testing to ensure model-specific retries are respected

* fix(add_retry_headers.py): clarify attempted retries vs. max retries

* test(test_fallbacks.py): add test for checking if max retries set for model is respected

* test(test_fallbacks.py): assert values for attempted retries and max retries are as expected

* fix(utils.py): return timeout in litellm proxy response headers

* test(test_fallbacks.py): add test to assert model specific timeout used on timeout error

* test: add bad model with timeout to proxy

* fix: fix linting error

* fix(router.py): fix get model list from model alias

* test: loosen test restriction - account for other events on proxy
2025-01-22 22:19:44 -08:00
Krish Dholakia
3a7b13efa2 feat(health_check.py): set upperbound for api when making health check call (#7865)
* feat(health_check.py): set upperbound for api when making health check call

prevent bad model from health check to hang and cause pod restarts

* fix(health_check.py): cleanup task once completed

* fix(constants.py): bump default health check timeout to 1min

* docs(health.md): add 'health_check_timeout' to health docs on litellm

* build(proxy_server_config.yaml): add bad model to health check
2025-01-18 19:47:43 -08:00
Ishaan Jaff
47e12802df (feat) /batches Add support for using /batches endpoints in OAI format (#7402)
* run azure testing on ci/cd

* update docs on azure batches endpoints

* add input azure.jsonl

* refactor - use separate file for batches endpoints

* fixes for passing custom llm provider to /batch endpoints

* pass custom llm provider to files endpoints

* update azure batches doc

* add info for azure batches api

* update batches endpoints

* use simple helper for raising proxy exception

* update config.yml

* fix imports

* update tests

* use existing settings

* update env var used

* update configs

* update config.yml

* update ft testing
2024-12-24 16:58:05 -08:00
Krish Dholakia
4ac66bd843 LiteLLM Minor Fixes and Improvements (09/07/2024) (#5580)
* fix(litellm_logging.py): set completion_start_time_float to end_time_float if none

Fixes https://github.com/BerriAI/litellm/issues/5500

* feat(_init_.py): add new 'openai_text_completion_compatible_providers' list

Fixes https://github.com/BerriAI/litellm/issues/5558

Handles correctly routing fireworks ai calls when done via text completions

* fix: fix linting errors

* fix: fix linting errors

* fix(openai.py): fix exception raised

* fix(openai.py): fix error handling

* fix(_redis.py): allow all supported arguments for redis cluster (#5554)

* Revert "fix(_redis.py): allow all supported arguments for redis cluster (#5554)" (#5583)

This reverts commit f2191ef4cb.

* fix(router.py): return model alias w/ underlying deployment on router.get_model_list()

Fixes https://github.com/BerriAI/litellm/issues/5524#issuecomment-2336410666

* test: handle flaky tests

---------

Co-authored-by: Jonas Dittrich <58814480+Kakadus@users.noreply.github.com>
2024-09-09 18:54:17 -07:00
Krrish Dholakia
0a016d33e6 Revert "fix(router.py): return model alias w/ underlying deployment on router.get_model_list()"
This reverts commit 638896309c.
2024-09-07 18:04:56 -07:00
Krrish Dholakia
638896309c fix(router.py): return model alias w/ underlying deployment on router.get_model_list()
Fixes https://github.com/BerriAI/litellm/issues/5524#issuecomment-2336410666
2024-09-07 18:01:31 -07:00
Ishaan Jaff
f1ffa82062 fix use provider specific routing 2024-08-07 14:37:20 -07:00
Ishaan Jaff
404360b28d test pass through endpoint 2024-08-06 12:16:00 -07:00
Ishaan Jaff
b35c63001d fix setup for endpoints 2024-07-31 17:09:08 -07:00
Ishaan Jaff
c8dfc95e90 add examples on config 2024-07-31 15:29:06 -07:00
Ishaan Jaff
9863520376 support using */* 2024-07-25 18:48:56 -07:00
Ishaan Jaff
e2397c3b83 fix test_team_2logging langfuse 2024-06-19 21:14:18 -07:00
Ishaan Jaff
d409ffbaa9 fix test_chat_completion_different_deployments 2024-06-17 23:04:48 -07:00
Ishaan Jaff
cb386fda20 test - making mistral embedding request on proxy 2024-06-12 15:10:20 -07:00
Marc Abramowitz
83c242bbb3 Add commented set_verbose line to proxy_config
because I've wanted to do this a couple of times and couldn't remember
the exact syntax.
2024-05-16 15:59:37 -07:00
Krrish Dholakia
54587db402 fix(alerting.py): fix datetime comparison logic 2024-05-14 22:10:09 -07:00
Ishaan Jaff
9bde3ccd1d (ci/cd) fixes 2024-05-13 20:49:02 -07:00
Krrish Dholakia
99e8f0715e test(test_end_users.py): fix end user region routing test 2024-05-11 22:42:43 -07:00
Ishaan Jaff
9c4f1ec3e5 fix - failing test_end_user_specific_region test 2024-05-11 17:05:37 -07:00
Ishaan Jaff
a4695c3010 test - using langfuse as a failure callback 2024-05-10 17:37:32 -07:00
Krrish Dholakia
3d18897d69 feat(router.py): enable filtering model group by 'allowed_model_region' 2024-05-08 22:10:17 -07:00
Ishaan Jaff
6a06aba443 (ci/cd) use db connection limit 2024-05-06 11:15:22 -07:00
Ishaan Jaff
e8d3dd475a fix fake endpoint used on ci/cd 2024-05-06 10:37:39 -07:00
Ishaan Jaff
56a75ee7fe (ci/cd) fix tests 2024-05-01 13:42:54 -07:00
Krrish Dholakia
d4bca6707b ci(proxy_server_config.yaml): use redis for usage-based-routing-v2 2024-04-22 13:34:36 -07:00
Krrish Dholakia
1507b23e30 test(test_openai_endpoints.py): make test stricter 2024-04-20 12:11:54 -07:00
Krrish Dholakia
01a1a8f731 fix(caching.py): dual cache async_batch_get_cache fix + testing
this fixes a bug in usage-based-routing-v2 which was caused b/c of how the result was being returned from dual cache async_batch_get_cache. it also adds unit testing for that function (and it's sync equivalent)
2024-04-19 15:03:25 -07:00
Ishaan Jaff
adae555fb1 Merge branch 'main' into litellm_fix_using_wildcard_openai_models_proxy 2024-04-15 14:35:06 -07:00
Ishaan Jaff
6df5337e65 test - wildcard openai models on proxy 2024-04-15 14:05:26 -07:00
Ishaan Jaff
ecc6aa060f test - team based logging on proxy 2024-04-15 13:26:55 -07:00
Krrish Dholakia
ea1574c160 test(test_openai_endpoints.py): add concurrency testing for user defined rate limits on proxy 2024-04-12 18:56:13 -07:00
Krrish Dholakia
74aa230eac fix(main.py): automatically infer mode for text completion models 2024-04-12 14:16:21 -07:00
Krrish Dholakia
3665b890f8 build(proxy_server_config.yaml): cleanup config 2024-04-11 20:20:09 -07:00
Krrish Dholakia
bdfb74f8a5 test(test_openai_endpoints.py): add local test, for proxy concurrency 2024-04-11 17:16:23 -07:00