litellm

mirror of https://github.com/BerriAI/litellm.git synced 2025-12-06 11:33:26 +08:00

Author	SHA1	Message	Date
abi_jey	a40c6ae0c0	fix: remove the unnecessary config changes	2025-11-26 13:33:52 +00:00
abi_jey	98344417ab	fix: tested e2e implementation and added sample config.	2025-11-26 12:37:13 +00:00
Ishaan Jaffer	d6b0c11d5b	test fixes, fk azure	2025-10-25 17:15:52 -07:00
Ishaan Jaffer	6ac21ddcec	fix build and test gpt-3.5-turbo	2025-10-25 16:39:14 -07:00
Ishaan Jaffer	a6b6e56246	fixes azure	2025-10-25 15:54:30 -07:00
Sameer Kankute	b9585b1db5	Update documentation for enable_caching_on_provider_specific_optional_params (#15885 )	2025-10-24 10:22:27 -07:00
Sameer Kankute	dce6cd1051	Add shared healthcheck	2025-10-09 22:18:05 +05:30
Ishaan Jaffer	4054eeea20	test build and test	2025-09-27 09:26:38 -07:00
Ishaan Jaff	9761ba7c7a	[Bug Fix] Responses api session management for streaming responses (#13396 ) * fix proxy config * fix(responses api): fix streaming ID consistency and tool format handling (#12640) * fix(responses): ensure streaming chunk IDs use consistent encoding format Fixes streaming ID inconsistency where streaming responses used raw provider IDs while non-streaming responses used properly encoded IDs with provider context. Changes: - Updated LiteLLMCompletionStreamingIterator to accept provider context - Added _encode_chunk_id() method using same logic as non-streaming responses - Modified chunk transformation to encode all streaming item_ids with resp_ prefix - Updated handlers to pass custom_llm_provider and litellm_metadata to streaming iterator Impact: - Streaming chunk IDs now format: resp_<base64_encoded_provider_context> - Enables session continuity when using streaming response IDs as previous_response_id - Allows provider detection and load balancing with streaming responses - Maintains backward compatibility with existing streaming functionality 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * fix(types): add explicit Optional[str] type annotation for model_id This resolves MyPy type checking error where model_id could be None but wasn't explicitly typed as Optional[str]. * fix(types): handle None case for litellm_metadata access Prevents 'Item None has no attribute get' error by checking for None before accessing litellm_metadata dictionary. * test: add comprehensive tests for streaming ID consistency Adds unit and E2E tests to verify streaming chunk IDs are properly encoded with consistent format across streaming responses. ## Tests Added ### Unit Test (test_reasoning_content_transformation.py) - `test_streaming_chunk_id_encoding()`: Validates the `_encode_chunk_id()` method correctly encodes chunk IDs with `resp_` prefix and provider context ### E2E Tests (test_e2e_openai_responses_api.py) - `test_streaming_id_consistency_across_chunks()`: Tests that all streaming chunk IDs are properly encoded across multiple chunks in a real streaming response - `test_streaming_response_id_as_previous_response_id()`: Tests the core use case - using streaming response IDs for session continuity with `previous_response_id` ## Key Testing Approach - Uses Gemini (non-OpenAI model) to test the transformation logic rather than OpenAI passthrough, since the streaming ID consistency issue occurs when LiteLLM transforms responses rather than just passing through to native OpenAI responses API - Tests validate that streaming chunk IDs now use same encoding as non-streaming responses - Verifies session continuity works with streaming responses Addresses @ishaan-jaff's request for unit tests covering the streaming ID consistency fix. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * fix(lint): remove unused imports in transformation.py Removes unused imports to fix CI linting errors: - GenericResponseOutputItem - OutputFunctionToolCall * test: remove E2E tests from openai_endpoints_tests Remove streaming ID consistency E2E tests as requested by @ishaan-jaff. Keep only the mock/unit test in test_reasoning_content_transformation.py * revert: remove streaming chunk ID encoding to original behavior This reverts the streaming chunk ID encoding changes to understand the original issue better. Original behavior was: - Streaming chunks: raw provider IDs - Streaming final response: raw IDs (PROBLEM!) - Non-streaming final response: encoded IDs (correct) The real issue: streaming final response IDs were not encoded, breaking session continuity. * fix(responses): encode streaming final response IDs to match OpenAI behavior Fixes streaming ID inconsistency to match OpenAI's Responses API behavior: - Streaming chunks: raw message IDs (like OpenAI's msg_xxx) - Final response: encoded IDs (like OpenAI's resp_xxx) This enables session continuity by ensuring streaming final response IDs have the same encoded format as non-streaming responses, allowing them to be used as previous_response_id in follow-up requests. Changes: - Add custom_llm_provider and litellm_metadata to LiteLLMCompletionStreamingIterator - Update handlers to pass provider context to streaming iterator - Apply _update_responses_api_response_id_with_model_id to final streaming response - Keep streaming chunks as raw IDs to match OpenAI format Impact: - Session continuity works with streaming responses - Load balancing can detect provider from streaming final response IDs - Format matches OpenAI's Responses API exactly 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * test: update unit test to match correct OpenAI-compatible behavior Updates the unit test to verify streaming chunk IDs are raw (not encoded) to match OpenAI's responses API format: - Streaming chunks: raw message IDs (like msg_xxx) - Final response: encoded IDs (like resp_xxx) This reflects the correct behavior implemented in the fix. --------- Co-authored-by: Claude <noreply@anthropic.com> * cleanup * TestBaseResponsesAPIStreamingIterator --------- Co-authored-by: Javier de la Torre <jatorre@carto.com> Co-authored-by: Claude <noreply@anthropic.com>	2025-08-07 20:13:24 -07:00
Krish Dholakia	9c32525c17	build: update model in test (#10706 )	2025-05-09 13:33:11 -07:00
Krrish Dholakia	96e31edad3	build(proxy_server_config.yaml): move to model with higher quota	2025-05-08 22:18:27 -07:00
Ishaan Jaff	97d7a5e78e	fix deployment name	2025-04-19 09:23:22 -07:00
Ishaan Jaff	8a1023fa2d	test image gen fix in build and test	2025-04-02 21:33:24 -07:00
Ishaan Jaff	6b3bfa2b42	(Feat) - return `x-litellm-attempted-fallbacks` in responses from litellm proxy (#8558 ) * add_fallback_headers_to_response * test x-litellm-attempted-fallbacks * unit test attempted fallbacks * fix add_fallback_headers_to_response * docs document response headers * fix file name	2025-02-15 14:54:23 -08:00
Krish Dholakia	6bafdbc546	Litellm dev 01 25 2025 p4 (#8006 ) * feat(main.py): use asyncio.sleep for mock_Timeout=true on async request adds unit testing to ensure proxy does not fail if specific Openai requests hang (e.g. recent o1 outage) * fix(streaming_handler.py): fix deepseek r1 return reasoning content on streaming Fixes https://github.com/BerriAI/litellm/issues/7942 * Revert "fix(streaming_handler.py): fix deepseek r1 return reasoning content on streaming" This reverts commit `7a052a64e3`. * fix(deepseek-r-1): return reasoning_content as a top-level param ensures compatibility with existing tools that use it * fix: fix linting error	2025-01-26 08:01:05 -08:00
Krish Dholakia	08b124aeb6	Litellm dev 01 25 2025 p2 (#8003 ) * fix(base_utils.py): supported nested json schema passed in for anthropic calls * refactor(base_utils.py): refactor ref parsing to prevent infinite loop * test(test_openai_endpoints.py): refactor anthropic test to use bedrock * fix(langfuse_prompt_management.py): add unit test for sync langfuse calls Resolves https://github.com/BerriAI/litellm/issues/7938#issuecomment-2613293757	2025-01-25 16:50:57 -08:00
Krish Dholakia	513b1904ab	Add `attempted-retries` and `timeout` values to response headers + more testing (#7926 ) * feat(router.py): add retry headers to response makes it easy to add testing to ensure model-specific retries are respected * fix(add_retry_headers.py): clarify attempted retries vs. max retries * test(test_fallbacks.py): add test for checking if max retries set for model is respected * test(test_fallbacks.py): assert values for attempted retries and max retries are as expected * fix(utils.py): return timeout in litellm proxy response headers * test(test_fallbacks.py): add test to assert model specific timeout used on timeout error * test: add bad model with timeout to proxy * fix: fix linting error * fix(router.py): fix get model list from model alias * test: loosen test restriction - account for other events on proxy	2025-01-22 22:19:44 -08:00
Krish Dholakia	3a7b13efa2	feat(health_check.py): set upperbound for api when making health check call (#7865 ) * feat(health_check.py): set upperbound for api when making health check call prevent bad model from health check to hang and cause pod restarts * fix(health_check.py): cleanup task once completed * fix(constants.py): bump default health check timeout to 1min * docs(health.md): add 'health_check_timeout' to health docs on litellm * build(proxy_server_config.yaml): add bad model to health check	2025-01-18 19:47:43 -08:00
Ishaan Jaff	47e12802df	(feat) `/batches` Add support for using `/batches` endpoints in OAI format (#7402 ) * run azure testing on ci/cd * update docs on azure batches endpoints * add input azure.jsonl * refactor - use separate file for batches endpoints * fixes for passing custom llm provider to /batch endpoints * pass custom llm provider to files endpoints * update azure batches doc * add info for azure batches api * update batches endpoints * use simple helper for raising proxy exception * update config.yml * fix imports * update tests * use existing settings * update env var used * update configs * update config.yml * update ft testing	2024-12-24 16:58:05 -08:00
Krish Dholakia	4ac66bd843	LiteLLM Minor Fixes and Improvements (09/07/2024) (#5580 ) * fix(litellm_logging.py): set completion_start_time_float to end_time_float if none Fixes https://github.com/BerriAI/litellm/issues/5500 * feat(_init_.py): add new 'openai_text_completion_compatible_providers' list Fixes https://github.com/BerriAI/litellm/issues/5558 Handles correctly routing fireworks ai calls when done via text completions * fix: fix linting errors * fix: fix linting errors * fix(openai.py): fix exception raised * fix(openai.py): fix error handling * fix(_redis.py): allow all supported arguments for redis cluster (#5554) * Revert "fix(_redis.py): allow all supported arguments for redis cluster (#5554)" (#5583) This reverts commit `f2191ef4cb`. * fix(router.py): return model alias w/ underlying deployment on router.get_model_list() Fixes https://github.com/BerriAI/litellm/issues/5524#issuecomment-2336410666 * test: handle flaky tests --------- Co-authored-by: Jonas Dittrich <58814480+Kakadus@users.noreply.github.com>	2024-09-09 18:54:17 -07:00
Krrish Dholakia	0a016d33e6	Revert "fix(router.py): return model alias w/ underlying deployment on router.get_model_list()" This reverts commit `638896309c`.	2024-09-07 18:04:56 -07:00
Krrish Dholakia	638896309c	fix(router.py): return model alias w/ underlying deployment on router.get_model_list() Fixes https://github.com/BerriAI/litellm/issues/5524#issuecomment-2336410666	2024-09-07 18:01:31 -07:00
Ishaan Jaff	f1ffa82062	fix use provider specific routing	2024-08-07 14:37:20 -07:00
Ishaan Jaff	404360b28d	test pass through endpoint	2024-08-06 12:16:00 -07:00
Ishaan Jaff	b35c63001d	fix setup for endpoints	2024-07-31 17:09:08 -07:00
Ishaan Jaff	c8dfc95e90	add examples on config	2024-07-31 15:29:06 -07:00
Ishaan Jaff	9863520376	support using /	2024-07-25 18:48:56 -07:00
Ishaan Jaff	e2397c3b83	fix test_team_2logging langfuse	2024-06-19 21:14:18 -07:00
Ishaan Jaff	d409ffbaa9	fix test_chat_completion_different_deployments	2024-06-17 23:04:48 -07:00
Ishaan Jaff	cb386fda20	test - making mistral embedding request on proxy	2024-06-12 15:10:20 -07:00
Marc Abramowitz	83c242bbb3	Add commented set_verbose line to proxy_config because I've wanted to do this a couple of times and couldn't remember the exact syntax.	2024-05-16 15:59:37 -07:00
Krrish Dholakia	54587db402	fix(alerting.py): fix datetime comparison logic	2024-05-14 22:10:09 -07:00
Ishaan Jaff	9bde3ccd1d	(ci/cd) fixes	2024-05-13 20:49:02 -07:00
Krrish Dholakia	99e8f0715e	test(test_end_users.py): fix end user region routing test	2024-05-11 22:42:43 -07:00
Ishaan Jaff	9c4f1ec3e5	fix - failing test_end_user_specific_region test	2024-05-11 17:05:37 -07:00
Ishaan Jaff	a4695c3010	test - using langfuse as a failure callback	2024-05-10 17:37:32 -07:00
Krrish Dholakia	3d18897d69	feat(router.py): enable filtering model group by 'allowed_model_region'	2024-05-08 22:10:17 -07:00
Ishaan Jaff	6a06aba443	(ci/cd) use db connection limit	2024-05-06 11:15:22 -07:00
Ishaan Jaff	e8d3dd475a	fix fake endpoint used on ci/cd	2024-05-06 10:37:39 -07:00
Ishaan Jaff	56a75ee7fe	(ci/cd) fix tests	2024-05-01 13:42:54 -07:00
Krrish Dholakia	d4bca6707b	ci(proxy_server_config.yaml): use redis for usage-based-routing-v2	2024-04-22 13:34:36 -07:00
Krrish Dholakia	1507b23e30	test(test_openai_endpoints.py): make test stricter	2024-04-20 12:11:54 -07:00
Krrish Dholakia	01a1a8f731	fix(caching.py): dual cache async_batch_get_cache fix + testing this fixes a bug in usage-based-routing-v2 which was caused b/c of how the result was being returned from dual cache async_batch_get_cache. it also adds unit testing for that function (and it's sync equivalent)	2024-04-19 15:03:25 -07:00
Ishaan Jaff	adae555fb1	Merge branch 'main' into litellm_fix_using_wildcard_openai_models_proxy	2024-04-15 14:35:06 -07:00
Ishaan Jaff	6df5337e65	test - wildcard openai models on proxy	2024-04-15 14:05:26 -07:00
Ishaan Jaff	ecc6aa060f	test - team based logging on proxy	2024-04-15 13:26:55 -07:00
Krrish Dholakia	ea1574c160	test(test_openai_endpoints.py): add concurrency testing for user defined rate limits on proxy	2024-04-12 18:56:13 -07:00
Krrish Dholakia	74aa230eac	fix(main.py): automatically infer mode for text completion models	2024-04-12 14:16:21 -07:00
Krrish Dholakia	3665b890f8	build(proxy_server_config.yaml): cleanup config	2024-04-11 20:20:09 -07:00
Krrish Dholakia	bdfb74f8a5	test(test_openai_endpoints.py): add local test, for proxy concurrency	2024-04-11 17:16:23 -07:00

1 2

85 Commits