Spaces:
Running on CPU Upgrade
fix(bedrock): force tool cachePoint via cache_control_injection_points
Bug
LiteLLM 1.83.0's Bedrock Converse adapter reads cache_control blocks on system content blocks but ignores cache_control on individual tool dicts. Only the native Anthropic adapter inspects tool dicts.
Result: the 16k tokens of tool definitions are re-billed at full input price on every Bedrock turn instead of hitting the cache_read tier (10x cheaper).
Confirmed by reading litellm/llms/bedrock/chat/converse_transformation.py โ the tool path appends a cachePoint only when the kwarg cache_control_injection_points=[{location: tool_config}] is passed top-level. The cache_control on the last tool dict (set by agent/core/prompt_caching.py for the Anthropic native path) is silently dropped.
Fix
Pass cache_control_injection_points=[{location: tool_config}] to the Converse adapter from _resolve_llm_params for any bedrock/... model. System-prompt caching still works via cache_control on system content blocks (Converse reads those).
Expected impact
Cache-read ratio on Opus 4.6 should jump from ~17% (Bedrock auto-prefix caching only) towards 50-70% on multi-turn sessions, cutting the input bill ~25-35% (= roughly $2-3k/day at current run-rate).
Test plan
- Deploy
- Compare CloudWatch
CacheReadInputTokenCount/InputTokenCountratio onus.anthropic.claude-opus-4-6-v1before vs 24h after deploy. Expect at least a 2x improvement. - Verify no regression on the Anthropic-direct path (
anthropic/claude-...) since that branch is untouched.
Diff: 1 file, +12 / -1.
Closing โ local test shows no measurable cache benefit on top of the existing system-block cache_control.
| Scenario | cache_creation | cache_read |
|---|---|---|
| system cache_control only | 11550 | 11550 |
| system + tool inject (this PR) | 11546 | 11546 |
| tool inject only (no system cache) | 0 | 5521 |
Bedrock's system cachePoint already extends through the deterministic prefix (system + tools). The tool inject only helps when system cache_control is absent, which prompt_caching.py rules out for "anthropic" models (matches bedrock/us.anthropic.*).
See huggingface/ml-intern#154 for full details.