fix(bedrock): force tool cachePoint via cache_control_injection_points

#24
by GuillaumeSalouHF HF Staff - opened
smolagents org

Bug

LiteLLM 1.83.0's Bedrock Converse adapter reads cache_control blocks on system content blocks but ignores cache_control on individual tool dicts. Only the native Anthropic adapter inspects tool dicts.

Result: the 16k tokens of tool definitions are re-billed at full input price on every Bedrock turn instead of hitting the cache_read tier (10x cheaper).

Confirmed by reading litellm/llms/bedrock/chat/converse_transformation.py โ€” the tool path appends a cachePoint only when the kwarg cache_control_injection_points=[{location: tool_config}] is passed top-level. The cache_control on the last tool dict (set by agent/core/prompt_caching.py for the Anthropic native path) is silently dropped.

Fix

Pass cache_control_injection_points=[{location: tool_config}] to the Converse adapter from _resolve_llm_params for any bedrock/... model. System-prompt caching still works via cache_control on system content blocks (Converse reads those).

Expected impact

Cache-read ratio on Opus 4.6 should jump from ~17% (Bedrock auto-prefix caching only) towards 50-70% on multi-turn sessions, cutting the input bill ~25-35% (= roughly $2-3k/day at current run-rate).

Test plan

  • Deploy
  • Compare CloudWatch CacheReadInputTokenCount / InputTokenCount ratio on us.anthropic.claude-opus-4-6-v1 before vs 24h after deploy. Expect at least a 2x improvement.
  • Verify no regression on the Anthropic-direct path (anthropic/claude-...) since that branch is untouched.

Diff: 1 file, +12 / -1.

smolagents org

Closing โ€” local test shows no measurable cache benefit on top of the existing system-block cache_control.

Scenario cache_creation cache_read
system cache_control only 11550 11550
system + tool inject (this PR) 11546 11546
tool inject only (no system cache) 0 5521

Bedrock's system cachePoint already extends through the deterministic prefix (system + tools). The tool inject only helps when system cache_control is absent, which prompt_caching.py rules out for "anthropic" models (matches bedrock/us.anthropic.*).

See huggingface/ml-intern#154 for full details.

GuillaumeSalouHF changed pull request status to closed

Sign up or log in to comment