Spaces:

smolagents
/

ml-intern

Running on CPU Upgrade

App Files Files Community

fix(bedrock): force tool cachePoint via cache_control_injection_points

#24

by GuillaumeSalouHF HF Staff - opened 8 days ago

base: refs/heads/main

←

from: refs/pr/24

Discussion Files changed

+39

-11

feat(session): include user_id and total_cost_usd in trajectory dump (#22)384bc5e8

fix(session-uploader): forward user_id + total_cost_usd to dataset row (#23)9f90f1fe

fix(bedrock): force tool cachePoint via cache_control_injection_points4701de6e

GuillaumeSalouHF

smolagents org 8 days ago

Bug

LiteLLM 1.83.0's Bedrock Converse adapter reads cache_control blocks on system content blocks but ignores cache_control on individual tool dicts. Only the native Anthropic adapter inspects tool dicts.

Result: the ~~16k tokens of tool definitions are re-billed at full input price on every Bedrock turn instead of hitting the cache_read tier (~~10x cheaper).

Confirmed by reading litellm/llms/bedrock/chat/converse_transformation.py — the tool path appends a cachePoint only when the kwarg cache_control_injection_points=[{location: tool_config}] is passed top-level. The cache_control on the last tool dict (set by agent/core/prompt_caching.py for the Anthropic native path) is silently dropped.

Fix

Pass cache_control_injection_points=[{location: tool_config}] to the Converse adapter from _resolve_llm_params for any bedrock/... model. System-prompt caching still works via cache_control on system content blocks (Converse reads those).

Expected impact

Cache-read ratio on Opus 4.6 should jump from ~17% (Bedrock auto-prefix caching only) towards 50-70% on multi-turn sessions, cutting the input bill ~25-35% (= roughly $2-3k/day at current run-rate).

Test plan

Deploy
Compare CloudWatch CacheReadInputTokenCount / InputTokenCount ratio on us.anthropic.claude-opus-4-6-v1 before vs 24h after deploy. Expect at least a 2x improvement.
Verify no regression on the Anthropic-direct path (anthropic/claude-...) since that branch is untouched.

Diff: 1 file, +12 / -1.

narrow cache injection to Anthropic-on-Bedrock onlye4cec204

GuillaumeSalouHF

smolagents org 8 days ago

Closing — local test shows no measurable cache benefit on top of the existing system-block cache_control.

Scenario	cache_creation	cache_read
system cache_control only	11550	11550
system + tool inject (this PR)	11546	11546
tool inject only (no system cache)	0	5521

Bedrock's system cachePoint already extends through the deterministic prefix (system + tools). The tool inject only helps when system cache_control is absent, which prompt_caching.py rules out for "anthropic" models (matches bedrock/us.anthropic.*).

See huggingface/ml-intern#154 for full details.

GuillaumeSalouHF changed pull request status to closed 8 days ago

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment