The proxy is honest about its layer — it shrinks ~53% of the bash output it intercepts. In Claude Code, that bash output is roughly 4% of the input you actually pay for. The other 96% lives in cache_read, which rtk cannot touch. A map, not an accusation.
rtk works exactly as advertised — it shrinks ~53% of the bash output
it intercepts. But in Claude Code, that bash output is roughly
4% of the input you actually pay for. The other 96% lives in
cache_read — system prompt, CLAUDE.md, skill manifests,
prior tool results — which rtk cannot touch. So the proxy is honest
about its layer; the README's "session drops 150K → 45K"
extrapolation just doesn't hold once you map the bucket sizes.
rtk-ai/rtk is a Rust CLI proxy that compresses tool output before it reaches your LLM context. The repo's headline claims:
PreToolUse hook that rewrites git status → rtk git statusThe second claim — the session-level one — is the one most users will read first. The first is mechanically true and easy to verify; the second requires looking at what the LLM actually consumed.
Installed via brew install rtk on 2026-05-13 02:17 COT
(confirmed from the Homebrew INSTALL_RECEIPT, brew info, and the
binary mtime). Version 0.39.0, single binary, hook wired up
(rtk hook claude in ~/.claude/settings.json).
Eight days of usage since install. Telemetry: Claude Code → OTEL → Langfuse Cloud (US). Two 7-day comparison windows:
| Window | Dates (UTC) | Observations |
|---|---|---|
| PRE | 2026-05-06 → 2026-05-13 | 1,500 |
| POST | 2026-05-15 → 2026-05-22 | 1,500 |
Sample: 500 claude_code.llm_request generations/day × 3
representative days per window. Install day is excluded.
Every Claude Code turn sends the model an input made of four buckets. Knowing which bucket a tool operates on tells you the ceiling on how much it can save.
| Bucket | ~size/req | $/M (Sonnet) | What lives there | rtk? |
|---|---|---|---|---|
| cache_read | ~60,000 | $0.30 | System prompt, CLAUDE.md, MEMORY.md, skill manifests, hook outputs, every prior tool result, full conversation history |
No |
| cache_creation | ~2,400 | $3.75 | What's new this turn — including the bash output rtk just compressed | Yes |
| output | ~440 | $15.00 | What the model generates | No |
| input (uncached) | ~3 | $3.00 | Truly fresh non-cached bytes | No |
rtk operates on cache_creation, and only on the
bash-output portion of it. Read, Edit,
Write, WebFetch, and every MCP tool flow past
rtk untouched. In agent-heavy flows (issue enrichment, code review)
those non-bash tools do most of the work.
Before challenging the downstream claim, confirm rtk is doing something
at all. rtk gain prints a daily summary:
53.5% — below the advertised 60–90% but in the same order of magnitude. The interesting part is the per-command breakdown:
rtk gain --json
Aggregate savings are dominated by a handful of heavy
commands (vitest, playwright, the stash dump). The high-frequency
one — rtk read, 2,770 calls — only trims 10%.
So rtk is doing the work it says it's doing. The question is whether that work translates into fewer tokens charged to my Anthropic account.
Per-request token averages across both windows, pulled from
metadata.attributes on every claude_code.llm_request
generation:
The uncached bucket — the one rtk shrinks — is a sliver.
cache_read dwarfs everything; it grew 3% POST. The
meter went up, not down.
The −56% looks dramatic but it's the smallest bucket by four orders of magnitude — a saving of three tokens per request. Side by side:
Both bars are drawn to the same per-token scale. That's the bandwidth difference between "the bucket rtk touches" and "the bucket Anthropic bills you for" — and it's exactly what the four-buckets table at the top of this post predicts.
Three things stack up, and all three are features of how Claude Code works rather than failures of rtk.
It's the right pond — that is where new bash output enters
the model — but it's a ~4% wedge of input. The 96% wedge
(cache_read) carries the system prompt, your
CLAUDE.md, MEMORY.md, ~80 installed skill
manifests, hook outputs, and every prior tool result, all replayed
each turn. rtk can't see any of it.
In a typical agent flow, Read, Edit,
Write, WebFetch, and MCP servers each
contribute new content to cache_creation, none of it through rtk. If a
session does 50 tool calls and 40 are non-bash, rtk's slice of
cache_creation is dilute even within its own bucket.
Smaller bash outputs today mean smaller cached entries replayed
tomorrow — so rtk does slow the growth of
cache_read slightly. But cache_read also grows from every
non-bash tool call, every skill that gets loaded, every CLAUDE.md
edit. rtk's deceleration is swamped by everything else accreting.
And activity often expands to fill the freed space. Per-request input stays flat across the install boundary; per-day token spend goes up, simply because there are more requests:
Dashed marker = install day. Per-request input is flat across the boundary; per-day token spend is up, simply because there are more requests.
And one shape-of-traffic observation worth naming: the high-frequency
intercepts are low-margin, and the high-margin ones are low-frequency.
rtk read at 10.6% savings fires 2,770 times; the 90%+
savers (vitest, playwright, stash show) fire <100
times each. That's not a rtk failure — it's what dev traffic looks
like.
cache_read, and that's a different tool —
probably an impossible one, since cache_read content is mostly
user-supplied (your CLAUDE.md, your skills, your hooks).
The proxy is doing its job. The README's headline extrapolates a per-command saving to session totals, but session totals are dominated by replayed cached context that rtk can't see. That's a map question, not a pass/fail.
If cache_read is 96% of input volume and rtk can't touch
it, what can? The answer is unglamorous: shrink what you
cram into the cached prelude. Every byte you keep in
CLAUDE.md, MEMORY.md, the skill manifest,
and hook outputs gets replayed on every single turn for the entire
session.
Concrete moves in rough order of impact:
CLAUDE.md / MEMORY.md / SOUL.md. These are read every session start and replayed in cache on every turn. Trim ruthlessly.SessionStart hook that prints 5KB of context is 5KB replayed on every subsequent turn until the session ends./compact before long sessions when you don't need the full history.
rtk stays installed — it's free, zero-dependency, and the savings it
does book are real money on heavy-output commands (vitest,
playwright, gh pr diff). But the headline win in a Claude
Code workflow comes from treating your cached prelude as a budget, not
from compressing bash.
jq '.hooks' ~/.claude/settings.json shows rtk hook claude is wired up, and rtk gain captured 16,345 commands. The hook works — that's not the disconnect.
session.id across a controlled task (e.g. "enrich this issue") run before and after rtk, with cache state reset. Isolates session size from activity volume. My windows don't control for what I was doing.
metadata.attributes.{input,output,cache_read,cache_creation}_tokens on Claude Code's OTEL spans. Langfuse's auto-extracted promptTokens/completionTokens fields are zero because the attribute naming doesn't match Langfuse's mapping — I had to read attributes directly.type=GENERATION, level=DEFAULT, name=claude_code.llm_request. Excludes 429 rate-limit errors (zero tokens).