marcuss.pro
~/writing/02 PUBLISHED
FIELD NOTES · 02 22 MAY 2026 · 9 MIN READ

rtk works. The pond
it's fishing is just 4%
of your bill.

The proxy is honest about its layer — it shrinks ~53% of the bash output it intercepts. In Claude Code, that bash output is roughly 4% of the input you actually pay for. The other 96% lives in cache_read, which rtk cannot touch. A map, not an accusation.

TL;DR

rtk works exactly as advertised — it shrinks ~53% of the bash output it intercepts. But in Claude Code, that bash output is roughly 4% of the input you actually pay for. The other 96% lives in cache_read — system prompt, CLAUDE.md, skill manifests, prior tool results — which rtk cannot touch. So the proxy is honest about its layer; the README's "session drops 150K → 45K" extrapolation just doesn't hold once you map the bucket sizes.

01.What rtk claims

rtk-ai/rtk is a Rust CLI proxy that compresses tool output before it reaches your LLM context. The repo's headline claims:

The second claim — the session-level one — is the one most users will read first. The first is mechanically true and easy to verify; the second requires looking at what the LLM actually consumed.

02.Setup

Installed via brew install rtk on 2026-05-13 02:17 COT (confirmed from the Homebrew INSTALL_RECEIPT, brew info, and the binary mtime). Version 0.39.0, single binary, hook wired up (rtk hook claude in ~/.claude/settings.json).

Eight days of usage since install. Telemetry: Claude Code → OTEL → Langfuse Cloud (US). Two 7-day comparison windows:

WindowDates (UTC)Observations
PRE2026-05-06 → 2026-05-131,500
POST2026-05-15 → 2026-05-221,500

Sample: 500 claude_code.llm_request generations/day × 3 representative days per window. Install day is excluded.

03.The four buckets of a Claude Code request

Every Claude Code turn sends the model an input made of four buckets. Knowing which bucket a tool operates on tells you the ceiling on how much it can save.

Bucket ~size/req $/M (Sonnet) What lives there rtk?
cache_read ~60,000 $0.30 System prompt, CLAUDE.md, MEMORY.md, skill manifests, hook outputs, every prior tool result, full conversation history No
cache_creation ~2,400 $3.75 What's new this turn — including the bash output rtk just compressed Yes
output ~440 $15.00 What the model generates No
input (uncached) ~3 $3.00 Truly fresh non-cached bytes No

rtk operates on cache_creation, and only on the bash-output portion of it. Read, Edit, Write, WebFetch, and every MCP tool flow past rtk untouched. In agent-heavy flows (issue enrichment, code review) those non-bash tools do most of the work.

// the ceiling — before measuring anything, rtk's theoretical maximum reach is ~4% of input (the bash slice of the ~4% cache_creation wedge). A heroic 80% cut on that slice nets you ~1–2% off the whole input — which is exactly the noise floor we measured.

04.What rtk's own counter says

Before challenging the downstream claim, confirm rtk is doing something at all. rtk gain prints a daily summary:

$ rtk gain RTK Token Savings (Global Scope) ════════════════════════════════════════════════════════════ Total commands: 16,345 Input tokens: 34.0M Output tokens: 15.8M Tokens saved: 18.2M (53.5%) Total exec time: 442m23s (avg 1.6s)

53.5% — below the advertised 60–90% but in the same order of magnitude. The interesting part is the per-command breakdown:

FIG · 01

Tokens saved by command — top 7 (out of 41 distinct entries)

source: rtk gain --json

Aggregate savings are dominated by a handful of heavy commands (vitest, playwright, the stash dump). The high-frequency one — rtk read, 2,770 calls — only trims 10%.

So rtk is doing the work it says it's doing. The question is whether that work translates into fewer tokens charged to my Anthropic account.

05.What Langfuse actually shows

Per-request token averages across both windows, pulled from metadata.attributes on every claude_code.llm_request generation:

FIG · 02

Effective input tokens per LLM request — PRE vs POST

stacked: cache_read · cache_creation · uncached

The uncached bucket — the one rtk shrinks — is a sliver. cache_read dwarfs everything; it grew 3% POST. The meter went up, not down.

Uncached input · Δ
−56%
5.5 → 2.4 tokens
cache_read · Δ
+3%
60,331 → 62,171 tokens
Effective input · Δ
+3%
62,572 → 64,571 tokens

The −56% looks dramatic but it's the smallest bucket by four orders of magnitude — a saving of three tokens per request. Side by side:

// rtk's actual reach
−56%
uncached input · −3 tokens / request
// the bucket above it
+3%
cache_read · +1,840 tokens / request

Both bars are drawn to the same per-token scale. That's the bandwidth difference between "the bucket rtk touches" and "the bucket Anthropic bills you for" — and it's exactly what the four-buckets table at the top of this post predicts.

06.Why we didn't see more improvement

Three things stack up, and all three are features of how Claude Code works rather than failures of rtk.

1. rtk only fishes the cache_creation bucket.

It's the right pond — that is where new bash output enters the model — but it's a ~4% wedge of input. The 96% wedge (cache_read) carries the system prompt, your CLAUDE.md, MEMORY.md, ~80 installed skill manifests, hook outputs, and every prior tool result, all replayed each turn. rtk can't see any of it.

2. Most tool calls aren't bash.

In a typical agent flow, Read, Edit, Write, WebFetch, and MCP servers each contribute new content to cache_creation, none of it through rtk. If a session does 50 tool calls and 40 are non-bash, rtk's slice of cache_creation is dilute even within its own bucket.

3. The compounding effect into cache is real but quiet.

Smaller bash outputs today mean smaller cached entries replayed tomorrow — so rtk does slow the growth of cache_read slightly. But cache_read also grows from every non-bash tool call, every skill that gets loaded, every CLAUDE.md edit. rtk's deceleration is swamped by everything else accreting.

And activity often expands to fill the freed space. Per-request input stays flat across the install boundary; per-day token spend goes up, simply because there are more requests:

FIG · 03

Daily trace volume — 14 days across the install boundary

~70 → ~230 traces/day

Dashed marker = install day. Per-request input is flat across the boundary; per-day token spend is up, simply because there are more requests.

And one shape-of-traffic observation worth naming: the high-frequency intercepts are low-margin, and the high-margin ones are low-frequency. rtk read at 10.6% savings fires 2,770 times; the 90%+ savers (vitest, playwright, stash show) fire <100 times each. That's not a rtk failure — it's what dev traffic looks like.

// none of this is a rtk bug — these are shape-of-the-stack facts. rtk would have to operate inside Claude Code's prompt assembly to reach cache_read, and that's a different tool — probably an impossible one, since cache_read content is mostly user-supplied (your CLAUDE.md, your skills, your hooks).

07.Where each claim lands on the map

// rtk's claims, mapped to the buckets
Shell-layer compression works as advertised
"60–90% per command"
In its layer
Per-command savings, on rtk's own counter
"rtk gain reports 53.5%"
Squarely in spirit
Session drops 150K → 45K (~70%)
per session, charged to your account
Doesn't survive cache_read

The proxy is doing its job. The README's headline extrapolates a per-command saving to session totals, but session totals are dominated by replayed cached context that rtk can't see. That's a map question, not a pass/fail.

08.The lever that would move your bill

If cache_read is 96% of input volume and rtk can't touch it, what can? The answer is unglamorous: shrink what you cram into the cached prelude. Every byte you keep in CLAUDE.md, MEMORY.md, the skill manifest, and hook outputs gets replayed on every single turn for the entire session.

Concrete moves in rough order of impact:

rtk stays installed — it's free, zero-dependency, and the savings it does book are real money on heavy-output commands (vitest, playwright, gh pr diff). But the headline win in a Claude Code workflow comes from treating your cached prelude as a budget, not from compressing bash.

09.Likely pushback


10.Methodology notes