Claude Code Was Silently Inflating Token Costs by ~40%. What Agencies Need to Know.
In March and April 2026, something went wrong with Claude Code.
Developers on Max 20x plans — the $200/month tier with the highest usage limits — were burning through their entire quota in under 90 minutes. Usage that used to last a full workday was gone before lunch. Community threads named it "Tokenocalypse." Anthropic acknowledged the issue on Reddit. The Register covered it.
For individual developers, it was frustrating. For agencies running Claude Code across a team of developers on client projects, it was a billing and capacity problem that nobody saw coming — and that most agencies still don't fully understand.
Here's what actually happened, what it means for your costs, and why it exposed a structural problem that outlasts the specific bug.
What Happened: Three Overlapping Problems
Anthropic published an April 23, 2026 postmortem identifying three separate issues that converged between March and April:
Problem 1: Effort Downgrade (March 4 – April 7)
Anthropic silently changed Claude Code's default effort level from high to medium on March 4. This degraded output quality — developers found Claude making more mistakes, requiring more correction turns, generating more back-and-forth.
More correction turns means more context accumulation. More context means higher token costs per session. This one change increased effective token consumption without changing any pricing.
Problem 2: Cache Invalidation Bug (March 26 – April 10)
A bug introduced March 26 accidentally cleared thinking tokens every turn instead of only on idle cache misses. This destroyed prompt cache efficiency.
Normally, Claude Code's prompt cache means that stable context — your CLAUDE.md, system instructions, tool definitions — gets re-read at a 90% discount on cache hits. When caching breaks, that same context is processed as fresh input at full price, every turn.
In practical terms: a session that cost $4 with working cache could cost $40 with broken cache. The token volume was the same; the billing was not.
Problem 3: Verbosity Reduction Prompt (April 16 – April 20)
Anthropic shipped a system prompt intended to reduce verbosity. It reduced quality instead. Developers compensated with more prompting and correction, again driving up turn count and context accumulation.
All three issues were resolved by v2.1.116, according to Anthropic's postmortem. Anthropic also reset affected usage limits on April 23.
The Separate Bug: +20,000 Tokens Per Request
Alongside the broader Tokenocalypse, GitHub issue #46917 documented a more specific and measurable problem in Claude Code v2.1.100 and v2.1.101.
The reporter compared token consumption across versions on identical requests:
| Version | Request size | cache_creation_input_tokens |
|---|---|---|
| v2.1.98 | 169,514 bytes | 49,726 |
| v2.1.100 | 168,536 bytes | 69,922 |
| v2.1.101 | 171,903 bytes | ~72,000 |
v2.1.100 consumed 20,196 more cache_creation_input_tokens per request than v2.1.98 — despite sending a smaller payload. The v2.1.101 figure was higher still.
Cache creation tokens are billed at 1.25× the standard input rate on Sonnet 4.6. That's 20,000 extra tokens billed at $3.75/million — about $0.075 per cold request, on every request.
For a developer running 50 cold requests per day across a project, that's $3.75/day in invisible extra cost. Across a team of eight developers over a month: roughly $750 in unattributed token inflation, on top of normal usage.
The workaround documented in the issue was to downgrade to v2.1.98:
npx claude-code@2.1.98
Though the reporter noted that auto-updates could overwrite the downgrade.
The Official Status vs Community Reports
Anthropic's position: the April quality and cache issues were resolved by v2.1.116. Usage limits that were consumed faster than expected during the affected period were reset on April 23.
The community picture is more nuanced. A GitHub issue filed May 12, 2026, against v2.1.137 — released after the claimed fix — reported two Max 20x accounts being exhausted in a single day and questioned whether the token inflation from v2.1.100 had been fully resolved.
The current Claude Code release as of writing is v2.1.141.
The conservative read: Anthropic resolved the identified April issues. Whether all token/cache anomalies are fully closed is less certain. Community reports on later versions suggest the issue may not be entirely resolved.
The practical implication for agencies: you cannot assume your Claude Code billing is stable across version updates. A silent version upgrade can meaningfully change your team's token consumption — and without session-level monitoring, you won't see it until the invoice arrives.
What This Means for Agencies Specifically
Individual developers felt Tokenocalypse as frustration — hitting limits mid-afternoon, waiting for reset windows, losing productivity.
Agencies running Claude Code across a team on client projects experienced something different:
Unpredictable project costs. If your team's Claude Code usage doubled in March and April due to broken caching, your AI cost per project doubled with it. On fixed-price contracts, that increase hit your margin directly.
No warning signal. Without session-level cost monitoring, there was no way to see the spike happening. The first signal was the invoice — or for subscription users, hitting limits much faster than expected with no clear explanation.
No attribution. Even if you identified the spike, you couldn't trace which projects or clients drove it. The Anthropic console showed total usage. The project-level breakdown didn't exist.
Version drift across the team. Different developers on different machine setups may have been on different Claude Code versions simultaneously. The developer who auto-updated to v2.1.100 the day it released was suddenly consuming 40% more tokens than the developer still on v2.1.98 — on the same project.
The Structural Problem Tokenocalypse Exposed
The specific bugs were resolved. The structural issue they exposed hasn't been.
Claude Code's token behavior is a function of the version you're running, the caching efficiency in that session, the model selected, and the context accumulation pattern of the specific work. All four variables change without warning:
- Anthropic ships updates automatically
- Cache efficiency degrades in long sessions or on resume
- Developers switch models based on task complexity
- Complex projects generate more agent loops and longer sessions
None of this is visible in the Anthropic console. The console shows totals. It doesn't show you which session spiked, which version a developer was running when it happened, or which client project absorbed the cost.
For an agency, this means your AI cost per project is always partially opaque — even in normal operation, let alone during an incident like Tokenocalypse.
What Proactive Monitoring Looks Like
The agencies that caught the Tokenocalypse impact earliest were the ones watching session-level usage — either manually through /usage and community tools like ccusage, or through attribution tooling that flagged cost spikes in real time.
What session-level monitoring catches that console billing doesn't:
- A developer whose daily usage doubled after a version update
- A session that's been running for 4 hours with a growing agent loop
- A project whose per-developer cost is running 3× the team average
- A cache invalidation event that's inflating costs on every turn
TokenWatch monitors this across Claude Code, Cursor, and Cline — surfacing cost spikes by developer and project as they happen, not at invoice time. When a session crosses the threshold where the context has outrun its value, you see it before the next billing cycle reflects it.
That's the difference between catching a version-related cost spike in week one and discovering it when you reconcile the month.
Practical Steps for Agency Owners Right Now
Check which Claude Code version your team is running.
claude --version
Current release: v2.1.141. If developers are significantly behind or have divergent versions across the team, standardize and document your version policy.
Pin your version in onboarding docs. A silent auto-update that changes token behavior is a billing risk. Include claude --version verification in your developer onboarding checklist.
Look at your March and April invoices. If your Anthropic costs were significantly higher than February or May, the timeline aligns with the documented bug window. That's a legitimate basis for a conversation with Anthropic support.
Implement session-level monitoring before the next incident. The next version-related cost anomaly won't announce itself. Teams with attribution tooling in place will catch it in days; teams without it will catch it in the next invoice.