-
Notifications
You must be signed in to change notification settings - Fork 18.9k
Activity
This explains a lot about the quota burn rate increase people have been reporting. A 1h → 5min cache TTL change means cache_create operations happen 12x more frequently for the same session, and cache_create tokens cost significantly more than cache_read.
For anyone trying to work around this in the meantime: keeping sessions shorter and more focused (one task per session) reduces the impact since you hit cache invalidation less often. Also, structuring your CLAUDE.md to front-load the most critical context means the cache_create tokens are at least spent on high-value content.
Would be great to get official transparency on pricing-related infrastructure changes like this — silent downgrades erode trust, especially for teams budgeting based on observed costs.
Thanks for the writeup — the JSONL analysis and date pinpointing is good detective work. Let me walk through what's going on.
The March 6 change makes Claude Code cheaper, not more expensive. 1h TTL for every request could cost more, not less.
The cost tables assume every 5m-tier write would have become a cheap cache read under 1h TTL. That's only true when the cached content is re-accessed within the hour. A meaningful share of Claude Code's requests are one-shot calls where the cached context is used once and not revisited — the 1h TTL those would just be more expensive writes with no follow-up read to amortize them, because 1h writes cost more than 5m writes (roughly 2× base input vs. 1.25× — see the prompt caching docs). So "1h everywhere" isn't the cheaper baseline the tables frame it as; for the requests that are on 5m, it would be more expensive.
Prompt cache optimization is something the Claude Code team invests heavily in on an ongoing basis. Different request types benefit from different TTL tiers, and the client selects per request. The March 6 change you spotted is part of that ongoing optimization work — it wasn't a regression, on balance it lowers total cost for users across the request mix. The pre-March-6 behavior (what your Phase 2 captures) wasn't the intended steady state.
A bug fixed in v2.1.90
A client-side bug could cause sessions that have used up all their subscription quota at application start and started using overages to stay on 5m TTL until their session exits. This was fixed in v2.1.90.
Responses to your specific asks
- Was there a change? Yes — March 6, intentional, part of ongoing cache optimization. You pinpointed the date correctly.
- Intended TTL behavior? The client picks per request based on the expected cache-reuse pattern; there is no single global default, by design.
- Restore 1h as the default / expose as configurable? 1h everywhere would increase total cost given the request mix, so we're not planning a global toggle.
- Cache-read quota weighting (ref [BUG] Pro Max 5x Quota Exhausted in 1.5 Hours Despite Moderate Usage #45756): we'll follow up there.
Jarred what I'm getting from your convoluted post is the change is to reduce costs to Anthropic at the users' expense.
7 remaining items
A lot of time typing "claude pls open an issue about this, add lots of detail"
Classic scammer tactics: first, lure users in by promising a huge deal, then scam the hell out of them.
The March 6 change you spotted is part of that ongoing optimization work — it wasn't a regression, on balance it lowers total cost for users across the request mix.
How does say a Claude Pro user benefit from this "lower total cost" exactly?
User has already paid the monthly/annual subscription cost. The money is already in Anthropic's bank acocunt.
What we are seeing instead is we exhaust the session quota under an hour and have to wait 4 hours to resume our work. This effectively wipes out our day.
This means even if I stay up 24 hours to try and max my utilization, I get about 4 hours of real use out of my subscription.
Responses to your specific asks
- Was there a change? Yes — March 6, intentional, part of ongoing cache optimization. You pinpointed the date correctly.
- Intended TTL behavior? The client picks per request based on the expected cache-reuse pattern; there is no single global default, by design.
- Restore 1h as the default / expose as configurable? 1h everywhere would increase total cost given the request mix, so we're not planning a global toggle.
If the caching TTL is being so drastically changed -- from 1 hour to 5 minutes is a humongous change -- and that change has this oversized impact on user experience to the point that Claude is effectively unusable for the majortity of the day....
... you ought to give your paying customers the control to toggle this the way they want. Let us experiment with setting it at 5 minutes, 15 mins, or 1 hour. With some experience we will figure out what works best for our way of working -- and will apply the right settings for ach session. Just like selecting a model, enabling extended thinking, or turning on plan mode. And I am sure community will find optimal ways of utilizing the cache TTLs that lower the cost for Anthropic as well in the long run.
PS: The drastic drop in user experience seen in mid March that I experienced first-hand is compelling me to add this voice. Otherwise, I am deeply respectful of the technical work being done by Anthropic team and wish you the very best -- want to see you succeed and make greater things. Cheers!
Jarred’s response was very informative and very transparent, but came at the wrong time. Instead of having this answer as a post-mortem, it should have been a pre-mortem.
As amazing as is your product, I sense the lack of transparency and strategical thinking from the product team.
You need to be clear about it - every change that touches consumption limits or has a probability to change how customers are billed - this must be announced well well in advance. The folks are getting tired of new features, while their regular workflows are starting to misbehave and their limits are getting smaller and smaller.
The only proper way now is to make your roadmap public. This is the best thing you can do for yourself. Not so much for the customers. They will learn to live with the TTLs, with the bugs, and in the end, they will vote with their wallets. But the optics are getting really bad really quickly now for Anthropic.
My usage is API usage and not a subscription, this is a good change, works out cheaper.
My usage is API usage and not a subscription, this is a good change, works out cheaper.
API was always 5 minutes, don't spread misinformation. This is a bad overall change.
@Jarred-Sumner How does this work out cheaper? People not paying for caching writes? How about stop charging people caching writes then as I am pretty sure OpenAI does not charge extra for caching, it's done automatically.
Not to mention, you are expecting people to be able to read/process/formulate and then type a reply to every conversation turn within 5 minutes or risk a 10x token use?
This is AFTER changing limits to burn them faster during peak hours?
How is any of this customer-friendly? We're risking our tokens burning TWENTY TIMES FASTER and you are out here saying it's cheaper? No bro, it's to make YOUR costs cheaper. Be honest about it at least.
My usage is API usage and not a subscription, this is a good change, works out cheaper.
API was always 5 minutes, don't spread misinformation. This is a bad overall change.
Via CC? Since when is the caching different between subscription and API different via CC? If you're suffering from lack of 1h caching then it sounds like you're not even using CC enough to hit any subscription limits.
Via CC? Since when is the caching different between subscription and API different via CC?
Literally check the documentation and the results of this exact thread? Are you being dense?
If you're suffering from lack of 1h caching then it sounds like you're not even using CC enough to hit any subscription limits.
What? It's the exact opposite. 1h caching prevents you from resending every message as cache writes every 6 minutes.
Yeah it makes sense to me. I use the Claude dope daily and notice it slacking and switched over to Codex which can't write, but can code. Claude's ability to keep context definitely declined or feels 'throttled' or nerfed. Thanks for writing this. As someone that's been around the block I think they found it more profitable to sell the sizzle, than the steak. You can see the turn of events from the political pressure to the inflated claims on mythos, the cyber use case restrictions. They're not what they were and it's clear.
Via CC? Since when is the caching different between subscription and API different via CC?
Literally check the documentation and the results of this exact thread? Are you being dense?
CC doesn't go to different places if you're using a subscription vs an API Key. 😆
@phillip-haydon Where "CC goes" is irrelevant. API keys are distinct from subscription keys. If these are distinct entities, then Anthropic's servers obviously can and do treat the requests differently.
In pseudo code:
if (isAPIKey(key)) {
// ... apply API key TTL, etc ...
} else {
// ... apply subscription TTL, etc ...
}
Here's the API docs on prompt caching from January: https://web.archive.org/web/20260124153111/https://platform.claude.com/docs/en/build-with-claude/prompt-caching
By default, the cache has a 5-minute lifetime. The cache is refreshed for no additional cost each time the cached content is used.
If you find that 5 minutes is too short, Anthropic also offers a 1-hour cache duration at additional cost.
For more information, see 1-hour cache duration.
So I feel the need to reiterate what @hi-fox said:
Literally check the documentation and the results of this exact thread? Are you being dense?
If you can't wrap your head around this, I'd recommend spending more time sharpening your programming skills and spending less time in Claude Code.
Cache TTL appears to have silently regressed from 1h to 5m around early March 2026, causing significant quota and cost inflation
Summary
Analysis of raw Claude Code session JSONL files spanning Jan 11 – Apr 11, 2026 shows that Anthropic appears to have silently changed the prompt cache TTL default from 1 hour to 5 minutes sometime in early March 2026. Prior to this change, Claude Code was receiving 1-hour TTL cache writes — which we believe was the intended default. The reversion to 5-minute TTL has caused a 20–32% increase in cache creation costs and a measurable spike in quota consumption for subscription users who have never previously hit their limits.
This appears directly related to the behavior described in #45756.
Data
Session data extracted from
~/.claude/projects/JSONL files across two machines (Linux workstation + Windows laptop, different accounts/sessions), totaling 119,866 API calls from Jan 11 – Apr 11, 2026. Each assistant message includes ausage.cache_creation.ephemeral_5m_input_tokens/ephemeral_1h_input_tokensbreakdown that makes the TTL tier per-call observable. Having two independent machines strengthens the signal — both show the same behavioral shift at the same dates.Phase breakdown
ephemeral_1habsent/zero — likely predates 1h tier availability in the APIephemeral_5m = 0,ephemeral_1h > 0across 33+ consecutive days on both machines — near-zero exceptionsWe believe Phase 2 represents Anthropic's intended default behavior — 1h TTL was rolled out as the Claude Code standard around Feb 1 and held consistently for over a month across two independent machines on two different accounts. January's all-5m data most likely predates the 1h TTL tier being available in the API. The regression began around March 6–8, 2026.
No client-side changes were made between phases. The same Claude Code version and usage patterns were in place throughout. The TTL tier is set server-side by Anthropic.
Day-by-day TTL data showing the regression (combined, both machines)
The transition is visible to the day: March 6 is when 5m tokens first reappear after 33 days of clean 1h-only behavior. By March 8, 5m tokens outnumber 1h by 5:1. This is consistent with a server-side configuration change being rolled out gradually then completing around March 8.
Cost impact
Applying official Anthropic pricing (rates.json, updated 2026-04-09):
Combined dataset (119,866 API calls, two machines):
claude-sonnet-4-6 (
cache_write_5m = $3.75/MTok,cache_write_1h = $6.00/MTok,cache_read = $0.30/MTok):claude-opus-4-6 (
cache_write_5m = $6.25/MTok,cache_write_1h = $10.00/MTok,cache_read = $0.50/MTok):February — the month Anthropic was defaulting to 1h TTL — shows only 1.1% waste (trace 5m activity from one machine on one day). Every other month shows 15–53% overpayment from 5m cache re-creations. The cost difference is explained entirely by TTL tier, not by usage volume. The percentage waste is identical across model tiers (17.1%) because it is driven purely by the 5m/1h token split, not by per-token price.
Why 5m TTL is so expensive in practice
With 5m TTL, any pause in a session longer than 5 minutes causes the entire cached context to expire. On the next turn, Claude Code must re-upload that context as a fresh
cache_creationat the write rate, rather than acache_readat the read rate. The write rate is 12.5× more expensive than the read rate for Sonnet, and the same ratio holds for Opus.For long coding sessions — which are the primary Claude Code use case — this creates a compounding penalty: the longer and more complex your session, the more context you have cached, and the more expensive each cache expiry becomes.
Over the 3-month period analyzed:
$0.30–0.50/MTok) instead of re-creations ($3.75–6.25/MTok)Quota impact
Users on Pro/subscription plans are quota-limited, not just cost-limited. Cache creation tokens count toward quota at full rate; cache reads are significantly cheaper (the exact coefficient is under investigation in #45756). The silent reversion to 5m TTL in March is the most likely explanation for why subscription users began hitting their 5-hour quota limits for the first time — including the author of this issue, who had never hit quota limits before March 2026.
Hypothesis
The data strongly suggests that 1h TTL was the intended default for Claude Code and was in place as of early February 2026. Sometime between Feb 27 and Mar 8, 2026, Anthropic silently changed the default to 5m TTL — either intentionally as a cost-saving measure, or accidentally as an infrastructure regression.
Evidence supporting "1h was the intended default":
The most likely sequence of events:
The 33-day window of clean 1h-only behavior (Feb 1 – Mar 5) across two independent machines and two separate accounts makes this one of the strongest available signals that 1h TTL was Anthropic's deliberate default, not a fluke.
Request
Methodology
~/.claude/projects/**/*.jsonlsession files (Claude Code stores per-message API responses including fullusageobjects)type: "assistant"entries withmessage.usage.cache_creationfieldquota-analysis --sourcemode (added to support this investigation)rates.json(updated 2026-04-09)