Anthropic is probing a surge of complaints about Claude Code burning through user quotas far faster than expected, leaving many developers rate‑limited after relatively light use and forcing the company to treat the issue as a top‑priority investigation.
What sparked the investigation
Over the past few weeks, Claude Code users across paid tiers have reported that their allotted “5‑hour” or weekly session limits are evaporating in a fraction of the time they used to. Some Max subscribers say sessions that previously lasted most of a workday now hit 100 percent usage in under one or two hours using the same workflows. Others describe quota bars jumping from around 20 percent to well over 60 percent after only a handful of prompts, or even draining while the CLI sits idle. In some extreme anecdotes, users claim that something as trivial as saying “hello” to Claude can eat a couple of percentage points of their weekly allowance.
The frustration has spilled across Reddit, GitHub issues and social channels, where developers share screenshots of dashboards and ccusage logs that, in their view, simply do not add up. Many of these reports come from power users on Claude Pro or Max plans, who pay 100–200 dollars per month and rely on Claude Code as a primary day‑to‑day coding assistant.
Anthropic’s public response so far
Anthropic has publicly acknowledged that something is off with Claude Code’s usage behavior, even if the company has not yet pinned down a single root cause. Developer advocate Lydia Hallie told users on X that “people are hitting usage limits in Claude Code way faster than expected” and stressed that the team is “actively investigating” the issue and treating it as their “top priority.” A report in the tech press echoed that internal message, noting that Anthropic admits quotas are “running out too fast” but is still working to understand why.
At the same time, Anthropic leadership has pointed to deliberate rate‑limit adjustments as one factor behind the perceived quota crunch. On March 26, the company confirmed that session limits for Claude were tweaked to burn down faster during weekday peak hours (roughly 5 a.m. to 11 a.m. Pacific), in an effort to manage growing demand without changing the underlying weekly caps. Those policy changes may explain some of the faster‑than‑usual drain, especially for users who work primarily during busy time windows.
Suspected technical causes: cache bugs and context costs
User investigations and third‑party write‑ups suggest that policy alone cannot account for the most dramatic cases of rapid quota depletion. Several developers claim to have reverse‑engineered parts of the Claude Code client and found prompt‑caching bugs that can silently inflate effective token usage by 10–20 times, particularly when cache misses cause the system to resend large prompts instead of reusing previous context. In at least one report, downgrading Claude Code to an earlier version noticeably reduced usage burn, which further fueled suspicions of a regression tied to recent updates.
Longer‑term context growth is also part of the story. Claude Code’s architecture tends to resend substantial portions of the conversation history with every new turn, which means each follow‑up question can be more expensive than the last as the transcript grows. When combined with heavier models such as Opus 4.6, which already consume significantly more tokens than leaner variants like Haiku, even typical coding workflows can accumulate large token counts surprisingly quickly.
GitHub issues paint a broader picture
GitHub has become a de facto incident log for Claude Code’s quota behavior, with multiple high‑traffic issues documenting abnormal patterns going back to late January. One bug report describes “extremely rapid token consumption” where small prompts trigger double‑digit percentage drops in session quota and even idle terminals appear to drain usage. Another details a case where 65 percent of a 5‑hour Max session was consumed despite relatively modest input and output token totals, highlighting huge variance between “1 percent quota” and the actual underlying token counts from session to session.
Earlier in February, a separate issue tied to the Opus 4.6 update flagged weekly quotas that seemed to disappear much faster than before, especially for non‑English languages that are already disadvantaged by tokenization inefficiencies. While some of these tickets have been closed or marked “not planned,” they collectively sketch a months‑long pattern of users perceiving Claude Code’s real‑world usage as both opaque and inconsistent with the advertised multipliers on Max plans.
Why rapid quota drain hurts developers
For many teams, Claude Code is not a novelty tool but a core part of the daily development workflow—used to scaffold new projects, refactor legacy code, write tests, and even run and debug apps directly from the CLI. When rate limits are hit unexpectedly in the middle of a work session, developers can be left without their assistant for the rest of the day or week, derailing carefully planned tasks and automation pipelines.
The lack of fine‑grained visibility also compounds the pain. Claude Code currently has limited ability to report its own real‑time token or quota consumption inside the tool, forcing users to periodically jump to the Claude web dashboard to see how close they are to being cut off. A feature request on GitHub asks Anthropic to add both “pull” (on‑demand) and “push” (proactive warning) modes, so that Claude Code can tell users when they are approaching thresholds or when further use might trigger extra charges. Until something like that ships, many developers feel they are flying blind.
What users can do right now
While Anthropic works on a long‑term fix, community guides and early investigations suggest several practical steps users can take to reduce the risk of sudden quota exhaustion. Some developers report that temporarily downgrading to an older Claude Code client version mitigates at least part of the suspected caching problem, though this is an unofficial workaround and may not be viable for everyone. Others have seen improvements by restarting sessions more frequently to avoid extremely long context windows, or by trimming prompts and results instead of letting conversations grow unchecked.
Third‑party tutorials also recommend favoring lighter models (like Haiku or Sonnet) for routine tasks and reserving heavy Opus variants only for jobs that truly require maximum reasoning depth. Avoiding peak‑hour usage where possible can lessen the impact of Anthropic’s time‑of‑day adjustments, and switching some workloads from the Claude Code CLI to direct API calls can give teams more precise control and observability over token accounting. Finally, documenting your own usage anomalies—screenshots, logs, timestamps—and sharing them with Anthropic support or in relevant GitHub threads increases the chances that edge‑case bugs are spotted and prioritized.
What to watch for next
Anthropic has signaled that resolving Claude Code’s quota‑drain problems is a top engineering priority, but meaningful fixes will likely span product policy, infrastructure, and tooling. On the technical side, users will be looking for confirmation that any prompt‑caching or quota‑accounting bugs have been identified and patched, ideally accompanied by retroactive rate‑limit resets or credits where appropriate. Just as important will be improvements to transparency: better in‑tool reporting of token usage, clearer documentation around peak‑hour multipliers, and more predictable mapping between plan tiers and real‑world capacity.
For now, the quota crisis around Claude Code is a reminder of how fragile developer trust can be when essential tools become unpredictable—even if the underlying models continue to impress. If Anthropic can diagnose and fix the rapid‑depletion issues quickly, while giving users more control and insight into their own consumption, Claude Code may emerge from this episode stronger and more battle‑tested than before.
