Anthropic final month diminished the TTL (time to stay) for the Claude Code immediate cache from one hour to 5 minutes for a lot of requests, however stated this could not improve prices regardless of customers reporting sooner depleting quotas.

Consumer Sean Swanson posted a bug report exhibiting that Anthropic launched a one-hour cache for Claude Code context round February 1, then modified it again to a five-minute cache round March 7. “The 5m TTL is disproportionately punishing for the long-session, high-context use case that defines Claude Code utilization,” stated Swanson.

When utilizing AI coding assistants or brokers, the context is extra knowledge despatched together with the consumer’s prompts, reminiscent of current code or background directions. Context improves the accuracy of the AI but additionally requires extra processing.

Claude prompt caching avoids re-processing beforehand used prompts together with context and background info. The cache can have both a five-minute or one-hour TTL. Writing to the five-minute cache prices 25 p.c extra in tokens, and writing to the one-hour cache one hundred pc extra, however studying from cache is round 10 p.c of the bottom worth.

Jarred Sumner, the creator of the Bun JavaScript runtime who now works for Anthropic, agreed that the evaluation was “good detective work” however claimed that the change again to the five-minute cache made Claude Code cheaper as a result of “a significant share of Claude Code’s requests are one-shot calls the place the cached context is used as soon as and never revisited.” Sumner stated that the Claude Code shopper determines the cache TTL routinely and there are not any plans for a world setting.

Swanson revised his evaluation in response, agreeing that classes utilizing subagents do profit from the decrease write value of the five-minute cache since they work together rapidly and “their caches virtually by no means expire.” Nonetheless, he stated he has been a $200 per thirty days subscriber for over six months and had by no means hit a quota restrict till March. The “additional burn fee” is “making a as soon as nice service unusable,” he stated.

One other issue is that the massive one-million-token context window out there on paid plans with the Claude Opus 4.6 or Sonnet 4.6 fashions will increase prices, particularly with cache misses. Claude Code creator Boris Cherny said that “immediate cache misses when utilizing 1M token context window are costly… in the event you go away your pc for over an hour then proceed a stale session, it is typically a full cache miss.” He stated that Anthropic is investigating a 400,000-token context window by default, with an possibility for a million tokens if most popular. There may be already a configuration setting for this.

Cherny stated that bigger contexts at the moment are frequent as a result of customers are “pulling in numerous abilities, or operating many brokers or background automations.”

Some builders are satisfied that cache rebuilding and cache misses are main components in Claude Code quota exhaustion, which has reached the purpose the place Professional customers ($20 per thirty days) could get as few as two prompts in five hours. A lot of bugs within the caching code have been reported, such that one consumer said: “Earlier than these are mounted doubtless any 5 minutes vs 1 h dialogue is completely moot since numbers are completely flawed.”

The deal with cache optimization can also be proof that, beneath the covers, Anthropic’s quotas are merely shopping for much less processing time than they did.

Swanson will not be alone in reporting that Claude’s efficiency has dropped. For instance, a consumer on the enterprise crew plan said: “In March I might use Opus all day and it was getting nice outcomes. For the reason that final week of March and into April, I’ve had classes the place I maxed out session utilization beneath 2 hours and it bought caught in overthinking loops, a number of turns of realising the identical factor, dozens of paragraphs of ‘however wait, truly I must do x’ with slight variations.” That chimes with similar comments from an AI director at AMD.

Cache optimization could also be vital, however it appears unlikely to account for all these reported points. ®


Source link