The observe of tokenmaxxing seems to be dying out, even earlier than I had an opportunity to put in writing about it. Good riddance. Burning tokens to create the looks of productiveness was fated to final solely till the accountants discovered about it, and the strictest of all accountants is one’s private checkbook. What acquired many builders occupied with the price of AI was the change in GitHub Copilot’s utilization fees. The price of Copilot went from a month-to-month price with limitless use to a monthly fee that bought a restricted variety of credit, that are used to pay the AI supplier of your selection. One credit score is equal to US$0.01; whenever you’ve used up your credit, you’ll be able to improve your account or pay for extra credit as you go.

The query isn’t why this didn’t occur earlier; it’s why this occurred now. Tokenmaxxing is each the creation and sufferer of two large-scale developments in AI. First, beginning with OpenAI, the key AI suppliers had been all enjoying a blitzscaling recreation that prioritized consumer development over profitability. Giving AI providers away without spending a dime acquired you extra customers, and in the long term, scalers would work out how one can earn money from end-user charges, promoting consumer knowledge, or promoting. This course of inevitably ends in enshittification, and continues to be very a lot the highway we’re on.

Second, token utilization exploded late in 2025. The looks of “reasoning fashions,” which use tokens to keep up an inner dialog in the middle of fixing an issue, elevated the variety of tokens used to answer every immediate. Reasoning tokens are a mannequin’s dialog with itself about doable responses to the immediate, and are sometimes extra quite a few than the immediate and response themselves. Whether or not or not customers see the reasoning course of (usually they don’t), reasoning tokens add to the invoice. They’re regularly counted as “output tokens” as a result of they’re generated by the mannequin, and are costlier than enter tokens.

The looks of brokers additionally multiplied the speed at which customers consumed tokens. In Might, 2025, Simon Willison quoted Anthropic’s Hannah Moran’s definition of an agent: “Brokers are fashions utilizing instruments in a loop.” The Tredence weblog writes: “The agent loop is a repeating cycle wherein the AI reads the present knowledge, thinks by means of what it means, chooses an motion, carries it out, checks what occurs and begins over.” If you happen to’ve ever watched Claude Code, OpenClaw, or every other agent work, a single request can turn out to be many calls to a mannequin, every one utilizing a whole lot of tokens, if not hundreds. Along with the present request, one agent-generated invocation can comprise the duty’s whole gathered context and related paperwork. Between reasoning tokens and brokers, token utilization goes up by an element of a whole lot.

The rise in token utilization may not be a difficulty if it leads to issues being solved and duties accomplished extra successfully. But it surely collides with the loss-leader pricing of the blitzscalers; their willingness to function at a loss to realize management of a market has limits. No matter whether or not the variety of AI customers is rising, the quantity of computation, and due to this fact value, per consumer grows as the usage of brokers will increase. Reasoning fashions elevated token utilization; brokers compounded the issue; and that led to cost will increase.1 Microsoft/GitHub doesn’t need to pay Copilot prospects’ AI payments. We haven’t but seen across-the-board worth will increase from the AI suppliers themselves. However we have now seen GitHub’s token credit, and we have now seen Anthropic and OpenAI worth extra succesful fashions considerably greater than older or much less succesful fashions. Fable is twice as costly as Opus 4.8, and whereas some writers have referred to as this pricing “improbable,” that’s most likely as a result of they had been anticipating an excellent larger enhance. Whereas Fable can delegate duties to Anthropic’s cheaper fashions, most early customers observe that with Fable, token use goes up somewhat than down. Anthropic’s change to token-based billing for its agent SDK (currently on hold) is one other sign that the times of cheap AI are coming to an finish. OpenAI’s story is comparable: GPT 5.5 prices twice as a lot GPT 5.4 per million tokens.

It’s additionally necessary to take capability under consideration. Large knowledge facilities have been within the information, however these knowledge facilities haven’t been constructed but. Extra necessary, {the electrical} infrastructure wanted to assist these knowledge facilities—transmission traces, mills—hasn’t been constructed both, and that’s not an funding over which AI firms have a lot management. They will construct their very own energy era amenities on an information middle campus, however that’s an enormous funding in applied sciences that they’re not conversant in. And even should you generate energy domestically, you want different kinds of infrastructure: rail for coal, pipelines for gasoline. This isn’t (but) an essay about knowledge middle energy consumption and its penalties, however it’s one other issue that limits elevated token utilization. We’ve seen Anthropic’s outages blamed on capability, and Anthropic has responded by leasing unused knowledge middle capability from SpaceX. However the different means to answer elevated demand that may’t be met by present capability is to extend costs, limiting prospects to those that can afford to pay. That enhance is being seen by managers, accountants, and unbiased builders.

Token optimization and accountability are the inevitable consequence of upward strain on token worth. One strategy to construct accountability is thru higher governance, a route Bennie Haelen describes in “The Subsidy Ended: What Tool-Using Agents Actually Cost.” Higher governance is achieved by means of constructing an observability layer that permits you to see precisely what the brokers and fashions are doing. With a well-designed observability layer, you’ll be able to see whether or not the information despatched to the mannequin is rising with every invocation, whether or not the mannequin is utilizing acceptable instruments, whether or not instruments are being referred to as repeatedly, and quite a lot of different data that can inform you whether or not your agent is operating effectively.

One other piece of token accountability is knowing which fashions are operating your agent’s requests. Normal-purpose reasoning fashions vary from costly high-performance fashions like Claude Fable or Opus 4.8 to fashions like Gemma 4 26B that may run on a well-equipped laptop computer, and a few fashions which can be even smaller. Whereas it’s tempting to say “I want one of the best; I’ll run Opus 4.8 or Fable with most reasoning,” most requests don’t require that degree of reasoning or expense. Brokers will be capable of determine what mannequin is finest for processing each request. Fable can delegate, and we anticipate different frontier suppliers to comply with as fashions incorporate agent capabilities. And there’s an energetic world of open fashions exterior of the frontier AI suppliers. Vicki Boykis writes that fashions operating domestically now work virtually in addition to frontier fashions. Instruments like OpenRouter offer you a model-independent means of routing requests to totally different fashions, together with open fashions that run domestically. OpenRouter may be built-in with OpenClaw, Claude Code, Cursor, Codex, and different brokers to offer clever routing.

Tokenmaxxing is dying. It’ll little doubt take time for its vestiges to die away, and there’ll all the time be builders who suppose they’ll recreation the trail to a promotion, together with managers who insist on being “all in” with AI. However spending tokens responsibly is now the norm, whether or not you pay with your individual checkbook or an organization account. Token optimization will solely turn out to be extra necessary as per-token fees enhance. They undoubtedly will.

Footnotes


Source link