The token bill is coming | Centered Networks

Dell reported $43.84 billion in revenue, up 88% year over year, with AI server revenue of $16.1 billion in a single quarter and a full-year AI server outlook raised to $60 billion. The headline writes itself: AI demand is staggering, and the companies selling the hardware are the early winners.

It is a good headline. It is also only the first chapter. Every one of those servers exists to do one thing: run AI workloads at scale. And when those workloads leave the lab and reach production, the cost of running them stops being someone else’s capital expense and starts being your operating expense. Metered, monthly, and growing with every person you add.

The servers shipping today are the leading edge of a cost wave that will land on organizational budgets over the next several quarters, denominated not in GPUs but in tokens. Most teams are not ready for it, because the economics of an AI pilot and the economics of AI in production are not the same thing, and almost nobody plans for the gap.

“The hardware boom is a capital expense for someone else. For everyone running AI in production, it arrives as an operating expense. And operating expenses are where CFOs live.”

A proof of concept is cheap by design. A handful of people, a few hundred queries a day, short prompts, a single model. The monthly bill rounds to a rounding error, the demo lands, and the project gets a green light. The number everyone remembers is the pilot number.

Production is a different animal. The same use case, rolled out across the organization, changes shape in ways that compound:

Volume. Ten testers become 10,000 staff, each making requests all day instead of a few times during a demo.
Context. Useful answers need grounding. Retrieval pulls documents, policies, and history into every prompt, so the token count per request climbs from hundreds to tens of thousands.
Agents. A single user action can trigger a chain of model calls (planning, tool use, verification, retries), where one question quietly becomes a dozen billable steps.
Always-on. Summarization, classification, and monitoring jobs run in the background whether anyone is watching or not.

None of this is waste. It is what makes the output good. But it means the unit cost that justified the pilot can be off by one or two orders of magnitude by the time the same feature is serving the whole organization. The bill doesn’t grow with adoption alone; it grows with adoption multiplied by the richness of every interaction.

If this pattern feels familiar, it should. It is the cloud story, told again with a different unit of measure.

A decade ago, organizations moved to the cloud on the promise of elasticity: no more data centers, pay only for what you use, scale on demand. All true. And for a few years the bills were small enough that nobody scrutinized them. Then the workloads grew, the architecture got casual, and the monthly invoice became one of the largest and least predictable lines in the budget. Finance noticed. An entire discipline, FinOps, grew up to answer a question the early enthusiasm had skipped: what are we actually paying per unit of value, and can we bring it down without breaking anything?

AI is now entering the same phase, faster. The on-premises-versus-cloud debate was never really about where the servers lived; it was about who controlled the cost curve and how visible it was. The same question is already forming around AI: rent intelligence by the token from a frontier provider, run smaller models on infrastructure you control, or (most likely) a deliberate mix of both, chosen workload by workload. Dell’s order book is one side of that decision being placed at scale right now.

So far, most AI spending has been governed by curiosity and competitive pressure. Budgets were approved on the strength of the demo and the worry of falling behind. That window is closing. As AI moves from initiative to infrastructure, it crosses a threshold every meaningful technology eventually reaches: it gets a budget owner who asks hard questions.

Those questions are coming whether teams are ready or not:

What does this feature cost per user, per request, per resolved outcome?
Are we using a frontier model where a smaller one would do?
How much of the bill is retries, runaway context, or jobs nobody uses?
Can we forecast next quarter’s spend within a reasonable margin?
What is the return that justifies the run rate?

The organizations that struggle won’t be the ones that adopted AI too slowly. They’ll be the ones that scaled it with no instrumentation, no unit economics, and no plan for the moment the meter started running in earnest.

The good news: token costs are far more controllable than most teams realize. And unlike the early cloud era, the tooling to manage them is arriving alongside the workloads, much of it native to the Microsoft and Azure platform where these workloads already run. The work is mostly engineering and governance, not magic.

Right-size the model. Route each task to the smallest model that meets the quality bar. A capable small model handling the bulk of traffic, with a frontier model reserved for the hard cases, can cut spend sharply with no visible drop in quality.
Cache what repeats. Prompt and response caching turns expensive repeated work into near-free lookups, often the single largest lever available.
Discipline the context. Retrieve precisely instead of stuffing the prompt. Most production token bills are inflated by context that never needed to be there.
Instrument before you scale. Cost-per-feature observability and token budgets turn a surprise invoice into a managed number you can see, attribute, and forecast.
Govern the architecture. Set boundaries on agent loops, retries, and runaway calls, and pair every deployment with the identity and data governance that keeps it safe for the donor, patient, and constituent data mission-driven organizations hold.

Done well, none of this dulls the product. It is the difference between AI that quietly compounds in value and AI that quietly compounds in cost.

The hard part isn’t any single lever. It is the judgment of where to place each one. Push too hard on cost and you ship something slow and disappointing that staff abandon. Ignore cost and you ship something good that finance shuts down a year later when the invoice arrives. The right answer is a deliberate balance, set per workload, revisited as both prices and models keep moving.

This is the work we do. Centered Networks helps mission-driven organizations, foundations, and rural hospitals move AI from pilot to production on the Microsoft platform, with cost discipline and governance built in from the start rather than bolted on after the first painful bill. We map use cases to the right models, design the architecture so it scales economically, and put the instrumentation in place so finance and IT read the same numbers before anyone is surprised by them.

The AI infrastructure boom is real, and the value on the other side of it is real too. The organizations that capture that value will be the ones that treated cost as a design input from day one, not the ones that waited for the token bill to teach them the lesson the cloud already tried to.

Source: AI server demand drives staggering revenue growth; Dell stock soars, SiliconANGLE, May 28, 2026.

Know what your AI will cost before you scale it.

Start with a two-week Discovery Sprint. We’ll map your highest-impact AI use cases, pressure-test them against real production economics, and build a 90-day roadmap. No commitment beyond insight.

Start a Discovery Sprint

The token bill is coming.

What Dell’s record quarter is really telling us.

Pilot economics hide the real number.

We have seen this movie before.

The CFO is about to enter the conversation.

What good cost discipline actually looks like.

Charting the right balance.

Know what your AI will cost before you scale it.