The ROI
Reckoning
Uber burned its entire 2026 AI budget in four months and cannot draw a line to shipped features. Meanwhile, frontier firms are pulling 3.5x ahead -- not by spending more, but by deploying differently. The gap between AI adoption and AI production has become a financial event.
The Spend Without a Line
Uber exhausted its entire 2026 AI coding tools budget by April. Microsoft is canceling internal AI licenses by the thousands. The problem is not that AI is expensive -- it is that the spending was decoupled from production outcomes. And the firms that got this right are now 3.5 times further ahead than everyone else.
Uber's chief operating officer, Andrew Macdonald, made an admission in May 2026 that every enterprise CFO should read carefully. The company spent its entire annual AI coding tools budget within the first four months of the year. Average per-engineer costs ran $150 to $250 per month. Power users hit $500 to $2,000. Uber's own CTO spent $1,200 in a single two-hour session. And when the budget ran out, Macdonald stated the core problem directly: "If you're not actually able to draw a direct line to how many useful features and functionality you're shipping, that trade becomes harder to justify."
The attribution gap -- the distance between AI spend and production output -- is the defining enterprise AI governance failure of 2026. Separately, Microsoft has begun canceling the majority of its internal Claude Code licenses and redirecting thousands of engineers to GitHub Copilot. The phenomenon now has a name: tokenmaxxing, which describes what happens when employees run trivial or speculative tasks through AI tools to maximize usage metrics rather than production value. Enterprise AI plans marketed as "unlimited" carry real token costs. The firms that deploy before building governance infrastructure are discovering that the hard way.
The problem is not that Uber bought the wrong tools. The problem is that Uber built its AI deployment around adoption metrics -- usage volume, seats activated, sessions completed -- because those were the metrics its vendors optimized for, and because they were easy to report. Production outcomes are harder to attribute and slower to materialize. Most enterprise AI rollouts are still measured the way Uber's was: by how much gets used, not by how much gets built.
The same week, OpenAI published its B2B Signals report, which documents what the firms that avoided this failure actually look like. Frontier firms -- organizations that have embedded AI into production workflows -- now consume 3.5 times as much intelligence per worker as typical firms. A year ago that gap was 2x. The gap is widening faster than most organizations are closing it. But the report is careful about causality: message volume explains only 36% of the frontier advantage. Frontier firms send 16x as many Codex messages per worker as typical firms -- not because they spend more on AI, but because AI is doing more of the actual execution work. Typical firms use AI to answer questions. Frontier firms use it to complete tasks.
These two data points tell the same story from opposite ends. Uber's tokenmaxxing crisis is what happens when AI deployment is measured by consumption. The frontier firm advantage is what happens when it is measured by output. The difference is not budget size, model quality, or vendor choice. It is whether the AI is running beside the workflow or inside it -- and whether there is a measurement system that can tell the difference.
The tokenmaxxing problem is a governance problem, not a technology problem. The enterprises now burning through budget without ROI attribution built their deployments around adoption metrics because those were the metrics their vendors optimized for. The fix is not to spend less. It is to build the measurement infrastructure that connects AI spend to production outcomes before scaling. Every dollar you scale before that infrastructure exists is a dollar you will have difficulty defending when the board asks where the value went.
Moving Pieces
Three of the Four Largest Consulting Firms Now Run on Anthropic's Models
Deloitte (470,000 employees, DARTfusion platform), KPMG (276,000 employees, 138 countries), and PwC (scaling from 30,000 to 364,000 active users by year-end) all announced enterprise-wide Claude deployments this week. EY went a different direction, committing to Microsoft Copilot. OpenAI's $4 billion Deployment Company -- backed by McKinsey, Bain, and Capgemini -- arrived simultaneously, with the same stated mission: close the enterprise AI execution gap. The advisory infrastructure that sits between AI vendors and enterprise buyers now has material commercial alignment with the vendors it recommends. Ask your advisors which platforms they have embedded before accepting any roadmap. The answer is more informative than the roadmap.
Modular AI Data Centers Reach Enterprise Investment Grade
Armada closed a $230 million Series B at a $2 billion valuation, with BlackRock and Johnson Controls as co-investors. Customer bookings grew 540% year over year; Q1 FY27 alone saw a 2,000% spike. An Arizona factory launching this summer will run continuous production of Leviathan-class modular systems -- from briefcase-sized units to megawatt-scale containerized data centers. Enterprise demand for AI infrastructure that doesn't require a hyperscaler lease agreement has reached the scale where institutional capital follows.
Canada Rules ChatGPT Was Built on Broken Privacy Consent
A joint federal-provincial investigation concluded that OpenAI violated Canadian privacy law in training and deploying ChatGPT: overcollection, no valid consent, factual inaccuracies involving personal data, inadequate deletion rights. The federal commissioner conditionally resolved the matter; British Columbia and Alberta refused, stating consent for scraped data cannot be obtained retroactively. For enterprise legal and compliance teams: this is the clearest jurisdictional ruling yet on the consent gap embedded in LLM training pipelines, and it has implications for every enterprise AI vendor relationship subject to Canadian law.
ServiceNow Otto: The Enterprise "AI Agent of Agents"
ServiceNow introduced Otto at Knowledge 2026 -- a unified AI layer combining Now Assist, Moveworks, and AI Experience into a single enterprise interface. Otto understands employee intent, routes work to the right AI agent, and executes to completion across departments without requiring tool-switching. Governance runs through AI Control Tower: every interaction logged, every policy enforced, every decision explainable. The enterprise AI orchestration layer -- who owns the dispatch function for all the other agents -- is now a defined product category. ServiceNow is staking its platform future on owning it.
Uber exhausted its entire 2026 AI coding tools budget by April, four months into the year. Per-engineer costs ranged from $150 to $250 per month for typical users, and from $500 to $2,000 for power users. The company's COO acknowledged no clear attribution between that spending and shipped product features. The number is not exceptional. It is representative. Most enterprise AI budgets were sized for steady-state adoption, not for agentic AI that consumes up to 1,000 times more tokens than standard queries. The firms that built usage governance before scaling avoided this budget event. Most did not build it first.
The Infrastructure Bet Is Not Slowing Down
While enterprise operators question AI ROI and cancel licenses, hyperscalers are locking in infrastructure at a scale that implies a fundamentally different view of where demand is headed.
Microsoft spent $11.1 billion on data center leases alone in Q1 FY2026. It guided approximately $40 billion in hardware and data center spend for Q2. The company expects to increase AI capacity by more than 80% through fiscal 2026. This is the same company that is canceling internal Claude Code licenses and tightening AI tool spend at the team level.
The two things are not contradictory -- they operate on different time horizons and serve different functions. The hyperscaler CapEx bet is about long-run AI demand: inference at scale, compute rental, and platform services for enterprises that eventually solve their governance problems. The tokenmaxxing pullback is short-run budget management at the team level. Both can be rational simultaneously. What the juxtaposition reveals is that the supply side has high conviction about where AI value lands. The demand side is still developing the discipline to capture it.
Measure What Ships, Not What Runs
There is a standard playbook for failing at enterprise AI. Deploy broadly, measure adoption, report seats and sessions to the board, and call it progress. Uber followed that playbook. Microsoft's internal teams followed it. Thousands of companies are following it right now and will discover the budget problem when the year's AI spend is exhausted in Q1 or Q2.
The counterplaybook is documented, even if it is not yet widely practiced. OpenAI's B2B Signals data shows what it looks like when a firm builds AI into execution rather than inquiry. The 3.5x intelligence advantage that frontier firms carry is not a product of larger budgets or better models. It is a product of measurement discipline -- knowing which tasks the AI is completing, tracing that completion to outcomes, and scaling only the deployments that show a production line. Codex usage at 16x the typical rate is not a sign of reckless spending. It is a sign of AI embedded in work product delivery.
The practical question for any enterprise leader reviewing AI spend right now: for each tool in your portfolio, can you draw the line from usage to output? Not "how many sessions" or "how many tokens" -- but "what shipped, what closed, what was decided, what didn't require a human to do manually?" If you cannot draw that line today, you cannot defend the budget tomorrow. And you cannot build toward frontier firm performance without it.
Build the measurement infrastructure before you scale the deployment. In that order. Every time.