What AI Actually Costs: Reference Cost Ranges for Enterprise AI

Key takeaways

Interpretation
Model inference fees are the most visible AI cost but typically represent only ten to twenty per cent of total enterprise AI spend once integration, governance, and support are included.
Evidence
Inference costs vary by several orders of magnitude depending on model choice, deployment pattern, and workload design. A well-optimised AI workflow can cost ten to fifty times less than a naively assembled one using the same underlying model.
Interpretation
Integration and maintenance engineering is consistently the largest hidden cost for organisations in the first two years of AI deployment.
Evidence
Governance overhead is real, measurable, and frequently underbudgeted. It typically runs at fifteen to thirty per cent of the operational AI investment in regulated environments.

Why AI cost benchmarks are difficult

AI costs are unusually hard to benchmark because the inputs vary enormously: different models, different deployment architectures, different workloads, different governance requirements, and different organisational maturity all produce cost structures that resist direct comparison.

Inference costs

Inference is the cost of running a model to produce an output. It is the most visible line item in enterprise AI spend and the one most frequently used as a proxy for total AI cost.

High-capability frontier models (top-tier reasoning, multimodal): $3–$30 per million output tokens
Mid-tier models (strong general-purpose, fast): $0.40–$3 per million output tokens
Efficient small models (specific tasks, constrained contexts): $0.05–$0.40 per million output tokens
Self-hosted open models (on-managed infrastructure): effective cost depends on utilisation; typically $0.10–$1.50 per million tokens at reasonable scale once infrastructure is included

Log-scale range bars of inference cost per million output tokens by model class: frontier, mid-tier, small, self-hosted — Inference cost ranges by model class. Inference is only 10 to 20 percent of total AI spend.

Real-world cost variance: Uber spent its entire 2026 AI coding budget in four months, with per-engineer API costs running $500–$2,000 per month. The variance was driven not by model choice but by workload design — how much context was sent with each request, how often agents re-sent growing context, and whether caching was used effectively.

Platform and orchestration costs

Most enterprise AI use cases are not direct API calls. They involve an orchestration layer — retrieval-augmented generation pipelines, agent frameworks, workflow automation, or AI application platforms — that carries its own costs.

Adoption versus spend reality: Market observations in early 2026 indicated significant enterprise Copilot subscription cancellations, with organisations citing low adoption and unclear value as primary reasons. The pattern: seats purchased on potential, cancelled on measured reality. Many organisations discovered they were paying for hundreds of seats with adoption rates below 40%.

Integration and engineering costs

Building an AI capability into a business workflow requires connecting it to data sources, access controls, existing applications, and output consumers — none of which is trivial.

When integration costs exceed value: Starbucks retired its inventory management AI agent after discovering the system's recommendations were being routinely overridden by store managers who understood local context the model missed. The agent consumed resources but delivered no measurable improvement over human judgment. The retirement decision came only after establishing baseline performance metrics and calculating full integration and maintenance costs.

Governance and oversight costs

Support and operations

Production AI workloads require operational support that differs from conventional software because the failure modes are different.

Total cost of ownership: indicative ranges

Unbudgeted spend at scale: In 2026, Marc Benioff said Salesforce expected to spend about $300 million on Anthropic tokens in 2026, largely for coding-related work. The pattern: distributed procurement, embedded AI tiers in SaaS contracts, and departmental experiments that never appeared in central IT budgets.

Optimist

Sceptic

The Optimist's Case

The Sceptic's Case

Implications for planning

Three observations are consistently useful for organisations building AI cost plans.

References and further reading

BCG, The Widening AI Value Gap: Build for the Future, 2025
BCG, From Potential to Profit: Closing the AI Impact Gap, 2024
FinOps Foundation, FinOps for AI: Scopes and Capabilities, 2025
AWS, Closing the AI Value Gap, 2024
Fortune, "Uber COO: AI spending on tokens like Claude Code is hard to justify", 26 May 2026, https://fortune.com/2026/05/26/uber-coo-ai-spending-tokens-claude-code/