Key takeaways
- Model inference fees are the most visible AI cost but typically represent only ten to twenty per cent of total enterprise AI spend once integration, governance, and support are included.
- Inference costs vary by several orders of magnitude depending on model choice, deployment pattern, and workload design. A well-optimised AI workflow can cost ten to fifty times less than a naively assembled one using the same underlying model.
- Integration and maintenance engineering is consistently the largest hidden cost for organisations in the first two years of AI deployment.
- Governance overhead is real, measurable, and frequently underbudgeted. It typically runs at fifteen to thirty per cent of the operational AI investment in regulated environments.
Why AI cost benchmarks are difficult
AI costs are unusually hard to benchmark because the inputs vary enormously: different models, different deployment architectures, different workloads, different governance requirements, and different organisational maturity all produce cost structures that resist direct comparison.
Provider pricing also changes frequently, sometimes dramatically. What was a useful reference number in one quarter may be significantly different six months later as model efficiency improves, competition increases, or providers rebalance their pricing tiers.
The ranges in this piece are intended as planning references, not procurement benchmarks. They represent the landscape of what enterprise organisations are actually spending across the main cost layers, based on public pricing, research, and reported enterprise experience. They will drift as the market evolves, but the structural patterns — the relative size of each layer, the variance drivers, and the cost optimisation levers — are more durable than specific numbers.
Inference costs
Inference is the cost of running a model to produce an output. It is the most visible line item in enterprise AI spend and the one most frequently used as a proxy for total AI cost. That proxy is misleading in most enterprise contexts, but understanding inference costs is still the right starting point.
Per-token pricing (API-based) is the dominant commercial model for frontier and large language model usage. Current ranges (mid-2026) vary substantially:
- High-capability frontier models (top-tier reasoning, multimodal): $3–$30 per million output tokens
- Mid-tier models (strong general-purpose, fast): $0.40–$3 per million output tokens
- Efficient small models (specific tasks, constrained contexts): $0.05–$0.40 per million output tokens
- Self-hosted open models (on-managed infrastructure): effective cost depends on utilisation; typically $0.10–$1.50 per million tokens at reasonable scale once infrastructure is included
To translate token pricing into task costs: a document summary typically uses 2,000–8,000 tokens; a code review 3,000–15,000 tokens; a complex analysis 10,000–60,000 tokens. At $1 per million output tokens, a hundred-thousand-document-summary operation costs roughly $200–$800 in inference fees alone.
Workload design matters more than model choice. Context stuffing — sending large amounts of text when only a fraction is needed — can increase costs by five to twenty times for the same task. Caching, batching, and prompt engineering are not optimisation niceties; they are the primary levers on inference spend. Organisations that optimise workload design routinely achieve fifty to eighty per cent cost reductions on the same outputs compared to initial deployment patterns.
Hyperscaler AI services (AWS Bedrock, Azure OpenAI Service, Google Vertex AI) add a platform margin above raw model costs — typically ten to twenty-five per cent — but offer tighter integration with existing governance, access control, and billing infrastructure. For organisations already running significant cloud workloads, this margin is often worth paying for the operational simplicity.
Platform and orchestration costs
Most enterprise AI use cases are not direct API calls. They involve an orchestration layer — retrieval-augmented generation pipelines, agent frameworks, workflow automation, or AI application platforms — that carries its own costs.
Enterprise AI platform licensing (for products like Microsoft Copilot, Salesforce Einstein, ServiceNow AI, Workday AI) typically runs at $20–$60 per user per month for AI-augmented seat licences, often layered on top of existing platform fees. At scale, this becomes the dominant AI cost line for many organisations — a workforce of five hundred knowledge workers using AI-augmented productivity tools costs $120,000–$360,000 annually in licence fees before any customisation or integration work.
Orchestration infrastructure (LLM gateways, vector databases, embedding services, retrieval pipelines) typically adds thirty to sixty per cent on top of inference costs for production-grade RAG or agentic deployments. This layer is often underestimated because it does not appear on the model provider's invoice.
Integration and engineering costs
Integration engineering is consistently the largest underestimated cost category in enterprise AI deployments. Building an AI capability into a business workflow requires connecting it to data sources, access controls, existing applications, and output consumers — none of which is trivial.
Initial integration build: $50,000–$300,000 for a production-grade, single-use-case integration in a typical enterprise environment. The wide range reflects the complexity of the target workflow and the quality of the existing data infrastructure. Simpler document processing use cases sit at the lower end; complex multi-system integrations involving real-time data, compliance controls, and legacy system connections sit at the upper end.
Ongoing maintenance: fifteen to twenty-five per cent of the initial build cost per year, driven by model updates, prompt degradation, upstream system changes, and expanding scope. Organisations that treat AI integrations as one-time builds rather than maintained software assets consistently underperform on total cost of ownership.
Enablement and change management: often excluded from technical cost models but typically equivalent to thirty to fifty per cent of the engineering cost for any workflow-level AI deployment. Tools that require significant user behaviour change have higher realisation costs, not lower.
Governance and oversight costs
Governance costs are real, measurable, and consistently underestimated — particularly in regulated environments.
Prompt and output review: depends heavily on the use case risk level. Low-risk applications (summarisation, drafting assistance with human review) add ten to fifteen per cent to operational cost. Medium-risk applications (analysis used in decisions, customer-facing outputs) add twenty to forty per cent. High-risk or regulated applications (financial recommendations, clinical applications, legal documents) can add fifty to one hundred per cent of the raw AI cost in oversight and review.
Audit and compliance documentation: $20,000–$100,000 per year for a moderate AI portfolio in a regulated industry, driven by the requirement to document model versions, prompt changes, output sampling, and incident records. This is not optional in financial services, healthcare, or government contexts.
Model risk management: organisations subject to SR 11-7 or equivalent model risk frameworks treat AI models as managed models, which adds a structured validation and ongoing monitoring cost. Initial validation: $30,000–$150,000 per model in scope. Annual monitoring: ten to twenty per cent of the initial validation cost.
Support and operations
Production AI workloads require operational support that differs from conventional software because the failure modes are different. An AI workflow can degrade gradually — producing outputs of lower quality rather than failing visibly — which means monitoring requires evaluation of output quality, not just system availability.
AI operations (AIOps/MLOps) overhead: five to fifteen per cent of the total AI investment per year for organisations with mature practices. Organisations without dedicated AI operations capability typically see this cost absorbed into engineering time at higher effective rates.
Incident response: AI incidents (hallucinations at scale, prompt injection, data leakage, unexpected output patterns) require different response capabilities than conventional software incidents. The cost of not having these capabilities is not in the operations budget — it appears in remediation, legal, and reputational recovery.
Total cost of ownership: indicative ranges
The following ranges represent the total annual cost of ownership for enterprise AI at different scale and complexity levels, combining all layers above.
Proof-of-concept or limited pilot (one or two use cases, controlled environment, limited users): $80,000–$300,000 in year one, primarily engineering and platform costs. Inference costs are typically negligible at this scale.
Departmental deployment (one or two use cases in production, active user population, governance in place): $300,000–$1.2m annually in stabilised operation, depending on use case risk, regulatory environment, and user scale.
Enterprise platform deployment (multiple use cases, shared AI platform, governance function, portfolio oversight): $2m–$15m annually. At this level, platform fees and governance infrastructure dominate. Inference costs represent a smaller fraction of total spend than at earlier stages.
What drives variance most: governance requirements, the quality of existing data infrastructure, whether the organisation is building on existing platforms or assembling a custom stack, and the maturity of the engineering team working on AI. Organisations with well-governed cloud infrastructure and strong data practices consistently report lower total AI costs than those assembling AI capability on top of fragmented or ungoverned data environments.
Implications for planning
Three observations are consistently useful for organisations building AI cost plans.
First, optimise workload design before optimising vendor selection. The difference between a well-designed and poorly-designed AI workflow is larger than the difference between any two commercial model providers at comparable capability levels. Invest engineering time in workload design early.
Second, plan for governance costs from the start, especially in regulated environments. The most common cost surprise in enterprise AI programmes is the governance overhead that appears in year two — after the initial deployment, when the organisation realises it needs to manage the growing portfolio seriously.
Third, model total cost of ownership over three years, not year one. Year-one costs are inflated by integration engineering. Year-two and year-three costs reveal the stabilised operating cost, which is the number that should drive portfolio and scaling decisions.