Skip to content

Flagship essay

The AI Value Gap

Why AI investments frequently fail to capture economic value and what leaders must change to close the gap.

The central challenge in enterprise AI is no longer access to models. It is the discipline required to connect capability, cost, ownership, and proof before demand scales beyond governance.

Strategic thesisCIOCFOFinOpsTBM
The AI Value Gap: Visibility, Accountability, and Proof
The three questions most organisations cannot yet answer.

Core diagnosis

Most AI portfolios scale spend before they scale proof.

FinOps Foundation reported in 2026 that 98% of practitioners now manage AI spend, up sharply from 31% in 2024. Yet McKinsey's 2025 State of AI suggests only around 5% to 6% of organisations are seeing clear financial ROI. Cost is becoming real faster than value is becoming provable.

1. The problem statement

Enterprise AI has entered a new phase. The question is no longer whether organisations can access capable models, launch pilots, or generate internal enthusiasm. The question is whether they can govern AI as an economic system with enough clarity to justify continued investment.

The external evidence is increasingly consistent. FinOps Foundation says AI spend is now part of almost every FinOps practice. IBM found late in 2025 that only 29% of executives felt confident in their ability to measure AI ROI. Kyndryl reported that 61% of CEOs felt more pressure than a year earlier to prove AI returns. Deloitte's 2025 work points to a much longer payback cycle for AI, typically two to four years rather than the seven to twelve months many leaders expect from more standard technology investments. IDC has also warned that large enterprises risk materially underestimating infrastructure cost as AI estates scale, while a large share of AI tool spending still sits outside formal IT budgets.

This creates a familiar but dangerous pattern. Organisations industrialise access, experimentation, and platform build-out before they industrialise visibility, accountability, and proof. MIT's 2025 reporting on generative AI pilots suggests failure rates remain extremely high. McKinsey's survey data suggests widespread AI use, but only a small minority of companies can translate that use into financially credible outcome claims. The economic gap is not a side issue. It is becoming the main management problem.

This is the AI Value Gap: the distance between AI ambition and AI evidence. It appears when leaders cannot state what AI really costs, who owns the economic result, or what proof standard should govern scale decisions.

2. The three gaps: visibility, accountability, proof

Visibility gap

The visibility gap exists when leaders can see the model invoice but not the economic system around it. IDC says 56% of AI tool spending happens outside IT budgets. Apptio and IBM point to higher IT budgets in 2026 with AI as a major driver, while IDC warns that underestimated infrastructure cost can rise sharply over the next two years. In practice, that means the visible line item is often the least complete one.

What is missing is the surrounding stack: platform engineering, retrieval and context systems, evaluation, governance controls, integration work, support, and the labour required to keep AI useful in production. A pilot can look inexpensive when only API spend is counted. The economics often change once the organisation needs reliability, security, workflow integration, and human oversight.

Visibility question

Can we state total AI cost within 20% accuracy?

A board-ready answer needs platform, model, cloud, data, integration, governance, support, and labour costs together. If only the vendor invoice is visible, decision quality is still weak.

Accountability gap

The accountability gap appears when many functions influence AI value but no single owner is answerable for the economic result. Technology may deliver the capability. A business unit may sponsor it. Finance may review the spending. Risk, legal, procurement, and architecture may add controls. Yet nobody owns whether the use case is still worth funding six months later.

Deloitte's 2025 research suggests only 16% of organisations have fully designed the roles, processes, and operating models needed for AI integration. That is a direct explanation for why many AI investments drift. Delivery ownership exists, but economic ownership does not. Without a named owner for cost, outcome, and proof standard, underperforming work survives on narrative and platform spend becomes socially distributed in ways that nobody can challenge cleanly.

Proof gap

The proof gap is the failure to show that AI is creating durable, attributable business value rather than isolated moments of usefulness. McKinsey's 2025 data suggests only a small minority of organisations can point to real financial ROI. IBM found fewer than one-third of executives can measure it confidently. Deloitte argues that satisfactory AI ROI often takes much longer than standard technology leaders expect. In other words, proof is both rare and slow.

Proof usually fails for predictable reasons. Baselines are weak. Adoption is uneven. Benefits are counted before workflows change. Time saved is treated as if it were automatically monetised. Risk reduction is asserted without showing what was avoided. Revenue claims are made without isolating AI's contribution from wider commercial changes.

3. Why the gap exists: the causes governance frameworks rarely address

The AI Value Gap is usually described as a governance maturity problem. The diagnosis is that organisations have not yet built the visibility, accountability, and proof disciplines needed to manage AI economically. That is true. But it is incomplete.

Governance immaturity explains why the gap is hard to close. It does not fully explain why the gap opened in the first place. Three deeper causes are consistently present in organisations where the gap is wide — and they are causes that governance frameworks alone do not resolve.

Vendor incentive misalignment

AI vendors are paid on adoption. Their commercial success depends on organisations using more AI, deploying more widely, and scaling faster. The incentive to prove that adoption is delivering value is structurally much weaker than the incentive to accelerate adoption itself. The result is a vendor relationship that is genuinely helpful in accelerating deployment and systematically unhelpful in establishing honest proof standards.

This does not require vendors to be dishonest. It requires only that they optimise for what their business model rewards — and their business model rewards seats deployed, API calls processed, and platform commitments made, not verified financial returns. Advisory briefings, benchmark comparisons, and customer success narratives all serve the adoption goal. A vendor that told a prospective customer "you probably should not scale this until you have stronger value evidence" would be commercially irrational, even if the advice were correct.

The implication for enterprise AI governance is that proof standards need to be internal, not outsourced to the vendor relationship. Vendor-provided metrics, case studies, and ROI calculators should be treated as marketing rather than as independent evidence.

FOMO and the politics of AI approval

A significant fraction of enterprise AI investment is approved on competitive and reputational grounds rather than economic ones. "We cannot afford to fall behind" and "our competitors are already doing this" are statements about perceived strategic risk, not about the economic case of a specific investment. They are also statements that are very difficult to disprove, which makes them durable throughout an approval cycle.

This pattern is not exclusive to AI — it appears in every technology investment cycle. But AI has created an unusually permissive approval environment because the combination of genuine capability, intense media coverage, visible competitor announcements, and executive anxiety has made the cost of appearing to resist AI feel higher than the cost of approving AI poorly.

When investment decisions are made primarily to signal competitive intent rather than to pursue economic value, the governance disciplines that normally constrain capital allocation are bypassed. Business cases are constructed after the investment intent is established, not before. Proof standards are looser because the decision drivers are political rather than economic. And the value gap widens because the investments that entered the portfolio on narrative have no mechanism for honest evaluation other than the passage of time.

Fixing this requires leadership behaviour change at the board and CEO level — a willingness to apply the same economic rigour to AI investments as to other capital decisions, even when the competitive narrative makes rigour feel costly. Governance frameworks are necessary but not sufficient here.

The measurement capture problem

In most enterprises, the teams that deploy AI systems are the same teams that measure whether those AI systems are working. This creates a structural conflict of interest in the evaluation function. Not because of dishonesty, but because confirmation bias is powerful and measurement design choices matter enormously. The choice of what to measure, when to measure it, what baseline to use, and how to attribute improvement to the AI versus to other factors are all decisions that can be made to produce a range of outcomes. Teams with a stake in the result will naturally make those choices in the direction of the result they want to show.

The measurement capture problem is particularly acute for productivity-based AI cases, because productivity measurement depends on choices that are inherently subjective: what counts as the relevant task, how to account for quality alongside speed, whether to use the same team as the control group or a different one, and how to handle the Hawthorne effect that makes measured performance different from unmeasured performance.

Addressing measurement capture requires structural independence — either an internal function with explicit independence from the AI programme, or independent third-party evaluation for material cases. This is organisationally uncomfortable, and organisations that have made AI deployment a visible success story have a particular reason to resist genuine independence in evaluation. But without it, the proof gap will persist regardless of how good the governance framework is.

4. The framework for closing the gaps

Closing the AI Value Gap requires a more integrated operating model across strategy, finance, technology, and portfolio governance. Five moves matter most.

A practical framework for closing the gap

1. Frame AI as an economic system, not a collection of experiments

AI should be managed as a portfolio of shared capabilities, domain investments, and local experiments.

  1. Create an AI investment register that classifies every initiative as foundational platform, domain capability, or local experiment.
  2. Review that register quarterly with finance, technology, and business sponsors together.
  3. Separate shared platform investment from use-case demand so early pilots do not distort later economics.

2. Make cost visible where decisions are actually made

Decision-makers need total cost views, not isolated vendor invoices.

  1. Define a standard AI cost stack covering model, infrastructure, data, integration, governance, support, and labour.
  2. Build reporting at the level of team, workflow, or service, including cost per inference and cost per action where relevant.
  3. Surface shadow AI and non-IT spend explicitly in monthly reviews rather than leaving it for annual clean-up.

3. Assign owners for value realisation, not just delivery

Each material initiative should have one named economic owner.

  1. Assign an accountable executive or functional leader for cost envelope, target outcome, and stop or scale decisions.
  2. Record owner, baseline, expected return dimension, and review cadence in the investment register.
  3. Distinguish shared platform ownership from initiative-level value ownership so neither becomes invisible.

4. Raise the proof standard before broad scale

The evidence bar should rise as spend and operational dependency rise.

  1. Define leading and lagging metrics before deployment, including baseline measures and timing assumptions.
  2. Introduce stage gates so pilots, limited production, and scaled roll-out each require stronger proof.
  3. Require evidence of workflow change and adoption, not only time-saved estimates or user satisfaction.

5. Review AI as a managed portfolio, not an accumulation of exceptions

Portfolio governance is where AI becomes strategically governable.

  1. Hold a recurring AI portfolio review that combines spend, owner status, demand trends, benefit evidence, and risk posture.
  2. Compare initiatives against one another, not only against their own original narratives.
  3. Reallocate capital toward proven or strategically necessary capabilities and close weak cases earlier.

Diagnostic questions

What the research says

5. The call to action

The next phase of enterprise AI will be defined less by model access and more by management discipline. Organisations that close the AI Value Gap earliest will not simply reduce waste. They will create a more credible basis for sustained investment, faster reallocation, and stronger trust between finance, technology, and business leaders.

For CFOs, that means treating AI as a capital-discipline problem as well as a technology strategy. For CIOs, it means designing architecture and operating models that expose economic truth rather than hiding it. For CAIOs, it means building the proof system as carefully as the use-case pipeline. For FinOps, TBM, and engineering leaders, it means making AI demand legible enough to be challenged before it hardens into habit.

The four questions remain straightforward.

  1. What are we really spending?
  2. Who owns the economic result?
  3. What proof standard governs the next tranche of scale?
  4. Which initiatives deserve more capital, less capital, or redesign?

Organisations that can answer those questions consistently are not just measuring AI better. They are governing it better.

Related reading