What CFOs Should Ask of AI ROI Claims

Key takeaways

Evidence
The research on AI ROI is less authoritative than it is routinely presented. The widely-cited figures — "only 6% see real ROI," "payback takes two to four years" — are self-reported survey data with significant methodological limitations. They are directionally useful, not analytically precise.
Evidence
By April 2026, a Bain survey found that 44% of enterprises were funding the next AI wave from savings that had not yet materialised. By May 2026, corporate America had begun rationing AI access (paywalled) as costs outpaced budgets and value proof remained elusive.
Interpretation
Most AI ROI claims are designed to be accepted, not tested. The structure of a typical AI business case — productivity estimates without baselines, strategic framing that defers economic proof, TCO models that exclude governance and labour — reflects the path of least resistance through an approval process, not a genuine attempt to forecast return.
Interpretation
CFOs who apply the same standards to AI investment claims that they apply to capital expenditure or acquisition proposals will find most AI cases incomplete. That is not a reason to reject them. It is a reason to send them back for better work.
Interpretation
The right finance role in AI is not to obstruct investment — it is to raise the quality of the conversation until the economic case can be genuinely trusted.

A note on the research

Why most AI business cases are built to pass approval, not to be right

Before examining what CFOs should ask, it is worth being direct about the dynamics that produce the typical AI business case.

The person sponsoring an AI investment has a stake in its approval. They may believe genuinely and correctly that the investment is worthwhile. They may also believe that presenting the case in the most favourable possible light is the most rational behaviour given how approvals work. These two things are not in conflict — they coexist in every capital approval process.

The result is structural optimism. Productivity gains are estimated at the high end of plausible ranges. Baselines are chosen to maximise apparent improvement. Cost models exclude items that are genuinely difficult to estimate — governance, integration complexity, the labour required to make the AI actually useful in production. Strategic framing is added to provide a justification that survives even if the financial case is challenged. Time horizons for proof are set long enough that approval can precede accountability.

None of this is dishonesty in the ordinary sense. It is the normal operation of an investment approval process in which the incentive structure favours positive cases. The CFO's role in this system is to be the counterforce — to ask the questions that the sponsor's incentives prevent them from asking about their own case.

The question of baselines: the single most common weakness

If there is one question that exposes weak AI business cases more reliably than any other, it is: compared with what?

AI productivity claims are almost never stated in isolation. They are stated as improvements — faster, cheaper, more accurate, higher throughput. Every improvement claim requires a baseline. Without a baseline, the claimed improvement is a direction, not a measurement.

The baseline problem manifests in several ways. Some cases have no baseline at all — the productivity gain is presented as an estimate of what the AI capability can produce without a reference to what the current process produces. Some cases have a nominal baseline that was not actually measured — the "before" state was estimated rather than observed, and was estimated by the same team that needs the improvement to look large. Some cases have a measured baseline but cherry-pick the measurement period — using a period of unusually low performance as the "before" to maximise apparent improvement.

Even tier-1 enterprises struggle with this. Uber's experience of exhausting annual AI budgets in four months illustrated how consumption can outpace both planning and proof systems when baseline measurement and attribution frameworks are weak. The UK government's Copilot trial provides a contrasting example: rigorous baseline methodology that found no robust evidence that time savings translated to productivity gains—precisely the kind of honest measurement that most business cases avoid.

Genuine baseline measurement requires capturing the current state of the process before the AI is deployed, using the same metrics that will be used to measure improvement, over a period long enough to smooth out normal variability. It requires documentation of the measurement methodology. And it requires that the measurement is conducted independently enough from the investment case that there is no structural incentive to make the baseline look worse than it is.

In most organisations, almost none of this happens. The baseline is a rough estimate constructed after the AI investment is already partially committed. The CFO who asks for the actual measured baseline — not the estimate — will often find that it does not exist, and that the entire productivity improvement claim is built on an assumption rather than an observation.

The time-saved problem: the most persistent inflation mechanism

Show me the conversion mechanism

Every AI business case should specify:

Baseline workload and cost

What is the current volume, cycle time, cost, and quality?
How was the baseline measured?
Is the measurement period representative?
Who validated the baseline?

Gross task saving

What percentage of task time will AI reduce?
Is this based on controlled measurement or estimation?
Does the saving apply to all work or only specific conditions?

Usable capacity

How much of the saved time can actually be redeployed?
Are the savings in large enough blocks to be useful?
Can the capacity be aggregated across individuals or teams?

Named management action

What specific operating change will capture the capacity?
Reduced overtime? Avoided hiring? Role consolidation? Increased throughput?
Who owns that decision and when will it be made?

Benefit type

Is this cost avoided, cashable saving, capacity for growth, quality improvement, or risk reduction?
These are not interchangeable and should not be combined into one undifferentiated total.

Timing

When does the capacity become available?
When does the management action occur?
When does the financial benefit appear in the P&L?

Owner

Who is accountable for realising the benefit?
Is it the technology team, the business owner, or finance?

Incremental build, run, control, and change cost

What are the model, platform, data, integration, governance, training, and support costs?
Are these one-time or recurring?
Are shared platform costs allocated?

Realised net outcome

What is the expected net value after all incremental costs?
How will realised value be validated?
What is the attribution method?

Finance AI benchmarks with full caveats: IBM IBV and APQC's 2026 survey of 1,025 active finance AI practitioners found that respondents reported a median 8% reduction in total annual finance cost. More mature, end-to-end implementations reported reductions reaching 18%. Median payback ranged from six to eight months. These are self-reported outcomes from organisations already deploying AI in finance, not universal targets. The sample is not representative of all finance organisations. Results reflect workflow redesign, data improvement, and operating-model maturity as well as AI technology. The 18% figure represents more mature execution and should not be described as typical. Use these as external reference points for what active practitioners report, not as benchmarks every organisation should expect.

Ask about maturity: pilot, integrated, or governed?

Discrete pilot

Limited scope, controlled conditions, often with dedicated support
Results may not transfer to production scale
Useful for learning, not for forecasting scaled economics

Integrated sub-process

AI embedded in a specific workflow with production data and users
More reliable than pilot results, but may not reflect full process complexity
Requires validation of adoption, quality, and exception handling

Governed end-to-end workflow

AI operating across a complete process with controls, monitoring, and continuous improvement
Most reliable basis for scaled value claims
Includes full run costs, governance overhead, and operating-model maturity

Illustrative case: two versions of the same business case

The following illustrates the difference between a typical AI business case and a genuinely credible one, using a composite legal document review scenario.

Typical case:
A legal operations team proposes an AI document review capability. The case projects that AI will reduce document review time by 40%, representing £1.2M in annual productivity savings against a fully-loaded cost of four senior paralegal FTEs. Implementation cost is £280,000. Payback period is presented as 2.8 months, ROI at 329%.

The 40% productivity improvement is based on vendor benchmarks from similar implementations at comparable organisations. No current-state baseline has been measured. The cost model includes model and platform licences; it excludes the time of the three senior lawyers who will need to configure and validate the system, the integration work required to connect the document management system to the AI platform, and the ongoing quality review overhead required to maintain acceptable accuracy. The case assumes that the four paralegal FTEs will be redeployed without describing to what. It is approved.

Credible case:
The same legal operations team proposes the same capability. Before preparing the case, the team conducts a four-week measurement of current document review throughput, time per document category, error rate, and escalation frequency. The baseline is documented and available to finance for review.

The productivity improvement estimate is based on a controlled pilot of 200 documents, with accuracy evaluated by a senior lawyer blind to which documents were AI-assisted. The pilot shows a 28% reduction in review time on standard commercial contracts and a 12% reduction on complex litigation documents.

The cost model includes model and platform licences, integration development (estimated at £85,000 in engineering time), quarterly quality review (estimated at 0.2 FTE of senior legal time ongoing), and a contingency for the first-year accuracy improvement programme. The case explicitly addresses workforce planning: two of the four paralegal FTEs will be redeployed to contract negotiation support, increasing throughput in that function by an estimated 15%; the other two are expected to be absorbed through natural attrition over 18 months.

The resulting payback period is 14 months. ROI is 87%. The case is approved with stronger confidence.

The second case is not a better news story. It is a better decision. The CFO who receives the first case and asks the questions that would produce the second is doing their job.

The cost denominator: why the number at the bottom is usually wrong

An ROI claim is only as credible as the cost it is divided into. The single most reliable way to inflate apparent AI ROI is to minimise the cost denominator.

The most commonly excluded costs in AI business cases are:

People cost for ongoing operation. AI capabilities require support — configuration, monitoring, quality review, retraining, vendor relationship management. These are not one-time costs; they are recurring obligations that belong in the annual operating cost model. Many AI cases count only the licence fee.

Integration and workflow change. Making AI useful in a production environment requires connecting it to existing systems, embedding it in workflows, and changing the way people work. These costs are rarely trivial and are almost always underestimated at the approval stage.

Governance overhead. Model validation, policy compliance, safety review, and audit trail requirements are not free. In regulated industries they are substantial. In any environment, they are real and recurring.

Shared platform burden. Many AI initiatives rely on shared infrastructure — common model access, shared orchestration, shared data pipelines — that was funded before the specific use case was proposed. This shared cost is rarely allocated to individual use cases, which means each case appears cheaper than it is.

Foundational data work. AI that relies on high-quality structured data often requires data preparation work that precedes and enables the AI deployment. This work is sometimes booked against the data programme rather than the AI programme, removing a real cost from the AI case.

Hidden consumption in SaaS pricing. When AI capabilities are delivered through SaaS subscriptions, the underlying token consumption is often aggregated and obscured, making it difficult to understand true unit economics or predict cost at scale. For more on this visibility gap, see SaaS Token Opacity: The Hidden Economics of AI Subscriptions.

The CFO's practical test is to ask the sponsor to walk through the TCO model line by line and explain what is and is not included. The items that are excluded deserve scrutiny proportional to their likely magnitude. A model that includes only vendor cost and one-time integration is almost certainly incomplete.

The real finance role

The right finance function in AI is not the one that approves everything with a plausible narrative, and it is not the one that obstructs investment until proof is perfect. It is the one that insists on honest, comparable, complete economic cases — and maintains that standard consistently enough that teams stop bringing cases that can only survive examination rather than invite it.

By mid-2026, market behaviour had begun to validate this approach. With enterprises rationing AI access (paywalled) and 44% funding expansion from unrealised savings, the organisations with rigorous baseline measurement and attribution frameworks were better positioned to make informed scale decisions.

That standard is not a constraint on AI ambition. It is the management discipline that distinguishes AI investment from AI expenditure. Organisations that develop it earlier will make fewer bad investments, reallocate capital faster, and build the executive credibility that allows their AI programmes to survive scrutiny as well as generate enthusiasm.

What to do next

For CFOs:

Return business cases that lack a measured baseline, regardless of how compelling the strategic framing is. Require one.
Ask every case to explain the mechanism by which time saved becomes economic value. "Productivity improvement" is not sufficient without a workforce planning or cost-structure implication.
Require a TCO model that has been reviewed against the standard seven cost layers. Accept an honest explanation of what is excluded and why; do not accept an unexplained gap.

For CIOs and CAIOs:

Proactively redesign governance packs to show both leading and lagging indicators, referenced against documented baselines.
Separate foundational platform spend from initiative-level business cases so that AI cases are not overstated by excluding shared cost.
Introduce stage gates so that scale decisions require stronger evidence than pilot approval — and communicate this standard clearly before pilots begin, not after.

For FinOps, TBM, and ITFM leaders:

Build reporting that connects unit economics, shared platform cost, and business ownership in a format that CFOs can reference directly in investment reviews.
Make shadow AI and non-IT spend visible inside the finance conversation before annual budget cycles, not as a clean-up exercise after the fact.
Develop a common evidence template so AI cases can be compared across the portfolio rather than evaluated each one in isolation against its own narrative.

Further reading on AI ROI and business cases

What CFOs Should Ask of AI ROI Claims

Key takeaways

A note on the research

Why most AI business cases are built to pass approval, not to be right

The question of baselines: the single most common weakness

The time-saved problem: the most persistent inflation mechanism

Show me the conversion mechanism

Ask about maturity: pilot, integrated, or governed?

Illustrative case: two versions of the same business case

The cost denominator: why the number at the bottom is usually wrong

The real finance role

What to do next

IBM Institute for Business Value: Finance Execution Unlocks AI Value at Scale

AI TCO Framework

When to Stop an AI Initiative

The State of AI in 2023

State of Generative AI in the Enterprise

To Get Real Value from AI, Treat It Like a Product

Continue exploring

AI ROI Models

The AI Value Gap

AI Economics Maturity Model

SaaS Token Opacity