Key takeaways
- The research on AI ROI is less authoritative than it is routinely presented. The widely-cited figures — "only 6% see real ROI," "payback takes two to four years" — are self-reported survey data with significant methodological limitations. They are directionally useful, not analytically precise.
- Most AI ROI claims are designed to be accepted, not tested. The structure of a typical AI business case — productivity estimates without baselines, strategic framing that defers economic proof, TCO models that exclude governance and labour — reflects the path of least resistance through an approval process, not a genuine attempt to forecast return.
- CFOs who apply the same standards to AI investment claims that they apply to capital expenditure or acquisition proposals will find most AI cases incomplete. That is not a reason to reject them. It is a reason to send them back for better work.
- The right finance role in AI is not to obstruct investment — it is to raise the quality of the conversation until the economic case can be genuinely trusted.
A note on the research
This article, like others on this site, references McKinsey's finding that only about 5-6% of organisations see real financial AI ROI, and Deloitte's finding that satisfactory AI payback typically takes two to four years. These are useful data points. They are also worth scrutinising.
Both are based on executive surveys — self-reported assessments of ROI status and payback experience. "Real financial ROI" is defined differently by different respondents in different organisations at different stages of AI maturity. A team that has just saved 10% on customer service costs using a chatbot may report strong ROI. A team that has failed to capture any productivity improvement from a generative AI deployment may report no ROI. The same capability, with the same economic result, can produce different survey responses depending on how "ROI" is interpreted.
This does not make the research wrong. It makes it directional. The surveys are telling us that executive confidence in AI ROI measurement is low, that many organisations are struggling to demonstrate financial returns, and that the timelines many expected have not been met. Those signals are consistent and meaningful.
What the research does not tell us is the precise state of AI ROI across enterprises. Any CFO who reads "only 6% see real ROI" and concludes they should drastically reduce AI investment is applying survey data beyond its appropriate scope. Any CFO who reads it and concludes the organisation's AI programme is probably fine unless it is unusually underperforming is making a more defensible inference.
The useful application of this research is not as a statistical benchmark. It is as a diagnostic signal: the fact that AI ROI is hard to demonstrate, and that most organisations admit they struggle to measure it, should make any CFO more demanding of the cases presented to them, not more accepting of the ambiguity as normal.
Why most AI business cases are built to pass approval, not to be right
Before examining what CFOs should ask, it is worth being direct about the dynamics that produce the typical AI business case.
The person sponsoring an AI investment has a stake in its approval. They may believe genuinely and correctly that the investment is worthwhile. They may also believe that presenting the case in the most favourable possible light is the most rational behaviour given how approvals work. These two things are not in conflict — they coexist in every capital approval process.
The result is structural optimism. Productivity gains are estimated at the high end of plausible ranges. Baselines are chosen to maximise apparent improvement. Cost models exclude items that are genuinely difficult to estimate — governance, integration complexity, the labour required to make the AI actually useful in production. Strategic framing is added to provide a justification that survives even if the financial case is challenged. Time horizons for proof are set long enough that approval can precede accountability.
None of this is dishonesty in the ordinary sense. It is the normal operation of an investment approval process in which the incentive structure favours positive cases. The CFO's role in this system is to be the counterforce — to ask the questions that the sponsor's incentives prevent them from asking about their own case.
The question of baselines: the single most common weakness
If there is one question that exposes weak AI business cases more reliably than any other, it is: compared with what?
AI productivity claims are almost never stated in isolation. They are stated as improvements — faster, cheaper, more accurate, higher throughput. Every improvement claim requires a baseline. Without a baseline, the claimed improvement is a direction, not a measurement.
The baseline problem manifests in several ways. Some cases have no baseline at all — the productivity gain is presented as an estimate of what the AI capability can produce without a reference to what the current process produces. Some cases have a nominal baseline that was not actually measured — the "before" state was estimated rather than observed, and was estimated by the same team that needs the improvement to look large. Some cases have a measured baseline but cherry-pick the measurement period — using a period of unusually low performance as the "before" to maximise apparent improvement.
Genuine baseline measurement requires capturing the current state of the process before the AI is deployed, using the same metrics that will be used to measure improvement, over a period long enough to smooth out normal variability. It requires documentation of the measurement methodology. And it requires that the measurement is conducted independently enough from the investment case that there is no structural incentive to make the baseline look worse than it is.
In most organisations, almost none of this happens. The baseline is a rough estimate constructed after the AI investment is already partially committed. The CFO who asks for the actual measured baseline — not the estimate — will often find that it does not exist, and that the entire productivity improvement claim is built on an assumption rather than an observation.
The time-saved problem: the most persistent inflation mechanism
The phrase "time saved" appears in the majority of AI ROI cases. It is also the most inflation-prone claim in enterprise AI, and it is worth examining exactly why.
When an AI capability reduces the time required to complete a task, two things happen. First, the individual completing the task has freed capacity. Second, that capacity is available to be used for something else. The first thing is a technology outcome. The second thing is a management decision.
An AI system that saves 30% of an analyst's time has created an asset — 30% of an analyst's working capacity, freed from a specific task. The value of that asset depends entirely on what happens next. If the analyst is redeployed to higher-value work, the saving is real. If the organisation reduces headcount proportionately, the saving is real. If the analyst uses the freed time for additional breaks, expanded administrative overhead, or simply less-efficient work because there is no pressure to fill the time productively, the saving is theoretical.
Most AI business cases treat time saved as if it were value created. The correct treatment is to ask explicitly: what change in cost structure, output volume, or revenue will follow from the freed capacity? If the answer is "we expect productivity to improve" without a specified mechanism for how the freed capacity will be captured, the ROI claim is aspirational, not economic.
This is a governance question as much as a measurement question. Time savings from AI are only realised if the organisation makes an active management decision to capture them. The CFO should ask not only what time will be saved, but what the workforce planning and operational change assumptions are that convert the saving into economic value.
Illustrative case: two versions of the same business case
The following illustrates the difference between a typical AI business case and a genuinely credible one, using a composite legal document review scenario.
Typical case:
A legal operations team proposes an AI document review capability. The case projects that AI will reduce document review time by 40%, representing £1.2M in annual productivity savings against a fully-loaded cost of four senior paralegal FTEs. Implementation cost is £280,000. Payback period is presented as 2.8 months, ROI at 329%.
The 40% productivity improvement is based on vendor benchmarks from similar implementations at comparable organisations. No current-state baseline has been measured. The cost model includes model and platform licences; it excludes the time of the three senior lawyers who will need to configure and validate the system, the integration work required to connect the document management system to the AI platform, and the ongoing quality review overhead required to maintain acceptable accuracy. The case assumes that the four paralegal FTEs will be redeployed without describing to what. It is approved.
Credible case:
The same legal operations team proposes the same capability. Before preparing the case, the team conducts a four-week measurement of current document review throughput, time per document category, error rate, and escalation frequency. The baseline is documented and available to finance for review.
The productivity improvement estimate is based on a controlled pilot of 200 documents, with accuracy evaluated by a senior lawyer blind to which documents were AI-assisted. The pilot shows a 28% reduction in review time on standard commercial contracts and a 12% reduction on complex litigation documents.
The cost model includes model and platform licences, integration development (estimated at £85,000 in engineering time), quarterly quality review (estimated at 0.2 FTE of senior legal time ongoing), and a contingency for the first-year accuracy improvement programme. The case explicitly addresses workforce planning: two of the four paralegal FTEs will be redeployed to contract negotiation support, increasing throughput in that function by an estimated 15%; the other two are expected to be absorbed through natural attrition over 18 months.
The resulting payback period is 14 months. ROI is 87%. The case is approved with stronger confidence.
The second case is not a better news story. It is a better decision. The CFO who receives the first case and asks the questions that would produce the second is doing their job.
The cost denominator: why the number at the bottom is usually wrong
An ROI claim is only as credible as the cost it is divided into. The single most reliable way to inflate apparent AI ROI is to minimise the cost denominator.
The most commonly excluded costs in AI business cases are:
People cost for ongoing operation. AI capabilities require support — configuration, monitoring, quality review, retraining, vendor relationship management. These are not one-time costs; they are recurring obligations that belong in the annual operating cost model. Many AI cases count only the licence fee.
Integration and workflow change. Making AI useful in a production environment requires connecting it to existing systems, embedding it in workflows, and changing the way people work. These costs are rarely trivial and are almost always underestimated at the approval stage.
Governance overhead. Model validation, policy compliance, safety review, and audit trail requirements are not free. In regulated industries they are substantial. In any environment, they are real and recurring.
Shared platform burden. Many AI initiatives rely on shared infrastructure — common model access, shared orchestration, shared data pipelines — that was funded before the specific use case was proposed. This shared cost is rarely allocated to individual use cases, which means each case appears cheaper than it is.
Foundational data work. AI that relies on high-quality structured data often requires data preparation work that precedes and enables the AI deployment. This work is sometimes booked against the data programme rather than the AI programme, removing a real cost from the AI case.
The CFO's practical test is to ask the sponsor to walk through the TCO model line by line and explain what is and is not included. The items that are excluded deserve scrutiny proportional to their likely magnitude. A model that includes only vendor cost and one-time integration is almost certainly incomplete.
The real finance role
The right finance function in AI is not the one that approves everything with a plausible narrative, and it is not the one that obstructs investment until proof is perfect. It is the one that insists on honest, comparable, complete economic cases — and maintains that standard consistently enough that teams stop bringing cases that can only survive examination rather than invite it.
That standard is not a constraint on AI ambition. It is the management discipline that distinguishes AI investment from AI expenditure. Organisations that develop it earlier will make fewer bad investments, reallocate capital faster, and build the executive credibility that allows their AI programmes to survive scrutiny as well as generate enthusiasm.
What to do next
For CFOs:
- Return business cases that lack a measured baseline, regardless of how compelling the strategic framing is. Require one.
- Ask every case to explain the mechanism by which time saved becomes economic value. "Productivity improvement" is not sufficient without a workforce planning or cost-structure implication.
- Require a TCO model that has been reviewed against the standard seven cost layers. Accept an honest explanation of what is excluded and why; do not accept an unexplained gap.
For CIOs and CAIOs:
- Proactively redesign governance packs to show both leading and lagging indicators, referenced against documented baselines.
- Separate foundational platform spend from initiative-level business cases so that AI cases are not overstated by excluding shared cost.
- Introduce stage gates so that scale decisions require stronger evidence than pilot approval — and communicate this standard clearly before pilots begin, not after.
For FinOps, TBM, and ITFM leaders:
- Build reporting that connects unit economics, shared platform cost, and business ownership in a format that CFOs can reference directly in investment reviews.
- Make shadow AI and non-IT spend visible inside the finance conversation before annual budget cycles, not as a clean-up exercise after the fact.
- Develop a common evidence template so AI cases can be compared across the portfolio rather than evaluated each one in isolation against its own narrative.