Key takeaways
- Most AI business cases fail board scrutiny not because the technology is wrong, but because the benefit claim is vague, the cost model is incomplete, and the proof standard is undefined.
- A CFO-grade AI business case must answer three questions: what does it cost in full, what return is expected and when, and how will we prove it happened.
- Productivity claims need a realisation assumption — capturing time freed up is not the same as capturing value.
- Stage-gate structures and exit criteria should be built into the approval, not added later when investment is already at stake.
Why most AI business cases fail
The characteristic failure mode of an AI business case is not dishonesty. It is optimism combined with structural incompleteness. The person presenting the case typically believes the technology will work. What they have not done is apply the same rigour to the cost model, the benefit realisation path, or the proof standard that they would apply to a capital expenditure request in any other category.
The result is a case that looks compelling in a slide deck and unravels in the first quarterly review. Investment continues while the evidence base stays thin. The organisation ends up in a familiar position: substantial AI spend, no clear proof of return, and increasing reluctance to authorise the next round of funding without something changing.
CFOs are increasingly aware of this pattern. The question is what a more rigorous standard looks like.
The three questions a credible AI business case must answer
A business case that can survive scrutiny must do three things: quantify the full cost, specify the expected return with a realistic realisation path, and define in advance what evidence will count as proof.
What does it cost in full? The model invoice, API fees, or vendor contract is typically the most visible cost. It is rarely the dominant one. A more complete cost picture includes engineering time to integrate and maintain the workflow, data infrastructure and security changes, user enablement and training, ongoing governance and oversight, and the productivity cost of imperfect outputs during the learning period. See the AI TCO Framework for a structured approach to building this estimate.
What return is expected and when? Benefit claims in AI cases tend to fall into three categories: productivity gains (time saved), capability gains (things now possible that were not before), and strategic optionality (positioning for future opportunity). All three are legitimate. But each requires a different realisation model. Time savings only become economic value if the released capacity is redeployed or if headcount grows more slowly. Capability gains require a link to revenue or cost outcomes. Strategic optionality requires a stage at which the option is exercised or abandoned.
How will we know it happened? This is the question most business cases skip entirely. Without a pre-agreed proof standard — a specific metric, measurement method, and baseline — no review process can determine whether the investment is delivering. The result is that positive stories circulate while unfavourable evidence goes uncaptured.
The productivity claim problem
Productivity is the most common AI benefit claim. It is also the one most likely to fail realisation.
The underlying mechanism is real. AI tools do reduce the time to produce a first draft, search a dataset, generate code, or summarise a document. In controlled testing, time savings are frequently measurable. The difficulty is that measured time savings in a task do not automatically translate into economic value for the organisation.
A knowledge worker who completes a task twenty per cent faster does not automatically deliver twenty per cent more output — particularly if the task they completed faster was not the binding constraint on their productivity. If the bottleneck is decision-making, coordination, or access to information rather than execution time, then the AI intervention may improve the experience of doing the task without improving the outcome that matters.
A credible productivity claim needs a realisation assumption: where does the freed time go, who makes that decision, and how is it tracked? Without this assumption explicitly stated, the productivity claim is a theoretical maximum rather than an expected return.
Structuring the benefit case
A rigorous benefit case separates claims into categories and disciplines each one differently.
Definite near-term savings are cases where AI removes a specific activity that currently has a direct cost: a vendor contract, a headcount role, or a defined volume of outsourced work. These claims are the most tractable because the baseline is clear and the capture mechanism is simple. They are also usually the smallest category.
Productivity return covers the use of freed time. The conservative approach is to model realisation at a fraction of the theoretical maximum — typically between twenty and forty per cent for well-managed deployments — and tie the estimate to a specific redeployment assumption. Unmeasured time savings should be excluded from the expected return.
Capability gains are returns that depend on AI making something possible that was not previously done. These are harder to baseline but often more material. A legal team that can now review contract volumes previously beyond their capacity, or a finance team that can produce scenario analyses at a cadence previously impractical, are generating returns that time-saving models do not capture. Document the enabling mechanism and the downstream impact it should produce.
Strategic optionality should be held separately from the operating return. It reflects the value of being positioned to scale if the technology proves out, or to integrate with capabilities that are not yet mature. Boards can assess optionality arguments, but they need to be labelled as such. Mixing optionality arguments with productivity claims produces benefit totals that neither category supports.
The cost model in detail
A business case that understates cost is not conservative — it is misleading. It also tends to produce cost overruns that undermine confidence in future proposals.
Build the cost model in three horizons. Year one costs are typically the highest because they include integration engineering, enablement, and change management in addition to the ongoing run cost. Year two costs reflect the stabilised operating position, usually lower than year one but often higher than the vendor contract alone. Year three costs introduce the first indication of total cost of ownership at scale: what happens when the use case expands, the model is upgraded, or governance requirements increase.
Costs to include in full:
- Model or platform fees (including potential volume tiers if adoption grows)
- Integration engineering (initial build and ongoing maintenance)
- Infrastructure changes (data pipelines, security controls, access management)
- User enablement (training, support, adoption management)
- Governance and oversight (prompt review, output auditing, compliance documentation)
- Contingency for transition costs (productivity dip during rollout, rework of AI-assisted outputs)
Stage-gate and exit criteria
One of the most useful structural elements in an AI business case is an explicit stage-gate: a defined decision point at which the organisation evaluates whether to proceed, scale, redesign, or stop.
Stage-gate approval forces the benefit case to specify interim proofs rather than only a final return. It also creates a legitimate exit path, which makes the original investment more credible. Boards are more likely to approve investment with a defined stop condition than investment with no exit, because the former demonstrates that the case has been stress-tested.
An exit criterion should specify: the proof standard that would trigger a redesign or stop decision, the timeframe in which that evidence should be available, and who owns the decision. Without these elements, the stage-gate becomes a reporting milestone rather than a genuine decision gate.
What board-ready looks like
A business case ready for board-level approval has the following characteristics:
The cost model is complete, includes year-one integration costs, and shows the stabilised run cost in year two. The benefit case separates categories by type and includes a realisation assumption for productivity claims. Expected return is stated as a range with a central estimate, not a single optimistic figure. The proof standard is defined before approval, not after the first review. A stage-gate structure is included with explicit decision criteria and an owner.
None of this requires more complexity than the investment deserves. A cost estimate built on incomplete assumptions will be more precise than one built on honest ranges — but it will also be wrong. The objective is not a number that looks compelling in a presentation. It is a commitment the organisation can actually hold.
The CFO's role in this process is not to make the case more conservative. It is to make it more honest: to ask where the benefit assumption requires future decisions that have not yet been made, where the cost model excludes categories that will still have to be paid for, and where the proof standard would actually change the investment decision if it were not met. That is the standard that allows AI investment to scale without accumulating an unaccountable tail of stranded spending.