A baseline is foundational to any credible AI value claim, yet missing or weak baselines are the single most common weakness in AI business cases. The reasons are structural: baselines require measurement before the AI capability is deployed, which requires anticipating what will need to be measured before the AI design is complete. This is not natural to most programme delivery processes, where measurement tends to be designed around what is convenient to measure post-deployment.
There are three common baseline failures in AI ROI practice.
No baseline at all. The pre-AI state was not measured. The productivity improvement is an estimate based on expected AI performance rather than a comparison against an observed reference.
An estimated baseline. The pre-AI state was approximated by asking users how long tasks took, reviewing historical system logs, or benchmarking against industry averages. Estimated baselines are biased in the direction of the sponsor's interest and cannot be independently verified.
A cherry-picked measurement period. The pre-AI baseline was measured during an unusually bad period — high volume, staffing pressure, a known quality problem — to make subsequent improvement look larger than it would be against a typical period. This is rarely conscious manipulation; it is the natural consequence of measuring baseline shortly after a problem has been identified.
The governance requirement is that baselines be measured before AI deployment, over a period representative of normal operating conditions, using the same metrics that will be used to measure improvement, with documentation available for independent review. Finance functions that accept anything less are accepting a proof standard they would not accept for other investment categories.