The five rungs
Every AI metric sits on one of five rungs. The rungs are ordered by how close they are to business value. The higher the rung, the harder the metric is to game, and the more it matters.
Rung 1: Activity
Things happening. Tokens consumed, API calls made, models deployed, users registered. Activity metrics tell you the system is being used. They do not tell you if it is useful.
Example: 2.4 million tokens consumed this month
Rung 2: Adoption
People using it. Active users, session frequency, feature usage. Adoption metrics tell you people are trying it. They do not tell you if it is working.
Example: 340 active users, 68% weekly return rate
Rung 3: Productivity
Work done faster. Time saved, tasks completed, throughput increased. Productivity metrics tell you the work is getting done quicker. They do not tell you if the work matters.
Example: 4.2 hours saved per user per week
Rung 4: Value
Business outcomes improved. Revenue increased, costs reduced, quality improved, risk lowered. Value metrics tell you the business result changed. They do not tell you if it was strategic.
Example: £840k in labour cost avoided, 12% reduction in processing errors
Rung 5: Strategic impact
Capability built, position strengthened, options created. Strategic metrics tell you the organisation is more capable than it was. They are the hardest to measure and the most important.
Example: New product capability enabled, competitive position defended, regulatory risk reduced
The ladder is not a judgement. Activity metrics are not bad. They are just on the first rung. The problem is when organisations treat first-rung metrics as if they were fourth-rung metrics, or when dashboards mix all five rungs together without distinction.
The risk track
There is a parallel track for risk metrics. They run alongside the value ladder, but in the opposite direction: the higher the risk metric, the more it threatens value.
- Risk exposure: What could go wrong (model bias, data leakage, compliance breach)
- Risk events: What did go wrong (incidents, failures, audit findings)
- Risk impact: What it cost when it went wrong (financial loss, reputational damage, regulatory penalty)
Risk metrics belong on the ladder too. A dashboard that only shows value metrics without risk metrics is incomplete. The full picture is value minus risk.
Where common metrics sit
Here is where the most common AI metrics sit on the ladder:
| Metric | Rung |
|---|---|
| Tokens consumed | Activity (1) |
| API calls made | Activity (1) |
| Models deployed | Activity (1) |
| Active users | Adoption (2) |
| Session frequency | Adoption (2) |
| Time saved per task | Productivity (3) |
| Tasks completed faster | Productivity (3) |
| Cost per successful outcome | Value (4) |
| Revenue attributed to AI | Value (4) |
| Quality improvement | Value (4) |
| New capability enabled | Strategic (5) |
| Competitive position defended | Strategic (5) |
The lead metric: attribution coverage
If you can only track one metric across your AI portfolio, track attribution coverage. It is the percentage of your total AI cost that can be linked to a measured business outcome.
MIT NANDA study (2025)
The MIT NANDA study found that roughly 95% of AI spend cannot be attributed to a measured outcome. That is the value gap. Attribution coverage is the inverse: the percentage you can attribute. The higher the coverage, the smaller the gap.
Attribution coverage is a lead metric because it predicts value realisation. If you cannot attribute cost to outcome, you cannot prove value. If you cannot prove value, you cannot defend the spend. If you cannot defend the spend, the programme gets cut. Coverage is the early warning.
The diagnostic question
The distinction ladder gives you one diagnostic question to ask of any AI dashboard:
Which rung is each metric on, and why does it matter?
If the dashboard cannot answer that question, it is a vanity dashboard. It is showing activity and calling it value. The ladder forces the distinction.
One-page self-audit
Use this checklist to audit your AI measurement system. Answer honestly. The point is diagnosis, not judgement.
Measurement maturity checklist
Scoring: 0-4 checks: measurement is ad hoc. 5-9 checks: measurement is emerging. 10-14 checks: measurement is systematic. The goal is not perfection. The goal is honest diagnosis and a plan to improve.
Related reading