The Distinction Ladder: Five Rungs Every AI Metric Belongs On

Five-rung ladder of metrics from activity up to strategic impact, with attribution coverage noted — Five rungs every AI metric belongs on. The higher the rung, the harder to game.

The five rungs

Every AI metric sits on one of five rungs. The rungs are ordered by how close they are to business value. The higher the rung, the harder the metric is to game, and the more it matters.

Rung 1: Activity

Things happening. Tokens consumed, API calls made, models deployed, users registered. Activity metrics tell you the system is being used. They do not tell you if it is useful.

Example: 2.4 million tokens consumed this month

Rung 2: Adoption

People using it. Active users, session frequency, feature usage. Adoption metrics tell you people are trying it. They do not tell you if it is working.

Example: 340 active users, 68% weekly return rate

Rung 3: Productivity

Work done faster. Time saved, tasks completed, throughput increased. Productivity metrics tell you the work is getting done quicker. They do not tell you if the work matters.

Example: 4.2 hours saved per user per week

Rung 4: Value

Business outcomes improved. Revenue increased, costs reduced, quality improved, risk lowered. Value metrics tell you the business result changed. They do not tell you if it was strategic.

Example: £840k in labour cost avoided, 12% reduction in processing errors

Rung 5: Strategic impact

Capability built, position strengthened, options created. Strategic metrics tell you the organisation is more capable than it was. They are the hardest to measure and the most important.

Example: New product capability enabled, competitive position defended, regulatory risk reduced

The ladder is not a judgement. Activity metrics are not bad. They are just on the first rung. The problem is when organisations treat first-rung metrics as if they were fourth-rung metrics, or when dashboards mix all five rungs together without distinction.

The risk track

There is a parallel track for risk metrics. They run alongside the value ladder, but in the opposite direction: the higher the risk metric, the more it threatens value.

Risk exposure: What could go wrong (model bias, data leakage, compliance breach)
Risk events: What did go wrong (incidents, failures, audit findings)
Risk impact: What it cost when it went wrong (financial loss, reputational damage, regulatory penalty)

Risk metrics belong on the ladder too. A dashboard that only shows value metrics without risk metrics is incomplete. The full picture is value minus risk.

Where common metrics sit

Here is where the most common AI metrics sit on the ladder:

Metric	Rung
Tokens consumed	Activity (1)
API calls made	Activity (1)
Models deployed	Activity (1)
Active users	Adoption (2)
Session frequency	Adoption (2)
Time saved per task	Productivity (3)
Tasks completed faster	Productivity (3)
Cost per successful outcome	Value (4)
Revenue attributed to AI	Value (4)
Quality improvement	Value (4)
New capability enabled	Strategic (5)
Competitive position defended	Strategic (5)

The lead metric: attribution coverage

If you can only track one metric across your AI portfolio, track attribution coverage. It is the percentage of your total AI cost that can be linked to a measured business outcome.

~95%

Disputed

MIT NANDA study (2025)

The MIT NANDA study found that roughly 95% of enterprises report no measurable P&L impact from AI investments. That is the value gap. Attribution coverage is the inverse: the percentage of AI spend you can connect to a measured outcome. The higher the coverage, the smaller the gap.

Interpretation

Attribution coverage is a lead metric because it predicts value realisation. If you cannot attribute cost to outcome, you cannot prove value. If you cannot prove value, you cannot defend the spend. If you cannot defend the spend, the programme gets cut. Coverage is the early warning.

The diagnostic question

The distinction ladder gives you one diagnostic question to ask of any AI dashboard:

Which rung is each metric on, and why does it matter?

If the dashboard cannot answer that question, it is a vanity dashboard. It is showing activity and calling it value. The ladder forces the distinction.

One-page self-audit

Use this checklist to audit your AI measurement system. Answer honestly. The point is diagnosis, not judgement.

Measurement maturity checklist

☐We can list all AI initiatives and their current spend

☐We know which rung each metric sits on

☐We have at least one metric on rung 4 (value) for each major initiative

☐We can calculate attribution coverage across the portfolio

☐We track risk metrics alongside value metrics

☐We can explain why each metric matters to a non-technical executive

☐We have a process for retiring metrics that no longer matter

☐Our dashboards separate activity from value

☐We can answer: what percentage of AI spend is attributed to measured outcomes?

☐We review attribution coverage quarterly with leadership

☐We have a plan to improve coverage over the next 12 months

☐We can defend our measurement approach to an auditor or regulator

☐We know which initiatives have weak attribution and why

☐We have stopped at least one initiative because attribution was too weak

Scoring: 0-4 checks: measurement is ad hoc. 5-9 checks: measurement is emerging. 10-14 checks: measurement is systematic. The goal is not perfection. The goal is honest diagnosis and a plan to improve.

Where we might be wrong

The distinction ladder assumes that value is the goal and that higher rungs are always better. In practice, some organisations need to optimise for adoption first, or for risk reduction, or for strategic positioning. The ladder does not tell you which rung to prioritise. It just tells you which rung you are on.

It also assumes that metrics can be cleanly separated into rungs. In practice, many metrics span multiple rungs. Time saved (productivity) can lead to cost reduction (value), which can enable new capability (strategic). The ladder simplifies this entanglement, which makes it useful but incomplete.

Finally, the ladder is silent on causation. A metric on rung 4 (value) does not prove that AI caused the value. It just proves that value happened while AI was running. Attribution is harder than measurement, and the ladder does not solve it.

What would change our mind: a real organisation where optimising for lower-rung metrics (activity, adoption) led to better long-term outcomes than optimising for higher-rung metrics (value, strategic impact). That would suggest the ladder is wrong about the hierarchy, or that the hierarchy is context-dependent.

The Distinction Ladder