Skip to content

Framework

The Distinction Ladder

Five rungs every AI metric belongs on

Most AI dashboards mix activity, adoption, productivity, value and strategic impact into one undifferentiated list. The distinction ladder separates them. It is the question that kills vanity dashboards: which rung is this metric on, and why does it matter?

Measurement framework

The five rungs

Every AI metric sits on one of five rungs. The rungs are ordered by how close they are to business value. The higher the rung, the harder the metric is to game, and the more it matters.

Rung 1: Activity

Things happening. Tokens consumed, API calls made, models deployed, users registered. Activity metrics tell you the system is being used. They do not tell you if it is useful.

Example: 2.4 million tokens consumed this month

Rung 2: Adoption

People using it. Active users, session frequency, feature usage. Adoption metrics tell you people are trying it. They do not tell you if it is working.

Example: 340 active users, 68% weekly return rate

Rung 3: Productivity

Work done faster. Time saved, tasks completed, throughput increased. Productivity metrics tell you the work is getting done quicker. They do not tell you if the work matters.

Example: 4.2 hours saved per user per week

Rung 4: Value

Business outcomes improved. Revenue increased, costs reduced, quality improved, risk lowered. Value metrics tell you the business result changed. They do not tell you if it was strategic.

Example: £840k in labour cost avoided, 12% reduction in processing errors

Rung 5: Strategic impact

Capability built, position strengthened, options created. Strategic metrics tell you the organisation is more capable than it was. They are the hardest to measure and the most important.

Example: New product capability enabled, competitive position defended, regulatory risk reduced

The ladder is not a judgement. Activity metrics are not bad. They are just on the first rung. The problem is when organisations treat first-rung metrics as if they were fourth-rung metrics, or when dashboards mix all five rungs together without distinction.

The risk track

There is a parallel track for risk metrics. They run alongside the value ladder, but in the opposite direction: the higher the risk metric, the more it threatens value.

  • Risk exposure: What could go wrong (model bias, data leakage, compliance breach)
  • Risk events: What did go wrong (incidents, failures, audit findings)
  • Risk impact: What it cost when it went wrong (financial loss, reputational damage, regulatory penalty)

Risk metrics belong on the ladder too. A dashboard that only shows value metrics without risk metrics is incomplete. The full picture is value minus risk.

Where common metrics sit

Here is where the most common AI metrics sit on the ladder:

MetricRung
Tokens consumedActivity (1)
API calls madeActivity (1)
Models deployedActivity (1)
Active usersAdoption (2)
Session frequencyAdoption (2)
Time saved per taskProductivity (3)
Tasks completed fasterProductivity (3)
Cost per successful outcomeValue (4)
Revenue attributed to AIValue (4)
Quality improvementValue (4)
New capability enabledStrategic (5)
Competitive position defendedStrategic (5)

The lead metric: attribution coverage

If you can only track one metric across your AI portfolio, track attribution coverage. It is the percentage of your total AI cost that can be linked to a measured business outcome.

~95%Disputed

MIT NANDA study (2025)

The MIT NANDA study found that roughly 95% of AI spend cannot be attributed to a measured outcome. That is the value gap. Attribution coverage is the inverse: the percentage you can attribute. The higher the coverage, the smaller the gap.

Interpretation

Attribution coverage is a lead metric because it predicts value realisation. If you cannot attribute cost to outcome, you cannot prove value. If you cannot prove value, you cannot defend the spend. If you cannot defend the spend, the programme gets cut. Coverage is the early warning.

The diagnostic question

The distinction ladder gives you one diagnostic question to ask of any AI dashboard:

Which rung is each metric on, and why does it matter?

If the dashboard cannot answer that question, it is a vanity dashboard. It is showing activity and calling it value. The ladder forces the distinction.

One-page self-audit

Use this checklist to audit your AI measurement system. Answer honestly. The point is diagnosis, not judgement.

Measurement maturity checklist

We can list all AI initiatives and their current spend
We know which rung each metric sits on
We have at least one metric on rung 4 (value) for each major initiative
We can calculate attribution coverage across the portfolio
We track risk metrics alongside value metrics
We can explain why each metric matters to a non-technical executive
We have a process for retiring metrics that no longer matter
Our dashboards separate activity from value
We can answer: what percentage of AI spend is attributed to measured outcomes?
We review attribution coverage quarterly with leadership
We have a plan to improve coverage over the next 12 months
We can defend our measurement approach to an auditor or regulator
We know which initiatives have weak attribution and why
We have stopped at least one initiative because attribution was too weak

Scoring: 0-4 checks: measurement is ad hoc. 5-9 checks: measurement is emerging. 10-14 checks: measurement is systematic. The goal is not perfection. The goal is honest diagnosis and a plan to improve.

Related reading