Skip to content

A field sequence, not a framework. Assumes a mid-to-large organisation with AI spend already running and nobody formally owning its economics.

Evidence

Everything here is doable with finance extracts, vendor portals and a spreadsheet; no tooling purchase required in month one.

The urgency is real.

Evidence

Uber spent its entire 2026 AI coding budget in four months
, and the COO could not draw a line between usage and value.

Evidence

The Wall Street Journal reports enterprise-wide AI rationing as costs skyrocket
.

Interpretation

The organisations that avoid this pattern will be those that built the value meter while the cost meter was still small.


Week 1: Find the money

Day 1-2. Name an owner.

Evidence

One person accountable for AI economics across the estate, with a mandate letter from the CFO or CIO. Not a committee.
The rest of this playbook fails without this step, because every later action needs someone empowered to ask awkward questions across budget lines.

Day 2-5. Build the spend inventory. Every AI cost, in one register, in four categories:

  1. Direct consumption - API and token spend, by provider and team
  2. Licensed seats - copilots, coding assistants, AI tools with per-seat pricing
  3. Embedded tiers - AI components inside SaaS contracts (ask vendors for the AI-attributable share in writing; log refusals as "opaque")
  4. Infrastructure - GPU, hosting, model platform costs inside the cloud bill

Interpretation

Expect this to be harder than it sounds and revealing in itself: duplicate tools, departmental contracts finance has never categorised as AI, embedded tiers nobody chose.
The total is your first published number.

Evidence

Most organisations have never seen it.

Real-world pattern: In 2026, Marc Benioff said Salesforce expected to spend about $300 million on Anthropic tokens in 2026, largely for coding-related work.

Day 5. Pull the consumption telemetry. Provider dashboards, seat-activity reports, cloud cost tags.

Evidence

You are establishing what is measurable today, not building anything yet.

Week 2: Find the usage and the owners

Day 6-8. Map spend to teams and use cases. For each register line: who uses it, for what workflow, and who claimed it would deliver what. Where there was a business case, attach it.

Evidence

Where there wasn't, record "no business case" without ceremony
- the gaps are findings, not accusations.

Day 8-10. Measure real adoption.

Evidence

Active users versus paid seats per tool. Repeat use, not log-ins.
Flag every contract where adoption is under 40% of seats - renewal leverage later.

Adoption reality check: Microsoft reported that enterprise Copilot subscription cancellations reached significant levels in early 2026, with organisations citing low adoption and unclear value as primary reasons. The pattern: seats purchased on potential, cancelled on measured reality.

Day 10. Identify the overlap. Tools answering the same job (general assistant ×2, coding assistant ×2, embedded copilot doing both).

Interpretation

Don't consolidate yet; just establish the duplicate-spend number.

Week 3: Baselines and unit costs

Day 11-15. Capture baselines for the top three use cases by spend. Whatever pre-AI history exists - cycle times, tickets resolved per FTE, drafting turnaround - capture now, before it ages out of systems.

Evidence

For use cases already past rollout with no baseline, record that the productivity claim is untestable and decide whether a control-group comparison is still possible.

Interpretation

Without a pre-rollout baseline, "maybe implicitly there's more that is getting shipped" - the Uber COO's actual words - becomes the strongest claim available.
A company that measures everything about a ride could not measure this, because measurement was never designed in.

Day 15-18. Define one unit cost per major use case. Examples: cost per accepted code change; cost per resolved ticket (resolved, not deflected); cost per document processed, fully loaded with review time; cost per workflow completed for agentic processes.

Evidence

Imperfect definitions are fine; consistency matters more than precision in month one.

The hidden costs: At Uber, the visible API invoice ran $500-$2,000 per engineer per month. The hidden costs - review time of engineers checking agent-generated code, rework when agent-committed code needs correction, orchestration engineering, opportunity cost of attention shifting to prompt-wrangling - never appeared on the Anthropic invoice but belong in cost per unit of shipped work.

Day 18-20. Compute attribution coverage, version one. Classify every register line: outcome-linked (a measured result exists), activity-linked (usage data only), unattributed (neither).

Evidence

Publish the three percentages. This is the headline metric the whole discipline improves over time, and version one is allowed to be embarrassing.

Interpretation

Attribution coverage is the lead metric because it measures the organisation's ability to connect AI spend to measured outcomes, which is precisely what the AI value gap describes.

Week 4: Forecast, controls, and the first decisions

Day 21-24. Build the first consumption forecast. Monthly, by category, ninety days out, with stated assumptions about adoption and agentic intensity.

Evidence

Set variance triggers: at 15% over forecast, the owner investigates; at 30%, spend approval tightens automatically.

Interpretation

Agentic workflows multiply tokens per task - an agent iterating on a codebase resends growing context with every step, so the marginal task is more expensive than the average task.
This is why static annual budgets fail:

Evidence

per-token prices may fall, but agentic systems consume so many more tokens per task that total spend rises anyway.

Day 24-26. Set guardrails where consumption is unbounded.

Evidence

Per-task and per-run budgets on any autonomous agent, with automatic halt. Per-user anomaly alerts on coding tools (investigate the top decile before capping it - your best users and your wasteful ones both live there).

Evidence

A standing rule: no consumption-based leaderboards or usage targets, anywhere, ever.

The leaderboard trap: Uber ran internal leaderboards ranking AI coding tool usage competitively. Once usage is ranked, usage data stops telling you about productivity, which is exactly the data Uber then needed to justify the spend it had encouraged. Some unknown share of consumption became performative, buying status rather than value.

Day 26-28. Fix the procurement posture. Standard clauses from now on: AI-attributable price component stated in every SaaS contract; notice period for pricing-model changes (Cursor's June 2025 repricing is the cautionary precedent); usage data export rights;

Evidence

no auto-renewal of AI tiers with sub-40% adoption.

Day 28-30. Hold the first value review and publish the pack. One hour, CFO or delegate in the room. Contents: total AI spend and trend; adoption versus seats; duplicate-spend estimate; attribution coverage v1; top-three unit costs; forecast with triggers; three decisions (typically: one tool to consolidate, one use case to baseline properly, one contract to renegotiate).

Evidence

Schedule it monthly. The pack is the discipline; the meeting is just where it becomes visible.

What you have at day 30

A named owner. A total spend number. An adoption-versus-paid map. Baselines for the biggest use cases. Three unit costs. Attribution coverage, version one. A forecast with tripwires. Agent guardrails. Procurement clauses. A monthly review with decision authority.

Interpretation

This is not AI Value Management complete. This is AI Value Management started, with the minimum viable governance to prevent the Uber pattern while building the data foundation for everything that follows.

What this deliberately leaves for later

Evidence

Chargeback and showback models (month 2-3, once allocation data is trustworthy). Tooling selection for AI cost management (buy after you know your requirements, not before). Portfolio-level investment reprioritisation (needs two or three review cycles of data). Value realisation audits of pre-existing business cases (month 3, with the baseline question settled).

The three failure modes of month one

  1. The inventory stalls in pursuit of completeness.

    Evidence

    Ship the 80% version at day 5; the register is a living document.
  2. The exercise becomes a policing project.

    Interpretation

    The owner's posture is economist, not auditor - teams that fear the data will starve it.
  3. Measurement becomes the deliverable.

    Evidence

    The day-30 review must make three real decisions, or the organisation learns that AI Value Management produces packs rather than consequences.

Real-world precedents

When AI agents fail economics: Starbucks retired its inventory management AI agent after discovering the system's recommendations were being routinely overridden by store managers who understood local context the model missed. The agent consumed resources but delivered no measurable improvement over human judgment. The retirement decision came only after establishing baseline performance metrics.

When trivial tasks game the system: Amazon's internal AI usage reportedly included significant volumes of trivial-task gaming, where employees used AI tools for low-value work to meet adoption targets or demonstrate engagement. The pattern emerged only after usage telemetry was analysed by task type and outcome, not just volume.

Optimist

Thirty days is sufficient to establish visibility, ownership, and basic controls. The FinOps Foundation's guidance on AI value management emphasises that early action prevents later crisis. BCG's research on the AI value gap shows that organisations with early measurement discipline close the gap faster than those that wait for perfect data.

Sceptic

Thirty days produces theatre, not discipline. Real baselines require months of pre-rollout data collection. Unit cost definitions need cross-functional agreement that takes quarters, not weeks. Attribution coverage version one will be so incomplete as to be misleading. The risk is that organisations declare victory at day 30 and stop improving the system.

Synthesis

House view
The playbook is deliberately titled 'first 30 days', not 'complete AI value management'. The goal is minimum viable governance to prevent budget overruns and establish the data foundation. Perfection is the enemy of starting. The organisations that wait for perfect measurement systems are the ones reporting unbudgeted AI spend and cancelled subscriptions.

Where this fits in the broader discipline

Evidence

This 30-day sequence addresses the bottom two rungs of the five-level distinction ladder: activity (what is being used) and adoption (who is using it, how much).

Interpretation

Productivity measurement, value realisation, and strategic impact assessment require the data foundation this playbook builds, but they are month 2-6 work, not month 1.

Evidence

The FinOps Foundation's 2025 guidance on AI scopes emphasises that AI cost management must extend beyond infrastructure to include model and inference costs, data and pipeline costs, and governance costs.
This playbook's four-category spend inventory maps directly to that expanded scope.

Interpretation

The NIST AI Risk Management Framework treats economic sustainability as a governance concern, not just a finance concern.
A named owner with CFO or CIO mandate (day 1-2) positions AI value management as governance, which is where it belongs.


References and further reading

  1. BCG, The Widening AI Value Gap: Build for the Future, 2025
  2. BCG, From Potential to Profit: Closing the AI Impact Gap, 2024
  3. FinOps Foundation, FinOps for AI: Scopes and Capabilities, 2025
  4. AWS, Closing the AI Value Gap, 2024
  5. UK Government Digital Service, Microsoft 365 Copilot Experiment: Cross-Government Findings Report, 2024, https://www.gov.uk/government/publications/microsoft-365-copilot-experiment-cross-government-findings-report
  6. NIST, AI Risk Management Framework, 2023
  7. Fortune, "Uber COO: AI spending on tokens like Claude Code is hard to justify", 26 May 2026, https://fortune.com/2026/05/26/uber-coo-ai-spending-tokens-claude-code/
  8. Wall Street Journal, "Corporate America Is Starting to Ration AI as Cost Skyrockets", May 2026