Key takeaways
-
Evidence
Most AI business cases price only the model invoice, missing 60-80% of the true operating cost in integration, governance, review time, and infrastructure. -
Evidence
The seven-sheet TCO worksheet maps directly to the canonical seven-category AI cost taxonomy used by FinOps Foundation and enterprise practitioners. -
Interpretation
A finance analyst and platform engineer can complete the worksheet together in an afternoon per use case, producing three numbers a CFO can repeat: fully loaded unit cost, naive-versus-loaded gap, and attribution status. -
Evidence
The pilot-to-production bridge sheet addresses the most common business case failure: assuming pilot economics scale linearly to production.
Why this matters
Evidence
Interpretation
Interpretation
Evidence
The hidden cost multiplier: At Uber, the visible API invoice for AI coding tools ran $500-$2,000 per engineer per month. The hidden costs - review time of engineers checking agent-generated code, rework when agent-committed code needs correction, orchestration engineering, opportunity cost of attention shifting to prompt-wrangling - never appeared on the Anthropic invoice but belong in cost per unit of shipped work.
Evidence
Interpretation
The seven-sheet structure
Evidence
Interpretation
Sheet 1: Use case header
Interpretation
Evidence
| Field | Notes | |---|---| | Use case name and owner | One named person accountable for economics | | Workflow description | What the AI does, who consumes the output | | Stage | Pilot / scaling / production | | Business case reference | Link, or "none" recorded as a finding | | Baseline captured? | Y/N, date, where stored | | Unit of work | The denominator everything divides by: ticket resolved, change merged, document processed, workflow completed |
Interpretation
Evidence
Sheet 2: Direct AI costs (monthly)
Evidence
Interpretation
Model and API consumption:
- Input tokens, output tokens, blended £/unit by provider
-
Evidence
Agentic multiplier: tokens per task for agent-driven work versus interactive work (tracked separately; this is where forecasts break)
Seat licences and SaaS tiers:
- Tool licences for this workflow
-
Evidence
Embedded SaaS AI tier allocation (the AI-attributable share of the relevant SaaS contract, even if estimated)
Training and hosting:
- Fine-tuning or training runs, amortised over expected life
- GPU, hosting, vector store, orchestration platform share
Sheet 3: The costs business cases forget
Interpretation
Evidence
Human review and rework:
-
Evidence
Minutes per output × volume × loaded rate of the reviewer -
Interpretation
For coding tools, review of AI-generated changes; for service AI, QA sampling and escalation handling -
Evidence
Share of AI outputs corrected or redone, × cost of correction
When review costs exceed model costs: Starbucks retired its inventory management AI agent after discovering the system's recommendations were being routinely overridden by store managers who understood local context the model missed. The agent consumed resources but delivered no measurable improvement over human judgement. The retirement decision came only after establishing baseline performance metrics and full-stack cost accounting, including the manager review time that exceeded the model API cost.
Quality gates and governance:
- Senior sign-off, compliance review steps that didn't exist pre-AI
- Security and compliance assessments (one-off and recurring)
-
Evidence
Evaluation harness and monitoring: building and running the thing that tells you quality held
Integration and enabling infrastructure:
-
Evidence
Integration and pipeline build: one-off, amortised over an assumed life (state the assumption) -
Evidence
Data preparation and retrieval infrastructure: the enabling layer - cleaning, permissions, indexing
Organisational costs:
- Change management and training
- Vendor management overhead (meaningful once tool count grows)
Failure-state consumption:
-
Evidence
For agentic workflows, tokens burned by retries, loops and abandoned runs - tracked as its own line, not blended into the average
Sheet 4: Unit economics
Interpretation
Calculations:
- Fully loaded monthly cost (Sheets 2+3) ÷ units of work = fully loaded unit cost
- Same calculation with Sheet 2 only = naive unit cost
-
Interpretation
The gap between the two numbers is itself a finding worth reporting
Comparators:
- Pre-AI unit cost from the baseline, or the human-only alternative at its full loaded cost
-
Evidence
Threshold: the unit cost at which this use case is agreed to be working (set in advance, signed by finance and the use-case owner)
Sheet 5: Pilot-to-production bridge
Evidence
Interpretation
For each assumption, pilot value → production value → reason it changes:
Volume and case mix:
-
Evidence
Volume (and whether case mix changes with it - the hard 20% usually arrives at scale)
Consumption patterns:
- Tokens per unit (retrieval depth, context growth, agentic intensity)
-
Interpretation
Review rate (pilots skip the quality gate production requires)
Edge cases and build costs:
- Edge-case share and cost
- One-off build costs not incurred in pilot
Output:
-
Evidence
Production fully loaded unit cost versus pilot unit cost, as a multiple. If the multiple is unknown, the scaling decision is being made on fiction.
Sheet 6: Forecast and variance
Evidence
Forecast structure:
- 12-month consumption forecast by line, with stated adoption and intensity assumptions
- Actual versus forecast, monthly, with variance triggers
-
Interpretation
Suggested triggers: investigate at 15%, restrict at 30%
Sensitivity analysis:
-
Evidence
Vendor price change ±25% (see Cursor, June 2025) - Agentic intensity ×2
- Adoption ±20%
Sheet 7: Value side (paired, so the sheet never ships cost-only)
Interpretation
Evidence
Value tracking:
- The measured outcome metric for this use case and its current value versus baseline
- Value realised versus business case, quarterly
-
Evidence
Attribution status: outcome-linked / activity-linked / unattributed
Net position:
-
Interpretation
Value evidenced minus fully loaded cost, with confidence noted
Worked example: Customer service AI agent
Interpretation
Use case header (Sheet 1)
- Use case: Customer service AI agent for tier-1 support
- Owner: Head of Customer Operations
- Workflow: Agent handles incoming support tickets, escalates complex cases to human agents
- Stage: Scaling (pilot complete, moving to full production)
- Baseline: Pre-AI resolution time 4.2 hours, cost per ticket £18.50
- Unit of work: Ticket resolved
Direct AI costs (Sheet 2) - Monthly
- Model consumption: 2.5M input tokens, 800K output tokens at £0.015/1K tokens = £49.50
- Agentic multiplier: 3.2× tokens per ticket versus interactive chat (agent retrieves context, checks knowledge base, formats response)
- Orchestration platform: £2,400/month (shared across 8 use cases, allocated 15% = £360)
- Vector database: £180/month
- Total Sheet 2: £589.50/month
Hidden costs (Sheet 3) - Monthly
- Human review: 12% of tickets reviewed, 8 minutes per review, 2,400 tickets/month × 12% × 8 min × £45/hour loaded rate = £1,728
- Rework: 4% of tickets require human correction, 15 minutes per correction = £270
- Quality gate: Weekly QA sampling, 4 hours/week × £55/hour = £880
- Integration build: £24,000 one-off, amortised over 24 months = £1,000/month
- Data preparation: Ongoing knowledge base curation, 20 hours/month × £50/hour = £1,000
- Monitoring and evaluation: £400/month
- Change management: Training and adoption support, 10 hours/month × £60/hour = £600
- Total Sheet 3: £5,878/month
Unit economics (Sheet 4)
- Volume: 2,400 tickets/month
- Fully loaded unit cost: (£589.50 + £5,878) ÷ 2,400 = £2.69 per ticket
- Naive unit cost: £589.50 ÷ 2,400 = £0.25 per ticket
- Gap: 10.8× (the naive cost understates the true cost by a factor of 11)
- Pre-AI baseline: £18.50 per ticket
- Savings: £15.81 per ticket (85% reduction)
Pilot-to-production bridge (Sheet 5)
- Volume: Pilot 200 tickets/month → Production 2,400 tickets/month (12× increase)
- Case mix: Pilot handled curated easy cases → Production includes full case mix (hard 20% drives review rate from 5% to 12%)
- Review rate: Pilot 5% → Production 12% (quality gate requirement)
- Tokens per ticket: Pilot 2.1× → Production 3.2× (deeper context retrieval at scale)
- Production multiplier: Fully loaded unit cost increased 2.4× from pilot to production
Forecast and variance (Sheet 6)
- 12-month forecast: £77,616 (assumes 2,400 tickets/month steady state)
- Sensitivity: Vendor price +25% = £84,993; agentic intensity ×2 = £91,344
- Variance trigger: Investigate if monthly cost exceeds £7,100 (15% over forecast)
Value side (Sheet 7)
- Outcome metric: Customer satisfaction score (CSAT)
- Baseline CSAT: 3.8/5.0
- Current CSAT: 4.1/5.0 (8% improvement)
- Resolution time: 4.2 hours → 0.8 hours (81% improvement)
- Attribution: Outcome-linked (measured before/after with control group)
- Net position: £37,944/month value (£15.81 savings × 2,400 tickets) minus £6,468 cost = £31,476/month net value (high confidence)
Interpretation
Design rules for implementation
Evidence
Interpretation
1. Label every input cell with provenance
Evidence
Interpretation
Evidence
2. Produce three numbers a CFO can repeat
Interpretation
- Fully loaded unit cost
- Naive-versus-loaded gap
- Attribution status
3. Ship with a worked example
Interpretation
Mapping to the AI TCO Framework
Evidence
Seven-category taxonomy mapping:
- Model and inference → Sheet 2 (model consumption, API calls)
- Data and pipeline → Sheet 3 (data preparation, retrieval infrastructure)
- Compute and infrastructure → Sheet 2 (GPU, hosting, vector store)
- Tooling and platform → Sheet 2 (orchestration platform, development tools)
- Integration and orchestration → Sheet 3 (integration build, pipeline work)
- Governance, risk and compliance → Sheet 3 (quality gates, security assessments, monitoring)
- Change, adoption and human → Sheet 3 (review time, rework, training, change management)
Lifecycle stage mapping:
- Prepare: Sheet 1 (baseline capture, unit of work definition)
- Build: Sheet 3 (integration build, data preparation, one-off costs)
- Run: Sheets 2, 3, 4, 6 (operating costs, unit economics, variance tracking)
- Scale/Retire: Sheet 5 (pilot-to-production bridge), Sheet 7 (value validation for continuation decisions)
- Govern (cross-cutting): Sheet 3 (governance costs), Sheet 7 (attribution and value evidence)
What to do next
For finance leaders:
-
Interpretation
Start with one high-visibility use case and complete the worksheet with the platform team. -
Evidence
Use the naive-versus-loaded gap as the lead finding in the next AI investment review. -
Interpretation
Establish the worksheet as the standard template for all AI business cases going forward.
For platform and engineering leaders:
-
Evidence
Instrument the top three AI workflows so Sheet 2 and Sheet 6 can be populated from telemetry rather than estimates. -
Interpretation
Work with finance to establish the loaded rates for review time and rework that populate Sheet 3. -
Evidence
Build the pilot-to-production bridge (Sheet 5) into every scaling decision.
For operating-model leaders:
-
Interpretation
Use the worksheet as the economic foundation for AI governance, not a finance-only exercise. -
Evidence
Track attribution coverage (Sheet 7) as the lead metric for AI value management maturity. -
Interpretation
Publish completed worksheets internally to build shared understanding of what AI really costs to run well.
References and further reading
FinOps Foundation, FinOps for AI: Scopes and Capabilities, 2025
Canonical guidance on AI cost management and value measurement
BCG, The Widening AI Value Gap: Build for the Future, 2025
Research on why AI value gaps persist and how to close them
AWS, Closing the AI Value Gap, 2024
Practical guidance on measuring and improving AI value realisation
AI TCO Framework
The seven-category framework this worksheet implements
Enterprise AI Cost Basics
Foundational primer on where enterprise AI costs accumulate