AI TCO Worksheet: The Seven-Sheet Model

Seven-sheet TCO worksheet map with a panel showing naive cost 0.25 versus fully loaded 2.69 per ticket, a 10.8 times gap — The seven sheets, and the three numbers a CFO can repeat.

Key takeaways

Evidence
Most AI business cases price only the model invoice, missing 60-80% of the true operating cost in integration, governance, review time, and infrastructure.
Evidence
The seven-sheet TCO worksheet maps directly to the canonical seven-layer AI TCO Framework used throughout this site.
Interpretation
A finance analyst and platform engineer can complete the worksheet together in an afternoon per use case, producing three numbers a CFO can repeat: fully loaded unit cost, naive-versus-loaded gap, and attribution status.
Evidence
The pilot-to-production bridge sheet addresses the most common business case failure: assuming pilot economics scale linearly to production.

Why this matters

The hidden cost multiplier: At Uber, the visible API invoice for AI coding tools ran $500-$2,000 per engineer per month. The hidden costs - review time of engineers checking agent-generated code, rework when agent-committed code needs correction, orchestration engineering, opportunity cost of attention shifting to prompt-wrangling - never appeared on the Anthropic invoice but belong in cost per unit of shipped work.

The seven-sheet structure

Sheet 1: Use case header

| Field | Notes | |---|---| | Use case name and owner | One named person accountable for economics | | Workflow description | What the AI does, who consumes the output | | Stage | Pilot / scaling / production | | Business case reference | Link, or "none" recorded as a finding | | Baseline captured? | Y/N, date, where stored | | Unit of work | The denominator everything divides by: ticket resolved, change merged, document processed, workflow completed |

Sheet 2: Direct AI costs (monthly)

Model and API consumption:

Input tokens, output tokens, blended £/unit by provider
Evidence
Agentic multiplier: tokens per task for agent-driven work versus interactive work (tracked separately; this is where forecasts break)

Seat licences and SaaS tiers:

Tool licences for this workflow
Evidence
Embedded SaaS AI tier allocation (the AI-attributable share of the relevant SaaS contract, even if estimated)

Training and hosting:

Fine-tuning or training runs, amortised over expected life
GPU, hosting, vector store, orchestration platform share

Sheet 3: The costs business cases forget

Human review and rework:

Evidence
Minutes per output × volume × loaded rate of the reviewer
Interpretation
For coding tools, review of AI-generated changes; for service AI, QA sampling and escalation handling
Evidence
Share of AI outputs corrected or redone, × cost of correction

When review costs exceed model costs: Starbucks retired its inventory management AI agent after discovering the system's recommendations were being routinely overridden by store managers who understood local context the model missed. The agent consumed resources but delivered no measurable improvement over human judgement. The retirement decision came only after establishing baseline performance metrics and full-stack cost accounting, including the manager review time that exceeded the model API cost.

Quality gates and governance:

Senior sign-off, compliance review steps that didn't exist pre-AI
Security and compliance assessments (one-off and recurring)
Evidence
Evaluation harness and monitoring: building and running the thing that tells you quality held

Integration and enabling infrastructure:

Evidence
Integration and pipeline build: one-off, amortised over an assumed life (state the assumption)
Evidence
Data preparation and retrieval infrastructure: the enabling layer - cleaning, permissions, indexing

Organisational costs:

Change management and training
Vendor management overhead (meaningful once tool count grows)

Failure-state consumption:

Evidence
For agentic workflows, tokens burned by retries, loops and abandoned runs - tracked as its own line, not blended into the average

Sheet 4: Unit economics

Calculations:

Fully loaded monthly cost (Sheets 2+3) ÷ units of work = fully loaded unit cost
Same calculation with Sheet 2 only = naive unit cost
Interpretation
The gap between the two numbers is itself a finding worth reporting

Comparators:

Pre-AI unit cost from the baseline, or the human-only alternative at its full loaded cost
Evidence
Threshold: the unit cost at which this use case is agreed to be working (set in advance, signed by finance and the use-case owner)

Sheet 5: Pilot-to-production bridge

For each assumption, pilot value → production value → reason it changes:

Volume and case mix:

Evidence
Volume (and whether case mix changes with it - the hard 20% usually arrives at scale)

Consumption patterns:

Tokens per unit (retrieval depth, context growth, agentic intensity)
Interpretation
Review rate (pilots skip the quality gate production requires)

Edge cases and build costs:

Edge-case share and cost
One-off build costs not incurred in pilot

Output:

Evidence
Production fully loaded unit cost versus pilot unit cost, as a multiple. If the multiple is unknown, the scaling decision is being made on fiction.

Sheet 6: Forecast and variance

Forecast structure:

12-month consumption forecast by line, with stated adoption and intensity assumptions
Actual versus forecast, monthly, with variance triggers
Interpretation
Suggested triggers: investigate at 15%, restrict at 30%

Sensitivity analysis:

Evidence
Vendor price change ±25% (see Cursor, June 2025)
Agentic intensity ×2
Adoption ±20%

Sheet 7: Value side (paired, so the sheet never ships cost-only)

Value tracking:

The measured outcome metric for this use case and its current value versus baseline
Value realised versus business case, quarterly
Evidence
Attribution status: outcome-linked / activity-linked / unattributed

Net position:

Interpretation
Value evidenced minus fully loaded cost, with confidence noted

Worked example: Customer service AI agent

This example uses a customer service AI agent handling tier-1 support queries.

Use case header (Sheet 1)

Use case: Customer service AI agent for tier-1 support
Owner: Head of Customer Operations
Workflow: Agent handles incoming support tickets, escalates complex cases to human agents
Stage: Scaling (pilot complete, moving to full production)
Baseline: Pre-AI resolution time 4.2 hours, cost per ticket £18.50
Unit of work: Ticket resolved

Direct AI costs (Sheet 2) - Monthly

Model consumption: 2.5M input tokens, 800K output tokens at £0.015/1K tokens = £49.50
Agentic multiplier: 3.2× tokens per ticket versus interactive chat (agent retrieves context, checks knowledge base, formats response)
Orchestration platform: £2,400/month (shared across 8 use cases, allocated 15% = £360)
Vector database: £180/month
Total Sheet 2: £589.50/month

Hidden costs (Sheet 3) - Monthly

Human review: 12% of tickets reviewed, 8 minutes per review, 2,400 tickets/month × 12% × 8 min × £45/hour loaded rate = £1,728
Rework: 4% of tickets require human correction, 15 minutes per correction = £270
Quality gate: Weekly QA sampling, 4 hours/week × £55/hour = £880
Integration build: £24,000 one-off, amortised over 24 months = £1,000/month
Data preparation: Ongoing knowledge base curation, 20 hours/month × £50/hour = £1,000
Monitoring and evaluation: £400/month
Change management: Training and adoption support, 10 hours/month × £60/hour = £600
Total Sheet 3: £5,878/month

Unit economics (Sheet 4)

Volume: 2,400 tickets/month
Fully loaded unit cost: (£589.50 + £5,878) ÷ 2,400 = £2.69 per ticket
Naive unit cost: £589.50 ÷ 2,400 = £0.25 per ticket
Gap: 10.8× (the naive cost understates the true cost by a factor of 11)
Pre-AI baseline: £18.50 per ticket
Savings: £15.81 per ticket (85% reduction)

Pilot-to-production bridge (Sheet 5)

Volume: Pilot 200 tickets/month → Production 2,400 tickets/month (12× increase)
Case mix: Pilot handled curated easy cases → Production includes full case mix (hard 20% drives review rate from 5% to 12%)
Review rate: Pilot 5% → Production 12% (quality gate requirement)
Tokens per ticket: Pilot 2.1× → Production 3.2× (deeper context retrieval at scale)
Production multiplier: Fully loaded unit cost increased 2.4× from pilot to production

Forecast and variance (Sheet 6)

12-month forecast: £77,616 (assumes 2,400 tickets/month steady state)
Sensitivity: Vendor price +25% = £84,993; agentic intensity ×2 = £91,344
Variance trigger: Investigate if monthly cost exceeds £7,100 (15% over forecast)

Value side (Sheet 7)

Outcome metric: Customer satisfaction score (CSAT)
Baseline CSAT: 3.8/5.0
Current CSAT: 4.1/5.0 (8% improvement)
Resolution time: 4.2 hours → 0.8 hours (81% improvement)
Attribution: Outcome-linked (measured before/after with control group)
Net position: £37,944/month value (£15.81 savings × 2,400 tickets) minus £6,468 cost = £31,476/month net value (high confidence)

Design rules for implementation

1. Label every input cell with provenance

2. Produce three numbers a CFO can repeat

Fully loaded unit cost
Naive-versus-loaded gap
Attribution status

3. Ship with a worked example

Mapping to the AI TCO Framework

Seven-layer TCO mapping:

Infrastructure → Sheet 2 (GPU, hosting, vector store, compute capacity)
Data and context → Sheet 3 (data preparation, retrieval infrastructure, indexing)
Models → Sheet 2 (model consumption, API calls, fine-tuning)
Integration and workflow redesign → Sheet 3 (integration build, workflow changes, API work)
People and capability → Sheet 3 (review time, rework, training, platform engineering)
Governance, safety, and compliance → Sheet 3 (quality gates, security assessments, monitoring, compliance)
Operations and portfolio oversight → Sheet 3 (monitoring, vendor management, portfolio review)

Lifecycle stage mapping:

Prepare: Sheet 1 (baseline capture, unit of work definition)
Build: Sheet 3 (integration build, data preparation, one-off costs)
Run: Sheets 2, 3, 4, 6 (operating costs, unit economics, variance tracking)
Scale/Retire: Sheet 5 (pilot-to-production bridge), Sheet 7 (value validation for continuation decisions)
Govern (cross-cutting): Sheet 3 (governance costs), Sheet 7 (attribution and value evidence)

What to do next

For finance leaders:

Interpretation
Start with one high-visibility use case and complete the worksheet with the platform team.
Evidence
Use the naive-versus-loaded gap as the lead finding in the next AI investment review.
Interpretation
Establish the worksheet as the standard template for all AI business cases going forward.

For platform and engineering leaders:

Evidence
Instrument the top three AI workflows so Sheet 2 and Sheet 6 can be populated from telemetry rather than estimates.
Interpretation
Work with finance to establish the loaded rates for review time and rework that populate Sheet 3.
Evidence
Build the pilot-to-production bridge (Sheet 5) into every scaling decision.

For operating-model leaders:

Interpretation
Use the worksheet as the economic foundation for AI governance, not a finance-only exercise.
Evidence
Track attribution coverage (Sheet 7) as the lead metric for AI value management maturity.
Interpretation
Publish completed worksheets internally to build shared understanding of what AI really costs to run well.

References and further reading