Skip to content
All articlesArticles

AI TCO Worksheet: The Seven-Sheet Model

A working spreadsheet structure for pricing AI use cases from pilot through production. Designed so a finance analyst and platform engineer can fill it in together in an afternoon, with the pilot-to-production bridge most business cases skip.

1 week agoAI TCOCost ManagementOperational GuidesUnit EconomicsBusiness Cases

Key takeaways

  • Evidence

    Most AI business cases price only the model invoice, missing 60-80% of the true operating cost in integration, governance, review time, and infrastructure.
  • Evidence

    The seven-sheet TCO worksheet maps directly to the canonical seven-category AI cost taxonomy used by FinOps Foundation and enterprise practitioners.
  • Interpretation

    A finance analyst and platform engineer can complete the worksheet together in an afternoon per use case, producing three numbers a CFO can repeat: fully loaded unit cost, naive-versus-loaded gap, and attribution status.
  • Evidence

    The pilot-to-production bridge sheet addresses the most common business case failure: assuming pilot economics scale linearly to production.

Why this matters

Evidence

In 2026, Marc Benioff said Salesforce expected to spend about $300 million on Anthropic tokens in 2026, largely for coding-related work.

Interpretation

The spend was fragmented across departments, embedded in SaaS contracts, and categorised inconsistently across finance systems.

Interpretation

This pattern repeats across enterprises: AI costs accumulate faster than visibility systems can track them.

Evidence

Most business cases price only the visible model API invoice, missing the larger costs in human review, integration work, governance, and infrastructure.

The hidden cost multiplier: At Uber, the visible API invoice for AI coding tools ran $500-$2,000 per engineer per month. The hidden costs - review time of engineers checking agent-generated code, rework when agent-committed code needs correction, orchestration engineering, opportunity cost of attention shifting to prompt-wrangling - never appeared on the Anthropic invoice but belong in cost per unit of shipped work.

Evidence

The seven-sheet TCO worksheet provides a structured approach to capturing full-stack AI costs before they become budget surprises.

Interpretation

It is designed as a working tool, not a framework diagram - something a finance analyst and platform engineer can fill in together in an afternoon per use case.

The seven-sheet structure

Evidence

The worksheet maps directly to the canonical seven-category AI cost taxonomy: model and inference; data and pipeline; compute and infrastructure; tooling and platform; integration and orchestration; governance, risk and compliance; change, adoption and human.

Interpretation

Each sheet captures a distinct cost layer, with Sheet 3 dedicated to the costs business cases routinely forget.

Sheet 1: Use case header

Interpretation

Every use case needs a named owner, a clear workflow description, and an explicit stage designation.

Evidence

The stage designation - pilot, scaling, or production - determines which cost assumptions apply and which governance requirements trigger.

| Field | Notes | |---|---| | Use case name and owner | One named person accountable for economics | | Workflow description | What the AI does, who consumes the output | | Stage | Pilot / scaling / production | | Business case reference | Link, or "none" recorded as a finding | | Baseline captured? | Y/N, date, where stored | | Unit of work | The denominator everything divides by: ticket resolved, change merged, document processed, workflow completed |

Interpretation

The unit of work is the most important field on the sheet.

Evidence

Without a clear denominator, cost per unit cannot be calculated, and the use case cannot be compared to alternatives or tracked over time.

Sheet 2: Direct AI costs (monthly)

Evidence

Direct AI costs include model and API consumption, seat licences, embedded SaaS AI tiers, fine-tuning runs, and hosting infrastructure.

Interpretation

This is the cost layer most business cases capture, but it typically represents only 20-40% of the fully loaded cost.

Model and API consumption:

  • Input tokens, output tokens, blended £/unit by provider
  • Evidence

    Agentic multiplier: tokens per task for agent-driven work versus interactive work (tracked separately; this is where forecasts break)

Seat licences and SaaS tiers:

  • Tool licences for this workflow
  • Evidence

    Embedded SaaS AI tier allocation (the AI-attributable share of the relevant SaaS contract, even if estimated)

Training and hosting:

  • Fine-tuning or training runs, amortised over expected life
  • GPU, hosting, vector store, orchestration platform share

Sheet 3: The costs business cases forget

Interpretation

This sheet is the worksheet's reason to exist.

Evidence

These are the costs that appear in operating budgets, engineering time, and governance overhead but rarely make it into AI business cases.

Human review and rework:

  • Evidence

    Minutes per output × volume × loaded rate of the reviewer
  • Interpretation

    For coding tools, review of AI-generated changes; for service AI, QA sampling and escalation handling
  • Evidence

    Share of AI outputs corrected or redone, × cost of correction

When review costs exceed model costs: Starbucks retired its inventory management AI agent after discovering the system's recommendations were being routinely overridden by store managers who understood local context the model missed. The agent consumed resources but delivered no measurable improvement over human judgement. The retirement decision came only after establishing baseline performance metrics and full-stack cost accounting, including the manager review time that exceeded the model API cost.

Quality gates and governance:

  • Senior sign-off, compliance review steps that didn't exist pre-AI
  • Security and compliance assessments (one-off and recurring)
  • Evidence

    Evaluation harness and monitoring: building and running the thing that tells you quality held

Integration and enabling infrastructure:

  • Evidence

    Integration and pipeline build: one-off, amortised over an assumed life (state the assumption)
  • Evidence

    Data preparation and retrieval infrastructure: the enabling layer - cleaning, permissions, indexing

Organisational costs:

  • Change management and training
  • Vendor management overhead (meaningful once tool count grows)

Failure-state consumption:

  • Evidence

    For agentic workflows, tokens burned by retries, loops and abandoned runs - tracked as its own line, not blended into the average

Sheet 4: Unit economics

Interpretation

Unit economics is where the worksheet produces its most important output: the fully loaded unit cost versus the naive unit cost most business cases quote.

Calculations:

  • Fully loaded monthly cost (Sheets 2+3) ÷ units of work = fully loaded unit cost
  • Same calculation with Sheet 2 only = naive unit cost
  • Interpretation

    The gap between the two numbers is itself a finding worth reporting

Comparators:

  • Pre-AI unit cost from the baseline, or the human-only alternative at its full loaded cost
  • Evidence

    Threshold: the unit cost at which this use case is agreed to be working (set in advance, signed by finance and the use-case owner)

Sheet 5: Pilot-to-production bridge

Evidence

The pilot-to-production bridge addresses the most common business case failure: assuming pilot economics scale linearly to production.

Interpretation

They rarely do.

For each assumption, pilot value → production value → reason it changes:

Volume and case mix:

  • Evidence

    Volume (and whether case mix changes with it - the hard 20% usually arrives at scale)

Consumption patterns:

  • Tokens per unit (retrieval depth, context growth, agentic intensity)
  • Interpretation

    Review rate (pilots skip the quality gate production requires)

Edge cases and build costs:

  • Edge-case share and cost
  • One-off build costs not incurred in pilot

Output:

  • Evidence

    Production fully loaded unit cost versus pilot unit cost, as a multiple. If the multiple is unknown, the scaling decision is being made on fiction.

Sheet 6: Forecast and variance

Evidence

Forecast and variance tracking prevents the budget surprises that force reactive rationing.

Forecast structure:

  • 12-month consumption forecast by line, with stated adoption and intensity assumptions
  • Actual versus forecast, monthly, with variance triggers
  • Interpretation

    Suggested triggers: investigate at 15%, restrict at 30%

Sensitivity analysis:

  • Evidence

    Vendor price change ±25% (see Cursor, June 2025)
  • Agentic intensity ×2
  • Adoption ±20%

Sheet 7: Value side (paired, so the sheet never ships cost-only)

Interpretation

The worksheet must never ship cost-only.

Evidence

Sheet 7 pairs cost with value, ensuring every TCO analysis includes the outcome side.

Value tracking:

  • The measured outcome metric for this use case and its current value versus baseline
  • Value realised versus business case, quarterly
  • Evidence

    Attribution status: outcome-linked / activity-linked / unattributed

Net position:

  • Interpretation

    Value evidenced minus fully loaded cost, with confidence noted

Worked example: Customer service AI agent

Interpretation

A concrete worked example shows how to use the seven-sheet model for a specific use case.
This example uses a customer service AI agent handling tier-1 support queries.

Use case header (Sheet 1)

  • Use case: Customer service AI agent for tier-1 support
  • Owner: Head of Customer Operations
  • Workflow: Agent handles incoming support tickets, escalates complex cases to human agents
  • Stage: Scaling (pilot complete, moving to full production)
  • Baseline: Pre-AI resolution time 4.2 hours, cost per ticket £18.50
  • Unit of work: Ticket resolved

Direct AI costs (Sheet 2) - Monthly

  • Model consumption: 2.5M input tokens, 800K output tokens at £0.015/1K tokens = £49.50
  • Agentic multiplier: 3.2× tokens per ticket versus interactive chat (agent retrieves context, checks knowledge base, formats response)
  • Orchestration platform: £2,400/month (shared across 8 use cases, allocated 15% = £360)
  • Vector database: £180/month
  • Total Sheet 2: £589.50/month

Hidden costs (Sheet 3) - Monthly

  • Human review: 12% of tickets reviewed, 8 minutes per review, 2,400 tickets/month × 12% × 8 min × £45/hour loaded rate = £1,728
  • Rework: 4% of tickets require human correction, 15 minutes per correction = £270
  • Quality gate: Weekly QA sampling, 4 hours/week × £55/hour = £880
  • Integration build: £24,000 one-off, amortised over 24 months = £1,000/month
  • Data preparation: Ongoing knowledge base curation, 20 hours/month × £50/hour = £1,000
  • Monitoring and evaluation: £400/month
  • Change management: Training and adoption support, 10 hours/month × £60/hour = £600
  • Total Sheet 3: £5,878/month

Unit economics (Sheet 4)

  • Volume: 2,400 tickets/month
  • Fully loaded unit cost: (£589.50 + £5,878) ÷ 2,400 = £2.69 per ticket
  • Naive unit cost: £589.50 ÷ 2,400 = £0.25 per ticket
  • Gap: 10.8× (the naive cost understates the true cost by a factor of 11)
  • Pre-AI baseline: £18.50 per ticket
  • Savings: £15.81 per ticket (85% reduction)

Pilot-to-production bridge (Sheet 5)

  • Volume: Pilot 200 tickets/month → Production 2,400 tickets/month (12× increase)
  • Case mix: Pilot handled curated easy cases → Production includes full case mix (hard 20% drives review rate from 5% to 12%)
  • Review rate: Pilot 5% → Production 12% (quality gate requirement)
  • Tokens per ticket: Pilot 2.1× → Production 3.2× (deeper context retrieval at scale)
  • Production multiplier: Fully loaded unit cost increased 2.4× from pilot to production

Forecast and variance (Sheet 6)

  • 12-month forecast: £77,616 (assumes 2,400 tickets/month steady state)
  • Sensitivity: Vendor price +25% = £84,993; agentic intensity ×2 = £91,344
  • Variance trigger: Investigate if monthly cost exceeds £7,100 (15% over forecast)

Value side (Sheet 7)

  • Outcome metric: Customer satisfaction score (CSAT)
  • Baseline CSAT: 3.8/5.0
  • Current CSAT: 4.1/5.0 (8% improvement)
  • Resolution time: 4.2 hours → 0.8 hours (81% improvement)
  • Attribution: Outcome-linked (measured before/after with control group)
  • Net position: £37,944/month value (£15.81 savings × 2,400 tickets) minus £6,468 cost = £31,476/month net value (high confidence)

Interpretation

This worked example shows the worksheet in practice: the naive cost of £0.25 per ticket would have produced a wildly optimistic business case, while the fully loaded cost of £2.69 per ticket still delivers 85% savings versus the pre-AI baseline.

Design rules for implementation

Evidence

The worksheet must be usable by practitioners, not just comprehensible to executives.

Interpretation

Three design rules ensure practical utility:

1. Label every input cell with provenance

Evidence

Every input cell must be labelled with where the number comes from: invoice, telemetry, sampled study, estimate.

Interpretation

Estimates are allowed, but visibly so.

Evidence

This prevents the false precision that undermines trust in cost models.

2. Produce three numbers a CFO can repeat

Interpretation

The worksheet must produce three numbers a CFO can repeat in any forum:
  • Fully loaded unit cost
  • Naive-versus-loaded gap
  • Attribution status

3. Ship with a worked example

Interpretation

Ship the worksheet with one completed worked example using real-shaped numbers, so users see what good looks like rather than staring at a blank grid.

Mapping to the AI TCO Framework

Evidence

The seven-sheet worksheet maps directly to the canonical seven-category AI cost taxonomy and the four-stage lifecycle model (Prepare, Build, Run, Scale/Retire with Govern cross-cutting).

Seven-category taxonomy mapping:

  1. Model and inference → Sheet 2 (model consumption, API calls)
  2. Data and pipeline → Sheet 3 (data preparation, retrieval infrastructure)
  3. Compute and infrastructure → Sheet 2 (GPU, hosting, vector store)
  4. Tooling and platform → Sheet 2 (orchestration platform, development tools)
  5. Integration and orchestration → Sheet 3 (integration build, pipeline work)
  6. Governance, risk and compliance → Sheet 3 (quality gates, security assessments, monitoring)
  7. Change, adoption and human → Sheet 3 (review time, rework, training, change management)

Lifecycle stage mapping:

  • Prepare: Sheet 1 (baseline capture, unit of work definition)
  • Build: Sheet 3 (integration build, data preparation, one-off costs)
  • Run: Sheets 2, 3, 4, 6 (operating costs, unit economics, variance tracking)
  • Scale/Retire: Sheet 5 (pilot-to-production bridge), Sheet 7 (value validation for continuation decisions)
  • Govern (cross-cutting): Sheet 3 (governance costs), Sheet 7 (attribution and value evidence)

What to do next

For finance leaders:

  • Interpretation

    Start with one high-visibility use case and complete the worksheet with the platform team.
  • Evidence

    Use the naive-versus-loaded gap as the lead finding in the next AI investment review.
  • Interpretation

    Establish the worksheet as the standard template for all AI business cases going forward.

For platform and engineering leaders:

  • Evidence

    Instrument the top three AI workflows so Sheet 2 and Sheet 6 can be populated from telemetry rather than estimates.
  • Interpretation

    Work with finance to establish the loaded rates for review time and rework that populate Sheet 3.
  • Evidence

    Build the pilot-to-production bridge (Sheet 5) into every scaling decision.

For operating-model leaders:

  • Interpretation

    Use the worksheet as the economic foundation for AI governance, not a finance-only exercise.
  • Evidence

    Track attribution coverage (Sheet 7) as the lead metric for AI value management maturity.
  • Interpretation

    Publish completed worksheets internally to build shared understanding of what AI really costs to run well.

References and further reading