AI Economics for the Engineering Leader

Key takeaways

Interpretation
Engineering leaders now influence AI economics directly through model routing, prompt design, workflow architecture, and platform choices.
Interpretation
The strongest engineering AI strategies treat cost per feature, cost per user, and cost per action as real delivery metrics, not finance afterthoughts.
Evidence
Prompt engineering, caching, and selective model choice can create meaningful savings without damaging product value when designed intentionally.
Interpretation
Build-versus-buy decisions should be judged on total operating burden, not only on provider price or hosting cost.

Engineering is now shaping the denominator

Engineering cost visibility matters: Uber spent its entire 2026 AI coding budget in four months, and the COO could not draw a line between usage and value. The visible API invoice ran $500-$2,000 per engineer per month, but the hidden costs - review time of engineers checking agent-generated code, rework when agent-committed code needs correction, orchestration engineering - never appeared on the Anthropic invoice but belong in cost per unit of shipped work.

Model selection is an economic decision

Prompt engineering as cost optimisation

Build versus buy versus fine-tune

How differentiated is the workflow?
How much control, observability, and sovereignty is actually required?
What internal platform and support burden does each option create?
How sensitive is the feature to inference cost at scale?
What is the likely marginal cost of adding more users, tasks, or product surfaces later?

When AI agents fail economics: Starbucks retired its inventory management AI agent after discovering the system's recommendations were being routinely overridden by store managers who understood local context the model missed. The agent consumed resources but delivered no measurable improvement over human judgement. The retirement decision came only after establishing baseline performance metrics - a discipline engineering leaders must build into deployment decisions from the start.

Shift-left cost awareness for developers

Trivial-task gaming: Amazon's internal AI usage reportedly included significant volumes of trivial-task gaming, where employees used AI tools for low-value work to meet adoption targets or demonstrate engagement. The pattern emerged only after usage telemetry was analysed by task type and outcome, not just volume. Engineering leaders must design telemetry that distinguishes productive use from performative use.

Useful engineering KPIs for AI economics

Cost per inference
Cost per action
Cost per feature
Cost per active user or workflow
Cache hit rate
Percentage of workflow steps routed to premium models
Quality-adjusted unit cost

BCG's six technology use-case families

1. Software development

Current state: Around two-thirds use AI in the software development lifecycle, with about 36% at scaled or full deployment. Current productivity improvement is reported around 25%, with higher expected improvement at full scale.

Economics: Value comes from faster feature delivery, but only if review burden, rework, and quality are managed. The task-to-business-value chain is: code generation → review and correction → integration and testing → deployment → adoption → business outcome. Value can be lost at any step.

Key metrics: Review-to-generation ratio, change failure rate, rework rate, deployment frequency, lead time, escaped defects

2. Data management

Current state: AI is used for data profiling, classification, lineage tracking, quality validation, and metadata enrichment.

Economics: Value is often indirect. Better data management enables faster use-case delivery, improved model quality, and lower compliance risk. Direct cost reduction is harder to demonstrate.

Key metrics: Data discovery time, quality improvement, compliance coverage, manual effort avoided, time to deploy new use cases

3. Compliance and security monitoring

Current state: AI analyses logs, validates policies, detects anomalies, generates audit trails, and tests controls.

Economics: Regulatory and risk-driven. Value includes avoided penalties, faster audits, and reduced manual control effort. Cost must be justified against risk exposure, not only efficiency.

Key metrics: Coverage percentage, detection accuracy, false positive rate, time to detection, audit efficiency, control cost per transaction

4. Service desk automation

Current state: AI handles incident deflection, first-contact resolution, and knowledge retrieval. Examples show lower handling time and higher first-contact resolution.

Economics: High volume, low complexity. Value comes from scale, not individual transaction quality. Requires continuous quality monitoring to prevent degradation.

Key metrics: Deflection rate, deflection quality, first-contact resolution, cost per resolved ticket, escalation rate, user satisfaction

5. Technology sourcing

Current state: AI assists with RFP drafting, vendor research, contract analysis, and knowledge retrieval.

Economics: Episodic rather than continuous. Value comes from faster, better-informed decisions rather than transaction volume. Requires integration with procurement systems and knowledge bases.

Key metrics: Sourcing cycle time, contract quality, knowledge reuse, analyst productivity, decision quality

6. Legacy modernisation

Current state: One case reports analysis of 3.1 million lines of code and more than 5,000 business rules in under three weeks, with dramatic reduction in estimated effort.

Economics: High upfront cost with long-term value. Business case should include immediate delivery savings, avoided future maintenance, risk reduction, platform simplification, and improved change velocity. Requires explicit decommissioning milestones to realise value.

Key metrics: Analysis time, migration effort, decommissioning progress, risk reduction, future change velocity, avoided maintenance cost

The 5% scale problem: BCG reported that only about 5% of companies in its 2025 technology-function study were generating measurable AI value at scale. Around 60% reported no material value, while 35% were scaling and seeing some return. This suggests that technology function AI is expanding rapidly but value realisation remains concentrated among early adopters with mature execution models. Engineering leaders should not use high productivity ranges without including this scale warning.

The task-to-business-value chain

Task acceleration → Workflow throughput → Release outcome → Adoption → Business result

Review bottlenecks: AI generates code faster, but review queues slow deployment
Rework: AI outputs require correction, consuming the time saved
Quality issues: Defects, rollbacks, or security findings create downstream cost
Architecture debt: Fast generation creates technical debt that slows future work
Unused output: Features ship but are not adopted
Unchanged priorities: Faster delivery does not change what gets built

Current versus expected-at-scale distinction

Observed current outcome: What is actually happening now
Expected scaled outcome: What is projected at full adoption
Confidence range: The uncertainty in the projection
Assumptions required for conversion: What must change to reach the scaled outcome

Model routing and small-model economics

Real routing economics (May 2026): Inferbase's enterprise pricing audit found that the same open-source model can vary up to 9x in fully loaded cost across different hosting providers once caching, output ratios, rate tiers, and side-charges are included. This means routing decisions must account for more than headline per-token rates. Engineering leaders should model fully loaded cost per task, not just API rate cards, when designing routing strategies.

Review, quality, security, and downstream capacity costs

Reuse and technology velocity measures

Time from idea to funded experiment
Time from approved use case to first production user
Time from production release to measurable outcome
Percentage of components reused across use cases
Percentage of portfolio spend redirected within a quarter
Decision latency for data, risk, security, and architecture approvals
Learning cost per validated or rejected hypothesis

AI Economics for the Engineering Leader

Key takeaways

Engineering is now shaping the denominator

Model selection is an economic decision

Prompt engineering as cost optimisation

Build versus buy versus fine-tune

Shift-left cost awareness for developers

Useful engineering KPIs for AI economics

BCG's six technology use-case families

1. Software development

2. Data management

3. Compliance and security monitoring

4. Service desk automation

5. Technology sourcing

6. Legacy modernisation

The task-to-business-value chain

Current versus expected-at-scale distinction

Model routing and small-model economics

Review, quality, security, and downstream capacity costs

Reuse and technology velocity measures

What product and engineering should align on

Optimist

Sceptic

The Optimist's Case

The Sceptic's Case

The practical conclusion

References and further reading

BCG, The Widening AI Value Gap: Build for the Future, 2025

FinOps Foundation, FinOps for AI: Scopes and Capabilities, 2025

AWS, Closing the AI Value Gap, 2024

Fortune, Uber COO: AI spending on tokens like Claude Code is hard to justify, 26 May 2026

BCG: How AI is paying off in the technology function

IBM: The enterprise in 2030

McKinsey: Global Tech Agenda 2026

AI TCO Framework

FinOps for AI

Five-Level AI Economics Maturity Model

Continue exploring

AI TCO Framework

AI ROI Models

Glossary

What Cloud Taught Us About the Real Cost of AI Inference