Agentic AI Economics: Why Your Existing Frameworks Are Already Obsolete

The hidden assumption in every framework you are using

Agentic deployment in practice: Klarna deployed an AI customer service agent in 2024 that handled the work equivalent of 700 full-time agents. The system operated autonomously across multiple conversation turns, accessing customer data, making policy decisions, and resolving issues without human intervention in the majority of cases. This represents a fundamentally different economic model from AI assistants that support human agents. However, by May 2025 Klarna had scaled back its AI customer service claims and resumed hiring human agents, illustrating how initial agentic deployments require ongoing validation and adjustment.

Why the cost model is wrong

The cost unit is no longer the prompt

An initial planning call to decompose the objective into subtasks
Retrieval calls to gather relevant context from internal systems
Multiple reasoning calls as the agent works through sub-problems
Tool invocations (API calls, code execution, database queries) and the resulting context that must be processed
Verification calls to check intermediate outputs against quality or compliance criteria
Error recovery calls when intermediate steps produce unexpected results or require revision
A final synthesis call to produce the output

At a glance

20–50

model calls for a well-designed moderately complex task

200–500

model calls for a poorly designed version of the same task

Up to 900%

cost variance between equivalent agent implementations

Analysis · Agentic AI Economics

How Agentic Systems Break the Cost Model

Every AI economics framework in use today assumes one human request generates one model response. Agentic systems break this assumption — one business objective can trigger 20–200 model calls, with cost variance up to 10× for equivalent output quality.

Traditional AI Assistant

Request-response pattern. Human initiates, model responds.

1 request = 1 API call

Agentic System

Same business task. Autonomous multi-step execution.

1 task = 20–200 API calls

Planning overhead

1–15%

of total token consumption

Tool invocations

20–40%

of total cost at scale

Error recovery

20–40%

invisible in most FinOps dashboards

Implementation variance

5–10×

cost difference for same task quality

The cost unit is no longer the prompt — it is the task. Agentic systems require a new cost model that tracks orchestration overhead, tool invocation cost, error recovery consumption, and verification loops separately. Two agent implementations producing equivalent outputs can differ by 900% in token cost, invisible to frameworks built for request-response AI.

Illustrative. Call counts are indicative ranges based on published AWS agentic workflow research and AI Economics Hub analysis. Error recovery estimates based on observed production agentic patterns. Sources: AWS ML blog (agentic optimisation), FinOps Foundation emerging guidance on agentic AI (2026), AI Economics Hub analysis.

Shared infrastructure cost changes character

Error recovery is expensive and invisible

Why the ROI model is wrong

Attribution breaks in multi-step autonomous systems

The human review assumption may not hold

Scale beyond human review: Klarna's agentic customer service system handled 2.3 million conversations in its first month, equivalent to two-thirds of the company's total customer service volume. At this scale, individual transaction review is economically impossible. Value measurement shifted to aggregate metrics: resolution rate, customer satisfaction, escalation patterns, and error frequency. The subsequent scaling back of claims in 2025 underscores the importance of rigorous ongoing measurement at this scale.

Why the governance model is wrong

Stage gates were designed for human-paced change

Optimist

Sceptic

The Optimist's Case

The Sceptic's Case

Existing controls assume human-initiated risk

Framework · Agentic AI Economics

The Agent Autonomy Ladder

The economically optimal autonomy level minimises total cost: execution + review + delay + expected error and remediation. High oversight and no oversight both carry high total costs — for different reasons. The sweet spot is levels 4–5 for high-volume, reversible decisions.

Execution cost

Review cost

Delay cost

Error & remediation cost

Bar length = total economic cost (relative units)

The decision cost equation

Total cost = Execution cost + Review cost + Delay cost + Expected error & remediation cost — Right autonomy = the level that minimises this sum for a given decision type, risk, and reversibility.

Illustrative framework — AI Economics Hub, based on Deloitte 2026 decision-making research (one-way/two-way door distinction), BCG AI Radar 2026 (58% of leading organisations expect agent autonomy to require governance changes), and AI Economics Hub analysis of agentic cost structures. Cost units are relative and scenario-dependent. Actual optimal level depends on decision volume, reversibility, error consequence, and value of speed.

What an agentic AI economics framework needs to include

On cost:

On ROI:

On governance:

On risk:

The autonomy ladder: matching economics to decision rights

Six levels of agent autonomy

1. Recommend: The agent analyses the situation and proposes an action, but a human must approve before execution.

Execution cost: Low (single recommendation generation)
Review cost: High (human must evaluate every recommendation)
Delay cost: High (queueing for human approval)
Error cost: Low (human catches errors before execution)
Best for: High-impact irreversible decisions, regulated actions, novel situations

2. Draft: The agent prepares complete work products (documents, code, communications) that humans review and approve before use.

Execution cost: Medium (complete draft generation)
Review cost: Medium-high (human reviews full output)
Delay cost: Medium (review queue, but work is complete)
Error cost: Low-medium (errors caught in review)
Best for: Customer communications, regulatory filings, code commits, policy documents

3. Execute with approval: The agent completes the task and presents it for approval, but can execute immediately upon approval without rework.

Execution cost: Medium (complete execution, held pending approval)
Review cost: Medium (human approves or rejects)
Delay cost: Medium (approval queue)
Error cost: Medium (errors may execute if approval is cursory)
Best for: Routine transactions with financial or operational consequence

4. Execute within limits: The agent acts autonomously within defined boundaries (spend limits, data access, communication scope) and escalates exceptions.

Execution cost: Low (autonomous execution)
Review cost: Low (exception review only)
Delay cost: Low (no queue for routine cases)
Error cost: Medium (errors execute but are bounded)
Exception rate: Critical metric (high rate indicates poor boundary design)
Best for: High-volume routine decisions with clear rules and bounded risk

5. Execute and report: The agent acts autonomously and reports actions for audit and pattern review, but does not require approval.

Execution cost: Low (fully autonomous)
Review cost: Very low (periodic audit only)
Delay cost: None
Error cost: Medium-high (errors execute and may compound)
Override rate: Key metric (human intervention frequency)
Best for: Low-risk reversible decisions, monitoring and alerting, data processing

6. Autonomous with monitoring: The agent operates continuously with automated monitoring for anomalies, cost thresholds, or quality degradation.

Execution cost: Very low (no human involvement)
Review cost: Very low (automated monitoring)
Delay cost: None
Error cost: High (errors may propagate before detection)
Value of faster action: Must significantly exceed error risk
Best for: Real-time systems, high-volume low-risk decisions, time-sensitive responses

Economics by autonomy level

For high-volume routine decisions with low error consequence, autonomous execution (levels 5-6) is economically optimal. The cost of human review exceeds the cost of occasional errors.

For high-impact irreversible decisions, recommendation or draft modes (levels 1-2) are economically optimal despite higher review and delay costs. The cost of a single significant error exceeds the cumulative cost of review.

For decisions in between, the optimal level depends on:

Volume: Higher volume favours autonomy (review cost compounds)
Reversibility: Easier reversal favours autonomy (error cost is bounded)
Value of speed: Time-sensitive decisions favour autonomy (delay cost is high)
Error consequence: Higher consequence favours human approval (error cost is high)
Exception rate: High exception rate indicates boundaries are poorly defined (redesign before increasing autonomy)

Expected agent growth: In IBM's late-2025 Enterprise 2030 survey of 2,000 executives, respondents expected agentic AI to become embedded across finance, HR, sales, marketing, IT, manufacturing, supply chain, and R&D by 2030, with substantial increases from 2025 levels. However, these are executive expectations about future adoption, not observed deployment outcomes. Organisations should treat these as directional indicators of intent rather than validated forecasts.

Decision-rights module: who decides what

What objectives is this system authorised to pursue?
What systems can it access?
What can it commit or communicate autonomously?
What requires human approval?
Who is the escalation owner for exceptions?
What monitoring is required?
What triggers a stop or capability reduction?