Key takeaways
- Every enterprise AI economics framework currently in use assumes a simple request-response model: a human initiates, a model responds. Agentic systems — those that plan, decide, call tools, and act across multiple steps without human approval at each stage — break this assumption fundamentally.
- The economics of agentic systems are not worse versions of existing AI economics. They are structurally different. Cost predictability, attribution logic, ROI measurement, governance checkpoints, and risk exposure all require a different model.
- Token spend in agentic workflows is not a linear function of task count. It is an exponential function of task complexity, error recovery, verification loops, and branching logic. The difference between a well-designed and a poorly-designed agentic workflow can be an order of magnitude in cost, not a percentage.
- Most organisations currently deploying early agentic capabilities are doing so with governance designed for simpler AI patterns. The gap between current governance and appropriate governance for agentic deployment is not yet creating major visible failures — but it is creating exposure that will become apparent as these systems scale.
The hidden assumption in every framework you are using
Go back to any AI economics framework you have used in the past two years. The AI TCO model, the ROI calculation, the FinOps cost attribution practice, the portfolio governance approach. Each of them contains an implicit assumption that is so fundamental it is almost invisible: there is a human on one end of the interaction, and a model on the other.
The human has a task. The human queries, prompts, or requests. The model responds. The human reviews. The cost is the cost of that exchange. The value is the value of what that exchange produces. The governance question is whether the exchange is happening enough, at sufficient quality, at acceptable cost.
This is a reasonable model for a world of AI assistants — tools that help knowledge workers do their jobs faster. It is not a useful model for agentic systems.
In an agentic system, the AI does not wait for a human to formulate a request. It receives an objective — summarise this set of contracts, process this backlog of applications, monitor these systems and respond to anomalies, complete this research task — and it plans and executes a series of steps to reach that objective. Those steps may involve dozens or hundreds of model calls. They may involve retrieving information from external systems, calling APIs, writing and executing code, reviewing intermediate outputs, revising plans, and making subsidiary decisions. The human may specify the objective and review the final output. Everything in between is autonomous.
This is not a future capability. It is a current and rapidly expanding one. Enterprises deploying coding assistants with autonomous commit capability, autonomous research workflows, autonomous customer communication agents, and autonomous process orchestration systems are already operating in this mode. The economics they are applying to these systems are, in most cases, the economics designed for the generation before.
Why the cost model is wrong
The cost unit is no longer the prompt
In a request-response AI system, cost can be reasonably tracked per prompt or per interaction. Inference cost scales with the number of interactions and the average token depth of each one. This is predictable enough to forecast and govern.
In an agentic system, the cost unit is the task — and the relationship between task count and cost is non-linear. A single business task assigned to an agentic system may involve:
- An initial planning call to decompose the objective into subtasks
- Retrieval calls to gather relevant context from internal systems
- Multiple reasoning calls as the agent works through sub-problems
- Tool invocations (API calls, code execution, database queries) and the resulting context that must be processed
- Verification calls to check intermediate outputs against quality or compliance criteria
- Error recovery calls when intermediate steps produce unexpected results or require revision
- A final synthesis call to produce the output
Even a well-designed agentic workflow for a moderately complex task can consume 20-50 model calls. A poorly designed one — with verbose planning, inefficient retrieval, unnecessary verification loops, or fragile tool interactions that trigger repeated retries — can consume 200-500 calls for the same task.
The difference is not 20%. It is not 100%. It can be 900%, and it can occur between two different agent implementations that produce outputs of equivalent quality from the user's perspective. In the request-response world, the worst-case cost variance was bounded by the length of a prompt. In the agentic world, it is bounded by the complexity ceiling of the task and the efficiency of the implementation.
Shared infrastructure cost changes character
In request-response AI, shared infrastructure — vector databases, retrieval systems, API orchestration — scales roughly proportionally with query volume. More users, more queries, more cost. The relationship is relatively predictable.
In agentic systems, shared infrastructure is under different pressure. Every agent action that touches an external system is a transaction: an API call, a database read, a file system operation, a web request. The cost of these transactions accumulates inside the agent's execution loop, often multiplied by the retries and refinements that are inherent to complex task execution. An agentic workflow that completes 1,000 business tasks may generate 100,000 infrastructure transactions, distributed across systems that were sized for much lower load.
This creates a new category of cost-scale risk: an agentic system that is performing as designed — completing tasks at the expected rate — can create infrastructure load that was not anticipated in any TCO model that was constructed with request-response patterns in mind.
Error recovery is expensive and invisible
Humans retry tasks occasionally, and when they do, it is visible. An agentic system retries continuously as part of its normal operation. When a tool call fails, the agent retries. When an intermediate result fails a quality check, the agent revises and retries. When the plan hits an unexpected constraint, the agent replans and retries. All of this retry behaviour generates model calls, infrastructure transactions, and latency — and almost none of it is visible in the output the human sees.
In a production agentic system under realistic operating conditions, 20-40% of the total token consumption may come from error recovery, replanning, and verification loops rather than from the direct work of completing the task. In poorly designed systems with fragile tool integrations or inadequate planning logic, this fraction can be higher.
Most TCO models do not have a line for error recovery spend. Most FinOps dashboards do not surface it. It is consumed quietly, at scale, inside what looks like normal usage.
Why the ROI model is wrong
Attribution breaks in multi-step autonomous systems
In a simple AI assistant case, attribution is imperfect but tractable. The AI helped write the report. The report contributed to winning the contract. The contract contributed to revenue. The chain of reasoning is speculative but bounded.
In an agentic system completing complex tasks autonomously, attribution becomes a genuinely difficult problem. Consider an agentic research system that analyses 500 regulatory filings, identifies a risk pattern, flags it for review, and provides supporting evidence. A lawyer reviews the output, makes a decision, and the organisation avoids a compliance failure. What is the value of the agentic system's contribution?
The autonomous system made the task possible at all — the volume of filings could not be reviewed manually within the available time. But the system also failed to identify three relevant filings, which the lawyer caught during review. And the risk pattern identified would have been caught by a different process in a longer timeframe, at greater cost. So the value is the difference between the outcome with the system and the outcome without it — under a counterfactual that can only be estimated, not observed.
This is a philosophically hard problem, and it gets harder as agentic systems become more deeply embedded in decision-making processes. When the agent is making thousands of micro-decisions per day — each individually small, collectively shaping business outcomes — the notion of attributing value to any one system or intervention becomes analytically untenable.
This does not mean that value measurement is impossible or that ROI frameworks are useless. It means that the existing frameworks, which assume a cleanly attributable human-AI interaction, need to be replaced with portfolio-level value measurement that assesses overall outcome improvement rather than initiative-level attribution.
The human review assumption may not hold
Most AI ROI models implicitly assume that humans review AI outputs before consequential decisions are made. The value chain is: AI accelerates task completion → human reviews and decides → organisation benefits.
Agentic systems increasingly operate outside this model. When an agentic system is processing 10,000 customer communication responses per day, or executing 500 research tasks per week, or monitoring and routing 50,000 service events per day, the volume of AI outputs exceeds any realistic human review capacity. The governance model that supports a request-response AI interaction — human review at the point of output — does not scale to agentic operation.
This has two economic consequences. First, the value of AI judgment becomes higher (because less human judgment is being applied alongside it) and so does the cost of AI error (because fewer human checkpoints exist to catch mistakes before they become consequential). Second, the measurement of value requires looking at aggregate outcome quality rather than individual transaction quality — which requires measurement infrastructure that most organisations do not yet have.
Why the governance model is wrong
Stage gates were designed for human-paced change
The governance frameworks in this publication — stage gates, proof thresholds, portfolio reviews, stop-or-scale decisions — were designed for a world where AI capability changes are delivered in development cycles and released to users who interact with them at human speed.
Agentic systems can change the scale and character of their impact much faster than this. An agentic system deployed with one task type can be redirected to a different task type in hours. An agentic system that was generating low volume can be scaled to high volume without a release cycle. The governance cadence that is adequate for quarterly portfolio reviews is not adequate for systems that can materially change their economic and operational impact between meetings.
This is not an argument against portfolio governance — it is an argument for tighter operating loops. Agentic systems need operational governance at the task level (what objectives is the system allowed to pursue?) and at the tool level (what external systems can the agent interact with, and under what constraints?), not only at the portfolio level.
Existing controls assume human-initiated risk
Almost all enterprise AI risk governance is built around the assumption that risk materialises because a human makes a bad decision using AI assistance. A model produces a hallucination; the human incorporates it uncritically; an error results. The control is human review quality.
In agentic systems, risk can materialise without a human decision point. An agent that is authorised to write and execute code can introduce a software vulnerability before any human has reviewed the code. An agent that is authorised to interact with external systems can send communications, trigger transactions, or access data before the consequence of those actions is visible to any oversight function. An agent that is authorised to book travel or commit resources can make commitments that create financial obligations before a human has validated that the commitment is appropriate.
These are not exotic failure scenarios. They are natural consequences of giving autonomous systems the capability to act rather than merely to advise. The governance frameworks that manage AI risk in the request-response world — output review, explainability requirements, human-in-the-loop design — require fundamental redesign for systems that act autonomously.
What an agentic AI economics framework needs to include
The frameworks on this site remain useful starting points — but they require extension to address agentic deployment.
On cost: The TCO model needs a layer for orchestration economics: the cost of planning loops, tool invocations, error recovery, and verification passes that occur inside the agent's execution. This is distinct from model inference cost and from infrastructure cost as traditionally defined. It should be tracked at the workflow level, with explicit measurement of the ratio between task-completion calls and overhead calls.
On ROI: Attribution models need to shift from initiative-level to portfolio-level measurement. The question is not "what did this agentic system contribute to this outcome?" — that question will often be unanswerable. The question is "what is the quality and volume of outcomes in this domain, compared with what it was before agentic deployment, at what total cost?" This is a different measurement design, but it is a tractable one.
On governance: Portfolio reviews need to be supplemented by operational governance at the task-authorisation and tool-access level. Before deploying an agentic system, the organisation needs to specify: what objectives is this system authorised to pursue? What systems can it access? What can it commit or communicate autonomously? What requires human approval? These are governance design questions, not technology questions — and they need to be answered before deployment, not after the first incident.
On risk: Tail-risk exposure from agentic systems is higher than from request-response systems, because autonomous action can propagate errors faster and further than human-reviewed output. This exposure should be included in economic analysis. The expected cost of a significant agentic failure — in operational disruption, regulatory consequence, reputational damage, or financial commitment — should be estimated and weighted against the expected operating benefit.
The practical implication
Organisations deploying agentic systems now — coding agents, research agents, customer communication agents, process orchestration agents — are doing so with governance designed for their predecessors. Most of them know this. Most of them are proceeding anyway, for competitive or operational reasons that are genuinely compelling.
That is a defensible choice. It is not a safe one. The gap between agentic governance requirements and current agentic governance practice is where the next generation of AI economic failures will originate.
The organisations that get ahead of this will not be the ones that deploy agentic systems most aggressively. They will be the ones that pair aggressive deployment with honest assessment of what their current governance frameworks can and cannot adequately cover — and invest proportionately in closing the gap before scale forces the lesson on them.