Why Cheaper AI Will Cost More
Standfirst
The most dangerous AI budget assumption is that falling model prices will solve the cost problem. They may do the opposite.
1. The comforting forecast
AI buyers are repeatedly told that models and hardware will become more efficient. That is likely.
The conclusion usually follows automatically: AI will become cheaper.
At the level of an individual model call, this may be true. At the level of an enterprise budget, it is not guaranteed.
Deloitte frames the problem through Jevons' paradox: efficiency lowers the cost of using a resource, which can expand consumption enough to increase total demand.[^deloitte] This is not a law that mechanically predicts every AI budget. It is a warning about behaviour and system design.
The relevant equation is simple:
Total AI cost = unit cost × volume × workflow complexity × operating overhead
A fall in the first term can be overwhelmed by growth in the others.
2. Five demand multipliers
1. More users
Lower prices reduce the friction of giving more employees and customers access to AI.
2. More use cases
Tasks that failed a cost threshold become viable. Summarisation expands into document analysis, then into continuous monitoring and autonomous action.
3. More calls per task
Agentic systems plan, retrieve, call tools, verify, retry and reflect. One user request can create a chain of model interactions.
4. More context and memory
Systems ingest larger documents, histories, customer records and multimodal inputs. The unit of demand expands even when the user count stays flat.
5. Higher quality expectations
As AI becomes embedded in material workflows, organisations may move from cheap models to better reasoning, verification and redundancy. The cost per successful outcome can rise even while list prices fall.
3. Human demand and machine demand are different
Traditional SaaS demand is anchored to people:
- users
- seats
- sessions
- working hours
Agent demand is anchored to objectives and permissions.
An agent can:
- monitor continuously
- retry automatically
- delegate to other agents
- generate its own subtasks
- operate outside human working hours
- respond to machine-generated events
This changes the forecasting denominator.
A 10% increase in employee adoption may not mean a 10% increase in cost. A small increase in agent autonomy can create a much larger increase in calls.
4. The rebound loop
A typical enterprise optimisation programme may work like this:
- Engineering routes simple work to a smaller model.
- Cost per call falls.
- The business expands the workflow to more cases.
- More context and tool calls are added.
- The agent is allowed to retry more often.
- Total cost rises.
- The programme reports successful unit-cost optimisation and a budget overrun at the same time.
Neither result is contradictory.
This is the AI rebound loop.
5. Why savings do not save themselves
A model-routing improvement may reduce the cost of an existing workflow.
Three things can happen.
Savings are banked
Budget or capacity is removed. The financial benefit is realised.
Savings are reinvested deliberately
The organisation funds additional high-value work. Cost stays flat or rises, but expected value rises faster.
Savings disappear into consumption
Usage expands without a portfolio decision. The unit-cost improvement is real, but no financial or strategic value is captured.
Most organisations do not distinguish these outcomes clearly.
6. Forecast demand as a range
A single annual AI budget is too fragile. Use scenarios.
Baseline scenario
- current users and workflows
- expected adoption
- known provider pricing
- current model mix
Expansion scenario
- more functions onboarded
- longer context
- more agent loops
- increased multimodal use
- higher quality requirements
Autonomy scenario
- event-driven agents
- agent-to-agent delegation
- continuous monitoring
- retries and verification
- tool execution
Constraint scenario
- capacity shortage
- vendor price change
- premium model requirement
- regulation or sovereignty requirement
- human review increase
Each scenario should show:
- total tokens or equivalent capacity
- full cost
- cost per successful outcome
- quality and service level
- value at risk
- breakpoints for architecture or contract change
7. Service levels for intelligence
Not every task needs frontier reasoning, immediate response or exhaustive verification.
Define intelligence service levels.
Bronze
- low-cost model
- batch where possible
- limited context
- lower assurance
- non-material work
Silver
- balanced model
- standard latency
- evaluation and fallback
- normal enterprise workflows
Gold
- high-capability model
- verification
- strong provenance
- human review
- material decisions
Critical
- redundant checks
- constrained actions
- auditable evidence
- explicit human authority
- risk and resilience controls
This makes cost a design choice rather than a surprise.
8. Portfolio governance for abundance
When intelligence becomes cheaper, prioritisation becomes more important, not less.
Scarcity once limited demand. Abundance removes that discipline.
Portfolio reviews should ask:
- Which new demand was planned?
- Which demand is machine-generated?
- Which unit-cost savings were banked or reinvested?
- Which workflows have rising cost per successful outcome?
- Which agents are creating calls without corresponding outcomes?
- What should be throttled, redesigned or stopped?
- Has lower cost enabled strategic value or merely more activity?
9. Where Jevons may not apply
Total spend can fall when:
- demand is saturated
- budgets are hard capped
- models become efficient faster than use expands
- on-device inference displaces paid capacity
- workflows are simplified rather than expanded
- competition transfers efficiency gains to buyers
- organisations deliberately bank savings
The correct claim is not "AI will always cost more".
It is:
Falling unit prices are not a cost-control strategy.
10. Management actions
CFO
Require volume and complexity scenarios, not only provider price assumptions.
FinOps
Separate price variance, volume variance, model-mix variance and workflow-complexity variance.
Engineering
Track calls per successful outcome, retry rates, context growth and agent fan-out.
Product and operations
Define where increased intelligence depth has measurable value.
Procurement
Negotiate visibility, volume bands, commitment flexibility and exit rights.
Board
Distinguish rising cost caused by successful value expansion from rising cost caused by uncontrolled demand.
Conclusion
AI can become dramatically cheaper per unit and materially more expensive in total.
That is not a paradox once demand is visible.
The enterprise problem is no longer only the cost of intelligence. It is the ability of humans and agents to create demand for intelligence faster than governance can judge its value.
Related reading
- Token Economics
- The Token Is the Meter, Not the Value
- AI TCO Framework
- Agentic AI Economics
- Inference Cost Crisis
Sources
[^deloitte]: Deloitte, The pivot to tokenomics: Navigating AI's new spend dynamics, p. 12 and pp. 16-19. The report discusses Jevons' paradox in the context of AI efficiency improvements potentially increasing total consumption.
[^jevons]: Blake Alcott, "Jevons' paradox", Ecological Economics 54(1), 2005, pp. 9-21. https://doi.org/10.1016/j.ecolecon.2005.03.020
[^finops]: FinOps Foundation, "Token Economics: The Atomic Unit of AI Value", https://www.finops.org/insights/token-economics-the-atomic-unit-of-ai-value/
[^agentic]: AI Economics Hub, "Agentic AI Economics", https://aieconomicshub.com/articles/agentic-ai-economics