Why Cheaper AI Will Cost More

Standfirst

The most dangerous AI budget assumption is that falling model prices will solve the cost problem. They may do the opposite.

1. The comforting forecast

AI buyers are repeatedly told that models and hardware will become more efficient. That is likely.

The conclusion usually follows automatically: AI will become cheaper.

At the level of an individual model call, this may be true. At the level of an enterprise budget, it is not guaranteed.

Deloitte frames the problem through Jevons' paradox: efficiency lowers the cost of using a resource, which can expand consumption enough to increase total demand.[^deloitte] This is not a law that mechanically predicts every AI budget. It is a warning about behaviour and system design.

The relevant equation is simple:

Total AI cost = unit cost × volume × workflow complexity × operating overhead

A fall in the first term can be overwhelmed by growth in the others.

Analysis · Inference Economics

The Jevons Paradox in Enterprise AI

Per-token prices have fallen dramatically. Enterprise AI budgets have not fallen — they have risen as cheaper access triggered more use cases, wider deployment, and agentic integration that multiplies consumption.

~99%

Fall in leading-model input cost per 1M tokens since 2023. Unit economics improved dramatically.

↑ 800%+

Rise in enterprise AI budget index over the same period. Total spend rose as cheaper access triggered more demand.

Illustrative — AI Economics Hub analysis. Token cost trajectory based on publicly reported commercial model pricing changes (GPT-4 class models, 2023–2026). Budget index is indicative of observed enterprise AI spend trajectories, not a survey benchmark. Agentic workloads increase consumption per task 5–30×, accelerating the Jevons effect. Sources: Gartner (2025), WSJ May 2026 (corporate rationing), AI Economics Hub inference.

2. Five demand multipliers

1. More users

Lower prices reduce the friction of giving more employees and customers access to AI.

2. More use cases

Tasks that failed a cost threshold become viable. Summarisation expands into document analysis, then into continuous monitoring and autonomous action.

3. More calls per task

Agentic systems plan, retrieve, call tools, verify, retry and reflect. One user request can create a chain of model interactions.

4. More context and memory

Systems ingest larger documents, histories, customer records and multimodal inputs. The unit of demand expands even when the user count stays flat.

5. Higher quality expectations

As AI becomes embedded in material workflows, organisations may move from cheap models to better reasoning, verification and redundancy. The cost per successful outcome can rise even while list prices fall.

3. Human demand and machine demand are different

Traditional SaaS demand is anchored to people:

users
seats
sessions
working hours

Agent demand is anchored to objectives and permissions.

An agent can:

monitor continuously
retry automatically
delegate to other agents
generate its own subtasks
operate outside human working hours
respond to machine-generated events

This changes the forecasting denominator.

A 10% increase in employee adoption may not mean a 10% increase in cost. A small increase in agent autonomy can create a much larger increase in calls.

4. The rebound loop

A typical enterprise optimisation programme may work like this:

Engineering routes simple work to a smaller model.
Cost per call falls.
The business expands the workflow to more cases.
More context and tool calls are added.
The agent is allowed to retry more often.
Total cost rises.
The programme reports successful unit-cost optimisation and a budget overrun at the same time.

Neither result is contradictory.

This is the AI rebound loop.

5. Why savings do not save themselves

A model-routing improvement may reduce the cost of an existing workflow.

Three things can happen.

Savings are banked

Budget or capacity is removed. The financial benefit is realised.

Savings are reinvested deliberately

The organisation funds additional high-value work. Cost stays flat or rises, but expected value rises faster.

Savings disappear into consumption

Usage expands without a portfolio decision. The unit-cost improvement is real, but no financial or strategic value is captured.

Most organisations do not distinguish these outcomes clearly.

6. Forecast demand as a range

A single annual AI budget is too fragile. Use scenarios.

Baseline scenario

current users and workflows
expected adoption
known provider pricing
current model mix

Expansion scenario

more functions onboarded
longer context
more agent loops
increased multimodal use
higher quality requirements

Autonomy scenario

event-driven agents
agent-to-agent delegation
continuous monitoring
retries and verification
tool execution

Constraint scenario

capacity shortage
vendor price change
premium model requirement
regulation or sovereignty requirement
human review increase

Each scenario should show:

total tokens or equivalent capacity
full cost
cost per successful outcome
quality and service level
value at risk
breakpoints for architecture or contract change

7. Service levels for intelligence

Not every task needs frontier reasoning, immediate response or exhaustive verification.

Define intelligence service levels.

Bronze

low-cost model
batch where possible
limited context
lower assurance
non-material work

Silver

balanced model
standard latency
evaluation and fallback
normal enterprise workflows

Gold

high-capability model
verification
strong provenance
human review
material decisions

Critical

redundant checks
constrained actions
auditable evidence
explicit human authority
risk and resilience controls

This makes cost a design choice rather than a surprise.

8. Portfolio governance for abundance

When intelligence becomes cheaper, prioritisation becomes more important, not less.

Scarcity once limited demand. Abundance removes that discipline.

Portfolio reviews should ask:

Which new demand was planned?
Which demand is machine-generated?
Which unit-cost savings were banked or reinvested?
Which workflows have rising cost per successful outcome?
Which agents are creating calls without corresponding outcomes?
What should be throttled, redesigned or stopped?
Has lower cost enabled strategic value or merely more activity?

9. Where Jevons may not apply

Total spend can fall when:

demand is saturated
budgets are hard capped
models become efficient faster than use expands
on-device inference displaces paid capacity
workflows are simplified rather than expanded
competition transfers efficiency gains to buyers
organisations deliberately bank savings

The correct claim is not "AI will always cost more".

It is:

Falling unit prices are not a cost-control strategy.

10. Management actions

CFO

Require volume and complexity scenarios, not only provider price assumptions.

FinOps

Separate price variance, volume variance, model-mix variance and workflow-complexity variance.

Engineering

Track calls per successful outcome, retry rates, context growth and agent fan-out.

Product and operations

Define where increased intelligence depth has measurable value.

Procurement

Negotiate visibility, volume bands, commitment flexibility and exit rights.

Board

Distinguish rising cost caused by successful value expansion from rising cost caused by uncontrolled demand.

Conclusion

AI can become dramatically cheaper per unit and materially more expensive in total.

That is not a paradox once demand is visible.

The enterprise problem is no longer only the cost of intelligence. It is the ability of humans and agents to create demand for intelligence faster than governance can judge its value.

Sources

[^deloitte]: Deloitte, The pivot to tokenomics: Navigating AI's new spend dynamics, p. 12 and pp. 16-19. The report discusses Jevons' paradox in the context of AI efficiency improvements potentially increasing total consumption.

[^jevons]: Blake Alcott, "Jevons' paradox", Ecological Economics 54(1), 2005, pp. 9-21. https://doi.org/10.1016/j.ecolecon.2005.03.020

[^finops]: FinOps Foundation, "Token Economics: The Atomic Unit of AI Value", https://www.finops.org/insights/token-economics-the-atomic-unit-of-ai-value/

[^agentic]: AI Economics Hub, "Agentic AI Economics", https://aieconomicshub.com/articles/agentic-ai-economics

Why Cheaper AI Will Cost More

Why Cheaper AI Will Cost More

Standfirst

1. The comforting forecast

2. Five demand multipliers

1. More users

2. More use cases

3. More calls per task

4. More context and memory

5. Higher quality expectations

3. Human demand and machine demand are different

4. The rebound loop

5. Why savings do not save themselves

Savings are banked

Savings are reinvested deliberately

Savings disappear into consumption

6. Forecast demand as a range

Baseline scenario

Expansion scenario

Autonomy scenario

Constraint scenario

7. Service levels for intelligence

Bronze

Silver

Gold

Critical

8. Portfolio governance for abundance

9. Where Jevons may not apply

10. Management actions

CFO

FinOps

Engineering

Product and operations

Procurement

Board

Conclusion

Related reading

Sources

Continue exploring

Token Economics

The Token Is the Meter, Not the Value

AI TCO Framework

Agentic AI Economics

Inference Cost Crisis