Skip to content

Why Cheaper AI Will Cost More

Standfirst

The most dangerous AI budget assumption is that falling model prices will solve the cost problem. They may do the opposite.

1. The comforting forecast

AI buyers are repeatedly told that models and hardware will become more efficient. That is likely.

The conclusion usually follows automatically: AI will become cheaper.

At the level of an individual model call, this may be true. At the level of an enterprise budget, it is not guaranteed.

Deloitte frames the problem through Jevons' paradox: efficiency lowers the cost of using a resource, which can expand consumption enough to increase total demand.[^deloitte] This is not a law that mechanically predicts every AI budget. It is a warning about behaviour and system design.

The relevant equation is simple:

Total AI cost = unit cost × volume × workflow complexity × operating overhead

A fall in the first term can be overwhelmed by growth in the others.

2. Five demand multipliers

1. More users

Lower prices reduce the friction of giving more employees and customers access to AI.

2. More use cases

Tasks that failed a cost threshold become viable. Summarisation expands into document analysis, then into continuous monitoring and autonomous action.

3. More calls per task

Agentic systems plan, retrieve, call tools, verify, retry and reflect. One user request can create a chain of model interactions.

4. More context and memory

Systems ingest larger documents, histories, customer records and multimodal inputs. The unit of demand expands even when the user count stays flat.

5. Higher quality expectations

As AI becomes embedded in material workflows, organisations may move from cheap models to better reasoning, verification and redundancy. The cost per successful outcome can rise even while list prices fall.

3. Human demand and machine demand are different

Traditional SaaS demand is anchored to people:

  • users
  • seats
  • sessions
  • working hours

Agent demand is anchored to objectives and permissions.

An agent can:

  • monitor continuously
  • retry automatically
  • delegate to other agents
  • generate its own subtasks
  • operate outside human working hours
  • respond to machine-generated events

This changes the forecasting denominator.

A 10% increase in employee adoption may not mean a 10% increase in cost. A small increase in agent autonomy can create a much larger increase in calls.

4. The rebound loop

A typical enterprise optimisation programme may work like this:

  1. Engineering routes simple work to a smaller model.
  2. Cost per call falls.
  3. The business expands the workflow to more cases.
  4. More context and tool calls are added.
  5. The agent is allowed to retry more often.
  6. Total cost rises.
  7. The programme reports successful unit-cost optimisation and a budget overrun at the same time.

Neither result is contradictory.

This is the AI rebound loop.

5. Why savings do not save themselves

A model-routing improvement may reduce the cost of an existing workflow.

Three things can happen.

Savings are banked

Budget or capacity is removed. The financial benefit is realised.

Savings are reinvested deliberately

The organisation funds additional high-value work. Cost stays flat or rises, but expected value rises faster.

Savings disappear into consumption

Usage expands without a portfolio decision. The unit-cost improvement is real, but no financial or strategic value is captured.

Most organisations do not distinguish these outcomes clearly.

6. Forecast demand as a range

A single annual AI budget is too fragile. Use scenarios.

Baseline scenario

  • current users and workflows
  • expected adoption
  • known provider pricing
  • current model mix

Expansion scenario

  • more functions onboarded
  • longer context
  • more agent loops
  • increased multimodal use
  • higher quality requirements

Autonomy scenario

  • event-driven agents
  • agent-to-agent delegation
  • continuous monitoring
  • retries and verification
  • tool execution

Constraint scenario

  • capacity shortage
  • vendor price change
  • premium model requirement
  • regulation or sovereignty requirement
  • human review increase

Each scenario should show:

  • total tokens or equivalent capacity
  • full cost
  • cost per successful outcome
  • quality and service level
  • value at risk
  • breakpoints for architecture or contract change

7. Service levels for intelligence

Not every task needs frontier reasoning, immediate response or exhaustive verification.

Define intelligence service levels.

Bronze

  • low-cost model
  • batch where possible
  • limited context
  • lower assurance
  • non-material work

Silver

  • balanced model
  • standard latency
  • evaluation and fallback
  • normal enterprise workflows

Gold

  • high-capability model
  • verification
  • strong provenance
  • human review
  • material decisions

Critical

  • redundant checks
  • constrained actions
  • auditable evidence
  • explicit human authority
  • risk and resilience controls

This makes cost a design choice rather than a surprise.

8. Portfolio governance for abundance

When intelligence becomes cheaper, prioritisation becomes more important, not less.

Scarcity once limited demand. Abundance removes that discipline.

Portfolio reviews should ask:

  • Which new demand was planned?
  • Which demand is machine-generated?
  • Which unit-cost savings were banked or reinvested?
  • Which workflows have rising cost per successful outcome?
  • Which agents are creating calls without corresponding outcomes?
  • What should be throttled, redesigned or stopped?
  • Has lower cost enabled strategic value or merely more activity?

9. Where Jevons may not apply

Total spend can fall when:

  • demand is saturated
  • budgets are hard capped
  • models become efficient faster than use expands
  • on-device inference displaces paid capacity
  • workflows are simplified rather than expanded
  • competition transfers efficiency gains to buyers
  • organisations deliberately bank savings

The correct claim is not "AI will always cost more".

It is:

Falling unit prices are not a cost-control strategy.

10. Management actions

CFO

Require volume and complexity scenarios, not only provider price assumptions.

FinOps

Separate price variance, volume variance, model-mix variance and workflow-complexity variance.

Engineering

Track calls per successful outcome, retry rates, context growth and agent fan-out.

Product and operations

Define where increased intelligence depth has measurable value.

Procurement

Negotiate visibility, volume bands, commitment flexibility and exit rights.

Board

Distinguish rising cost caused by successful value expansion from rising cost caused by uncontrolled demand.

Conclusion

AI can become dramatically cheaper per unit and materially more expensive in total.

That is not a paradox once demand is visible.

The enterprise problem is no longer only the cost of intelligence. It is the ability of humans and agents to create demand for intelligence faster than governance can judge its value.

Sources

[^deloitte]: Deloitte, The pivot to tokenomics: Navigating AI's new spend dynamics, p. 12 and pp. 16-19. The report discusses Jevons' paradox in the context of AI efficiency improvements potentially increasing total consumption.

[^jevons]: Blake Alcott, "Jevons' paradox", Ecological Economics 54(1), 2005, pp. 9-21. https://doi.org/10.1016/j.ecolecon.2005.03.020

[^finops]: FinOps Foundation, "Token Economics: The Atomic Unit of AI Value", https://www.finops.org/insights/token-economics-the-atomic-unit-of-ai-value/

[^agentic]: AI Economics Hub, "Agentic AI Economics", https://aieconomicshub.com/articles/agentic-ai-economics