Skip to content

Rent, Reserve or Own Intelligence?

The enterprise infrastructure question is no longer simply cloud or on-premises. It is how much intelligence capacity to rent, reserve, manage or own for each workload.

From software sourcing to intelligence sourcing

Enterprises once chose whether to buy software or build it.

They now choose how to source machine intelligence:

  • Packaged SaaS
  • Public API
  • Reserved model or throughput capacity
  • Hyperscaler managed service
  • Neocloud capacity
  • Private cloud
  • Colocation
  • Owned AI factory
  • Edge or device inference
  • Hybrid combinations

The same model capability can arrive through radically different economic structures.

Four sourcing modes

Rent

Pay per token, request, action or usage.

Best for:

  • Pilots
  • Uncertain demand
  • Low volume
  • Rapid access
  • Model experimentation
  • Variable workloads

Risks:

  • Price exposure
  • Limited infrastructure control
  • Opacity
  • Vendor dependency
  • Data and residency constraints

Reserve

Commit to capacity, throughput or spend for a period.

Best for:

  • Growing but still variable workloads
  • Predictable base demand plus bursts
  • Need for discount without infrastructure ownership
  • Enterprise service levels

Risks:

  • Commitment waste
  • Forecast error
  • Model change during contract
  • Lock-in

Managed capacity

Use dedicated or specialised infrastructure operated by a provider.

Best for:

  • Performance-sensitive workloads
  • Regional or security requirements
  • Elastic access with more control
  • Organisations lacking facilities capability

Risks:

  • Shared responsibility complexity
  • Provider concentration
  • Contract and capacity constraints
  • Migration difficulty

Own

Acquire and operate infrastructure directly or through colocation.

Best for:

  • Large predictable demand
  • High utilisation
  • Material latency or sovereignty needs
  • Stable model strategy
  • Strong platform capability

Risks:

  • Capital
  • Stranded capacity
  • Rapid obsolescence
  • Power and cooling
  • Staffing
  • Integration
  • Slower deployment
  • Refresh cycles

What Deloitte's model shows

Deloitte models API, neocloud and self-hosted AI-factory economics across rising token volume.

The useful lesson is the shape of the curve.

The dangerous lesson is treating 84 billion as a universal threshold.

Interpretation

Deloitte's 84-billion-token crossover is a modelled result under specific assumptions: B200 hardware, US costs, hourly neocloud pricing, and specific workload simulation. The model explicitly excludes storage and egress, security, one-time integration, and staffing costs. These exclusions can materially change the crossover point.

Why the threshold moves

Model and quality

A smaller open model and a frontier reasoning model do not provide equivalent work.

Input and output mix

Output tokens can carry different economics from input and cached tokens.

Utilisation

Owned infrastructure wins only if it is productively used.

Demand shape

Steady demand supports capacity ownership. Spiky demand supports renting.

Hardware generation

Performance and energy efficiency change rapidly.

Contract terms

Reserved API or neocloud pricing can change the comparison.

Full cost

Deloitte's model excludes storage and egress, security, one-time integration and staffing. An enterprise decision must include them.

Risk and sovereignty

Control may justify higher nominal cost.

Exit and optionality

The ability to switch models or providers has economic value.

Workload decision profile

For each material workload, record:

  • Business outcome
  • Criticality
  • Volume
  • Growth
  • Seasonality
  • Context size
  • Reasoning depth
  • Latency
  • Quality threshold
  • Data sensitivity
  • Residency
  • Integration
  • Availability
  • Model portability
  • Human review
  • Expected lifespan
  • Full cost
  • Exit requirement

Decision matrix

Sourcing mode comparison across key decision factors

Demand uncertainty

Factor
Demand uncertainty
Rent
Strongest
Reserve
Medium
Managed capacity
Medium
Own
Weakest

Speed to start

Factor
Speed to start
Rent
Strongest
Reserve
Strong
Managed capacity
Strong
Own
Weakest

Unit cost at high utilisation

Factor
Unit cost at high utilisation
Rent
Weakest
Reserve
Medium
Managed capacity
Strong
Own
Potentially strongest

Capital exposure

Factor
Capital exposure
Rent
Lowest
Reserve
Low
Managed capacity
Low-medium
Own
Highest

Control

Factor
Control
Rent
Lowest
Reserve
Medium
Managed capacity
High
Own
Highest

Model flexibility

Factor
Model flexibility
Rent
High initially
Reserve
Medium
Managed capacity
Medium
Own
Depends on stack

Sovereignty

Factor
Sovereignty
Rent
Low-medium
Reserve
Medium
Managed capacity
High
Own
Highest

Obsolescence risk

Factor
Obsolescence risk
Rent
Provider
Reserve
Shared
Managed capacity
Provider/shared
Own
Enterprise

Skills burden

Factor
Skills burden
Rent
Lowest
Reserve
Low
Managed capacity
Medium
Own
Highest

A hybrid portfolio

A sensible enterprise portfolio may contain:

  • SaaS for commoditised capabilities
  • APIs for exploration
  • Reserved capacity for scaled shared services
  • Managed regional infrastructure for sensitive workloads
  • Owned capacity for stable, material workloads
  • Edge models for latency and privacy

The value-aware crossover

A cost-only crossover asks:

"When is owned capacity cheaper per token?"

A value-aware crossover asks:

"When does a sourcing model produce the best cost per successful outcome, at the required quality, latency, control and resilience?"

A cheaper stack may deliver:

  • Lower quality
  • Slower iteration
  • More engineering burden
  • Less model choice
  • Delayed market entry

A more expensive API may create strategic value through speed and optionality.

Governance

Quarterly capacity review

  • Demand versus forecast
  • Utilisation
  • Unit cost
  • Quality
  • Workload placement
  • Vendor changes
  • Obsolescence
  • Sovereignty
  • Exit readiness

Trigger events

  • Sustained utilisation threshold
  • Cost crossover
  • Material price change
  • Model change
  • Regulation
  • Acquisition
  • New data class
  • Service incident
  • Major demand expansion

Future tool

A "Rent, Reserve or Own Intelligence" calculator should include:

Inputs:

  • Monthly input, output and cached tokens
  • Growth
  • Peak-to-average ratio
  • Model classes
  • Target utilisation
  • API and capacity price
  • Hardware and facilities
  • Staffing
  • Storage and network
  • Security and compliance
  • Capital cost
  • Refresh period
  • Migration and exit cost
  • Quality adjustment

Outputs:

  • Three-year TCO range
  • Crossover range
  • Utilisation sensitivity
  • Risk-adjusted recommendation
  • Caveats
  • Hybrid split

Conclusion

The right question is not whether owned AI is cheaper than an API.

It is which sourcing model best matches the workload's economics, risk and strategic importance, and how that answer changes as demand evolves.

Sources and further reading

Explore next

Continue exploring

Follow the threads that connect AI cost, value, governance, and operating discipline.