Rent, Reserve or Own Intelligence?
The enterprise infrastructure question is no longer simply cloud or on-premises. It is how much intelligence capacity to rent, reserve, manage or own for each workload.
From software sourcing to intelligence sourcing
Enterprises once chose whether to buy software or build it.
They now choose how to source machine intelligence:
- Packaged SaaS
- Public API
- Reserved model or throughput capacity
- Hyperscaler managed service
- Neocloud capacity
- Private cloud
- Colocation
- Owned AI factory
- Edge or device inference
- Hybrid combinations
The same model capability can arrive through radically different economic structures.
Four sourcing modes
Rent
Pay per token, request, action or usage.
Best for:
- Pilots
- Uncertain demand
- Low volume
- Rapid access
- Model experimentation
- Variable workloads
Risks:
- Price exposure
- Limited infrastructure control
- Opacity
- Vendor dependency
- Data and residency constraints
Reserve
Commit to capacity, throughput or spend for a period.
Best for:
- Growing but still variable workloads
- Predictable base demand plus bursts
- Need for discount without infrastructure ownership
- Enterprise service levels
Risks:
- Commitment waste
- Forecast error
- Model change during contract
- Lock-in
Managed capacity
Use dedicated or specialised infrastructure operated by a provider.
Best for:
- Performance-sensitive workloads
- Regional or security requirements
- Elastic access with more control
- Organisations lacking facilities capability
Risks:
- Shared responsibility complexity
- Provider concentration
- Contract and capacity constraints
- Migration difficulty
Own
Acquire and operate infrastructure directly or through colocation.
Best for:
- Large predictable demand
- High utilisation
- Material latency or sovereignty needs
- Stable model strategy
- Strong platform capability
Risks:
- Capital
- Stranded capacity
- Rapid obsolescence
- Power and cooling
- Staffing
- Integration
- Slower deployment
- Refresh cycles
What Deloitte's model shows
Deloitte models API, neocloud and self-hosted AI-factory economics across rising token volume.
The useful lesson is the shape of the curve.
The dangerous lesson is treating 84 billion as a universal threshold.
Interpretation
Deloitte's 84-billion-token crossover is a modelled result under specific assumptions: B200 hardware, US costs, hourly neocloud pricing, and specific workload simulation. The model explicitly excludes storage and egress, security, one-time integration, and staffing costs. These exclusions can materially change the crossover point.
Why the threshold moves
Model and quality
A smaller open model and a frontier reasoning model do not provide equivalent work.
Input and output mix
Output tokens can carry different economics from input and cached tokens.
Utilisation
Owned infrastructure wins only if it is productively used.
Demand shape
Steady demand supports capacity ownership. Spiky demand supports renting.
Hardware generation
Performance and energy efficiency change rapidly.
Contract terms
Reserved API or neocloud pricing can change the comparison.
Full cost
Deloitte's model excludes storage and egress, security, one-time integration and staffing. An enterprise decision must include them.
Risk and sovereignty
Control may justify higher nominal cost.
Exit and optionality
The ability to switch models or providers has economic value.
Workload decision profile
For each material workload, record:
- Business outcome
- Criticality
- Volume
- Growth
- Seasonality
- Context size
- Reasoning depth
- Latency
- Quality threshold
- Data sensitivity
- Residency
- Integration
- Availability
- Model portability
- Human review
- Expected lifespan
- Full cost
- Exit requirement
Decision matrix
Sourcing mode comparison across key decision factors
Demand uncertainty
- Factor
- Demand uncertainty
- Rent
- Strongest
- Reserve
- Medium
- Managed capacity
- Medium
- Own
- Weakest
Speed to start
- Factor
- Speed to start
- Rent
- Strongest
- Reserve
- Strong
- Managed capacity
- Strong
- Own
- Weakest
Unit cost at high utilisation
- Factor
- Unit cost at high utilisation
- Rent
- Weakest
- Reserve
- Medium
- Managed capacity
- Strong
- Own
- Potentially strongest
Capital exposure
- Factor
- Capital exposure
- Rent
- Lowest
- Reserve
- Low
- Managed capacity
- Low-medium
- Own
- Highest
Control
- Factor
- Control
- Rent
- Lowest
- Reserve
- Medium
- Managed capacity
- High
- Own
- Highest
Model flexibility
- Factor
- Model flexibility
- Rent
- High initially
- Reserve
- Medium
- Managed capacity
- Medium
- Own
- Depends on stack
Sovereignty
- Factor
- Sovereignty
- Rent
- Low-medium
- Reserve
- Medium
- Managed capacity
- High
- Own
- Highest
Obsolescence risk
- Factor
- Obsolescence risk
- Rent
- Provider
- Reserve
- Shared
- Managed capacity
- Provider/shared
- Own
- Enterprise
Skills burden
- Factor
- Skills burden
- Rent
- Lowest
- Reserve
- Low
- Managed capacity
- Medium
- Own
- Highest
A hybrid portfolio
A sensible enterprise portfolio may contain:
- SaaS for commoditised capabilities
- APIs for exploration
- Reserved capacity for scaled shared services
- Managed regional infrastructure for sensitive workloads
- Owned capacity for stable, material workloads
- Edge models for latency and privacy
The value-aware crossover
A cost-only crossover asks:
"When is owned capacity cheaper per token?"
A value-aware crossover asks:
"When does a sourcing model produce the best cost per successful outcome, at the required quality, latency, control and resilience?"
A cheaper stack may deliver:
- Lower quality
- Slower iteration
- More engineering burden
- Less model choice
- Delayed market entry
A more expensive API may create strategic value through speed and optionality.
Governance
Quarterly capacity review
- Demand versus forecast
- Utilisation
- Unit cost
- Quality
- Workload placement
- Vendor changes
- Obsolescence
- Sovereignty
- Exit readiness
Trigger events
- Sustained utilisation threshold
- Cost crossover
- Material price change
- Model change
- Regulation
- Acquisition
- New data class
- Service incident
- Major demand expansion
Future tool
A "Rent, Reserve or Own Intelligence" calculator should include:
Inputs:
- Monthly input, output and cached tokens
- Growth
- Peak-to-average ratio
- Model classes
- Target utilisation
- API and capacity price
- Hardware and facilities
- Staffing
- Storage and network
- Security and compliance
- Capital cost
- Refresh period
- Migration and exit cost
- Quality adjustment
Outputs:
- Three-year TCO range
- Crossover range
- Utilisation sensitivity
- Risk-adjusted recommendation
- Caveats
- Hybrid split
Conclusion
The right question is not whether owned AI is cheaper than an API.
It is which sourcing model best matches the workload's economics, risk and strategic importance, and how that answer changes as demand evolves.
Sources and further reading
- Deloitte, The pivot to tokenomics: Navigating AI's new spend dynamics, pp. 6-19 and model limitations pp. 25-26
- FinOps Foundation, "GenAI FinOps: How Token Pricing Really Works"
- European Commission, "AI Factories"
- AI TCO Framework
- Token Economics
- Inference Cost Crisis