Skip to content

Glossary entry

Inference Cost

The recurring cost of generating outputs from a deployed AI system in production.

Why it matters

Inference cost matters because AI spend often scales with usage, and a workflow that appears affordable in testing can become structurally expensive once it spreads across teams, customers, or transactions.

Inference cost can include tokens, API requests, compute, retrieval steps, guardrails, and monitoring overhead. It is one of the clearest examples of why AI economics is demand-sensitive rather than fixed.

For the wider cost stack, see AI TCO Framework.