Inference cost can include tokens, API requests, compute, retrieval steps, guardrails, and monitoring overhead. It is one of the clearest examples of why AI economics is demand-sensitive rather than fixed.
For the wider cost stack, see AI TCO Framework.