Glossary entry

Inference Optimisation

The practice of reducing the cost, latency, or resource intensity of AI inference without unacceptable loss of quality.

Why it matters

Inference optimisation matters because recurring production cost, not one-off model development cost, often becomes the main economic bottleneck in enterprise AI.

Common techniques include routing smaller models to simpler tasks, caching, batching, and improving prompt efficiency. For the operating discipline behind that work, see FinOps & AI.

Explore next

Continue exploring

Follow the threads that connect AI cost, value, governance, and operating discipline.

Glossary index

Browse the full alphabetized library of AI economics terms.

AI TCO Framework

See how cost structure affects the meaning of the terms on this page.

FinOps & AI

Connect vocabulary to the operating practices shaping AI cost control.