Common techniques include routing smaller models to simpler tasks, caching, batching, and improving prompt efficiency. For the operating discipline behind that work, see FinOps & AI.
Glossary entry
Inference Optimisation
The practice of reducing the cost, latency, or resource intensity of AI inference without unacceptable loss of quality.
Why it matters
Inference optimisation matters because recurring production cost, not one-off model development cost, often becomes the main economic bottleneck in enterprise AI.
Explore next
Continue exploring
Follow the threads that connect AI cost, value, governance, and operating discipline.