Glossary

What is AI Caching?

Reducing LLM costs with prompt and response caching.

What is AI caching?

AI caching stores LLM responses to serve identical or similar requests without calling the model again. This reduces costs by 50-90% and improves latency from seconds to milliseconds for cached queries.

Caching Strategies

  • Exact match: Cache identical prompts
  • Semantic: Cache similar meaning queries
  • Prefix: Cache shared system prompts

Caching Risks

  • Stale responses for time-sensitive queries
  • Cache poisoning with bad responses
  • PII leakage through cached data
  • Semantic similarity false positives

Monitor AI cache performance

Start Free