Glossary
What is Semantic Caching?
Caching LLM responses for similar queries to reduce costs.
What is semantic caching?
Semantic caching stores LLM responses and returns cached results for semantically similar queries, not just exact matches. This reduces API costs and latency by avoiding redundant LLM calls for questions with the same meaning.
How It Works
- Embed incoming queries into vectors
- Search cache for similar embeddings
- Return cached response if similarity exceeds threshold
- Otherwise, call LLM and cache the result
Benefits
- Cost reduction: Avoid redundant API calls
- Lower latency: Cache hits are instant
- Consistency: Same questions get same answers
Monitoring Cached Responses
- Track cache hit rates and quality
- Monitor for stale or mismatched responses
- Compare cached vs fresh response quality
- Adjust similarity thresholds based on metrics
Does caching affect quality?
It can. Cached responses may become stale or not perfectly match the new query's intent. Monitor cache hit quality and set appropriate similarity thresholds to balance cost savings with accuracy.
Monitor LLM quality and costs
Start Free