Glossary
What is the Attention Mechanism?
How transformers weigh context for generation.
What is attention?
Attention is a mechanism that allows models to weigh the importance of different parts of the input when generating each output token. It enables LLMs to consider relevant context regardless of position in the sequence.
How It Works
- Query: What the model is looking for
- Key: What each token offers
- Value: The actual information to retrieve
- Score: How relevant each token is
Types of Attention
- Self-attention: Tokens attend to each other
- Multi-head: Multiple attention patterns in parallel
- Cross-attention: Attend to different sequences
- Sparse: Efficient attention for long contexts
Attention and Quality
- "Lost in the middle" problem from attention patterns
- Long contexts can dilute attention
- Attention visualization helps debug issues
Why does attention matter for quality?
Attention determines what context the model considers for each response. Poor attention patterns can lead to ignoring relevant information or hallucinating based on irrelevant context. Understanding attention helps debug quality issues.
Monitor LLM output quality
Start Free