What is the Attention Mechanism in LLMs?

What is attention?

Attention is a mechanism that allows models to weigh the importance of different parts of the input when generating each output token. It enables LLMs to consider relevant context regardless of position in the sequence.

How It Works

Query: What the model is looking for
Key: What each token offers
Value: The actual information to retrieve
Score: How relevant each token is

Types of Attention

Self-attention: Tokens attend to each other
Multi-head: Multiple attention patterns in parallel
Cross-attention: Attend to different sequences
Sparse: Efficient attention for long contexts

Attention and Quality

"Lost in the middle" problem from attention patterns
Long contexts can dilute attention
Attention visualization helps debug issues

Why does attention matter for quality?

Attention determines what context the model considers for each response. Poor attention patterns can lead to ignoring relevant information or hallucinating based on irrelevant context. Understanding attention helps debug quality issues.