What is Context Length in LLMs?

Q: Does longer context mean better performance?

Not necessarily. Models can struggle with 'lost in the middle' problems where information in the center of long contexts is ignored. Longer contexts also increase latency and cost. Monitor quality across different context lengths.

What is context length?

Context length (or context window) is the maximum number of tokens an LLM can process in a single request, including both input and output. Current models range from 8K to 1M+ tokens.

Context Lengths (2025)

GPT-5: 400K tokens
Claude 4: 200K tokens
Gemini 2.5: Up to 1M tokens
Llama 4: Varies by model

Context Length Trade-offs

Cost: Longer contexts = more tokens = higher cost
Latency: More tokens increase processing time
Quality: "Lost in the middle" problem
Relevance: More context isn't always better

Does longer context mean better?

Not necessarily. Models can struggle with "lost in the middle" problems where information in the center of long contexts is ignored. Longer contexts also increase latency and cost. Monitor quality across different context lengths.