Glossary

What is Context Length?

Understanding LLM context windows and their implications.

What is context length?

Context length (or context window) is the maximum number of tokens an LLM can process in a single request, including both input and output. Current models range from 8K to 1M+ tokens.

Context Lengths (2025)

  • GPT-5: 400K tokens
  • Claude 4: 200K tokens
  • Gemini 2.5: Up to 1M tokens
  • Llama 4: Varies by model

Context Length Trade-offs

  • Cost: Longer contexts = more tokens = higher cost
  • Latency: More tokens increase processing time
  • Quality: "Lost in the middle" problem
  • Relevance: More context isn't always better

Does longer context mean better?

Not necessarily. Models can struggle with "lost in the middle" problems where information in the center of long contexts is ignored. Longer contexts also increase latency and cost. Monitor quality across different context lengths.

Monitor context usage and quality

Start Free