Glossary
What is a Transformer?
The architecture powering modern LLMs.
What is a transformer?
A transformer is a neural network architecture that uses self-attention to process sequences. Introduced in 2017's "Attention Is All You Need" paper, it's the foundation of all modern LLMs including GPT-5, Claude 4, Gemini, and Llama.
Key Components
- Self-attention: Weighs relationships between tokens
- Feed-forward: Processes each position
- Layer norm: Stabilizes training
- Positional encoding: Adds sequence order
Transformer Variants
- Decoder-only: GPT, Claude, Llama (text generation)
- Encoder-only: BERT (understanding)
- Encoder-decoder: T5 (translation, summarization)
Why Transformers Enabled LLMs
- Parallel processing enables massive scale
- Attention captures long-range dependencies
- Scales predictably with compute
- Flexible for many modalities
Why are transformers important?
Transformers enabled the scaling that created modern LLMs. Unlike previous architectures, they can process entire sequences in parallel and capture long-range dependencies. This made training on massive datasets practical.
Monitor any transformer-based LLM
Start Free