Glossary

What is a Transformer?

The architecture powering modern LLMs.

What is a transformer?

A transformer is a neural network architecture that uses self-attention to process sequences. Introduced in 2017's "Attention Is All You Need" paper, it's the foundation of all modern LLMs including GPT-5, Claude 4, Gemini, and Llama.

Key Components

  • Self-attention: Weighs relationships between tokens
  • Feed-forward: Processes each position
  • Layer norm: Stabilizes training
  • Positional encoding: Adds sequence order

Transformer Variants

  • Decoder-only: GPT, Claude, Llama (text generation)
  • Encoder-only: BERT (understanding)
  • Encoder-decoder: T5 (translation, summarization)

Why Transformers Enabled LLMs

  • Parallel processing enables massive scale
  • Attention captures long-range dependencies
  • Scales predictably with compute
  • Flexible for many modalities

Why are transformers important?

Transformers enabled the scaling that created modern LLMs. Unlike previous architectures, they can process entire sequences in parallel and capture long-range dependencies. This made training on massive datasets practical.

Monitor any transformer-based LLM

Start Free