Glossary
What is Mixture of Experts?
Sparse architectures for efficient large-scale LLMs.
What is Mixture of Experts?
Mixture of Experts (MoE) is an architecture where a model has multiple specialized sub-networks (experts) and a router that selects which experts to use for each input. This allows models to have more total parameters while only activating a subset during inference.
How MoE Works
- Experts: Multiple feed-forward networks specialized for different tasks
- Router: Learned gating network that selects experts per token
- Sparse activation: Only 1-2 experts active per token
- Result: Large capacity with efficient inference
MoE Models (2025)
- Llama 4: Scout, Maverick, Behemoth all use MoE
- Mixtral: 8x7B and 8x22B from Mistral
- Gemini 3 Pro: Sparse mixture-of-experts
Monitoring MoE Models
MoE models have unique characteristics to monitor:
- Expert routing can affect consistency
- Different experts may have different failure modes
- Track quality across different query types
Monitor any LLM architecture
Start Free