Glossary

What is Mixture of Experts?

Sparse architectures for efficient large-scale LLMs.

What is Mixture of Experts?

Mixture of Experts (MoE) is an architecture where a model has multiple specialized sub-networks (experts) and a router that selects which experts to use for each input. This allows models to have more total parameters while only activating a subset during inference.

How MoE Works

  • Experts: Multiple feed-forward networks specialized for different tasks
  • Router: Learned gating network that selects experts per token
  • Sparse activation: Only 1-2 experts active per token
  • Result: Large capacity with efficient inference

MoE Models (2025)

  • Llama 4: Scout, Maverick, Behemoth all use MoE
  • Mixtral: 8x7B and 8x22B from Mistral
  • Gemini 3 Pro: Sparse mixture-of-experts

Monitoring MoE Models

MoE models have unique characteristics to monitor:

  • Expert routing can affect consistency
  • Different experts may have different failure modes
  • Track quality across different query types

Monitor any LLM architecture

Start Free