Guide
What is Constitutional AI?
Training AI with explicit principles for safety and helpfulness.
Constitutional AI (CAI) is Anthropic's approach to training AI systems using a set of explicit principles (a "constitution") that guide the model toward helpful, harmless, and honest behavior.
How Constitutional AI Works
- 1. Generate responses: Model produces initial outputs
- 2. Self-critique: Model evaluates outputs against principles
- 3. Revision: Model improves outputs based on critique
- 4. RL training: Train on revised outputs
Example Principles
- Choose the response that is least harmful
- Choose the response that is most helpful
- Choose the response that is most honest
- Avoid responses that are deceptive or manipulative
CAI vs RLHF
CAI reduces reliance on human labelers by using AI self-critique. This scales better and makes principles explicit rather than implicit in human preferences.
Which models use Constitutional AI?
Anthropic's Claude models use Constitutional AI. Other providers use similar principle-based approaches alongside RLHF.
Monitor any LLM in production
Start Free