Glossary
What is LoRA?
Low-Rank Adaptation for efficient LLM fine-tuning.
What is LoRA?
LoRA (Low-Rank Adaptation) is a technique for efficiently fine-tuning large language models by training small adapter layers instead of all model weights. This reduces memory requirements by 10-100x while achieving similar results to full fine-tuning.
How LoRA Works
- Freezes original model weights
- Adds small trainable matrices (adapters)
- Adapters are merged at inference time
- Multiple LoRAs can be swapped without reloading base model
LoRA Variants
- QLoRA: Combines quantization with LoRA for even lower memory
- DoRA: Weight-decomposed LoRA for better performance
- AdaLoRA: Adaptive rank allocation
Monitoring Fine-Tuned Models
Fine-tuning introduces new risks:
- May increase hallucinations on out-of-domain queries
- Can introduce unexpected behaviors
- Quality may degrade over time as data drifts
- Compare fine-tuned vs base model metrics
Do fine-tuned models need monitoring?
Yes. Fine-tuning can introduce new failure modes, increase hallucinations on out-of-domain queries, or create unexpected behaviors. Monitor fine-tuned models to catch regressions and ensure quality.
Monitor fine-tuned models
Start Free