What is LoRA? Low-Rank Adaptation Explained

What is LoRA?

LoRA (Low-Rank Adaptation) is a technique for efficiently fine-tuning large language models by training small adapter layers instead of all model weights. This reduces memory requirements by 10-100x while achieving similar results to full fine-tuning.

How LoRA Works

Freezes original model weights
Adds small trainable matrices (adapters)
Adapters are merged at inference time
Multiple LoRAs can be swapped without reloading base model

LoRA Variants

QLoRA: Combines quantization with LoRA for even lower memory
DoRA: Weight-decomposed LoRA for better performance
AdaLoRA: Adaptive rank allocation

Monitoring Fine-Tuned Models

Fine-tuning introduces new risks:

May increase hallucinations on out-of-domain queries
Can introduce unexpected behaviors
Quality may degrade over time as data drifts
Compare fine-tuned vs base model metrics

Do fine-tuned models need monitoring?

Yes. Fine-tuning can introduce new failure modes, increase hallucinations on out-of-domain queries, or create unexpected behaviors. Monitor fine-tuned models to catch regressions and ensure quality.