Glossary

What is LoRA?

Low-Rank Adaptation for efficient LLM fine-tuning.

What is LoRA?

LoRA (Low-Rank Adaptation) is a technique for efficiently fine-tuning large language models by training small adapter layers instead of all model weights. This reduces memory requirements by 10-100x while achieving similar results to full fine-tuning.

How LoRA Works

  • Freezes original model weights
  • Adds small trainable matrices (adapters)
  • Adapters are merged at inference time
  • Multiple LoRAs can be swapped without reloading base model

LoRA Variants

  • QLoRA: Combines quantization with LoRA for even lower memory
  • DoRA: Weight-decomposed LoRA for better performance
  • AdaLoRA: Adaptive rank allocation

Monitoring Fine-Tuned Models

Fine-tuning introduces new risks:

  • May increase hallucinations on out-of-domain queries
  • Can introduce unexpected behaviors
  • Quality may degrade over time as data drifts
  • Compare fine-tuned vs base model metrics

Do fine-tuned models need monitoring?

Yes. Fine-tuning can introduce new failure modes, increase hallucinations on out-of-domain queries, or create unexpected behaviors. Monitor fine-tuned models to catch regressions and ensure quality.

Monitor fine-tuned models

Start Free