When should I fine-tune an LLM vs. use RAG?
Fine-tuning and RAG solve different problems and are often used together. Use RAG when: the model needs access to specific, current, or proprietary facts, the information changes frequently, or you need source citations. Use fine-tuning when: the model needs to behave differently adopt specific response format, use domain vocabulary consistently, follow particular reasoning patterns, or match brand tone in ways prompt engineering cannot reliably achieve. Fine-tuning teaches style and behaviour; RAG provides knowledge. Most organisations asking for fine-tuning actually need RAG ClickMasters identifies the correct solution before any training begins.
What is LoRA and why is it used for LLM fine-tuning?
LoRA (Low-Rank Adaptation) is a parameter-efficient fine-tuning technique that trains small adapter matrices alongside frozen base model weights. A 7B parameter Llama model has 7 billion weights requiring massive GPU memory. LoRA trains adapters representing 0.1-1% of original parameters reducing VRAM requirements by 75-90% and training cost proportionally. QLoRA combines 4-bit quantisation with LoRA enabling 70B model fine-tuning on a single consumer GPU that would otherwise require 8 high-end server GPUs. ClickMasters uses LoRA/QLoRA for open-source model fine-tuning as default.
How much training data do I need for fine-tuning?
For format and style fine-tuning: 50-200 high-quality examples. For domain adaptation: 500-2,000 examples covering production input range. For complex reasoning tasks: 1,000-5,000+ examples. Quality matters far more than quantity 100 carefully curated examples consistently outperform 10,000 examples with label noise or format inconsistency. ClickMasters includes dataset quality review as part of every fine-tuning engagement.
What is MLOps and when do I need it?
MLOps is the engineering discipline of deploying, monitoring, and maintaining ML models in production. You need MLOps when: training and retraining models regularly on new data, need to track experiment results (Weights & Biases/MLflow), need model versioning traceable to training data, need to detect model degradation (data drift), or are serving multiple model versions simultaneously (A/B testing). For teams training models once and infrequently, a full MLOps pipeline is unnecessary overhead.