Glossary · Applied AI

Fine-tuning

Fine-tuning is taking a general-purpose AI model and training it further on examples from your specific domain — your vocabulary, your data, your preferred behavior — so it performs better on the tasks where a general model is too generic. The base model knows language broadly; the fine-tuned model knows how your business talks, what your output should look like, and what kind of answer you want for a question that has many possible plausible answers.

How it works

How fine-tuning applies in practice

Fine-tuning starts with a base model, a training set, and an evaluation set. The training set is examples of the input-output pairs you want the model to learn. The evaluation set is held-out examples you grade against. The training process adjusts the model's weights on your examples — leaving the broader language ability intact while pulling its behavior toward your domain.

  • Pick the base. Choose a model that is already close enough — usually the smallest one that performs acceptably with prompting and RAG.
  • Build the training set. Curate examples that show the model the input it will see and the output you want. Quality of examples matters far more than quantity.
  • Build the evaluation set. Separate held-out examples that the model will not train on, used to measure whether fine-tuning actually improved things.
  • Train. Run the fine-tuning job — modest amounts of compute relative to base-model training.
  • Evaluate. Compare the fine-tuned model against the base on the evaluation set. Fine-tuning that does not measurably help is a sign the training set is wrong, not the model.
  • Iterate. Real fine-tuning is rarely one-and-done; the training set evolves as edge cases appear in production.
Why it matters

Why fine-tuning matters

Most applied-AI work in small and mid-sized businesses can be done well without fine-tuning — careful prompting, RAG, and a thin validation layer cover a remarkable amount of ground. Where fine-tuning earns its place is in workflows that need consistency: a categorization engine that has to produce the same answer for the same input every time, a draft-email system that has to sound like the firm's voice, an extraction model that has to use the company's specific schema. Those use cases are the ones where the model's behavior, not just its knowledge, has to be locked in.

The flip side is that fine-tuning is harder to update than prompting or RAG. Once a model has been trained on your data, changing its behavior requires another training cycle. That trade-off is why a good applied-AI practice fine-tunes deliberately, when the cost of inconsistency in production is higher than the cost of locking the behavior in.

Related terms

Closely related concepts

Large language model (LLM)

The kind of model fine-tuning is most often applied to.

Retrieval-augmented generation (RAG)

The alternative for knowledge-based use cases.

Embedding

Sometimes also fine-tuned for domain-specific similarity.

Applied AI

The broader discipline fine-tuning lives inside.

Agentic workflow

Where fine-tuned models often show up as the agent's brain.

Document intelligence

A common fine-tuning target when schemas are domain-specific.

FAQ

Common questions about fine-tuning

When should you fine-tune?

When prompting and RAG can't get the model to behave consistently — usually when domain vocabulary, output format, or judgment criteria need to be locked in. For most knowledge use cases, RAG is enough.

What does it cost?

Less than people think, but more than nothing. Fine-tuning requires labeled examples, compute time, and evaluation infrastructure. The biggest cost is usually building the training set, not the training run.

Fine-tuning vs RAG?

RAG gives the model facts at query time; fine-tuning changes the model itself. RAG is easier to update, cheaper, and the right answer for knowledge questions. Fine-tuning is the right answer when you need consistent behavior or output format.

Does AMG fine-tune for clients?

When it's the right tool. For most engagements, prompting plus RAG plus validation gets us where we need to be. We fine-tune when consistent domain behavior matters more than flexibility.

Considering a custom model?

See how AMG decides between prompting, RAG, and fine-tuning for client systems.