Retrieval-augmented generation (RAG)
Retrieval-augmented generation, almost always shortened to RAG, is the pattern that lets a language model look up your documents before it answers. A user asks a question. The system searches your data for the most relevant material. Those passages are handed to the model as context. The model produces an answer grounded in what it found. RAG is the reason a general-purpose LLM can answer specific questions about your contracts, your books, your SOPs, or your portfolio — without being trained on any of them.
How RAG applies in practice
A working RAG system has four moving parts. Each one is independently tunable, and getting them all right is what separates a useful system from a chatbot that hallucinates with confidence.
- Document ingestion. Files are loaded, OCRed if needed, split into chunks, and stored in a way that preserves source metadata.
- Embedding. Each chunk is converted into a numeric vector that represents its meaning, so it can be matched against future queries by semantic similarity.
- Retrieval. When a question comes in, the retriever finds the chunks most relevant to it — often combining vector search with keyword search.
- Generation. The retrieved chunks are placed in the LLM's context window, and the model produces an answer that should be grounded in those chunks.
- Citations. Good RAG systems show the source documents and quoted passages behind every answer, so the user can verify.
- Evaluation. Accuracy on real questions is measured against expected answers — RAG quality is the joint quality of retrieval and generation.
Why RAG matters
General-purpose LLMs do not know your business. They know what they were trained on — the public web up to some cutoff date, with all the gaps and biases that come with it. They do not know what your operating agreement says, how your accounting policy treats a particular kind of expense, what the renewal clause is in the lease for the property in Plano, or how your firm decides which trades to flag for review. Asked any of those questions, a raw LLM will guess — and the guess will sound plausible.
RAG closes that gap. By giving the model the right passages of your data at query time, the answer is anchored in something verifiable. The user can read the source. The model can be told to refuse if the retrieved material does not actually answer the question. The system stays current automatically — change a document, and the answers change too. For any business AI workflow that has to reason about specific company information, RAG is usually the right place to start.
Closely related concepts
Large language model (LLM)
The generation half of RAG.
Embedding
The numeric representation that makes retrieval work.
Fine-tuning
The alternative approach to specializing a model — usually complementary, not competing.
Document intelligence
A major application area for RAG.
Agentic workflow
Where RAG often shows up as one tool the agent uses.
Entity-aware document vault
The structured document store RAG retrieves from in multi-entity work.
Common questions about RAG
Why does RAG matter?
Because a general LLM does not know your contracts, your books, your SOPs, or your data. RAG is the bridge — it lets the model answer questions about your specific business with current information instead of making things up.
What does a RAG system include?
A document store, an embedding step, a retriever that finds the most relevant chunks for any query, and the LLM that uses those chunks to compose an answer.
Is RAG the same as fine-tuning?
No. Fine-tuning changes the model itself; RAG leaves the model alone and feeds it context at query time. RAG is typically faster, cheaper, easier to update, and the right answer for most business knowledge use cases.
Does AMG build RAG into its systems?
Yes. The vault holds the documents; the retriever finds the right ones; the LLM composes the answer with citations back to source.
Want AI that knows your documents?
See how AMG builds RAG-based document Q&A into operational systems.