Fine-tune RAG transformations

This guide shows you how to fine-tune RAG transformations by adjusting chunking parameters for your documents. This page covers the following topics:

Understanding chunking and its impact

When you ingest documents, Vertex AI RAG Engine splits them into smaller pieces called chunks before it creates embeddings. The size and overlap of these chunks can significantly impact the quality of your RAG system's responses.

The default chunking settings are optimized for a wide range of use cases. However, you might want to adjust them based on the nature of your documents and your specific application. The following table provides guidance on how to choose a chunk size.

Chunk Size Pros Cons Best for
Smaller
  • Creates more precise embeddings with specific details.
  • Improves retrieval of factual, short-form answers.
  • Can lose the broader context of the original document.
  • Can lead to fragmented information retrieval.
Question-answering on documents where answers are concise and factual, such as FAQs.
Larger
  • Preserves more surrounding context in each chunk.
  • Captures more general, high-level concepts in embeddings.
  • Can create embeddings that are too general and miss specific details.
  • Might not fit well into the embedding model's context window, which can cause information loss.
Summarization tasks or querying for broader themes and concepts within a document.

Available chunking parameters

You can control chunking behavior by using the following parameters during data ingestion.

Parameter Description
chunk_size The size of the chunk in tokens. The default is 1,024. To help you decide on a size, see the guidance in the preceding table.
chunk_overlap The number of tokens that overlap between adjacent chunks. Overlap helps maintain context between chunks. A larger overlap can improve retrieval quality but also increases processing and storage costs. The default is 256.

What's next