After a document is ingested, RAG Engine runs a set of transformations to prepare the data for indexing. You can control your use cases using the following parameters:
Parameter | Description |
---|---|
chunk_size |
When documents are ingested into an index, they are split into chunks. The chunk_size parameter (in tokens) specifies the size of the chunk. The default chunk size is 1,024 tokens. |
chunk_overlap |
By default, documents are split into chunks with a certain amount of overlap to improve relevance and retrieval quality. The default chunk overlap is 200 tokens. |
A smaller chunk size means the embeddings are more precise. A larger chunk size means that the embeddings might be more general but can miss specific details.
For example, if you convert 200 words as opposed to 1,000 words into an embedding array of the same dimension, you can lose details. This is also a good example of when you consider the model context length limit. A large chunk of text might not fit into a small-window model.
What's next
- To learn about the file size limits, see Supported document types.
- To learn about quotas related to RAG Engine, see RAG Engine quotas.
- To learn about customizing parameters, see Retrieval parameters.
- To learn more about the RAG API, see RAG Engine API.
- To learn more about grounding, see Grounding overview.
- To learn more about the difference between grounding and RAG, see Ground responses using RAG.
- To learn more about Generative AI on Vertex AI, see Overview of Generative AI on Vertex AI.
- To learn more about the RAG architecture, see
Infrastructure for a RAG-capable generative AI application using Vertex AI.