RAG quotas

For each service to perform retrieval-augmented generation (RAG) using RAG Engine, the following quotas apply, with the quota measured as requests per minute (RPM).
Service Quota Metric
RAG Engine data management APIs 60 RPM VertexRagDataService requests per minute per region
RetrievalContexts API 1,500 RPM VertexRagService retrieve requests per minute per region
base_model: textembedding-gecko 1,500 RPM Online prediction requests per base model per minute per region per base_model

An additional filter for you to specify is base_model: textembedding-gecko
The following limits apply:
Service Limit Metric
Concurrent ImportRagFiles requests 3 RPM VertexRagService concurrent import requests per region
Maximum number of files per ImportRagFiles request 10,000 VertexRagService import rag files requests per region

For more rate limits and quotas, see Generative AI on Vertex AI rate limits.

What's next