Vertex AI RAG Engine billing

This page explains the pricing and billing for Vertex AI RAG Engine. This page covers the following topics:

  • Pricing and billing: Learn how billing works for key RAG components like data ingestion, transformation, embedding, indexing, and reranking.
  • What's next: Find resources to get started with RAG and learn more about its architecture.

While Vertex AI RAG Engine itself has no additional charge, you are billed for the underlying Google Cloud components that you use, such as models, reranking, and vector storage.

For more information, see the Vertex AI RAG Engine overview.

Pricing and billing

You are billed for the underlying Google Cloud components that you use with Vertex AI RAG Engine.

Data ingestion

You can ingest data from various sources, such as local file uploads, Cloud Storage, and Google Drive. Accessing files from these sources is free, but the data sources might charge for data transfer, such as data egress costs.

Data transformation

File parsing

You can choose from the following file parsers:

  • Default parser: A free, simple parser for basic text extraction from plain text files or simple document formats.
  • LLM Parser: Uses a specified LLM to parse files, enabling a more nuanced understanding of unstructured or semi-structured content. You are billed for the underlying LLM model usage.
  • Document AI layout parser: A specialized parser that understands the layout and structure of documents with complex layouts, such as PDFs with tables, columns, and forms. You are billed for Document AI usage.

File chunking

You can use fixed-size chunking, which is free.

Embedding generation

When you generate embeddings, Vertex AI RAG Engine uses the embedding model that you specify. You are billed for the costs associated with that model.

For more pricing information, see Cost of building and deploying AI models in Vertex AI.

Data indexing and retrieval

For vector search, you can use one of two types of vector databases:

  • RAG-managed database: A fully managed vector database solution. This database serves two purposes:

    • Stores RAG resources, such as RAG corpora and RAG files (file contents are excluded).
    • Optionally handles embedding indexing and retrieval for vector search.

    A RAG-managed database uses a Spanner instance as its backend. For each project, Vertex AI RAG Engine provisions a customer-specific Google Cloud project to manage these resources. This ensures that your data is physically isolated.

    If you choose the RagManagedDB Basic tier or Scaled tier, Vertex AI RAG Engine provisions a Spanner Enterprise edition instance in the corresponding project:

    • Basic tier: 100 processing units with backup.
    • Scaled tier: Starts at 1 node (1,000 processing units) and autoscales up to 10 nodes with backup.

    If any RAG corpus in your project chooses to use a RAG-managed database for the vector search, you will be charged for the RAG-managed Spanner instance.

    Vertex AI RAG Engine surfaces Spanner costs from your corresponding RAG-managed project to your Google Cloud project, so that you can see and pay Spanner instance costs.

    For more pricing details, see Spanner pricing.

  • Bring-Your-Own (BYO) vector database: You can use an existing vector database, such as Vector Search. You are responsible for provisioning, managing, and billing for your vector database. Vertex AI RAG Engine does not charge for this integration.

Reranking for Vertex AI RAG Engine

After initial retrieval, you can use one of the following reranking tools to improve search result relevance:

  • LLM Reranker: Uses a specified LLM to reorder retrieved documents based on semantic relevance to the query. You are billed for the underlying LLM model usage.
  • Vertex AI Search ranking API: Uses the specialized ranking API from Vertex AI Search for general-purpose reranking that is optimized for performance and relevance. You are billed for the Ranking API directly from your project.

What's next