Vertex AI RAG Engine supported models

You can use Vertex AI RAG Engine with several types of models. This page describes the following supported model types:

  • Gemini models: Learn about the native Gemini models and versions that support Vertex AI RAG Engine.
  • Self-deployed models: Use Vertex AI RAG Engine with your own models deployed on Vertex AI endpoints.
  • Models with managed APIs: Use Vertex AI RAG Engine with third-party models available as a managed service on Vertex AI.

The following table compares the supported model types.

Model Type Description Use Case
Gemini models Natively integrated, multimodal models developed by Google. Best for general-purpose tasks, leveraging the latest features and optimizations from Google.
Self-deployed models Open-source or custom models that you deploy and manage on your own Vertex AI endpoints. Ideal for users who require full control over the model, architecture, and serving environment, or need to use a custom or fine-tuned model.
Models with managed APIs Third-party models, such as Llama and Mistral, offered as fully managed API endpoints on Vertex AI. Suitable for users who want to use popular third-party models without the overhead of deploying and managing the infrastructure.

Gemini models

The following table lists the Gemini models and their versions that support Vertex AI RAG Engine:

Fine-tuned Gemini models are unsupported when the Gemini models use Vertex AI RAG Engine.

Self-deployed models

Vertex AI RAG Engine supports all models in Model Garden.

To use Vertex AI RAG Engine with a self-deployed model, create a model instance that points to your model's endpoint.

Before you run the code sample, replace the following variables:

  • PROJECT_ID: Your project ID.
  • LOCATION: The region to process your request.
  • ENDPOINT_ID: Your endpoint ID.

      # Create a model instance with your self-deployed open model endpoint
      rag_model = GenerativeModel(
          "projects/PROJECT_ID/locations/LOCATION/endpoints/ENDPOINT_ID",
          tools=[rag_retrieval_tool]
      )
    

Models with managed APIs on Vertex AI

The following models with managed APIs on Vertex AI support Vertex AI RAG Engine:

The following code sample demonstrates how to use the Gemini GenerateContent API to create a generative model instance. The model ID, /publisher/meta/models/llama-3.1-405B-instruct-maas, is found in the model card.

Before you run the code sample, replace the following variables:

  • PROJECT_ID: Your project ID.
  • LOCATION: The region to process your request.
  • RAG_RETRIEVAL_TOOL: Your RAG retrieval tool.

      # Create a model instance with Llama 3.1 MaaS endpoint
      rag_model = GenerativeModel(
          "projects/PROJECT_ID/locations/LOCATION/publisher/meta/models/llama-3.1-405B-instruct-maas",
          tools=RAG_RETRIEVAL_TOOL
      )
    

The following code sample uses the OpenAI-compatible ChatCompletions API to generate a model response.

Before you run the code sample, replace the following variables:

  • PROJECT_ID: Your project ID.
  • LOCATION: The region to process your request.
  • MODEL_ID: LLM model for content generation. For example, meta/llama-3.1-405b-instruct-maas.
  • INPUT_PROMPT: The text sent to the LLM for content generation. Use a prompt relevant to the documents in Vertex AI Search.
  • RAG_CORPUS_ID: The ID of the RAG corpus resource.
  • ROLE: Your role.
  • USER: Your username.
  • CONTENT: Your content.

      # Generate a response with Llama 3.1 MaaS endpoint
      response = client.chat.completions.create(
          model="MODEL_ID",
          messages=[{"ROLE": "USER", "content": "CONTENT"}],
          extra_body={
              "extra_body": {
                  "google": {
                      "vertex_rag_store": {
                          "rag_resources": {
                              "rag_corpus": "RAG_CORPUS_ID"
                          },
                          "similarity_top_k": 10
                      }
                  }
              }
          },
      )
    

What's next