Vertex AI RAG Engine supported models

This page lists Gemini models, self-deployed models, and models with managed APIs on Vertex AI that support Vertex AI RAG Engine.

Gemini models

The following models support Vertex AI RAG Engine:

Fine-tuned Gemini models are unsupported when the Gemini models use Vertex AI RAG Engine.

Self-deployed models

Vertex AI RAG Engine supports all models in Model Garden.

Use Vertex AI RAG Engine with your self-deployed open model endpoints.

Replace the variables used in the code sample:

PROJECT_ID: Your project ID.
LOCATION: The region to process your request.

ENDPOINT_ID: Your endpoint ID.

  # Create a model instance with your self-deployed open model endpoint
  rag_model = GenerativeModel(
      "projects/PROJECT_ID/locations/LOCATION/endpoints/ENDPOINT_ID",
      tools=[rag_retrieval_tool]
  )

Models with managed APIs on Vertex AI

The models with managed APIs on Vertex AI that support Vertex AI RAG Engine include the following:

The following code sample demonstrates how to use the Gemini GenerateContent API to create a generative model instance. The model ID, /publisher/meta/models/llama-3.1-405B-instruct-maas, is found in the model card.

Replace the variables used in the code sample:

PROJECT_ID: Your project ID.
LOCATION: The region to process your request.

RAG_RETRIEVAL_TOOL: Your RAG retrieval tool.

  # Create a model instance with Llama 3.1 MaaS endpoint
  rag_model = GenerativeModel(
      "projects/PROJECT_ID/locations/LOCATION/publisher/meta/models/llama-3.1-405B-instruct-maas",
      tools=RAG_RETRIEVAL_TOOL
  )

The following code sample demonstrates how to use the OpenAI compatible ChatCompletions API to generate a model response.

Replace the variables used in the code sample:

PROJECT_ID: Your project ID.
LOCATION: The region to process your request.
MODEL_ID: LLM model for content generation. For example, meta/llama-3.1-405b-instruct-maas.
INPUT_PROMPT: The text sent to the LLM for content generation. Use a prompt relevant to the documents in Vertex AI Search.
RAG_CORPUS_ID: The ID of the RAG corpus resource.
ROLE: Your role.
USER: Your username.

CONTENT: Your content.

  # Generate a response with Llama 3.1 MaaS endpoint
  response = client.chat.completions.create(
      model="MODEL_ID",
      messages=[{"ROLE": "USER", "content": "CONTENT"}],
      extra_body={
          "extra_body": {
              "google": {
                  "vertex_rag_store": {
                      "rag_resources": {
                          "rag_corpus": "RAG_CORPUS_ID"
                      },
                      "similarity_top_k": 10
                  }
              }
          }
      },
  )

What's next

Use Embedding models with Vertex AI RAG Engine.