Starting April 29, 2025, Gemini 1.5 Pro and Gemini 1.5 Flash models are not available in projects that have no prior usage of these models, including new projects. For details, see Model versions and lifecycle.

Vertex AI RAG Engine supported models

This page lists Gemini models, self-deployed models, and models with managed APIs on Vertex AI that support Vertex AI RAG Engine.

Gemini models

The following table lists the Gemini models and their versions that support Vertex AI RAG Engine:

Fine-tuned Gemini models are unsupported when the Gemini models use Vertex AI RAG Engine.

Self-deployed models

Vertex AI RAG Engine supports all models in Model Garden.

Use Vertex AI RAG Engine with your self-deployed open model endpoints.

Replace the variables used in the code sample:

PROJECT_ID: Your project ID.
LOCATION: The region to process your request.

ENDPOINT_ID: Your endpoint ID.

  # Create a model instance with your self-deployed open model endpoint
  rag_model = GenerativeModel(
      "projects/PROJECT_ID/locations/LOCATION/endpoints/ENDPOINT_ID",
      tools=[rag_retrieval_tool]
  )

Models with managed APIs on Vertex AI

The models with managed APIs on Vertex AI that support Vertex AI RAG Engine include the following:

The following code sample demonstrates how to use the Gemini GenerateContent API to create a generative model instance. The model ID, /publisher/meta/models/llama-3.1-405B-instruct-maas, is found in the model card.

Replace the variables used in the code sample:

PROJECT_ID: Your project ID.
LOCATION: The region to process your request.

RAG_RETRIEVAL_TOOL: Your RAG retrieval tool.

  # Create a model instance with Llama 3.1 MaaS endpoint
  rag_model = GenerativeModel(
      "projects/PROJECT_ID/locations/LOCATION/publisher/meta/models/llama-3.1-405B-instruct-maas",
      tools=RAG_RETRIEVAL_TOOL
  )

The following code sample demonstrates how to use the OpenAI compatible ChatCompletions API to generate a model response.

Replace the variables used in the code sample:

PROJECT_ID: Your project ID.
LOCATION: The region to process your request.
MODEL_ID: LLM model for content generation. For example, meta/llama-3.1-405b-instruct-maas.
INPUT_PROMPT: The text sent to the LLM for content generation. Use a prompt relevant to the documents in Vertex AI Search.
RAG_CORPUS_ID: The ID of the RAG corpus resource.
ROLE: Your role.
USER: Your username.

CONTENT: Your content.

  # Generate a response with Llama 3.1 MaaS endpoint
  response = client.chat.completions.create(
      model="MODEL_ID",
      messages=[{"ROLE": "USER", "content": "CONTENT"}],
      extra_body={
          "extra_body": {
              "google": {
                  "vertex_rag_store": {
                      "rag_resources": {
                          "rag_corpus": "RAG_CORPUS_ID"
                      },
                      "similarity_top_k": 10
                  }
              }
          }
      },
  )

What's next

Use Embedding models with Vertex AI RAG Engine.

Vertex AI RAG Engine supported models Stay organized with collections Save and categorize content based on your preferences.

Gemini models

Self-deployed models

Models with managed APIs on Vertex AI

What's next

Vertex AI RAG Engine supported models