Generative models

This page lists Gemini models, self-deployed models, and models with managed APIs on Vertex AI that support RAG.

Gemini models

The following table lists the Gemini models and their versions that support RAG Engine:

Model Version
Gemini 1.5 Flash gemini-1.5-flash-002
gemini-1.5-flash-001
Gemini 1.5 Pro gemini-1.5-pro-002
gemini-1.5-pro-001
Gemini 1.0 Pro gemini-1.0-pro-001
gemini-1.0-pro-002
Gemini 1.0 Pro Vision gemini-1.0-pro-vision-001
Gemini gemini-experimental

Self-deployed models

RAG Engine supports all models in Model Garden.

Use RAG Engine with your self-deployed open model endpoints.

Replace the variables used in the code sample:

  • PROJECT_ID: Your project ID.
  • LOCATION: The region to process your request.
  • ENDPOINT_ID: Your endpoint ID.

      # Create a model instance with your self-deployed open model endpoint
      rag_model = GenerativeModel(
          "projects/PROJECT_ID/locations/LOCATION/endpoints/ENDPOINT_ID",
          tools=[rag_retrieval_tool]
      )
    

Models with managed APIs on Vertex AI

The models with managed APIs on Vertex AI that support RAG Engine include the following:

The following code sample demonstrates how to use the Gemini GenerateContent API to create a generative model instance. The model ID, /publisher/meta/models/llama-3.1-405B-instruct-maas, is found in the model card.

Replace the variables used in the code sample:

  • PROJECT_ID: Your project ID.
  • LOCATION: The region to process your request.
  • RAG_RETRIEVAL_TOOL: Your RAG retrieval tool.

      # Create a model instance with Llama 3.1 MaaS endpoint
      rag_model = GenerativeModel(
          "projects/PROJECT_ID/locations/LOCATION/publisher/meta/models/llama-3.1-405B-instruct-maas",
          tools=RAG_RETRIEVAL_TOOL
      )
    

The following code sample demonstrates how to use the OpenAI compatible ChatCompletions API to generate a model response.

Replace the variables used in the code sample:

  • PROJECT_ID: Your project ID.
  • LOCATION: The region to process your request.
  • MODEL_ID: LLM model for content generation. For example, meta/llama-3.1-405b-instruct-maas.
  • INPUT_PROMPT: The text sent to the LLM for content generation. Use a prompt relevant to the documents in Vertex AI Search.
  • RAG_CORPUS_ID: The ID of the RAG corpus resource.
  • ROLE: Your role.
  • USER: Your username.
  • CONTENT: Your content.

      # Generate a response with Llama 3.1 MaaS endpoint
      response = client.chat.completions.create(
          model="MODEL_ID",
          messages=[{"ROLE": "USER", "content": "CONTENT"}],
          extra_body={
              "extra_body": {
                  "google": {
                      "vertex_rag_store": {
                          "rag_resources": {
                              "rag_corpus": "RAG_CORPUS_ID"
                          },
                          "similarity_top_k": 10
                      }
                  }
              }
          },
      )
    

What's next