This page lists Gemini models, self-deployed models, and models with managed APIs on Vertex AI that support RAG.
Gemini models
The following table lists the Gemini models and their versions that support Vertex AI RAG Engine:
Model | Version |
---|---|
Gemini 1.5 Flash | gemini-1.5-flash-002 gemini-1.5-flash-001 |
Gemini 1.5 Pro | gemini-1.5-pro-002 gemini-1.5-pro-001 |
Gemini 1.0 Pro | gemini-1.0-pro-001 gemini-1.0-pro-002 |
Gemini 1.0 Pro Vision | gemini-1.0-pro-vision-001 |
Gemini | gemini-experimental |
Self-deployed models
Vertex AI RAG Engine supports all models in Model Garden.
Use Vertex AI RAG Engine with your self-deployed open model endpoints.
Replace the variables used in the code sample:
- PROJECT_ID: Your project ID.
- LOCATION: The region to process your request.
ENDPOINT_ID: Your endpoint ID.
# Create a model instance with your self-deployed open model endpoint rag_model = GenerativeModel( "projects/PROJECT_ID/locations/LOCATION/endpoints/ENDPOINT_ID", tools=[rag_retrieval_tool] )
Models with managed APIs on Vertex AI
The models with managed APIs on Vertex AI that support Vertex AI RAG Engine include the following:
The following code sample demonstrates how to use the Gemini
GenerateContent
API to create a generative model instance. The model ID,
/publisher/meta/models/llama-3.1-405B-instruct-maas
, is found in the
model card.
Replace the variables used in the code sample:
- PROJECT_ID: Your project ID.
- LOCATION: The region to process your request.
RAG_RETRIEVAL_TOOL: Your RAG retrieval tool.
# Create a model instance with Llama 3.1 MaaS endpoint rag_model = GenerativeModel( "projects/PROJECT_ID/locations/LOCATION/publisher/meta/models/llama-3.1-405B-instruct-maas", tools=RAG_RETRIEVAL_TOOL )
The following code sample demonstrates how to use the OpenAI compatible
ChatCompletions
API to generate a model response.
Replace the variables used in the code sample:
- PROJECT_ID: Your project ID.
- LOCATION: The region to process your request.
- MODEL_ID: LLM model for content generation. For
example,
meta/llama-3.1-405b-instruct-maas
. - INPUT_PROMPT: The text sent to the LLM for content generation. Use a prompt relevant to the documents in Vertex AI Search.
- RAG_CORPUS_ID: The ID of the RAG corpus resource.
- ROLE: Your role.
- USER: Your username.
CONTENT: Your content.
# Generate a response with Llama 3.1 MaaS endpoint response = client.chat.completions.create( model="MODEL_ID", messages=[{"ROLE": "USER", "content": "CONTENT"}], extra_body={ "extra_body": { "google": { "vertex_rag_store": { "rag_resources": { "rag_corpus": "RAG_CORPUS_ID" }, "similarity_top_k": 10 } } } }, )