This page lists Gemini models, self-deployed models, and models with managed APIs on Vertex AI that support RAG.
Gemini models
The following table lists the Gemini models and their versions that support RAG Engine:
Model | Version |
---|---|
Gemini 1.5 Flash | gemini-1.5-flash-002 gemini-1.5-flash-001 |
Gemini 1.5 Pro | gemini-1.5-pro-002 gemini-1.5-pro-001 |
Gemini 1.0 Pro | gemini-1.0-pro-001 gemini-1.0-pro-002 |
Gemini 1.0 Pro Vision | gemini-1.0-pro-vision-001 |
Gemini | gemini-experimental |
Self-deployed models
RAG Engine supports all models in Model Garden.
Use RAG Engine with your self-deployed open model endpoints.
Replace the variables used in the code sample:
- PROJECT_ID: Your project ID.
- LOCATION: The region to process your request.
ENDPOINT_ID: Your endpoint ID.
# Create a model instance with your self-deployed open model endpoint rag_model = GenerativeModel( "projects/PROJECT_ID/locations/LOCATION/endpoints/ENDPOINT_ID", tools=[rag_retrieval_tool] )
Models with managed APIs on Vertex AI
The models with managed APIs on Vertex AI that support RAG Engine include the following:
The following code sample demonstrates how to use the Gemini
GenerateContent
API to create a generative model instance. The model ID,
/publisher/meta/models/llama-3.1-405B-instruct-maas
, is found in the
model card.
Replace the variables used in the code sample:
- PROJECT_ID: Your project ID.
- LOCATION: The region to process your request.
RAG_RETRIEVAL_TOOL: Your RAG retrieval tool.
# Create a model instance with Llama 3.1 MaaS endpoint rag_model = GenerativeModel( "projects/PROJECT_ID/locations/LOCATION/publisher/meta/models/llama-3.1-405B-instruct-maas", tools=RAG_RETRIEVAL_TOOL )
The following code sample demonstrates how to use the OpenAI compatible
ChatCompletions
API to generate a model response.
Replace the variables used in the code sample:
- PROJECT_ID: Your project ID.
- LOCATION: The region to process your request.
- MODEL_ID: LLM model for content generation. For
example,
meta/llama-3.1-405b-instruct-maas
. - INPUT_PROMPT: The text sent to the LLM for content generation. Use a prompt relevant to the documents in Vertex AI Search.
- RAG_CORPUS_ID: The ID of the RAG corpus resource.
- ROLE: Your role.
- USER: Your username.
CONTENT: Your content.
# Generate a response with Llama 3.1 MaaS endpoint response = client.chat.completions.create( model="MODEL_ID", messages=[{"ROLE": "USER", "content": "CONTENT"}], extra_body={ "extra_body": { "google": { "vertex_rag_store": { "rag_resources": { "rag_corpus": "RAG_CORPUS_ID" }, "similarity_top_k": 10 } } } }, )
What's next
- To learn about the file size limits, see Supported document types.
- To learn about quotas related to RAG Engine, see RAG Engine quotas.
- To learn about customizing parameters, see Retrieval parameters.
- To learn more about the RAG API, see RAG Engine API.
- To learn more about grounding, see Grounding overview.
- To learn more about the difference between grounding and RAG, see Ground responses using RAG.
- To learn more about Generative AI on Vertex AI, see Overview of Generative AI on Vertex AI.
- To learn more about the RAG architecture, see
Infrastructure for a RAG-capable generative AI application using Vertex AI.