This page explains reranking and shows you how to use the API to rerank your retrieved responses.
Post-retrieval reranking is a technique that enhances the relevance of retrieval results. Vertex AI RAG Engine offers optional rerankers that enhance the relevance of retrieved results during queries. Rerankers assess the relevance of chunks from a query and reorder results accordingly. The new order leads to responses that are more suitable in response to the query or can be included in prompts for model inference to generate more relevant and accurate responses.
Available Rerankers
This section explores the types of rerankers.
LLM reranker
LLM reranker is the reranker that uses an LLM to assess the relevance of chunks to a query and reorder results accordingly, leading to more suitable responses or improved prompts for model inference.
Vertex AI rank service reranker
Rank service reranker is based on the rank API that takes a list of documents and reranks those documents based on how relevant the documents are to a query. Compared to embeddings, which look only at the semantic similarity of a document and a query, this can give you precise scores for how well a document answers a given query.
How to use rerankers
This section presents the prerequisites and code samples for using rerankers.
Prerequisites for using the LLM reranker
The LLM reranker supports only Gemini models, which are accessible when the RAG API is enabled. To view the list of supported models, see Gemini models.
Retrieve relevant contexts using the RAG API
This code sample demonstrates how to retrieve relevant contexts using the RAG API.
REST
Replace the following variables used in the code sample:
- PROJECT_ID: The ID of your Google Cloud project.
- LOCATION: The region to process the request.
- RAG_CORPUS_RESOURCE: The name of the RAG corpus
resource. Format:
projects/{project}/locations/{location}/ragCorpora/{rag_corpus}
. - TEXT: The query text to get relevant contexts.
- MODEL_NAME: The name of the model used for reranking.
curl -X POST \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
"https://LOCATION-aiplatform.googleapis.com/v1beta1/projects/PROJECT_ID/locations/LOCATION:retrieveContexts" \
-d '{
"vertex_rag_store": {
"rag_resources": {
"rag_corpus": """RAG_CORPUS_RESOURCE"
}
},
"query": {
"text": "TEXT",
"rag_retrieval_config": {
"top_k": 10,
"ranking": {
"llm_ranker": {
"model_name": "MODEL_NAME"
}
}
}
}
}'
Python
To learn how to install or update the Vertex AI SDK for Python, see Install the Vertex AI SDK for Python. For more information, see the Python API reference documentation.
Replace the following variables used in the code sample:
- PROJECT_ID: The ID of your Google Cloud project.
- LOCATION: The region to process the request.
- RAG_CORPUS_RESOURCE: The name of the RAG corpus
resource. Format:
projects/{project}/locations/{location}/ragCorpora/{rag_corpus}
. - TEXT: The query text to get relevant contexts.
- MODEL_NAME: The name of the model used for reranking.
from vertexai.preview import rag
import vertexai
PROJECT_ID = "PROJECT_ID"
CORPUS_NAME = "projects/[PROJECT_ID]/locations/LOCATION/ragCorpora/[RAG_CORPUS_ID]"
MODEL_NAME= "MODEL_NAME"
# Initialize Vertex AI API once per session
vertexai.init(project=PROJECT_ID, location="LOCATION")
rag_retrieval_config = rag.RagRetrievalConfig(
top_k=10,
ranking=rag.Ranking(
llm_ranker=rag.LlmRanker(
model_name=MODEL_NAME
)
)
)
response = rag.retrieval_query(
rag_resources=[
rag.RagResource(
rag_corpus=CORPUS_NAME,
)
],
text="TEXT",
rag_retrieval_config=rag_retrieval_config,
)
print(response)
# Example response:
# contexts {
# contexts {
# source_uri: "gs://your-bucket-name/file.txt"
# text: "....
# ....
Generate content using the RAG API
REST
To generate content using Gemini models, make a call to the
Vertex AI GenerateContent
API. By specifying the
RAG_CORPUS_RESOURCE
in the request, the model automatically retrieves data
from the Vertex AI Search.
Replace the following variables used in the sample code:
- PROJECT_ID: The ID of your Google Cloud project.
- LOCATION: The region to process the request.
- MODEL_ID: LLM model for content generation. For
example,
gemini-1.5-flash-002
. - GENERATION_METHOD: LLM method for content generation.
Options are
generateContent
andstreamGenerateContent
. - INPUT_PROMPT: The text sent to the LLM for content generation. Use a prompt relevant to the documents in Vertex AI Search.
- RAG_CORPUS_RESOURCE: The name of the RAG corpus
resource.
Format:projects/{project}/locations/{location}/ragCorpora/{rag_corpus}
. - SIMILARITY_TOP_K: Optional: The number of top contexts to retrieve.
- MODEL_NAME: The name of the model used for reranking.
curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
"https://LOCATION-aiplatform.googleapis.com/v1beta1/projects/PROJECT_ID/locations/LOCATION/publishers/google/models/MODEL_ID:GENERATION_METHOD" \
-d '{
"contents": {
"role": "user",
"parts": {
"text": "INPUT_PROMPT"
}
},
"tools": {
"retrieval": {
"disable_attribution": false,
"vertex_rag_store": {
"rag_resources": {
"rag_corpus": "RAG_CORPUS_RESOURCE"
},
"rag_retrieval_config": {
"top_k": 10,
"ranking": {
"llm_ranker": {
"model_name": "MODEL_NAME"
}
}
}
}
}
}
}'
Python
To learn how to install or update the Vertex AI SDK for Python, see Install the Vertex AI SDK for Python. For more information, see the Python API reference documentation.
Replace the following variables used in the sample code:
- PROJECT_ID: The ID of your Google Cloud project.
- LOCATION: The region to process the request.
- MODEL_ID: LLM model for content generation. For
example,
gemini-1.5-flash-002
. - GENERATION_METHOD: LLM method for content generation.
Options are
generateContent
andstreamGenerateContent
. - INPUT_PROMPT: The text sent to the LLM for content generation. Use a prompt relevant to the documents in Vertex AI Search.
- RAG_CORPUS_RESOURCE: The name of the RAG corpus
resource.
Format:projects/{project}/locations/{location}/ragCorpora/{rag_corpus}
. - SIMILARITY_TOP_K: Optional: The number of top contexts to retrieve.
- MODEL_NAME: The name of the model used for reranking.
from vertexai.preview import rag
from vertexai.preview.generative_models import GenerativeModel, Tool
import vertexai
PROJECT_ID = "PROJECT_ID"
CORPUS_NAME = "projects/{PROJECT_ID}/locations/LOCATION/ragCorpora/RAG_CORPUS_RESOURCE"
MODEL_NAME= "MODEL_NAME"
# Initialize Vertex AI API once per session
vertexai.init(project=PROJECT_ID, location="LOCATION")
config = vertexai.preview.rag.RagRetrievalConfig(
top_k=10,
ranking=rag.Ranking(
llm_ranker=rag.LlmRanker(
model_name=MODEL_NAME
)
)
)
rag_retrieval_tool = Tool.from_retrieval(
retrieval=rag.Retrieval(
source=rag.VertexRagStore(
rag_resources=[
rag.RagResource(
rag_corpus=CORPUS_NAME,
)
],
rag_retrieval_config=config
),
)
)
rag_model = GenerativeModel(
model_name=MODEL_NAME, tools=[rag_retrieval_tool]
)
response = rag_model.generate_content("Why is the sky blue?")
print(response.text)
# Example response:
# The sky appears blue due to a phenomenon called Rayleigh scattering.
# Sunlight, which contains all colors of the rainbow, is scattered
# by the tiny particles in the Earth's atmosphere....
# ...
Vertex rank service reranker prerequisites
To use Vertex AI rank service reranker, Discovery Engine API must be enabled.
Retrieve relevant contexts using the RAG API
After you create your RAG corpus, relevant contexts can be retrieved from the
Vertex AI Search through the RetrieveContexts
API.
These code samples demonstrate how to use the API to retrieve contexts from Vertex AI Search.
REST
Replace the following variables used in the sample code:
- PROJECT_ID: The ID of your Google Cloud project.
- LOCATION: The region to process your request.
- RAG_CORPUS_RESOURCE: The name of the RAG corpus resource.
Format:
projects/{project}/locations/{location}/ragCorpora/{rag_corpus}
. - TEXT: The query text to get relevant contexts.
- MODEL_NAME: The name of the model used for reranking.
curl -X POST \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
"https://LOCATION-aiplatform.googleapis.com/v1beta1/projects/PROJECT_ID/locations/LOCATION:retrieveContexts" \
-d '{
"vertex_rag_store": {
"rag_resources": {
"rag_corpus": "RAG_CORPUS_RESOURCE"
}
},
"query": {
"text": "TEXT",
"rag_retrieval_config": {
"top_k": 5,
"ranking": {
"rank_service": {
"model_name": "MODEL_NAME"
}
}
}
}
}'
Python
To learn how to install or update the Vertex AI SDK for Python, see Install the Vertex AI SDK for Python. For more information, see the Python API reference documentation.
Replace the following variables used in the sample code:
- PROJECT_ID: The ID of your Google Cloud project.
- LOCATION: The region to process your request.
- RAG_CORPUS_RESOURCE: The name of the RAG corpus
resource.
Format:projects/{project}/locations/{location}/ragCorpora/{rag_corpus}
. - TEXT: The query text to get relevant contexts.
- MODEL_NAME: The name of the model used for reranking.
from vertexai.preview import rag
import vertexai
PROJECT_ID = "PROJECT_ID"
CORPUS_NAME = "projects/[PROJECT_ID]/locations/LOCATION/ragCorpora/RAG_CORPUS_RESOURCE"
MODEL_NAME= "MODEL_NAME"
# Initialize Vertex AI API once per session
vertexai.init(project=PROJECT_ID, location="LOCATION")
rag_retrieval_config = rag.RagRetrievalConfig(
top_k=10,
ranking=rag.Ranking(
rank_service=rag.RankService(
model_name=MODEL_NAME
)
)
)
response = rag.retrieval_query(
rag_resources=[
rag.RagResource(
rag_corpus=CORPUS_NAME,
)
],
text="TEXT",
rag_retrieval_config=rag_retrieval_config,
)
print(response)
# Example response:
# contexts {
# contexts {
# source_uri: "gs://your-bucket-name/file.txt"
# text: "....
# ....
Generate content using the RAG API
REST
To generate content using Gemini models, make a call to the
Vertex AI GenerateContent
API. By specifying the
RAG_CORPUS_RESOURCE
in the request, the model automatically retrieves data
from the Vertex AI Search.
Replace the following variables used in the sample code:
- PROJECT_ID: The ID of your Google Cloud project.
- LOCATION: The region to process the request.
- MODEL_ID: LLM model for content generation. For
example,
gemini-1.5-flash-002
. - GENERATION_METHOD: LLM method for content generation.
Options include
generateContent
andstreamGenerateContent
. - INPUT_PROMPT: The text sent to the LLM for content generation. Use a prompt relevant to the documents in Vertex AI Search.
- RAG_CORPUS_RESOURCE: The name of the RAG corpus
resource.
Format:projects/{project}/locations/{location}/ragCorpora/{rag_corpus}
. - SIMILARITY_TOP_K: Optional: The number of top contexts to retrieve.
- MODEL_NAME: The name of the model used for reranking.
curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
"https://LOCATION-aiplatform.googleapis.com/v1beta1/projects/PROJECT_ID/locations/LOCATION/publishers/google/models/MODEL_ID:GENERATION_METHOD" \
-d '{
"contents": {
"role": "user",
"parts": {
"text": "INPUT_PROMPT"
}
},
"tools": {
"retrieval": {
"disable_attribution": false,
"vertex_rag_store": {
"rag_resources": {
"rag_corpus": "RAG_CORPUS_RESOURCE"
},
"rag_retrieval_config": {
"top_k": 10,
"ranking": {
"rank_service": {
"model_name": "MODEL_NAME"
}
}
}
}
}
}
}'
Python
To learn how to install or update the Vertex AI SDK for Python, see Install the Vertex AI SDK for Python. For more information, see the Python API reference documentation.
Replace the following variables used in the sample code:
- PROJECT_ID: The ID of your Google Cloud project.
- LOCATION: The region to process the request.
- MODEL_ID: LLM model for content generation. For
example,
gemini-1.5-flash-002
. - GENERATION_METHOD: LLM method for content generation.
Options include
generateContent
andstreamGenerateContent
. - INPUT_PROMPT: The text sent to the LLM for content generation. Use a prompt relevant to the documents in Vertex AI Search.
- RAG_CORPUS_RESOURCE: The name of the RAG corpus
resource.
Format:projects/{project}/locations/{location}/ragCorpora/{rag_corpus}
. - SIMILARITY_TOP_K: Optional: The number of top contexts to retrieve.
- MODEL_NAME: The name of the model used for reranking.
from vertexai.preview import rag
from vertexai.preview.generative_models import GenerativeModel, Tool
import vertexai
PROJECT_ID = "PROJECT_ID"
CORPUS_NAME = "projects/{PROJECT_ID}/locations/LOCATION/ragCorpora/RAG_CORPUS_RESOURCE"
# Initialize Vertex AI API once per session
vertexai.init(project=PROJECT_ID, location="LOCATION")
config = vertexai.preview.rag.RagRetrievalConfig(
top_k=10,
ranking=rag.Ranking(
rank_service=rag.RankService(
model_name=MODEL_NAME
)
)
)
rag_retrieval_tool = Tool.from_retrieval(
retrieval=rag.Retrieval(
source=rag.VertexRagStore(
rag_resources=[
rag.RagResource(
rag_corpus=CORPUS_NAME,
)
],
rag_retrieval_config=config
),
)
)
rag_model = GenerativeModel(
model_name="MODEL_NAME", tools=[rag_retrieval_tool]
)
response = rag_model.generate_content("INPUT_PROMPT")
print(response.text)
# Example response:
# The sky appears blue due to a phenomenon called Rayleigh scattering.
# Sunlight, which contains all colors of the rainbow, is scattered
# by the tiny particles in the Earth's atmosphere....
# ...
What's next
- To learn more about ranking models, see Rank service models.
- For information about pricing, see Vertex AI pricing.
- For information about limitations, see Request quotas.
- To learn more about choosing embedding models, see Use embedding models with RAG Engine.
- To learn more about RAG Engine, see
Overview of RAG Engine.