As part of your search experience with Agentspace Enterprise, you can evaluate the quality of your search results for generic search apps using sample query sets.
You can evaluate the performance of generic search apps that contain structured and unstructured data.
You can't evaluate the performance of apps with multiple data stores.
This page explains why, when, and how to evaluate the search quality using the evaluation method.
Overview
This section describes why and when to perform search quality evaluation. For information on how to perform search quality evaluation, see Process for evaluating search quality.
Reasons to perform evaluation
Assessment of your search quality provides you metrics that help perform tasks such as the following:
- At an aggregate-level, gauge your search engine's performance
- At a query-level, locate patterns to understand potential biases or shortcomings in ranking algorithms
- Compare historical evaluation results to understand the impact of changes in your search configuration
For a list of metrics, see Understand the results.
When to perform evaluation
Agentspace Enterprise extends several search configurations to enhance your search experience. You can perform search quality evaluation after you make the following changes:
- Configure serving controls for search
- Tune your search results
- Use custom embeddings
- Filter search results
- Boost search results
You can also run the evaluation tests regularly because the search behavior updates periodically.
About sample query sets
Sample query sets are used for quality evaluation. The sample query set must adhere to its prescribed format, and it must contain query entries that have the following nested fields:
- Queries: the query whose search results are used to generate the evaluation metrics and determine the search quality. Google recommends using a diverse set of queries that reflects your user's search pattern and behavior.
Targets: the URI of the document that's expected as a search result of the sample query. To understand the definition of document for structured, unstructured, see Documents.
When the target documents are compared to the documents retrieved in the search response, performance metrics are generated. Metrics are generated using these two techniques:
- Document matching: the URIs of the target documents are compared with
the URIs of the retrieved documents. This determines whether the expected
documents are present in the search results. During the comparison, the
evaluation API tries to extract the following fields in the following
order, and use the first available value to match the target with the
retrieved document:
cdoc_url
in thestructData
field of the document definitionuri
in thestructData
field of the document definitionlink
in thederivedStructData
field of the document definitionurl
in thederivedStructData
field of the document definition
- Page matching: when you include page numbers in your sample targets, the evaluation API compares the results at a page-level. This determines whether the pages mentioned in the targets are also cited in the search response. You must enable extractive answers to enable page-level matching. The evaluation API matches the page from the first extractive answer in the search result.
- Document matching: the URIs of the target documents are compared with
the URIs of the retrieved documents. This determines whether the expected
documents are present in the search results. During the comparison, the
evaluation API tries to extract the following fields in the following
order, and use the first available value to match the target with the
retrieved document:
Purpose of sample query sets
Using the same sample query set for all your search quality evaluations for a given data store ensures a consistent and reliable way to measure the search quality results. This also establishes a fair and repeatable system.
The results from each evaluation are compared to target results for each sample query to calculate different metrics, such as recall, precision, and normalized discounted cumulative gain (NDCG). These quantitative metrics are used to rank the results from different search configurations.
Quotas and limits
The following limit applies to the sample query sets:
- Each sample query set can contain a maximum of 20,000 queries.
The following quota applies to the sample query sets:
- You can create a maximum of 100 sample query sets per project and 500 sample query sets per organization.
For more information, see Quotas and limits.
Sample query set format
The query set must conform to the following schema when constructed in JSON format. The query set can contain multiple query entries with one query in each query entry. When presented in newline delimited JSON (NDJSON) format, each query entry must be on a new line.
Import from BigQuery and Cloud Storage
The following section provides the sample query set templates for importing from from BigQuery and Cloud Storage.
Unstructured data
Use the following template to draft a sample query file in JSON format to evaluate unstructured data with metadata.
{
"queryEntry": {
"query": "SAMPLE_QUERY",
"targets": [
{
"uri": "gs://PATH/TO/CLOUD/STORAGE/LOCATION_1.docx"
},
{
"uri": "gs://PATH/TO/CLOUD/STORAGE/LOCATION_2.pdf",
"pageNumbers": [
PAGE_NUMBER_1,
PAGE_NUMBER_2
]
},
{
"uri": "CDOC_URL"
}
]
}
}
Replace the following:
SAMPLE_QUERY
: the query used to test evaluate the search qualityPATH/TO/CLOUD/STORAGE/LOCATION
: the path to the Cloud Storage location where the expected result resides. This is the value of thelink
field in thederivedStructData
field of the document definition.PAGE_NUMBER_1
: an optional field to indicate the page numbers in the PDF file where the expected response for the query is located. This is useful when the file has multiple pages.CDOC_URL
: an optional field to indicate the custom document IDcdoc_url
field in the document metadata in the Agentspace Enterprise data store schema.
Structured data
Use the following template to draft a sample query file in JSON format to evaluate structured data from BigQuery.
{
"queryEntry": {
"query": "SAMPLE_QUERY",
"targets": [
{
"uri": "CDOC_URL"
}
]
}
}
Replace the following:
SAMPLE_QUERY
: the query used to test evaluate the search qualityCDOC_URL
: a required field to indicate the customcdoc_url
field for the structured data field in the Agentspace Enterprise data store schema.
Here's an example of a sample query set in JSON and NDJSON formats:
JSON
[
{
"queryEntry": {
"query": "2018 Q4 Google revenue",
"targets": [
{
"uri": "gs://cloud-samples-data/gen-app-builder/search/alphabet-investor-pdfs/2018Q4_alphabet_earnings_release.pdf"
},
{
"uri": "gs://cloud-samples-data/gen-app-builder/search/alphabet-investor-pdfs/201802024_alphabet_10K.pdf"
}
]
}
},
{
"queryEntry": {
"query": "2019 Q4 Google revenue",
"targets": [
{
"uri": "gs://cloud-samples-data/gen-app-builder/search/alphabet-investor-pdfs/2019Q4_alphabet_earnings_release.pdf"
}
]
}
}
]
NDJSON
{"queryEntry":{"query":"2018 Q4 Google revenue","targets":[{"uri":"gs://cloud-samples-data/gen-app-builder/search/alphabet-investor-pdfs/2018Q4_alphabet_earnings_release.pdf"},{"uri":"gs://cloud-samples-data/gen-app-builder/search/alphabet-investor-pdfs/201802024_alphabet_10K.pdf"}]}}
{"queryEntry":{"query":"2019 Q4 Google revenue","targets":[{"uri":"gs://cloud-samples-data/gen-app-builder/search/alphabet-investor-pdfs/2019Q4_alphabet_earnings_release.pdf"}]}}
Import from local file system
The following section provides the sample query set templates for importing from the local file system.
Unstructured data
Use the following template to draft a sample query file in JSON format to evaluate unstructured data with metadata.
{
"inlineSource": {
"sampleQueries": [
{
"queryEntry": {
"query": "SAMPLE_QUERY",
"targets": [
{
"uri": "gs://PATH/TO/CLOUD/STORAGE/LOCATION_1.docx"
},
{
"uri": "gs://PATH/TO/CLOUD/STORAGE/LOCATION_2.pdf",
"pageNumbers": [
PAGE_NUMBER_1,
PAGE_NUMBER_2
]
},
{
"uri": "CDOC_URL"
}
]
}
}
]
}
}
Replace the following:
SAMPLE_QUERY
: the query used to test evaluate the search qualityPATH/TO/CLOUD/STORAGE/LOCATION
: the path to the Cloud Storage location where the unstructured data file to be queried resides. This is the value of thelink
field in thederivedStructData
field of the document definition.PAGE_NUMBER_1
: an optional field to indicate the page numbers where the required response for the query can be located in the PDF file. This is useful if the file has multiple pages.CDOC_URL
: an optional field to indicate the custom document IDcdoc_url
field in the document metadata in the Agentspace Enterprise data store schema.
Structured data
Use the following template to draft a sample query file in JSON format to evaluate structured data from BigQuery.
{
"inlineSource": {
"sampleQueries": [
{
"queryEntry": {
"query": "SAMPLE_QUERY",
"targets": [
{
"uri": "CDOC_URL"
}
]
}
}
]
}
}
Replace the following:
SAMPLE_QUERY
: the query used to test evaluate the search qualityCDOC_URL
: a required field to indicate the customcdoc_url
field for the structured data field in the Agentspace Enterprise data store schema.
Here's an example of a sample query set:
JSON
{
"inlineSource": {
"sampleQueries": [
{
"queryEntry": {
"query": "2018 Q4 Google revenue",
"targets": [
{
"uri": "gs://cloud-samples-data/gen-app-builder/search/alphabet-investor-pdfs/2018Q4_alphabet_earnings_release.pdf"
},
{
"uri": "gs://cloud-samples-data/gen-app-builder/search/alphabet-investor-pdfs/201802024_alphabet_10K.pdf"
}
]
}
},
{
"queryEntry": {
"query": "2019 Q4 Google revenue",
"targets": [
{
"uri": "gs://cloud-samples-data/gen-app-builder/search/alphabet-investor-pdfs/2019Q4_alphabet_earnings_release.pdf"
}
]
}
}
]
}
}
Process for evaluating search quality
The process of search quality evaluation is as follows:
- Create a sample query set.
- Import sample query that conforms to the prescribed JSON format.
- Run search quality evaluation.
- Understand the results.
The following sections give the instructions to perform these steps using REST API methods.
Before you begin
- The following limit applies:
- At a given time, you can only have a single active evaluation per project.
- The following quota applies:
- You can initiate a maximum of five evaluation requests per day per project. For more information, see Quotas and limits.
- To get page-level metrics, you must enable extractive answers.
Create a sample query set
You can create a sample query set and use it to evaluate the quality of the search responses for a given data store. To create a sample query set, do the following.
REST
The following sample shows how to create the sample query set using the
sampleQuerySets.create
method.
Create the sample query set.
curl -X POST \ -H "Authorization: Bearer $(gcloud auth print-access-token)" \ -H "Content-Type: application/json" \ -H "X-Goog-User-Project: PROJECT_ID" \ "https://discoveryengine.googleapis.com/v1beta/projects/PROJECT_ID/locations/global/sampleQuerySets?sampleQuerySetId=SAMPLE_QUERY_SET_ID" \ -d '{ "displayName": "SAMPLE_QUERY_SET_DISPLAY_NAME" }'
Replace the following:
PROJECT_ID
: the ID of your project.SAMPLE_QUERY_SET_ID
: a custom ID for your sample query set.SAMPLE_QUERY_SET_DISPLAY_NAME
: a custom name for your sample query set.
Import sample query data
After creating the sample query set, import the sample query data. To import the sample query data, you can do any of the following:
- Import from Cloud Storage: import an NDJSON file from a Cloud Storage location.
- Import from BigQuery: import BigQuery data from a BigQuery table. To create the BigQuery table from your NDJSON file, see Loading JSON data from Cloud Storage.
- Import from your local file system: create the sample query set in your local file system and import it.
Cloud Storage
Create the sample query sets that conform to the sample query set format.
Import the JSON file containing the sample query set from a Cloud Storage location using the
sampleQueries.import
method.curl -X POST \ -H "Authorization: Bearer $(gcloud auth print-access-token)" \ -H "Content-Type: application/json" \ -H "X-Goog-User-Project: PROJECT_ID" \ "https://discoveryengine.googleapis.com/v1beta/projects/PROJECT_ID/locations/global/sampleQuerySets/SAMPLE_QUERY_SET_ID/sampleQueries:import" \ -d '{ "gcsSource": { "inputUris": ["INPUT_FILE_PATH"], }, "errorConfig": { "gcsPrefix": "ERROR_DIRECTORY" } }'
Replace the following:
PROJECT_ID
: the ID of your project.SAMPLE_QUERY_SET_ID
: the custom ID for your sample query set that you defined during sample query set creation.INPUT_FILE_PATH
: the path to the Cloud Storage location for your sample query set.ERROR_DIRECTORY
: an optional field to specify the path to the Cloud Storage location where error files are logged when import errors occur. Google recommends leaving this empty or removing theerrorConfig
field so that Agentspace Enterprise can automatically create a temporary location.
Get the status of the long-running operation (LRO) using the
operations.get
method.curl -X GET \ -H "Authorization: Bearer $(gcloud auth print-access-token)" \ "https://discoveryengine.googleapis.com/v1beta/projects/PROJECT_NUMBER/locations/global/sampleQuerySets/SAMPLE_QUERY_SET_ID/operations/OPERATION_ID"
BigQuery
Create the sample query sets that conform to the sample query set format.
Import the JSON file containing the sample query set from a BigQuery location using the
sampleQueries.import
method.curl -X POST \ -H "Authorization: Bearer $(gcloud auth print-access-token)" \ -H "Content-Type: application/json" \ -H "X-Goog-User-Project: PROJECT_ID" \ "https://discoveryengine.googleapis.com/v1beta/projects/PROJECT_ID/locations/global/sampleQuerySets/SAMPLE_QUERY_SET_ID/sampleQueries:import" \ -d '{ "bigquerySource": { "projectId": "PROJECT_ID", "datasetId":"DATASET_ID", "tableId": "TABLE_ID" }, "errorConfig": { "gcsPrefix": "ERROR_DIRECTORY" } }'
Replace the following:
PROJECT_ID
: the ID of your project.SAMPLE_QUERY_SET_ID
: the custom ID for your sample query set that you defined during sample query set creation.DATASET_ID
: the ID of the BigQuery dataset that contains the sample query set.TABLE_ID
: the ID of your BigQuery table that contains the sample query set.ERROR_DIRECTORY
: an optional field to specify the path to the Cloud Storage location where error files are logged when import errors occur. Google recommends leaving this empty or removing the `errorConfig` field so that Agentspace Enterprise can automatically create a temporary location.
Get the status of the long-running operation (LRO) using the
operations.get
method.curl -X GET \ -H "Authorization: Bearer $(gcloud auth print-access-token)" \ "https://discoveryengine.googleapis.com/v1beta/projects/PROJECT_NUMBER/locations/global/sampleQuerySets/SAMPLE_QUERY_SET_ID/operations/OPERATION_ID"
Local file system
Create the sample query sets that conform to the sample query set format.
Import the JSON file containing the sample query set from a local file system location using the
sampleQueries.import
method.curl -X POST \ -H "Authorization: Bearer $(gcloud auth print-access-token)" \ -H "Content-Type: application/json" \ -H "X-Goog-User-Project: PROJECT_ID" \ "https://discoveryengine.googleapis.com/v1beta/projects/PROJECT_ID/locations/global/sampleQuerySets/SAMPLE_QUERY_SET_ID/sampleQueries:import" \ --data @PATH/TO/LOCAL/FILE.json
Replace the following:
PROJECT_ID
: the ID of your project.SAMPLE_QUERY_SET_ID
: the custom ID for your sample query set that you defined during sample query set creation.PATH/TO/LOCAL/FILE.json
: the path to the JSON file that contains the sample query set.
Get the status of the long-running operation (LRO) using the
operations.get
method.curl -X GET \ -H "Authorization: Bearer $(gcloud auth print-access-token)" \ "https://discoveryengine.googleapis.com/v1beta/projects/PROJECT_NUMBER/locations/global/sampleQuerySets/SAMPLE_QUERY_SET_ID/operations/OPERATION_ID"
Run search quality evaluation
After importing the sample query data into the sample query sets, follow these steps to run the search quality evaluation.
REST
Initiate a search quality evaluation.
curl -X POST \ -H "Authorization: Bearer $(gcloud auth print-access-token)" \ -H "Content-Type: application/json" \ -H "X-Goog-User-Project: PROJECT_ID" \ "https://discoveryengine.googleapis.com/v1beta/projects/PROJECT_ID/locations/global/evaluations" \ -d '{ "evaluationSpec": { "querySetSpec": { "sampleQuerySet": "projects/PROJECT_ID/locations/global/sampleQuerySets/SAMPLE_QUERY_SET_ID" }, "searchRequest": { "servingConfig": "projects/PROJECT_ID/locations/global/collections/default_collection/engines/APP_ID/servingConfigs/default_search" } } }'
Replace the following:
PROJECT_ID
: the ID of your project.SAMPLE_QUERY_SET_ID
: the custom ID for your sample query set that you defined during sample query set creation.APP_ID
: the ID of the Agentspace Enterprise app whose search quality you want to evaluate.
Monitor progress of the evaluation.
curl -X GET \ -H "Authorization: Bearer $(gcloud auth print-access-token)" \ -H "Content-Type: application/json" \ -H "X-Goog-User-Project: PROJECT_ID" \ "https://discoveryengine.googleapis.com/v1beta/projects/PROJECT_ID/locations/global/evaluations/EVALUATION_ID"
Replace the following:
PROJECT_ID
: the ID of your project.EVALUATION_ID
: the ID for your evaluation job that was returned in the previous step when you initiated the evaluation.
Retrieve the aggregate results.
curl -X GET \ -H "Authorization: Bearer $(gcloud auth print-access-token)" \ -H "Content-Type: application/json" \ -H "X-Goog-User-Project: PROJECT_ID" \ "https://discoveryengine.googleapis.com/v1beta/projects/PROJECT_ID/locations/global/evaluations/EVALUATION_ID"
Replace the following:
PROJECT_ID
: the ID of your project.EVALUATION_ID
: the ID for your evaluation job that was returned in the previous step when you initiated the evaluation.
Retrieve query-level results.
curl -X GET \ -H "Authorization: Bearer $(gcloud auth print-access-token)" \ -H "Content-Type: application/json" \ -H "X-Goog-User-Project: PROJECT_ID" \ "https://discoveryengine.googleapis.com/v1beta/projects/PROJECT_ID/locations/global/evaluations/EVALUATION_ID:listResults"
Replace the following:
PROJECT_ID
: the ID of your project.EVALUATION_ID
: the ID for your evaluation job that was returned in the previous step when you initiated the evaluation.
Understand the results
The following table describes the metrics that are returned in your evaluation results.
Name | Description | Requirements |
---|---|---|
docRecall |
Recall per document, at various top-k cutoff levels. Recall is the fraction of relevant documents retrieved out of all relevant documents.
For example, For a single query, if 3 out of 5 relevant documents are retrieved in the top-5, the |
The sample query must contain the URI field. |
pageRecall |
Recall per page, at various top-k cutoff levels. Recall is the fraction of relevant pages retrieved out of all relevant pages.
For example, For a single query, if 3 out of 5 relevant pages are retrieved in the top-5, the |
|
docNdcg |
Normalized discounted cumulative gain (NDCG) per document, at various top-k cutoff levels. NDCG measures the ranking quality, giving higher relevance to top results. The NDCG value can be calculated for each query according to Normalized CDG. |
The sample query must contain the URI field. |
pageNdcg |
Normalized discounted cumulative gain (NDCG) per page, at various top-k cutoff levels. NDCG measures the ranking quality, giving higher relevance to top results. The NDCG value can be calculated for each query according to Normalized CDG. |
|
docPrecision |
Precision per document, at various top-k cutoff levels. Precision is the fraction of retrieved documents that are relevant.
For example, For a single query, if 4 out of 5 retrieved documents in the top-5 are relevant, the |
The sample query must contain the URI field. |
Based on the values of these supported metrics, you can perform the following tasks:
- Analyze aggregated metrics:
- Examine overall metrics like average recall, precision, and normalized discounted cumulative gain (NDCG).
- These metrics provide a high-level view of your search engine's performance.
- Review query-level results:
- Drill down into individual queries to identify specific areas where the search engine performs well or poorly.
- Look for patterns in the results to understand potential biases or shortcomings in the ranking algorithms.
- Compare results over time:
- Run evaluations regularly to track changes in search quality over time.
- Use historical data to identify trends and assess the impact of any changes you make to your search engine.
What's next
- Use Cloud Scheduler to set up scheduled quality evaluation. For more information, see Use authentication with HTTP targets.