本页介绍了如何了解媒体数据的各种指标是否达到了其要求阈值。
关于检查媒体数据质量
由于近期用户事件对媒体推荐至关重要,因此您必须定期检查提取的数据和用户事件的质量。为此,您可以查看媒体推荐应用的优化标签页,确定可以对数据做出哪些改进,以便优化推荐内容的质量。
如果指标未达到阈值,则指标的状态为“警告”。然后,您需要查看该指标及其说明,以确定应采取哪些措施来提高媒体质量。
所有模型和目标都需要通过通用质量指标阈值。某些模型和目标还有其他特定于应用的质量指标和阈值。对于使用相同数据存储区的所有应用,常规质量指标都是相同的,但特定于应用的质量指标因应用的模型和目标而异。
如需了解推荐模型和目标,请参阅媒体应用推荐类型简介。
检查数据质量
如需检查媒体推荐数据的质量,请按以下步骤操作:
在 Google Cloud 控制台中,前往 Agent Builder 页面。
点击要检查数据质量的媒体推荐应用的名称。
在导航菜单中,点击数据质量,然后点击优化标签页。此页面会显示与您的应用关联的数据的各种指标的状态。
查看页面顶部的总体质量和应用专用质量状态。如果一个或多个指标超出阈值,页面顶部的摘要状态会显示为警告。
两个指标表(一般质量和特定于应用的质量)列出了各个指标。
在指标表格中,点击查看详情,详细了解处于警告状态的任何指标。
可选:如果您想查看符合要求的指标的阈值,请点击查看详情。合规指标的阈值不会显示在指标表格中。
使用 requirements:checkRequirement
方法检查媒体推荐数据的质量,如下所示。
如需从命令行检查质量,请按以下步骤操作:
找到您的数据存储区 ID。如果您已拥有数据存储 ID,请跳至下一步。
在 Google Cloud 控制台中,前往 Agent Builder 页面,然后在导航菜单中点击数据存储区。
点击您的数据存储区的名称。
在数据存储区的数据页面上,获取数据存储区 ID。
运行以下 curl 命令,了解您的媒体建议是否达到了常规指标的阈值:
curl -X POST \ -H "Authorization: Bearer $(gcloud auth print-access-token)" \ -H "Content-Type: application/json" \ -H "X-GFE-SSL: yes" \ -H "X-Goog-User-Project:
PROJECT_ID " \ "https://discoveryengine.googleapis.com/v1alpha/projects/PROJECT_ID /locations/global/requirements:checkRequirement" \ -d '{ "location": "projects/PROJECT_ID /locations/global", "requirementType": "discoveryengine.googleapis.com/media_recs/general/all/warning", "resources": [ { "labels": { "branch_id": "0", "collection_id": "default_collection", "datastore_id": "DATA_STORE_ID ", "location_id": "global", "project_number": "PROJECT_ID " }, "type": "discoveryengine.googleapis.com/Branch" }, { "labels": { "collection_id": "default_collection", "datastore_id": "DATA_STORE_ID ", "location_id": "global", "project_number": "PROJECT_ID " }, "type": "discoveryengine.googleapis.com/DataStore" } ] }'替换以下内容:
PROJECT_ID
:您的 Google Cloud 项目的 ID。DATA_STORE_ID
:Vertex AI Search 数据存储区的 ID。
命令和结果示例
curl -X POST -H "Authorization: Bearer $(gcloud auth print-access-token)" -H "Content-Type: application/json" -H "X-GFE-SSL: yes" -H "X-Goog-User-Project: my-project-123" "https://discoveryengine.googleapis.com/v1alpha/projects/my-project-123/locations/global/requirements:checkRequirement" -d '{ "location": "projects/123456/locations/global", "requirementType": "discoveryengine.googleapis.com/media_recs/general/all/warning", "resources": [ { "labels": { "branch_id": "0", "collection_id": "default_collection", "datastore_id": "my-data-store", "location_id": "global", "project_number": "123456" }, "type": "discoveryengine.googleapis.com/Branch" }, { "labels": { "collection_id": "default_collection", "datastore_id": "my-data-store", "location_id": "global", "project_number": "123456" }, "type": "discoveryengine.googleapis.com/DataStore" } ] }'
{ "requirement": { "type": "discoveryengine.googleapis.com/media_recs/general/all/warning", "displayName": "Warning level requirements for all models and all business objectives.", "description": "Requirements for the media recommendations model that will result in performance issue if not met for all media recommendations models and all business objectives.", "condition": { "expression": "doc_with_same_title_percentage \u003c doc_with_same_title_percentage_threshold && most_common_visitor_id_percentage \u003c most_common_visitor_id_percentage_threshold && short_term_unjoined_events_percentage \u003c short_term_unjoined_events_percentage_threshold && long_term_unjoined_events_percentage \u003c long_term_unjoined_events_percentage_threshold" }, "metricBindings": [ { "variableId": "doc_with_same_title_percentage", "resourceType": "discoveryengine.googleapis.com/Branch", "metricFilter": "metric.type = 'discoveryengine.googleapis.com/branch/documents/items_with_same_title' AND metric.labels.is_percentage = 'True' AND resource.labels.project_number = '123456' AND resource.labels.branch_id = '0' AND resource.labels.datastore_id = 'my-data-store' AND resource.labels.location_id = 'global' AND resource.labels.collection_id = 'default_collection'", "description": "The percentage of the documents with the same title in a branch.", "category": "Document" }, { "variableId": "most_common_visitor_id_percentage", "resourceType": "discoveryengine.googleapis.com/DataStore", "metricFilter": "metric.type = 'discoveryengine.googleapis.com/branch/datastore/user_events/most_used_visitor_id_events' AND metric.labels.is_percentage = 'True' AND resource.labels.datastore_id = 'my-data-store' AND resource.labels.project_number = '123456' AND resource.labels.location_id = 'global' AND resource.labels.collection_id = 'default_collection'", "description": "The percentage of the events with the same visitor id.", "category": "DataStore" }, { "variableId": "short_term_unjoined_events_percentage", "resourceType": "discoveryengine.googleapis.com/DataStore", "metricFilter": "metric.type = 'discoveryengine.googleapis.com/datastore/user_events/unjoined_events_for_document_ids' AND metric.labels.is_percentage = 'True' AND metric.conditions.time_range = 'WEEK' AND resource.labels.datastore_id = 'my-data-store' AND resource.labels.project_number = '123456' AND resource.labels.location_id = 'global' AND resource.labels.collection_id = 'default_collection'", "description": "The percentage of events refers to a document id that is not in the catalog in the last 7 days.", "category": "DataStore" }, { "variableId": "long_term_unjoined_events_percentage", "resourceType": "discoveryengine.googleapis.com/DataStore", "metricFilter": "metric.type = 'discoveryengine.googleapis.com/datastore/user_events/unjoined_events_for_document_ids' AND metric.labels.is_percentage = 'True' AND metric.conditions.time_range = 'NINETY_DAYS' AND resource.labels.datastore_id = 'my-data-store' AND resource.labels.project_number = '123456' AND resource.labels.location_id = 'global' AND resource.labels.collection_id = 'default_collection'", "description": "The percentage of events refers to a document id that is not in the catalog in the last 90 days.", "category": "DataStore" } ], "thresholdBindings": [ { "variableId": "doc_with_same_title_percentage_threshold", "threshold_values": { "severity": "WARNING", "value": 1.0 } "description": "The threshold for the percentage of the documents with the same title in a branch." }, { "variableId": "most_common_visitor_id_percentage_threshold", "threshold_values": { "severity": "WARNING", "value": 5.0 } "description": "The threshold for the percentage of the events with the same visitor id." }, { "variableId": "short_term_unjoined_events_percentage_threshold", "threshold_values": { "severity": "WARNING", "value": 5.0 } "description": "The threshold for the percentage of the events refers to a document id that is not in the catalog in the last 7 days." }, { "variableId": "long_term_unjoined_events_percentage_threshold", "threshold_values": { "severity": "WARNING", "value": 2.0 } "description": "The threshold for the percentage of the events refers to a document id that is not in the catalog in the last 90 days" } ] }, "result": "WARNING", "requirementCondition": { "expression": "doc_with_same_title_percentage \u003c doc_with_same_title_percentage_threshold && most_common_visitor_id_percentage \u003c most_common_visitor_id_percentage_threshold && short_term_unjoined_events_percentage \u003c short_term_unjoined_events_percentage_threshold && long_term_unjoined_events_percentage \u003c long_term_unjoined_events_percentage_threshold" }, "metricResults": [ { "name": "short_term_unjoined_events_percentage", "value": { "doubleValue": 0 }, "timestamp": "2024-06-06T03:03:13.416900898Z", "unit": "%", "metricType": "discoveryengine.googleapis.com/datastore/user_events/unjoined_events_for_document_ids" }, { "name": "long_term_unjoined_events_percentage", "value": { "doubleValue": 0 }, "timestamp": "2024-06-06T03:03:13.417962744Z", "unit": "%", "metricType": "discoveryengine.googleapis.com/datastore/user_events/unjoined_events_for_document_ids" }, { "name": "most_common_visitor_id_percentage", "value": { "doubleValue": 0.8 }, "timestamp": "2024-06-06T03:03:16.090037135Z", "unit": "%", "metricType": "discoveryengine.googleapis.com/datastore/user_events/most_used_visitor_id_events" }, { "name": "doc_with_same_title_percentage", "value": { "doubleValue": 30.47 }, "timestamp": "2024-06-06T03:03:17.599458357Z", "unit": "%", "metricType": "discoveryengine.googleapis.com/documents/items_with_same_title" } ], "oldestMetricTimestamp": "2024-06-06T03:03:13.416900898Z" }查看输出:
查找
result
的值:如果值为
SUCCESS
,则表示您的数据符合一般要求;请继续执行第 4 步。如果值为
WARNING
,请继续执行第 b 步。如果您在输出中没有看到
result
,可能有以下几种原因:请求中的
PROJECT_ID
或DATA_STORE_ID
不正确。某些指标值无法提供。请在 6 小时后重试,或与客户工程师联系寻求帮助。
查找表达式 (
requirement.Condition.Expression
):如果此表达式的计算结果为 false,则表示您的数据存在问题。指标的值位于
requirementCondition.metricResults.value
字段中。警告阈值值位于thresholdBindings.thresholdValues
字段中。description
字段可帮助您了解指标的用途。例如,
doc_with_same_title_percentage
的值为30.47
,doc_with_same_title_percentage_threshold
的警告阈值为1
。数据存储区中有很多标题相同,这是一个数据问题,需要进行调查。
如果为推荐应用使用的模型和目标组合显示在此表格中,则您还需要调用检查要求方法,并使用模型和目标的值进行更新:
型号 目标 MODEL_OBJ
您可能喜欢的其他类型 转化率 oyml/cvr
为您推荐 转化率 rfy/cvr
更多类似内容 转化率 mlt/cvr
最热门 转化率 mp/cvr
您可能喜欢的其他类型 每次访问的观看时长 oyml/wdps
为您推荐 每次访问的观看时长 rfy/wdps
更多类似内容 每次访问的观看时长 mlt/wdps
curl -X POST \ -H "Authorization: Bearer $(gcloud auth print-access-token)" \ -H "Content-Type: application/json" \ -H "X-GFE-SSL: yes" \ -H "X-Goog-User-Project:
PROJECT_ID " \ "https://discoveryengine.googleapis.com/v1alpha/projects/PROJECT_ID /locations/global/requirements:checkRequirement" \ -d '{ "location": "projects/PROJECT_ID /locations/global", "requirementType": "discoveryengine.googleapis.com/media_recs/MODEL_OBJ /warning", "resources": [ { "labels": { "branch_id": "0", "collection_id": "default_collection", "datastore_id": "DATA_STORE_ID ", "location_id": "global", "project_number": "PROJECT_ID " }, "type": "discoveryengine.googleapis.com/Branch" }, { "labels": { "collection_id": "default_collection", "datastore_id": "DATA_STORE_ID ", "location_id": "global", "project_number": "PROJECT_ID " }, "type": "discoveryengine.googleapis.com/DataStore" } ] }'替换以下内容:
PROJECT_ID
:您的 Google Cloud 项目的 ID。DATA_STORE_ID
:Vertex AI Search 数据存储区的 ID。MODEL_OBJ
:请参阅上表,为您的推荐应用选择正确的值。
命令和结果示例
以下示例适用于“更多类似内容”模型和观看时长目标:
curl -X POST -H "Authorization: Bearer $(gcloud auth print-access-token)" -H "Content-Type: application/json" -H "X-GFE-SSL: yes" -H "X-Goog-User-Project: my-project-123" "https://discoveryengine.googleapis.com/v1alpha/projects/my-project-123/locations/global/collections/default_collection/dataStores/my-data-store/branches/0/requirements:checkRequirement" -d '{ "location": "projects/my-project-123/locations/global", "requirementType": "discoveryengine.googleapis.com/media_recs/mlt/wdps/warning", "resources": [ { "labels": { "branch_id": "0", "collection_id": "default_collection", "datastore_id": "my-data-store", "location_id": "global", "project_number": "my-project-123" }, "type": "discoveryengine.googleapis.com/Branch" }, { "labels": { "collection_id": "default_collection", "datastore_id": "my-data-store", "location_id": "global", "project_number": "my-project-123" }, "type": "discoveryengine.googleapis.com/DataStore" } ] }'
{ "requirement": { "type": "discoveryengine.googleapis.com/media_recs/mlt/wdps/warning", "displayName": "Warning level requirements for 'More Like This' models and 'Watch duration per session' business objectives.", "description": "Requirements for the media recommendations model that will result in performance issue if not met for the 'More Like This' model and the 'Watch duration per session' business objective.", "condition": { "expression": "invalid_sequence_percentage \u003c= invalid_sequence_percentage_threshold" }, "metricBindings": [ { "variableId": "invalid_sequence_percentage", "resourceType": "discoveryengine.googleapis.com/DataStore", "metricFilter": "metric.type = 'discoveryengine.googleapis.com/datastore/user_events/invalid_sequences_media_play_media_complete' AND metric.labels.is_percentage = 'True' AND resource.labels.location_id = 'global' AND resource.labels.collection_id = 'default_collection' AND resource.labels.project_number = '123456' AND resource.labels.datastore_id = 'my-data-store'", "description": "The percentage of invalid sequences for media play and media complete events sampled by randomly selected visitor ids.", "category": "DataStore" } ], "thresholdBindings": [ { "variableId": "invalid_sequence_percentage_threshold", "thresholdValues": [ { "severity": "WARNING", "value": 50 } ], "description": "The threshold for the percentage of invalid sequences sampled among all media play and media complete events." } ] }, "result": "SUCCESS", "requirementCondition": { "expression": "invalid_sequence_percentage \u003c= invalid_sequence_percentage_threshold" }, "metricResults": [ { "name": "invalid_sequence_percentage", "value": { "doubleValue": 0 }, "timestamp": "2024-06-06T02:32:00.460056386Z", "unit": "%", "metricType": "discoveryengine.googleapis.com/datastore/user_events/invalid_sequences_media_play_media_complete" } ], "oldestMetricTimestamp": "2024-06-06T02:32:00.460056386Z" }查看输出:
查找
result
的值:如果该值为
SUCCESS
,则表示您的数据足够好。如果值为
WARNING
,请继续执行第 b 步。如果您在输出中没有看到
result
,可能有以下几种原因:请求中的
PROJECT_ID
或DATA_STORE_ID
不正确。某些指标值无法提供。请在 6 小时后重试,或与客户工程师联系寻求帮助。
查看表达式 (
requirement.Condition.Expression
)。如果此表达式的计算结果为 false,则表示您的数据存在问题。您可以在
requirementCondition.metricResults.value
字段中找到指标的值,在thresholdBindings.thresholdValues
字段中找到警告阈值。description
字段有助于您了解该指标的用途。