使用多变量时序预测模型执行异常值检测
本教程介绍了如何执行以下任务:
- 创建
ARIMA_PLUS_XREG
时序预测模型。 - 通过对模型运行
ML.DETECT_ANOMALIES
函数来检测时序数据中的异常值。
本教程使用公共 epa_historical_air_quality
数据集中的以下表,该数据集包含从美国多个城市收集的每日 PM 2.5、温度和风速信息:
epa_historical_air_quality.pm25_nonfrm_daily_summary
epa_historical_air_quality.wind_daily_summary
epa_historical_air_quality.temperature_daily_summary
所需权限
- 如需创建数据集,您需要拥有
bigquery.datasets.create
IAM 权限。 如需创建连接资源,您需要以下权限:
bigquery.connections.create
bigquery.connections.get
如需创建模型,您需要以下权限:
bigquery.jobs.create
bigquery.models.create
bigquery.models.getData
bigquery.models.updateData
bigquery.connections.delegate
如需运行推理,您需要以下权限:
bigquery.models.getData
bigquery.jobs.create
如需详细了解 BigQuery 中的 IAM 角色和权限,请参阅 IAM 简介。
费用
在本文档中,您将使用 Google Cloud 的以下收费组件:
- BigQuery: You incur costs for the data you process in BigQuery.
您可使用价格计算器根据您的预计使用情况来估算费用。
如需了解详情,请参阅 BigQuery 价格。
须知事项
- Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.
-
In the Google Cloud console, on the project selector page, select or create a Google Cloud project.
-
Make sure that billing is enabled for your Google Cloud project.
-
Enable the BigQuery API.
-
In the Google Cloud console, on the project selector page, select or create a Google Cloud project.
-
Make sure that billing is enabled for your Google Cloud project.
-
Enable the BigQuery API.
创建数据集
创建 BigQuery 数据集以存储您的机器学习模型:
在 Google Cloud 控制台中,转到 BigQuery 页面。
在探索器窗格中,点击您的项目名称。
点击
查看操作 > 创建数据集。在创建数据集页面上,执行以下操作:
在数据集 ID 部分,输入
bqml_tutorial
。在位置类型部分,选择多区域,然后选择 US (multiple regions in United States)(美国[美国的多个区域])。
公共数据集存储在
US
多区域中。为简单起见,请将数据集存储在同一位置。保持其余默认设置不变,然后点击创建数据集。
准备训练数据
PM2.5、温度和风速数据位于不同的表中。通过组合这些公开表中的数据来创建 bqml_tutorial.seattle_air_quality_daily
训练数据表。bqml_tutorial.seattle_air_quality_daily
包含以下列:
date
:观察的日期PM2.5
:每天的平均 PM2.5 值wind_speed
:每天的平均风速temperature
:每天的最高温度
新表包含 2009 年 8 月 11 日至 2022 年 1 月 31 日的每日数据。
转到 BigQuery 页面。
在 SQL 编辑器窗格中,运行以下 SQL 语句:
CREATE TABLE `bqml_tutorial.seattle_air_quality_daily` AS WITH pm25_daily AS ( SELECT avg(arithmetic_mean) AS pm25, date_local AS date FROM `bigquery-public-data.epa_historical_air_quality.pm25_nonfrm_daily_summary` WHERE city_name = 'Seattle' AND parameter_name = 'Acceptable PM2.5 AQI & Speciation Mass' GROUP BY date_local ), wind_speed_daily AS ( SELECT avg(arithmetic_mean) AS wind_speed, date_local AS date FROM `bigquery-public-data.epa_historical_air_quality.wind_daily_summary` WHERE city_name = 'Seattle' AND parameter_name = 'Wind Speed - Resultant' GROUP BY date_local ), temperature_daily AS ( SELECT avg(first_max_value) AS temperature, date_local AS date FROM `bigquery-public-data.epa_historical_air_quality.temperature_daily_summary` WHERE city_name = 'Seattle' AND parameter_name = 'Outdoor Temperature' GROUP BY date_local ) SELECT pm25_daily.date AS date, pm25, wind_speed, temperature FROM pm25_daily JOIN wind_speed_daily USING (date) JOIN temperature_daily USING (date)
创建模型
使用 bqml_tutorial.seattle_air_quality_daily
中的数据作为训练数据,创建多变量时序模型。
转到 BigQuery 页面。
在 SQL 编辑器窗格中,运行以下 SQL 语句:
CREATE OR REPLACE MODEL `bqml_tutorial.arimax_model` OPTIONS ( model_type = 'ARIMA_PLUS_XREG', auto_arima=TRUE, time_series_data_col = 'temperature', time_series_timestamp_col = 'date' ) AS SELECT * FROM `bqml_tutorial.seattle_air_quality_daily`;
查询需要几秒钟才能完成,之后模型
arimax_model
会显示在探索器窗格的bqml_tutorial
数据集中。由于查询使用
CREATE MODEL
语句来创建模型,因此没有查询结果。
对历史数据执行异常值检测
针对用于训练模型的历史数据运行异常值检测。
转到 BigQuery 页面。
在 SQL 编辑器窗格中,运行以下 SQL 语句:
SELECT * FROM ML.DETECT_ANOMALIES ( MODEL `bqml_tutorial.arimax_model`, STRUCT(0.6 AS anomaly_prob_threshold) ) ORDER BY date ASC;
结果类似于以下内容:
+-------------------------+-------------+------------+--------------------+--------------------+---------------------+ | date | temperature | is_anomaly | lower_bound | upper_bound | anomaly_probability | +--------------------------------------------------------------------------------------------------------------------+ | 2009-08-11 00:00:00 UTC | 70.1 | false | 67.65880237416745 | 72.541197625832538 | 0 | +--------------------------------------------------------------------------------------------------------------------+ | 2009-08-12 00:00:00 UTC | 73.4 | false | 71.715603233887791 | 76.597998485552878 | 0.20589853827304627 | +--------------------------------------------------------------------------------------------------------------------+ | 2009-08-13 00:00:00 UTC | 64.6 | true | 67.741606808079425 | 72.624002059744512 | 0.94627126678202522 | +-------------------------+-------------+------------+--------------------+--------------------+---------------------+
对新数据执行异常值检测
对您生成的新数据运行异常值检测。
转到 BigQuery 页面。
在 SQL 编辑器窗格中,运行以下 SQL 语句:
SELECT * FROM ML.DETECT_ANOMALIES ( MODEL `bqml_tutorial.arimax_model`, STRUCT(0.6 AS anomaly_prob_threshold), ( SELECT * FROM UNNEST( [ STRUCT<date TIMESTAMP, pm25 FLOAT64, wind_speed FLOAT64, temperature FLOAT64> ('2023-02-01 00:00:00 UTC', 8.8166665, 1.6525, 44.0), ('2023-02-02 00:00:00 UTC', 11.8354165, 1.558333, 40.5), ('2023-02-03 00:00:00 UTC', 10.1395835, 1.6895835, 46.5), ('2023-02-04 00:00:00 UTC', 11.439583500000001, 2.0854165, 45.0), ('2023-02-05 00:00:00 UTC', 9.7208335, 1.7083335, 46.0), ('2023-02-06 00:00:00 UTC', 13.3020835, 2.23125, 43.5), ('2023-02-07 00:00:00 UTC', 5.7229165, 2.377083, 47.5), ('2023-02-08 00:00:00 UTC', 7.6291665, 2.24375, 44.5), ('2023-02-09 00:00:00 UTC', 8.5208335, 2.2541665, 40.5), ('2023-02-10 00:00:00 UTC', 9.9086955, 7.333335, 39.5) ] ) ) );
结果类似于以下内容:
+-------------------------+-------------+------------+--------------------+--------------------+---------------------+------------+------------+ | date | temperature | is_anomaly | lower_bound | upper_bound | anomaly_probability | pm25 | wind_speed | +----------------------------------------------------------------------------------------------------------------------------------------------+ | 2023-02-01 00:00:00 UTC | 44.0 | true | 36.917405956304407 | 41.79980120796948 | 0.890904731626234 | 8.8166665 | 1.6525 | +----------------------------------------------------------------------------------------------------------------------------------------------+ | 2023-02-02 00:00:00 UTC | 40.5 | false | 34.622436643607685 | 40.884690866417984 | 0.53985850962605064 | 11.8354165 | 1.558333 | +--------------------------------------------------------------------------------------------------------------------+-------------------------+ | 2023-02-03 00:00:00 UTC | 46.5 | true | 33.769587937313183 | 40.7478502941026 | 0.97434506593220793 | 10.1395835 | 1.6895835 | +-------------------------+-------------+------------+--------------------+--------------------+---------------------+-------------------------+
清理
- In the Google Cloud console, go to the Manage resources page.
- In the project list, select the project that you want to delete, and then click Delete.
- In the dialog, type the project ID, and then click Shut down to delete the project.