Cloud Composer 1 |Cloud Composer 2 |Cloud Composer 3
本页介绍了如何为同一组织中所选项目中的多个 Cloud Composer 环境实现集成监控信息中心。
概览
所述解决方案可帮助中央企业平台团队支持 其他团队使用的 Cloud Composer 环境。这个 可用于监控所有 Cloud Composer 甚至是不是使用 Terraform 创建的环境。
本指南介绍了如何在 Cloud Composer 中实现 Cloud Monitoring 信息中心,以及如何设置提醒政策,以便持续报告 Cloud Composer 环境的关键指标,并在出现问题时发出突发事件。该信息中心会自动扫描为此监控功能选择的项目中的所有 Cloud Composer 环境。该实现依赖于 Terraform。
该模型使用一个 Google Cloud 项目, Monitoring 项目,用于进行监控(只读) Cloud Composer 环境部署在多个 Monitored 中 项目。中央信息中心使用受监控项目中的 Cloud Monitoring 指标来呈现其内容。
信息中心可监控多个指标,并创建相关提醒,其中包括 环境健康状况:
或 CPU 指标:
将指针悬停在特定线条上,可查看它代表的环境。 然后,信息中心会显示项目名称和资源:
如果某个指标超出预定义的阈值,系统会发出突发事件,并在与该指标对应的图表中显示相应的提醒:
受监控指标的列表
受监控指标的完整列表:
- Cloud Composer 环境健康状况(基于 Monitoring DAG)
- 数据库运行状况
- Web 服务器健康状况
- 调度器心跳
- 所有工作器的 CPU 和内存利用率
- Airflow 数据库的 CPU 和内存利用率
- 网络服务器的 CPU 和内存利用率
- Airflow 调度器的 CPU 和内存利用率
- 环境中已加入队列、已安排、已加入队列或已安排的任务所占的比例(有助于发现 Airflow 并发配置问题)
- DAG 解析时间
- 当前工作器数量与最低工作器数量 - 有助于了解工作器稳定性问题或扩缩问题
- 工作器 Pod 逐出
- 工作器、调度器、Web 服务器或其他 组件(各个图表)
准备工作
如需使用 Cloud Composer 和 Cloud Monitoring,您需要创建一个 Google Cloud 项目并启用结算功能。项目必须包含 Cloud Composer 环境。此项目 在本指南中称为 Monitoring 项目。
- Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.
-
In the Google Cloud console, on the project selector page, select or create a Google Cloud project.
-
Make sure that billing is enabled for your Google Cloud project.
-
In the Google Cloud console, on the project selector page, select or create a Google Cloud project.
-
Make sure that billing is enabled for your Google Cloud project.
- 安装 Terraform(如果尚未安装)。
- 配置项目的指标范围。默认情况下,项目可以 仅显示或监控其存储的时间序列数据。如果您想 显示数据或监控存储在多个项目中的数据, 配置项目的指标范围。如需更多信息 请参阅指标范围概览。
实施步骤
在运行 Terraform 的本地计算机上,设置
GOOGLE_CLOUD_PROJECT
环境变量 ID Monitoring 项目:export GOOGLE_CLOUD_PROJECT=MONITORING_PROJECT_ID
确保您的 Terraform Google 提供程序已通过身份验证,并且 获取以下权限:
- Monitoring 项目中的
roles/monitoring.editor
权限 roles/monitoring.viewer
,共roles/logging.viewer
受监控的项目
- Monitoring 项目中的
将以下
main.tf
文件复制到您运行的本地计算机 Terraform。点击即可展开
# Monitoring for multiple Cloud Composer environments # # Usage: # 1. Create a new project that you will use for monitoring of Cloud Composer environments in other projects # 2. Replace YOUR_MONITORING_PROJECT with the name of this project in the "metrics_scope" parameter that is part of the "Add Monitored Projects to the Monitoring project" section # 3. Replace the list of projects to monitor with your list of projects with Cloud Composer environments to be monitored in the "for_each" parameter of the "Add Monitored Projects to the Monitoring project" section # 4. Set up your environment and apply the configuration following these steps: https://cloud.google.com/docs/terraform/basic-commands. Your GOOGLE_CLOUD_PROJECT environment variable should be the new monitoring project you just created. # # The script creates the following resources in the monitoring project: # 1. Adds monitored projects to Cloud Monitoring # 2. Creates Alert Policies # 3. Creates Monitoring Dashboard # ####################################################### # # Add Monitored Projects to the Monitoring project # ######################################################## resource "google_monitoring_monitored_project" "projects_monitored" { for_each = toset(["YOUR_PROJECT_TO_MONITOR_1", "YOUR_PROJECT_TO_MONITOR_2", "YOUR_PROJECT_TO_MONITOR_3"]) metrics_scope = join("", ["locations/global/metricsScopes/", "YOUR_MONITORING_PROJECT"]) name = each.value } ####################################################### # # Create alert policies in Monitoring project # ######################################################## resource "google_monitoring_alert_policy" "environment_health" { display_name = "Environment Health" combiner = "OR" conditions { display_name = "Environment Health" condition_monitoring_query_language { query = join("", [ "fetch cloud_composer_environment", "| {metric 'composer.googleapis.com/environment/dagbag_size'", "| group_by 5m, [value_dagbag_size_mean: if(mean(value.dagbag_size) > 0, 1, 0)]", "| align mean_aligner(5m)", "| group_by [resource.project_id, resource.environment_name], [value_dagbag_size_mean_aggregate: aggregate(value_dagbag_size_mean)]; ", "metric 'composer.googleapis.com/environment/healthy'", "| group_by 5m, [value_sum_signals: aggregate(if(value.healthy,1,0))]", "| align mean_aligner(5m)| absent_for 5m }", "| outer_join 0", "| group_by [resource.project_id, resource.environment_name]", "| value val(2)", "| align mean_aligner(5m)", "| window(5m)", "| condition val(0) < 0.9" ]) duration = "120s" trigger { count = "1" } } } # uncomment to set an auto close strategy for the alert #alert_strategy { # auto_close = "30m" #} } resource "google_monitoring_alert_policy" "database_health" { display_name = "Database Health" combiner = "OR" conditions { display_name = "Database Health" condition_monitoring_query_language { query = join("", [ "fetch cloud_composer_environment", "| metric 'composer.googleapis.com/environment/database_health'", "| group_by 5m,", " [value_database_health_fraction_true: fraction_true(value.database_health)]", "| every 5m", "| group_by 5m,", " [value_database_health_fraction_true_aggregate:", " aggregate(value_database_health_fraction_true)]", "| every 5m", "| group_by [resource.project_id, resource.environment_name],", " [value_database_health_fraction_true_aggregate_aggregate:", " aggregate(value_database_health_fraction_true_aggregate)]", "| condition val() < 0.95"]) duration = "120s" trigger { count = "1" } } } # uncomment to set an auto close strategy for the alert #alert_strategy { # auto_close = "30m" #} } resource "google_monitoring_alert_policy" "webserver_health" { display_name = "Web Server Health" combiner = "OR" conditions { display_name = "Web Server Health" condition_monitoring_query_language { query = join("", [ "fetch cloud_composer_environment", "| metric 'composer.googleapis.com/environment/web_server/health'", "| group_by 5m, [value_health_fraction_true: fraction_true(value.health)]", "| every 5m", "| group_by 5m,", " [value_health_fraction_true_aggregate:", " aggregate(value_health_fraction_true)]", "| every 5m", "| group_by [resource.project_id, resource.environment_name],", " [value_health_fraction_true_aggregate_aggregate:", " aggregate(value_health_fraction_true_aggregate)]", "| condition val() < 0.95"]) duration = "120s" trigger { count = "1" } } } # uncomment to set an auto close strategy for the alert #alert_strategy { # auto_close = "30m" #} } resource "google_monitoring_alert_policy" "scheduler_heartbeat" { display_name = "Scheduler Heartbeat" combiner = "OR" conditions { display_name = "Scheduler Heartbeat" condition_monitoring_query_language { query = join("", [ "fetch cloud_composer_environment", "| metric 'composer.googleapis.com/environment/scheduler_heartbeat_count'", "| group_by 10m,", " [value_scheduler_heartbeat_count_aggregate:", " aggregate(value.scheduler_heartbeat_count)]", "| every 10m", "| group_by 10m,", " [value_scheduler_heartbeat_count_aggregate_mean:", " mean(value_scheduler_heartbeat_count_aggregate)]", "| every 10m", "| group_by [resource.project_id, resource.environment_name],", " [value_scheduler_heartbeat_count_aggregate_mean_aggregate:", " aggregate(value_scheduler_heartbeat_count_aggregate_mean)]", "| condition val() < 80"]) duration = "120s" trigger { count = "1" } } } # uncomment to set an auto close strategy for the alert #alert_strategy { # auto_close = "30m" #} } resource "google_monitoring_alert_policy" "database_cpu" { display_name = "Database CPU" combiner = "OR" conditions { display_name = "Database CPU" condition_monitoring_query_language { query = join("", [ "fetch cloud_composer_environment", "| metric 'composer.googleapis.com/environment/database/cpu/utilization'", "| group_by 10m, [value_utilization_mean: mean(value.utilization)]", "| every 10m", "| group_by [resource.project_id, resource.environment_name]", "| condition val() > 0.8"]) duration = "120s" trigger { count = "1" } } } # uncomment to set an auto close strategy for the alert #alert_strategy { # auto_close = "30m" #} } resource "google_monitoring_alert_policy" "scheduler_cpu" { display_name = "Scheduler CPU" combiner = "OR" conditions { display_name = "Scheduler CPU" condition_monitoring_query_language { query = join("", [ "fetch k8s_container", "| metric 'kubernetes.io/container/cpu/limit_utilization'", "| filter (resource.pod_name =~ 'airflow-scheduler-.*')", "| group_by 10m, [value_limit_utilization_mean: mean(value.limit_utilization)]", "| every 10m", "| group_by [resource.cluster_name],", " [value_limit_utilization_mean_mean: mean(value_limit_utilization_mean)]", "| condition val() > 0.8"]) duration = "120s" trigger { count = "1" } } } # uncomment to set an auto close strategy for the alert #alert_strategy { # auto_close = "30m" #} } resource "google_monitoring_alert_policy" "worker_cpu" { display_name = "Worker CPU" combiner = "OR" conditions { display_name = "Worker CPU" condition_monitoring_query_language { query = join("", [ "fetch k8s_container", "| metric 'kubernetes.io/container/cpu/limit_utilization'", "| filter (resource.pod_name =~ 'airflow-worker.*')", "| group_by 10m, [value_limit_utilization_mean: mean(value.limit_utilization)]", "| every 10m", "| group_by [resource.cluster_name],", " [value_limit_utilization_mean_mean: mean(value_limit_utilization_mean)]", "| condition val() > 0.8"]) duration = "120s" trigger { count = "1" } } } # uncomment to set an auto close strategy for the alert #alert_strategy { # auto_close = "30m" #} } resource "google_monitoring_alert_policy" "webserver_cpu" { display_name = "Web Server CPU" combiner = "OR" conditions { display_name = "Web Server CPU" condition_monitoring_query_language { query = join("", [ "fetch k8s_container", "| metric 'kubernetes.io/container/cpu/limit_utilization'", "| filter (resource.pod_name =~ 'airflow-webserver.*')", "| group_by 10m, [value_limit_utilization_mean: mean(value.limit_utilization)]", "| every 10m", "| group_by [resource.cluster_name],", " [value_limit_utilization_mean_mean: mean(value_limit_utilization_mean)]", "| condition val() > 0.8"]) duration = "120s" trigger { count = "1" } } } # uncomment to set an auto close strategy for the alert #alert_strategy { # auto_close = "30m" #} } resource "google_monitoring_alert_policy" "parsing_time" { display_name = "DAG Parsing Time" combiner = "OR" conditions { display_name = "DAG Parsing Time" condition_monitoring_query_language { query = join("", [ "fetch cloud_composer_environment", "| metric 'composer.googleapis.com/environment/dag_processing/total_parse_time'", "| group_by 5m, [value_total_parse_time_mean: mean(value.total_parse_time)]", "| every 5m", "| group_by [resource.project_id, resource.environment_name]", "| condition val(0) > cast_units(30,\"s\")"]) duration = "120s" trigger { count = "1" } } } # uncomment to set an auto close strategy for the alert #alert_strategy { # auto_close = "30m" #} } resource "google_monitoring_alert_policy" "database_memory" { display_name = "Database Memory" combiner = "OR" conditions { display_name = "Database Memory" condition_monitoring_query_language { query = join("", [ "fetch cloud_composer_environment", "| metric 'composer.googleapis.com/environment/database/memory/utilization'", "| group_by 10m, [value_utilization_mean: mean(value.utilization)]", "| every 10m", "| group_by [resource.project_id, resource.environment_name]", "| condition val() > 0.8"]) duration = "0s" trigger { count = "1" } } } # uncomment to set an auto close strategy for the alert #alert_strategy { # auto_close = "30m" #} } resource "google_monitoring_alert_policy" "scheduler_memory" { display_name = "Scheduler Memory" combiner = "OR" conditions { display_name = "Scheduler Memory" condition_monitoring_query_language { query = join("", [ "fetch k8s_container", "| metric 'kubernetes.io/container/memory/limit_utilization'", "| filter (resource.pod_name =~ 'airflow-scheduler-.*')", "| group_by 10m, [value_limit_utilization_mean: mean(value.limit_utilization)]", "| every 10m", "| group_by [resource.cluster_name],", " [value_limit_utilization_mean_mean: mean(value_limit_utilization_mean)]", "| condition val() > 0.8"]) duration = "0s" trigger { count = "1" } } } documentation { content = join("", [ "Scheduler Memory exceeds a threshold, summed across all schedulers in the environment. ", "Add more schedulers OR increase scheduler's memory OR reduce scheduling load (e.g. through lower parsing frequency or lower number of DAGs/tasks running"]) } # uncomment to set an auto close strategy for the alert #alert_strategy { # auto_close = "30m" #} } resource "google_monitoring_alert_policy" "worker_memory" { display_name = "Worker Memory" combiner = "OR" conditions { display_name = "Worker Memory" condition_monitoring_query_language { query = join("", [ "fetch k8s_container", "| metric 'kubernetes.io/container/memory/limit_utilization'", "| filter (resource.pod_name =~ 'airflow-worker.*')", "| group_by 10m, [value_limit_utilization_mean: mean(value.limit_utilization)]", "| every 10m", "| group_by [resource.cluster_name],", " [value_limit_utilization_mean_mean: mean(value_limit_utilization_mean)]", "| condition val() > 0.8"]) duration = "0s" trigger { count = "1" } } } # uncomment to set an auto close strategy for the alert #alert_strategy { # auto_close = "30m" #} } resource "google_monitoring_alert_policy" "webserver_memory" { display_name = "Web Server Memory" combiner = "OR" conditions { display_name = "Web Server Memory" condition_monitoring_query_language { query = join("", [ "fetch k8s_container", "| metric 'kubernetes.io/container/memory/limit_utilization'", "| filter (resource.pod_name =~ 'airflow-webserver.*')", "| group_by 10m, [value_limit_utilization_mean: mean(value.limit_utilization)]", "| every 10m", "| group_by [resource.cluster_name],", " [value_limit_utilization_mean_mean: mean(value_limit_utilization_mean)]", "| condition val() > 0.8"]) duration = "0s" trigger { count = "1" } } } # uncomment to set an auto close strategy for the alert #alert_strategy { # auto_close = "30m" #} } resource "google_monitoring_alert_policy" "scheduled_tasks_percentage" { display_name = "Scheduled Tasks Percentage" combiner = "OR" conditions { display_name = "Scheduled Tasks Percentage" condition_monitoring_query_language { query = join("", [ "fetch cloud_composer_environment", "| metric 'composer.googleapis.com/environment/unfinished_task_instances'", "| align mean_aligner(10m)", "| every(10m)", "| window(10m)", "| filter_ratio_by [resource.project_id, resource.environment_name], metric.state = 'scheduled'", "| condition val() > 0.80"]) duration = "300s" trigger { count = "1" } } } # uncomment to set an auto close strategy for the alert #alert_strategy { # auto_close = "30m" #} } resource "google_monitoring_alert_policy" "queued_tasks_percentage" { display_name = "Queued Tasks Percentage" combiner = "OR" conditions { display_name = "Queued Tasks Percentage" condition_monitoring_query_language { query = join("", [ "fetch cloud_composer_environment", "| metric 'composer.googleapis.com/environment/unfinished_task_instances'", "| align mean_aligner(10m)", "| every(10m)", "| window(10m)", "| filter_ratio_by [resource.project_id, resource.environment_name], metric.state = 'queued'", "| group_by [resource.project_id, resource.environment_name]", "| condition val() > 0.95"]) duration = "300s" trigger { count = "1" } } } # uncomment to set an auto close strategy for the alert #alert_strategy { # auto_close = "30m" #} } resource "google_monitoring_alert_policy" "queued_or_scheduled_tasks_percentage" { display_name = "Queued or Scheduled Tasks Percentage" combiner = "OR" conditions { display_name = "Queued or Scheduled Tasks Percentage" condition_monitoring_query_language { query = join("", [ "fetch cloud_composer_environment", "| metric 'composer.googleapis.com/environment/unfinished_task_instances'", "| align mean_aligner(10m)", "| every(10m)", "| window(10m)", "| filter_ratio_by [resource.project_id, resource.environment_name], or(metric.state = 'queued', metric.state = 'scheduled' )", "| group_by [resource.project_id, resource.environment_name]", "| condition val() > 0.80"]) duration = "120s" trigger { count = "1" } } } # uncomment to set an auto close strategy for the alert #alert_strategy { # auto_close = "30m" #} } resource "google_monitoring_alert_policy" "workers_above_minimum" { display_name = "Workers above minimum (negative = missing workers)" combiner = "OR" conditions { display_name = "Workers above minimum" condition_monitoring_query_language { query = join("", [ "fetch cloud_composer_environment", "| { metric 'composer.googleapis.com/environment/num_celery_workers'", "| group_by 5m, [value_num_celery_workers_mean: mean(value.num_celery_workers)]", "| every 5m", "; metric 'composer.googleapis.com/environment/worker/min_workers'", "| group_by 5m, [value_min_workers_mean: mean(value.min_workers)]", "| every 5m }", "| outer_join 0", "| sub", "| group_by [resource.project_id, resource.environment_name]", "| condition val() < 0"]) duration = "0s" trigger { count = "1" } } } # uncomment to set an auto close strategy for the alert #alert_strategy { # auto_close = "30m" #} } resource "google_monitoring_alert_policy" "pod_evictions" { display_name = "Worker pod evictions" combiner = "OR" conditions { display_name = "Worker pod evictions" condition_monitoring_query_language { query = join("", [ "fetch cloud_composer_environment", "| metric 'composer.googleapis.com/environment/worker/pod_eviction_count'", "| align delta(1m)", "| every 1m", "| group_by [resource.project_id, resource.environment_name]", "| condition val() > 0"]) duration = "60s" trigger { count = "1" } } } # uncomment to set an auto close strategy for the alert #alert_strategy { # auto_close = "30m" #} } resource "google_monitoring_alert_policy" "scheduler_errors" { display_name = "Scheduler Errors" combiner = "OR" conditions { display_name = "Scheduler Errors" condition_monitoring_query_language { query = join("", [ "fetch cloud_composer_environment", "| metric 'logging.googleapis.com/log_entry_count'", "| filter (metric.log == 'airflow-scheduler' && metric.severity == 'ERROR')", "| group_by 5m,", " [value_log_entry_count_aggregate: aggregate(value.log_entry_count)]", "| every 5m", "| group_by [resource.project_id, resource.environment_name],", " [value_log_entry_count_aggregate_max: max(value_log_entry_count_aggregate)]", "| condition val() > 50"]) duration = "300s" trigger { count = "1" } } } # uncomment to set an auto close strategy for the alert #alert_strategy { # auto_close = "30m" #} } resource "google_monitoring_alert_policy" "worker_errors" { display_name = "Worker Errors" combiner = "OR" conditions { display_name = "Worker Errors" condition_monitoring_query_language { query = join("", [ "fetch cloud_composer_environment", "| metric 'logging.googleapis.com/log_entry_count'", "| filter (metric.log == 'airflow-worker' && metric.severity == 'ERROR')", "| group_by 5m,", " [value_log_entry_count_aggregate: aggregate(value.log_entry_count)]", "| every 5m", "| group_by [resource.project_id, resource.environment_name],", " [value_log_entry_count_aggregate_max: max(value_log_entry_count_aggregate)]", "| condition val() > 50"]) duration = "300s" trigger { count = "1" } } } # uncomment to set an auto close strategy for the alert #alert_strategy { # auto_close = "30m" #} } resource "google_monitoring_alert_policy" "webserver_errors" { display_name = "Web Server Errors" combiner = "OR" conditions { display_name = "Web Server Errors" condition_monitoring_query_language { query = join("", [ "fetch cloud_composer_environment", "| metric 'logging.googleapis.com/log_entry_count'", "| filter (metric.log == 'airflow-webserver' && metric.severity == 'ERROR')", "| group_by 5m,", " [value_log_entry_count_aggregate: aggregate(value.log_entry_count)]", "| every 5m", "| group_by [resource.project_id, resource.environment_name],", " [value_log_entry_count_aggregate_max: max(value_log_entry_count_aggregate)]", "| condition val() > 50"]) duration = "300s" trigger { count = "1" } } } # uncomment to set an auto close strategy for the alert #alert_strategy { # auto_close = "30m" #} } resource "google_monitoring_alert_policy" "other_errors" { display_name = "Other Errors" combiner = "OR" conditions { display_name = "Other Errors" condition_monitoring_query_language { query = join("", [ "fetch cloud_composer_environment", "| metric 'logging.googleapis.com/log_entry_count'", "| filter", " (metric.log !~ 'airflow-scheduler|airflow-worker|airflow-webserver'", " && metric.severity == 'ERROR')", "| group_by 5m, [value_log_entry_count_max: max(value.log_entry_count)]", "| every 5m", "| group_by [resource.project_id, resource.environment_name],", " [value_log_entry_count_max_aggregate: aggregate(value_log_entry_count_max)]", "| condition val() > 10"]) duration = "300s" trigger { count = "1" } } } # uncomment to set an auto close strategy for the alert #alert_strategy { # auto_close = "30m" #} } ####################################################### # # Create Monitoring Dashboard # ######################################################## resource "google_monitoring_dashboard" "Composer_Dashboard" { dashboard_json = <<EOF { "category": "CUSTOM", "displayName": "Cloud Composer - Monitoring Platform", "mosaicLayout": { "columns": 12, "tiles": [ { "height": 1, "widget": { "text": { "content": "", "format": "MARKDOWN" }, "title": "Health" }, "width": 12, "xPos": 0, "yPos": 0 }, { "height": 4, "widget": { "alertChart": { "name": "${google_monitoring_alert_policy.environment_health.name}" } }, "width": 6, "xPos": 0, "yPos": 1 }, { "height": 4, "widget": { "alertChart": { "name": "${google_monitoring_alert_policy.database_health.name}" } }, "width": 6, "xPos": 6, "yPos": 1 }, { "height": 4, "widget": { "alertChart": { "name": "${google_monitoring_alert_policy.webserver_health.name}" } }, "width": 6, "xPos": 0, "yPos": 5 }, { "height": 4, "widget": { "alertChart": { "name": "${google_monitoring_alert_policy.scheduler_heartbeat.name}" } }, "width": 6, "xPos": 6, "yPos": 5 }, { "height": 1, "widget": { "text": { "content": "", "format": "RAW" }, "title": "Airflow Task Execution and DAG Parsing" }, "width": 12, "xPos": 0, "yPos": 9 }, { "height": 4, "widget": { "alertChart": { "name": "${google_monitoring_alert_policy.scheduled_tasks_percentage.name}" } }, "width": 6, "xPos": 0, "yPos": 10 }, { "height": 4, "widget": { "alertChart": { "name": "${google_monitoring_alert_policy.queued_tasks_percentage.name}" } }, "width": 6, "xPos": 6, "yPos": 10 }, { "height": 4, "widget": { "alertChart": { "name": "${google_monitoring_alert_policy.queued_or_scheduled_tasks_percentage.name}" } }, "width": 6, "xPos": 0, "yPos": 14 }, { "height": 4, "widget": { "alertChart": { "name": "${google_monitoring_alert_policy.parsing_time.name}" } }, "width": 6, "xPos": 6, "yPos": 14 }, { "height": 1, "widget": { "text": { "content": "", "format": "RAW" }, "title": "Workers presence" }, "width": 12, "xPos": 0, "yPos": 18 }, { "height": 4, "widget": { "alertChart": { "name": "${google_monitoring_alert_policy.workers_above_minimum.name}" } }, "width": 6, "xPos": 0, "yPos": 19 }, { "height": 4, "widget": { "alertChart": { "name": "${google_monitoring_alert_policy.pod_evictions.name}" } }, "width": 6, "xPos": 6, "yPos": 19 }, { "height": 1, "widget": { "text": { "content": "", "format": "RAW" }, "title": "CPU Utilization" }, "width": 12, "xPos": 0, "yPos": 23 }, { "height": 4, "widget": { "alertChart": { "name": "${google_monitoring_alert_policy.database_cpu.name}" } }, "width": 6, "xPos": 0, "yPos": 24 }, { "height": 4, "widget": { "alertChart": { "name": "${google_monitoring_alert_policy.scheduler_cpu.name}" } }, "width": 6, "xPos": 6, "yPos": 24 }, { "height": 4, "widget": { "alertChart": { "name": "${google_monitoring_alert_policy.worker_cpu.name}" } }, "width": 6, "xPos": 0, "yPos": 28 }, { "height": 4, "widget": { "alertChart": { "name": "${google_monitoring_alert_policy.webserver_cpu.name}" } }, "width": 6, "xPos": 6, "yPos": 28 }, { "height": 1, "widget": { "text": { "content": "", "format": "RAW" }, "title": "Memory Utilization" }, "width": 12, "xPos": 0, "yPos": 32 }, { "height": 4, "widget": { "alertChart": { "name": "${google_monitoring_alert_policy.database_memory.name}" } }, "width": 6, "xPos": 0, "yPos": 33 }, { "height": 4, "widget": { "alertChart": { "name": "${google_monitoring_alert_policy.scheduler_memory.name}" } }, "width": 6, "xPos": 6, "yPos": 33 }, { "height": 4, "widget": { "alertChart": { "name": "${google_monitoring_alert_policy.worker_memory.name}" } }, "width": 6, "xPos": 0, "yPos": 37 }, { "height": 4, "widget": { "alertChart": { "name": "${google_monitoring_alert_policy.webserver_memory.name}" } }, "width": 6, "xPos": 6, "yPos": 37 }, { "height": 1, "widget": { "text": { "content": "", "format": "RAW" }, "title": "Airflow component errors" }, "width": 12, "xPos": 0, "yPos": 41 }, { "height": 4, "widget": { "alertChart": { "name": "${google_monitoring_alert_policy.scheduler_errors.name}" } }, "width": 6, "xPos": 0, "yPos": 42 }, { "height": 4, "widget": { "alertChart": { "name": "${google_monitoring_alert_policy.worker_errors.name}" } }, "width": 6, "xPos": 6, "yPos": 42 }, { "height": 4, "widget": { "alertChart": { "name": "${google_monitoring_alert_policy.webserver_errors.name}" } }, "width": 6, "xPos": 0, "yPos": 48 }, { "height": 4, "widget": { "alertChart": { "name": "${google_monitoring_alert_policy.other_errors.name}" } }, "width": 6, "xPos": 6, "yPos": 48 }, { "height": 1, "widget": { "text": { "content": "", "format": "RAW" }, "title": "Task errors" }, "width": 12, "xPos": 0, "yPos": 52 } ] } } EOF }
修改
"google_monitoring_monitored_project"
resource
代码块:- 将
for_each
代码块中的项目列表替换为您的受监控项目。 - 将
metrics_scope
中的"YOUR_MONITORING_PROJECT"
替换为名称 (位于 Monitoring 项目中)。
- 将
查看配置并验证 Terraform 将创建或更新的资源是否符合您的预期。如有必要,请进行更正。
terraform plan
运行以下命令以应用 Terraform 配置 在提示符后输入 yes:
terraform apply
在 Monitoring 项目的 Google Cloud 控制台中,前往 Monitoring 信息中心页面:
在自定义标签页中,找到名为 Cloud Composer - Monitoring Platform 的自定义信息中心。