Cloud Composer 1 |Cloud Composer 2 |Cloud Composer 3
本页介绍了如何为同一组织中所选项目中的多个 Cloud Composer 环境实现集成监控信息中心。
概览
所述解决方案可帮助中央企业平台团队支持 其他团队使用的 Cloud Composer 环境。这个 可用于监控所有 Cloud Composer 甚至是不是使用 Terraform 创建的环境。
本指南在 Google Cloud Monitoring 中 Cloud Composer 以及提醒 持续报告 Cloud Composer 关键指标的政策 并在出现问题时引发突发事件。信息中心 自动扫描项目中的所有 Cloud Composer 环境 选择的指标。该实现依赖于 Terraform。
该模型使用一个 Google Cloud 项目作为监控项目,用于监控部署在多个受监控项目中的 Cloud Composer 环境(只读)。中央信息中心使用受监控项目中的 Cloud Monitoring 指标来呈现其内容。
该信息中心会监控多个指标(包括环境运行状况)并为其创建提醒:
或 CPU 指标:
将指针悬停在特定线条上,可查看它代表的环境。 然后,信息中心会显示项目名称和资源:
如果某个指标超出预定义的阈值,系统会发出突发事件,并在与该指标对应的图表中显示相应的提醒:
受监控指标的列表
受监控指标的完整列表:
- Cloud Composer 环境健康状况(基于 Monitoring DAG)
- 数据库运行状况
- Web 服务器健康状况
- 调度器心跳
- 所有工作器的 CPU 和内存利用率
- Airflow 数据库的 CPU 和内存利用率
- Web 服务器的 CPU 和内存利用率 (仅在 Cloud Composer 2 中提供)
- Airflow 调度器的 CPU 和内存利用率
- 环境中已加入队列、已安排、已加入队列或已安排的任务所占的比例(有助于发现 Airflow 并发配置问题)
- DAG 解析时间
- 当前工作器数量与最低工作器数量 - 有助于了解工作器稳定性问题或扩缩问题
- 工作器 Pod 逐出
- 工作器、调度器、Web 服务器或其他组件在日志中抛出的错误数量(各个图表)
准备工作
如需使用 Cloud Composer 和 Cloud Monitoring,请执行以下操作: 您需要创建 Google Cloud 项目并启用结算功能。 该项目必须包含 Cloud Composer 环境。此项目 在本指南中称为 Monitoring 项目。
- Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.
-
In the Google Cloud console, on the project selector page, select or create a Google Cloud project.
-
Make sure that billing is enabled for your Google Cloud project.
-
In the Google Cloud console, on the project selector page, select or create a Google Cloud project.
-
Make sure that billing is enabled for your Google Cloud project.
- 安装 Terraform(如果尚未安装)。
- 配置项目的指标范围。默认情况下,项目可以 仅显示或监控其存储的时间序列数据。如果您想显示数据或监控存储在多个项目中的数据,请配置项目的指标范围。如需更多信息 请参阅指标范围概览。
实现步骤
在运行 Terraform 的本地计算机上,设置
GOOGLE_CLOUD_PROJECT
环境变量 ID Monitoring 项目:export GOOGLE_CLOUD_PROJECT=MONITORING_PROJECT_ID
确保您的 Terraform Google 提供程序已通过身份验证,并且拥有以下权限:
- Monitoring 项目中的
roles/monitoring.editor
权限 - 所有受监控项目中的
roles/monitoring.viewer
、roles/logging.viewer
- Monitoring 项目中的
将以下
main.tf
文件复制到您运行 Terraform 的本地计算机。点击展开
# Monitoring for multiple Cloud Composer environments # # Usage: # 1. Create a new project that you will use for monitoring of Cloud Composer environments in other projects # 2. Replace YOUR_MONITORING_PROJECT with the name of this project in the "metrics_scope" parameter that is part of the "Add Monitored Projects to the Monitoring project" section # 3. Replace the list of projects to monitor with your list of projects with Cloud Composer environments to be monitored in the "for_each" parameter of the "Add Monitored Projects to the Monitoring project" section # 4. Set up your environment and apply the configuration following these steps: https://cloud.google.com/docs/terraform/basic-commands. Your GOOGLE_CLOUD_PROJECT environment variable should be the new monitoring project you just created. # # The script creates the following resources in the monitoring project: # 1. Adds monitored projects to Cloud Monitoring # 2. Creates Alert Policies # 3. Creates Monitoring Dashboard # ####################################################### # # Add Monitored Projects to the Monitoring project # ######################################################## resource "google_monitoring_monitored_project" "projects_monitored" { for_each = toset(["YOUR_PROJECT_TO_MONITOR_1", "YOUR_PROJECT_TO_MONITOR_2", "YOUR_PROJECT_TO_MONITOR_3"]) metrics_scope = join("", ["locations/global/metricsScopes/", "YOUR_MONITORING_PROJECT"]) name = each.value } ####################################################### # # Create alert policies in Monitoring project # ######################################################## resource "google_monitoring_alert_policy" "environment_health" { display_name = "Environment Health" combiner = "OR" conditions { display_name = "Environment Health" condition_monitoring_query_language { query = join("", [ "fetch cloud_composer_environment", "| {metric 'composer.googleapis.com/environment/dagbag_size'", "| group_by 5m, [value_dagbag_size_mean: if(mean(value.dagbag_size) > 0, 1, 0)]", "| align mean_aligner(5m)", "| group_by [resource.project_id, resource.environment_name], [value_dagbag_size_mean_aggregate: aggregate(value_dagbag_size_mean)]; ", "metric 'composer.googleapis.com/environment/healthy'", "| group_by 5m, [value_sum_signals: aggregate(if(value.healthy,1,0))]", "| align mean_aligner(5m)| absent_for 5m }", "| outer_join 0", "| group_by [resource.project_id, resource.environment_name]", "| value val(2)", "| align mean_aligner(5m)", "| window(5m)", "| condition val(0) < 0.9" ]) duration = "120s" trigger { count = "1" } } } # uncomment to set an auto close strategy for the alert #alert_strategy { # auto_close = "30m" #} } resource "google_monitoring_alert_policy" "database_health" { display_name = "Database Health" combiner = "OR" conditions { display_name = "Database Health" condition_monitoring_query_language { query = join("", [ "fetch cloud_composer_environment", "| metric 'composer.googleapis.com/environment/database_health'", "| group_by 5m,", " [value_database_health_fraction_true: fraction_true(value.database_health)]", "| every 5m", "| group_by 5m,", " [value_database_health_fraction_true_aggregate:", " aggregate(value_database_health_fraction_true)]", "| every 5m", "| group_by [resource.project_id, resource.environment_name],", " [value_database_health_fraction_true_aggregate_aggregate:", " aggregate(value_database_health_fraction_true_aggregate)]", "| condition val() < 0.95"]) duration = "120s" trigger { count = "1" } } } # uncomment to set an auto close strategy for the alert #alert_strategy { # auto_close = "30m" #} } resource "google_monitoring_alert_policy" "webserver_health" { display_name = "Web Server Health" combiner = "OR" conditions { display_name = "Web Server Health" condition_monitoring_query_language { query = join("", [ "fetch cloud_composer_environment", "| metric 'composer.googleapis.com/environment/web_server/health'", "| group_by 5m, [value_health_fraction_true: fraction_true(value.health)]", "| every 5m", "| group_by 5m,", " [value_health_fraction_true_aggregate:", " aggregate(value_health_fraction_true)]", "| every 5m", "| group_by [resource.project_id, resource.environment_name],", " [value_health_fraction_true_aggregate_aggregate:", " aggregate(value_health_fraction_true_aggregate)]", "| condition val() < 0.95"]) duration = "120s" trigger { count = "1" } } } # uncomment to set an auto close strategy for the alert #alert_strategy { # auto_close = "30m" #} } resource "google_monitoring_alert_policy" "scheduler_heartbeat" { display_name = "Scheduler Heartbeat" combiner = "OR" conditions { display_name = "Scheduler Heartbeat" condition_monitoring_query_language { query = join("", [ "fetch cloud_composer_environment", "| metric 'composer.googleapis.com/environment/scheduler_heartbeat_count'", "| group_by 10m,", " [value_scheduler_heartbeat_count_aggregate:", " aggregate(value.scheduler_heartbeat_count)]", "| every 10m", "| group_by 10m,", " [value_scheduler_heartbeat_count_aggregate_mean:", " mean(value_scheduler_heartbeat_count_aggregate)]", "| every 10m", "| group_by [resource.project_id, resource.environment_name],", " [value_scheduler_heartbeat_count_aggregate_mean_aggregate:", " aggregate(value_scheduler_heartbeat_count_aggregate_mean)]", "| condition val() < 80"]) duration = "120s" trigger { count = "1" } } } # uncomment to set an auto close strategy for the alert #alert_strategy { # auto_close = "30m" #} } resource "google_monitoring_alert_policy" "database_cpu" { display_name = "Database CPU" combiner = "OR" conditions { display_name = "Database CPU" condition_monitoring_query_language { query = join("", [ "fetch cloud_composer_environment", "| metric 'composer.googleapis.com/environment/database/cpu/utilization'", "| group_by 10m, [value_utilization_mean: mean(value.utilization)]", "| every 10m", "| group_by [resource.project_id, resource.environment_name]", "| condition val() > 0.8"]) duration = "120s" trigger { count = "1" } } } # uncomment to set an auto close strategy for the alert #alert_strategy { # auto_close = "30m" #} } resource "google_monitoring_alert_policy" "scheduler_cpu" { display_name = "Scheduler CPU" combiner = "OR" conditions { display_name = "Scheduler CPU" condition_monitoring_query_language { query = join("", [ "fetch k8s_container", "| metric 'kubernetes.io/container/cpu/limit_utilization'", "| filter (resource.pod_name =~ 'airflow-scheduler-.*')", "| group_by 10m, [value_limit_utilization_mean: mean(value.limit_utilization)]", "| every 10m", "| group_by [resource.cluster_name],", " [value_limit_utilization_mean_mean: mean(value_limit_utilization_mean)]", "| condition val() > 0.8"]) duration = "120s" trigger { count = "1" } } } # uncomment to set an auto close strategy for the alert #alert_strategy { # auto_close = "30m" #} } resource "google_monitoring_alert_policy" "worker_cpu" { display_name = "Worker CPU" combiner = "OR" conditions { display_name = "Worker CPU" condition_monitoring_query_language { query = join("", [ "fetch k8s_container", "| metric 'kubernetes.io/container/cpu/limit_utilization'", "| filter (resource.pod_name =~ 'airflow-worker.*')", "| group_by 10m, [value_limit_utilization_mean: mean(value.limit_utilization)]", "| every 10m", "| group_by [resource.cluster_name],", " [value_limit_utilization_mean_mean: mean(value_limit_utilization_mean)]", "| condition val() > 0.8"]) duration = "120s" trigger { count = "1" } } } # uncomment to set an auto close strategy for the alert #alert_strategy { # auto_close = "30m" #} } resource "google_monitoring_alert_policy" "webserver_cpu" { display_name = "Web Server CPU" combiner = "OR" conditions { display_name = "Web Server CPU" condition_monitoring_query_language { query = join("", [ "fetch k8s_container", "| metric 'kubernetes.io/container/cpu/limit_utilization'", "| filter (resource.pod_name =~ 'airflow-webserver.*')", "| group_by 10m, [value_limit_utilization_mean: mean(value.limit_utilization)]", "| every 10m", "| group_by [resource.cluster_name],", " [value_limit_utilization_mean_mean: mean(value_limit_utilization_mean)]", "| condition val() > 0.8"]) duration = "120s" trigger { count = "1" } } } # uncomment to set an auto close strategy for the alert #alert_strategy { # auto_close = "30m" #} } resource "google_monitoring_alert_policy" "parsing_time" { display_name = "DAG Parsing Time" combiner = "OR" conditions { display_name = "DAG Parsing Time" condition_monitoring_query_language { query = join("", [ "fetch cloud_composer_environment", "| metric 'composer.googleapis.com/environment/dag_processing/total_parse_time'", "| group_by 5m, [value_total_parse_time_mean: mean(value.total_parse_time)]", "| every 5m", "| group_by [resource.project_id, resource.environment_name]", "| condition val(0) > cast_units(30,\"s\")"]) duration = "120s" trigger { count = "1" } } } # uncomment to set an auto close strategy for the alert #alert_strategy { # auto_close = "30m" #} } resource "google_monitoring_alert_policy" "database_memory" { display_name = "Database Memory" combiner = "OR" conditions { display_name = "Database Memory" condition_monitoring_query_language { query = join("", [ "fetch cloud_composer_environment", "| metric 'composer.googleapis.com/environment/database/memory/utilization'", "| group_by 10m, [value_utilization_mean: mean(value.utilization)]", "| every 10m", "| group_by [resource.project_id, resource.environment_name]", "| condition val() > 0.8"]) duration = "0s" trigger { count = "1" } } } # uncomment to set an auto close strategy for the alert #alert_strategy { # auto_close = "30m" #} } resource "google_monitoring_alert_policy" "scheduler_memory" { display_name = "Scheduler Memory" combiner = "OR" conditions { display_name = "Scheduler Memory" condition_monitoring_query_language { query = join("", [ "fetch k8s_container", "| metric 'kubernetes.io/container/memory/limit_utilization'", "| filter (resource.pod_name =~ 'airflow-scheduler-.*')", "| group_by 10m, [value_limit_utilization_mean: mean(value.limit_utilization)]", "| every 10m", "| group_by [resource.cluster_name],", " [value_limit_utilization_mean_mean: mean(value_limit_utilization_mean)]", "| condition val() > 0.8"]) duration = "0s" trigger { count = "1" } } } documentation { content = join("", [ "Scheduler Memory exceeds a threshold, summed across all schedulers in the environment. ", "Add more schedulers OR increase scheduler's memory OR reduce scheduling load (e.g. through lower parsing frequency or lower number of DAGs/tasks running"]) } # uncomment to set an auto close strategy for the alert #alert_strategy { # auto_close = "30m" #} } resource "google_monitoring_alert_policy" "worker_memory" { display_name = "Worker Memory" combiner = "OR" conditions { display_name = "Worker Memory" condition_monitoring_query_language { query = join("", [ "fetch k8s_container", "| metric 'kubernetes.io/container/memory/limit_utilization'", "| filter (resource.pod_name =~ 'airflow-worker.*')", "| group_by 10m, [value_limit_utilization_mean: mean(value.limit_utilization)]", "| every 10m", "| group_by [resource.cluster_name],", " [value_limit_utilization_mean_mean: mean(value_limit_utilization_mean)]", "| condition val() > 0.8"]) duration = "0s" trigger { count = "1" } } } # uncomment to set an auto close strategy for the alert #alert_strategy { # auto_close = "30m" #} } resource "google_monitoring_alert_policy" "webserver_memory" { display_name = "Web Server Memory" combiner = "OR" conditions { display_name = "Web Server Memory" condition_monitoring_query_language { query = join("", [ "fetch k8s_container", "| metric 'kubernetes.io/container/memory/limit_utilization'", "| filter (resource.pod_name =~ 'airflow-webserver.*')", "| group_by 10m, [value_limit_utilization_mean: mean(value.limit_utilization)]", "| every 10m", "| group_by [resource.cluster_name],", " [value_limit_utilization_mean_mean: mean(value_limit_utilization_mean)]", "| condition val() > 0.8"]) duration = "0s" trigger { count = "1" } } } # uncomment to set an auto close strategy for the alert #alert_strategy { # auto_close = "30m" #} } resource "google_monitoring_alert_policy" "scheduled_tasks_percentage" { display_name = "Scheduled Tasks Percentage" combiner = "OR" conditions { display_name = "Scheduled Tasks Percentage" condition_monitoring_query_language { query = join("", [ "fetch cloud_composer_environment", "| metric 'composer.googleapis.com/environment/unfinished_task_instances'", "| align mean_aligner(10m)", "| every(10m)", "| window(10m)", "| filter_ratio_by [resource.project_id, resource.environment_name], metric.state = 'scheduled'", "| condition val() > 0.80"]) duration = "300s" trigger { count = "1" } } } # uncomment to set an auto close strategy for the alert #alert_strategy { # auto_close = "30m" #} } resource "google_monitoring_alert_policy" "queued_tasks_percentage" { display_name = "Queued Tasks Percentage" combiner = "OR" conditions { display_name = "Queued Tasks Percentage" condition_monitoring_query_language { query = join("", [ "fetch cloud_composer_environment", "| metric 'composer.googleapis.com/environment/unfinished_task_instances'", "| align mean_aligner(10m)", "| every(10m)", "| window(10m)", "| filter_ratio_by [resource.project_id, resource.environment_name], metric.state = 'queued'", "| group_by [resource.project_id, resource.environment_name]", "| condition val() > 0.95"]) duration = "300s" trigger { count = "1" } } } # uncomment to set an auto close strategy for the alert #alert_strategy { # auto_close = "30m" #} } resource "google_monitoring_alert_policy" "queued_or_scheduled_tasks_percentage" { display_name = "Queued or Scheduled Tasks Percentage" combiner = "OR" conditions { display_name = "Queued or Scheduled Tasks Percentage" condition_monitoring_query_language { query = join("", [ "fetch cloud_composer_environment", "| metric 'composer.googleapis.com/environment/unfinished_task_instances'", "| align mean_aligner(10m)", "| every(10m)", "| window(10m)", "| filter_ratio_by [resource.project_id, resource.environment_name], or(metric.state = 'queued', metric.state = 'scheduled' )", "| group_by [resource.project_id, resource.environment_name]", "| condition val() > 0.80"]) duration = "120s" trigger { count = "1" } } } # uncomment to set an auto close strategy for the alert #alert_strategy { # auto_close = "30m" #} } resource "google_monitoring_alert_policy" "workers_above_minimum" { display_name = "Workers above minimum (negative = missing workers)" combiner = "OR" conditions { display_name = "Workers above minimum" condition_monitoring_query_language { query = join("", [ "fetch cloud_composer_environment", "| { metric 'composer.googleapis.com/environment/num_celery_workers'", "| group_by 5m, [value_num_celery_workers_mean: mean(value.num_celery_workers)]", "| every 5m", "; metric 'composer.googleapis.com/environment/worker/min_workers'", "| group_by 5m, [value_min_workers_mean: mean(value.min_workers)]", "| every 5m }", "| outer_join 0", "| sub", "| group_by [resource.project_id, resource.environment_name]", "| condition val() < 0"]) duration = "0s" trigger { count = "1" } } } # uncomment to set an auto close strategy for the alert #alert_strategy { # auto_close = "30m" #} } resource "google_monitoring_alert_policy" "pod_evictions" { display_name = "Worker pod evictions" combiner = "OR" conditions { display_name = "Worker pod evictions" condition_monitoring_query_language { query = join("", [ "fetch cloud_composer_environment", "| metric 'composer.googleapis.com/environment/worker/pod_eviction_count'", "| align delta(1m)", "| every 1m", "| group_by [resource.project_id, resource.environment_name]", "| condition val() > 0"]) duration = "60s" trigger { count = "1" } } } # uncomment to set an auto close strategy for the alert #alert_strategy { # auto_close = "30m" #} } resource "google_monitoring_alert_policy" "scheduler_errors" { display_name = "Scheduler Errors" combiner = "OR" conditions { display_name = "Scheduler Errors" condition_monitoring_query_language { query = join("", [ "fetch cloud_composer_environment", "| metric 'logging.googleapis.com/log_entry_count'", "| filter (metric.log == 'airflow-scheduler' && metric.severity == 'ERROR')", "| group_by 5m,", " [value_log_entry_count_aggregate: aggregate(value.log_entry_count)]", "| every 5m", "| group_by [resource.project_id, resource.environment_name],", " [value_log_entry_count_aggregate_max: max(value_log_entry_count_aggregate)]", "| condition val() > 50"]) duration = "300s" trigger { count = "1" } } } # uncomment to set an auto close strategy for the alert #alert_strategy { # auto_close = "30m" #} } resource "google_monitoring_alert_policy" "worker_errors" { display_name = "Worker Errors" combiner = "OR" conditions { display_name = "Worker Errors" condition_monitoring_query_language { query = join("", [ "fetch cloud_composer_environment", "| metric 'logging.googleapis.com/log_entry_count'", "| filter (metric.log == 'airflow-worker' && metric.severity == 'ERROR')", "| group_by 5m,", " [value_log_entry_count_aggregate: aggregate(value.log_entry_count)]", "| every 5m", "| group_by [resource.project_id, resource.environment_name],", " [value_log_entry_count_aggregate_max: max(value_log_entry_count_aggregate)]", "| condition val() > 50"]) duration = "300s" trigger { count = "1" } } } # uncomment to set an auto close strategy for the alert #alert_strategy { # auto_close = "30m" #} } resource "google_monitoring_alert_policy" "webserver_errors" { display_name = "Web Server Errors" combiner = "OR" conditions { display_name = "Web Server Errors" condition_monitoring_query_language { query = join("", [ "fetch cloud_composer_environment", "| metric 'logging.googleapis.com/log_entry_count'", "| filter (metric.log == 'airflow-webserver' && metric.severity == 'ERROR')", "| group_by 5m,", " [value_log_entry_count_aggregate: aggregate(value.log_entry_count)]", "| every 5m", "| group_by [resource.project_id, resource.environment_name],", " [value_log_entry_count_aggregate_max: max(value_log_entry_count_aggregate)]", "| condition val() > 50"]) duration = "300s" trigger { count = "1" } } } # uncomment to set an auto close strategy for the alert #alert_strategy { # auto_close = "30m" #} } resource "google_monitoring_alert_policy" "other_errors" { display_name = "Other Errors" combiner = "OR" conditions { display_name = "Other Errors" condition_monitoring_query_language { query = join("", [ "fetch cloud_composer_environment", "| metric 'logging.googleapis.com/log_entry_count'", "| filter", " (metric.log !~ 'airflow-scheduler|airflow-worker|airflow-webserver'", " && metric.severity == 'ERROR')", "| group_by 5m, [value_log_entry_count_max: max(value.log_entry_count)]", "| every 5m", "| group_by [resource.project_id, resource.environment_name],", " [value_log_entry_count_max_aggregate: aggregate(value_log_entry_count_max)]", "| condition val() > 10"]) duration = "300s" trigger { count = "1" } } } # uncomment to set an auto close strategy for the alert #alert_strategy { # auto_close = "30m" #} } ####################################################### # # Create Monitoring Dashboard # ######################################################## resource "google_monitoring_dashboard" "Composer_Dashboard" { dashboard_json = <<EOF { "category": "CUSTOM", "displayName": "Cloud Composer - Monitoring Platform", "mosaicLayout": { "columns": 12, "tiles": [ { "height": 1, "widget": { "text": { "content": "", "format": "MARKDOWN" }, "title": "Health" }, "width": 12, "xPos": 0, "yPos": 0 }, { "height": 4, "widget": { "alertChart": { "name": "${google_monitoring_alert_policy.environment_health.name}" } }, "width": 6, "xPos": 0, "yPos": 1 }, { "height": 4, "widget": { "alertChart": { "name": "${google_monitoring_alert_policy.database_health.name}" } }, "width": 6, "xPos": 6, "yPos": 1 }, { "height": 4, "widget": { "alertChart": { "name": "${google_monitoring_alert_policy.webserver_health.name}" } }, "width": 6, "xPos": 0, "yPos": 5 }, { "height": 4, "widget": { "alertChart": { "name": "${google_monitoring_alert_policy.scheduler_heartbeat.name}" } }, "width": 6, "xPos": 6, "yPos": 5 }, { "height": 1, "widget": { "text": { "content": "", "format": "RAW" }, "title": "Airflow Task Execution and DAG Parsing" }, "width": 12, "xPos": 0, "yPos": 9 }, { "height": 4, "widget": { "alertChart": { "name": "${google_monitoring_alert_policy.scheduled_tasks_percentage.name}" } }, "width": 6, "xPos": 0, "yPos": 10 }, { "height": 4, "widget": { "alertChart": { "name": "${google_monitoring_alert_policy.queued_tasks_percentage.name}" } }, "width": 6, "xPos": 6, "yPos": 10 }, { "height": 4, "widget": { "alertChart": { "name": "${google_monitoring_alert_policy.queued_or_scheduled_tasks_percentage.name}" } }, "width": 6, "xPos": 0, "yPos": 14 }, { "height": 4, "widget": { "alertChart": { "name": "${google_monitoring_alert_policy.parsing_time.name}" } }, "width": 6, "xPos": 6, "yPos": 14 }, { "height": 1, "widget": { "text": { "content": "", "format": "RAW" }, "title": "Workers presence" }, "width": 12, "xPos": 0, "yPos": 18 }, { "height": 4, "widget": { "alertChart": { "name": "${google_monitoring_alert_policy.workers_above_minimum.name}" } }, "width": 6, "xPos": 0, "yPos": 19 }, { "height": 4, "widget": { "alertChart": { "name": "${google_monitoring_alert_policy.pod_evictions.name}" } }, "width": 6, "xPos": 6, "yPos": 19 }, { "height": 1, "widget": { "text": { "content": "", "format": "RAW" }, "title": "CPU Utilization" }, "width": 12, "xPos": 0, "yPos": 23 }, { "height": 4, "widget": { "alertChart": { "name": "${google_monitoring_alert_policy.database_cpu.name}" } }, "width": 6, "xPos": 0, "yPos": 24 }, { "height": 4, "widget": { "alertChart": { "name": "${google_monitoring_alert_policy.scheduler_cpu.name}" } }, "width": 6, "xPos": 6, "yPos": 24 }, { "height": 4, "widget": { "alertChart": { "name": "${google_monitoring_alert_policy.worker_cpu.name}" } }, "width": 6, "xPos": 0, "yPos": 28 }, { "height": 4, "widget": { "alertChart": { "name": "${google_monitoring_alert_policy.webserver_cpu.name}" } }, "width": 6, "xPos": 6, "yPos": 28 }, { "height": 1, "widget": { "text": { "content": "", "format": "RAW" }, "title": "Memory Utilization" }, "width": 12, "xPos": 0, "yPos": 32 }, { "height": 4, "widget": { "alertChart": { "name": "${google_monitoring_alert_policy.database_memory.name}" } }, "width": 6, "xPos": 0, "yPos": 33 }, { "height": 4, "widget": { "alertChart": { "name": "${google_monitoring_alert_policy.scheduler_memory.name}" } }, "width": 6, "xPos": 6, "yPos": 33 }, { "height": 4, "widget": { "alertChart": { "name": "${google_monitoring_alert_policy.worker_memory.name}" } }, "width": 6, "xPos": 0, "yPos": 37 }, { "height": 4, "widget": { "alertChart": { "name": "${google_monitoring_alert_policy.webserver_memory.name}" } }, "width": 6, "xPos": 6, "yPos": 37 }, { "height": 1, "widget": { "text": { "content": "", "format": "RAW" }, "title": "Airflow component errors" }, "width": 12, "xPos": 0, "yPos": 41 }, { "height": 4, "widget": { "alertChart": { "name": "${google_monitoring_alert_policy.scheduler_errors.name}" } }, "width": 6, "xPos": 0, "yPos": 42 }, { "height": 4, "widget": { "alertChart": { "name": "${google_monitoring_alert_policy.worker_errors.name}" } }, "width": 6, "xPos": 6, "yPos": 42 }, { "height": 4, "widget": { "alertChart": { "name": "${google_monitoring_alert_policy.webserver_errors.name}" } }, "width": 6, "xPos": 0, "yPos": 48 }, { "height": 4, "widget": { "alertChart": { "name": "${google_monitoring_alert_policy.other_errors.name}" } }, "width": 6, "xPos": 6, "yPos": 48 }, { "height": 1, "widget": { "text": { "content": "", "format": "RAW" }, "title": "Task errors" }, "width": 12, "xPos": 0, "yPos": 52 } ] } } EOF }
修改
"google_monitoring_monitored_project"
resource
代码块:- 将
for_each
代码块中的项目列表替换为 受监控的项目。 - 将
metrics_scope
中的"YOUR_MONITORING_PROJECT"
替换为您的监控项目的名称。
- 将
查看配置并验证 Terraform 的资源 创建或更新符合您预期的。如有必要,请进行更正。
terraform plan
运行以下命令以应用 Terraform 配置 在提示符后输入 yes:
terraform apply
在 Monitoring 项目的 Google Cloud 控制台中,前往 Monitoring 信息中心页面:
在自定义标签页中,找到名为 Cloud Composer - Monitoring Platform 的自定义信息中心。