2026 年 9 月 15 日，所有 Cloud Composer 1 版本和 Cloud Composer 2 2.0.x 版都會按照計畫終止服務。您將無法使用這些版本的環境。建議您規劃遷移至 Cloud Composer 3。Cloud Composer 2 版本 2.1.x 以上仍受支援，不會受到這項異動影響。

本頁面由 Cloud Translation API 翻譯而成。

使用 Terraform 監控跨專案環境

Cloud Composer 3 | Cloud Composer 2 | Cloud Composer�

本頁說明如何為同一機構中選定專案的多個 Cloud Composer 環境，實作整合式監控資訊主頁。

總覽

所述解決方案可協助中央企業平台團隊，支援其他團隊使用的 Cloud Composer 環境。這項實作方式可用於監控所有 Cloud Composer 環境，即使這些環境不是使用 Terraform 建立也沒問題。

本指南會在 Cloud Composer 中導入 Cloud Monitoring 資訊主頁，並設定快訊政策，持續回報 Cloud Composer 環境的重要指標，並在發生問題時引發事件。資訊主頁會自動掃描所選專案中，用於這項監控的所有 Cloud Composer 環境。實作方式是使用 Terraform。

這個模型會使用 Google Cloud 專案做為監控專案，用來監控 (唯讀) 部署在多個受監控專案中的 Cloud Composer 環境。中央資訊主頁會使用受監控專案的 Cloud Monitoring 指標，轉譯其內容。

這張圖顯示監控專案 (內含監控資訊主頁)，以及三個受監控專案 (各內含 Composer 環境)。每個受監控的專案都有一個箭頭，指向標示為「指標」的受監控專案

資訊主頁會監控多項指標並建立快訊，包括環境健康狀態：

監控資訊主頁的螢幕截圖，顯示環境健康狀態、資料庫健康狀態、網頁伺服器健康狀態和排程器心跳

或 CPU 指標：

螢幕截圖：監控資訊主頁，顯示資料庫 CPU、排程器 CPU、工作站 CPU 和網頁伺服器 CPU

將指標懸停在特定行上，即可查看代表的環境。然後，資訊主頁會顯示專案名稱和資源：

螢幕截圖：監控資訊主頁，將游標懸停在某條線上時會顯示彈出式視窗。彈出式視窗會顯示四個資源，其中一個對應到該行。

如果指標超出預先定義的門檻，系統就會提出事件，並在對應指標的圖表中顯示相關快訊：

螢幕截圖：未解決的事件檢視畫面，顯示兩起未解決的事件。每個列出的事件都有一個連結，可供查看詳細資料。

受監控的指標清單

受監控指標的完整清單：

Cloud Composer 環境健康狀態 (以 Monitoring DAG 為準)
資料庫健康狀態
網路伺服器健康狀態
排程器活動訊號
所有工作站的 CPU 和記憶體使用率
Airflow 資料庫的 CPU 和記憶體使用率
網路伺服器的 CPU 和記憶體使用率 (僅適用於 Cloud Composer 2)
Airflow 排程器的 CPU 和記憶體使用率
環境中處於「已排入佇列」、「已排定」或「已排入佇列或已排定」狀態的工作比例 (有助於找出 Airflow 並行設定問題)
剖析 DAG 的時間
目前與最少工作站數量 - 有助於瞭解工作站穩定性問題或調度資源問題
工作站 Pod 撤銷次數
工作站、排程器、網路伺服器或其他元件在記錄檔中擲回的錯誤數量 (個別圖表)

事前準備

如要使用 Cloud Composer 和 Cloud Monitoring，請建立 Google Cloud 專案並啟用計費功能。專案必須包含 Cloud Composer 環境。在本指南中，這個專案稱為「監控專案」。

Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.

In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

Roles required to select or create a project

Select a project: Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
Create a project: To create a project, you need the Project Creator (roles/resourcemanager.projectCreator), which contains the resourcemanager.projects.create permission. Learn how to grant roles.

Go to project selector

Verify that billing is enabled for your Google Cloud project.

In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

Roles required to select or create a project

Select a project: Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
Create a project: To create a project, you need the Project Creator (roles/resourcemanager.projectCreator), which contains the resourcemanager.projects.create permission. Learn how to grant roles.

Go to project selector

Verify that billing is enabled for your Google Cloud project.

如果尚未安裝，請安裝 Terraform。
設定專案的指標範圍。根據預設，專案只能顯示或監控儲存的時間序列資料。如要顯示或監控儲存在多個專案中的資料，請設定專案的指標範圍。詳情請參閱「指標範圍總覽」。

導入步驟

在執行 Terraform 的本機電腦上，將 GOOGLE_CLOUD_PROJECT 環境變數設定為監控專案的ID：
```
export GOOGLE_CLOUD_PROJECT=MONITORING_PROJECT_ID
```
請確認 Terraform Google 供應商已通過驗證，且具備下列權限：
- roles/monitoring.editor 權限，位於監控專案中
- roles/monitoring.viewer、roles/logging.viewer，適用於所有受監控的專案

將下列 main.tf 檔案複製到執行 Terraform 的本機電腦。

按一下即可展開

#   Monitoring for multiple Cloud Composer environments
#
#   Usage:
#       1. Create a new project that you will use for monitoring of Cloud Composer environments in other projects
#       2. Replace YOUR_MONITORING_PROJECT with the name of this project in the "metrics_scope" parameter that is part of the "Add Monitored Projects to the Monitoring project" section
#       3. Replace the list of projects to monitor with your list of projects with Cloud Composer environments to be monitored in the "for_each" parameter of the "Add Monitored Projects to the Monitoring project" section
#       4. Set up your environment and apply the configuration following these steps: https://cloud.google.com/docs/terraform/basic-commands. Your GOOGLE_CLOUD_PROJECT environment variable should be the new monitoring project you just created.
#
#   The script creates the following resources in the monitoring project:
#           1. Adds monitored projects to Cloud Monitoring
#           2. Creates Alert Policies
#           3. Creates Monitoring Dashboard
#



#######################################################
#
# Add Monitored Projects to the Monitoring project
#
########################################################

resource "google_monitoring_monitored_project" "projects_monitored" {
  for_each      = toset(["YOUR_PROJECT_TO_MONITOR_1", "YOUR_PROJECT_TO_MONITOR_2", "YOUR_PROJECT_TO_MONITOR_3"])
  metrics_scope = join("", ["locations/global/metricsScopes/", "YOUR_MONITORING_PROJECT"])
  name          = each.value
}


#######################################################
#
# Create alert policies in Monitoring project
#
########################################################

resource "google_monitoring_alert_policy" "environment_health" {
  display_name = "Environment Health"
  combiner     = "OR"
  conditions {
    display_name = "Environment Health"
    condition_monitoring_query_language {
      query = join("", [
        "fetch cloud_composer_environment",
        "| {metric 'composer.googleapis.com/environment/dagbag_size'",
        "| group_by 5m, [value_dagbag_size_mean: if(mean(value.dagbag_size) > 0, 1, 0)]",
        "| align mean_aligner(5m)",
        "| group_by [resource.project_id, resource.environment_name],    [value_dagbag_size_mean_aggregate: aggregate(value_dagbag_size_mean)];  ",
        "metric 'composer.googleapis.com/environment/healthy'",
        "| group_by 5m,    [value_sum_signals: aggregate(if(value.healthy,1,0))]",
        "| align mean_aligner(5m)| absent_for 5m }",
        "| outer_join 0",
        "| group_by [resource.project_id, resource.environment_name]",
        "| value val(2)",
        "| align mean_aligner(5m)",
        "| window(5m)",
        "| condition val(0) < 0.9"
      ])
      duration = "120s"
      trigger {
        count = "1"
      }
    }
  }

  # uncomment to set an auto close strategy for the alert
  #alert_strategy {
  #    auto_close = "30m"
  #}
}

resource "google_monitoring_alert_policy" "database_health" {
  display_name = "Database Health"
  combiner     = "OR"
  conditions {
    display_name = "Database Health"
    condition_monitoring_query_language {
      query = join("", [
        "fetch cloud_composer_environment",
        "| metric 'composer.googleapis.com/environment/database_health'",
        "| group_by 5m,",
        "    [value_database_health_fraction_true: fraction_true(value.database_health)]",
        "| every 5m",
        "| group_by 5m,",
        "    [value_database_health_fraction_true_aggregate:",
        "       aggregate(value_database_health_fraction_true)]",
        "| every 5m",
        "| group_by [resource.project_id, resource.environment_name],",
        "    [value_database_health_fraction_true_aggregate_aggregate:",
        "       aggregate(value_database_health_fraction_true_aggregate)]",
      "| condition val() < 0.95"])
      duration = "120s"
      trigger {
        count = "1"
      }
    }
  }
  # uncomment to set an auto close strategy for the alert
  #alert_strategy {
  #    auto_close = "30m"
  #}
}

resource "google_monitoring_alert_policy" "webserver_health" {
  display_name = "Web Server Health"
  combiner     = "OR"
  conditions {
    display_name = "Web Server Health"
    condition_monitoring_query_language {
      query = join("", [
        "fetch cloud_composer_environment",
        "| metric 'composer.googleapis.com/environment/web_server/health'",
        "| group_by 5m, [value_health_fraction_true: fraction_true(value.health)]",
        "| every 5m",
        "| group_by 5m,",
        "    [value_health_fraction_true_aggregate:",
        "       aggregate(value_health_fraction_true)]",
        "| every 5m",
        "| group_by [resource.project_id, resource.environment_name],",
        "    [value_health_fraction_true_aggregate_aggregate:",
        "       aggregate(value_health_fraction_true_aggregate)]",
      "| condition val() < 0.95"])
      duration = "120s"
      trigger {
        count = "1"
      }
    }
  }

  # uncomment to set an auto close strategy for the alert
  #alert_strategy {
  #    auto_close = "30m"
  #}
}

resource "google_monitoring_alert_policy" "scheduler_heartbeat" {
  display_name = "Scheduler Heartbeat"
  combiner     = "OR"
  conditions {
    display_name = "Scheduler Heartbeat"
    condition_monitoring_query_language {
      query = join("", [
        "fetch cloud_composer_environment",
        "| metric 'composer.googleapis.com/environment/scheduler_heartbeat_count'",
        "| group_by 10m,",
        "    [value_scheduler_heartbeat_count_aggregate:",
        "      aggregate(value.scheduler_heartbeat_count)]",
        "| every 10m",
        "| group_by 10m,",
        "    [value_scheduler_heartbeat_count_aggregate_mean:",
        "       mean(value_scheduler_heartbeat_count_aggregate)]",
        "| every 10m",
        "| group_by [resource.project_id, resource.environment_name],",
        "    [value_scheduler_heartbeat_count_aggregate_mean_aggregate:",
        "       aggregate(value_scheduler_heartbeat_count_aggregate_mean)]",
      "| condition val() < 80"])
      duration = "120s"
      trigger {
        count = "1"
      }
    }
  }

  # uncomment to set an auto close strategy for the alert
  #alert_strategy {
  #    auto_close = "30m"
  #}
}

resource "google_monitoring_alert_policy" "database_cpu" {
  display_name = "Database CPU"
  combiner     = "OR"
  conditions {
    display_name = "Database CPU"
    condition_monitoring_query_language {
      query = join("", [
        "fetch cloud_composer_environment",
        "| metric 'composer.googleapis.com/environment/database/cpu/utilization'",
        "| group_by 10m, [value_utilization_mean: mean(value.utilization)]",
        "| every 10m",
        "| group_by [resource.project_id, resource.environment_name]",
      "| condition val() > 0.8"])
      duration = "120s"
      trigger {
        count = "1"
      }
    }
  }

  # uncomment to set an auto close strategy for the alert
  #alert_strategy {
  #    auto_close = "30m"
  #}
}

resource "google_monitoring_alert_policy" "scheduler_cpu" {
  display_name = "Scheduler CPU"
  combiner     = "OR"
  conditions {
    display_name = "Scheduler CPU"
    condition_monitoring_query_language {
      query = join("", [
        "fetch k8s_container",
        "| metric 'kubernetes.io/container/cpu/limit_utilization'",
        "| filter (resource.pod_name =~ 'airflow-scheduler-.*')",
        "| group_by 10m, [value_limit_utilization_mean: mean(value.limit_utilization)]",
        "| every 10m",
        "| group_by [resource.cluster_name],",
        "    [value_limit_utilization_mean_mean: mean(value_limit_utilization_mean)]",
      "| condition val() > 0.8"])
      duration = "120s"
      trigger {
        count = "1"
      }
    }
  }

  # uncomment to set an auto close strategy for the alert
  #alert_strategy {
  #    auto_close = "30m"
  #}
}

resource "google_monitoring_alert_policy" "worker_cpu" {
  display_name = "Worker CPU"
  combiner     = "OR"
  conditions {
    display_name = "Worker CPU"
    condition_monitoring_query_language {
      query = join("", [
        "fetch k8s_container",
        "| metric 'kubernetes.io/container/cpu/limit_utilization'",
        "| filter (resource.pod_name =~ 'airflow-worker.*')",
        "| group_by 10m, [value_limit_utilization_mean: mean(value.limit_utilization)]",
        "| every 10m",
        "| group_by [resource.cluster_name],",
        "    [value_limit_utilization_mean_mean: mean(value_limit_utilization_mean)]",
      "| condition val() > 0.8"])
      duration = "120s"
      trigger {
        count = "1"
      }
    }
  }

  # uncomment to set an auto close strategy for the alert
  #alert_strategy {
  #    auto_close = "30m"
  #}
}

resource "google_monitoring_alert_policy" "webserver_cpu" {
  display_name = "Web Server CPU"
  combiner     = "OR"
  conditions {
    display_name = "Web Server CPU"
    condition_monitoring_query_language {
      query = join("", [
        "fetch k8s_container",
        "| metric 'kubernetes.io/container/cpu/limit_utilization'",
        "| filter (resource.pod_name =~ 'airflow-webserver.*')",
        "| group_by 10m, [value_limit_utilization_mean: mean(value.limit_utilization)]",
        "| every 10m",
        "| group_by [resource.cluster_name],",
        "    [value_limit_utilization_mean_mean: mean(value_limit_utilization_mean)]",
      "| condition val() > 0.8"])
      duration = "120s"
      trigger {
        count = "1"
      }
    }
  }

  # uncomment to set an auto close strategy for the alert
  #alert_strategy {
  #    auto_close = "30m"
  #}
}

resource "google_monitoring_alert_policy" "parsing_time" {
  display_name = "DAG Parsing Time"
  combiner     = "OR"
  conditions {
    display_name = "DAG Parsing Time"
    condition_monitoring_query_language {
      query = join("", [
        "fetch cloud_composer_environment",
        "| metric 'composer.googleapis.com/environment/dag_processing/total_parse_time'",
        "| group_by 5m, [value_total_parse_time_mean: mean(value.total_parse_time)]",
        "| every 5m",
        "| group_by [resource.project_id, resource.environment_name]",
      "| condition val(0) > cast_units(30,\"s\")"])
      duration = "120s"
      trigger {
        count = "1"
      }
    }
  }
  # uncomment to set an auto close strategy for the alert
  #alert_strategy {
  #    auto_close = "30m"
  #}
}

resource "google_monitoring_alert_policy" "database_memory" {
  display_name = "Database Memory"
  combiner     = "OR"
  conditions {
    display_name = "Database Memory"
    condition_monitoring_query_language {
      query = join("", [
        "fetch cloud_composer_environment",
        "| metric 'composer.googleapis.com/environment/database/memory/utilization'",
        "| group_by 10m, [value_utilization_mean: mean(value.utilization)]",
        "| every 10m",
        "| group_by [resource.project_id, resource.environment_name]",
      "| condition val() > 0.8"])
      duration = "0s"
      trigger {
        count = "1"
      }
    }
  }
  # uncomment to set an auto close strategy for the alert
  #alert_strategy {
  #    auto_close = "30m"
  #}
}

resource "google_monitoring_alert_policy" "scheduler_memory" {
  display_name = "Scheduler Memory"
  combiner     = "OR"
  conditions {
    display_name = "Scheduler Memory"
    condition_monitoring_query_language {
      query = join("", [
        "fetch k8s_container",
        "| metric 'kubernetes.io/container/memory/limit_utilization'",
        "| filter (resource.pod_name =~ 'airflow-scheduler-.*')",
        "| group_by 10m, [value_limit_utilization_mean: mean(value.limit_utilization)]",
        "| every 10m",
        "| group_by [resource.cluster_name],",
        "    [value_limit_utilization_mean_mean: mean(value_limit_utilization_mean)]",
      "| condition val() > 0.8"])
      duration = "0s"
      trigger {
        count = "1"
      }
    }
  }
  documentation {
    content = join("", [
      "Scheduler Memory exceeds a threshold, summed across all schedulers in the environment. ",
    "Add more schedulers OR increase scheduler's memory OR reduce scheduling load (e.g. through lower parsing frequency or lower number of DAGs/tasks running"])
  }
  # uncomment to set an auto close strategy for the alert
  #alert_strategy {
  #    auto_close = "30m"
  #}
}

resource "google_monitoring_alert_policy" "worker_memory" {
  display_name = "Worker Memory"
  combiner     = "OR"
  conditions {
    display_name = "Worker Memory"
    condition_monitoring_query_language {
      query = join("", [
        "fetch k8s_container",
        "| metric 'kubernetes.io/container/memory/limit_utilization'",
        "| filter (resource.pod_name =~ 'airflow-worker.*')",
        "| group_by 10m, [value_limit_utilization_mean: mean(value.limit_utilization)]",
        "| every 10m",
        "| group_by [resource.cluster_name],",
        "    [value_limit_utilization_mean_mean: mean(value_limit_utilization_mean)]",
      "| condition val() > 0.8"])
      duration = "0s"
      trigger {
        count = "1"
      }
    }
  }
  # uncomment to set an auto close strategy for the alert
  #alert_strategy {
  #    auto_close = "30m"
  #}
}

resource "google_monitoring_alert_policy" "webserver_memory" {
  display_name = "Web Server Memory"
  combiner     = "OR"
  conditions {
    display_name = "Web Server Memory"
    condition_monitoring_query_language {
      query = join("", [
        "fetch k8s_container",
        "| metric 'kubernetes.io/container/memory/limit_utilization'",
        "| filter (resource.pod_name =~ 'airflow-webserver.*')",
        "| group_by 10m, [value_limit_utilization_mean: mean(value.limit_utilization)]",
        "| every 10m",
        "| group_by [resource.cluster_name],",
        "    [value_limit_utilization_mean_mean: mean(value_limit_utilization_mean)]",
      "| condition val() > 0.8"])
      duration = "0s"
      trigger {
        count = "1"
      }
    }
  }
  # uncomment to set an auto close strategy for the alert
  #alert_strategy {
  #    auto_close = "30m"
  #}
}

resource "google_monitoring_alert_policy" "scheduled_tasks_percentage" {
  display_name = "Scheduled Tasks Percentage"
  combiner     = "OR"
  conditions {
    display_name = "Scheduled Tasks Percentage"
    condition_monitoring_query_language {
      query = join("", [
        "fetch cloud_composer_environment",
        "| metric 'composer.googleapis.com/environment/unfinished_task_instances'",
        "| align mean_aligner(10m)",
        "| every(10m)",
        "| window(10m)",
        "| filter_ratio_by [resource.project_id, resource.environment_name], metric.state = 'scheduled'",
      "| condition val() > 0.80"])
      duration = "300s"
      trigger {
        count = "1"
      }
    }
  }
  # uncomment to set an auto close strategy for the alert
  #alert_strategy {
  #    auto_close = "30m"
  #}
}

resource "google_monitoring_alert_policy" "queued_tasks_percentage" {
  display_name = "Queued Tasks Percentage"
  combiner     = "OR"
  conditions {
    display_name = "Queued Tasks Percentage"
    condition_monitoring_query_language {
      query = join("", [
        "fetch cloud_composer_environment",
        "| metric 'composer.googleapis.com/environment/unfinished_task_instances'",
        "| align mean_aligner(10m)",
        "| every(10m)",
        "| window(10m)",
        "| filter_ratio_by [resource.project_id, resource.environment_name], metric.state = 'queued'",
        "| group_by [resource.project_id, resource.environment_name]",
      "| condition val() > 0.95"])
      duration = "300s"
      trigger {
        count = "1"
      }
    }
  }
  # uncomment to set an auto close strategy for the alert
  #alert_strategy {
  #    auto_close = "30m"
  #}
}

resource "google_monitoring_alert_policy" "queued_or_scheduled_tasks_percentage" {
  display_name = "Queued or Scheduled Tasks Percentage"
  combiner     = "OR"
  conditions {
    display_name = "Queued or Scheduled Tasks Percentage"
    condition_monitoring_query_language {
      query = join("", [
        "fetch cloud_composer_environment",
        "| metric 'composer.googleapis.com/environment/unfinished_task_instances'",
        "| align mean_aligner(10m)",
        "| every(10m)",
        "| window(10m)",
        "| filter_ratio_by [resource.project_id, resource.environment_name], or(metric.state = 'queued', metric.state = 'scheduled' )",
        "| group_by [resource.project_id, resource.environment_name]",
      "| condition val() > 0.80"])
      duration = "120s"
      trigger {
        count = "1"
      }
    }
  }
  # uncomment to set an auto close strategy for the alert
  #alert_strategy {
  #    auto_close = "30m"
  #}
}


resource "google_monitoring_alert_policy" "workers_above_minimum" {
  display_name = "Workers above minimum (negative = missing workers)"
  combiner     = "OR"
  conditions {
    display_name = "Workers above minimum"
    condition_monitoring_query_language {
      query = join("", [
        "fetch cloud_composer_environment",
        "| { metric 'composer.googleapis.com/environment/num_celery_workers'",
        "| group_by 5m, [value_num_celery_workers_mean: mean(value.num_celery_workers)]",
        "| every 5m",
        "; metric 'composer.googleapis.com/environment/worker/min_workers'",
        "| group_by 5m, [value_min_workers_mean: mean(value.min_workers)]",
        "| every 5m }",
        "| outer_join 0",
        "| sub",
        "| group_by [resource.project_id, resource.environment_name]",
      "| condition val() < 0"])
      duration = "0s"
      trigger {
        count = "1"
      }
    }
  }
  # uncomment to set an auto close strategy for the alert
  #alert_strategy {
  #    auto_close = "30m"
  #}
}

resource "google_monitoring_alert_policy" "pod_evictions" {
  display_name = "Worker pod evictions"
  combiner     = "OR"
  conditions {
    display_name = "Worker pod evictions"
    condition_monitoring_query_language {
      query = join("", [
        "fetch cloud_composer_environment",
        "| metric 'composer.googleapis.com/environment/worker/pod_eviction_count'",
        "| align delta(1m)",
        "| every 1m",
        "| group_by [resource.project_id, resource.environment_name]",
      "| condition val() > 0"])
      duration = "60s"
      trigger {
        count = "1"
      }
    }
  }
  # uncomment to set an auto close strategy for the alert
  #alert_strategy {
  #    auto_close = "30m"
  #}
}

resource "google_monitoring_alert_policy" "scheduler_errors" {
  display_name = "Scheduler Errors"
  combiner     = "OR"
  conditions {
    display_name = "Scheduler Errors"
    condition_monitoring_query_language {
      query = join("", [
        "fetch cloud_composer_environment",
        "| metric 'logging.googleapis.com/log_entry_count'",
        "| filter (metric.log == 'airflow-scheduler' && metric.severity == 'ERROR')",
        "| group_by 5m,",
        "    [value_log_entry_count_aggregate: aggregate(value.log_entry_count)]",
        "| every 5m",
        "| group_by [resource.project_id, resource.environment_name],",
        "    [value_log_entry_count_aggregate_max: max(value_log_entry_count_aggregate)]",
      "| condition val() > 50"])
      duration = "300s"
      trigger {
        count = "1"
      }
    }
  }
  # uncomment to set an auto close strategy for the alert
  #alert_strategy {
  #    auto_close = "30m"
  #}
}

resource "google_monitoring_alert_policy" "worker_errors" {
  display_name = "Worker Errors"
  combiner     = "OR"
  conditions {
    display_name = "Worker Errors"
    condition_monitoring_query_language {
      query = join("", [
        "fetch cloud_composer_environment",
        "| metric 'logging.googleapis.com/log_entry_count'",
        "| filter (metric.log == 'airflow-worker' && metric.severity == 'ERROR')",
        "| group_by 5m,",
        "    [value_log_entry_count_aggregate: aggregate(value.log_entry_count)]",
        "| every 5m",
        "| group_by [resource.project_id, resource.environment_name],",
        "    [value_log_entry_count_aggregate_max: max(value_log_entry_count_aggregate)]",
      "| condition val() > 50"])
      duration = "300s"
      trigger {
        count = "1"
      }
    }
  }
  # uncomment to set an auto close strategy for the alert
  #alert_strategy {
  #    auto_close = "30m"
  #}
}

resource "google_monitoring_alert_policy" "webserver_errors" {
  display_name = "Web Server Errors"
  combiner     = "OR"
  conditions {
    display_name = "Web Server Errors"
    condition_monitoring_query_language {
      query = join("", [
        "fetch cloud_composer_environment",
        "| metric 'logging.googleapis.com/log_entry_count'",
        "| filter (metric.log == 'airflow-webserver' && metric.severity == 'ERROR')",
        "| group_by 5m,",
        "    [value_log_entry_count_aggregate: aggregate(value.log_entry_count)]",
        "| every 5m",
        "| group_by [resource.project_id, resource.environment_name],",
        "    [value_log_entry_count_aggregate_max: max(value_log_entry_count_aggregate)]",
      "| condition val() > 50"])
      duration = "300s"
      trigger {
        count = "1"
      }
    }
  }
  # uncomment to set an auto close strategy for the alert
  #alert_strategy {
  #    auto_close = "30m"
  #}
}

resource "google_monitoring_alert_policy" "other_errors" {
  display_name = "Other Errors"
  combiner     = "OR"
  conditions {
    display_name = "Other Errors"
    condition_monitoring_query_language {
      query = join("", [
        "fetch cloud_composer_environment",
        "| metric 'logging.googleapis.com/log_entry_count'",
        "| filter",
        "    (metric.log !~ 'airflow-scheduler|airflow-worker|airflow-webserver'",
        "     && metric.severity == 'ERROR')",
        "| group_by 5m, [value_log_entry_count_max: max(value.log_entry_count)]",
        "| every 5m",
        "| group_by [resource.project_id, resource.environment_name],",
        "    [value_log_entry_count_max_aggregate: aggregate(value_log_entry_count_max)]",
      "| condition val() > 10"])
      duration = "300s"
      trigger {
        count = "1"
      }
    }
  }
  # uncomment to set an auto close strategy for the alert
  #alert_strategy {
  #    auto_close = "30m"
  #}
}


#######################################################
#
# Create Monitoring Dashboard
#
########################################################


resource "google_monitoring_dashboard" "Composer_Dashboard" {
  dashboard_json = <<EOF
{
  "category": "CUSTOM",
  "displayName": "Cloud Composer - Monitoring Platform",
  "mosaicLayout": {
    "columns": 12,
    "tiles": [
      {
        "height": 1,
        "widget": {
          "text": {
            "content": "",
            "format": "MARKDOWN"
          },
          "title": "Health"
        },
        "width": 12,
        "xPos": 0,
        "yPos": 0
      },
      {
        "height": 4,
        "widget": {
          "alertChart": {
            "name": "${google_monitoring_alert_policy.environment_health.name}"
          }
        },
        "width": 6,
        "xPos": 0,
        "yPos": 1
      },
      {
        "height": 4,
        "widget": {
          "alertChart": {
            "name": "${google_monitoring_alert_policy.database_health.name}"
          }
        },
        "width": 6,
        "xPos": 6,
        "yPos": 1
      },
      {
        "height": 4,
        "widget": {
          "alertChart": {
            "name": "${google_monitoring_alert_policy.webserver_health.name}"
          }
        },
        "width": 6,
        "xPos": 0,
        "yPos": 5
      },
      {
        "height": 4,
        "widget": {
          "alertChart": {
            "name": "${google_monitoring_alert_policy.scheduler_heartbeat.name}"
          }
        },
        "width": 6,
        "xPos": 6,
        "yPos": 5
      },
      {
        "height": 1,
        "widget": {
          "text": {
            "content": "",
            "format": "RAW"
          },
          "title": "Airflow Task Execution and DAG Parsing"
        },
        "width": 12,
        "xPos": 0,
        "yPos": 9
      },
      {
        "height": 4,
        "widget": {
          "alertChart": {
            "name": "${google_monitoring_alert_policy.scheduled_tasks_percentage.name}"
          }
        },
        "width": 6,
        "xPos": 0,
        "yPos": 10
      },
      {
        "height": 4,
        "widget": {
          "alertChart": {
            "name": "${google_monitoring_alert_policy.queued_tasks_percentage.name}"
          }
        },
        "width": 6,
        "xPos": 6,
        "yPos": 10
      },
      {
        "height": 4,
        "widget": {
          "alertChart": {
            "name": "${google_monitoring_alert_policy.queued_or_scheduled_tasks_percentage.name}"
          }
        },
        "width": 6,
        "xPos": 0,
        "yPos": 14
      },
      {
        "height": 4,
        "widget": {
          "alertChart": {
            "name": "${google_monitoring_alert_policy.parsing_time.name}"
          }
        },
        "width": 6,
        "xPos": 6,
        "yPos": 14
      },
      {
        "height": 1,
        "widget": {
          "text": {
            "content": "",
            "format": "RAW"
          },
          "title": "Workers presence"
        },
        "width": 12,
        "xPos": 0,
        "yPos": 18
      },
      {
        "height": 4,
        "widget": {
          "alertChart": {
            "name": "${google_monitoring_alert_policy.workers_above_minimum.name}"
          }
        },
        "width": 6,
        "xPos": 0,
        "yPos": 19
      },
      {
        "height": 4,
        "widget": {
          "alertChart": {
            "name": "${google_monitoring_alert_policy.pod_evictions.name}"
          }
        },
        "width": 6,
        "xPos": 6,
        "yPos": 19
      },
      {
        "height": 1,
        "widget": {
          "text": {
            "content": "",
            "format": "RAW"
          },
          "title": "CPU Utilization"
        },
        "width": 12,
        "xPos": 0,
        "yPos": 23
      },
      {
        "height": 4,
        "widget": {
          "alertChart": {
            "name": "${google_monitoring_alert_policy.database_cpu.name}"
          }
        },
        "width": 6,
        "xPos": 0,
        "yPos": 24
      },
      {
        "height": 4,
        "widget": {
          "alertChart": {
            "name": "${google_monitoring_alert_policy.scheduler_cpu.name}"
          }
        },
        "width": 6,
        "xPos": 6,
        "yPos": 24
      },
      {
        "height": 4,
        "widget": {
          "alertChart": {
            "name": "${google_monitoring_alert_policy.worker_cpu.name}"
          }
        },
        "width": 6,
        "xPos": 0,
        "yPos": 28
      },
      {
        "height": 4,
        "widget": {
          "alertChart": {
            "name": "${google_monitoring_alert_policy.webserver_cpu.name}"
          }
        },
        "width": 6,
        "xPos": 6,
        "yPos": 28
      },

      {
        "height": 1,
        "widget": {
          "text": {
            "content": "",
            "format": "RAW"
          },
          "title": "Memory Utilization"
        },
        "width": 12,
        "xPos": 0,
        "yPos": 32
      },
      {
        "height": 4,
        "widget": {
          "alertChart": {
            "name": "${google_monitoring_alert_policy.database_memory.name}"
          }
        },
        "width": 6,
        "xPos": 0,
        "yPos": 33
      },
      {
        "height": 4,
        "widget": {
          "alertChart": {
            "name": "${google_monitoring_alert_policy.scheduler_memory.name}"
          }
        },
        "width": 6,
        "xPos": 6,
        "yPos": 33
      },
      {
        "height": 4,
        "widget": {
          "alertChart": {
            "name": "${google_monitoring_alert_policy.worker_memory.name}"
          }
        },
        "width": 6,
        "xPos": 0,
        "yPos": 37
      },
      {
        "height": 4,
        "widget": {
          "alertChart": {
            "name": "${google_monitoring_alert_policy.webserver_memory.name}"
          }
        },
        "width": 6,
        "xPos": 6,
        "yPos": 37
      },
      {
        "height": 1,
        "widget": {
          "text": {
            "content": "",
            "format": "RAW"
          },
          "title": "Airflow component errors"
        },
        "width": 12,
        "xPos": 0,
        "yPos": 41
      },
      {
        "height": 4,
        "widget": {
          "alertChart": {
            "name": "${google_monitoring_alert_policy.scheduler_errors.name}"
          }
        },
        "width": 6,
        "xPos": 0,
        "yPos": 42
      },
      {
        "height": 4,
        "widget": {
          "alertChart": {
            "name": "${google_monitoring_alert_policy.worker_errors.name}"
          }
        },
        "width": 6,
        "xPos": 6,
        "yPos": 42
      },
            {
        "height": 4,
        "widget": {
          "alertChart": {
            "name": "${google_monitoring_alert_policy.webserver_errors.name}"
          }
        },
        "width": 6,
        "xPos": 0,
        "yPos": 48
      },
      {
        "height": 4,
        "widget": {
          "alertChart": {
            "name": "${google_monitoring_alert_policy.other_errors.name}"
          }
        },
        "width": 6,
        "xPos": 6,
        "yPos": 48
      },
      {
        "height": 1,
        "widget": {
          "text": {
            "content": "",
            "format": "RAW"
          },
          "title": "Task errors"
        },
        "width": 12,
        "xPos": 0,
        "yPos": 52
      }
    ]
  }
}
EOF
}

編輯 "google_monitoring_monitored_project" resource 區塊：
1. 將 for_each 區塊中的專案清單替換為您的「受監控專案」。
2. 將 metrics_scope 中的 "YOUR_MONITORING_PROJECT" 替換為您的監控專案名稱。
檢查設定，確認 Terraform 即將建立或更新的資源符合預期。視需要進行修正。
```
terraform plan
```
執行下列指令，並在提示中輸入 yes，即可套用 Terraform 設定：
```
terraform apply
```
在 Google Cloud 「監控專案」的控制台中，前往「監控資訊主頁」頁面：

前往 Monitoring 資訊主頁
在「自訂」分頁中，找到名為「Cloud Composer - Monitoring Platform」的自訂資訊主頁。

後續步驟

監控環境