Cloud Composer 1 | Cloud Composer 2 | Cloud Composer 3
이 페이지에서는 동일한 조직에서 선택한 프로젝트 전체에 걸쳐 여러 Cloud Composer 환경에 통합 모니터링 대시보드를 구현하는 방법을 보여줍니다.
개요
설명된 솔루션은 중앙 엔터프라이즈 플랫폼 팀이 다른 팀에서 사용하는 Cloud Composer 환경을 지원하는 데 도움이 될 수 있습니다. 이 구현은 Terraform을 사용하여 생성되지 않은 환경을 포함하여 모든 Cloud Composer 환경을 모니터링하는 데 사용될 수 있습니다.
이 가이드에서는 Cloud Composer 환경의 주요 측정항목을 연속적으로 보고하고 문제 발생 시 이슈를 제기하는 알림 정책과 함께 Cloud Composer에서 Cloud Monitoring 대시보드를 구현합니다. 대시보드는 이 모니터링에 선택한 프로젝트의 모든 Cloud Composer 환경을 자동으로 스캔합니다. 이 구현에서는 Terraform을 사용합니다.
이 모델은 여러 모니터링 프로젝트에 배포된 (읽기 전용) Cloud Composer 환경의 모니터링에 사용되는 모니터링 프로젝트 역할을 하는 Google Cloud 프로젝트를 사용합니다. 중앙 대시보드는 모니터링 프로젝트의 Cloud Monitoring 측정항목을 사용하여 콘텐츠를 렌더링합니다.
대시보드는 환경 상태를 포함하여 여러 측정항목에 대한 알림을 모니터링하고 만듭니다.
CPU 측정항목의 경우는 다음과 같습니다.
특정 선 위에 마우스를 놓으면 해당 선이 나타내는 환경을 확인할 수 있습니다. 그런 후 대시보드에 프로젝트 이름과 리소스가 표시됩니다.
측정항목이 사전 정의된 기준점을 초과하면 이슈가 발생하고 이 측정항목에 해당하는 차트에 각 알림이 표시됩니다.
모니터링되는 측정항목 목록
모니터링되는 측정항목의 전체 목록:
- Cloud Composer 환경 상태(Monitoring DAG 기준)
- 데이터베이스 상태
- 웹 서버 상태
- 스케줄러 하트비트
- 모든 작업자의 CPU 및 메모리 사용률
- Airflow 데이터베이스의 CPU 및 메모리 사용률
- 웹 서버의 CPU 및 메모리 사용률
- Airflow 스케줄러의 CPU 및 메모리 사용률
- 환경에서 큐에 추가, 예약, 큐에 추가되거나 예약된 태스크 비율(Airflow 동시 실행 구성 문제를 파악하는 데 유용)
- DAG 파싱 시간
- 현재 작업자 수 또는 최소 작업자 수 - 작업자 안정성 문제 또는 확장 문제를 이해하는 데 유용함
- 작업자 포드 제거
- 작업자, 스케줄러, 웹 서버 또는 기타 구성요소(개별 차트)에 의해 로그에 발생한 오류 수
시작하기 전에
Cloud Composer 및 Cloud Monitoring을 사용하려면 Google Cloud 프로젝트를 만들고 결제를 사용 설정해야 합니다. 프로젝트에 Cloud Composer 환경이 포함되어야 합니다. 이 가이드에서는 이 프로젝트를 Monitoring 프로젝트라고 합니다.
- Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.
-
In the Google Cloud console, on the project selector page, select or create a Google Cloud project.
-
Make sure that billing is enabled for your Google Cloud project.
-
In the Google Cloud console, on the project selector page, select or create a Google Cloud project.
-
Make sure that billing is enabled for your Google Cloud project.
- Terraform이 아직 설치되어 있지 않으면 설치합니다.
- 프로젝트의 측정항목 범위를 구성합니다. 기본적으로 프로젝트는 저장되는 시계열 데이터만 표시하거나 모니터링할 수 있습니다. 여러 프로젝트에 저장된 데이터를 표시하거나 모니터링하려면 프로젝트의 측정항목 범위를 구성합니다. 자세한 내용은 측정항목 범위 개요를 참조하세요.
구현 단계
Terraform을 실행하는 로컬 컴퓨터에서
GOOGLE_CLOUD_PROJECT
환경 변수를 모니터링 프로젝트의 ID로 설정합니다.export GOOGLE_CLOUD_PROJECT=MONITORING_PROJECT_ID
Terraform Google 공급업체가 인증되었고 다음 권한에 액세스할 수 있는지 확인하세요.
- Monitoring 프로젝트의
roles/monitoring.editor
권한 - 모든 모니터링되는 프로젝트의
roles/monitoring.viewer
,roles/logging.viewer
- Monitoring 프로젝트의
다음
main.tf
파일을 Terraform을 실행하는 로컬 컴퓨터로 복사합니다.클릭하여 펼치기
# Monitoring for multiple Cloud Composer environments # # Usage: # 1. Create a new project that you will use for monitoring of Cloud Composer environments in other projects # 2. Replace YOUR_MONITORING_PROJECT with the name of this project in the "metrics_scope" parameter that is part of the "Add Monitored Projects to the Monitoring project" section # 3. Replace the list of projects to monitor with your list of projects with Cloud Composer environments to be monitored in the "for_each" parameter of the "Add Monitored Projects to the Monitoring project" section # 4. Set up your environment and apply the configuration following these steps: https://cloud.google.com/docs/terraform/basic-commands. Your GOOGLE_CLOUD_PROJECT environment variable should be the new monitoring project you just created. # # The script creates the following resources in the monitoring project: # 1. Adds monitored projects to Cloud Monitoring # 2. Creates Alert Policies # 3. Creates Monitoring Dashboard # ####################################################### # # Add Monitored Projects to the Monitoring project # ######################################################## resource "google_monitoring_monitored_project" "projects_monitored" { for_each = toset(["YOUR_PROJECT_TO_MONITOR_1", "YOUR_PROJECT_TO_MONITOR_2", "YOUR_PROJECT_TO_MONITOR_3"]) metrics_scope = join("", ["locations/global/metricsScopes/", "YOUR_MONITORING_PROJECT"]) name = each.value } ####################################################### # # Create alert policies in Monitoring project # ######################################################## resource "google_monitoring_alert_policy" "environment_health" { display_name = "Environment Health" combiner = "OR" conditions { display_name = "Environment Health" condition_monitoring_query_language { query = join("", [ "fetch cloud_composer_environment", "| {metric 'composer.googleapis.com/environment/dagbag_size'", "| group_by 5m, [value_dagbag_size_mean: if(mean(value.dagbag_size) > 0, 1, 0)]", "| align mean_aligner(5m)", "| group_by [resource.project_id, resource.environment_name], [value_dagbag_size_mean_aggregate: aggregate(value_dagbag_size_mean)]; ", "metric 'composer.googleapis.com/environment/healthy'", "| group_by 5m, [value_sum_signals: aggregate(if(value.healthy,1,0))]", "| align mean_aligner(5m)| absent_for 5m }", "| outer_join 0", "| group_by [resource.project_id, resource.environment_name]", "| value val(2)", "| align mean_aligner(5m)", "| window(5m)", "| condition val(0) < 0.9" ]) duration = "120s" trigger { count = "1" } } } # uncomment to set an auto close strategy for the alert #alert_strategy { # auto_close = "30m" #} } resource "google_monitoring_alert_policy" "database_health" { display_name = "Database Health" combiner = "OR" conditions { display_name = "Database Health" condition_monitoring_query_language { query = join("", [ "fetch cloud_composer_environment", "| metric 'composer.googleapis.com/environment/database_health'", "| group_by 5m,", " [value_database_health_fraction_true: fraction_true(value.database_health)]", "| every 5m", "| group_by 5m,", " [value_database_health_fraction_true_aggregate:", " aggregate(value_database_health_fraction_true)]", "| every 5m", "| group_by [resource.project_id, resource.environment_name],", " [value_database_health_fraction_true_aggregate_aggregate:", " aggregate(value_database_health_fraction_true_aggregate)]", "| condition val() < 0.95"]) duration = "120s" trigger { count = "1" } } } # uncomment to set an auto close strategy for the alert #alert_strategy { # auto_close = "30m" #} } resource "google_monitoring_alert_policy" "webserver_health" { display_name = "Web Server Health" combiner = "OR" conditions { display_name = "Web Server Health" condition_monitoring_query_language { query = join("", [ "fetch cloud_composer_environment", "| metric 'composer.googleapis.com/environment/web_server/health'", "| group_by 5m, [value_health_fraction_true: fraction_true(value.health)]", "| every 5m", "| group_by 5m,", " [value_health_fraction_true_aggregate:", " aggregate(value_health_fraction_true)]", "| every 5m", "| group_by [resource.project_id, resource.environment_name],", " [value_health_fraction_true_aggregate_aggregate:", " aggregate(value_health_fraction_true_aggregate)]", "| condition val() < 0.95"]) duration = "120s" trigger { count = "1" } } } # uncomment to set an auto close strategy for the alert #alert_strategy { # auto_close = "30m" #} } resource "google_monitoring_alert_policy" "scheduler_heartbeat" { display_name = "Scheduler Heartbeat" combiner = "OR" conditions { display_name = "Scheduler Heartbeat" condition_monitoring_query_language { query = join("", [ "fetch cloud_composer_environment", "| metric 'composer.googleapis.com/environment/scheduler_heartbeat_count'", "| group_by 10m,", " [value_scheduler_heartbeat_count_aggregate:", " aggregate(value.scheduler_heartbeat_count)]", "| every 10m", "| group_by 10m,", " [value_scheduler_heartbeat_count_aggregate_mean:", " mean(value_scheduler_heartbeat_count_aggregate)]", "| every 10m", "| group_by [resource.project_id, resource.environment_name],", " [value_scheduler_heartbeat_count_aggregate_mean_aggregate:", " aggregate(value_scheduler_heartbeat_count_aggregate_mean)]", "| condition val() < 80"]) duration = "120s" trigger { count = "1" } } } # uncomment to set an auto close strategy for the alert #alert_strategy { # auto_close = "30m" #} } resource "google_monitoring_alert_policy" "database_cpu" { display_name = "Database CPU" combiner = "OR" conditions { display_name = "Database CPU" condition_monitoring_query_language { query = join("", [ "fetch cloud_composer_environment", "| metric 'composer.googleapis.com/environment/database/cpu/utilization'", "| group_by 10m, [value_utilization_mean: mean(value.utilization)]", "| every 10m", "| group_by [resource.project_id, resource.environment_name]", "| condition val() > 0.8"]) duration = "120s" trigger { count = "1" } } } # uncomment to set an auto close strategy for the alert #alert_strategy { # auto_close = "30m" #} } resource "google_monitoring_alert_policy" "scheduler_cpu" { display_name = "Scheduler CPU" combiner = "OR" conditions { display_name = "Scheduler CPU" condition_monitoring_query_language { query = join("", [ "fetch k8s_container", "| metric 'kubernetes.io/container/cpu/limit_utilization'", "| filter (resource.pod_name =~ 'airflow-scheduler-.*')", "| group_by 10m, [value_limit_utilization_mean: mean(value.limit_utilization)]", "| every 10m", "| group_by [resource.cluster_name],", " [value_limit_utilization_mean_mean: mean(value_limit_utilization_mean)]", "| condition val() > 0.8"]) duration = "120s" trigger { count = "1" } } } # uncomment to set an auto close strategy for the alert #alert_strategy { # auto_close = "30m" #} } resource "google_monitoring_alert_policy" "worker_cpu" { display_name = "Worker CPU" combiner = "OR" conditions { display_name = "Worker CPU" condition_monitoring_query_language { query = join("", [ "fetch k8s_container", "| metric 'kubernetes.io/container/cpu/limit_utilization'", "| filter (resource.pod_name =~ 'airflow-worker.*')", "| group_by 10m, [value_limit_utilization_mean: mean(value.limit_utilization)]", "| every 10m", "| group_by [resource.cluster_name],", " [value_limit_utilization_mean_mean: mean(value_limit_utilization_mean)]", "| condition val() > 0.8"]) duration = "120s" trigger { count = "1" } } } # uncomment to set an auto close strategy for the alert #alert_strategy { # auto_close = "30m" #} } resource "google_monitoring_alert_policy" "webserver_cpu" { display_name = "Web Server CPU" combiner = "OR" conditions { display_name = "Web Server CPU" condition_monitoring_query_language { query = join("", [ "fetch k8s_container", "| metric 'kubernetes.io/container/cpu/limit_utilization'", "| filter (resource.pod_name =~ 'airflow-webserver.*')", "| group_by 10m, [value_limit_utilization_mean: mean(value.limit_utilization)]", "| every 10m", "| group_by [resource.cluster_name],", " [value_limit_utilization_mean_mean: mean(value_limit_utilization_mean)]", "| condition val() > 0.8"]) duration = "120s" trigger { count = "1" } } } # uncomment to set an auto close strategy for the alert #alert_strategy { # auto_close = "30m" #} } resource "google_monitoring_alert_policy" "parsing_time" { display_name = "DAG Parsing Time" combiner = "OR" conditions { display_name = "DAG Parsing Time" condition_monitoring_query_language { query = join("", [ "fetch cloud_composer_environment", "| metric 'composer.googleapis.com/environment/dag_processing/total_parse_time'", "| group_by 5m, [value_total_parse_time_mean: mean(value.total_parse_time)]", "| every 5m", "| group_by [resource.project_id, resource.environment_name]", "| condition val(0) > cast_units(30,\"s\")"]) duration = "120s" trigger { count = "1" } } } # uncomment to set an auto close strategy for the alert #alert_strategy { # auto_close = "30m" #} } resource "google_monitoring_alert_policy" "database_memory" { display_name = "Database Memory" combiner = "OR" conditions { display_name = "Database Memory" condition_monitoring_query_language { query = join("", [ "fetch cloud_composer_environment", "| metric 'composer.googleapis.com/environment/database/memory/utilization'", "| group_by 10m, [value_utilization_mean: mean(value.utilization)]", "| every 10m", "| group_by [resource.project_id, resource.environment_name]", "| condition val() > 0.8"]) duration = "0s" trigger { count = "1" } } } # uncomment to set an auto close strategy for the alert #alert_strategy { # auto_close = "30m" #} } resource "google_monitoring_alert_policy" "scheduler_memory" { display_name = "Scheduler Memory" combiner = "OR" conditions { display_name = "Scheduler Memory" condition_monitoring_query_language { query = join("", [ "fetch k8s_container", "| metric 'kubernetes.io/container/memory/limit_utilization'", "| filter (resource.pod_name =~ 'airflow-scheduler-.*')", "| group_by 10m, [value_limit_utilization_mean: mean(value.limit_utilization)]", "| every 10m", "| group_by [resource.cluster_name],", " [value_limit_utilization_mean_mean: mean(value_limit_utilization_mean)]", "| condition val() > 0.8"]) duration = "0s" trigger { count = "1" } } } documentation { content = join("", [ "Scheduler Memory exceeds a threshold, summed across all schedulers in the environment. ", "Add more schedulers OR increase scheduler's memory OR reduce scheduling load (e.g. through lower parsing frequency or lower number of DAGs/tasks running"]) } # uncomment to set an auto close strategy for the alert #alert_strategy { # auto_close = "30m" #} } resource "google_monitoring_alert_policy" "worker_memory" { display_name = "Worker Memory" combiner = "OR" conditions { display_name = "Worker Memory" condition_monitoring_query_language { query = join("", [ "fetch k8s_container", "| metric 'kubernetes.io/container/memory/limit_utilization'", "| filter (resource.pod_name =~ 'airflow-worker.*')", "| group_by 10m, [value_limit_utilization_mean: mean(value.limit_utilization)]", "| every 10m", "| group_by [resource.cluster_name],", " [value_limit_utilization_mean_mean: mean(value_limit_utilization_mean)]", "| condition val() > 0.8"]) duration = "0s" trigger { count = "1" } } } # uncomment to set an auto close strategy for the alert #alert_strategy { # auto_close = "30m" #} } resource "google_monitoring_alert_policy" "webserver_memory" { display_name = "Web Server Memory" combiner = "OR" conditions { display_name = "Web Server Memory" condition_monitoring_query_language { query = join("", [ "fetch k8s_container", "| metric 'kubernetes.io/container/memory/limit_utilization'", "| filter (resource.pod_name =~ 'airflow-webserver.*')", "| group_by 10m, [value_limit_utilization_mean: mean(value.limit_utilization)]", "| every 10m", "| group_by [resource.cluster_name],", " [value_limit_utilization_mean_mean: mean(value_limit_utilization_mean)]", "| condition val() > 0.8"]) duration = "0s" trigger { count = "1" } } } # uncomment to set an auto close strategy for the alert #alert_strategy { # auto_close = "30m" #} } resource "google_monitoring_alert_policy" "scheduled_tasks_percentage" { display_name = "Scheduled Tasks Percentage" combiner = "OR" conditions { display_name = "Scheduled Tasks Percentage" condition_monitoring_query_language { query = join("", [ "fetch cloud_composer_environment", "| metric 'composer.googleapis.com/environment/unfinished_task_instances'", "| align mean_aligner(10m)", "| every(10m)", "| window(10m)", "| filter_ratio_by [resource.project_id, resource.environment_name], metric.state = 'scheduled'", "| condition val() > 0.80"]) duration = "300s" trigger { count = "1" } } } # uncomment to set an auto close strategy for the alert #alert_strategy { # auto_close = "30m" #} } resource "google_monitoring_alert_policy" "queued_tasks_percentage" { display_name = "Queued Tasks Percentage" combiner = "OR" conditions { display_name = "Queued Tasks Percentage" condition_monitoring_query_language { query = join("", [ "fetch cloud_composer_environment", "| metric 'composer.googleapis.com/environment/unfinished_task_instances'", "| align mean_aligner(10m)", "| every(10m)", "| window(10m)", "| filter_ratio_by [resource.project_id, resource.environment_name], metric.state = 'queued'", "| group_by [resource.project_id, resource.environment_name]", "| condition val() > 0.95"]) duration = "300s" trigger { count = "1" } } } # uncomment to set an auto close strategy for the alert #alert_strategy { # auto_close = "30m" #} } resource "google_monitoring_alert_policy" "queued_or_scheduled_tasks_percentage" { display_name = "Queued or Scheduled Tasks Percentage" combiner = "OR" conditions { display_name = "Queued or Scheduled Tasks Percentage" condition_monitoring_query_language { query = join("", [ "fetch cloud_composer_environment", "| metric 'composer.googleapis.com/environment/unfinished_task_instances'", "| align mean_aligner(10m)", "| every(10m)", "| window(10m)", "| filter_ratio_by [resource.project_id, resource.environment_name], or(metric.state = 'queued', metric.state = 'scheduled' )", "| group_by [resource.project_id, resource.environment_name]", "| condition val() > 0.80"]) duration = "120s" trigger { count = "1" } } } # uncomment to set an auto close strategy for the alert #alert_strategy { # auto_close = "30m" #} } resource "google_monitoring_alert_policy" "workers_above_minimum" { display_name = "Workers above minimum (negative = missing workers)" combiner = "OR" conditions { display_name = "Workers above minimum" condition_monitoring_query_language { query = join("", [ "fetch cloud_composer_environment", "| { metric 'composer.googleapis.com/environment/num_celery_workers'", "| group_by 5m, [value_num_celery_workers_mean: mean(value.num_celery_workers)]", "| every 5m", "; metric 'composer.googleapis.com/environment/worker/min_workers'", "| group_by 5m, [value_min_workers_mean: mean(value.min_workers)]", "| every 5m }", "| outer_join 0", "| sub", "| group_by [resource.project_id, resource.environment_name]", "| condition val() < 0"]) duration = "0s" trigger { count = "1" } } } # uncomment to set an auto close strategy for the alert #alert_strategy { # auto_close = "30m" #} } resource "google_monitoring_alert_policy" "pod_evictions" { display_name = "Worker pod evictions" combiner = "OR" conditions { display_name = "Worker pod evictions" condition_monitoring_query_language { query = join("", [ "fetch cloud_composer_environment", "| metric 'composer.googleapis.com/environment/worker/pod_eviction_count'", "| align delta(1m)", "| every 1m", "| group_by [resource.project_id, resource.environment_name]", "| condition val() > 0"]) duration = "60s" trigger { count = "1" } } } # uncomment to set an auto close strategy for the alert #alert_strategy { # auto_close = "30m" #} } resource "google_monitoring_alert_policy" "scheduler_errors" { display_name = "Scheduler Errors" combiner = "OR" conditions { display_name = "Scheduler Errors" condition_monitoring_query_language { query = join("", [ "fetch cloud_composer_environment", "| metric 'logging.googleapis.com/log_entry_count'", "| filter (metric.log == 'airflow-scheduler' && metric.severity == 'ERROR')", "| group_by 5m,", " [value_log_entry_count_aggregate: aggregate(value.log_entry_count)]", "| every 5m", "| group_by [resource.project_id, resource.environment_name],", " [value_log_entry_count_aggregate_max: max(value_log_entry_count_aggregate)]", "| condition val() > 50"]) duration = "300s" trigger { count = "1" } } } # uncomment to set an auto close strategy for the alert #alert_strategy { # auto_close = "30m" #} } resource "google_monitoring_alert_policy" "worker_errors" { display_name = "Worker Errors" combiner = "OR" conditions { display_name = "Worker Errors" condition_monitoring_query_language { query = join("", [ "fetch cloud_composer_environment", "| metric 'logging.googleapis.com/log_entry_count'", "| filter (metric.log == 'airflow-worker' && metric.severity == 'ERROR')", "| group_by 5m,", " [value_log_entry_count_aggregate: aggregate(value.log_entry_count)]", "| every 5m", "| group_by [resource.project_id, resource.environment_name],", " [value_log_entry_count_aggregate_max: max(value_log_entry_count_aggregate)]", "| condition val() > 50"]) duration = "300s" trigger { count = "1" } } } # uncomment to set an auto close strategy for the alert #alert_strategy { # auto_close = "30m" #} } resource "google_monitoring_alert_policy" "webserver_errors" { display_name = "Web Server Errors" combiner = "OR" conditions { display_name = "Web Server Errors" condition_monitoring_query_language { query = join("", [ "fetch cloud_composer_environment", "| metric 'logging.googleapis.com/log_entry_count'", "| filter (metric.log == 'airflow-webserver' && metric.severity == 'ERROR')", "| group_by 5m,", " [value_log_entry_count_aggregate: aggregate(value.log_entry_count)]", "| every 5m", "| group_by [resource.project_id, resource.environment_name],", " [value_log_entry_count_aggregate_max: max(value_log_entry_count_aggregate)]", "| condition val() > 50"]) duration = "300s" trigger { count = "1" } } } # uncomment to set an auto close strategy for the alert #alert_strategy { # auto_close = "30m" #} } resource "google_monitoring_alert_policy" "other_errors" { display_name = "Other Errors" combiner = "OR" conditions { display_name = "Other Errors" condition_monitoring_query_language { query = join("", [ "fetch cloud_composer_environment", "| metric 'logging.googleapis.com/log_entry_count'", "| filter", " (metric.log !~ 'airflow-scheduler|airflow-worker|airflow-webserver'", " && metric.severity == 'ERROR')", "| group_by 5m, [value_log_entry_count_max: max(value.log_entry_count)]", "| every 5m", "| group_by [resource.project_id, resource.environment_name],", " [value_log_entry_count_max_aggregate: aggregate(value_log_entry_count_max)]", "| condition val() > 10"]) duration = "300s" trigger { count = "1" } } } # uncomment to set an auto close strategy for the alert #alert_strategy { # auto_close = "30m" #} } ####################################################### # # Create Monitoring Dashboard # ######################################################## resource "google_monitoring_dashboard" "Composer_Dashboard" { dashboard_json = <<EOF { "category": "CUSTOM", "displayName": "Cloud Composer - Monitoring Platform", "mosaicLayout": { "columns": 12, "tiles": [ { "height": 1, "widget": { "text": { "content": "", "format": "MARKDOWN" }, "title": "Health" }, "width": 12, "xPos": 0, "yPos": 0 }, { "height": 4, "widget": { "alertChart": { "name": "${google_monitoring_alert_policy.environment_health.name}" } }, "width": 6, "xPos": 0, "yPos": 1 }, { "height": 4, "widget": { "alertChart": { "name": "${google_monitoring_alert_policy.database_health.name}" } }, "width": 6, "xPos": 6, "yPos": 1 }, { "height": 4, "widget": { "alertChart": { "name": "${google_monitoring_alert_policy.webserver_health.name}" } }, "width": 6, "xPos": 0, "yPos": 5 }, { "height": 4, "widget": { "alertChart": { "name": "${google_monitoring_alert_policy.scheduler_heartbeat.name}" } }, "width": 6, "xPos": 6, "yPos": 5 }, { "height": 1, "widget": { "text": { "content": "", "format": "RAW" }, "title": "Airflow Task Execution and DAG Parsing" }, "width": 12, "xPos": 0, "yPos": 9 }, { "height": 4, "widget": { "alertChart": { "name": "${google_monitoring_alert_policy.scheduled_tasks_percentage.name}" } }, "width": 6, "xPos": 0, "yPos": 10 }, { "height": 4, "widget": { "alertChart": { "name": "${google_monitoring_alert_policy.queued_tasks_percentage.name}" } }, "width": 6, "xPos": 6, "yPos": 10 }, { "height": 4, "widget": { "alertChart": { "name": "${google_monitoring_alert_policy.queued_or_scheduled_tasks_percentage.name}" } }, "width": 6, "xPos": 0, "yPos": 14 }, { "height": 4, "widget": { "alertChart": { "name": "${google_monitoring_alert_policy.parsing_time.name}" } }, "width": 6, "xPos": 6, "yPos": 14 }, { "height": 1, "widget": { "text": { "content": "", "format": "RAW" }, "title": "Workers presence" }, "width": 12, "xPos": 0, "yPos": 18 }, { "height": 4, "widget": { "alertChart": { "name": "${google_monitoring_alert_policy.workers_above_minimum.name}" } }, "width": 6, "xPos": 0, "yPos": 19 }, { "height": 4, "widget": { "alertChart": { "name": "${google_monitoring_alert_policy.pod_evictions.name}" } }, "width": 6, "xPos": 6, "yPos": 19 }, { "height": 1, "widget": { "text": { "content": "", "format": "RAW" }, "title": "CPU Utilization" }, "width": 12, "xPos": 0, "yPos": 23 }, { "height": 4, "widget": { "alertChart": { "name": "${google_monitoring_alert_policy.database_cpu.name}" } }, "width": 6, "xPos": 0, "yPos": 24 }, { "height": 4, "widget": { "alertChart": { "name": "${google_monitoring_alert_policy.scheduler_cpu.name}" } }, "width": 6, "xPos": 6, "yPos": 24 }, { "height": 4, "widget": { "alertChart": { "name": "${google_monitoring_alert_policy.worker_cpu.name}" } }, "width": 6, "xPos": 0, "yPos": 28 }, { "height": 4, "widget": { "alertChart": { "name": "${google_monitoring_alert_policy.webserver_cpu.name}" } }, "width": 6, "xPos": 6, "yPos": 28 }, { "height": 1, "widget": { "text": { "content": "", "format": "RAW" }, "title": "Memory Utilization" }, "width": 12, "xPos": 0, "yPos": 32 }, { "height": 4, "widget": { "alertChart": { "name": "${google_monitoring_alert_policy.database_memory.name}" } }, "width": 6, "xPos": 0, "yPos": 33 }, { "height": 4, "widget": { "alertChart": { "name": "${google_monitoring_alert_policy.scheduler_memory.name}" } }, "width": 6, "xPos": 6, "yPos": 33 }, { "height": 4, "widget": { "alertChart": { "name": "${google_monitoring_alert_policy.worker_memory.name}" } }, "width": 6, "xPos": 0, "yPos": 37 }, { "height": 4, "widget": { "alertChart": { "name": "${google_monitoring_alert_policy.webserver_memory.name}" } }, "width": 6, "xPos": 6, "yPos": 37 }, { "height": 1, "widget": { "text": { "content": "", "format": "RAW" }, "title": "Airflow component errors" }, "width": 12, "xPos": 0, "yPos": 41 }, { "height": 4, "widget": { "alertChart": { "name": "${google_monitoring_alert_policy.scheduler_errors.name}" } }, "width": 6, "xPos": 0, "yPos": 42 }, { "height": 4, "widget": { "alertChart": { "name": "${google_monitoring_alert_policy.worker_errors.name}" } }, "width": 6, "xPos": 6, "yPos": 42 }, { "height": 4, "widget": { "alertChart": { "name": "${google_monitoring_alert_policy.webserver_errors.name}" } }, "width": 6, "xPos": 0, "yPos": 48 }, { "height": 4, "widget": { "alertChart": { "name": "${google_monitoring_alert_policy.other_errors.name}" } }, "width": 6, "xPos": 6, "yPos": 48 }, { "height": 1, "widget": { "text": { "content": "", "format": "RAW" }, "title": "Task errors" }, "width": 12, "xPos": 0, "yPos": 52 } ] } } EOF }
"google_monitoring_monitored_project"
resource
블록을 수정합니다.for_each
블록의 프로젝트 목록을 모니터링되는 프로젝트로 바꿉니다.metrics_scope
의"YOUR_MONITORING_PROJECT"
를 Monitoring 프로젝트의 이름으로 바꿉니다.
구성을 검토하고 Terraform에서 만들거나 업데이트할 리소스가 예상과 일치하는지 확인합니다. 필요한 경우 수정합니다.
terraform plan
다음 명령어를 실행하고 프롬프트에 yes를 입력하여 Terraform 구성을 적용합니다.
terraform apply
Monitoring 프로젝트의 Google Cloud 콘솔에서 Monitoring 대시보드 페이지로 이동합니다.
Monitoring 대시보드로 이동
커스텀 탭에서 Cloud Composer - Monitoring Platform이라는 커스텀 대시보드를 찾습니다.