Monitor environments with Cloud Monitoring

Stay organized with collections Save and categorize content based on your preferences.

Cloud Composer 1 | Cloud Composer 2

You can use Cloud Monitoring and Cloud Logging with Cloud Composer.

Cloud Monitoring provides visibility into the performance, uptime, and overall health of cloud-powered applications. Cloud Monitoring collects and ingests metrics, events, and metadata from Cloud Composer to generate insights in dashboards and charts. You can use Cloud Monitoring to understand the performance and health of your Cloud Composer environments and Airflow metrics.

Logging captures logs produced by the scheduler and worker containers in your environment's cluster. These logs contain system-level and Airflow dependency information to help with debugging. For information about viewing logs, see View Airflow logs.

Before you begin

  • The following permissions are required to access logs and metrics for your Cloud Composer environment:

    • Read-only access to logs and metrics: logging.viewer and monitoring.viewer
    • Read-only access to logs, including private logs: logging.privateLogViewer
    • Read/write access to metrics: monitoring.editor

    For more information about other permissions and roles for Cloud Composer, see Access control.

  • To avoid duplicate logging, Cloud Logging for Google Kubernetes Engine is disabled.

  • Cloud Logging produces an entry for each status and event that occurs in your Google Cloud project. You can use exclusion filters to reduce the volume of logs, including the logs that Cloud Logging produces for Cloud Composer.

    Excluding logs from jobs.py can cause health check failures and CrashLoopBackOff errors. You must include -jobs.py in exclusion filters to prevent it from being excluded.

  • Monitoring cannot plot the count values for DAGs and tasks that execute more than once per minute, and does not plot metrics for failed tasks.

Environment metrics

You can use environment metrics to check the resource usage and health of your Cloud Composer environments.

Environment health

To check the health of your environment, you can use the following health status metric: composer.googleapis.com/environment/healthy.

Cloud Composer runs a liveness DAG named airflow_monitoring every five minutes and reports environment health as follows:

  • If the liveness DAG run finishes successfully, the health status is True.
  • If the liveness DAG run fails, the health status is False.
  • If the liveness DAG run does not finish, Cloud Composer polls the liveness DAG state every 5 minutes and reports False after 20 minutes have passed.

The liveness DAG is stored in the dags/ folder and visible in the Airflow UI. The frequency and contents of the liveness DAG are immutable and must not be modified. Changes to the liveness DAG do not persist.

Database health

To check the health of your database, you can use the following health status metric: composer.googleapis.com/environment/database_health.

The Airflow monitoring pod pings the database every minute and reports health status as True if a SQL connection can be established or False if not.

Database metrics

The following environment metrics are available for the Airflow metadata database used by Cloud Composer environments. You can use these metrics to monitor the performance and resource usage of your environment's database instance.

For example, you might want to upgrade the Cloud SQL machine type of your environment if your environment approaches resource limits. Or you might want to optimize costs related to Airflow metadata database usage by doing a database cleanup, in order to keep storage under a certain threshold.

Database metric API
Database CPU usage composer.googleapis.com/environment/database/cpu/usage_time
Database CPU cores composer.googleapis.com/environment/database/cpu/reserved_cores
Database CPU utilization composer.googleapis.com/environment/database/cpu/utilization
Database Memory usage composer.googleapis.com/environment/database/memory/bytes_used
Database Memory quota composer.googleapis.com/environment/database/memory/quota
Database Memory utilization composer.googleapis.com/environment/database/memory/utilization
Database Disk usage composer.googleapis.com/environment/database/disk/bytes_used
Database Disk quota composer.googleapis.com/environment/database/database/disk/quota
Database Disk utilization composer.googleapis.com/environment/database/disk/utilization
Database Connections limit composer.googleapis.com/environment/database/network/max_connections
Database Connections composer.googleapis.com/environment/database/network/connections

Web server metrics

The following environment metrics are available for the Airflow web server used by Cloud Composer environments. You can use these metrics to check the performance and resource usage of your environment's Airflow web server instance.

For example, you might want to upgrade the web server machine type if it constantly approaches resource limits.

Web server metric API
Web server CPU usage composer.googleapis.com/environment/web_server/cpu/usage_time
Web server CPU quota composer.googleapis.com/environment/web_server/cpu/reserved_cores
Web server memory usage composer.googleapis.com/environment/web_server/memory/bytes_used
Web server memory quota composer.googleapis.com/environment/web_server/memory/quota

DAG metrics

To help you monitor the efficiency of your DAG runs and identify tasks that cause high latency, the following DAG metrics are available:

DAG metric API
Number of DAG runs composer.googleapis.com/workflow/run_count
Duration of each DAG run composer.googleapis.com/workflow/run_duration
Number of task runs composer.googleapis.com/workflow/task/run_count
Duration of each task run composer.googleapis.com/workflow/task/run_duration

Cloud Monitoring shows only the metrics for completed workflow and task runs (success or failure). No Data displays when there is no workflow activity and for in-progress workflow and task runs.

Celery Executor metrics

The following Celery Executor metrics are available. These metrics can help you determine if there are sufficient worker resources in your environment.

Celery Executor metric API
Number of tasks in the queue composer.googleapis.com/environment/task_queue_length
Number of online Celery workers composer.googleapis.com/environment/num_celery_workers

Airflow metrics

Airflow metric API Description
Open slots in the pool composer.googleapis.com/environment/pool/open_slots Number of open slots in the pool.
Running slots in the pool composer.googleapis.com/environment/pool/running_slots Number of running slots in the pool.
Starving tasks in the pool composer.googleapis.com/environment/pool/starving_tasks Number of starving tasks in the pool.
Smart sensor poked tasks composer.googleapis.com/environment/smart_sensor/poked_tasks Number of tasks poked by the smart sensor in the previous poking loop.
Smart sensor successfully poked tasks composer.googleapis.com/environment/smart_sensor/poked_success Number of newly succeeded tasks poked by the smart sensor in the previous poking loop.
Smart sensor poking exceptions composer.googleapis.com/environment/smart_sensor/poked_exception Number of exceptions in the previous smart sensor poking loop.
Smart sensor poking exception failures composer.googleapis.com/environment/smart_sensor/exception_failures Number of failures caused by exception in the previous smart sensor poking loop.
Smart sensor poking infrastructure failures composer.googleapis.com/environment/smart_sensor/infra_failures Number of infrastructure failures in the previous smart sensor poking loop.
Failed SLA miss email notifications composer.googleapis.com/environment/email/sla_notification_failure_count Number of failed SLA miss email notification attempts.
Blocking triggers composer.googleapis.com/environment/trigger/blocking_count Number of triggers that blocked the main thread (likely due to not being fully asynchronous).
Failed triggers composer.googleapis.com/environment/trigger/failed_count Number of triggers that errored before they could fire an event.
Succeeded triggers composer.googleapis.com/environment/trigger/succeeded_count Number of triggers that have fired at least one event.

Using Monitoring for Cloud Composer environments

Console

You can use Metrics Explorer to display metrics related to your environments and DAGs:

  • Cloud Composer Environment resource contains metrics for environments.

    To show metrics for a specific environment, filter metrics by the environment_name label. You can also filter by other labels, such as environment's location or image version.

  • Cloud Composer Workflow resource contains metrics for DAGs.

    To show metrics for a specific DAG or task, filter metrics by the workflow_name and task_name labels. You can also filter by other labels, such as task status or Airflow operator name.

API and gcloud

You can create and manage custom dashboards and the widgets through the Cloud Monitoring API and gcloud monitoring dashboards command. For more information, see Manage dashboards by API.

For more information about resources, metrics, and filters, see the reference for Cloud Monitoring API:

Using Cloud Monitoring alerts

You can create alerting policies to monitor the values of metrics and to notify you when those metrics violate a condition.

  1. In the Google Cloud console, go to the Monitoring page.

    Go to Monitoring

  2. In the Monitoring navigation pane, select Alerting.
  3. If you haven't created your notification channels and if you want to be notified, then click Edit Notification Channels and add your notification channels. Return to the Alerting page after you add your channels.
  4. From the Alerting page, select Create policy.
  5. To select the metric, expand the Select a metric menu and then do the following:
    1. To limit the menu to relevant entries, enter Cloud Composer into the filter bar. If there are no results after you filter the menu, then disable the Show only active resources & metrics toggle.
    2. For the Resource type, select Cloud Composer Environment or Cloud Composer Workflow.
    3. Select a Metric category and a Metric, and then select Apply.
  6. Click Next.
  7. The settings in the Configure alert trigger page determine when the alert is triggered. Select a condition type and, if necessary, specify a threshold. For more information, see Condition trigger.
  8. Click Next.
  9. Optional: To add notifications to your alerting policy, click Notification channels. In the dialog, select one or more notification channels from the menu, and then click OK.
  10. Optional: Update the Incident autoclose duration. This field determines when Monitoring closes incidents in the absence of metric data.
  11. Optional: Click Documentation, and then add any information that you want included in a notification message.
  12. Click Alert name and enter a name for the alerting policy.
  13. Click Create Policy.
For more information, see Alerting policies.

What's next