Monitoring environments in Stackdriver

You can use Stackdriver Monitoring and Stackdriver Logging with Cloud Composer.

Monitoring provides visibility into the performance, uptime, and overall health of cloud-powered applications. Stackdriver collects and ingests metrics, events, and metadata from Cloud Composer to generate insights via dashboards and charts. You can use Monitoring to understand the performance and health of your Cloud Composer environments and Airflow metrics.

Logging captures the logs that the scheduler and worker containers produce. The logs contain useful system-level and Airflow dependency information to help with debugging. For information about viewing logs, see Viewing Airflow logs.

Before you begin

  • The following permissions are required to access the logs and metrics for your Cloud Composer environment:

    • Read-only logging and monitoring: logging.viewer and monitoring.viewer
    • Read-only logging including private logs: logging.privateLogViewer
    • Read-write monitoring: monitoring.editor

      For more information, see Cloud Composer Access Control.

  • To avoid duplicate logging, Stackdriver Logging for Google Kubernetes Engine is disabled.

  • Stackdriver Logging produces an entry for each status and event that occurs in your Google Cloud Platform project. You can use exclusion filters to reduce the volume of logs, including the logs that Stackdriver produces for Cloud Composer.

  • Monitoring cannot plot the count values for workflows and tasks that execute more than once per minute, and currently, does not plot metrics for failed tasks.

Metrics and resource types

You can examine Airflow metrics in Monitoring for workflows (DAGs) and the Celery Executor.

Environment

Environment health

To check the health of your environment, you can use the following health status metric: composer.googleapis.com/environment/healthy

Cloud Composer runs a liveness DAG named airflow_monitoring every 5 minutes and reports environment health as follows:

  • When the DAG run finishes successfully, the health status is True.
  • If the DAG run fails, the health status is False.
  • If the DAG run does not finish, Cloud Composer polls the DAG's state every 5 minutes and reports False if the one-hour timeout occurs.

The liveness DAG is stored in the dags/ folder and visible in the Airflow web UI.

Database health

To check the health of your database, you can use the following health status metric: composer.googleapis.com/environment/database_health

The Cloud Composer Airflow monitoring pod pings the database every minute and reports health status as True if a SQL connection can be established or False if not.

Workflows

To help you monitor the efficiency of your workflow runs and identify straggler tasks that cause long latency, the following workflow metrics are available:

Workflow Metric API
Number of workflow runs composer.googleapis.com/workflow/run_count
Duration of each workflow run composer.googleapis.com/workflow/run_duration
Number of task runs composer.googleapis.com/workflow/task/run_count
Duration of each task composer.googleapis.com/workflow/task/run_duration

Stackdriver shows only the metrics for completed workflow and task runs (success or failure). No Data displays when there is no workflow activity and for in-progress workflow and task runs.

Celery Executor

The following Celery Executor metrics are available. These metrics can help you determine if there are sufficient worker resources in your environment.

Celery Executor Metric API
Number of tasks in the queue composer.googleapis.com/environment/task_queue_length
Number of online Celery workers composer.googleapis.com/environment/num_celery_workers

The Stackdriver documentation also includes the following information about Cloud Composer metrics and resources:

  • For the list of usage metrics that Cloud Composer reports to Stackdriver, see Metrics List.
  • For details on the cloud_composer_environment resource type, see Monitored Resource Types in the Stackdriver documentation.

Using Monitoring on Cloud Composer environments

You can access Monitoring from the Monitoring console or using the Monitoring API.

Console

  1. After creating a Cloud Composer environment, go to the Monitoring console to view environment monitoring data.
  2. When you first access Monitoring, you are asked to create a Workspace and select a project.
  3. After setting up the Workspace, the Monitoring console appears.

  4. Select Resources > Metrics Explorer and choose Cloud Composer:
    1. Click in the Find resource type and metric input box to display the resource drop-down list.
    2. Select the Cloud Composer Environment or Cloud Composer Workflow resource. Alternatively, enter cloud_composer_environment or cloud_composer_workflow in the box.
  5. Click again in the input box and then select a metric from the drop-down list. Hovering over the metric name displays information about the metric.
  6. Cloud Composer environment information is contained in the workflow_name label: workflow_name=environment.workflow. To view workflow metrics for a specific environment, add a filter:
    1. Create a filter for workflow_name.
    2. Filter the prefix by using the regular expression =~ "your-environment-name.*" with the name of the environment you want to view workflow metrics for. For information about using regular expression in filtering labels, see Filtering.
  7. Click Save Chart.

    You can also group by metric labels, perform aggregations, and select chart viewing options. See the Monitoring documentation.

API

You can use the Monitoring timeSeries.list API to capture and list metrics defined by a filter expression. Use the Try this API template on the API page to send an API request and display the response.

Building a custom Monitoring dashboard

You can build a custom Monitoring dashboard that display charts of selected metrics for your Cloud Composer environment.

  1. Select Dashboards > Create Dashboard from the Monitoring console.

  2. In the Untitled Dashboard, click Add Chart and create the chart:

    1. In the Add Chart window, select Cloud Composer Environment as the resource type.
    2. Select one or more metrics and chart properties.
    3. Confirm or type a new chart title and click Save.
    4. Add additional charts to your dashboard, as needed, and Save.

    The following example shows the metric Task Duration. This metric plots the duration of active tasks in your workflows, which is useful for fine-tuning performance.

  3. To view the dashboard, click the title in the Monitoring Dashboards menu.

  4. From the dashboard display page, you can view, update, and delete charts.

Using Monitoring alerts

You can create a Monitoring alert that notifies you when a Cloud Composer metric crosses a specified threshold.

Creating an alert

To create an alert:

  1. Select Alerting > Create a Policy from the Monitoring console. The Create a new alerting policy page displays.

  2. In Conditions, click Add Condition.

  3. On the Create condition page:

    1. Add a name for the condition.
    2. In the tab header, scroll to Metric.
    3. Under Target, choose Cloud Composer Environment or Cloud Composer Workflow as the Resource type.
    4. Select a metric for the selected Resource type.
    5. Click Save.

  4. After setting the alert condition, complete the alert policy by setting notification channels, documentation, and the name for the new alert policy from the Create a new alerting policy page.

See the Stackdriver documentation for more information about managing alerting policies and specifying conditions for alerting policies.

Viewing alerts

When an alert is triggered by a metric threshold condition, Monitoring creates an incident (and a corresponding event).

You can review incidents from the Monitoring Alerting > Incidents page.

If you defined a notification mechanism in the alert policy, such as an email or SMS notification, Monitoring also sends a notification of the incident.

What's next

Var denne siden nyttig? Si fra hva du synes:

Send tilbakemelding om ...