Use Cloud Monitoring for Dataflow pipelines

Cloud Monitoring provides powerful logging and diagnostics. Dataflow integration with Monitoring lets you access Dataflow job metrics such as Job Status, Element Counts, System Lag (for streaming jobs), and User Counters from the Monitoring dashboards. You can also employ Monitoring alerting capabilities to notify you of various conditions, such as long streaming system lag or failed jobs.

Before you begin

Follow one of the quickstarts to set up your Dataflow project and to construct and run your pipeline.

To see logs in Metrics Explorer, the controller service account must have the roles/monitoring.metricWriter role.

Custom metrics

Any metric you define in your Apache Beam pipeline is reported by Dataflow to Monitoring as a custom metric. There are three types of Apache Beam pipeline metrics: Counter, Distribution, and Gauge. Dataflow currently only reports Counter and Distribution to Monitoring. Distribution is reported as four submetrics suffixed with _MAX, _MIN, _MEAN, and _COUNT. Dataflow does not support creating a histogram from Distribution metrics.

Dataflow reports incremental updates to Monitoring approximately every 30 seconds. All Dataflow custom metrics are exported as a double data type to avoid conflicts. They appear in Monitoring as dataflow.googleapis.com/job/user_counter with metric_name: metric-name and ptransform: ptransform-name as labels. They are subject to the limitation of cardinality in Monitoring.

For backward compatibility, Dataflow also reports custom metrics to Monitoring as custom.googleapis.com/dataflow/metric-name. There is a limit of 100 Dataflow custom metrics per project published as custom.googleapis.com/dataflow/metric-name.

Custom metrics reported to Monitoring incurs charges based on the Cloud Monitoring pricing.

Explore metrics

You can explore Dataflow metrics using Monitoring. Follow the steps in this section to observe the several standard metrics provided for each of your Apache Beam pipelines.

  1. In the Google Cloud console, select Monitoring:

    Go to Monitoring

  2. In the left navigation pane, click Metrics Explorer.

  3. In the Find resource type and metric pane, select the dataflow_job resource type. Selecting dataflow_job resource in the Metrics Explorer.

  4. From the list that appears, select a metric you'd like to observe for one of your jobs.


    Choose metrics

Create alerting policies and dashboards

Monitoring provides access to Dataflow-related metrics. You can create dashboards to chart time series of metrics, and you can create alerting policies that notify you when metrics reach specified values.

Create groups of resources

You can create resource groups that include multiple Apache Beam pipelines so that you can easily set alerts and build dashboards.

  1. In the Google Cloud console, select Monitoring:

    Go to Monitoring

  2. In the Groups menu, select Create Group.

  3. Add filter criteria that define the Dataflow resources included in the group. For example, one of your filter criteria can be the name prefix of your pipelines. Create group.

  4. After the group is created, you can see the basic metrics related to resources in that group.

Create alerting policies for Dataflow metrics

Monitoring gives you the ability to create alerts and be notified when a certain metric crosses a specified threshold. For example, when System lag of a streaming pipeline increases above a predefined value.

  1. In the Google Cloud console, select Monitoring:

    Go to Monitoring

  2. In the Alerting menu, click Create Policy.

  3. In the Create new alerting policy page, you can define the alerting conditions and notification channels.
    For example, to set an alert on the System lag for the WindowedWordCount Apache Beam pipeline group, complete the following steps:

    1. Select Add condition.
    2. In the Find resource type or metric field, enter and select dataflow_job.
    3. In the Find resource type or metric field, select System lag.
  4. After you've created an alert, you can review the events related to Dataflow by selecting See all events in the Events section. Every time an alert is triggered, an incident and a corresponding event are created. If you specified a notification mechanism in the alert, such as email or SMS, you also receive a notification.

Build your own custom monitoring dashboard

You can build Monitoring dashboards with the most relevant Dataflow-related charts.

  1. Go to the Google Cloud console, and select Monitoring:

    Go to Monitoring

  2. Select Dashboards > Create Dashboard.

  3. Click Add Chart.

  4. In the Add Chart window, select dataflow_job and metric you want to chart.

  5. In the Filter field, select a group that contains Apache Beam pipelines.

You can add as many charts to the dashboard as you like.

Receive worker VM metrics from the Monitoring agent

If you would like to monitor persistent disk, CPU, network, and process metrics from your Dataflow worker VM instances, you can enable the Monitoring agent when you run your pipeline. See the list of available Monitoring agent metrics.

To enable the Monitoring agent, use the --experiments=enable_stackdriver_agent_metrics option when running your pipeline. The controller service account must have the roles/monitoring.metricWriter role.

To disable the Monitoring agent without stopping your pipeline, update your pipeline by launching a replacement job without the --experiments=enable_stackdriver_agent_metrics parameter.

Storage and retention

Information about completed or cancelled Dataflow jobs is stored for 30 days.

Operational logs are stored in the _Default log bucket. The logging API service name is dataflow.googleapis.com. For more information about the Google Cloud monitored resource types and services used in Cloud Logging, see Monitored resources and services.

For details about how long log entries are retained by Logging, see the retention information in Quotas and limits: Logs retention periods.

For information about viewing operational logs, see Monitor and view pipeline logs.

What's next

To learn more, consider exploring these other resources: