Using Monitoring for Dataflow pipelines

Cloud Monitoring provides powerful logging and diagnostics. Dataflow integration with Monitoring lets you access Dataflow job metrics such as Job Status, Element Counts, System Lag (for streaming jobs), and User Counters from the Monitoring dashboards. You can also employ Monitoring alerting capabilities to notify you of various conditions, such as long streaming system lag or failed jobs.

Before you begin

Follow one of the quickstarts to get your Dataflow project set up and construct and run your pipeline.

Custom metrics

Any metric you define in your Apache Beam pipeline is reported by Dataflow to Monitoring as a custom metric. There are three types of Apache Beam pipeline metrics: Counter, Distribution, and Gauge. Dataflow currently only reports Counter and Distribution to Monitoring. Distribution is reported as four sub-metrics suffixed with _MAX, _MIN, _MEAN, and _COUNT. Dataflow does not support creating a histogram from Distribution metrics.

Dataflow reports incremental updates to Monitoring approximately every 30 seconds. All user metrics are exported as a double data type to avoid conflicts. Custom metrics in Dataflow appear in Monitoring as custom.googleapis.com/dataflow/metric-name and are limited to 500 metrics per project.

Custom metrics reported to Monitoring incurs charges based on the Cloud Monitoring pricing.

Explore metrics

You can explore Dataflow metrics using Monitoring. Follow the steps in this section to observe the several standard metrics provided for each of your Apache Beam pipelines.

  1. In the Google Cloud Console, select Monitoring:

    Go to Monitoring

  2. In the left navigation pane, click Metrics Explorer.

  3. In the Find resource type and metric pane, select the dataflow_job resource type. Selecting dataflow_job resource in the Metrics explorer.

  4. From the list that appears, select a metric you'd like to observe for one of your jobs.


    Choose metrics

Create alerting policies and dashboards

Monitoring does not only provide you with access to Dataflow-related metrics, but also lets you to create alerting policies and dashboards so you can chart time series of metrics and choose to be notified when these metrics reach specified values.

Create groups of resources

You can create resource groups that include multiple Apache Beam pipelines so that you can easily set alerts and build dashboards.

  1. In the Google Cloud Console, select Monitoring:

    Go to Monitoring

  2. In the Groups menu, select Create Group.

  3. Add filter criteria that define the Dataflow resources included in the group. For example, one of your filter criteria can be the name prefix of your pipelines. Create group.

  4. After the group is created, you can see the basic metrics related to resources in that group.

Create alerting policies for Dataflow metrics

Monitoring gives you the ability to create alerts and be notified when a certain metric crosses a specified threshold. For example, when System lag of a streaming pipeline increases above a predefined value.

  1. In the Google Cloud Console, select Monitoring:

    Go to Monitoring

  2. In the Alerting menu, click on Create Policy.

  3. In the Create new alerting policy page, you can define the alerting conditions and notification channels.
    For example, to set an alert on the System lag for the WindowedWordCount Apache Beam pipeline group, complete the following steps:

    1. Select Add condition.
    2. In the Find resource type or metric field, enter and select dataflow_job.
    3. In the Find resource type or metric field, select System lag.
  4. After you've created an alert, you can review the events related to Dataflow by selecting See all events in the Events section. Every time an alert is triggered, an incident and a corresponding event are created. If you specified a notification mechanism in the alert, such as email or SMS, you will also receive a notification.

Build your own custom monitoring dashboard

You can build Monitoring monitoring dashboards with the most relevant Dataflow-related charts.

  1. Go to the Google Cloud Console, and select Monitoring:

    Go to Monitoring

  2. Select Dashboards > Create Dashboard.

  3. Click on Add Chart.

  4. In the Add Chart window, select dataflow_job and metric you want to chart.

  5. In the Filter field, select a group that contains Apache Beam pipelines.

You can add as many charts to the dashboard as you like.

Receive worker VM metrics from the Monitoring agent

If you would like to monitor persistent disk, CPU, network, and process metrics from your Dataflow worker VM instances, you can enable the Monitoring agent when you run your pipeline. See the list of available Monitoring agent metrics.

To enable the Monitoring agent, use the --experiments=enable_stackdriver_agent_metrics option when running your pipeline. The controller service account must have the roles/monitoring.metricWriter role.

To disable the Monitoring agent without stopping your pipeline, update your pipeline by launching a replacement job without the --experiments=enable_stackdriver_agent_metrics parameter.

What's next

To learn more, consider exploring these other resources: