Using Stackdriver Monitoring for Cloud Dataflow pipelines

Stackdriver provides powerful monitoring, logging, and diagnostics. Dataflow integration with Stackdriver Monitoring lets you access Dataflow job metrics such as Job Status, Element Counts, System Lag (for streaming jobs), and User Counters from the Stackdriver dashboards. You can also employ Stackdriver alerting capabilities to notify you of various conditions, such as long streaming system lag or failed jobs.

Before you begin

Follow one of the quickstarts to get your Dataflow project set up and construct and run your pipeline.

Custom metrics

Any metric you define in your Apache Beam pipeline is reported by Dataflow to Stackdriver as a custom metric. There are three types of Apache Beam pipeline metrics: Counter, Distribution, and Gauge. Dataflow currently only reports Counter and Distribution to Stackdriver. Distribution is reported as four sub-metrics suffixed with _MAX, _MIN, _MEAN, and _COUNT. Dataflow does not support creating a histogram from Distribution metrics.

Dataflow reports incremental updates to Stackdriver approximately every 30 seconds. All user metrics are exported as a double data type to avoid conflicts. Custom metrics in Dataflow appear in Stackdriver as and are limited to 500 metrics per project.

Custom metrics reported to Stackdriver incurs charges based on the Stackdriver Monitoring pricing.

Explore metrics

You can explore Dataflow metrics using Stackdriver. Follow the steps in this section to observe the several standard metrics provided for each of your Apache Beam pipelines.

  1. In the Google Cloud Console, select Stackdriver Monitoring:

    Go to Monitoring

  2. If the Add your project to a Workspace dialog is displayed, create a new Workspace by selecting your Google Cloud project under New Workspace and then clicking Add. In the following image, the Google Cloud project name is Quickstart:

    Creating a new workspace dialog.

    The Add your project to a Workspace dialog is displayed only when you have at least one existing Workspace available to you. The Workspaces listed under Existing Workspace are Workspaces you've created or Workspaces for Google Cloud projects where you have editorial permission. You can choose between creating a new Workspace and adding your project to an existing Workspace by using this dialog.

  3. In the Resource menu, select Metrics Explorer.

  4. In the Find a resource type and/or a metric pane, select the dataflow_job resource type. dataflow_job

  5. From the list that appears, select a metric you'd like to observe for one of your jobs.

    Choose metrics
    For example: This example shows a streaming pipeline that reads from a Cloud Pub/Sub topic and writes to BigQuery. It has 5 steps, one of which is PubsubIO.Read. The image below displays the dataflow/job/element_count for the PubsubIO.Read step of the pipeline. Example

Create alerts and dashboards

Stackdriver does not only provide you with access to Dataflow-related metrics, but also lets you to create alerts and dashboards so you can chart time series of metrics and choose to be notified when these metrics reach specified values.

Create groups of resources

You can create resource groups that include multiple Apache Beam pipelines so that you can easily set alerts and build dashboards.

  1. In the Google Cloud Console, select Stackdriver Monitoring:

    Go to Monitoring

  2. In the Groups menu, select Create Groups.

  3. Add filter criteria that define the Dataflow resources included in the group. For example, one of your filter criteria can be the name prefix of your pipelines. Create group.

  4. After the group is created, you can see the basic metrics related to resources in that group. Create group.

Create alerts for Cloud Dataflow metrics

Stackdriver gives you the ability to create alerts and be notified when a certain metric crosses a specified threshold. For example, when System Lag of a streaming pipeline increases above a predefined value.

  1. In the Google Cloud Console, select Stackdriver Monitoring:

    Go to Monitoring

  2. In the Alerting menu, select Policies Overview.

  3. Click on Add Policy. Add policy.

  4. In the Create new alerting policy page, you can define the alerting conditions and the channels of communication for alerts.
    For example, to set an alert on the System Lag for the WindowedWordCount Apache Beam pipeline group, select 'Dataflow Job' in the Resource Type dropdown, 'Group' in the Applies To dropdown, and 'System Lag' in the If Metric dropdown. Create alert.

  5. After you've created an alert, you can review the events related to Dataflow by navigating to Alerting > Events. Every time an alert is triggered by a Metric Threshold condition, an Incident and a corresponding Event are created in Stackdriver. If you specified a notification mechanism in the alert, such as email or SMS, you will also receive a notification. Incident alert.

Build your own custom monitoring dashboard

You can build Stackdriver monitoring dashboards with the most relevant Dataflow-related charts.

  1. Go to the Google Cloud Console, and select Stackdriver Monitoring:

    Go to Monitoring

  2. Select Dashboards > Create Dashboard.

  3. Click on Add Chart.

  4. In the Add Chart window, select "Dataflow Job" as the Resource Type, select a metric you want to chart in the Metric Type field, and select a group that contains Apache Beam pipelines in the Filter panel. Add chart.

You can add as many charts to the dashboard as you like.

Receive worker VM metrics from Stackdriver Monitoring agent

If you would like to monitor persistent disk, CPU, network, and process metrics from your Dataflow worker VM instances, you can enable the Stackdriver Monitoring Agent when you run your pipeline. See the list of available Monitoring agent metrics.

To enable the Monitoring agent, use the --experiments=enable_stackdriver_agent_metrics option when running your pipeline.

To disable the Monitoring agent without stopping your pipeline, update your pipeline by launching a replacement job and without specifying the --experiments=enable_stackdriver_agent_metrics parameter.

What's next

To learn more, consider exploring these other resources:

Was this page helpful? Let us know how we did:

Send feedback about...

Cloud Dataflow
Need help? Visit our support page.