Use the Dataflow job monitoring interface

When you run your pipeline using the Dataflow-managed service, you can view that job and any others by using the web-based monitoring interface of Dataflow. The monitoring interface lets you see and interact with your Dataflow jobs.

You can access the Dataflow monitoring interface by using the Google Cloud console. The monitoring interface can show you:

  • A list of all running Dataflow jobs and all jobs run within the last 30 days.
  • A graphical representation of each pipeline.
  • Details about the status of your job, type, and SDK version.
  • Links to information about the Google Cloud services running your pipeline, such as Compute Engine and Cloud Storage.
  • Any errors or warnings that occur during a job.
  • Additional diagnostics for a job.

You can view job visualizers within the Dataflow monitoring interface. These charts display metrics over the duration of a pipeline job and include the following information:

  • Step-level visibility to help identify which steps might be causing pipeline lag.
  • Statistical information that can surface anomalous behavior.
  • I/O metrics that can help identify bottlenecks in your sources and sinks.

Access the Dataflow monitoring interface

To access the Dataflow monitoring interface, follow these steps:

  1. Sign in to the Google Cloud console.
  2. Select your Google Cloud project.
  3. Open the navigation menu.
  4. In Analytics, click Dataflow.

A list of Dataflow jobs appears along with their status. If you don't see any jobs, you need to run a new job. To learn how to run a job, see the Java quickstart, Python quickstart, or Go quickstart.

A list of Dataflow jobs with jobs in the running, failed, and succeeded states.
Figure 1: A list of Dataflow jobs in the Google Cloud console with jobs in the Running, Failed, and Succeeded states.

A job can have the following statuses:

  • : the monitoring interface has not yet received a status from the Dataflow service.
  • Running: the job is running.
  • Starting...: the job is created, but the system needs some time to prepare before launching.
  • Queued: either a FlexRS job is queued or a Flex template job is being launched (which might take several minutes).
  • Canceling...: the job is being canceled.
  • Canceled: the job is canceled.
  • Draining...: the job is being drained.
  • Drained: the job is drained.
  • Updating...: the job is being updated.
  • Updated: the job is updated.
  • Succeeded: the job has finished successfully.
  • Failed: the job failed to complete.

For more information about a pipeline, click the name of the job.

Access job visualizers

To access charts for monitoring your job, click the job name within the Dataflow monitoring interface. The Job details page is displayed, which contains the following information:

  • Job graph: visual representation of your pipeline
  • Execution details: tool to optimize your pipeline performance
  • Job metrics: metrics about the running of your job
  • Cost: metrics about the estimated cost of your job
  • Autoscaling: metrics related to streaming job autoscaling events
  • Job info panel: descriptive information about your pipeline
  • Job logs: logs generated by the Dataflow service at the job level
  • Worker logs: logs generated by the Dataflow service at the worker level
  • Diagnostics: table showing where errors occurred along the chosen timeline and possible recommendations for your pipeline
  • Data sampling: tool that lets you observe the data at each step of a pipeline. See Use data sampling to observe pipeline data.

Within the Job details page, you can switch your job view with the Job graph, Execution details, Job metrics, Cost, and Autoscaling tabs.

Job graphs

When you select a specific Dataflow job, the monitoring interface provides a graphical representation of your pipeline: the job graph. The job graph page in the console also provides a job summary, a job log, and information about each step in the pipeline. For more details about job graphs, see Dataflow job graph.

Job metrics

You can view charts in the Job metrics tab of the Dataflow web interface. Each metric is organized into the following dashboards:

Overview metrics

Streaming metrics (streaming pipelines only)

Resource metrics

Input metrics

Output metrics

Cloud Monitoring alerts

See Create Cloud Monitoring alerts.

Cost monitoring

The Cost page in the Google Cloud console shows the estimated cost of your current Dataflow job. Estimated costs are calculated by multiplying the resource usage metrics as shown in Cloud Monitoring by the price of those resources in the job region.

Use cost monitoring

Job cost estimates are available for both batch and streaming jobs. The Cost page in the Google Cloud console provides the following information:

  • Details about which resources contribute to the job cost and by how much. Resources include vCPUs, memory, Dataflow Shuffle data processed or Streaming Engine data processed, and SSD and HDD disk usage.
  • Costs over specific time windows, such as: time since the job started, the previous hour, the last 24 hours, the preceding seven days, and a user-specified time range.

You can use monitoring alerts to get notifications when your job costs cross a specified threshold. You can also use alerts to make changes to your jobs, such as stopping or canceling jobs, based on the thresholds that you set.

To create a Cloud Monitoring alert rule, click Create alert. For instructions about how to configure these alerts, see Use Cloud Monitoring for Dataflow pipelines.

Limitations

Dataflow cost monitoring does not support Dataflow Prime jobs and GPU metrics.

Autoscaling metrics

You can view autoscaling monitoring charts for streaming jobs within the Dataflow monitoring interface. These charts display metrics over the duration of a pipeline job and include the following information:

  • The number of worker instances used by your job at any point in time
  • Autoscaling log files
  • The estimated backlog over time
  • Average CPU utilization over time

For more information, see Monitor Dataflow autoscaling.

Recommendations and diagnostics

Dataflow provides recommendations for improving job performance, reducing cost, and troubleshooting errors. This section explains how to review and interpret the recommendations. Keep in mind that some recommendations might not be relevant to your use case.

Recommendations

The Recommendations tab displays insights from Dataflow regarding the pipeline. The goal of these insights is to identify situations in which improvements in cost and performance might be made.

The Recommendations tab for a Dataflow job with sample recommendations.

The Update date column indicates the last time that an insight was observed. Recommendations will be stored for 30 days from the Update date.

Programmatic access to recommendations

For programmatic access to recommendations, use the Recommender API.

Dismiss a recommendation

You can dismiss a recommendation at the Recommendation Hub for your project.

To dismiss a recommendation, click the navigation menu in the upper left of the Google Cloud console and select Home > Recommendations. On the Dataflow Diagnostics card, click View all, select the recommendation you want to dismiss, and click Dismiss.

Diagnostics

The Diagnostics tab of the Logs pane, collects and displays certain log entries produced in your pipelines. These include messages that indicate a probable issue with the pipeline, and error messages with stack traces. Collected log entries are deduplicated and combined into error groups.

The Diagnostics tab for a Dataflow job with a Service Error error group.

The error report includes the following information:

  • A list of errors with error messages.
  • The number of times each error occurred.
  • A histogram indicating when each error occurred.
  • The time that the error most recently occurred.
  • The time that the error first occurred.
  • The status of the error.

To view the error report for a specific error, click the description under the Errors column. The Error reporting page is displayed. If the error is a Service Error, an extra link with documentation including further steps will be displayed ("Troubleshooting guide").

The error group detail page for a Dataflow Service Error.

To know more about the page, see Viewing errors.

Mute an error

To mute an error message, open the Diagnostics tab, click the error you want to mute, open the resolution status menu (labeled one of: Open | Acknowledged | Resolved | Muted) and select Muted.

What's next