When you run your pipeline using the Dataflow-managed service, you can view that job and any others by using Dataflow's web-based monitoring user interface. The monitoring interface lets you see and interact with your Dataflow jobs.
You can access the Dataflow monitoring interface by using the Google Cloud console. The monitoring interface can show you:
- A list of all currently running Dataflow jobs and previously run jobs within the last 30 days.
- A graphical representation of each pipeline.
- Details about your job's status, type, and SDK version.
- Links to information about the Google Cloud services running your pipeline, such as Compute Engine and Cloud Storage.
- Any errors or warnings that occur during a job.
- Additional diagnostics for a job.
You can view job visualizers within the Dataflow monitoring interface. These charts display metrics over the duration of a pipeline job and include the following information:
- Step-level visibility to help identify which steps might be causing pipeline lag.
- Statistical information that can surface anomalous behavior.
- I/O metrics that can help identify bottlenecks in your sources and sinks.
Access the Dataflow monitoring interface
To access the Dataflow monitoring interface, follow these steps:
- Sign in to the Google Cloud console.
- Select your Google Cloud project.
- Open the navigation menu.
- Under Big Data, click Dataflow.
A list of Dataflow jobs appears along with their status. If you don't see any jobs, you need to run a new job. To learn how to run a job, see the Dataflow quickstarts.
A job can have the following statuses:
- —: the monitoring interface has not yet received a status from the Dataflow service.
- Running: the job is running.
- Starting...: the job is created, but the system needs some time to prepare before launching.
- Queued: either a FlexRS job is queued or a Flex template job is being launched (which might take several minutes).
- Canceling...: the job is being canceled.
- Canceled: the job is canceled.
- Draining...: the job is being drained.
- Drained: the job is drained.
- Updating...: the job is being updated.
- Updated: the job is updated.
- Succeeded: the job has finished successfully.
- Failed: the job failed to complete.
For more information about a pipeline, click that job's Name.
Access job visualizers
To access charts for monitoring your job, click the job Name within the Dataflow monitoring interface. The Job details page is displayed, which contains the following information:
- Job graph: visual representation of your pipeline
- Execution details: tool to optimize your pipeline performance
- Job metrics: metrics about the running of your job
- Autoscaling: metrics related to streaming job autoscaling events
- Job info panel: descriptive information about your pipeline
- Job logs: logs generated by the Dataflow service at the job level
- Worker logs: logs generated by the Dataflow service at the worker level
- Diagnostics: table showing where errors occurred along the chosen timeline and possible recommendations for your pipeline
- Time selector: tool that lets you adjust the timespan of your metrics
Within the Job details page, you can switch your job view with the Job graph, Execution details, Job metrics, and Autoscaling tabs.
Create Cloud Monitoring alerts
Dataflow is fully integrated with Cloud Monitoring, which lets you create alerts when your job exceeds a user-defined threshold. To create a Cloud Monitoring alert from a metric chart, click Create alerting policy.
For instructions on creating these alerts, read the Using Cloud Monitoring for Dataflow pipelines page. If you are unable to see the monitoring graphs or create alerts, you might need additional Monitoring permissions.
Full screen mode
To view a metric chart in full screen, click fullscreen.
Use the time selector tool
You can adjust the timespan of the metrics with the time selector tool. You can select a predefined duration or select a custom time interval to analyze your job.
For streaming or in-flight batch jobs, the default display of the charts shows the previous six hours of metrics for that job. For stopped or completed streaming jobs, the default display of the charts shows the entire runtime of the job duration.
When you select a specific Dataflow job, the monitoring interface provides a graphical representation of your pipeline: the job graph. The job graph page in the console also provides a job summary, a job log, and information about each step in the pipeline. For more details about job graphs, see Dataflow job graph.
You can view charts in the
Job metrics tab of the Dataflow UI. Each metric is organized into the following dashboards:
Streaming metrics (streaming pipelines only)
- Data freshness (with and without Streaming Engine)
- System latency (with and without Streaming Engine)
- Processing (Streaming Engine only)
- Parallelism (Streaming Engine only)
- Persistence (Streaming Engine only)
- Duplicates (Streaming Engine only)
- Timers (Streaming Engine only)
You can view autoscaling monitoring charts for streaming jobs within the Dataflow monitoring interface. These charts display metrics over the duration of a pipeline job and include the following information:
- The number of worker instances used by your job at any point in time
- Autoscaling log files
- The estimated backlog over time
- Average CPU utilization over time
For more information, see Monitor Dataflow autoscaling.
Recommendations and Diagnostics
Dataflow provides recommendations for improving job performance, reducing cost, and troubleshooting errors. This section explains how to review and interpret the recommendations. Keep in mind that some recommendations might not be relevant to your use case.
The Diagnostics tab under Logs collects and displays certain log entries produced in your pipelines. These include messages that indicate a probable issue with the pipeline, and error messages with stack traces. Collected log entries are deduplicated and combined into error groups.
The error report includes the following information:
- A list of errors with error messages.
- The number of times each error occurred.
- A histogram indicating when each error occurred.
- The time that the error most recently occurred.
- The time that the error first occurred.
- The status of the error.
To view the error report for a specific error, click the description under the Errors column. The Error reporting page is displayed. If the error is a Service Error, an additional link with documentation including further steps will be displayed ("Troubleshooting guide").
To know more about the page, see Viewing errors.
Mute an error
To mute an error message, open the Diagnostics tab, click the error you want to mute, open the resolution status menu (labeled one of: Open | Acknowledged | Resolved | Muted ) and select Muted.
The Recommendations tab displays insights from Dataflow regarding the pipeline. The goal of these insights is to identify situations in which improvements in cost and performance might be made.
The Update date column indicates the last time that an insight was observed. Recommendations will be stored for 30 days from the Update date.
Programmatic access to recommendations
For programmatic access to recommendations, use the Recommender API.
Dismiss a recommendation
You can dismiss a recommendation at the Recommendation Hub for your project.
To dismiss a recommendation, click the navigation menu in the upper left of the Google Cloud console and select Home > Recommendations. On the Dataflow Diagnostics card, click View all, select the recommendation you want to dismiss, and click Dismiss.
Read how to use Execution details to optimize a Dataflow job
Explore Cloud Monitoring to create alerts and view Dataflow metrics, including custom metrics
Learn more about building production-ready data pipelines