Use the Dataflow monitoring interface

Stay organized with collections Save and categorize content based on your preferences.

When you run your pipeline using the Dataflow-managed service, you can view that job and any others by using Dataflow's web-based monitoring user interface. The monitoring interface lets you see and interact with your Dataflow jobs.

You can access the Dataflow monitoring interface by using the Google Cloud console. The monitoring interface can show you:

  • A list of all currently running Dataflow jobs and previously run jobs within the last 30 days.
  • A graphical representation of each pipeline.
  • Details about your job's status, type, and SDK version.
  • Links to information about the Google Cloud services running your pipeline, such as Compute Engine and Cloud Storage.
  • Any errors or warnings that occur during a job.
  • Additional diagnostics for a job.

You can view job monitoring charts within the Dataflow monitoring interface. These charts display metrics over the duration of a pipeline job and include the following information:

  • Step-level visibility to help identify which steps might be causing pipeline lag.
  • Statistical information that can surface anomalous behavior.
  • I/O metrics that can help identify bottlenecks in your sources and sinks.

Access the Dataflow monitoring interface

To access the Dataflow monitoring interface, follow these steps:

  1. Sign in to the Google Cloud console.
  2. Select your Google Cloud project.
  3. Open the navigation menu.
  4. Under Big Data, click Dataflow.

A list of Dataflow jobs appears along with their status. If you don't see any jobs, you need to run a new job. To learn how to run a job, see the Dataflow quickstarts.

A list of Dataflow jobs with jobs in the running, failed, and succeeded states.
Figure 1: A list of Dataflow jobs in the Google Cloud console with jobs in the Running, Failed, and Succeeded states.

A job can have the following statuses:

  • : the monitoring interface has not yet received a status from the Dataflow service.
  • Running: the job is running.
  • Starting...: the job is created, but the system needs some time to prepare before launching.
  • Queued: either a FlexRS job is queued or a Flex template job is being launched (which might take several minutes).
  • Canceling...: the job is being canceled.
  • Canceled: the job is canceled.
  • Draining...: the job is being drained.
  • Drained: the job is drained.
  • Updating...: the job is being updated.
  • Updated: the job is updated.
  • Succeeded: the job has finished successfully.
  • Failed: the job failed to complete.

For more information about a pipeline, click that job's Name.

Access job monitoring charts

To access a job's monitoring charts, click the job Name within the Dataflow monitoring interface. The Job details page is displayed, which contains the following information:

  • Job graph: visual representation of your pipeline
  • Execution details: tool to optimize your pipeline performance
  • Job metrics: metrics about the running of your job
  • Autoscaling: metrics related to streaming job autoscaling events
  • Job info panel: descriptive information about your pipeline
  • Job logs: logs generated by the Dataflow service at the job level
  • Worker logs: logs generated by the Dataflow service at the worker level
  • Diagnostics: table showing where errors occurred along the chosen timeline and possible recommendations for your pipeline
  • Time selector: tool that lets you adjust the timespan of your metrics

Within the Job details page, you can switch your job view with the Job graph, Execution details, Job metrics, and Autoscaling tabs.

View of the Dataflow monitoring interface with the Job graph tab selected. You can
view your pipeline graph, Job info, Job logs, Worker logs, Diagnostics,
and the time selector tool in this mode.

View of the Dataflow monitoring interface with the Job metrics tab selected.
You can view Job metrics charts, Job info, Job logs, Worker logs, Diagnostics,
and the time selector tool in this mode.

Create Cloud Monitoring alerts

Dataflow is fully integrated with Cloud Monitoring, which lets you create alerts when your job exceeds a user-defined threshold. To create a Cloud Monitoring alert from a metric chart, click Create alerting policy.

The **Create alerting policy** link lets you create an alert from a metric chart.

For instructions on creating these alerts, read the Using Cloud Monitoring for Dataflow pipelines page. If you are unable to see the monitoring graphs or create alerts, you might need additional Monitoring permissions.

Full screen mode

To view a metric chart in full screen, click .

Use the time selector tool

You can adjust the timespan of the metrics with the time selector tool. You can select a predefined duration or select a custom time interval to analyze your job.

The time selector tool lets you select a time range using increments
of hour and days or a custom range.

For streaming or in-flight batch jobs, the default display of the charts shows the previous six hours of metrics for that job. For stopped or completed streaming jobs, the default display of the charts shows the entire runtime of the job duration.

Stage and worker metrics

You can view charts in the Job metrics tab of the Dataflow UI. Each metric is organized into the following dashboards:

Overview metrics

Streaming metrics (streaming pipelines only)

Resource metrics

Input metrics

Output metrics

To access additional information in these charts, click the legend toggle to "Expand chart legend."

The legend toggle button is located near the Create alerting policy
button.

Some of these charts are specific to streaming pipelines only. A list of scenarios where these metrics can be useful for debugging can be found in the using streaming Dataflow monitoring metrics section.

Autoscaling

The Dataflow service automatically chooses the number of worker instances required to run your autoscaling job. The number of worker instances can change over time according to the job requirements.

A data visualization showing number of workers in a pipeline.

To see the history of autoscaling changes, click the More History button. A table with information about your pipeline's worker history is shown.

Table showing history of a pipeline's worker history.

Throughput

Throughput is the volume of data that is processed at any point in time. This per step metric is displayed as the number of elements per second. To view this metric in bytes per second, scroll down to the Throughput (bytes/sec) chart.

A data visualization showing throughput of four steps in a
pipeline.

Worker error log count

The Worker error log count shows you the rate of errors observed across all workers at any point in time.

A summary of each logged error and the number of times it occurred.

Data freshness (with and without Streaming Engine)

The data freshness metric shows the difference in seconds between the timestamp on the data element and the time the event is processed in your pipeline. The data element receives a timestamp when an event occurs on the element, such as a click on a website or ingestion by Pub/Sub. The time the data is processed is the output watermark.

At any time, the Dataflow job is processing multiple elements. The data points in the data freshness chart show the element with the largest delay relative to its event time. Therefore, the same line in the chart displays data for multiple elements, with each data point in the line displaying data for the slowest element at that stage in the pipeline.

If some input data has not yet been processed, the output watermark might be delayed, which impacts data freshness. A significant difference between the watermark time and the event time indicates a slow operation.

For more information, see Watermarks and late data in the Apache Beam documentation.

The dashboard includes the following two charts:

  • Data freshness by stages
  • Data freshness

A data visualization showing data freshness in a
streaming pipeline.

In the preceding image, the highlighted area shows a substantial difference between the event time and the output watermark time, indicating a slow operation.

System latency (with and without Streaming Engine)

System latency is the current maximum number of seconds that an item of data has been processing or awaiting processing. This metric indicates how long an element waits inside any one source in the pipeline. The maximum duration is adjusted after processing. The following cases are additional considerations:

  • For multiple sources and sinks, system latency is the maximum amount of time that an element waits inside a source before it is written to all sinks.
  • Sometimes, a source does not provide a value for the time period for which an element waits inside the source. In addition, the element might not have metadata to define its event time. In this scenario, system latency is calculated from the time the pipeline first receives the element.

The dashboard includes the following two charts:

  • System latency by stages
  • System latency

A data visualization showing system latency in a
streaming pipeline.

Backlog

The Backlog dashboard provides information about elements waiting to be processed. The dashboard includes the following two charts:

  • Backlog seconds (Streaming Engine only)
  • Backlog bytes (with and without Streaming Engine)

The Backlog seconds chart shows an estimate of the amount of time in seconds needed to consume the current backlog if no new data arrives and throughput doesn't change. The estimated backlog time is calculated from both the throughput and the backlog bytes from the input source that still need to be processed. This metric is used by the streaming autoscaling feature to determine when to scale up or down.

A data visualization showing the backlog seconds chart in a
streaming pipeline.

The Backlog bytes chart shows the amount of known unprocessed input for a stage in bytes. This metric compares the remaining bytes to be consumed by each stage against upstream stages. For this metric to report accurately, each source ingested by the pipeline must be configured correctly. Native sources such as Pub/Sub and BigQuery are already supported out of the box, however, custom sources require some extra implementation. For more details, see autoscaling for custom unbounded sources.

A data visualization showing the backlog bytes chart in a
streaming pipeline.

Processing (Streaming Engine only)

The Processing dashboard provides information about active user operations. The dashboard includes the following two charts:

  • User processing latencies heatmap
  • User processing latencies by stage

The User processing latencies heatmap shows the maximum user operation latencies over the 50th, 95th, and 99th percentile distributions. You can use the heatmap to see if any long-tail operations are causing high overall system latency or are negatively impacting overall data freshness.

Having high latencies in the 99th percentile is usually less impactful on the job than having high latencies in the 50th percentile. To fix an upstream issue before it becomes a problem downstream, set an alerting policy for high latencies in the 50th percentile.

A data visualization showing the user processing latencies heatmap chart for a
streaming pipeline.

The User processing latencies by stage chart shows the 99th percentile for every active user operation broken down by stage. If user code is causing a bottleneck, this chart shows which stage contains the bottleneck. You can use the following steps to debug the pipeline:

  1. Use the chart to find a stage with an unusually high latency.

  2. On the job details page, in the Execution details tab, for Graph view, select Stage workflow. In the Stage workflow graph, find the stage that has unusually high latency.

  3. To find the associated user operations, in the graph, click the node for that stage.

  4. To find additional details, navigate to Cloud Profiler, and use Cloud Profiler to debug the stack trace at the correct time range. Look for the user operations that you identified in the previous step.

A data visualization showing the user processing latencies by stage chart for a
streaming pipeline.

Parallelism (Streaming Engine only)

The Parallel processing chart shows the approximate number of keys in use for data processing for each stage. Dataflow scales based on the parallelism of a pipeline. In Dataflow, the parallelism of a pipeline is an estimate of the number of threads needed to most efficiently process data at any given time. Processing for any given key is serialized, so the total number of keys for a stage represents the maximum available parallelism at that stage. Parallelism metrics can be useful for finding hot keys or bottlenecks for slow or stuck pipelines.

A data visualization showing the parallel processing chart in a
streaming pipeline.

Persistence (Streaming Engine only)

The Persistence dashboard provides information about the rate at which Persistent Disk storage is written and read by a particular pipeline stage in bytes per second. The dashboard includes the following two charts:

  • Storage write
  • Storage read

The speed of write and read operations is limited by the maximum IOPS (input/output operations per second) of the selected disk. To determine if the current disk is causing a bottleneck, review the IOPS of the disks that the workers are using. For more information about the performance limits for persistent disks, see Performance limits.

A data visualization showing the storage write chart for a
streaming pipeline.

Duplicates (Streaming Engine only)

The Duplicates chart shows the number of messages being processed by a particular stage that have been filtered out as duplicates. Dataflow supports many sources and sinks which guarantee at least once delivery. The downside of at least once delivery is that it can result in duplicates. Dataflow guarantees exactly once delivery, which means that duplicates are automatically filtered out. Downstream stages are saved from reprocessing the same elements, which ensures that state and outputs are not affected. The pipeline can be optimized for resources and performance by reducing the number of duplicates produced in each stage.

A data visualization showing the duplicates chart in a
streaming pipeline.

Timers (Streaming Engine only)

The Timers dashboard provides information about the number of timers pending and the number of timers already processed in a particular pipeline stage. Because windows rely on timers, this metric allows you to track the progress of windows.

The dashboard includes the following two charts:

  • Timers pending by stage
  • Timers processing by stage

These charts show the rate at which windows are pending or processing at a specific point in time. The Timers pending by stage chart indicates how many windows are delayed due to bottlenecks. The Timers processing by stage chart indicates how many windows are currently collecting elements.

These charts display all job timers, so if timers are used elsewhere in your code, those timers also appear in these charts.

A data visualization showing the number of timers pending in a particular stage.

A data visualization showing the number of timers already processed in a particular stage.

CPU utilization

CPU utilization is the amount of CPU used divided by the amount of CPU available for processing. This per worker metric is displayed as a percentage. The dashboard includes the following four charts:

  • CPU utilization (All workers)
  • CPU utilization (Stats)
  • CPU utilization (Top 4)
  • CPU utilization (Bottom 4)

An animated data visualization showing CPU utilization for one Dataflow
worker.

Memory utilization

Memory utilization is the estimated amount of memory used by the workers in bytes per second. The dashboard includes the following two charts:

  • Max worker memory utilization (estimated bytes per second)
  • Memory utilization (estimated bytes per second)

The Max worker memory utilization chart provides information about the workers that use the most memory in the Dataflow job at each point in time. If at different points during a job, the worker using the maximum amount of memory changes, the same line in the chart displays data for multiple workers, with each data point in the line displaying data for the worker using the maximum amount of memory at that time. The chart compares the estimated memory used by the worker to the memory limit in bytes.

You can use this chart to troubleshoot out-of-memory (OOM) issues. Worker out-of-memory crashes are not shown on this chart.

The Memory utilization chart shows a estimate of the memory used by all workers in the Dataflow job compared to the memory limit in bytes.

Input and Output Metrics

Input metrics and output metrics are displayed if your streaming Dataflow job reads or writes records using Pub/Sub.

All input metrics of the same type are combined, and all output metrics are also combined. For example, all Pub/Sub metrics are grouped together under one section. Each metric type is organized into a separate section. To change which metrics are displayed, select the section on the left which best represents the metrics you are looking for. The following images show all the available sections.

An example image which shows the separate input and output sections for a Dataflow job.

The following two charts are displayed in both the Input Metrics and Output Metrics sections.

A series of charts showing input and output metrics for a Dataflow job.

Requests per sec

Requests per sec is the rate of API requests to read or write data by the source or sink over time. If this rate drops to zero, or decreases significantly for an extended time period relative to expected behavior, then the pipeline might be blocked from performing certain operations. Additionally, there might be no data to read. In such a case, review the job steps that have a high system watermark. Also, examine the worker logs for errors or indications about slow processing.

A chart showing the number of API requests to read or write data by the source or sink over time.

Response errors per sec by error type

Response errors per sec by error type is the rate of failed API requests to read or write data by the source or sink over time. If such errors occur frequently, these API requests might slow down processing. Such failed API requests must be investigated. To help troubleshoot these issues, review the general I/O error code documentation and any specific error code documentation used by the source or sink, such as the Pub/Sub error codes.

A chart showing the rate of failed API requests to read or write data by the source or sink over time.

Use Metrics explorer

The following Dataflow I/O metrics can be viewed in Metrics explorer:

For the complete list of Dataflow metrics, see the Google Cloud metrics documentation.

Upcoming changes to Pub/Sub metrics (Streaming Engine only)

Streaming Engine currently uses Synchronous Pull to consume data from Pub/Sub, but starting soon, we will migrate to Streaming Pull for improved performance. The existing Requests per sec and Response errors per sec graphs and metrics are appropriate for Synchronous Pull only. We will add a metric about the health of Streaming Pull connections, and any errors that terminate those connections.

The migration will not require any involvement from users. During the migration, a job may be using Synchronous Pull for one period of time, and Streaming Pull for another period of time. Therefore, the same job may show the Synchronous Pull metrics for one period of time, and the Streaming Pull metrics for another period of time. After the migration is complete, the Synchronous Pull metrics will be removed from the UI.

The migration will also affect the job/pubsub/read_count and job/pubsub/read_latencies metrics in Cloud Monitoring. Those counters will not be incremented while a job is using Streaming Pull.

Streaming jobs that don't use Streaming Engine will not be migrated to Streaming Pull, and will not be affected by this change. They will continue to display the Synchronous Pull metrics.

Additional information about the Streaming Pull migration can be found on the Streaming with Pub/Sub page.

Please contact your account team if you have any questions about this change.

Autoscaling metrics

You can view autoscaling monitoring charts for streaming jobs within the Dataflow monitoring interface. These charts display metrics over the duration of a pipeline job and include the following information:

  • The number of worker instances used by your job at any point in time
  • Autoscaling log files
  • The estimated backlog over time
  • Average CPU utilization over time

For more information, see Monitor Dataflow autoscaling.

View a pipeline

When you select a specific Dataflow job, the monitoring interface shows information about the pipeline in that job. This information includes a graphical representation of your pipeline as it is runs on the Dataflow service, a job summary, a job log, and information about each step in the pipeline.

The Dataflow monitoring interface provides a graphical representation of your pipeline: the execution graph. A pipeline's execution graph represents each transform in the pipeline as a box. Each box contains the transform name and information about the job status, which includes the following:

  • Running: the step is running.
  • Queued: the step in a FlexRS job is queued.
  • Succeeded: the step finished successfully.
  • Stopped: the step stopped because the job stopped.
  • Unknown: the step failed to report status.
  • Failed: the step failed to complete.

Basic execution graph

Pipeline Code:

Java

  // Read the lines of the input text.
  p.apply("ReadLines", TextIO.read().from(options.getInputFile()))
     // Count the words.
     .apply(new CountWords())
     // Write the formatted word counts to output.
     .apply("WriteCounts", TextIO.write().to(options.getOutput()));

Python

(
    pipeline
    # Read the lines of the input text.
    | 'ReadLines' >> beam.io.ReadFromText(args.input_file)
    # Count the words.
    | CountWords()
    # Write the formatted word counts to output.
    | 'WriteCounts' >> beam.io.WriteToText(args.output_path))

Go

  // Create the pipeline.
  p := beam.NewPipeline()
    s := p.Root()
  // Read the lines of the input text.
  lines := textio.Read(s, *input)
  // Count the words.
  counted := beam.ParDo(s, CountWords, lines)
  // Write the formatted word counts to output.
  textio.Write(s, *output, formatted)

Execution Graph:

The execution graph for a WordCount pipeline as shown in the Dataflow monitoring
              interface.

Figure 2: The pipeline code for a WordCount pipeline shown with the resulting execution graph in the Dataflow monitoring interface.

Composite transforms

In the execution graph, composite transforms, transforms that contain multiple nested sub-transforms, are expandable. Expandable composite transforms are marked with an arrow in the graph. Click the arrow to expand the transform and view the sub-transforms within.

Pipeline Code:

Java

  // The CountWords Composite Transform
  // inside the WordCount pipeline.

  public static class CountWords
    extends PTransform<PCollection<String>, PCollection<String>> {

    @Override
    public PCollection<String> apply(PCollection<String> lines) {

      // Convert lines of text into individual words.
      PCollection<String> words = lines.apply(
        ParDo.of(new ExtractWordsFn()));

      // Count the number of times each word occurs.
      PCollection<KV<String, Long>> wordCounts =
        words.apply(Count.<String>perElement());

      return wordCounts;
    }
  }

Python

# The CountWords Composite Transform inside the WordCount pipeline.
@beam.ptransform_fn
def CountWords(pcoll):
  return (
      pcoll
      # Convert lines of text into individual words.
      | 'ExtractWords' >> beam.ParDo(ExtractWordsFn())
      # Count the number of times each word occurs.
      | beam.combiners.Count.PerElement()
      # Format each word and count into a printable string.
      | 'FormatCounts' >> beam.ParDo(FormatCountsFn()))

Go

  // The CountWords Composite Transform inside the WordCount pipeline.
  func CountWords(s beam.Scope, lines beam.PCollection) beam.PCollection {
    s = s.Scope("CountWords")

    // Convert lines of text into individual words.
    col := beam.ParDo(s, &extractFn{SmallWordLength: *smallWordLength}, lines)

    // Count the number of times each word occurs.
    return stats.Count(s, col)
  }
Execution Graph:

The execution graph for a WordCount pipeline with the CountWords transform expanded
              to show its component transforms.

Figure 3: The pipeline code for the sub-steps of the CountWords transform shown with the expanded execution graph for the entire pipeline.

Transform names

Dataflow has a few different ways to obtain the transform name that's shown in the monitoring execution graph:

Java

  • Dataflow can use a name that you assign when you apply your transform. The first argument you supply to the apply method is your transform name.
  • Dataflow can infer the transform name, either from the class name (if you've built a custom transform) or the name of your DoFn function object (if you're using a core transform such as ParDo).

Python

  • Dataflow can use a name that you assign when you apply your transform. You can set the transform name by specifying the transform's label argument.
  • Dataflow can infer the transform name, either from the class name (if you've built a custom transform) or the name of your DoFn function object (if you're using a core transform such as ParDo).

Go

  • Dataflow can use a name that you assign when you apply your transform. You can set the transform name by specifying the Scope.
  • Dataflow can infer the transform name, either from the struct name if you're using a structural DoFn or from the function name if you're using a functional DoFn.

Understand the metrics

Wall time

When you click a step, the Wall time metric is displayed in the Step info panel. Wall time provides the total approximate time spent across all threads in all workers on the following actions:

  • Initializing the step
  • Processing data
  • Shuffling data
  • Ending the step

For composite steps, wall time tells you the sum of time spent in the component steps. This estimate can help you identify slow steps and diagnose which part of your pipeline is taking more time than required.

You can view the amount of time it takes for a step to run in your pipeline.
Figure 4: The Wall time metric can help you ensure your pipeline is running efficiently.

Side input metrics

Side input metrics show you how your side input access patterns and algorithms affect your pipeline's performance. When your pipeline uses a side input, Dataflow writes the collection to a persistent layer, such as a disk, and your transforms read from this persistent collection. These reads and writes affect your job's run time.

The Dataflow monitoring interface displays side input metrics when you select a transform that creates or consumes a side input collection. You can view the metrics in the Side Input Metrics section of the Step info panel.

Transforms that create a side input

If the selected transform creates a side input collection, the Side Input Metrics section displays the name of the collection, along with the following metrics:

  • Time spent writing: The time spent writing the side input collection.
  • Bytes written: The total number of bytes written to the side input collection.
  • Time & bytes read from side input: A table that contains additional metrics for all transforms that consume the side input collection, called side input consumers.

The Time & bytes read from side input table contains the following information for each side input consumer:

  • Side input consumer: The transform name of the side input consumer.
  • Time spent reading: The time this consumer spent reading the side input collection.
  • Bytes read: The number of bytes this consumer read from the side input collection.

If your pipeline has a composite transform that creates a side input, expand the composite transform until you see the specific subtransform that creates the side input. Then, select that subtransform to view the Side Input Metrics section.

Figure 5 shows side input metrics for a transform that creates a side input collection.

You can select the subtransform and its side input metrics are
         visible in the Step info side panel.
Figure 5: The execution graph has an expanded composite transform (MakeMapView). The subtransform that creates the side input (CreateDataflowView) is selected, and the side input metrics are visible in the Step info side panel.

Transforms that consume one or more side inputs

If the selected transform consumes one or more side inputs, the Side Input Metrics section displays the Time & bytes read from side input table. This table contains the following information for each side input collection:

  • Side input collection: The name of the side input collection.
  • Time spent reading: The time the transform spent reading this side input collection.
  • Bytes read: The number of bytes the transform read from this side input collection.

If your pipeline has a composite transform that reads a side input, expand the composite transform until you see the specific subtransform that reads the side input. Then, select that subtransform to view the Side Input Metrics section.

Figure 6 shows side input metrics for a transform that reads from a side input collection.

You can select the transform and its side input metrics are
         visible in the Step info side panel.
Figure 6: The JoinBothCollections transform reads from a side input collection. JoinBothCollections is selected in the execution graph and the side input metrics are visible in the Step info side panel.

Identify side input performance issues

Reiteration is a common side input performance issue. If your side input PCollection is too large, workers can't cache the entire collection in memory. As a result, the workers must repeatedly read from the persistent side input collection.

In figure 7, side input metrics show that the total bytes read from the side input collection are much larger than the collection's size, total bytes written.

You can select the transform and its side input metrics are
         visible in the Step info side panel.
Figure 7: An example of reiteration. The side input collection is 563 MB, and the sum of the bytes read by consuming transforms is almost 12 GB.

To improve the performance of this pipeline, redesign your algorithm to avoid iterating or refetching the side input data. In this example, the pipeline creates the Cartesian product of two collections. The algorithm iterates through the entire side input collection for each element of the main collection. You can improve the access pattern of the pipeline by batching multiple elements of the main collection together. This change reduces the number of times workers must re-read the side input collection.

Another common performance issue can occur if your pipeline performs a join by applying a ParDo with one or more large side inputs. In this case, workers spend a large percentage of the processing time for the join operation reading from the side input collections.

Figure 8 shows example side input metrics for this issue:

You can select the transform and its side input metrics are
         visible in the Step info side panel.
Figure 8: The JoinBothCollections transform has a total processing time of 18 min 31 sec. Workers spend the majority of the processing time (10 min 3 sec) reading from the 10 GB side input collection.

To improve the performance of this pipeline, use CoGroupByKey instead of side inputs.

Recommendations and Diagnostics

Dataflow provides recommendations for improving job performance, reducing cost, and troubleshooting errors. This section explains how to review and interpret the recommendations. Keep in mind that some recommendations might not be relevant to your use case.

Diagnostics

The Diagnostics tab under Logs collects and displays certain log entries produced in your pipelines. These include messages that indicate a probable issue with the pipeline, and error messages with stack traces. Collected log entries are deduplicated and combined into error groups.

The Diagnostics tab for a Dataflow job with a Service Error error
          group.

The error report includes the following information:

  • A list of errors with error messages.
  • The number of times each error occurred.
  • A histogram indicating when each error occurred.
  • The time that the error most recently occurred.
  • The time that the error first occurred.
  • The status of the error.

To view the error report for a specific error, click the description under the Errors column. The Error reporting page is displayed. If the error is a Service Error, an additional link with documentation including further steps will be displayed ("Troubleshooting guide").

The error group detail page for a Dataflow Service Error.

To know more about the page, see Viewing errors.

Mute an error

To mute an error message, open the Diagnostics tab, click the error you want to mute, open the resolution status menu (labeled one of: Open | Acknowledged | Resolved | Muted ) and select Muted.

Recommendations

The Recommendations tab displays insights from Dataflow regarding the pipeline. The goal of these insights is to identify situations in which improvements in cost and performance might be made.

The Recommendations tab for a Dataflow job with sample recommendations.

The Update date column indicates the last time that an insight was observed. Recommendations will be stored for 30 days from the Update date.

Programmatic access to recommendations

For programmatic access to recommendations, use the Recommender API.

Dismiss a recommendation

You can dismiss a recommendation at the Recommendation Hub for your project.

To dismiss a recommendation, click the navigation menu in the upper left of the Google Cloud console and select Home > Recommendations. On the Dataflow Diagnostics card, click View all, select the recommendation you want to dismiss, and click Dismiss.

What's next