Using the Dataflow Monitoring Interface

When you execute your pipeline using the Cloud Dataflow managed service, you can view that job and any others by using Dataflow's web-based monitoring user interface. The monitoring interface lets you see and interact with your Dataflow jobs.

You can access the Dataflow Monitoring Interface by using the Google Cloud Platform Console. The monitoring interface can show you:

  • A list of all Dataflow jobs currently running and previously run
  • A graphical representation of each pipeline
  • Details about your job's status and execution
  • Links to information about the Cloud Platform services running your pipeline (such as Google Compute Engine and Google Cloud Storage)
  • Any errors or warnings that occur during a job

Accessing the Dataflow Monitoring Interface

To access the Dataflow Monitoring Interface:

  1. Log in to the Cloud Platform Console.
  2. Select your Cloud Platform project.
  3. Click the menu in the upper left corner.
  4. Navigate to the Big Data section and click Dataflow.

A list of Cloud Dataflow jobs appears along with their status, which might include running, succeeded, or failed. Select an individual job to see more information about that pipeline.

Figure 1: A list of Dataflow jobs in the Cloud Platform Console with jobs in the running, succeeded, and failed states.

Viewing a Pipeline

When you select a specific Dataflow job, the monitoring interface shows detailed information about the pipeline in that job. This includes a graphical representation of your pipeline as it is executed on the Dataflow service, as well as a job summary, the job log, and detailed information on each step in the pipeline.

An individual pipeline job in the Dataflow monitoring interface.
Figure 2: An individual pipeline job shown in the Dataflow monitoring interface.

The Dataflow monitoring interface provides a graphical representation of your pipeline: the execution graph. A pipeline's execution graph represents each transform in the pipeline as a box that contains the transform name and some status information.

Basic Execution Graph

Pipeline Code:

Java

  // Read the lines of the input text.
  p.apply(TextIO.Read.named("ReadLines").from(options.getInput()))
     // Count the words.
     .apply(new CountWords())
     // Write the formatted word counts to output.
     .apply(TextIO.Write.named("WriteCounts")
         .to(options.getOutput())
         .withNumShards(options.getNumShards()));

Python

(p
 # Read the lines of the input text.
 | 'ReadLines' >> beam.io.Read(beam.io.TextFileSource(options.input))
 # Count the words.
 | CountWords()
 # Write the formatted word counts to output.
 | 'WriteCounts' >> beam.io.Write(beam.io.TextFileSink(options.output)))
Execution Graph:

The execution graph for a WordCount pipeline as shown in the Dataflow monitoring
              interface.

Figure 3: The pipeline code for a WordCount pipeline shown side-by-side with the resulting execution graph in the Dataflow monitoring interface.

Composite Transforms

In the execution graph, composite transforms (transforms that contain multiple nested sub-transforms) are expandable. Expandable composite transforms are marked with an arrow in the graph; click the arrow to expand the transform and view the sub-transforms within.

Pipeline Code:

Java

  // The CountWords Composite Transform
  // inside the WordCount pipeline.

  public static class CountWords
    extends PTransform<PCollection<String>, PCollection<String>> {

    @Override
    public PCollection<String> apply(PCollection<String> lines) {

      // Convert lines of text into individual words.
      PCollection<String> words = lines.apply(
        ParDo.of(new ExtractWordsFn()));

      // Count the number of times each word occurs.
      PCollection<KV<String, Long>> wordCounts =
        words.apply(Count.<String>perElement());

      // Format each word and count into a printable string.
      PCollection<String> results = wordCounts.apply(
        ParDo.of(new FormatCountsFn()));

      return results;
    }
  }

Python

# The CountWords Composite Transform inside the WordCount pipeline.
class CountWords(beam.PTransform):

  def expand(self, pcoll):
    return (pcoll
            # Convert lines of text into individual words.
            | 'ExtractWords' >> beam.ParDo(ExtractWordsFn())
            # Count the number of times each word occurs.
            | beam.combiners.Count.PerElement()
            # Format each word and count into a printable string.
            | 'FormatCounts' >> beam.ParDo(FormatCountsFn()))
Execution Graph:

The execution graph for a WordCount pipeline with the CountWords transform expanded
              to show its component transforms.

Figure 4: The pipeline code for the sub-steps of the CountWords transform shown side-by-side with the expanded execution graph for the entire pipeline.

Transform Names

Dataflow has a few different ways to obtain the transform name that's shown in the monitoring execution graph:

Java

  • Dataflow can use a name that you assign when you apply your transform. You can set the transform name by invoking the .named operation on your transform.
  • Dataflow can infer the transform name, either from the class name (if you've built a custom transform) or the name of your DoFn function object (if you're using a core transform such as ParDo).

Python

  • Dataflow can use a name that you assign when you apply your transform. You can set the transform name by specifying the transform's label argument.
  • Dataflow can infer the transform name, either from the class name (if you've built a custom transform) or the name of your DoFn function object (if you're using a core transform such as ParDo).

Autoscaling

The Dataflow service automatically chooses the number of worker instances required to run your autoscaling job. The number of worker instances may change over time according to your job's needs. You can view this number and other information about your autoscaling job in the Summary tab.

You can view the number of worker instances used by your autoscaling
          pipeline as well as other information in the 'Summary' tab.

To see the history of autoscaling changes, click on the See More History link. A window with information about your pipeline's worker history will pop up.

To see the history of autoscaling changes, click on the 'See
          More History' link.

Note: You can view autoscaling details about streaming pipelines run or updated on or after December 12, 2016. If your pipeline was run or last updated prior to December 12, you will only be able to see autoscaling details after you update your pipeline.

Send feedback about...

Cloud Dataflow Documentation