Logging pipeline messages

You can use the Dataflow SDK's built-in logging infrastructure to log information during your pipeline's execution. You can use the Google Cloud Console to monitor logging information during and after your pipeline runs.

Adding log messages to your pipeline

Java: SDK 2.x

The Apache Beam SDK for Java recommends that you log worker messages through the open source SLF4J (Simple Logging Facade for Java) library. The Apache Beam SDK for Java implements the required logging infrastructure so your Java code need only import the SLF4J API. Then, it instantiates a Logger to enable message logging within your pipeline code.

For pre-existing code and/or libraries, the Apache Beam SDK for Java sets up additional logging infrastructure. This occurs when executing on the worker to capture log messages produced by the following logging libraries for Java:

Python

The Apache Beam SDK for Python provides the logging library package, which allows your pipeline's workers to output log messages. To use the library functions, you must import the library:

import logging

Java: SDK 1.x

The Apache Beam SDK for Java recommends that you log worker messages through the open source SLF4J (Simple Logging Facade for Java) library. The Apache Beam SDK for Java implements the required logging infrastructure so your Java code need only import the SLF4J API. Then, it instantiates a Logger to enable message logging within your pipeline code.

For pre-existing code and/or libraries, the Apache Beam SDK for Java sets up additional logging infrastructure. This occurs when executing on the worker to capture log messages produced by the following logging libraries for Java:

Worker log message code example

Java: SDK 2.x

The Apache Beam WordCount example can be modified to output a log message when the word "love" is found in a line of the processed text. The added code is indicated in bold below (surrounding code is included for context).

 package org.apache.beam.examples;
 // Import SLF4J packages.
 import org.slf4j.Logger;
 import org.slf4j.LoggerFactory;
 ...
 public class WordCount {
   ...
   static class ExtractWordsFn extends DoFn<String, String> {
     // Instantiate Logger.
     // Suggestion: As shown, specify the class name of the containing class
     // (WordCount).
     private static final Logger LOG = LoggerFactory.getLogger(WordCount.class);
     ...
     @ProcessElement
     public void processElement(ProcessContext c) {
       ...
       // Output each word encountered into the output PCollection.
       for (String word : words) {
         if (!word.isEmpty()) {
           c.output(word);
         }
         // Log INFO messages when the word "love" is found.
         if(word.toLowerCase().equals("love")) {
           LOG.info("Found " + word.toLowerCase());
         }
       }
     }
   }
 ... // Remaining WordCount example code ...

Python

The Apache Beam wordcount.py example can be modified to output a log message when the word "love" is found in a line of the processed text.

# import Python logging module.
import logging

class ExtractWordsFn(beam.DoFn):

  def process(self, element):
    words = re.findall(r'[A-Za-z\']+', element)
    for word in words:
      yield word

      if word.lower() == 'love':
        # Log using the root logger at info or higher levels
        logging.info('Found : %s', word.lower())

# Remaining WordCount example code ...

Java: SDK 1.x

The WordCount example can be modified to output a log message when the word "love" is found in a line of the processed text. The added code is indicated in bold below (surrounding code is included for context).

 package com.google.cloud.dataflow.examples;
 // Import SLF4J packages.
 import org.slf4j.Logger;
 import org.slf4j.LoggerFactory;
 ...
 public class WordCount {
   ...
   static class ExtractWordsFn extends DoFn<String, String> {
     // Instantiate Logger.
     // Suggestion: As shown, specify the class name of the containing class
     // (WordCount).
     private static final Logger LOG = LoggerFactory.getLogger(WordCount.class);
     ...
     @Override
     public void processElement(ProcessContext c) {
       ...
       // Output each word encountered into the output PCollection.
       for (String word : words) {
         if (!word.isEmpty()) {
           c.output(word);
         }
         // Log INFO messages when the word "love" is found.
         if(word.toLowerCase().equals("love")) {
           LOG.info("Found " + word.toLowerCase());
         }
       }
     }
   }
 ... // Remaining WordCount example code ...

Java: SDK 2.x

If the modified WordCount pipeline is run locally using the default DirectRunner with the output sent to a local file (--output=./local-wordcounts), console output includes the added log messages:

INFO: Executing pipeline using the DirectRunner.
...
Feb 11, 2015 1:13:22 PM org.apache.beam.examples.WordCount$ExtractWordsFn processElement
INFO: Found love
Feb 11, 2015 1:13:22 PM org.apache.beam.examples.WordCount$ExtractWordsFn processElement
INFO: Found love
Feb 11, 2015 1:13:22 PM org.apache.beam.examples.WordCount$ExtractWordsFn processElement
INFO: Found love
...
INFO: Pipeline execution complete.

By default, only log lines marked INFO and higher will be sent to Cloud Logging. If you wish to change this behavior, see Setting Pipeline Worker Log Levels.

Python

If the modified WordCount pipeline is run locally using the default DirectRunner with the output sent to a local file (--output=./local-wordcounts), console output includes the added log messages:

INFO:root:Found : love
INFO:root:Found : love
INFO:root:Found : love

By default, only log lines marked INFO and higher will be sent to Cloud Logging.

Java: SDK 1.x

If the modified WordCount pipeline is run locally using the default DirectPipelineRunner with the output sent to a local file (--output=./local-wordcounts), console output includes the added log messages:

INFO: Executing pipeline using the DirectPipelineRunner.
...
Feb 11, 2015 1:13:22 PM com.google.cloud.dataflow.examples.WordCount$ExtractWordsFn processElement
INFO: Found love
Feb 11, 2015 1:13:22 PM com.google.cloud.dataflow.examples.WordCount$ExtractWordsFn processElement
INFO: Found love
Feb 11, 2015 1:13:22 PM com.google.cloud.dataflow.examples.WordCount$ExtractWordsFn processElement
INFO: Found love
...
INFO: Pipeline execution complete.

By default, only log lines marked INFO and higher will be sent to Cloud Logging. If you wish to change this behavior, see Setting Pipeline Worker Log Levels.