Adding log messages to your pipeline
Java: SDK 2.x
The Apache Beam SDK for Java recommends that you log worker messages through the open source SLF4J (Simple Logging Facade for Java) library. The Apache Beam SDK for Java implements the required logging infrastructure so your Java code need only import the SLF4J API. Then, it instantiates a Logger to enable message logging within your pipeline code.
For pre-existing code and/or libraries, the Apache Beam SDK for Java sets up additional logging infrastructure. This occurs when executing on the worker to capture log messages produced by the following logging libraries for Java:
Python
The Apache Beam SDK for Python provides the logging
library package,
which allows your pipeline's workers to output log messages. To use the
library functions, you must import the library:
import logging
Java: SDK 1.x
The Apache Beam SDK for Java recommends that you log worker messages through the open source SLF4J (Simple Logging Facade for Java) library. The Apache Beam SDK for Java implements the required logging infrastructure so your Java code need only import the SLF4J API. Then, it instantiates a Logger to enable message logging within your pipeline code.
For pre-existing code and/or libraries, the Apache Beam SDK for Java sets up additional logging infrastructure. This occurs when executing on the worker to capture log messages produced by the following logging libraries for Java:
Worker log message code example
Java: SDK 2.x
The Apache Beam WordCount example can be modified to output a log message when the word "love" is found in a line of the processed text. The added code is indicated in bold below (surrounding code is included for context).
package org.apache.beam.examples; // Import SLF4J packages. import org.slf4j.Logger; import org.slf4j.LoggerFactory; ... public class WordCount { ... static class ExtractWordsFn extends DoFn<String, String> { // Instantiate Logger. // Suggestion: As shown, specify the class name of the containing class // (WordCount). private static final Logger LOG = LoggerFactory.getLogger(WordCount.class); ... @ProcessElement public void processElement(ProcessContext c) { ... // Output each word encountered into the output PCollection. for (String word : words) { if (!word.isEmpty()) { c.output(word); } // Log INFO messages when the word "love" is found. if(word.toLowerCase().equals("love")) { LOG.info("Found " + word.toLowerCase()); } } } } ... // Remaining WordCount example code ...
Python
The Apache Beam wordcount.py example can be modified to output a log message when the word "love" is found in a line of the processed text.
# import Python logging module. import logging class ExtractWordsFn(beam.DoFn): def process(self, element): words = re.findall(r'[A-Za-z\']+', element) for word in words: yield word if word.lower() == 'love': # Log using the root logger at info or higher levels logging.info('Found : %s', word.lower()) # Remaining WordCount example code ...
Java: SDK 1.x
The WordCount example can be modified to output a log message when the word "love" is found in a line of the processed text. The added code is indicated in bold below (surrounding code is included for context).
package com.google.cloud.dataflow.examples; // Import SLF4J packages. import org.slf4j.Logger; import org.slf4j.LoggerFactory; ... public class WordCount { ... static class ExtractWordsFn extends DoFn<String, String> { // Instantiate Logger. // Suggestion: As shown, specify the class name of the containing class // (WordCount). private static final Logger LOG = LoggerFactory.getLogger(WordCount.class); ... @Override public void processElement(ProcessContext c) { ... // Output each word encountered into the output PCollection. for (String word : words) { if (!word.isEmpty()) { c.output(word); } // Log INFO messages when the word "love" is found. if(word.toLowerCase().equals("love")) { LOG.info("Found " + word.toLowerCase()); } } } } ... // Remaining WordCount example code ...
Java: SDK 2.x
If the modified WordCount pipeline is run locally using the default DirectRunner
with the output sent to a local file (--output=./local-wordcounts
), console output
includes the added log messages:
INFO: Executing pipeline using the DirectRunner. ... Feb 11, 2015 1:13:22 PM org.apache.beam.examples.WordCount$ExtractWordsFn processElement INFO: Found love Feb 11, 2015 1:13:22 PM org.apache.beam.examples.WordCount$ExtractWordsFn processElement INFO: Found love Feb 11, 2015 1:13:22 PM org.apache.beam.examples.WordCount$ExtractWordsFn processElement INFO: Found love ... INFO: Pipeline execution complete.
By default, only log lines marked INFO
and higher will be sent to Cloud Logging. If
you wish to change this behavior, see
Setting Pipeline Worker Log Levels.
Python
If the modified WordCount pipeline is run locally using the default DirectRunner
with the output sent to a local file (--output=./local-wordcounts
), console output
includes the added log messages:
INFO:root:Found : love INFO:root:Found : love INFO:root:Found : love
By default, only log lines marked INFO
and higher will be sent to Cloud Logging.
Java: SDK 1.x
If the modified WordCount pipeline is run locally using the default DirectPipelineRunner
with the output sent to a local file (--output=./local-wordcounts
), console output
includes the added log messages:
INFO: Executing pipeline using the DirectPipelineRunner. ... Feb 11, 2015 1:13:22 PM com.google.cloud.dataflow.examples.WordCount$ExtractWordsFn processElement INFO: Found love Feb 11, 2015 1:13:22 PM com.google.cloud.dataflow.examples.WordCount$ExtractWordsFn processElement INFO: Found love Feb 11, 2015 1:13:22 PM com.google.cloud.dataflow.examples.WordCount$ExtractWordsFn processElement INFO: Found love ... INFO: Pipeline execution complete.
By default, only log lines marked INFO
and higher will be sent to Cloud Logging. If
you wish to change this behavior, see
Setting Pipeline Worker Log Levels.
Monitoring pipeline logs
When you run your pipeline on the Dataflow service, you can use the Dataflow Monitoring Interface to view logs emitted by your pipeline.
Cloud Dataflow worker log example
The modified WordCount pipeline can be run in the cloud with the following options:
Java: SDK 2.x
--project=WordCountExample --output=gs://<bucket-name>/counts --runner=DataflowRunner --tempLocation=gs://<bucket-name>/temp --stagingLocation=gs://<bucket-name>/binaries
Python
--project=WordCountExample --output=gs://<bucket-name>/counts --runner=DataflowRunner --staging_location=gs://<bucket-name>/binaries
Java: SDK 1.x
--project=WordCountExample --output=gs://<bucket-name>/counts --runner=BlockingDataflowPipelineRunner --stagingLocation=gs://<bucket-name>/binaries
Viewing job summary and status
Because the WordCount cloud pipeline uses blocking execution, console messages are output during pipeline execution. Once the job starts, a link to the Cloud Console page is output to the console, followed by the pipeline job ID:
INFO: To access the Dataflow monitoring console, please navigate to https://console.developers.google.com/dataflow/job/2017-04-13_13_58_10-6217777367720337669 Submitted job: 2017-04-13_13_58_10-6217777367720337669
The console URL leads to the Dataflow Monitoring Interface with a summary page for the submitted job. It shows a dynamic execution graph on the left, with summary information on the right:

The Logs button opens the bottom logs panel, defaulting to show Job Log messages that report the status of the job as a whole. You can use the Minimum Severity selector to filter job progress and status messages.
Selecting a pipeline step in the graph changes the view to Step Logs generated by your code and the generated code running in the pipeline step.

To get back to Job Logs, deselect the step by clicking outside the graph or using the Close button in the right side panel.
Viewing logs
The Step Logs in the Dataflow Monitoring Interface show only the most recent log messages. You can view all Step Logs for a pipeline step in Stackdriver Logging by clicking the Stackdriver link on the right side of the logs pane.

Logging also includes other infrastructure logs for your pipeline. Clicking the external link button from your Job Logs navigates to Cloud Logging with a menu to select different log types.

Here is a summary of the different log types available for viewing from the Monitoring→Logs page:
- job-message logs contain job-level messages that various components of Dataflow generate. Examples include the autoscaling configuration, when workers start up or shut down, progress on the job step, and job errors. Worker-level errors that originate from crashing user code and that are present in worker logs also propagate up to the job-message logs.
- worker logs are produced by Dataflow workers. Workers do
most of the pipeline work (for example, applying your
ParDo
s to data). Worker logs contain messages logged by your code and Dataflow. - worker-startup logs are present on most Dataflow jobs and can capture messages related to the startup process. The startup process includes downloading a job's jars from Cloud Storage, then starting the workers. If there is a problem starting workers, these logs are a good place to look.
- shuffler logs contain messages from workers that consolidate the results of parallel pipeline operations.
- docker and kubelet logs contain messages related to these public technologies, which are used on Dataflow workers.
Setting pipeline worker log levels
Java: SDK 2.x
The default SLF4J logging level set on workers by the Apache Beam SDK for Java is
INFO
. All log messages of INFO
or higher (INFO
,
WARN
, ERROR
) will be emitted. You can set a different default log level
to support lower SLF4J logging levels (TRACE
or DEBUG
) or set different
log levels for different packages of classes in your code.
Two pipeline options are provided to allow you to set worker log levels from the command line or programmatically:
--defaultWorkerLogLevel=<level>
: use this option to set all loggers at the specified default level. For example, the following command-line option will override the default DataflowINFO
log level, and set it toDEBUG
:
--defaultWorkerLogLevel=DEBUG
--workerLogLevelOverrides={"<package or class>":"<level>"}
: use this option to set the logging level for specified package(s) or class(es). For example, to override the default pipeline log level for thecom.google.cloud.dataflow
package, and set it toTRACE
:
--workerLogLevelOverrides={"com.google.cloud.dataflow":"TRACE"}
or to override the default pipeline logging level for thecom.google.cloud.Foo
class, and set it toDEBUG
:
--workerLogLevelOverrides={"com.google.cloud.Foo":"DEBUG"}
Multiple overrides can be accomplished by providing a JSON map:
(--workerLogLevelOverrides={"<package/class>":"<level>","<package/class>":"<level>",...}
).
The following example programmatically sets pipeline logging options with default values that can be overridden from the command line:
PipelineOptions options = ... DataflowWorkerLoggingOptions loggingOptions = options.as(DataflowWorkerLoggingOptions.class); // Overrides the default log level on the worker to emit logs at TRACE or higher. loggingOptions.setDefaultWorkerLogLevel(Level.TRACE); // Overrides the Foo class and "com.google.cloud.dataflow" package to emit logs at WARN or higher. loggingOptions.setWorkerLogLevelOverrides( WorkerLogLevelOverride.forClass(Foo.class, Level.WARN), WorkerLogLevelOverride.forPackage(Package.getPackage("com.google.cloud.dataflow"), Level.WARN));
Python
This feature is not yet available in the Apache Beam SDK for Python.
Java: SDK 1.x
The default SLF4J logging level set on workers by the Apache Beam SDK for Java is
INFO
. All log messages of INFO
or higher (INFO
,
WARN
, ERROR
) will be emitted. You can set a different default log level
to support lower SLF4J logging levels (TRACE
or DEBUG
) or set different
log levels for different packages of classes in your code.
Two pipeline options are provided to allow you to set worker log levels from the command line or programmatically:
--defaultWorkerLogLevel=<level>
: use this option to set all loggers at the specified default level. For example, the following command-line option will override the defaultINFO
log level, and set it toDEBUG
:
--defaultWorkerLogLevel=DEBUG
--workerLogLevelOverrides={"<package or class>":"<level>"}
: use this option to set the logging level for specified package(s) or class(es). For example, to override the default pipeline log level for thecom.google.cloud.dataflow
package, and set it toTRACE
:
--workerLogLevelOverrides={"com.google.cloud.dataflow":"TRACE"}
or to override the default pipeline logging level for thecom.google.cloud.Foo
class, and set it toDEBUG
:
--workerLogLevelOverrides={"com.google.cloud.Foo":"DEBUG"}
Multiple overrides can be accomplished by providing a JSON map:
(--workerLogLevelOverrides={"<package/class>":"<level>","<package/class>":"<level>",...}
).
The following example programmatically sets pipeline logging options with default values that you can override from the command line:
PipelineOptions options = ... DataflowWorkerLoggingOptions loggingOptions = options.as(DataflowWorkerLoggingOptions.class); // Overrides the default log level on the worker to emit logs at TRACE or higher. loggingOptions.setDefaultWorkerLogLevel(Level.TRACE); // Overrides the Foo class and "com.google.cloud.dataflow" package to emit logs at WARN or higher. loggingOptions.setWorkerLogLevelOverrides( WorkerLogLevelOverride.forClass(Foo.class, Level.WARN), WorkerLogLevelOverride.forPackage(Package.getPackage("com.google.cloud.dataflow"), Level.WARN));