Join the Apache Beam community on July 18th-20th for the Beam Summit 2022 to learn more about Beam and share your expertise.

Logging pipeline messages

You can use the Apache Beam SDK's built-in logging infrastructure to log information during your pipeline's execution. You can use the Google Cloud console to monitor logging information during and after your pipeline runs.

Add log messages to your pipeline


The Apache Beam SDK for Java recommends that you log worker messages through the open source Simple Logging Facade for Java (SLF4J) library. The Apache Beam SDK for Java implements the required logging infrastructure so your Java code need only import the SLF4J API. Then, it instantiates a Logger to enable message logging within your pipeline code.

For pre-existing code and/or libraries, the Apache Beam SDK for Java sets up additional logging infrastructure. This occurs when executing on the worker to capture log messages produced by the following logging libraries for Java:


The Apache Beam SDK for Python provides the logging library package, which allows your pipeline's workers to output log messages. To use the library functions, you must import the library:

import logging

Worker log message code example


The following example uses SLF4J for Dataflow logging. To learn more about configuring SLF4J for Dataflow logging, see the Java Tips article.

The Apache Beam WordCount example can be modified to output a log message when the word "love" is found in a line of the processed text. The added code is indicated in bold below (surrounding code is included for context).

 package org.apache.beam.examples;
 // Import SLF4J packages.
 import org.slf4j.Logger;
 import org.slf4j.LoggerFactory;
 public class WordCount {
   static class ExtractWordsFn extends DoFn<String, String> {
     // Instantiate Logger.
     // Suggestion: As shown, specify the class name of the containing class
     // (WordCount).
     private static final Logger LOG = LoggerFactory.getLogger(WordCount.class);
     public void processElement(ProcessContext c) {
       // Output each word encountered into the output PCollection.
       for (String word : words) {
         if (!word.isEmpty()) {
         // Log INFO messages when the word "love" is found.
         if(word.toLowerCase().equals("love")) {
 "Found " + word.toLowerCase());
 ... // Remaining WordCount example code ...


The Apache Beam example can be modified to output a log message when the word "love" is found in a line of the processed text.

# import Python logging module.
import logging

class ExtractWordsFn(beam.DoFn):
  def process(self, element):
    words = re.findall(r'[A-Za-z\']+', element)
    for word in words:
      yield word

      if word.lower() == 'love':
        # Log using the root logger at info or higher levels'Found : %s', word.lower())

# Remaining WordCount example code ...


If the modified WordCount pipeline is run locally using the default DirectRunner with the output sent to a local file (--output=./local-wordcounts), console output includes the added log messages:

INFO: Executing pipeline using the DirectRunner.
Feb 11, 2015 1:13:22 PM org.apache.beam.examples.WordCount$ExtractWordsFn processElement
INFO: Found love
Feb 11, 2015 1:13:22 PM org.apache.beam.examples.WordCount$ExtractWordsFn processElement
INFO: Found love
Feb 11, 2015 1:13:22 PM org.apache.beam.examples.WordCount$ExtractWordsFn processElement
INFO: Found love
INFO: Pipeline execution complete.

By default, only log lines marked INFO and higher will be sent to Cloud Logging. If you wish to change this behavior, see Setting Pipeline Worker Log Levels.


If the modified WordCount pipeline is run locally using the default DirectRunner with the output sent to a local file (--output=./local-wordcounts), console output includes the added log messages:

INFO:root:Found : love
INFO:root:Found : love
INFO:root:Found : love

By default, only log lines marked INFO and higher will be sent to Cloud Logging.

Control log volume

You might also reduce the volume of logs generated by changing the pipeline log levels. If you do not want to continue ingesting some or all of your Dataflow logs, you can add a Logging exclusion to exclude Dataflow logs and export them to a different destination such as BigQuery, Cloud Storage or Pub/Sub.

Logging limit and throttling

Worker log messages are limited to 15,000 messages every 30 seconds, per worker. If this limit is reached, a single worker log message is added saying that logging is throttled:

Throttling logger worker. It used up its 30s quota for logs in only 12.345s
No more messages are logged until the 30 second interval is over. This limit is shared by log messages generated by the Apache Beam SDK and user code.

Log storage and retention

Operational logs are stored in the _Default log bucket. The logging API service name is For more information about the Google Cloud monitored resource types and services used in Cloud Logging, see Monitored resources and services.

For details about how long log entries are retained by Logging, see the retention information in Quotas and limits: Logs retention periods.

For information about viewing operational logs, see Monitor and view pipeline logs.

Monitor and view pipeline logs

When you run your pipeline on the Dataflow service, you can use the Dataflow Monitoring Interface to view logs emitted by your pipeline.

Dataflow worker log example

The modified WordCount pipeline can be run in the cloud with the following options:





View logs

Because the WordCount cloud pipeline uses blocking execution, console messages are output during pipeline execution. After the job starts, a link to the Cloud console page is output to the console, followed by the pipeline job ID:

INFO: To access the Dataflow monitoring console, please navigate to
Submitted job: 2017-04-13_13_58_10-6217777367720337669

The console URL leads to the Dataflow Monitoring Interface with a summary page for the submitted job. It shows a dynamic execution graph on the left, with summary information on the right. Click on the bottom panel to expand the logs panel.

The logs panel defaults to showing Job Logs that report the status of the job as a whole. You can filter the messages that appear in the logs panel by clicking Info and Filter logs.

Selecting a pipeline step in the graph changes the view to Step Logs generated by your code and the generated code running in the pipeline step.

To get back to Job Logs, deselect the step by clicking outside the graph or using the Deselect step button in the right side panel.

Clicking the external link button from the logs panel navigates to Logging with a menu to select different log types.

Logging also includes other infrastructure logs for your pipeline. For more details on how to explore your logs, refer to the Logs explorer guide.

Here is a summary of the different log types available for viewing from the Logging page:

  • job-message logs contain job-level messages that various components of Dataflow generate. Examples include the autoscaling configuration, when workers start up or shut down, progress on the job step, and job errors. Worker-level errors that originate from crashing user code and that are present in worker logs also propagate up to the job-message logs.
  • worker logs are produced by Dataflow workers. Workers do most of the pipeline work (for example, applying your ParDos to data). Worker logs contain messages logged by your code and Dataflow.
  • worker-startup logs are present on most Dataflow jobs and can capture messages related to the startup process. The startup process includes downloading a job's jars from Cloud Storage, then starting the workers. If there is a problem starting workers, these logs are a good place to look.
  • shuffler logs contain messages from workers that consolidate the results of parallel pipeline operations.
  • docker and kubelet logs contain messages related to these public technologies, which are used on Dataflow workers.

Set pipeline worker log levels


The default SLF4J logging level set on workers by the Apache Beam SDK for Java is INFO. All log messages of INFO or higher (INFO, WARN, ERROR) will be emitted. You can set a different default log level to support lower SLF4J logging levels (TRACE or DEBUG) or set different log levels for different packages of classes in your code.

Two pipeline options are provided to allow you to set worker log levels from the command line or programmatically:

  • --defaultWorkerLogLevel=<level>: use this option to set all loggers at the specified default level. For example, the following command-line option will override the default Dataflow INFO log level, and set it to DEBUG:
  • --workerLogLevelOverrides={"<package or class>":"<level>"}: use this option to set the logging level for specified package(s) or class(es). For example, to override the default pipeline log level for the package, and set it to TRACE:
    or to override the default pipeline logging level for the class, and set it to DEBUG:
    Multiple overrides can be accomplished by providing a JSON map:

The following example programmatically sets pipeline logging options with default values that can be overridden from the command line:

 PipelineOptions options = ...
 DataflowWorkerLoggingOptions loggingOptions =;
 // Overrides the default log level on the worker to emit logs at TRACE or higher.
 // Overrides the Foo class and "" package to emit logs at WARN or higher.
     .addOverrideForClass(Foo.class, Level.WARN)
     .addOverrideForPackage(Package.getPackage(""), Level.WARN);


This feature is not yet available in the Apache Beam SDK for Python.

View the log of launched BigQuery jobs

When using BigQuery in your Dataflow pipeline, BigQuery jobs are launched to perform various actions on your behalf, such as loading data, exporting data, and so on. For troubleshooting and monitoring purposes, the Dataflow monitoring UI has additional information on these BigQuery jobs available in the Logs panel.

The BigQuery jobs information displayed in the Logs panel is stored and loaded from a BigQuery system table, so a billing cost is incurred when the underlying BigQuery table is queried.

Set up your project

To view the BigQuery jobs information, your pipeline must use Apache Beam 2.24.0 or later; however, until that is released, you must use a development version of the Apache Beam SDK built from the main branch.


  1. Add the following profile to the pom.xml file for your project.

      <!-- Additional profiles listed here. -->
  2. When testing or running your project, set the profile option to the id value listed in your pom.xml and set the beam.version property to 2.24.0-SNAPSHOT or later. For example:

    mvn test -Psnapshot -Dbeam.version=2.24.0-SNAPSHOT

    For more snapshot values, see the snapshot index.


  1. Log into GitHub.

  2. Navigate to the results list for successfully completed Apache Beam Python SDK builds.

  3. Click a recently-completed job that was built from the main (master) branch.

  4. In the side panel, click List files on Google Cloud Storage Bucket.

  5. In the main panel, expand List file on Google Cloud Storage Bucket.

  6. Download the zip file from the file list to a local machine or location where you run your Python project.

    The Cloud Storage bucket name is beam-wheels-staging, so you must include that when constructing your download URL. For example:

    gsutil cp gs://beam-wheels-staging/master/02bf081d0e86f16395af415cebee2812620aff4b-207975627/ <var>SAVE_TO_LOCATION</var>
  7. Install the downloaded zip file.

    pip install
  8. When you run your Apache Beam pipeline, pass in the --sdk_location flag and reference the SDK zip file.

View the BigQuery job details

To list the BigQuery jobs, open the BigQuery Jobs tab and select the location of the BigQuery jobs. Next, click Load BigQuery Jobs and confirm the dialog. After the query completes, the jobs list is displayed.

The Load BigQuery Jobs button in the BigQuery jobs information tablej

Basic information about each job is provided including job id, type, duration, and so on.

A table showing the BigQuery jobs that were run during the current pipeline job execution.

For more detailed information on a specific job, click Command line in the More Info column.

In the modal window for the command line, copy the bq jobs describe command and run it locally or in Cloud Shell.

gcloud alpha bq jobs describe BIGQUERY_JOB_ID

The bq jobs describe command outputs JobStatistics, which provides further details that are useful when diagnosing a slow or stuck BigQuery job.

Alternatively, when you use BigQueryIO with a SQL query, a query job is issued. Click View query in the More Info column to see the SQL query used by the job.