Dataproc job and cluster logs can be viewed, searched, filtered, and archived in Stackdriver Logging.
See Stackdriver Pricing to understand your costs.
See Logs retention periods for information on logging retention.
See Logs Exclusions to disable all logs or exclude logs from Logging.
See Overview of Logs Exports to export logs from Stackdriver to Cloud Storage, BigQuery, or Pub/Sub.
Dataproc job logs in Stackdriver
When you run a Dataproc job, job driver output is streamed to the Cloud Console, displayed in the command terminal window (for jobs submitted from the command line), and stored in Cloud Storage (see Accessing job driver output). The section explains how to enable Dataproc to also save job driver logs in Logging.
Enabling job driver logs in Stackdriver Logging
To enable job driver logs in Logging, set the following cluster property when creating the cluster:
dataproc:dataproc.logging.stackdriver.job.driver.enable=true
The following cluster properties are also required, and are set by default when a cluster is created:
dataproc:dataproc.logging.stackdriver.enable=true dataproc:jobs.file-backed-output.enable=true
Enabling YARN container logs in Stackdriver Logging
To enable job resourcing of YARN container logs, set the following cluster property when creating the cluster:
dataproc:dataproc.logging.stackdriver.job.yarn.container.enable=true
The following property is also required, and is set by default when a cluster is created;
dataproc:dataproc.logging.stackdriver.enable=true
Accessing job logs in Stackdriver Logging
You can access Logging using the Logging console, the gcloud logging command, or the Logging API.
Console
Dataproc Job driver and YARN container logs are listed under are listed under the Cloud Dataproc Job resource.
Job driver log example:

YARN container log example:

gcloud
You can read job log entries using the gcloud logging read command. The following command uses cluster labels to filter the returned log entries.
gcloud logging read \ 'resource.type=cloud_dataproc_job resource.labels.region=cluster-region resource.labels.job_id=my-job-id'
Sample output (partial):
jsonPayload: class: org.apache.hadoop.hdfs.StateChange filename: hadoop-hdfs-namenode-test-dataproc-resize-cluster-20190410-38an-m-0.log ,,, logName: projects/project-id/logs/hadoop-hdfs-namenode --- jsonPayload: class: SecurityLogger.org.apache.hadoop.security.authorize.ServiceAuthorizationManager filename: cluster-name-dataproc-resize-cluster-20190410-38an-m-0.log ... logName: projects/google.com:hadoop-cloud-dev/logs/hadoop-hdfs-namenode
REST API
You can use the Logging REST API to list log entries (see entries.list).
Dataproc cluster logs in Stackdriver
Dataproc exports the following Apache Hadoop, Spark, Hive, Zookeeper, and other Dataproc cluster logs to Stackdriver Logging.
Log Type | Log Name | Description |
---|---|---|
Master daemon logs | hadoop-hdfs hadoop-hdfs-namenode hadoop-hdfs-secondary namenode hadoop-hdfs-zkfc hive-metastore hive-server2 mapred-mapred-historyserver yarn-yarn-resourcemanager yarn-yarn-timelineserver zookeeper |
Journal node HDFS namenode HDFS secondary namenode Zookeeper failover controller Hive metastore Hive server2 Mapreduce job history server YARN resource manager YARN timeline server Zookeeper server |
Worker daemon logs |
hadoop-hdfs-datanode yarn-yarn-nodemanager |
HDFS datanode YARN nodemanager |
System logs |
autoscaler google.dataproc.agent google.dataproc.startup |
Dataproc autoscaler log Dataproc agent log Dataproc startup script log + initialization action log |
Accessing cluster logs in Stackdriver Logging
You can access Logging using the Logging console, the gcloud logging command, or the Logging API.
Console
Dataproc cluster logs are listed under the Cloud Dataproc Cluster resource. Select a log type from the drop-down list.

gcloud
You can read cluster log entries using the gcloud logging read command. The following command uses cluster labels to filter the returned log entries.
gcloud logging read \ 'resource.type=cloud_dataproc_cluster resource.labels.region=cluster-region resource.labels.cluster_name=cluster-name resource.labels.cluster_uuid=cluster-uuid'
Sample output (partial):
jsonPayload: class: org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger filename: yarn-yarn-resourcemanager-cluster-name-m.log ... logName: projects/project-id/logs/yarn-yarn-resourcemanager --- jsonPayload: class: org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger filename: yarn-yarn-resourcemanager-component-gateway-cluster-m.log ... logName: projects/project-name/logs/yarn-yarn-resourcemanager
REST API
You can use the Logging REST API to list log entries (see entries.list).
Whats next
- Explore Stackdriver