Dataproc job and cluster logs can be viewed, searched, filtered, and archived in Cloud Logging.
See Google Cloud's operations suite Pricing to understand your costs.
See Logs retention periods for information on logging retention.
See Logs Exclusions to disable all logs or exclude logs from Logging.
See Overview of Logs Exports to export logs from Logging to Cloud Storage, BigQuery, or Pub/Sub.
Dataproc job logs in Logging
When you run a Dataproc job, job driver output is streamed to the Cloud Console, displayed in the command terminal window (for jobs submitted from the command line), and stored in Cloud Storage (see Accessing job driver output). The section explains how to enable Dataproc to also save job driver logs in Logging.
Enabling job driver logs in Cloud Logging
To enable job driver logs in Logging, set the following cluster property when creating the cluster:
dataproc:dataproc.logging.stackdriver.job.driver.enable=true
The following cluster properties are also required, and are set by default when a cluster is created:
dataproc:dataproc.logging.stackdriver.enable=true dataproc:jobs.file-backed-output.enable=true
Enabling YARN container logs in Cloud Logging
To enable job resourcing of YARN container logs, set the following cluster property when creating the cluster:
dataproc:dataproc.logging.stackdriver.job.yarn.container.enable=true
The following property is also required, and is set by default when a cluster is created;
dataproc:dataproc.logging.stackdriver.enable=true
Accessing job logs in Cloud Logging
You can access Logging using the Logging console, the gcloud logging command, or the Logging API.
Console
Dataproc Job driver and YARN container logs are listed under are listed under the Cloud Dataproc Job resource.
Job driver log example:

YARN container log example:

gcloud
You can read job log entries using the gcloud logging read command. The following command uses cluster labels to filter the returned log entries.
gcloud logging read \ resource.type=cloud_dataproc_job \ resource.labels.region=cluster-region \ resource.labels.job_id=my-job-id
Sample output (partial):
jsonPayload: class: org.apache.hadoop.hdfs.StateChange filename: hadoop-hdfs-namenode-test-dataproc-resize-cluster-20190410-38an-m-0.log ,,, logName: projects/project-id/logs/hadoop-hdfs-namenode --- jsonPayload: class: SecurityLogger.org.apache.hadoop.security.authorize.ServiceAuthorizationManager filename: cluster-name-dataproc-resize-cluster-20190410-38an-m-0.log ... logName: projects/google.com:hadoop-cloud-dev/logs/hadoop-hdfs-namenode
REST API
You can use the Logging REST API to list log entries (see entries.list).
Dataproc cluster logs in Logging
Dataproc exports the following Apache Hadoop, Spark, Hive, Zookeeper, and other Dataproc cluster logs to Cloud Logging.
Log Type | Log Name | Description |
---|---|---|
Master daemon logs | hadoop-hdfs hadoop-hdfs-namenode hadoop-hdfs-secondary namenode hadoop-hdfs-zkfc hive-metastore hive-server2 mapred-mapred-historyserver yarn-yarn-resourcemanager yarn-yarn-timelineserver zookeeper |
Journal node HDFS namenode HDFS secondary namenode Zookeeper failover controller Hive metastore Hive server2 Mapreduce job history server YARN resource manager YARN timeline server Zookeeper server |
Worker daemon logs |
hadoop-hdfs-datanode yarn-yarn-nodemanager |
HDFS datanode YARN nodemanager |
System logs |
autoscaler google.dataproc.agent google.dataproc.startup |
Dataproc autoscaler log Dataproc agent log Dataproc startup script log + initialization action log |
Accessing cluster logs in Cloud Logging
You can access Logging using the Logging console, the gcloud logging command, or the Logging API.
Console
Dataproc cluster logs are listed under the Cloud Dataproc Cluster resource. Select a log type from the drop-down list.

gcloud
You can read cluster log entries using the gcloud logging read command. The following command uses cluster labels to filter the returned log entries.
gcloud logging read <<'EOF' resource.type=cloud_dataproc_cluster resource.labels.region=cluster-region resource.labels.cluster_name=cluster-name resource.labels.cluster_uuid=cluster-uuid EOF
Sample output (partial):
jsonPayload: class: org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger filename: yarn-yarn-resourcemanager-cluster-name-m.log ... logName: projects/project-id/logs/yarn-yarn-resourcemanager --- jsonPayload: class: org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger filename: yarn-yarn-resourcemanager-component-gateway-cluster-m.log ... logName: projects/project-name/logs/yarn-yarn-resourcemanager
REST API
You can use the Logging REST API to list log entries (see entries.list).
Permissions
To write logs to Logging, the Dataproc VM service
account must have the logging.logWriter
role
IAM role. The default Dataproc service account has this role. If you use
a custom service account,
you must assign this role to the service account.
Whats next
- Explore Google Cloud's operations suite