Dataproc job and cluster logs can be viewed, searched, filtered, and archived in Cloud Logging.
See Google Cloud's operations suite Pricing to understand your costs.
See Logs retention periods for information on logging retention.
See Logs exclusions to disable all logs or exclude logs from Logging.
See Routing and storage overview to route logs from Logging to Cloud Storage, BigQuery, or Pub/Sub.
Job driver logging levels
Dataproc uses a default
logging level
of INFO
for job driver programs. This setting can be changed for one or more packages
with the gcloud dataproc jobs submit
command, which allows you to submit a job and specify job driver logging levels
with the --driver-log-levels
flag.
The root
package controls the root logger level. For example:
gcloud dataproc jobs submit hadoop ...\
--driver-log-levels root=FATAL,com.example=INFO
Cloud Logging can be set at a more granular level for a specific job. For example, to assist in debugging issues when reading files from Cloud Storage, you can submit a job as follows:
gcloud dataproc jobs submit hadoop ...\
--driver-log-levels com.google.cloud.hadoop.gcsio=DEBUG
Component executive logging levels
You can set Spark, Hadoop, Flink and other Dataproc component executive logging levels when you create a cluster by one or both of the following methods:
Write an initialization action that edits the component
log4j.properties
orlog4j2.properties
file.Specify component
log4j
oflog4j2
cluster properties.
Dataproc job driver logs in Logging
See Dataproc job output and logs for information on enabling Dataproc job driver logs in Logging.
Access job logs in Logging
You can access Dataproc job logs using the Logs Explorer, the gcloud logging command, or the Logging API.
Console
Dataproc Job driver and YARN container logs are listed under the Cloud Dataproc Job resource.
Example: Job driver log after running a Logs Explorer query with the following selections:
- Resource:
Cloud Dataproc Job
- Log name:
dataproc.job.driver

Example: YARN container log after running a Logs Explorer query with the following selections:
- Resource:
Cloud Dataproc Job
- Log name:
dataproc.job.yarn.container

gcloud
You can read job log entries using the gcloud logging read command. The resource arguments must be enclosed in quotes ("..."). The following command uses cluster labels to filter the returned log entries.
gcloud logging read \ "resource.type=cloud_dataproc_job \ resource.labels.region=cluster-region \ resource.labels.job_id=my-job-id"
Sample output (partial):
jsonPayload: class: org.apache.hadoop.hdfs.StateChange filename: hadoop-hdfs-namenode-test-dataproc-resize-cluster-20190410-38an-m-0.log ,,, logName: projects/project-id/logs/hadoop-hdfs-namenode --- jsonPayload: class: SecurityLogger.org.apache.hadoop.security.authorize.ServiceAuthorizationManager filename: cluster-name-dataproc-resize-cluster-20190410-38an-m-0.log ... logName: projects/google.com:hadoop-cloud-dev/logs/hadoop-hdfs-namenode
REST API
You can use the Logging REST API to list log entries (see entries.list).
Dataproc cluster logs in Logging
Dataproc exports the following Apache Hadoop, Spark, Hive, Zookeeper, and other Dataproc cluster logs to Cloud Logging.
Log Type | Log Name | Description |
---|---|---|
Master daemon logs | hadoop-hdfs hadoop-hdfs-namenode hadoop-hdfs-secondary namenode hadoop-hdfs-zkfc hadoop-yarn-resourcemanager hadoop-yarn-timelineserver hive-metastore hive-server2 mapred-mapred-historyserver zookeeper |
Journal node HDFS namenode HDFS secondary namenode Zookeeper failover controller YARN resource manager YARN timeline server Hive metastore Hive server2 Mapreduce job history server Zookeeper server |
Worker daemon logs |
hadoop-hdfs-datanode hadoop-yarn-nodemanager |
HDFS datanode YARN nodemanager |
System logs |
autoscaler google.dataproc.agent google.dataproc.startup |
Dataproc autoscaler log Dataproc agent log Dataproc startup script log + initialization action log |
Access cluster logs in Cloud Logging
You can access Dataproc cluster logs using the Logs Explorer, the gcloud logging command, or the Logging API.
Console
Make the following query selections to view cluster logs in the Logs Explorer:
- Resource:
Cloud Dataproc Cluster
- Log name: log name

gcloud
You can read cluster log entries using the gcloud logging read command. The resource arguments must be enclosed in quotes ("..."). The following command uses cluster labels to filter the returned log entries.
gcloud logging read <<'EOF' "resource.type=cloud_dataproc_cluster resource.labels.region=cluster-region resource.labels.cluster_name=cluster-name resource.labels.cluster_uuid=cluster-uuid" EOF
Sample output (partial):
jsonPayload: class: org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService filename: hadoop-yarn-resourcemanager-cluster-name-m.log ... logName: projects/project-id/logs/hadoop-yarn-resourcemanager --- jsonPayload: class: org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService filename: hadoop-yarn-resourcemanager-component-gateway-cluster-m.log ... logName: projects/project-id/logs/hadoop-yarn-resourcemanager
REST API
You can use the Logging REST API to list log entries (see entries.list).
Permissions
To write logs to Logging, the Dataproc VM service
account must have the logging.logWriter
role
IAM role. The default Dataproc service account has this role. If you use
a custom service account,
you must assign this role to the service account.
Protecting the logs
By default, logs in Logging are encrypted at rest. You can enable customer-managed encryption keys (CMEK) to encrypt the logs. For more information on CMEK support, see Manage the keys that protect Log Router data and Manage the keys that protect Logging storage data.
Whats next
- Explore Google Cloud's operations suite