Stackdriver Logging

Cloud Dataproc job and cluster logs can be viewed, searched, filtered, and archived in Stackdriver Logging.

Cloud Dataproc job logs in Stackdriver

When you run a Cloud Dataproc job, job driver output is streamed to the GCP Console, displayed in the command terminal window (for jobs submitted from the command line), and stored in Cloud Storage (see Accessing job driver output). The section explains how to enable Cloud Dataproc to also save job driver logs in Logging.

Enabling job driver logs in Stackdriver Logging

To enable job driver logs in Logging, set the following cluster property when creating the cluster:

dataproc:dataproc.logging.stackdriver.job.driver.enable=true

The following cluster properties are also required, and are set by default when a cluster is created:

dataproc:dataproc.logging.stackdriver.enable=true
dataproc:jobs.file-backed-output.enable=true

Enabling YARN container logs in Stackdriver Logging

To enable job resourcing of YARN container logs, set the following cluster property when creating the cluster:

dataproc:dataproc.logging.stackdriver.job.yarn.container.enable=true

The following property is also required, and is set by default when a cluster is created;

dataproc:dataproc.logging.stackdriver.enable=true

Accessing job logs in Stackdriver Logging

You can access Logging using the Logging console, the gcloud logging command, or the Logging API.

Console

Cloud Dataproc Job driver and YARN container logs are listed under are listed under the Cloud Dataproc Job resource.

Job driver log example:

YARN container log example:

gcloud

You can read job log entries using the gcloud logging read command. The following command uses cluster labels to filter the returned log entries.

gcloud logging read \
    'resource.type=cloud_dataproc_job
    resource.labels.region=cluster-region
    resource.labels.job_id=my-job-id'

Sample output (partial):

jsonPayload:
  class: org.apache.hadoop.hdfs.StateChange
  filename: hadoop-hdfs-namenode-test-dataproc-resize-cluster-20190410-38an-m-0.log
  ,,,
logName: projects/project-id/logs/hadoop-hdfs-namenode
---
jsonPayload:
  class: SecurityLogger.org.apache.hadoop.security.authorize.ServiceAuthorizationManager
  filename: cluster-name-dataproc-resize-cluster-20190410-38an-m-0.log
  ...
logName: projects/google.com:hadoop-cloud-dev/logs/hadoop-hdfs-namenode

REST API

You can use the Logging REST API to list log entries (see entries.list).

Cloud Dataproc cluster logs in Stackdriver

Cloud Dataproc exports the following Apache Hadoop, Spark, Hive, Zookeeper, and other Cloud Dataproc cluster logs to Stackdriver Logging.

Log Type Log Name Description
Master daemon logs hadoop-hdfs
hadoop-hdfs-namenode
hadoop-hdfs-secondary namenode
hadoop-hdfs-zkfc
hive-metastore
hive-server2
mapred-mapred-historyserver
yarn-yarn-resourcemanager
yarn-yarn-timelineserver
zookeeper
Journal node
HDFS namenode
HDFS secondary namenode
Zookeeper failover controller
Hive metastore
Hive server2
Mapreduce job history server
YARN resource manager
YARN timeline server
Zookeeper server
Worker daemon logs hadoop-hdfs-datanode
yarn-yarn-nodemanager
HDFS datanode
YARN nodemanager
System logs autoscaler
google.dataproc.agent
google.dataproc.startup
Cloud Dataproc autoscaler log
Cloud Dataproc agent log
Cloud Dataproc startup script log + initialization action log

Accessing cluster logs in Stackdriver Logging

You can access Logging using the Logging console, the gcloud logging command, or the Logging API.

Console

Cloud Dataproc cluster logs are listed under the Cloud Dataproc Cluster resource. Select a log type from the drop-down list.

gcloud

You can read cluster log entries using the gcloud logging read command. The following command uses cluster labels to filter the returned log entries.

gcloud logging read \
    'resource.type=cloud_dataproc_cluster
    resource.labels.region=cluster-region
    resource.labels.cluster_name=cluster-name
    resource.labels.cluster_uuid=cluster-uuid'

Sample output (partial):

jsonPayload:
  class: org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger
  filename: yarn-yarn-resourcemanager-cluster-name-m.log
  ...
logName: projects/project-id/logs/yarn-yarn-resourcemanager
---
jsonPayload:
  class: org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger
  filename: yarn-yarn-resourcemanager-component-gateway-cluster-m.log
  ...
logName: projects/project-name/logs/yarn-yarn-resourcemanager

REST API

You can use the Logging REST API to list log entries (see entries.list).

Whats next

هل كانت هذه الصفحة مفيدة؟ يرجى تقييم أدائنا:

إرسال تعليقات حول...

Cloud Dataproc Documentation
هل تحتاج إلى مساعدة؟ انتقل إلى صفحة الدعم.