Looking at log and configuration information can be useful to troubleshoot a
cluster or job. Unfortunately, there are many log and configuration files, and
gathering each one for investigation can be time consuming. To address this
problem, Dataproc clusters support a special diagnose
command through the Cloud SDK. This command gathers
and archives important system, Spark/Hadoop, and Dataproc logs,
and then uploads the archive to the Cloud Storage
bucket attached to your cluster.
Using the Cloud SDK diagnose command
You can use the Cloud SDK diagnose
command on your Dataproc
clusters (see Dataproc and Cloud SDK).
Once the Cloud SDK is installed and configured, you can run the
gcloud dataproc clusters diagnose
command on your cluster as shown below. Replace cluster-name with the
name of your cluster and region with your cluster's region, for
example, --region=us-central1
.
gcloud dataproc clusters diagnose cluster-name \ --region=region \ ... other args ...
The command outputs the name and location of the archive that contains your data.
... Saving archive to cloud Copying file:///tmp/tmp.FgWEq3f2DJ/diagnostic.tar ... Uploading ...23db9-762e-4593-8a5a-f4abd75527e6/diagnostic.tar ... Diagnostic results saved in: gs://bucket-name/.../cluster-uuid/.../job-id/diagnostic.tar ...In this example, bucket-name is the Cloud Storage bucket attached to your cluster, cluster-uuid is the unique ID (UUID) of your cluster, and job-id is the UUID belonging to the system task that ran the diagnose command.
When you create a Dataproc cluster, Dataproc
creates a Cloud Storage bucket and attaches it to your cluster. The
diagnose command outputs the archive file to this bucket. To determine the
name of the bucket created by Dataproc, use the
Cloud SDK clusters describe
command. The bucket associated with your
cluster is listed next to configurationBucket
.
gcloud dataproc clusters describe cluster-name \ --region=region \ ... clusterName: cluster-name clusterUuid: daa40b3f-5ff5-4e89-9bf1-bcbfec6e0eac configuration: configurationBucket: dataproc-edc9d85f-...-us ...
Running the diagnostic script from the master node (optional)
The Cloud SDK diagnose command can fail or time out if a cluster is in an error state and cannot accept diagnose tasks from the Dataproc server. To avoid this issue, you can SSH into the master node, download the diagnostic script, then run the script locally on the master node:
gcloud compute ssh hostname
gsutil cp gs://dataproc-diagnostic-scripts/diagnostic-script.sh .
sudo bash diagnostic-script.sh
The diagnostic tarball will be saved in a local temporary directory. If you want, you can follow the instructions in the command output to upload it to a Cloud Storage bucket and share with Google Support.
Sharing the data gathered by diagnose
You can share the archive generated by the diagnose
command in two ways:
- Download the file from Cloud Storage, then share the downloaded archive.
- Change the permissions on the archive to allow other Google Cloud Platform users or projects to access the file.
For example, the following command adds read permissions to the diagnose archive in a test-project
:
gsutil -m acl ch -g test-project:R path-to-archive
Items included in diagnose command output
The diagnose
command includes the following configuration files, logs,
and outputs from your cluster in an archive file. The archive file is placed
in the Cloud Storage bucket associated with your Dataproc
cluster, as discussed above.
Daemons and services information
Command executed | Location in archive |
---|---|
yarn node -list -all |
/system/yarn-nodes.log |
hdfs dfsadmin -report -live -decommissioning |
/system/hdfs-nodes.log |
hdfs dfs -du -h |
/system/hdfs-du.log |
service --status-all |
/system/service.log |
systemctl --type service |
/system/systemd-services.log |
curl "http://${HOSTNAME}:8088/jmx" |
/metrics/resource_manager_jmx |
curl "http://${HOSTNAME}:8088/ws/v1/cluster/apps" |
/metrics/yarn_app_info |
curl "http://${HOSTNAME}:8088/ws/v1/cluster/nodes" |
/metrics/yarn_node_info |
curl "http://${HOSTNAME}:9870/jmx" |
/metrics/namenode_jmx |
JVM information
Command executed | Location in archive |
---|---|
jstack -l "${DATAPROC_AGENT_PID}" |
jstack/agent_${DATAPROC_AGENT_PID}.jstack |
jstack -l "${PRESTO_PID}" |
jstack/agent_${PRESTO_PID}.jstack |
jstack -l "${JOB_DRIVER_PID}" |
jstack/driver_${JOB_DRIVER_PID}.jstack |
jinfo "${DATAPROC_AGENT_PID}" |
jinfo/agent_${DATAPROC_AGENT_PID}.jstack |
jinfo "${PRESTO_PID}" |
jinfo/agent_${PRESTO_PID}.jstack |
jinfo "${JOB_DRIVER_PID}" |
jinfo/agent_${JOB_DRIVER_PID}.jstack |
Linux system information
Command executed | Location in archive |
---|---|
df -h |
/system/df.log |
ps aux |
/system/ps.log |
free -m |
/system/free.log |
netstat -anp |
/system/netstat.log |
sysctl -a |
/system/sysctl.log |
uptime |
/system/uptime.log |
cat /proc/sys/fs/file-nr |
/system/fs-file-nr.log |
ping -c 1 |
/system/cluster-ping.log |
Log files
Item(s) included | Location in archive |
---|---|
All logs in /var/log with the following prefixes in their filename:cloud-sql-proxy dataproc druid gcdp gcs google hadoop hdfs hive knox presto spark syslog yarn zookeeper |
Files are placed in the archive logs folder, and keep their original filenames. |
Dataproc node startup logs for each node (master and worker) in your cluster. | Files are placed in the archive node_startup folder, which contains separate sub-folders for each machine in the cluster. |
Component gateway logs from journalctl -u google-dataproc-component-gateway |
/logs/google-dataproc-component-gateway.log |
Configuration files
Item(s) included | Location in archive |
---|---|
VM metadata | /conf/dataproc/metadata |
Environment variables in /etc/environment |
/conf/dataproc/environment |
Dataproc properties | /conf/dataproc/dataproc.properties |
All files in /etc/google-dataproc/ |
/conf/dataproc/ |
All files in /etc/hadoop/conf/ |
/conf/hadoop/ |
All files in /etc/hive/conf/ |
/conf/hive/ |
All files in /etc/hive-hcatalog/conf/ |
/conf/hive-hcatalog/ |
All files in /etc/knox/conf/ |
/conf/knox/ |
All files in /etc/pig/conf/ |
/conf/pig/ |
All files in /etc/presto/conf/ |
/conf/presto/ |
All files in /etc/spark/conf/ |
/conf/spark/ |
All files in /etc/tez/conf/ |
/conf/tez/ |
All files in /etc/zookeeper/conf/ |
/conf/zookeeper/ |