Diagnose Dataproc on GKE clusters

Dataproc provides a gcloud CLI dataproc clusters diagnose command to help you troubleshoot Dataproc on GKE cluster and job issues. This command gathers and archives cluster-related configuration files, logs, and outputs into an archive file. and then uploads the archive to the Cloud Storage staging bucket you specified when you created your Dataproc on GKE cluster.

Diagnose archive file

The following tables list metrics and other information included in the dataproc clusters diagnose command archive file.

System information

Item Archive location
GKE node metrics where virtual Dataproc on GKE pods run:
  • CPU usage
  • Memory usage
/system/NODE_NAME.json
Network metrics and file system status of running pods:
  • CPU usage
  • Memory usage
  • Network status
  • Filesystem status
/system/POD_NAME.json

Configuration information

Item Archive location
Cluster configmap /conf/configmap
Kubernetes deployment /conf/deployment
Role Based Access Control (RBAC)
  • /conf/role
  • /conf/rolebind
  • /conf/serviceaccount

Logs

Item Archive location
Agent log /logs/agent.log
Spark engine log /logs/sparkengine.log
Spark driver running and completed job logs over the last 24 hours /logs/DRIVER_ID

Job and pod information

Item Archive location
JobAttempt object /jobattempts
Kubernetes Pod object /pods

For more information