Component HDFS failed to activate during cluster creation

Problem

Cluster creation fails with the following error:
Failed to initialize node
<cluster-name-m>: Component hdfs failed to activate See output in:
gs://<staging-bucket>/google-cloud-dataproc-metainfo/<cluster-uuid>/<cluster-name-m>/dataproc-startup-script_output

Example of errors to look out for in the startup script logs:

1.  <13>Jun 11 10:02:51 google-dataproc-startup[2043]: <13>Jun 11 10:02:51
    setup-hadoop-hdfs-namenode[3624]: mkdir: `/user/com?pute': No such file or
    directory`

2.  <13>Nov 17 17:53:21 google-dataproc-startup[1397]: <13>Nov 17 17:53:21
    activate-component-hdfs[3545]: mkdir: Illegal file pattern:
    Illegal/unsupported escape sequence near index 10

Environment

  • Cloud Dataproc

Solution

  1. Look out for the usernames with special characters under Compute Engine > Metadata > SSH key and identify which one could be the culprit.
  2. Remove it according to instructions here.

Cause

During cluster creation, the startup script tries to initialize HDFS directories based on the usernames of SSH keys in the project using the Hadoop file system command hadoop fs -mkdir <dirname>.

Poorly formatted usernames cause this command to fail leading cluster creation to fail.