Problem
Cluster creation fails with the following error:
Failed to initialize node <cluster-name-m>: Component hdfs failed to activate See output in: gs://<staging-bucket>/google-cloud-dataproc-metainfo/<cluster-uuid>/<cluster-name-m>/dataproc-startup-script_output
Example of errors to look out for in the startup script logs:
1. <13>Jun 11 10:02:51 google-dataproc-startup[2043]: <13>Jun 11 10:02:51 setup-hadoop-hdfs-namenode[3624]: mkdir: `/user/com?pute': No such file or directory` 2. <13>Nov 17 17:53:21 google-dataproc-startup[1397]: <13>Nov 17 17:53:21 activate-component-hdfs[3545]: mkdir: Illegal file pattern: Illegal/unsupported escape sequence near index 10
Environment
- Cloud Dataproc
Solution
- Look out for the usernames with special characters under Compute Engine > Metadata > SSH key and identify which one could be the culprit.
- Remove it according to instructions here.
Cause
During cluster creation, the startup script tries to initialize HDFS directories based on the usernames of SSH keys in the project using the Hadoop file system command hadoop fs -mkdir <dirname>.
Poorly formatted usernames cause this command to fail leading cluster creation to fail.