gcloud alpha dataproc clusters create

NAME
gcloud alpha dataproc clusters create - create a cluster
SYNOPSIS
gcloud alpha dataproc clusters create (CLUSTER : --region=REGION) [--async] [--autoscaling-policy=AUTOSCALING_POLICY] [--bucket=BUCKET] [--dataproc-metastore=DATAPROC_METASTORE] [--driver-pool-accelerator=[type=TYPE,[count=COUNT],…]] [--driver-pool-boot-disk-size=DRIVER_POOL_BOOT_DISK_SIZE] [--driver-pool-boot-disk-type=DRIVER_POOL_BOOT_DISK_TYPE] [--driver-pool-id=DRIVER_POOL_ID] [--driver-pool-local-ssd-interface=DRIVER_POOL_LOCAL_SSD_INTERFACE] [--driver-pool-machine-type=DRIVER_POOL_MACHINE_TYPE] [--driver-pool-min-cpu-platform=PLATFORM] [--driver-pool-size=DRIVER_POOL_SIZE] [--enable-component-gateway] [--initialization-action-timeout=TIMEOUT; default="10m"] [--initialization-actions=CLOUD_STORAGE_URI,[…]] [--labels=[KEY=VALUE,…]] [--master-accelerator=[type=TYPE,[count=COUNT],…]] [--master-boot-disk-size=MASTER_BOOT_DISK_SIZE] [--master-boot-disk-type=MASTER_BOOT_DISK_TYPE] [--master-local-ssd-interface=MASTER_LOCAL_SSD_INTERFACE] [--master-machine-type=MASTER_MACHINE_TYPE] [--master-min-cpu-platform=PLATFORM] [--max-idle=MAX_IDLE] [--min-secondary-worker-fraction=MIN_SECONDARY_WORKER_FRACTION] [--node-group=NODE_GROUP] [--num-driver-pool-local-ssds=NUM_DRIVER_POOL_LOCAL_SSDS] [--num-master-local-ssds=NUM_MASTER_LOCAL_SSDS] [--num-masters=NUM_MASTERS] [--num-secondary-worker-local-ssds=NUM_SECONDARY_WORKER_LOCAL_SSDS] [--num-worker-local-ssds=NUM_WORKER_LOCAL_SSDS] [--optional-components=[COMPONENT,…]] [--private-ipv6-google-access-type=PRIVATE_IPV6_GOOGLE_ACCESS_TYPE] [--properties=[PREFIX:PROPERTY=VALUE,…]] [--secondary-worker-accelerator=[type=TYPE,[count=COUNT],…]] [--secondary-worker-boot-disk-size=SECONDARY_WORKER_BOOT_DISK_SIZE] [--secondary-worker-boot-disk-type=SECONDARY_WORKER_BOOT_DISK_TYPE] [--secondary-worker-local-ssd-interface=SECONDARY_WORKER_LOCAL_SSD_INTERFACE] [--secondary-worker-machine-types=type=MACHINE_TYPE[,type=MACHINE_TYPE…][,rank=RANK]] [--secondary-worker-standard-capacity-base=SECONDARY_WORKER_STANDARD_CAPACITY_BASE] [--shielded-integrity-monitoring] [--shielded-secure-boot] [--shielded-vtpm] [--temp-bucket=TEMP_BUCKET] [--worker-accelerator=[type=TYPE,[count=COUNT],…]] [--worker-boot-disk-size=WORKER_BOOT_DISK_SIZE] [--worker-boot-disk-type=WORKER_BOOT_DISK_TYPE] [--worker-local-ssd-interface=WORKER_LOCAL_SSD_INTERFACE] [--worker-machine-type=WORKER_MACHINE_TYPE] [--worker-min-cpu-platform=PLATFORM] [--zone=ZONE, -z ZONE] [--expiration-time=EXPIRATION_TIME     | --max-age=MAX_AGE] [--gce-pd-kms-key=GCE_PD_KMS_KEY : --gce-pd-kms-key-keyring=GCE_PD_KMS_KEY_KEYRING --gce-pd-kms-key-location=GCE_PD_KMS_KEY_LOCATION --gce-pd-kms-key-project=GCE_PD_KMS_KEY_PROJECT] [--metadata=KEY=VALUE,[KEY=VALUE,…] --scopes=SCOPE,[SCOPE,…] --service-account=SERVICE_ACCOUNT --tags=TAG,[TAG,…] --network=NETWORK     | --subnet=SUBNET --reservation=RESERVATION --reservation-affinity=RESERVATION_AFFINITY; default="any"] [--image=IMAGE     | --image-version=VERSION] [--kerberos-config-file=KERBEROS_CONFIG_FILE     | --enable-kerberos --kerberos-root-principal-password-uri=KERBEROS_ROOT_PRINCIPAL_PASSWORD_URI [--kerberos-kms-key=KERBEROS_KMS_KEY : --kerberos-kms-key-keyring=KERBEROS_KMS_KEY_KEYRING --kerberos-kms-key-location=KERBEROS_KMS_KEY_LOCATION --kerberos-kms-key-project=KERBEROS_KMS_KEY_PROJECT]] [--kms-key=KMS_KEY : --kms-keyring=KMS_KEYRING --kms-location=KMS_LOCATION --kms-project=KMS_PROJECT] [--no-address     | --public-ip-address] [--single-node     | --min-num-workers=MIN_NUM_WORKERS --num-secondary-workers=NUM_SECONDARY_WORKERS --num-workers=NUM_WORKERS --secondary-worker-type=TYPE; default="preemptible"] [GCLOUD_WIDE_FLAG]
DESCRIPTION
(ALPHA) Create a cluster.
EXAMPLES
To create a cluster, run:
gcloud alpha dataproc clusters create my-cluster --region=us-central1
POSITIONAL ARGUMENTS
Cluster resource - The name of the cluster to create. The arguments in this group can be used to specify the attributes of this resource. (NOTE) Some attributes are not given arguments in this group but can be set in other ways.

To set the project attribute:

  • provide the argument cluster on the command line with a fully specified name;
  • provide the argument --project on the command line;
  • set the property core/project.

This must be specified.

CLUSTER
ID of the cluster or fully qualified identifier for the cluster.

To set the cluster attribute:

  • provide the argument cluster on the command line.

This positional argument must be specified if any of the other arguments in this group are specified.

--region=REGION
Dataproc region for the cluster. Each Dataproc region constitutes an independent resource namespace constrained to deploying instances into Compute Engine zones inside the region. Overrides the default dataproc/region property value for this command invocation.

To set the region attribute:

  • provide the argument cluster on the command line with a fully specified name;
  • provide the argument --region on the command line;
  • set the property dataproc/region.
FLAGS
--async
Return immediately, without waiting for the operation in progress to complete.
--autoscaling-policy=AUTOSCALING_POLICY
ID of the autoscaling policy or fully qualified identifier for the autoscaling policy.

To set the autoscaling_policy attribute:

  • provide the argument --autoscaling-policy on the command line.
--bucket=BUCKET
The Google Cloud Storage bucket to use by default to stage job dependencies, miscellaneous config files, and job driver console output when using this cluster.
--dataproc-metastore=DATAPROC_METASTORE
Specify the name of a Dataproc Metastore service to be used as an external metastore in the format: "projects/{project-id}/locations/{region}/services/{service-name}".
--driver-pool-accelerator=[type=TYPE,[count=COUNT],…]
Attaches accelerators, such as GPUs, to the driver-pool instance(s).
type
The specific type of accelerator to attach to the instances, such as nvidia-tesla-k80 for NVIDIA Tesla K80. Use gcloud compute accelerator-types list to display available accelerator types.
count
The number of accelerators to attach to each instance. The default value is 1.
--driver-pool-boot-disk-size=DRIVER_POOL_BOOT_DISK_SIZE
The size of the boot disk. The value must be a whole number followed by a size unit of KB for kilobyte, MB for megabyte, GB for gigabyte, or TB for terabyte. For example, 10GB will produce a 10 gigabyte disk. The minimum size a boot disk can have is 10 GB. Disk size must be a multiple of 1 GB.
--driver-pool-boot-disk-type=DRIVER_POOL_BOOT_DISK_TYPE
The type of the boot disk. The value must be pd-balanced, pd-ssd, or pd-standard.
--driver-pool-id=DRIVER_POOL_ID
Custom identifier for the DRIVER Node Group being created. If not provided, a random string is generated.
--driver-pool-local-ssd-interface=DRIVER_POOL_LOCAL_SSD_INTERFACE
Interface to use to attach local SSDs to cluster driver pool node(s).
--driver-pool-machine-type=DRIVER_POOL_MACHINE_TYPE
The type of machine to use for the cluster driver pool nodes. Defaults to server-specified.
--driver-pool-min-cpu-platform=PLATFORM
When specified, the VM is scheduled on the host with a specified CPU architecture or a more recent CPU platform that's available in that zone. To list available CPU platforms in a zone, run:
gcloud compute zones describe ZONE

CPU platform selection may not be available in a zone. Zones that support CPU platform selection provide an availableCpuPlatforms field, which contains the list of available CPU platforms in the zone (see Availability of CPU platforms for more information).

--driver-pool-size=DRIVER_POOL_SIZE
The size of the cluster driver pool.
--enable-component-gateway
Enable access to the web UIs of selected components on the cluster through the component gateway.
--initialization-action-timeout=TIMEOUT; default="10m"
The maximum duration of each initialization action. See $ gcloud topic datetimes for information on duration formats.
--initialization-actions=CLOUD_STORAGE_URI,[…]
A list of Google Cloud Storage URIs of executables to run on each node in the cluster.
--labels=[KEY=VALUE,…]
List of label KEY=VALUE pairs to add.

Keys must start with a lowercase character and contain only hyphens (-), underscores (_), lowercase characters, and numbers. Values must contain only hyphens (-), underscores (_), lowercase characters, and numbers.

--master-accelerator=[type=TYPE,[count=COUNT],…]
Attaches accelerators, such as GPUs, to the master instance(s).
type
The specific type of accelerator to attach to the instances, such as nvidia-tesla-k80 for NVIDIA Tesla K80. Use gcloud compute accelerator-types list to display available accelerator types.
count
The number of accelerators to attach to each instance. The default value is 1.
--master-boot-disk-size=MASTER_BOOT_DISK_SIZE
The size of the boot disk. The value must be a whole number followed by a size unit of KB for kilobyte, MB for megabyte, GB for gigabyte, or TB for terabyte. For example, 10GB will produce a 10 gigabyte disk. The minimum size a boot disk can have is 10 GB. Disk size must be a multiple of 1 GB.
--master-boot-disk-type=MASTER_BOOT_DISK_TYPE
The type of the boot disk. The value must be pd-balanced, pd-ssd, or pd-standard.
--master-local-ssd-interface=MASTER_LOCAL_SSD_INTERFACE
Interface to use to attach local SSDs to master node(s) in a cluster.
--master-machine-type=MASTER_MACHINE_TYPE
The type of machine to use for the master. Defaults to server-specified.
--master-min-cpu-platform=PLATFORM
When specified, the VM is scheduled on the host with a specified CPU architecture or a more recent CPU platform that's available in that zone. To list available CPU platforms in a zone, run:
gcloud compute zones describe ZONE

CPU platform selection may not be available in a zone. Zones that support CPU platform selection provide an availableCpuPlatforms field, which contains the list of available CPU platforms in the zone (see Availability of CPU platforms for more information).

--max-idle=MAX_IDLE
The duration before cluster is auto-deleted after last job completes, such as "2h" or "1d". See $ gcloud topic datetimes for information on duration formats.
--min-secondary-worker-fraction=MIN_SECONDARY_WORKER_FRACTION
Minimum fraction of secondary worker nodes required to create the cluster. If it is not met, cluster creation will fail. Must be a decimal value between 0 and 1. The number of required secondary workers is calculated by ceil(min-secondary-worker-fraction * num_secondary_workers). Defaults to 0.0001.
--node-group=NODE_GROUP
The name of the sole-tenant node group to create the cluster on. Can be a short name ("node-group-name") or in the format "projects/{project-id}/zones/{zone}/nodeGroups/{node-group-name}".
--num-driver-pool-local-ssds=NUM_DRIVER_POOL_LOCAL_SSDS
The number of local SSDs to attach to each cluster driver pool node.
--num-master-local-ssds=NUM_MASTER_LOCAL_SSDS
The number of local SSDs to attach to the master in a cluster.
--num-masters=NUM_MASTERS
The number of master nodes in the cluster.
Number of Masters Cluster Mode
1 Standard
3 High Availability
--num-secondary-worker-local-ssds=NUM_SECONDARY_WORKER_LOCAL_SSDS
The number of local SSDs to attach to each preemptible worker in a cluster.
--num-worker-local-ssds=NUM_WORKER_LOCAL_SSDS
The number of local SSDs to attach to each worker in a cluster.
--optional-components=[COMPONENT,…]
List of optional components to be installed on cluster machines.

The following page documents the optional components that can be installed: https://cloud.google.com/dataproc/docs/concepts/configuring-clusters/optional-components.

--private-ipv6-google-access-type=PRIVATE_IPV6_GOOGLE_ACCESS_TYPE
The private IPv6 Google access type for the cluster. PRIVATE_IPV6_GOOGLE_ACCESS_TYPE must be one of: inherit-subnetwork, outbound, bidirectional.
--properties=[PREFIX:PROPERTY=VALUE,…]
Specifies configuration properties for installed packages, such as Hadoop and Spark.

Properties are mapped to configuration files by specifying a prefix, such as "core:io.serializations". The following are supported prefixes and their mappings:

Prefix File Purpose of file
capacity-scheduler capacity-scheduler.xml Hadoop YARN Capacity Scheduler configuration
core core-site.xml Hadoop general configuration
distcp distcp-default.xml Hadoop Distributed Copy configuration
hadoop-env hadoop-env.sh Hadoop specific environment variables
hdfs hdfs-site.xml Hadoop HDFS configuration
hive hive-site.xml Hive configuration
mapred mapred-site.xml Hadoop MapReduce configuration
mapred-env mapred-env.sh Hadoop MapReduce specific environment variables
pig pig.properties Pig configuration
spark spark-defaults.conf Spark configuration
spark-env spark-env.sh Spark specific environment variables
yarn yarn-site.xml Hadoop YARN configuration
yarn-env yarn-env.sh Hadoop YARN specific environment variables
See https://cloud.google.com/dataproc/docs/concepts/configuring-clusters/cluster-properties for more information.
--secondary-worker-accelerator=[type=TYPE,[count=COUNT],…]
Attaches accelerators, such as GPUs, to the secondary-worker instance(s).
type
The specific type of accelerator to attach to the instances, such as nvidia-tesla-k80 for NVIDIA Tesla K80. Use gcloud compute accelerator-types list to display available accelerator types.
count
The number of accelerators to attach to each instance. The default value is 1.
--secondary-worker-boot-disk-size=SECONDARY_WORKER_BOOT_DISK_SIZE
The size of the boot disk. The value must be a whole number followed by a size unit of KB for kilobyte, MB for megabyte, GB for gigabyte, or TB for terabyte. For example, 10GB will produce a 10 gigabyte disk. The minimum size a boot disk can have is 10 GB. Disk size must be a multiple of 1 GB.
--secondary-worker-boot-disk-type=SECONDARY_WORKER_BOOT_DISK_TYPE
The type of the boot disk. The value must be pd-balanced, pd-ssd, or pd-standard.
--secondary-worker-local-ssd-interface=SECONDARY_WORKER_LOCAL_SSD_INTERFACE
Interface to use to attach local SSDs to each secondary worker in a cluster.
--secondary-worker-machine-types=type=MACHINE_TYPE[,type=MACHINE_TYPE…][,rank=RANK]
Types of machines with optional rank for secondary workers to use. Defaults to server-specified.eg. --secondary-worker-machine-types="type=e2-standard-8,type=t2d-standard-8,rank=0"
--secondary-worker-standard-capacity-base=SECONDARY_WORKER_STANDARD_CAPACITY_BASE
The number of standard VMs in the Spot and Standard Mix feature.
--shielded-integrity-monitoring
Enables monitoring and attestation of the boot integrity of the cluster's VMs. vTPM (virtual Trusted Platform Module) must also be enabled. A TPM is a hardware module that can be used for different security operations, such as remote attestation, encryption, and sealing of keys.
--shielded-secure-boot
The cluster's VMs will boot with secure boot enabled.
--shielded-vtpm
The cluster's VMs will boot with the TPM (Trusted Platform Module) enabled. A TPM is a hardware module that can be used for different security operations, such as remote attestation, encryption, and sealing of keys.
--temp-bucket=TEMP_BUCKET
The Google Cloud Storage bucket to use by default to store ephemeral cluster and jobs data, such as Spark and MapReduce history files.
--worker-accelerator=[type=TYPE,[count=COUNT],…]
Attaches accelerators, such as GPUs, to the worker instance(s).
type
The specific type of accelerator to attach to the instances, such as nvidia-tesla-k80 for NVIDIA Tesla K80. Use gcloud compute accelerator-types list to display available accelerator types.
count
The number of accelerators to attach to each instance. The default value is 1.
--worker-boot-disk-size=WORKER_BOOT_DISK_SIZE
The size of the boot disk. The value must be a whole number followed by a size unit of KB for kilobyte, MB for megabyte, GB for gigabyte, or TB for terabyte. For example, 10GB will produce a 10 gigabyte disk. The minimum size a boot disk can have is 10 GB. Disk size must be a multiple of 1 GB.
--worker-boot-disk-type=WORKER_BOOT_DISK_TYPE
The type of the boot disk. The value must be pd-balanced, pd-ssd, or pd-standard.
--worker-local-ssd-interface=WORKER_LOCAL_SSD_INTERFACE
Interface to use to attach local SSDs to each worker in a cluster.
--worker-machine-type=WORKER_MACHINE_TYPE
The type of machine to use for workers. Defaults to server-specified.
--worker-min-cpu-platform=PLATFORM
When specified, the VM is scheduled on the host with a specified CPU architecture or a more recent CPU platform that's available in that zone. To list available CPU platforms in a zone, run:
gcloud compute zones describe ZONE

CPU platform selection may not be available in a zone. Zones that support CPU platform selection provide an availableCpuPlatforms field, which contains the list of available CPU platforms in the zone (see Availability of CPU platforms for more information).

--zone=ZONE, -z ZONE
The compute zone (e.g. us-central1-a) for the cluster. If empty and --region is set to a value other than global, the server will pick a zone in the region. Overrides the default compute/zone property value for this command invocation.
At most one of these can be specified:
--expiration-time=EXPIRATION_TIME
The time when cluster will be auto-deleted, such as "2017-08-29T18:52:51.142Z." See $ gcloud topic datetimes for information on time formats.
--max-age=MAX_AGE
The lifespan of the cluster before it is auto-deleted, such as "2h" or "1d". See $ gcloud topic datetimes for information on duration formats.
Key resource - The Cloud KMS (Key Management Service) cryptokey that will be used to protect the cluster. The 'Compute Engine Service Agent' service account must hold permission 'Cloud KMS CryptoKey Encrypter/Decrypter'. The arguments in this group can be used to specify the attributes of this resource.
--gce-pd-kms-key=GCE_PD_KMS_KEY
ID of the key or fully qualified identifier for the key.

To set the kms-key attribute:

  • provide the argument --gce-pd-kms-key on the command line.

This flag argument must be specified if any of the other arguments in this group are specified.

--gce-pd-kms-key-keyring=GCE_PD_KMS_KEY_KEYRING
The KMS keyring of the key.

To set the kms-keyring attribute:

  • provide the argument --gce-pd-kms-key on the command line with a fully specified name;
  • provide the argument --gce-pd-kms-key-keyring on the command line.
--gce-pd-kms-key-location=GCE_PD_KMS_KEY_LOCATION
The Google Cloud location for the key.

To set the kms-location attribute:

  • provide the argument --gce-pd-kms-key on the command line with a fully specified name;
  • provide the argument --gce-pd-kms-key-location on the command line.
--gce-pd-kms-key-project=GCE_PD_KMS_KEY_PROJECT
The Google Cloud project for the key.

To set the kms-project attribute:

  • provide the argument --gce-pd-kms-key on the command line with a fully specified name;
  • provide the argument --gce-pd-kms-key-project on the command line;
  • set the property core/project.
At most one of these can be specified:
Compute Engine options for Dataproc clusters.
--metadata=KEY=VALUE,[KEY=VALUE,…]
Metadata to be made available to the guest operating system running on the instances
--scopes=SCOPE,[SCOPE,…]
Specifies scopes for the node instances. Multiple SCOPEs can be specified, separated by commas. Examples:
gcloud alpha dataproc clusters create example-cluster --scopes https://www.googleapis.com/auth/bigtable.admin
gcloud alpha dataproc clusters create example-cluster --scopes sqlservice,bigquery

The following minimum scopes are necessary for the cluster to function properly and are always added, even if not explicitly specified:

https://www.googleapis.com/auth/devstorage.read_write
https://www.googleapis.com/auth/logging.write

If the --scopes flag is not specified, the following default scopes are also included:

https://www.googleapis.com/auth/bigquery
https://www.googleapis.com/auth/bigtable.admin.table
https://www.googleapis.com/auth/bigtable.data
https://www.googleapis.com/auth/devstorage.full_control

If you want to enable all scopes use the 'cloud-platform' scope.

SCOPE can be either the full URI of the scope or an alias. Default scopes are assigned to all instances. Available aliases are:

Alias URI
bigquery https://www.googleapis.com/auth/bigquery
cloud-platform https://www.googleapis.com/auth/cloud-platform
cloud-source-repos https://www.googleapis.com/auth/source.full_control
cloud-source-repos-ro https://www.googleapis.com/auth/source.read_only
compute-ro https://www.googleapis.com/auth/compute.readonly
compute-rw https://www.googleapis.com/auth/compute
datastore https://www.googleapis.com/auth/datastore
default https://www.googleapis.com/auth/devstorage.read_only
https://www.googleapis.com/auth/logging.write
https://www.googleapis.com/auth/monitoring.write
https://www.googleapis.com/auth/pubsub
https://www.googleapis.com/auth/service.management.readonly
https://www.googleapis.com/auth/servicecontrol
https://www.googleapis.com/auth/trace.append
gke-default https://www.googleapis.com/auth/devstorage.read_only
https://www.googleapis.com/auth/logging.write
https://www.googleapis.com/auth/monitoring
https://www.googleapis.com/auth/service.management.readonly
https://www.googleapis.com/auth/servicecontrol
https://www.googleapis.com/auth/trace.append
logging-write https://www.googleapis.com/auth/logging.write
monitoring https://www.googleapis.com/auth/monitoring
monitoring-read https://www.googleapis.com/auth/monitoring.read
monitoring-write https://www.googleapis.com/auth/monitoring.write
pubsub https://www.googleapis.com/auth/pubsub
service-control https://www.googleapis.com/auth/servicecontrol
service-management https://www.googleapis.com/auth/service.management.readonly
sql (deprecated) https://www.googleapis.com/auth/sqlservice
sql-admin https://www.googleapis.com/auth/sqlservice.admin
storage-full https://www.googleapis.com/auth/devstorage.full_control
storage-ro https://www.googleapis.com/auth/devstorage.read_only
storage-rw https://www.googleapis.com/auth/devstorage.read_write
taskqueue https://www.googleapis.com/auth/taskqueue
trace https://www.googleapis.com/auth/trace.append
userinfo-email https://www.googleapis.com/auth/userinfo.email
DEPRECATION WARNING: https://www.googleapis.com/auth/sqlservice account scope and sql alias do not provide SQL instance management capabilities and have been deprecated. Please, use https://www.googleapis.com/auth/sqlservice.admin or sql-admin to manage your Google SQL Service instances.
--service-account=SERVICE_ACCOUNT
The Google Cloud IAM service account to be authenticated as.
--tags=TAG,[TAG,…]
Specifies a list of tags to apply to the instance. These tags allow network firewall rules and routes to be applied to specified VM instances. See gcloud compute firewall-rules create(1) for more details.

To read more about configuring network tags, read this guide: https://cloud.google.com/vpc/docs/add-remove-network-tags

To list instances with their respective status and tags, run:

gcloud compute instances list --format='table(name,status,tags.list())'

To list instances tagged with a specific tag, tag1, run:

gcloud compute instances list --filter='tags:tag1'
At most one of these can be specified:
--network=NETWORK
The Compute Engine network that the VM instances of the cluster will be part of. This is mutually exclusive with --subnet. If neither is specified, this defaults to the "default" network.
--subnet=SUBNET
Specifies the subnet that the cluster will be part of. This is mutally exclusive with --network.
Specifies the reservation for the instance.
--reservation=RESERVATION
The name of the reservation, required when --reservation-affinity=specific.
--reservation-affinity=RESERVATION_AFFINITY; default="any"
The type of reservation for the instance. RESERVATION_AFFINITY must be one of: any, none, specific.
At most one of these can be specified:
--image=IMAGE
The custom image used to create the cluster. It can be the image name, the image URI, or the image family URI, which selects the latest image from the family.
--image-version=VERSION
The image version to use for the cluster. Defaults to the latest version.
Specifying these flags will enable Kerberos for the cluster.

At most one of these can be specified:

--kerberos-config-file=KERBEROS_CONFIG_FILE
Path to a YAML (or JSON) file containing the configuration for Kerberos on the cluster. If you pass - as the value of the flag the file content will be read from stdin.

The YAML file is formatted as follows:

  # Optional. Flag to indicate whether to Kerberize the cluster.
  # The default value is true.
  enable_kerberos: true

  # Optional. The Google Cloud Storage URI of a KMS encrypted file
  # containing the root principal password.
  root_principal_password_uri: gs://bucket/password.encrypted

  # Optional. The URI of the Cloud KMS key used to encrypt
  # sensitive files.
  kms_key_uri:
    projects/myproject/locations/global/keyRings/mykeyring/cryptoKeys/my-key

  # Configuration of SSL encryption. If specified, all sub-fields
  # are required. Otherwise, Dataproc will provide a self-signed
  # certificate and generate the passwords.
  ssl:
    # Optional. The Google Cloud Storage URI of the keystore file.
    keystore_uri: gs://bucket/keystore.jks

    # Optional. The Google Cloud Storage URI of a KMS encrypted
    # file containing the password to the keystore.
    keystore_password_uri: gs://bucket/keystore_password.encrypted

    # Optional. The Google Cloud Storage URI of a KMS encrypted
    # file containing the password to the user provided key.
    key_password_uri: gs://bucket/key_password.encrypted

    # Optional. The Google Cloud Storage URI of the truststore
    # file.
    truststore_uri: gs://bucket/truststore.jks

    # Optional. The Google Cloud Storage URI of a KMS encrypted
    # file containing the password to the user provided
    # truststore.
    truststore_password_uri:
      gs://bucket/truststore_password.encrypted

  # Configuration of cross realm trust.
  cross_realm_trust:
    # Optional. The remote realm the Dataproc on-cluster KDC will
    # trust, should the user enable cross realm trust.
    realm: REMOTE.REALM

    # Optional. The KDC (IP or hostname) for the remote trusted
    # realm in a cross realm trust relationship.
    kdc: kdc.remote.realm

    # Optional. The admin server (IP or hostname) for the remote
    # trusted realm in a cross realm trust relationship.
    admin_server: admin-server.remote.realm

    # Optional. The Google Cloud Storage URI of a KMS encrypted
    # file containing the shared password between the on-cluster
    # Kerberos realm and the remote trusted realm, in a cross
    # realm trust relationship.
    shared_password_uri:
      gs://bucket/cross-realm.password.encrypted

  # Optional. The Google Cloud Storage URI of a KMS encrypted file
  # containing the master key of the KDC database.
  kdc_db_key_uri: gs://bucket/kdc_db_key.encrypted

  # Optional. The lifetime of the ticket granting ticket, in
  # hours. If not specified, or user specifies 0, then default
  # value 10 will be used.
  tgt_lifetime_hours: 1

  # Optional. The name of the Kerberos realm. If not specified,
  # the uppercased domain name of the cluster will be used.
  realm: REALM.NAME
--enable-kerberos
Enable Kerberos on the cluster.
--kerberos-root-principal-password-uri=KERBEROS_ROOT_PRINCIPAL_PASSWORD_URI
Google Cloud Storage URI of a KMS encrypted file containing the root principal password. Must be a Cloud Storage URL beginning with 'gs://'.
Key resource - The Cloud KMS (Key Management Service) cryptokey that will be used to protect the password. The 'Compute Engine Service Agent' service account must hold permission 'Cloud KMS CryptoKey Encrypter/Decrypter'. The arguments in this group can be used to specify the attributes of this resource.
--kerberos-kms-key=KERBEROS_KMS_KEY
ID of the key or fully qualified identifier for the key.

To set the kms-key attribute:

  • provide the argument --kerberos-kms-key on the command line.

This flag argument must be specified if any of the other arguments in this group are specified.

--kerberos-kms-key-keyring=KERBEROS_KMS_KEY_KEYRING
The KMS keyring of the key.

To set the kms-keyring attribute:

  • provide the argument --kerberos-kms-key on the command line with a fully specified name;
  • provide the argument --kerberos-kms-key-keyring on the command line.
--kerberos-kms-key-location=KERBEROS_KMS_KEY_LOCATION
The Google Cloud location for the key.

To set the kms-location attribute:

  • provide the argument --kerberos-kms-key on the command line with a fully specified name;
  • provide the argument --kerberos-kms-key-location on the command line.
--kerberos-kms-key-project=KERBEROS_KMS_KEY_PROJECT
The Google Cloud project for the key.

To set the kms-project attribute:

  • provide the argument --kerberos-kms-key on the command line with a fully specified name;
  • provide the argument --kerberos-kms-key-project on the command line;
  • set the property core/project.
Key resource - The Cloud KMS (Key Management Service) cryptokey that will be used to protect the cluster. The 'Compute Engine Service Agent' service account must hold permission 'Cloud KMS CryptoKey Encrypter/Decrypter'. The arguments in this group can be used to specify the attributes of this resource.
--kms-key=KMS_KEY
ID of the key or fully qualified identifier for the key.

To set the kms-key attribute:

  • provide the argument --kms-key on the command line.

This flag argument must be specified if any of the other arguments in this group are specified.

--kms-keyring=KMS_KEYRING
The KMS keyring of the key.

To set the kms-keyring attribute:

  • provide the argument --kms-key on the command line with a fully specified name;
  • provide the argument --kms-keyring on the command line.
--kms-location=KMS_LOCATION
The Google Cloud location for the key.

To set the kms-location attribute:

  • provide the argument --kms-key on the command line with a fully specified name;
  • provide the argument --kms-location on the command line.
--kms-project=KMS_PROJECT
The Google Cloud project for the key.

To set the kms-project attribute:

  • provide the argument --kms-key on the command line with a fully specified name;
  • provide the argument --kms-project on the command line;
  • set the property core/project.
At most one of these can be specified:
--no-address
If provided, the instances in the cluster will not be assigned external IP addresses.

If omitted, then the Dataproc service will apply a default policy to determine if each instance in the cluster gets an external IP address or not.

Note: Dataproc VMs need access to the Dataproc API. This can be achieved without external IP addresses using Private Google Access (https://cloud.google.com/compute/docs/private-google-access).

--public-ip-address
If provided, cluster instances are assigned external IP addresses.

If omitted, the Dataproc service applies a default policy to determine whether or not each instance in the cluster gets an external IP address.

Note: Dataproc VMs need access to the Dataproc API. This can be achieved without external IP addresses using Private Google Access (https://cloud.google.com/compute/docs/private-google-access).

At most one of these can be specified:
--single-node
Create a single node cluster.

A single node cluster has all master and worker components. It cannot have any separate worker nodes. If this flag is not specified, a cluster with separate workers is created.

Multi-node cluster flags
--min-num-workers=MIN_NUM_WORKERS
Minimum number of primary worker nodes to provision for cluster creation to succeed.
--num-secondary-workers=NUM_SECONDARY_WORKERS
The number of secondary worker nodes in the cluster.
--num-workers=NUM_WORKERS
The number of worker nodes in the cluster. Defaults to server-specified.
--secondary-worker-type=TYPE; default="preemptible"
The type of the secondary worker group. TYPE must be one of: preemptible, non-preemptible, spot.
GCLOUD WIDE FLAGS
These flags are available to all commands: --access-token-file, --account, --billing-project, --configuration, --flags-file, --flatten, --format, --help, --impersonate-service-account, --log-http, --project, --quiet, --trace-token, --user-output-enabled, --verbosity.

Run $ gcloud help for details.

NOTES
This command is currently in alpha and might change without notice. If this command fails with API permission errors despite specifying the correct project, you might be trying to access an API with an invitation-only early access allowlist. These variants are also available:
gcloud dataproc clusters create
gcloud beta dataproc clusters create