Dataproc Auto Zone placement

When you create a Dataproc cluster, cluster resources use a regional endpoints based on Compute Engine zones. When you choose a region, you can select a zone within that region, or you can omit the zone to have the Dataproc Auto Zone feature select a zone for you in the region you choose. Once a zone is selected, all nodes for that cluster will be deployed to that zone.

Auto Zone and resource reservations

Auto Zone prioritizes creating a cluster in a zone with resource reservations, as follows:

  • If requested cluster resources can be fully satisfied by reserved, plus, if necessary, on-demand resources in a zone, Auto Zone will consume the reserved and on-demand resources, and create the cluster in that zone.

  • Auto Zone prioritizes zones for selection according to total CPU core (vCPU) reservations in a zone.

    Example: A cluster creation request specifies 20 n2-standard-2 and 1 n2-standard-64 (40 + 64 vCPUs requested). Auto Zone will prioritize the following zones for selection according to the total vCPU reservations available in the zone:

    1. zone-c available reservations: 3 n2-standard-2 and 1 n2-standard-64 (70 vCPUs)
    2. zone-b available reservations: 1 n2-standard-64 (64 vCPUs)
    3. zone-a available reservations: 25 n2-standard-2 (50 vCPUs)

      Assuming each of the above zones has additional on-demand vCPU and other resources sufficient to satisfy the cluster request, Auto Zone will select zone-c for cluster creation.

  • If requested cluster resources cannot be fully satisfied by reserved plus on-demand resources in a zone, Auto Zone will create the cluster in a zone that is most likely to satisfy the request using on-demand resources.

Using Auto Zone placement

Console

To create a Dataproc cluster that uses Auto Zone placement:

  • In the Google Cloud console, open the Dataproc Create a Dataproc cluster on Compute Engine page. The Set up cluster panel is selected.
  • In the Location section:
    • Select a Region for your cluster.
    • Under Zone, select "Any".

gcloud command

To create a Dataproc cluster that uses Auto Zone placement, use the gcloud dataproc clusters create command. Set the --region flag to a region, and omit the --zone flag (or leave the flag empty: --zone= or zone="").

gcloud dataproc clusters create cluster-name \
    --region=region \
    --zone="" \
    other args ...

REST API

To create a Dataproc cluster that uses Auto Zone placement, construct a JSON clusters.create API request, leaving the gceClusterConfig.zoneUri field empty. In the REST endpoint, https://dataproc.googleapis.com/v1/projects/projectId/regions/region/clusters, insert a region name. Dataproc Auto Zone will choose a zone for the cluster within the specified region.

Use short resource names with Auto Zone placement: When specifying a resource URI, such as machineTypeUri or acceleratorTypeUri, in an Auto Zone placement REST API cluster creation request, use a short resource name without a zone specification, for example, "n1-standard-2" or "nvidia-tesla-k80".