Regional endpoints

Cloud Dataproc supports both a single "global" endpoint and regional endpoints based on Compute Engine zones.

Global Endpoint: The "global" endpoint is a special multi-region namespace that is capable of interacting with Cloud Dataproc resources in any user-specified Compute Engine zone.

Regional Endpoints: Each Dataproc region constitutes an independent resource namespace constrained to deploying instances into Compute Engine zones inside the region. Specifically, you can specify distinct regions, such as "us-east1" or "europe-west1", to isolate resources (including VM instances and Cloud Storage) and metadata storage locations utilized by Cloud Dataproc within the user-specified region. This is made possible because the underlying infrastructure for Cloud Dataproc, including its control plane, is deployed in each region. The regional namespace corresponds to the /regions/<region> segment of the Cloud Dataproc resource URIs being referenced.

Benefits of regional endpoints:

  • If you use Cloud Dataproc in multiple regions, specifying a regional endpoint can provide better regional isolation and protection.
  • You may notice better performance by selecting regional endpoints, particularly based on geography, compared to the "global" multi-region namespace.
  • If you specify a regional enpoint when you create a cluster, you do not need to specify a zone within the region. Cloud Dataproc Auto Zone Placement will choose the zone for you.

Regional endpoint semantics

Regional endpoint names follow a standard naming convention based on Compute Engine regions. For example, the name for the Central US region is us-central1, and the name of the Western Europe region is europe-west1. Run the gcloud compute regions list command to see a listing of available regions.

Using regional endpoints

gcloud

Specify a region or the multi-region endpoint using the gcloud command-line tool with the --region flag.

gcloud dataproc clusters create cluster-name --region region ...

REST API

Use the region URL parameter in a clusters.create request to specify the region or "global" multi-region endpoint for your cluster. The zoneUri parameter must be specified in the request body for a global endpoint. You can specify the zone for a regional endpoint or leave it empty to allow Cloud Dataproc Cloud Dataproc Auto Zone Placement to select the zone for your cluster.

gRPC

The default gRPC endpoint accesses the global multi-region namespace. To use a regional endpoint, configure the endpoint to the address on the client's transport, using the following pattern:

region-dataproc.googleapis.com

Python (google-cloud-python) example:

from google.cloud import dataproc_v1
from google.cloud.dataproc_v1.gapic.transports import cluster_controller_grpc_transport

transport = cluster_controller_grpc_transport.ClusterControllerGrpcTransport(
    address='us-central1-dataproc.googleapis.com:443')
client = dataproc_v1.ClusterControllerClient(transport)

project_id = 'my-project'
region = 'us-central1'
cluster = {...}

Java (google-cloud-java) example:

ClusterControllerSettings settings =
     ClusterControllerSettings.newBuilder()
        .setEndpoint("us-central1-dataproc.googleapis.com:443")
        .build();
 try (ClusterControllerClient clusterControllerClient = ClusterControllerClient.create(settings)) {
   String projectId = "my-project";
   String region = "us-central1";
   Cluster cluster = Cluster.newBuilder().build();
   Cluster response =
       clusterControllerClient.createClusterAsync(projectId, region, cluster).get();
 }

Console

When you use the Google Cloud Platform Console, you specify a Cloud Dataproc region from the Create a cluster page.

What's next

Was this page helpful? Let us know how we did:

Send feedback about...

Cloud Dataproc Documentation
Need help? Visit our support page.