Regional endpoints

Dataproc supports both a single "global" endpoint and regional endpoints based on Compute Engine zones.

Global Endpoint: The "global" endpoint is a special multi-region namespace that is capable of interacting with Dataproc resources in any user-specified Compute Engine zone.

Regional Endpoints: Each Dataproc region constitutes an independent resource namespace constrained to deploying instances into Compute Engine zones inside the region. Specifically, you can specify distinct regions, such as "us-east1" or "europe-west1", to isolate resources (including VM instances and Cloud Storage) and metadata storage locations utilized by Dataproc within the user-specified region. This is made possible because the underlying infrastructure for Dataproc, including its control plane, is deployed in each region. The regional namespace corresponds to the /regions/<region> segment of the Dataproc resource URIs being referenced.

Benefits of regional endpoints:

  • If you use Dataproc in multiple regions, specifying a regional endpoint can provide better regional isolation and protection.
  • You may notice better performance by selecting regional endpoints, particularly based on geography, compared to the "global" multi-region namespace.
  • If you specify a regional enpoint when you create a cluster, you do not need to specify a zone within the region. Dataproc Auto Zone Placement will choose the zone for you.

Regional endpoint semantics

Regional endpoint names follow a standard naming convention based on Compute Engine regions. For example, the name for the Central US region is us-central1, and the name of the Western Europe region is europe-west1. Run the gcloud compute regions list command to see a listing of available regions.

Using regional endpoints

gcloud

Specify a region or the multi-region endpoint using the gcloud command-line tool with the --region flag.

gcloud dataproc clusters create cluster-name --region region ...

REST API

Use the region URL parameter in a clusters.create request to specify the region or "global" multi-region endpoint for your cluster. The zoneUri parameter must be specified in the request body for a global endpoint. You can specify the zone for a regional endpoint or leave it empty to allow Dataproc Dataproc Auto Zone Placement to select the zone for your cluster.

gRPC

The default gRPC endpoint accesses the global multi-region namespace. To use a regional endpoint, configure the endpoint to the address on the client's transport, using the following pattern:

region-dataproc.googleapis.com

Python (google-cloud-python) example:

from google.cloud import dataproc_v1
from google.cloud.dataproc_v1.gapic.transports import cluster_controller_grpc_transport

transport = cluster_controller_grpc_transport.ClusterControllerGrpcTransport(
    address='us-central1-dataproc.googleapis.com:443')
client = dataproc_v1.ClusterControllerClient(transport)

project_id = 'my-project'
region = 'us-central1'
cluster = {...}

Java (google-cloud-java) example:

ClusterControllerSettings settings =
     ClusterControllerSettings.newBuilder()
        .setEndpoint("us-central1-dataproc.googleapis.com:443")
        .build();
 try (ClusterControllerClient clusterControllerClient = ClusterControllerClient.create(settings)) {
   String projectId = "my-project";
   String region = "us-central1";
   Cluster cluster = Cluster.newBuilder().build();
   Cluster response =
       clusterControllerClient.createClusterAsync(projectId, region, cluster).get();
 }

Console

When you use the Google Cloud Console, you specify a Dataproc region from the Create a cluster page.

What's next

Var denne side nyttig? Giv os en anmeldelse af den:

Send feedback om...

Cloud Dataproc Documentation
Har du brug for hjælp? Besøg vores supportside.