Regional endpoints

Cloud Dataproc supports both a single "global" endpoint and "regional" endpoints based on Compute Engine zones. Each Dataproc region constitutes an independent resource namespace constrained to deploying instances into Compute Engine zones inside the region. Specifically, you can specify distinct regions, such as us-east1 or europe-west1, to isolate resources (including VM instances and Cloud Storage) and metadata storage locations utilized by Cloud Dataproc within the user-specified region. This is made possible because the underlying infrastructure for Cloud Dataproc, including its control plane, is deployed in each region. This region parameter corresponds to the /regions/<region> segment of the Cloud Dataproc resource URIs being referenced.

Unless specified, Cloud Dataproc will default to the "global" region. The "global" region is a special multi-region namespace which is capable of interacting with Cloud Dataproc resources in any user-specified Compute Engine zone.

There are some situations where specifying a regional endpoint may be useful:

  • If you use Cloud Dataproc in multiple regions, specifying an explicit regional endpoint may provide better regional isolation and protection.
  • You may notice better performance by selecting specific regions, especially based on geography, compared to the default "global" namespace.

Regional endpoint semantics

  • Regional endpoint names follow a standard naming convention based on Compute Engine regions. For example, the name for the Central US region is us-central1 and the name of the Western Europe region is europe-west1. You can run the gcloud compute regions list command to see a listing of available regions.
  • When new regions are added to Compute Engine, they will also become available for use with Cloud Dataproc.

Using regional endpoints

gcloud

You can specify a region when using the gcloud command-line tool by using the --region parameter when passing commands.

gcloud dataproc clusters create cluster-name --region region ...

Unless specified, the gcloud command assumes a default of --region global.

REST API

You can specify a Cloud Dataproc region through the Cloud Dataproc REST API. Currently, region is a required parameter. You specify the region you want to use, including global, in this parameter.

gRPC

The default gRPC endpoint accesses the global multi-region namespace. To use a regional endpoint, configure the endpoint to the address on the client's transport, using the following pattern:

region-dataproc.googleapis.com

Python (google-cloud-python) example:

from google.cloud import dataproc_v1
from google.cloud.dataproc_v1.gapic.transports import cluster_controller_grpc_transport

transport = cluster_controller_grpc_transport.ClusterControllerGrpcTransport(
    address='us-central1-dataproc.googleapis.com:443')
client = dataproc_v1.ClusterControllerClient(transport)

project_id = 'my-project'
region = 'us-central1'
cluster = {...}

Java (google-cloud-java) example:

ClusterControllerSettings settings =
     ClusterControllerSettings.newBuilder()
        .setEndpoint("us-central1-dataproc.googleapis.com:443")
        .build();
 try (ClusterControllerClient clusterControllerClient = ClusterControllerClient.create(settings)) {
   String projectId = "my-project";
   String region = "us-central1";
   Cluster cluster = Cluster.newBuilder().build();
   Cluster response =
       clusterControllerClient.createClusterAsync(projectId, region, cluster).get();
 }

Console

When you use the Google Cloud Platform Console, you specify a Cloud Dataproc region from the Create a cluster page.

What's next

هل كانت هذه الصفحة مفيدة؟ يرجى تقييم أدائنا:

إرسال تعليقات حول...

Cloud Dataproc Documentation
هل تحتاج إلى مساعدة؟ انتقل إلى صفحة الدعم.