Dataproc supports both a single "global" endpoint and regional endpoints based on Compute Engine zones.
Global Endpoint: The "global" endpoint is a special multi-region namespace that is capable of interacting with Dataproc resources in any user-specified Compute Engine zone.
Regional Endpoints: Each Dataproc region constitutes an independent resource
namespace constrained to deploying instances into Compute Engine zones inside
the region. Specifically, you can specify distinct regions, such as "us-east1"
or "europe-west1", to isolate resources (including VM instances and Cloud Storage)
and metadata storage locations utilized by Dataproc within the
user-specified region. This is made possible because the underlying
infrastructure for Dataproc, including its control plane, is
deployed in each region. The regional namespace corresponds to the
/regions/<region>
segment of the
Dataproc resource URIs being referenced.
Benefits of regional endpoints:
- If you use Dataproc in multiple regions, specifying a regional endpoint can provide better regional isolation and protection.
- You may notice better performance by selecting regional endpoints, particularly based on geography, compared to the "global" multi-region namespace.
- If you specify a regional endpoint when you create a cluster, you do not need to specify a zone within the region. Dataproc Auto Zone Placement will choose the zone for you.
Regional endpoint semantics
Regional endpoint names follow a standard naming convention based on
Compute Engine regions.
For example, the name for the Central US region is us-central1
, and the name
of the Western Europe region is europe-west1
. Run the gcloud compute regions list
command to see a listing of available regions.
Using regional endpoints
gcloud
Specify a region or the multi-region endpoint using the
gcloud
command-line tool with the --region
flag.
gcloud dataproc clusters create cluster-name \ --region=region \ other args ...
REST API
Use the region
URL parameter in a
clusters.create
request to specify the region or "global" multi-region endpoint for your cluster.
The zoneUri
parameter must be specified in the request body for a
global endpoint. You can specify the zone for a regional endpoint or leave
it empty to allow Dataproc
Dataproc Auto Zone Placement
to select the zone for your cluster.
gRPC
The default gRPC endpoint accesses the global
multi-region
namespace. To use a regional endpoint, configure the endpoint to the address on
the client's transport, using the following pattern:
region-dataproc.googleapis.com
Python (google-cloud-python) example:
from google.cloud import dataproc_v1
from google.cloud.dataproc_v1.gapic.transports import cluster_controller_grpc_transport
transport = cluster_controller_grpc_transport.ClusterControllerGrpcTransport(
address='us-central1-dataproc.googleapis.com:443')
client = dataproc_v1.ClusterControllerClient(transport)
project_id = 'my-project'
region = 'us-central1'
cluster = {...}
Java (google-cloud-java) example:
ClusterControllerSettings settings =
ClusterControllerSettings.newBuilder()
.setEndpoint("us-central1-dataproc.googleapis.com:443")
.build();
try (ClusterControllerClient clusterControllerClient = ClusterControllerClient.create(settings)) {
String projectId = "my-project";
String region = "us-central1";
Cluster cluster = Cluster.newBuilder().build();
Cluster response =
clusterControllerClient.createClusterAsync(projectId, region, cluster).get();
}
Console
Specify a Dataproc region in the Location section of the Set up cluster panel of the Dataproc Create a cluster page in the Cloud Console.
What's next
- Geography and Regions
- Compute Engine Engine→Regions and Zones
- Compute Engine→Global, Regional, and Zonal Resources
- Dataproc Auto Zone Placement