Dataproc supports regional endpoints based on Compute Engine regions. You must specify a region, such as "us-east1" or "europe-west1", when you create a Dataproc cluster. Dataproc will isolate cluster resources, such as VM instances and Cloud Storage and metadata storage, within a zone within the specified region.
You can optionally specify a zone within the specified cluster region, such as "us-east1-a" or "europe-west1-b", when you create a cluster. If you do not specify the zone, Dataproc Auto Zone Placement will choose a zone within your specified cluster region to locate clusters resources.
The regional namespace corresponds to the /regions/REGION
segment of Dataproc resource URIs (see, for example, the
cluster
networkUri
).
Regional endpoint semantics
Regional endpoint names follow a standard naming convention based on
Compute Engine regions.
For example, the name for the Central US region is us-central1
, and the name
of the Western Europe region is europe-west1
. Run the gcloud compute regions list
command to see a listing of available regions.
Create a cluster
gcloud
When you create a cluster, specify a region using the required
--region
flag.
gcloud dataproc clusters create CLUSTER_NAME \ --region=REGION \ other args ...
REST API
Use the REGION
URL parameter in a
clusters.create
request to specify the cluster region.
gRPC
Set the client transport address to the regional endpoint using the following pattern:
REGION-dataproc.googleapis.com
Python (google-cloud-python) example:
from google.cloud import dataproc_v1
from google.cloud.dataproc_v1.gapic.transports import cluster_controller_grpc_transport
transport = cluster_controller_grpc_transport.ClusterControllerGrpcTransport(
address='us-central1-dataproc.googleapis.com:443')
client = dataproc_v1.ClusterControllerClient(transport)
project_id = 'my-project'
region = 'us-central1'
cluster = {...}
Java (google-cloud-java) example:
ClusterControllerSettings settings =
ClusterControllerSettings.newBuilder()
.setEndpoint("us-central1-dataproc.googleapis.com:443")
.build();
try (ClusterControllerClient clusterControllerClient = ClusterControllerClient.create(settings)) {
String projectId = "my-project";
String region = "us-central1";
Cluster cluster = Cluster.newBuilder().build();
Cluster response =
clusterControllerClient.createClusterAsync(projectId, region, cluster).get();
}
Console
Specify a Dataproc region in the Location section of the Set up cluster panel on the Dataproc Create a cluster page in the Google Cloud console.
What's next
- Geography and Regions
- Compute Engine Engine→Regions and Zones
- Compute Engine→Global, Regional, and Zonal Resources
- Dataproc Auto Zone Placement