You can create a Dataproc Metastore service using the Google Cloud Console,
the Cloud SDK gcloud command-line tool in a local terminal window or in
Cloud Shell, or a
When you create a service, you're required to specify the region for it. See Cloud locations for information on which locations support Dataproc Metastore.
Additional fields include metastore version, network, port, and service tier.
Note that if you don't specify a network, the
default network in
the Dataproc Metastore service project is used.
Dataproc Metastore uses private IP, so only VMs on the same
network can access the Dataproc Metastore service.
Before you begin
gcloud metastorecommands require a location. You can specify the location by using the
--locationflag or by setting the default location.
Do not set the org-policy constraint to restrict VPC peering. Specifying
constraints/compute.restrictVpcPeeringwill cause your creation request to fail with an
INVALID_ARGUMENTerror. If you must set the constraint, use the following command to allow
gcloud resource-manager org-policies allow compute.restrictVpcPeering under:folders/270204312590 --organization ORGANIZATION_ID
For more information, see Organization policy constraints.
If you'd like to enable Kerberos for your Hive metastore instance, you must:
- Host your own Kerberos Key Distribution Center (KDC).
- Set up IP connectivity between the VPC network and your KDC.
- Set up a Secret Manager secret that contains the contents of a Hive Keytab.
- Specify a principal that is in both the KDC and the Hive Keytab.
- Specify a krb5.conf file in a Google Cloud Storage bucket.
For more information, see Configuring Kerberos.
To create a Dataproc Metastore service that is accessible in a network belonging to a different project than the one the service belongs to, you must grant
roles/metastore.serviceAgentto the service project's Dataproc Metastore service agent (
service-SERVICE_PROJECT_NUMBER@gcp-sa-metastore.iam.gserviceaccount.com) in the network project's IAM policy.
gcloud projects add-iam-policy-binding NETWORK_PROJECT_ID \ --role "roles/metastore.serviceAgent" \ --member "serviceAccount:service-SERVICE_PROJECT_NUMBER@gcp-sa-metastore.iam.gserviceaccount.com"
To create a service, you must be granted an IAM role containing the
metastore.services.createIAM permission. The Dataproc Metastore specific roles
roles/metastore.editorcan be used to grant create permission.
You can also give create permission to users or groups by using the
For more information, see Dataproc Metastore IAM and access control.
Creating a Dataproc Metastore service
The following instructions demonstrate how to create a Dataproc Metastore
service using the Google Cloud Console, the
gcloud tool, or the
Dataproc Metastore API.
In the Cloud Console, open the Dataproc Metastore page:
At the top of the Dataproc Metastore page click the Create button. The Create service page opens.
Enter a unique name for your service in the Service name field. For information on the naming convention, see Resource naming convention.
Select the Data location.
Select the Hive Metastore version. If not specified, Hive version
2.3.6is used. For more information, see Version policy.
Select the Release channel. If not specified,
Stableis used. For more information, see Release channel.
Enter the Port. This is the TCP port at which the Dataproc Metastore Thrift interface is available. If not provided, port number
Select the Service tier. This influences the capacity of the service.
Developeris the default tier. It's good for low-cost proof-of-concept as it provides limited scalability and no fault tolerance.
Enterprisetier provides flexible scalability, fault tolerance, and multi-zone high availability. It can handle heavy Dataproc Metastore workloads.
Select the Network. The service must be attached to the same network that other Metastore clients, such as the Dataproc cluster, are attached to in order to access them. If not provided, the
defaultnetwork is used.
Optional: Click to Use shared VPC network and enter the Project ID and VPC network name of the shared VPC network. For more information, see VPC Service Controls with Dataproc Metastore.
Optional: Enable Data Catalog sync to sync the Dataproc Metastore service to Data Catalog. For more information, see Dataproc Metastore to Data Catalog sync.
Optional: Select the Day of week and Hour of day for the service's maintenance window. For more information, see Maintenance windows.
Optional: Enable a Kerberos keytab file:
Click the toggle to enable Kerberos.
Select or enter your secret resource ID.
Either choose to use the latest secret version or select an older one to use.
Enter the Kerberos principal. This is the principal allocated for this Dataproc Metastore service.
Browse to the krb5 config file.
Optional: Click + Add Overrides to apply a mapping to the Hive Metastore.
Optional: Click + Add Labels to add additional metadata to the metastore service resource.
Click the Submit button to create and start the service.
Verify that you have returned to the Dataproc Metastore page, and that your new service appears in the list.
Use the following
gcloud metastore services createcommand to create a service:
gcloud metastore services create SERVICE \ --location=LOCATION \ --labels=k1=v1,k2=v2,k3=v3 \ --network=NETWORK \ --port=PORT \ --tier=TIER \ --hive-metastore-version=HIVE_METASTORE_VERSION \ --release-channel=RELEASE_CHANNEL \ --hive-metastore-configs=K1=V1,K2=V2 \ --kerberos-principal=KERBEROS_PRINCIPAL \ --krb5-config=KRB5_CONFIG \ --keytab=CLOUD_SECRET
Replace the following:
SERVICE: The name of the new service.
LOCATION: Refers to a Google Cloud region.
k1=v1,k2=v2,k3=v3: The labels used.
NETWORK: The name of the VPC network on which the service can be accessed. When using a VPC network belonging to a different project than the service, the entire relative resource name must be provided, for example
PORT: The TCP port at which the metastore Thrift interface is available. Default: 9083.
TIER: The tier capacity of the new service.
HIVE_METASTORE_VERSION: The versions of Hive metastore that can be used when creating a new metastore service in this location. The server guarantees that exactly one
HiveMetastoreVersionin the list will set
RELEASE_CHANNEL: The release channel of the service.
K1=V1,K2=V2: Optional: The Hive metastore configs used.
KERBEROS_PRINCIPAL: Optional: A Kerberos principal that exists in the both the keytab and the KDC. A typical principal is of the form "primary/instance@REALM", but there is no exact format.
KRB5_CONFIG: Optional: The krb5.config file specifies the KDC and the Kerberos realm information, which includes locations of KDCs and defaults for the realm and Kerberos applications.
CLOUD_SECRET: Optional: The relative resource name of a Secret Manager secret version.
Verify that the creation was successful.
Follow the API instructions to create a service by using the APIs Explorer.
Using non-RFC 1918 private IP address ranges
The provided VPC network may run out of available RFC 1918 addresses required by Dataproc Metastore services. If that happens, Dataproc Metastore will attempt to reserve private IP address ranges outside of the RFC 1918 ranges for service creation. Please see Valid ranges in the VPC network documentation for a list of supported non-RFC 1918 private ranges.
Non-RFC 1918 private IP addresses used in Dataproc Metastore may conflict with a range in an on-premises network that is connected to the provided VPC network. To check the list of RFC 1918 and non-RFC 1918 private IP addresses reserved by Dataproc Metastore:
gcloud compute addresses list \ --project NETWORK_PROJECT_ID \ --filter="purpose:VPC_PEERING AND name ~ cluster|resourcegroup"
If a conflict is determined and cannot be mitigated by re-configuring the on-premises network, delete the offending Dataproc Metastore service and re-create it again after 2 hours.
After you create a Dataproc Metastore service
- Learn more about attaching a Dataproc cluster.
- Learn more about updating and deleting a service.
- Learn more about importing metadata into a service.