To create a Dataproc Metastore service, enter service parameters on the
Create service page opened in a local browser, use the
gcloud tool, or
issue a Dataproc Metastore API method services.create.
When you create a service, you're required to specify the region for it. See Cloud locations for information on which locations support Dataproc Metastore.
Additional fields include metastore version, network, port, and service tier.
Note that if you don't specify a network, Dataproc Metastore uses
default network in the service project.
Dataproc Metastore uses private IP, so only VMs on the same
network can access the Dataproc Metastore service.
Before you begin
- Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.
In the Google Cloud Console, on the project selector page, select or create a Google Cloud project.
Make sure that billing is enabled for your Cloud project. Learn how to confirm that billing is enabled for your project.
- Enable the Dataproc Metastore API.
gcloud metastorecommands require a location. You can specify the location by using the
--locationflag or by setting the default location.
Don't set the org-policy constraint to restrict VPC peering. Specifying
constraints/compute.restrictVpcPeeringcauses your creation request to fail with an
INVALID_ARGUMENTerror. If you must set the constraint, use the following command to allow
gcloud resource-manager org-policies allow compute.restrictVpcPeering under:folders/270204312590 --organization ORGANIZATION_ID
For more information, see Organization policy constraints.
If you'd like to enable Kerberos for your Hive metastore instance, you must:
- Host your own Kerberos Key Distribution Center (KDC).
- Set up IP connectivity between the VPC network and your KDC.
- Set up a Secret Manager secret that contains the contents of a Hive Keytab.
- Specify a principal that is in both the KDC and the Hive Keytab.
- Specify a krb5.conf file in a Google Cloud Storage bucket.
For more information, see Configuring Kerberos.
Set up Shared VPC
To create a Dataproc Metastore service that is accessible in a network belonging to a different project than the one the service belongs to, you must grant
roles/metastore.serviceAgentto the service project's Dataproc Metastore service agent (
service-SERVICE_PROJECT_NUMBER@gcp-sa-metastore.iam.gserviceaccount.com) in the network project's IAM policy.
gcloud projects add-iam-policy-binding NETWORK_PROJECT_ID \ --role "roles/metastore.serviceAgent" \ --member "serviceAccount:service-SERVICE_PROJECT_NUMBER@gcp-sa-metastore.iam.gserviceaccount.com"
To create a service, you must be granted an IAM role containing the
metastore.services.createIAM permission. The Dataproc Metastore specific roles
roles/metastore.editorinclude create permission.
You can give create permission to users or groups by using the
For more information, see Dataproc Metastore IAM and access control.
Creating a Dataproc Metastore service
The following instructions demonstrate how to create a Dataproc Metastore service.
In the Cloud Console, open the Dataproc Metastore page:
At the top of the Dataproc Metastore page click the Create button. The Create service page opens.
In the Service name field, enter a unique name for your service. For information on the naming convention, see Resource naming convention.
Select the Data location.
Select the Hive Metastore version. If not specified, Hive version
3.1.2is used. For more information, see Version policy.
Select the Release channel. If not specified,
Stableis used. For more information, see Release channel.
Enter the Port. This is the TCP port at which the Dataproc Metastore Thrift interface is available. If not provided, port number
Select the Service tier. This influences the capacity of the service.
Developeris the default tier. It's good for low-cost proof-of-concept as it provides limited scalability and no fault tolerance.
Enterprisetier provides flexible scalability, fault tolerance, and multi-zone high availability. It can handle heavy Dataproc Metastore workloads.
Select the Network. The service must be attached to the same network that other Metastore clients, such as the Dataproc cluster, are attached to in order to access them. If not provided, the
defaultnetwork is used.
Optional: Click to Use shared VPC network and enter the Project ID and VPC network name of the shared VPC network. For more information, see VPC Service Controls with Dataproc Metastore.
Optional: Enable Data Catalog sync. For more information, see Dataproc Metastore to Data Catalog sync.
Optional: Select the Day of week and Hour of day for the service's maintenance window. For more information, see Maintenance windows.
Optional: Enable a Kerberos keytab file:
Click the toggle to enable Kerberos.
Select or enter your secret resource ID.
Either choose to use the latest secret version or select an older one to use.
Enter the Kerberos principal. This is the principal allocated for this Dataproc Metastore service.
Browse to the krb5 config file.
Optional: Click to Use a customer managed encryption key (CMEK) and select a customer-managed key. For more information, see Using customer-managed encryption keys.
Optional: To apply a mapping to the Hive Metastore, click + Add Overrides.
Optional: To add additional metadata to the metastore service resource, click + Add Labels.
To create and start the service, click the Submit button.
Verify that you have returned to the Dataproc Metastore page, and that your new service appears in the list.
Run the following
gcloud metastore services createcommand to create a service:
gcloud metastore services create SERVICE \ --location=LOCATION \ --labels=k1=v1,k2=v2,k3=v3 \ --network=NETWORK \ --port=PORT \ --tier=TIER \ --hive-metastore-version=HIVE_METASTORE_VERSION \ --release-channel=RELEASE_CHANNEL \ --hive-metastore-configs=K1=V1,K2=V2 \ --kerberos-principal=KERBEROS_PRINCIPAL \ --krb5-config=KRB5_CONFIG \ --keytab=CLOUD_SECRET
Replace the following:
SERVICE: The name of the new service.
LOCATION: Refers to a Google Cloud region.
k1=v1,k2=v2,k3=v3: The labels used.
NETWORK: The name of the VPC network on which the service can be accessed. When using a VPC network belonging to a different project than the service, the entire relative resource name must be provided, for example
PORT: The TCP port at which the metastore Thrift interface is available. Default: 9083.
TIER: The tier capacity of the new service.
HIVE_METASTORE_VERSION: The versions of Hive metastore that can be used when creating a new metastore service in this location. The server guarantees that exactly one
HiveMetastoreVersionin the list is set to
RELEASE_CHANNEL: The release channel of the service.
K1=V1,K2=V2: Optional: The Hive metastore configs used.
KERBEROS_PRINCIPAL: Optional: A Kerberos principal that exists in the both the keytab and the KDC. A typical principal is of the form "primary/instance@REALM", but there is no exact format.
KRB5_CONFIG: Optional: The krb5.config file specifies the KDC and the Kerberos realm information, which includes locations of KDCs and defaults for the realm and Kerberos applications.
CLOUD_SECRET: Optional: The relative resource name of a Secret Manager secret version.
Verify that the creation was successful.
Follow the API instructions to create a service by using the APIs Explorer.
Using non-RFC 1918 private IP address ranges
The provided VPC network may run out of available RFC 1918 addresses required by Dataproc Metastore services. If that happens, Dataproc Metastore will attempt to reserve private IP address ranges outside of the RFC 1918 ranges for service creation. For a list of supported non-RFC 1918 private ranges, see Valid ranges in the VPC network documentation.
Non-RFC 1918 private IP addresses used in Dataproc Metastore may conflict with a range in an on-premises network that is connected to the provided VPC network. To check the list of RFC 1918 and non-RFC 1918 private IP addresses reserved by Dataproc Metastore:
gcloud compute addresses list \ --project NETWORK_PROJECT_ID \ --filter="purpose:VPC_PEERING AND name ~ cluster|resourcegroup"
If a conflict is determined and cannot be mitigated by re-configuring the on-premises network, delete the offending Dataproc Metastore service and re-create it again after 2 hours.
After you create a Dataproc Metastore service
- Learn more about attaching a Dataproc cluster.
- Learn more about updating and deleting a service.
- Learn more about importing metadata into a service.