Creating a service

To create a Dataproc Metastore service, enter service parameters on the Create service page opened in a local browser, use the gcloud tool, or issue a Dataproc Metastore API method services.create.

When you create a service, you're required to specify the region for it. See Cloud locations for information on which locations support Dataproc Metastore.

Additional fields include metastore version, network, port, and service tier. Note that if you don't specify a network, Dataproc Metastore uses the default network in the service project. Dataproc Metastore uses private IP, so only VMs on the same network can access the Dataproc Metastore service.

Before you begin

  • Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.
  • In the Google Cloud Console, on the project selector page, select or create a Google Cloud project.

    Go to project selector

  • Make sure that billing is enabled for your Cloud project. Learn how to confirm that billing is enabled for your project.

  • Enable the Dataproc Metastore API.

    Enable the API

  • Most gcloud metastore commands require a location. You can specify the location by using the --location flag or by setting the default location.

  • Don't set the org-policy constraint to restrict VPC peering. Specifying constraints/compute.restrictVpcPeering causes your creation request to fail with an INVALID_ARGUMENT error. If you must set the constraint, use the following command to allow under:folders/270204312590:

    gcloud resource-manager org-policies allow compute.restrictVpcPeering under:folders/270204312590 --organization ORGANIZATION_ID
    

    For more information, see Organization policy constraints.

  • If you'd like to enable Kerberos for your Hive metastore instance, you must:

    • Host your own Kerberos Key Distribution Center (KDC).
    • Set up IP connectivity between the VPC network and your KDC.
    • Set up a Secret Manager secret that contains the contents of a Hive Keytab.
    • Specify a principal that is in both the KDC and the Hive Keytab.
    • Specify a krb5.conf file in a Google Cloud Storage bucket.

    For more information, see Configuring Kerberos.

Set up Shared VPC

  • To create a Dataproc Metastore service that is accessible in a network belonging to a different project than the one the service belongs to, you must grant roles/metastore.serviceAgent to the service project's Dataproc Metastore service agent (service-SERVICE_PROJECT_NUMBER@gcp-sa-metastore.iam.gserviceaccount.com) in the network project's IAM policy.

    gcloud projects add-iam-policy-binding NETWORK_PROJECT_ID \
        --role "roles/metastore.serviceAgent" \
        --member "serviceAccount:service-SERVICE_PROJECT_NUMBER@gcp-sa-metastore.iam.gserviceaccount.com"
    

Access control

  • To create a service, you must be granted an IAM role containing the metastore.services.create IAM permission. The Dataproc Metastore specific roles roles/metastore.admin and roles/metastore.editor include create permission.

  • You can give create permission to users or groups by using the roles/owner and roles/editor legacy roles.

For more information, see Dataproc Metastore IAM and access control.

Creating a Dataproc Metastore service

The following instructions demonstrate how to create a Dataproc Metastore service.

Console

  1. In the Cloud Console, open the Dataproc Metastore page:

    Open Dataproc Metastore in the Cloud Console

  2. At the top of the Dataproc Metastore page click the Create button. The Create service page opens.

    Create service page
  3. In the Service name field, enter a unique name for your service. For information on the naming convention, see Resource naming convention.

  4. Select the Data location.

  5. Select the Hive Metastore version. If not specified, Hive version 3.1.2 is used. For more information, see Version policy.

  6. Select the Release channel. If not specified, Stable is used. For more information, see Release channel.

  7. Enter the Port. This is the TCP port at which the Dataproc Metastore Thrift interface is available. If not provided, port number 9083 is used.

  8. Select the Service tier. This influences the capacity of the service. Developer is the default tier. It's good for low-cost proof-of-concept as it provides limited scalability and no fault tolerance. Enterprise tier provides flexible scalability, fault tolerance, and multi-zone high availability. It can handle heavy Dataproc Metastore workloads.

  9. Select the Network. The service must be attached to the same network that other Metastore clients, such as the Dataproc cluster, are attached to in order to access them. If not provided, the default network is used.

    Optional: Click to Use shared VPC network and enter the Project ID and VPC network name of the shared VPC network. For more information, see VPC Service Controls with Dataproc Metastore.

  10. Optional: Enable Data Catalog sync. For more information, see Dataproc Metastore to Data Catalog sync.

  11. Optional: Select the Day of week and Hour of day for the service's maintenance window. For more information, see Maintenance windows.

  12. Optional: Enable a Kerberos keytab file:

    1. Click the toggle to enable Kerberos.

    2. Select or enter your secret resource ID.

    3. Either choose to use the latest secret version or select an older one to use.

    4. Enter the Kerberos principal. This is the principal allocated for this Dataproc Metastore service.

    5. Browse to the krb5 config file.

  13. Optional: Click to Use a customer managed encryption key (CMEK) and select a customer-managed key. For more information, see Using customer-managed encryption keys.

  14. Optional: To apply a mapping to the Hive Metastore, click + Add Overrides.

  15. Optional: To add additional metadata to the metastore service resource, click + Add Labels.

  16. To create and start the service, click the Submit button.

  17. Verify that you have returned to the Dataproc Metastore page, and that your new service appears in the list.

gcloud

  1. Run the following gcloud metastore services create command to create a service:

    gcloud metastore services create SERVICE \
        --location=LOCATION \
        --labels=k1=v1,k2=v2,k3=v3 \
        --network=NETWORK \
        --port=PORT \
        --tier=TIER \
        --hive-metastore-version=HIVE_METASTORE_VERSION \
        --release-channel=RELEASE_CHANNEL \
        --hive-metastore-configs=K1=V1,K2=V2 \
        --kerberos-principal=KERBEROS_PRINCIPAL \
        --krb5-config=KRB5_CONFIG \
        --keytab=CLOUD_SECRET
    

    Replace the following:

    • SERVICE: The name of the new service.
    • LOCATION: Refers to a Google Cloud region.
    • k1=v1,k2=v2,k3=v3: The labels used.
    • NETWORK: The name of the VPC network on which the service can be accessed. When using a VPC network belonging to a different project than the service, the entire relative resource name must be provided, for example projects/HOST_PROJECT/global/networks/NETWORK_ID.
    • PORT: The TCP port at which the metastore Thrift interface is available. Default: 9083.
    • TIER: The tier capacity of the new service.
    • HIVE_METASTORE_VERSION: The versions of Hive metastore that can be used when creating a new metastore service in this location. The server guarantees that exactly one HiveMetastoreVersion in the list is set to is_default.
    • RELEASE_CHANNEL: The release channel of the service.
    • K1=V1,K2=V2: Optional: The Hive metastore configs used.
    • KERBEROS_PRINCIPAL: Optional: A Kerberos principal that exists in the both the keytab and the KDC. A typical principal is of the form "primary/instance@REALM", but there is no exact format.
    • KRB5_CONFIG: Optional: The krb5.config file specifies the KDC and the Kerberos realm information, which includes locations of KDCs and defaults for the realm and Kerberos applications.
    • CLOUD_SECRET: Optional: The relative resource name of a Secret Manager secret version.
  2. Verify that the creation was successful.

REST

Follow the API instructions to create a service by using the APIs Explorer.

Using non-RFC 1918 private IP address ranges

The provided VPC network may run out of available RFC 1918 addresses required by Dataproc Metastore services. If that happens, Dataproc Metastore will attempt to reserve private IP address ranges outside of the RFC 1918 ranges for service creation. For a list of supported non-RFC 1918 private ranges, see Valid ranges in the VPC network documentation.

Non-RFC 1918 private IP addresses used in Dataproc Metastore may conflict with a range in an on-premises network that is connected to the provided VPC network. To check the list of RFC 1918 and non-RFC 1918 private IP addresses reserved by Dataproc Metastore:

gcloud compute addresses list \
    --project NETWORK_PROJECT_ID \
    --filter="purpose:VPC_PEERING AND name ~ cluster|resourcegroup"

If a conflict is determined and cannot be mitigated by re-configuring the on-premises network, delete the offending Dataproc Metastore service and re-create it again after 2 hours.

After you create a Dataproc Metastore service

After you create a service, you can create and attach a Dataproc cluster or self-managed Apache Hive/Apache Spark/Presto cluster that uses the service as its Hive metastore.

What's next