Stay organized with collections Save and categorize content based on your preferences.

Create a basic Dataproc Metastore service

This page shows you how to create a Dataproc Metastore service and connect to it from a Dataproc cluster.

Dataproc Metastore provides you with a fully compatible Hive Metastore (HMS), which is the established standard in the open source big data ecosystem for managing technical metadata. This service helps you manage the metadata of your data lakes and provides interoperability between the various data processing tools you're using.

To start, you create a Dataproc Metastore service, which provides the core functionality of your HMS. After, you create a Dataproc cluster, and connect to it from your metastore. Your Dataproc cluster then uses the Dataproc Metastore as it's HMS.


To follow step-by-step guidance for this task directly in the Google Cloud console, click Guide me:

Guide me


Before you begin

  1. Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.
  2. In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

    Go to project selector

  3. Make sure that billing is enabled for your Cloud project. Learn how to check if billing is enabled on a project.

  4. Enable the Dataproc Metastore, Dataproc APIs.

    Enable the APIs

  5. In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

    Go to project selector

  6. Make sure that billing is enabled for your Cloud project. Learn how to check if billing is enabled on a project.

  7. Enable the Dataproc Metastore, Dataproc APIs.

    Enable the APIs

Required Roles

To get the permissions that you need to create a Dataproc Metastore and a Dataproc cluster, ask your administrator to grant you the following IAM roles:

  • To grant full access to all Dataproc Metastore resources, including setting IAM permissions: (roles/metastore.admin) on the user account or service account
  • To grant full control of Dataproc Metastore resources: Dataproc Metastore Editor (roles/metastore.editor) on the user account or service account
  • To create a Dataproc cluster: (roles/dataproc.worker) on the service account

For more information about granting roles, see Manage access.

These predefined roles contain the permissions required to create a Dataproc Metastore and a Dataproc cluster. To see the exact permissions that are required, expand the Required permissions section:

Required permissions

  • To create a Dataproc Metastore service: metastore.services.create on the user account or service account
  • To create a Dataproc cluster: Dataproc worker (roles/dataproc.worker) on on the service account

You might also be able to get these permissions with custom roles or other predefined roles.

For more information about specific Dataproc Metastore roles and permissions, see Dataproc Metastore IAM overview.

Create a Dataproc Metastore service

The following instructions show you how to create a basic Dataproc Metastore service using the provided default settings.

Console

  1. In the Google Cloud console, go to the Dataproc Metastore page.

    Go to Dataproc Metastore

  2. In the navigation bar, click +Create.

  3. In the Service name field, enter example-service.

  4. In the Data location field, select us-central1.

  5. For the remaining service configuration options, use the provided defaults.

  6. To create and start the service, click Submit.

    Your new metastore service appears on the Dataproc Metastore page. The status displays Creating until the service is ready to use. When it's ready, the status changes to Active. Provisioning the service might take a couple of minutes.

The following screenshot shows an example of the Create service page using some of the provided defaults.

The Create service page.

gcloud CLI

To create a basic metastore service using the provided defaults, run the following gcloud metastore services create command:

 gcloud metastore services create example-service \
     --location=LOCATION

Replace LOCATION with us-central1.

This is the default region that a Dataproc Metastore service is created in.

REST

Follow the API instructions to create a service by using the APIs Explorer.

Create a Dataproc cluster and connect to Dataproc Metastore

Next, you create a Dataproc cluster and connect to your metastore from the cluster. After that, your cluster uses the metastore service as its HMS. The cluster you create here uses the default provided settings.

Console

  1. In the Google Cloud console, go to the Dataproc Clusters page.

    Go to Dataproc Clusters

  2. In the navigation bar, select +Create cluster.

    The Create a cluster dialog opens providing multiple infrastructure choices that you can choose from.

  3. In the Cluster on Compute Engine row, select Create.

    The Create a Dataproc cluster on Compute Engine page opens.

  4. In the Cluster Name field, enter example-cluster.

  5. In the Region and Zone menus, select us-central1.

  6. For the remaining Set up cluster options, use the provided defaults.

  7. In the navigation menu, click the Customize cluster (optional) tab.

  8. In the Dataproc Metastore section, select the metastore service you created earlier.

    If you followed this tutorial as-is, it's named example-service.

  9. For the remaining service configuration options, use the provided defaults.

  10. To create the cluster, click Create.

    Your new cluster appears in the Clusters list. The cluster status displays Provisioning until the cluster is ready to use. When it's ready, the status changes to Active. Provisioning the cluster might take a couple of minutes.

gcloud CLI

To create a cluster using the provided default settings, run the following gcloud dataproc clusters create command:

 gcloud dataproc clusters create example-cluster \
    --dataproc-metastore=projects/PROJECT_ID/locations/LOCATION/services/example-service \
    --region=LOCATION

Replace PROJECT_ID with the project ID of the project that you created your Dataproc Metastore service in.

Replace LOCATION with us-central1.

This is the default region that a Dataproc Metastore service is created in. It is also the same region you used in the previous steps.

REST

Follow the API instructions to create a cluster by using the APIs Explorer.

Clean up

To avoid incurring charges to your Google Cloud account for the resources used on this page, follow these steps.

  1. In the Google Cloud console, go to the Manage resources page.

    Go to Manage resources

  2. If the project that you plan to delete is attached to an organization, expand the Organization list in the Name column.
  3. In the project list, select the project that you want to delete, and then click Delete.
  4. In the dialog, type the project ID, and then click Shut down to delete the project.

Alternatively, you can delete the resources used in this tutorial:

  1. Delete the Dataproc Metastore service.

    Console

    1. In the Google Cloud console, open the Dataproc Metastore page:

      Go to Dataproc Metastore

    2. In the service list, select example-service.

    3. In the navigation bar, click Delete.

      The Delete service dialog opens.

    4. In the dialog, click Delete

      Your service no longer appears in the Service list.

    gcloud CLI

    To delete a service, run the following gcloud metastore services delete command.

     gcloud metastore services delete example-service \
         --location=LOCATION

    Replace LOCATION with the region where the service was created. In the context of this tutorial, enter us-central1.

    REST

    Follow the API instructions to delete a service by using the APIs Explorer.

    All deletions succeed immediately.

  2. Delete the Cloud Storage bucket for the Dataproc Metastore service.

  3. Delete the Dataproc cluster that used the Dataproc Metastore service.

What's next