This page shows you how to create a Dataproc Metastore service in the Google Cloud Console and create a Dataproc cluster that uses the service as its Hive metastore.
Before you begin
- Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.
In the Google Cloud Console, on the project selector page, select or create a Google Cloud project.
Make sure that billing is enabled for your Cloud project. Learn how to confirm that billing is enabled for your project.
- Enable the Dataproc Metastore API.
To create a service, you must be granted an IAM role containing the
metastore.services.createIAM permission. The Dataproc Metastore specific roles
roles/metastore.editorcan be used to grant create permission.
You can also give create permission to users or groups by using the
For more information, see Dataproc Metastore IAM and access control.
Creating a Dataproc Metastore service
The following instructions demonstrate how to create a Dataproc Metastore
service using the Google Cloud Console, the
gcloud tool, or the
Dataproc Metastore API.
In the Cloud Console, open the Create service page:
In the Service name field, enter
Select the Data location. For information on selecting a region, see Available regions.
For other service configuration options, use the provided defaults.
To create and start the service, click the Submit button.
Your new service appears in the Service list.
Use the following
gcloud metastore services create
command to create a service:
gcloud metastore services create example-service \ --location=LOCATION
LOCATION with the Compute Engine region where the service is
to be created. Make sure that the location you specify is one where
Dataproc Metastore is available.
Follow the API instructions to create a service by using the APIs Explorer.
Creating a Dataproc cluster that uses the service
After you create a service, you can create and attach a Dataproc cluster that uses the service as its Hive metastore.
The Dataproc image and Dataproc Metastore Hive version must be compatible:
Dataproc 2.x images require Dataproc Metastore services created with Hive 3.1.2.
Dataproc 1.x images require Dataproc Metastore services created with either Hive 2.3.6 or 3.1.2, but perform optimally with 2.3.6.
For more information on Dataproc image versions and to find out which Hive version is used by a Dataproc image, see Dataproc Versioning.
In the Cloud Console, open the Dataproc Create a cluster page:
In the Cluster Name field, enter
On the Region and Zone menus, select a region and zone for the cluster. You can select a distinct region, to isolate resources and metadata storage locations within the specified region. If you select a distinct region, you can select "No preference" for the zone to let Dataproc pick a zone within the selected region for your cluster (see Dataproc Auto zone placement).
Use the provided defaults for all the other options.
Click on the Customize cluster tab.
In the Network configuration section, select the same network specified during the metastore service creation.
In the Dataproc Metastore section, select
Click Create to create the cluster.
Your new cluster appears in the Clusters list. Cluster status is listed as "Provisioning" until the cluster is ready to use, then changes to "Running."
Use the following
gcloud dataproc clusters create command to create a cluster:
gcloud dataproc clusters create example-cluster \ --dataproc-metastore=projects/PROJECT_ID/locations/LOCATION/services/example-service \ --region=LOCATION
PROJECT_ID with the project ID of the project you created your
Dataproc Metastore service in.
LOCATION with the same region you specified above for
the Dataproc Metastore service.
Follow the API instructions to create a cluster by using the APIs Explorer.
To avoid incurring charges to your Google Cloud account for the resources used in this quickstart, follow these steps.
- In the Cloud Console, go to the Manage resources page.
- If the project that you plan to delete is attached to an organization, expand the Organization list in the Name column.
- In the project list, select the project that you want to delete, and then click Delete.
- In the dialog, type the project ID, and then click Shut down to delete the project.
Alternatively, you can delete the resources used in this tutorial:
Delete the Dataproc Metastore service.
In the Cloud Console, open the Dataproc Metastore page:
On the left of the service name, select
example-serviceby checking the box.
At the top of the Dataproc Metastore page, click Delete to delete the service.
On the dialog, click Delete to confirm the deletion.
Your service no longer appears in the Service list.
Use the following
gcloud metastore services deletecommand to delete a service:
gcloud metastore services delete example-service \ --location=LOCATION
LOCATIONwith the Compute Engine region where the service was created.
Follow the API instructions to delete a service by using the APIs Explorer.
All deletions succeed immediately.
Delete the Cloud Storage bucket for the Dataproc Metastore service.
Delete the Dataproc cluster that used the Dataproc Metastore service.