Deploy a Dataproc Metastore service with a Dataproc cluster
This page shows you how to create a Dataproc Metastore service and a Dataproc cluster that uses the service as its Hive metastore.
For step-by-step guidance on this task directly in console, click Guide me:
The following sections take you through the same steps as clicking Guide me.
Before you begin
- Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.
-
In the Google Cloud console, on the project selector page, select or create a Google Cloud project.
-
Make sure that billing is enabled for your Cloud project. Learn how to check if billing is enabled on a project.
-
In the Google Cloud console, on the project selector page, select or create a Google Cloud project.
-
Make sure that billing is enabled for your Cloud project. Learn how to check if billing is enabled on a project.
-
Enable the Dataproc Metastore API.
Required Roles
To get the permission that you need to create a Dataproc Metastore, ask your administrator to grant you the following IAM roles on your project. You might not them all, depending on the level of access you require:
- Grant full control of Dataproc Metastore resources (
roles/metastore.editor
) - Grant full access to all Dataproc Metastore resources, including IAM policy administration (
roles/metastore.admin
)
For more information about granting roles, see Manage access.
This predefined role contains the
metastore.services.create
permission, which is
required to create a Dataproc Metastore.
You might
also be able to get this
permission
with custom roles or
other predefined roles.
Create a Dataproc Metastore service
The following instructions demonstrate how to create a Dataproc Metastore service using the Google Cloud console, the gcloud CLI, or the Dataproc Metastore API.
Console
In the console, open the Create service page:
Open the Create service page in the console
In the Service name field, enter
example-service
.Select the Data location. For information on selecting a region, see Cloud locations.
For other service configuration options, use the provided defaults.
To create and start the service, click the Submit button.
Your new service appears in the Service list.
gcloud
Run the following gcloud metastore services create
command to create a
service:
gcloud metastore services create example-service \ --location=LOCATION
Replace LOCATION
with the Compute Engine region
where the service is to be created. Make sure that Dataproc Metastore is available
in the location.
REST
Follow the API instructions to create a service by using the API Explorer.
Create a Dataproc cluster that uses the service
After you create a service, you can create and attach a Dataproc cluster that uses the service as its Hive metastore.
The Dataproc image and Dataproc Metastore Hive version must be compatible. Check the following image versioning pages to ensure that the Hive version is compatible:
- Dataproc 2.0.x release versions
- Dataproc 1.5.x release versions
- Dataproc 1.4.x release versions
For more information, see Dataproc Image version list.
Console
In the console, open the Dataproc Create a cluster page:
In the Cluster Name field, enter
example-cluster
.On the Region and Zone menus, select a region and zone for the cluster. You can select a distinct region, to isolate resources and metadata storage locations within the specified region. If you select a distinct region, you can select "No preference" for the zone to let Dataproc pick a zone within the selected region for your cluster (see Dataproc Auto zone placement).
Use the provided defaults for all the other options.
Click on the Customize cluster tab.
In the Network configuration section, select the same network specified during the metastore service creation.
In the Dataproc Metastore section, select
example-service
.Click Create to create the cluster.
Your new cluster appears in the Clusters list. Cluster status is listed as "Provisioning" until the cluster is ready to use. Its status then changes to "Running."
gcloud
Run the following gcloud dataproc clusters create
command to create a
cluster:
gcloud dataproc clusters create example-cluster \ --dataproc-metastore=projects/PROJECT_ID/locations/LOCATION/services/example-service \ --region=LOCATION
Replace PROJECT_ID
with the project ID of the
project you created your Dataproc Metastore service in.
Replace LOCATION
with the region you specified
for the Dataproc Metastore service.
REST
Follow the API instructions to create a cluster by using the API Explorer.
Clean up
To avoid incurring charges to your Google Cloud account for the resources used on this page, follow these steps.
- In the console, go to the Manage resources page.
- If the project that you plan to delete is attached to an organization, expand the Organization list in the Name column.
- In the project list, select the project that you want to delete, and then click Delete.
- In the dialog, type the project ID, and then click Shut down to delete the project.
Alternatively, you can delete the resources used in this tutorial:
Delete the Dataproc Metastore service.
Console
In the console, open the Dataproc Metastore page:
On the left of the service name, select
example-service
by checking the box.At the top of the Dataproc Metastore page, click Delete to delete the service.
On the dialog, click Delete to confirm the deletion.
Your service no longer appears in the Service list.
gcloud
Run the following
gcloud metastore services delete
command to delete a service:gcloud metastore services delete example-service \ --location=LOCATION
Replace
LOCATION
with the Compute Engine region where the service was created.REST
Follow the API instructions to delete a service by using the API Explorer.
All deletions succeed immediately.
Delete the Cloud Storage bucket for the Dataproc Metastore service.
Delete the Dataproc cluster that used the Dataproc Metastore service.