Quickstart for migrating to Dataproc Metastore

This page shows you how to migrate your external self-managed MySQL metastore to Dataproc Metastore by creating a MySQL dump file and importing the metadata into an existing Dataproc Metastore service.

Before you begin

  1. Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.
  2. In the Google Cloud Console, on the project selector page, select or create a Google Cloud project.

    Go to project selector

  3. Make sure that billing is enabled for your Cloud project. Learn how to confirm that billing is enabled for your project.

  4. In the Google Cloud Console, on the project selector page, select or create a Google Cloud project.

    Go to project selector

  5. Make sure that billing is enabled for your Cloud project. Learn how to confirm that billing is enabled for your project.

  6. Enable the Dataproc Metastore API.

    Enable the API

Access control

  • If you're using VPC Service Controls, then you can only import data from a Cloud Storage bucket that resides in the same service perimeter as the Dataproc Metastore service.

  • To create a service, you must request an IAM role containing the metastore.services.create IAM permission. To import metadata, you must request an IAM role containing the metastore.imports.create IAM permission. The Dataproc Metastore specific roles roles/metastore.admin and roles/metastore.editor include create and import permissions.

  • You can give create and import permissions to users or groups by using the roles/owner and roles/editor legacy roles.

  • The Dataproc Metastore service agent (service-CUSTOMER_PROJECT_NUMBER@gcp-sa-metastore.iam.gserviceaccount.com) and the user importing the metadata must have storage.objects.get permission on the Cloud Storage object (SQL dump file) used for the import.

To get and set IAM policies, you can use the following:

For more information, see Dataproc Metastore IAM and access control.

Create a Dataproc Metastore service

The following instructions demonstrate how to create a Dataproc Metastore service that you can then migrate to:

Console

  1. In the Cloud Console, open the Create service page:

    Open the Create service page in the Cloud Console

    Create service page
  2. In the Service name field, enter example-service.

  3. Select the Data location. For information on selecting a region, see Cloud locations.

  4. For other service configuration options, use the provided defaults.

  5. To create and start the service, click the Submit button.

Your new service appears in the Service list.

gcloud

Run the following gcloud metastore services create command to create a service:

 gcloud metastore services create example-service \
     --location=LOCATION
 

Replace LOCATION with the Compute Engine region where you plan to create the service. Make sure Dataproc Metastore is available in the region.

REST

Follow the API instructions to create a service by using the API Explorer.

Prepare for migration

You must now prepare the metadata stored in your Hive metastore database for import by making a MySQL dump file and placing it into a Cloud Storage bucket.

See Prepare the import for steps to prepare for migration.

Import the metadata

Now that you've prepared the dump file, import it into your Dataproc Metastore service.

See Perform the import for steps to import your metadata into your example-service service.

Create and attach a Dataproc cluster

After you import your metadata into your Dataproc Metastore example-service service, create and attach a Dataproc cluster that uses the service as its Hive metastore.

Clean up

To avoid incurring charges to your Google Cloud account for the resources used in this page, follow these steps.

  1. In the Cloud Console, go to the Manage resources page.

    Go to Manage resources

  2. If the project that you plan to delete is attached to an organization, expand the Organization list in the Name column.
  3. In the project list, select the project that you want to delete, and then click Delete.
  4. In the dialog, type the project ID, and then click Shut down to delete the project.

Alternatively, you can delete the resources used in this tutorial:

  1. Delete the Dataproc Metastore service.

    Console

    1. In the Cloud Console, open the Dataproc Metastore page:

      Open Dataproc Metastore in the Cloud Console

    2. On the left of the service name, select example-service by checking the box.

    3. At the top of the Dataproc Metastore page, click Delete to delete the service.

    4. On the dialog, click Delete to confirm the deletion.

    Your service no longer appears in the Service list.

    gcloud

    Run the following gcloud metastore services delete command to delete a service:

     gcloud metastore services delete example-service \
         --location=LOCATION
     

    Replace LOCATION with the Compute Engine region where you created the service.

    REST

    Follow the API instructions to delete a service by using the API Explorer.

    All deletions succeed immediately.

  2. Delete the Cloud Storage bucket for the Dataproc Metastore service.

What's next