Exporting metadata from a service

An export takes metadata stored in a Dataproc Metastore service, and returns either a folder of Avro files or a MySQL dump file in a Cloud Storage folder. For Avro export, Dataproc Metastore creates a <table-name>.avro file for each table. Avro based exports are supported for Hive versions 2.3.6 and 3.1.2.

This page explains how to export metadata from an existing Dataproc Metastore service.

Before you begin

  • Most gcloud metastore commands require a location. You can specify the location by using the --location flag or by setting the default location.

Access control

  • To export metadata, you must be granted an IAM role containing the metastore.services.export IAM permission. The Dataproc Metastore specific roles roles/metastore.admin, roles/metastore.editor, and roles/metastore.metadataOperator include export permission.

  • You can give export permission to users or groups by using the roles/owner and roles/editor legacy roles.

  • The Dataproc Metastore service agent (service-CUSTOMER_PROJECT_NUMBER@gcp-sa-metastore.iam.gserviceaccount.com) must have storage.objects.create permission on the Cloud Storage bucket destination for your export.

    • The user creating the export must also have storage.objects.create permission on the bucket.
  • If you're using VPC Service Controls, then you can only export data to a Cloud Storage bucket that resides in the same service perimeter as the Dataproc Metastore service.

For more information, see Dataproc Metastore IAM and access control.

Exporting metadata from a service

To export metadata from a service, select the export destination on the Service detail page opened in a local browser, use the gcloud tool, or issue a Dataproc Metastore API method services.exportMetadata.

While an export is running, no updates can be made to the service. You can still use the service while it's undergoing an export.

To export metadata from a Dataproc Metastore service, complete the following steps:

Console

  1. In the Cloud Console, open the Dataproc Metastore page:

    Open Dataproc Metastore in the Cloud Console

  2. On the Dataproc Metastore page, click the name of the service you'd like to export metadata from. The Service detail page opens.

    Service detail page
  3. At the top of the page, click the Export button. The Export metadata page opens.

  4. Select the Destination.

  5. Browse to and select the Cloud Storage URI where you'd like the export to be stored at.

  6. Click the Submit button to start the export.

  7. Verify that you have returned to the Service detail page, and that your export appears under Export history on the Import/Export tab.

gcloud

  1. Run the following gcloud metastore services export gcs command to export metadata from a service:

    gcloud metastore services export gcs SERVICE  \
        --location=LOCATION \
        --destination-folder=gs://bucket-name/path/to/folder \
        --dump-type=DUMP_TYPE
    

    Replace the following:

    • SERVICE: The name of the service.
    • LOCATION: Refers to a Google Cloud region.
    • bucket-name/path/to/folder: Refers to the Cloud Storage destination folder.
    • DUMP_TYPE: The type of database dump. Defaults to mysql.
  2. Verify that the export was successful.

REST

Follow the API instructions to export metadata into a service by using the APIs Explorer.

When the export completes, the service automatically enters active state regardless of whether or not it succeeded.

To view a service's export history, refer to the Import/Export tab on the Service detail page in the Cloud Console.

Export caveats

  • Avro based exports are supported for Hive versions 2.3.6 and 3.1.2.

  • A history of past exports is available on the UI. Deleting the service itself deletes all export history under that service.

Common failures

  • The Dataproc Metastore service agent (service-CUSTOMER_PROJECT_NUMBER@gcp-sa-metastore.iam.gserviceaccount.com) doesn't have storage.objects.create permission on the Cloud Storage bucket used for the Avro or MySQL dump files.

    • The user creating the export doesn't have storage.objects.create permission on the bucket.
  • Your database file is too large and takes more than one hour to complete the export process.

What's next