An export takes metadata stored in a Dataproc Metastore service, and
returns either a folder of Avro files or a MySQL dump file in a Cloud Storage
folder. For Avro export, Dataproc Metastore creates a <table-name>.avro
file for each table. Avro based exports are supported for Hive versions 2.3.6
and 3.1.2.
This page explains how to export metadata from an existing Dataproc Metastore service.
Before you begin
- Most
gcloud metastore
commands require a location. You can specify the location by using the--location
flag or by setting the default location.
Access control
To export metadata, you must request an IAM role containing the
metastore.services.export
IAM permission. The Dataproc Metastore specific rolesroles/metastore.admin
,roles/metastore.editor
, androles/metastore.metadataOperator
include export permission.You can give export permission to users or groups by using the
roles/owner
androles/editor
legacy roles.The Dataproc Metastore service agent (
service-CUSTOMER_PROJECT_NUMBER@gcp-sa-metastore.iam.gserviceaccount.com
) must havestorage.objects.create
permission on the Cloud Storage bucket destination for your export.- The user creating the export must also have
storage.objects.create
permission on the bucket.
- The user creating the export must also have
If you're using VPC Service Controls, then you can only export data to a Cloud Storage bucket that resides in the same service perimeter as the Dataproc Metastore service.
For more information, see Dataproc Metastore IAM and access control.
Export metadata from a service
To export metadata from a service, select the export destination on the Service detail page opened in a local browser, use the gcloud CLI, or issue a Dataproc Metastore API method services.exportMetadata.
While an export is running, no updates can be made to the service. You can still use the service while it's undergoing an export.
To export metadata from a Dataproc Metastore service, complete the following steps:
Console
In the Cloud console, open the Dataproc Metastore page:
On the Dataproc Metastore page, click the name of the service you'd like to export metadata from. The Service detail page opens.
At the top of the page, click the Export button. The Export metadata page opens.
Select the Destination.
Browse to and select the Cloud Storage URI where you'd like the export to be stored at.
Click the Submit button to start the export.
Verify that you have returned to the Service detail page, and that your export appears under Export history on the Import/Export tab.
gcloud
Run the following
gcloud metastore services export gcs
command to export metadata from a service:gcloud metastore services export gcs SERVICE \ --location=LOCATION \ --destination-folder=gs://bucket-name/path/to/folder \ --dump-type=DUMP_TYPE
Replace the following:
SERVICE
: The name of the service.LOCATION
: Refers to a Google Cloud region.bucket-name/path/to/folder
: Refers to the Cloud Storage destination folder.DUMP_TYPE
: The type of database dump. Defaults tomysql
.
Verify that the export was successful.
REST
Follow the API instructions to export metadata into a service by using the API Explorer.
When the export completes, the service automatically enters active state regardless of whether or not it succeeded.
To view a service's export history, refer to the Import/Export tab on the Service detail page in the Cloud console.
Export caveats
Avro based exports are supported for Hive versions 2.3.6 and 3.1.2.
A history of past exports is available on the UI. Deleting the service itself deletes all export history under that service.
Common failures
The Dataproc Metastore service agent (
service-CUSTOMER_PROJECT_NUMBER@gcp-sa-metastore.iam.gserviceaccount.com
) doesn't havestorage.objects.create
permission on the Cloud Storage bucket used for the Avro or MySQL dump files.- The user creating the export doesn't have
storage.objects.create
permission on the bucket.
- The user creating the export doesn't have
Your database file is too large and takes more than one hour to complete the export process.