Troubleshooting

This page explains various error scenarios and troubleshooting steps to resolve the errors.

Connectivity and networking error scenarios

If your service experiences connectivity or networking issues, check the scenarios in the following sections to see if one of them is causing the problem.

Service creation fails due to constraint to restrict VPC peering

Don't set the org-policy constraint to restrict VPC peering. Specifying constraints/compute.restrictVpcPeering causes your creation request to fail with an INVALID_ARGUMENT error. If you must set the constraint, use the following command to allow under:folders/270204312590:

gcloud resource-manager org-policies allow compute.restrictVpcPeering under:folders/270204312590 --organization ORGANIZATION_ID

For more information, see Organization policy constraints.

Cross-project deployment fails where service account may not exist

To create a Dataproc Metastore service that is accessible in a network belonging to a different project than the one the service belongs to, you must grant roles/metastore.serviceAgent to the service project's Dataproc Metastore service agent (service-SERVICE_PROJECT_NUMBER@gcp-sa-metastore.iam.gserviceaccount.com) in the network project's IAM policy.

gcloud projects add-iam-policy-binding NETWORK_PROJECT_ID \
    --role "roles/metastore.serviceAgent" \
    --member "serviceAccount:service-SERVICE_PROJECT_NUMBER@gcp-sa-metastore.iam.gserviceaccount.com"

For more information, see Set up a cross-project deployment.

Multi-region Dataproc Metastore service fails to create

When creating a multi-regional Dataproc Metastore service, you might see the following error:

The Dataproc Service Agent does not have permission to GET Dataproc Metastore service
'projects/<dpmsproject>/locations/nam7/services/<service>'. If using a service from the
cluster's project, please ensure the service agent has the Cloud IAM role
'roles/dataproc.serviceAgent'. Please see
https://cloud.google.com/dataproc/docs/concepts/iam/dataproc-
principals#service_agent_control_plane_identity for information on the Dataproc Service
Agent, and https://cloud.google.com/iam/docs/granting-changing-revoking-access#granting-
gcloud-manual for information on granting access.

This error occurs if your Dataproc cluster is in a different project then your Dataproc Metastore, but you haven't granted the correct permissions to your project.

To resolve this error, must grant the metastore.services.get permission to the Dataproc service account contained in your Dataproc Metastore project.

Private IP is required for network connectivity

Dataproc Metastore uses private IP only, so no public IP is exposed. This means that only VMs on the provided Virtual Private Cloud (VPC) network or on-premises (connected through Cloud VPN or Cloud Interconnect) can access the Dataproc Metastore service.

For more information, see Service access.

Required ingress-allow firewall rule for Kerberos

Kerberos requires Dataproc Metastore to initiate connections to hosts in your project network. You must create a firewall rule that allows TCP and UDP ingress on all ports from the /17 IP block that contains the Dataproc Metastore IP.

For more information, see Firewall rules on your KDC and Firewall rules for your services.

Connection error caused by resources provisioned in Shared VPC networks

If your Dataproc Metastore. If that happens, Dataproc Metastore will attempt to reserve private IP address ranges outside of the RFC 1918 ranges for service creation. For a list of supported non-RFC 1918 private ranges, see Valid ranges in the VPC network documentation.} service uses a network belonging to a different project, and Compute Engine is protected by the service perimeter, then the metastore project and network project must be in the same perimeter.

To add existing Dataproc Metastore projects to the perimeter, follow the instructions in Updating a service perimeter.

For more information, see VPC Service Controls with Dataproc Metastore.

Allocated IP range is exhausted

The provided VPC network may run out of available RFC 1918 addresses required by Dataproc Metastore services. If that happens, Dataproc Metastore will attempt to reserve private IP address ranges outside of the RFC 1918 ranges for service creation. For a list of supported non-RFC 1918 private ranges, see Valid ranges in the VPC network documentation.

Non-RFC 1918 private IP addresses used in Dataproc Metastore may conflict with a range in an on-premises network that is connected to the provided VPC network. To check the list of RFC 1918 and non-RFC 1918 private IP addresses reserved by Dataproc Metastore:

gcloud compute addresses list \
    --project NETWORK_PROJECT_ID \
    --filter="purpose:VPC_PEERING AND name ~ cluster|resourcegroup"

If a conflict is determined and cannot be mitigated by re-configuring the on-premises network, delete the offending Dataproc Metastore service and re-create it again after 2 hours.

For more information, see IP address range exhaustion.

Operation timeout error scenarios

The following error scenarios result in an unresponsive service or operation timeouts.

Use Audit logs to troubleshoot operation timeouts

To troubleshoot service operation timeouts, use the Logs Explorer in the Google Cloud console to retrieve your audit log entries for your Cloud project.

In the Query builder pane, select Audited Resource or audited_resource as the Google Cloud resource type, followed by Dataproc Metastore or metastore.googleapis.com as the service. Selecting a method is optional.

For more information, see Viewing logs.

Import and export error scenarios

This section outlines common Dataproc Metastore import and export issues.

Import fails because the Hive versions don't match

When you import metadata, the Hive metastore and Dataproc Metastore versions must be compatible. Your import may fail if the two don't match.

For more information on compatibility, see version policy.

You can check Cloud Logging to see if you encountered this error. The following message is an example of this error.

Metastore schema version is not compatible. Hive Version: 3.1.0, Database Schema Version: 2.3.0

Import fails because there are missing Hive table files

When preparing the import for Avro, there should be a file for each Hive table, even if the table is empty, otherwise the import will fail.

You can use the Dataproc Metastore Toolkit as a reference to check the correct Hive metadata schemas.

Service agent or user doesn't have necessary permissions

The Dataproc Metastore service agent (service-CUSTOMER_PROJECT_NUMBER@gcp-sa-metastore.iam.gserviceaccount.com) and the user importing the metadata must have the following read permission to the Cloud Storage bucket used for the import:

  • For MySQL, they must have storage.objects.get permission on the Cloud Storage object (SQL dump file) used for the import.

  • For Avro, they must have storage.objects.get permission on the Cloud Storage bucket used for the import.

For exports, the Dataproc Metastore service agent (service-CUSTOMER_PROJECT_NUMBER@gcp-sa-metastore.iam.gserviceaccount.com) and the user creating the export must have storage.objects.create permission on the bucket.

Job fails because the database file is too big

If your database file is too large, it can take more than the job timeout limit to complete the import or export process. If the import is not able to complete within the 3 hour job timeout limit, you can split your metadata across multiple Dataproc Metastore instances.

Backup and restore error scenarios

This section outlines common Dataproc Metastore backup and restore issues.

Unable to create a new backup for a service

If there are already 7 backups in a service, then you must first manually delete a backup before creating a new one. You can delete existing backups from the Backup/Restore tab.

User doesn't have necessary permissions

To backup metadata, you must request an IAM role containing the metastore.backups.create IAM permission.

To restore metadata, you must request an IAM role containing the metastore.services.restore and metastore.backups.use IAM permissions.

Job fails because the service metadata file is too big

If your service metadata file is too large, it can take more than the one hour job timeout limit to complete the backup or restore process.

Dataproc cluster error scenarios

The following sections outline common Dataproc and self-managed cluster issues.

Dataproc cluster attachment fails due to mismatched versions

When you attach a Dataproc cluster, the Dataproc image and Dataproc Metastore Hive version must be compatible. Check the following image versioning pages to ensure that the Hive version is compatible:

For more information, see Dataproc Image version list.

Dataproc cluster creation fails due to insufficient roles

When creating a Dataproc cluster, the cluster creation fails with the following error message:

gcloud.dataproc.clusters.create) INVALID_ARGUMENT: Default Service Account '**********-compute@developer.gserviceaccount.com' is missing required permissions: [dataproc.agents.create, dataproc.agents.delete, dataproc.agents.get, dataproc.agents.update, dataproc.tasks.lease, dataproc.tasks.listInvalidatedLeases, dataproc.tasks.reportStatus]. Service Accounts must have either 'Dataproc/Dataproc Worker' role or all permissions granted by the role.

This can occur when you try to create a Dataproc cluster and attach it to your Dataproc Metastore.

To fix this issue, grant the roles/dataproc.worker to your service account. The following example illustrates how to do this, using some sample values.

gcloud projects add-iam-policy-binding my_project \
    --member=219315-compute@developer.gserviceaccount.com \
    --role=roles/dataproc.worker

Connection to metastore fails due to time zone issues

Your Hive file returns the error The server time zone value 'CDT' is unrecognized or represents more than one time zone.

If you can't connect to your metastore from Dataproc, then you must delete your current cluster and update your gcloud dataproc clusters create command or automation tool to include the following time zone parameter:

--metadata "hive-metastore-uri=jdbc:mysql://10.207.26.77:3306/hive_metastore?serverTimezone=CST6CDT"

Troubleshooting Google Cloud CLI issues

If you encounter an issue where a gcloud CLI command is unavailable, or if the command behaves differently from how it is documented, try updating the gcloud CLI:

gcloud components update

What's next