Troubleshooting Dataproc Metastore Connectivity

This page provides guidance on diagnosing and resolving common connectivity issues when connecting Dataproc clusters or Dataproc Serverless workloads to a managed Dataproc Metastore service.

Common Symptoms and Error Messages

When Dataproc encounters connectivity problems with Dataproc Metastore, you might see errors such as:

  • Unable to connect to Hive Metastore
  • Connection refused
  • Host unreachable
  • javax.jdo.JDOException or similar database connection errors
  • Timeout errors when attempting to list databases or tables, or when submitting Spark or Hive jobs that interact with the Metastore.

Common Causes and Troubleshooting Steps

This section outlines common reasons why Dataproc Metastore connectivity issues occur and provides specific troubleshooting steps for each.

1. Network Configuration Issues

Network misconfigurations are the most frequent cause of connectivity failures between Dataproc workloads and Dataproc Metastore.

  • Virtual Private Cloud Network Peering or Private Service Access:

    • Dataproc Metastore instances are typically accessed using a private IP address range using a Virtual Private Cloud network peering connection (specifically, Private Service Access).
    • Verify Peering Status: Verify the Virtual Private Cloud peering connection between your Dataproc workload's Virtual Private Cloud network and the service producer network for Dataproc Metastore is active and healthy. You can check this in the Google Cloud console under VPC Network > VPC Network Peering.
    • IP Range Allocation: Confirm that a sufficient IP range has been allocated for Private Service Access in your Virtual Private Cloud network.
  • Firewall Rules:

    • Verify that firewall rules in your Dataproc workload's Virtual Private Cloud network allow outbound traffic on the port used by Dataproc Metastore (default is 9083).
    • Verify there are no overly restrictive ingress rules on the service producer network side that would block traffic from your Dataproc workload.
  • DNS Resolution:

    • Confirm that the Metastore endpoint hostname (e.g., your-metastore-endpoint.us-central1.dataproc.cloud.google.com) resolves correctly to a private IP address from your Dataproc cluster or Dataproc Serverless environment.
    • Issues with Cloud DNS private zones or DNS forwarding can cause resolution failures.

Troubleshooting Steps (Network):

  1. Check Dataproc Metastore Connectivity Information:
    • In the Google Cloud console, navigate to Dataproc Metastore and select your instance.
    • Note the Endpoint URI and the Network it's connected to.
  2. Verify Virtual Private Cloud Peering or Private Service Access:
    • Go to VPC Network > VPC Network Peering. Confirm the peering connection to servicenetworking-googleapis-com is ACTIVE.
  3. Use Connectivity Tests: Use Google Cloud's Connectivity Tests to diagnose the network path from a Compute Engine VM in your Dataproc workload's subnet to the Dataproc Metastore endpoint IP address and port.
  4. Check Firewall Logs: If firewall rules are suspected, analyze Cloud Firewall logs for denied connections.

2. IAM Permissions

The service account used by your Dataproc workload needs appropriate IAM roles to access Dataproc Metastore.

  • Required Role: The service account must have the Dataproc Metastore User role (roles/datametastore.user) on the Dataproc Metastore instance or project.
  • Service Agent Permissions: Verify the Dataproc service agent has sufficient permissions if Dataproc is implicitly accessing the Metastore.

Troubleshooting Steps (IAM):

  1. Identify Service Account: Determine the service account used by your Dataproc cluster or Dataproc Serverless batch.
  2. Verify IAM Roles: Go to IAM & Admin > IAM in the Google Cloud console. Check the roles assigned to the service account on the Dataproc Metastore project or instance. Grant roles/datametastore.user if missing.
  3. For more details on service account configuration, refer to:

3. Incorrect Endpoint Configuration

The Dataproc workload must be configured with the correct Dataproc Metastore endpoint URI.

Troubleshooting Steps (Endpoint):

  1. Verify Endpoint URI: Double-check the hive.metastore.uris Spark property or any other configuration used to specify the Dataproc Metastore endpoint in your workload submission. Verify it matches the Endpoint URI from your Dataproc Metastore instance details.

4. Other Considerations

  • Metastore Status: Verify that your Dataproc Metastore instance is in a HEALTHY state in the Google Cloud console. If it's unhealthy, address the Metastore's internal issues first.
  • Version Compatibility: While rare, verify there are no known compatibility issues between your Dataproc image version and the Dataproc Metastore version.
  • SQL Proxy versus Managed Service: If you are using Cloud SQL as a Metastore using the cloud-sql-proxy.sh initialization action, refer to its specific troubleshooting in the Cloud SQL Proxy initialization action README.

What's next