Configuring Network Access for Dataproc Metastore

This page provides detailed guidance on configuring network access for your Dataproc Metastore instances. Correct network setup is essential for Dataproc clusters and Dataproc Serverless workloads to securely and privately communicate with your managed Dataproc Metastore service.

Key Networking Concepts

Dataproc Metastore instances typically reside within a Google-managed service producer network and communicate with your Virtual Private Cloud network using private connectivity. Understanding the following concepts is crucial for a successful setup:

  • Shared Virtual Private Cloud: If your Dataproc clusters or Dataproc Serverless workloads are in a service project that uses a Shared Virtual Private Cloud network from a host project, verify the appropriate network configurations are made in the host project. For more information, see Shared Virtual Private Cloud overview.
  • Private Google Access: Dataproc Metastore instances often rely on Private Google Access for private communication with your Virtual Private Cloud network. This allows Virtual Machine (VM) instances in your Virtual Private Cloud to connect to Google APIs and services using internal IP addresses. For more information, see Private Google Access.
  • VPC Network Peering: This mechanism enables private IP connectivity between two Virtual Private Cloud networks, allowing resources in one network to communicate with resources in the other using internal IP addresses. Dataproc Metastore establishes a managed VPC Network Peering connection to your Virtual Private Cloud network as part of its setup. For more information, see VPC Network Peering.
  • Firewall Rules: Proper firewall rules are necessary to permit traffic between your Dataproc workloads and the Dataproc Metastore instance.
  • Cloud DNS Resolution: Verify that DNS resolution is correctly configured within your Virtual Private Cloud network to resolve the Dataproc Metastore endpoint URI to its private IP address.

Configuration Steps

To verify proper network access for your Dataproc Metastore instance, follow these steps:

1. Configure Private Service Access

Dataproc Metastore uses Private Service Access to establish a private connection between your Virtual Private Cloud network and the Google-managed service producer network where your Dataproc Metastore instance resides.

  • Verify Private Service Access Connection:
    1. In the Google Cloud console, go to Virtual Private Cloud network > VPC Network Peering.
    2. Verify that a peering connection named servicenetworking-googleapis-com exists and its state is ACTIVE.
    3. If this connection is missing or not active, follow the instructions in Configuring Private Service Access. This includes allocating an IP address range for the service producer network.

2. Configure Firewall Rules

Verify that firewall rules in your Virtual Private Cloud network (or the Shared Virtual Private Cloud host project, if applicable) allow necessary traffic.

  • Egress Rule from Workload to Metastore:
    • Verify that an egress firewall rule allows outbound TCP traffic from your Dataproc cluster or Dataproc Serverless workloads to the IP address range of your Dataproc Metastore instance on port 9083. This is the default port for Hive Metastore.
    • If using Private Service Access, this traffic will be routed privately.
  • Ingress Rules (less common for client-to-Metastore):
    • Generally, you don't need to configure ingress rules on your Virtual Private Cloud for traffic from the Dataproc Metastore instance to your workload, as communication typically originates from the workload. However, verify no overly restrictive ingress rules are inadvertently blocking necessary responses.

3. Verify DNS Resolution

Your Dataproc workloads need to resolve the Dataproc Metastore endpoint URI to its private IP address.

  • DNS Peering or Private Zones: If you are using custom DNS servers or private Cloud DNS zones, verify that DNS queries for the Dataproc Metastore endpoint (e.g., your-metastore-endpoint.us-central1.dataproc.cloud.google.com) are correctly forwarded or resolved to the private IP range used by Private Service Access.
  • Testing DNS Resolution: From a VM within the same subnet as your Dataproc workload, use nslookup or dig to verify that the Dataproc Metastore endpoint resolves to a private IP address.

Troubleshooting Network Connectivity

If you encounter connectivity issues after configuring network access, consider the following troubleshooting steps:

What's next