This page provides detailed guidance on configuring network access for your Dataproc Metastore instances. Correct network setup is essential for Dataproc clusters and Dataproc Serverless workloads to securely and privately communicate with your managed Dataproc Metastore service.
Key Networking Concepts
Dataproc Metastore instances typically reside within a Google-managed service producer network and communicate with your Virtual Private Cloud network using private connectivity. Understanding the following concepts is crucial for a successful setup:
- Shared Virtual Private Cloud: If your Dataproc clusters or Dataproc Serverless workloads are in a service project that uses a Shared Virtual Private Cloud network from a host project, verify the appropriate network configurations are made in the host project. For more information, see Shared Virtual Private Cloud overview.
- Private Google Access: Dataproc Metastore instances often rely on Private Google Access for private communication with your Virtual Private Cloud network. This allows Virtual Machine (VM) instances in your Virtual Private Cloud to connect to Google APIs and services using internal IP addresses. For more information, see Private Google Access.
- VPC Network Peering: This mechanism enables private IP connectivity between two Virtual Private Cloud networks, allowing resources in one network to communicate with resources in the other using internal IP addresses. Dataproc Metastore establishes a managed VPC Network Peering connection to your Virtual Private Cloud network as part of its setup. For more information, see VPC Network Peering.
- Firewall Rules: Proper firewall rules are necessary to permit traffic between your Dataproc workloads and the Dataproc Metastore instance.
- Cloud DNS Resolution: Verify that DNS resolution is correctly configured within your Virtual Private Cloud network to resolve the Dataproc Metastore endpoint URI to its private IP address.
Configuration Steps
To verify proper network access for your Dataproc Metastore instance, follow these steps:
1. Configure Private Service Access
Dataproc Metastore uses Private Service Access to establish a private connection between your Virtual Private Cloud network and the Google-managed service producer network where your Dataproc Metastore instance resides.
- Verify Private Service Access Connection:
- In the Google Cloud console, go to Virtual Private Cloud network > VPC Network Peering.
- Verify that a peering connection named
servicenetworking-googleapis-com
exists and its state isACTIVE
. - If this connection is missing or not active, follow the instructions in Configuring Private Service Access. This includes allocating an IP address range for the service producer network.
2. Configure Firewall Rules
Verify that firewall rules in your Virtual Private Cloud network (or the Shared Virtual Private Cloud host project, if applicable) allow necessary traffic.
- Egress Rule from Workload to Metastore:
- Verify that an egress firewall rule allows outbound TCP traffic from your
Dataproc cluster or Dataproc Serverless workloads
to the IP address range of your Dataproc Metastore instance
on port
9083
. This is the default port for Hive Metastore. - If using Private Service Access, this traffic will be routed privately.
- Verify that an egress firewall rule allows outbound TCP traffic from your
Dataproc cluster or Dataproc Serverless workloads
to the IP address range of your Dataproc Metastore instance
on port
- Ingress Rules (less common for client-to-Metastore):
- Generally, you don't need to configure ingress rules on your Virtual Private Cloud for traffic from the Dataproc Metastore instance to your workload, as communication typically originates from the workload. However, verify no overly restrictive ingress rules are inadvertently blocking necessary responses.
3. Verify DNS Resolution
Your Dataproc workloads need to resolve the Dataproc Metastore endpoint URI to its private IP address.
- DNS Peering or Private Zones: If you are using custom DNS servers or
private Cloud DNS zones, verify that DNS queries for the
Dataproc Metastore endpoint (e.g.,
your-metastore-endpoint.us-central1.dataproc.cloud.google.com
) are correctly forwarded or resolved to the private IP range used by Private Service Access. - Testing DNS Resolution: From a VM within the same subnet as your
Dataproc workload, use
nslookup
ordig
to verify that the Dataproc Metastore endpoint resolves to a private IP address.
Troubleshooting Network Connectivity
If you encounter connectivity issues after configuring network access, consider the following troubleshooting steps:
- Review Dataproc Metastore Status: Verify that your
Dataproc Metastore instance is in a
HEALTHY
state in the Google Cloud console. - Check Cloud Logging: Examine Cloud Logging for your Dataproc Metastore instance and related Dataproc workloads for network-related error messages or connection timeouts.
- Use Network Intelligence Center Connectivity Tests: Use Google Cloud's Connectivity Tests to diagnose the network path from your Dataproc workload's VMs to the Dataproc Metastore endpoint.
- Refer to General Troubleshooting: For more detailed network diagnostics, refer to:
What's next
- Learn more about Dataproc Metastore.
- Review Dataproc networking options.
- Understand VPC Network Peering.