Dataproc Metastore networking overview

Dataproc Metastore services use private IPs, which provide many benefits. Private IPs provide lower network latency than public IPs. You can connect through private IP from any region. You can also connect through Shared VPCs between projects.

You can also use private IPs to connect hosts in your VPC network with a Dataproc Metastore service, by peering your VPC network with the Dataproc Metastore's service producer VPC network. Google allocates IP ranges used by your VPC network to the Dataproc Metastore's service producer project.

Example

In the following example, Google allocates the 10.100.0.0/17 and 10.200.0.0/20 address ranges in the customer VPC network for Google services and uses the address ranges in a peered VPC network.

Customer, service, and Cloud SQL projects networking diagram

  • On the Google services side of the VPC peering, Google creates a project for the customer. The project is isolated, meaning no other customers share it and the customer is billed for only the resources the customer provisions.
  • When creating the first Dataproc Metastore service in a region, Dataproc Metastore allocates a /17 range and a /20 range in the customer's network for all future Dataproc Metastore services usage in that region and network. Dataproc Metastore further subdivides these ranges to create subnetworks and address ranges in the service producer project.
  • VM services in the customer's network can access Dataproc Metastore service resources in any region if the Google Cloud service supports it. Some Google Cloud services might not support cross-region communication.
  • Egress costs for cross-regional traffic, where a VM instance communicates with resources in a different region, still apply.
  • Google assigns the Dataproc Metastore service the IP address 10.100.0.100. In the customer VPC network, requests with a destination of 10.100.0.100 are routed through the VPC peering to the service producer's network. After reaching the service network, the service network contains routes that direct the request to the correct resource.
  • Traffic between VPC networks travels internally within Google's network, not through the public internet.

Network issues

Dataproc Metastore allocates a /17 range and a /20 range from the address space for each region. For example, placing Dataproc Metastore services in two regions requires that the allocated IP address range contains at least two unused address blocks of size /17 and two unused address blocks of size /20.

Connections to a Dataproc Metastore service using a private IP address use RFC 1918 address ranges. If RFC 1918 address blocks aren't found, then Dataproc Metastore finds suitable non-RFC 1918 address blocks instead. Note that the allocation of non-RFC 1918 blocks doesn't take into account whether or not those addresses are in use in your VPC network or on-premises.

Security

Traffic over VPC Network Peering is provided with a certain level of encryption. For more information, see Google Cloud's virtual network encryption and authentication.

Creating one VPC network for each service with a private IP address provides better network isolation than putting all services in the "default" VPC network.

Quick reference for networking topics

Topic Discussion
Shared VPC networks You can create Dataproc Metastore services in a Shared VPC network. To set up shared VPC when creating a service, see Set up shared VPC.
Legacy networks You can't connect to a Dataproc Metastore service from a legacy network. Legacy networks don't support VPC Network Peering.
Existing Dataproc Metastore services You can't change the network that a Dataproc Metastore service is connected to.
Static IP addresses The private IP address of the Dataproc Metastore service doesn't change.
VPC Network Peering You don't create the VPC Network Peering explicitly, because the peering is internal to Google Cloud. After you create a Dataproc Metastore service, you can see its underlying VPC Network Peering on the VPC Network Peering page in the Google Cloud console. The peering is shared by services in the same region and network. Don't delete it. For more information, see VPC Network Peering.
VPC Service Controls VPC Service Controls improve your ability to mitigate the risk of data exfiltration. With VPC Service Controls, you create perimeters around the Dataproc Metastore service. VPC Service Controls restrict access to resources within the perimeter from the outside. Only clients and resources within the perimeter can interact with one another. For more information, see Overview of VPC Service Controls.

Also review Dataproc Metastore limitations when using VPC Service Controls. To use VPC Service Controls with Dataproc Metastore, see VPC Service Controls with Dataproc Metastore.

Transitive peering Only directly peered networks can communicate. Transitive peering is not supported. In other words, if VPC network N1 is peered with N2 and N3, but N2 and N3 are not directly connected, VPC network N2 can't communicate with VPC network N3 over VPC Network Peering. In addition, VMs in networks peered with your Dataproc Metastore project network can't reach Dataproc Metastore. Only hosts on the VPC network can reach Dataproc Metastore.

A Dataproc Metastore service in one project can be connected to by clients in multiple different projects using Shared VPC networks.

What's next