Private Service Connect with Dataproc Metastore

This page explains what Private Service Connect is and how to use it for networking as an alternative to VPC peering.

Dataproc Metastore service without VPC peering

Dataproc Metastore protects its metadata access by exposing only private IP endpoints. It also restricts connectivity to VMs in the provided customer's VPC network through VPC peering.

Dataproc Metastore requires the following per region for each VPC network:

Setting up VPC peering and IP address reservation poses a challenge for crowded VPC networks. Similarly, a VPC network may not have enough peering quota to accommodate additional peering requests. Both of these limitations prevent new Dataproc Metastore service creations.

You can create a Dataproc Metastore service without VPC peering and address block reservations by using Private Service Connect to expose the Dataproc Metastore endpoint. Private Service Connect allows a private connection to Dataproc Metastore metadata across VPC networks.

With Private Service Connect, Dataproc Metastore requires a single address reservation in the subnetwork and a forwarding rule targeting the service attachment that exposes the Dataproc Metastore endpoint. The address reservation and forwarding rule are created as a part of the Dataproc Metastore service create call.

Create a Dataproc Metastore service with Private Service Connect

The following instructions demonstrate how to configure Private Service Connect during service creation.

gcloud

  1. Run the following gcloud beta metastore services create command to create a service with Private Service Connect:

    gcloud beta metastore services create SERVICE \
       --location=LOCATION \
       --consumer-subnetworks="projects/PROJECT_ID/regions/LOCATION/subnetworks/SUBNET1, projects/PROJECT_ID/regions/LOCATION/subnetworks/SUBNET2"
    
  2. Verify that the creation was successful.

curl

Create a service using the create method:

     curl -X POST -s -i 
-H "Authorization: Bearer $(gcloud auth print-access-token)"
-d '{"network_config":{"consumers": [{"subnetwork": "projects/PROJECT_ID/regions/LOCATION/subnetworks/SUBNET1"}]}}'
-H "Content-Type:application/json"
https://metastore.googleapis.com/v1beta/projects/PROJECT_ID/locations/LOCATION/services?service_id=SERVICE_ID

Dataproc Metastore reserves addresses and creates forwarding rules in each of the specified subnetworks. Each subnetwork has a Thrift endpoint URI that you can use to access the Dataproc Metastore metadata endpoint from.

Attach a Dataproc cluster

You can attach a Dataproc cluster that uses the Dataproc Metastore service with Private Service Connect as its Hive metastore.

There are two ways you can attach a Dataproc cluster:

  • Option 1: Provide the following Hive property config setup while creating the Dataproc cluster.

    1. Run the following gcloud dataproc clusters create command:

       gcloud dataproc clusters create CLUSTER_NAME 
      --properties="hive:hive.metastore.uris=$ENDPOINT_URI,hive:hive.metastore.warehouse.dir=$WAREHOUSE_DIR/hive-warehouse"

  • Option 2: Update the hive-site.xml on the Dataproc cluster with the endpoint URI listed in NetworkConfig.

    1. SSH into the Dataproc cluster's master instance and perform the following:

      1. Modify /etc/hive/conf/hive-site.xml on the Dataproc cluster:

        <property>
          <name>hive.metastore.uris</name>
          <!-- Update this value. -->
          <value>ENDPOINT_URI</value>
        </property>
        <!-- Add this property entry. -->
        <property>
          <name>hive.metastore.warehouse.dir</name>
          <value>WAREHOUSE_DIR</value>
        </property>
        
      2. Restart HiveServer2:

        sudo systemctl restart hive-server2.service
        

Private Service Connect caveats for Dataproc Metastore

  • Dataproc Metastore service endpoints that use Private Service Connect only support access from subnetworks in the same region as the service.
  • Reverse connectivity is not possible. This means Kerberos configuration with Private Service Connect setup is not supported.
  • You can't dynamically add or remove subnets from a Dataproc Metastore service. You must recreate a service if you'd like to add or remove subnets.
  • You can't update a Dataproc Metastore service from Private Service Connect setup to peering setup or vice versa.

What's next