This document explains how to specify a network and a subnetwork when you create BigQuery Engine for Apache Flink deployments. Virtual Private Cloud (VPC) networks, subnets, and firewall rules allow your deployments and jobs to communicate with your resources while maintaining a secure system. Configuring these elements correctly helps to ensure that your jobs run as expected and that your jobs and resources remain secure.
This document requires that you know how to create Google Cloud networks and subnetworks. This document also requires familiarity with the network terms discussed in the next section.
Google Cloud network terminology
VPC network. A VPC network is a virtual version of a physical network that is implemented inside of Google's production network. Sometimes called a network, a VPC provides connectivity for resources in a project.
For more information, see VPC network overview.
Subnetwork or subnet. The terms subnetwork and subnet are synonymous. Each VPC network consists of one or more IP address ranges called subnets. Subnets are regional resources. A network must have at least one subnet before you can use it.
For more information, see Subnets.
Firewall rules. VPC firewall rules allow or deny connections to and from virtual machine (VM) instances in your VPC network. You define the firewall rules at the network level, but connections are allowed or denied on a per-instance basis.
For more information, see Virtual Private Cloud (VPC) firewall rules.
Network and subnetwork for deployments
When you create a deployment, you must specify a network and a subnetwork. For your job to run, the resources that your job uses need to be in the same VPC network that you specify in your deployment. These resources include both Google Cloud resources and external resources.
For example, if your job reads from a standalone Apache Kafka cluster, the Apache Kafka cluster needs to be in the same VPC network as the BigQuery Engine for Apache Flink deployment.
The deployment and the resources don't need to be in the same subnetwork.
Guidelines for specifying a network parameter
You can select an auto mode VPC network in your project with the network parameter.
You can specify a network using only its name and not the complete URL.
You can't use a Shared VPC network or VPC Service Controls.
Although the default
network has configurations that allow deployments to run
jobs, for security reasons, we recommend that you create a separate network for
BigQuery Engine for Apache Flink. The default
network is not secure, because it is
pre-populated with firewall rules that allow incoming connections to instances.
Guidelines for specifying a subnetwork parameter
Specify a subnet by using the subnet name.
You must select a subnet in the same region as your deployment.
The subnet must reside inside the VPC that you specify when you create the deployment.
The subnet must have Private Google Access enabled. For more information, see Configure Private Google Access.
Each subnet defines a range of IPv4 addresses. IPv6 subnet ranges are not supported.
You can provide at most one subnet for each deployment.
The IP address range in the subnet must be large enough to support all of your task slots.
The IP address range 172.16.0.0/14 (172.16.0.0 - 172.19.255.255) is reserved. You can't use this range in your subnet.
Create a deployment with the network specified
The following example shows how to specify a network and subnetwork when you create your deployment.
gcloud
To create a deployment by using the gcloud CLI, use the
gcloud alpha managed-flink deployments create
command.
gcloud alpha managed-flink deployments create DEPLOYMENT_ID \
--project=PROJECT_ID \
--location=REGION \
--network-config-vpc=NETWORK_NAME \
--network-config-subnetwork=SUBNET_NAME \
--max-slots=TASK_SLOTS
Replace the following:
DEPLOYMENT_ID
: the name of your deploymentPROJECT_ID
: your project IDREGION
: a BigQuery Engine for Apache Flink region, likeus-central1
NETWORK_NAME
: the name of your network. To use the default network, enterdefault
.SUBNET_NAME
: the name of your subnet. To use the default subnet, enterdefault
.TASK_SLOTS
: the number of task slots to assign to the deployment
Firewall rules
Access management between BigQuery Engine for Apache Flink jobs, Google Cloud resources, and user-owned resources is supported by VPC firewall rules. Firewall rules allow or deny traffic to and from the resources that your jobs use.
When you assign a VPC network and subnet to your deployment, the deployment follows all of the rules that you define in your VPC firewall rules. Therefore, you can use firewall ingress and egress rules to control which resources your deployment and job can access.
For more information, see Use VPC firewall rules.
Example firewall ingress rule
In the following example, a firewall ingress rule is created.
A project owner, editor, or security administrator can use the
gcloud compute firewall-rules create
command to create an ingress allow rule that permits traffic
between the deployment and a resource used by a job,
such as a standalone Apache Kafka cluster or Cloud SQL.
gcloud compute firewall-rules create FIREWALL_RULE_NAME_INGRESS \
--network=NETWORK \
--action=allow \
--direction=ingress \
--source-ranges=SOURCE_RANGE \
--destination-ranges=DESTINATION_RANGE\
--priority=PRIORITY_NUM \
Replace the following:
FIREWALL_RULE_NAME_INGRESS
: a name for the firewall ruleNETWORK
: the name of your VPC networkSOURCE_RANGE
: the range of IPv4 addresses for the subnet of the deployment, as a comma-delimited list in CIDR formatDESTINATION_RANGE
: the range of IPv4 addresses for the resource that the job needs to access, as a comma-delimited list in CIDR formatInclude the selected subnetwork's primary IP address range.
PRIORITY_NUM
: the priority of the firewall ruleLower numbers have higher priorities and 0 is the highest priority.
For more information and examples, see Create VPC firewall rules.