This page explains how to control communication between your cluster's Pods and Services using GKE's network policy enforcement.
You can also control Pods' egress traffic to any endpoint or Service outside of the cluster using fully qualified domain name (FQDN) network policies. For more information, see Control communication between Pods and Services using FQDNs.
About GKE network policy enforcement
Network policy enforcement lets you create Kubernetes Network Policies in your cluster. By default, all Pods within a cluster can communicate with each other freely. Network policies create Pod-level firewall rules that determine which Pods and Services can access one another inside your cluster.
Defining network policy helps you enable things like defense in depth when your cluster is serving a multi-level application. For example, you can create a network policy to ensure that a compromised front-end service in your application cannot communicate directly with a billing or accounting service several levels down.
Network policy can also make it easier for your application to host data from multiple users simultaneously. For example, you can provide secure multi-tenancy by defining a tenant-per-namespace model. In such a model, network policy rules can ensure that Pods and Services in a given namespace cannot access other Pods or Services in a different namespace.
Before you begin
Before you start, make sure you have performed the following tasks:
- Enable the Google Kubernetes Engine API. Enable Google Kubernetes Engine API
- If you want to use the Google Cloud CLI for this task,
install and then
initialize the
gcloud CLI. If you previously installed the gcloud CLI, get the latest
version by running
gcloud components update
.
Requirements and limitations
The following requirements and limitations apply to both Autopilot and Standard clusters:- You must allow egress to the metadata server.
- If you specify an
endPort
field in a Network Policy on a cluster that has GKE Dataplane V2 enabled, it might not take effect starting in GKE version 1.22. For more information, see Network Policy port ranges don't take effect. For Autopilot clusters, GKE Dataplane V2 is always enabled.
- You must allow egress to the metadata server if you use network policy with Workload Identity Federation for GKE.
- Enabling network policy enforcement increases the memory footprint of the
kube-system
process by approximately 128 MB, and requires approximately 300 millicores of CPU. This means that if you enable network policies for an existing cluster, you might need to increase the cluster's size to continue running your scheduled workloads. - Enabling network policy enforcement requires that your nodes be re-created. If your cluster has an active maintenance window, your nodes are not automatically re-created until the next maintenance window. If you prefer, you can manually upgrade your cluster at any time.
- The required minimum cluster size to run network policy enforcement is
three
e2-medium
instances or one machine type instance with more than 1 allocatable vCPU. See GKE known issues for more details. - Network policy is not supported for clusters whose nodes are
f1-micro
org1-small
instances, as the resource requirements are too high.
For more information about node machine types and allocatable resources, see Standard cluster architecture - Nodes.
Enable network policy enforcement
Network policy enforcement is enabled by default for Autopilot clusters, so you can skip to Create a network policy.
You can enable network policy enforcement in Standard by using the gcloud CLI, the Google Cloud console, or the GKE API.
Network policy enforcement is built into GKE Dataplane V2. You do not need to enable network policy enforcement in clusters that use GKE Dataplane V2.
This change requires recreating the nodes, which can cause disruption to your running workloads. For details about this specific change, find the corresponding row in the manual changes that recreate the nodes using a node upgrade strategy and respecting maintenance policies table. To learn more about node updates, see Planning for node update disruptions.
gcloud
-
In the Google Cloud console, activate Cloud Shell.
At the bottom of the Google Cloud console, a Cloud Shell session starts and displays a command-line prompt. Cloud Shell is a shell environment with the Google Cloud CLI already installed and with values already set for your current project. It can take a few seconds for the session to initialize.
To enable network policy enforcement when creating a new cluster, run the following command:
gcloud container clusters create CLUSTER_NAME --enable-network-policy
Replace
CLUSTER_NAME
with the name of the new cluster.To enable network policy enforcement for an existing cluster, perform the following tasks:
Run the following command to enable the add-on:
gcloud container clusters update CLUSTER_NAME --update-addons=NetworkPolicy=ENABLED
Replace
CLUSTER_NAME
with the name of the cluster.Run the following command to enable network policy enforcement on your cluster, which in turn re-creates your cluster's node pools with network policy enforcement enabled:
gcloud container clusters update CLUSTER_NAME --enable-network-policy
Console
To enable network policy enforcement when creating a new cluster:
Go to the Google Kubernetes Engine page in the Google Cloud console.
Click add_box Create.
In the Create cluster dialog, for GKE Standard, click Configure.
Configure your cluster as chosen.
From the navigation pane, under Cluster, click Networking.
Select the Enable network policy checkbox.
Click Create.
To enable network policy enforcement for an existing cluster:
Go to the Google Kubernetes Engine page in Google Cloud console.
In the cluster list, click the name of the cluster you want to modify.
Under Networking, in the Network policy field, click edit Edit network policy.
Select the Enable network policy for master checkbox and click Save Changes.
Wait for your changes to apply, and then click edit Edit network policy again.
Select the Enable network policy for nodes checkbox.
Click Save Changes.
API
To enable network policy enforcement, perform the following:
Specify the
networkPolicy
object inside thecluster
object that you provide to projects.zones.clusters.create or projects.zones.clusters.update.The
networkPolicy
object requires an enum that specifies which network policy provider to use, and a boolean value that specifies whether to enable network policy. If you enable network policy but do not set the provider, thecreate
andupdate
commands return an error.
Disable network policy enforcement in a Standard cluster
You can disable network policy enforcement by using the gcloud CLI, the Google Cloud console, or the GKE API. You cannot disable network policy enforcement in Autopilot clusters or clusters that use GKE Dataplane V2.
This change requires recreating the nodes, which can cause disruption to your running workloads. For details about this specific change, find the corresponding row in the manual changes that recreate the nodes using a node upgrade strategy and respecting maintenance policies table. To learn more about node updates, see Planning for node update disruptions.
gcloud
-
In the Google Cloud console, activate Cloud Shell.
At the bottom of the Google Cloud console, a Cloud Shell session starts and displays a command-line prompt. Cloud Shell is a shell environment with the Google Cloud CLI already installed and with values already set for your current project. It can take a few seconds for the session to initialize.
To disable network policy enforcement, perform the following tasks:
- Disable network policy enforcement on your cluster:
gcloud container clusters update CLUSTER_NAME --no-enable-network-policy
Replace
CLUSTER_NAME
with the name of the cluster.After you run this command, GKE re-creates your cluster node pools with network policy enforcement disabled.
Verify that all your nodes were re-created:
kubectl get nodes -l projectcalico.org/ds-ready=true
If the operation is successful, the output is similar to the following:
No resources found
If the output is similar to the following, then you must wait for GKE to finish updating the node pools:
NAME STATUS ROLES AGE VERSION gke-calico-cluster2-default-pool-bd997d68-pgqn Ready,SchedulingDisabled <none> 15m v1.22.10-gke.600 gke-calico-cluster2-np2-c4331149-2mmz Ready <none> 6m58s v1.22.10-gke.600
When you disable network policy enforcement, GKE might not update the nodes immediately if your cluster has a configured maintenance window or exclusion. For more information, see Cluster slow to update.
After all of the nodes are re-created, disable the add-on:
gcloud container clusters update CLUSTER_NAME --update-addons=NetworkPolicy=DISABLED
Console
To disable network policy enforcement for an existing cluster, perform the following:
Go to the Google Kubernetes Engine page in Google Cloud console.
In the cluster list, click the name of the cluster you want to modify.
Under Networking, in the Network policy field, click edit Edit network policy.
Clear the Enable network policy for nodes checkbox and click Save Changes.
Wait for your changes to apply, and then click edit Edit network policy again.
Clear the Enable network policy for master checkbox.
Click Save Changes.
API
To disable network policy enforcement for an existing cluster, do the following:
Update your cluster to use
networkPolicy.enabled: false
using thesetNetworkPolicy
API.Verify that all your nodes were re-created using the gcloud CLI:
kubectl get nodes -l projectcalico.org/ds-ready=true
If the operation is successful, the output is similar to the following:
No resources found
If the output is similar to the following, then you must wait for GKE to finish updating the node pools:
NAME STATUS ROLES AGE VERSION gke-calico-cluster2-default-pool-bd997d68-pgqn Ready,SchedulingDisabled <none> 15m v1.22.10-gke.600 gke-calico-cluster2-np2-c4331149-2mmz Ready <none> 6m58s v1.22.10-gke.600
When you disable network policy enforcement, GKE might not update the nodes immediately if your cluster has a configured maintenance window or exclusion. For more information, see Cluster slow to update.
Update your cluster to use
update.desiredAddonsConfig.NetworkPolicyConfig.disabled: true
using theupdateCluster
API.
Create a network policy
You can create a network policy using the Kubernetes Network Policy API.
For further details on creating a network policy, see the following topics in the Kubernetes documentation:
Network policy and Workload Identity Federation for GKE
If you use network policy with Workload Identity Federation for GKE, you must allow egress to the following IP addresses so your Pods can communicate with the GKE metadata server.
- For clusters
running GKE version 1.21.0-gke.1000 and later, allow egress to
169.254.169.252/32
on port988
. - For clusters running GKE versions earlier than 1.21.0-gke.1000,
allow egress to
127.0.0.1/32
on port988
. - For clusters running GKE Dataplane V2, allow egress to
169.254.169.254/32
on port80
.
If you don't allow egress to these IP addresses and ports, you might experience disruptions during auto-upgrades.
Migrating from Calico to GKE Dataplane V2
If you migrate your network policies from Calico to GKE Dataplane V2, consider the following limitations:
You cannot use a Pod or Service IP address in the
ipBlock.cidr
field of aNetworkPolicy
manifest. You must reference workloads using labels. For example, the following configuration is invalid:- ipBlock: cidr: 10.8.0.6/32
You cannot specify an empty
ports.port
field in aNetworkPolicy
manifest. If you specify a protocol, you must also specify a port. For example, the following configuration is invalid:ingress: - ports: - protocol: TCP
Working with Application Load Balancers
When an Ingress is applied to a Service to build an Application Load Balancer, you must configure the network policy applied to Pods behind that Service to allow the appropriate Application Load Balancer health check IP ranges. If you are using an internal Application Load Balancer, you must also configure the network policy to allow the proxy-only subnet.
If you are not using container-native load balancing with network endpoint
groups, node ports for a Service might forward connections to Pods on
other nodes unless they are prevented from doing so by setting
externalTrafficPolicy
to Local
in the Service definition. If
externalTrafficPolicy
is not set to Local
, the network policy must also
allow connections from other node IPs in the cluster.
Inclusion of Pod IP ranges in ipBlock rules
To control traffic for specific Pods, always select Pods by their namespace
or Pod labels by using namespaceSelector
and podSelector
fields in your
NetworkPolicy ingress or egress rules. Don't use the ipBlock.cidr
field to
intentionally select Pod IP address ranges, which are ephemeral in nature.
The Kubernetes project doesn't explicitly define the behavior of the
ipBlock.cidr
field when it includes Pod IP address ranges. Specifying broad
CIDR ranges in this field, like 0.0.0.0/0
(which include the Pod IP address
ranges) might have unexpected results in different implementations of
NetworkPolicy.
ipBlock behavior in GKE Dataplane V2
With the GKE Dataplane V2 implementation of NetworkPolicy, Pod traffic is never
covered by an ipBlock
rule. Therefore, even if you define a broad rule such
as cidr: '0.0.0.0/0'
, it will not include Pod traffic. This is useful as it
lets you to, for example, allow Pods in a namespace to receive traffic from
the internet, without also allowing traffic from Pods. To also
include Pod traffic, select Pods explicitly using an additional Pod or namespace
selector in the ingress or egress rule definitions of the NetworkPolicy.
ipBlock behavior in Calico
For the Calico implementation of NetworkPolicy, the ipBlock
rules do
cover Pod traffic. With this implementation, to configure a broad CIDR range
without allowing Pod traffic, explicitly exclude the cluster's Pod CIDR range,
like in the following example:
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: allow-non-pod-traffic
spec:
ingress:
- from:
- ipBlock:
cidr: '0.0.0.0/0'
except: ['POD_IP_RANGE']
In this example, POD_IP_RANGE
is your cluster's Pod
IPv4 address range, for example 10.95.0.0/17
. If you have multiple IP ranges,
these can be included individually in the array, for example
['10.95.0.0/17', '10.108.128.0/17']
.
Troubleshooting
Pods can't communicate with control plane on clusters that use Private Service Connect
Pods on GKE clusters that use Private Service Connect might experience a communication issue with the control plane if the Pod's egress to the control plane's internal IP address is restricted in egress network policies.
To mitigate this issue:
Confirm that your cluster uses Private Service Connect. On clusters that use Private Service Connect, if you use the
master-ipv4-cidr
flag when creating the subnet, GKE assigns each control plane an internal IP address from the values you defined inmaster-ipv4-cidr
. Otherwise, GKE uses the cluster node subnet to assign each control plane an internal IP address.Configure your cluster's egress policy to allow traffic to the control plane's internal IP address.
To find the control plane's internal IP address:
gcloud
To look for
privateEndpoint
, run the following command:gcloud container clusters describe CLUSTER_NAME
Replace
CLUSTER_NAME
with the name of the cluster.This command retrieves the
privateEndpoint
of the specified cluster.Console
Go to the Google Kubernetes Engine page in the Google Cloud console.
From the navigation pane, under Clusters, click the cluster whose internal IP address you want to find.
Under Cluster basics, navigate to
Internal endpoint
, where the internal IP address is listed.
Once you are able to locate the
privateEndpoint
orInternal endpoint
, configure your cluster's egress policy to allow traffic to the control plane's internal IP address. For more information, see Create a network policy.
Cluster slow to update
When you enable or disable network policy enforcement on an existing cluster, GKE might not update the nodes immediately if the cluster has a configured maintenance window or exclusion.
You can manually upgrade a node pool by setting the--cluster-version
flag
to the same GKE version that the control plane is running. You
must use the Google Cloud CLI to perform this operation. For more information,
see
caveats for maintenance windows.
Manually deployed Pods unscheduled
When you enable network policy enforcement on the control plane of existing cluster, GKE unschedules any ip-masquerade-agent or calico node Pods that you manually deployed.
GKE does not reschedule these Pods until network policy enforcement is enabled on the cluster nodes and the nodes are recreated.
If you have configured a maintenance window or exclusion, this might cause an extended disruption.
To minimize the duration of this disruption, you can manually assign the following labels to the cluster nodes:
node.kubernetes.io/masq-agent-ds-ready=true
projectcalico.org/ds-ready=true
Network policy not taking effect
If a NetworkPolicy is not taking effect, you can troubleshoot using the following steps:
Confirm that network policy enforcement is enabled. The command that you use depends on if your cluster has GKE Dataplane V2 enabled.
If your cluster has GKE Dataplane V2 enabled, run the following command:
kubectl -n kube-system get pods -l k8s-app=cilium
If the output is empty, network policy enforcement is not enabled.
If your cluster does not have GKE Dataplane V2 enabled, run the following command:
kubectl get nodes -l projectcalico.org/ds-ready=true
If the output is empty, network policy enforcement is not enabled.
Check the Pod labels:
kubectl describe pod POD_NAME
Replace
POD_NAME
with the name of the Pod.The output is similar to the following:
Labels: app=store pod-template-hash=64d9d4f554 version=v1
Confirm that the labels on the policy match the labels on the Pod:
kubectl describe networkpolicy
The output is similar to the following:
PodSelector: app=store
In this output, the
app=store
labels match theapp=store
labels from the previous step.Check if there are any network policies selecting your workloads:
kubectl get networkpolicy
If the output is empty, no NetworkPolicy was created in the namespace and nothing is selecting your workloads. If the output is not empty, check if the policy selects your workloads:
kubectl describe networkpolicy
The output is similar to the following:
... PodSelector: app=nginx Allowing ingress traffic: To Port: <any> (traffic allowed to all ports) From: PodSelector: app=store Not affecting egress traffic Policy Types: Ingress
Known issues
StatefulSet pod termination with Calico
GKE clusters with
Calico network
policy enabled might experience an issue where a StatefulSet pod drops existing
connections when the pod is deleted. After a pod enters the Terminating
state,
the terminationGracePeriodSeconds
configuration in the pod spec is not honored
and causes disruptions for other applications that have an existing connection
with the StatefulSet pod. For more information about this issue, see
Calico issue #4710.
This issue affects the following GKE versions:
- 1.18
- 1.19 to 1.19.16-gke.99
- 1.20 to 1.20.11-gke.1299
- 1.21 to 1.21.4-gke.1499
To mitigate this issue, upgrade your GKE control plane to one of the following versions:
- 1.19.16-gke.100 or later
- 1.20.11-gke.1300 or later
- 1.21.4-gke.1500 or later
Pod stuck in containerCreating
state
There can be scenario where GKE clusters with Calico network
policy enabled might experience an issue where Pods get stuck in containerCreating
state.
Under the Pod Events tab, you see a message similar to the following:
plugin type="calico" failed (add): ipAddrs is not compatible with
configured IPAM: host-local
To mitigate this issue, use host-local ipam for Calico instead of calico-ipam in GKE clusters.