Using network policy logging

This page explains how to use network policy logging for Google Kubernetes Engine (GKE).

Overview

Network policies specify network traffic that Pods are allowed to send and receive. Network policy logging lets you record when a connection is allowed or denied by a network policy. Using network policy logging, you can:

  • Verify that your network policies are working as expected.
  • Understand which Pods in your cluster are communicating with the internet.
  • Understand which namespaces are communicating with each other.
  • Recognize a Denial of Service attack.

Network policy logs are uploaded to Cloud Logging for storage, search, analysis and alerting if Cloud Logging is enabled. Cloud Logging is enabled by default in new clusters. See Installing Cloud Operations for GKE support for more.

Network policy logging is only available for clusters that use Dataplane V2.

Pricing

  • There are no log generation charges for Network Policy Logging during beta.
  • If you store your logs in Cloud Logging, standard Cloud Logging charges apply.
  • Logs can be further exported to Pub/Sub, Cloud Storage, or BigQuery. Pub/Sub, Cloud Storage, or BigQuery charges may apply. For more information on exporting logs, see Overview of logs export.

About Dataplane V2

Dataplane V2 is based on eBPF and allows Linux nodes to flexibly and performantly process network packets in-kernel. Dataplane V2 includes built-in network policy enforcement and network policy logging without any third-party add-ons.

To use network policy logging, you must enable Dataplane V2 on your GKE cluster. For instructions, see the Creating a cluster with Dataplane V2 section.

Limitations

  • Dataplane V2 can only be enabled in new clusters. Existing clusters cannot be upgraded to use Dataplane V2.
  • Windows nodes do not support Dataplane V2.

Beta limitations

  • While Dataplane V2 is in beta, backwards compatibility is not guaranteed. You might have to recreate a cluster using Dataplane V2 when a new version of Dataplane V2 becomes available.
  • Some Kubernetes and GKE features are known not to work in Beta:
    • Kubernetes services that set ExternalTrafficPolicy:local and are backed by Pods running with hostNetwork:true cannot receive traffic from clients outside the cluster.
    • Kubernetes network policies that use FromCIDR with CIDR ranges that select some but not all node IPs don't work.
    • Some features including NodeLocal DNSCache are not supported.
  • There is not a Google Cloud Console interface for Dataplane V2.

Creating a GKE cluster with Dataplane V2

You can enable Dataplane V2 when creating new clusters with GKE 1.17.9 and later.

gcloud

To create a new cluster with Dataplane V2, use the following command:

gcloud beta container clusters create cluster-name \
    --enable-dataplane-v2 \
    --enable-ip-alias \
    --cluster-version version \
    --release-channel channel-name \
    {--region region-name | --zone zone-name}

Replace the following:

  • cluster-name: the name of your new cluster.
  • version: your cluster version, which must be GKE 1.17.9 or later.
  • channel-name: a release channel that includes GKE version 1.17.9 or later.
  • region-name or zone-name: the location of the cluster. These arguments are mutually exclusive. See Types of clusters for more information.

API

To create a new cluster with Dataplane V2, specify the datapathProvider field in the networkConfig object in your cluster create request.

The following JSON snippet shows the configuration needed to enable Dataplane V2:

"cluster":{
  "initialClusterVersion":"version",
  "ipAllocationPolicy":{
     "useIpAliases":true
  },
  "networkConfig":{
     "datapathProvider":"ADVANCED_DATAPATH"
  },
  "releaseChannel":{
     "channel":"channel-name"
  }
}

Replace the following:

  • version: your cluster version, which must be GKE 1.17.9 or later.
  • channel-name: a release channel that includes GKE version 1.17.9 or later.

Configuring network policy logging

You configure network policy logging settings by editing the NetworkLogging object in your cluster. GKE automatically creates a NetworkLogging object named default in new Dataplane V2 clusters. There can only be one NetworkLogging object per cluster and it can't be renamed.

You can configure the logging of allowed connections and the logging of denied connections separately. You can also selectively enable logging for some network policies. The following is an example of the NetworkLogging specification, with settings specified to log all allowed and denied connections:

kind: NetworkLogging
apiVersion: networking.gke.io/v1alpha1
metadata:
  name: default
spec:
  cluster:
    allow:
      log: true
      delegate: false
    deny:
      log: true
      delegate: false

Use kubectl to edit your configuration:

kubectl edit networklogging default

NetworkLogging spec

The NetworkLogging object specification is in a YAML format. This format is described in the following table:

FieldTypeDescription
cluster.allowstruct Settings for logging allowed connections.
FieldTypeDescription
log bool

If set to true, allowed connections in the cluster are logged; otherwise, allowed connections are not logged.

Network policies that select the Pod and have a rule that matches the connection are listed in the log message.

delegate bool

If false, all allowed connections are logged. If multiple network policies allow a connection, all matching policies are listed in the log message.

If true, allowed connections are only logged if they are allowed by a network policy with the logging annotation policy.network.gke.io/enable-logging: "true". If multiple network policies allow a connection, all matching policies with the enable-logging annotation are listed in the log message.

A configuration error occurs if you set spec.cluster.allow.delegate to true and spec.cluster.allow.log to false.

cluster.deny struct Settings for logging denied connections.
FieldTypeDescription
log bool

If set to true, denied connections in the cluster are logged; otherwise, denied connections are not logged.

delegate bool

If false, all denied connections are logged.

If true, denied connections are only logged if the Pod where the connection was denied is in a namespace with the annotation policy.network.gke.io/enable-deny-logging: "true".

A configuration error occurs if you set spec.cluster.deny.delegate to true and spec.cluster.deny.log to false.

Accessing logs

The network policy logs generated on each cluster node are available locally at /var/log/network/policy_action.log*. A new numbered log file is created when the current log file reaches 10 MB. Up to five previous log files are stored.

Network policy logs are automatically uploaded to Cloud Logging. You can access logs through the Logs Viewer or with the gcloud command-line tool. You can also export logs from Cloud Logging to the sink of your choice.

gcloud

gcloud logging read --project "project-name" 'resource.type="k8s_node" \
    resource.labels.location="cluster-location" \
    resource.labels.cluster_name="cluster-name" \
    logName="projects/project-name/logs/policy-action"'

Replace the following:

  • project-name: The name of your Google Cloud project.
  • cluster-location: The zone your cluster is in.
  • cluster-name: The name of your cluster.

Cloud Logging

  1. Go to the Google Cloud navigation menu and select Logging > Logs Viewer:
    Go to the Logs Viewer
  2. Use this following query to find all network policy log records:

    resource.type="k8s_node"
    resource.labels.location="cluster-location"
    resource.labels.cluster_name="cluster-name"
    logName="projects/project-name/logs/policy-action"
    

    Replace the following:

    • cluster-location: The zone your cluster is in.
    • cluster-name: The name of your cluster.
    • project-name: The name of your Google Cloud project.

See Using Logs Viewer (Preview) to learn how to use the Logs Viewer.

You can add further conditions to filter the results. For example:

  • Show logs in a certain time frame:

    timestamp>="2020-06-22T06:30:51.128Z"
    timestamp<="2020-06-23T06:30:51.128Z"
    
  • Show logs for denied connections:

    jsonPayload.disposition="deny"
    
  • Show logs to a deployment named "redis":

    jsonPayload.dest.pod_name=~"redis"
    jsonPayload.dest.pod_namespace="default"
    
  • Show logs for cluster-external connections:

    jsonPayload.dest.instance != ""
    
  • Show logs that match a certain network policy, in this case "allow-frontend-to-db":

    jsonPayload.policies.name="allow-frontend-to-db"
    jsonPayload.policies.namespace="default"
    

Log format

Network policy log records are in a JSON format. This format is described in the following table:

FieldTypeDescription
connectionstruct Connection information:
FieldTypeDescription
src_ipstringSource IP address of the connection.
src_portintSource port of the connection.
dest_ipstringDestination IP address of the connection.
dest_portintDestination port of the connection.
protocolstringProtocol of the connection, which can be one of tcp, udp, or icmp}.
directionstringDirection of the connection, which can be ingress, or egress.
srcstruct Endpoint information of the source:
FieldTypeDescription
pod_namestringName of the Pod, if the source is a Pod.
pod_namespacestringNamespace of the Pod, if the source is a Pod.
instancestringIP address of the source, if the source is not a Pod.
deststruct Endpoint information of the destination:
FieldTypeDescription
pod_namestringName of the Pod, if the destination is a Pod.
pod_namespacestringNamespace of the Pod, if the destination is a Pod.
instancestringIP address of the source, if the destination is not a Pod.
dispositionstringDisposition of the connection, which can be allow or deny.
policieslist of structs

Matched policies for the allowed connections from the enforced Pod's view. For ingress connection, the enforced Pod is the destination Pod. For egress connection, the enforced Pod is the source Pod. Multiple policies are logged if a connection is matched by all of them.

This field is is only included in logs of allowed connections.

FieldTypeDescription
namestringName of the matching network policy.
namespacestringNamespace of the matching network policy.
countintUsed for log aggregation of denied queries. The value is always 1 for allowed connection.
node_namestringThe node that runs the Pod that generated this log message.
timestampstringWhen the connection attempt occurred.

Definition of connection

For connection-oriented protocols like TCP, a log is created for each allowed or denied connection. For protocols like UDP and ICMP that aren't connection-oriented, packets are grouped into time-window based connections.

Policy logs for denied connections

The log records for denied connections do not include the policies field because the Kubernetes network policy API does not have explicit deny policies. A connection is denied if a Pod is covered by one or more network policies, but none of the policies allow the connection. This means that no policy is individually responsible for a blocked connection.

Log aggregation for denied connections

It is common for a client to retry a connection that was denied. To prevent excessive logging, repeated denied connections within a five-second window are aggregated into a single log message using the count field.

Subsequent denied connections are aggregated with a previous log message if the connection's src_ip, dest_ip, dest_port, protocol,and direction match the first denied connection. Note that the src_port of subsequent connections does not have to match because retried connections might come from a different port. The aggregated log message includes the src_prt of the first denied connection at the beginning of the aggregation window.

Example log records

The following example network policy named allow-green applied to test-service allows connections to test-service from a Pod named client-green. Implicitly, this policy denies all other ingress traffic to test-service including from the Pod client-red.

  apiVersion: networking.k8s.io/v1
  kind: NetworkPolicy
  metadata:
    name: allow-green
    namespace: default
    annotations:
      policy.network.gke.io/enable-logging: "true"
  spec:
    podSelector:
      matchLabels:
        app: test-service
    ingress:
    - from:
      - podSelector:
          matchLabels:
            app: client-green
    policyTypes:
    - Ingress

This diagram shows the effect of the allow-green policy on two connections to test-service. The allow-green policy allows the connection from client-green. Because no policy allows the connection from client-red the connection is denied.

The log for the allowed connection from client-green looks like this:

{
   "connection":{
      "src_ip":"10.84.0.252",
      "dest_ip":"10.84.0.165",
      "src_port":52648,
      "dest_port":8080,
      "protocol":"tcp",
      "direction":"ingress"
   },
   "disposition":"allow",
   "policies":[
      {
         "name":"allow-green",
         "namespace":"default"
      }
   ],
   "src":{
      "pod_name":"client-green-7b78d7c957-68mv4",
      "pod_namespace":"default"
   },
   "dest":{
      "pod_name":"test-service-745c798fc9-sfd9h",
      "pod_namespace":"default"
   },
   "count":1,
   "node_name":"gke-demo-default-pool-5dad52ed-k0h1",
   "timestamp":"2020-06-16T03:10:37.993712906Z"
}

The log for the denied connection from client-red looks like this:

{
   "connection":{
      "src_ip":"10.84.0.180",
      "dest_ip":"10.84.0.165",
      "src_port":39610,
      "dest_port":8080,
      "protocol":"tcp",
      "direction":"ingress"
   },
   "disposition":"deny",
   "src":{
      "pod_name":"client-red-5689846f5b-b5ccx",
      "pod_namespace":"default"
   },
   "dest":{
      "pod_name":"test-service-745c798fc9-sfd9h",
      "pod_namespace":"default"
   },
   "count":3,
   "node_name":"gke-demo-default-pool-5dad52ed-k0h1",
   "timestamp":"2020-06-15T22:38:32.189649531Z"
}

Note that the denied connection log does not include the policies field. This is described in the preceding section, Policy logs for denied connections.

The denied connection log includes a count field for aggregating denied connections.

Troubleshooting

Use the procedures in this section to inspect your cluster.

Dataplane V2

  1. Check the state of the system Pods:

    kubectl -n kube-system get pods
    

    If Dataplane V2 is running, you will see Pods with the prefix anetd- running in the HEALTHY state. anetd is the networking controller for Dataplane V2.

  2. If the issue is with services or network policy enforcement, check the anetd Pod logs:

    kubectl -n kube-system describe pod anetd-pod
    kubectl -n kube-system logs anetd-pod
    

    Replace anetd-pod with the name of an anetd Pod identified previously.

  3. If Pod creation is failing, check the kubelet logs for clues:

    gcloud compute ssh node -- sudo journalctl -u kubelet
    

    Replace node with the name of the VM instance.

Network policy logging

  1. Check for error events in the NetworkLogging object:

    kubectl describe networklogging default
    

    If the logging configuration is invalid, the configuration will not take effect and an error will be reported in the events section:

    Name:         default
    Namespace:
    Labels:       addonmanager.kubernetes.io/mode=EnsureExists
    Annotations:  API Version:  networking.gke.io/v1alpha1
    Kind:         NetworkLogging
    Metadata:
      Creation Timestamp:  2020-06-20T05:54:08Z
      Generation:          8
      Resource Version:    187864
      Self Link:           /apis/networking.gke.io/v1alpha1/networkloggings/default
      UID:                 0f1ddd6e-4193-4295-9172-baa6a52aa6e6
    Spec:
      Cluster:
        Allow:
          Delegate:  true
          Log:       false
        Deny:
          Delegate:  false
          Log:       false
    Events:
      Type     Reason                 Age                From                                                               Message
      ----     ------                 ----               ----                                                               -------
      Warning  InvalidNetworkLogging  16s (x3 over 11h)  network-logging-controller, gke-anthos-default-pool-cee49209-0t09  cluster allow log action is invalid: delegate cannot be true when log is false
      Warning  InvalidNetworkLogging  16s (x3 over 11h)  network-logging-controller, gke-anthos-default-pool-cee49209-80fx  cluster allow log action is invalid: delegate cannot be true when log is false
    
  2. A node can log up to 500 connections per second. See if there are dropped policy logs by checking if any error counters are incrementing:

    kubectl exec anetd-xyz -n kube-system -- curl -s http://localhost:9990/metrics |grep policy_logging
    

    Replace anted-xyz with the name of an anetd Pod. Check each node.

What's next