About GKE Dataplane V2 observability


GKE Dataplane V2 observability provides GKE Dataplane V2 metrics and insights into Kubernetes workloads. With GKE Dataplane V2 observability, starting with GKE versions 1.28 or later, you can:

GKE Dataplane V2 observability offers the following troubleshooting tools:

  • A Kubernetes cluster Network Topology
  • A Kubernetes Network Policy verdict table with live traffic flows and connection information
  • Command-line tooling for troubleshooting Kubernetes traffic flows

GKE Dataplane V2 metrics

GKE Dataplane V2 metrics provide traffic flow information for the following:

  • Traffic flows: insights about how GKE handles flows between Pods and Services.
  • Network policy enforcement: information about how GKE enforces Kubernetes Network Policies.

You can use GKE Dataplane V2 metrics to monitor and troubleshoot Kubernetes workloads using the following tools:

  • Google Cloud Managed Service for Prometheus to view and analyze your GKE Dataplane V2 metrics. You can modify the Google Cloud Managed Service for Prometheus configuration to add or remove the metrics of your choice for Google Cloud Managed Service for Prometheus ingestion.
  • Cloud Monitoring Metrics Explorer to view Pod-level traffic flow details.
  • Cloud Monitoring to explore and use any metric. For example, you can create alerts that trigger when GKE Dataplane V2 metrics exceed certain thresholds.
  • Self-managed Grafana to visualize metrics collected by Google Cloud Managed Service for Prometheus.

When you enable Google Cloud Managed Service for Prometheus:

  • GKE creates a PodMonitoring resource
  • GKE exposes the metrics endpoint

To consume metrics with Google Cloud Managed Service for Prometheus and to be able to create a PodMonitoring resource, you must enable Google Cloud Managed Service for Prometheus on the cluster. If you don't enable Google Cloud Managed Service for Prometheus, GKE exposes the metrics endpoint but does not create a PodMonitoring resource.

When you enable GKE Dataplane V2 metrics for a cluster, Google Cloud Managed Service for Prometheus ingests the following GKE Dataplane V2 metrics:

GKE Dataplane V2 Metric Type Description
prometheus.googleapis.com/pod_flow_egress_flows_count/counter cumulative Total number of flows from a Pod.
prometheus.googleapis.com/pod_flow_ingress_flows_count/counter cumulative Total number of flows to a Pod.

Enabling GKE Dataplane V2 metrics opens the metrics port on each Kubernetes node.

Additional metrics are also available, including from the open source observability platform Hubble. By default, Google Cloud Managed Service for Prometheus doesn't ingest these additional metrics but you can configure Google Cloud Managed Service for Prometheus to collect them. To collect these metrics, configure a PodMonitoring custom resource (CR).

The following table describes additional Hubble metrics:

Hubble metric Type Description
hubble_flows_processed_total cumulative Total number of flows processed.
hubble_drop_total cumulative Total number of flows dropped.
hubble_port_distribution_total cumulative Total number of flows processed aggregated by port number.
hubble_tcp_flags_total cumulative Total number of flows processed with given TCP flags set.
hubble_icmp_total cumulative Total number of ICMP flows processed.

GKE Dataplane V2 observability tools

GKE Dataplane V2 observability provides a Managed Hubble solution with network observability and security insights for Kubernetes workloads deployed with GKE Dataplane V2.

When enabled, GKE Dataplane V2 observability deploys the following components to your cluster:

  • Hubble Relay: a service that collects network telemetry data about your Pods from each node.

  • Hubble CLI: a command-line interface tool providing live traffic information within the cluster.

You can deploy the following component after you enable GKE Dataplane V2 observability to your cluster:

  • Hubble UI: a web-based tool that you can use to view and analyze the network telemetry data that is collected by Hubble Relay. You must enable GKE Dataplane V2 observability to deploy Hubble UI.

How GKE Dataplane V2 metrics and observability works

GKE Dataplane V2 observability uses the following components and tools to collect metrics and provide insights into your network traffic:

  • GKE Dataplane V2: GKE Dataplane V2 metrics and observability use GKE Dataplane V2 datapath based on eBPF to collect metrics about traffic flows and network policy enforcement for a Pod based on a given workload.

  • Google Cloud Managed Service for Prometheus: GKE Dataplane V2 metrics configures the Google Cloud Managed Service for Prometheus agent to ingest aggregated metrics to Google Cloud Managed Service for Prometheus, a scalable monitoring solution that can ingest and store large amounts of data that also lets you to build on the Google Cloud Observability.

  • Hubble: GKE Dataplane V2 observability uses Hubble, an open source observability project. Hubble enables network observability and security insights for Kubernetes workloads deployed with an eBPF Dataplane.

    Hubble flow events occur when:

    • A network connection is first established

    • A TCP flag is first seen, which indicates the state of the TCP connection

    • A packet is transmitted after at least five seconds have passed since the last flow event

    Hubble metrics: counts the number of flow events in a Kubernetes cluster that you can use to identify which Pods are communicating with each other.

  • Enabling metrics and observability: You can enable GKE Dataplane V2 and observability independent of each other. To enable Network topology visualization in open source Hubble UI feature, you must enable Network inspection.

  • Autopilot clusters:

    • Metrics are enabled by default

    • Observability tools are disabled by default

    • You must create the ClusterPodMonitoring resource to gather metrics in the Google Cloud Managed Service for Prometheus

  • Standard clusters:

    • Metrics are disabled by default

    • Observability tools are disabled by default

    • If you have Google Cloud Managed Service for Prometheus enabled, a PodMonitoring resource is created automatically

    • A PodMonitoring resource is marked as ensure exists. You can stop sending metrics to Google Cloud Managed Service for Prometheus by editing the PodMonitoring resource to disable all metrics

GKE Dataplane V2 observability endpoints

GKE Dataplane V2 observability components expose the following two observability endpoints:

  • Metrics endpoint: an HTTP endpoint that exposes traffic metrics in Prometheus format. The anetd Pod exposes the metrics endpoint on each cluster node on port 9965.

  • Flows port: a gRPC endpoint. The hubble-relay Pod exposes the flows port endpoint as a Kubernetes ClusterIP Service on port 443. The hubble-relay Pod is the backend for the Kubernetes ClusterIP Service and all requests to the Kubernetes ClusterIP Service are forwarded to the hubble-relay Pod. You can access the flow port using Hubble CLI or the Hubble UI.

Limitations

  • GKE Dataplane V2 observability has a cluster-wide limit of 5000 nodes.
  • GKE Dataplane V2 observability only works in clusters with GKE Dataplane V2 enabled.
  • GKE Dataplane V2 metrics are similar to Hubble metrics in that they are implemented as flow-based metrics to provide connection information. These metrics don't count the amount of data nor number of packets transmitted. Given that the metrics are flow-based, they don't provide an accurate representation of the amount of data transmitted in a network flow.

What's next