GKE Dataplane V2 observability provides GKE Dataplane V2 metrics and insights into workloads on GKE clusters on Google Cloud. With GKE Dataplane V2 observability, starting with GKE versions 1.28 or later, you can:
- Capture, observe, and alert on network metrics using Google Cloud Managed Service for Prometheus and Cloud Monitoring with Metrics Explorer
- Understand traffic flows for a particular Service in a cluster
- Understand and identify issues with the network health of a Kubernetes workload
- Verify Kubernetes Network Policies
GKE Dataplane V2 observability offers the following troubleshooting tools:
- A Kubernetes cluster Network Topology
- A Kubernetes Network Policy verdict table with live traffic flows and connection information
- Command-line tooling for troubleshooting Kubernetes traffic flows
GKE Dataplane V2 metrics
GKE Dataplane V2 metrics provide traffic flow information for the following:
- Traffic flows: insights about how GKE handles flows between Pods and Services.
Network policy enforcement: information about how GKE enforces Kubernetes Network Policies.
You can use GKE Dataplane V2 metrics to monitor and troubleshoot Kubernetes workloads using the following tools:
- Google Cloud Managed Service for Prometheus to view and analyze your GKE Dataplane V2 metrics. You can modify the Google Cloud Managed Service for Prometheus configuration to add or remove the metrics of your choice for Google Cloud Managed Service for Prometheus ingestion.
- Cloud Monitoring Metrics Explorer to view Pod-level traffic flow details.
- Cloud Monitoring to explore and use any metric. For example, you can create alerts that trigger when GKE Dataplane V2 metrics exceed certain thresholds.
- Self-managed Grafana to visualize metrics collected by Google Cloud Managed Service for Prometheus.
When you enable Google Cloud Managed Service for Prometheus:
- GKE creates a
PodMonitoring
resource - GKE exposes the metrics endpoint
To consume metrics with Google Cloud Managed Service for Prometheus and to be able to create a
PodMonitoring
resource, you must enable Google Cloud Managed Service for Prometheus
on the cluster. If you don't enable Google Cloud Managed Service for Prometheus, GKE
exposes the metrics endpoint but does not create a PodMonitoring
resource.
When you enable GKE Dataplane V2 metrics for a cluster, Google Cloud Managed Service for Prometheus ingests the following GKE Dataplane V2 metrics:
GKE Dataplane V2 Metric | Type | Description |
---|---|---|
prometheus.googleapis.com/pod_flow_egress_flows_count/counter
|
cumulative | Total number of flows from a Pod. |
prometheus.googleapis.com/pod_flow_ingress_flows_count/counter
|
cumulative | Total number of flows to a Pod. |
Enabling GKE Dataplane V2 metrics opens the metrics port on each Kubernetes node.
Additional metrics are also available, including from the open source
observability platform Hubble.
By default, Google Cloud Managed Service for Prometheus doesn't ingest these additional metrics
but you can configure Google Cloud Managed Service for Prometheus to collect them. To collect
these metrics, configure a PodMonitoring
custom resource (CR).
The following table describes additional Hubble metrics:
Hubble metric | Type | Description |
---|---|---|
hubble_flows_processed_total
|
cumulative | Total number of flows processed. |
hubble_drop_total
|
cumulative | Total number of flows dropped. |
hubble_port_distribution_total
|
cumulative | Total number of flows processed aggregated by port number. |
hubble_tcp_flags_total
|
cumulative | Total number of flows processed with given TCP flags set. |
hubble_icmp_total
|
cumulative | Total number of ICMP flows processed. |
GKE Dataplane V2 observability tools
GKE Dataplane V2 observability provides a Managed Hubble solution with network observability and security insights for Kubernetes workloads deployed with GKE Dataplane V2.
When enabled, GKE Dataplane V2 observability deploys the following components to your cluster:
Hubble Relay: a service that collects network telemetry data about your Pods from each node.
Hubble CLI: a command-line interface tool providing live traffic information within the cluster.
You can deploy the following component after you enable GKE Dataplane V2 observability to your cluster:
- Hubble UI: a web-based tool that you can use to view and analyze the network telemetry data that is collected by Hubble Relay. You must enable GKE Dataplane V2 observability to deploy Hubble UI.
How GKE Dataplane V2 metrics and observability works
GKE Dataplane V2 observability uses the following components and tools to collect metrics and provide insights into your network traffic:
GKE Dataplane V2: GKE Dataplane V2 metrics and observability use GKE Dataplane V2 datapath based on eBPF to collect metrics about traffic flows and network policy enforcement for a Pod based on a given workload.
Google Cloud Managed Service for Prometheus: GKE Dataplane V2 metrics configures the Google Cloud Managed Service for Prometheus agent to ingest aggregated metrics to Google Cloud Managed Service for Prometheus, a scalable monitoring solution that can ingest and store large amounts of data that also lets you build on the Google Cloud Observability.
Hubble: GKE Dataplane V2 observability uses Hubble, an open source observability project. Hubble enables network observability and security insights for Kubernetes workloads deployed with an eBPF Dataplane.
Hubble flow events occur when:
A network connection is first established
A TCP flag is first seen, which indicates the state of the TCP connection
A packet is transmitted after at least five seconds have passed since the last flow event
Hubble metrics: counts the number of flow events in a Kubernetes cluster that you can use to identify which Pods are communicating with each other.
Enabling metrics and observability: You can enable GKE Dataplane V2 and observability independent of each other. To enable Network topology visualization in open source Hubble UI feature, you must enable Network inspection.
Autopilot clusters:
Metrics are enabled by default
Observability tools are disabled by default
You must create the Cluster
PodMonitoring
resource to gather metrics in the Google Cloud Managed Service for Prometheus
Standard clusters:
Metrics are disabled by default
Observability tools are disabled by default
If you have Google Cloud Managed Service for Prometheus enabled, a
PodMonitoring
resource is created automaticallyA
PodMonitoring
resource is marked as ensure exists. You can stop sending metrics to Google Cloud Managed Service for Prometheus by editing thePodMonitoring
resource to disable all metrics
GKE Dataplane V2 observability endpoints
GKE Dataplane V2 observability components expose the following two observability endpoints:
Metrics endpoint: an HTTP endpoint that exposes traffic metrics in Prometheus format. The
anetd
Pod exposes the metrics endpoint on each cluster node on port 9965.Flows port: a gRPC endpoint. The
hubble-relay
Pod exposes the flows port endpoint as a KubernetesClusterIP
Service on port 443. Thehubble-relay
Pod is the backend for the KubernetesClusterIP
Service and all requests to the KubernetesClusterIP
Service are forwarded to thehubble-relay
Pod. You can access the flow port using Hubble CLI or the Hubble UI.
Limitations
- GKE Dataplane V2 observability has a cluster-wide limit of 5000 nodes.
- GKE Dataplane V2 metrics and observability only work in clusters on Google Cloud with GKE Dataplane V2 enabled.
- GKE Dataplane V2 metrics are similar to Hubble metrics in that they are implemented as flow-based metrics to provide connection information. These metrics don't count the amount of data nor number of packets transmitted. Given that the metrics are flow-based, they don't provide an accurate representation of the amount of data transmitted in a network flow.