GKE system metrics

This document lists the metrics available in Cloud Monitoring when Google Kubernetes Engine (GKE) system metrics are enabled.

  • For a general explanation of the entries in the tables, including information about values like DELTA and GAUGE, see Metric types.

    To chart or monitor metric types with values of type STRING, you must use Monitoring Query Language (MQL), and you must convert the value into a numeric value. For information about MQL string-conversion methods, see String.

  • For information about the units used in the metric lists, see the unit field in the MetricDescriptor reference.
  • For information about statements of the form “Sampled every x seconds” and “After sampling, data is not visible for up to y seconds”, see Additional information: metadata.
  • The resource-hierarchy level tells you if the metric is written at the project, organization, or folder level(s). When the level is not specified in the metric descriptor, the metric writes at the project level by default.
  • For information about the meaning of launch stages such as GA (General Availability) and BETA (Preview), see Product launch stages.

Kubernetes metrics

Metrics from Google Kubernetes Engine.

The following list was last generated at 2024-09-12 02:25:45 UTC. For more information about this process, see About the lists.

kubernetes

Metrics for Google Kubernetes Engine. For information on viewing these metrics, go to View observability metrics. Launch stages of these metrics: BETA GA

The "metric type" strings in this table must be prefixed with kubernetes.io/. That prefix has been omitted from the entries in the table. When querying a label, use the metric.labels. prefix; for example, metric.labels.LABEL="VALUE".

Metric type Launch stage(Resource hierarchy levels)
Display name
Kind, Type, Unit
Monitored resources
Description
Labels
GAUGEDOUBLE{cpu}
k8s_scale
Number of CPU cores for the recommended CPU request for a single replica of the workload. Sampled every 60 seconds. After sampling, data is not visible for up to 240 seconds.
container_name: Name of the container.
GAUGEINT64By
k8s_scale
Recommended memory request for a single replica of the workload, in bytes. Sampled every 60 seconds. After sampling, data is not visible for up to 240 seconds.
container_name: Name of the container.
container/accelerator/duty_cycle BETA(project)
Accelerator duty cycle
GAUGEINT64%
k8s_container
Percent of time over the past sample period (10s) during which the accelerator was actively processing. Values are integers between 0 and 100. Sampled every 60 seconds.
make: Make of the accelerator (e.g. nvidia)
accelerator_id: ID of the accelerator.
model: Model of the accelerator (e.g. 'Tesla P100')
container/accelerator/memory_bandwidth_utilization BETA(project)
Memory bandwidth utilization
GAUGEDOUBLEpercent
k8s_container
Current percentage of the accelerator memory bandwidth that is being used. Computed by dividing the memory bandwidth used over a sample period by the maximum supported bandwidth over the same sample period. Sampled every 60 seconds. After sampling, data is not visible for up to 120 seconds.
make: Make of the accelerator.
accelerator_id: ID of the accelerator.
model: Model of the accelerator.
tpu_topology: Topology of the TPU accelerator.
container/accelerator/memory_total BETA(project)
Accelerator memory total
GAUGEINT64By
k8s_container
Total accelerator memory in bytes. Sampled every 60 seconds.
make: Make of the accelerator (e.g. nvidia)
accelerator_id: ID of the accelerator.
model: Model of the accelerator (e.g. 'Tesla P100')
container/accelerator/memory_used BETA(project)
Accelerator memory used
GAUGEINT64By
k8s_container
Total accelerator memory allocated in bytes. Sampled every 60 seconds.
make: Make of the accelerator (e.g. nvidia)
accelerator_id: ID of the accelerator.
model: Model of the accelerator (e.g. 'Tesla P100')
container/accelerator/request BETA(project)
Request accelerators
GAUGEINT64{devices}
k8s_container
Number of accelerator devices requested by the container. Sampled every 60 seconds.
resource_name: Name of the requested accelerator resource.
container/accelerator/tensorcore_utilization BETA(project)
Tensorcore utilization
GAUGEDOUBLEpercent
k8s_container
Current percentage of the Tensorcore that is utilized. Computed by dividing the Tensorcore operations that were performed over a sample period by the supported number of Tensorcore operations over the same sample period. Sampled every 60 seconds. After sampling, data is not visible for up to 120 seconds.
make: Make of the accelerator.
accelerator_id: ID of the accelerator.
model: Model of the accelerator.
tpu_topology: Topology of the TPU accelerator.
container/cpu/core_usage_time GA(project)
CPU usage time
CUMULATIVEDOUBLEs{CPU}
k8s_container
Cumulative CPU usage on all cores used by the container in seconds. Sampled every 60 seconds.
container/cpu/limit_cores GA(project)
Limit cores
GAUGEDOUBLE{cpu}
k8s_container
CPU cores limit of the container. Sampled every 60 seconds.
container/cpu/limit_utilization GA(project)
CPU limit utilization
GAUGEDOUBLE1
k8s_container
The fraction of the CPU limit that is currently in use on the instance. This value can be greater than 1 as a container might be allowed to exceed its CPU limit for extended periods of time. Sampled every 60 seconds. After sampling, data is not visible for up to 240 seconds.
container/cpu/request_cores GA(project)
Request cores
GAUGEDOUBLE{cpu}
k8s_container
Number of CPU cores requested by the container. Sampled every 60 seconds. After sampling, data is not visible for up to 120 seconds.
container/cpu/request_utilization GA(project)
CPU request utilization
GAUGEDOUBLE1
k8s_container
The fraction of the requested CPU that is currently in use on the instance. This value can be greater than 1 as usage can exceed the request. Sampled every 60 seconds. After sampling, data is not visible for up to 240 seconds.
container/ephemeral_storage/limit_bytes GA(project)
Ephemeral storage limit
GAUGEINT64By
k8s_container
Local ephemeral storage limit in bytes. Sampled every 60 seconds.
container/ephemeral_storage/request_bytes GA(project)
Ephemeral storage request
GAUGEINT64By
k8s_container
Local ephemeral storage request in bytes. Sampled every 60 seconds.
container/ephemeral_storage/used_bytes GA(project)
Ephemeral storage usage
GAUGEINT64By
k8s_container
Local ephemeral storage usage in bytes. Sampled every 60 seconds.
container/memory/limit_bytes GA(project)
Memory limit
GAUGEINT64By
k8s_container
Memory limit of the container in bytes. Sampled every 60 seconds.
container/memory/limit_utilization GA(project)
Memory limit utilization
GAUGEDOUBLE1
k8s_container
The fraction of the memory limit that is currently in use on the instance. This value cannot exceed 1 as usage cannot exceed the limit. Sampled every 60 seconds. After sampling, data is not visible for up to 120 seconds.
memory_type: Either `evictable` or `non-evictable`. Evictable memory is memory that can be easily reclaimed by the kernel, while non-evictable memory cannot.
container/memory/page_fault_count GA(project)
Page faults
CUMULATIVEINT641
k8s_container
Number of page faults, broken down by type: major and minor.
fault_type: Fault type - either 'major' or 'minor', with the former indicating that the page had to be loaded from disk.
container/memory/request_bytes GA(project)
Memory request
GAUGEINT64By
k8s_container
Memory request of the container in bytes. Sampled every 60 seconds. After sampling, data is not visible for up to 120 seconds.
container/memory/request_utilization GA(project)
Memory request utilization
GAUGEDOUBLE1
k8s_container
The fraction of the requested memory that is currently in use on the instance. This value can be greater than 1 as usage can exceed the request. Sampled every 60 seconds. After sampling, data is not visible for up to 240 seconds.
memory_type: Either `evictable` or `non-evictable`. Evictable memory is memory that can be easily reclaimed by the kernel, while non-evictable memory cannot.
container/memory/used_bytes GA(project)
Memory usage
GAUGEINT64By
k8s_container
Memory usage in bytes. Sampled every 60 seconds.
memory_type: Either `evictable` or `non-evictable`. Evictable memory is memory that can be easily reclaimed by the kernel, while non-evictable memory cannot.
container/multislice/accelerator/device_to_host_transfer_latencies BETA(project)
Device to Host transfer latencies
DELTADISTRIBUTIONus
k8s_container
Distribution of device to host transfer latency for each chunk of data for multislice traffic. Sampled every 60 seconds. After sampling, data is not visible for up to 120 seconds.
buffer_size: Size of the buffer.
make: Make of the accelerator.
model: Model of the accelerator.
tpu_topology: Topology of the TPU accelerator.
container/multislice/accelerator/host_to_device_transfer_latencies BETA(project)
Host to Device transfer latencies
DELTADISTRIBUTIONus
k8s_container
Distribution of host to device transfer latency for each chunk of data for multislice traffic. Sampled every 60 seconds. After sampling, data is not visible for up to 120 seconds.
buffer_size: Size of the buffer.
make: Make of the accelerator.
model: Model of the accelerator.
tpu_topology: Topology of the TPU accelerator.
container/multislice/network/collective_end_to_end_latencies BETA(project)
Collective latencies
DELTADISTRIBUTIONus
k8s_container
Distribution of end to end collective latency for multislice traffic. Sampled every 60 seconds. After sampling, data is not visible for up to 120 seconds.
collective_type: Collective operation type.
input_size: Size of the message.
make: Make of the accelerator.
model: Model of the accelerator.
tpu_topology: Topology of the TPU accelerator.
container/multislice/network/dcn_transfer_latencies BETA(project)
DCN (Data Center Network) transfer latencies
DELTADISTRIBUTIONus
k8s_container
Distribution of network transfer latencies for multislice traffic. Sampled every 60 seconds. After sampling, data is not visible for up to 120 seconds.
buffer_size: Size of the buffer.
make: Make of the accelerator.
model: Model of the accelerator.
tpu_topology: Topology of the TPU accelerator.
type: Protocol Type.
container/restart_count GA(project)
Restart count
CUMULATIVEINT641
k8s_container
Number of times the container has restarted. Sampled every 60 seconds. After sampling, data is not visible for up to 120 seconds.
container/uptime GA(project)
Uptime
GAUGEDOUBLEs
k8s_container
Time in seconds that the container has been running. Sampled every 60 seconds.
node/accelerator/duty_cycle BETA(project)
Accelerator duty cycle with node
GAUGEDOUBLEpercent
k8s_node
Percent of time over the past sample period (10s) during which the accelerator was actively processing. Sampled every 60 seconds.
make: Make of the accelerator.
accelerator_id: ID of the accelerator.
model: Model of the accelerator.
node/accelerator/memory_bandwidth_utilization BETA(project)
Memory bandwidth utilization
GAUGEDOUBLEpercent
k8s_node
Current percentage of the accelerator memory bandwidth that is being used. Computed by dividing the memory bandwidth used over a sample period by the maximum supported bandwidth over the same sample period. Sampled every 60 seconds. After sampling, data is not visible for up to 120 seconds.
make: Make of the accelerator.
accelerator_id: ID of the accelerator.
model: Model of the accelerator.
tpu_topology: Topology of the TPU accelerator.
node/accelerator/memory_total BETA(project)
Accelerator memory total with node
GAUGEINT64bytes
k8s_node
Total accelerator memory in bytes. Sampled every 60 seconds.
make: Make of the accelerator.
accelerator_id: ID of the accelerator.
model: Model of the accelerator.
node/accelerator/memory_used BETA(project)
Accelerator memory used with node
GAUGEINT64bytes
k8s_node
Total accelerator memory allocated in bytes. Sampled every 60 seconds.
make: Make of the accelerator.
accelerator_id: ID of the accelerator.
model: Model of the accelerator.
node/accelerator/tensorcore_utilization BETA(project)
Tensorcore utilization
GAUGEDOUBLEpercent
k8s_node
Current percentage of the Tensorcore that is utilized. Computed by dividing the Tensorcore operations that were performed over a sample period by the supported number of Tensorcore operations over the same sample period. Sampled every 60 seconds. After sampling, data is not visible for up to 120 seconds.
make: Make of the accelerator.
accelerator_id: ID of the accelerator.
model: Model of the accelerator.
tpu_topology: Topology of the TPU accelerator.
node/cpu/allocatable_cores GA(project)
Allocatable cores
GAUGEDOUBLE{cpu}
k8s_node
Number of allocatable CPU cores on the node. Sampled every 60 seconds.
node/cpu/allocatable_utilization GA(project)
CPU allocatable utilization
GAUGEDOUBLE1
k8s_node
The fraction of the allocatable CPU that is currently in use on the instance. Sampled every 60 seconds. After sampling, data is not visible for up to 240 seconds.
node/cpu/core_usage_time GA(project)
CPU usage time
CUMULATIVEDOUBLEs{CPU}
k8s_node
Cumulative CPU usage on all cores used on the node in seconds. Sampled every 60 seconds.
node/cpu/total_cores GA(project)
Total cores
GAUGEDOUBLE{cpu}
k8s_node
Total number of CPU cores on the node. Sampled every 60 seconds.
node/ephemeral_storage/allocatable_bytes GA(project)
Allocatable ephemeral storage
GAUGEINT64By
k8s_node
Local ephemeral storage bytes allocatable on the node. Sampled every 60 seconds.
node/ephemeral_storage/inodes_free GA(project)
Free inodes
GAUGEINT641
k8s_node
Free number of inodes on local ephemeral storage. Sampled every 60 seconds.
node/ephemeral_storage/inodes_total GA(project)
Total inodes
GAUGEINT641
k8s_node
Total number of inodes on local ephemeral storage. Sampled every 60 seconds.
node/ephemeral_storage/total_bytes GA(project)
Total ephemeral storage
GAUGEINT64By
k8s_node
Total ephemeral storage bytes on the node. Sampled every 60 seconds.
node/ephemeral_storage/used_bytes GA(project)
Ephemeral storage usage
GAUGEINT64By
k8s_node
Local ephemeral storage bytes used by the node. Sampled every 60 seconds.
node/logs/input_bytes BETA(project)
Logging throughput
DELTAINT64By
k8s_node
Volume of log bytes generated on the node by user and system workloads. Sampled every 60 seconds. After sampling, data is not visible for up to 120 seconds.
type: Type is either 'system' or 'workload'. 'system' indicates the logging throughput of GKE system components. 'workload' indicates the throughput of logs generated by non-system containers running on user nodes.
node/memory/allocatable_bytes GA(project)
Allocatable memory
GAUGEINT64By
k8s_node
Number of bytes of memory that can be allocated for workloads on the node. Sampled every 60 seconds.
node/memory/allocatable_utilization GA(project)
Memory allocatable utilization
GAUGEDOUBLE1
k8s_node
The fraction of the allocatable memory that is currently in use on the instance. This value cannot exceed 1 as usage cannot exceed allocatable memory bytes. Sampled every 60 seconds. After sampling, data is not visible for up to 120 seconds.
memory_type: Either `evictable` or `non-evictable`. Evictable memory is memory that can be easily reclaimed by the kernel, while non-evictable memory cannot.
component: Name of the respective system daemon.
node/memory/total_bytes GA(project)
Total memory
GAUGEINT64By
k8s_node
Total number of bytes of memory on the node. Sampled every 60 seconds.
node/memory/used_bytes GA(project)
Memory usage
GAUGEINT64By
k8s_node
Cumulative memory bytes used by the node. Sampled every 60 seconds.
memory_type: Either `evictable` or `non-evictable`. Evictable memory is memory that can be easily reclaimed by the kernel, while non-evictable memory cannot.
node/network/received_bytes_count GA(project)
Bytes received
CUMULATIVEINT64By
k8s_node
Cumulative number of bytes received by the node over the network. Sampled every 60 seconds.
node/network/sent_bytes_count GA(project)
Bytes transmitted
CUMULATIVEINT64By
k8s_node
Cumulative number of bytes transmitted by the node over the network. Sampled every 60 seconds.
node/pid_limit GA(project)
PID capacity
GAUGEINT641
k8s_node
The max PID of OS on the node. Sampled every 60 seconds.
node/pid_used GA(project)
PID usage
GAUGEINT641
k8s_node
The number of running process in the OS on the node. Sampled every 60 seconds.
node_daemon/cpu/core_usage_time GA(project)
CPU usage time
CUMULATIVEDOUBLEs{CPU}
k8s_node
Cumulative CPU usage on all cores used by the node level system daemon in seconds. Sampled every 60 seconds.
component: Name of the respective system daemon.
node_daemon/memory/used_bytes GA(project)
Memory usage
GAUGEINT64By
k8s_node
Memory usage by the system daemon in bytes. Sampled every 60 seconds.
component: Name of the respective system daemon.
memory_type: Either `evictable` or `non-evictable`. Evictable memory is memory that can be easily reclaimed by the kernel, while non-evictable memory cannot.
pod/ephemeral_storage/used_bytes BETA(project)
Ephemeral pod storage usage
GAUGEINT64By
k8s_pod
Pod ephemeral storage usage in bytes. Sampled every 60 seconds.
pod/network/policy_event_count BETA(project)
Network policy event count
DELTAINT641
k8s_pod
Change in the number of network policy events seen in the dataplane. Sampled every 60 seconds. After sampling, data is not visible for up to 120 seconds.
verdict: Policy verdict, possible values: [allow, deny].
workload_kind: Kind of the workload, policy-enforced-pod belongs to, for example, "Deployment", "Replicaset", "StatefulSet", "DaemonSet", "Job" or "CronJob".
workload_name: Name of the workload, policy-enforced-pod belongs to.
direction: Direction of the traffic from the point of view of policy-enforced-pod, possible values: [ingress, egress].
pod/network/received_bytes_count GA(project)
Bytes received
CUMULATIVEINT64By
k8s_pod
Cumulative number of bytes received by the pod over the network. Sampled every 60 seconds.
pod/network/sent_bytes_count GA(project)
Bytes transmitted
CUMULATIVEINT64By
k8s_pod
Cumulative number of bytes transmitted by the pod over the network. Sampled every 60 seconds.
pod/volume/total_bytes GA(project)
Volume capacity
GAUGEINT64By
k8s_pod
Total number of disk bytes available to the pod. Sampled every 60 seconds. After sampling, data is not visible for up to 120 seconds.
volume_name: The name of the volume (e.g. `/dev/sda1`).
persistentvolumeclaim_name: The name of the referenced Persistent Volume Claim.
persistentvolumeclaim_namespace: The namespace of the referenced Persistent Volume Claim.
pod/volume/used_bytes GA(project)
Volume usage
GAUGEINT64By
k8s_pod
Number of disk bytes used by the pod. Sampled every 60 seconds.
volume_name: The name of the volume (e.g. `/dev/sda1`).
persistentvolumeclaim_name: The name of the referenced Persistent Volume Claim.
persistentvolumeclaim_namespace: The namespace of the referenced Persistent Volume Claim.
pod/volume/utilization GA(project)
Volume utilization
GAUGEDOUBLE1
k8s_pod
The fraction of the volume that is currently being used by the instance. This value cannot be greater than 1 as usage cannot exceed the total available volume space. Sampled every 60 seconds. After sampling, data is not visible for up to 120 seconds.
volume_name: The name of the volume (e.g. `/dev/sda1`).
persistentvolumeclaim_name: The name of the referenced Persistent Volume Claim.
persistentvolumeclaim_namespace: The namespace of the referenced Persistent Volume Claim.

nginx

Metrics exported from the NGINX Prometheus Exporter. Launch stages of these metrics: ALPHA

The "metric type" strings in this table must be prefixed with kubernetes.io/nginx/. That prefix has been omitted from the entries in the table. When querying a label, use the metric.labels. prefix; for example, metric.labels.LABEL="VALUE".

Metric type Launch stage(Resource hierarchy levels)
Display name
Kind, Type, Unit
Monitored resources
Description
Labels
connections_accepted ALPHA(project)
Nginx connections_accepted
CUMULATIVEINT64{connection}
k8s_container
Accepted client connections. Sampled every 60 seconds.
connections_active ALPHA(project)
Nginx connections_active
GAUGEINT64{connection}
k8s_container
Active client connections. Sampled every 60 seconds.
connections_handled ALPHA(project)
Nginx connections_handled
CUMULATIVEINT64{connection}
k8s_container
Handled client connections. Sampled every 60 seconds.
connections_reading ALPHA(project)
Nginx connections_reading
GAUGEINT64{connection}
k8s_container
Connections where NGINX is reading the request header. Sampled every 60 seconds.
connections_waiting ALPHA(project)
Nginx connections_waiting
GAUGEINT64{connection}
k8s_container
Idle client connections. Sampled every 60 seconds.
connections_writing ALPHA(project)
Nginx connections_writing
GAUGEINT64{connection}
k8s_container
Connections where NGINX is writing the response back to the client. Sampled every 60 seconds.
http_requests_total ALPHA(project)
Nginx http_requests_total
CUMULATIVEINT64{request}
k8s_container
Total http requests. Sampled every 60 seconds.
nginxexporter_build_info ALPHA(project)
Nginx nginxexporter_build_info
GAUGEINT641
k8s_container
Exporter build information. Sampled every 60 seconds.
gitCommit: Commit hash of the build which can be abbreviated.
version: Build version.
up ALPHA(project)
Nginx up
GAUGEINT641
k8s_container
Status of the last metric scrape. Indicates if the server is up or not. Sampled every 60 seconds.

Generated at 2024-09-12 02:25:45 UTC.