GKE system metrics

This page lists the metrics available in Cloud Monitoring when Google Kubernetes Engine (GKE) system metrics are enabled.

  • For a general explanation of the entries in the tables, including information about values like DELTA and GAUGE, see Metric types.

  • For information about the units used in the metric lists, see the unit field in the MetricDescriptor reference.

  • For information about statements of the form “Sampled every x seconds” and “After sampling, data is not visible for up to y seconds”, see Additional information: metadata.

  • For a set of complete, current lists of supported metric types, see Metrics list.

  • For information about the meaning of launch stages such as GA (General Availability) and BETA (Preview), see Product launch stages.

Kubernetes metrics

Metrics from Google Kubernetes Engine.

The following list was last generated at 2024-03-14 21:32:40 UTC. For more information about this process, see About the lists.

kubernetes

Metrics for Google Kubernetes Engine. For information on viewing these metrics, go to View observability metrics. Launch stages of these metrics: BETA GA

The "metric type" strings in this table must be prefixed with kubernetes.io/. That prefix has been omitted from the entries in the table. When querying a label, use the metric.labels. prefix; for example, metric.labels.LABEL="VALUE".

Metric type Launch stage
Display name
Kind, Type, Unit
Monitored resources
Description
Labels
GAUGEDOUBLE{cpu}
k8s_scale
Number of CPU cores for the recommended CPU request for a single replica of the workload. Sampled every 60 seconds. After sampling, data is not visible for up to 240 seconds.
container_name: Name of the container.
GAUGEINT64By
k8s_scale
Recommended memory request for a single replica of the workload, in bytes. Sampled every 60 seconds. After sampling, data is not visible for up to 240 seconds.
container_name: Name of the container.
container/accelerator/duty_cycle BETA
Accelerator duty cycle
GAUGEINT64%
k8s_container
Percent of time over the past sample period (10s) during which the accelerator was actively processing. Values are integers between 0 and 100. Sampled every 60 seconds.
make: Make of the accelerator (e.g. nvidia)
accelerator_id: ID of the accelerator.
model: Model of the accelerator (e.g. 'Tesla P100')
container/accelerator/memory_bandwidth_utilization BETA
Memory bandwidth utilization
GAUGEDOUBLEpercent
k8s_container
Current percentage of the accelerator memory bandwidth that is being used. Computed by dividing the memory bandwidth used over a sample period by the maximum supported bandwidth over the same sample period. Sampled every 60 seconds. After sampling, data is not visible for up to 120 seconds.
make: Make of the accelerator.
accelerator_id: ID of the accelerator.
model: Model of the accelerator.
tpu_topology: Topology of the TPU accelerator.
container/accelerator/memory_total BETA
Accelerator memory total
GAUGEINT64By
k8s_container
Total accelerator memory in bytes. Sampled every 60 seconds.
make: Make of the accelerator (e.g. nvidia)
accelerator_id: ID of the accelerator.
model: Model of the accelerator (e.g. 'Tesla P100')
container/accelerator/memory_used BETA
Accelerator memory used
GAUGEINT64By
k8s_container
Total accelerator memory allocated in bytes. Sampled every 60 seconds.
make: Make of the accelerator (e.g. nvidia)
accelerator_id: ID of the accelerator.
model: Model of the accelerator (e.g. 'Tesla P100')
container/accelerator/request BETA
Request accelerators
GAUGEINT64{devices}
k8s_container
Number of accelerator devices requested by the container. Sampled every 60 seconds.
resource_name: Name of the requested accelerator resource.
container/accelerator/tensorcore_utilization BETA
Tensorcore utilization
GAUGEDOUBLEpercent
k8s_container
Current percentage of the Tensorcore that is utilized. Computed by dividing the Tensorcore operations that were performed over a sample period by the supported number of Tensorcore operations over the same sample period. Sampled every 60 seconds. After sampling, data is not visible for up to 120 seconds.
make: Make of the accelerator.
accelerator_id: ID of the accelerator.
model: Model of the accelerator.
tpu_topology: Topology of the TPU accelerator.
container/cpu/core_usage_time GA
CPU usage time
CUMULATIVEDOUBLEs{CPU}
k8s_container
Cumulative CPU usage on all cores used by the container in seconds. Sampled every 60 seconds.
container/cpu/limit_cores GA
Limit cores
GAUGEDOUBLE{cpu}
k8s_container
CPU cores limit of the container. Sampled every 60 seconds.
container/cpu/limit_utilization GA
CPU limit utilization
GAUGEDOUBLE1
k8s_container
The fraction of the CPU limit that is currently in use on the instance. This value can be greater than 1 as a container might be allowed to exceed its CPU limit for extended periods of time. Sampled every 60 seconds. After sampling, data is not visible for up to 240 seconds.
container/cpu/request_cores GA
Request cores
GAUGEDOUBLE{cpu}
k8s_container
Number of CPU cores requested by the container. Sampled every 60 seconds. After sampling, data is not visible for up to 120 seconds.
container/cpu/request_utilization GA
CPU request utilization
GAUGEDOUBLE1
k8s_container
The fraction of the requested CPU that is currently in use on the instance. This value can be greater than 1 as usage can exceed the request. Sampled every 60 seconds. After sampling, data is not visible for up to 240 seconds.
container/ephemeral_storage/limit_bytes GA
Ephemeral storage limit
GAUGEINT64By
k8s_container
Local ephemeral storage limit in bytes. Sampled every 60 seconds.
container/ephemeral_storage/request_bytes GA
Ephemeral storage request
GAUGEINT64By
k8s_container
Local ephemeral storage request in bytes. Sampled every 60 seconds.
container/ephemeral_storage/used_bytes GA
Ephemeral storage usage
GAUGEINT64By
k8s_container
Local ephemeral storage usage in bytes. Sampled every 60 seconds.
container/memory/limit_bytes GA
Memory limit
GAUGEINT64By
k8s_container
Memory limit of the container in bytes. Sampled every 60 seconds.
container/memory/limit_utilization GA
Memory limit utilization
GAUGEDOUBLE1
k8s_container
The fraction of the memory limit that is currently in use on the instance. This value cannot exceed 1 as usage cannot exceed the limit. Sampled every 60 seconds. After sampling, data is not visible for up to 120 seconds.
memory_type: Either `evictable` or `non-evictable`. Evictable memory is memory that can be easily reclaimed by the kernel, while non-evictable memory cannot.
container/memory/page_fault_count GA
Page faults
CUMULATIVEINT641
k8s_container
Number of page faults, broken down by type: major and minor.
fault_type: Fault type - either 'major' or 'minor', with the former indicating that the page had to be loaded from disk.
container/memory/request_bytes GA
Memory request
GAUGEINT64By
k8s_container
Memory request of the container in bytes. Sampled every 60 seconds. After sampling, data is not visible for up to 120 seconds.
container/memory/request_utilization GA
Memory request utilization
GAUGEDOUBLE1
k8s_container
The fraction of the requested memory that is currently in use on the instance. This value can be greater than 1 as usage can exceed the request. Sampled every 60 seconds. After sampling, data is not visible for up to 240 seconds.
memory_type: Either `evictable` or `non-evictable`. Evictable memory is memory that can be easily reclaimed by the kernel, while non-evictable memory cannot.
container/memory/used_bytes GA
Memory usage
GAUGEINT64By
k8s_container
Memory usage in bytes. Sampled every 60 seconds.
memory_type: Either `evictable` or `non-evictable`. Evictable memory is memory that can be easily reclaimed by the kernel, while non-evictable memory cannot.
container/multislice/accelerator/device_to_host_transfer_latencies BETA
Device to Host transfer latencies
DELTADISTRIBUTIONus
k8s_container
Distribution of device to host transfer latency for each chunk of data for multislice traffic. Sampled every 60 seconds. After sampling, data is not visible for up to 120 seconds.
buffer_size: Size of the buffer.
make: Make of the accelerator.
model: Model of the accelerator.
tpu_topology: Topology of the TPU accelerator.
container/multislice/accelerator/host_to_device_transfer_latencies BETA
Host to Device transfer latencies
DELTADISTRIBUTIONus
k8s_container
Distribution of host to device transfer latency for each chunk of data for multislice traffic. Sampled every 60 seconds. After sampling, data is not visible for up to 120 seconds.
buffer_size: Size of the buffer.
make: Make of the accelerator.
model: Model of the accelerator.
tpu_topology: Topology of the TPU accelerator.
container/multislice/network/collective_end_to_end_latencies BETA
Collective latencies
DELTADISTRIBUTIONus
k8s_container
Distribution of end to end collective latency for multislice traffic. Sampled every 60 seconds. After sampling, data is not visible for up to 120 seconds.
collective_type: Collective operation type.
input_size: Size of the message.
make: Make of the accelerator.
model: Model of the accelerator.
tpu_topology: Topology of the TPU accelerator.
container/multislice/network/dcn_transfer_latencies BETA
DCN (Data Center Network) transfer latencies
DELTADISTRIBUTIONus
k8s_container
Distribution of network transfer latencies for multislice traffic. Sampled every 60 seconds. After sampling, data is not visible for up to 120 seconds.
buffer_size: Size of the buffer.
make: Make of the accelerator.
model: Model of the accelerator.
tpu_topology: Topology of the TPU accelerator.
type: Protocol Type.
container/restart_count GA
Restart count
CUMULATIVEINT641
k8s_container
Number of times the container has restarted. Sampled every 60 seconds. After sampling, data is not visible for up to 120 seconds.
container/uptime GA
Uptime
GAUGEDOUBLEs
k8s_container
Time in seconds that the container has been running. Sampled every 60 seconds.
node/accelerator/duty_cycle BETA
Accelerator duty cycle with node
GAUGEDOUBLEpercent
k8s_node
Percent of time over the past sample period (10s) during which the accelerator was actively processing. Sampled every 60 seconds.
make: Make of the accelerator.
accelerator_id: ID of the accelerator.
model: Model of the accelerator.
node/accelerator/memory_bandwidth_utilization BETA
Memory bandwidth utilization
GAUGEDOUBLEpercent
k8s_node
Current percentage of the accelerator memory bandwidth that is being used. Computed by dividing the memory bandwidth used over a sample period by the maximum supported bandwidth over the same sample period. Sampled every 60 seconds. After sampling, data is not visible for up to 120 seconds.
make: Make of the accelerator.
accelerator_id: ID of the accelerator.
model: Model of the accelerator.
tpu_topology: Topology of the TPU accelerator.
node/accelerator/memory_total BETA
Accelerator memory total with node
GAUGEINT64bytes
k8s_node
Total accelerator memory in bytes. Sampled every 60 seconds.
make: Make of the accelerator.
accelerator_id: ID of the accelerator.
model: Model of the accelerator.
node/accelerator/memory_used BETA
Accelerator memory used with node
GAUGEINT64bytes
k8s_node
Total accelerator memory allocated in bytes. Sampled every 60 seconds.
make: Make of the accelerator.
accelerator_id: ID of the accelerator.
model: Model of the accelerator.
node/accelerator/tensorcore_utilization BETA
Tensorcore utilization
GAUGEDOUBLEpercent
k8s_node
Current percentage of the Tensorcore that is utilized. Computed by dividing the Tensorcore operations that were performed over a sample period by the supported number of Tensorcore operations over the same sample period. Sampled every 60 seconds. After sampling, data is not visible for up to 120 seconds.
make: Make of the accelerator.
accelerator_id: ID of the accelerator.
model: Model of the accelerator.
tpu_topology: Topology of the TPU accelerator.
node/cpu/allocatable_cores GA
Allocatable cores
GAUGEDOUBLE{cpu}
k8s_node
Number of allocatable CPU cores on the node. Sampled every 60 seconds.
node/cpu/allocatable_utilization GA
CPU allocatable utilization
GAUGEDOUBLE1
k8s_node
The fraction of the allocatable CPU that is currently in use on the instance. Sampled every 60 seconds. After sampling, data is not visible for up to 240 seconds.
node/cpu/core_usage_time GA
CPU usage time
CUMULATIVEDOUBLEs{CPU}
k8s_node
Cumulative CPU usage on all cores used on the node in seconds. Sampled every 60 seconds.
node/cpu/total_cores GA
Total cores
GAUGEDOUBLE{cpu}
k8s_node
Total number of CPU cores on the node. Sampled every 60 seconds.
node/ephemeral_storage/allocatable_bytes GA
Allocatable ephemeral storage
GAUGEINT64By
k8s_node
Local ephemeral storage bytes allocatable on the node. Sampled every 60 seconds.
node/ephemeral_storage/inodes_free GA
Free inodes
GAUGEINT641
k8s_node
Free number of inodes on local ephemeral storage. Sampled every 60 seconds.
node/ephemeral_storage/inodes_total GA
Total inodes
GAUGEINT641
k8s_node
Total number of inodes on local ephemeral storage. Sampled every 60 seconds.
node/ephemeral_storage/total_bytes GA
Total ephemeral storage
GAUGEINT64By
k8s_node
Total ephemeral storage bytes on the node. Sampled every 60 seconds.
node/ephemeral_storage/used_bytes GA
Ephemeral storage usage
GAUGEINT64By
k8s_node
Local ephemeral storage bytes used by the node. Sampled every 60 seconds.
node/logs/input_bytes BETA
Logging throughput
DELTAINT64By
k8s_node
Volume of log bytes generated on the node by user and system workloads. Sampled every 60 seconds. After sampling, data is not visible for up to 120 seconds.
type: Type is either 'system' or 'workload'. 'system' indicates the logging throughput of GKE system components. 'workload' indicates the throughput of logs generated by non-system containers running on user nodes.
node/memory/allocatable_bytes GA
Allocatable memory
GAUGEINT64By
k8s_node
Number of bytes of memory that can be allocated for workloads on the node. Sampled every 60 seconds.
node/memory/allocatable_utilization GA
Memory allocatable utilization
GAUGEDOUBLE1
k8s_node
The fraction of the allocatable memory that is currently in use on the instance. This value cannot exceed 1 as usage cannot exceed allocatable memory bytes. Sampled every 60 seconds. After sampling, data is not visible for up to 120 seconds.
memory_type: Either `evictable` or `non-evictable`. Evictable memory is memory that can be easily reclaimed by the kernel, while non-evictable memory cannot.
component: Name of the respective system daemon.
node/memory/total_bytes GA
Total memory
GAUGEINT64By
k8s_node
Total number of bytes of memory on the node. Sampled every 60 seconds.
node/memory/used_bytes GA
Memory usage
GAUGEINT64By
k8s_node
Cumulative memory bytes used by the node. Sampled every 60 seconds.
memory_type: Either `evictable` or `non-evictable`. Evictable memory is memory that can be easily reclaimed by the kernel, while non-evictable memory cannot.
node/network/received_bytes_count GA
Bytes received
CUMULATIVEINT64By
k8s_node
Cumulative number of bytes received by the node over the network. Sampled every 60 seconds.
node/network/sent_bytes_count GA
Bytes transmitted
CUMULATIVEINT64By
k8s_node
Cumulative number of bytes transmitted by the node over the network. Sampled every 60 seconds.
node/pid_limit GA
PID capacity
GAUGEINT641
k8s_node
The max PID of OS on the node. Sampled every 60 seconds.
node/pid_used GA
PID usage
GAUGEINT641
k8s_node
The number of running process in the OS on the node. Sampled every 60 seconds.
node_daemon/cpu/core_usage_time GA
CPU usage time
CUMULATIVEDOUBLEs{CPU}
k8s_node
Cumulative CPU usage on all cores used by the node level system daemon in seconds. Sampled every 60 seconds.
component: Name of the respective system daemon.
node_daemon/memory/used_bytes GA
Memory usage
GAUGEINT64By
k8s_node
Memory usage by the system daemon in bytes. Sampled every 60 seconds.
component: Name of the respective system daemon.
memory_type: Either `evictable` or `non-evictable`. Evictable memory is memory that can be easily reclaimed by the kernel, while non-evictable memory cannot.
pod/ephemeral_storage/used_bytes BETA
Ephemeral pod storage usage
GAUGEINT64By
k8s_pod
Pod ephemeral storage usage in bytes. Sampled every 60 seconds.
pod/network/policy_event_count BETA
Network policy event count
DELTAINT641
k8s_pod
Change in the number of network policy events seen in the dataplane. Sampled every 60 seconds. After sampling, data is not visible for up to 120 seconds.
verdict: Policy verdict, possible values: [allow, deny].
workload_kind: Kind of the workload, policy-enforced-pod belongs to, for example, "Deployment", "Replicaset", "StatefulSet", "DaemonSet", "Job" or "CronJob".
workload_name: Name of the workload, policy-enforced-pod belongs to.
direction: Direction of the traffic from the point of view of policy-enforced-pod, possible values: [ingress, egress].
pod/network/received_bytes_count GA
Bytes received
CUMULATIVEINT64By
k8s_pod
Cumulative number of bytes received by the pod over the network. Sampled every 60 seconds.
pod/network/sent_bytes_count GA
Bytes transmitted
CUMULATIVEINT64By
k8s_pod
Cumulative number of bytes transmitted by the pod over the network. Sampled every 60 seconds.
pod/volume/total_bytes GA
Volume capacity
GAUGEINT64By
k8s_pod
Total number of disk bytes available to the pod. Sampled every 60 seconds. After sampling, data is not visible for up to 120 seconds.
volume_name: The name of the volume (e.g. `/dev/sda1`).
persistentvolumeclaim_name: The name of the referenced Persistent Volume Claim.
persistentvolumeclaim_namespace: The namespace of the referenced Persistent Volume Claim.
pod/volume/used_bytes GA
Volume usage
GAUGEINT64By
k8s_pod
Number of disk bytes used by the pod. Sampled every 60 seconds.
volume_name: The name of the volume (e.g. `/dev/sda1`).
persistentvolumeclaim_name: The name of the referenced Persistent Volume Claim.
persistentvolumeclaim_namespace: The namespace of the referenced Persistent Volume Claim.
pod/volume/utilization GA
Volume utilization
GAUGEDOUBLE1
k8s_pod
The fraction of the volume that is currently being used by the instance. This value cannot be greater than 1 as usage cannot exceed the total available volume space. Sampled every 60 seconds. After sampling, data is not visible for up to 120 seconds.
volume_name: The name of the volume (e.g. `/dev/sda1`).
persistentvolumeclaim_name: The name of the referenced Persistent Volume Claim.
persistentvolumeclaim_namespace: The namespace of the referenced Persistent Volume Claim.

nginx

Metrics exported from the NGINX Prometheus Exporter. Launch stages of these metrics: ALPHA

The "metric type" strings in this table must be prefixed with kubernetes.io/nginx/. That prefix has been omitted from the entries in the table. When querying a label, use the metric.labels. prefix; for example, metric.labels.LABEL="VALUE".

Metric type Launch stage
Display name
Kind, Type, Unit
Monitored resources
Description
Labels
connections_accepted ALPHA
Nginx connections_accepted
CUMULATIVEINT64{connection}
k8s_container
Accepted client connections. Sampled every 60 seconds.
connections_active ALPHA
Nginx connections_active
GAUGEINT64{connection}
k8s_container
Active client connections. Sampled every 60 seconds.
connections_handled ALPHA
Nginx connections_handled
CUMULATIVEINT64{connection}
k8s_container
Handled client connections. Sampled every 60 seconds.
connections_reading ALPHA
Nginx connections_reading
GAUGEINT64{connection}
k8s_container
Connections where NGINX is reading the request header. Sampled every 60 seconds.
connections_waiting ALPHA
Nginx connections_waiting
GAUGEINT64{connection}
k8s_container
Idle client connections. Sampled every 60 seconds.
connections_writing ALPHA
Nginx connections_writing
GAUGEINT64{connection}
k8s_container
Connections where NGINX is writing the response back to the client. Sampled every 60 seconds.
http_requests_total ALPHA
Nginx http_requests_total
CUMULATIVEINT64{request}
k8s_container
Total http requests. Sampled every 60 seconds.
nginxexporter_build_info ALPHA
Nginx nginxexporter_build_info
GAUGEINT641
k8s_container
Exporter build information. Sampled every 60 seconds.
gitCommit: Commit hash of the build which can be abbreviated.
version: Build version.
up ALPHA
Nginx up
GAUGEINT641
k8s_container
Status of the last metric scrape. Indicates if the server is up or not. Sampled every 60 seconds.

Generated at 2024-03-14 21:32:40 UTC.