GKE 会默认启用通过 Google Cloud Managed Service for Prometheus 收集代管式指标的功能。如果您在默认不启用代管式收集的 GKE 环境中运行,则可手动启用代管式收集。启用代管式收集后,集群内组件会运行,但在您部署 PodMonitoring 资源来抓取有效指标端点或启用某一代管式指标包之前,系统不会生成指标。
[[["易于理解","easyToUnderstand","thumb-up"],["解决了我的问题","solvedMyProblem","thumb-up"],["其他","otherUp","thumb-up"]],[["很难理解","hardToUnderstand","thumb-down"],["信息或示例代码不正确","incorrectInformationOrSampleCode","thumb-down"],["没有我需要的信息/示例","missingTheInformationSamplesINeed","thumb-down"],["翻译问题","translationIssue","thumb-down"],["其他","otherDown","thumb-down"]],["最后更新时间 (UTC):2025-08-12。"],[],[],null,["# Using Prometheus metrics\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\nThis page covers the basics of using\n[Prometheus](https://prometheus.io/) metrics for\navailability and latency SLIs in Cloud Monitoring, and using those\nmetrics to create an SLO.\n\nThe basics of Prometheus\n------------------------\n\n\n[Prometheus](https://prometheus.io/) is a leading open-source monitoring solution for metrics\nand alerting.\n\nPrometheus supports dimensional data with key-value identifiers for metrics,\nprovides the PromQL query language, and supports many integrations by providing\n\n[exporters](https://prometheus.io/docs/instrumenting/exporters) for other products.\n\nTo start using Prometheus with Monitoring, we recommend using\n[Google Cloud Managed Service for Prometheus](/stackdriver/docs/managed-prometheus).\n\n### Metrics\n\nPrometheus supports the following types of metrics:\n\n- Counter: a single value that can only be monotonically increased or reset to 0 on restart.\n- Gauge: a single numeric value that can be arbitrarily set.\n- Histogram: a group of configurable buckets for sampling observations and recording values in ranges; also provides a sum of all observed values\n- Summary: like a histogram, but it also calculates configurable quantiles over a sliding time window.\n\nFor more information, see\n[Metric types](https://prometheus.io/docs/concepts/metric_types/).\n\nCreating metrics for SLIs\n-------------------------\n\nIf your application emits Prometheus metrics, you can use them for SLIs.\n\n- For availability SLIs on request and error counts, you can start with Prometheus counter metrics.\n- For latency SLIs, you can use Prometheus histogram or summary metrics.\n\nTo collect Prometheus metrics with Google Cloud Managed Service for Prometheus, refer to the\ndocumentation for setting up [managed](/stackdriver/docs/managed-prometheus/setup-managed) or\n[self-deployed](/stackdriver/docs/managed-prometheus/setup-unmanaged) metric collection.\n\nWhen you [create an SLO](/stackdriver/docs/solutions/slo-monitoring/ui/create-slo) in the Google Cloud console, the default\navailability and latency SLO types do not include Prometheus metrics. To use a\nPrometheus metric, create a custom SLO and then choose a Prometheus metric for\nthe SLI.\n\nPrometheus metrics start with `prometheus.googleapis.com/`.\n\n### Metrics for GKE\n\nManaged collection of metrics by Google Cloud Managed Service for Prometheus is\n[enabled by default](/stackdriver/docs/managed-prometheus/setup-managed#enable-mgdcoll-gke) for GKE. If you are\nrunning in a GKE environment that does not enable managed\ncollection by default, you can\n[enable managed collection manually](/stackdriver/docs/managed-prometheus/setup-managed#enable-mgdcoll-gke-manual).\nWhen managed collection is enabled, the in-cluster components are running but\nmetrics are not generated until you deploy a\n[PodMonitoring](/stackdriver/docs/managed-prometheus/setup-managed#gmp-pod-monitoring) resource that scrapes a valid metrics endpoint\nor enable one of the managed metrics packages.\n\nThe [control plane metrics](/stackdriver/docs/solutions/gke/control-plane-metrics) package\nincludes metrics that are useful indicators of system health.\n[Enable collection](/stackdriver/docs/solutions/gke/managing-metrics#control-plane-metrics) of control plane metrics to use\nthese metrics for availability, latency, and other SLIs.\n\n- Use [API server metrics](/stackdriver/docs/solutions/gke/control-plane-metrics#api-server) to track API server load, the fraction of API server requests that return errors, and the response latency for requests received by the API server.\n- Use [scheduler metrics](/stackdriver/docs/solutions/gke/control-plane-metrics#scheduler-metrics) to help you to proactively respond to scheduling issues when there aren't enough resources for pending Pods.\n\n### Metrics for availability SLIs\n\nYou express a request-based availability SLI in the Cloud Monitoring API by\nusing the [`TimeSeriesRatio`](/monitoring/api/ref_v3/rest/v3/services.serviceLevelObjectives#TimeSeriesRatio) structure to set\nup a ratio of \"good\" or \"bad\" requests to total requests. This ratio is used\nin the `goodTotalRatio` field of a [`RequestBasedSli`](/monitoring/api/ref_v3/rest/v3/services.serviceLevelObjectives#RequestBasedSli)\nstructure.\n\nYour application must emit Prometheus metrics that can be used to construct\nthis ratio. The application must emit at least two of the following:\n\n1. A metric that counts total events; use this metric in the ratio's\n `totalServiceFilter`.\n\n You can use a Prometheus counter that's incremented for every event.\n2. A metric that counts \"bad\" events, use this metric in the ratio's\n `badServiceFilter`.\n\n You can use a Prometheus counter that's incremented for\n every error or other \"bad\" event.\n3. A metric that counts \"good\" events, use this metric in the ratio's\n `goodServiceFilter`.\n\n You can use a Prometheus counter that's incremented for\n every successful or other \"good\" event.\n\n### Metrics for latency SLIs\n\nYou express a request-based latency SLI in the Cloud Monitoring API by\ncreating a [`DistributionCut`](/monitoring/api/ref_v3/rest/v3/services.serviceLevelObjectives#DistributionCut) structure. This\nstructure is used in the `distributionCut` field of a\n[`RequestBasedSli`](/monitoring/api/ref_v3/rest/v3/services.serviceLevelObjectives#RequestBasedSli) structure.\n\nYour application must emit a Prometheus metric that can be used to construct\nthe distribution-cut value. You can use a Prometheus histogram or summary\nfor this purpose. To determine how to define your buckets to accurately\nmeasure whether your responses fall within your SLO, see\n\n[Metric types](https://prometheus.io/docs/concepts/metric_types/) in the Prometheus documentation.\n\nExample\n-------\n\nThe following JSON example uses the GKE control plane metric\n`prometheus.googleapis.com/apiserver_request_duration_seconds` metric to\ncreate a latency SLO for a service. The SLO requires 98% of response latency to\nbe less than 50 seconds in a calendar month. \n\n {\n \"displayName\": \"98% Calendar month - Request Duration Under 50s\",\n \"goal\": 0.98,\n \"calendarPeriod\": \"MONTH\",\n \"serviceLevelIndicator\": {\n \"requestBased\": {\n \"distributionCut\": {\n \"distributionFilter\": \"metric.type=\\\"prometheus.googleapis.com/apiserver_request_duration_seconds/histogram\\\" resource.type=\\\"prometheus_target\\\"\",\n \"range\": {\n \"min\": \"-Infinity\",\n \"max\": 50\n }\n }\n }\n }\n }\n\nWhat's next\n-----------\n\n- [Create an SLO](/stackdriver/docs/solutions/slo-monitoring/ui/create-slo)\n- Learn more about [Google Cloud Managed Service for Prometheus](/stackdriver/docs/managed-prometheus).\n- Learn more about [control plane metrics](/stackdriver/docs/solutions/gke/control-plane-metrics)."]]