Migrating to GKE workload metrics from the Stackdriver Prometheus Sidecar

This tutorial describes how to migrate a metric collection pipeline based on the Stackdriver Prometheus Sidecar to fully-managed Google Kubernetes Engine (GKE) workload metrics. There are many benefits to using GKE workload metrics instead of the Stackdriver Prometheus sidecar:

  • Easy setup: With a single kubectl command to deploy a PodMonitor custom resource, you can start collecting metrics.
  • Highly configurable: Adjust scrape endpoints, frequency and other parameters.
  • Fully managed: Google maintains the pipeline, lowering total cost of ownership.
  • Control costs: Easily manage Cloud Monitoring costs through flexible metric filtering.
  • Open standard: Configure workload metrics using the PodMonitor custom resource, which is modeled after the Prometheus Operator's PodMonitor resource.
  • HPA support: Compatible with the Stackdriver Custom Metrics Adapter to enable horizontal auto-scaling on custom metrics.
  • Better pricing: More intuitive, predictable, and lower pricing.
  • Autopilot support: GKE workload metrics is available for both GKE Standard and GKE Autopilot clusters.

Confirm you are using the Stackdriver Prometheus Sidecar

The Stackdriver Prometheus Sidecar was previously the recommended approach for collecting Prometheus-style metrics from GKE clusters and ingesting them into Cloud Monitoring. In this approach, an existing Prometheus installation, typically running as a StatefulSet or Deployment under GKE, is patched with a sidecar that exports all the metrics it scrapes into Cloud Monitoring. The Prometheus Helm chart provides a convenient way to configure this integration.

Determine whether GKE workload metrics covers your usage

The following checklist can help determine whether you use any features of the Stackdriver Prometheus Sidecar approach that can't be reproduced by using GKE workload metrics.

Feature Support/workarounds
Is Prometheus being used to monitor anything other than pods in the same cluster? To determine this, look for usage of the following:
  • static_config
  • any service discovery plugin other than kubernetes_sd_config
  • any kubernetes_sd_config role other than endpoint or pod
  • api_server or kubeconfig being used to point to a different Kubernetess cluster
The pod role is supported out of the box. The endpoint role can be simulated by exposing and naming the port.
Are pods being selected based on fields rather than labels? Use an equivalent label-selector, which might require adding new labels to the desired pods.
Is Prometheus configured with any form of HTTP authorization or mutual TLS for scraping? This feature is not supported in GKE workload metrics.
Is Prometheus configured with metric_relabel_configs that use actions other than keep and drop? This feature is not supported in GKE workload metrics.
Do you use the Counter Aggregator or metric renaming features of the Stackdriver Prometheus Sidecar? This feature is not supported in GKE workload metrics.
Does your Prometheus configuration include alerting? Convert them into alerts within Monitoring.
Does your sidecar configuration include static_metadata? GKE workload metrics collects your metrics but disregards documentation.

Migrate Prometheus configuration to PodMonitor custom resources

For each job (item in the scrape_configs array) defined in the Prometheus configuration, create a corresponding PodMonitor custom resource. Here's an illustrative example:

Prometheus config PodMonitor CRD
scrape_configs:
- job_name: example
  metrics_path: /metrics
  scheme: http
  scrape_interval: 20s
  kubernetes_sd_configs:
  - role: endpoints
    namespaces:
      names:
      - gke-workload-metrics
    selectors:
    - role: endpoints
      label: "app=prom-example"
      field: "port=metrics-port"
apiVersion: monitoring.gke.io/v1alpha1
kind: PodMonitor
metadata:
  name: example
spec:
  namespaceSelector:
    matchNames:
    - gke-workload-metrics
  selector:
    matchLabels:
      app: prom-example
  podMetricsEndpoints:
  - port: metrics-port
    path: /metrics
    scheme: http
    interval: 20s

Migrate Sidecar configuration to PodMonitor custom resources

Filters configured at the sidecar level apply to all scrape jobs. As a result, you must append these configurations to each PodMonitor CRD that you created in the previous step.

Sidecar CLI PodMonitor CRD
$ stackdriver-prometheus-sidecar \
--include='metric_name{label="foo"}'
apiVersion: monitoring.gke.io/v1alpha1
kind: PodMonitor
metadata:
  name: example
spec:
  namespaceSelector:
    matchNames:
    - gke-workload-metrics
  selector:
    matchLabels:
      app: prom-example
  podMetricsEndpoints:
  - port: metrics-port
    path: /metrics
    scheme: http
    interval: 20s
    metricRelabelings:
      - sourceLabels: [__name__, label]
        regex: "^metric_name;foo$"
        action: keep
      - sourceLabels: [__name__]

View metrics in Google Cloud Monitoring

Use the Metrics Explorer to verify that the metric is now ingested via the GKE workload metrics pipeline.

Note that if the metric was previously named external.googleapis.com/prometheus/metric_name, it's now named workload.googleapis.com/metric_name. Remember to modify any dashboards or alerts that depend on these metrics to use the new naming scheme instead.