Using Prometheus

Prometheus is an optional monitoring tool often used with Kubernetes. If you configure Stackdriver Kubernetes Engine Monitoring with Prometheus support, then services that expose metrics in the Prometheus data model can be exported from the cluster and made visible as external metrics in Stackdriver.

This page presents a mechanism for Stackdriver to collect data from Prometheus clients that works with Stackdriver Kubernetes Engine Monitoring. The source code for the integration is publicly available.

Before you begin

The Prometheus support described here does not work with the Legacy Stackdriver support that is described in Stackdriver Monitoring.

Installing the collector

Stackdriver provides a collector that needs to be deployed as a sidecar in the same Kubernetes pod as your Prometheus server. Use the following shell commands to install the Stackdriver collector in a new cluster using Stackdriver Kubernetes Engine Monitoring.

Log into your cluster and then run the following script with the required environment variables:

  • KUBE_NAMESPACE: namespace to run the script against
  • KUBE_CLUSTER: cluster name parameter for the sidecar
  • GCP_REGION: GCP region parameter for the sidecar
  • GCP_PROJECT: GCP project parameter for the sidecar
  • DATA_DIR: data directory for the sidecar
  • DATA_VOLUME: name of the volume that contains Prometheus's data
  • SIDECAR_IMAGE_TAG: Docker image version for the Prometheus sidecar. We recommend using the latest release from the Container Registry.
#!/bin/sh

set -e
set -u

usage() {
  echo -e "Usage: $0 <deployment|statefulset> <name>\n"
}

if [  $# -le 1 ]; then
  usage
  exit 1
fi

# Override to use a different Docker image name for the sidecar.
export SIDECAR_IMAGE_NAME=${SIDECAR_IMAGE_NAME:-'gcr.io/stackdriver-prometheus/stackdriver-prometheus-sidecar'}

kubectl -n "${KUBE_NAMESPACE}" patch "$1" "$2" --type strategic --patch "
spec:
  template:
    spec:
      containers:
      - name: sidecar
        image: ${SIDECAR_IMAGE_NAME}:${SIDECAR_IMAGE_TAG}
        imagePullPolicy: Always
        args:
        - \"--stackdriver.project-id=${GCP_PROJECT}\"
        - \"--prometheus.wal-directory=${DATA_DIR}/wal\"
        - \"--stackdriver.kubernetes.location=${GCP_REGION}\"
        - \"--stackdriver.kubernetes.cluster-name=${KUBE_CLUSTER}\"
        #- \"--stackdriver.generic.location=${GCP_REGION}\"
        #- \"--stackdriver.generic.namespace=${KUBE_CLUSTER}\"
        ports:
        - name: sidecar
          containerPort: 9091
        volumeMounts:
        - name: ${DATA_VOLUME}
          mountPath: ${DATA_DIR}
"

Validating the configuration

After configuring Prometheus, run the following command to validate the installation:

kubectl -n "${KUBE_NAMESPACE}" get <deployment|statefulset> <name> -o=go-template='{{$output := "stackdriver-prometheus-sidecar does not exists."}}{{range .spec.template.spec.containers}}{{if eq .name "stackdriver-prometheus-sidecar"}}{{$output = (print "stackdriver-prometheus-sidecar exists. Image: " .image)}}{{end}}{{end}}{{printf $output}}{{"\n"}}'

If the Prometheus sidecar is successfully installed, the output of the script shows:

stackdriver-prometheus-sidecar exists. Image: gcr.io/stackdriver-prometheus/stackdriver-prometheus-sidecar:0.3.2

If the Prometheus sidecar is not successfully installed, the output of the script shows:

stackdriver-prometheus-sidecar does not exist.

To verify if a workload is up-to-date and available, run:

kubectl -n "${KUBE_NAMESPACE}" get <deployment|statefulset> <name>

Updating the configuration

After verifying that the collector successfully installed, update your cluster configuration to make the changes permanent:

  1. Make sure Prometheus Server is writing to a shared volume:

    1. Ensure that there is a shared volume in the Prometheus pod:

      volumes:
        - name: data-volume
          emptyDir: {}
      
    2. Have Prometheus mount the volume under /data:

      volumeMounts:
      - name: data-volume
        mountPath: /data
      
    3. Instruct Prometheus Server to write to the shared volume in /data. Add the following to its container args:

      --storage.tsdb.path=/data
      
  2. Add the collector container as a sidecar.

    - name: sidecar
      image: gcr.io/stackdriver-prometheus/stackdriver-prometheus-sidecar:[SIDECAR_IMAGE_TAG]
      args:
      - "--stackdriver.project-id=[GCP_PROJECT]"
      - "--prometheus.wal-directory=/data/wal"
      - "--stackdriver.kubernetes.location=[GCP_REGION]"
      - "--stackdriver.kubernetes.cluster-name=[KUBE_CLUSTER]"
      ports:
      - name: sidecar
        containerPort: 9091
      volumeMounts:
      - name: data-volume
        mountPath: /data
    

For additional configuration details for the collector, refer to the Stackdriver Prometheus sidecar documentation.

Viewing metrics

The Prometheus software you installed is configured to begin exporting metrics to Monitoring as external metrics. You can see them in Stackdriver > Resources > Metrics Explorer:

GO TO METRICS EXPLORER

Look in the monitored resource type Kubernetes Container (k8s_container) for metrics named external/prometheus/.... A metric that should have some interesting data is external/prometheus/go_memstats_alloc_bytes. If you have more than one cluster in your Workspace, then you might want to filter the chart on the cluster name, as shown in the following screenshot:

Prometheus chart

Prometheus integration issues

No data shows up in Stackdriver.

If no data shows up in Stackdriver after you went through the installation steps, look at the collector logs for any error messages that could indicate problems.

If the logs don't contain any obvious failure messages, pass the -log.level=debug flag to the collector in order to turn on debug logging. After restarting the collector, you might start seeing new messages that may point to the source of the problem.

I'm using recording rules and the metrics don't appear in Stackdriver.

Recording rules require special handling. Where possible we recommend ingesting the raw metric into Stackdriver and Stackdriver Monitoring's features to aggregate the data at query time (a chart, dashboard, etc.)

If ingesting the raw metric isn't an option, you can add a static_metadata entry in the collector's config (docs). This option requires you to preserve the job and instance labels. For instance, the current configuration is valid:

In your Prometheus Server configuration:

groups:
- name: my-groups
  rules:
  - record: backlog_avg_10m
    expr: avg_over_time(backlog_k8s[10m])
  - record: backlog_k8s
    expr: sum(total_lag) by (app, job, instance)

In your Stackdriver Prometheus collector configuration:

static_metadata:
  - metric: backlog_avg_10m
    type: gauge

Currently recording rules that change or remove either the job or instance labels are not supported.

My metrics are missing the job and instance Prometheus labels.

The Stackdriver Prometheus collector constructs a Stackdriver MonitoredResource for your Kubernetes objects from well-known Prometheus labels. If you accidentally change the label descriptors, the collector isn't able to write the metrics to Stackdriver.

I see "duplicate time series" or "out-of-order writes" errors in the logs.

These errors can be caused by writing metric data twice to the same time series. They can occur if your Prometheus endpoints expose the same metric data—the same set of metric label values—twice from a single Stackdriver monitored resource.

For example, a Kubernetes container might expose Prometheus metrics on multiple ports. Since the Stackdriver k8s_container monitored resource doesn't differentiate resources based on port, Stackdriver detects you are writing two points to the same time series. A workaround is to add a metric label in Prometheus that differentiates the time series. For example, you might use label __meta_kubernetes_pod_annotation_prometheus_io_port, because it should remain constant across container restarts.

I see "metric kind must be X, but is Y" errors in the logs.

These errors can be caused by changing the Prometheus metric type in the source code between gauge, counter, and others. Stackdriver metrics are strictly typed and Stackdriver strictly enforces this, because the data semantics vary with the type.

If you want to change a metric's type you have to delete the corresponding metric descriptors, which will make its existing time series data inaccessible.

I'm sure I saw Prometheus metrics types before, but now I can't find them!

The Prometheus software you installed is pre-configured to export metrics to Stackdriver Monitoring as external metrics. When data is exported, Monitoring creates the appropriate metric descriptor for the external metric. If no data of that metric type is subsequently written for at least 6 weeks, the metric descriptor is subject to deletion.

There is no guarantee that unused metric descriptors will be deleted after 6 weeks, but Monitoring reserves the right to delete any Prometheus metric descriptor that hasn't been used in the previous 6 weeks.

Deprecation policy

The Stackdriver-Prometheus integration is subject to the Stackdriver agents deprecation policy.

Was this page helpful? Let us know how we did:

Send feedback about...

Stackdriver Monitoring
Need help? Visit our support page.