Using Prometheus

Prometheus is a monitoring tool often used with Kubernetes. If you configure Stackdriver Kubernetes Engine Monitoring and include Prometheus support, then the metrics that are generated by services using the Prometheus exposition format can be exported from the cluster and made visible as external metrics in Stackdriver.

This page describes how to configure and use Prometheus with Stackdriver Kubernetes Engine Monitoring. The source code for the integration is publicly available.

Before you begin

You cannot configure clusters using Legacy Stackdriver with Prometheus. For more information about Legacy Stackdriver, go to Legacy Stackdriver how-to guides.

This page doesn't contain instructions for installing a Prometheus server or for creating a GKE cluster using Stackdriver Kubernetes Engine Monitoring.

Prior to installing the collector, carefully review these requirements:

Installing the collector

To deploy the Stackdriver collector, do the following:

  1. Identify the object to be updated by its name and controller type. Only the controller types of deployment and statefulset are supported.

  2. Set the following environment variables:

    • KUBE_NAMESPACE: Namespace to run the script against.
    • KUBE_CLUSTER: Sidecar's cluster name parameter.
    • GCP_REGION: Sidecar's GCP region parameter.
    • GCP_PROJECT: Sidecar's GCP project parameter.
    • DATA_DIR: Sidecar's data directory. This is the directory that houses the shared volume that your Prometheus server writes to. In the subsequent instructions, this variable is set to the value /data.
    • DATA_VOLUME: Name of the shared volume in the DATA_DIR that contains Prometheus's data. In the subsequent instructions, this variable is set to data-volume.
    • SIDECAR_IMAGE_TAG: Docker image version for the Prometheus sidecar. The latest release can be found in the Container registry.
  3. Execute the following script and supply the two parameters identified in the initial step of this procedure:

    #!/bin/sh
    
    set -e
    set -u
    
    usage() {
      echo -e "Usage: $0 <deployment|statefulset> <name>\n"
    }
    
    if [  $# -le 1 ]; then
      usage
      exit 1
    fi
    
    # Override to use a different Docker image name for the sidecar.
    export SIDECAR_IMAGE_NAME=${SIDECAR_IMAGE_NAME:-'gcr.io/stackdriver-prometheus/stackdriver-prometheus-sidecar'}
    
    kubectl -n "${KUBE_NAMESPACE}" patch "$1" "$2" --type strategic --patch "
    spec:
      template:
        spec:
          containers:
          - name: sidecar
            image: ${SIDECAR_IMAGE_NAME}:${SIDECAR_IMAGE_TAG}
            imagePullPolicy: Always
            args:
            - \"--stackdriver.project-id=${GCP_PROJECT}\"
            - \"--prometheus.wal-directory=${DATA_DIR}/wal\"
            - \"--stackdriver.kubernetes.location=${GCP_REGION}\"
            - \"--stackdriver.kubernetes.cluster-name=${KUBE_CLUSTER}\"
            #- \"--stackdriver.generic.location=${GCP_REGION}\"
            #- \"--stackdriver.generic.namespace=${KUBE_CLUSTER}\"
            ports:
            - name: sidecar
              containerPort: 9091
            volumeMounts:
            - name: ${DATA_VOLUME}
              mountPath: ${DATA_DIR}
    "
    

After successful execution of the script, the Stackdriver collector is added as a sidecar to the pods for the object identified in step one of the procedure. The two lines in the script that are commented out aren't relevant to the collection of metric data from GKE clusters. However, these two lines are relevant when you want to populate a generic MonitoredResource.

There are additional steps you must take to make the configuration changes permanent. These steps are described in subsequent sections.

Validating the installation

To validate the Stackdriver collector installation, run the following command:

kubectl -n "${KUBE_NAMESPACE}" get <deployment|statefulset> <name> -o=go-template='{{$output := "stackdriver-prometheus-sidecar does not exists."}}{{range .spec.template.spec.containers}}{{if eq .name "stackdriver-prometheus-sidecar"}}{{$output = (print "stackdriver-prometheus-sidecar exists. Image: " .image)}}{{end}}{{end}}{{printf $output}}{{"\n"}}'

  • When the Prometheus sidecar is successfully installed, the output of the script lists the image used from the container registry. In the following example, the image version is 0.4.3. In your installation, the version might be different:

    stackdriver-prometheus-sidecar exists. Image: gcr.io/stackdriver-prometheus/stackdriver-prometheus-sidecar:0.4.3
    
  • Otherwise, the output of the script shows:

    stackdriver-prometheus-sidecar does not exist.
    

To determine if your workload is up-to-date and available, run:

kubectl -n "${KUBE_NAMESPACE}" get <deployment|statefulset> <name>

Making the configuration change permanent

After verifying that the collector is successfully installed, update your cluster configuration to make the changes permanent:

  1. Configure the Prometheus server to write to a shared volume. In the following example steps, it is assumed that DATA_DIR was set to /data and DATA_VOLUME was set to data-volume:

    1. Ensure that there is a shared volume in the Prometheus pod:

      volumes:
        - name: data-volume
          emptyDir: {}
      
    2. Have Prometheus mount the volume under /data:

      volumeMounts:
      - name: data-volume
        mountPath: /data
      
    3. Instruct the Prometheus server to write to the shared volume in /data by adding the following to its container args:

      --storage.tsdb.path=/data
      
  2. Using the tools you use to manage the configuration of your workloads, re-apply the configuration to the cluster and include the Stackdriver collector container as a sidecar in the new configuration:

    - name: sidecar
      image: gcr.io/stackdriver-prometheus/stackdriver-prometheus-sidecar:[SIDECAR_IMAGE_TAG]
      args:
      - "--stackdriver.project-id=[GCP_PROJECT]"
      - "--prometheus.wal-directory=/data/wal"
      - "--prometheus.api-address=[API_ADDRESS]"
      - "--stackdriver.kubernetes.location=[GCP_REGION]"
      - "--stackdriver.kubernetes.cluster-name=[KUBE_CLUSTER]"
      ports:
      - name: sidecar
        containerPort: 9091
      volumeMounts:
      - name: data-volume
        mountPath: /data
    

    In the previous expression, [API_ADDRESS] refers to Prometheus's API address, which is typically http://127.0.0.1:9090.

For additional configuration details for the collector, refer to the Stackdriver Prometheus sidecar documentation.

Viewing metrics

Prometheus is configured to export metrics to Monitoring as external metrics.

To view these metrics:

  1. From the GCP Console, go to Stackdriver > Resources > Metrics Explorer:

    Go to Metrics Explorer

  2. In the Find resource type and metric menu:

    • Select Kubernetes Container (k8s_container) for the Resource type.
    • For the Metric field, select one with the prefix external/prometheus/. For example, you might select external/prometheus/go_memstats_alloc_bytes.

    In the following example, a filter was added to display the metrics for a specific cluster. Filtering by cluster name is useful when you have multiple clusters in one Workspace:

    Sample Prometheus metric for a Kubernetes container.

Managing costs for Prometheus-derived metrics

Typically, Prometheus is configured to collect all the metrics exported by your application, and, by default, the Stackdriver collector sends these metrics to Stackdriver. This collection includes metrics exported by libraries that your application depends on. For instance, the Prometheus client library exports many metrics about the application environment.

You can configure filters in the Stackdriver collector to select what metrics get ingested into Stackdriver. For example, to import only those metrics generated by kubernetes-pods and kubernetes-service-endpoints, add the following --include statement when starting the stackdriver-prometheus-sidecar:

 --include={job=~"kubernetes-pods|kubernetes-service-endpoints"}

For more information, see Stackdriver Prometheus sidecar documentation.

You can also estimate how much these metrics contribute to your bill.

Prometheus integration issues

No data shows up in Stackdriver.

If no data shows up in Stackdriver after you went through the installation steps, search the collector logs for error messages.

If the logs don't contain any obvious failure messages, turn on debug logging by passing -log.level=debug flag to the collector. You must restart the collector for the logging change to take effect. After restarting the collector, search the collector logs for error messages.

To verify that data is sent to Stackdriver Monitoring, you can send the requests to files using the --stackdriver.store-in-files-directory command line parameter and then inspect the files in this directory.

I'm using recording rules and the metrics don't appear in Stackdriver.

When you are using recording roles, if possible ingest the raw metric into Stackdriver and use Stackdriver Monitoring's features to aggregate the data when you create a chart or dashboard.

If ingesting the raw metric isn't an option, add a static_metadata entry in the collector's config. This option requires you to preserve the job and instance labels. For instance, the current configuration is valid:

  • Your Prometheus server configuration:

    groups:
    - name: my-groups
      rules:
      - record: backlog_avg_10m
        expr: avg_over_time(backlog_k8s[10m])
      - record: backlog_k8s
        expr: sum(total_lag) by (app, job, instance)
    
  • Your Stackdriver Prometheus collector configuration:

    static_metadata:
      - metric: backlog_avg_10m
        type: gauge
    

Recording rules that change or remove either the job or instance labels aren't supported.

My metrics are missing the job and instance Prometheus labels.

The Stackdriver Prometheus collector constructs a Stackdriver MonitoredResource for your Kubernetes objects from well-known Prometheus labels. When you change the label descriptors the collector isn't able to write the metrics to Stackdriver.

I see "duplicate time series" or "out-of-order writes" errors in the logs.

These errors are caused by writing metric data twice to the same time series. They occur when your Prometheus endpoints use the same metric twice from a single Stackdriver monitored resource.

For example, a Kubernetes container might send Prometheus metrics on multiple ports. Since the Stackdriver k8s_container monitored resource doesn't differentiate resources based on port, Stackdriver detects you are writing two points to the same time series. To avoid this situation, add a metric label in Prometheus that differentiates the time series. For example, you might use label __meta_kubernetes_pod_annotation_prometheus_io_port, because it remains constant across container restarts.

I see "metric kind must be X, but is Y" errors in the logs.

These errors are caused by changing the Prometheus metric type for an existing metric descriptor. Stackdriver metrics are strictly typed and don't support changing a metric's type between gauge, counter, and others.

To change a metric's type you must delete the corresponding metric descriptors and create a new descriptor. Deletion of a metric descriptor makes the existing time series data inaccessible.

I'm sure I saw Prometheus metrics types before, but now I can't find them!

Prometheus is pre-configured to export metrics to Stackdriver Monitoring as external metrics. When data is exported, Monitoring creates the appropriate metric descriptor for the external metric. If no data of that metric type is written for at least 6 weeks, the metric descriptor is subject to deletion.

There is no guarantee that unused metric descriptors are deleted after 6 weeks, but Monitoring reserves the right to delete any Prometheus metric descriptor that hasn't been used in the previous 6 weeks.

Deprecation policy

The Stackdriver-Prometheus integration is subject to the Stackdriver agents deprecation policy.

Czy ta strona była pomocna? Podziel się z nami swoją opinią:

Wyślij opinię na temat...

Stackdriver Monitoring
Potrzebujesz pomocy? Odwiedź naszą stronę wsparcia.