Prometheus is an optional monitoring tool often used with Kubernetes. If you configure Stackdriver Kubernetes Monitoring with Prometheus support, then services that expose metrics in the Prometheus data model can be exported from the cluster and made visible as external metrics in Stackdriver.
This page presents a basic configuration for Prometheus that works with Stackdriver Kubernetes Monitoring.
Before you begin
You must be an Owner of the project containing your cluster. For more information on Owner privileges, see IAM Role Types.
You must have already installed Stackdriver Kubernetes Monitoring in your cluster. For instructions, see Installing Stackdriver Kubernetes Monitoring.
Use the following
kubectl commands to install the basic Prometheus
configuration in a new cluster using Stackdriver Kubernetes Monitoring. If you want to customize
your own configuration, then see the following section on
- Log into your cluster.
Download the Kubernetes auth configuration (YAML):
curl -sSO "https://storage.googleapis.com/stackdriver-prometheus-documentation/rbac-setup.yml"
Have a cluster admin (which could be yourself) run the following to set up a Kubernetes service account (named "prometheus") for the collector:
kubectl apply -f rbac-setup.yml --as=admin --as-group=system:masters
Download the default basic Prometheus configuration (YAML):
curl -sSO "https://storage.googleapis.com/stackdriver-prometheus-documentation/prometheus-service.yml"
Edit the default basic configuration for your cluster. Look for the following labels, and modify their values to identify your cluster:
Run the following to start the server using your modified configuration:
kubectl apply -f prometheus-service.yml
Validating the configuration
After configuring Prometheus, run the following command to validate the installation:
kubectl get deployment,service -n stackdriver
The output of this command will show that the
prometheus Deployment is
available and the Service is deployed:
NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE deployment.extensions/prometheus 1 1 1 1 48s NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE service/prometheus ClusterIP 10.15.253.135 <none> 9090/TCP 49s
The Prometheus software you installed is pre-configured to begin exporting metrics to Monitoring as external metrics. You can see them in Stackdriver > Resources > Metrics Explorer:
Look in the monitored resource type Kubernetes Container (
for metrics named
external/prometheus/.... A metric that should have some
interesting data is
external/prometheus/go_memstats_alloc_bytes. If you have
more than one cluster in your Workspace, then you might want to filter the
chart on the cluster name, as shown in the following screenshot:
If you have an existing Prometheus configuration, you can use it with the following changes for Stackdriver Kubernetes Monitoring:
Copy the following three stanzas from the provided basic configuration (YAML) to your own configuration:
Follow the instructions in the preceding Configuration section, substituting your own configuration for the basic configuration.
You can annotate your pods before or after you configure Prometheus.
The basic configuration assumes that the pods you want to monitor use the following annotation:
This and other annotations are documented in the configuration file prometheus-service.yml.
Prometheus integration issues
My metrics are missing the
instance Prometheus labels.
The Prometheus job and instance labels might appear in the Stackdriver monitored
resource associated with the metric data under other names. If you need to
change this, look for the
write_relabel_config section in the default
up metric has no data points for the times the endpoint is not up.
This is a deviation from the typical Prometheus behavior. If you rely on this metric for alerting, you can use the metric absence alerting condition in your Stackdriver alert policy.
We made this change to avoid other issues described in more detail in the duplicate time series issue.
I modified your default configuration and things stopped working.
The Stackdriver Prometheus collector constructs a Stackdriver MonitoredResource for your Kubernetes objects from well-known Prometheus labels. If you accidentally change the label descriptors, the collector isn't able to write the metrics to Stackdriver.
I see "duplicate time series" or "out-of-order writes" errors in the logs.
These errors can be caused by writing metric data twice to the same time series. They can occur if your Prometheus endpoints expose the same metric data—the same set of metric label values—twice from a single Stackdriver monitored resource.
For example, a Kubernetes container might expose Prometheus metrics on multiple
ports. Since the Stackdriver
k8s_container monitored resource doesn't
differentiate resources based on port, Stackdriver detects you are writing two
points to the same time series. A workaround is to add a metric label in
Prometheus that differentiates the time series. For example, you might use label
__meta_kubernetes_pod_annotation_prometheus_io_port, because it should remain
constant across container restarts.
I see "metric kind must be X, but is Y" errors in the logs.
These errors can be caused by changing the Prometheus metric type in the source code between gauge, counter, and others. Stackdriver metrics are strictly typed and Stackdriver strictly enforces this, because the data semantics vary with the type.
If you want to change a metric's type you have to delete the corresponding metric descriptors, which will make its existing time series data inaccessible.