Google Cloud Managed Service for Prometheus supports Prometheus-compatible rule evaluation and alerting. This document describes how to set up self-deployed rule evaluation, including the standalone rule-evaluator component.
You only need to follow these instructions if you want to execute rules and alerts against the global data store.
Rule evaluation for self-deployed collection
After you have deployed Managed Service for Prometheus, you can continue to
evaluate rules locally in each deployed instance by using the
rule_files
field of your Prometheus
configuration file. However, the maximum query window for the rules is
constrained by how long the server keeps local data.
Most rules execute only over the last few minutes of data, so running rules on each local server is often a valid strategy. In that case, no further setup is necessary.
However, sometimes it's useful to be able to evaluate rules against the global metric backend, for example, when all data for a rule is not co-located on a given Prometheus instance. For these cases, Managed Service for Prometheus also provides a rule-evaluator component.
Before you begin
This section describes the configuration needed for the tasks described in this document.
Configure your environment
To avoid repeatedly entering your project ID or cluster name, perform the following configuration:
Configure the command-line tools as follows:
Configure the gcloud CLI to refer to the ID of your Google Cloud project:
gcloud config set project PROJECT_ID
Configure the
kubectl
CLI to use your cluster:kubectl config set-cluster CLUSTER_NAME
For more information about these tools, see the following:
Set up a namespace
Create the NAMESPACE_NAME
Kubernetes namespace for resources you create
as part of the example application:
kubectl create ns NAMESPACE_NAME
Verify service account credentials
You can skip this section if your Kubernetes cluster has Workload Identity Federation for GKE enabled.
When running on GKE, Managed Service for Prometheus
automatically retrieves credentials from the environment based on the
Compute Engine default service account. The default service account has the
necessary permissions, monitoring.metricWriter
and monitoring.viewer
, by
default. If you don't use Workload Identity Federation for GKE, and you have previously
removed either of those roles from the default node service account, you will
have to re-add those missing permissions before continuing.
If you are not running on GKE, see Provide credentials explicitly.
Configure a service account for Workload Identity Federation for GKE
You can skip this section if your Kubernetes cluster does not have Workload Identity Federation for GKE enabled.
Managed Service for Prometheus captures metric data by using the Cloud Monitoring API. If your cluster is using Workload Identity Federation for GKE, you must grant your Kubernetes service account permission to the Monitoring API. This section describes the following:
- Creating a dedicated Google Cloud service account,
gmp-test-sa
. - Binding the Google Cloud service account to the default Kubernetes
service account in a test namespace,
NAMESPACE_NAME
. - Granting the necessary permission to the Google Cloud service account.
Create and bind the service account
This step appears in several places in the Managed Service for Prometheus documentation. If you have already performed this step as part of a prior task, then you don't need to repeat it. Skip ahead to Authorize the service account.
The following command sequence creates the gmp-test-sa
service account
and binds it to the default Kubernetes service account in the
NAMESPACE_NAME
namespace:
gcloud config set project PROJECT_ID \ && gcloud iam service-accounts create gmp-test-sa \ && gcloud iam service-accounts add-iam-policy-binding \ --role roles/iam.workloadIdentityUser \ --member "serviceAccount:PROJECT_ID.svc.id.goog[NAMESPACE_NAME/default]" \ gmp-test-sa@PROJECT_ID.iam.gserviceaccount.com \ && kubectl annotate serviceaccount \ --namespace NAMESPACE_NAME \ default \ iam.gke.io/gcp-service-account=gmp-test-sa@PROJECT_ID.iam.gserviceaccount.com
If you are using a different GKE namespace or service account, adjust the commands appropriately.
Authorize the service account
Groups of related permissions are collected into roles, and you grant the roles to a principal, in this example, the Google Cloud service account. For more information about Monitoring roles, see Access control.
The following command grants the Google Cloud service account,
gmp-test-sa
, the Monitoring API roles it needs to
read and write
metric data.
If you have already granted the Google Cloud service account a specific role as part of prior task, then you don't need to do it again.
gcloud projects add-iam-policy-binding PROJECT_ID \ --member=serviceAccount:gmp-test-sa@PROJECT_ID.iam.gserviceaccount.com \ --role=roles/monitoring.viewer \ && \ gcloud projects add-iam-policy-binding PROJECT_ID\ --member=serviceAccount:gmp-test-sa@PROJECT_ID.iam.gserviceaccount.com \ --role=roles/monitoring.metricWriter
Debug your Workload Identity Federation for GKE configuration
If you are having trouble getting Workload Identity Federation for GKE to work, see the documentation for verifying your Workload Identity Federation for GKE setup and the Workload Identity Federation for GKE troubleshooting guide.
As typos and partial copy-pastes are the most common sources of errors when configuring Workload Identity Federation for GKE, we strongly recommend using the editable variables and clickable copy-paste icons embedded in the code samples in these instructions.
Workload Identity Federation for GKE in production environments
The example described in this document binds the Google Cloud service account to the default Kubernetes service account and gives the Google Cloud service account all necessary permissions to use the Monitoring API.
In a production environment, you might want to use a finer-grained approach, with a service account for each component, each with minimal permissions. For more information on configuring service accounts for workload-identity management, see Using Workload Identity Federation for GKE.
Deploy the standalone rule evaluator
The Managed Service for Prometheus rule evaluator evaluates Prometheus alerting and recording rules against the Managed Service for Prometheus HTTP API and writes the results back to Monarch. It accepts the same configuration-file format and rule-file format as Prometheus. The flags are mostly identical, as well.
Create an example deployment of the rule evaluator that is pre-configured to evaluate an alerting and a recording rule:
kubectl apply -n NAMESPACE_NAME -f https://raw.githubusercontent.com/GoogleCloudPlatform/prometheus-engine/v0.13.0/manifests/rule-evaluator.yaml
Verify that the pods for the rule-evaluator deployed successfully:
kubectl -n NAMESPACE_NAME get pod
If the deployment was successful, then you see output similar to the following:
NAME READY STATUS RESTARTS AGE ... rule-evaluator-64475b696c-95z29 2/2 Running 0 1m
After you verify that the rule-evaluator deployed successfully, you can make adjustments to the installed manifests to do the following:
- Add your custom rules files.
- Configure the rule-evaluator to send alerts to a self-deployed
Prometheus Alertmanager by using the
alertmanager_config
field of the configuration file.
If your Alertmanager is located in a different cluster
than your rule-evaluator, then you might need to set up an Endpoints resource.
For example, if your OperatorConfig specifies that Alertmanager endpoints can be
found in Endpoints object ns=alertmanager/name=alertmanager
, then you can
manually or programmatically create this object yourself and populate it
with reachable IPs from the other cluster.
Provide credentials explicitly
When running on GKE, the rule-evaluator
automatically retrieves credentials from the environment based on the
node's service account or the Workload Identity Federation for GKE setup.
In non-GKE Kubernetes clusters, credentials must be explicitly
provided to the rule-evaluator by using flags or the
GOOGLE_APPLICATION_CREDENTIALS
environment variable.
Set the context to your target project:
gcloud config set project PROJECT_ID
Create a service account:
gcloud iam service-accounts create gmp-test-sa
This step creates the service account that you might have already created in the Workload Identity Federation for GKE instructions.
Grant the required permissions to the service account:
gcloud projects add-iam-policy-binding PROJECT_ID \ --member=serviceAccount:gmp-test-sa@PROJECT_ID.iam.gserviceaccount.com \ --role=roles/monitoring.viewer \ && \ gcloud projects add-iam-policy-binding PROJECT_ID\ --member=serviceAccount:gmp-test-sa@PROJECT_ID.iam.gserviceaccount.com \ --role=roles/monitoring.metricWriter
Create and download a key for the service account:
gcloud iam service-accounts keys create gmp-test-sa-key.json \ --iam-account=gmp-test-sa@PROJECT_ID.iam.gserviceaccount.com
Add the key file as a secret to your non-GKE cluster:
kubectl -n NAMESPACE_NAME create secret generic gmp-test-sa \ --from-file=key.json=gmp-test-sa-key.json
Open the rule-evaluator Deployment resource for editing:
kubectl -n NAMESPACE_NAME edit deploy rule-evaluator
Add the text shown in bold to the resource:
apiVersion: apps/v1 kind: Deployment metadata: namespace: NAMESPACE_NAME name: rule-evaluator spec: template containers: - name: evaluator args: - --query.credentials-file=/gmp/key.json - --export.credentials-file=/gmp/key.json ... volumeMounts: - name: gmp-sa mountPath: /gmp readOnly: true ... volumes: - name: gmp-sa secret: secretName: gmp-test-sa ...
Save the file and close the editor. After the change is applied, the pods are re-created and start authenticating to the metric backend with the given service account.
GOOGLE_APPLICATION_CREDENTIALS
environment variable.Multi-project and global rule evaluation
We recommend that you run one instance of the rule evaluator in each Google Cloud project and region rather than running one instance that evaluates against many projects and regions. However, we do support multi-project rule evaluation for scenarios that require it.
When deployed on Google Kubernetes Engine, the rule evaluator uses the Google Cloud project associated with the cluster, which it automatically detects. To evaluate rules that span projects, you can override the queried project by using the
--query.project-id
flag and specifying a project with a multi-project metrics scope. If your metrics scope contains all your projects, then your rules evaluate globally. For more information, see Metrics scopes.You must also update the permissions of the service account used by the rule evaluator so the service account can read from the scoping project and write to all monitored projects in the metrics scope.
Preserve labels when writing rules
For data the evaluator writes back to Managed Service for Prometheus, the evaluator supports the same
--export.*
flags andexternal_labels
-based configuration as the Managed Service for Prometheus server binary. We strongly recommend that you write rules so that theproject_id
,location
,cluster
, andnamespace
labels are preserved appropriately for their aggregation level, otherwise query performance might decline and you might encounter cardinality limits.The
project_id
orlocation
labels are mandatory. If these labels are missing, then the values in rule-evaluation results are set based on the configuration of the rule evaluator. Missingcluster
ornamespace
labels are not given values.Self-observability
The rule-evaluator emits Prometheus metrics on a configurable port using the
--web.listen-address
flag.For example, if the pod
rule-evaluator-64475b696c-95z29
is exposing these metrics on port9092
, the metrics can be viewed manually by usingkubectl
:# Port forward the metrics endpoint. kubectl port-forward rule-evaluator-64475b696c-95z29 9092 # Then query in a separate terminal. curl localhost:9092/metrics
You can configure your Prometheus stack to collect these so you have visibility to the performance of the rule-evaluator.
High-availability deployments
The rule evaluator can run in a highly available setup by following the same approach as documented for the Prometheus server.
Alerting using Cloud Monitoring metrics
You can configure the rule evaluator to alert on Google Cloud system metrics using PromQL. For instructions on how to create a valid query, see PromQL for Cloud Monitoring metrics.