This document describes how to set up Google Cloud Managed Service for Prometheus with managed collection. The setup is a minimal example of working ingestion, using a Prometheus deployment that monitors an example application and stores collected metrics in Monarch.
This document shows you how to do the following:
- Set up your environment and command-line tools.
- Set up managed collection for your cluster.
- Configure a resource for target scraping and metric ingestion.
- Migrate existing prometheus-operator custom resources.
We recommend that you use managed collection; it reduces the complexity of deploying, scaling, sharding, configuring, and maintaining the collectors. Managed collection is supported for GKE and all other Kubernetes environments.
Managed collection runs Prometheus-based collectors as a Daemonset and ensures scalability by only scraping targets on colocated nodes. You configure the collectors with lightweight custom resources to scrape exporters using pull collection, then the collectors push the scraped data to the central data store Monarch. Google Cloud never directly accesses your cluster to pull or scrape metric data; your collectors push data to Google Cloud. For more information about managed and self-deployed data collection, see Data collection with Managed Service for Prometheus and Ingestion and querying with managed and self-deployed collection.
Before you begin
This section describes the configuration needed for the tasks described in this document.
Set up projects and tools
To use Google Cloud Managed Service for Prometheus, you need the following resources:
A Google Cloud project with the Cloud Monitoring API enabled.
If you don't have a Google Cloud project, then do the following:
In the Google Cloud console, go to New Project:
In the Project Name field, enter a name for your project and then click Create.
Go to Billing:
Select the project you just created if it isn't already selected at the top of the page.
You are prompted to choose an existing payments profile or to create a new one.
The Monitoring API is enabled by default for new projects.
If you already have a Google Cloud project, then ensure that the Monitoring API is enabled:
Go to APIs & services:
Select your project.
Click Enable APIs and Services.
Search for "Monitoring".
In the search results, click through to "Cloud Monitoring API".
If "API enabled" is not displayed, then click the Enable button.
A Kubernetes cluster. If you do not have a Kubernetes cluster, then follow the instructions in the Quickstart for GKE.
You also need the following command-line tools:
gcloud
kubectl
The gcloud
and kubectl
tools are part of the
Google Cloud CLI. For information about installing
them, see Managing Google Cloud CLI components. To see the
gcloud CLI components you have installed, run the following command:
gcloud components list
Configure your environment
To avoid repeatedly entering your project ID or cluster name, perform the following configuration:
Configure the command-line tools as follows:
Configure the gcloud CLI to refer to the ID of your Google Cloud project:
gcloud config set project PROJECT_ID
Configure the
kubectl
CLI to use your cluster:kubectl config set-cluster CLUSTER_NAME
For more information about these tools, see the following:
Set up a namespace
Create the NAMESPACE_NAME
Kubernetes namespace for resources you create
as part of the example application:
kubectl create ns NAMESPACE_NAME
Set up managed collection
You can use managed collection on both GKE and non-GKE Kubernetes clusters.
After enabling managed collection, the in-cluster components will be running but no metrics are generated yet. You must deploy a PodMonitoring resource that scrapes a valid metrics endpoint to see any data in the Query UI. For troubleshooting information, see Ingestion-side problems.
Enabling managed collection installs the following components in your cluster:
- The
gmp-operator
Deployment, which deploys the Kubernetes operator for Managed Service for Prometheus. - The
rule-evaluator
Deployment, which is used to configure and run alerting and recording rules. - The
collector
DaemonSet, which horizontally scales collection by scraping metrics only from pods running on the same node as each collector. - The
alertmanager
StatefulSet, which is configured to send triggered alerts to your preferred notification channels.
For reference documentation about the Managed Service for Prometheus operator, see the manifests page.
Enable managed collection: GKE
If you are running in a GKE environment, then you can enable managed collection by using the following:
- The GKE Clusters dashboard in Cloud Monitoring.
- The Kubernetes Engine page in the Google Cloud console.
- The Google Cloud CLI. To use the gcloud CLI, you must be running GKE version 1.21.4-gke.300 or newer.
- Terraform for Google Kubernetes Engine. To use Terraform to enable Managed Service for Prometheus, you must be running GKE version 1.21.4-gke.300 or newer.
Managed collection is on by default in GKE Autopilot clusters running GKE version 1.25 or greater.
Managed collection on GKE gets automatically upgraded when new in-cluster component versions are released.
Managed collection on GKE uses permissions granted to the default
Compute Engine service account. If you have a policy that modifies the
standard permissions on the default node service account, you might need to add
the Monitoring Metric Writer
role
to continue.
GKE Clusters dashboard
You can do the following by using the GKE Clusters dashboard in Cloud Monitoring.
- Determine whether Managed Service for Prometheus is enabled on your clusters and whether you are using managed or self-deployed collection.
- Enable managed collection on clusters in your project.
- View other information about your clusters.
To view the GKE Clusters dashboard, do the following:
In the Google Cloud console, select Monitoring, or use the following button:
Select the GCP dashboard category, and then select GKE Clusters.
To enable managed collection on one or more GKE clusters by using the GKE Clusters dashboard, do the following:
Select the checkbox for each GKE cluster on which you want to enable managed collection.
Select Enable Selected.
Kubernetes Engine UI
You can do the following by using the Google Cloud console:
- Enable managed collection on an existing GKE cluster.
- Create a new GKE cluster with managed collection enabled.
To update an existing cluster, do the following:
In the Google Cloud console, select Kubernetes Engine, or use the following button:
Select Clusters.
Click on the name of the cluster.
In the Features list, locate the Managed Service for Prometheus option. If it is listed as disabled, click edit Edit, and then select Enable Managed Service for Prometheus.
Click Save changes.
To create a cluster with managed collection enabled, do the following:
In the Google Cloud console, select Kubernetes Engine, or use the following button:
Select Clusters.
Click Create.
Click Configure for the Standard option.
In the navigation panel, click Features.
In the Operations section, select Enable Managed Service for Prometheus.
Click Save.
gcloud CLI
You can do the following by using the gcloud CLI:
- Enable managed collection on an existing GKE cluster.
- Create a new GKE cluster with managed collection enabled.
These commands might take up to 5 minutes to complete.
First, set your project:
gcloud config set project PROJECT_ID
To update an existing cluster, run one of the following
update
commands based on whether your cluster is zonal or regional:
gcloud container clusters update CLUSTER_NAME --enable-managed-prometheus --zone ZONE
gcloud container clusters update CLUSTER_NAME --enable-managed-prometheus --region REGION
To create a cluster with managed collection enabled, run the following command:
gcloud container clusters create CLUSTER_NAME --zone ZONE --enable-managed-prometheus
GKE Autopilot
Managed collection is on by default in GKE Autopilot clusters running GKE version 1.25 or greater. You can't turn off managed collection.
If your cluster fails to enable managed collection automatically when upgrading to 1.25, you can manually enable it by running the update command in the gcloud CLI section.
Terraform
For instructions on configuring managed collection using Terraform, see the
Terraform registry for google_container_cluster
.
For general information about using Google Cloud with Terraform, see Terraform with Google Cloud.
Enable managed collection: non-GKE Kubernetes
If you are running in a non-GKE environment, then you can enable managed collection using the following:
- The
kubectl
CLI. The bundled solution included in Anthos deployments running version 1.12 or newer.
kubectl
CLI
To install managed collectors when you are using a non-GKE Kubernetes cluster, run the following commands to install the setup and operator manifests:
kubectl apply -f https://raw.githubusercontent.com/GoogleCloudPlatform/prometheus-engine/v0.6.1/manifests/setup.yaml kubectl apply -f https://raw.githubusercontent.com/GoogleCloudPlatform/prometheus-engine/v0.6.1/manifests/operator.yaml
Anthos
For information about configuring managed collection for Anthos clusters, see the documentation for your distribution:
Deploy the example application
Below is a manifest for an example application that emits
the example_requests_total
counter metric and the example_random_numbers
histogram metric on its metrics
port. The application uses three replicas.
To deploy the example application, run the following command:
kubectl -n NAMESPACE_NAME apply -f https://raw.githubusercontent.com/GoogleCloudPlatform/prometheus-engine/v0.6.1/examples/example-app.yaml
Configure a PodMonitoring resource
To ingest the metric data emitted by the example application, you use target scraping. Target scraping and metrics ingestion are configured using Kubernetes custom resources. The managed service uses PodMonitoring custom resources (CRs).
A PodMonitoring CR scrapes targets only in the namespace the CR is deployed in.
To scrape targets in multiple namespaces, deploy the same PodMonitoring CR in
each namespace. You can verify the PodMonitoring resource is installed in the
intended namespace by running kubectl get podmonitoring -A
.
For reference documentation about all the Managed Service for Prometheus CRs, see the prometheus-engine/doc/api reference.
The following manifest defines a PodMonitoring resource,
prom-example
, in the NAMESPACE_NAME
namespace. The resource
uses a Kubernetes label selector to find all
pods in the namespace that have the label app
with the value prom-example
.
The matching pods are scraped on a port named metrics
, every 30 seconds, on
the /metrics
HTTP path.
apiVersion: monitoring.googleapis.com/v1 kind: PodMonitoring metadata: name: prom-example spec: selector: matchLabels: app: prom-example endpoints: - port: metrics interval: 30s
To apply this resource, run the following command:
kubectl -n NAMESPACE_NAME apply -f https://raw.githubusercontent.com/GoogleCloudPlatform/prometheus-engine/v0.6.1/examples/pod-monitoring.yaml
Your managed collector is now scraping the matching pods. You can view the status of your scrape target by enabling the target status feature.
To configure horizontal collection that applies to a range of pods across all namespaces, use the ClusterPodMonitoring resource. The ClusterPodMonitoring resource provides the same interface as the PodMonitoring resource but does not limit discovered pods to a given namespace.
If you are running on GKE, then you can do the following:
- To query the metrics ingested by the example application, see Query data from the Prometheus service.
- To learn about filtering exported metrics and adapting your prom-operator resources, see Additional topics for managed collection.
If you are running outside of GKE, then you need to create a service account and authorize it to write your metric data, as described in the following section.
Provide credentials explicitly
When running on GKE, the collecting Prometheus server automatically retrieves credentials from the environment based on the node's service account. In non-GKE Kubernetes clusters, credentials must be explicitly provided through the OperatorConfig resource in the gmp-public namespace.
Set the context to your target project:
gcloud config set project PROJECT_ID
Create a service account:
gcloud iam service-accounts create gmp-test-sa
Grant the required permissions to the service account:
gcloud projects add-iam-policy-binding PROJECT_ID\ --member=serviceAccount:gmp-test-sa@PROJECT_ID.iam.gserviceaccount.com \ --role=roles/monitoring.metricWriter
Create and download a key for the service account:
gcloud iam service-accounts keys create gmp-test-sa-key.json \ --iam-account=gmp-test-sa@PROJECT_ID.iam.gserviceaccount.com
Add the key file as a secret to your non-GKE cluster:
kubectl -n gmp-public create secret generic gmp-test-sa \ --from-file=key.json=gmp-test-sa-key.json
Open the OperatorConfig resource for editing:
kubectl -n gmp-public edit operatorconfig config
Add the text shown in bold to the resource:
apiVersion: monitoring.googleapis.com/v1 kind: OperatorConfig metadata: namespace: gmp-public name: config collection: credentials: name: gmp-test-sa key: key.json
Make sure you also add these credentials to therules
section so that managed rule evaluation works.Save the file and close the editor. After the change is applied, the pods are re-created and start authenticating to the metric backend with the given service account.
Additional topics for managed collection
This section describes how to do the following:
- Enable the target status feature for easier debugging.
- Configure target scraping using Terraform.
- Filter the data you export to the managed service.
- Scrape Kubelet and cAdvisor metrics.
- Convert your existing prom-operator resources for use with the managed service.
- Run managed collection outside of GKE.
Enabling the target status feature
You can check the status of your targets in your PodMonitoring or
ClusterPodMonitoring resources by setting the features.targetStatus
value
within the OperatorConfig resource to true
, as shown in the following:
apiVersion: monitoring.googleapis.com/v1 kind: OperatorConfig metadata: namespace: gmp-public name: config features: targetStatus: true
If you have a PodMonitoring resource with the name prom-example
in the NAMESPACE_NAME
namespace, then you can check the status by running
the following command:
kubectl -n NAMESPACE_NAME describe podmonitorings/prom-example
Once the Endpoint Statuses
field shows a Collectors Fraction
value of 1
(meaning 100%), then all of the managed collectors are reachable. The Sample
Groups
field shows sample targets grouped by common labels, which is useful for
debugging situations where your targets are not discovered. For more information
about debugging target discovery issues, see
Ingestion-side problems in the troubleshooting
documentation.
Configuring target scraping using Terraform
You can automate the creation and management of PodMonitoring and
ClusterPodMonitoring resources by using the kubernetes_manifest
Terraform
resource type or the kubectl_manifest
Terraform resource type, either of which
lets you specify arbitrary custom resources.
For general information about using Google Cloud with Terraform, see Terraform with Google Cloud.
Filter exported metrics
If you collect a lot of data, you might want to prevent some time series from
being sent to Managed Service for Prometheus to keep down costs. You can do
this by using Prometheus relabeling rules
with a keep
action for an allowlist or a drop
action for a denylist. For
managed collection, this rule goes in the metricRelabeling
section of your
PodMonitoring or ClusterPodMonitoring
resource.
For example, the following metric relabeling rule will filter out any metric
that begins with foo_bar_
, foo_baz_
, or foo_qux_
:
metricRelabeling: - action: drop regex: foo_(bar|baz|qux)_.+ sourceLabels: [__name__]
For additional suggestions on how to lower your costs, see Cost controls and attribution.
Scraping Kubelet and cAdvisor metrics
The Kubelet exposes metrics about itself as well as cAdvisor metrics about containers running on its node. You can configure managed collection to scrape Kubelet and cAdvisor metrics by editing the OperatorConfig resource. For instructions, see the exporter documentation for Kubelet and cAdvisor.
Convert existing prometheus-operator resources
You can usually convert your existing prometheus-operator resources to Managed Service for Prometheus managed collection PodMonitoring and ClusterPodMonitoring resources.
For example, the ServiceMonitor resource defines monitoring for a set of services. The PodMonitoring resource serves a subset of the fields served by the ServiceMonitor resource. You can convert a ServiceMonitor CR to a PodMonitoring CR by mapping the fields as described in the following table:
monitoring.coreos.com/v1 ServiceMonitor |
Compatibility |
monitoring.googleapis.com/v1 PodMonitoring |
---|---|---|
.ServiceMonitorSpec.Selector
|
Identical |
.PodMonitoringSpec.Selector
|
.ServiceMonitorSpec.Endpoints[]
|
.TargetPort maps to .Port .Path |
.PodMonitoringSpec.Endpoints[]
|
.ServiceMonitorSpec.TargetLabels
|
PodMonitor must specify:.FromPod[].From pod label.FromPod[].To target label
|
.PodMonitoringSpec.TargetLabels
|
The following is a sample ServiceMonitor CR; the content in bold type is replaced in the conversion, and the content in italic type maps directly:
apiVersion: monitoring.coreos.com/v1 kind: ServiceMonitor metadata: name: example-app spec: selector: matchLabels: app: example-app endpoints: - targetPort: web path: /stats interval: 30s targetLabels: - foo
The following is the analogous PodMonitoring CR, assuming that your service
and its pods are labeled with app=example-app
. If this assumption
does not apply, then you need to use the label selectors of the underlying
Service resource.
The content in bold type has been replaced in the conversion:
apiVersion: monitoring.googleapis.com/v1 kind: PodMonitoring metadata: name: example-app spec: selector: matchLabels: app: example-app endpoints: - port: web path: /stats interval: 30s targetLabels: fromPod: - from: foo # pod label from example-app Service pods. to: foo
You can always continue to use your existing prometheus-operator resources and deployment configs by using self-deployed collectors instead of managed collectors. You can query metrics sent from both collector types, so you might want to use self-deployed collectors for your existing Prometheus deployments while using managed collectors for new Prometheus deployments.
Reserved labels
Managed Service for Prometheus automatically adds the following labels to all metrics collected:
project_id
: The identifier of the Google Cloud project associated with your metric.location
: The physical location (Google Cloud region) where the data is stored. This value is typically the region of your GKE cluster. If data is collected from an AWS or on-premises deployment, then the value might be the closest Google Cloud region.cluster
: The name of the Kubernetes cluster associated with your metric.namespace
: The name of the Kubernetes namespace associated with your metric.job
: The job label of the Prometheus target, if known; might be empty for rule-evaluation results.instance
: The instance label of the Prometheus target, if known; might be empty for rule-evaluation results.
While not recommended when running on Google Kubernetes Engine, you can override the
project_id
, location
, and cluster
labels by adding
them as args
to the Deployment resource within
operator.yaml
. If you use any reserved labels as metric labels,
Managed Service for Prometheus automatically relabels them by adding the
prefix exported_
. This behavior matches how upstream Prometheus handles
conflicts with reserved labels.
Teardown
To disable managed collection deployed using gcloud
or the GKE
UI, you can do either of the following:
Run the following command:
gcloud container clusters update CLUSTER_NAME --disable-managed-prometheus
Use the GKE UI:
Select Kubernetes Engine in the Google Cloud console, then select Clusters.
Locate the cluster for which you want to disable managed collection and click its name.
On the Details tab, scroll down to Features and change the state to Disabled by using the edit button.
To disable managed collection deployed by using Terraform, specify
enabled = false
in the managed_prometheus
section of the
google_container_cluster
resource.
To disable managed collection deployed by using kubectl
, run the following
command:
kubectl delete -f https://raw.githubusercontent.com/GoogleCloudPlatform/prometheus-engine/v0.6.1/manifests/operator.yaml
Disabling managed collection causes your cluster to stop sending new data to Managed Service for Prometheus. Taking this action does not delete any existing metrics data already stored in the system.
Disabling managed collection also deletes the gmp-public
namespace and
any resources within it, including any exporters
installed in that namespace.
Run managed collection outside of GKE
In GKE environments, you can run managed collection without
further configuration. In other Kubernetes environments,
you need to explicitly provide credentials, a project-id
value to contain your
metrics, a location
value (Google Cloud region) where your metrics will be
stored, and a cluster
value to save the name of the cluster in which the
collector is running.
As gcloud
does not work outside of Google Cloud environments, you need to
deploy using kubectl instead. Unlike with gcloud
,
deploying managed collection using kubectl
does not automatically upgrade your
cluster when a new version is available. Remember to watch the releases
page for new versions and manually upgrade
by re-running the kubectl
commands with the new version.
You can provide a service account key by modifying the
OperatorConfig resource within operator.yaml
as described in Provide
credentials explicitly. You can provide
project-id
, location
, and cluster
values by adding them as args
to the
Deployment resource within operator.yaml
.
We recommend choosing project-id
based on your planned tenancy model for
reads. Pick a project to store metrics in based on how you plan to organize
reads later via metrics scopes. If you don't care, you can
put everything into one project.
For location
, we recommend choosing the nearest Google Cloud region to your
deployment. The further the chosen Google Cloud region is from your deployment,
the more write latency you'll have and the more you'll be affected by potential
networking issues. You might want to consult this list of regions across
multiple clouds. If
you don't care, you can put everything into one Google Cloud region. You can't
use global
as your location.
For cluster
, we recommend choosing the name of the cluster in which the
operator is deployed.
When properly configured, your OperatorConfig should look like this:
apiVersion: monitoring.googleapis.com/v1 kind: OperatorConfig metadata: namespace: gmp-public name: config collection: credentials: name: gmp-test-sa key: key.json rules: credentials: name: gmp-test-sa key: key.json
And your Deployment resource should look like this:
apiVersion: apps/v1 kind: Deployment ... spec: ... template: ... spec: ... containers: - name: operator ... args: - ... - "--project-id=PROJECT_ID" - "--cluster=CLUSTER_NAME" - "--location=REGION"
This example assumes you have set the REGION
variable to a value like
us-central1
, for example.
Running Managed Service for Prometheus outside of Google Cloud incurs data ingress fees and might incur data egress fees if running on another cloud. In versions 0.5.0 and above, you can minimize these costs by enabling gzip compression through the OperatorConfig. Add the text shown in bold to the resource:
apiVersion: monitoring.googleapis.com/v1 kind: OperatorConfig metadata: namespace: gmp-public name: config collection: compression: gzip ...
Further reading on managed collection custom resources
For reference documentation about all the Managed Service for Prometheus custom resources, see the prometheus-engine/doc/api reference.
What's next
- Query the Prometheus metrics collected by using the service.
- Set up managed rule evaluation.
- Set up commonly used exporters.