Google Cloud Managed Service for Prometheus

Stay organized with collections Save and categorize content based on your preferences.

Google Cloud Managed Service for Prometheus is Google Cloud's fully managed multi-cloud solution for Prometheus metrics. It lets you globally monitor and alert on your workloads, using Prometheus, without having to manually manage and operate Prometheus at scale.

Managed Service for Prometheus collects metrics from Prometheus exporters and lets you query the data globally using PromQL, meaning that you can keep using any existing Grafana dashboards, PromQL-based alerts, and workflows. It is hybrid- and multi-cloud compatible, can monitor both Kubernetes and VM workloads, retains data for 24 months, and maintains portability by staying compatible with upstream Prometheus. You can also supplement your Prometheus monitoring by querying over 1,500 free metrics in Cloud Monitoring, including free GKE system metrics, using PromQL.

This document gives an overview of the managed service, and further documents describe how to set up and run the service. To receive regular updates about new features and releases, submit the optional sign-up form.

Hear how The Home Depot uses Managed Service for Prometheus to get unified observability across 2,200 stores running on-prem Kubernetes clusters:

System overview

Managed Service for Prometheus gives you access to features of Prometheus and Cloud Monitoring.

Managed Service for Prometheus is built on top of Monarch, the same globally scalable data store used for Google's own monitoring. Because it uses the same backend and APIs as Cloud Monitoring, all Cloud Monitoring metric data is queryable using PromQL, and all Managed Service for Prometheus data is queryable using Cloud Monitoring.

In a standard Prometheus deployment, data collection, query evaluation, rule and alert evaluation, and data storage are all handled within a single Prometheus server. Managed Service for Prometheus splits responsibilities for these functions into multiple components:

  • Data collection is handled by either managed or self-deployed collectors, which scrape local exporters and forward the collected data to Monarch. These collectors can be used for both Kubernetes and traditional workloads and can run everywhere, including other clouds and on-prem deployments.
  • Query evaluation is handled by Monarch, which executes queries and unions results across all Google Cloud regions and across up to 1,000 Google Cloud projects.
  • Rule and alert evaluation is handled by locally-run and locally-configured rule evaluator components, which execute rules and alerts against the global Monarch data store and forward any fired alerts to Prometheus AlertManager.
  • Data storage is handled by Monarch, which stores all Prometheus data for 24 months at no additional cost.

Grafana connects to the global Monarch data store instead of connecting to individual Prometheus servers. If you have Managed Service for Prometheus collectors configured in all your deployments, then this single Grafana instance gives you a unified view of all your metrics across all your clouds.

Data collection

You can use Managed Service for Prometheus in one of two modes: with managed data collection or with self-deployed data collection.

Managed Service for Prometheus offers an operator for managed data collection in Kubernetes environments. We recommend that you use managed collection; using it eliminates the complexity of deploying, scaling, sharding, configuring, and maintaining Prometheus servers. Managed collection is supported for both GKE and non-GKE Kubernetes environments.

With self-deployed data collection, you manage your Prometheus installation as you always have. The only difference from upstream Prometheus is that you run the Managed Service for Prometheus drop-in replacement binary instead of the upstream Prometheus binary.

You can run either collection option in on-prem deployments and on any cloud. Collectors running outside of Google Cloud send data to Monarch for long-term storage and global querying.

When choosing between managed and self-deployed collection, consider the following:

  • Managed collection:

    • Google's recommended approach for all Kubernetes environments.
    • Deployed by using the GKE UI, the gcloud CLI, the the kubectl CLI, or Terraform.
    • Operation of Prometheus—generating scrape configurations, scaling ingestion, scoping rules to the right data, and so forth—is fully handled by the Kubernetes operator.
    • Scraping and rules are configured by using lightweight custom resources (CRs).
    • Good for those who want a more hands-off, fully managed experience.
    • Intuitive migration from prometheus-operator configs.
    • Supports most current Prometheus use cases.
  • Self-deployed collection:

    • A drop-in replacement for the upstream Prometheus binary.
    • You can use your preferred deployment mechanism, like prometheus-operator or manual deployment.
    • Scraping is configured by using your preferred methods, like annotations or prometheus-operator.
    • Scaling and functional sharding is done manually.
    • Good for quick integration into more complex existing setups. You can reuse your existing configs and run upstream Prometheus and Managed Service for Prometheus side by side.
    • Rules and alerts typically run within individual Prometheus servers, which might be preferable for edge deployments as local rule evaluation does not incur any network traffic.
    • Might support long-tail use cases that aren't yet supported by managed collection, such as local aggregations to reduce cardinality.

To get started, see Get started with managed collection or Get started with self-deployed collection.

If you use the managed service outside of Google Kubernetes Engine or Google Cloud, some additional configuration might be necessary; see Run managed collection outside of Google Cloud or Run self-deployed collection outside of Google Cloud.

Query evaluation

Managed Service for Prometheus supports any query UI that can call the Prometheus query API, including Grafana and the Cloud Monitoring UI. Existing Grafana dashboards continue to work when switching from local Prometheus to Managed Service for Prometheus, and you can continue using PromQL found in popular open-source repositories and on community forums.

You can use PromQL to query over 1,500 free metrics in Cloud Monitoring, even without sending data to Managed Service for Prometheus. You can also use PromQL to query free Kubernetes metrics, custom metrics and log-based metrics.

For information on how to configure Grafana to query Managed Service for Prometheus data, see Configure a query user interface.

For information on how to query Cloud Monitoring metrics using PromQL, see PromQL for Cloud Monitoring metrics.

Rule and alert evaluation

Managed Service for Prometheus provides a stand-alone rule evaluator that evaluates recording and alerting rules against all Monarch data accessible in a metrics scope. Evaluating rules against a multi-project metrics scope eliminates the need to co-locate all data of interest on a single Prometheus server or within a single Google Cloud project, and it lets you set IAM permissions on groups of projects.

Because the rule evaluator accepts the standard Prometheus rule_files format, you can easily migrate to Managed Service for Prometheus by copy-pasting existing rules or by copy-pasting rules found in popular open source repositories. For those using self-deployed collectors, you can continue to evaluate recording rules locally in your collectors. The results of recording and alerting rules are stored in Monarch, just like directly collected metric data.

For rule evaluation with managed collection, see Managed rule evaluation and alerting.

For rule evaluation with self-deployed collection, see Self-deployed rule evaluation and alerting.

For information on reducing cardinality using recording rules on self-deployed collectors, see Cost controls and attribution.

Data storage

All Managed Service for Prometheus data is stored for 24 months at no additional cost.

Managed Service for Prometheus supports a minimum scrape interval of 5 seconds. Data is stored at full granularity for 1 week, then is downsampled to 1-minute points for the next 5 weeks, then is downsampled to 10-minute points and stored for the remainder of the retention period.

Managed Service for Prometheus has no limit on the number of active time series or total time series.

For more information, see Quotas and limits within the Cloud Monitoring documentation.

Billing and quotas

Managed Service for Prometheus is a Google Cloud product, and billing and usage quotas apply.

Billing

Billing for the service is based primarily on the number of metric samples ingested into storage. There is also a nominal charge for read API calls. Managed Service for Prometheus does not charge for storage or retention of metric data.

Quotas

Managed Service for Prometheus shares ingest and read quotas with Cloud Monitoring. The default ingest quota is 500 QPS per project with up to 200 samples in a single call, equivalent to 100,000 samples per second. The default read quota is 100 QPS per metrics scope.

You can increase these quotas to support your metric and query volumes. For information about managing quotas and requesting quota increases, see Working with quotas.

Terms of Service and compliance

Managed Service for Prometheus is part of Cloud Monitoring and therefore inherits certain agreements and certifications from Cloud Monitoring, including (but not limited to):

What's next