Google Cloud Managed Service for Prometheus

Google Cloud Managed Service for Prometheus lets you monitor and alert on your workloads, using Prometheus, without having to manually manage and operate Prometheus at scale.

This document describes some of the characteristics of the managed service. To receive updates as the product reaches General Availability, submit the sign-up form. You can also request help by sending mail to gmp-support@google.com.

Managed Service for Prometheus is Google Cloud's fully managed storage and query service for Prometheus metrics. This service is built on top of Monarch, the same globally scalable data store as Cloud Monitoring. A thin fork of Prometheus replaces existing Prometheus deployments and sends data to the managed service with no user intervention. This data can then be queried by using PromQL through the Prometheus Query API supported by the managed service and by using the existing Cloud Monitoring query mechanisms.

Managed Service for Prometheus gives you access to features of Prometheus and Cloud Monitoring.

The service is intended to be a drop-in replacement for Prometheus that eliminates the need for Thanos, letting you to keep your existing Grafana dashboards, PromQL-based alerts, and workflows. You can use the managed Prometheus binary anywhere that you use upstream Prometheus today. This collector retains all regular Prometheus functionality, such as local storage and rule evaluation.

Data collection with Managed Service for Prometheus

You can use Managed Service for Prometheus in one of two modes: with managed data collection, or with self-deployed data collection. This document describes the differences between the modes.

Managed Service for Prometheus offers an operator for managed data collection in Kubernetes environments. We recommend that you use managed collection; it reduces the complexity of deploying, scaling, sharding, configuring, and maintaining the Prometheus servers. Managed collection is supported for both GKE and non-GKE Kubernetes environments.

With self-deployed data collection, you manage your Prometheus installation as you always have. The only difference from upstream Prometheus is that you run the Managed Service for Prometheus drop-in replacement binary instead of the upstream Prometheus binary.

When choosing between managed and self-deployed collection, consider the following:

  • Managed collection:

    • The recommended approach for Kubernetes environments.
    • Deployed using the kubectl tool.
    • Operation of Prometheus—generating scrape configurations, scaling ingestion, scoping rules to the right data, and so forth—is fully handled by the Kubernetes operator.
    • Scraping and rules are configured by using lightweight custom resources (CRs).
    • Good for those who want a more hands-off, fully managed experience.
    • Intuitive migration from prometheus-operator configs.
    • Supports most current Prometheus use cases.
  • Self-deployed collection:

    • The Managed Service for Prometheus binary is a drop-in replacement for the upstream Prometheus binary.
    • You can use your preferred deployment mechanism, like prometheus-operator or manual deployment.
    • You can configure scraping by using your preferred methods, like annotations or prometheus-operator.
    • Scaling and functional sharding is done manually.
    • Good for quick integration into more complex existing setups. You can reuse your existing configs and run upstream Prometheus and Managed Service for Prometheus side by side.
    • Might support use cases that aren't yet supported by managed collection.

    Streaming data to Managed Service for Prometheus consumes additional resources. If you are manually deploying collectors, then we recommend increasing CPU and memory limits by 5x and adjusting them based on actual usage.

Managed Service for Prometheus and Google Cloud

Managed Service for Prometheus is a Google Cloud product, and certain billing and quotas apply.

Billing

Billing for the service is based primarily on the number of metric samples ingested into storage. There is also a nominal charge for read API calls. Managed Service for Prometheus does not charge for storage or retention of metric data.

Quotas

Managed Service for Prometheus shares ingest and read quotas with Cloud Monitoring. The default ingest quota is 500 QPS per project, with up to 200 samples in a single call. The default read quota is 100 QPS per metrics scope.

You can request increases to these quotas to support your metric and query volumes. For information about managing quotas and requesting quota increases, see Working with quotas.

Interoperability with upstream Prometheus

The following sections describe some common Prometheus use cases and how Managed Service for Prometheus fits into them.

Existing Prometheus deployments

Managed Service for Prometheus adds managed collectors for Kubernetes environments to upstream Prometheus. Managed collection simplifies the setup and maintenance of Prometheus deployments, as described in Data collection with Managed Service for Prometheus. For setup instructions, see Get started with managed collection.

You can also run the managed service with self-deployed collection. You can duplicate and use your existing deployment configuration with the Managed Service for Prometheus container image or binary. All of your existing configurations and workflows will continue to work, and your data is stored in Monarch. For setup instructions, see Get started with self-deployed collection.

If you use the managed service outside of Google Kubernetes Engine, some additional configuration might be necessary; see Provide credentials explicitly.

Recording rules

You can continue to evaluate recording rules locally in your collectors. The results of recording rules are stored in Monarch, just like directly collected metric data.

Managed Service for Prometheus also provides a stand-alone rule evaluator that evaluates recording and alerting rules against all the data accessible in a metrics scope. Evaluating rules against a multi-project metrics scope eliminates the need to co-locate all the data of interest on a single Prometheus server or within a single Google Cloud project.

Limiting exported data

For high-volume data, you might want to prevent some time series from being sent to Managed Service for Prometheus to keep costs down. You can use filtering to limit the data that is exported. For more information, see filtering for managed collection or filtering for self-deployed collection.

Federation servers

We do not recommend using the Managed Service for Prometheus binary in federation servers. The managed service provides a global view of all your data through its globally scalable storage. Prometheus federation typically attempts to work around the lack of such scalable storage, so federation and the managed service represent two different approaches to achieving a global view of your metric data.

Managed Service for Prometheus supports filtering to limit the metrics exported to Monarch. The export filtering has the same configuration semantics as federation, so you can replace federation with export filtering to achieve the same goal.

High-availability Prometheus

Managed Service for Prometheus doesn't support high-availability collection. Using managed collectors eliminates the need for high-availability collectors, because the managed collectors run as node agents.

Autoscaling

You can use the same technique for autoscaling that is used with GKE workload metrics, but instead of using the metric prefix workload.googleapis.com, use prometheus.googleapis.com. For more information see Autoscaling deployments with GKE workload metrics.

What's next