Collect AWS CloudWatch metrics by using the Prometheus CloudWatch exporter

This document describes how to use the open source Prometheus CloudWatch exporter and the Ops Agent running on a Compute Engine instance to collect AWS CloudWatch metrics and store them in a Google Cloud project. This document is intended for the following audiences:

  • Developers and system administrators who need to collect AWS CloudWatch metrics. This document describes how to set up the Prometheus CloudWatch exporter to collect AWS CloudWatch metrics.
  • Users of AWS CloudWatch metrics with AWS Connector projects who are migrating to the Prometheus CloudWatch exporter. This document includes also information about migrating from collecting legacy AWS CloudWatch metrics in Connector projects.

With Cloud Monitoring, you can view your AWS metrics in the same context as your Google Cloud metrics. For example, you can create a dashboard with charts that display CPU utilization for your Amazon EC2 instances and for your Compute Engine instances. You can also create alerting policies to monitor your AWS metrics. For more information, see the following sections:

Before you begin

To collect AWS CloudWatch metrics by using the Prometheus CloudWatch exporter, you need the following:

  • A Google Cloud project with permissions to do the following:
    • Create a VM
    • Write logs to Cloud Logging
    • Write metrics to Cloud Monitoring
  • An AWS account with AWS credentials that can be used by the Prometheus exporter to fetch metrics. For more information, see Run the Prometheus exporter.

Create a Compute Engine VM

We recommend creating a Linux Compute Engine VM to use specifically for running the Ops Agent and the Prometheus CloudWatch exporter. This VM acts as the collection site for all AWS metrics.

  1. To create a Debian Linux VM named aws-exporter-test in a zone that you specify, run the following command:

    gcloud compute instances create \
      --image-project debian-cloud \
      --image-family debian-11 \
      --zone ZONE \
      aws-exporter-test
    

    Configure the command as follows:

    • Replace ZONE with the zone for your new VM
    • Optional. Replace aws-exporter-test with different name for your VM.

    For more information about this command, see the gcloud compute instances create reference.

  2. To access your VM so that you can install the Prometheus CloudWatch exporter and the Ops Agent, you can use the following command:

    gcloud compute ssh --zone ZONE  --project PROJECT_ID  aws-exporter-test
    

    Configure the command as follows:

    • Replace ZONE with the zone in which you created your VM
    • Replace PROJECT_ID with ID of your Google Cloud project
    • Replace aws-exporter-test if you created your VM with a different name

    For more information about this command, see the gcloud compute ssh reference.

Set up the Prometheus CloudWatch exporter

The following sections describe the procedure for downloading, installing, and configuring the Prometheus CloudWatch exporter on your Compute Engine VM.

Download the Prometheus exporter and the JRE

To run the Prometheus CloudWatch exporter, you need to download the exporter and the Java Runtime Environment (JRE), version 11 or newer.

  1. To download the JAR file containing the Prometheus CloudWatch exporter, run the following command on your Compute Engine instance:

    curl -sSO https://github.com/prometheus/cloudwatch_exporter/releases/download/v0.15.1/cloudwatch_exporter-0.15.1-jar-with-dependencies.jar
    
  2. To install the JRE, you can use a command like the following:

    sudo apt install default-jre
    

Configure the Prometheus exporter

To configure the Prometheus CloudWatch exporter, you create a configuration file for the AWS service or services from which you want to collect metrics. For general information, see the Prometheus CloudWatch exporter configuration documentation.

  • Migrating users: If you are migrating your existing AWS CloudWatch metrics to the Prometheus CloudWatch exporter, then you can use the configuration files in Prometheus CloudWatch exporter configurations for migration. These configuration files are designed to replicate the existing metrics as closely as possible, but they do not collect all the metrics available by using the Prometheus CloudWatch exporter for the AWS services.

  • New users: If you are not migrating existing metrics, we recommend that you don't use the migration configurations. See the AWS CloudWatch service documentation for information about how to define exporter configurations for other services. You can also find additional samples in the Prometheus CloudWatch exporter GitHub repository.

You can combine configuration for multiple AWS services into one configuration file. The examples in this document assume that your configuration file is named config.yml.

Run the Prometheus exporter

Before you can run the Prometheus CloudWatch exporter, you must provide the exporter with credentials and authorization. The Prometheus CloudWatch exporter uses the the AWS Java SDK, which offers ways to provide credentials by using the following environment variables:

For more information about providing credentials to the SDK, see AWS SDK for Java 2.x.

You must also have permission to use the CloudWatch API to retrieve metrics, You need the following AWS IAM CloudWatch permissions:

  • cloudwatch:ListMetrics
  • cloudwatch:GetMetricStatistics
  • cloudwatch:GetMetricData

Using the aws_tag_select feature also requires the tag:GetResources AWS IAM permission.

For more information about authorizing access to AWS services, see AWS Identity and Access Management.

To run the Prometheus CloudWatch exporter, do the following:

  1. To provide credentials for the exporter, set the access-key environment variables:

    export AWS_ACCESS_KEY=YOUR_ACCESS_KEY
    export AWS_SECRET_ACCESS_KEY=YOUR_SECRET_ACCESS_KEY
    export AWS_SESSION_TOKEN=YOUR_SESSION_TOKEN
    

    Replace the YOUR_KEY variables with your access keys. You need to set the AWS_SESSION_TOKEN environment variable only if you are using temporary credentials.

  2. To test your configuration, start the exporter and load your configuration file, run the following command:

    java -jar cloudwatch_exporter-0.15.1-jar-with-dependencies.jar 9106 config.yml
    

    Change the port (9106) and configuration-file (config.yml) values if necessary.

    If you modify your config.yml file while the exporter is running, then you can reload the exporter by running the following command:

    curl -X POST localhost:9106/-/reload
    

    For use in a production environment, you can configure the exporter to restart if the VM restarts. For example, on Debian systems, you can use the system and service manager, systemd.

Set up the Ops Agent

The following sections describe how to install configure, and start the Ops Agent. These sections provide minimal set-up information for the Ops Agent for use with the Prometheus CloudWatch exporter. For more information about these topics, see Ops Agent overview.

Install the Ops Agent

To install the Ops Agent, use the following commands to download and run the agent's installation script:

curl -sSO https://dl.google.com/cloudagents/add-google-cloud-ops-agent-repo.sh
sudo bash add-google-cloud-ops-agent-repo.sh --also-install

Configure the Ops Agent

To configure the Ops Agent, you add configuration elements to the agent's user-configuration file. On Linux, the user-configuration file is /etc/google-cloud-ops-agent/config.yaml.

When you configure the Ops Agent to collect AWS metrics from the Prometheus CloudWatch exporter, you use the agent's Prometheus receiver. This document describes two basic configurations for the Ops Agent. Select one of the configurations and add it to the user-configuration file:

sudo vim /etc/google-cloud-ops-agent/config.yaml

Basic configuration for the Ops Agent

The following configuration provides a minimal configuration for the Ops Agent. This configuration does the following:

  • Creates a receiver named aws of type prometheus. The receiver is configured to scrape metrics from the aws_exporter job. The port specified must match the port on which the Prometheus CloudWatch exporter is exporting metrics; see Run the Prometheus exporter.

  • Creates a pipeline named aws_pipeline that uses the aws metrics recevier.

metrics:
  receivers:
    aws:
      type: prometheus
      config:
        scrape_configs:
          - job_name: 'aws_exporter'
            scrape_interval: 10s
            static_configs:
              - targets: ['localhost:9106']
  service:
    pipelines:
      aws_pipeline:
        receivers:
          - aws

Configuration that drops JVM metrics

The following configuration does everything that the basic configuration does, but it also adds a relabeling config that drops the JVM metrics generated by the exporter. Dropping these metrics reduces the ingested metric data, but it can make problems with the exporter more difficult to debug, because you aren't getting the exporter's own metrics:

metrics:
  receivers:
    aws:
      type: prometheus
      config:
        scrape_configs:
          - job_name: 'aws_exporter'
            scrape_interval: 10s
            static_configs:
              - targets: ['localhost:9106']
            # Drop the exporter's own JVM metrics to reduce noise.
            metric_relabel_configs:
              - source_labels: [ __name__ ]
                regex: 'jvm_.*'
                action: drop
  service:
    pipelines:
      aws_pipeline:
        receivers:
          - aws

You can create much more complex configurations for the Ops Agent.

Restart the Ops Agent

To apply configuration changes to the Ops Agent, you must restart the agent.

  1. To restart the agent, run the following command on your instance:
    sudo service google-cloud-ops-agent restart
    
  2. To confirm that the agent restarted, run the following command and verify that the components "Metrics Agent" and "Logging Agent" started:
    sudo systemctl status google-cloud-ops-agent"*"
    

View metrics in Cloud Monitoring

In Cloud Monitoring, you can query your AWS CloudWatch metrics and create charts like you do for any other metrics. From the Metrics Explorer interface, you can use PromQL, Monitoring Query Language (MQL), or a query-builder interface. For more information, see Create charts with Metrics Explorer.

If you have created charts you that want to keep, then you can save them to custom dashboards. For more information, see Dashboards overview.

The following chart shows a PromQL query for the aws_ec2_cpuutilization_sum metric for AWS VMs:

The chart shows the result of fetching the aws_ec2_cpuutilization_sum statistic for AWS VMs by using PromQL.

You can query any metric in Cloud Monitoring by using PromQL. For information, see Mapping Cloud Monitoring metrics to PromQL.

You can query Prometheus metrics by using PromQL or by using Cloud Monitoring tools like Monitoring Query Language (MQL). When Prometheus metrics are ingested into Cloud Monitoring, each metric is transformed by using the standard OpenTelemetry-to-Prometheus transformation and mapped to the Cloud Monitoring prometheus_target monitored-resource-type. The transformation includes the following changes:

  • The metric name is prefixed with the string prometheus.googleapis.com/.
  • Any non-alphanumeric characters, such as periods (.), in the metric name are replaced by underscores (_).
  • The metric name is postfixed with a string that indicates the metric kind, like /gauge or /counter.

To query the Prometheus aws_ec2_cpuutilization_sum metric by using MQL, refer to the metric as prometheus.googleapis.com/aws_ec2_cpuutilization_sum/gauge, and specify the prometheus_target monitored-resource type:

fetch prometheus_target :: 'prometheus.googleapis.com/aws_ec2_cpuutilization_sum/gauge'

The following chart shows the result of the MQL query for the Prometheus aws_ec2_cpuutilization_sum metric:

The chart shows the result of fetching the aws_ec2_cpuutilization_sum statistic for AWS VMs by using MQL.

Alert on metric behavior

In Cloud Monitoring, you can create [alerting policies][alert-intro] to monitor your AWS CloudWatch metrics and notify you of spikes, dips, or trends in metric values.

Monitor multiple regions

The configuration of the Prometheus CloudWatch exporter supports the use of only one region per configuration file. If you need to monitor multiple regions, then we recommend that you run multiple instances of the Prometheus exporter, one configured for each region you want to monitor. You can run multiple exporters on a single VM, or you can distribute them across VMs. The Prometheus exporter Docker images might be useful in this situation.

You can configure the Ops Agent running on the Compute Engine VM to scrape multiple Prometheus endpoints. We recommend that, when you configure multiple instances of the Prometheus exporter, you use a different job name the scrape config for each, so you can distinguish the exporter instances if you need to troubleshoot them.

For information about configuring the Ops Agent and the Prometheus receiver, see Configure the Ops Agent.

Cost

Amazon charges for every CloudWatch API request or for every CloudWatch metric you request; for current pricing, see Amazon CloudWatch Pricing. The Prometheus CloudWatch exporter has the following query characteristics, which can affect your Amazon costs:

  • When using the GetMetricStatistics method (default), every metric requires one API request. Each request can include multiple statistics.
  • When using aws_dimensions, the exporter must perform API requests to determine which metrics to request. The number of dimensions requests is typically negligible in comparison to the number of metric requests.

Cloud Monitoring charges for AWS CloudWatch metrics from the Prometheus exporter by the number of samples ingested. For current pricing, see Monitoring pricing summary.

Migration guide

This section provides additional information for customers migrating from the legacy AWS CloudWatch metrics with AWS Connector projects to the Prometheus CloudWatch exporter solution.

If you are not migrating to the Prometheus CloudWatch exporter from the legacy solution, then you can skip this section.

Mapping legacy AWS CloudWatch metrics to Prometheus CloudWatch exporter metrics

This section describes how the legacy AWS CloudWatch metrics map to the metrics collected by the Prometheus CloudWatch exporter, using the AWS CloudWatch metric CPUUtilization as an example.

The CPUUtilization metric measures the percentage of physical CPU time that Amazon EC2 uses to run the instance, including time running user code and Amazon EC2 code. In general terms, the metric value is the sum of the guest CPU utilization and the hypervisor CPU utilization.

The legacy solution reports this data to Cloud Monitoring by using the following metrics:

The metrics for values like "average" and "maximum" represent the CloudWatch statistics that are meaningful for the metric; each reports a different aggregation of the AWS CPUUtilization metric. These metrics are written against the aws_ec2_instance monitored-resource type, and the value of the instance_id resource label is the identifier for the Amazon EC2 instance writing the metric.

When you use the Prometheus CloudWatch exporter and the Ops Agent, the metrics are reported as the following:

  • aws_ec2_cpuutilization_average
  • aws_ec2_cpuutilization_maximum
  • aws_ec2_cpuutilization_minimum
  • aws_ec2_cpuutilization_samplecount
  • aws_ec2_cpuutilization_sum

These metrics correspond to the aws.googleapis.com/EC2/CPUUtilization/Statistic metrics collected by the legacy solution. These metrics are written against the prometheus-target monitored-resource type.

The values of the labels on the prometheus-target resource reflect the Compute Engine VM on which the Prometheus CloudWatch exporter is running, not those of the Amazon EC2 instance. The values of the labels on the metric are set by the Prometheus exporter. The aws_ec2_cpuutilization_statistic metrics preserve the Amazon EC2 instance's Instance ID in the instance_id metric label. The following screenshot shows a PromQL query that charts the aws_ec2_cpuutilization_sum metric; the table shows the values of selected labels:

The table shows the value of the `intance_id` label for an EC2 metric.

If you are using one of the provided Prometheus CloudWatch exporter configurations for migration but want to collect additional dimensions or statistics for the metrics, then you can modify the configuration. For more information, see the Prometheus CloudWatch exporter configuration documentation.

Rebuild your dashboards and alerting policies

Existing dashboards and alerting policies that use the legacy AWS CloudWatch metrics will not work for metrics ingested by using the Prometheus CloudWatch exporter. This is a breaking change.

To get the same observability into your AWS systems, you must rebuild your dashboards and alerting policies to use the metrics collected by the Prometheus exporter.

Metadata loss

In addition to collecting AWS CloudWatch metrics, the legacy solution also collected metadata from the legacy Monitoring agent and the legacy Logging agent running on Amazon Elastic Compute Cloud (Amazon EC2) instances. That resource metadata was joined to the metrics in Cloud Monitoring and appeared as system or user metadata labels like Instance Name, Availability Zone, Machine Type, and others.

The Prometheus CloudWatch exporter might not collect the same metadata. If you are using either of the legacy agents on Amazon EC2 VM instances, the following sections describe how you can use the Prometheus exporter to collect the missing metadata and join it with the metrics collected by the Prometheus exporter:

After August 21, 2024, when the deprecation of the legacy AWS CloudWatch solution is complete, these metadata labels will no longer be available. Metric and aws_ec2_instance resource labels are unaffected.

For users of the legacy Monitoring agent on Amazon EC2 instances

If you use the AWS metadata in your charts and queries and want to maintain it, you can collect it by using the Prometheus CloudWatch exporter. Create a Compute Engine VM, and set up the Prometheus CloudWatch exporter and Ops Agent as described in this document. Use the ec2.yml configuration file when configuring the Prometheus CloudWatch exporter,

The ec2.yml configuration file uses the aws_tag_select feature. When the aws_tag_select feature is used in the configuration, the Prometheus CloudWatch exporter exports a metric called aws_resource_info. The aws_resource_info metric reports a time series with metadata of the AWS resource in the metric labels. This metadata includes all the Amazon EC2 instance tags, including Instance Name, in the label tag_Name.

If you want to collect additional metadata, you can add that metadata by using instance tags on the Amazon EC2 instance; see Add or remove EC2 instance tags. The aws_resource_info metric reported by the Prometheus exporter includes the additional metadata.

You can then join the metadata labels from the aws_resource_info metric with the self metrics from the legacy Monitoring agent or any Amazon EC2 metrics by using MQL for PromQL.

MQL join

For example, the following MQL query joins the agent self metric agent.googleapis.com/agent/uptime, written against the aws_ec2_instance resource type, with the Prometheus CloudWatch exporter aws_resource_info metric, written against the prometheus-target resource type:

{
    aws_ec2_instance :: 'agent.googleapis.com/agent/uptime'
    | align next_older()
    | group_by [instance_id: resource.instance_id, resource.project_id, resource.region, resource.aws_account, metric.version]
    ;
    prometheus_target :: 'prometheus.googleapis.com/aws_resource_info/gauge'
    | align next_older()
    | group_by [instance_id: metric.instance_id, resource.project_id, aws_account: re_extract(metric.arn, "arn:aws:ec2:[^:]+:([^:]+):instance/.*"), region: concatenate("aws:", re_extract(metric.arn, "arn:aws:ec2:([^:]+):[^:]+:instance/.*")), name: metric.tag_Name]
}
| join
| val(0)

The two metrics are joined on the instance_id label, so the name of the VM—the value of the metric.tag_Name label in the aws_resource_info metric—can be added to the result of the join. The agent uptime self metric includes the resource label region; the join with the region label works because AWS doesn't specify whether instance IDs must be unique regionally or globally.

For more information about MQL, see Monitoring Query Language overview.

PromQL join

The following example shows a PromQL query that joins the aws_ec2_cpuutilization_average metric from the Prometheus CloudWatch exporter The following example shows a PromQL query that joins the aws_ec2_cpuutilization_average metric from the Prometheus CloudWatch exporter with the aws_resource_info metadata metric. The metrics are joined on the instance_id label, to add the VM name—from the tag_Name label of the metadata metric to the query result.

  aws_ec2_cpuutilization_average
* on(instance_id) group_left(tag_Name)
  aws_resource_info

For users of the legacy Logging agent on Amazon EC2 instances

The legacy Logging agent, google-fluentd, reports its metadata directly to Cloud Logging, so the deprecation of the AWS CloudWatch metrics solution using the legacy Monitoring agent has no effect on the logs collected by the Logging agent.

The legacy Logging agent does, however, report some metrics about itself. If you want to add metadata to those self-metrics, you can create a Compute Engine VM, and set up the Prometheus CloudWatch exporter and Ops Agent as described in this document. Use the ec2.yml configuration file when configuring the Prometheus CloudWatch exporter,

You might also need to modify the configuration of your legacy Logging agent. The output plugin for the legacy Logging agent supports the use_aws_availability_zone option for AWS. This option must be set to false so that the agent writes the region label rather than the availability_zone label. For information about the location of the plugin configuration file, see Google Cloud fluentd output plugin configuration.

The ec2.yml configuration file uses the aws_tag_select feature. When the aws_tag_select feature is used in the configuration, the Prometheus CloudWatch exporter exports a metric called aws_resource_info. The aws_resource_info metric reports a time series with metadata of the AWS resource in the metric labels. This metadata includes all the Amazon EC2 instance tags, including Instance Name, in the label tag_Name.

If you want to collect additional metadata, you can add that metadata by using instance tags on the Amazon EC2 instance; see Add or remove EC2 instance tags. The aws_resource_info metric reported by the Prometheus exporter includes the additional metadata.

You can then join the metadata labels from the aws_resource_info metric with the self metrics from the legacy Logging agent by using MQL. For example, the following MQL query joins the agent self metric agent.googleapis.com/agent/uptime, written against the aws_ec2_instance resource type, with the Prometheus CloudWatch exporter aws_resource_info metric, written against the prometheus-target resource type:

{
    aws_ec2_instance :: 'agent.googleapis.com/agent/uptime'
    | align next_older()
    | group_by [instance_id: resource.instance_id, resource.project_id, resource.region, resource.aws_account, metric.version]
    ;
    prometheus_target :: 'prometheus.googleapis.com/aws_resource_info/gauge'
    | align next_older()
    | group_by [instance_id: metric.instance_id, resource.project_id, aws_account: re_extract(metric.arn, "arn:aws:ec2:[^:]+:([^:]+):instance/.*"), region: concatenate("aws:", re_extract(metric.arn, "arn:aws:ec2:([^:]+):[^:]+:instance/.*")), name: metric.tag_Name]
}
| join
| val(0)

The two metrics are joined on the instance_id label, so the name of the VM—the value of the metric.tag_Name label in the aws_resource_info metric—can be added to the result of the join. The agent uptime self metric includes the resource label region; the join with the region label works because AWS doesn't specify whether instance IDs must be unique regionally or globally.

For more information about MQL, see Monitoring Query Language overview.

Turn off the legacy metrics in your AWS account

When you created the AWS Connector project in your Google Cloud project, you created an AWS IAM role that granted Google Cloud read-only access to your AWS account. To turn off the legacy AWS CloudWatch metrics in your AWS console, remove this role. For more information, see Deleting an IAM role (console).

Prometheus CloudWatch exporter configurations for migration

This section provides configurations that replicate, as closely as possible, the AWS service metrics documented in the AWS metrics list. These configuration files are intended for use by customers who are migrating to the Prometheus CloudWatch exporter from the legacy solution. If you are setting up the Prometheus CloudWatch exporter as a new user rather than a migrating user and you use these configurations, then you are not collecting all the AWS metrics that the Prometheus CloudWatch exporter makes available.

To view a sample configuration file for AWS CloudWatch metrics, expand one of the following sections.