Collect AWS CloudWatch metrics by using the Prometheus CloudWatch exporter
Stay organized with collections
Save and categorize content based on your preferences.
This document describes how to use the open source Prometheus CloudWatch exporter
and the Ops Agent running on a Compute Engine instance to collect AWS
CloudWatch metrics and store them in a Google Cloud project. This document is
intended for the following audiences:
Developers and system administrators who need to collect AWS CloudWatch
metrics. This document describes how to set up the Prometheus CloudWatch exporter
to collect AWS CloudWatch metrics.
Users of AWS CloudWatch metrics with
AWS Connector projects who are migrating to the
Prometheus CloudWatch exporter. This document includes also information about
migrating from collecting legacy AWS CloudWatch metrics in Connector projects.
With Cloud Monitoring, you can view your AWS metrics in the same context
as your Google Cloud metrics. For example, you can create a dashboard with
charts that display CPU utilization for your Amazon EC2 instances
and for your Compute Engine instances. You can also create alerting
policies to monitor your AWS metrics. For more information, see the
following sections:
To collect AWS CloudWatch metrics by using the Prometheus CloudWatch exporter, you
need the following:
A Google Cloud project with permissions to do the following:
Create a VM
Write logs to Cloud Logging
Write metrics to Cloud Monitoring
An AWS account with AWS credentials that can be used by the
Prometheus exporter to fetch metrics.
For more information, see
Run the Prometheus exporter.
Create a Compute Engine VM
We recommend creating a Linux Compute Engine VM to use specifically
for running the Ops Agent and the Prometheus CloudWatch exporter. This VM acts as
the collection site for all AWS metrics.
To create a Debian Linux VM named aws-exporter-test in a zone that you
specify, run the following command:
The following sections describe the procedure for downloading, installing,
and configuring the Prometheus CloudWatch exporter on your Compute Engine VM.
Download the Prometheus exporter and the JRE
To run the Prometheus CloudWatch exporter, you need to download the exporter and
the Java Runtime Environment (JRE), version 11 or newer.
To download the JAR file containing the Prometheus CloudWatch exporter, run the
following command on your Compute Engine instance:
To install the JRE, you can use a command like the following:
sudo apt install default-jre
Configure the Prometheus exporter
To configure the Prometheus CloudWatch exporter, you create a configuration file for
the AWS service or services from which you want to collect metrics.
For general information, see the Prometheus CloudWatch exporter configuration
documentation.
Migrating users: If you are migrating your existing AWS CloudWatch
metrics to the Prometheus CloudWatch exporter, then you can use the configuration files
in Prometheus CloudWatch exporter configurations for migration.
These configuration files are designed to replicate the existing metrics as
closely as possible, but they do not collect all the metrics available
by using the Prometheus CloudWatch exporter for the AWS services.
New users: If you are not migrating existing metrics, we recommend that
you don't use the migration configurations. See the AWS CloudWatch service
documentation for information
about how to define exporter configurations for other services. You can also
find additional samples in the Prometheus CloudWatch exporter
GitHub repository.
You can combine configuration for multiple AWS services into one
configuration file. The examples in this document assume that your
configuration file is named config.yml.
Run the Prometheus exporter
Before you can run the Prometheus CloudWatch exporter, you must provide the
exporter with credentials and authorization.
The Prometheus CloudWatch exporter uses the the AWS Java
SDK, which offers ways to provide credentials
by using the following environment variables:
Replace the YOUR_KEY variables with your access keys.
You need to set the AWS_SESSION_TOKEN environment variable only
if you are using temporary credentials.
To test your configuration, start the exporter and load your configuration
file, run the following command:
Change the port (9106) and configuration-file (config.yml)
values if necessary.
If you modify your config.yml file while the exporter is running,
then you can reload the exporter by running the following command:
curl -X POST localhost:9106/-/reload
For use in a production environment, you can configure the exporter to
restart if the VM restarts. For example, on Debian systems, you
can use the system and service manager,
systemd.
Set up the Ops Agent
The following sections describe how to install configure, and start
the Ops Agent. These sections provide minimal set-up information for
the Ops Agent for use with the Prometheus CloudWatch exporter.
For more information about these topics, see
Ops Agent overview.
Install the Ops Agent
To install the Ops Agent, use the following commands to download and
run the agent's installation script:
To configure the Ops Agent, you add configuration elements to the
agent's user-configuration file. On Linux, the user-configuration file
is /etc/google-cloud-ops-agent/config.yaml.
When you configure the Ops Agent to collect AWS metrics from the
Prometheus CloudWatch exporter, you use the agent's Prometheus receiver. This
document describes two basic configurations for the Ops Agent.
Select one of the configurations and add it to the
user-configuration file:
sudo vim /etc/google-cloud-ops-agent/config.yaml
Basic configuration for the Ops Agent
The following configuration provides a minimal configuration for the
Ops Agent. This configuration does the following:
Creates a receiver named aws of type prometheus. The receiver is
configured to scrape metrics from the aws_exporter job. The
port specified must match the port on which the Prometheus CloudWatch exporter is
exporting metrics; see Run the Prometheus exporter.
Creates a pipeline named aws_pipeline that uses the aws metrics recevier.
The following configuration does everything that the basic configuration does,
but it also adds a relabeling config that drops
the JVM metrics generated by the exporter. Dropping these metrics reduces the
ingested metric data, but it can make problems with the exporter more
difficult to debug, because you aren't getting the exporter's own metrics:
metrics:
receivers:
aws:
type: prometheus
config:
scrape_configs:
- job_name: 'aws_exporter'
scrape_interval: 10s
static_configs:
- targets: ['localhost:9106']
# Drop the exporter's own JVM metrics to reduce noise.
metric_relabel_configs:
- source_labels: [ __name__ ]
regex: 'jvm_.*'
action: drop
service:
pipelines:
aws_pipeline:
receivers:
- aws
You can create much more complex configurations for the Ops Agent.
For general information about configuring the Ops Agent, see
Configuration model.
Restart the Ops Agent
To apply configuration changes to the Ops Agent, you must restart the agent.
To restart the agent, run the following command on your instance:
sudo service google-cloud-ops-agent restart
To confirm that the agent restarted, run the following command and
verify that the components "Metrics Agent" and "Logging Agent" started:
sudo systemctl status google-cloud-ops-agent"*"
View metrics in Cloud Monitoring
In Cloud Monitoring, you can query your AWS CloudWatch metrics and
create charts like you do for any other metrics. From the Metrics Explorer
interface, you can use PromQL, Monitoring Query Language (MQL), or a query-builder
interface. For more information, see Create charts with
Metrics Explorer.
If you have created charts you that want to keep, then you can save them to
custom dashboards. For more information, see Dashboards overview.
The following chart shows a PromQL query for the
aws_ec2_cpuutilization_sum metric for AWS VMs:
You can query Prometheus metrics by using PromQL or by using
Cloud Monitoring tools like Monitoring Query Language (MQL). When
Prometheus metrics are ingested into Cloud Monitoring, each metric is
transformed by using the standard
OpenTelemetry-to-Prometheus transformation
and mapped to the Cloud Monitoring
prometheus_target monitored-resource-type.
The transformation includes the following changes:
The metric name is prefixed with the string prometheus.googleapis.com/.
Any non-alphanumeric characters, such as periods (.), in the metric name
are replaced by underscores (_).
The metric name is postfixed with a string that indicates the metric kind,
like /gauge or /counter.
To query the Prometheus aws_ec2_cpuutilization_sum metric by using
MQL, refer to the metric as
prometheus.googleapis.com/aws_ec2_cpuutilization_sum/gauge, and
specify the prometheus_target monitored-resource type:
The following chart shows the result of the MQL query for the
Prometheus aws_ec2_cpuutilization_sum metric:
Alert on metric behavior
In Cloud Monitoring, you can create [alerting policies][alert-intro] to
monitor your AWS CloudWatch metrics and notify you of spikes, dips, or trends
in metric values.
For information about used the query-builder interface to create alerting
policies, see Create alerting policies.
Monitor multiple regions
The configuration of the Prometheus CloudWatch exporter supports the use of only one
region per configuration file. If you need to monitor multiple regions, then
we recommend that you run multiple instances of the Prometheus exporter,
one configured for each region you want to monitor. You can run multiple
exporters on a single VM, or you can distribute them across VMs. The
Prometheus exporter Docker images
might be useful in this situation.
You can configure the Ops Agent running on the Compute Engine VM
to scrape multiple Prometheus endpoints. We recommend that, when you
configure multiple instances of the Prometheus exporter, you
use a different job name the the scrape config for each, so you
can distinguish the exporter instances if you need to troubleshoot them.
For information about configuring the Ops Agent and the Prometheus
receiver, see Configure the Ops Agent.
Cost
Amazon charges for every CloudWatch API request or for every CloudWatch
metric you request; for current pricing, see Amazon CloudWatch
Pricing. The Prometheus CloudWatch exporter
has the following query
characteristics,
which can affect your Amazon costs:
When using the GetMetricStatistics method (default), every metric
requires one API request. Each request can include multiple statistics.
When using aws_dimensions, the exporter must perform API requests
to determine which metrics to request. The number of dimensions requests
is typically negligible in comparison to the number of metric requests.
This section provides additional information for customers migrating from
the legacy AWS CloudWatch metrics with
AWS Connector projects
to the Prometheus CloudWatch exporter solution.
If you are not migrating to the Prometheus CloudWatch exporter from the
legacy solution, then you can skip this section.
Mapping legacy AWS CloudWatch metrics to Prometheus CloudWatch exporter metrics
This section describes how the legacy AWS CloudWatch metrics map to the
metrics collected by the Prometheus CloudWatch exporter, using the AWS CloudWatch
metric CPUUtilization as an example.
The CPUUtilization metric
measures the percentage of physical CPU time that Amazon EC2
uses to run the instance, including time running user code and
Amazon EC2 code. In general terms, the metric value is
the sum of the guest CPU utilization and the hypervisor CPU utilization.
The legacy solution reports this data to Cloud Monitoring by using the
following metrics:
The metrics for values like "average" and "maximum" represent the
CloudWatch statistics
that are meaningful for the metric; each reports a different aggregation
of the AWS CPUUtilization metric. These metrics are written against the
aws_ec2_instance monitored-resource type, and
the value of the instance_id resource label is the identifier for the
Amazon EC2 instance writing the metric.
When you use the Prometheus CloudWatch exporter and the Ops Agent, the metrics
are reported as the following:
aws_ec2_cpuutilization_average
aws_ec2_cpuutilization_maximum
aws_ec2_cpuutilization_minimum
aws_ec2_cpuutilization_samplecount
aws_ec2_cpuutilization_sum
These metrics correspond to the
aws.googleapis.com/EC2/CPUUtilization/Statistic metrics
collected by the legacy solution. These metrics are written against the
prometheus-target monitored-resource type.
The values of the labels on the prometheus-target resource
reflect the Compute Engine VM on which the Prometheus CloudWatch exporter is running,
not those of the Amazon EC2 instance. The values of the labels on
the metric are set by the Prometheus exporter. The
aws_ec2_cpuutilization_statistic metrics preserve the
Amazon EC2 instance's Instance ID in the instance_id
metric label. The following screenshot shows a PromQL query
that charts the aws_ec2_cpuutilization_sum metric; the table
shows the values of selected labels:
Existing dashboards and alerting policies that use the legacy
AWS CloudWatch metrics will not work for metrics ingested by
using the Prometheus CloudWatch exporter. This is a breaking change.
To get the same observability into your AWS systems, you must
rebuild your dashboards and alerting policies to use the metrics
collected by the Prometheus exporter.
Metadata loss
In addition to collecting AWS CloudWatch metrics, the legacy solution
also collected metadata from the legacy Monitoring agent and the legacy
Logging agent running on Amazon Elastic Compute Cloud (Amazon EC2) instances.
That resource metadata was joined to the metrics in Cloud Monitoring
and appeared as system or user metadata labels like Instance Name, Availability
Zone, Machine Type, and others.
The Prometheus CloudWatch exporter might not collect the same metadata. If you are using
either of the legacy agents on Amazon EC2 VM instances, the following
sections describe how you can use the Prometheus exporter to collect the
missing metadata and join it with the metrics collected by the
Prometheus exporter:
After August 21, 2024, when the deprecation of the
legacy AWS CloudWatch solution is complete, these metadata labels will no
longer be available. Metric and
aws_ec2_instance resource labels are unaffected.
For users of the legacy Monitoring agent on Amazon EC2 instances
If you use the AWS metadata in your charts and queries and want to maintain
it, you can collect it by using the Prometheus CloudWatch exporter. Create a
Compute Engine VM, and set up the
Prometheus CloudWatch exporter and Ops Agent
as described in this document. Use the ec2.yml configuration file
when configuring the Prometheus CloudWatch exporter,
The ec2.yml configuration file uses the aws_tag_select feature.
When the aws_tag_select feature is used in the configuration,
the Prometheus CloudWatch exporter exports a metric called aws_resource_info.
The aws_resource_info metric reports a time series with metadata of
the AWS resource in the metric labels. This metadata includes all the
Amazon EC2 instance tags, including Instance Name, in the
label tag_Name.
If you want to collect additional metadata, you can add that metadata
by using instance tags on the Amazon EC2 instance; see
Add or remove EC2 instance tags.
The aws_resource_info metric reported by the Prometheus exporter
includes the additional metadata.
You can then join the metadata labels from the aws_resource_info
metric with the self metrics from the legacy Monitoring agent or any
Amazon EC2 metrics by using MQL for PromQL.
MQL join
For example, the following MQL query joins the agent self metric
agent.googleapis.com/agent/uptime, written
against the aws_ec2_instance resource type, with the
Prometheus CloudWatch exporter aws_resource_info metric, written against the
prometheus-target resource type:
The two metrics are joined on the instance_id label, so the name of the
VM—the value of the metric.tag_Name label in the aws_resource_info
metric—can be added to the result of the join. The agent uptime self
metric includes the resource label region; the join with the region
label works because AWS doesn't specify whether instance IDs must be unique
regionally or globally.
The following example shows a PromQL query that joins the
aws_ec2_cpuutilization_average metric from the Prometheus CloudWatch exporter
The following example shows a PromQL query that joins the
aws_ec2_cpuutilization_average metric from the Prometheus CloudWatch exporter
with the aws_resource_info metadata metric. The metrics are joined
on the instance_id label, to add the VM name—from the tag_Name
label of the metadata metric to the query result.
For users of the legacy Logging agent on Amazon EC2 instances
The legacy Logging agent, google-fluentd, reports its metadata directly
to Cloud Logging, so the deprecation of the AWS CloudWatch metrics
solution using the legacy Monitoring agent has no effect on the logs collected
by the Logging agent.
The legacy Logging agent does, however, report some metrics about itself.
If you want to add metadata to those self-metrics, you can create a
Compute Engine VM, and set up the
Prometheus CloudWatch exporter and Ops Agent
as described in this document. Use the ec2.yml configuration file
when configuring the Prometheus CloudWatch exporter,
You might also need to modify the configuration of your legacy Logging agent.
The output plugin for the legacy Logging agent supports the
use_aws_availability_zone option for AWS. This option must be set to false
so that the agent writes the region label rather than the
availability_zone label. For information about the location of
the plugin configuration file, see Google Cloud fluentd output plugin
configuration.
The ec2.yml configuration file uses the aws_tag_select feature.
When the aws_tag_select feature is used in the configuration,
the Prometheus CloudWatch exporter exports a metric called aws_resource_info.
The aws_resource_info metric reports a time series with metadata of
the AWS resource in the metric labels. This metadata includes all the
Amazon EC2 instance tags, including Instance Name, in the
label tag_Name.
If you want to collect additional metadata, you can add that metadata
by using instance tags on the Amazon EC2 instance; see
Add or remove EC2 instance tags.
The aws_resource_info metric reported by the Prometheus exporter
includes the additional metadata.
You can then join the metadata labels from the aws_resource_info
metric with the self metrics from the legacy Logging agent by using MQL.
For example, the following MQL query joins the agent self metric
agent.googleapis.com/agent/uptime, written
against the aws_ec2_instance resource type, with the
Prometheus CloudWatch exporter aws_resource_info metric, written against the
prometheus-target resource type:
The two metrics are joined on the instance_id label, so the name of the
VM—the value of the metric.tag_Name label in the aws_resource_info
metric—can be added to the result of the join. The agent uptime self
metric includes the resource label region; the join with the region
label works because AWS doesn't specify whether instance IDs must be unique
regionally or globally.
When you created the AWS Connector project
in your Google Cloud project, you created an AWS IAM role that granted
Google Cloud read-only access to your AWS account.
To turn off the legacy AWS CloudWatch metrics in your AWS console,
remove this role. For more information, see Deleting an IAM role
(console).
Prometheus CloudWatch exporter configurations for migration
This section provides configurations that replicate, as closely as
possible, the AWS service metrics documented in the
AWS metrics list. These configuration
files are intended for use by customers who are migrating to the
Prometheus CloudWatch exporter from the legacy solution. If you are setting up
the Prometheus CloudWatch exporter as a new user rather than a migrating user and
you use these configurations, then you are not collecting all the AWS
metrics that the Prometheus CloudWatch exporter makes available.
To view a sample configuration file for AWS CloudWatch metrics, expand one
of the following sections.