Monitoring with Stackdriver Monitoring

This document describes how to use the Stackdriver Monitoring console to monitor your Cloud Spanner instances.

The Stackdriver Monitoring console provides several monitoring tools for Cloud Spanner:

If you prefer to monitor Cloud Spanner programmatically, use the Cloud Client Libraries for Stackdriver Monitoring to retrieve metrics.

Before you begin

To use Stackdriver Monitoring, you must have a Stackdriver account. If you do not have a Stackdriver account, you can create a new account.

Using the Stackdriver Monitoring curated dashboard

Stackdriver Monitoring provides you with a curated dashboard that summarizes key information about your Cloud Spanner instances, including:

  • Incidents: User-created monitoring alerts that are open, active, or resolved
  • Events: A list of Cloud Spanner audit logs (if enabled and available)
  • Instances: A high-level summary of your Cloud Spanner instances, including node count, database count, and instance health
  • Aggregated charts of throughput and storage use

Use the following link to access the dashboard:

Go to the dashboard

Viewing instance and database details

When you open the curated dashboard for Cloud Spanner, it shows aggregated data for all of your instances. You can view more details about a specific instance by clicking the instance's name under Instances.

The dashboard displays information such as instance metadata, databases in the instance, and charts of various metrics broken down by region.

From the instance dashboard page, you can also see charts for a specific database in the instance:

  1. On the right-hand side, above the instance metrics charts, click Database metrics.

  2. In the Select a breakdown drop-down list, select the database that you want to examine.

    The Stackdriver Monitoring console displays charts for the database.

Creating custom charts for Cloud Spanner metrics

You can use Stackdriver Monitoring to create custom charts for Cloud Spanner metrics. You can use the Metrics Explorer to create temporary, ad-hoc charts, or you can create charts that appear on custom dashboards.

In particular, Stackdriver Monitoring enables you to create a custom chart that shows whether two or more metrics are correlated with each other. For example, you can check for a correlation between CPU utilization and latency in a Cloud Spanner instance, which might indicate that your instance needs more nodes or that some of your queries are causing high CPU utilization.

To get started with this example, follow these steps:

  1. Go to the Metrics Explorer in the Stackdriver Monitoring console.

    Go to the Metrics Explorer

  2. Click the View options tab, then select the Log scale on Y-axis checkbox. This option helps you compare multiple metrics when one metric has much larger values than the others.

  3. In the drop-down list above the right pane, select Line.

  4. Click the Metrics tab. You can now add metrics to the chart.

To add latency metrics to the chart, follow these steps:

  1. In the Find resource type and metric box, enter the value spanner.googleapis.com/api/request_latencies, then click the row that appears below the box.
  2. In the Filter box, enter the value instance_id, then enter the instance ID you want to examine and click Apply.
  3. In the Aggregator drop-down list, click max.
  4. Optional: Change the latency percentile:

    1. Click Show advanced options.
    2. Click the Aligner drop-down list, then click the latency percentile that you want to view.

      In most cases, you should look at either the 50th percentile latency, to understand the typical amount of latency, or the 99th percentile latency, to understand the latency for the slowest 1% of requests.

To add CPU utilization metrics to the chart, follow these steps:

  1. Click add Add metric.
  2. In the Find resource type and metric box, enter the value spanner.googleapis.com/instance/cpu/utilization, then click the row that appears below the box.
  3. In the Filter box, enter the value instance_id, then enter the instance ID you want to examine and click Apply.
  4. In the Aggregator drop-down list, click max.

You now have a chart that shows the CPU utilization and latency metrics for a Cloud Spanner instance. If both metrics are higher than expected at the same time, you can take additional steps to correct the issue.

For more information about creating custom charts, see the Stackdriver Monitoring documentation.

Creating alerts for Cloud Spanner metrics

When you create a Cloud Spanner instance, you choose the number of nodes that provide compute resources for the instance. As the instance's workload changes, Cloud Spanner does not automatically adjust the number of nodes in the instance. As a result, you need to set up several alerts to ensure that the instance stays within the recommended maximums for CPU utilization and the recommended limit for storage per node.

To create the recommended alerts:

  1. Create an alerting policy in the Stackdriver Monitoring console:

    Create an alerting policy

  2. In the Find resource type and metric box, enter the value shown below, then click the row that appears below the box:

    High-priority CPU

    Enter the value spanner.googleapis.com/instance/cpu/utilization_by_priority.

    24 hour rolling average CPU

    Enter the value spanner.googleapis.com/instance/cpu/smoothed_utilization.

    Storage

    Enter the value spanner.googleapis.com/instance/storage/used_bytes.

  3. Click Show advanced options, then enter the following recommended values for the alerting policy's target and configuration:

    High-priority CPU

    Section Field name Value
    Filter instance_id YOUR_INSTANCE_ID
    priority high
    Aggregator max
    Advanced Aggregation Aligner mean
    Alignment Period 10 m
    Configuration Condition triggers if Any time series violates
    Condition is above
    Threshold 45% for multi-region instances;
    65% for regional instances
    For 10 minutes

    24 hour rolling average CPU

    Section Field name Value
    Filter instance_id YOUR_INSTANCE_ID
    Aggregator sum
    Advanced Aggregation Aligner mean
    Alignment Period 10 m
    Configuration Condition triggers if Any time series violates
    Condition is above
    Threshold 90%
    For 10 minutes

    Storage

    Section Field name Value
    Filter instance_id YOUR_INSTANCE_ID
    Aggregator sum
    Advanced Aggregation Aligner max
    Alignment Period 10 m
    Configuration Condition triggers if Any time series violates
    Condition is above
    Threshold

    1649267441664*, multiplied by the number of nodes in your instance

    * Equal to 1.5 TB, or 75% of the 2 TB limit per node

    Your settings should look similar to the following examples:

    High-priority CPU

    Screenshot of high-priority CPU usage alert Target settings Screenshot of high-priority CPU usage alert Configuration settings

    24 hour rolling average CPU

    Screenshot of rolling average CPU usage alert Target settings Screenshot of rolling average CPU usage alert Configuration settings

    Storage

    Screenshot of storage usage alert Target settings Screenshot of storage usage alert Configuration settings
  4. Click Save.

  5. Optional: To configure your notification settings, click Add Notification Channel.

    You can elect to receive notifications by email, SMS, and several other options.

  6. Optional: Enter a notification message in the Documentation section.

  7. Name your policy and click Save.

  8. Repeat these steps for each of the metrics shown above.

What's next

¿Te sirvió esta página? Envíanos tu opinión:

Enviar comentarios sobre…

Cloud Spanner Documentation