Monitor the replica states of regional Persistent Disk volumes


Disk replica state for regional Persistent Disk shows you the state of a zonal replica in comparison to the content of the disk. Compute Engine allows you to check the zonal replica states of your attached regional Persistent Disk volumes at all times in Cloud Monitoring. You can check the replica state information for your zonal replicas by using the Regional disk replica state metric.

This document provides an overview of the Regional disk replica state metric and then explains how to do the following:

  • Check the current and historical replica state data of zonal replicas.
  • Understand the replica state data on the metric charts.
  • Use the metric data to determine the replication state of your regional Persistent Disk volume.

For information about the metric definition, see the Compute Engine Monitoring metrics section.

Before you begin

Regional disk replica state metric overview

The Regional disk replica state metric collects and reports the replica states of the zonal replicas every minute. As a result, you can see both the current and historical disk replica state of your regional Persistent Disk zonal replicas. However, at any time, if there is a zonal outage and it impacts the VM to which your regional Persistent Disk volume is attached, then you won't see the Regional disk replica state metric data for either zonal replica.

The following are all the possible values of the Regional disk replica state metric. Your regional Persistent Disk zonal replicas are always in one of these disk replica states.

  • Synced: The replica is available, synchronously receives all the writes performed to the disk, and is up to date with all the data on the disk.
  • CatchingUp: The replica is available but is still catching up with the data on the disk from the other replica.
  • OutOfSync: The replica is temporarily unavailable and out of sync with the data on the disk.

You can use the Regional disk replica state metric data to do the following:

  • Determine the replication state of your regional Persistent Disk.
  • Review the replica state history of your regional Persistent Disk volume to understand whether your failover architecture works as intended and take necessary action in case the state of your regional Persistent Disk volume changes.
  • Create alerts based on the Regional disk replica state metric data, detect any changes in your regional Persistent Disk replica states, and take the necessary actions. For more information about how to create metric-based alerts, see How to add an alerting policy.

Check the Regional disk replica state metric data

You can check the state of your regional Persistent Disk zonal replicas by configuring a temporary chart with the time series data on the Metrics explorer page of the Cloud Monitoring dashboard. You can configure the time series that you want to see in one of the following ways:

To learn more about how to configure temporary charts for your metrics, see Select metrics when using Metrics Explorer.

Configuration tab

To build a query using menu selections and to display the time series for the replica states of your regional Persistent Disk zonal replicas, do the following:

  1. In the Google Cloud console, go to the Monitoring page:

    Go to Monitoring

  2. In the navigation pane, select Metrics Explorer.

  3. Select the Configuration tab.

  4. In the Resource & Metric section, expand the Select a metric menu and use the following menus to select the Regional disk replica state metric.

    1. In the Active resources menu, select Disk
    2. In the Active metric categories menu, select Disk
    3. In the Active metrics menu, select Regional disk replica state.
    4. Click Apply.

    A chart appears with all the active time series for the Regional disk replica state metric for the zonal replicas of all the regional Persistent Disks in your project. You can track the disk replica states by checking the state and value columns.

    • The state column displays all the possible disk replica states for a zonal replica, one of Synced, CatchingUp, and OutOfSync. The chart displays each of these states in the form of a time series for all zonal replicas of all regional Persistent Disks in your project.
    • The value column indicates whether the zonal replica is in a specific disk replica state or not. This column shows a corresponding binary value (either 0 or 1) for every value of state for all zonal replica of all regional Persistent Disks in your project.

    To understand what the disk replica states and their values mean, see Understand the Regional disk replica state metric data.

  5. Optional: To specify a subset of data to display, select Add filter and do the following:

    1. Click Label and then select an entry from the menu.
    2. Click Comparison and then select an entry from the menu.
    3. Click Value and then select the value from the menu or enter a value.
    4. Click Done.

    The chart shows the time series only for the filtered labels. You can add multiple filters.

    For example, you can view disk replica data for a specific regional Persistent Disk by applying a filter with the name or disk_id labels.

    For more information about how to add filters, see Filter charted data.

  6. Optional: To group the data that you want to see, click the text in the Group by text box, and then select the label that you want to use for grouping. You can group by multiple labels. The chart displays one time series for each combination of label values. By default, the chart is grouped by the replica_zone label.

    For example, if you group the data by using the state and replica_zone labels, the resulting chart displays one time series for each combination of zone and replica state for every zonal replica.

    As another example, if you want to check the average Synced status of your regional Persistent Disk replicas over a period of time, you must apply the following filters and grouping:

    1. Select Add filter and do the following:
      1. Click Label and then select state.
      2. Click Comparison and then select = (equals).
      3. Click Value and then select Synced.
      4. Click Done.
    2. Click the text in the Group by text box and select the state and replica_zone labels
    3. In the Aggregator menu, select mean.

    The resulting chart displays one time series for each replica with the average Synced state of that replica over time. If the value for the Synced state shows as 1 throughout that time period, then the replica was always up to date with the latest data. Any drops in the value of the Synced state indicate that the replica was not synced at that time.

  7. Optional: To specify the time period over which you want to see the metric data, select your desired time period on the panel above the generated chart. The following options are available:

    • 1H: the preceding hour.
    • 6H: the preceding 6 hours.
    • 1D: the preceding day.
    • 1W: the preceding week.
    • 1M: the preceding month.
    • 6W: the preceding 6 weeks.
    • Custom: a specific time period of your choice.

MQL tab

To build a query using MQL to display the time series for the replica states of your regional Persistent Disk zonal replicas, do the following:

  1. In the Google Cloud console, go to the Monitoring page:

    Go to Monitoring

  2. In the navigation pane, select Metrics Explorer.

  3. Select the MQL tab.

  4. Enter your query.

  5. Click Run query.

PromQL tab

To build a query using PromQL to display the time series for the replica states of your regional Persistent Disk zonal replicas, do the following:

  1. In the Google Cloud console, go to the Monitoring page:

    Go to Monitoring

  2. In the navigation pane, select Metrics Explorer.

  3. Select the PromQL tab.

  4. Enter your query.

  5. Click Run query.

Understand the Regional disk replica state metric data

To understand the Regional disk replica state metric data for your regional Persistent Disk volume, you must check the state and value columns for the zonal replicas in your generated chart. If you don't add any filters to your query, the following things happen:

  • The state column displays all the possible disk replica states for a zonal replica, one of Synced, CatchingUp, and OutOfSync. The chart displays each of these states in the form of a time series for all zonal replicas of all regional Persistent Disks in your project.
  • The value column indicates whether the zonal replica is in a specific disk replica state or not. This column shows a corresponding binary value (either 0 or 1) for every value of state for all zonal replica of all regional Persistent Disks in your project.

For any zonal replica of a regional Persistent Disk, if the value column shows 1 for a specific disk replica state, then that zonal replica is in that specific state. If the value column shows 0 for a specific state, then that replica is not in that specific state. At any given time, for a specific regional Persistent Disk and a zonal replica, exactly one of the disk replica states has 1 in the value column. The other two disk replica states have 0 in their respective value columns.

Determine the replication state of regional Persistent Disk

At any given time, you can use the replica states of your zonal replicas to determine the replication state of your regional Persistent Disk volume in the following way:

  • If both the zonal replicas have 1 as the value for the Synced state, then your regional Persistent Disk volume is fully replicated.
  • If one of the zonal replicas has 1 as the value for the Synced state and the other zonal replica has 1 as the value for the CatchingUp state, then your regional Persistent Disk volume is catching up.
  • If one of the zonal replicas has 1 as the value for the Synced state and the other zonal replica has 1 as the value for the OutOfSync state, then your regional Persistent Disk volume is degraded.

For example, consider a regional Persistent Disk my-disk1 that has replicas in us-central1-a and us-central1-b. The following scenarios shows the values of the state and value columns for the zonal replicas for each possible replication state of my-disk1:

Fully replicated

In this scenario, the replica in us-central1-a and the replica in us-central1-b are both updated with the latest data on the disk. The chart displays the following values for each disk replica state for the zonal replicas of my-disk1:

replica_zone state value
us-central1-a Synced 1
us-central1-a CatchingUp 0
us-central1-a OutOfSync 0
us-central1-b Synced 1
us-central1-b CatchingUp 0
us-central1-b OutOfSync 0

Catching up

In this scenario, the replica in us-central1-a is updated with the data on the disk and the replica in us-central1-b is catching up with the data on the disk. The chart displays the following values for each disk replica state for the zonal replicas of my-disk1:

replica_zone state value
us-central1-a Synced 1
us-central1-a CatchingUp 0
us-central1-a OutOfSync 0
us-central1-b Synced 0
us-central1-b CatchingUp 1
us-central1-b OutOfSync 0

Degraded

In this scenario, the replica in us-central1-a is updated with the data on the disk and the replica in us-central1-b is out of sync. The chart displays the following values for each disk replica state for the zonal replicas of my-disk1:

replica_zone state value
us-central1-a Synced 1
us-central1-a CatchingUp 0
us-central1-a OutOfSync 0
us-central1-b Synced 0
us-central1-b CatchingUp 0
us-central1-b OutOfSync 1

What's next