Compute Engine maintains copies of each regional disk in two Google Cloud zones. Each copy is called a zonal replica. When you write data to your disk, Compute Engine synchronously replicates that data to both replicas to ensure high availability (HA). At any given time, the disk replication status of the regional disk tells you about the ability of a disk to synchronously write to both replicas. The disk's replication status is determined by the replica states of the disk's zonal replicas. The replica state for a zone is tells you the state of an individual zonal replica in comparison to the latest data on the disk. If a zonal replica contains the latest disk data, then that replica is considered to be synced with the latest disk data. If both zonal replicas are synced, then your Regional Persistent Disk or Hyperdisk Balanced High Availability disk is considered to be fully replicated.
This document explains how you can monitor the replica states of your regional disks and their disk replication status over a period of time. You can use this document to do the following:
- Check the current and historical replica states of your regional disks.
- To only verify whether the zonal replicas for a specific regional disk are synced or not, monitor using the Google Cloud console.
- To check the exact zonal replica state for replicas of all the disks in a project, monitor using the Cloud Monitoring dashboard.
- Use the replica state information from a specific point in time to determine if your disk was fully replicated.
To learn more about replica state and disk replication status, see About synchronous disk replication.
Required roles
To get the permissions that you need to view replication states using Cloud Monitoring, ask your administrator to grant you the following IAM roles:
-
To view regional disk metrics (one of the following):
-
Monitoring viewer (
roles/monitoring.viewer
) on the project -
Monitoring editor (
roles/monitoring.editor
) on the project
-
Monitoring viewer (
For more information about granting roles, see Manage access to projects, folders, and organizations.
You might also be able to get the required permissions through custom roles or other predefined roles.
Monitor using the Google Cloud console
This section explains how you can monitor the replica states and disk replication status of a Hyperdisk Balanced High Availability or Regional Persistent Disk volume using the Google Cloud console.
Check if zonal replicas are synced for a single disk
You can use the Google Cloud console to check whether the zonal replicas of a regional disk are synced with the latest disk data.
To see detailed information about the exact zonal replica states for all regional disks in a project, check the zonal replica states using the Cloud Monitoring dashboard.
Console
To monitor the zonal replica states for your regional disks, do the following:
In the Google Cloud console, go to the Disks page.
On the Disks page, in the Name column, select the disk for which you want to check the replica states.
The Manage disk page opens for the selected disk and displays the Details tab for that disk.
Click the Observability tab.
The Manage disk page displays the monitoring information for the disk.
To see the historical replica state information for your disk, on the Observability tab, navigate to the Regional Persistent Disk Replication State graph.
The graph displays the replica state values for your zonal replicas over the preceding hour in the form of two separate graph lines.
The replica state value can be one of the following:
0
: The replica is not in sync with the latest disk data.1
. The replica is synced with the latest disk data.
To check the replica state value for your zonal replicas at a specific point in time, do the following:
- Hold the pointer on the graph for the time value at which you want to check the replica state.
- To see the replica state values for your zonal replicas, navigate to the bottom of the graph.
- Optional: To see the name and replica state value denoted by a graph line, hold the pointer over the graph line for any specific time value. The graph highlights the name and time-specific state of that replica inside a tooltip.
Optional: To modify the time period over which you want to see the replica state data, select a time period at the top of the Observability tab. The following options are available:
- 1 hour: the preceding hour. This is the default value.
- 6 hours: the preceding 6 hours.
- 1 day: the preceding day.
- 1 week: the preceding week.
- 1 month: the preceding month.
- 6 weeks: the preceding 6 weeks.
Custom: a specific time period of your choice. To specify a custom monitoring time period, click Custom and then do the following:
- In the Start date and time field, specify the beginning of your monitoring time period. You must specify a time in the past.
- In the End date and time field, specify the end of your monitoring time period. You must specify a time in the past.
- To save your custom monitoring time period, click Apply.
Determine if the disk is fully replicated
After you determine whether or not your zonal replicas are synced with the latest disk data, you can use that information to determine whether or not your disk is fully replicated.
At any given time, the disk was fully replicated if the replica state value
for both zonal replicas was 1
. If that was not the case, check for the exact
replica states at that time to know whether your disk was degraded or
catching up. For more information, see
Monitor using Cloud Monitoring metrics.
Monitor using Cloud Monitoring metrics
You can check detailed information about the exact zonal replica states for all
your regional disks by using the Regional disk replica state
metric in Cloud Monitoring.
About the Regional disk replica state
metric
You can see the current and historical disk replica states of your zonal
replicas on the Cloud Monitoring dashboard.
Compute Engine captures the replica states of your disks every minute and
reports it using the Regional disk replica state
metric. However, if
there is a zonal outage that impacts the compute instance to which a
zonal replica is attached, you won't see any Regional disk replica state
metric data for either zonal replica.
The following are the possible values of the Regional disk replica state
metric. Your zonal replicas are always in one of these
disk replica states.
Synced
: The replica is available, synchronously receives all the writes performed to the disk, and is up to date with all the data on the disk.CatchingUp
: The replica is available but is still catching up with the data on the disk from the other replica.OutOfSync
: The replica is temporarily unavailable and out of sync with the data on the disk.
For information about the metric definition, see the Compute Engine Monitoring metrics section.
You can use the Regional disk replica state
metric data to do the following:
- Determine the replication status of your regional disk.
- Review the replica state history of your regional disk to understand whether your failover architecture works as intended and take necessary action in case the state of your regional disk changes.
- Create alerts based on the
Regional disk replica state
metric data, detect any changes in your replica states, and take the necessary actions. For more information about how to create metric-based alerts, see How to add an alerting policy.
Check the Regional disk replica state
metric data
To see the status of the zonal replicas of an attached regional
disk, build a query and create a temporary chart for the
Regional disk replica state
metric. You can do this on Metrics Explorer
by using the menu-driven interface, Monitoring Query Language (MQL),
or PromQL.
Menu-driven interface
-
In the Google Cloud console, go to the leaderboard Metrics explorer page:
If you use the search bar to find this page, then select the result whose subheading is Monitoring.
The Metrics explorer page opens and displays the Queries tab.
To see the replica state data for each zonal replica in a project, select the time series data for the
Regional disk replica state
metric and then remove the aggregation filter by doing the following in the toolbar of the query pane:In the Metric menu, click Select a metric and then select Disk > Disk > Regional disk replica state.
Click Apply.
In the Aggregation menu, select Unaggregated by None.
A chart appears and displays the metric data from the preceding hour for each replica as a time series. You see the metric data only for zonal replicas of attached disks.
For more information about selecting time series for a metric, see Select metrics when using Metrics Explorer.
To view chart and table views simultaneously, at the top of the chart, click Both.
To view data for all available regional disk properties, at the top of the table view, click
Column display options..., select all the columns and then click Ok.The dashboard displays the following fields for every row in the table, along with their current values:
disk_id
: ID of the diskzone
: The region where the regional disk was created.replica_zone
: Replica zonestate
: Replica statestorage_type
: Storage type of the diskvalue
: Value for the replica state
To view this data on the corresponding time series in the chart view, hold the pointer on the chart at the current time. The chart displays these values inside a tooltip.
To check the historical replica states at a specific point in time, do the following:
Hold the pointer over the chart at a specific time value of your choice. The dashboard displays the metric data for all replica states of all the zonal replicas in your project at that specific point in time.
In the chart view, this information appears inside a tooltip.
In the table view, this information appears as individual rows.
Note the replica states and their corresponding values. At any given time, if a particular state has a value of
1
, then the replica was in that state.In the chart view, check the replica states and values inside the tooltip for the disk IDs and replica zones that you want.
In the table view, check the state and value columns for the specific disk IDs and replica zones that you want.
To learn more about what the replica states and their values mean, see Understand the
Regional disk replica state
metric data.Optional: To view the replica state information for a specific label, in the Filter menu, select the label for which you want to view the data and then complete the dialogue. You can add multiple filters.
The dashboard displays the metric data only for the filtered labels. For more information about filters, see Filter charted data.
For example, to view the replica state data for a specific disk, do the following:
- In the Filter menu, select either the name label.
- In the Comparator menu, select = (equals).
- In the Value menu, select the name of the disk that you want.
Optional: To determine what percentage of the time a specific disk's replicas were synced, filter the data for the specific disk and state and then use the aggregation menu:
- In the Filter menu, select the name label.
- In the Comparator menu, select = (equals).
- In the Value menu, select the name of the disk.
- In the Filter menu, select the state label.
- In the Comparator menu, select = (equals).
- In the Value menu, select Synced.
- In the Aggregation menu, select Mean by replica_zone.
- Select the time period for which you want to see the data.
The dashboard displays the data about the average
synced
status for your disk's replicas over the specified time period. Multiple this data by 100 to determine the percentage of the time for which the replicas were synced. If the value for the average value shows as1
for that time period, then the replica was always up to date with the latest data. An average value that is less than1
indicates that the replica was not synced at some point of time during the specified time period.For more information about grouping and alignment, see Choose how to display charted data.
Optional: To modify the time period over which you want to monitor the metric data, at the top of the dashboard, click Last 1 hour select the time period that you want.
You can select a relative time period to the current time, or specify start and end times of your choice. By default, you see the metric data for the preceding hour.
MQL
-
In the Google Cloud console, go to the leaderboard Metrics explorer page:
If you use the search bar to find this page, then select the result whose subheading is Monitoring.
The Metrics explorer page opens and displays the Queries tab.
In the toolbar of your query pane, click the button whose name starts with < >.
In the Language field, select MQL as your query language. This field is in the same toolbar that lets you format your query.
Optional: Disable the Auto-run toggle.
Enter your query and then click Run query.
When the Auto-run toggle is enabled, the Run query button isn't displayed.
For example, to view the replica state data for a disk called
disk-1
, run the following query:fetch gce_disk | metric 'compute.googleapis.com/disk/regional/replica_state' | filter (metadata.system_labels.name == 'disk-=1') | group_by 1m, [value_replica_state_mean: mean(value.replica_state)] | every 1m
As another example, to determine what percentage of the time the replicas were synced for a disk called
disk-1
, run the following query:fetch gce_disk | metric 'compute.googleapis.com/disk/regional/replica_state' | filter (metadata.system_labels.name == 'disk-1') && (metric.state == 'Synced') | group_by 1m, [value_replica_state_mean: mean(value.replica_state)] | every 1m | group_by [metric.replica_zone], [value_replica_state_mean_mean: mean(value_replica_state_mean)]
To modify the time period over which you want to monitor the metric data, at the top of the dashboard, click Last 1 hour select the time period and time zone that you want.
You can select a relative time period to the current time, or specify start and end times of your choice. By default, you see the metric data for the preceding hour.
PromQL
-
In the Google Cloud console, go to the leaderboard Metrics explorer page:
If you use the search bar to find this page, then select the result whose subheading is Monitoring.
The Metrics explorer page opens and displays the Queries tab.
In the toolbar of your query pane, click the button whose name starts with < >.
In the Language field, select PromQL as your query language. This field is in the same toolbar that lets you format your query.
Optional: Disable the Auto-run toggle.
Enter your query and then click Run query.
When the Auto-run toggle is enabled, the Run query button isn't displayed.
For example, to view the replica state data for a disk called
disk-1
, run the following query:avg_over_time(compute_googleapis_com:disk_regional_replica_state{monitored_resource="gce_disk",metadata_system_name="disk-1"}[${__interval}])
As another example, to determine what percentage of the time the replicas were synced for a disk called
disk-1
, run the following query:avg by (replica_zone)(avg_over_time(compute_googleapis_com:disk_regional_replica_state{monitored_resource="gce_disk",state="Synced",metadata_system_name="disk-1"}[${__interval}]))
To modify the time period over which you want to monitor the metric data, at the top of the dashboard, click Last 1 hour select the time period and time zone that you want.
You can select a relative time period to the current time, or specify start and end times of your choice. By default, you see the metric data for the preceding hour.
Determine the exact zonal replica states using metric data
To understand the Regional disk replica state
metric data for a
regional disk, you must check the state and value
columns for the zonal replicas in your generated chart. If you don't add any
filters to your query, the following things happen:
- The state column displays all the possible disk replica states
for a zonal replica, one of
Synced
,CatchingUp
, andOutOfSync
. The chart displays each of these states in the form of a time series for all zonal replicas of all regional disks in your project. - The value column indicates whether the zonal replica is in a
specific disk replica state or not. This column shows a
corresponding binary value (either
0
or1
) for every value of state for all zonal replica of all regional disks in your project.
For any zonal replica, if the value column shows 1
for a specific disk
replica state, then that zonal replica is in that
specific state. If the value column shows 0
for a specific state, then
that replica is not in that specific state. At any given time, a zonal replica
has exactly one of the disk replica states with 1
in the value column. The
other two disk replica states have 0
in their respective value columns.
For every zonal replica, the chart and table display a separate
entry for each disk replica state: Synced
, CatchingUp
, and OutOfSync
.
The value column for each entry is a binary value (either 0
or 1
) that
indicates whether or not the replica is in that state. At any given time,
a zonal replica has exactly one replica state with its value as 1
.
Determine the exact disk replication status
You can use the replica states of your zonal replicas to determine the replication state of your regional disks in the following way:
- If both the zonal replicas have
1
as the value for theSynced
state, then the disk is fully replicated. - If one of the zonal replicas has
1
as the value for theSynced
state and the other zonal replica has1
as the value for theCatchingUp
state, then the disk is catching up. - If one of the zonal replicas has
1
as the value for theSynced
state and the other zonal replica has1
as the value for theOutOfSync
state, then the disk is degraded.
For example, consider a disk named my-disk1
that has replicas in
us-central1-a
and us-central1-b
. The following scenarios shows the values
of the state and value columns for the zonal replicas for each
possible replication state of my-disk1
:
Fully replicated
In this scenario, the replica in us-central1-a
and the replica in
us-central1-b
are both updated with the latest data on the disk. The chart
displays the following values for each disk replica state for the zonal
replicas of my-disk1
:
replica_zone | state | value |
---|---|---|
us-central1-a |
Synced |
1 |
us-central1-a |
CatchingUp |
0 |
us-central1-a |
OutOfSync |
0 |
us-central1-b |
Synced |
1 |
us-central1-b |
CatchingUp |
0 |
us-central1-b |
OutOfSync |
0 |
Catching up
In this scenario, the replica in us-central1-a
is updated with the data on
the disk and the replica in us-central1-b
is catching up with the data on
the disk. The chart displays the following values for each disk replica
state for the zonal replicas of my-disk1
:
replica_zone | state | value |
---|---|---|
us-central1-a |
Synced |
1 |
us-central1-a |
CatchingUp |
0 |
us-central1-a |
OutOfSync |
0 |
us-central1-b |
Synced |
0 |
us-central1-b |
CatchingUp |
1 |
us-central1-b |
OutOfSync |
0 |
Degraded
In this scenario, the replica in us-central1-a
is updated with the data on
the disk and the replica in us-central1-b
is out of sync. The chart
displays the following values for each disk replica state for the zonal
replicas of my-disk1
:
replica_zone | state | value |
---|---|---|
us-central1-a |
Synced |
1 |
us-central1-a |
CatchingUp |
0 |
us-central1-a |
OutOfSync |
0 |
us-central1-b |
Synced |
0 |
us-central1-b |
CatchingUp |
0 |
us-central1-b |
OutOfSync |
1 |