To monitor your SAP systems on Google Cloud, you can set up Cloud Monitoring alerting policies that notify your SAP administrators about potential misconfigurations or resource failures.
This document describes some common HA issues and shows how you can create alerting policies for them, or use the example alerts as a reference to create your own custom alerts. The example alerts described in this document use the Monitoring Query Language (MQL) to query metrics generated by Google Cloud's Agent for SAP. Although by default these alerts apply to all SAP systems in a given Google Cloud project, you can customize the alerts to filter the required SIDs or adjust the elapsed time to trigger the alert.
For information about how Cloud Monitoring alerts work, see Alerting Overview.
Before you begin
Ensure that you're familiar with the general concepts of Monitoring alerting policies. For information about alerting policies, see Alerting overview.
On each instance that hosts the SAP system that you want to monitor, make sure that Google Cloud's Agent for SAP is installed and configured to collect the Process Monitoring metrics.
-
To get the permissions that you need to create and modify alerting policies by using the Google Cloud console, ask your administrator to grant you the following IAM roles on your project:
-
All:
Monitoring Editor (
roles/monitoring.editor
)
For more information about granting roles, see Manage access to projects, folders, and organizations.
You might also be able to get the required permissions through custom roles or other predefined roles.
-
All:
Monitoring Editor (
To receive the alerts, create the required notification channels. For redundancy purposes, we recommend that you create multiple notification channels. For more information, see Create and manage notification channels.
Import predefined alert policies
Google Cloud provides predefined alert policies that you can import and set up alerting for some common HA issues. For more information, see the following sections:
- Import alerting for location constraint detection
- Import alerting for resource failure detection
- Import alerting for SAP HANA replication errors
Import alerting for location constraint detection
When you manually move a resource in a Pacemaker cluster using the cluster commands, then that resource gains a constraint, or client preference is set to favor a particular node. Such a situation can hinder the resource from failing over in the event of a system outage. For more information, see Moving One Resource section of the ClusterLabs documentation.
To get notified about such a situation in your SAP HA system running on Google Cloud, you can import the predefined alert policy Pacemaker: Location constraint detected.
This alert policy notifies when a preference-based constraint is detected and
refers your SAP administrators to the "Unintentional node affinity that favors a
particular node" section of the
Troubleshooting high-availability configurations for SAP
guide. This policy uses the Process Monitoring metric
workload.googleapis.com/sap/validation/pacemaker
which is collected by Google Cloud's Agent for SAP.
To import this alerting policy in your Google Cloud project by using the Google Cloud console, complete the following steps:
-
In the Google Cloud console, go to the Integrations page:
If you use the search bar to find this page, then select the result whose subheading is Monitoring.
- Filter for
Google Cloud Agent for SAP
, and then click View Details. - Navigate to the Alerts tab.
- Select Pacemaker: Location constraint detected, and then click Show Options > Customise Alert Policy.
- (Optional) To configure alerting for one or more specific SAP systems
instead of all SAP systems in your Google Cloud project, update the
filter
statement in the Query editor as follows:- Remove the
#
character that precedes thesid
variable. - Specify the required SIDs. To specify multiple SIDs, separate the SIDs
using the
|
character. The following is an example of how such a filter statement looks like:sid=~"ABC|HDB|XYZ"
In this example,
ABC
,XYZ
, andHDB
are SIDs.
- Remove the
- (Optional) To customize the elapsed time before triggering an alert update the
window
statement in the Query editor to specify your preferred unit of measure:- For example, to set a time limit of 3 minutes set:
| window 3m
- For example, to set a time limit of 3 minutes set:
- Under Alert Details, navigate to the Notifications and name tab.
- Select the required notification channels.
- Review the alert and click Create Policy.
Import alerting for resource failure detection
In your HA system, if a running resource agent fails, then Pacemaker attempts to
stop that agent and restart it. If the restart operation fails for any reason,
then Pacemaker sets that resource agent's failcount
value to INFINITY
(if
start-failure-is-fatal
is set to true
, which is the default) and then
attempts to start the agent on a different node. If the resource agent fails to
start on all nodes, then the resource agent remains in the Stopped
status. To
restore this resource agent back to an operational state, an SAP administrator
must manually clear the resource agent's failcount. For more information on the
failcount behavior of Pacemaker, see the
ClusterLabs documentation.
To get notified about such a situation in your SAP HA system running on Google Cloud, you can import the predefined alert policy Pacemaker: Resource failed to start.
This alert policy notifies when a resource agent fails to start and remains in
status Stopped
for more than 3 minutes. This policy refers your SAP
administrator to the "Resource agent is stopped" section of the
Troubleshooting high-availability configurations for SAP
guide. This policy uses the Process Monitoring metric
workload.googleapis.com/sap/cluster/failcounts
,
which is collected by Google Cloud's Agent for SAP.
To import this alerting policy in your Google Cloud project by using the Google Cloud console, complete the following steps:
-
In the Google Cloud console, go to the Integrations page:
If you use the search bar to find this page, then select the result whose subheading is Monitoring.
- Filter for
Google Cloud Agent for SAP
, and then click View Details. - Navigate to the Alerts tab.
- Select Pacemaker: Resource failed to start, and then click Show Options > Customise Alert Policy.
- (Optional) To configure alerting for one or more specific SAP systems
instead of all SAP systems in your Google Cloud project, update the
filter
statement in the Query editor as follows:- Remove the
#
character that precedes thesid
variable. - Specify the required SIDs. To specify multiple SIDs, separate the SIDs
using the
|
character. The following is an example of how such a filter statement looks like:sid=~"ABC|HDB|XYZ"
In this example,
ABC
,XYZ
, andHDB
are SIDs.
- Remove the
- (Optional) To customize the elapsed time before triggering an alert update the
window
statement in the Query editor to specify your preferred unit of measure:- For example, to set a time limit of 3 minutes set:
| window 3m
- For example, to set a time limit of 3 minutes set:
- Under Alert Details, navigate to the Notifications and name tab.
- Select the required notification channels.
- Review the alert and click Create Policy.
Import alerting for SAP HANA replication errors
In the event of an outage on the SAP HANA primary site, an automated failover from primary to the secondary system isn't possible if the secondary is not in sync with the primary.
This alert policy notifies when the replication status of a highly-available SAP HANA system is not in sync for more than a minute. This policy uses the Process Monitoring metric
workload.googleapis.com/sap/hana/ha/replication
,
which is derived from systemReplication.py
script. This policy refers the SAP administrator to check the status and network connectivity of the primary and secondary SAP HANA systems.
To get notified about such a situation in your SAP HANA HA system running on Google Cloud, you can import the predefined alert policy SAP HANA Replication is not in sync.
To import this alerting policy in your Google Cloud project by using the Google Cloud console, complete the following steps:
-
In the Google Cloud console, go to the Integrations page:
If you use the search bar to find this page, then select the result whose subheading is Monitoring.
- Filter for
Google Cloud Agent for SAP
, and then click View Details. - Navigate to the Alerts tab.
- Select SAP HANA Replication is not in sync, and then click Show Options > Customise Alert Policy.
- (Optional) To configure alerting for one or more specific SAP systems
instead of all SAP systems in your Google Cloud project, update the
filter
statement in the Query editor as follows:- Remove the
#
character that precedes thesid
variable. - Specify the required SIDs. To specify multiple SIDs, separate the SIDs
using the
|
character. The following is an example of how such a filter statement looks like:sid=~"ABC|HDB|XYZ"
In this example,
ABC
,XYZ
, andHDB
are SIDs.
- Remove the
- (Optional) To customize the elapsed time before triggering an alert update the
window
statement in the Query editor to specify your preferred unit of measure:- For example, to set a time limit of 3 minutes set:
| window 3m
- For example, to set a time limit of 3 minutes set:
- Under Alert Details, navigate to the Notifications and name tab.
- Select the required notification channels.
- Review the alert and click Create Policy.
Create a custom alerting policy
In addition to importing predefined alert policies, you can update them to create custom ones that suit your requirements. For this, you can use the Google Cloud console, the Cloud Monitoring API, the Google Cloud CLI, or Terraform.
As a starting point, we recommend that you review the summary of example alerting policies as well as the preconfigured alerting policies described in this document.
For information about how to manage or modify alerting policies, see Manage alerting policies.