Create metrics based alert policies for SAP on Google Cloud

To monitor your SAP systems on Google Cloud, you can set up Cloud Monitoring alerting policies that notify your SAP administrators about potential misconfigurations or resource failures.

This document describes some common HA issues and shows how you can create alerting policies for them, or use the example alerts as a reference to create your own custom alerts. The example alerts described in this document use the Monitoring Query Language (MQL) to query metrics generated by Google Cloud's Agent for SAP. Although by default these alerts apply to all SAP systems in a given Google Cloud project, you can customize the alerts to filter the required SIDs or adjust the elapsed time to trigger the alert.

For information about how Cloud Monitoring alerts work, see Alerting Overview.

Before you begin

Import predefined alert policies

Google Cloud provides predefined alert policies that you can import and set up alerting for some common HA issues. For more information, see the following sections:

Import alerting for location constraint detection

When you manually move a resource in a Pacemaker cluster using the cluster commands, then that resource gains a constraint, or client preference is set to favor a particular node. Such a situation can hinder the resource from failing over in the event of a system outage. For more information, see Moving One Resource section of the ClusterLabs documentation.

To get notified about such a situation in your SAP HA system running on Google Cloud, you can import the predefined alert policy Pacemaker: Location constraint detected.

This alert policy notifies when a preference-based constraint is detected and refers your SAP administrators to the "Unintentional node affinity that favors a particular node" section of the Troubleshooting high-availability configurations for SAP guide. This policy uses the Process Monitoring metric workload.googleapis.com/sap/validation/pacemaker which is collected by Google Cloud's Agent for SAP.

To import this alerting policy in your Google Cloud project by using the Google Cloud console, complete the following steps:

  1. In the Google Cloud console, go to the  Integrations page:

    Go to Integrations

    If you use the search bar to find this page, then select the result whose subheading is Monitoring.

  2. Filter for Google Cloud Agent for SAP, and then click View Details.
  3. Navigate to the Alerts tab.
  4. Select Pacemaker: Location constraint detected, and then click Show Options > Customise Alert Policy.
  5. (Optional) To configure alerting for one or more specific SAP systems instead of all SAP systems in your Google Cloud project, update the filter statement in the Query editor as follows:
    1. Remove the # character that precedes the sid variable.
    2. Specify the required SIDs. To specify multiple SIDs, separate the SIDs using the | character. The following is an example of how such a filter statement looks like:
      sid=~"ABC|HDB|XYZ"

      In this example, ABC, XYZ, and HDB are SIDs.

  6. (Optional) To customize the elapsed time before triggering an alert update the window statement in the Query editor to specify your preferred unit of measure:
    1. For example, to set a time limit of 3 minutes set:
      | window 3m
  7. Under Alert Details, navigate to the Notifications and name tab.
  8. Select the required notification channels.
  9. Review the alert and click Create Policy.

Import alerting for resource failure detection

In your HA system, if a running resource agent fails, then Pacemaker attempts to stop that agent and restart it. If the restart operation fails for any reason, then Pacemaker sets that resource agent's failcount value to INFINITY (if start-failure-is-fatal is set to true, which is the default) and then attempts to start the agent on a different node. If the resource agent fails to start on all nodes, then the resource agent remains in the Stopped status. To restore this resource agent back to an operational state, an SAP administrator must manually clear the resource agent's failcount. For more information on the failcount behavior of Pacemaker, see the ClusterLabs documentation.

To get notified about such a situation in your SAP HA system running on Google Cloud, you can import the predefined alert policy Pacemaker: Resource failed to start.

This alert policy notifies when a resource agent fails to start and remains in status Stopped for more than 3 minutes. This policy refers your SAP administrator to the "Resource agent is stopped" section of the Troubleshooting high-availability configurations for SAP guide. This policy uses the Process Monitoring metric workload.googleapis.com/sap/cluster/failcounts, which is collected by Google Cloud's Agent for SAP.

To import this alerting policy in your Google Cloud project by using the Google Cloud console, complete the following steps:

  1. In the Google Cloud console, go to the  Integrations page:

    Go to Integrations

    If you use the search bar to find this page, then select the result whose subheading is Monitoring.

  2. Filter for Google Cloud Agent for SAP, and then click View Details.
  3. Navigate to the Alerts tab.
  4. Select Pacemaker: Resource failed to start, and then click Show Options > Customise Alert Policy.
  5. (Optional) To configure alerting for one or more specific SAP systems instead of all SAP systems in your Google Cloud project, update the filter statement in the Query editor as follows:
    1. Remove the # character that precedes the sid variable.
    2. Specify the required SIDs. To specify multiple SIDs, separate the SIDs using the | character. The following is an example of how such a filter statement looks like:
      sid=~"ABC|HDB|XYZ"

      In this example, ABC, XYZ, and HDB are SIDs.

  6. (Optional) To customize the elapsed time before triggering an alert update the window statement in the Query editor to specify your preferred unit of measure:
    1. For example, to set a time limit of 3 minutes set:
      | window 3m
  7. Under Alert Details, navigate to the Notifications and name tab.
  8. Select the required notification channels.
  9. Review the alert and click Create Policy.

Import alerting for SAP HANA replication errors

In the event of an outage on the SAP HANA primary site, an automated failover from primary to the secondary system isn't possible if the secondary is not in sync with the primary.

This alert policy notifies when the replication status of a highly-available SAP HANA system is not in sync for more than a minute. This policy uses the Process Monitoring metric workload.googleapis.com/sap/hana/ha/replication, which is derived from systemReplication.py script. This policy refers the SAP administrator to check the status and network connectivity of the primary and secondary SAP HANA systems.

To get notified about such a situation in your SAP HANA HA system running on Google Cloud, you can import the predefined alert policy SAP HANA Replication is not in sync.

To import this alerting policy in your Google Cloud project by using the Google Cloud console, complete the following steps:

  1. In the Google Cloud console, go to the  Integrations page:

    Go to Integrations

    If you use the search bar to find this page, then select the result whose subheading is Monitoring.

  2. Filter for Google Cloud Agent for SAP, and then click View Details.
  3. Navigate to the Alerts tab.
  4. Select SAP HANA Replication is not in sync, and then click Show Options > Customise Alert Policy.
  5. (Optional) To configure alerting for one or more specific SAP systems instead of all SAP systems in your Google Cloud project, update the filter statement in the Query editor as follows:
    1. Remove the # character that precedes the sid variable.
    2. Specify the required SIDs. To specify multiple SIDs, separate the SIDs using the | character. The following is an example of how such a filter statement looks like:
      sid=~"ABC|HDB|XYZ"

      In this example, ABC, XYZ, and HDB are SIDs.

  6. (Optional) To customize the elapsed time before triggering an alert update the window statement in the Query editor to specify your preferred unit of measure:
    1. For example, to set a time limit of 3 minutes set:
      | window 3m
  7. Under Alert Details, navigate to the Notifications and name tab.
  8. Select the required notification channels.
  9. Review the alert and click Create Policy.

Create a custom alerting policy

In addition to importing predefined alert policies, you can update them to create custom ones that suit your requirements. For this, you can use the Google Cloud console, the Cloud Monitoring API, the Google Cloud CLI, or Terraform.

As a starting point, we recommend that you review the summary of example alerting policies as well as the preconfigured alerting policies described in this document.

For information about how to manage or modify alerting policies, see Manage alerting policies.