Create metrics based alert policies for SAP on Google Cloud

To monitor your SAP systems on Google Cloud, you can set up Cloud Monitoring alerting policies that notify your SAP administrators about potential misconfigurations or resource failures.

This document describes some common HA issues and shows how you can create alerting policies for them, or use the example alerts as a reference to create your own custom alerts. The example alerts described in this document use the Monitoring Query Language (MQL) to query metrics generated by Google Cloud's Agent for SAP. Although by default these alerts apply to all SAP systems in a given Google Cloud project, you can customize the alerts to filter the required SIDs or adjust the elapsed time to trigger the alert.

For information about how Cloud Monitoring alerts work, see Alerting Overview.

Before you begin

  • Ensure that you're familiar with the general concepts of Monitoring alerting policies. For information about alerting policies, see Alerting overview.

  • On each instance that hosts the SAP system that you want to monitor, make sure that Google Cloud's Agent for SAP is installed and configured to collect the Process Monitoring metrics.

  • To get the permissions that you need to create and modify alerting policies by using the Google Cloud console, ask your administrator to grant you the following IAM roles on your project:

    For more information about granting roles, see Manage access.

    You might also be able to get the required permissions through custom roles or other predefined roles.

  • To receive the alerts, create the required notification channels. For redundancy purposes, we recommend that you create multiple notification channels. For more information, see Create and manage notification channels.

Import predefined alert policies

Google Cloud provides predefined alert policies that you can import and set up alerting for some common HA issues. For more information, see the following sections:

Import alerting for location constraint detection

When you manually move a resource in a Pacemaker cluster using the cluster commands, then that resource gains a constraint, or client preference is set to favor a particular node. Such a situation can hinder the resource from failing over in the event of a system outage. For more information, see Moving One Resource section of the ClusterLabs documentation.

To get notified about such a situation in your SAP HA system running on Google Cloud, you can import the predefined alert policy Pacemaker: Location constraint detected.

This alert policy notifies when a preference-based constraint is detected and refers your SAP administrators to the "Unintentional node affinity that favors a particular node" section of the Troubleshooting high-availability configurations for SAP guide. This policy uses the Process Monitoring metric workload.googleapis.com/sap/validation/pacemaker which is collected by Google Cloud's Agent for SAP.

To import this alerting policy in your Google Cloud project by using the Google Cloud console, complete the following steps:

  1. In the Google Cloud console, go to the  Integrations page:

    Go to Integrations

    If you use the search bar to find this page, then select the result whose subheading is Monitoring.

  2. Filter for Google Cloud Agent for SAP, and then click View Details.
  3. Navigate to the Alerts tab.
  4. Select Pacemaker: Location constraint detected, and then click Show Options > Customise Alert Policy.
  5. (Optional) To configure alerting for one or more specific SAP systems instead of all SAP systems in your Google Cloud project, update the filter statement in the Query editor as follows:
    1. Remove the # character.
    2. Specify the required SIDs. To specify multiple SIDs, separate the SIDs using the | character. The following is an example of how such a filter statement looks like:
      | filter (metric.sid =~ 'ABC|XYZ|HDB')

      In this example, ABC, XYZ, and HDB are SIDs.

  6. (Optional) To customize the elapsed time before triggering an alert update the window statement in the Query editor to specify your preferred unit of measure:
    1. For example, to set a time limit of 3 minutes set:
      | window 3m
  7. Under Alert Details, navigate to the Notifications and name tab.
  8. Select the required notification channels.
  9. Review the alert and click Create Policy.

Import alerting for resource failure detection

In your HA system, if a running resource agent fails, then Pacemaker attempts to stop that agent and restart it. If the restart operation fails for any reason, then Pacemaker sets that resource agent's failcount value to INFINITY (if start-failure-is-fatal is set to true, which is the default) and then attempts to start the agent on a different node. If the resource agent fails to start on all nodes, then the resource agent remains in the Stopped status. To restore this resource agent back to an operational state, an SAP administrator must manually clear the resource agent's failcount. For more information on the failcount behavior of Pacemaker, see the ClusterLabs documentation.

To get notified about such a situation in your SAP HA system running on Google Cloud, you can import the predefined alert policy Pacemaker: Resource failed to start.

This alert policy notifies when a resource agent fails to start and remains in status Stopped for more than 3 minutes. This policy refers your SAP administrator to the "Resource agent is stopped" section of the Troubleshooting high-availability configurations for SAP guide. This policy uses the Process Monitoring metric workload.googleapis.com/sap/cluster/failcounts, which is collected by Google Cloud's Agent for SAP.

To import this alerting policy in your Google Cloud project by using the Google Cloud console, complete the following steps:

  1. In the Google Cloud console, go to the  Integrations page:

    Go to Integrations

    If you use the search bar to find this page, then select the result whose subheading is Monitoring.

  2. Filter for Google Cloud Agent for SAP, and then click View Details.
  3. Navigate to the Alerts tab.
  4. Select Pacemaker: Resource failed to start, and then click Show Options > Customise Alert Policy.
  5. (Optional) To configure alerting for one or more specific SAP systems instead of all SAP systems in your Google Cloud project, update the filter statement in the Query editor as follows:
    1. Remove the # character.
    2. Specify the required SIDs. To specify multiple SIDs, separate the SIDs using the | character. The following is an example of how such a filter statement looks like:
      | filter (metric.sid =~ 'ABC|XYZ|HDB')

      In this example, ABC, XYZ, and HDB are SIDs.

  6. (Optional) To customize the elapsed time before triggering an alert update the window statement in the Query editor to specify your preferred unit of measure:
    1. For example, to set a time limit of 3 minutes set:
      | window 3m
  7. Under Alert Details, navigate to the Notifications and name tab.
  8. Select the required notification channels.
  9. Review the alert and click Create Policy.

Create a custom alerting policy

In addition to importing predefined alert policies, you can update them to create custom ones that suit your requirements. For this, you can use the Google Cloud console, the Cloud Monitoring API, the Google Cloud CLI, or Terraform.

As a starting point, we recommend that you review the summary of example alerting policies as well as the preconfigured alerting policies described in this document.

For information about how to manage or modify alerting policies, see Manage alerting policies.