Jump to Content
SAP on Google Cloud

Analyze Pacemaker events in Cloud Logging - Part 3

April 4, 2022
Cherry Legler

Senior Technical Solution Engineer

Customer’s deploying SAP on Google Cloud often leverage Pacemaker for high availability to support their most critical systems. Let’s take a look at how you can use Cloud Logging to easily conduct root cause analysis of Pacemaker clusters.

When there are multiple Pacemaker clusters running in Google Cloud platform, a central logging place can help to store the Pacemaker logs and offer an easy way to analyze Pacemaker events such as fencing or resource failover.

The Ops Agent is the primary agent for collecting telemetry from your Compute Engine instances. Combining logging and metrics into a single agent, the Ops Agent uses Fluent Bit for logs, which supports high-throughput logging, and the OpenTelemetry Collector for metrics.

Install the Agent

Follow this guide to install the Ops Agent on a single VM via command line or using the Google Cloud Console. To install the agent to multiple VMs, use gcloud or automation tools. Ensure your VM doesn't have the legacy Cloud Logging Agent or Cloud Monitoring Agent installed on it.

Configure the Agent

By default, the Ops Agent’s build-in configuration collects file-based syslog log. Pacemaker resource agents such as SAPHana write logs to system log /var/log/messages in SAP certified OS SUSE and RedHat.

Add below configuration elements to the user configuration file /etc/google-cloud-ops-agent/config.yaml to stream Pacemaker logs to Cloud Logging. The path defined below covers all default log files that Pacemaker writes logs to in SUSE and RedHat.

Pacemaker-log is the receiver ID which defines the logName "projects/[PROJECT_ID]/logs/pacemaker-log" of the log entries streamed to Cloud Logging.

Note: If there are existing configurations defined in the logging section, then only add the bold parts.


logging:

  receivers:

    pacemaker-log:

      type: files

      include_paths: 

      - /var/log/pacemaker.log

      - /var/log/cluster/corosync.log

      - /var/log/pacemaker/pacemaker.log

  service:

    pipelines:

      pacemaker-pipeline:

        receivers: [pacemaker-log]


Restart the agent 

Restart the agent to apply the user-specified configuration

Loading...

Validate the agent

Validate in logging module log 

/var/log/google-cloud-ops-agent/subagents/logging-module.log to ensure the Pacemaker logs are activated, you should see similar entries as below listing Pacemaker logs. Follow the troubleshooting guide for any issues.

Loading...

Validate cloud logging

Use below log filter (replace PROJECT_ID) in Cloud Logging Logs Explorer to validate the Pacemaker logs are being streamed there.

Loading...

Now you can use Cloud Logging Logs Explorer to analyze Pacemaker events.  Below sample log filter can help to filter the critical Pacemaker actions and events. Replace the INSTANCE_NAME1/2 with the actual instance names of the two cluster nodes. The filter captures 

  • Actions of the cluster nodes, cluster resources such as start, stop or promote

  • Failed resource operations, such as start, stop or promote

  • Fencing actions, reasons (loss of cluster nodes, resource failure etc.) and results

  • Corosync communication errors

  • Cluster membership changes, member joins or leaves

Loading...

Now Pacemaker logs from all your clusters are stored in Cloud Logging, you can analyze Pacemaker events happening to any of your clusters in one central place. If further support is needed from Google Cloud Customer Care Team, efforts and time are saved to collect and transfer logs to the support agent.

Learn more about running SAP on Google Cloud in our public documentation. If you are interested in learning more about running SAP on Google Cloud with Pacemaker, read the other blogs in this series here:

Watch these two tutorial videos to get hands-on instructions.

Posted in