Analyze Pacemaker events in Cloud Logging - Part 3
Cherry Legler
Senior Technical Solution Engineer
Customer’s deploying SAP on Google Cloud often leverage Pacemaker for high availability to support their most critical systems. Let’s take a look at how you can use Cloud Logging to easily conduct root cause analysis of Pacemaker clusters.
When there are multiple Pacemaker clusters running in Google Cloud platform, a central logging place can help to store the Pacemaker logs and offer an easy way to analyze Pacemaker events such as fencing or resource failover.
The Ops Agent is the primary agent for collecting telemetry from your Compute Engine instances. Combining logging and metrics into a single agent, the Ops Agent uses Fluent Bit for logs, which supports high-throughput logging, and the OpenTelemetry Collector for metrics.
Install the Agent
Follow this guide to install the Ops Agent on a single VM via command line or using the Google Cloud Console. To install the agent to multiple VMs, use gcloud or automation tools. Ensure your VM doesn't have the legacy Cloud Logging Agent or Cloud Monitoring Agent installed on it.Configure the Agent
By default, the Ops Agent’s build-in configuration collects file-based syslog log. Pacemaker resource agents such as SAPHana write logs to system log /var/log/messages in SAP certified OS SUSE and RedHat.Add below configuration elements to the user configuration file /etc/google-cloud-ops-agent/config.yaml to stream Pacemaker logs to Cloud Logging. The path defined below covers all default log files that Pacemaker writes logs to in SUSE and RedHat.
Pacemaker-log is the receiver ID which defines the logName "projects/[PROJECT_ID]/logs/pacemaker-log" of the log entries streamed to Cloud Logging.
Note: If there are existing configurations defined in the logging section, then only add the bold parts.
logging:
receivers:
pacemaker-log:
type: files
include_paths:
- /var/log/pacemaker.log
- /var/log/cluster/corosync.log
- /var/log/pacemaker/pacemaker.log
service:
pipelines:
pacemaker-pipeline:
receivers: [pacemaker-log]
Restart the agent
Restart the agent to apply the user-specified configurationValidate the agent
Validate in logging module log/var/log/google-cloud-ops-agent/subagents/logging-module.log to ensure the Pacemaker logs are activated, you should see similar entries as below listing Pacemaker logs. Follow the troubleshooting guide for any issues.
Validate cloud logging
Use below log filter (replace PROJECT_ID) in Cloud Logging Logs Explorer to validate the Pacemaker logs are being streamed there.Now you can use Cloud Logging Logs Explorer to analyze Pacemaker events. Below sample log filter can help to filter the critical Pacemaker actions and events. Replace the INSTANCE_NAME1/2
with the actual instance names of the two cluster nodes. The filter captures
Actions of the cluster nodes, cluster resources such as start, stop or promote
Failed resource operations, such as start, stop or promote
Fencing actions, reasons (loss of cluster nodes, resource failure etc.) and results
Corosync communication errors
Cluster membership changes, member joins or leaves
Now Pacemaker logs from all your clusters are stored in Cloud Logging, you can analyze Pacemaker events happening to any of your clusters in one central place. If further support is needed from Google Cloud Customer Care Team, efforts and time are saved to collect and transfer logs to the support agent.
Learn more about running SAP on Google Cloud in our public documentation. If you are interested in learning more about running SAP on Google Cloud with Pacemaker, read the other blogs in this series here:
Using Pacemaker for SAP high availability on Google Cloud - Part 1
What’s happening in your SAP systems? Find out with Pacemaker Alerts - Part 2
Watch these two tutorial videos to get hands-on instructions.