High-availability planning guide for SAP NetWeaver on Google Cloud

This guide provides an overview of the options, recommendations, and general concepts that you need to know before you deploy a high-availability (HA) SAP NetWeaver system on Google Cloud.

This guide assumes that you already have an understanding of the concepts and practices that are generally required to implement an SAP NetWeaver high-availability system. Therefore, the guide focuses primarily on what you need to know to implement such a system on Google Cloud.

If you need to know more about the general concepts and practices that are required to implement an SAP NetWeaver HA system, see:

This planning guide focuses solely on HA for SAP NetWeaver and does not cover HA for database systems. For information about HA for SAP HANA, see the SAP HANA high availability planning guide.

Deployment architecture

The following diagram shows a basic Linux HA cluster that uses Pacemaker cluster software.

The cluster includes two hosts: a primary host and a secondary host. Each host is located in a different zone within the same region.

The active Central Services instance and the inactive Enqueue Replication Server (ERS) instance are on the primary host. The active ERS instance and the inactive Central Services instance are on the secondary host. Each Central Services and ERS pair has its own virtual IP address (VIP). In the diagram, "Central Services" represents either ABAP SAP Central Services or, for a Java stack, SAP Central Services.

A basic HA setup for SAP NetWeaver on Google Cloud with two hosts,
each in a different zone

The high availability of Google Cloud infrastructure

Google Cloud is highly available by design, with a redundant infrastructure of data centers around the world that contain zones designed to be independent from each other. Zones usually have power, cooling, networking, and control planes that are isolated from other zones. If a single failure event were to occur, in most cases, it would affect only a single zone.

In some cases, you might meet your availability requirements without implementing all of the traditional, on-premises safeguards against hardware, storage, and networking failures, which could save you both time and money.

Before you design and implement your high-availability strategy on Google Cloud, review the Google Cloud Service Level Agreements.

For general information about the reliability, privacy, and security of Google Cloud, see Trusted Infrastructure.

HA clustering options for SAP systems on Google Cloud

You define a high-availability (HA) cluster for SAP NetWeaver on Google Cloud by using the same types of third-party HA cluster software that you might use in an on-premises installation. The HA cluster software monitors the health of the systems and manages the failover when problems occur.

You can use a number of different HA cluster software solutions, such as the following:

  • Red Hat Enterprise Linux (RHEL) for SAP Solutions
  • SUSE Linux Enterprise Server (SLES) for SAP Applications
  • Windows Server Failover Clustering

Linux HA clustering software

Recent versions of both RHEL and SLES include integrated HA support that is enabled specifically for Google Cloud. To check if your Linux version includes Google Cloud-enabled HA support, look for "GCP-HA" in the table in Operating system support for SAP NetWeaver on Google Cloud.

Windows HA clustering software

On Windows Server you use Windows Server Failover Clustering (WSFC) to create the HA cluster, as described in Running Windows Server Failover Clustering.

On Google Cloud, the routing of incoming traffic to the active node in a WSFC cluster is managed by Cloud Load Balancing, which does not require an alias IP or static route VIP implementation.

Cloud Load Balancing uses health checks to determine the active node.

Google Cloud zones, regions, and SAP NetWeaver HA deployments

Deploy the nodes of your HA cluster across two or more Compute Engine zones within the same region. Deploying the nodes in different zones ensures that they are on different physical machines and also protects against the very unlikely possibility of a zonal failure.

Keeping the zones within the same region ensures that the nodes are close enough geographically to meet SAP latency requirements for high-availability systems.

Compute Engine virtual machines and SAP NetWeaver HA deployments

To support high availability, Compute Engine VMs are backed by live migration and automatic restart.

Compute Engine live migration

Compute Engine monitors the state of the underlying infrastructure. When an infrastructure maintenance event occurs, Compute Engine automatically migrates your instance away from the event and, if possible, keeps your instance running during the migration. No user intervention is required.

In the case of major outages, there might be a slight delay between when the instance goes down and when it is available.

In most cases, live migration events occur without impacting the HA cluster. However, test your HA cluster by simulating a live migration of the active host after your HA cluster is set up and the systems are running, especially if your HA cluster monitor is configured with a low failover threshold. For more information about simulating a live migration event, see Testing your availability policies.

A migrated instance is identical to the original instance, including the instance ID, private IP address, and all instance metadata and storage.

By default, standard instances are set to live migrate. We recommend not changing this setting.

For more information, see Live migrate.

Compute Engine automatic restart

If your instance is set to terminate when there is a maintenance event, or if your instance crashes because of an underlying hardware issue, you can set up Compute Engine to automatically restart the instance. By default, instances are set to automatically restart. We recommend not changing this setting.

For more information about automatic restart, see Automatic restart.

Storage options for HA SAP systems on Google Cloud

The SAP NetWeaver global file system is a single point of failure that needs to be available to all of the SAP NetWeaver instances in your HA system. To ensure the availability of the global file system on Google Cloud, you can use either highly available shared storage or replicated zonal persistent disks.

For a high-availability shared storage solution you can use third-party file-sharing solutions, such as NetApp Cloud Volumes. Google Cloud provides an NFS file server solution, Filestore, but Filestore does not currently provide a file server that is highly available across zones.

For replication of zonal persistent disks for Linux systems, you can use a Distributed Replicated Block Device (DRBD) to replicate the persistent disks that contain the SAP global file system between the nodes in your HA cluster.

Although Compute Engine regional persistent disks offer synchronously replicated block storage across zones, they are not currently supported for SAP NetWeaver HA systems.

For more information about storage options on Google Cloud, see:

Networking options for HA SAP systems

When you set up your network for your HA cluster, in addition to completing the steps in Creating a network, you need to complete the following HA-specific tasks:

  • Choose your VIP implementation for Linux systems, as described in the following section. Windows systems use an internal load balancer, which doesn't require the same VIP solutions as Linux systems.
  • Define the communication path between the SAP Central Services instance and the Enqueue Replication Server instance.
  • Define firewall rules to support your defined communication paths.

Virtual IP implementation on Google Cloud

A high-availability cluster uses a floating or virtual IP address (VIP) to move its workload from one cluster node to another in the event of an unexpected failure or for scheduled maintenance. The IP address of the VIP doesn't change, so client applications are unaware that the work is being served by a different node.

A VIP is also referred to as a floating IP address.

On Google Cloud, VIPs are implemented slightly differently than they are in on-premises installations, in that when a failover occurs, gratuitous ARP requests cannot be used to announce the change. Instead, you can implement a VIP address for an SAP HA cluster by using one of the following methods:

Internal TCP/UDP Load Balancing VIP implementations

A load balancer typically distributes user traffic across multiple instances of your applications, both to distribute the workload across multiple active systems and to protect against a processing slowdown or failure on any one instance.

The Internal TCP/UDP Load Balancing service also provides failover support that you can use with Compute Engine health checks to detect failures, trigger failover, and reroute traffic to a new primary SAP system in an OS-native HA cluster.

Internal TCP/UDP Load Balancing failover support is the recommended VIP implementation for a variety of reasons, including:

  • Load balancing on Compute Engine offers a 99.99% availability SLA.
  • Load balancing supports multi-zone high-availability clusters, which protects against zone failures with predictable cross-zone failover times.
  • Using load balancing reduces the time required to detect and trigger a failover, usually within seconds of the failure. Overall failover times are dependent on the failover times of each of the components in the HA system, which can include the hosts, database systems, application systems, and more.
  • Using load balancing simplifies cluster configuration and reduces dependencies.
  • Unlike a VIP implementation that uses routes, with load balancing, you can use IP ranges from your own VPC network, allowing you to reserve and configure them as needed.
  • Load balancing can easily be used to reroute traffic to a secondary system for planned maintenance outages.

When you create a health check for a load balancer implementation of a VIP, you specify the host port that the health check probes to determine the health of the host. For an SAP HA cluster, specify a target host port that is in the private range, 49152-65535, to avoid clashing with other services. On the host VM, configure the target port with a secondary helper service, such as the socat utility or HAProxy.

For database clusters in which the secondary, standby system remains online, the health check and helper service enables load balancing to direct traffic to the online system that is currently serving as the primary system in the cluster.

Using the helper service and port redirection, you can trigger a failover for planned software maintenance on your SAP systems.

You can change the default routing behavior for HA cluster nodes by removing the VIP from the local Linux OS routing tables on each node in the cluster. By removing the entry, messages sent from a cluster node to the VIP are directed to the default gateway first and then the VIP. The load balancer then treats the messages like any other front-end traffic and forwards them to the node that is currently hosting as active primary system.

For more information about the failover support of the Internal TCP/UDP Load Balancing, see Configuring failover for Internal TCP/UDP Load Balancing.

To deploy an HA cluster with a load-balancer VIP implementation, see:

Static route VIP implementations

The static route implementation also provides protection against zone failures, but requires you to use a VIP outside of the IP ranges of your existing VPC subnets where the VMs reside. Consequently, you also need to make sure that the VIP does not conflict with any external IP addresses in your extended network.

Static route implementations can also introduce complexity when used with shared VPC configurations, which are intended to segregate network configuration to a host project.

If you use a static route implementation for your VIP, consult with your network administrator to determine a suitable IP address for a static route implementation.

Alias IP VIP implementations

Alias IP VIP implementations are not recommended for multi-zone HA deployments because, if a zone fails, the reallocation of the alias IP to a node in a different zone can be delayed. Implement your VIP with an Internal TCP/UDP Load Balancing with failover support instead.

If you are deploying all nodes of your SAP HA cluster in the same zone, you can use an alias IP to implement a VIP for the HA cluster.

If you have existing multi-zone SAP HA clusters that use an alias IP implementation for the VIP, you can migrate to an Internal TCP/UDP Load Balancing implementation without changing your VIP address. Both alias IP and Internal TCP/UDP Load Balancing use IP ranges from your VPC network.

While alias IP addresses are not recommended for VIP implementations in multi-zone HA clusters, they have other use cases in SAP deployments. For example, they can be used to provide a logical host name and IP assignments for flexible SAP deployments, such as those managed by SAP Landscape Management.

General best practices for VIPs on Google Cloud

For more information about VIPs on Google Cloud, see Best Practices for Floating IP Addresses.