Known issues

This page describes known issues that you might run into while using SAP on Google Cloud. The page includes only those issues that have come to the attention of the SAP specialists on the Cloud Customer Care team.

Other issues that can impact SAP systems might be listed in the documentation for other Google Cloud products and services. For example, issues that are related to Compute Engine VMs, persistent disks, or OS images, are listed on the Compute Engine known issues page.

Changes to the default fencing method can cause fencing timeout in RHEL 8.4

If you are using RHEL 8.4 with the fence agent fence-agents-gce versions 4.2.1-65 to 4.2.1-69, then a fencing timeout might occur.

The fence agent fence-agents-gce versions 4.2.1-65 to 4.2.1-69, do not define the default fencing method cycle. As a result, the default fencing method falls back to the onoff method. This causes the fencing agent to make a stop API call and a start API call instead of a single reset API call. So, the fencing process takes longer to access the APIs, which can lead to a fencing timeout.

Resolution

To resolve this issue, try the following options:

Change the default fencing method to cycle using the following command:
```
pcs resource update <STONITH_device_name> method=cycle
```
Check your fence-agents-gce version and make sure that you are using the version 4.2.1-70 or later:
- To check your fence agent version, run the following command:
```
yum info fence-agents-gce
```
- To update your fence agent, run the following command:
```
yum --releasever=8.6 update fence-agents-gce
```

StorageException for Cloud Storage can cause corrupted Backint agent backup

Under certain conditions, if a StorageException occurs when the Cloud Storage Backint agent for SAP HANA stores a backup in Cloud Storage, the Backint agent might append duplicate data to the backup file, which makes the backup file unusable for recovery.

If you try to recover the database from a backup file with duplicated data, you receive the following error:

  exception 3020043: Wrong checksum

Users affected

SAP HANA users that use the Cloud Storage Backint agent for SAP HANA to store backups in Cloud Storage.

Resolution

To resolve this issue, first install version 1.0.13 or later of the Backint agent and then check the Backint agent logs for any StorageException errors to see if you have been affected by this issue.

For instructions for upgrading the Backint agent, see Updating the Backint agent to a new version

To see if you have been affected by this issue, check the Backint agent logs:

As sidadm user on the SAP HANA host, search the logs for the StorageException message:

grep 'com.google.cloud.storage.StorageException' \
 /usr/sap/$SAPSYSTEMNAME/SYS/global/hdb/opt/backint/backint-gcs/logs/*.log.*

If you find the error message, verify the status of the associated backup:
```
$ hdbbackupcheck -e <var>EBID</var> --backintParamFile /usr/sap/$SAPSYSTEMNAME/SYS/global/hdb/opt/backint/backint-gcs/parameters.txt /usr/sap/$SAPSYSTEMNAME/SYS/global/hdb/backint/DB_$SAPSYSTEMNAME/<var>BACKUP_FILE_NAME</var>
```
In the example, replace the following placeholder values:
- EBID with the external backup ID of the backup.
- BACKUP_FILE_NAME with the file name of the backup file.
If you receive a checksum error, contact Cloud Customer Care.

In addition to the preceding check, to detect this and other issues before your backups are needed, make the following actions a regular part of your backup process:

As per SAP recommended best practices, execute the SAP hdbbackupcheck tool regularly against backups to verify logical consistency. For more information, see SAP Note 1869119.
Test your disaster recovery procedures regularly.

SAP HANA scale-out deployment fails due to a Python error

If you are installing SAP HANA 2.0 SPS 5 Revision 56 or later for an SAP HANA scale-out system with host auto-failover, the SAP HANA scale-out with host auto-failover deployment fails due to a Python error in the storage manager for SAP HANA. The SAP HANA trace log files show the following Python error for this failure: failed with python error: _sap_hana_forbid() got an unexpected keyword argument 'stdout'.

Resolution

Use version 2.2 or later of the storage manager for SAP HANA. Version 2.2 adds support for SAP HANA 2.0 SPS 5 Revision 56 and later. For more information about the storage manager for SAP HANA, see SAP HANA host auto-failover on Google Cloud.

High-availability cluster failover issue due to a Corosync communication delay

For your high-availability (HA) cluster for SAP HANA on Google Cloud, failover can be incorrectly triggered due to a temporary delay in the transmission of Corosync messages between the cluster nodes.

This issue occurs on both SUSE and Red Hat high-availability Linux distributions.

This issue is not specific to Google Cloud, but is described here because it has impacted SAP on Google Cloud users.

Resolution

The resolution of the issue is different depending on your operating system.

SUSE

SUSE provided a Corosync maintenance update that solves the problem. To apply the fix, update your Corosync software to one of the versions that are listed in the following table.

SUSE version	Corosync version
SLES 12 - all SP releases	`corosync-2.3.6-9.19.1`
SLES 15	`corosync-2.4.5-5.13.1`
SLES 15 SP1	`corosync-2.4.5-9.16.1`
SLES 15 SP2	`corosync-2.4.5-10.14.6.1`
SLES 15 SP3	`corosync-2.4.5-12.3.1`
SLES 15 SP4	`corosync-2.4.5-12.7.1`

Red Hat

Red Hat provided a Corosync maintenance update that solves the problem. To apply the fix, update your Corosync software to one of the versions that are listed in the following table.

Red Hat version	Corosync version
RHEL 7	`corosync-2.4.5-7.el7_9.2`
RHEL 8	`corosync-3.1.5-2.el8`

gVNIC reset on RHEL causes failover in HA configuration

If you are using the gVNIC network driver in combination with versions of RHEL prior to 8.7, then you might experience a gVNIC reset causing the concerned VM to lose network connectivity for a couple of seconds, which might result in undesired failovers in your HA cluster.

You might observe a kernel call stack being generated in the messages log file of the OS, for example:

  Feb  4 06:58:33  kernel: ------------[ cut here ]------------
  Feb  4 06:58:33  kernel: NETDEV WATCHDOG: eth0 (gvnic): transmit queue 0 timed out
  Feb  4 06:58:33  kernel: WARNING: CPU: 51 PID: 0 at net/sched/sch_generic.c:447 dev_watchdog+0x272/0x280
  Feb  4 06:58:33  kernel: Modules linked in: falcon_lsm_serviceable(PE) falcon_nf_netcontain(PE) falcon_kal(E) falcon_lsm_pinned_16206(E) binfmt_misc nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nf_tables_set nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set falcon_lsm_pinned_16108(E) nf_tables nfnetlink intel_rapl_msr intel_rapl_common nfit libnvdimm vfat fat dm_mod gve crct10dif_pclmul crc32_pclmul i2c_piix4 ghash_clmulni_intel rapl pcspkr auth_rpcgss sunrpc xfs libcrc32c crc32c_intel serio_raw nvme nvme_core t10_pi [last unloaded: falcon_kal]
  Feb  4 06:58:33  kernel: CPU: 51 PID: 0 Comm: swapper/51 Kdump: loaded Tainted: P            E    --------- -  - 4.18.0-305.82.1.el8_4.x86_64 #1
  Feb  4 06:58:33  kernel: Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 12/13/2023
  Feb  4 06:58:33  kernel: RIP: 0010:dev_watchdog+0x272/0x280
  ...
  Feb  4 06:58:33  kernel: ---[ end trace d6c7c7cb653cce9a ]---
  Feb  4 06:58:33  kernel: gvnic 0000:00:03.0: Performing reset

Cause

The cause of this issue is that RHEL versions prior to 8.7 contain an earlier build of the gVNIC driver that doesn't have the required enhancements and stability patches.

Resolution

Use an SAP-certified version of RHEL that is later than 8.7, in combination with the gVNIC driver. Doing so is particularly important if you're using a third generation machine from Compute Engine, such as M3, because they don't support using the VirtIO driver, thus requiring you to use the gVNIC driver. For the full list of machine types that default to gVNIC, see the Machine series comparison table.

Known issues Stay organized with collections Save and categorize content based on your preferences.

Changes to the default fencing method can cause fencing timeout in RHEL 8.4

Resolution

StorageException for Cloud Storage can cause corrupted Backint agent backup

Users affected

Resolution

SAP HANA scale-out deployment fails due to a Python error

Resolution

High-availability cluster failover issue due to a Corosync communication delay

Resolution

SUSE

Red Hat

gVNIC reset on RHEL causes failover in HA configuration

Cause

Resolution

Known issues