This page describes known issues that you might run into while using SAP on Google Cloud. The page includes only those issues that have come to the attention of the SAP specialists on the Cloud Customer Care team.
Other issues that can impact SAP systems might be listed in the documentation for other Google Cloud products and services. For example, issues that are related to Compute Engine VMs, persistent disks, or OS images, are listed on the Compute Engine known issues page.
Changes to the default fencing method can cause fencing timeout in RHEL 8.4
If you are using RHEL 8.4 with the fence agent fence-agents-gce
versions
4.2.1-65
to 4.2.1-69
, then a fencing timeout might occur.
The fence agent fence-agents-gce
versions 4.2.1-65
to 4.2.1-69
, do not
define the default fencing method cycle
. As a result, the default fencing
method falls back to the onoff
method. This causes the fencing agent to make
a stop
API call and a start
API call instead of a single reset
API call.
So, the fencing process takes longer to access the APIs, which can lead to
a fencing timeout.
Resolution
To resolve this issue, try the following options:
Change the default fencing method to
cycle
using the following command:pcs resource update <STONITH_device_name> method=cycle
Check your
fence-agents-gce
version and make sure that you are using the version4.2.1-70
or later:- To check your fence agent version, run the following command:
yum info fence-agents-gce
- To update your fence agent, run the following command:
yum --releasever=8.6 update fence-agents-gce
StorageException for Cloud Storage can cause corrupted Backint agent backup
Under certain conditions, if a StorageException occurs when the Cloud Storage Backint agent for SAP HANA stores a backup in Cloud Storage, the Backint agent might append duplicate data to the backup file, which makes the backup file unusable for recovery.
If you try to recover the database from a backup file with duplicated data, you receive the following error:
exception 3020043: Wrong checksum
Users affected
SAP HANA users that use the Cloud Storage Backint agent for SAP HANA to store backups in Cloud Storage.
Resolution
To resolve this issue, first install version 1.0.13 or later of the Backint agent and then check the Backint agent logs for any StorageException errors to see if you have been affected by this issue.
For instructions for upgrading the Backint agent, see Updating the Backint agent to a new version
To see if you have been affected by this issue, check the Backint agent logs:
As sidadm user on the SAP HANA host, search the logs for the
StorageException
message:grep 'com.google.cloud.storage.StorageException' \ /usr/sap/$SAPSYSTEMNAME/SYS/global/hdb/opt/backint/backint-gcs/logs/*.log.*
If you find the error message, verify the status of the associated backup:
$ hdbbackupcheck -e <var>EBID</var> --backintParamFile /usr/sap/$SAPSYSTEMNAME/SYS/global/hdb/opt/backint/backint-gcs/parameters.txt /usr/sap/$SAPSYSTEMNAME/SYS/global/hdb/backint/DB_$SAPSYSTEMNAME/<var>BACKUP_FILE_NAME</var>
In the example, replace the following placeholder values:
- EBID with the external backup ID of the backup.
- BACKUP_FILE_NAME with the file name of the backup file.
If you receive a
checksum
error, contact Cloud Customer Care.
In addition to the preceding check, to detect this and other issues before your backups are needed, make the following actions a regular part of your backup process:
- As per SAP recommended best practices, execute the SAP
hdbbackupcheck
tool regularly against backups to verify logical consistency. For more information, see SAP Note 1869119. - Test your disaster recovery procedures regularly.
SAP HANA scale-out deployment fails due to a Python error
If you are installing SAP HANA 2.0 SPS 5 Revision 56 or later for an SAP HANA
scale-out system with host auto-failover, the SAP HANA scale-out with host
auto-failover deployment fails due to a Python error in the storage manager for SAP HANA.
The SAP HANA trace log files show the following Python error for this failure:
failed with python error: _sap_hana_forbid() got an unexpected keyword argument 'stdout'
.
Resolution
Use version 2.2 or later of the storage manager for SAP HANA. Version 2.2 adds support for SAP HANA 2.0 SPS 5 Revision 56 and later. For more information about the storage manager for SAP HANA, see SAP HANA host auto-failover on Google Cloud.
High-availability cluster failover issue due to a Corosync communication delay
For your high-availability (HA) cluster for SAP HANA on Google Cloud, failover can be incorrectly triggered due to a temporary delay in the transmission of Corosync messages between the cluster nodes.
This issue occurs on both SUSE and Red Hat high-availability Linux distributions.
This issue is not specific to Google Cloud, but is described here because it has impacted SAP on Google Cloud users.
Resolution
The resolution of the issue is different depending on your operating system.
SUSE
SUSE provided a Corosync maintenance update that solves the problem. To apply the fix, update your Corosync software to one of the versions that are listed in the following table.
SUSE version | Corosync version |
---|---|
SLES 12 - all SP releases | corosync-2.3.6-9.19.1 |
SLES 15 | corosync-2.4.5-5.13.1 |
SLES 15 SP1 | corosync-2.4.5-9.16.1 |
SLES 15 SP2 | corosync-2.4.5-10.14.6.1 |
SLES 15 SP3 | corosync-2.4.5-12.3.1 |
SLES 15 SP4 | corosync-2.4.5-12.7.1 |
Red Hat
Red Hat provided a Corosync maintenance update that solves the problem. To apply the fix, update your Corosync software to one of the versions that are listed in the following table.
Red Hat version | Corosync version |
---|---|
RHEL 7 | corosync-2.4.5-7.el7_9.2 |
RHEL 8 | corosync-3.1.5-2.el8 |
gVNIC reset on RHEL causes failover in HA configuration
If you are using the gVNIC network driver in combination with versions of RHEL prior to 8.7, then you might experience a gVNIC reset causing the concerned VM to lose network connectivity for a couple of seconds, which might result in undesired failovers in your HA cluster.
You might observe a kernel call stack being generated in the messages log file of the OS, for example:
Feb 4 06:58:33 kernel: ------------[ cut here ]------------
Feb 4 06:58:33 kernel: NETDEV WATCHDOG: eth0 (gvnic): transmit queue 0 timed out
Feb 4 06:58:33 kernel: WARNING: CPU: 51 PID: 0 at net/sched/sch_generic.c:447 dev_watchdog+0x272/0x280
Feb 4 06:58:33 kernel: Modules linked in: falcon_lsm_serviceable(PE) falcon_nf_netcontain(PE) falcon_kal(E) falcon_lsm_pinned_16206(E) binfmt_misc nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nf_tables_set nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set falcon_lsm_pinned_16108(E) nf_tables nfnetlink intel_rapl_msr intel_rapl_common nfit libnvdimm vfat fat dm_mod gve crct10dif_pclmul crc32_pclmul i2c_piix4 ghash_clmulni_intel rapl pcspkr auth_rpcgss sunrpc xfs libcrc32c crc32c_intel serio_raw nvme nvme_core t10_pi [last unloaded: falcon_kal]
Feb 4 06:58:33 kernel: CPU: 51 PID: 0 Comm: swapper/51 Kdump: loaded Tainted: P E --------- - - 4.18.0-305.82.1.el8_4.x86_64 #1
Feb 4 06:58:33 kernel: Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 12/13/2023
Feb 4 06:58:33 kernel: RIP: 0010:dev_watchdog+0x272/0x280
...
Feb 4 06:58:33 kernel: ---[ end trace d6c7c7cb653cce9a ]---
Feb 4 06:58:33 kernel: gvnic 0000:00:03.0: Performing reset
Cause
The cause of this issue is that RHEL versions prior to 8.7 contain an earlier build of the gVNIC driver that doesn't have the required enhancements and stability patches.
Resolution
Use an SAP-certified version of RHEL that is later than 8.7, in combination with the gVNIC driver. Doing so is particularly important if you're using a third generation machine from Compute Engine, such as M3, because they don't support using the VirtIO driver, thus requiring you to use the gVNIC driver. For the full list of machine types that default to gVNIC, see the Machine series comparison table.