Last update: April 13, 2023
What you need to know
An incompatibility with Google Cloud Backup and DR was introduced in recent new versions of the Linux kernel. This could impact your production and backup operations.
What is affected?
This issue applies only for configurations on which the following is true:
- Linux Servers that have the Backup and DR agent installed, and
One or more of the following databases, running on Linux, are protected using LVM snapshots with Changed Block Tracking (this continues to run incremental backups by comparing the latest backup with the current workload data and generates a new backup by writing the changed blocks):
- IBM Db2
- MariaDB
- MySQL
- PostgreSQL
- SAP ASE
- SAP HANA
- SAP IQ
- SAP MaxDB
Impact
For servers that meet the above conditions, the impact is as follows:
- RHEL kernel version higher than
4.18.0-425.3.1
:- Production servers fail to start after a reboot;
- The system may freeze if it reboots.
- The system may freeze if it enables Changed Block Tracking functionality on the incompatible kernels.
- SLES kernel version higher than
5.14.21-150400.22.1
:- Backup jobs fail.
How do I know if my servers are impacted?
Is the Backup and DR agent installed?
To determine which of your Linux application VMs have the Backup and DR agent installed, log into each Linux VM and run the following on each server:
sudo systemctl status udsagent
If the agent is installed and running, its output contains the following:
active (running)
Is my kernel an affected version?
RHEL
Check the post-reboot kernel version. You will be impacted if the kernel version is higher than 4.18.0-425.3.1
.
To check the post-reboot kernel version, run the following command in the shell:
sudo grubby --grub2 --default-title
And you will get the output similar to:
Red Hat Enterprise Linux (4.18.0-425.13.1.el8.x86_64) 8.7 (Ootpa)
Which indicates your post-reboot kernel version is 4.18.0-425.13.1.el8.x86_64
.
SLES
Check the post-reboot kernel version. You will be impacted if the kernel version is higher than 5.14.21-150400.22.1
.
To check the post-reboot kernel version, run the following command in the shell:
sudo grep -e "menuentry " -e submenu -e linux /boot/grub2/grub.cfg
And you will get the output similar to:
menuentry 'SLES15-SP4' --class sles15_sp4 --class gnu-linux --class gnu --class os $menuentry_id_option 'gnulinux-simple-9d30aeb5-d035-4732-b8c2-145c907808ff' {
$linux /boot/vmlinuz-5.14.21-150400.24.11-default root=LABEL=ROOT console=ttyS0,38400n8 net.ifnames=0 dis_ucode_ldr multipath=off
Which indicates your post-reboot kernel version is 5.14.21-150400.24.11
.
When will my servers freeze
Due to the kernel incompatibility, if your system has enabled Changed Block Tracking functionality, there are some cases may cause the system to freeze:
- Your system is running an affected kernel but it has not been rebooted yet, it may freeze if:
- Reboot.
- Run
sudo /opt/act/cbt/bin/cbt_deactivate.sh
- Your system is running an affected kernel but it has already been rebooted, it may freeze at any time.
- Your system is running an older kernel which is not affected, it may freeze if:
- Current kernel is upgraded to the affected kernel and reboot.
What you should do
If you have systems that may be impacted, including those that have OS auto-updates turned on, take these immediate proactive measures to mitigate the issue:
If your system is running an affected kernel
- Avoid rebooting these servers before making these changes.
- Do not run
/opt/act/cbt/bin/cbt_deactivate.sh
. - If you can, Turn off the OS auto-update functionality on your Linux VMs.
- Disable backup jobs for all workloads on these servers. This prevents production impact by stopping all backup jobs on these servers.
- Denylist the Changed Block Tracking kernel module and Enable Degraded Capture Mode.
- Login to the Linux VMs.
- Follow How to denylist the Changed Block Tracking kernel module to denylist the Changed Block Tracking module.
- Reboot the system (if system freeze during shutting down, forcibly stop it).
- Set the Enable Degraded Capture Mode to YES in applications by changing the policy settings from the Manage Backup Plan page. For more details, see Policy Settings.
- Avoid changing the method of backup through any other approaches before the Changed Block Tracking module is denylisted.
If your system is running an unaffected kernel
Turn off the OS auto-update functionality on your Linux VMs. If you cannot turn off auto-update, consider the following options to disable Changed Block Tracking:
Manually disable the Changed Block Tracking functionality and Enable Degraded Capture Mode.
- Login to the Linux VMs.
- Run
sudo /opt/act/cbt/bin/cbt_deactivate.sh
. - Set the Enable Degraded Capture Mode to YES in applications by changing the policy settings from the Manage Backup Plan page. For more details, see Policy Settings.
From the management console for applications using Linux Changed Block Tracking:
- Change the method of backup from Use Volume Level Backup to Use full+incremental Backup.
- For faster RTO, turn on Force Full Filesystem Backup.
- For more information on this step, see Check the backup method to be used for this database or instance.
What if you encounter a restart failure or system freeze
If no mitigation steps are taken and a server is restarted then it could get into a soft lockup (system may freeze). Reach out to the Google Cloud Support Center for help with resolving this issue.
To recover from a soft lockup state
You can denylist the Changed Block Tracking kernel module, so that your systems do not experience the freeze issue. Since the system freezes and is unavailable for login, you must create a new boot disk to rescue the crashed boot disk. You can use gce-rescue to do this rescue job.
- Use cloud shell or another Linux shell to install gce-rescue.
Run gce-rescue with the command below:
sudo $(which gce-rescue) --zone instance-zone --project instance-project --name instance-name
When the command is finished, connect to the same instance by using the ssh tool in the Cloud Console.
After connecting to the instance, make sure the old boot disk is mounted at
/mnt/sysroot
.Run
sudo chroot /mnt/sysroot
.Run
mount -a
.Follow How to denylist the Changed Block Tracking kernel module to denylist the Changed Block Tracking module.
Restore the original boot disk. Run the same rescue command again. This automatically detects if the instance is in rescue, and restores it to the original boot disk.
sudo $(which gce-rescue) --zone instance-zone --project instance-project --name instance-name
Wait for the instance to reboot, then try to ssh into the instance. Make sure that
lsmod | grep act
outputs nothing.
How to denylist the Changed Block Tracking kernel module
Edit or create the file /etc/modprobe.d/blacklist.conf
and add the following denylist rules:
blacklist act_cbt_1_14
blacklist act_cbt_1_15
blacklist act_cbt_1_14_0
blacklist act_cbt_1_15_0
Then run the following commands:
sudo sh -c "echo 'install act_cbt_1_14 /bin/true' > /etc/modprobe.d/act_cbt_1_14.conf"
sudo sh -c "echo 'install act_cbt_1_15 /bin/true' > /etc/modprobe.d/act_cbt_1_15.conf"
sudo sh -c "echo 'install act_cbt_1_14_0 /bin/true' > /etc/modprobe.d/act_cbt_1_14_0.conf"
sudo sh -c "echo 'install act_cbt_1_15_0 /bin/true' > /etc/modprobe.d/act_cbt_1_15_0.conf"
When is this issue getting resolved?
Backup and DR Service will provide an update to mitigate the issue and prevent the soft lock up by April 18th. Follow instructions on the Backup and DR Service page in Cloud Console and perform the update at the earliest opportunity.
Backup and DR Service will follow up with a more detailed update in May that makes the Backup and DR Service kernel module compatible with the newer versions of the Red Hat and SLES kernel packages.