Strategies to migrate IBM Db2 to Compute Engine

This document describes best practices for a homogeneous Db2 migration to Compute Engine. It is intended for database admins, system admins and software, database, and ops engineers who are migrating Db2 environments to Google Cloud. Migrations from Db2 to other types of databases are outside the scope of this document.

Terminology

IBM Db2: An enterprise grade relational database management system (RDBMS), with replication and failover capabilities.
High availability disaster recovery (HADR): A capability for Db2 that uses database-logged activities to replicate data from the primary database to the standby. This feature enables a manual failover from the primary database to the standby database.
Primary: The machine hosting Db2 that accepts writes as well as read requests. This machine is the source of the replication to the standby machines
Principal standby: A standby machine that can accept read requests only. This machine supports any of the synchronization modes that Db2 allows and is designated to be a failback instance for HADR purposes. IBM Db2 allows only one such machine in a cluster.
Auxiliary standby: A standby machine that can accept read requests only. This machine supports only the SUPERASYNC synchronization mode and resides in a different data center than the primary machine in case of a manual failover if the main data center fails.
Tivoli System Automation for Multiplatforms (TSA/MP): Cluster management software that facilitates automatic resource failover from the primary database to the primary standby. This software includes network, storage, and compute resources that are defined as part of the cluster. Db2 enterprise edition comes with TSAMP entitlement for HADR included.
Automatic client reroute (ACR): A Db2 feature that redirects client apps from a failed server to an alternate server so that the apps can continue working with minimal interruption.
Change data capture (CDC): A set of techniques or tools to detect changes in a database, such as data synchronization with another database or creating an audit trail.

Architecture

A Db2 cluster usually consists of at least primary and principal standby nodes with HADR between them. In newer versions of Db2, you can also add auxiliary standby nodes that serve DR purposes.

The following diagram depicts a source environment.

Architecture of typical source environment in two data centers.

In this environment, the primary and the principal standby are in one data center and the auxiliary standbys are in different data centers.

A migration goal is to recreate this environment on Google Cloud as shown in the following diagram.

Architecture of source environment recreated on Google Cloud.

The following table compares aspects of each type of migration.

	Migrate to Virtual Machines	Q replication	SQL Replication	HADR
Sources	VMware, Amazon Web Services (AWS) VMs	Any Db2 environment, based on licensing	Any Db2 environment	Any Db2 environment
What is replicated?	Block-level replication of disks	Tables in the database	Tables in the database	Entire database
Cutover	Requires a few minutes for a VM to launch in Google Cloud	Point apps and DNS to the Compute Engine instances	Point apps and DNS to the cloud instances	Point apps and DNS to the cloud instances
DDL change replication	Yes (disk writes being replicated)	Yes	Yes	Yes
Synchronous data replication	N/A	No	Yes	Yes
Asynchronous data replication	N/A	Yes	Yes	Yes
Point-in-time data replication	No	Yes	Yes	No

The preceding table is a guide to help you match system availability requirements and resource effort level to set up the target system, set up replication, as well as maintain and test the replication over time. The table shows that Migrate to VMs is the easiest approach to implement, but the least flexible in terms of system availability. Alternatively, HADR, Q replication, and SQL Replication have a lower impact on system availability in exchange for a higher level of effort to set up and maintain the replication in a parallel model.

Migration types

There are two ways to migrate Db2 to Compute Engine:

Migrations that involve modifying an existing cluster configuration or topology.
Migrations that replicate data into completely new clusters.

Modifying an existing cluster doesn't require launching a completely new cluster in the cloud and, therefore, can be faster. The other way to migrate requires that you deploy a new cluster to Google Cloud, but it has a smaller impact on the existing cluster because the replication is out-of-band. This method is also handy if you want to replicate only a part of the database or perform transformations on the data before it lands in the target.

The following sections discuss what to consider before you move your Db2 instances to Google Cloud. Some commonly used capabilities might not work as-is on Google Cloud or might need some configuration changes.

Floating (virtual) IP addresses

In a highly available Db2 cluster, TSA/MP can assign a virtual IP address to the primary node. This address is also called a floating IP address and means that traffic is always routed to the primary node and not the standby.

Compute Engine uses a virtualized network stack in a Virtual Private Cloud (VPC) network, so typical implementation mechanisms might not work. For example, the VPC network handles Address Resolution Protocol (ARP) requests based on the configured routing topology, and ignores gratuitous ARP frames. In addition, it's impossible to directly modify the VPC network routing table with standard routing protocols such as Open Shortest Path First Protocol (OSPF) or Border Gateway Protocol (BGP). Therefore, you must either implement an alternative to floating IP addresses or use ACR.

If you are moving some or all of the nodes in a Db2 cluster, make sure to disable floating IP addresses for your cluster before you move any nodes.

ACR

If your Db2 environment uses ACR , you might need to change the catalog on your clients if the DNS names change or if your clients connect by using IP addresses.

Tiebreakers

TSA/MP requires that the majority of the cluster nodes are online to start automation actions. If the cluster consists of an even number of nodes, there's a chance that exactly half of the nodes of the cluster are online, and there's a chance for a split-brain scenario. In this case, TSA/MP uses a tiebreaker to decide the quorum (the majority group) state, which determines whether automation actions can be started.

Consider the following tiebreakers that your Db2 environment might use:

Storage or disk tiebreaker. Ibm Db2 uses disk reservations in order to break the tie. Because reservations aren't available on Google Cloud, you must choose a different type of tiebreaker.
Network tiebreaker. Uses an external (to the cluster) IP address to resolve a tie situation. In a hybrid deployment, your network tiebreaker might not need to move to Google Cloud initially as long as it is reachable from the cluster nodes. After your cluster runs on Google Cloud, however, a good practice is to create the tiebreaker in a different zone or use the Google Cloud metadata server as the tiebreaker.
NFS tiebreaker. The NFS tiebreaker resolves tie situations that are based on reserve files that are stored on an NFS v4 server. Like with the network tiebreaker, the NFS tiebreaker and the NFS v4 server can also remain in their original location in a hybrid deployment. Later, a better practice is to deploy your own NFS server or use partners like Elastifile as the NFS tiebreaker targets on Google Cloud.

Migrating using Migrate to VMs

If both of the following are true for your environment, Migrate to VMs is your recommended option:

You have a VMware vCenter environment or virtual machines on Amazon Elastic Compute Cloud (Amazon EC2).
You have a private connection from Google Cloud to your environment such as Cloud VPN or Cloud Interconnect.

Migrate to VMs is for migrating virtual machines from on-premises and cloud environments to Google Cloud. It lets you migrate a virtual machine to Google Cloud in a few minutes, while the data is copied in the background but the virtual machines are completely operational. You must have a private connection between your source environment to your Google Cloud project such as Cloud VPN, Cloud Interconnect, or Partner Interconnect.

With Migrate to VMs, you need to reevaluate the database configuration on the cloud VMs. Some configurations might not be optimized for Google Cloud, such as registry variables, buffer pools, database manager configuration or database configuration. You can use the AUTOCONFIGURE utility to start with a baseline.

The Migrate to VMs operational methodology is detailed in VM migration lifecycle.

The following sections outlinehow to apply this methodology for a Db2 environment.

Test clones

Test clones are available only on vCenter environments.

Migrate to VMs can take a snapshot of your VM and create a ready-to-go compute instance on Google Cloud based on that snapshot. You can recreate your Db2 environment on Google Cloud, try configuration changes, test, and benchmark the deployment without any consequences to your source environment.

The following diagram shows your DB2 environment on Google Cloud with the side-by-side environment on Google Cloud after a Migrate to VMs test cloning.

Architecture of side-by-side environment after a test cloning.

After you benchmark and test the test clones on Google Cloud, you can delete the test clones.

Run-in-cloud

When activating run-in-cloud, Migrate to VMs shuts down your source cluster and starts the VMs on Google Cloud, while only fetching data as needed and not streaming the entire storage to Google Cloud. Run-in-cloud supports write-back and is enabled by default. Migrate to VMs helps you test your environment before actively streaming the storage. You can also move the VM back to your source environment by using the move back feature. In cloud-to-cloud migrations, you cannot replicate writes back to the source.

The following diagram shows the run-in-cloud phase, if you set all your nodes to run in the cloud. You can decide to gradually move cluster nodes instead of the entire cluster at once.

Architecture of run-in-cloud phase with all nodes set to run in the cloud.

Migration

The migration phase is similar to the run-in-cloud phase, but Migrate to VMs also actively streams the storage to Google Cloud. During the run-in-cloud phase, Migrate to VMs only brings data on demand to save on the bandwidth because you haven't indicated that you are ready to move the VM completely.

Detach

During this phase, Migrate to VMs syncs the data from its cache and object store to the native data disks on Google Cloud, and then attaches the disks to the VM. This phase requires that you shut down the VM on Google Cloud. For Db2, we recommend detaching one node of the cluster at a time.

Using replication

For Db2, replication is the process of capturing changes from the transaction log by using a program called the capture program, and then applying them to a different cluster using the apply program. The way the capture program captures the changes and the type of communication channel used to transmit the changes to the apply program differ between the replication types.

The following diagram shows the logical flow of information in Db2 replication.

Architecture of flow of information in Db2 replication.

The capture app captures changes from the database and sends the changes to the apply app. The apply app writes those changes to the target database. There are some transformations that the apps can do on the data itself. The capture and apply applications don't necessarily need to run on the database server itself.

SQL Replication

A SQL Replication captures changes to source tables and views and uses staging tables to store committed transactional data. The changes are then read from the staging tables and replicated to corresponding target tables. At the time of writing this document, when you install Db2, SQL Replication is available to you.

A migration process leveraging SQL Replication would look like this:

Deploy Db2 on Google Cloud.
Configure SQL Replication.
Start SQL Replication.
Verify that the deployments are in sync.
Point your apps to the Google Cloud instance. Stop the replication.

The following diagram is an example of SQL replication.

SQL replication of source environment on Google Cloud.

Your production environment works as usual, while replicating the SQL commands to the new cluster you create on Google Cloud. In the preceding diagram, the replication process runs on the primary instance but there are different ways to deploy it that are outside of the scope of this document.

Q replication

Q replication is a newer and more efficient way than SQL Replication to replicate data from one Db2 instance to another. This method uses IBM MQ to deliver data changes entries, which means that you have to deploy an instance of IBM MQ in the source environment and the target environment. This method of replication is faster than SQL replication because it is in memory. SQL replication is slower but Q replication is usually more difficult to set up, because you need to set up IBM MQ. Depending on your Db2 license, you may have to acquire a license for Q replication.

When you start the Db2 Q replication, you can choose between the following two methods:

Automatic loading. The Q replication processes perform the initial load, which means restoring the target database from a backup of the source.
Manual loading. You perform the initial load, and then start the replication from the point in the log.

A migration process looks like this:

Deploy IBM MQ on Google Cloud and in your source environment.
Deploy Db2 on Google Cloud.
Configure Q replication.
Start Q replication (either with manual loading or automatic loading).
Verify that the two deployments are in sync.
Point your applications to the Google Cloud instance. Stop the replication.

The following diagram shows a possible Q replication solution.

Architecture of Q replication of source environment on Google Cloud.

The source environment uses IBM Q replication to send the database changes to IBM MQ and the target environment, extending a Db2 cluster to Compute Engine

In this approach, you gradually move your existing Db2 cluster to Compute Engine and rely on HADR for the data transfer between nodes.

Use this approach if you meet the following conditions:

You don't want to deploy an entirely new cluster on Compute Engine.
You cannot leverage Migrate to VMs.
You cannot use one of the replication options.
You don't or cannot use a partner product (licensing, costs, or compliance to name a few reasons).

If your Db2 version doesn't support auxiliary standby

You can do the following:

Deploy a Db2 instance on Compute Engine.
Take a backup from your primary instance.
Restore the Db2 instance on Compute Engine from backup.
Remove the standby instance from the HADR setup.
Attach the Compute Engine Db2 instance as a standby (you can choose your sync mode, but due to possible higher latency , ASYNC or NEARASYNC might be preferable).
Failover to the Compute Engine Db2 instance and make it the HADR primary.
Create another Compute Engine Db2 instance, restore it from backup, and set up as HADR standby.

The first step in the following diagram shows the newly created Db2 instance on Google Cloud set up as the principal standby of the source Db2 primary.

Architecture of Db2 instance on Google Cloud set up as principal standby.

In the preceding diagram, the Google Cloud instance becomes the HADR primary. Then you remove the source principal standby and attach another Db2 instance on Compute Engine as the principal standby instance.

If your Db2 version supports auxiliary standby

One option is to follow the same steps as when Db2 version doesn't support auxiliary standby, and at the end, move the auxiliary standby instances as well.

Another option is to leverage the auxiliary replicas for a more fault-tolerant move to Google Cloud, because you don't have the primary or principal standby in your source environment and the other on Google Cloud. The following list outlines the steps for this second option:

Deploy Db2 instances on Compute Engine (primary, principal, auxiliaries if needed) to their locations.
Remove the auxiliary standby nodes from the source cluster.
Configure the nodes that will become the primary and principal of the auxiliary standbys of the source cluster.
Perform a takeover of one of the Compute Engine instances. This instance becomes the primary instance.Configure one of the other Compute Engine instances as principal standby of the primary instance.

The first step depicted in the following diagram shows two of the newly created Db2 instances on Compute Engine.

Architecture of auxiliary Db2 instances on Google Cloud.

The instances are set up as auxiliary standbys of the source Db2 primary instance instead of the auxiliary instances in the source environment. Then, after invoking takeover to one of the Compute Engine instances, that instance becomes the HADR primary and one other instance is configured as principal standby. In the last step, two other instances are configured as auxiliary standbys.

Partner products

Google has several partners who have products to help with such a migration. Most of these products leverage CDC to replicate data between the source Db2 and the target. These products aren't Google Cloud products, and you need to check licensing and pricing for each product or service. Usually, this service replicates data from an existing cluster to a different cluster that you create on Google Cloud, and the overall approach is similar to the replication scenarios described in this document.

The following are a few partner products:

What's next

IBM Db2 for SAP planning guide.
Google Cloud database migration center.
Explore reference architectures, diagrams, and best practices about Google Cloud. Take a look at our Cloud Architecture Center.