IBM Db2 Warehouse deployment strategies on Google Cloud

This document is the first part of a multi-part set that discusses options for deploying IBM Db2 Warehouse on Google Cloud. This part provides an overview of what DB2 Warehouse is and what it's used for, important points to take into consideration when planning a deployment, and options for deployment.
The series consists of these parts:

Introduction

IBM Db2 Warehouse is an analytics data warehouse that's deployed using Docker containers. Db2 Data Warehouse offers a scalable in-database analytics, in-memory processing and support for data applications.

Terminology

The following terms are important for understanding a Db2 Warehouse deployment on Google Cloud.

Data node
Any Db2 IBM Warehouse cluster member.
Head node
A designation for a data node that runs the admin web console and that functions as the cluster manager. There is exactly one head node in each cluster.
IBM Db2 Warehouse cluster
A group of machines that function as one IBM Db2 warehouse instance that are configured for connectivity, leader election, failover, and scaling.The cluster consists of one head node and two or more additional data nodes.
Shared file system
A file system that's mounted as a read-write resource on all cluster nodes. This is required by a Db2 Warehouse cluster.

When to deploy IBM Db2 Warehouse on Google Cloud

You might consider deploying Db2 Warehouse on Google Cloud under the following conditions:

  • You are already operating a Db2 Warehouse on-premises or on another cloud provider, and you want to move ("lift and shift") that workload to Google Cloud without migrating to a different data warehouse product.
  • You are considering a hybrid deployment, such as for disaster and recovery or availability purposes.
  • You are already using Db2 Warehouse, and want to use your team's skills and experience to deploy a data warehouse for a new analytics workload.

High-level architecture of a Db2 Warehouse cluster

The following diagram illustrates a typical Db2 Warehouse deployment. The deployment consists of a head node, two data nodes, and a shared file system, as well as an interactive user login and an application connection.

High-level architecture of a Db2 Warehouse deployment on GCP

For all Db2 Warehouse deployments:

  • Db2 Warehouse Docker images are obtained from the Docker Store. These images are required when deploying new instances (for example, scaling) as well as when you need to upgrade existing instances to a new version of Db2 Warehouse.
  • An appropriate license is required beyond any available free trial period to run workloads in production.
  • A Db2 Warehouse cluster assumes a fixed network topology—that is, the nodes' IP addresses and DNS names do not change over time. Changes to the cluster configuration require you to take down the cluster and manually make the configuration updates.
  • The shared file system that you're using must be mounted in read-write mode on all cluster nodes.
  • Cluster nodes require a wide range of ports to be open in order to facilitate communications.

Deployment considerations

This section outlines the different options for deploying and running Db2 Warehouse on Google Cloud.

Compute resources

The tutorials in this series demonstrate how to deploy Db2 Warehouse on Google Cloud in two different ways:

When to choose Compute Engine

Compute Engine is a service at the IaaS level. Choose Compute Engine when you need full control over the virtualized computing resources and network configuration. The building blocks of your Db2 Warehouse deployment are virtual machines. IBM Db2 Warehouse is an enterprise-grade data warehouse. Therefore, it's a workload that typically fits with this IaaS approach.

When to choose GKE

GKE takes care of managing Kubernetes clusters for you. Data warehouse workloads like Db2 Warehouse are not typically deployed on GKE, and therefore deploying IBM Db2 Warehouse on GKE requires an advanced understanding of Kubernetes. If your devops team is committed to using GKE for deployment in general, it can make sense to use GKE for your Db2 Warehouse deployment.

Storage options

You can choose different shared storage systems for Db2 Warehouse. For example, you can choose NFS volumes, IBM Spectrum Scale File System (GPFS), or GlusterFS. Each of these options require a fast and reliable network connection between the Db2 Warehouse data nodes and the shared storage system.

You should consider the following when choosing a shared storage system:

  • Licensing and costs. For example, GlusterFS is an open source solution, while IBM Spectrum Scale requires you to purchase a license.
  • Service management. If you require a fully managed solution, you might opt for Filestore. On the other hand, if you need to be able to fine-tune your storage solution, rolling out your own NFS servers or using a distributed file system might fit your needs better.
  • Network. Shared files systems require a fast and reliable network for communication between nodes in the cluster. The bandwidth requirements are a function of the required read and write throughput. There are some differences between the different storage solutions. For example, when you use remote NFS storage, it's imperative to have a fast connection from the nodes to the NFS servers. If the distributed file system is deployed on the nodes themselves, inter-node communications require more bandwidth compared to the remote NFS case.
  • Team experience and skills. If your team is experienced with a given technology or solution, it's often cheaper in terms of time and resources to stick with that technology.
  • Current tooling. If you manage your existing storage solution with an established toolchain, it could be expensive and time consuming to have a dedicated toolchain just to manage the newly deployed storage components

In the tutorials in this series you use two different storage types:

  • NFS. You use Filestore, which is a high-performance, fully managed file storage service that has a file system interface and can be used to share data. Data volumes managed with Filestore are accessible by using the Network File System (NFS) protocol.
  • GlusterFS, which is a scalable, distributed file system designed for scalability and replication. It offers multiple access interfaces, including GlusterFS native client, a FUSE-based client, and NFS. With GlusterFS, you can also fine-tune the storage backend configuration (for example, the number of replicas) for your environment. A GlusterFS server can run either on Compute Engine or in a Kubernetes cluster (in this case, created and managed in GKE). Note that when configuring GlusterFS you should also provision dedicated block storage devices that will be exclusively managed by the GlusterFS daemons.

What's next