Deploying IBM Db2 Warehouse on Google Kubernetes Engine with GlusterFS storage

This tutorial shows how to create an IBM Db2 Warehouse cluster on Google Kubernetes Engine (GKE) with a GlusterFS file system running in the Kubernetes cluster as the storage layer. GlusterFS is an open-source, scalable network file system.

This tutorial is useful if you are a sysadmin, developer, engineer, or database administrator and you want to deploy an IBM Db2 Warehouse cluster on Google Cloud.

For an overview of IBM Db2 Warehouse and deployment options on Google Cloud, see the series overview.

In this tutorial, you use the following software:

  • Ubuntu-server 16.04
  • IBM Db2 Warehouse Enterprise Edition
  • IBM Db2 Warehouse Client
  • GlusterFS file system

Objectives

  • Get access to the IBM Db2 Warehouse Docker images in the Docker Store.
  • Provision a custom service account that has only the permissions required for this architecture.
  • Launch your GKE cluster.
  • Verify that the cluster is operational.
  • Initialize Docker Store authentication in the Kubernetes cluster.
  • Deploy and run GlusterFS in the cluster.
  • Deploy and run IBM Db2 Warehouse containers in the cluster.
  • Upload sample data in IBM Db2 Warehouse.
  • Connect to the IBM Db2 administration console and test the deployment.

Costs

This tutorial uses billable components of Google Cloud, including:

Use the Pricing Calculator to generate a cost estimate based on your projected usage.

Before you begin

  1. Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.
  2. In the Google Cloud Console, on the project selector page, select or create a Google Cloud project.

    Go to project selector

  3. Make sure that billing is enabled for your Cloud project. Learn how to confirm that billing is enabled for your project.

  4. Enable the GKE API.

    Enable the API

  5. If you don't have a Docker ID, create one in the Docker Store.

In this tutorial, you use IBM Db2 Warehouse Enterprise Edition. If you don't already have a license for this software, you might be able to use a free trial version for this tutorial.

When you finish this tutorial, you can avoid continued billing by deleting the resources you created. For more information, see Cleaning up.

Architecture

In this tutorial, you deploy a Kubernetes cluster using GKE in three different Google Cloud zones. In the cluster, you deploy three instances of IBM Db2 Warehouse:

  • An instance named db2wh-1 is initially designated as the head node.
  • Instances named db2wh-2 and db2wh-3 are initially designated as data nodes.

The role (head or data node) of individual instances can change if the head node fails over.

You also deploy a GlusterFS shared file system on the three nodes. GlusterFS, also deployed in the cluster, will be used as a shared file system for the IBM Db2 Warehouse nodes.

The architecture is shown in the following diagram:

Architecture

Getting access to the IBM Db2 Warehouse Edition Docker images

In this tutorial, you allow your Docker Store account to download a free trial version of IBM Db2 Warehouse Edition from the Docker Store. This involves downloading two separate images—a server and a client.

  1. In your browser, go to the IBM Db2 Warehouse EE Docker image.
  2. Sign in using your Docker username and password.
  3. Click Proceed to checkout.
  4. Fill in your details.
  5. If you agree to the terms, select the I agree ... and I acknowledge ... checkboxes on the right.
  6. Click Get Content.

    This takes you to the Setup page. You don't need to follow those instructions, because you'll perform those steps later in the tutorial.

  7. Repeat the process for the IBM Db2 Warehouse Client image.

Preparing your environment

In this tutorial, you use us-central1 as the default region and us-central1-b as the default zone. To save time typing your Compute Engine zone options in the gcloud command-line tool, you set the region and zone as defaults.

You perform most of the steps for this tutorial in Cloud Shell. When you open Cloud Shell, you can also automatically clone the GitHub repo that's associated with this tutorial.

  1. Open Cloud Shell and clone the GitHub repo for this tutorial:

    OPEN Cloud Shell

  2. Set the default region and zone:

    gcloud config set compute/region us-central1
    gcloud config set compute/zone us-central1-b
    

Provisioning a service account to manage GKE clusters

In this tutorial, you create a service account to manage Compute Engine instances. The GKE cluster nodes will use this service account instead of the default service account. It's a best practice to limit the service account to just the roles and access permissions that are required in order to run the application.

For this tutorial, the following roles are needed:

  • The Compute Admin role (roles/compute.admin). This role provides full control of all Compute Engine resources. The service account needs this role in order to manage persistent disks for GlusterFS volumes.
  • The Service Account User role (roles/iam.serviceAccountUser). This role provides access to all service accounts in the project, including service accounts that might be created in the future. The service account needs this role in order to attach persistent disks for GlusterFS volumes.
  1. In Cloud Shell, create an environment variable that stores the service account name:

    export GKE_SERVICE_ACCOUNT_NAME=db2dw-gke-service-account
    
  2. Create a service account:

    gcloud iam service-accounts create $GKE_SERVICE_ACCOUNT_NAME \
        --display-name=$GKE_SERVICE_ACCOUNT_NAME
    
  3. Create an environment variable that stores the service account email account name:

    export GKE_SERVICE_ACCOUNT_EMAIL=$(gcloud iam service-accounts list \
        --format='value(email)' \
        --filter=displayName:"$GKE_SERVICE_ACCOUNT_NAME")
    
  4. Bind the roles/compute.admin role to the service account:

    gcloud projects add-iam-policy-binding $(gcloud config get-value project 2> /dev/null) \
        --member serviceAccount:$GKE_SERVICE_ACCOUNT_EMAIL \
        --role roles/compute.admin
    
  5. Bind the roles/iam.serviceAccountUser role to the service account:

    gcloud projects add-iam-policy-binding $(gcloud config get-value project 2> /dev/null) \
        --member serviceAccount:$GKE_SERVICE_ACCOUNT_EMAIL \
        --role roles/iam.serviceAccountUser
    

Preparing the GKE cluster

In this section, you launch the GKE cluster, grant permissions, and finish the cluster configuration.

Launch the GKE Cluster

You can now create and launch the GKE cluster.

  • In Cloud Shell, create a regional, private GKE cluster with a single node in each zone:

    gcloud container clusters create ibm-db2dw-demo \
        --enable-ip-alias \
        --image-type=ubuntu \
        --machine-type=n1-standard-16 \
        --metadata disable-legacy-endpoints=true \
        --node-labels=app=db2wh \
        --node-locations us-central1-a,us-central1-b,us-central1-c \
        --no-enable-basic-auth \
        --no-issue-client-certificate \
        --num-nodes=1 \
        --region us-central1 \
        --service-account=$GKE_SERVICE_ACCOUNT_EMAIL \
        --enable-private-nodes \
        --enable-master-authorized-networks \
        --master-authorized-networks=0.0.0.0/0 \
        --master-ipv4-cidr "172.18.0.0/28"
    

    This creates a cluster named ibm-db2dw-demo.

    Note that you set a static IP address range for master nodes with this option:

    --master-ipv4-cidr "172.18.0.0/28"

    If this range is already in use in your project, change it accordingly.

Because you're creating this cluster with just one node pool (the default one), all of the nodes of this cluster will be eligible to run IBM Db2 Warehouse workloads. (Only labeled nodes are eligible to host IBM Db2 Warehouse pods.) If you want more separation—for example, you want dedicated nodes for IBM Db2 Warehouse—you can create either a new node pool or a dedicated cluster.

Master nodes of this cluster will accept connections from all networks, because you set the --master-authorized-networks option to 0.0.0.0/0. This is useful for managing the cluster from Cloud Shell. In a production environment, consider a more locked-down approach, like using a bastion host.

Configuring a Cloud NAT gateway for external internet access

Because you configured a private GKE cluster, you also have to configure a Cloud Router and a Cloud NAT to let GKE nodes access the internet.

  1. In Cloud Shell, create a Cloud Router:

    gcloud compute routers create nat-router --network default
    
  2. Create a Cloud NAT gateway:

    gcloud compute routers nats create nat-config \
        --router nat-router \
        --nat-all-subnet-ip-ranges \
        --auto-allocate-nat-external-ips
    

Manage Docker Store authentication

In this tutorial, you create a secret to store your Docker Store credentials, so that your GKE cluster can download the IBM Db2 Warehouse Docker image from the Docker Store. For more details, see the relevant section of the Kubernetes Documentation. This approach is valid for private Docker Registry instances as well.

  1. In Cloud Shell, log in to the Docker Store (the Docker registry instance you're going to use):

    docker login
    
  2. Create a Kubernetes secret with your Docker Store credentials:

    kubectl create secret generic dockerstore \
        --type=kubernetes.io/dockerconfigjson \
        --from-file=.dockerconfigjson="$HOME"/.docker/config.json
    

Grant Cluster Admin privileges to the user

You need to grant your user the ability to create new roles in GKE, as described in the GKE documentation.

  • In Cloud Shell, grant the permission to create new roles to your user:

    kubectl create clusterrolebinding cluster-admin-binding \
        --clusterrole cluster-admin \
        --user $(gcloud config list \
        --format 'value(core.account)')
    

Deploy GlusterFS

The next task is to deploy GlusterFS in the cluster. As part of this task, you deploy Heketi, which provides a RESTful API for managing GlusterFS volumes.

  1. In Cloud Shell, download the GlusterFS distribution package:

    wget -q https://github.com/gluster/gluster-kubernetes/archive/master.zip
    
  2. Extract GlusterFS deployment scripts from the package:

    unzip master.zip
    
  3. Create a ConfigMap to hold the node initialization script:

    kubectl apply -f solutions-db2wh/gluster/cm-entrypoint.yaml
    
  4. Deploy the node initialization DaemonSet:

    kubectl apply -f solutions-db2wh/gluster/daemon-set.yaml
    
  5. Verify that the node initialization is completed:

    kubectl get ds --watch
    

    Wait for the DaemonSet to be reported as ready and up to date, as in this output:

    NAME              DESIRED   CURRENT   READY     UP-TO-DATE   AVAILABLE   NODE SELECTOR   AGE
    node-initializer   3         3         3         3            3          <none>          2h
    
  6. Deploy the Heketi LoadBalancer service:

    kubectl apply -f solutions-db2wh/gluster/heketi-lb.yaml
    
  7. Wait for the load balancer service heketi-lbto be assigned an external IP:

    kubectl get services --watch
    

    In the output, you see an IP address for CLUSTER-IP and EXTERNAL-IP:

    NAME        TYPE           CLUSTER-IP   EXTERNAL-IP   PORT(S)          AGE
    heketi-lb   LoadBalancer   yy.yy.yy.yy  xx.xx.xx.xx   8080:30973/TCP   7s
    
  8. Populate the Heketi topology file:

    kubectl get nodes -o=jsonpath='{range .items[?(@.metadata.labels.app=="db2wh")]}{.metadata.name}{" "}{.status.addresses[?(@.type=="InternalIP")].address}{"\n"}{end}' | xargs -n 2 sh -c 'jq ".clusters[0].nodes += [{"node": {"hostnames": {"manage": [\""$0"\"], "storage": [\""$1"\"]}, "zone": 1}, "devices": [\"/dev/sdb\"]}]" solutions-db2wh/gluster/topology.json | sponge solutions-db2wh/gluster/topology.json'
    
  9. Deploy GlusterFS in the cluster:

    ./gluster-kubernetes-master/deploy/gk-deploy -gvy solutions-db2wh/gluster/topology.json
    
  10. Create an environment variable that stores the Heketi service cluster IP address:

    HEKETI_SERVICE_IP="$(kubectl get svc heketi-lb -o=jsonpath='{.status.loadBalancer.ingress[0].ip}')"
    
  11. Configure the Heketi URL in the StorageClass descriptor:

    sed -i "s/heketi.default.svc.cluster.local/$HEKETI_SERVICE_IP/g" solutions-db2wh/gluster/storage-class.yaml
    

    This is a workaround for issue GH-42306 in Kubernetes.

  12. Create a Storage Class to back PersistentVolumeClaims with GlusterFS volumes:

    kubectl apply -f solutions-db2wh/gluster/storage-class.yaml
    

Create the nodes files

You can now create a configuration file that IBM Db2 Warehouse needs in order to bootstrap each instance.

  1. In Cloud Shell, create the nodes files:

    kubectl get nodes -o=jsonpath="{range \
    .items[?(@.metadata.labels.app=='db2wh')]}\
    {.metadata.name}{':'}{.status.addresses[?(@.type=='InternalIP')]\
    .address}{\"\n\"}{end}" | sed '1s/^/head_node=/' | \
    sed -e '2,$ s/^/data_node=/' > nodes
    
  2. Create a ConfigMap that contains the nodes file:

    kubectl create configmap db2wh-nodes --from-file=nodes
    

Deploying IBM Db2 Warehouse pods

You now create all the GKE pods that are required in order to run IBM Db2 Warehouse.

  1. In Cloud Shell, create the PersistentVolumeClaim that will create a PersistentVolume in the GlusterFS cluster to back the claim:

    kubectl apply -f solutions-db2wh/persistent-volume-claim.yaml
    
  2. Run a job that copies the nodes file in the GlusterFS volume:

    kubectl apply -f solutions-db2wh/nodes-file-deploy-job.yaml
    
  3. Verify that the nodes file deployment job has run:

    kubectl get jobs --watch
    

    The job has run when nodes-config is reported as Successful:

    NAME           DESIRED   SUCCESSFUL   AGE
    nodes-config   1         1            19s
    
  4. Deploy a LoadBalancer Service to allow access to the IBM Db2 Warehouse administration console:

    kubectl apply -f solutions-db2wh/service.yaml
    
  5. Wait for the load balancer service named db2wh-ext to be assigned an external IP address:

    kubectl get services --watch
    

    In the output, you see an IP address for CLUSTER-IP and EXTERNAL-IP:

    NAME       TYPE          CLUSTER-IP   EXTERNAL-IP  PORT(S)                         AGE
    db2wh-ext  LoadBalancer  yy.yy.yy.yy  xx.xx.xx.xx  8443:30973/TCP,50000:30613/TCP  7s
    
  6. Deploy the StatefulSet to start the IBM Db2 Warehouse pods:

    kubectl apply -f solutions-db2wh/statefulset.yaml
    
  7. Verify that the IBM Db2 Warehouse pods (db2wh-0, db2wh-1, and db2wh-2) are running:

    kubectl get pods --watch
    

    This might take a few minutes.

    The pods are running when you see the status Running for all of the pods:

    db2wh-1   0/1       Running   0         3m
    db2wh-2   0/1       Running   0         3m
    db2wh-0   0/1       Running   0         3m
    
  8. Create an environment variable that stores the IP address of the node that's running the IBM Db2 Warehouse head node:

    HEAD_NODE_IP=$(grep "head_node" nodes | awk -F ':' '{print $2}')
    
  9. Create an environment variable that stores the head node pod name:

    HEAD_NODE_POD_NAME=$(kubectl get pods \
    --field-selector=status.phase=Running -o=jsonpath="{range \
    .items[?(@.metadata.labels.app=='db2wh')]} \
    {.metadata.name}{':'}{.status.\
    hostIP}{'\n'}{end}" | grep $HEAD_NODE_IP | awk -F ':' '{print $1}')
    
  10. Check the logs of one of the pods to ensure that the bootstrap process is running without problems:

    kubectl exec -it $HEAD_NODE_POD_NAME -- status --check-startup
    

    This might take 40 to 60 minutes, during which time you might see some errors detected. You can ignore them for this tutorial.

    The process is running correctly when you see the status running successfully in the output:

    HA Management up and running successfully!
    Successfully started IBM Db2 Warehouse service stack!
    
  11. Set up the administration console password:

    DB2_ADMIN_PASSWORD=$(openssl rand -hex 8)
    kubectl exec -it $HEAD_NODE_POD_NAME -- setpass ${DB2_ADMIN_PASSWORD}
    

Testing your deployment

You've finished configuring the pods, so you can now test the deployment.

Deploy the IBM Db2 Warehouse Client container

In order to upload data to IBM Db2 Warehouse, you now deploy the Client container and map the sample data to it using a Kubernetes ConfigMap.

  1. In Cloud Shell, create a ConfigMap that contains the sample data:

    kubectl create configmap sample-data \
        --from-file=solutions-db2wh/sample-data/nyc-wifi-locs.csv \
        --from-file=solutions-db2wh/sample-data/sample-table.sql
    
  2. Create the Deployment to start the IBM Db2 Warehouse Client container:

    kubectl apply -f solutions-db2wh/client.yaml
    
  3. Verify that the IBM Db2 Warehouse Client pod is running:

    kubectl get pods --watch
    

    This might take a few minutes.

    The pod is running when you see Running in the status:

    db2wh-client-xxxxx-xxxx   1/1       Running   0         3m
    

Upload sample data

To help you test the deployment, you upload sample data to IBM Db2 Warehouse server.

  1. In Cloud Shell, display the password created earlier:

    echo $DB2_ADMIN_PASSWORD
    
  2. Create an environment variable that stores the IBM Db2 Warehouse Client container name:

    CLIENT_CONTAINER_NAME=$(kubectl get pods -l app=db2wh-client -o=jsonpath='{.items[0].metadata.name}')
    
  3. Open a shell window in the Client container:

    kubectl exec -it $CLIENT_CONTAINER_NAME -- cli
    
  4. Create an environment variable that stores the password, where [PASSWORD] is the password you got earlier in this procedure:

    DB_PASSWORD=[PASSWORD]
    
  5. Create an environment variable that stores the database alias:

    DB_ALIAS=BLUDB
    

    BLUDB is the default database name in IBM Db2 Warehouse.

  6. Create an environment variable that stores the database hostname:

    DB_HOST=db2wh-ext.default.svc.cluster.local
    
  7. Set up the database catalog:

    db_catalog --add $DB_HOST --alias $DB_ALIAS
    
  8. Create a table to hold sample data in IBM Db2 Warehouse server:

    dbsql -f /sample-table.sql -d $DB_ALIAS -h $DB_HOST -u bluadmin -W $DB_PASSWORD
    
  9. Upload data to the IBM Db2 Warehouse server:

    dbload -verbose -host $DB_HOST -u bluadmin \
    -pw $DB_PASSWORD -db $DB_ALIAS -schema BLUADMIN \
    -t NYC_FREE_PUBLIC_WIFI -df /nyc-wifi-locs.csv -delim ',' \
    -quotedValue DOUBLE -timeStyle 12HOUR -skipRows 1
    
  10. Close the IBM Db2 Warehouse Client shell:

    exit
    

Validate the data using the administration console

You now connect to the IBM Db2 Warehouse administration console and verify the data that you uploaded.

  1. In Cloud Shell, find the service's external IP address:

    kubectl get svc db2wh-ext
    
  2. Open a browser and go to the following URL, where [EXTERNAL_IP] is the IP address from the preceding step:

    https://[EXTERNAL_IP]:8443
    

    You can bypass the security warning.

  3. Log in with the following credentials:

    • Username: bluadmin
    • Password: (the password you created in the previous procedure)
  4. If you accept the IBM Db2 Warehouse EULA, click Accept.

  5. On the left-hand side, open the menu and then select Administer > Tables:

    IBM Db2 Warehouse administration UI, showing the Tables folder

  6. Close the Quick Tour popup.

  7. Click NYC_FREE_PUBLIC_WIFI:

    Listing of available wi-fi hotspots

  8. Click the Data Distribution tab and make sure the table is populated:

    Data Distribution tab in the Tables listing

    You see 2871 rows in total, which is the entire dataset.

  9. Click Generate SQL.

  10. Select SELECT statement.

  11. Click OK.

    The Generate SQL tab opens and is pre-populated with an auto-generated SELECT statement.

  12. Add a LIMIT clause to the auto-generated SELECT statement to limit the results to the first five records:

    SELECT "THE_GEOM", "OBJECTID", "BORO", "TYPE", "PROVIDER", "NAME", "LOCATION",
           "LAT", "LON", "X", "Y", "LOCATION_T", "REMARKS", "CITY", "SSID",
           "SOURCEID", "ACTIVATED", "BOROCODE", "BORONAME", "NTACODE", "NTANAME",
           "COUNDIST", "POSTCODE", "BOROCD", "CT2010", "BOROCT2010", "BIN", "BBL", "DOITT_ID"
      FROM "BLUADMIN"."NYC_FREE_PUBLIC_WIFI"
      LIMIT 5;
    
  13. Click Run and then select Run All.

    A listing of records is displayed in the Result Set tab, showing that you successfully uploaded the sample data.

    Results of running SQL statement showing 5 records from the uploaded data

Clean up

To avoid incurring charges to your Google Cloud account for the resources used in this tutorial, either delete the project that contains the resources, or keep the project and delete the individual resources.

Delete the project

  1. In the Cloud Console, go to the Manage resources page.

    Go to Manage resources

  2. In the project list, select the project that you want to delete, and then click Delete.
  3. In the dialog, type the project ID, and then click Shut down to delete the project.

What's next