Deploying IBM Db2 Warehouse on Compute Engine with GlusterFS storage

This tutorial shows how to create an IBM Db2 Warehouse cluster on Compute Engine with a GlusterFS file system as the storage layer. GlusterFS is an open-source, scalable network file system.

You will find this tutorial useful if you are a sysadmin, developer, engineer, or database administrator and you want to deploy an IBM Db2 Warehouse cluster on Google Cloud.

For an overview about IBM Db2 Warehouse and deployment options on Google Cloud, see the series overview.

In this tutorial, you use the following software:

  • Ubuntu-server 16.04
  • Docker Community Edition
  • IBM Db2 Warehouse Enterprise Edition (trial)
  • IBM Db2 Warehouse Client
  • GlusterFS file system

Objectives

  • Get access to the IBM Db2 Warehouse Docker images in the Docker Store.
  • Provision a custom service account that has only the permissions required for this architecture.
  • Launch your cluster VMs.
  • Configure the GlusterFS file system.
  • Initialize Docker Store authentication.
  • Deploy the IBM Db2 Warehouse containers to the cluster.
  • Run IBM Db2 Warehouse.
  • Upload sample data in IBM Db2 Warehouse.
  • Connect to the IBM Db2 administration console and test the deployment.

Costs

This tutorial uses billable components of Google Cloud, including:

Use the Pricing Calculator to generate a cost estimate based on your projected usage.

Before you begin

  1. Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.
  2. In the Google Cloud Console, on the project selector page, select or create a Google Cloud project.

    Go to project selector

  3. Make sure that billing is enabled for your Cloud project. Learn how to confirm that billing is enabled for your project.

  4. Enable the Compute Engine API.

    Enable the API

  5. If you don't have a Docker ID, create one in the Docker Store.

In this tutorial, you use IBM Db2 Warehouse Enterprise Edition. If you don't already have a license for this software, you might be able to use a free trial version for this tutorial.

When you finish this tutorial, you can avoid continued billing by deleting the resources you created. For more information, see Cleaning up.

Architecture

In this tutorial, you deploy three Compute Engine instances in three different zones:

  • An instance named db2wh-1 is initially designated as the head node.
  • Instances named db2wh-2 and db2wh-3 are initially designated as data nodes.

The role (head or data node) of individual instances can change if the head node fails over.

You also deploy a GlusterFS shared file system on the three nodes. GlusterFS is used as the shared file system for the IBM Db2 Warehouse cluster.

The architecture is shown in the following diagram:

Architecture

Getting access to the IBM Db2 Warehouse Edition Docker images

In this tutorial, you allow your Docker Store account to download a free trial version of IBM Db2 Warehouse Edition from the Docker Store. This involves downloading two separate images—a server and a client.

  1. In your browser, go to the IBM Db2 Warehouse EE Docker image.
  2. Sign in using your Docker username and password.
  3. Click Proceed to checkout.
  4. Fill in your details.
  5. If you agree to the terms, select the I agree ... and I acknowledge ... checkboxes on the right.
  6. Click Get Content.

    This takes you to the Setup page. You don't need to follow those instructions, because you'll perform those steps later in the tutorial.

  7. Repeat the process for the IBM Db2 Warehouse Client image.

Preparing your environment

You perform most of the steps for this tutorial in Cloud Shell. When you open Cloud Shell, you can also automatically clone the GitHub repo that's associated with this tutorial.

  • Open Cloud Shell and clone the GitHub repo for this tutorial:

    OPEN Cloud Shell

Provisioning a service account to manage Compute Engine instances

In this tutorial, you create a service account to manage Compute Engine instances. It's a best practice to limit the service account to just the roles and access permissions that are required in order to run the application.

For this tutorial, the only role required for the service account is the Compute Viewer role (roles/compute.viewer). This role provides read-only access to Compute Engine resources.

  1. In Cloud Shell, create a service account named db2wh-sa:

    gcloud iam service-accounts create db2wh-sa --display-name=db2wh-sa
    
  2. Create an environment variable that stores the full service account email address:

    export GCE_SERVICE_ACCOUNT_EMAIL="db2wh-sa@$(gcloud config get-value project 2> /dev/null).iam.gserviceaccount.com"
    
  3. Bind the compute.viewer to the service account:

    gcloud projects add-iam-policy-binding $(gcloud config get-value project 2> /dev/null) \
        --member serviceAccount:$GCE_SERVICE_ACCOUNT_EMAIL \
        --role roles/compute.viewer
    

Creating firewall rules

You create firewall rules to allow internal cluster communications and access from the management console to the cluster. In addition, you configure rules to allow internal GlusterFS communication.

For more information, see:

  1. In Cloud Shell, create the firewall rules that are required for inter-node communications:

    gcloud compute firewall-rules create db2wh-internal \
        --description="db2wh" \
        --action=ALLOW \
        --rules=tcp:50000-50001,tcp:60000-60060,tcp:25000-25999,tcp:50022,tcp:9929,tcp:9300,tcp:8998,tcp:5000,tcp:2379-2380,tcp:389 \
        --source-tags=db2wh \
        --target-tags=db2wh
    
  2. Create the firewall rules for the administration console:

    gcloud compute firewall-rules create db2wh-admin \
        --description="db2wh admin console" \
        --action=ALLOW \
        --rules=tcp:8443 \
        --source-ranges=0.0.0.0/0
    
  3. Create the firewall rules for GlusterFS:

    gcloud compute firewall-rules create db2wh-glusterfs \
        --description="db2wh glusterfs" \
        --action=ALLOW \
        --rules=tcp:22,tcp:111,tcp:139,tcp:24007,tcp:24009-24108,tcp:49152-49251 \
        --source-tags=db2wh \
        --target-tags=db2wh
    

Initializing the Compute Engine environment

In this section, you create persistent disks for GlusterFS volumes, launch Compute Engine instances, grant permissions, and finish the environment initialization.

Create persistent disks for GlusterFS

In this tutorial, you create additional persistent disks to host the GlusterFS file system.

  1. In Cloud Shell, create a disk named db2wh-1-glusterfs:

    gcloud compute disks create db2wh-1-glusterfs \
        --size=1024 \
        --zone=us-central1-c
    
  2. Create a second disk named db2wh-2-glusterfs:

    gcloud compute disks create db2wh-2-glusterfs \
        --size=1024 \
        --zone=us-central1-b
    
  3. Create a third disk named db2wh-3-glusterfs:

    gcloud compute disks create db2wh-3-glusterfs \
        --size=1024 \
        --zone=us-central1-f
    

Launch Compute Engine instances

Using the startup script in the GitHub repository, you create three Compute Engine instances for the cluster nodes. (The instances are in different zones to support high availability.) This includes mapping an internal IP addresses to each of the instances.

  1. In Cloud Shell, create an instance named db2wh-1:

    gcloud compute instances create db2wh-1 \
        --image-family=ubuntu-1604-lts \
        --image-project=ubuntu-os-cloud \
        --tags=db2wh --boot-disk-size=50GB \
        --machine-type=n1-standard-16 \
        --zone=us-central1-c \
        --private-network-ip=10.128.0.100 \
        --metadata=storagetype=gluster \
        --metadata-from-file startup-script=solutions-db2wh/gce-db2wh-startup.sh \
        --service-account=$GCE_SERVICE_ACCOUNT_EMAIL \
        --disk=device-name=sdb,name=db2wh-1-glusterfs
    
  2. Create instance db2wh-2:

    gcloud compute instances create db2wh-2 \
        --image-family=ubuntu-1604-lts \
        --image-project=ubuntu-os-cloud \
        --tags=db2wh --boot-disk-size=50GB \
        --machine-type=n1-standard-16 \
        --zone=us-central1-b \
        --private-network-ip=10.128.0.101 \
        --metadata=storagetype=gluster \
        --metadata-from-file startup-script=solutions-db2wh/gce-db2wh-startup.sh \
        --service-account=$GCE_SERVICE_ACCOUNT_EMAIL \
        --disk=device-name=sdb,name=db2wh-2-glusterfs
    
  3. Create instance db2wh-3:

    gcloud compute instances create db2wh-3 \
        --image-family=ubuntu-1604-lts \
        --image-project=ubuntu-os-cloud \
        --tags=db2wh --boot-disk-size=50GB \
        --machine-type=n1-standard-16 \
        --zone=us-central1-f \
        --private-network-ip=10.128.0.102 \
        --metadata=storagetype=gluster \
        --metadata-from-file startup-script=solutions-db2wh/gce-db2wh-startup.sh \
        --service-account=$GCE_SERVICE_ACCOUNT_EMAIL \
        --disk=device-name=sdb,name=db2wh-3-glusterfs
    

Log in to the Docker Store from Compute Engine instances

You now supply your credentials to each of the three members of the cluster in order to enable communications to the Docker Store.

  1. In Cloud Shell, log in to the Docker Store on db2wh-1:

    gcloud compute ssh db2wh-1 \
        --zone=us-central1-c \
        --command="sudo docker login" \
        -- -t
    
  2. Log in to the Docker Store on db2wh-2:

    gcloud compute ssh db2wh-2 \
        --zone=us-central1-b \
        --command="sudo docker login" \
        -- -t
    
  3. Log in to the Docker Store on db2wh-3:

    gcloud compute ssh db2wh-3 \
        --zone=us-central1-f \
        --command="sudo docker login" \
        -- -t
    

Create the nodes file

You now create the nodes file, which IBM Db2 Warehouse uses to discover the nodes in the cluster. Each line specifies the initial type of the node, its DNS name, and its IP address.

  • In Cloud Shell, create the node files by connecting to the dbw2h-1 instance through SSH:

    gcloud compute ssh db2wh-1 --zone=us-central1-c \
        --command='sudo bash -c "cat <<EOF >  /mnt/clusterfs/nodes
    head_node=db2wh-1:10.128.0.100
    data_node=db2wh-2:10.128.0.101
    data_node=db2wh-3:10.128.0.102
    EOF"'
    

Launching the IBM Db2 Warehouse containers

You now start the IBM Db2 Warehouse containers on the three instances.

  1. In Cloud Shell, start the Docker container on db2wh-1:

    gcloud compute ssh db2wh-1 \
        --zone=us-central1-c \
        --command='sudo docker run -d -it \
        --privileged=true \
        --net=host \
        --name=db2wh -v /mnt/clusterfs:/mnt/bludata0 -v /mnt/clusterfs:/mnt/blumeta0 store/ibmcorp/db2wh_ee:v2.12.0-db2wh-linux'
    
  2. Start the Docker container on db2wh-2:

    gcloud compute ssh db2wh-2 \
        --zone=us-central1-b \
        --command='sudo docker run -d -it \
        --privileged=true \
        --net=host \
        --name=db2wh -v /mnt/clusterfs:/mnt/bludata0 -v /mnt/clusterfs:/mnt/blumeta0 store/ibmcorp/db2wh_ee:v2.12.0-db2wh-linux'
    
  3. Start the Docker container on db2wh-3:

    gcloud compute ssh db2wh-3 \
        --zone=us-central1-f \
        --command='sudo docker run -d -it \
        --privileged=true \
        --net=host \
        --name=db2wh -v /mnt/clusterfs:/mnt/bludata0 -v /mnt/clusterfs:/mnt/blumeta0 store/ibmcorp/db2wh_ee:v2.12.0-db2wh-linux'
    
  4. Verify that startup is complete:

    gcloud compute ssh db2wh-1 \
        --zone=us-central1-c \
        --command='sudo docker exec -it db2wh status --check-startup' \
        -- -t
    

    This might take 40 to 60 minutes, during which time you might see some errors detected. You can ignore them for this tutorial.

    The configuration is complete when you see the status running successfully in the output:

    HA Management up and running successfully!
    Successfully started IBM Db2 Warehouse service stack!
    
  5. Use SSH to connect to db2wh-1 and reset the admin password:

    DB2_ADMIN_PASSWORD=$(openssl rand -hex 8)
    gcloud compute ssh db2wh-1 \
        --zone=us-central1-c \
        --command="sudo docker exec db2wh setpass ${DB2_ADMIN_PASSWORD}" \
        -- -t
    

Testing your deployment

You've finished configuring the server containers, so you can now test it.

Deploy the IBM Db2 Warehouse Client container

In order to upload data to IBM Db2 Warehouse, you can now deploy to the Client container.

  1. In Cloud Shell, copy sample data to the Compute Engine instance that will run the Client container:

    gcloud compute scp solutions-db2wh/sample-data/nyc-wifi-locs.csv db2wh-1:/tmp \
        --zone=us-central1-c
    
  2. Copy sample table schema to that Compute Engine instance:

    gcloud compute scp solutions-db2wh/sample-data/sample-table.sql db2wh-1:/tmp \
        --zone=us-central1-c
    
  3. Start the Docker container on db2wh-1:

    gcloud compute ssh db2wh-1 \
        --zone=us-central1-c \
        --command='sudo docker run -d -it \
        --privileged=true \
        --net=host \
        --name=db2wh-client -v /tmp/nyc-wifi-locs.csv:/nyc-wifi-locs.csv -v /tmp/sample-table.sql:/sample-table.sql store/ibmcorp/db2wh_ce:v3.0.1-db2wh_client-linux'
    

Upload sample data

To help you test the deployment, you upload sample data to IBM Db2 Warehouse server.

  1. In Cloud Shell, display the password created earlier:

    echo $DB2_ADMIN_PASSWORD
    
  2. Open a shell window in the IBM Db2 Warehouse Client container:

    gcloud compute ssh db2wh-1 \
        --zone=us-central1-c \
        --command='sudo docker exec -it db2wh-client cli' \
        -- -t
    
  3. Create an environment variable that stores the password, where [PASSWORD] is the password you got earlier in the procedure:

    DB_PASSWORD=[PASSWORD]
    
  4. Create an environment variable that stores the database alias:

    DB_ALIAS=BLUDB
    

    BLUDB is the default database name in IBM Db2 Warehouse.

  5. Create an environment variable that stores the database hostname:

    DB_HOST=db2wh-1
    
  6. Set up the database catalog:

    db_catalog --add $DB_HOST --alias $DB_ALIAS
    
  7. Create a table to hold sample data in IBM Db2 Warehouse server:

    dbsql -f /sample-table.sql -d $DB_ALIAS -h $DB_HOST -u bluadmin -W $DB_PASSWORD
    
  8. Upload data to the IBM Db2 Warehouse server:

    dbload -verbose -host $DB_HOST -u bluadmin -pw $DB_PASSWORD -db $DB_ALIAS -schema BLUADMIN -t NYC_FREE_PUBLIC_WIFI -df /nyc-wifi-locs.csv -delim ',' -quotedValue DOUBLE -timeStyle 12HOUR -skipRows 1
    
  9. Close the IBM Db2 Warehouse Client shell:

    exit
    

Validate the data using the administration console

You can now connect to the IBM Db2 Warehouse administration console and verify the data that you uploaded.

  1. In Cloud Shell, find the service's external IP address:

    gcloud compute instances describe db2wh-1 \
        --zone=us-central1-c \
        --format='value(networkInterfaces[0].accessConfigs[0].natIP)'
    
  2. Open a browser and go to the following URL, where [EXTERNAL_IP] is the IP address from the preceding step:

    https://[EXTERNAL_IP]:8443
    

    You can bypass the security warning.

  3. Log in with the following credentials:

    • Username: bluadmin
    • Password: (the password you created in the previous procedure)
  4. If you accept the IBM Db2 Warehouse EULA, click Accept.

  5. On the left-hand side, open the menu and then select Administer > Tables:

    IBM Db2 Warehouse administration UI, showing the Tables folder

  6. Close the Quick Tour popup.

  7. Click NYC_FREE_PUBLIC_WIFI:

    Listing of available wi-fi hotspots

  8. Click the Data Distribution tab and make sure the table is populated:

    Data Distribution tab in the Tables listing

    You see 2871 rows in total, which is the entire dataset.

  9. Click Generate SQL.

  10. Select SELECT statement.

  11. Click OK.

    The Generate SQL tab opens and is pre-populated with an auto-generated SELECT statement.

  12. Add a LIMIT clause to the auto-generated SELECT statement to limit the results to the first five records:

    SELECT "THE_GEOM", "OBJECTID", "BORO", "TYPE", "PROVIDER", "NAME", "LOCATION",
           "LAT", "LON", "X", "Y", "LOCATION_T", "REMARKS", "CITY", "SSID",
           "SOURCEID", "ACTIVATED", "BOROCODE", "BORONAME", "NTACODE", "NTANAME",
           "COUNDIST", "POSTCODE", "BOROCD", "CT2010", "BOROCT2010", "BIN", "BBL", "DOITT_ID"
      FROM "BLUADMIN"."NYC_FREE_PUBLIC_WIFI"
      LIMIT 5;
    
  13. Click Run and then select Run All.

    A listing of records is displayed in the Result Set tab, showing that you successfully uploaded the sample data.

    Results of running SQL statement showing 5 records from the uploaded data

Clean up

To avoid incurring charges to your Google Cloud account for the resources used in this tutorial, either delete the project that contains the resources, or keep the project and delete the individual resources.

Delete the project

  1. In the Cloud Console, go to the Manage resources page.

    Go to Manage resources

  2. In the project list, select the project that you want to delete, and then click Delete.
  3. In the dialog, type the project ID, and then click Shut down to delete the project.

What's next