Creating Custom Interactive Dashboards with Bokeh and BigQuery

In this tutorial, you learn how to build a custom interactive dashboard app on Google Cloud Platform (GCP) by using the Bokeh library to visualize data from publicly available BigQuery datasets. You also learn how to deploy this app with both security and scalability in mind.

Powerful software-as-a-service (SaaS) solutions like Google Data Studio are extremely useful when building interactive data visualization dashboards. However, these types of solutions might not provide sufficient customization options. For those scenarios, you can use open source libraries like D3.js, Chart.js, or Bokeh to create custom dashboards. While these libraries offer a lot of flexibility for building dashboards with tailored features and visualizations, a major challenge remains: deploying these dashboards to the internet to enable easy stakeholder access.

Objectives

Costs

This tutorial uses the following billable components of Google Cloud Platform:

  • BigQuery
  • Compute Engine
  • Cloud Storage
  • Cloud CDN
  • Load Balancer
  • Static IP address

You can use the pricing calculator to generate a cost estimate based on your projected usage. New GCP users might be eligible for a free trial.

Before you begin

  1. Select or create a Google Cloud Platform project.

    Go to the Manage resources page

  2. Make sure that billing is enabled for your Google Cloud Platform project.

    Learn how to enable billing

  3. Enable the GKE and BigQuery APIs.

    Enable the APIs

  4. Start a Cloud Shell instance. You'll use Cloud Shell throughout this tutorial.

    OPEN Cloud Shell

  5. Clone the Git repository by running the following command in Cloud Shell:

    git clone https://github.com/GoogleCloudPlatform/bigquery-bokeh-dashboard
  6. In Cloud Shell, set the default Compute Engine zone to the zone where you are going to create your GKE cluster. This tutorial uses the us-central1-a zone.

    gcloud config set compute/zone us-central1-a
  7. To allow the demo app to access BigQuery, create a service account key by running the following commands:

    export PROJECT_ID=$(gcloud info --format='value(config.project)')
    export SERVICE_ACCOUNT_NAME="dashboard-demo"
    cd bigquery-bokeh-dashboard
    gcloud iam service-accounts create $SERVICE_ACCOUNT_NAME \
        --display-name="Bokeh/BigQuery Dashboard Demo"
    gcloud projects add-iam-policy-binding $PROJECT_ID \
        --member="serviceAccount:${SERVICE_ACCOUNT_NAME}@${PROJECT_ID}.iam.gserviceaccount.com" \
        --role roles/bigquery.user
    gcloud iam service-accounts keys create --iam-account \
        "${SERVICE_ACCOUNT_NAME}@${PROJECT_ID}.iam.gserviceaccount.com" \
        service-account-key.json
          

    For information on service accounts, see service accounts.

Understanding the demo app

The demo app displays a dashboard that processes and visualizes publicly available data for all 50 U.S. states. The dashboard provides a simple level of interaction: the user can select any of the fifty states from a dropdown widget to drill down and display information about that state. The dashboard contains four different modules, and each module displays different types of information:

A screenshot of the running demo follows:

Dashboard view of the app with graphs.

Overall architecture

At a high-level, the demo's architecture consists of six main elements. The following diagram shows how these components interact:

Architecture of the demo app.

These elements include:

  • Client web browsers on the desktop machines and mobile devices of the dashboard users.
  • An authentication proxy managed by Cloud IAP, responsible for controlling access to the app.
  • An HTTPS Load Balancer responsible both for effectively distributing incoming web traffic to the appropriate backends and for handling SSL decryption/encryption of data coming in and out of the system.
  • The app backend responsible for serving the dashboard's dynamic content (HTML and plot data).
  • The static assets backend serving the static JavaScript and CSS files needed for client web browser rendering.

App backend

The app backend receives incoming requests from the load balancer and then fetches and transforms the appropriate data from BigQuery. The backend returns the requested dynamic content, in the form of HTML over HTTP, and returns plotting data through WebSockets.

The following diagram shows the architecture:

Architecture of the app's backend.

The app backend contains two main subcomponents:

  • The Bokeh service. The app's centerpiece, this service contains pods running the Bokeh servers that query data from BigQuery and generate the dynamic content consumed by the client web browsers.
  • The Memcached service. Contains pods running Memcached servers that are responsible for caching frequently requested data.

These subcomponents exist in Docker containers and are deployed as Kubernetes pods to Kubernetes Engine to enable horizontal scalability.

Bokeh service

In this demo, the Bokeh service contains two Bokeh pods. Each pod runs a separate Bokeh server instance, which is a wrapper around Tornado's non-blocking web server. This server can serve information synchronously over HTTP for HTML content and asynchronously through Websockets for the dashboard's plotting data.

For this tutorial, each Bokeh server has been configured to fork into four different work subprocesses, reflected by the --num-procs parameter in the container Dockerfile's CMD statement:

CMD bokeh serve --disable-index-redirect --num-procs=4 --port=5006 --address=0.0.0.0 --allow-websocket-origin=$DASHBOARD_DEMO_DOMAIN dashboard.py

The Bokeh server automatically load balances incoming traffic between the subprocesses. This approach increases the performance and resilience of each pod. The number of subprocesses used in this tutorial is arbitrary. In a real-world scenario, you would adjust this number based on actual production traffic and the memory and CPU resources available in the cluster.

BigQuery is accessed directly from the Bokeh app by using the read_gbq method. For more details, see the documentation about the read_gbq method, available in the Pandas library's gbq extension. This method has a very simple interface. It accepts a query as a parameter and submits it over the network to BigQuery. Other required parameters include the project ID and the service account key, both of which can be passed through environment variables loaded from GKE secrets. The read_gbq method then transparently handles the pagination of results returned by BigQuery, and consolidates them into a single Pandas DataFrame object, which Bokeh can directly process.

As an illustration, here is an example of a query submitted with read_gbq to collect the top 100 most populated ZIP Codes in California (state code 'CA'):

query = """
    SELECT
      A.zipcode,
      Population,
      City,
      State_code
    FROM
      `bigquery-public-data.census_bureau_usa.population_by_zip_2010` AS A
    JOIN
      `bigquery-public-data.utility_us.zipcode_area` AS B
    ON
     A.zipcode = B.zipcode
    WHERE
     gender = ''
    AND
     state_code = '%(state)s'
    ORDER BY
     population DESC
    LIMIT
     100
"""

import os
import pandas
dataframe = pandas.io.gbq.read_gbq(
   query % 'CA',
   project_id=os.environ['GOOGLE_PROJECT_ID'],
   private_key=os.environ['GOOGLE_APPLICATION_CREDENTIALS'],
   dialect='standard'
)

Note that, because the resulting data needs to travel over the network, offloading as much pre-processing to BigQuery as possible is highly recommended. For example, do any necessary downsampling at the query level (by using the LIMIT clause, targeting fields in the SELECT clause, pre-segmenting tables, or pre-grouping results using the GROUP BY clause) in order to reduce the overall network traffic and speed up response times.

Recall that the demo dashboard displays four different plot modules. When the user selects a U.S. state, the app sends four separate queries (one for each module) to BigQuery, processing a combined total of 7 gigabytes of data. Each of these BigQuery queries takes an average of 1 to 3 seconds to run. During this processing lapse, the app remains idle, waiting for the results to return from BigQuery. To optimize the dashboard's overall response time, these four queries run in parallel on separate threads, as shown in the following code:

def fetch_data(state):
    """
    Fetch data from BigQuery for the given US state by running
    the queries for all dashboard modules in parallel.
    """
    t0 = time.time()
    # Collect fetch methods for all dashboard modules
    fetch_methods = {module.id: getattr(module, 'fetch_data') for module in modules}
    # Create a thread pool: one separate thread for each dashboard module
    with concurrent.futures.ThreadPoolExecutor(max_workers=len(fetch_methods)) as executor:
        # Prepare the thread tasks
        tasks = {}
        for key, fetch_method in fetch_methods.items():
            task = executor.submit(fetch_method, state)
            tasks[task] = key
        # Run the tasks and collect results as they arrive
        results = {}
        for task in concurrent.futures.as_completed(tasks):
            key = tasks[task]
            results[key] = task.result()
    # Return results once all tasks have been completed
    t1 = time.time()
    timer.text = '(Execution time: %s seconds)' % round(t1 - t0, 4)
    return results

When all the results have been received from BigQuery, Bokeh's graph library plots the data. For example, the following code plots the evolution of air pollutant levels over time:

def make_plot(self, dataframe):
    self.source = ColumnDataSource(data=dataframe)
    palette = all_palettes['Set2'][6]
    hover_tool = HoverTool(tooltips=[
        ("Value", "$y"),
        ("Year", "@year"),
    ])
    self.plot = figure(
        plot_width=600, plot_height=300, tools=[hover_tool],
        toolbar_location=None)
    columns = {
        'pm10': 'PM10 Mass (µg/m³)',
        'pm25_frm': 'PM2.5 FRM (µg/m³)',
        'pm25_nonfrm': 'PM2.5 non FRM (µg/m³)',
        'lead': 'Lead (¹/₁₀₀ µg/m³)',
    }
    for i, (code, label) in enumerate(columns.items()):
        self.plot.line(
            x='year', y=code, source=self.source, line_width=3,
            line_alpha=0.6, line_color=palette[i], legend=label)

    self.title = Paragraph(text=TITLE)
    return column(self.title, self.plot)

Deploying the Bokeh service

To facilitate horizontal scalability for the app, all components of the app backend are deployed with Docker containers using GKE. The steps for deploying these components to GKE follow.

  1. First, create a GKE cluster, with an arbitrary number of 2 nodes, by running the following command in the Cloud Shell:

    gcloud container clusters create dashboard-demo-cluster \
      --tags dashboard-demo-node \
      --num-nodes=2
    
  2. Create GKE secrets to securely store the project ID and the service account key that you created previously.

    export PROJECT_ID=$(gcloud info --format='value(config.project)')
    
    kubectl create secret generic project-id \
      --from-literal project-id=$PROJECT_ID
    
    kubectl create secret generic service-account-key \
      --from-file service-account-key=service-account-key.json
    
  3. Create a static IP address:

    gcloud compute addresses create dashboard-demo-static-ip --global
    
  4. Define a domain using xip.io, a free DNS service that automatically resolves domains containing an IP address back to that same IP address:

    export STATIC_IP=$(gcloud compute addresses describe \
      dashboard-demo-static-ip --global --format="value(address)")
    
    export DOMAIN="${STATIC_IP}.xip.io"
    
    echo "My domain: ${DOMAIN}"
    
  5. Create a GKE secret with the domain:

    kubectl create secret generic domain --from-literal domain=$DOMAIN
    
  6. The Bokeh service is now ready to be deployed with this command:

    kubectl create -f kubernetes/bokeh.yaml
    

The last step deploys an arbitrary number of pods, in this case two, each one running a separate instance of the demo app. It also exposes as environment variables the three GKE secrets created earlier through the env.valueFrom.secretKeyRef entry, so that this information can be retrieved by the app to access BigQuery.

For details, see bokeh.yaml.

Memcached service

When you don't expect the data displayed on a dashboard to change for a given period of time, it's important to consider performance and cost optimizations. For this purpose, and by default, BigQuery internally caches the results from duplicated queries for a period of 24 hours. This greatly speeds up response times and avoids unnecessary billing charges. For more information, see information about this caching mechanism in the documentation about the Memcached service.

This caching mechanism can be further improved by caching frequently used data locally inside the dashboard app's memory. This approach takes BigQuery out of the loop and avoids unnecessary network traffic to collect the cached data. This tutorial uses Memcached to handle this type of app-level caching. Memcached is extremely fast and reduces response times from a few seconds to a few milliseconds in the case of this tutorial.

One peculiarity of Memcached is that it does not have a built-in load-balancing mechanism. Instead, load-balancing between multiple Memcached pods must be done by the Memcached client, which is included in each Bokeh pod, as illustrated in the following diagram:

Architecture of the Memchached pod.

For the Memcached pods to be discoverable by the Memcached client, the Memcached service must be declared as headless, which is achieved by using the clusterIP: None entry in the GKE configuration file (see memcached.yaml). The Memcached client can then retrieve IP addresses of the individual pods by querying kube-dns for the memcached.default.svc.cluster.local domain name.

Deploying the Memcached service

Similar to the Bokeh service, the Memcached service is deployed by using a Docker container to GKE using Kubernetes.

Deploy the Memcached pods and headless service by running the following command:

kubectl create -f kubernetes/memcached.yaml

Adding the load balancer

At this point, the dashboard app is running inside a private service. You need to introduce an HTTPS Load Balancer to expose the app to the internet so users can access the dashboard. You will do this in the following steps:

  1. Create an SSL certificate.
  2. Create a load balancer.
  3. Extend the connection timeout.

Creating an SSL certificate

In real-world scenarios, data displayed on dashboards can be sensitive or private. You should use SSL to encrypt this data before it gets sent to the users over the internet. The first step is to create an SSL certificate.

In a production setting, you would have to issue a certificate through an official Certificate Authority. For the purpose of this tutorial, you can issue a self-signed certificate by running the commands that follow:

export STATIC_IP=$(gcloud compute addresses describe \
  dashboard-demo-static-ip --global --format="value(address)")

export DOMAIN="${STATIC_IP}.xip.io"

mkdir /tmp/dashboard-demo-ssl

cd /tmp/dashboard-demo-ssl

openssl genrsa -out ssl.key 2048

openssl req -new -key ssl.key -out ssl.csr -subj "/CN=${DOMAIN}"

openssl x509 -req -days 365 -in ssl.csr -signkey ssl.key -out ssl.crt

cd -

Creating the HTTPS load balancer

The HTTPS load balancer comprises multiple resources:

  • A global forwarding rule.
  • A health check.
  • A backend service.
  • A URL map.
  • A target proxy.

To create all of those resources simultaneously included in the GitHub source repository:

./create-lb.sh

Verify that the dashboard is live by visiting the URL returned by this command:

echo https://${STATIC_IP}.xip.io/dashboard

If your browser returns an error with a 502 code, the load balancer is still being deployed. It might take a few minutes for this deployment to complete. Wait for a few seconds, and then refresh the page. Try this until the page works and the dashboard appears.

At this point, the HTTPS load balancer terminates the encryption. Traffic coming in and out of the system uses HTTPS and Secure WebSockets, while traffic inside the system remains unencrypted using regular HTTP and WebSockets. This level of encryption is sufficient in most scenarios. If you require extra security, consider enforcing SSL encryption in the GCP project's Virtual Private Cloud network. Setting up SSL encryption in the VPC network is out of scope for this tutorial.

Extending the connection timeout

By default, the HTTPS load balancer closes all connections that are open for longer than 30 seconds.

To allow the dashboard's WebSocket connections to stay open for a longer period of time, you need to increase the connection timeout. The following gcloud command sets the timeout to one day (86,400 seconds):

gcloud compute backend-services update dashboard-demo-service \
    --global --timeout=86400

Static assets backend

The Bokeh pods can directly serve all the static JavaScript and CSS files required to render the dashboard. However, letting those pods handle this task has a number of drawbacks:

  • It places an unnecessary load on the Bokeh pods, potentially reducing their capacity for serving other types of resources, such as the dynamic plotting data.
  • It forces clients to download assets from the same origin server, regardless of where the clients are geographically located.

To alleviate those drawbacks, create a separate dedicated backend to efficiently serve static assets.

This static assets backend uses two main components:

  • A bucket hosted in Cloud Storage), responsible for holding the source JavaScript and CSS assets.
  • Cloud CDN), responsible for globally distributing these assets closer to potential dashboard users, resulting in lower latency and faster page loads.

The following diagram shows the architecture:

Architecture of the static assets backend.

Creating the static assets backend

  1. Extract the static assets from the Bokeh container and store them locally in a temporary directory by running the following commands:

    gcloud auth configure-docker --quiet
    
    export DOCKER_IMAGE="gcr.io/cloud-solutions-images/dashboard-demo"
    
    export STATIC_PATH=$(docker run $DOCKER_IMAGE bokeh info --static)
    
    export TMP_PATH="/tmp/dashboard-demo-static"
    
    mkdir $TMP_PATH
    
    docker run $DOCKER_IMAGE tar Ccf $(dirname $STATIC_PATH) - \
        static | tar Cxf $TMP_PATH -
    
  2. Create the bucket and upload all the assets to that bucket using the gsutil command included in the Cloud SDK:

    export BUCKET="${PROJECT_ID}-dashboard-demo"
    
    # Create the bucket
    gsutil mb gs://$BUCKET
    
    # Upload the assets to the bucket
    gsutil -m cp -z js,css -a public-read -r $TMP_PATH/static gs://$BUCKET
    

    The -z option applies gzip encoding to files with the js and css filename extensions, which saves network bandwidth and space in Cloud Storage, reducing storage costs and speeding up page loads. The -a option sets the permissions to public-read so that the uploaded assets can be publicly accessible by all clients.

  3. Create the backend bucket, including Cloud CDN support using the --enable-cdn option:

    gcloud compute backend-buckets create dashboard-demo-backend-bucket \
        --gcs-bucket-name=$BUCKET --enable-cdn
    
  4. Connect the backend bucket to the HTTPS load balancer for all traffic requesting assets inside the /static/* directory:

    gcloud compute url-maps add-path-matcher dashboard-demo-urlmap \
        --default-service dashboard-demo-service \
        --path-matcher-name bucket-matcher \
        --backend-bucket-path-rules="/static/*=dashboard-demo-backend-bucket"
    
  5. This configuration change can take a few minutes to propagate. Wait a moment, then verify that Cloud CDN is working by checking that the response headers for static assets contain x-goog-* parameters:

    curl -sD - -o /dev/null "https://${DOMAIN}/static/js/bokeh.min.js" | grep x-goog
    

Authentication

While the SSL encryption previously enabled at the load balancer level protects the transport, we recommend adding an authentication layer in order to control access to the app and to prevent potential abuse. One option is to enable Cloud IAP.

The first step is to set up an OAuth consent screen:

  1. Set up your OAuth consent screen:

    CONFIGURE THE CONSENT SCREEN

    1. Under Email address, select the email address you want to display as a public contact. This must be your email address, or a Google Group you own.
    2. Enter "Dashboard Demo" as the Product name.
    3. Click Save.
  2. Under Credentials, click Create credentials > OAuth client ID.

  3. Under Application type, select Web application, then enter "Dashboard demo" as the Name, and specify Authorized redirect URIs in the format:

    https://[STATIC_IP].xip.io/_gcp_gatekeeper/authenticate
    

    Replace [STATIC_IP] with the actual static IP address created for this tutorial.

  4. When you're finished entering details, click Create and make note of the client ID and client secret shown in the OAuth client window.

  5. Enable Cloud IAP by running the following commands in the Cloud Shell. Replace the [CLIENT_ID] and [CLIENT_SECRET] values with the actual values noted in the previous step:

    gcloud beta compute backend-services update dashboard-demo-service --global \
        --iap=enabled,oauth2-client-id=[CLIENT_ID],oauth2-client-secret=[CLIENT_SECRET]
    
  6. Finally, add a member to allow access to the app. Replace the [EMAIL] value with your Google account email address:

    gcloud projects add-iam-policy-binding $PROJECT_ID \
        --role roles/iap.httpsResourceAccessor \
        --member user:[EMAIL]
    

This configuration change can take a few minutes to propagate. When done, the app is protected by Cloud IAP, and only the user with the email address provided above can access it. You can add more users by using the same process as in the last step above.

Cleaning up

To avoid incurring charges to your Google Cloud Platform account for the resources used in this tutorial:

Deleting the project

The easiest way to eliminate billing is to delete the project that you created for the tutorial.

To delete the project:

  1. In the GCP Console, go to the Projects page.

    Go to the Projects page

  2. In the project list, select the project you want to delete and click Delete .
  3. In the dialog, type the project ID, and then click Shut down to delete the project.

Deleting individual resources

Run the following commands to delete individual resources, instead of deleting the whole project:

export PROJECT_ID=$(gcloud info --format='value(config.project)')

# Delete the load balancer resources
gcloud compute forwarding-rules \
    delete dashboard-demo-gfr --global --quiet
gcloud compute target-https-proxies \
    delete dashboard-demo-https-proxy --quiet
gcloud compute ssl-certificates \
    delete dashboard-demo-ssl-cert --quiet
gcloud compute url-maps \
    delete dashboard-demo-urlmap --quiet
gcloud compute backend-services \
    delete dashboard-demo-service --global --quiet
gcloud compute health-checks delete \
    dashboard-demo-basic-check --quiet
gcloud compute firewall-rules delete \
    gke-dashboard-demo-lb7-fw --quiet

# Delete the Kubernetes Engine cluster
gcloud container clusters delete dashboard-demo-cluster --quiet

# Delete the service account
export SERVICE_ACCOUNT_NAME="dashboard-demo"
gcloud iam service-accounts delete \
    "${SERVICE_ACCOUNT_NAME}@${PROJECT_ID}.iam.gserviceaccount.com" --quiet

# Delete the backend bucket
gcloud compute backend-buckets delete dashboard-demo-backend-bucket --quiet

# Delete the Cloud Storage bucket
export BUCKET="${PROJECT_ID}-dashboard-demo"
gsutil -m rm -r gs://$BUCKET

# Delete the static IP address
gcloud compute addresses delete dashboard-demo-static-ip --global --quiet

# Delete the local Docker image
docker rmi -f gcr.io/cloud-solutions-images/dashboard-demo

Last, delete the Oauth client ID:

  1. Open the Credentials page.

    OPEN CREDENTIALS

  2. Click the "Dashboard demo" entry.

  3. Click DELETE, then DELETE again to confirm.

What's next

  • Try out other Google Cloud Platform features for yourself. Have a look at our tutorials.
Was this page helpful? Let us know how we did:

Send feedback about...