Powerful software-as-a-service (SaaS) solutions like Google Data Studio are extremely useful when building interactive data visualization dashboards. However, these types of solutions might not provide sufficient customization options. For those scenarios, you can use open source libraries like D3.js, Chart.js, or Bokeh to create custom dashboards. While these libraries offer a lot of flexibility for building dashboards with tailored features and visualizations, a major challenge remains: deploying these dashboards to the internet to enable easy stakeholder access.
Objectives
- Learn how to connect to BigQuery from a custom app built using Bokeh.
- Implement optimization techniques, including parallel queries and app-level caching using Memcached.
- Deploy the dashboard demo to Google Kubernetes Engine (GKE) using Kubernetes.
- Create an HTTPS Load Balancer.
- Deploy a CDN-enabled backend for serving static assets.
- Secure the app by using Identity-Aware Proxy authentication and SSL encryption.
Costs
This tutorial uses the following billable components of Google Cloud:
- BigQuery
- Compute Engine
- Cloud Storage
- Cloud CDN
- Load Balancer
- Static IP address
To generate a cost estimate based on your projected usage, use the pricing calculator. New Google Cloud users might be eligible for a free trial.
Before you begin
-
In the Google Cloud Console, on the project selector page, select or create a Google Cloud project.
-
Make sure that billing is enabled for your Cloud project. Learn how to confirm that billing is enabled for your project.
- Enable the GKE and BigQuery APIs.
Start a Cloud Shell instance. You'll use Cloud Shell throughout this tutorial.
Clone the Git repository by running the following command in Cloud Shell:
git clone https://github.com/GoogleCloudPlatform/bigquery-bokeh-dashboard
In Cloud Shell, set the default Compute Engine zone to the zone where you are going to create your GKE cluster. This tutorial uses the
us-central1-a
zone.gcloud config set compute/zone us-central1-a
To allow the demo app to access BigQuery, create a service account key by running the following commands:
export PROJECT_ID=$(gcloud info --format='value(config.project)') export SERVICE_ACCOUNT_NAME="dashboard-demo" cd bigquery-bokeh-dashboard gcloud iam service-accounts create $SERVICE_ACCOUNT_NAME \ --display-name="Bokeh/BigQuery Dashboard Demo" gcloud projects add-iam-policy-binding $PROJECT_ID \ --member="serviceAccount:${SERVICE_ACCOUNT_NAME}@${PROJECT_ID}.iam.gserviceaccount.com" \ --role roles/bigquery.user gcloud iam service-accounts keys create --iam-account \ "${SERVICE_ACCOUNT_NAME}@${PROJECT_ID}.iam.gserviceaccount.com" \ service-account-key.json
For information on service accounts, see service accounts.
Understanding the demo app
The demo app displays a dashboard that processes and visualizes publicly available data for all 50 U.S. states. The dashboard provides a simple level of interaction: the user can select any of the fifty states from a dropdown widget to drill down and display information about that state. The dashboard contains four different modules, and each module displays different types of information:
- A line plot shows the evolution of various air pollutant levels over the past several decades. Data comes from the EPA Historical Air Quality Dataset.
- A bar plot displays precipitation levels for the year 2016. Data comes from the NOAA Global Historical Climatology Network Weather Dataset.
- A line plot indicates the magnitude and average temperature levels for the year 2016. Data from the NOAA Global Surface Summary of the Day Weather Dataset.
- A table lists the top 100 most populated ZIP Codes. Data from the United States Census Bureau Dataset.
A screenshot of the running demo follows:

Overall architecture
At a high-level, the demo's architecture consists of six main elements. The following diagram shows how these components interact:
These elements include:
- Client web browsers on the desktop machines and mobile devices of the dashboard users.
- An authentication proxy managed by IAP, responsible for controlling access to the app.
- An HTTPS Load Balancer responsible both for effectively distributing incoming web traffic to the appropriate backends and for handling SSL decryption/encryption of data coming in and out of the system.
- The app backend responsible for serving the dashboard's dynamic content (HTML and plot data). This app backend uses a Docker image that is built with a Dockerfile that is available on GitHub. You can build your own image, but that step is out of scope for this tutorial.
- The static assets backend serving the static JavaScript and CSS files needed for client web browser rendering.
App backend
The app backend receives incoming requests from the load balancer and then fetches and transforms the appropriate data from BigQuery. The backend returns the requested dynamic content, in the form of HTML over HTTP, and returns plotting data through WebSockets.
The following diagram shows the architecture:
The app backend contains two main subcomponents:
- The Bokeh service. The app's centerpiece, this service contains pods running the Bokeh servers that query data from BigQuery and generate the dynamic content consumed by the client web browsers.
- The Memcached service. Contains pods running Memcached servers that are responsible for caching frequently requested data.
These subcomponents exist in Docker containers and are deployed as Kubernetes pods to Kubernetes Engine to enable horizontal scalability.
Bokeh service
In this demo, the Bokeh service contains two Bokeh pods. Each pod runs a separate Bokeh server instance, which is a wrapper around Tornado's non-blocking web server. This server can serve information synchronously over HTTP for HTML content and asynchronously through Websockets for the dashboard's plotting data.
For this tutorial, each Bokeh server has been configured to fork into four
different work subprocesses, reflected by the --num-procs
parameter in the
container
Dockerfile's
CMD
statement:
The Bokeh server automatically load balances incoming traffic between the subprocesses. This approach increases the performance and resilience of each pod. The number of subprocesses used in this tutorial is arbitrary. In a real-world scenario, you would adjust this number based on actual production traffic and the memory and CPU resources available in the cluster.
BigQuery is accessed directly from the Bokeh app by using the
read_gbq
method. For more details, see the
documentation about the read_gbq
method,
available in the Pandas
library's
gbq
extension. This method has a very simple interface. It accepts a query as a
parameter and submits it over the network to BigQuery. Other
required parameters include the project ID and the service account key, both of
which can be passed through environment variables loaded from
GKE secrets. The read_gbq
method then transparently
handles the pagination of results returned by BigQuery, and
consolidates them into a single Pandas
DataFrame
object, which Bokeh can directly process.
As an illustration, here is an example of a query submitted with read_gbq
to
collect the top 100 most populated ZIP Codes in California (state code 'CA'):
query = """
SELECT
A.zipcode,
Population,
City,
State_code
FROM
`bigquery-public-data.census_bureau_usa.population_by_zip_2010` AS A
JOIN
`bigquery-public-data.utility_us.zipcode_area` AS B
ON
A.zipcode = B.zipcode
WHERE
gender = ''
AND
state_code = '%(state)s'
ORDER BY
population DESC
LIMIT
100
"""
import os
import pandas
dataframe = pandas.io.gbq.read_gbq(
query % 'CA',
project_id=os.environ['GOOGLE_PROJECT_ID'],
private_key=os.environ['GOOGLE_APPLICATION_CREDENTIALS'],
dialect='standard'
)
Note that, because the resulting data needs to travel over the network,
offloading as much pre-processing to BigQuery as possible is
highly recommended. For example, do any necessary downsampling at the query
level (by using the LIMIT
clause, targeting fields in the SELECT
clause,
pre-segmenting tables, or pre-grouping results using the GROUP BY
clause) in
order to reduce the overall network traffic and speed up response times.
Recall that the demo dashboard displays four different plot modules. When the user selects a U.S. state, the app sends four separate queries (one for each module) to BigQuery, processing a combined total of 7 gigabytes of data. Each of these BigQuery queries takes an average of 1 to 3 seconds to run. During this processing lapse, the app remains idle, waiting for the results to return from BigQuery. To optimize the dashboard's overall response time, these four queries run in parallel on separate threads, as shown in the following code:
When all the results have been received from BigQuery, Bokeh's graph library plots the data. For example, the following code plots the evolution of air pollutant levels over time:
Deploying the Bokeh service
To facilitate horizontal scalability for the app, all components of the app backend are deployed with Docker containers using GKE. The steps for deploying these components to GKE follow.
First, create a GKE cluster, with an arbitrary number of 2 nodes, by running the following command in the Cloud Shell:
gcloud container clusters create dashboard-demo-cluster \ --tags dashboard-demo-node \ --num-nodes=2
Create GKE secrets to securely store the project ID and the service account key that you created previously.
export PROJECT_ID=$(gcloud info --format='value(config.project)') kubectl create secret generic project-id \ --from-literal project-id=$PROJECT_ID kubectl create secret generic service-account-key \ --from-file service-account-key=service-account-key.json
Create a static IP address:
gcloud compute addresses create dashboard-demo-static-ip --global
Define a domain using xip.io, a free DNS service that automatically resolves domains containing an IP address back to that same IP address:
export STATIC_IP=$(gcloud compute addresses describe \ dashboard-demo-static-ip --global --format="value(address)") export DOMAIN="${STATIC_IP}.xip.io" echo "My domain: ${DOMAIN}"
Create a GKE secret with the domain:
kubectl create secret generic domain --from-literal domain=$DOMAIN
The Bokeh service is now ready to be deployed with this command:
kubectl create -f kubernetes/bokeh.yaml
The last step deploys an arbitrary number of pods, in this case two, each one
running a separate instance of the demo app. It also exposes as
environment variables the three GKE secrets created earlier
through the env.valueFrom.secretKeyRef
entry, so that this information can be
retrieved by the app to access BigQuery.
For details, see bokeh.yaml.
Memcached service
When you don't expect the data displayed on a dashboard to change for a given period of time, it's important to consider performance and cost optimizations. For this purpose, and by default, BigQuery internally caches the results from duplicated queries for a period of 24 hours. This greatly speeds up response times and avoids unnecessary billing charges. For more information, see information about this caching mechanism in the documentation about the Memcached service.
This caching mechanism can be further improved by caching frequently used data locally inside the dashboard app's memory. This approach takes BigQuery out of the loop and avoids unnecessary network traffic to collect the cached data. This tutorial uses Memcached to handle this type of app-level caching. Memcached is extremely fast and reduces response times from a few seconds to a few milliseconds in the case of this tutorial.
One peculiarity of Memcached is that it does not have a built-in load-balancing mechanism. Instead, load-balancing between multiple Memcached pods must be done by the Memcached client, which is included in each Bokeh pod, as illustrated in the following diagram:
For the Memcached pods to be discoverable by the Memcached client, the
Memcached service must be declared as
headless,
which is achieved by using the clusterIP: None
entry in the GKE
configuration file (see
memcached.yam
l).
The Memcached client can then retrieve IP addresses of the individual pods by
querying
kube-dns
for the memcached.default.svc.cluster.local
domain name.
Deploying the Memcached service
Similar to the Bokeh service, the Memcached service is deployed by using a Docker container to GKE using Kubernetes.
Deploy the Memcached pods and headless service by running the following command:
kubectl create -f kubernetes/memcached.yaml
Adding the load balancer
At this point, the dashboard app is running inside a private service. You need to introduce an HTTPS Load Balancer to expose the app to the internet so users can access the dashboard. You will do this in the following steps:
- Create an SSL certificate.
- Create a load balancer.
- Extend the connection timeout.
Creating an SSL certificate
In real-world scenarios, data displayed on dashboards can be sensitive or private. You should use SSL to encrypt this data before it gets sent to the users over the internet. The first step is to create an SSL certificate.
In a production setting, you would have to issue a certificate through an official Certificate Authority. For the purpose of this tutorial, you can issue a self-signed certificate by running the commands that follow:
export STATIC_IP=$(gcloud compute addresses describe \
dashboard-demo-static-ip --global --format="value(address)")
export DOMAIN="${STATIC_IP}.xip.io"
mkdir /tmp/dashboard-demo-ssl
cd /tmp/dashboard-demo-ssl
openssl genrsa -out ssl.key 2048
openssl req -new -key ssl.key -out ssl.csr -subj "/CN=${DOMAIN}"
openssl x509 -req -days 365 -in ssl.csr -signkey ssl.key -out ssl.crt
cd -
Creating the HTTPS load balancer
The HTTPS load balancer comprises multiple resources:
- A global forwarding rule.
- A health check.
- A backend service.
- A URL map.
- A target proxy.
To create all of those resources simultaneously included in the GitHub source repository:
./create-lb.sh
Verify that the dashboard is live by visiting the URL returned by this command:
echo https://${STATIC_IP}.xip.io/dashboard
If your browser returns an error with a 502 code, the load balancer is still being deployed. It might take a few minutes for this deployment to complete. Wait for a few seconds, and then refresh the page. Try this until the page works and the dashboard appears.
At this point, the HTTPS load balancer terminates the encryption. Traffic coming in and out of the system uses HTTPS and Secure WebSockets, while traffic inside the system remains unencrypted using regular HTTP and WebSockets. This level of encryption is sufficient in most scenarios. If you require extra security, consider enforcing SSL encryption in the Google Cloud project's Virtual Private Cloud network. Setting up SSL encryption in the VPC network is out of scope for this tutorial.
Extending the connection timeout
By default, the HTTPS load balancer closes all connections that are open for longer than 30 seconds.
To allow the dashboard's WebSocket connections to stay open for a longer period
of time, you need to increase the connection timeout. The following gcloud
command sets the timeout to one day (86,400 seconds):
gcloud compute backend-services update dashboard-demo-service \
--global --timeout=86400
Static assets backend
The Bokeh pods can directly serve all the static JavaScript and CSS files required to render the dashboard. However, letting those pods handle this task has a number of drawbacks:
- It places an unnecessary load on the Bokeh pods, potentially reducing their capacity for serving other types of resources, such as the dynamic plotting data.
- It forces clients to download assets from the same origin server, regardless of where the clients are geographically located.
To alleviate those drawbacks, create a separate dedicated backend to efficiently serve static assets.
This static assets backend uses two main components:
- A bucket hosted in Cloud Storage), responsible for holding the source JavaScript and CSS assets.
- Cloud CDN), responsible for globally distributing these assets closer to potential dashboard users, resulting in lower latency and faster page loads.
The following diagram shows the architecture:
Creating the static assets backend
Extract the static assets from the Bokeh container and store them locally in a temporary directory by running the following commands:
gcloud auth configure-docker --quiet export DOCKER_IMAGE="gcr.io/cloud-solutions-images/dashboard-demo" export STATIC_PATH=$(docker run $DOCKER_IMAGE bokeh info --static) export TMP_PATH="/tmp/dashboard-demo-static" mkdir $TMP_PATH docker run $DOCKER_IMAGE tar Ccf $(dirname $STATIC_PATH) - \ static | tar Cxf $TMP_PATH -
Create the bucket and upload all the assets to that bucket using the
gsutil
command included in the Cloud SDK:export BUCKET="${PROJECT_ID}-dashboard-demo" # Create the bucket gsutil mb gs://$BUCKET # Upload the assets to the bucket gsutil -m cp -z js,css -a public-read -r $TMP_PATH/static gs://$BUCKET
The
-z
option appliesgzip
encoding to files with thejs
andcss
filename extensions, which saves network bandwidth and space in Cloud Storage, reducing storage costs and speeding up page loads. The-a
option sets the permissions topublic-read
so that the uploaded assets can be publicly accessible by all clients.Create the backend bucket, including Cloud CDN support using the
--enable-cdn
option:gcloud compute backend-buckets create dashboard-demo-backend-bucket \ --gcs-bucket-name=$BUCKET --enable-cdn
Connect the backend bucket to the HTTPS load balancer for all traffic requesting assets inside the
/static/*
directory:gcloud compute url-maps add-path-matcher dashboard-demo-urlmap \ --default-service dashboard-demo-service \ --path-matcher-name bucket-matcher \ --backend-bucket-path-rules="/static/*=dashboard-demo-backend-bucket"
This configuration change can take a few minutes to propagate. Wait a moment, then verify that Cloud CDN is working by checking that the response headers for static assets contain
x-goog-*
parameters:curl -Iks https://${DOMAIN}/static/js/bokeh.min.js | grep x-goog
Authentication
While the SSL encryption previously enabled at the load balancer level protects the transport, we recommend adding an authentication layer in order to control access to the app and to prevent potential abuse. One option is to enable IAP.
The first step is to set up an OAuth consent screen:
Set up your OAuth consent screen:
- For User type, select External, and then click Create.
- Enter
Dashboard demo
for Application name. - Under Support email, select the email address you want to display as a public contact. This must be your email address, or a Google Group you own.
- Click Save.
Under Credentials, click Create credentials > OAuth client ID.
Under Application type, select Web application, and then enter
Dashboard demo
as the Name value.When you're finished entering details, click Create and take note of the client ID and client secret shown in the OAuth client window.
Click the "Dashboard demo" credential.
For Authorized redirect URIs, add an entry with the following format. (Replace
[CLIENT_ID]
with the client IP value that you noted in the earlier step.)https://iap.googleapis.com/v1/oauth/clientIds/[CLIENT_ID]:handleRedirect
Enable IAP by running the following commands in the Cloud Shell. Replace the
[CLIENT_ID]
and[CLIENT_SECRET]
values with the values that you noted in the earlier step:gcloud beta compute backend-services update dashboard-demo-service --global \ --iap=enabled,oauth2-client-id=[CLIENT_ID],oauth2-client-secret=[CLIENT_SECRET]
Finally, add a member to allow access to the app. Replace the
[EMAIL]
value with your Google account email address:gcloud projects add-iam-policy-binding $PROJECT_ID \ --role roles/iap.httpsResourceAccessor \ --member user:[EMAIL]
This configuration change can take a few minutes to propagate. When done, the app is protected by IAP, and only the user with the email address provided above can access it. You can add more users by using the same process as in the last step above.
Cleaning up
To avoid incurring charges to your Google Cloud account for the resources used in this tutorial, either delete the project that contains the resources, or keep the project and delete the individual resources.
Deleting the project
The easiest way to eliminate billing is to delete the project that you created for the tutorial.
To delete the project:
- In the Cloud Console, go to the Manage resources page.
- In the project list, select the project that you want to delete, and then click Delete.
- In the dialog, type the project ID, and then click Shut down to delete the project.
Deleting individual resources
Run the following commands to delete individual resources, instead of deleting the whole project:
export PROJECT_ID=$(gcloud info --format='value(config.project)')
# Delete the load balancer resources
gcloud compute forwarding-rules \
delete dashboard-demo-gfr --global --quiet
gcloud compute target-https-proxies \
delete dashboard-demo-https-proxy --quiet
gcloud compute ssl-certificates \
delete dashboard-demo-ssl-cert --quiet
gcloud compute url-maps \
delete dashboard-demo-urlmap --quiet
gcloud compute backend-services \
delete dashboard-demo-service --global --quiet
gcloud compute health-checks delete \
dashboard-demo-basic-check --quiet
gcloud compute firewall-rules delete \
gke-dashboard-demo-lb7-fw --quiet
# Delete the Kubernetes Engine cluster
gcloud container clusters delete dashboard-demo-cluster --quiet
# Delete the service account
export SERVICE_ACCOUNT_NAME="dashboard-demo"
gcloud iam service-accounts delete \
"${SERVICE_ACCOUNT_NAME}@${PROJECT_ID}.iam.gserviceaccount.com" --quiet
# Delete the backend bucket
gcloud compute backend-buckets delete dashboard-demo-backend-bucket --quiet
# Delete the Cloud Storage bucket
export BUCKET="${PROJECT_ID}-dashboard-demo"
gsutil -m rm -r gs://$BUCKET
# Delete the static IP address
gcloud compute addresses delete dashboard-demo-static-ip --global --quiet
# Delete the local Docker image
docker rmi -f gcr.io/cloud-solutions-images/dashboard-demo
Last, delete the Oauth client ID:
Open the Credentials page.
Click the "Dashboard demo" entry.
Click DELETE, then DELETE again to confirm.
What's next
If you want to process and upload your own data to BigQuery, check out:
If you're interested in implementing real-time dashboards, check out Bokeh's stream method and the documentation on streaming data into BigQuery.
- Try out other Google Cloud features for yourself. Have a look at our tutorials.