Creating Custom Interactive Dashboards with Bokeh and BigQuery

Stay organized with collections Save and categorize content based on your preferences.
In this tutorial, you learn how to build a custom interactive dashboard app on Google Cloud by using the Bokeh library to visualize data from publicly available BigQuery datasets. You also learn how to deploy this app with both security and scalability in mind.

Powerful software-as-a-service (SaaS) solutions like Looker and Looker Studio are extremely useful when building interactive data visualization dashboards. However, these types of solutions might not provide sufficient customization options. For those scenarios, you can use open source libraries like D3.js, Chart.js, or Bokeh to create custom dashboards. While these libraries offer a lot of flexibility for building dashboards with tailored features and visualizations, a major challenge remains: deploying these dashboards to the internet to enable easy stakeholder access.



This tutorial uses the following billable components of Google Cloud:

  • BigQuery
  • Google Kubernetes Engine
  • Artifact Registry
  • Cloud Storage
  • Cloud CDN
  • Load Balancer
  • Static IP address

To generate a cost estimate based on your projected usage, use the pricing calculator. New Google Cloud users might be eligible for a free trial.

Before you begin

  1. In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

    Go to project selector

  2. Make sure that billing is enabled for your Cloud project. Learn how to check if billing is enabled on a project.

  3. Enable the GKE, Cloud Build, Artifact Registry, and BigQuery APIs.

    Enable the APIs

  4. Start a Cloud Shell instance. You'll use Cloud Shell throughout this tutorial.

    Open Cloud Shell

  5. Clone the Git repository by running the following command in Cloud Shell:

    git clone
    cd bigquery-bokeh-dashboard
  6. In Cloud Shell, set the default Compute Engine zone to the zone where you are going to create your GKE cluster. This tutorial uses the us-central1-a zone.

    gcloud config set compute/zone us-central1-a
  7. Create a service account:

    export PROJECT_ID=$(gcloud info --format='value(config.project)')
    export SERVICE_ACCOUNT_NAME="dashboard-demo"
    gcloud iam service-accounts create $SERVICE_ACCOUNT_NAME \
        --display-name="Bokeh/BigQuery Dashboard Demo"
  8. Give access to Artifact Registry to the service account:

    gcloud projects add-iam-policy-binding $PROJECT_ID \
        --member="serviceAccount:${SERVICE_ACCOUNT_NAME}@${PROJECT_ID}" \
        --role roles/artifactregistry.reader
  9. To allow the demo app to access BigQuery, give the service account the bigquery.user role:

    gcloud projects add-iam-policy-binding $PROJECT_ID \
        --member="serviceAccount:${SERVICE_ACCOUNT_NAME}@${PROJECT_ID}" \
        --role roles/bigquery.user

    For information on service accounts, see service accounts.

Understanding the demo app

The demo app displays a dashboard that processes and visualizes publicly available data for all 50 U.S. states. The dashboard provides a simple level of interaction: the user can select any of the fifty states from a dropdown widget to drill down and display information about that state. The dashboard contains four different modules, and each module displays different types of information:

A screenshot of the running demo follows:

Dashboard view of the app with graphs.

Overall architecture

At a high-level, the demo's architecture consists of six main elements. The following diagram shows how these components interact:

Architecture of the demo app.

These elements include:

  • Client web browsers on the desktop machines and mobile devices of the dashboard users.
  • An authentication proxy managed by IAP, responsible for controlling access to the app.
  • An HTTPS Load Balancer responsible both for effectively distributing incoming web traffic to the appropriate backends and for handling SSL decryption/encryption of data coming in and out of the system.
  • The app backend responsible for serving the dashboard's dynamic content (HTML and plot data). This app backend uses a Docker image that you build with a Dockerfile that is available on GitHub.
  • The static assets backend serving the static JavaScript and CSS files needed for client web browser rendering.

App backend

The app backend receives incoming requests from the load balancer and then fetches and transforms the appropriate data from BigQuery. The backend returns the requested dynamic content, in the form of HTML over HTTP, and returns plotting data through WebSockets.

The following diagram shows the architecture:

Architecture of the app's backend.

The app backend contains two main subcomponents:

  • The Bokeh service. The app's centerpiece, this service contains pods running the Bokeh servers that query data from BigQuery and generate the dynamic content consumed by the client web browsers.
  • The Memcached service. Contains pods running Memcached servers that are responsible for caching frequently requested data.

These subcomponents exist in Docker containers and are deployed as Kubernetes pods to Kubernetes Engine to enable horizontal scalability.

Bokeh service

In this demo, the Bokeh service contains two Bokeh pods. Each pod runs a separate Bokeh server instance, which is a wrapper around Tornado's non-blocking web server. This server can serve information synchronously over HTTP for HTML content and asynchronously through Websockets for the dashboard's plotting data.

For this tutorial, each Bokeh server has been configured to fork into four different work subprocesses, reflected by the --num-procs parameter in the CMD statement in the container Dockerfile:

CMD bokeh serve --disable-index-redirect --num-procs=4 --port=5006 --address= --allow-websocket-origin=$DASHBOARD_DEMO_DOMAIN dashboard

The Bokeh server automatically load balances incoming traffic between the subprocesses. This approach increases the performance and resilience of each pod. The number of subprocesses used in this tutorial is arbitrary. In a real-world scenario, you would adjust this number based on actual production traffic and the memory and CPU resources available in the cluster.

BigQuery is accessed directly from the Bokeh app by using the BigQuery Python client library. The client's query() method submits a SQL query to BigQuery. The to_dataframe() method then transparently handles the pagination of results returned by BigQuery, and consolidates them into a Pandas DataFrame object, which Bokeh can directly process.

As an illustration, here is an example of a query to collect the top 100 most populated ZIP Codes in California (state code 'CA'):

query = """
      `bigquery-public-data.census_bureau_usa.population_by_zip_2010` AS A
      `bigquery-public-data.utility_us.zipcode_area` AS B
     A.zipcode = B.zipcode
     gender = ''
     state_code = 'CA'
     population DESC

from import bigquery
bigquery_client = bigquery.Client()
dataframe = bigquery_client.query(query).to_dataframe()

Note that, because the resulting data needs to travel over the network, offloading as much pre-processing to BigQuery as possible is highly recommended. For example, do any necessary downsampling at the query level (by using the LIMIT clause, targeting fields in the SELECT clause, pre-segmenting tables, or pre-grouping results using the GROUP BY clause) in order to reduce the overall network traffic and speed up response times.

Recall that the demo dashboard displays four different plot modules. When the user selects a U.S. state, the app sends four separate queries (one for each module) to BigQuery, processing a combined total of 7 gigabytes of data. Each of these BigQuery queries takes an average of 1 to 3 seconds to run. During this processing lapse, the app remains idle, waiting for the results to return from BigQuery. To optimize the dashboard's overall response time, these four queries run in parallel on separate threads, as shown in the following code:

def fetch_data(state):
    Fetch data from BigQuery for the given US state by running
    the queries for all dashboard modules in parallel.
    t0 = time.time()
    # Collect fetch methods for all dashboard modules
    fetch_methods = { getattr(module, 'fetch_data') for module in modules}
    # Create a thread pool: one separate thread for each dashboard module
    with concurrent.futures.ThreadPoolExecutor(max_workers=len(fetch_methods)) as executor:
        # Prepare the thread tasks
        tasks = {}
        for key, fetch_method in fetch_methods.items():
            task = executor.submit(fetch_method, state)
            tasks[task] = key
        # Run the tasks and collect results as they arrive
        results = {}
        for task in concurrent.futures.as_completed(tasks):
            key = tasks[task]
            results[key] = task.result()
    # Return results once all tasks have been completed
    t1 = time.time()
    timer.text = '(Execution time: %s seconds)' % round(t1 - t0, 4)
    return results

When all the results have been received from BigQuery, Bokeh's graph library plots the data. For example, the following code plots the evolution of air pollutant levels over time:

def make_plot(self, dataframe):
    self.source = ColumnDataSource(data=dataframe)
    palette = all_palettes['Set2'][6]
    hover_tool = HoverTool(tooltips=[
        ("Value", "$y"),
        ("Year", "@year"),
    self.plot = figure(
        plot_width=600, plot_height=300, tools=[hover_tool],
    columns = {
        'pm10': 'PM10 Mass (µg/m³)',
        'pm25_frm': 'PM2.5 FRM (µg/m³)',
        'pm25_nonfrm': 'PM2.5 non FRM (µg/m³)',
        'lead': 'Lead (¹/₁₀₀ µg/m³)',
    for i, (code, label) in enumerate(columns.items()):
            x='year', y=code, source=self.source, line_width=3,
            line_alpha=0.6, line_color=palette[i], legend=label)

    self.title = Paragraph(text=TITLE)
    return column(self.title, self.plot)

Deploying the Bokeh service

To facilitate horizontal scalability for the app, all components of the app backend are deployed with Docker containers using GKE. The steps for deploying these components to GKE follow.

  1. First, create a GKE cluster, with an arbitrary number of 2 nodes, by running the following command in the Cloud Shell:

    gcloud container clusters create dashboard-demo-cluster \
      --tags dashboard-demo-node \
      --service-account ${SERVICE_ACCOUNT_NAME}@${PROJECT_ID} \
      --num-nodes 2
  2. Create a static IP address:

    gcloud compute addresses create dashboard-demo-static-ip --global
  3. Define a domain using, a free DNS service that automatically resolves domains containing an IP address back to that same IP address:

    export STATIC_IP=$(gcloud compute addresses describe \
      dashboard-demo-static-ip --global --format="value(address)")
    export DOMAIN="${STATIC_IP}"
    echo "My domain: ${DOMAIN}"
  4. Create a GKE secret with the domain:

    kubectl create secret generic domain --from-literal domain=$DOMAIN
  5. Create a Docker repository in Artifact Registry:

    gcloud artifacts repositories create dashboard-demo \
      --location=us-central1 \
  6. Build the Docker image using Cloud Build:

    gcloud builds submit --config app/cloudbuild.yaml
  7. Edit kubernetes/bokeh.yaml to replace [PROJECT_ID] with your actual project ID:

    sed -i "s/\[PROJECT_ID\]/$PROJECT_ID/" kubernetes/bokeh.yaml
  8. Deploy the Bokeh service:

    kubectl create -f kubernetes/bokeh.yaml

    This last step deploys an arbitrary number of pods, in this case two, each one running a separate instance of the demo app. For details, see bokeh.yaml.

Memcached service

When you don't expect the data displayed on a dashboard to change for a given period of time, it's important to consider performance and cost optimizations. For this purpose, and by default, BigQuery internally caches the results from duplicated queries for a period of 24 hours. This greatly speeds up response times and avoids unnecessary billing charges. For more information, see Using cached query results.

This caching mechanism can be further improved by caching frequently used data locally inside the dashboard app's memory. This approach takes BigQuery out of the loop and avoids unnecessary network traffic to collect the cached data. This tutorial uses Memcached to handle this type of app-level caching. Memcached is extremely fast and reduces response times from a few seconds to a few milliseconds in the case of this tutorial.

One peculiarity of Memcached is that it does not have a built-in load-balancing mechanism. Instead, load-balancing between multiple Memcached pods must be done by the Memcached client, which is included in each Bokeh pod, as illustrated in the following diagram:

Architecture of the Memchached pod.

For the Memcached pods to be discoverable by the Memcached client, the Memcached service must be declared as headless, which is achieved by using the clusterIP: None entry in the GKE configuration file (see memcached.yaml). The Memcached client can then retrieve IP addresses of the individual pods by querying kube-dns for the memcached.default.svc.cluster.local domain name.

Deploying the Memcached service

Similar to the Bokeh service, the Memcached service is deployed by using a Docker container to GKE using Kubernetes.

Deploy the Memcached pods and headless service by running the following command:

kubectl create -f kubernetes/memcached.yaml

Adding the load balancer

At this point, the dashboard app is running inside a private service. You need to introduce an HTTPS Load Balancer to expose the app to the internet so users can access the dashboard. You will do this in the following steps:

  1. Create an SSL certificate.
  2. Create a load balancer.
  3. Extend the connection timeout.

Creating an SSL certificate

In real-world scenarios, data displayed on dashboards can be sensitive or private. You should use SSL to encrypt this data before it gets sent to the users over the internet. The first step is to create an SSL certificate.

In a production setting, you would have to issue a certificate through an official Certificate Authority. For the purpose of this tutorial, you can issue a self-signed certificate by running the commands that follow:

export STATIC_IP=$(gcloud compute addresses describe \
  dashboard-demo-static-ip --global --format="value(address)")

export DOMAIN="${STATIC_IP}"

mkdir /tmp/dashboard-demo-ssl

cd /tmp/dashboard-demo-ssl

openssl genrsa -out ssl.key 2048

openssl req -new -key ssl.key -out ssl.csr -subj "/CN=${DOMAIN}"

openssl x509 -req -days 365 -in ssl.csr -signkey ssl.key -out ssl.crt

cd -

Creating the HTTPS load balancer

The HTTPS load balancer comprises multiple resources:

  • A global forwarding rule.
  • A health check.
  • A backend service.
  • A URL map.
  • A target proxy.

To create all of those resources simultaneously included in the GitHub source repository:


Verify that the dashboard is live by visiting the URL returned by this command:

echo https://${STATIC_IP}

If your browser returns an error with a 502 code, the load balancer is still being deployed. It might take a few minutes for this deployment to complete. Wait for a few seconds, and then refresh the page. Try this until the page works and the dashboard appears.

At this point, the HTTPS load balancer terminates the encryption. Traffic coming in and out of the system uses HTTPS and Secure WebSockets, while traffic inside the system remains unencrypted using regular HTTP and WebSockets. This level of encryption is sufficient in most scenarios. If you require extra security, consider enforcing SSL encryption in the Google Cloud project's Virtual Private Cloud network. Setting up SSL encryption in the VPC network is out of scope for this tutorial.

Extending the connection timeout

By default, the HTTPS load balancer closes all connections that are open for longer than 30 seconds.

To allow the dashboard's WebSocket connections to stay open for a longer period of time, you need to increase the connection timeout. The following gcloud command sets the timeout to one day (86,400 seconds):

gcloud compute backend-services update dashboard-demo-service \
    --global --timeout=86400

Static assets backend

The Bokeh pods can directly serve all the static JavaScript and CSS files required to render the dashboard. However, letting those pods handle this task has a number of drawbacks:

  • It places an unnecessary load on the Bokeh pods, potentially reducing their capacity for serving other types of resources, such as the dynamic plotting data.
  • It forces clients to download assets from the same origin server, regardless of where the clients are geographically located.

To alleviate those drawbacks, create a separate dedicated backend to efficiently serve static assets.

This static assets backend uses two main components:

  • A bucket hosted in Cloud Storage), responsible for holding the source JavaScript and CSS assets.
  • Cloud CDN), responsible for globally distributing these assets closer to potential dashboard users, resulting in lower latency and faster page loads.

The following diagram shows the architecture:

Architecture of the static assets backend.

Creating the static assets backend

  1. Extract the static assets from the Bokeh container and store them locally in a temporary directory by running the following commands:

    gcloud auth configure-docker --quiet
    export DOCKER_IMAGE="$PROJECT_ID/dashboard-demo/dashboard-demo:latest"
    export STATIC_PATH=$(docker run $DOCKER_IMAGE bokeh info --static)
    export TMP_PATH="/tmp/dashboard-demo-static"
    mkdir $TMP_PATH
    docker run $DOCKER_IMAGE tar Ccf $(dirname $STATIC_PATH) - \
        static | tar Cxf $TMP_PATH -
  2. Create the bucket and upload all the assets to that bucket using the gsutil command included in the Google Cloud CLI:

    export BUCKET="${PROJECT_ID}-dashboard-demo"
    # Create the bucket
    gsutil mb gs://$BUCKET
    # Upload the assets to the bucket
    gsutil -m cp -z js,css -a public-read -r $TMP_PATH/static gs://$BUCKET

    The -z option applies gzip encoding to files with the js and css filename extensions, which saves network bandwidth and space in Cloud Storage, reducing storage costs and speeding up page loads. The -a option sets the permissions to public-read so that the uploaded assets can be publicly accessible by all clients.

  3. Create the backend bucket, including Cloud CDN support using the --enable-cdn option:

    gcloud compute backend-buckets create dashboard-demo-backend-bucket \
        --gcs-bucket-name=$BUCKET --enable-cdn
  4. Connect the backend bucket to the HTTPS load balancer for all traffic requesting assets inside the /static/* directory:

    gcloud compute url-maps add-path-matcher dashboard-demo-urlmap \
        --default-service dashboard-demo-service \
        --path-matcher-name bucket-matcher \
  5. This configuration change can take a few minutes to propagate. Wait a moment, then verify that Cloud CDN is working by checking that the response headers for static assets contain x-goog-* parameters:

    curl -Iks https://${DOMAIN}/static/js/bokeh.min.js | grep x-goog


While the SSL encryption previously enabled at the load balancer level protects the transport, we recommend adding an authentication layer in order to control access to the app and to prevent potential abuse. One option is to enable IAP.

The first step is to set up an OAuth consent screen:

  1. Set up your OAuth consent screen:


    1. For User type, select External, and then click Create.
    2. Enter Dashboard demo for App name.
    3. Under both User support email and Developer contact information, select the email address you want to display as a public contact. This must be your email address, or a Google Group you own.
    4. Click Save.
  2. Under Credentials, click Create credentials > OAuth client ID.

  3. Under Application type, select Web application, and then enter Dashboard demo as the Name value.

  4. When you're finished entering details, click Create and take note of the client ID and client secret shown in the OAuth client window.

  5. Click the "Dashboard demo" credential.

  6. For Authorized redirect URIs, add an entry with the following format, then click Save. (Replace [CLIENT_ID] with the client ID value that you noted in the earlier step.)[CLIENT_ID]:handleRedirect
  7. In Cloud Shell, enable IAP by running the following commands in the Cloud Shell. Replace the [CLIENT_ID] and [CLIENT_SECRET] values with the values that you noted in the earlier step:

    gcloud compute backend-services update dashboard-demo-service --global \
  8. Finally, add a member to allow access to the app. Replace the [EMAIL] value with your Google account email address:

    gcloud projects add-iam-policy-binding $PROJECT_ID \
        --role roles/iap.httpsResourceAccessor \
        --member user:[EMAIL]

This configuration change can take a few minutes to propagate. When done, the app is protected by IAP, and only the user with the email address provided above can access it. You can add more users by using the same process as in the last step above.

Clean up

To avoid incurring charges to your Google Cloud account for the resources used in this tutorial, either delete the project that contains the resources, or keep the project and delete the individual resources.

Deleting the project

The easiest way to eliminate billing is to delete the project that you created for the tutorial.

To delete the project:

  1. In the Google Cloud console, go to the Manage resources page.

    Go to Manage resources

  2. In the project list, select the project that you want to delete, and then click Delete.
  3. In the dialog, type the project ID, and then click Shut down to delete the project.

Deleting individual resources

Run the following commands to delete individual resources, instead of deleting the whole project:

export PROJECT_ID=$(gcloud info --format='value(config.project)')

# Delete the load balancer resources
gcloud compute forwarding-rules \
    delete dashboard-demo-gfr --global --quiet
gcloud compute target-https-proxies \
    delete dashboard-demo-https-proxy --quiet
gcloud compute ssl-certificates \
    delete dashboard-demo-ssl-cert --quiet
gcloud compute url-maps \
    delete dashboard-demo-urlmap --quiet
gcloud compute backend-services \
    delete dashboard-demo-service --global --quiet
gcloud compute health-checks delete \
    dashboard-demo-basic-check --quiet
gcloud compute firewall-rules delete \
    gke-dashboard-demo-lb7-fw --quiet

# Delete the Kubernetes Engine cluster
gcloud container clusters delete dashboard-demo-cluster --quiet

# Delete the service account
export SERVICE_ACCOUNT_NAME="dashboard-demo"
gcloud iam service-accounts delete \

# Delete the backend bucket
gcloud compute backend-buckets delete dashboard-demo-backend-bucket --quiet

# Delete the Cloud Storage bucket
export BUCKET="${PROJECT_ID}-dashboard-demo"
gsutil -m rm -r gs://$BUCKET

# Delete the static IP address
gcloud compute addresses delete dashboard-demo-static-ip --global --quiet

# Delete the Docker repository
gcloud artifacts repositories delete dashboard-demo \

Last, delete the Oauth client ID:

  1. Open the Credentials page.


  2. Click the "Dashboard demo" entry.

  3. Click DELETE, then DELETE again to confirm.

What's next

  • Explore reference architectures, diagrams, tutorials, and best practices about Google Cloud. Take a look at our Cloud Architecture Center.