Edit on GitHub
Report issue
Page history

Locally connected microcontrollers and real-time analytics (part 2 of 2)

Author(s): @markku @varundhussa ,   Published: 2019-03-20

Varun Dhussa and Markku Lepisto | Solutions Architect | Google

Contributed by Google employees.

This two-part tutorial demonstrates how to control an Arduino Microcontroller with a Raspberry Pi, connect the devices to IoT Core, post sensor data from the devices, and analyze the data in real time. Part 1 of the tutorial created a hybrid device, combining the strengths of a Linux-based microprocessor with internet connectivity and TLS stack, together with a constrained microcontroller for analog I/O.

Part 2 objectives

  • Process sensor data from Pub/Sub using Dataflow.
  • Store processed sensor data in BigQuery.
  • Create a report dashboard using Google Data Studio.
  • Create a notebook on Datalab.

architecture diagram

Before you begin

This tutorial assumes that you already have a Google Cloud account set up and have Part 1 of the tutorial working.


This tutorial uses billable components of Google Cloud, including the following:

  • IoT Core
  • Pub/Sub
  • Dataflow
  • BigQuery
  • Datalab

This tutorial should not generate any usage that would not be covered by the free tier, but you can use the Pricing Calculator to generate a cost estimate based on your projected production usage.

Enable Dataflow for your project

Perform all of the steps in the "Before you begin" section of the Dataflow Quickstart—through creating a Cloud Storage bucket—on your local development environment (e.g., laptop).

Enable BigQuery for your project

Perform all of the steps in the "Before you begin" section of the BigQuery Quickstart.

Install environment dependencies and the Cloud SDK

  1. Clone the source repository:

    $ git clone https://github.com/GoogleCloudPlatform/community.git
  2. Change to the directory for this tutorial:

    $ cd community/tutorials/ardu-pi-serial-part-2
  3. Create and activate the virtual environment:

    $ virtualenv my_virtual_env
    $ . ./my_virtual_env/bin/activate
    $ pip install -r beam-requirements.txt
  4. Follow the steps in this guide to install the Cloud SDK.

Create a BigQuery dataset

BigQuery is Google's fully managed serverless and highly scalable enterprise data warehouse solution.

A BigQuery dataset contains tables and views in a specified single region or a geography containing multiple regions. Follow the instructions to create a dataset in your project. The dataset location can only be specified while creating it. More details are available here.

Start the Dataflow job

Dataflow is a fully managed service for transforming and enriching data in stream (real-time) and batch (historical) modes with equal reliability and expressiveness using the Apache Beam SDK.

Select your preferred Dataflow service region.

Run the command below to start the Apache Beam pipeline on the Dataflow runner.

$ python -m beam-solarwind --project [project_name] \
--topic [pub_sub_topic_name (e.g., projects/my-project/topics/my-topic)] \
--temp_location gs://[cloud_storage_bucket]/tmp \
--setup_file ./setup.py \
--region [your_preferred_region] \
--runner DataflowRunner \
--output "[bigquery_table_dataset].[table_name]" \
--output_avg "[bigquery_average_table_dataset].[table_avg]" 

Go to the Dataflow interface in the GCP Console and select your newly created Dataflow job to see your pipeline.

The following diagram shows an example Dataflow pipeline:

DF job

The first part of the Dataflow job sets up the pipeline options with the required parameters passed through the command-line parameters, as shown above. The streaming mode option is also enabled. To allow access to the modules available in the main session, the save_main_session flag is set. After this, the beam pipeline object is created.

args, pipeline_args = parser.parse_known_args(argv)
options = PipelineOptions(pipeline_args)
options.view_as(SetupOptions).save_main_session = True
options.view_as(StandardOptions).streaming = True
p = beam.Pipeline(options=options)

The first two steps of the Dataflow pipeline read incoming events from Pub/Sub and then parse the JSON text:

records = (p | 'Read from PubSub' >> beam.io.ReadFromPubSub(
    topic=args.topic) | 'Parse JSON to Dict' >> beam.Map(

There are two branches at the next step. The one on the right in the figure above writes the incoming stream of events to the raw BigQuery table. The table is created if it does not exist.

# Write to the raw table
records | 'Write to BigQuery' >> beam.io.WriteToBigQuery(

The one on the left aggregates the events and writes them to the BigQuery average table.

  1. Use the timestamp in the event object and emit it with the object. This is then used to create a sliding window of 300 seconds that starts every 30 seconds.

    records | 'Add timestamp' >> beam.ParDo(AddTimestampToDict()) |
         'Window' >> beam.WindowInto(beam.window.SlidingWindows(
             300, 60, offset=0))
  2. At the next stage, the record is emitted as a key-value tuple, in which the clientid is the key and the object is the value.

    'Dict to KeyValue' >> beam.ParDo(AddKeyToDict())
  3. The elements are grouped by the clientid key, and the averages of all the metrics (temperature, pressure, etc.) are calculated.

    'Group by Key' >> beam.GroupByKey() |
    'Average' >> beam.ParDo(CountAverages())
  4. The calculated average values are written to the BigQuery average table. The table is created if it does not exist.

    'Write Avg to BigQuery' >> beam.io.WriteToBigQuery(

View results in BigQuery

  1. Ensure that the client from Part 1 is running and posting data to Pub/Sub through IoT Core.
  2. Go to the BigQuery UI in the GCP Console.
  3. In the BigQuery menu, select your project my_project_id.
  4. Select the dataset my_dataset.
  5. Select the table my_table:
    1. View the table schema.
    2. See table details.
  6. Run the following queries to see the latest data:

    1. Select the latest 20 records from the raw table:

      select * from [my_dataset].[my_table]
      order by timestamp DESC
      limit 20;
    2. The average table adds a single row for each time window. Run the query below to select the latest 20 records.

      select * from [my_dataset].[my_avg_table]
      order by timestamp DESC
      limit 20;

BigQuery table schema:


BigQuery table preview: bq data

Create a Data Studio report

Data Studio is a managed tool that allows creation and sharing of dashboards and reports.

  1. Go to the Data Studio interface.
  2. Click the + button to create a new blank report.
  3. Add a new Data Source:


    1. Click the + Create New Data Source button
    2. Select the BigQuery by Google connector
    3. Select the BigQuery project my_project_id, dataset my_dataset, and table my_table, and then click Connect.
    4. All the schema fields (clientid, temperature, pressure, etc.) would be auto-selected.
    5. Click Add to report and confirm by clicking the button in the popup.
  4. Create a new chart:


    1. Click Add a Chart in the menu bar.
    2. Select a line chart.
    3. Select Date and Time range dimensions as the timestamp column.
    4. Select the clientid field as the breakdown dimension.
    5. Select a metric (e.g., temperature) and aggregation (e.g., AVG).
    6. Add a Text Box and label the chart.
    7. Repeat the steps above for additional metrics.

Data Studio report: dsreport

Create a Datalab notebook

Datalab is an interactive tool for data exploration that is built on Jupyter. The Jupyter Notebook is an open-source web application that allows you to create and share documents that contain live code, equations, visualizations, and narrative text.

  1. Go to the Datalab quickstart and perform all of the steps in the "Before you begin" section.
  2. Go to the notebooks page.
  3. Click the Upload button to add community/tutorials/ardu-pi-serial-part2/solarwindreport.ipynb to Datalab.
  4. Click the notebook to open and edit it.
    1. Set the project ID (e.g., my_project_id).
    2. Set the dataset name (e.g., my_dataset).
    3. Set the raw table name (e.g., my_table).
    4. Set the average table name (e.g., my_avg_table).
    5. Set the location (e.g., my_location). Important: The location must match that of the datasets referenced in the query.
    6. Set the client id (e.g., my_client_name).
  5. From the Kernel menu in the menu bar, select python3.
  6. Click Run in the menu bar to execute the notebook.

Clean up

  1. Clean up the Datalab environment.
  2. Delete the Data Studio report:
    1. Go to the Data Studio interface.
    2. In the menu section, click the three-dot menu next to the report name.
    3. Select Remove.
  3. Stop the Dataflow job.
  4. Delete the Cloud Storage bucket:

    $ gsutil rm -r gs://[cloud_storage_bucket]
  5. Delete the BigQuery dataset:

    $ bq rm -r [my_dataset]
    rm: remove dataset '[my_project_id]:[my_dataset]'? (y/N) y
  6. To delete a project, do the following:

    1. In the GCP Console, go to the Projects page.
    2. In the project list, select the project you want to delete and click Delete project.
    3. In the dialog, type the project ID, and then click Shut down to delete the project.

    deleting the project

What's next

Submit a tutorial

Share step-by-step guides

Submit a tutorial

Request a tutorial

Ask for community help

Submit a request

View tutorials

Search Google Cloud tutorials

View tutorials

Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see our Site Policies. Java is a registered trademark of Oracle and/or its affiliates.