Overview of Internet of Things

Internet of Things (IoT) is a sprawling set of technologies and use cases that has no clear, single definition. One workable view frames IoT as the use of network-connected devices, embedded in the physical environment, to improve some existing process or to enable a new scenario not previously possible.

These devices, or things, connect to the network to provide information they gather from the environment through sensors, or to allow other systems to reach out and act on the world through actuators. They could be connected versions of common objects you might already be familiar with, or new and purpose-built devices for functions not yet realized. They could be devices that you own personally and have on your person or in your home, or they could be embedded in factory equipment, or part of the fabric of the city you live in. Each of them is able to convert valuable information from the real world into digital data that provides increased visibility into how your users interact with your products, services, or applications.

The specific use cases and opportunities across different industries are numerous, and in many ways the world of IoT is just getting started. What emerges from these scenarios is a set of common challenges and patterns. IoT projects have additional dimensions that increase their complexity when compared to other cloud-centric technology applications, including:

  • Diverse hardware
  • Diverse operating systems and software on the devices
  • Different network gateway requirements

This guide explains the elements you can combine with Google Cloud Platform to build a robust, maintainable, end-to-end IoT solution on Cloud Platform.

Overview of the top-level components

Here we divide the system into three basic components, the device, gateway, and cloud:

Three components

A device includes hardware and software that directly interact with the world. Devices connect to a network to communicate with each other, or to centralized applications. Devices might be directly or indirectly connected to the internet.

A gateway enables devices that are not directly connected to the Internet to reach cloud services. Although the term gateway has a specific function in networking, it is also used to describe a class of device that processes data on behalf of a group or cluster of devices. The data from each device is sent to Cloud Platform, where it is processed and combined with data from other devices, and potentially with other business-transactional data.

Types of information

Each device can provide or consume various types of information. Each form of information might best be handled by a different backend system, and each system should be specialized around the data rate, volume, and preferred API. This section lists and describes common categories of information found in IoT scenarios.

Device metadata

Metadata contains information about a device. Most metadata is immutable or rarely changes. Examples of metadata fields include:

  • Identifier (ID) - An identifier that uniquely identifies a device. The device ID should never change for the lifespan of a deployed device.
  • Class or type
  • Model
  • Revision
  • Date manufactured
  • Hardware serial number

State information

State information describes the current status of the device, not of the environment. This information can be read/write. It is updated, but usually not frequently.

Telemetry

Data collected by the device is called telemetry. This is the eyes-and-ears data that IoT devices provide to applications. Telemetry is read-only data about the environment, usually collected through sensors.

Each source of telemetry results in a channel. Telemetry data might be preserved as a stateful variable on the device or in the cloud.

Although each device might send only a single data point every minute, when you multiply that data by a large number of devices, you quickly need to apply big data strategies and patterns.

Commands

Commands are actions performed by a device. Commands often have traits that constrain the choices available in your implementation. These traits include:

  • Commands are not easily represented as state data.

  • Commands are often not idempotent, which means each duplicate message usually results in a different outcome. Like messaging systems, the implementation of a command function determines the delivery semantics, such as "at least once" or "exactly once". The command mechanism can include a return value, or might rely on the confirmation being made through a separate return message or by reflecting the expected change in the state data.

  • Commands might be of limited temporal relevance, so they should include a time-to-live (TTL) or other expiration value.

Examples of commands include:

  • Spin 360 degrees to the right.
  • Run self cleaning cycle.
  • Increase the rate by ten percent.

Operational information

Operational information is data that's most relevant to the operation of the device as opposed to the business application. This might include things such as CPU operating temperature and battery state. This kind of data might not have long-term analytical value, but it has short-term value to help maintain the operating state, such as responding to breakages and correcting performance degradation of software after updates.

Operational information can be transmitted as telemetry or state data.

Devices

It's not always clear what constitutes a device. Many physical things are modular, which means it can be hard to decide whether the whole machine is the device, or each module is a discrete device. There's no single, right answer to this question. As you design your IoT project, you'll need to think about the various levels of abstraction in your design and make decisions about how to represent the physical things and their relationships to each other. The specific requirements of your application will help you understand whether something that generates information should be treated as a device, and therefore deserves its own ID, or is simply a channel or state detail of another device.

As an example, consider a project that has the goal of monitoring the temperature of rooms in a hotel. In each room there might be three sensors: one at the floor by the door, one on the ceiling, and one next to the bed. You can model this setup by representing each sensor as a device:

{deviceID: "dh28dslkja", "location": "floor", "room": 128, "temp": 22 }
{deviceID: "8d3kiuhs8a", "location": "ceiling", "room": 128, "temp": 24 }
{deviceID: "kd8s8hh3o", "location": "bedside", "room": 128, "temp": 23 }

You could also model the entire room as a device. While you usually wouldn't consider a room to be a device, in IoT the device abstraction is really about what you manage and record from as a unit; it isn't always limited to a single gizmo you can hold in your hand. Viewed that way, you could model the hotel room as a device that contains three sensors:

{deviceID: "dh28dslkja", "room": 128, "temp_floor": 22, "temp_ceiling": 24, "temp_bedside": 23, "average_temp":  23 }

Depending on the goals, one of these two data representations might be more correct than the other. Note the average temperature field in the second example. This might be what the hotel is looking for. Is metadata from each sensor most valuable on its own, or do the separate pieces of metadata make more sense applied to the room as a whole? What if the room was a suite and the three locations were the bathroom, lounge, and bedroom? These are the sorts of questions you'd need to ask yourself when deciding how to model the data. The domain model of the connected application defines the exact boundary of what constitutes the device.

Device hardware

General considerations when choosing hardware

When choosing hardware, consider the following factors, which are affected by how the hardware is deployed:

  • Cost. Given the value of the data provided, think about what cost can be supported for each device.
  • I/O roles. The device might be primarily a sensor, an actuator, or some combination of the two roles.
  • Power budget. The device might have access to electricity, or power might be scarce. Think about whether the device will require battery or solar power.
  • Networking environment. Consider whether the device can be wired directly to the Internet as TCP/IP routable. Some types of connections, such as cellular, can be expensive with high traffic. Think about the reliability of the network, and the impact of that reliability on latency and throughput. If it is wireless, consider the range the transmission power achieves and the added energy costs.

Functional inputs and outputs

The devices used to interact with the physical world contain components, or are connected to peripherals, that enable sensor input or actuator output. The specific hardware you choose for these hardware I/O components should be based on the functional requirements. For example, the sensitivity or complexity of the motion you need to detect will determine what kind of accelerometer you choose, or whether you need a gyro instead. If you are doing gas detection, the type of gases that the sensor can accurately detect matters. When using a device to produce output, you must consider requirements such as how loud a buzzer needs to sound, how fast a motor needs to turn, or how many amps a relay needs to carry.

In addition to the requirements determined by the environmental performance, the choice of these I/O components or peripherals might also be related to the type of information they are associated with. For example, a stepper motor can be set to a specific direction that might be represented in device state data, while a microphone might be steadily sampling data in terms of frequencies, which is best transmitted as telemetry. These components are connected to the logic systems of the device through a hardware interface.

Device platforms

There is an incredible amount of diversity in the specific hardware available to you for building IoT applications. This diversity starts with the options for hardware platforms. Common examples of platforms include single-board-computers such as the Beaglebone, and Raspberry Pi, as well as microcontroller platforms such as the Arduino series, boards from Particle, and the Adafruit Feather.

Each of these platforms lets you connect multiple types of sensor and actuator modules through a hardware interface.

These platforms interface with the modules using a layered approach similar to those used in general-purpose computing. If you think about the common, everyday computer mouse, you can consider the layers of peripheral, interface, driver and application. On a typical operating system, such as Linux or Windows, the hardware input is interpreted by a driver, which in turn relies on OS services, and might be part of the kernel. For simplicity, the following diagram omits the operating system.

Three components

Hardware Interfaces

Most hardware interfaces are serial interfaces. Serial interfaces generally use multiple wires to control the flow and timing of binary information along the primary data wire. Each type of hardware interface defines a method of communicating between a peripheral and the central processor.

IoT hardware platforms use a number of common interfaces. Sensor and actuator modules can support one or more of these interfaces:

  • USB. Universal Serial Bus is in common use for a wide array of plug-and-play capable devices.
  • GPIO. General-purpose input/output pins are connected directly to the processor. As their name implies, these pins are provided by the manufacturer to enable custom usage scenarios that the manufacturer didn't design for. GPIO pins can be designed to carry digital or analog signals, and digital pins have only two states: HIGH or LOW.

    Digital GPIO can support Pulse Width Modulation (PWM). PWM lets you very quickly switch a power source on and off, with each "on" phase being a pulse of a particular duration, or width. The effect in the device can be a lower or higher power level. For example, you can use PWM to change the brightness of an LED; the wider the "on" pulses, the brighter the LED glows.

    Analog pins might have access to an onboard analog-to-digital conversion (ADC) circuit. An ADC periodically samples a continuous, analog waveform, such as an analog audio signal, giving each sample a digital value between zero and one, relative to the system voltage.

    When you read the value of a digital I/O pin in code, the value can must be either HIGH or LOW, where an analog input pin at any given moment could be any value in a range. The range depends on the resolution of the ADC. For example an 8-bit ADC can produce digital values from 0 to 255, while a 10-bit ADC can yield a wider range of values, from 0 to 1024. More values means higher resolution and thus a more faithful digital representation of any given analog signal.

    The ADC sampling rate determines the frequency range that an ADC can reproduce. A higher sampling rate results in a higher maximum frequency in the digital data. For example, an audio signal sampled at 44,100 Hz produces a digital audio file with a frequency response up to 22.5 kHz, ignoring typical filtering and other processing. The bit precision dictates the resolution of the amplitude of the signal.

  • I2C. Inter-Integrated Circuit serial bus uses a protocol that enables multiple modules to be assigned a discrete address on the bus. I2C is sometimes pronounced "I two C", "I-I-C", or "I squared C".

  • SPI. Serial Peripheral Interface Bus devices employ a master-slave architecture, with a single master and full-duplex communication. SPI specifies four logic signals:

    • SCLK: Serial Clock, which is output from the master
    • MOSI: Master Output Slave Input, which is output from the master
    • MISO: Master Input Slave Output, which is output from a slave
    • SS: Slave Select, which is an active-low signal output from master
  • UART. Universal Asynchronous Receiver/Transmitter devices translate data between serial and parallel forms at the point where the data is acted on by the processor. UART is required when serial data must be laid out in memory in a parallel fashion.

Hardware abstraction in software

An operating system abstracts common computing resources such as memory and file I/O. The OS also provides very low-level support for the different hardware interfaces. Generally these abstractions are not easy to use directly, and frequently the OS does not provide abstractions for the wide range of sensor and actuator modules you might encounter in building IoT solutions.

You can take advantage of libraries that abstract hardware interfaces across platforms. These libraries enable you to work with a device, such as a motion detector, in a more straightforward way. Using a library lets you focus on collecting the information the module provides to your application instead of on the low-level details of working directly with hardware.

Some libraries provide abstractions that represent peripherals in the form of lightweight drivers on top of the hardware interfaces. Examples of these libraries include the Johnny-Five JavaScript framework, MRAA, which supports multiple languages, the EMBD Go library, Arduino-wiring, and Firmata.

Computing environment

The computing environment of your platform executes the software. Based on the hardware constraints of power and cost, the capabilities of the processor will vary. Some computing environments consist of a full system on a chip (SOC), which can support an embedded Linux operating system. Microcontroller-based devices might be more constrained, and your application code could run directly on the processor without the support of an operating system.

These computing environments are the bridge between the logic of your application code and the physical hardware of the platform. The software they run might be entirely loaded during boot up from read-only memory (ROM). Alternatively, the environment might result from a staged boot process. This process loads a small program called a bootloader from ROM, which then loads a full operating system from onboard flash or a connected SD card.

On-device processing

After data is collected from a sensor, the device can provide data processing functionality before sending the data to the cloud. Multiple devices might handle the data before it gets to the cloud, and each might perform some amount of processing.

Processing includes things like:

  • Converting data to another format
  • Packaging data in a way that's secure and combines the data into a practical batch
  • Validating data to ensure it meets a set of rules
  • Sorting data to create a preferred sequence
  • Enhancing data to decorate the core value with additional related information
  • Summarizing data to reduce the volume and eliminate unneeded or unwanted detail
  • Combining data into aggregate values

On-device analysis can combine multiple processing tasks to provide an intermediate, synthesized interpretation that enables more information to be transmitted in a smaller data footprint.

Three components

Device Management

Device management is similar to other IT asset management: the main concerns are provisioning, operating, and updating the devices. These concerns apply to all devices, including gateways.

Provisioning

Provisioning is the process of setting up a new device and making it ready for use. Provisioning includes:

  • Bootstrapping with basic device information. At a minimum, a device needs an ID and basic metadata.
  • Credentials and authentication required for secure communications. For example, the device can be provided a token or key that can be used for ongoing communications. Such credentials might have an expiration time.
  • Authorizing the device. Authorization establishes the permissions of the device to interact with the application or other services, relying on the authentication credentials above. Authorization is the specific permission of the device ID and credential with a specific resource it can use.
  • Setting up the network connection. A device needs a network connection to be able to communicate with other services and to transmit data.
  • Registering the device. Applications need to know which devices are available. A device registry keeps track of which devices are in use, manages the cloud side of authentication and associates devices with specific data and resources (such as telemetry topics and state storage).

Operations

The daily operation of an IoT system requires that you collect the right information about what's going on. Similar to any IT-hardware deployment, the logging of various events and the monitoring of key status metrics through dashboards and alert mechanisms can help you keep things running smoothly. Cloud Platform provides features that you can use to your advantage for daily operations:

  • Stackdriver Logging collects and stores logs. Key device lifecycle events are logged for auditing. A subset of telemetry events can be relayed into Stackdriver Logging for analysis and reporting. Using Stackdriver Logging can save you a lot of time and effort compared to building a custom logging solution.

Over-the-air updates

The sheer scale of a typical IoT deployment means that updating individual devices on site is not practical. Because devices already have some sort of network connection by design, updating devices can be made simpler by pushing updates across the network. In cellular phone parlance, this is an over-the-air (OTA) update, and the same idea applies in IoT. Some options include:

  • Android Things. If you use Android-Things-based hardware, OTA updating is built in.
  • Setting up your own Debian package repository (APT) on Cloud Platform.
  • Resin.io. Based on the Yocto Project, resin.io lets you use familiar tools such as Docker and git to push container image updates.

Gateway

A gateway manages traffic between networks that use different protocols. A gateway is responsible for protocol translation and other interoperability tasks. An IoT gateway device is sometimes employed to provide the connection and translation between devices and the cloud. Because some devices don't contain the network stack required for Internet connectivity, a gateway device acts as a proxy, receiving data from devices and packaging it for transmission over TCP/IP.

A dedicated gateway device might be a requirement if devices in the deployment:

  • Don’t have routable connectivity to the Internet, for example, Bluetooth devices.
  • Don’t have processing capability needed for transport-layer security (TLS), and as such can't communicate with Google APIs.
  • Don't have the electrical power to perform required network transmission.

A gateway device might be used even when the participating devices are capable of communicating without one. In this scenario, the gateway adds value because it provides processing of the data across multiple devices before it is sent to the cloud. In that case, the direct inputs would be other devices, not individual sensors. The following tasks would likely be relegated to a gateway device:

  • Condensing data to maximize the amount that can be sent to the cloud over a single link
  • Storing data in a local database, and then forwarding it on when the connection to cloud is intermittent
  • Providing a real-time clock, with a battery backup, used to provide a consistent timestamp for devices that can't manage timestamps well or keep them well synchronized
  • Performing IPV6 to IPV4 translation
  • Ingesting and uploading other flat-file-based data from the local network that is relevant and associated with the IoT data
  • Acting as a local cache for firmware updates

Cloud Platform

After your IoT project is up and running, many devices will be producing lots of data. You need an efficient, scalable, affordable way to both manage those devices and handle all that information and make it work for you. When it comes to storing, processing, and analyzing data, especially big data, it's hard to beat the cloud.

The following diagram shows the various stages of IoT data management in Cloud Platform:

Three components https://cloud.google.com/iot-core/images/benefits-diagram.png

Device Management

IoT Core Device Manager

Google Cloud IoT Core Provides a fully managed service for managing devices. This includes registration, authentication, and authorization inside the Cloud Platform resource hierarchy as well as device metadata stored in the cloud, and the ability to send device configuration from the service to devices.

Ingestion

Ingestion is the process of importing information from devices into Cloud Platform services. Cloud Platform provides different ingestion services, depending on whether the data is telemetry or operational information about the devices and the IoT infrastructure.

IoT Core MQTT

Google Cloud IoT Core Provides a secure MQTT (Message Queue Telemetry Transport) broker for devices managed by IoT Core. This efficient binary industry standard allows for constrained devices to send real-time telemetry as well as immediately receive messages sent from cloud to device by using the configuration management feature. The IoT Core MQTT broker directly connects with Cloud Pub/Sub.

Cloud Pub/Sub

Google Cloud Pub/Sub provides a globally durable message ingestion service. By creating topics for streams or channels, you can enable different components of your application to subscribe to specific streams of data without needing to construct subscriber-specific channels on each device. Cloud Pub/Sub also natively connects to other Cloud Platform services, helping you to connect ingestion, data pipelines, and storage systems.

Cloud Pub/Sub can act like a shock absorber and rate leveller for both incoming data streams and application architecture changes. Many devices have limited ability to store and retry sending telemetry data. Cloud Pub/Sub scales to handle data spikes that can occur when swarms of devices respond to events in the physical world, and buffers these spikes to help isolate them from applications monitoring the data.

Stackdriver Monitoring and Stackdriver Logging

As noted in previous sections, Cloud Monitoring and Stackdriver Logging can provide operational advantages. Operational information is ingested by these services through their provided interfaces.

Pipeline processing tasks

Pipelines manage data after it arrives on Cloud Platform, similar to how parts are managed on a factory line. This includes tasks such as:

  • Transforming data. You can convert the data into another format, for example, converting a captured device signal voltage to a calibrated unit measure of temperature.
  • Aggregating data and computing. By combining data you can add checks, such as averaging data across multiple devices to avoid acting on a single, spurious, device. You can ensure that you have actionable data if a single device goes offline. By adding computation to your pipeline, you can apply streaming analytics to data while it is still in the processing pipeline.
  • Enriching data. You can combine the device-generated data with other metadata about the device, or with other datasets, such as weather or traffic data, for use in subsequent analysis.
  • Moving data. You can store the processed data in one or more final storage locations.

Cloud Dataflow

Google Cloud Dataflow provides the open Apache Beam programming model as a managed service for processing data in multiple ways, including batch operations, extract-transform-load (ETL) patterns, and continuous, streaming computation. Cloud Dataflow can be particularly useful for managing the high-volume data processing pipelines required for IoT scenarios. Cloud Dataflow is also designed to integrate seamlessly with the other Cloud Platform services you choose for your pipeline.

Data storage

Data from the physical world comes in various shapes and sizes. Cloud Platform offers a range of storage solutions from unstructured blobs of data, such as images or video streams, to structured entity storage of devices or transactions, and high-performance key-value databases for event and telemetry data.

Storing state in IoT Core

Some device state might be directly connected to the hardware. For example, when you check the state of a digital GPIO pin, it reads as HIGH or LOW, based on the physical reading of the voltage on the pin.

Other device state might exist at the application layer. For example, recording-audio might have a state condition of true or false, related to whether the application is sampling from the mic or writing to disk. At the hardware level, the mic itself might be left on.

From the software perspective, the application code running on the device maintains the source of truth. It is often valuable, even required, for other software to read the device's last known state. Given that IoT devices can spend some time in low-power sleep mode and might exist on particularly unreliable networks, it's often useful to store some of a device's state in the cloud. That way, state data can be made available even when the devices themselves are temporarily offline.

The last known device state can be reported and stored in IoT Core for retrieval by applications. State information sent over MQTT or HTTP is persisted in IoT Core and is available in the cloud, even if the device has disconnected or gone off line.

Storing application data in Datastore and Firebase

When you need to make state or telemetry data available to mobile or web apps, you can store processed or raw data in structured but schemaless databases, such as Cloud Datastore and Firebase Realtime Database, where you can represent IoT device data as domain or application level objects.

Rule processing and streaming analytics in Cloud Functions and Cloud Dataflow

IoT events and data can be sent to the cloud at a high rate and need to be processed quickly. For many IoT applications, the decision to place the device into the physical environment is made in order to provide faster access to data. For example, produce exposed to high temperatures during shipping can be flagged and disposed of immediately.

Being able to process and act on this information quickly is key. Google Cloud Functions allows you to write custom logic that can be applied to each event as it arrives. This can be used to trigger alerts, filter invalid data, or invoke other APIs. Cloud Functions can operate on each published event individually.

If you need to process data and events with more sophisticated analytics, including time windowing techniques or converging data from multiple streams, Cloud Dataflow provides a highly capable analytics tool that can be applied to streaming and batch data.

Analytics

Performing analytics on data obtained through IoT sources is often the entire purpose of instrumenting the physical world. After streaming data has been analyzed in a processing pipeline, it will begin to accumulate. Over time, this data provides a rich source of information for looking at trends, and can be combined with other data, including data from sources outside of your IoT devices.

BigQuery and Cloud Datalab

Google BigQuery provides a fully managed data warehouse with a familiar SQL interface, so you can store your IoT data alongside any of your other enterprise analytics and logs. The performance and cost of Bigquery means you might keep your valuable data longer, instead of deleting it just to save disk space.

Cloud Datalab is an interactive tool for large-scale data exploration, analysis, and visualization. IoT data can end up being useful for multiple use cases, depending on which other data it's combined with. Cloud Datalab lets you interactively explore, transform, analyze, and visualize your data using a hosted online data workbench environment based on the open source Jupyter project.

Machine Learning

IoT data is often inherently multi-dimensional and noisy by nature. These attributes can make it hard to extract insight by using conventional analytics techniques. However, this nuance and complexity is often where Deep Neural Networks excel. Tensorflow is a leading open source machine learning framework, and on Cloud Platform you can apply Tensorflow in a distributed and managed training service through Cloud Machine Learning Engine.

Time Series dashboards with Cloud Bigtable

Certain types of data need to be quickly sliceable along known indexes and dimensions for updating core visualizations and user interfaces. Cloud Bigtable provides a low-latency and high-throughput database for NoSQL data. Cloud Bigtable provides a good place to drive heavily used visualizations and queries, where the questions are already well understood and you need to absorb or serve at high volumes.

Compared to BigQuery, Cloud Bigtable works better for queries that act on rows or groups of consecutive rows, because Cloud Bigtable stores data by using a row-based format. Compared to Cloud Bigtable, BigQuery is a better choice for queries that require data aggregation.

Archival Storage in Cloud Storage Nearline

The accumulation of data from the world never stops, and the data might not always be structured. Cloud Storage provides a single API for both current-use object storage, and archival data that is used infrequently. If your IoT device captures media data, Cloud Storage can store virtually unlimited quantities durably and economically.

Conclusion

Building Internet of Things solutions involves solving challenges across a wide range of domains. Cloud Platform brings device management, scale of infrastructure, networking, and a range of storage and analytics products you can use to make the most of device-generated data.

Monitor your resources on the go

Get the Google Cloud Console app to help you manage your projects.

Send feedback about...